Skip to content

Half the Battle

When AI comes for Wikipedia, knowledge itself is at stake

Late into his New York City mayoral campaign, Andrew Cuomo posted an attack ad stunning for its ability to break numerous election and copyright laws. A talking legislative bill, lipsticked and pregnant, converses with the likeness of Zohran Mamdani about his qualifications for the mayoral office until an anthropomorphized smartphone with a ChatGPT logo for a face tells us—in the smooth, automated voice of epistemic authority—that Mamdani would be the “least unqualified” mayor of the city. Cartoon Zohran then confirms the factoid, telling us to “try it with your own Chat GPT at home, kids!”

Like most of today’s internet content, the ad traffics in references it expects us to autocomplete, ham-fistedly promising us the answers, never mind the question. It’s not immediately clear who, or what, generated the ad. Nor is it obvious that OpenAI, valued at half a trillion dollars on the day of the ad’s release and currently mired in a series of class-action copyright lawsuits, consented to the ChatGPT logo’s use. Don’t get too excited: the same aggressive copyright litigation posture that makes it possible to bring AI companies to heel over stolen writing has seen the likes of semilegal file-sharing sites like LibGen or Sci-Hub sued into oblivion. I don’t think any of us will find much to be happy about online for a long, long time.

But we certainly used to. Even as the information superhighway was turning into a nightmarish parking lot of paywalls, security cameras, and slot machines, a few sites held onto the web’s original, idealistic promise. We all have our favorites. For as long as I can remember, I’ve relaxed in the evening by browsing Wikipedia. Occasionally stoned, always enthralled, I read about the dances of Wallis and Futuna, the history of fertilizer, Nestor Makhno and whether he was really an anarchist, Felipe Rose and whether he is really Native American, the early performance history of Chekhov’s The Seagull, all the very distinct plants known as “tea tree”, European monarchic precedence before 1815, the repetition compulsion, and Basic Instinct. These midnight snacks had always been my private delight, but it was only when walked past WikiBär, a brick-and-mortar Wikipedia storefront, logo and all, near my apartment, that I saw how the sausage is made. Four nerdy guys typed quietly on a weeknight, during what I learned is a regular open editing session. After fifteen years of reading the site, I got home and finally clicked the “Talk” tab to figure out how editors talk to each other, and what they talk about.

Among the ad-infested wetlands and drainage basins of social media, Wikipedia seems the last piece of solid ground in sight.

Charmingly antiquated, unwieldy enough to form a distinct internal culture without alienating newcomers, Wikipedia’s self-referential backchannel reveals the website’s origins in 1990s computer-programmer idealism. In brief, internauts Larry Sanger and Jimmy Wales had the ingenious notion of combining an online encyclopedia with a wiki—that is, a collaborative website editable by any user, from any internet browser. The first wiki predates their project by several years: WikiWikiWeb (1995) compiled open-source software and used hyperlinks to connect pages. This architecture would prove instrumental to Wikipedia (see rabbit-holes and the piles of open tabs that attend each dig). Hyperlinks allowed for the emergence of what is now called folksonomy (a portmanteau, and special case, of folk taxonomies). By tagging text in articles and linking them to other articles, hyperlinks organize knowledge in a necessarily decentralized and non-hierarchical manner. Over time, the collective tagging, categorizing, list making, and subsuming and disambiguating articles yields structural features like portals, disambiguation pages, and chains of links often so uncanny that editors and readers have devised numerous games about them. Some pages are further linked in tables called Navigation templates, like the extensive template of the structure of the Communist Party of the Soviet Union (CPSU) or the impressive template of Colombian emeralds.

These organizational stratagems are the boon of open collaboration in the Talk pages over Wikipedia’s twenty-five-year history. Their culture of dispute and deliberation, governed by fairly extensive guidelines, constitutes the widest-ranging experiment in organizing human knowledge of all time, not because of the flurry of interesting articles themselves but rather this consensus model of encyclopedia writing, which has been likened to Quaker deliberation. Significant is how it has managed to organize collaboration on a mass scale by fostering a culture, gate-kept by fairly attainable shibboleths, that can accommodate editors indefinitely.

While encyclopedias have always existed in some format, the modern one was borne out of the eighteenth-century dictionary form, which organized entries through the alphabet and cross-referenced them, as well as the development of collaborative authorship. A timely solution to the question of how to organize knowledge amassed by multiple authors came in the form of centralized taxonomies of knowledge: tree diagrams of categories and subcategories, deliberated by committee and published alongside modern encyclopedias. The first modern encyclopedia, Diderot and d’Alembert’s Encyclopédie, first published in 1751, included the “Figurative System of Human Knowledge” (Système figuré des connaissances humaines), which divided knowledge into three branches: memory-history, reason-philosophy, and imagination-poetry. From there, the tree branched into sub-categories: “glassmaking,” for instance, we find under the uses of glass, which is under arts, crafts, and manufactures, which is under Uses of Nature, which is under the natural history subcategory. Hierarchical taxonomies became the spinal column of future encyclopedias, as in Encyclopedia Britannica’s Propædia and its Outline of Knowledge. They also force encyclopedia editors to make categorical choices, and the authors of the Système figuré chose, significantly for their time, to order theology as a subcategory of philosophy. Rather than a source of knowledge on par with human reason, it became one kind of reason among others.

All encyclopedias create an image of the world, not in what they contain, but in how they classify, categorize, and organize their content. The Encyclopédie’s image of the world was explicitly rationalist and tacitly atheist. This did not preclude the inclusion of a moderately conservative Catholic theologian among the Encyclopédie’s authors, but it did mean that Diderot permitted himself some clever cross-references. Most famous among them is in the entry on cannibalism, which concludes with the following cross-reference: “Voyez Eucharistie, Communion, Autel, etc.” (“See: Eucharist, Communion, Altar, etc.”). In the assessment of the 1911 version of Encyclopedia Brittanica, Diderot’s Encyclopédie “sought not only to give information, but to guide opinion.”

The Britannica originated, in fact, out of difference in this opinion. Scotsmen Colin Macfarquhar and Andrew Bell envisioned it as a conservative version of the Encyclopédie, without the heretical cross-references or Diderot’s weird jokes. Theirs would be objective and neutral, and their gripes about the French encyclopédistes find a distant echo in complaints about Wikipedia and its editorial cadres over the past twenty years. Truly there are no clean breaks in the history of knowledge. Diderot’s Encyclopédie itself contained many articles translated from Ephraim Chambers’s 1728 Cyclopædia, and back in our timeline, Wikipedia got its start by reproducing and citing articles from the 1911 edition of the Encyclopedia Britannica, among other encyclopedias, dictionaries, and histories available in the public domain.

Despite their very different aims and forms, encyclopedias have conventionally followed rigorous citation and referencing guidelines. Wikipedia’s may be byzantine, governing not just the provenance of sources but also the various styles in which they can be included in articles, but they are what formally distinguishes it from all preceding encyclopedias. Referencing took on a new significance through Wikipedia’s commitment to open access for research and open knowledge more broadly: elaborations of the open source movement, which has extended the idea of access from computer code to human language and, by extension, all written knowledge. This political commitment finds its abstract form in Wikipedia’s Five Pillars: a shorthand for the online encyclopedia’s epistemology; its concrete expressions, on the other hand, we can see in Wikipedia’s editorial culture and the marvelous, intersecting lists, and lists of lists, that is has created. These include craters on the far side of the moon, names for the biblical nameless, or, my favorite, Errors in the Encyclopædia Britannica that have been corrected in Wikipedia. Rigorous recordkeeping, including of archived versions of articles and talk pages, made this editorial culture appealing to demographics far more varied than the white, male computer programmers who started the site, despite the ongoing shortcomings recorded in Wikimedia’s 2023 Community Insights Report. Wikipedia comes out of the happy marriage between a 1990s hacker culture that provided its lingo and its digital infrastructure and the detail-oriented perniciousness of indexers, lexicographers, fact-checkers, history buffs, trivia-collectors, and other bookish oddballs.

A lucky stroke of historical coincidence likely came from the systematic defunding of most other intellectual forums in twenty-first-century America, from universities decimated by financial austerity measures to local newspapers bought out and shut down. Unemployed geeks and nerds of all stripes make for enthusiastic editors. Anybody who’s been out of work for a while knows how heartening it can feel to contribute to a collective project, even unpaid. Of course, the other side of this coin are the self-selecting personal temperaments and social subject-positions that tend to populate the editorial cadres. No better description of these issues can be found than in Wikipedia’s internal articles. More than a repository for dispossessed intellectuals and enthusiasts, however, Wikipedia’s intellectual culture of open collaboration allowed it to become the fifth most visited domain on the internet between 2012 and 2020, and the most-visited website run without any profit incentive over the past two decades. Among the ad-infested wetlands and drainage basins of social media, Wikipedia seems the last piece of solid ground in sight. To understand its social function in the broader history of the internet, however, we have to return to the beginning.

The theoretical apogee of the open-source software movement was The Cathedral and the Bazaar, Eric S. Raymond’s 1997 essay, later turned into a book. Raymond’s central thesis was that public testing of the source code would allow it to be debugged more efficiently (“given enough eyeballs, all bugs are shallow”), creating better code more quickly, using the internet to solicit volunteer user-programmer input. He distinguished between source code restricted to closed teams of developers and available to consumers with official software releases (cathedrals) and source code developed on the internet, in public view, and available to everyone to edit (bazaars). What was an open question in 1997 is now a closed case. Wherever we log on, we find ourselves inside one of several grubby cathedrals, all of them enshittified by overvalued tech firms scrambling to counteract the falling rate of profit. Wikipedia is one of a few bazaars left, and it might not be left standing for long.


Even Wikipedia’s cofounder has turned against it. In September 2025, Larry Sanger posted “9 Theses” describing its supposed liberal biases on his website. His chief grievance has to do with its labelling Fox News and the New York Post as unreliable sources. Elon Musk, omniscient mastermind, took this opportunity to announce the launch of Grokipedia: an AI-powered encyclopedia that would eliminate the human bias of actual editors. Most journalism about Grokipedia has fixated on the question of neutrality as it relates to the simplistic schema of conservative and liberal bias. It goes without saying that such a framing reduces everything to highly mediated culture war positions on hot-button issues, of more interest to media corporations driving engagement and consumers addicted to getting upset than to anybody interested in thinking original thoughts.

Wikipedia’s own approach to epistemic neutrality is outlined in the fundamental principle of NPOV, or neutral point of view. This approach is, to say the least, a bit more sophisticated than what commercial media ventures have been able to offer. Consider the lengthy discussions about NPOV in the Talk page about the Gaza Genocide article, which includes a robust, publicly available, open discussion about news outlets, definitions, framings, and references. One would be hard-pressed to find a more nuanced discussion of epistemological neutrality anywhere on the internet. Unfortunately this did not prevent Jimmy Wales, the other cofounder, from chiming in with his personal opinions on the topic, the content of which is less surprising than his ill-advised decision to make a public statement in the first place. But let’s assume good faith, like all Wikipedians are encouraged to. Wales and Sanger are not stupid. Wales’s comments, quoted in multiple news sources, seem to have been deleted from the Gaza Genocide Talk page. Sanger, for his part, reviewed the newly launched Grokipedia and correctly surmised that it’s “bullshittery.” These founders are certainly misguided, but it’s almost as if they are also ignorant of the history of encyclopedic projects. Like Britannica’s attempts to smooth out the supposed heresies of Diderot’s Encyclopédie, an apparent struggle over bias, neutrality, and truth turns out to be a political struggle about authority and its domains. The positions in the present struggle are the project of collective knowledge fostered by a culture of open collaboration on the one hand and the product sold by AI companies on the other.

Wikipedia has done its best to paper the cracks that formed during its meteoric expansion: a 2006 New Yorker piece described “five robots [that] troll the site,” and, as of this writing, Wikipedia has 309 algorithmic bots, fixing tags, reverting vandalism, and finding archived webpages to replace outdated sources. From some two hundred thousand registered users in 2006, there are now about fifty-one million registered users in English Wikpedia alone; just under twelve million have contributed at least one edit to the English Wikipedia alone. The collective efforts of Wikipedia users, the bots they made, and the epistemic categories and verification practices they devised together, over many years, are the reason why Wikipedia has become the superlative source of truth in the world today. Despite what your high school teacher may have told you a decade or two ago, you’d be hard pressed to encounter a factual inaccuracy on the site. These collective labors are what built this monument to human inquisitiveness, and this anonymous, volunteer labor is what is currently being steamrolled by LLMs and repackaged as chatbot responses available by subscription.

All the text on the World Wide Web has become terrain for AI companies to strip-mine for language patterns, and Wikipedia is uniquely valuable because everything on it has been referenced, checked, edited for NPOV, not to mention that it is openly available. So much so, in fact, that the Wikimedia Foundation announced last April that AI bots are straining the bandwidth on their servers. Six months later, the foundation announced that its website traffic from human visitors has plummeted as more people get their info from generative AI chatbots and search engine summaries trained on Wikipedia’s articles. But even the form of these chatbots and e-summaries is indebted to the work of Wikipedia editors and the Wikimedia Foundation, which has a played an ever-growing role in governing the encyclopedia, its intellectual culture, and those of the over fourteen other wiki projects it oversees, like the Wikimedia Commons.

Take Knowledge Engine, a controversial 2015 project on the part of the foundation to build an alternative to conventional search engines that would reclaim lost web traffic, and tell me if it doesn’t sound familiar. Knowledge Engine was a response to Google’s 2012 rollout of its Knowledge Graphs: infoboxes that accompanied some search results, reproducing and linking to Wikipedia without necessarily redirecting web traffic. Out of character for a foundation whose premise includes radical transparency, the Knowledge Engine project was kept under wraps and scrapped not long after its release. It was widely despised by editors and administrators for radically upending Wikipedia’s interface. Unable to let a bad idea go to waste, Wikimedia Deutschland, the foundation’s German chapter, announced the Wikidata Embedding Project earlier this year, around when Musk was announcing his own enlightened alternative to the site. Like most German technological ideas of the twenty-first century, this one, which promises to tag Wikipedia’s content to make it more legible to large language models, is unoriginal, superfluous, and retrograde all at once.

This is because all major AI companies have already tagged this data for themselves, repackaging knowledge collected and organized by living humans as personalized and allegedly artificial chatbot responses. We can see this as another instance of digital enclosure of the commons or as a kind of primitive accumulation of cyberspace: capturing the virtual terrains of open-source, open-access knowledge to resell it back to users. But what happens when the resource of primitive accumulation is not land or agricultural commodities but knowledge itself? We can answer this by taking a cue from Wikipedia’s toolbox and disambiguating “knowledge.”

Knowledge has always been indissociable from the media ecology in which we encounter it, increasingly so since the postwar emergence of mass print and broadcast communication. That Encyclopedia Britannica was disseminated by door-to-door salesmen suggesting this hefty commodity’s ability to confer social status or erudition is by no means insignificant. How we come to learn something about the world teaches us as much as what we learn: the dynamic of writing, disseminating, reading, and perhaps participating is what creates intellectual cultures in the broadest sense. They make knowledge into something more than information by deliberating and validating its authority and explanatory power.

Knowledge tends toward social institutionalization, not only because thinking is something we do with others, even if they are separated from us by continents or centuries, but also because complex intellectual cultures require the social organization of hundreds and thousands of people. The open knowledge movement, with Wikipedia at its apogee, showed us the superior efficiency and scope of informal, decentralized, and semi-anonymous social institutions. How exciting, how uncanny, that amidst the historical decline of the past century’s knowledge institutions, collaborative thinking and collective self-organization gave us all a massive internet encyclopedia.

Amid the compulsory enthusiasm and fatalistic inevitability that seems to be the hallmark of AI chatbot marketing, we should ask ourselves whether it’s really all so simple.

All this knowledge infrastructure impresses onto us an image of the world and our place in it. Consider Wikipedia’s Talk pages and the species that inhabit them, called WikiFauna. Among them we find WikiMastodons, WikiMimes, and WikiGnomes, the last only making “useful incremental edits without clamoring for attention.” A WikiCyclops has a narrow focus, while a WikiVampire has it out for newbies, and neither are as dangerous as a WikiZombie, who seeks only to sabotage consensus. What are all these festive terms if not social roles in an intellectual culture, which teach editors and readers how to think together in productive ways? The names might be straight out of Dungeons & Dragons, but these mentalities will be familiar to anybody who grew up online.

Wikipedia represents just one highly developed, surviving form of many intellectual cultures that comprised most of the old internet before social media’s rise in the 2010s. These included innumerable specialist forums, blogs in the wilderness, semilegal file-sharing, anonymous debates, and early memes. Learning about the world through this internet is probably why so many of us millennials find ourselves on the left.

We might consider the past decade of well-heeled social media campaigns of right-wing influence as a revanchist strategy to counteract decades of a relatively organic, open-access internet culture of shared knowledge making untold numbers of people vaguely more anarchist. It also made people more inquisitive, comfortable with anonymity, and keen on institutional transparency and personal privacy. The old internet may have been no golden age, but only at this late hour can we discern how it fostered intellectual cultures which, in turn, shaped our generation’s political consciousness, from biohacking weightlifters and LiveJournal queers to small-nation nationalists and DSA electoral socialists.

This is why the full-throated alignment of right-wing and neoliberal authoritarians with AI technology is totally unsurprising. They have good reason to harvest and repackage all of the above as the error-prone effluvia of corny chatbots, and they’ve almost finished the job. But the social dimensions of knowledge reveal the fundamental difference between encyclopedias and AI chatbots: namely, the complete vacuum of any corresponding intellectual culture in the latter. No living context, no shared endless task—and, in its place, endlessly personalized responses. They spit out factoids without distinguishing the right answer to someone’s question from the answer somebody’s asking for. One doubts whether the subtle intellects at the helms of these AI firms could parse the distinction.

Amid the compulsory enthusiasm and fatalistic inevitability that seems to be the hallmark of AI chatbot marketing, we should ask ourselves whether it’s really all so simple. The key question may not be whether a computer can be as “intelligent” as a doctoral student, or even as smart as a fifth grader, but rather: What image of the world are these tech firms trying to create? For a few years, we saw knowledge workers spontaneously organize themselves to create knowledge through collaboration and consensus. We are unlikely to see this again and certainly not online. Fortunately for us, there’s still a whole world out there. See for yourself.