newsletter #5 – wikihistories

Editorial: The elusiveness of gaps

This week in the wikihistories newsletter, we look at Michael Mandiberg’s analysis of Wikipedia’s race and ethnicity gap. Here, what Mandiberg initiates as an attempt to determine the percentages of underrepresentation of Indigenous and historically nondominant ethnic groups in the content and creation of Wikipedia becomes an exploration of the methodological and epistemological intricacies of categorisation, especially with the invisibility of whiteness as an unquestioned default.

In other news, Heather Ford presented “What I learned from studying Wikipedia bias for 18 years” at Wikimania in Singapore. She talked about her experience working as a Wikipedia community advocate in the 2000s in South Africa, about what we know about Wikipedia’s bias, and why she thinks that continuing to quantify gaps at a global scale limits what research can do for the Wikimedia movement. The project team also caught up with the wikihistories international expert advisory board to discuss the project and what is on the agenda for 2024. Unsurprisingly, the entanglements of Wikipedia and generative AI are top of mind for many.

This is my first newsletter as part of the wikihistories team and I am thrilled to be on board. This first month has given me the briefest of glances of the iceberg that is Wikipedia. It has been an eye-opening experience to say the least. I’ve also learnt that Wikipedia races and speedruns exist.

Francesca Sidoti
wikihistories research assistant

Is Wikipedia a colonial project?

In their article, ‘Wikipedia’s Race and Ethnicity Gap and the Unverifiability of Whiteness’, Michael Mandiberg clearly outlines the impossibility of calculating the race and ethnicity gap on Wikipedia because of three factors: “cultural norms, limitations of the data, and the unverifiability of whiteness” (37). In connection with the first of these factors, Mandiberg makes some fascinating observations about the geographies of Wikipedia and Wikidata. “Wikipedia is a global project”, Mandiberg writes, but “questions or principles that may make sense in one national context often produce disagreement on Wikipedia and Wikidata because of … varying cultural norms and contexts” (25). This is especially visible at the level of nomenclature: “one country’s terminology [in connection with race and ethnicity] may seem nonsensical or outright offensive in another national context” (25). Moreover, not only are terms specific to place, but they “are limited and reinscribe colonial power/knowledge relationships” (26). “These considerations”, concludes Mandiberg, “make it very challenging to design a global survey about race and ethnicity that successfully captures specific cultural meaning that allows for analysis at local and global scales” (27). With such comments, Mandiberg identifies the inherent problems of a project with global aspirations built on local forms of production. Not only is it impossible to create equivalences between different contexts, but the attempt itself reinscribes existing structural inequalities (including those that make whiteness unverifiable).

These astute observations led me to wonder about the global aspirations of the platform itself. Wikipedia was designed and built at the turn of the century and the tensions that Mandiburg describes in many ways reflect the moment of its creation. The early 2000s was a time that, in retrospect, stands out as a highpoint in utopian thinking about globalisation. Arguing that contemporary free-trade had created a more level playing field for international commerce, Milton Friedman’s 2005 book, The World Is Flat: A Brief History of the Twenty-first Century, might be seen to exemplify this thinking. Historians too were quick to take up the notion of globalisation and marked out a new field that would, in the words of one of the founding editors the Journal of Global History seek to “construct negotiable meta-narratives, based upon serious scholarship that will become cosmopolitan in outlook and meet the needs of our globalizing world.” Not everyone, however, was so taken by the promises of cosmopolitanism. As the historian of colonial Africa, Fred Cooper, wrote in 2001, “The world has long been — and still is — a space where economic and political relations are very uneven; it is filled with lumps, places where power coalesces surrounded by those where it does not, where social relations become dense amidst others that are diffuse. Structures and networks penetrate certain places and do certain things with great intensity, but their effects tail off elsewhere.” Cooper’s caution seems well founded for those of us interested in Wikipedia too and its inequalities. How does the world of Wikipedia produce relations that are uneven? Where are its lumps and channels? Where are networks concentrated and where do they tail off? Colonial projects, as Cooper elsewhere writes (with Laura Stoler), “were fundamentally predicated on a tension between notions of incorporation and differentiation that were weighted differently at different times.” (1997, 10). How might it be useful, as we try to make sense of Wikipedia and its gaps and fissures, to see it as a kind of colonial project?

Tamson Pietsch
wikihistories chief investigator

Definition defiance and researching Wikipedia gaps

In our first newsletter, we considered Katy Weathington and Jed Brubaker’s ‘Queer Identities, Normative Databases’, which demonstrated Wikidata’s heteronormativity. This edition, we consider a new piece by Michael Mandiberg, which demonstrates the whiteness of Wikipedia and Wikidata’s racial and ethnic categories.

Mandiberg arrives at a strikingly similar conclusion to Weathington and Brubaker. In their paper, Weathington and Brubaker noted that heterosexuality is rarely marked in Wikidata. Generally speaking, only LGBT+ people have their sexuality recorded in Wikidata. Wikidata’s verification policy entrenches this outcome. Any statement about a person’s sexuality in Wikidata must be supported by a reliable source – and it is almost impossible to find a source that directly confirms a person is straight. A similar pattern afflicts Wikipedia’s racial and ethnic categorisations, observes Mandiberg. Whiteness is typically unmarked. Only members of racial or ethnic minorities are racially or ethnically classified. Wikipedia’s verification policy makes it especially difficult to categorise a person as white. In the English speaking world, whiteness is the default category. It is essentially undefined, and it is difficult to find a reliable source that directly confirms a person is white (p. 23).

Mandiberg builds on this observation to critique the methods of Wikipedia research. Since so many articles lack a racial or ethnic classification, and those that are so classified predominately describe members of minorities, it is difficult to get an accurate picture of the race and ethnicity gap: ‘It is statistically fallacious either to analyze only the biographies that have ethnicity data, or to treat the uncategorized biographies as white.’ (p. 25) Whiteness is doubly unverifiable. It cannot be verified by Wikipedia’s editors, and it cannot be verified by Wikipedia researchers who are trying to observe the demographics of Wikipedia articles.

Mandiberg makes some powerful observations about whiteness on English Wikipedia, and has some interesting points to make about the differences between American and European editors on the topic. But their analysis founders when they try to critique the methods of Wikipedia research.

It is not ‘statistically fallacious’ to try and estimate the ethnic or racial profile of articles on the encyclopaedia. Mandiberg quite rightly points out that a population study is impossible given the encyclopaedia’s current state – but as they later admit, sampling methods are perfectly feasible (p. 38). If researchers cannot rely on the overall picture provided by Wikipedia categories or Wikidata properties, then they can instead draw random samples from the database and make their own estimates of the ‘race and ethnicity gap’. Whether researchers want to categorise people by race in order to compute such a metric is another question.

Mandiberg is on stronger ground when they discuss the heterogeneity of racial and ethnic concepts, both within Wikipedia and in the wider world. They observe that the concept of ‘race’ is especially important in American discourse, and that in other countries such as India or Brazil, different ideas about how to divide and categorise people prevail. In the wikihistories project, we are grappling with the representation of Australia’s colonial heritage in Wikipedia. The categorisation of Indigenous Australians on the platform raises a host of issues that echo, but are quite different to, the issues Mandiberg raises in their paper.

Ultimately, Mandiberg’s most powerful insight is that the very concepts of ‘race’ or ‘ethnicity’ defy definition. Since the meaning and application of these concepts vary so much from place to place and time to time, it probably makes little sense to even try and compute a ‘race and ethnicity gap’ for Wikipedia as a whole. Instead, researchers are well-advised to conduct more focussed studies, which examine how particular racial or ethnic regimes are encoded by different communities on the platform.

Michael Falk
wikihistories chief investigator

Contact us

Any news for us to share about Wikipedia and its role(s) in history making? Contact us!