Dear friends of the wikihistories project,
We’ve been silent for a while but that’s because three major outputs from the wikihistories project have been released in the past few months with the jewel in our crown, our report that explores how Australians are represented in Wikipedia, launching today. Other outputs include the “Gender and the invisibility of care on Wikipedia” paper for Big Data & Society and the release of wikitiddy, our new R client library for conducting Wikipedia research here. Read all about them in our latest newsletter below. We’re all now due for a well earned rest and will be reporting back early next year! We wish all our friends of the wikihistories project a safe and restful break over the end of year period and look forward to new collaborations in the new year.
wikihistories chief investigator
‘How Australians are represented in Wikipedia’: Our first Annual Report launches
The team at the wikihistories project is interested in how Wikipedia represents Australian people, places and events over time. One of the main ways we explore these topics is through our Annual Reports and we are thrilled to announce the launch of our very first one! Our inaugural report, ‘How Australians are represented on Wikipedia’, focuses on Australian people. It’s a fascinating read with lots of insights into how Wikipedia categorises Australians, how biographies of underrepresented groups have evolved over time, and what some of the implications of these processes and practices might be. We think that this is the first comprehensive national quantitative analysis of representation on Wikipedia, something we think is crucial for Wikipedia research so that we’re able to understand trends in representation according to local shaping of contextual issues such as colonial histories and gender politics. You can find it here or download a PDF for your very own here.
Nationality, biography, Australianness: these are topics beset with complexities and politics before we even get to Wikipedia. On Wikipedia, further complexities arise. Even getting to the point of being able to map Wikipedia’s representation over time requires understanding which biographies on Wikipedia refer to Australians. This involves further questions: who counts as Australian in Wikipedia and its sister-projects? How are Wikipedia articles or Wikidata items marked as ‘Australian’ by the system? How does Wikipedia define or represent ‘Australianness’?
In this report, we address these questions using a dataset of biographical articles culled from English Wikipedia with the aim of revealing the definition of ‘Australianness’ implicit in Wikipedia’s systems and assessing how well Wikipedia represents the diversity of Australians. To address these questions and complexities, we used an innovative methodology which is explored in-depth in the report Appendix for all those data fiends out there.
The report fills important gaps in Wikipedia analysis. It Is a rare quantitative national study of WIkipedia. It also drills down into the ways in which individuals are categorised on Wikipedia as crucial to any quantitative analysis of representation on the encyclopedia. As a result, we found that Wikipedia generally ‘assumes that an “Australian” is a white cis-gendered male: only if a person departs from this norm are their racial or sexual characteristics described’.
The data and report also argues convincingly for the importance of attentiveness to who is represented on Wikipedia and how: who does and does not have a Wikipedia biography matters. How Wikipedia represents people also matters. It matters in relation to the production of knowledge, for how we understand ourselves, others, and the notion of “Australianness” and it matters in a nuts and bolts way: being well represented on Wikipedia makes a difference for one’s reputation and for the kinds of opportunities available to people in work and life.
And my favourite parts? The glorious complexities around how entities–like a Kylie Minogue song–become categorised as Australian person, the phrase “categorical anarchy”, and the gem of a question ‘does Wikidata collect structured data, or does it structure collected data?’
wikihistories research assistant
Who deserves a Wikipedia page?
The thing about interdisciplinary work, particularly when it involves the critical study of data, is that it takes time: time to experiment with data, to reflect on data produced, to try out new analyses and check against our assumptions. The wikihistories team recently published our first journal article, “Gender and the invisibility of care on Wikipedia” in Big Data & Society after more than 3 years of work. Tamson and I had started this project with a small grant from the School of Communications at UTS in August, 2020. Led by the data visualisation expert, Kelly Tall and with help from Toby Hudson, Alex Lum and Pru Mitchell, we published a data essay and an open dataset at the end of 2021. We were so excited about the collaboration and new methods we were experimenting with in the project that we developed a funding proposal to expand the work and we began the wikihistories project in mid-2022, funded by the Australian Research Council.
When we first started the project, I was really curious to find out how Wikipedia’s designation of notability mirrors (or fails to mirror) other (external) notability signals after the controversy over Nobel Prize winner, Donna Strickland’s representation on Wikipedia. When Strickland won the Nobel Prize in physics in 2018, Wikipedia faced significant criticism when it was discovered that a Wikipedia editor had tried to create a page about her a few months prior to the award but it had been rejected because it didn’t meet the project’s notability requirements. It appeared to outsiders that it took winning the Nobel Prize for a woman to get a page on Wikipedia.
Notability has been the subject of a lot of excellent research on gender bias (e.g. by Francesca Tripodi, Franziska Martini and Mackenzie Lemieux, Rebecca Zhang and Francesca Tripodi) on Wikipedia recently because it is an excellent example of how Wikipedians must make decisions about representation that rely on their own judgement about a person’s worth. Assessing evidence of recognition by external notability sources is part of that evaluation but a person’s recognition in one external source of notability is by no means a guarantee of acceptance on Wikipedia. This makes notability a really great place to understand how the seemingly neutral practice of deciding who should have a biography on Wikipedia is actually the result of (ever presentl) practices of qualitative judgement which includes practices that can be gamed and can be subject to ideological politics or bias as much as they can be the result of fair and balanced discussion.
What we found was super interesting. Situating the study in a particular national context, we compared English Wikipedia’s biographies to awardees of the Order of Australia, a national awards system that recognises citizens for “excellence, achievement or meritorious service and contributions to [Australian] society”. Since Wikipedia began in 2001 (the Order actually stretches back to 1975), the majority of awardees’ pages were created between 2004 and 2008, but there has been a significant decline in page creation for Order of Australia recipients since 2014 (in our recent report we learned that this is part of a larger trend of declining biography creation overall). I expected that female awardees of the Australian Honours would be less likely to have a Wikipedia biography. Women, it turns out, actually receive a larger proportion of a shrinking base of the created biographies.
But when we turn to what women are recognised for on Wikipedia vs the Order of Australia, we see some problematic results. We found that women whose service is for labour relating to the caring professions are less likely to have a Wikipedia article written about them than if their service is for sports, arts and films, politics or the judiciary. Women who do not have a Wikipedia biography but do have an Order of Australia have more mentions in their citations of terms such as education (519 versus 203), medicine (314 versus 42), care (114 versus 29), nursing (104 versus 12), teacher (34 versus 10) and aged (29 versus 4). The citations of women who have an Order and a Wikipedia biography have more mentions of words such as sport (148 versus 4), arts (135 versus 64), gold (114 versus 1), industry (46 versus 13), parliament (51 versus 10), judiciary (35 versus 11) or film (24 versus 1).
What is interesting about this analysis that took us over three years to publish, mulling over the data, experimenting and exploring different questions and representations, is that you don’t really see this problem if you only look at who has a biography according to their gender. Nor do you see it if you only look at Wikipedia without examining the sources on which Wikipedia depends. What we were able to see is yet another example of how Wikipedia does not only mirror the world (by, for example, mirroring notability decisions made elsewhere) but it also produces notability through the independent judgement of Wikipedia editors. What we hope is that Wikipedians recognise the ways in which their judgements matter, and that those in caring professions who have made significant contributions to the nation are also represented. People in caring professions, after all, deserve recognition just as much – perhaps more so – than those who have already been rewarded through fame and/or fortune.
Check out the paper (it’s open access!): “Gender and the invisibility of care on Wikipedia” by Heather Ford, Tamson Pietsch and Kelly Tall on Big Data and Society
wikihistories chief investigator
Team sightings IRL
Michael Falk gave a talk entitled ‘‘The Need for Mixed Infrastructure: The Case of Wikipedia Studies’ at the Creative Approaches to Open Social Scholarship: Australasia.
Michael presented the wikihistories project in the light of the question about what kind of infrastructure does digital research require. He said that Wikipedia is born-digital, that data streams out of it by the terabyte. To make sense of this surging datastream, scholars require adequate ‘digital infrastructure’ in the narrow sense of ‘infrastructure which is digital’. They need client libraries to access Wikipedia’s APIs, high-performance computing resources that can handle Wikipedia’s large database dumps, and digital publishing platforms where analysis can be shared. But this is only part of the story. Like other social networks, Wikipedia reflects the practices, ideals, meanings, hopes and skullduggeries of its human protagonists. Perhaps more than any other social network, Wikipedia is dominated by a highly skilled subset of power users who have mastered Wikipedia’s systems and can shape the stream of data to reflect their own wishes. To understand this aspect of Wikipedia, scholars require another kind of ‘digital infrastructure’, namely ‘infrastructure to support the understanding of the digital’. This kind of infrastructure is largely intangible: it is the know-how, the concepts, the personal connections, the cunning required to enter the datastream and observe it as a stream of culture. Only with both kinds of infrastructure can Wikipedia scholars—particularly at the start of their careers—develop into hybrid data-scientist-ethnographers capable of engaging with Wikipedia on both fronts. At wikihistories, we are trying to build up the mixed infrastructure required to support such hybrid scholars, and bridge the divide between computational social science and digital ethnography more broadly. See our new R client library for Wikipedia called wikkitidy here. We are also investigating a range of tools and training events to embed wikkitidy in a larger ethnographic research program.
wikihistories chief investigator
Any news for us to share about Wikipedia and its role(s) in history making? Contact us!