Watch Laurie’s talk on YouTube.
Wikipedia’s global audience has created a dense web of narratives that vary across languages. Far from being a problem to be corrected through standardization, this variation in content across language editions encodes distinct and valuable consensus perspectives and insights into collective memory. Understanding the scope, diversity, and evolution of memory curation on Wikipedia remains methodologically and theoretically under-examined. For example, the Arab Spring was a global event but the overlap in the concepts linked from the English and Arabic articles was very low, indicating divergent representations and recollections of these events. The lack of overlapping links and even the absence of corresponding articles documenting the Arab Spring across languages reflects differences in how these two communities reference concepts popular within their own language, cite sources that are written in their own language and contextualize events within their own past. Using natural language processing and social network analysis methods, we provide some descriptive quantitative evidence about the emergence and stability of these cross-language differences in content, authorship, and consumption. Highlighting these linguistic silos and perspectives that have been intentionally or unintentionally forgotten, misrepresented, or unknown has larger social and technical implications. Open and linked data like Wikipedia plays an important role as training data for many emerging technologies like machine translation, conversational agents, and large language models that run the risk of amplifying cultural and linguistic discrepancies and biases far beyond the audience of Wikipedia readers. Striking a responsible balance between preserving and aligning these diverse representations of events and people should be a priority of human-centered peer production.
Laurie Jones is a second year PhD student in the Information Science department at the University of Colorado Boulder. She is advised by Professor Brian Keegan in Information Science and Professor Alexandra Siegel in Political Science. She has a background in socio-physics research, utilizing mathematical models of complex phenomena to analyze human interaction, decision-making and movement. She is a computational social scientist focusing on discrepancies between collective memory and narrative evolution between English and Arabic speaking communities building off of my 5 years of Arabic language experience.