We are developing a range of tools and associated training materials for Wikipedia scholars.

Wikipedia is born-digital. Data streams out of it by the terabyte. To make sense of this surging datastream, scholars require adequate ‘digital infrastructure’ in the narrow sense of ‘infrastructure which is digital’. They need client libraries to access Wikipedia’s APIs, high-performance computing resources that can handle Wikipedia’s large database dumps, and digital publishing platforms where analysis can be shared. But this is only part of the story. Like other social networks, Wikipedia reflects the practices, ideals, meanings, hopes and skullduggeries of its human protagonists. Perhaps more than any other social network, Wikipedia is dominated by a highly skilled subset of power users who have mastered Wikipedia’s systems and can shape the stream of data to reflect their own wishes. To understand this aspect of Wikipedia, scholars require another kind of ‘digital infrastructure’, namely ‘infrastructure to support the understanding of the digital’. This kind of infrastructure is largely intangible: it is the know-how, the concepts, the personal connections, the cunning required to enter the datastream and observe it as a stream of culture. Only with both kinds of infrastructure can Wikipedia scholars—particularly at the start of their careers—develop into hybrid data-scientist-ethnographers capable of engaging with Wikipedia on both fronts. At wikihistories, we are building up the mixed infrastructure required to support such hybrid scholars, and bridge the divide between computational social science and digital ethnography more broadly.


A new R package for Wikipedia Studies, developed by Michael Falk. The R package will enable scholars to access Wikipedia’s three APIs, and also process XML data dumps, in one consistent interface. It will include a number of analysis tools for common research tasks, such as ‘blaming’ edits on particular users, or navigating the category tree.