On 19th March I was invited to a seminar hosted by the CILIP Special Interest Group, Multimedia Information & Technology (MmIT) at the University of Huddersfield to talk about some of our work around University Collections and Wikimedia.

Full programme and links to slides:

What is crowdsourcing?

In her opening talk of the afternoon Caroline Pringle, Senior Lecturer in Journalism and Media, invoked Wikipedia as the most obvious example of crowdsourcing in the internet age before taking us back to 1907 where Francis Galton observed that, when guessing the weight of an ox at a county fair, the average guesses of a crowd were more accurate than the estimates of individual cattle experts.


Wikipedia, of course, is far from infallible and individual contributors will often be incorrect, whether guessing the weight of an ox or contributing to Francis Galton’s Wikipedia page, which is where citation comes in, ideally to authoritative and open sources. More on that in a minute…

Note: Francis Galton, half-cousin to that most celebrated scientist Charles Darwin, is today mostly associated with the unsavoury "science" of eugenics; the Heredity and eugenics section on Wikipedia currently needs additional citations for verification


So crowdsourcing isn’t a new idea, and it isn’t just Wikipedia and guessing games, but “a distributed problem-solving and production model”. The term as we know it today, however, does connote utilising digital technologies to access an untapped workforce and other examples of the phenomenon include crowd funding (e.g. kickstarter), participatory journalism and citizen science.

crowdsourcingCaroline presented examples from marketing and design, film making, music and community journalism and introduced the concept of “cognitive surplus”, the free time that people are prepared to devote to collaborative online activity.

In this brave new world, users are not only consumers but also producers motivated to contribute to a specific task.

Engaging the Crowd

The British Library certainly has the profile to engage the crowd to contribute to their unique collections. As an introductory exercise, our handout from Mia Ridge (available as Google doc) highlighted three current crowdsourcing projects at the BL as well as a wide range of examples elsewhere that might encourage people to volunteer their time:


The declared mission of the British Library is “to make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment”. Crowdsourcing ticks all those boxes which Mia emphasised should be inherently rewarding and make participants feel they’ve made a difference. It should offer a modest intellectual challenge without being too onerous.

One of the first British Library projects was The British Sound Maps (2010/2011) and at least one of our small crowd recalled contributing to it by recording the demolition of the Fire Station in Sheffield.

The more recent projects linked above are Georeferencer, to identify accurate locations for thousands of historic maps in the BL collection, Convert-a-Card to help convert a digitised, handwritten catalogue card into a digital entry and In the spotlight, asking volunteers to transcribe names and performances on the playbills of Britain’s old theatres. This project in particular has generated lots of interest and engagement on the project forum.

University collections and Wikimedia

Now it was my turn to talk about our nascent initiatives around Wikimedia here at the University of Leeds. I mainly covered this blogpost which I won’t repeat here other than to emphasise an exciting discussion with the CORE team to potentially leverage their database of over 10 million open access articles to automatically add OA links to Wikipedia citations as well as exploring their recommender service to suggest potential citations for Wikipedia articles.

OAbot already utilises the unpaywall API to crowdsource valid OA links for Wikipedia citations generated at random (log in here with a Wikipedia account). However, the CORE dataset comprises actual documents that have been identified/downloaded and hosted on the CORE servers. This is in contrast to unpaywall’s unverified URIs, hence the need for a human actor to validate accuracy. Potentially, therefore, CORE could enable greater automation, to provide OA links for all 1000 references associated with the University of Leeds cited across Wikipedia for example (N.B. Wikipedia governance might mean potential barriers to an automated model.)  OAbot

The updated citation:


[Citation needed]

CORE incorporates sophisticated text and data mining (TDM) technologies which underpin the CORE recommender, a plugin that can be installed in repositories and journal systems to suggest similar articles. The CORE team are working with Wikipedia to leverage the same technology to generate potential citations for Wikipedia articles which I’m told should be fairly straightforward with a RESTful call to the API.

(N.B. See also https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements#Citation_Need_Modeling )

Wikimedia Commons

Another development since the original blogpost back in October is the facility to track page view numbers for pages on Wikipedia (and other Wikimedia sites) containing Commons files in a specific category, which reveals that the two Wikipedia pages currently containing Media from the Research Data Leeds Repository were viewed a total of 11,238 times in March:


Running an Editathon

The last talk of the day is was from Laura Woods and from  on the practicalities and benefits of hosting a Wikipedia editathon.


I was particularly interested in this talk and would very much like to facilitate an editathon at Leeds, in an area where the institution has well established academic expertise such as climate science, for example, or water research.

Laura and Lindsay also wanted to work with academics but decided to run their first event in the Library archives focusing on “The History of the University of Huddersfield”.

A subsequent event in 2018 was organised around the #1lib1ref initiative which encourages librarians to add (at least) one reference to Wikipedia with participants at the Huddersfield event in March contributing on everything from Albania to Widnes and Boudica to Nabokov.

Some of the lessons learned (apart from the obvious importance caffeine and biscuits) were that themes are useful and that experienced wikipedians should be on hand to help the inexperienced.

There is, of course, a Wikipedia page with guidance on how to run an edit-a-thon – https://en.wikipedia.org/wiki/Wikipedia:How_to_run_an_edit-a-thon.

Laura and Lindsay plan to run more editathons through Staff Development and the Research Office as well as exploring how they might embed into specific schools and departments.

I for one will be watching with interest and please do let us know if would be interested in contributing your own expertise to the global common. Perhaps you already are?