Open in order to…contribute to the global digital commons: University collections and Wikimedia

In today’s post for International Open Access Week we explore the value of Wikipedia and Wikimedia Commons for exposing University collections to a wider audience, whether research articles, data or Special Collections

Thanks to Richard Nevell at Wikimedia UK for his input to this post.

Beyond compliance

So much of the discussion around open access is focussed on compliance that it’s easy to lose sight of the more noble ambitions of OA, to democratise knowledge by ensuring that primary research is freely available to all. To inform the global public no less!

Like an individual vote in a referendum, uploading a green version of your article or dataset to an institutional repository might not seem to make much difference, yet its scientific rigour, academic objectivity or sociological insight incontrovertibly contributes to the digital global commons.

Knowledge networks

Making your work available online is only the first step in helping the right audience to find it, an “audience” that might no longer even be (only) human, and it is increasingly important to ensure that both you and your work are networked online, using persistent digital identifiers for example.

According to data from Crossref, Wikipedia was the sixth referrer of DOI clicks in 2015/2016 and, as of 18th October 2017, 1004 DOIs associated with the University of Leeds (978 articles / 26 books / 4 chapters) are cited across 1197 individual Wikipedia pages (thanks to Terry Bucknell of Digital Science for kindly supplying this data).

In a previous post we discussed adding repository links to cited sources in Wikipedia, in addition to the DOI (N.B. I have since been informed that I should use the archiveurl | archivedate protocol, though there seems to be some disagreement about how best to add repository links discussed here on WikiProject Open).

In any case, the University of Leeds has very nearly 1000 publications cited across 1157 pages* yet a search reveals a mere 96 links to records in WRRO – interestingly there are also 155 links to PhD theses in WREO. As both repositories are shared services across the Universities of Leeds, Sheffield and York, the numbers specifically for Leeds outputs will be lower still.

* I’ve discounted books as these aren’t necessarily directly associated with Leeds i.e. can be a reference to a chapter within a book to which a Leeds author has contributed a different chapter.

Despite these relatively low numbers, referrals from Wikipedia are significant for both WRRO and WREO, being the 17th and 19th top referrers respectively:


Given the immense potential value of OA links to Wikipedia and the inevitable frustration of paywalled DOIs it seems clear that we should encourage contributors to include legitimate links to OA versions where possible (without replacing the DOI or other link to the version of record). However, given the scale of the issue might there be an opportunity to leverage the Jisc supported CORE aggregation service, for example, which will potentially provide OA links to documents from the global repository network?

Wikimedia Commons

Wikimedia Commons is a repository of openly licensed media files – images, video and audio – for use in education and, like Wikipedia itself, anyone can upload or edit material. It also makes is very easy to embed media files across Wikimedia projects.

Haua Fteah cave

According to Wikipedia, Haua Fteah is a large karstic cave located in the Cyrenaica in northeastern Libya. The page includes a section on stratigraphy and layout of the cave which cites a 2014 article archived in WRRO – the affiliation is York rather than Leeds but the global digital commons is obviously bigger than any one institution or repository! The article is published OA on the publisher’s own site so there is no need to add the WRRO link, the DOI provides full access to everyone. As the article is CC-BY, we can use an image of the cave from the paper to illustrate Wikipedia, and the easiest way to do that is from Wikimedia Commons – also embedded here.

Research data – more than just spreadsheets

Using its DOI a dataset can of course be cited in Wikipedia in exactly the same way as a journal article or it might provide a unique source of additional material as in the case of Hugh Davies.

To quote Wikipedia once again, Hugh Seymour Davies (23 April 1943 – 1 January 2005) was a musicologist, composer, and inventor of experimental musical instruments. He has also been the subject of extensive research by Leeds academic Dr James Mooney.

On Saturday 17 October, 2015 a concert of music composed by, or in response to the work of, Hugh Davies, was staged at the Clothworkers’ Centenary Concert Hall, including a pre-concert talk by Dr Mooney which has been preserved as a dataset in the Research Data Leeds repository ( and which is included on Wikipedia as an external link.

In truth research data often is spreadsheets or other forms of numeric or textual data, however, as in the case of this one off concert footage, it can also comprise all sorts of rich media material that can be uploaded to Wikimedia Commons and embedded in Wikipedia with a suitable citation. Baxter, for example, is an industrial robot also used in robotics courses at Universities including Leeds.

Play media

Natural Language Acquisition and Grounding for Embodied Robotic Systems is a conference paper presented at the Thirty-First AAAI Conference on Artificial Intelligence in San Francisco. The associated dataset ( includes videos of Baxter manipulating different objects which can be uploaded to Wikimedia Commons under the terms of CC-BY, with a full citation, and used to illustrate the Wikipedia page using a single line of embed code.

Special Collections

Leeds University Library is the only library to have as many as 5 Designated collections. Designation status is a mark of distinction awarded by Arts Council England to outstanding collections of national and international importance held by non-national institutions. One of these is the Cookery Collection, currently on display in the Treasures of the Brotherton Gallery and the first to have its very own Wikipedia article created as part of a Wikimedia internship run by Special Collections in 2016-2017.

The future

At Leeds University Library, our exploration of the potential of Wikimedia projects is at an early stage. Nevertheless we recognise their immense potential to share information with the world.

In the future we would like to organise an Edit-a-athon, related to another designated collection perhaps, or around a particular discipline where there is established expertise at the University of Leeds.

Repository Fringe, 2017 – Beyond Borders

Posted by Rachel Proudfoot – programme – presentations from the event

Repository Fringe is an annual event in Edinburgh where anyone interested in repositories and research outputs can share experience, expertise and learn about developments in the repository field. 2017 marks the 10th Repo Fringe and this year was, in part, a celebration of how we have shared content ‘beyond borders’ over the last decade. The Research Data Leeds team explored the theme in our ‘Galactic Interfaces’ poster about working with arts and humanities researchers and data. (The poster is currently on display in the Research Hub on Level 13 of the Edward Boyle Library).

The conference offered a mix of keynote talks, short presentations, ‘birds of a feather’ sessions, posters and of course lots of informal networking over tea and biscuits.

Repositories:  problem or solution?

Shortly before the conference, Elsevier announced it had acquired BePress. Concern about the amount of control large commercial publisher have over research dissemination was a recurrent theme in the conference. This is nothing new, but we are seeing large publishers increasingly pushing into the ‘open access’ arena. Keynote Kathleen Shearer, Executive Director of COAR, suggested that, financially, Universities are as much over a barrel now with article processing charges for ‘gold’ open access articles as we were (still are) with journal ‘big deals’ and hikes in journal subscription costs. Shearer challenged the conference: are repositories helping to perpetuate a highly flawed scholarly communications system? Shearer is part of the Next Generation Repositories Working Group which will be publishing a set of recommendations in Sept 2017. She suggested we need to rethink repository design so we have ‘repositories of the web, not just on the web’. This may involve supporting peer review (another speaker pointed out Elsevier’s controversial patent of the online peer review system), improving discovery of research more than we currently do, making sure metadata is machine readable and taking a stronger lead on digital preservation. We should also develop a shared, international vision and common ways of working which reduce the risk of academic research being disproportionately shaped, controlled and charged-for by commercial interests. For Shearer, we need a more coherent alternative – and we’re certainly not there yet.

Active promotion of content

A few presentations suggested ways that repositories can promote content in addition to curating it. Gavin Willshaw from University of Edinburgh gave a great example of promotion as part of a project to digitize 17,000 PhD theses. Edinburgh have highlighted theses from notable alumni, such as Gordon Brown, Arthur Conan Doyle and Helen Pankhurst, have linked PhD theses to author pages in Wikipedia e.g. and have uploaded older theses to Wikisource, Wikimedia’s online library of out of copyright works.

Other discussion looked at what role, if any, a repository can have for impact case studies / research impact more generally. Could the repository promote research and /or capture more usage and impact data? Is there a role for repositories to host lay summaries of research to make research more accessible to a non-specialist audience – be they the ‘general public’ or researchers from other academic disciplines.

Easier, embedded metadata creation which will make researchers’ lives easier

Well, we can dream! One of the keynote speakers, Andrew Millar, outlined a vision of specialist tools designed to support an academic ‘community of practice’ which make it easier to capture metadata and contextual information as a routine part of research practice. Millar is a systems biologist and suggested Fairdom is a widely used tool which helps to capture metadata in a standard experimental workflow.

Such domain specific tools could link painlessly to shared repositories if we adopted common standards of data exchange. Tools discussed in the context were:

  •– packages documents, code and data into a zip file with manifest. Designed to be flexible across different subject areas.
  • – a way of packaging documents, models and data together using the Open Modelling EXchange format (OMEX).
  • BagIt – uses a file naming convention for structuring digital content

A presentation by Rory Macneil and Megan Hardeman demonstrated an end to end workflow, capturing information via an electronic lab notebook in the RSpace digital research platform depositing directly into the Figshare repository via an easy to use and embedded tool.

Hopefully repository uptake will increase – and we’ll get more enthusiastic engagement from researchers – if we can get closer to their everyday workflows and provide relatively pain free deposit options.


Anthea Wallace’s absolutely excellent presentation showed examples of how public domain works – either deliberately or unintentionally – have had restrictions imposed on reuse. As Wallace put it, closed licences won’t stop bad people from doing bad stuff with your data but may well stop good people doing good stuff. Wallace promoted the Copyright Cortex as a helpful resource for researchers in digital humanities. I partly mention this presentation as an excuse to use one of Wallace’s examples: the transcription of music from a human bottom in Hieronymus Bosch’s Garden of Earthly delights which can you can see below. You can also listen to an adaption of the ‘Butt Music’ on YouTube.

Hieronymus Bosch’s Garden of Earthly delights

More from the Fringe

Incentivising open practices – Digital Curation Centre (Authored by Sarah Jones with corrections from Dr Paul Ayris)

Wikipedia, information literacy and open access

In the 1980s my parents invested in an edition of the Encyclopædia Britannica. It’s still there, taking up shelf space in its burgundy livery, unopened since 1993, the information frozen in time like Britpop and New Labour.

Encyclopaedia Britannica 15 with 2002

SEWilco (Own work) [GFDL, CC-BY-SA-3.0 or CC BY-SA 2.5-2.0-1.0], via Wikimedia Commons

Wikipedia was launched in 2001, some 12 years after Berniers-Lee invented the Web, and is often maligned in academia. Yet it remains a default information source for many a denizen of the Web, whether layperson, undergraduate or PhD (citation needed).

Unlike a leather-bound Britannica, information is dynamic, updated by a small army of volunteers and there can be little argument regarding the success of the project in terms of sheer scale and cultural impact. Wikipedia and its sister projects are edited at the rate of 10 edits per second (which would equate to more than three-quarters of  a billion updates since 1993. That’s a lot of crossings out.)

Whether all those edits are accurate and properly referenced is a moot point of course, and like any informational resource, digital or otherwise, requires its readership to exercise its critical faculties.

Information and digital literacy

“Information Literacy is an umbrella term which encompasses concepts such as digital, visual and media literacies, academic literacy, information handling, information skills, data curation and data management”  SCONUL Working Group on Information Literacy 2011

Cognitive bias is universal, no less so in an information environment mediated by search engine algorithm, which is why peer review is essential in scholarship. Wikipedia’s model of collaborative authorship arguably provides a form of peer review and also supports formal academic citation, with many articles referencing peer reviewed sources, often by DOI. However, we are still a long way from full open access and many such references will inevitably be behind a paywall, inaccessible to those without access to a subscription through a university library i.e. those laypeople who might benefit most.

Fortunately it is quick and easy to sign up for a Wikipedia account and add a link to an open access version, in the White Rose Research Repository for example. WRRO also displays a colour coded Altmetric score to help identify when an article has been cited in Wikipedia – look out for dark grey in the patented donut:

Altmetric detail
Wikipedia links for “Investment and risk appraisal in enery storage systems: A real options approach”

Above is the altmetric page for Investment and risk appraisal in energy storage systems: A real options approach (DOI: 10.1016/ which is linked to from a Wikipedia page on Energy Storage and which took just a few moments to edit the linked title from the published version to the WRRO record (the link to the version of record is maintained via the DOI; click on the image to see the citation on Wikipedia):

Editing a Wilipedia page
Editing a Wikipedia page

To what extent the casual visitor will actually consult a citation and follow a link to a peer reviewed source is another question. Nevertheless, the very act of contributing to Wikipedia in this way will help to embed information literate practice into this culturally significant informational resource. It also extends the network of open scholarship which, in turn, will subtly influence those search engine algorithms.

A single link perhaps not so much, but three quarters of a billion…

For more information on contributing to Wikipedia and Open Access see Wikipedia:WikiProject Open Access

Working with Wikipedia in Special Collections

Our intern, Imogen, talks about the work she’s been doing with Wikipedia and how it benefits Special Collections.

The Collections Enhancement internship is well under way, with only a month remaining. Fortunately I will be moving to another project with Special Collections this summer. But for now, I wanted to explain the work I have been doing with Wikipedia.

Recently I wrote a Wikipedia article about Leeds University Library’s Cookery Collection. It’s tricky for a library staff member to create a Wikipedia page about Special Collections itself due to our conflict of interest.  Instead I created an article about this designated collection.

Other Wikipedia editors will delete work that is not in keeping with the principles of Wikipedia. The main principle is that articles needs to be well referenced, use independent sources and evidence the topic’s notability.

It was a big job to bring together my research about Wikipedia editing and the Cookery Collection as a finished article, but I’m really pleased that it is now live and has been awarded a B-class rating.  I have put together a comprehensive how-to guide for library staff on writing Wikipedia articles to facilitate future work.

I have also been enriching Special Collection’s representation on Wikipedia by adding links to Wikipedia pages which refer readers onto our relevant archives and collections. For example, if somebody was researching Jon Silkin and checked his Wikipedia page, they would find links to our collections about him in the page’s references and ‘External Links’ section.

To create these links I have been using Wikidata – a sister project to Wikipedia. Wikidata stores structured data. This means that information about people represented in our collections can be stored as items with properties that describe characteristics such as their gender, name or occupation.

I have been using the property ‘archives at’ to link relevant items with Leeds University Library archives. Storing this information with Wikidata makes it possible to pull out data and display it in lists, diagrams or on maps. For example, click on run in this query to see a bubble diagram illustrating the institutions with the most ‘archives at’ properties in Wikidata.

Alexa Internet cites Wikipedia as the fifth most popular website, viewed more often than Reddit, Amazon or Twitter. It is widely used as a first stop for research and 70% of students use Wikipedia to begin their research and obtain a summary of a topic. This is why Wikipedia is an important platform for the Library to be working with. Improving our representation on Wikipedia will strengthen our overall online presence and help make Library collections more visible to students and researchers.