This is the second in a series of three posts reflecting on our data management engagement award*. In our last post we considered the sheer scale of Wikipedia and its potential to engage with a global audience. So how is this free global information ecosystem being used at universities in teaching and to expose diverse collections including research and Special Collections? In December 2018 Kirstine McDermid and Nick Sheppard from the Research Support Team visited the University of Edinburgh to find out.

* Award sponsored by SPARC Europe, Jisc and the University of Cambridge https://sparceurope.org/first-ever-data-management-engagement-award-winner-named/

A note on Wikimedia

Wikipedia, “the free encyclopedia that anyone can edit” is a ubiquitous resource and the 5th most visited website in the world; perhaps less well-known are a wide range of related projects under the Wikimedia umbrella (currently 16) that together comprise a powerful information ecosystem. In this post we refer to two specific projects:

• Wikimedia Commons is a repository of openly licensed media files – images, video and audio - for use in education. Like Wikipedia itself, anyone can upload or edit material. It also makes is very easy to embed media files across Wikimedia projects

• Wikidata is a free and open knowledge base that can be read and edited by humans or machines and acts as central storage for the structured data of Wikimedia. It also provides support to many other sites and services beyond Wikimedia.

After my talk about the project at Internet Librarian International (slides here) the first point raised during the Q & A was that at their library Wikipedia is most definitely not regarded as a reliable source of information and should be avoided, which is perhaps a common view, especially in academia. This attitude is changing however and as far back as 2012, in an article for Ariadne, Amber Thomas described how “Wikipedia is reflective of the story of knowledge in the digital age” while emphasising its untapped potential.

Information literacy

"For God (sic) sake, you’re in college; don’t cite the encyclopedia"

So said Jimmy Wales himself, the founder of Wikipedia, on record in 2006 and quoted in the Chronicle of Higher Education. Well over a decade later the same caveat applies; Wikipedia is not a primary source and should not be cited as such. However, thinking about her own digital practice Amber admits that “I often use a Wikipedia entry as an identifier for a concept…my first point of call is often Wikipedia”. In the age of Google, which explicitly uses Wikipedia in its knowledge graph to inform results, this is no doubt true of many of us whether scholar or lay person. With this in mind it’s important to ensure that students, scholars and an educated public possess the necessary information literacy skills to assess the information they encounter online.

Of course, the beauty of “the free encyclopedia that anyone can edit” is that those critical skills can be taken one step further, to add missing information, correct errors and cite reputable sources which is why some lecturers and universities are beginning to explore the potential of Wikipedia in teaching or summative assessment. Here at Leeds for example, Antonio Martínez-Arboleda uses Wikipedia in his Spanish module to review the Spanish Wikipedia entry for the Economy of Ecuador economy, discussing critically questions of style and framing, potentially editing the content.

At the University of Edinburgh Wikipedia is increasingly embedded in the curriculum with Senior Honours students on the Reproductive Biomedicine programme undertaking a Wikipedia research assignment to “help address knowledge gaps and allow students a motivating opportunity to share their scholarship with the world for the common good”. The approach is also being trialled in the Global Health and Translation Studies MSc while Wikidata has been used on the Data Science for Design MSc.

Constructing knowledge online

Related to information literacy is an awareness of “how knowledge is created, curated and contested online” as Ewan McAndrew put it in a presentation at Maynooth University on 18 June 2018. The collaborative scale of Wikipedia puts it at the very centre of this debate with a YouGov survey in 2014 finding that the British public trust it significantly more than mainstream news outlets including the BBC and “upmarket” newspapers like the Times, Guardian and Telegraph.

Wikipedia is only as reliable as its contributors and while “regular” contributors number in the tens of thousands, a more modest 3000 are considered to be ‘very active Wikipedians’. The majority of these are men which undoubtedly contributes to an egregious example of bias, with only 17% of biographies on the site covering notable women. Which fact bolsters the argument that academic experts have nothing less than a professional responsibility to contribute to this most extensive reference work in human history, to ensure that it is accurate and properly referenced with primary sources (that, ideally, are themselves open access).

Manage it locally to share it globally

At the University of Leeds, this was the essential idea behind our successful proposal for the data management engagement award, to encourage academics and others to share openly licensed research outputs on Wikipedia Commons and run a series of editathons to use this material to improve Wikipedia. Now, with the help of experienced Wikimedians Ewan McAndrew and Martin Poulter we’ve begun to flesh out the idea and explore how it might utilise other tools in the Wikimedians’ workshop, notably WikiData.

Florence Bell
Wikimedia projects are interlinked. You can read a Wikipedia article on British Scientist Florence Bell, see her image on Wikimedia Commons and view the structured data about her on Wikidata.

Ewan and Martin are the Wikimedians in Residence at the Universities of Edinburgh and Oxford respectively, currently the only two Universities in the UK with such a role, though Wikimedians are more common in the US and at so called GLAM institutions (Galleries, Libraries, Archives and Museums) on this side of the pond.

At the beginning of December we visited Edinburgh to take part in a Women in Red Wikipedia editathon, led by Ewan, where Kirstine created her first Wikipedia page on accused witch Issobell Young and we added structured data about Scottish suffragettes to WikiData so it could be used to import to the Women’s suffrage in Scotland timeline. The following morning Martin joined us via Skype to tell us about his work importing scholarly works into WikiData with the Source Import tool utilising identifiers like DOI and ORCID.

One such work is “Properties of expanding universes”, the seminal PhD thesis of Professor Stephen Hawking made openly available by the University of Cambridge in 2017, resulting in such demand that it crashed the Cambridge repository. It is of course now linked from the great scientist’s Wikipedia page which receives around half a million views a month, excepting March 2018 when it received more than 12 million (Professor Hawking died on 14 March 2018). Discounting the initial spike and another corresponding to his death in March 2018, the thesis itself is downloaded around 25,000 times a month with much of that traffic undoubtedly driven by Wikipedia. Martin has also manually created its Wikidata item (Q42307084).

Wikidata is a knowledge database containing millions of statements such as “Stephen Hawking was a British theoretical physicist, cosmologist, and author” (Q17714) or indeed “Properties of expanding universes is the doctoral thesis of Stephen Hawking” (Q42307084). Additional statements enable each record to be linked to related informational ‘properties’, ‘date of birth’ (Property:P569) and ‘date of death’ (Property:P570), or in the case of a thesis ‘full work available at’ (Property:P953); there is also an ‘EThOS thesis ID’ property (P4536) to link to the British Library EThOS thesis record.  The Wikidata database can be queried using SPARQL to answer all manner of questions and build data visualisations. The Women’s suffrage in Scotland timeline for example is built using a 3rd party application called Histropedia that uses data from Wikipedia and Wikidata to automatically generate interactive timelines with events linked to Wikipedia articles.

Manually adding records to Wikidata is time consuming so Martin has used a tool called QuickStatements to bulk import thesis metadata from the Oxford Research Archive meaning we can run queries like “English Wikipedia articles of people whose doctoral theses are available full-text in Oxford Research Archive“. In addition these theses automatically appear in the Wikidata-driven bibliographic tool Scholia – https://tools.wmflabs.org/scholia/author/Q17714

Scholia is a tool built on WikiData to visualise scholarly networks which among other things has huge potential as a tool to demonstrate the benefits of ORCID and advocate open access via an institutional repository like White Rose Research Online. Here is a query on the topic of climate change for example listing recently published works, authors publishing on the topic, a co-author graph, highly cited works etc – https://tools.wmflabs.org/scholia/topic/Q125928

The potential for Academic Libraries is almost limitless – just think about related material in data silos across the campus. A University archive might have primary material about a famous scientist for example, a pioneer in their field, carefully catalogued, perhaps digitised, while elsewhere in the Library another team are uploading research papers citing that pioneering work to the institutional repository and curating datasets related to those papers in a data repository. Meanwhile the gallery has digitised their portrait. Using Wikidata all of these collections can be interlinked with each other and with related material from institutions and archives worldwide.

Martin is currently focussing on taking his message to senior managers at Universities to help them understand the huge benefits of linking-up their disparate datasets in this way; WikiData is a free, scalable solution and conduit to the global commons, the pin that the Wikimedia Universe wheels around.

Huge thanks to Ewan and Martin for leading us systematically though this Universe and helping us make sense of the conceptually abstract (which I may not necessarily have succeeded in fully articulating here!)

Here are a couple useful resources from Ewan at Edinburgh with Navino Evans from Histropedia:

How to build SPARQL queries to retrieve/visualise data in Wikidata

Wikidata Sparql Query Tutorial

Also see Martin’s blog post about using Wikidata to make visual representations of contextual knowledge about GLAM collection items and you can watch him talking about Wikidata on YouTube: The global knowledge base: open data about everything.

In our next post we will discuss our next steps for Wikimedia in general and the data management engagement award in particular.