Universities really can’t afford not to have a Wikimedian in Residence these days. It still surprises me how few do.

Melissa Highton, Director of Learning, Teaching and Web Services

A Wikimedian in Residence (WiR) delivers training within and outside an organisation and liaises with the online community. They are not paid to directly improve Wikipedia but to share skills and content that empower others to do so. The University of Edinburgh, for example, has a well established WiR which you can hear about from Lorna Campbell in this previous event recording: A global challenge: digital and open education for inclusive societies

In April we were pleased to welcome Dr Martin Poulter from the University of Bristol and former ‘Wikimedian in Residence’ at the Bodleian Library, University of Oxford. You can see a full recording of Martin’s talk on YouTube or a brief summary below.

Wikimedia: an open, free and scalable infrastructure

Funders, not just the Research Councils but also organisations like the National Lottery Heritage Fund, increasingly expect research outputs to be open, reusable and remixable and to feed into wider civic debate. They may expect lay summaries of research as an output alongside journal articles and there’s a drive to make research relevant to public debate…

And then there’s Wikimedia, not only Wikipedia itself, “the free encyclopedia that anyone can edit”, but also its family of related sites which together form an ecosystem of shared knowledge which are seen by a massive global audience.

However, they are “works in progress”, (in)famously incomplete and lacking in many ways. Universities are uniquely placed to repurpose different kinds of research output to to both improve Wikipedia and acheive a greater public reach for their research.

Martin first encountered this idea when he encountered the scientist Dr Darren Logan whose team had published a paper about urinary proteins in the open access journal Plos One. The figures for the paper included a diagram of evolutionary relations between different families of proteins. Plos One is fully open access under CC-BY, so he could upload the figure to Wikimedia Commons from where it can easily be used to illustrate articles on Wikipedia, including in different languages:

Major urinary proteins (English language Wikipedia)

Proteínas urinarias mayores (Spanish language Wikipedia)

Proteïnes urinàries majors (Catalan language Wikipedia)

遺伝子ファミリー (Japanese language Wikipedia)

To see all the different Wikipedias that include the diagram, see the file on Wikimedia Commons – File:Phylogenetic tree of Mups.jpg – with the attribution and the DOI (Digital Object Identifier) showing where the file has come from and the link back to the original paper. Dr Logan also worked on the English Wikipedia article using this peer-reviewed paper as well as the figure to illustrate the article. He also used other relevant research and got it to a high enough standard for it to be a “featured article” which is shown on the front page of English Wikipedia and seen by millions of people.

Everyone uses Wikipedia

Dr Logan suggests that his paper received a lot more attention than other, similar papers due to this public exposure to a massive audience it wouldn’t otherwise have had. While it’s hard to test, this does appear to be reflected in citations and scholarly attention on the article.

While they may not readily admit it, not only a lay audience but scholars, researchers and doctors also use Wikipedia. There is evidence, for example, that Wikipedia shapes language in science papers.

Facebook and YouTube use extracts to provide contextual information about videos and news outlets, while Google extracts text and key facts for the boxes that appear alongside search results. Voice assistants like Siri and Alexa mine Wikipedia and Wikidata to answer questions. These services sometimes strip extracts of vital context or fail to make clear the provenance of the information (Poulter and Sheppard, 2020)

A question of bias

During his time at Oxford University, Martin was an editorial advisor to an open access journal Vestiges: Traces of Record. One of its papers was about the Cameroon Press Photo Archive. A significant problem with Wikipedia is its various biases which reflect its contributor demographics, with major discrepancies in geographical coverage for example, with more articles about the Netherlands than the whole continent of Africa. The Vestiges paper was describing an important archive in Cameroon about which Wikipedia had no article, so Martin was able to adapt it to a purely descriptive factual article that could also reuse photographs which has generates lots of citations from Wikipedia.

Blogpost as (wiki)data

Wikidata, an example of a knowledge graph that represents knowledge through the connections between things.

Poulter and Sheppard (2020)

At the Bodliean there was also a lot of work digitising middle eastern manuscripts including an important manuscript “The Shanimath of Ibrahim Sultan” announced in a blogpost. Martin went through the post sentence by sentence and turned each statement into a Wikidata statement – https://www.wikidata.org/wiki/Q53676578

This provides a sort of data ‘model’ of this illuminated manuscript, where it’s from and why it’s important. Meanwhile other items and records are being added to Wikidata all the time, from disparate collections of other illuminated manuscripts for example.

Wikidata can be queried using the database query language SPARQL, to answer all manner of questions and build data visualisations. So a query for things in collections that are connected somehow to Timur, the Turco-Mongol conqueror who founded the Timurid Empire in the 14th Century is illustrated below, including paintings depicting him, a letter written by him as well as things commissioned by or dedicated to him. A similar query can be run for any kind of historical figure:

Illustration from Wikidata query, of things in collections connected somehow to Tumer, the Turco-Mongol conquerer who founded the Timurid Empire

The same principle in the scientific field could apply to proteins or species, to create a linked dataset that improves over time to provide an ever more detailed picture of a specific topic.

Wikimedia: more than just an encyclopedia

There are more than a dozen different content projects of which Wikipedia is just the best known. The four that Martin suggests are most relevant to the work in universities are: Wikipedia, Wikimedia Commons, Wikisource and Wikidata.

The four Wikimedia projects most relevan to the work of universities: Wikipedia, Wikimedia Commons, Wikisource, Wikidata
Wikimedia Commons is a repository of openly licensed media files including photographs, diagrams, video and audio. Wikisource is a free library of out-of-copyright texts, while Wikidata is a store of structured data that can be read and edited by humans or machines.

For a given topic like a historical individual, ideally there will be an article in Wikipedia about their life, what they did and why they’re notable that will be illustrated with images from Wikimedia Commons, perhaps a portrait, their signature, coat of arms or other related manuscripts and artwork. Wikisource can be used for (out of copyright) text, so you wouldn’t have all of Lord Byron’s poetry in Wikipedia for example, which can be added to Wikisource instead. Finally Wikidata expresses the relations between things, so Lord Byron was a member of the Royal Society, and the father of Ada Lovelace, which are facts represented in Wikidata.

Wikidata also also records how things are named and identified, that Lord Byron and George Gordon Byron are the same person for instance and that Lord Byron the person is different from ‘Lord Byron’ the book or the film, and then Lord Byron has a particular library of congress number that is also recorded in Wikidata.

A license to share

The content of all of the Wikimedia sites is ‘free’. Not only in the sense that it doesn’t cost money, but also in the sense to reuse, often described as ‘open’ rather than ‘free’, which means we can only use the top tier of licenses in this illustration. In particular, as it’s impractical to attribute to the tens of thousands of individuals that may have cotributed, Wikidata must be cc0.

Illustration of copyright and the different Creative Commons licenses from least (CC0) to most restrictive (c)all rights reserved

These more open licenses are increasingly the ones mandated by funders, and represent an opportunity to contribute to the global commons in a way that was limited by non-commercial or non-derivative licenses in the past.

This is hugely valuable to open science, to share diagrams such as the phylogenetic tree of proteins described above, or micrographs, maps and scans. Document scans, sound or video clips, geographical Shape files, 3D structures. The list goes on…

…or if an image is generated with code, you might have the coordinates of supernovae distribution in a galaxy and use Python or R to generate a visualisation for instance. That can also be shared on Wikimedia Commons, perhaps link to a larger data table to contribute to open and reproducible science with much richer context than simply sharing them as a static image.

Persistent identifiers are another crucial component of open science, so if your research project is about people, artworks, places, species,genes…all of those will have identifiers which is turbocharged by linking up with Wikidata.

Tools to visualise Wikidata

Reasonator: a site which gives you an overview of everything that wikidata knows about about a particular topic e.g. Jane Austin

Histropedia: generates timelines from Wikidata including links to Wikipedia articles and images from Wikimedia Commons e.g. Women Engineers (colour coded by country)

Instead of Wikipedia, Histropedia can be coded to link to anywhere so a university curated catalogue for example.

Scholia: Scholia is a service that creates visual scholarly profiles for topics, people, organizations, species, chemicals, etc using bibliographic and other information in Wikidata

Universities can improve Scholia by adding repository links to the Wikidata records for published papers and tagging with appropriate topics.

What next at Leeds?

With Martin’s help, we’ve already begun to experiment with some of the techniques and tools described here, with PEATMAP for example, uploaded to Wikimedia Commons and used on several different language Wikipedias. PEATMAP is by far the most popular dataset in Research Data Leeds which can perhaps in part be attributed to its presence on the global commons. Then there is Wilson Armistead, a British Quaker merchant, slavery abolitionist and author from Leeds, for whom we created a Wikipedia article with its primary reference a research paper by Professor Bridget Bennett from the School of English. This article was also featured on the front page of English Wikipedia meaning many more people know about this local 19th century hero than would otherwise. You can see the spike in views to the Wikipedia article here.

We’re working with Martin to upload metadata from Research Data Leeds to Wikidata so it can be utilised in Scholia and exploring specific research projects with several colleagues across the university. If you have an idea you would like to explore with us please get in touch!

For a lay audience, Wikipedia is the ‘lens’ through which they will see a given subject. In an age of disinfomation, the academy has the expertise, and a responsiblity, to contribute to the global commons to ensure the information is accurate, reliable and properly evidenced from primary, peer-reviewed sources.

Published papers

Poulter, Martin, and Nick Sheppard. 2020. “Wikimedia and Universities: Contributing to the Global Commons in the Age of Disinformation”. Insights 33 (1): 14. DOI: http://doi.org/10.1629/uksg.509

Tattersall, Andy, Nick Sheppard, Thom Blake, Kate O’Neill, and Christopher Carroll. 2022. “Exploring Open Access Coverage of Wikipedia-cited Research Across the White Rose Universities”. Insights 35: 3. DOI: http://doi.org/10.1629/uksg.559