REF2021: towards Open Research

With the funding bodies’ Initial decisions on the Research Excellence Framework 2021 published at the beginning of September including a paragraph on ‘open research’ we consider what this might mean as the REF takes shape.

29. The revised template will also include a section on ‘open research’, detailing the submitting unit’s open access strategy, including where this goes above and beyond the REF open access policy requirements, and wider activity to encourage the effective sharing and management of research data. The panels will set out further guidance on this in the panel criteria. 

Initial decisions on the Research Excellence Framework 2021 (pg 9)

While still some way from full Open Access in the UK we are getting closer, largely thanks to HEFCE’s “Policy for open access in the post-2014 Research Excellence Framework” which came into effect in 2016, on April Fools’ day in fact. Nevertheless it has been taken very seriously. REF is no laughing matter!

The REF has sometimes been maligned as an expensive bureaucratic exercise ill-fitted for purpose, yet the goal of promoting the value and impact of publicly funded research is surely worthwhile and as advocates for all things ‘open’, it at least provides a stick on which to dangle our carrots.

In lieu of the further guidance promised, can we pre-empt some of the activity and initiatives that might contribute to ‘open research’ above and beyond the REF open access policy requirements?

N.B See the updated HEFCE FAQ, specifically:

7.1. What aspects of OA should submitting unit’s include in the environment statement section titled ‘open research’?

Research Data

It is good to see this referred to explicitly at this early stage, following on from the Concordat on Open Research Data published in July 2016 focused on ensuring that research data is made openly available wherever possible.

In actual fact research data was already an eligible output for REF in 2014 and the exercise in 2021 will continue to assess “all types of research and forms of research output”. Nevertheless infrastructure and best practice around RDM are still developing. At Leeds the RDL team based in the Library provide support and advice throughout the research lifecycle. We run an institutional data repository providing long term, secure storage and associating data with a Digital Object Identifier (DOI), a persistent identifier that will facilitate formal citation. Alternatively use the Registry of Research Data Repositories (r3data) to identify a suitable discipline specific repository.

Other useful organisations include Jisc and the Digital Curation Centre.

Potential questions for REF2021:

  • Is the data underpinning your submitted outputs safely stored according to best practice?
  • Is that data openly available (if appropriate) or is it clear how it can be accessed (i.e. does the paper include a suitable data statement)?
  • Has your data been reused by other researchers / initiated collaboration?
  • Do you have established protocols for data management planning that is followed for all research projects?


ORCID is an open, non-profit, community-based initiative that provides a unique identifier to reliably differentiate individual authors and enables connections between systems. Linking your ORCID to Symplectic, for example, will provide an additional method for the system to reliably identify your published work and add it to your Symplectic profile, your ORCID will also be passed over to the White Rose Research Repository (WRRO) when you deposit a manuscript.

ORCID increasingly underpins an open scholarly infrastructure, nationally and internationally and is also supported by Jisc.

Related post: So you’ve got an ORCiD…what next?

Potential questions for REF2021:

  • Do all of your submitted authors have an ORCID?
  • Are they using their ORCID profile effectively?
  • Are you actively using ORCID to integrate systems and improve workflows?


Another area that is discussed in the document which identifies “an explicit focus on the submitting unit’s approach to supporting collaboration with organisations beyond higher education” (pg 6, para 18).

The benefits of open research to collaboration opportunities with such organisations are obvious, whether the NHS or SMEs who may not otherwise be able to find or access the research and data they need to further their own mission. Perhaps there is also a question here of targeted dissemination, via social media for example – making research available online doesn’t mean the right people will simply stumble across it.

Potential questions for REF2021:

  • Have you adopted open research practices that are conducive to collaboration?
  • To what extent have these been successful?
  • Are you proactively building and monitoring a network around your research (e.g. by leveraging alternative metrics)?


The document  acknowledges that work is required to align definitions of ‘academic impact’ and ‘wider impact’ which relate respectively to the assessment of outputs and the impact element of the REF. Notably the weighting for impact has increased from 20% to 25% – as was in fact originally proposed for the 2014 exercise.

There will be additional guidance on the criteria for both ‘reach and significance’ and impact arising from public engagement – it is not hard to anticipate how an open research agenda will feed into each of these. There is evidence that OA increases traditional citations for example while developments in alternative or “altmetrics” are enabling online social activity around research to be recorded and measured. 

Repository downloads also provide a valuable article level metric, indeed we might expect correlation with traditional citations, even causation. The IRUS-UK* service provides COUNTER compliant download statistics for the majority of UK based repositories which means that downloads are standardised and filter out automated downloads by search engine robots for example.

* With 3,766,192 downloads since October 2013, and as might be expected for a consortium of 3 research intensive Universities, IRUS-UK reveals that the White Rose Research Repository is one of the most highly downloaded in the UK. Leeds accounts for 1,773,744 of those downloads.

Potential questions for REF2021:

  • To what extent are you engaging with audiences beyond academia?
  • Do you produce plain language precis of your research?
  • Are you exploiting social media to engage with academic and lay audiences (e.g. Twitter, blogs, Wikipedia)?
  • Are you analysing quantitative data from these sources?

Related post: Wikipedia, information literacy and open access

The Research Support team based on Level 13 of the Edward Boyle Library will continue to review REF guidelines as they are released and associated developments across the sector. You can get in touch by email or on Twitter.

In the meantime, you must ensure your research outputs meet the new REF open access requirements by depositing your author accepted manuscript via Symplectic as soon as possible after acceptance



Twitter and Scholarly Communication: do you pass the Turing test?

Robinson-Garcia, N., Costas, R., Isett, K., Melkers, J. and Hicks, D. (2017). The unbearable emptiness of tweeting—About journal articles. PLOS ONE, 12(8), p.e0183551.

Underlying data to the study

This recent paper from Robinson-Garcia et al, part of a project looking at dissemination channels for dentistry in the US, has (ironically enough) gained considerable traction on Twitter:

As a low-barrier platform to interact with a broad audience Twitter has proved popular with social-media savvy academics as a channel to disseminate their research outputs. It’s also infested with automated accounts, the dreaded Twitter bot, spewing links into the ether, everything from pornography to cutting edge research.

Robot image from Research Data Leeds dataset
Robot image from Research Data Leeds dataset

It’s so easy to tweet a link to an article, in fact, by clicking a button on a journal or repository for example, that many real people are indistinguishable from robots and the paper finds that, at least in the field of dentistry, less than 10% of tweets exemplify “an ideal of curating and informing about the literature”.

“The bulk of tweets about dental papers were sent by accounts seemingly run by people but whose dental journal article tweeting could be easily automated”

It’s an interesting and valuable paper. However, the value of Twitter as a tool for disseminating research is not as badly undermined as the provocative title might suggest. No disrespect to the authors who clearly know a thing or two about promoting their work (as of 1pm on Friday 1st September it has a very healthy altmetric score of 448 – including 659 tweets from 606 users, with an upper bound of 1,316,619 followers).

To see the live score see

Metrics have a lot to answer for and the paper is about counting tweets as a potential indicator of reach and impact. What it’s NOT really about is tweeting about your research, which can be valuable if you do it properly, spend time developing your network and interacting with them and with your research in a meaningful way.

This is the type of interaction we hope to encourage via the Open Research Leeds Twitter account @OpenResLeeds, which rather than that 90% of noise, we aim to be amongst the (nearly) 10% of valuable dissemination channels and a node in various academic networks across the University of Leeds and beyond.

One initiative is to leverage altmetrics to disseminate research when the ‘green’ self-archived version of the manuscript is released from embargo from the White Rose repository. The colour coded altmetric ‘score’ that is embedded in all WRRO and Symplectic records can be used to identify how and where journal articles have been disseminated and Twitter can be used to amplify the impact of research outputs, by retweeting a Leeds based author, for example, or linking to an open version of a paper from a mainstream news article discussing the research. The actual score doesn’t really matter, it’s simply a convenient method to visualise the network.

We are keen to develop synergies with other Leeds based accounts, through reciprocal retweets for example, and have curated a list of nearly 700 accounts associated with the University of Leeds – lists are a feature of twitter that offer a great way of limiting ‘noise’ by focusing on a specific subset of users such as a research community. ‘Hashtags’ can also be employed to emphasise specific types of content, #openaccess, for instance or #JiscRDM which is a powerful method of building community and attracting subscribers to your network.

(#JiscRDM is promulgated by Jisc to foster a community around Research Data Management and is used at community events such as the Research Data Network –

So tweeting your research need not be unbearably empty, just don’t be a robot.

Further reading:

To Tweet or Not To Tweet –  an Academic Questions [blog] (by Dr Ben Britton)

What happens when you tweet an Open Access Paper [blog] (by Melissa Terras)

Network effects: on alternative metrics [blog] (by @ukcorr)

Social Media for Academics [book] (by Mark Carrigan)

Wikipedia, information literacy and open access

In the 1980s my parents invested in an edition of the Encyclopædia Britannica. It’s still there, taking up shelf space in its burgundy livery, unopened since 1993, the information frozen in time like Britpop and New Labour.

Encyclopaedia Britannica 15 with 2002

SEWilco (Own work) [GFDL, CC-BY-SA-3.0 or CC BY-SA 2.5-2.0-1.0], via Wikimedia Commons

Wikipedia was launched in 2001, some 12 years after Berniers-Lee invented the Web, and is often maligned in academia. Yet it remains a default information source for many a denizen of the Web, whether layperson, undergraduate or PhD (citation needed).

Unlike a leather-bound Britannica, information is dynamic, updated by a small army of volunteers and there can be little argument regarding the success of the project in terms of sheer scale and cultural impact. Wikipedia and its sister projects are edited at the rate of 10 edits per second (which would equate to more than three-quarters of  a billion updates since 1993. That’s a lot of crossings out.)

Whether all those edits are accurate and properly referenced is a moot point of course, and like any informational resource, digital or otherwise, requires its readership to exercise its critical faculties.

Information and digital literacy

“Information Literacy is an umbrella term which encompasses concepts such as digital, visual and media literacies, academic literacy, information handling, information skills, data curation and data management”  SCONUL Working Group on Information Literacy 2011

Cognitive bias is universal, no less so in an information environment mediated by search engine algorithm, which is why peer review is essential in scholarship. Wikipedia’s model of collaborative authorship arguably provides a form of peer review and also supports formal academic citation, with many articles referencing peer reviewed sources, often by DOI. However, we are still a long way from full open access and many such references will inevitably be behind a paywall, inaccessible to those without access to a subscription through a university library i.e. those laypeople who might benefit most.

Fortunately it is quick and easy to sign up for a Wikipedia account and add a link to an open access version, in the White Rose Research Repository for example. WRRO also displays a colour coded Altmetric score to help identify when an article has been cited in Wikipedia – look out for dark grey in the patented donut:

Altmetric detail
Wikipedia links for “Investment and risk appraisal in enery storage systems: A real options approach”

Above is the altmetric page for Investment and risk appraisal in energy storage systems: A real options approach (DOI: 10.1016/ which is linked to from a Wikipedia page on Energy Storage and which took just a few moments to edit the linked title from the published version to the WRRO record (the link to the version of record is maintained via the DOI; click on the image to see the citation on Wikipedia):

Editing a Wilipedia page
Editing a Wikipedia page

To what extent the casual visitor will actually consult a citation and follow a link to a peer reviewed source is another question. Nevertheless, the very act of contributing to Wikipedia in this way will help to embed information literate practice into this culturally significant informational resource. It also extends the network of open scholarship which, in turn, will subtly influence those search engine algorithms.

A single link perhaps not so much, but three quarters of a billion…

For more information on contributing to Wikipedia and Open Access see Wikipedia:WikiProject Open Access

New video: Open Access Explained

A recent survey surfaced some common questions that academics have when depositing their papers in Symplectic. In response, the Library’s Research Support Team has created a 3 minute video to address some of the issues raised. Watch the video to find out more about the benefits of Open Access; how to deposit your paper via Symplectic; what is the Author Accepted Manuscript; what is Gold Open Access and Green Open Access; how the Library checks and applies embargoes, only making publications open access once embargoes have expired; what funding is available for article processing charges, and where to find additional help through your School’s Open Access contact and the Library Research Support Team.

Research Data Network – University of York – June 2017

If Jisc’s 4th Research Data Network earlier this week felt a bit rushed at times, it only reflects the sheer number of exciting projects happening across the sector.

There’s still a long way to go but it felt like the dots are really starting to join up and there was lots of energy in both real and virtual space – see Storify of tweets on the #JiscRDM tag during the event.

Delegates busy networking on Tuesday evening at RDN York (thanks to Paul Stokes for the photo, used with permission)
Delegates busy networking on Tuesday evening at RDN York (thanks to Paul Stokes for the photo, used with permission)

Two packed days in York were bookended by an inspiring opening keynote from Mark Humphries asking “Who will use the Open Data?” and by a panel session the following afternoon on the principles and practice of open research, informed by the open research pilot project at the University of Cambridge.

Mark emphasised that there is a clearer rationale in some academic contexts than others. Clinical trials, for example, are time consuming and expensive and need to be safe and effective which provides a clear motivation to share data and check conclusions.

Mark singled out his own discipline of neuroscience however as lagging behind, with no discipline specific open data repositories, and inclined to “data worship”. New data is hard to get and requires considerable skill (to implant electrodes in a rat’s cortex for instance) and will underpin high-impact papers, that universal currency of academia. It’s not for sharing!

Mark reassured us, nevertheless, that open data is the future. Inevitably. If only due to the sheer scale of data being generated which simply has to be shared if it is to be analysed effectively, citing an instance whereby a single dataset generated 9 high quality papers from several labs. RDM isn’t trivial though, one of the main reasons that funding bodies are mandating data sharing.

Some 28 hours later, we were back in the same lecture theatre for the final session chaired by Marta Teperek. Our four panelists fielding questions from the floor were David Carr (Wellcome Trust), Tim Fulton, Lauren Cadwallader (both University of Cambridge) and Jennifer Harris (Birkbeck University).

There was a great deal of emphasis on the cost of open research and sustainability – by way of answer to the question above, Lauren Cadwallader referred to her recent blog post Open Resources: Who Should Pay? and shared her reservations about the ‘gold’ model of open access that is sustained by expensive Article Processing Charges to commercial publishers.

There are similarities and synergies between OA and open data initiatives, including increasing interest from publishers. There are also significant differences and it was pointed out from the floor that long term preservation is a cost that needs to be borne by someone.

Betwixt these bookends were far too many sessions to discuss in detail, covering everything from the European Open Science Cloud (EOSC) to an update on the work HESA is doing in relation to research data in the context of REF2021, Archivematica for preservation and some fantastic resources for business case development and costing for RDM (including a number of useful case studies). Then there’s the Research Data Alliance which *anyone* is able to join and which offers a window onto many different communities.

It was particularly interesting to learn about ongoing developments with Jisc’s shared service which is working with 13 pilot institutions on repository and preservation solutions and comprises a range of tools to capture, preserve, disseminate and allow reporting. The pilot offer also includes training, support and gathering of best practice. Pilot users will be testing these systems throughout the summer and providing feedback with a view to rolling out production between April and July 2018.

The UK research data discovery service (beta), part of the Jisc Research at Risk challenge to develop RDM infrastructure, enables the discovery of data from UK HEI’s and national data centres.

Leeds contributed to the event by sharing lessons learned when setting up our RDM service and with a lightning talk.

All in all a valuable couple of days with lots of information still to synthesise and file away. Indeed to preserve in one’s cortex…now where’s that neuroscientist?

Slides from all sessions and extensive notes are available from

RDN Lightning talk – Open Research Leeds (@OpenResLeeds): networks, metrics and #openresearch

These are slides for a lightning talk next week at the Research Data Network in York:

N.B. Altmetric data (slide 9) – I ran all DOIs available from IRUSdata-UK against the API on 22/06/2017, available in this Google sheet.*

Note that not all repositories appear to expose DOIs in a manner that is currently available to IRUSdata. In addition, several repositories do not differentiate types of DOI (i.e. DataCite DOIs assigned to a dataset vs publisher DOIs pointing at an associated journal article.)

* Instructions how to do this available at

White Rose Libraries Digital Scholarship Event

Last week I attended an event in Sheffield that brought together colleagues from across the White Rose consortium (Universities of Leeds, Sheffield and York) to explore developments in Digital Scholarship. Whatever that might be…

Indeed, several speakers throughout the day drew attention to potential problems of terminology – the other common descriptor is Digital Humanities – with Ben Outhwaite in his keynote differentiating between the plain scroll and the later codex to illustrate that technology has always facilitated new methods of analysis and that digital technology isn’t qualitatively any different. Digital Humanities is simply humanities research driven by the opportunities offered by new media.

Anne Horn, University Librarian (Sheffield), conversational in her introduction, provided a preliminary definition and elicited perspectives from the audience. Anne emphasised interdisciplinary collaboration, with the Library as an active participant – a theme that recurred throughout the day – and suggested that communities coalesce around both technology and processes as well as content and datasets. She talked about the challenges of building and sustaining the broad range of knowledge and skills required, an area in which the Library has a clear role.

In one of several academic viewpoints throughout the day, Mike Pidd described how the Digital Humanities Institute at the University of Sheffield is self-funded through project collaboration and supports technology R & D in the humanities with services ranging from data acquisition, data modelling and data management to data visualisation and preservation and sustainability. We learned about just a few of the projects within the HRI, like The Digital Panopticon which has brought together genealogical, biometric and criminal justice datasets held in the UK and Australia to explore the impact of different types of punishments on the lives of 90,000 people sentenced at the Old Bailey between 1780 and 1875. The scale of the project is impressive having linked records across 45 separate datasets both public and commercial (e.g. Ancestry UK) illustrating a common challenge negotiating with data providers.

Other projects are Locating London’s Past*, Old Bailey Online*, Linguistic DNA and Mark My Bird all of which are capturing and reusing data in innovative ways, backing up Mike’s statement that “data is just as important for your career as publishing books and articles”.

* Raw XML data from London Lives and Old Bailey Online is available from Sheffield’s data repository ORDA

The Library Showcases, reprised in the afternoon, were an opportunity for us to learn about digitisation projects within archives and special collections across the consortium:

The presentation from York, for example, emphasised the complexity of these types of project requiring a broad range of skills from traditional document preservation, digitisation/ingest and development of an editorial interface (the editing tool for the Archbishops’ Registers is available from github.)

Digitised excerpt from Henry VIII’s divorce from Anne of Cleves (Archbishops’ Registers)

Digitised excerpt from Henry VIII’s divorce from Anne of Cleves (Archbishops’ Registers)

High quality digital images facilitates zoom-in with no loss of fidelity

A couple of academic viewpoints spanned lunchtime with Louise Hampson from the Centre for the Study of Christianity & Culture at the University of York and Brett Greatley-Hirsch from the University of Leeds.

Louise talked about the legacy issues of migrating CD Roms to internet based resources, both practical difficulties for a small team and (re)negotiating usage rights while Brett immediately won over the room by saying that libraries should be recognised as active collaborators and not mere support services.

Brett has come to Leeds via Australia and Canada and introduced us to Digital Renaissance Editions which publishes open-access electronic critical editions of non-Shakespearean early modern drama.

The second of my Library Showcases was Sheffield’s National Fairground & Circus Archive, a “living” archive actively “contributing to the organisation and promotion of shows and festivals” and drove home yet again the broad range of skills required to curate digital material.

All of which brought us to an energetic keynote from Ben Outhwaite who described a somewhat fragmented landscape at Cambridge with various pockets of work that perhaps lack cohesion across a University where STEM subjects tend to prevail. The University is beginning to look at the area strategically however, to support their digital humanists who might be collaborating with scholars elsewhere through the Digital Humanities Network, a university funded, short-term, strategic initiative. Ben also talked us through the high profile Casebooks Project, making available the astrological records of Simon Forman (1552-1611) and Richard Napier (1559-1634) “unparalleled resources in the history of early modern medicine”.

The best projects are idea-led not technology led, according to Ben, and there needs to be a real scholarly need, a theme that came through strongly in presentations throughout the day with digital technology an integrated aspect of all projects. Digitisation, though, undoubtedly leads to more opportunities

Crucially “You can’t do anything without data, collect and look after the data rigorously“.