There was a good turnout for our first Open Lunch on Thursday 25th February with talks from two Leeds researchers from very different disciplines, Professor of Linguistics Cecile De Cat and astrophysicist Dr Christopher Wareing, who led an engaging discussion about preprints in their respective disciplines.

A preprint is an early version of a scholarly article prior to formal peer review. Preprints have been around for a long time with the first preprint server, arXiv, for physics, mathematics and related disciplines, launched in 1991. Today there are preprint servers for every discipline, from biological and chemical science (bioRxiv, ChemRxiv) to social science (SocArXiv) and the humanities. Preprints have become an established method of sharing early scientific results though the lack of formal peer review means caution is required.

Open research in linguistics

Linguists are engaging more and more with open science practices

Professor Cecile De Cat

Professor Cecile de Cat is currently involved in 3 funded projects focussing on different areas of bilingualism. She uses R – a language and environment for statistical computing – to analyse experimental, survey, and corpus data and shares her scripts for transparency and reproducibility.

In Linguistics, it is common practice to cite preprints and Cecile shares early versions of her work on her personal webpage as well as PsyArxiv and the Open Science Framework (OSF), where she receives a high number of reads and downloads prior to formal publication. A preprint DOI means that it can be linked to the final publication and to link citations to different versions.

While Cecile has generally only posted pre-prints of articles subsequently published, she is considering sharing a pre-print for an article not intending for formal publication, so that someone else can pick up and perhaps progress the work. 

A screen shot of Professor Cecile De Cat’s profile on the Open Science Framework. OSF is an integrated environment that enables project management throughout the research lifecycle including study preregistration, posting papers or preprints and sharing data, code and materials.

Survey on open science practices in linguistics

As part of her talk, Cecile presented the results of a brief survey advertised via contacts and word of mouth in early 2021.

There were a total of 159 respondents of whom 57.2 % shared preprints using several platforms including ResearchGate and a university repository as well as OSF (42% already had an OSF account with another 17% wanting one – 39% didn’t know what OSF was.)

The main reasons given for not doing so (67 respondents, multiple responses) was not wanting pre-publication versions to be available (33%) and fear of compromising anonymous peer review (31%). A significant number either didn’t know how (25%) or found it too much hassle (25%).

In terms of sharing data alongside publications (129 respondents, multiple responses) the time it takes to prepare and document was the main reason for not doing so (54%) while lack of ethical approval was also an issue (42%).

Virtually all respondents believed research should be more reproducible with 98% choosing 4 or 5 on a Likert scale (25.8% and 67.3% respectively) while a large majority wished (in principle) to adopt (more) open science practices (e.g. sharing data, sharing protocols, sharing scripts for data analysis) with almost 90% choosing 4 or 5 on a Likert scale (28.3% and 61% respectively)

See the embedded slides above for a fuller breakdown of the survey results.

Preprints in astrophysics

A note on terminology
In the Library 'preprint' tends to refer explicitly to a manuscript before it has been through formal peer review. However, Chris tends not to submit papers prior to peer review. In Library language this would be referred to as a 'postprint', so it's important to consider how terminology varies. The key point is rapid dissemination of research before formal publication and that enables more in-depth content to be included in grant submission for example.

As a computational fluid dynamacist, Chris has worked across several disciplines including Mathematics, Chemical Engineering and Astrophysics. He has published widely in journals and conference proceedings, some more open and accessible than others.

Chris has posted preprints to arXiv since 2005 which is the ‘done thing’ in Astrophysics. It’s a simple process to upload a PDF to an online portal, and within reason you can submit anything. With no barriers to access, arXiv provides a daily digest of new submissions which has become part of Chris’ daily routine, to follow up on submissions he finds interesting, including reaching out to authors with informal feedback. In this way it removes the gatekeepers, levelling the playing field to discuss research. Chris has corresponded with authors from all over the world who are interested in his work. As part of his disciplinary culture, Chris was encouraged to post preprints and now does so with his own PhD students.

Different disciplines, practices, services…

Having heard from Cecile about preprints in Linguistics, it was interesting to hear that citing preprints is not usual practice in Astrophysics. Neither Chris or Cecile have had any issues with publishers subsequently publishing a paper that has been posted as a preprint.

When Chris moved to Chemical Engineering in 2011, he found it lacked the same preprint culture as Astrophysics so he switched to the White Rose Research Online (WRRO) repository instead which is not discipline specific so can reach a broader audience. It also means his papers are eligible for REF.

Note: ChemRxiv: the preprint server for Chemistry was set up in 2017 which would perhaps be most appropriate for Chemical Engineering today?

Historically, preprints - in the Library sense of a manuscript prior to formal peer review - have not been in scope for WRRO. However a dedicated 'preprint' item type will be added later in the year. The use of the terms 'preprint' and 'postprint' in this way is arguably quite confusing as discussed on Twitter here.  

Chris also referred to ResearchGate (which was also the most popular service for sharing preprints according to Cecile’s survey) citing the advantages associated with social media (dynamic, community driven) and the disadvantages associated with…social media! ResearchGate also doesn’t tend to handle preprints very well which are non-static and can be confounded by multiple sources.

From a Library perspective we would also emphasise that ResearchGate is perhaps comparable to FaceBook in that ‘you are the product’ and it’s not clear how your data is being used and commodified.

Sharing Data and code

Chris shares data routinely, which typically would be via a link to a data repository (see Chris’ submissions to the Research Data Leeds repository) However, he rarely shares full code which often has commercial / copyright issues. Code is generally available on request, and can lead to collaboration but is very specialised and can be misused easily.

Pros and cons

These are the pros and cons as outlined by Chris. These are also discussed in a previous blogpost: How preprints provide rapid research dissemination in a crisis.

Future events

Open Lunches will be a regular series, presenting reflections on open research practice every month. The next event in the series will be next Wednesday 17th March about Tools for reproducible research. All welcome!

If you have an idea for a future event, please get in touch.