I’ll benevolently assume it’s careless rather than malicious, but the number of issues we have with data access statements in published journal articles certainly betrays a considerable lack of care!

It’s not unusual for a statement to disappear entirely during the editing process, which may or may not be replaced on request, or DOIs are frequently incorrectly formatted so they don’t resolve (examples below). A common phenomenon is where (presumably automatic) formatting prepends something that breaks the link – dx.doi.org or doi.org so you end up with something like dx.doi.org/doi.org/10.5518/180 – or even http://www.doi.org (which would never work anyway). Again, sometimes these are fixed (when we spot them at all) and sometimes they aren’t, even with repeated correspondence, sending a letter a week, like Andy Dufresne in the Shawshank Redemption

One explanation, perhaps, is that the practice still isn’t fully mainstream, despite the RCUK Common Principles on Data Policy stating that published results should always include information on how to access the supporting data (RCUK, 2015). Meanwhile, according to the RCUK Concordat on Open Research Data “publishers should enable the formal citation of data in articles”, however, there is currently no standardised method to describe how supporting data can be accessed. Moreover many journals (still) include data as “supplementary” information which is unlikely to have a unique identifier, may  sit behind a journal pay wall and not be readily discoverable (see this post from David Kernohan for further discussion – Research Data Management, journals and supplementary materials.)

At Leeds we advise that data files should be deposited in a recognised repository to provide long term curation with appropriate metadata and to enable proper citation and that a prominent data availability statement should be included in the body of the paper AND as an entry in the reference list, but we’ll generally see one or the other. Or neither.

N.B. where supporting data is cited in the reference list, there is an additional need to differentiate it from other cited data sources – it has been suggested (see Mietchen et al, 2015) that journals could use JATS (Journal Archiving and Interchange Tag Library) to differentiate between different types of content in a reference list.

A good example of a data access statement:

Naicker, SS and Rees, SJ (2018) Performance analysis of a large geothermal heating and cooling system. Renewable Energy, 122. pp. 429-442. ISSN 0960-1481 https://doi.org/10.1016/j.renene.2018.01.099

In this example, the data is properly referenced in the body of the article* and the ‘Data statement’ is clearly labelled as a discrete section of the article in the outline and can be linked to directly with an anchor link.

The statement itself is very clear:

Data statement

The data collected in this work has been made publicly available at the University of Leeds Research Data Archive https://doi.org/10.5518/255 [23]. This archive includes the high frequency temperature and flow rate data for each loop. Data definitions and error protocols are documented with this data.

* While a good example, even this is imperfect as it doesn’t contain all the appropriate elements in the Reference list (i.e. it isn’t labelled as a dataset, there is no publisher):

S.S. Naicker, S.J. Rees Geothermal Heat Pump System Operational Data: High Frequency Monitoring of a Large University Building

Get in touch

We would be interested to hear from Research Data Management support at other universities. Have you had similar experiences or are we just unlucky? What advice do you give your researchers? Do you have any clever ways of checking data access statements?

Perhaps the problem isn’t peculiar to data DOIs and also affects cited journal articles, i.e. it simply reflects journal editing processes and software anomalies? If you’re a publisher who happens to be reading this, please let us know your policies and processes. How can we best avoid these problems in the future?

Example 1.

The article:

Dijkstra, AG and Prokhorenko, VI (2017) Simulation of photo-excited adenine in water with a hierarchy of equations of motion approach. Journal of Chemical Physics, 147 (6). 064102. ISSN 0021-9606 https://doi.org/10.1063/1.4997433

The problem:

Data access statement in the Acknowledgements section which states:
“Simulation data are available at https://doi.org/10.5518/191” – this is correct but the underlying link was formatted as https://-doi.org/10.5518/191 so it did not resolve correctly.

Status: Fixed (on the first time of asking)

Example 2.

The article:

Mengoni, M, Luxmoore, BJ, Wijayathunga, VN et al. (3 more authors) (2015)Derivation of inter-lamellar behaviour of the intervertebral disc annulus. Journal of the Mechanical Behavior of Biomedical Materials, 48. 164 – 172. ISSN 1751-6161 http://dx.doi.org/10.1016/j.jmbbm.2015.03.028

The problem:

The dataset cited in the reference list was displayed correctly but the underlying link was http://dx.doi.org/http://doi.org/10.5518/2 (presumably an automated process). We asked for it to be corrected and it was, eventually, updated…introducing an entirely new error with a couple of rogue characters appended to the DOI which consequently doesn’t resolve:

Mengoni, Wilcox Ovine annulus fibrosus interlamellar material model calibration data-set
University of Leeds, UK (2015)
[dataset] http://dx.doi.org/10.5518/2An

(should be http://dx.doi.org/10.5518/2)
Status: Attempted fix introduced a new error