EPrints and Research Data, 15th October

Presentations and discussions on the ways EPrints is being used or could be used to handle research data

Participants travelled to Leeds from as far as Southampton and Strathclyde  to discuss how EPrints is being – or could be – used to manage research data. We were particularly pleased that Andrew Bell was able to come along and represent EPrints Services. There was a lot of discussion on the day and, as one participant put it, it was a good opportunity ‘to validate, clarify and challenge our thinking’.

The programme from the day, including presentations, outputs from our discussion groups and a list of attendees is online in Google Docs.

Most participants are at an early stage of research data repository development – some have already made the decision to use EPrints, others are still considering their options.

A report back from the event will be made to the next EPrints User Group at Glasgow School of Art in December 2013.

Some key themes of the day

It’s early days for all of us. The EPrints User Community could/should be a useful group to discuss common issues and tackle them collaboratively, complementing the EPrints tech list.

We need a mechanism to list EPrint development areas so we can express interest and priorities, and also indicate if we, as EPrints sites, are planning to work on a specific development.

Make academics’ lives easier: don’t make them re-key information that already exists elsewhere. For example, how do we import metadata from grants systems and data management plans? Can we work with the Digital Curation Centre to import metadata from DMPOnline? How about importing metadata from elab notebooks – is there scope for work with LabTrove?

Wherever possible, we want drop downs not manual keying (e.g. drop down for grants; publication lookup).

What does EPrints do best? We should think about the role(s) EPrints plays in managing data: is it primarily a discovery metadata store which points to data (wherever the data sits)? Should it be used to visualise and navigate data or is that better performed by other systems? We discussed incorporating a visualisation layer on top of EPrints using Multivalent or similar to look at complex objects like CAD files and 3D models.

It’s also useful to consider what’s out of scope for EPrints – for example, the discussions on the day suggested we’re not expecting to manage volatile, frequently changing data in EPrints. What about very large datasets?

Access control: we’d like more granular access control to content within EPrints. In particular, it would be useful to be able to limit access to specific content by individual user / group of users. A summary of access requirements and possible solutions discussed on the day is online.

Security and confidentiality: are we in a position to handle confidential data? Should there be a confidentiality flag on ingest? Do we even look at two different instances of EPrints for open and closed data?

Metadata: what are the key metadata fields for discovery, when should metadata be captured and what about subject specific metadata? A spreadsheet of core metadata fields was produced by the Research Data @ Essex project and used for the ReCollect EPrints-plug for research data. The spreadsheet is online and now owned by the community. The fields have been mapped against various metadata standards with comments and suggested amendments from several institutions. The spreadsheet will be enhanced with scope notes so that fields are used consistently. The metadata spreadsheet is online in Google Docs. Our work in this area will be shared with colleagues developing the national data catalogue.

NB – it’s difficult to strike the right balance between gathering sufficient metadata and putting researchers off with onerous metadata requirements.

User generated metadata may well be important to aiding re-use of data – particularly across different academic disciplines; it’s a facility we may well want to develop more fully in the future.

Physical data: we’ll want EPrints records for non-digital data e.g. physical specimens. What protocols will we use for this?

Deposit routes: we tend to assume data will be uploaded directly to EPrints (rather like the workflows for research papers). There is already variation in workflow to EPrints (for example, via Current Research Information Systems like PURE and Symplectic); data is likely to make workflow even more diverse. We will certainly need ways to import metadata and associated files.

Roles, responsibilities and wokflow will be different for data: basic QA and editorial work done in Libraries for publications may need more specialist skills.

Links between data and other scholarly outputs: ideally we’ll want to link to journals articles, conference papers etc which analyse the data.

There was some enthusiasm for a hack/mash-up event to tackle some of the identified areas for development such as visualising/rendering particular types of data e.g. geospatial data. A multi-day event would be desirable to really get to grips with defining requirements and coding. Leeds will look into potential funding for such an event.