Only weeks into January and like an inveterate smoker, I’m struggling to honour my New Year’s resolution and falling behind in my MOOC. Largely due to already scrabbling up the learning curve of a new role. When I do get round to it I find it very useful and naturally complementary to everything I am learning on the job.
We’re now approaching the end of week 3…but I only completed the assignment for week 2 yesterday, focused on the humble Data Management Plan, or DMP to its friends.
The practical assignment presented us with a scenario to use as the basis for a DMP, suggesting a couple of tools, DMPTool from the University of California and DMPOnline from the Digital Curation Centre as well as the framework provided by the DCC Checklist for a Data Management Plan, v4.0. From a UK perspective it might be more typical to use DMPOnline but I’ve already had a play with that so in honor (sic) of today’s Presedential inauguration and the Special Relationship I chose to try DMPTool which also includes a template from NSF-SBE as cited in the scenario.
As a newcomer to RDM it’s the sheer complexity of the myriad aspects and associated best practice that is so daunting which is where formal planning comes in, to impose some order on the primordial project chaos.
Using the DCC Checklist a DMP breaks down as follows:
- What data will you collect or create?
- What documentation and metadata accompany the data?
- How will you manage any ethical issues?
- How will you manage copyright and intellectual property issues?
- How will the data be stored and backed up during research?
- How will you manage access and security?
- Which data should be retained, shared, and/or preserved?
- What is the long-term preservation plan for the dataset?
- How will you share the data?
- Are any restrictions on data sharing required?
- What resources do you require to implement your plan?
A DMP then serves two primary functions:
to describe the data produced in the course of a research project
outline the data management strategies that will be implemented both during the active phase of the research project and after the project ends
Research funding bodies, including the Economic & Social Research Council (ESRC), the Engineering and Physical Sciences Research Council (EPSRC) and the Wellcome Trust in the UK as well as the EU Horizon 2020 programme, increasingly require a DMP as part of the application process but even if you don’t have formal funding for your research, writing a DMP can be an invaluable exercise, for PhD and other Post Graduate Researchers for example, which was the focus of a recent training session delivered by my colleagues Rachel and Graham, ‘Research Data Management Essentials for your Research Degree’ (this was delivered to a cohort of 24 PGRs with more signed up than showed up. There is a waiting list for future sessions so clearly an appetite for RDM amongst PGRs, see here for details of future sessions.)
Back to research funding bodies, who I have no doubt wish to make life easier for researchers, also require a DMP for somewhat more pragmatic purposes:
- Transparency and openness – since many funding bodies are allocating public money, they have a responsibility to ensure research outputs are preserved and made accessible to the public
- Return on investment – maximise potential reuse of data and if someone invents the wheel, ensure that money doesn’t need to be spent again to reinvent it
By now I do have some experience of real life DMPs, mainly based on our application stage template which is primarily to identify any potential costs but which we also use to promote best practice; for the practical exercise I tried to expand beyond the parameters of the supplied case study rather than just rewrite it in the form of a DMP.
For example, the scenario refers to computer-assisted telephone interviewing (CATI) software where “the final, cleaned data will consist of a single SPSS file” but I googled CATI (outputs as something called Blaise data files) and tried to think through the implications of SPSS as a preservation format:
Data format and dissemination
Blaise data files are a proprietary data format and therefore not suitable for preservation and data sharing. SPSS is also proprietary software, nevertheless in widespread use within research institutions and unlikely to present access problems in the short to medium term. However the data will also be exported as .csv where possible and (anonymised) interview transcripts will be retained as plain text (.txt) to ensure accessibility without specialist software.
I found DMPtool fairly user-friendly though the NSF-SBE template arguably repeats requirements in some of the section guidelines, also commented upon by others, and could perhaps be clearer to avoid overlap.
The other aspect of data management planning that I am particularly interested in is the emphasis that it should be a ‘living’ document, revisited throughout a research project. I may be wrong but I get the sense that this rather paid lip-service and would like to explore tools and strategies to prompt a PI to proactively revisit their DMP. I was at pains to emphasise this in my own plan stating that “it will be proactively reviewed on a monthly basis and/or at suitable project milestones (TBC)”. It won’t of course, but I have the excuse that this is only pretend…