Summary of findings from RDM survey, 2012 including link to anonymised survey data
The Leeds Research Data Survey ran from July 2012 to 1st Nov 2012. The survey was planned in conjunction with the University’s Research Data Steering and Working Groups. We reviewed use of the Data Asset Framework at other institutions including the Universities of Bath, Edinburgh and Southampton.
Questions we wanted to answer..
It was agreed the Leeds priority was establishing a high level overview of data assets and current research data management practices – primarily to inform capacity planning. We aimed to maximise survey uptake by offering a relatively short set of questions. The survey was primarily targeted at PIs to avoid multiple submissions regarding the same data.
What we did..
The survey was created using the Bristol Online Surveys platform and publicised in conjunction with the new University Research Data Management Policy via an email from the PVC for Research and Innovation. Publicity was sent to researcher email networks and to all Faculty Research Managers with encouragement to forward to their networks. A link to the survey was included in the For Staff area of the University website and in the August Staff enewsletter. Members of the Research Data Steering and Working Groups encouraged completion. No prize incentive was offered.
What we found..
Full survey results are online as an Excel spreadsheet at: http://tinyurl.com/c28lk5n
242 completed responses were received and analysed. It was surprising to find the largest percentage of survey responses came from the Arts Faculty, which has far fewer grant holders than Engineering, Environment or Medicine and Health. We had responses from all 9 Faculties at the University. Unsurprisingly, as the survey was aimed at PIs, 80% of our responses came from academic staff, with the rest mainly coming from research assistants, clinical staff and around 7% from Post Graduate Research students.
A full list of data types is available from the survey spreadsheet. The top ten most common formats are:
|Documents (e.g. text, Microsoft Word, PDF), spreadsheets||
|Statistical data sets (e.g. SPSS, Stata, SAS)||
|Books, manuscripts (including musical scores)||31|
|Laboratory notebooks, field notebooks, diaries||30|
|Photographs / other images||28|
|Interviews (including transcripts)||28|
|Laboratory instrument data (e.g. from microscopes, chemical analysers, monitors etc..)||24|
|Computer software (e.g modeling / simulation, schemas)||24|
|Models, algorithms, scripts||20|
The output from BOS also allows us to breakdown the type of data by Faculty.
The view across Faculties illustrates some differences in types of storage used, volume of data generated and how much data researchers believe would need to be kept for others to validate their research findings. Respondent numbers in some Faculties are low so findings may not be fully representative. Nonetheless, they may serve to illustrate broad Faculty differences and similarities – for example, respondents from all Faculties are making some use of Cloud storage, primarily Dropbox.
Respondents were asked to estimate how much their data volume was likely to change over time. The majority indicated an increase of 25% or less. Although there was a methodological issues with this question (some respondents were frustrated that ‘stay the same’ and ‘don’t know’ options were not offered), nonetheless, the small number of researcher anticipating a growth of more than 25% suggests we may not have a huge increase in data to manage in the immediate term.
40% of respondent do not create metadata for their research projects; 36% of respondents create metadata and 24% create metadata but did not realize this is what they were doing. Of the respondents who create metadata, 20% felt it was ‘full’, 46% ‘partial’ and 34% ‘extremely limited’. Some commentators recognised the importance of metadata and the need to be more systematic in creating it. Others noted that there may not be specific resource within a project to tackle metadata:“Attempt to do as full a job as possible, but on large projects this is a huge undertaking – not usually funded separately to research time.”
Attitudes on the appropriate level of metadata to enable re-use varied:“I create corpora for re-use by others, but cannot know what else they will use it for – so I cannot predict what metadata they would like, hence it is most practical to keep this to a minimum.” “We have developed protocols for gold standard metadata collection and its presentation.”
Data Management Planning
The headline finding that 44% of respondents had completed a data management plan hides a wide variation across Faculties with a much lower % from MAPS, LUBS and ARTS and higher from ESSL and FMH. Respondents showed a wide variety of attitudes towards data management planning – it is clearly part of the culture in some subject areas whereas others see it as unnecessary bureaucracy. Although some respondents were completing detailed and diligent DMPs, other comments were along the lines “it was not very detailed, and I would hesitate to call it a formal DMP.”
Has a DMP ever been completed for any of yourresearch projects?
Choose 3 words that best express what you see as the challenges of research data management at the University of Leeds.
65% of respondents were willing to be followed up. Some may be interviewed in more depth.
There are other RDM areas which are of immediate interest but which were not addressed through the survey. These could be explored through interview and/or in future surveys. Some examples:
More detailed profiling of data practices at a Faculty level, including identifying datasets in scope for an institutional data repository.
The appropriate point in the research data lifecycle data for repository deposit.
How long researchers anticipate keeping their data for.
Responsibility for research data over time e.g. if the PI leaves.
The amount and location of non-digital data.
Attitudes towards sharing research data.
Awareness of and reaction to the University Research Data Management Policy.
Training and guidance requirements.
- The progress of RoaDMaP through ethical review took slightly longer than anticipated resulting in the survey being publicised later than planned; anticipate and allow time for ethical clearance.
- The number of free text comments suggests many of those filling in the survey were very interested in the area.
- At 242 the number of responses was somewhat lower than hoped (approximately 10% of the target group), we could have been clearer about the target numbers – for example, what level of response would allow generalisations to be made about storage requirements at the Faculty level.
- An individual champion for the survey in a Faculty can boost the response rate – for example, a direct approach from one of our Faculty of Biological Sciences professors.
- Most researchers are using a variety of storage locations; there was some concern about how many researchers are using hard drives on desktop computers and portable storage devices (generally not as their only storage location).
- Most datasets are
- The % of data researchers in different Faculties wish to retain will vary significantly.
- Storage (access to and security of) was the main area of concern for researchers.