Timescapes: Archiving and sharing Qualitative Longitudinal data

Professor Bren Neale describes some of the challenges of managing qualitative longitudinal research data and reflection on the relationship between the RoaDMaP project and Timescapes, one RoaDMaP’s case study projects.

Timescapes provides an interesting research data management case study – partly because Timescapes staff have a wealth of experience of data management practice but also because Timescapes has created a subject specific archive being managed within an institutional data management structure. A recent leaflet outlining the new UK Data Service states that “the UK Data Service will be operating a more selective collections development policy which will include working with universities towards the goal of holding more research data within institutional repositories.” The relationship between different types of repository is something we will need to be aware of and plan for. The following post by Director of Timescapes, Professor Bren Neale,   outlines the Timescapes programme and its research data management challenges.

Timescapes – our Research

One of the major tasks of the ESRC funded Timescapes programme (www.timescapes.leeds.ac.uk, 2007-12) was to create a specialist archive of Qualitative Longitudinal (QL) social science research data for sharing and re-use. QL data, which is gathered over time through in depth interviews and ethnographic methods, explores the lived experience of change and continuity in the social world and gives insights into how and why micro processes unfold.

The Timescapes Data Archive

Over the five years of the programme we set up the Timescapes archive as a collaborative venture with the University of Leeds Institutional Repository (LUDOS), using the DigiTool platform from ExLibris, and hosted by the University Library. We developed the resource in collaboration with the UK Data Archive and, in doing so, adhered to national level standards for data management and archiving.

By the close of our funding in May 2012, we had archived nine social science datasets, comprising nearly 3,000 files of multi-media data. Eight of these datasets were drawn from a network of projects that were funded through Timescapes to explore the dynamics of family lives and relationships. The ninth dataset (on the experiences of health and illness) was drawn from our network of affiliated projects.

The Timescapes Affiliation Scheme

The affiliation scheme was set up to encourage data sharing and re-use; over the course of our funding we supported the development of over 50 QL projects, and found ourselves expanding into inter-disciplinary areas of scholarship as QL methods became more widely established. This included a project funded by the Engineering and Physical Sciences Research Council on the Dynamics of transport. We currently have a queue of affiliated projects ready to deposit data in the archive, demonstrating the researchers see value in making their data available to share.

Facilitating re-use

To encourage secondary use, and develop a community of users for the resource, we set up a secondary analysis demonstrator project and a series of training workshops; to date we have over 200 users registered for the archive and requests for registration continue to grow steadily. These are significant achievements in a context where hardly any QL datasets were available for re-use at the outset of our programme. New projects continue to seek affiliation with Timescapes, benefitting from ongoing methodological and data management advice. This reflects a growing commitment among researchers to archive and share data as an integral part of the research process.

New models of archiving

The advances outlined above were achieved within Timescapes through a stakeholder model of data sharing and re-use. In this model, archiving does not occur in a vacuum, but is harnessed to particular research agendas and becomes embedded in the research process as a project unfolds. This is important in QL research because there is no clear point at which a primary project, which is addressing dynamic research questions and producing cumulative findings, comes to an end, and secondary use can begin.

Archiving, in this context serves a dual purpose. It becomes a useful tool for the safe storage and longitudinal use of data by the originating team, as well as creating archive-ready data for wider sharing and re-use. The archive is set up in such a way that primary researchers can store their data in secure areas of the resource, with the originating team controlling who can access the data. The sense of ‘giving data away’ is therefore avoided. Providing data security through such access controls is often preferable to the process of anonymising data, since this can strip some qualitative data (especially audio and images) of their integrity and meaning.

Timescapes – in the longer term

In the longer term, Timescapes aims to build further collections of QL datasets for sharing and re-use, bringing related datasets together through the archive, and providing refined means of thematic searching and data retrieval both within and across projects. This creates the opportunity for new forms of cross project analysis and the potential to enhance the evidence from studies that are often small scale, scattered and localised in their findings and impact.  For example we have recently set up a network of projects that are researching the voluntary sector using QL methods.

Funding permitting, we will scale up the evidence on the third sector through a programme of archiving, data sharing and knowledge exchange activities across the network. Of crucial importance, this new project will also promote the ethos of data sharing and re-use within policy and practice communities. Two further networks are under development, both of which address important themes for public policy (environmental sustainability and the lived experience of welfare reform).

Data Management Planning (DMP) & RoaDMaP

Data Management Planning (DMP) is integral to the developments outlined above, and has prompted us to produce guidelines for QL researchers who are facing the challenges of organising and presenting cumulative waves of data for their own and others’ use (see www.timescapes.leeds.ac.uk/about/timescapes-methods-guide-series for our methods guides on the archive, secondary analysis, the ethics of data sharing and re-use, and data management planning).

Our inclusion as a case study in the JISC funded RoaDMaP project here at Leeds has highlighted this important dimension of our work and enabled us to reflect on the processes involved and how we might have managed things better. We are also considering what we need to do in the longer term to sustain and improve the data resource that we have created, both in terms of technical development and its scientific value and use to the research community.

Research Data Management challenges

The challenges that we have faced straddle two domains: research and archiving. The research tasks are associated with the generation and safe storage of data, and preparation for archiving and sharing, including:

  • Identifying a lead researcher to take overall responsibility for DMP, sourcing suitable training for this role, and costing and allocating sufficient time and budget for this task from the outset as an integral part of a project.
  • Ensuring high technical and scientific standards for the generation of data in the field.
  • Developing and applying ethical templates to seek permission from research participants to archive and share data about their lives, including transferring copyright to researchers.
  • Specifying and applying mechanisms for the safe storage, formatting, ‘future proofing’ and labelling of data files that accumulate over time, to enable longitudinal as well as case based analysis by both primary and secondary teams.
  • Developing and applying ethical and technical protocols for the representation of a dataset for archiving, re-use and dissemination purposes, including templates for the layout of interview transcripts, and for anonymising data, including multi-media data where appropriate.
  • Developing and applying gold standards for the production of metadata (data about data) to document and contextualise a dataset to aid longitudinal and secondary use.

Archiving challenges

The archiving challenges faced within Timescapes (set out below) may not be currently applicable in many research contexts, but they are likely to have a growing currency in future as institutional repositories assume greater responsibility for the curation, preservation and sharing of research data, and as archiving and data sharing increasingly comes to be seen as a collaborative venture between research and archiving teams. Particular challenges for Timescapes have included:

  • Building and sustaining the archive collections within an institutional repository, requiring archive and repository to advance in tandem, technically and scientifically, and with adequate institutional support.
  • Working with and applying national level standards for data curation and dissemination as part of our collaboration with the UK Data Archive.
  • Ensuring an appropriate software platform to maximise ease of use and technical backup to maintain the platform. We are currently facing the challenge of migrating the archive to a new open source software platform (Eprints) in line with developments in the University of Leeds LUDOS system.
  • Creating and applying metadata (cataloguing) templates for the ingest of data into the resource.
  • Creating a useful interface and search and retrieval tools in line with the analytical needs of researchers. We have identified the need to improve the interface and search tools once we have migrated to Eprints.
  • Making provision for varied levels of access controls, including ‘approved’ access (fine grained, file level access) that enables secure deposit and controls on re-use for the benefit of primary teams.
  • Tagging files in the resource to enable thematic searching and retrieval of data within and across projects (e.g. through the assignment of key words to data files and free text searching).
  • Supporting QL researchers in data management planning and facilitating creative synergies between research and archiving.
  • Ongoing collaboration with ‘stakeholder’ data depositors and users e.g. seeking feedback on and refining the presentation of archived data and metadata.
  • Securing resources and skilled staff to manage, develop and promote the resource and ensure its medium and longer-term sustainability through external and institutional funding.

The challenges outlined above are substantial but there also significant rewards in working at the cutting edge of new archiving developments and supporting a new ethos of data sharing. We hope that our involvement in RoaDMaP will help us in future to hone our skills and refine our practices as well as promoting new ways of sharing data that are in line with researcher needs.

Reflections on Developing a Pilot Training course on Research Data Management

Dr Jim Baxter, Senior Staff Development Officer at the University of Leeds reflects on developing a pilot training course for RDM.

(RoaDMaP is working closely with the University of Leeds central staff training (SDDU – Staff and Departmental Development Unit) to develop RDM training for researchers at the institution. Working with SDDU increases the likelihood that training will be developed in an embedded and sustainable way and brings invaluable prior experience of delivering researcher training. Dr Jim Baxter, Senior Staff Development Officer, is a member of the RoaDMaP training workgroup and will deliver our pilot training session. We are also working with the Digital Curation Centre who will deliver the data management planning elements of the pilot. The rest of this blog post is by Jim Baxter and provides some reflections of developing a pilot course on research data management for researchers in the Faculty of Engineering at the University of Leeds.)
  1. The University has developed and had approved a research data management policy that identifies the University’s responsibility to train its staff in research data management. Experience of running courses for research students suggests they have little awareness of the need to back up their research data. The EPSRC have asked Universities to put in place process for ensuring the data they create that is EPSRC funded is made publically available. Thus, there is an implied need for research data management training. The RoaDMaP project has involvement from 3 academic areas that are aware of the need to manage research data. Otherwise this training need has not been identified by researchers themselves.
  2. There are broadly, two groups of staff who need training in research data management:
    (a) the researchers who use and create the research data;
    (b) the professionals who support research staff including

    • IT staff who provide hardware, software and guidance on its selection and use for managing research data,
    • Library staff who have expertise in information literacy who are in a good position to advise researchers on research data management,
    • research and innovation support staff who administer grant application and awards.
  3. Different types of training and awareness raising are needed.(a) Creation of awareness amongst research staff of the need for and benefits of managing research data. This is crucial in getting researchers to realise they have a need to develop their research data management knowledge and skills.(b) Developing the skills and knowledge of researchers of how to plan to manage and manage research data.(c) Developing the skills and knowledge of IT and library staff to support researchers in managing research data.

    (d) The above need doing initially for all and in the medium term those coming new to academic research or Leeds.

  4. The focus of the first pilot at Leeds will be raising awareness amongst research students and research staff.
  5. A number of courses already exist. Some are technical in nature and are probably most relevant to those who realise they have a need for research data management training. Two courses at the awareness raising level that we found useful are those developed at University of Bath as part of Research360 and a course developed at the University of Durham (Managing Your Research Data).
  6. The model we are working on at Leeds is to develop a generic course that can have discipline specific examples slotted in to it to make it relevant to a particular audience. The Durham and Bath slide sets provide generic bases. Discipline specific examples to be used in bringing the issues alive for a particular group are not obviously available. This issue has been raised at the JISC Research Data Management Training Workshop held in London on 26 October 2012.
  7. The Leeds pilot course follows an agenda of why research data management, what is research data, what is research data management, how to do research data management. This agenda is not the same as the Durham or Bath courses although a lot of the themes are duplicated.
  8. At the heart of this agenda is the process of creating and using research data. Many lifecycle models have been developed. These can have a technical or detail focus that is needed once the requirement for research data management has been identified. Other simpler models exist. However, these have not provided a model that I feel helps me tell the story to researchers. Thus, I have developed my own interpretation of the lifecycle of research data based on the models that already exist. I think a key here is the person delivering the training having a model that suits their style of delivery.