Open in order to…contribute to the global digital commons: University collections and Wikimedia

In today’s post for International Open Access Week we explore the value of Wikipedia and Wikimedia Commons for exposing University collections to a wider audience, whether research articles, data or Special Collections

Thanks to Richard Nevell at Wikimedia UK for his input to this post.

Beyond compliance

So much of the discussion around open access is focussed on compliance that it’s easy to lose sight of the more noble ambitions of OA, to democratise knowledge by ensuring that primary research is freely available to all. To inform the global public no less!

Like an individual vote in a referendum, uploading a green version of your article or dataset to an institutional repository might not seem to make much difference, yet its scientific rigour, academic objectivity or sociological insight incontrovertibly contributes to the digital global commons.

Knowledge networks

Making your work available online is only the first step in helping the right audience to find it, an “audience” that might no longer even be (only) human, and it is increasingly important to ensure that both you and your work are networked online, using persistent digital identifiers for example.

According to data from Crossref, Wikipedia was the sixth referrer of DOI clicks in 2015/2016 and, as of 18th October 2017, 1004 DOIs associated with the University of Leeds (978 articles / 26 books / 4 chapters) are cited across 1197 individual Wikipedia pages (thanks to Terry Bucknell of Digital Science for kindly supplying this data).

In a previous post we discussed adding repository links to cited sources in Wikipedia, in addition to the DOI (N.B. I have since been informed that I should use the archiveurl | archivedate protocol, though there seems to be some disagreement about how best to add repository links discussed here on WikiProject Open).

In any case, the University of Leeds has very nearly 1000 publications cited across 1157 pages* yet a search reveals a mere 96 links to records in WRRO – interestingly there are also 155 links to PhD theses in WREO. As both repositories are shared services across the Universities of Leeds, Sheffield and York, the numbers specifically for Leeds outputs will be lower still.

* I’ve discounted books as these aren’t necessarily directly associated with Leeds i.e. can be a reference to a chapter within a book to which a Leeds author has contributed a different chapter.

Despite these relatively low numbers, referrals from Wikipedia are significant for both WRRO and WREO, being the 17th and 19th top referrers respectively:

wikipedia_referrals

Given the immense potential value of OA links to Wikipedia and the inevitable frustration of paywalled DOIs it seems clear that we should encourage contributors to include legitimate links to OA versions where possible (without replacing the DOI or other link to the version of record). However, given the scale of the issue might there be an opportunity to leverage the Jisc supported CORE aggregation service, for example, which will potentially provide OA links to documents from the global repository network?

Wikimedia Commons

Wikimedia Commons is a repository of openly licensed media files – images, video and audio – for use in education and, like Wikipedia itself, anyone can upload or edit material. It also makes is very easy to embed media files across Wikimedia projects.

Haua Fteah cave

According to Wikipedia, Haua Fteah is a large karstic cave located in the Cyrenaica in northeastern Libya. The page includes a section on stratigraphy and layout of the cave which cites a 2014 article archived in WRRO – the affiliation is York rather than Leeds but the global digital commons is obviously bigger than any one institution or repository! The article is published OA on the publisher’s own site so there is no need to add the WRRO link, the DOI provides full access to everyone. As the article is CC-BY, we can use an image of the cave from the paper to illustrate Wikipedia, and the easiest way to do that is from Wikimedia Commons – also embedded here.

https://commons.wikimedia.org/wiki/File:Haua_Fteah_cave.jpg

Research data – more than just spreadsheets

Using its DOI a dataset can of course be cited in Wikipedia in exactly the same way as a journal article or it might provide a unique source of additional material as in the case of Hugh Davies.

To quote Wikipedia once again, Hugh Seymour Davies (23 April 1943 – 1 January 2005) was a musicologist, composer, and inventor of experimental musical instruments. He has also been the subject of extensive research by Leeds academic Dr James Mooney.

On Saturday 17 October, 2015 a concert of music composed by, or in response to the work of, Hugh Davies, was staged at the Clothworkers’ Centenary Concert Hall, including a pre-concert talk by Dr Mooney which has been preserved as a dataset in the Research Data Leeds repository (https://doi.org/10.5518/57) and which is included on Wikipedia as an external link.

In truth research data often is spreadsheets or other forms of numeric or textual data, however, as in the case of this one off concert footage, it can also comprise all sorts of rich media material that can be uploaded to Wikimedia Commons and embedded in Wikipedia with a suitable citation. Baxter, for example, is an industrial robot also used in robotics courses at Universities including Leeds.

220px--Baxter_full.webm
Play media

Natural Language Acquisition and Grounding for Embodied Robotic Systems is a conference paper presented at the Thirty-First AAAI Conference on Artificial Intelligence in San Francisco. The associated dataset (https://doi.org/10.5518/110) includes videos of Baxter manipulating different objects which can be uploaded to Wikimedia Commons under the terms of CC-BY, with a full citation, and used to illustrate the Wikipedia page using a single line of embed code.

Special Collections

Leeds University Library is the only library to have as many as 5 Designated collections. Designation status is a mark of distinction awarded by Arts Council England to outstanding collections of national and international importance held by non-national institutions. One of these is the Cookery Collection, currently on display in the Treasures of the Brotherton Gallery and the first to have its very own Wikipedia article created as part of a Wikimedia internship run by Special Collections in 2016-2017.

The future

At Leeds University Library, our exploration of the potential of Wikimedia projects is at an early stage. Nevertheless we recognise their immense potential to share information with the world.

In the future we would like to organise an Edit-a-athon, related to another designated collection perhaps, or around a particular discipline where there is established expertise at the University of Leeds.

Advertisements

REF2021: towards Open Research

With the funding bodies’ Initial decisions on the Research Excellence Framework 2021 published at the beginning of September including a paragraph on ‘open research’ we consider what this might mean as the REF takes shape.

29. The revised template will also include a section on ‘open research’, detailing the submitting unit’s open access strategy, including where this goes above and beyond the REF open access policy requirements, and wider activity to encourage the effective sharing and management of research data. The panels will set out further guidance on this in the panel criteria. 

Initial decisions on the Research Excellence Framework 2021 (pg 9)

While still some way from full Open Access in the UK we are getting closer, largely thanks to HEFCE’s “Policy for open access in the post-2014 Research Excellence Framework” which came into effect in 2016, on April Fools’ day in fact. Nevertheless it has been taken very seriously. REF is no laughing matter!

The REF has sometimes been maligned as an expensive bureaucratic exercise ill-fitted for purpose, yet the goal of promoting the value and impact of publicly funded research is surely worthwhile and as advocates for all things ‘open’, it at least provides a stick on which to dangle our carrots.

In lieu of the further guidance promised, can we pre-empt some of the activity and initiatives that might contribute to ‘open research’ above and beyond the REF open access policy requirements?

N.B See the updated HEFCE FAQ, specifically:

7.1. What aspects of OA should submitting unit’s include in the environment statement section titled ‘open research’?

Research Data

It is good to see this referred to explicitly at this early stage, following on from the Concordat on Open Research Data published in July 2016 focused on ensuring that research data is made openly available wherever possible.

In actual fact research data was already an eligible output for REF in 2014 and the exercise in 2021 will continue to assess “all types of research and forms of research output”. Nevertheless infrastructure and best practice around RDM are still developing. At Leeds the RDL team based in the Library provide support and advice throughout the research lifecycle. We run an institutional data repository providing long term, secure storage and associating data with a Digital Object Identifier (DOI), a persistent identifier that will facilitate formal citation. Alternatively use the Registry of Research Data Repositories (r3data) to identify a suitable discipline specific repository.

Other useful organisations include Jisc and the Digital Curation Centre.

Potential questions for REF2021:

  • Is the data underpinning your submitted outputs safely stored according to best practice?
  • Is that data openly available (if appropriate) or is it clear how it can be accessed (i.e. does the paper include a suitable data statement)?
  • Has your data been reused by other researchers / initiated collaboration?
  • Do you have established protocols for data management planning that is followed for all research projects?

ORCID

ORCID is an open, non-profit, community-based initiative that provides a unique identifier to reliably differentiate individual authors and enables connections between systems. Linking your ORCID to Symplectic, for example, will provide an additional method for the system to reliably identify your published work and add it to your Symplectic profile, your ORCID will also be passed over to the White Rose Research Repository (WRRO) when you deposit a manuscript.

ORCID increasingly underpins an open scholarly infrastructure, nationally and internationally and is also supported by Jisc.

Related post: So you’ve got an ORCiD…what next?

Potential questions for REF2021:

  • Do all of your submitted authors have an ORCID?
  • Are they using their ORCID profile effectively?
  • Are you actively using ORCID to integrate systems and improve workflows?

Collaboration

Another area that is discussed in the document which identifies “an explicit focus on the submitting unit’s approach to supporting collaboration with organisations beyond higher education” (pg 6, para 18).

The benefits of open research to collaboration opportunities with such organisations are obvious, whether the NHS or SMEs who may not otherwise be able to find or access the research and data they need to further their own mission. Perhaps there is also a question here of targeted dissemination, via social media for example – making research available online doesn’t mean the right people will simply stumble across it.

Potential questions for REF2021:

  • Have you adopted open research practices that are conducive to collaboration?
  • To what extent have these been successful?
  • Are you proactively building and monitoring a network around your research (e.g. by leveraging alternative metrics)?

Impact

The document  acknowledges that work is required to align definitions of ‘academic impact’ and ‘wider impact’ which relate respectively to the assessment of outputs and the impact element of the REF. Notably the weighting for impact has increased from 20% to 25% – as was in fact originally proposed for the 2014 exercise.

There will be additional guidance on the criteria for both ‘reach and significance’ and impact arising from public engagement – it is not hard to anticipate how an open research agenda will feed into each of these. There is evidence that OA increases traditional citations for example while developments in alternative or “altmetrics” are enabling online social activity around research to be recorded and measured. 

Repository downloads also provide a valuable article level metric, indeed we might expect correlation with traditional citations, even causation. The IRUS-UK* service provides COUNTER compliant download statistics for the majority of UK based repositories which means that downloads are standardised and filter out automated downloads by search engine robots for example.

* With 3,766,192 downloads since October 2013, and as might be expected for a consortium of 3 research intensive Universities, IRUS-UK reveals that the White Rose Research Repository is one of the most highly downloaded in the UK. Leeds accounts for 1,773,744 of those downloads.

Potential questions for REF2021:

  • To what extent are you engaging with audiences beyond academia?
  • Do you produce plain language precis of your research?
  • Are you exploiting social media to engage with academic and lay audiences (e.g. Twitter, blogs, Wikipedia)?
  • Are you analysing quantitative data from these sources?

Related post: Wikipedia, information literacy and open access

The Research Support team based on Level 13 of the Edward Boyle Library will continue to review REF guidelines as they are released and associated developments across the sector. You can get in touch by email or on Twitter.

In the meantime, you must ensure your research outputs meet the new REF open access requirements by depositing your author accepted manuscript via Symplectic as soon as possible after acceptance https://library.leeds.ac.uk/university-publications

 

 

Repository Fringe, 2017 – Beyond Borders

Posted by Rachel Proudfoot

http://rfringe17.blogs.edina.ac.uk/ – programme

https://www.era.lib.ed.ac.uk/handle/1842/2403 – presentations from the event

Repository Fringe is an annual event in Edinburgh where anyone interested in repositories and research outputs can share experience, expertise and learn about developments in the repository field. 2017 marks the 10th Repo Fringe and this year was, in part, a celebration of how we have shared content ‘beyond borders’ over the last decade. The Research Data Leeds team explored the theme in our ‘Galactic Interfaces’ poster about working with arts and humanities researchers and data. (The poster is currently on display in the Research Hub on Level 13 of the Edward Boyle Library).

The conference offered a mix of keynote talks, short presentations, ‘birds of a feather’ sessions, posters and of course lots of informal networking over tea and biscuits.

Repositories:  problem or solution?

Shortly before the conference, Elsevier announced it had acquired BePress. Concern about the amount of control large commercial publisher have over research dissemination was a recurrent theme in the conference. This is nothing new, but we are seeing large publishers increasingly pushing into the ‘open access’ arena. Keynote Kathleen Shearer, Executive Director of COAR, suggested that, financially, Universities are as much over a barrel now with article processing charges for ‘gold’ open access articles as we were (still are) with journal ‘big deals’ and hikes in journal subscription costs. Shearer challenged the conference: are repositories helping to perpetuate a highly flawed scholarly communications system? Shearer is part of the Next Generation Repositories Working Group which will be publishing a set of recommendations in Sept 2017. She suggested we need to rethink repository design so we have ‘repositories of the web, not just on the web’. This may involve supporting peer review (another speaker pointed out Elsevier’s controversial patent of the online peer review system), improving discovery of research more than we currently do, making sure metadata is machine readable and taking a stronger lead on digital preservation. We should also develop a shared, international vision and common ways of working which reduce the risk of academic research being disproportionately shaped, controlled and charged-for by commercial interests. For Shearer, we need a more coherent alternative – and we’re certainly not there yet.

Active promotion of content

A few presentations suggested ways that repositories can promote content in addition to curating it. Gavin Willshaw from University of Edinburgh gave a great example of promotion as part of a project to digitize 17,000 PhD theses. Edinburgh have highlighted theses from notable alumni, such as Gordon Brown, Arthur Conan Doyle and Helen Pankhurst, have linked PhD theses to author pages in Wikipedia e.g. https://en.wikipedia.org/wiki/Ernest_Francis_Bashford and have uploaded older theses to Wikisource, Wikimedia’s online library of out of copyright works.

Other discussion looked at what role, if any, a repository can have for impact case studies / research impact more generally. Could the repository promote research and /or capture more usage and impact data? Is there a role for repositories to host lay summaries of research to make research more accessible to a non-specialist audience – be they the ‘general public’ or researchers from other academic disciplines.

Easier, embedded metadata creation which will make researchers’ lives easier

Well, we can dream! One of the keynote speakers, Andrew Millar, outlined a vision of specialist tools designed to support an academic ‘community of practice’ which make it easier to capture metadata and contextual information as a routine part of research practice. Millar is a systems biologist and suggested Fairdom is a widely used tool which helps to capture metadata in a standard experimental workflow. https://fair-dom.org/

Such domain specific tools could link painlessly to shared repositories if we adopted common standards of data exchange. Tools discussed in the context were:

  • http://www.researchobject.org/– packages documents, code and data into a zip file with manifest. Designed to be flexible across different subject areas.
  • https://combinearchive.org/index/ – a way of packaging documents, models and data together using the Open Modelling EXchange format (OMEX).
  • BagIt – uses a file naming convention for structuring digital content

A presentation by Rory Macneil and Megan Hardeman demonstrated an end to end workflow, capturing information via an electronic lab notebook in the RSpace digital research platform depositing directly into the Figshare repository via an easy to use and embedded tool.

Hopefully repository uptake will increase – and we’ll get more enthusiastic engagement from researchers – if we can get closer to their everyday workflows and provide relatively pain free deposit options.

Copyright

Anthea Wallace’s absolutely excellent presentation showed examples of how public domain works – either deliberately or unintentionally – have had restrictions imposed on reuse. As Wallace put it, closed licences won’t stop bad people from doing bad stuff with your data but may well stop good people doing good stuff. Wallace promoted the Copyright Cortex https://copyrightcortex.org/ as a helpful resource for researchers in digital humanities. I partly mention this presentation as an excuse to use one of Wallace’s examples: the transcription of music from a human bottom in Hieronymus Bosch’s Garden of Earthly delights which can you can see below. You can also listen to an adaption of the ‘Butt Music’ on YouTube.

Hieronymus Bosch’s Garden of Earthly delights

http://animalnewyork.com/2014/listen-hieronymus-boschs-butt-song-hell/

More from the Fringe

Incentivising open practices – Digital Curation Centre (Authored by Sarah Jones with corrections from Dr Paul Ayris)

Repository Fringe 2017

We are looking forward to the Repository Fringe next week, now in its 10th year, and coinciding as always with the far less entertaining Edinburgh Fringe. We will be presenting a poster (to follow, see below for a taster), perhaps telling a few jokes, and sharing expertise and experience with our fellow repository professionals.

The full programme is available at http://rfringe17.blogs.edina.ac.uk/programme/ and you can follow on Twitter @repofringe | #rfringe17

Galactic Interfaces: navigating the creative data universe

The poster takes its title from a piece of music in one of the datasets in the Research Data Leeds repository. ‘Galactic Interfaces’ is a semi-improvised piece about interactions and contrasts; rather like developing a research data service. The poster will use the galactic theme to show how working with arts and humanities researchers has launched us from planet ‘EPSRC data compliance’ to boldly go where the research data service has not gone before. We use ‘Galactic Interfaces’ in research data training sessions to encourage researchers to step outside their own world and think creatively about their data and metadata. Our galactic journey has taken us into the Special Collections galaxy where we have been working on developing a common language so we can understand each other. We have a landing party visiting the digital humanities nebula and we’re launching a rescue mission for project web sites currently being drawn into a giant black hole.

Black hole

Much valuable work has been done with creative data already in other repository services (VADS, UAL etc.); for a repository in a multi-disciplinary institution like University of Leeds, working with creative data has shifted thinking about our research data service and where its long term value may lie. It has prompted consideration of the variety of data contributors; who should be acknowledged for their creative input, and how? How do we licence data with third party content? How do we capture and package data from practice-based creative disciplines? Do we have a role in bringing together data and researchers from different spheres – virtually, but also in physical space for discussion and exploration? Borders are being crossed, redrawn and broken down and we are re-plotting our star charts! (We will also reach beyond the borders of the poster by making it interactive.)

Research Data Network – University of York – June 2017

If Jisc’s 4th Research Data Network earlier this week felt a bit rushed at times, it only reflects the sheer number of exciting projects happening across the sector.

There’s still a long way to go but it felt like the dots are really starting to join up and there was lots of energy in both real and virtual space – see Storify of tweets on the #JiscRDM tag during the event.

Delegates busy networking on Tuesday evening at RDN York (thanks to Paul Stokes for the photo, used with permission)
Delegates busy networking on Tuesday evening at RDN York (thanks to Paul Stokes for the photo, used with permission)

Two packed days in York were bookended by an inspiring opening keynote from Mark Humphries asking “Who will use the Open Data?” and by a panel session the following afternoon on the principles and practice of open research, informed by the open research pilot project at the University of Cambridge.

Mark emphasised that there is a clearer rationale in some academic contexts than others. Clinical trials, for example, are time consuming and expensive and need to be safe and effective which provides a clear motivation to share data and check conclusions.

Mark singled out his own discipline of neuroscience however as lagging behind, with no discipline specific open data repositories, and inclined to “data worship”. New data is hard to get and requires considerable skill (to implant electrodes in a rat’s cortex for instance) and will underpin high-impact papers, that universal currency of academia. It’s not for sharing!

Mark reassured us, nevertheless, that open data is the future. Inevitably. If only due to the sheer scale of data being generated which simply has to be shared if it is to be analysed effectively, citing an instance whereby a single dataset generated 9 high quality papers from several labs. RDM isn’t trivial though, one of the main reasons that funding bodies are mandating data sharing.

Some 28 hours later, we were back in the same lecture theatre for the final session chaired by Marta Teperek. Our four panelists fielding questions from the floor were David Carr (Wellcome Trust), Tim Fulton, Lauren Cadwallader (both University of Cambridge) and Jennifer Harris (Birkbeck University).

There was a great deal of emphasis on the cost of open research and sustainability – by way of answer to the question above, Lauren Cadwallader referred to her recent blog post Open Resources: Who Should Pay? and shared her reservations about the ‘gold’ model of open access that is sustained by expensive Article Processing Charges to commercial publishers.

There are similarities and synergies between OA and open data initiatives, including increasing interest from publishers. There are also significant differences and it was pointed out from the floor that long term preservation is a cost that needs to be borne by someone.

Betwixt these bookends were far too many sessions to discuss in detail, covering everything from the European Open Science Cloud (EOSC) to an update on the work HESA is doing in relation to research data in the context of REF2021, Archivematica for preservation and some fantastic resources for business case development and costing for RDM (including a number of useful case studies). Then there’s the Research Data Alliance which *anyone* is able to join and which offers a window onto many different communities.

It was particularly interesting to learn about ongoing developments with Jisc’s shared service which is working with 13 pilot institutions on repository and preservation solutions and comprises a range of tools to capture, preserve, disseminate and allow reporting. The pilot offer also includes training, support and gathering of best practice. Pilot users will be testing these systems throughout the summer and providing feedback with a view to rolling out production between April and July 2018.

The UK research data discovery service (beta), part of the Jisc Research at Risk challenge to develop RDM infrastructure, enables the discovery of data from UK HEI’s and national data centres.

Leeds contributed to the event by sharing lessons learned when setting up our RDM service and with a lightning talk.

All in all a valuable couple of days with lots of information still to synthesise and file away. Indeed to preserve in one’s cortex…now where’s that neuroscientist?

Slides from all sessions and extensive notes are available from https://research-data-network.readme.io/v2.01/docs/4th-research-data-network-york-university

Research data: enabling peer review

We are starting to get requests to make data available for peer review prior to the journal paper being accepted. Some authors are happy for the data to go live in the repository with a note explaining the data is under review and may be subject to change. However, not all authors are happy with putting their pre-review data into the public domain and a better model would be restricted access. There are additional challenges associated with single and double blind peer review and any model based on the (institutional) repository will necessarily reveal the affiliation of an author due to the institutional URL.

Images from datasets in the Research Data Leeds Repository
Images from datasets in the Research Data Leeds Repository

Increasingly journals manage this themselves via a partnership with Data Dryad* or Figshare but not all have a suitable mechanism set up for access to data in addition to the draft of the paper. Moreover, such a journal-centric model will disadvantage institutionally based data repositories, potentially even render them obsolete (see pros and cons of journals handling data below).

* http://datadryad.org/pages/submissionIntegration

Might there be a role for Jisc here to build suitable mechanism into their shared service which, from a blind-review perspective would have the advantage of obscuring author affiliation?

Potential solutions

1. Make the data available in the repository. Don’t mint a DOI. Send the URL to the reviewers. Include a prominent note on the eprint record ‘This data is associated with a paper which has been submitted for publication. The data may be subject to change [date]. Full details of the associated publication and the final dataset will be made available in due course.’

2. Make the data available in the repository with access control. Repository account enables access to the dataset only from specific user account(s). Problem: this is not available yet (for EPrints)?

3. Share the data via OneDrive. This may not be suitable for double blind peer review. However, if the journal can act as a liaison point i.e. the editor is given access to the data on OneDrive, the journal could then provide access details to the peer reviewers. This could be a good solution if the journal is willing.

4. Share the data in another repository – Figshare, Zenodo – which supports restricted access prior to publication of a dataset. This is a good way to share data with a restricted group, but may not be suitable for single or double blind peer review – unless the journal publisher can act as the access gateway as in the OneDrive model outlined in 3. One downside – why bother to deposit in RDL if the data is already in Figshare or similar?

5. Ask the journal if they can help – there may be a mechanism for providing access to the data. This may not be in place. There is a risk data will become supplementary information or be deposited in another repository (if we see this as a problem) so reduces the role for RDL.

Hide Creator Hide reviewer Hidden to world
1 Data available in repository N N N
access through publisher N Y N
2 Data available in repository with access control N N Y
access through publisher N Y N
3 Share data via OneDrive N N Y
access through publisher N Y Y
4 Share data in another repository Y N Y
access through publisher Y Y Y
5 Ask journal if they can help Y Y Y
Jisc Shared Services? ? ? ?

White Rose Libraries Digital Scholarship Event

Last week I attended an event in Sheffield that brought together colleagues from across the White Rose consortium (Universities of Leeds, Sheffield and York) to explore developments in Digital Scholarship. Whatever that might be…

Indeed, several speakers throughout the day drew attention to potential problems of terminology – the other common descriptor is Digital Humanities – with Ben Outhwaite in his keynote differentiating between the plain scroll and the later codex to illustrate that technology has always facilitated new methods of analysis and that digital technology isn’t qualitatively any different. Digital Humanities is simply humanities research driven by the opportunities offered by new media.

Anne Horn, University Librarian (Sheffield), conversational in her introduction, provided a preliminary definition and elicited perspectives from the audience. Anne emphasised interdisciplinary collaboration, with the Library as an active participant – a theme that recurred throughout the day – and suggested that communities coalesce around both technology and processes as well as content and datasets. She talked about the challenges of building and sustaining the broad range of knowledge and skills required, an area in which the Library has a clear role.

In one of several academic viewpoints throughout the day, Mike Pidd described how the Digital Humanities Institute at the University of Sheffield is self-funded through project collaboration and supports technology R & D in the humanities with services ranging from data acquisition, data modelling and data management to data visualisation and preservation and sustainability. We learned about just a few of the projects within the HRI, like The Digital Panopticon which has brought together genealogical, biometric and criminal justice datasets held in the UK and Australia to explore the impact of different types of punishments on the lives of 90,000 people sentenced at the Old Bailey between 1780 and 1875. The scale of the project is impressive having linked records across 45 separate datasets both public and commercial (e.g. Ancestry UK) illustrating a common challenge negotiating with data providers.

Other projects are Locating London’s Past*, Old Bailey Online*, Linguistic DNA and Mark My Bird all of which are capturing and reusing data in innovative ways, backing up Mike’s statement that “data is just as important for your career as publishing books and articles”.

* Raw XML data from London Lives and Old Bailey Online is available from Sheffield’s data repository ORDA

The Library Showcases, reprised in the afternoon, were an opportunity for us to learn about digitisation projects within archives and special collections across the consortium:

The presentation from York, for example, emphasised the complexity of these types of project requiring a broad range of skills from traditional document preservation, digitisation/ingest and development of an editorial interface (the editing tool for the Archbishops’ Registers is available from github.)

Digitised excerpt from Henry VIII’s divorce from Anne of Cleves (Archbishops’ Registers)

Digitised excerpt from Henry VIII’s divorce from Anne of Cleves (Archbishops’ Registers)

High quality digital images facilitates zoom-in with no loss of fidelity

A couple of academic viewpoints spanned lunchtime with Louise Hampson from the Centre for the Study of Christianity & Culture at the University of York and Brett Greatley-Hirsch from the University of Leeds.

Louise talked about the legacy issues of migrating CD Roms to internet based resources, both practical difficulties for a small team and (re)negotiating usage rights while Brett immediately won over the room by saying that libraries should be recognised as active collaborators and not mere support services.

Brett has come to Leeds via Australia and Canada and introduced us to Digital Renaissance Editions which publishes open-access electronic critical editions of non-Shakespearean early modern drama.

The second of my Library Showcases was Sheffield’s National Fairground & Circus Archive, a “living” archive actively “contributing to the organisation and promotion of shows and festivals” and drove home yet again the broad range of skills required to curate digital material.

All of which brought us to an energetic keynote from Ben Outhwaite who described a somewhat fragmented landscape at Cambridge with various pockets of work that perhaps lack cohesion across a University where STEM subjects tend to prevail. The University is beginning to look at the area strategically however, to support their digital humanists who might be collaborating with scholars elsewhere through the Digital Humanities Network, a university funded, short-term, strategic initiative. Ben also talked us through the high profile Casebooks Project, making available the astrological records of Simon Forman (1552-1611) and Richard Napier (1559-1634) “unparalleled resources in the history of early modern medicine”.

The best projects are idea-led not technology led, according to Ben, and there needs to be a real scholarly need, a theme that came through strongly in presentations throughout the day with digital technology an integrated aspect of all projects. Digitisation, though, undoubtedly leads to more opportunities

Crucially “You can’t do anything without data, collect and look after the data rigorously“.