In this 4th blog in the series, Archivist, Caroline Bolton looks at opportunities and challenges for enhancing catalogue data.
What does it involve?
Even structured catalogue data can be limited when descriptions (values) are not consistent or buried in lengthy narrative. Enhancing these can improve consistency, make descriptions more meaningful and provide new access points (subject, person, organisation or place). This recent project showed this to involve several stages:
- Extraction: Identifying and pulling out access points from narrative descriptions to record them in a structured way to improve filtering/browsing or search.
- Standardisation: Ensuring that data is consistent, e.g. adopting standards for formation of dates or names, or using controlled vocabularies to standardise terminology. This improves sorting of records.
- Disambiguation: Using recognisable (unique) identifiers (ID’s) for describing who, where or what in collections. This provides assurance that we are talking about the same person, place or subject – essential for entities with the same name/title (e.g. Steve McQueen the US actor or British film director ). For place this might include geo-co-ordinates.
- Linked (Open) Data (LOD): Transforming identifiers into machine readable links (URI’s-Uniform Resource Identifier) – that web technologies can use. This disambiguates and enables linking common entities across collections over the (semantic) web. This can support discoverability and research by building knowledge and making connections between collections and those that feature. Interest in LOD is growing across the GLAM sector (Galleries, Libraries, Archive and Museum) with examples of initiatives to join up collections across historically siloed catalogues. The Library of Congress , Getty and OCLC FAST all now provide LOD services for their vocabularies.
A note on selecting standard identifiers
There are many authorities for describing names, places, subjects to choose from. It is useful to consider:
- Responsibility for maintenance? This can impact on sustainability, reliability, development and may reflect cultural/historic biases.
- Coverage- comprehensive?
- Support for linked data?
- Language? Will it support multilingual discovery?
- Targeted or general audiences -is it widely adopted? Is it only used by certain sectors or does it have broader application (e.g. Wikidata)? Is it a niche standard needed for engaging with specialist audiences? Widespread use can impact on effectiveness. The Archives Hub Names Project is evaluating standards for names and looking at practical ways of encouraging adoption. Realistically it may be necessary to use legacy or niche authorities, but it is possible to use multiple standards or use/create mappings between equivalent terms.
What approaches exist for enhancing catalogues?
Manual: Manually reviewing and updating catalogues is resource intensive but offers opportunities to crowdsource or engage remote volunteers or researchers.
(Semi)- automation: Accessible semantic tools such as OpenRefine can help enhance data at scale. These still require the user to verify but massively reduced the effort.
Automated/Artificial Intelligence: There are many semantic analysis tools and initiatives that are designed to identify people, places, subjects, and concepts in text (entity extraction) with increasing accuracy.
Challenges and Opportunities
An enhanced catalogue offers increased opportunities for improving access, discoverability and understanding of collections and the stories within. The process of enhancement offers additional opportunities to:
- Engage local/specialist knowledge and skills to be involved in co-creation of enhanced catalogues.
- Grow digital skills and confidence in using technology for both archivists and researchers.
It also presents some new questions and challenges:
- Identifying priorities for enhancement. It may not be practical to enhance to all levels or worthwhile for all collections. (Those containing recognised individuals, organisations, and places may benefit).
- Settling on an approach for legacy catalogues needs planning/testing but recent experience found the resource is reduced once it becomes business as usual.
- How do we revise cataloguing practices to ensure an enhanced catalogue is created for new and born digital collections?
- How do we manage co-curation in cataloguing? Where does the remit of the archivist end and the researcher/volunteer begin in this process? How do we address IPR in collaborative catalogue descriptions and attribute the efforts of volunteers/researchers?
- How are cataloguing processes and systems equipped to integrate enhancements (such as holding URI’s) and the enhancement process into the catalogue? Where tools have been used, how is this reflected, e.g. addressing confidence in accuracy for automated indexing?
This blog has focused on the enhancement on legacy catalogues for analogue collections, but the approaches will become increasingly relevant for cataloguing born digital collections.