The National Archive’s Bridging the Digital Gap trainee Marco Brunello discusses Special Collections’ latest audiovisual transcription project.
The aim of this project is to create transcriptions for audiovisual material held in Special Collections. The benefits of this would include making these audiovisual collections more searchable and discoverable.*
The project started when digital archivist Chris Grygiel uploaded 844 videos from the South Bank Show Production Archive to MS Stream. This video sharing tool auto-generated captions. After that, a small team – including myself – was put to work on the first round of transcription editing. This early testing was carried out to create a comprehensive guidance document before extending the transcription editing task to a wider team.
MS Stream output, in fact, could not be provided “as-it-is”: several words pronounced during the videos, especially proper nouns, have been misinterpreted by the software, and punctuation had to be fixed to make the text more readable. Additional issues arose from the fact that these videos are unedited production footage. This means that occasionally there are filming crew members talking in the background, or even the interviewer’s speech can be often unclear.
Other questions which emerged during the early stage of this task included:
- How to mark each speaker in the transcriptions?
- What to do with swearing and offensive terms?
- Should we transcribe filler expressions (uhm, eh) or thought pauses (you know, yeah)?
Considering these transcriptions are meant to be provided primarily as standalone texts, rather than in their original subtitle format, the team has been working on solving these issues and agreed to:
- specify a speaker’s name in the first instance, then use their initials in the rest of the dialogue (with specific instructions on how to mark unidentified speakers);
- not remove swearing and offensive terms, but include a sensitivity statement to the affected texts;
- consider the Intelligent Verbatim Transcription style guidelines when encountering filler expressions or thought pauses.
A first, completed version of the guidance document was released to the wider transcription team in mid-April. Since then, it has been occasionally updated as and when additional issues arose.
It usually requires several hours to edit a single transcription, depending on the length of a video and how accurate the auto-generated captions are. So far, almost 200 out of 844 videos have already been processed, and the workflow created for this project may be extended later on to other audiovisual collections (such as the Leeds Archive of Vernacular Culture).
*Transcripts are for University of Leeds staff and students only to make the content more accessible.