Digital scholarship blog

Enabling innovative research with British Library digital collections

18 December 2024

The challenges of AI for oral history: key questions

Oral History Archivist Charlie Morgan shares some key questions for oral historians thinking about AI, and shares some examples of automatic speech recognition (ASR) tools in practice in the first of two posts...

Oral history has always been a technologically mediated discipline and so has not been immune to the current wave of AI hype. Some have felt under pressure to ‘do some AI’, while others have gone ahead and done it. In the British Library oral history department, we have been adamant that any use of AI must align practically, legally and ethically with the Library’s AI principles (currently in draft form). While the ongoing effects of the 2023 cyber-attack have also stymied any integration of new technologies into archival workflows, we have begun to experiment with some tools. In September, I was pleased to present on this topic with Digital Curator Mia Ridge at the 7th World Conference of the International Federation for Public History in Belval, Luxembourg. Below is a summary of what I spoke about in our presentation, ‘Listening with machines? The challenges of AI for oral history and digital public history in libraries’.

The ‘boom’ in AI and oral history has mostly focussed on speech recognition and transcription, driven by the release of Trint (2014) and Otter (2016), but especially Whisper (2022). There have also been investigations into indexing, summarising and visualisation, notably from the Congruence Engine project. Oral historians are interested in how AI tools could help with documentation and analysis but many also have concerns. Concerns include, but are not limited to, ownership, data protection/harvesting, labour conditions, environmental costs, loss of human involvement, unreliable outputs and inbuilt biases.

For those of us working with archived collections there are specific considerations: How do we manage AI generated metadata? Should we integrate new technologies into catalogue searching? What are the ethics of working at scale and do we have the experience to do so? How do we factor in interviewee consent, especially since speakers in older collections are now likely dead or uncontactable?

With speech recognition, we are now at a point where we can compare different automated transcripts created at different times. While our work on this topic at the British Library has been minimal, future trials might help us build up enough research data to address the above questions.

Robert Gladders was interviewed by Alan Dein for the National Life Stories oral history project ‘Lives in Steel’ in 1991 and the extract below was featured on the 1993 published CD ‘Lives in Steel’.

The full transcripts for this audio clip are at the end of this post.

Sign Language

We can compare three automatic speech recognition (ASR) transcripts of the first line:

  • Human: Sign language was for telling the sample to the first hand, what carbon the- when you took the sample up into the lab, you run with the sample to the lab​
  • Otter 2020: Santa Lucia Chelan, the sound pachala fest and what cabin the when he took the sunlight into the lab, you know they run with a sample to the lab​
  • Otter 2024: Sign languages for selling the sample, pass or the festa and what cabin the and he took the samples into the lab. Yet they run with a sample to the lab.
  • Whisper 2024: The sand was just for telling the sand that they were fed down. What cabin, when he took the sand up into the lab, you know, at the run with the sand up into the lab

Gladders speaks with a heavy Middlesbrough accent and in all cases the ASR models struggle, but the improvements between 2020 and 2024 are clear. In this case, Otter in 2024 seems to outperform Whisper (‘The sand’ is an improvement on ‘Santa Lucia Chelan’ but it isn’t ‘Sign languages’), but this was a ‘small’ version of Whisper and larger models might well perform better.

One interesting point of comparison is how the models handle ‘sample passer’, mentioned twice in the short extract:

  • Otter 2020: Sentinel pastor / sound the password​
  • Otter 2024: Salmon passer / Saturn passes​
  • Whisper 2024: Santland pass / satin pass

While in all cases the models fail, this would be easy to fix. The aforementioned CD came with its own glossary, which we could feed into a large language model working on these transcriptions. Practically this is not difficult but it raises some larger questions. Do we need to produce tailored lexicons for every collection? This is time-consuming work so who is going to do it? Would we label an automated transcript in 2024 that makes use of a human glossary written in 1993 as machine generated, human generated, or both? Moreover, what level of accuracy we are willing to accept and how do we define accuracy itself?

 

Samplepasser: The top man on the melting shop with responsibility for the steel being refined. Sampling: The act of taking a sample of steel from a steel furnace, using a long-handled spoon which is inserted into the furnace and withdrawn. Sintering: The process of heating crushed iron-ore dust and particles (fines) with coke breeze in an oxidising atmosphere to reduce sulphur content and produce a more effective and consistent charge for the blast furnaces. This process superseded the earlier method of charging the furnaces with iron-ore and coke, and led to greatly increased tonnages of iron being produced
Sample glossary terms

Full transcripts for audio clip

Otter 2020

Santa Lucia Chelan, the sound pachala fest and what cabin the when he took the sunlight into the lab, you know they run with a sample to the lab, then you call it up, put it in water and real a sample itself for the analyzer steel and they will write the cabin and assault on the flush on a piece of paper and soldier came out the dough into the plant your own mega donor works in the Guild, the Santa the Sentinel pastor or stood in the middle of the stage and whatever the cabin was by, if you put your hand on the chin, that meant 20. So if you don't that when 20 Heaven, if it was five, you put your feet and on your nose. It was 10 you put your hand on top of your head. If it was true, you're touching between the legs, but if it was one, you put your finger up your backside and your key point eight that's your sound the password got your message then God on the back and shut the furnace.

Otter 2024

Sign languages for selling the sample, pass or the festa and what cabin the and he took the samples into the lab. Yet they run with a sample to the lab. Then you have to cool it off of the blower, put it in water, and drill the sample itself for the check in the lab for them to analyze the steel and they would write the carbon and the salt and the fuss on the piece of paper. And soon as you came out the door into the plant, you had to run like a donor wash, and you had to give the sand to the salmon passer, or stood in the middle of the stage. And whatever the carbon was by, if you put your hand under your chin, that meant 20. So if you done that, when 20 carbon, if it was five, you put your feet and on your nose. And if it was 10, you'd put your hand on top of your head. If it was two, you touch in between your legs. And if it was one, you put your finger up your backside, and you keep pointing at that till the Saturn passes. Got your message, then he would go on the back and trapped the furnace.

Whisper 2024

The sand was just for telling the sand that they were fed down. What cabin, when he took the sand up into the lab, you know, at the run with the sand up into the lab. Then you got the cool it off, put it in the water and drill the sand itself for the jet-checking the lab. Then the analyzer stale. Then they would write the cabin and the salt was on the piece of paper. And so when you came up the door into the plant, you got the rune like I don't know what, and you got the sand, or the santland pass, oh, it was through the middle of the station, whatever the carbon was. If you put your hand on your chin, that meant 20. So if you done that meant 20 carbon. If it was 5, you put your finger and on your nose. And if it was 10, you put your hand on top of your head. If it was toe you'd touch in between your legs. But if it was one you'd put your finger up your back side and you'd keep pointing at that so the satin pass would go to your message and then you'd go on the back and tap the fairness."

Human Transcript

Sign language was for telling the sample to the first hand, what carbon the- when you took the sample up into the lab, you run with the sample to the lab, then you got to cool it off with the blower, put it in water and drill the sample yourself for the, in the lab for them to analyse the steel. And they would write the carbon and the sulphur on a piece of paper and soon as you came out of the door into the plant, you had to run like I don’t know what and you had to give the sign to the sample passer who was stood in the middle of the station, whatever the carbon was. If you put your hand under your chin, that meant twenty, so if you done that, it meant twenty carbon. If it was five, you’d put your hand on your nose. And if it was ten you’d put your hand on top of your head. If it was two, you’d touch in between your legs, and if it was one, you’d put your finger up your backside. And you would keep pointing like that, so the sample passer got your message, then he would go round the back and tap the furnace.

.