31 October 2010
Searching video, growing knowledge 2
Audio Search results page
OK, as promised, here's part two of the guide to the moving image applications available for testing at the British Library's Growing Knowledge exhibition. In the last post we covered Video Server, which is making BBC television news recordings from May 2010 available with a subtitle search facility. This involves some ingenious work underneath the bonnet, because TV subtitles comes as graphics, not text. The results are - we think - very impressive and other huge potential for integrating video content with other, text-based resources in a digital research environment. But what about video content that doesn't come with subtitles?
Many TV channels do not have a subtitles feature (generally in the UK it is only the terrestrial channels that do), so unless you have access to catalogue information, which can be scanty in any case, then you have to scroll through the video, or audio file, until you get to what you are looking for - if it is there at all. What is needed is software that converts speech to text. And that's what we've got.
MS Audio Search makes use of Microsoft Research Audio Video Indexing System, or MAVIS. This is speech-recognition software which takes an audio file and converts the sounds searchable text, by means of 'Large-vocabulary continuous speech recognition', which essentially means that it is underpinned by a dictionary and a language grammar controller. In practice you type in a search term, and up comes a range of videos with a column on the results page which gives those lines in the speech track where your search term was spoke. Click on any one of these and it takes you to a point roughly five seconds before the word is spoken and plays the video or audio.
For the purposes of Growing Knowledge we have made available around 120 hours of BBC television news programmes dating May-August 2010, plus around 500 hours of audio-only reording from the British Library's collection, covering oral history interviews with Jewish Holocaust survivors, talks at at London's Institute of Contemporary Arts, and interviews with British painters, sculptors, photographers and architects. The software isn't perfect - there's around 70% accuracy rate - and it is essential to cross-check with the actual audio track that you have selected because the 'transcript' alone cannot be trusted as an entirely accurate record of what has been said, which is the same with subtitle or indeed with any form of optical character recognition (OCR). The software is happiest with clearly-spoken English, and struggles a little when there are multiple speakers.
But even so, wow. The potential is huge. Audio or video recordings can become immediately discoverable instead or requiring a period of time to analyse their contents, so long as there is a speech track. Because a dictionary underpins the resource, you can build up a thesaurus and keywords, and develop rich linkages with other text-based resources. Although it is early days for such technology, it will in time open up audiovisual archives to an unimaginable degree.
You can try out MS Audio Search and Video Server at the Growing Knowledge exhibition, which runs until July 2011. We're unable to make the video content available remotely through the Growing Knowledge website, but you can test out the audio files from the British Library and other institutions experimenting with MAVIS at Microsoft's test site. Do have a go, and just imagine what the future for research might be.