On 23 September the British Library played host to the Semantic Media Network for a one-day worksop, snappily entitled Semantic Media @ British Library. The Network has been established by Queen Mary University of London to "address the challenge of time-based navigation in large collections of media documents". Digital and digitised media archives have grown vast, and finding what they actually contain has become a huge challenges for broadcasters, archivists, researchers and some bright developers who are interested in a challenge.
It was those developers who were the main target of the workshop, which was based around the sound and moving image collections of the British Library. After an opening address by Mark Sandler (Head of School of Electronic Engineering and Computer Science at Queen Mary) we had four short presentation from projects which have received funding support from the Network.
Michael Bell (Newcastle University) introduced the Tawny Overtone music synthesis project; Tim Crawford (Goldsmiths University of London) spoke on semantic linking and early lute music, which made for a delighful combination of the ancient and modern; Ryan Stables (Birmingham City University) discussed 'Large-scale Capture of Producer-Defined Musical Semantics' (defining music recordings by subjective terms such as humans like to use but machines struggle to comprehend); and David Newman (University of Southampton) on enriching news stories by semantic means, specifically enriching episodes of Question Time with contextual information taken from Twitter, Wikipedia etc (so themes raised in the programme are connected to online resources).
Next up came three speakers from the British Library, describing our audiovisual collections, the potential for opening up their research value by extracting meaningful information (which is what semantic media is all about), and describing some of the challenges involved. Richard Ranft (Head of Sound and Vision) describe the British Library Sound Archive collection, with its 8 million tracks requiring 66 years were you to listen to it all - by which time rather more than an additional 8 million tracks will have been acquired. How to manage and make available such information, let alone listen to it all? Of course there is a catalogue to guide you to the Library's sound holdings, but some much information that the audio files contain lies buried because so much of the media is not yet in digital form, or if it is then barriers such as copyright and limited catalogue records mean that too much of the collection remains largely undiscovered. Automated indexing and enrichment through such tools as melody matching, score matching, speaker identification and speech-to-text have the potential radically to transform how researchers engage with such archives. But demand needs to come before tools. Ranft was disarmingly frank about the need for users to demand more. From demand will come new services - people just need to raise their expectations and think not simply of what can be found now, but what ought to be found.
Paul Wilson (Radio Curator) described a collection of over 200,000 hours of radio, access to which would be radically transformed by the application of searching tools such as speaker identification and speech recognition. He described the national radio archiving picture overall, revealing the alarming fact that of the 3 million of hours of radio broadcast in the UK each year, only 3% can be said to be archived properly in a form that will ensure its long-term preservation. There is so much in radio content that can benefit a huge range of research enquiries, yet before we devise ingenious means of discovering such archives, we have to ensure that we have the archives to discover in the first place.
I then spoke on the News collection at the British Library, by which is meant newspapers, television, radio and web. We are at different stages of development for each. We house the British Newspaper Library, with some 750 million pages from the 17th century to today. Our television and radio news service, Broadcast News, began recording programmes in May 2010 and has now passed the figure of 30,000 titles, with some 60 hours of new content added every day. Web news sites are to be a special focus of our UK web archiving activities, now that the non-print legal deposit legislation and regulations are in place, but we are still in the process of determining which sites to harvest on a daily or weekly basis. The great challenge for the British Library will be to start forging meaningful links between these different news media, because ultimately the news does not exist in any one medium, rather it is we who seek out the news from the multiplicity of news forms available who create what news actually is, in our heads. Thinking semantically will help bring the news media together to create a more meaningful and potentially very exciting future for researchers.
A panel session then followed, for which Mahendra Mahey of the BL Labs initiaitive joined us, a project similar to the Semantic Media Network in encouraging the development of new ides with small amounts of project funding. The debate turned away from the practicalities of semantic linking to the angst of archivists. There is so much to be discovered, so much that can be done, but is the demand always there? Do you wait for demand, or hope to encourage it through new tools and services? Do we capture everything, even if we can? Where is the place of audiovisual in a Library which still - for the most part - puts print first and foremost?
The day finished with a lively 'speed-dating' session, in which we sat opposite another delegate, exchanged ideas for three minutes, then a bell rang and we all moved chairs to sit opposite someone else and started up the conversation once again. I came away with three business cards, so I can't have done too badly. The ultimate aim of the Network is just that - to be a network, because it is from casual meetings that ideas start to grow (Silicon Valley is built on that very principle).
The slides from most of the talks given on the day are available from the Semantic Media Network site. There will be a funding call from the Network before the end of this year, and hopefully some of the issues raised during the workshop will help inform the nature of that call or the responses that are made to it.
Many thanks are due to Sebastian Ewert and the team at Queen Mary for having put together such a productive and interesting event. If we put on another event like it, we'll want to bring in users and potential users to meet up with the developers. Combining need and opportunity so each feeds off the other - that's the way forward.