06 June 2014
The British Library Big Data Experiment
This week The British Library Big Data Experiment began, a collaboration between the British Library Digital Research team, UCL Computer Science, and UCL Centre for Digital Humanities that will experiment with approaches to opening up the digital collections at the British Library, particularly to benefit those undertaking research in the arts and humanities.
The experiment itself will see a team of UCL Computer Science and Software Systems Engineering students work between now and September on an MSc project that seeks to develop experimental platforms for access to and interrogation of British Library public domain digital collections using the Microsoft Azure cloud infrastructure. Their brief is to design a research-oriented front end with adaptors and facades and to construct implementations of Azure APIs that are functionally scalable to the datasets provided. Features of this public front end might include recommender and similarity engines, machine learning interfaces, statistics integration, and the ability to bundle mixed subsets of digital resources for download.
The project team assembled through a process of self-selection and each member has their own reasons for being involved. For Stefan P. Alborzpour, MSc Computer Science, "the opportunity to plough through vast quantities of digitised historical content using intelligent systems presents an exciting challenge. This shall benefit not only academics but enrich the wider community and I am thrilled at the prospect of contributing to this project." Stelios Georgiou, Testing Director and MSc Software Systems Engineering, is "eager to work for a globally renowned hub of information and knowledge, such as the British Library, because it will provide me with the opportunity to develop my skills in a challenging infrastructure setting." Wendy Wong, MSc Computer Science, is all about the data. "Big Data is such a modern and growing field now," she said, "to integrate it with century old works just shows how far the technological age has come, and this in itself I find very exciting." Finally, Nectaria Stavrou, Team Leader and MSc Software Systems Engineering, sees effecting change and the challenge of the project as the biggest draw: "Handling a mass amount of information and mining into that can be considered a big challenge, but yet powerful enough to advance humanity a step further. The British Library is definitely one of the greatest libraries globally and now is the time for that big move that will allow researchers and other people to find the gold, the information they seek faster. And I am excited that I will be a small part of this significant change."
The first stage of the project will involve bringing the team up to speed with research patterns in the arts and humanities and the demands these researchers place on the British Library as digital library. Armed with a clear sense of user need, the students will then go on to grapple with the data from our Public Domain Microsoft Books collection (for more details see 'A million first steps' and the British Library Public Domain wiki), before building an public facing experimental interface to the collection that you'll be able to use in your research.
This project is intended as the first stage of a long term collaboration that will see UCL Computer Science students using British Library open data and public domain digital collections to develop experimental services, tools, and infrastructures with support from the Digital Research Team.
Curator, Digital Research