The British Library Big Data Experiment project update
In this post, the British Library Big Data Experiment team reflect on their work in the first six weeks of the project. For more information on this collaboration between the British Library Digital Research team, University College London Computer Science, and University College London Centre for Digital Humanities see our kickoff post.
Since the project began in early June we have had an interesting time coming to terms with the typical workflow of a researcher from the arts and humanities. One of the key tasks for this goal was conducting a focus group where we learnt a variety of different things, for instance, considering research conventions within the field, it was surprising to discover that researchers are willing to leverage modern computing tools such as text analysis. During the focus group researchers expressed views and ideas which had never occurred to us, such as, “each instance of a book is a different object, it is unique because one specific copy is particular,” and “the person who composes the content can be different from the person who actually writes it.” Having the researcher’s perspective conveyed to us in such a way was invaluable. It was also useful to learn how they would improve existing search systems, “I would like intelligent suggestions” and another felt “feedback on which collection has been searched would be particularly useful.” Overall the focus group was an essential learning exercise for getting this project off the ground.
We have also spent some time interrogating the British Library’s data and gained an appreciation for the variety, volume, velocity and veracity of its structure. This presents a challenge which is interesting because it is not possible to resolve using familiar database software systems. The data we have begun working with is quite diverse, it was created during the digitisation of approximately 40,000 titles (equates to approximately 65,000 volumes) which until recently been challenging for researchers and the public to access. Now, all of the metadata, data and scans within the collection are dedicated into the public domain for unrestricted use.
The team have taken the opportunity to consult with key stakeholders and leading academics of the field. All of this has set us up very nicely to begin development work. In the coming weeks, we hope to build a powerful and intuitive service which will enable arts and humanities researchers to better interact with the British Library’s digitised collection of public domain books, thereby enabling them to access the data in a more meaningful way.
Nektaria Stavrou (Team Leader and MSc Software Systems Engineering, University College London), Stelios Georgiou (Testing Director and MSc Software Systems Engineering, UCL), Wendy Wong (MSc Computer Science, UCL), Stefan P. Alborzpour (MSc Computer Science, UCL)