17 December 2024
Open cultural data - an open GLAM perspective at the British Library
Drawing on work at and prior to the British Library, Digital Curator Mia Ridge shares a personal perspective on open cultural data for galleries, libraries, archives and museums (GLAMs) based on a recent lecture for students in Archives and Records Management…
Cultural heritage institutions face both exciting opportunities and complex challenges when sharing their collections online. This post gives common reasons why GLAMs share collections as open cultural data, and explores some strategic considerations behind making collections accessible.
What is Open Cultural Data?
Open cultural data includes a wide range of digital materials, from individual digitised or born-digital items – images, text, audiovisual records, 3D objects, etc. – to datasets of catalogue metadata, images or text, machine learning models and data derived from collections.
Open data must be clearly licensed for reuse, available for commercial and non-commercial use, and ideally provided in non-proprietary formats and standards (e.g. CSV, XML, JSON, RDF, IIIF).
Why Share Open Data?
The British Library shares open data for multiple compelling reasons.
Broadening Access and Engagement: by releasing over a million images on platforms like Flickr Commons, the Library has achieved an incredible 1.5 billion views. Open data allows people worldwide to experience wonder and delight with collections they might never physically access in the UK.
Deepening Access and Engagement: crowdsourcing and online volunteering provide opportunities for enthusiasts to spend time with individual items while helping enrich collections information. For instance, volunteers have helped transcribe complex materials like Victorian playbills, adding valuable contextual information.
Supporting Research and Scholarship: in addition to ‘traditional’ research, open collections support the development of reproducible computational methods including text and data mining, computer vision and image analysis. Institutions also learn more about their collections through formal and informal collaborations.
Creative Reuse: open data encourages artists to use collections, leading to remarkable creative projects including:
- A Malaysian band creating a music video using British Library images
- People experiencing an artistic commission with digitised book illustrations projected onto a massive building in central Leeds
- Experiments with image clustering and pose detection that provide new ways into collections
Some lessons for Effective Data Sharing
Make it as easy as possible for people to find and use your open collections:
- Tell people about your open data
- Celebrate and highlight creative reuses
- Use existing licences for usage rights where possible
- Provide data in accessible, sustainable formats
- Offer multiple access methods (e.g. individual items, datasets, APIs)
- Invest effort in meeting the FAIR, and where appropriate, CARE principles
Navigating Challenges
Open data isn't without tensions. Institutions must balance potential revenue, copyright restrictions, custodianship and ethical considerations with the benefits of publishing specific collections.
Managing expectations can also be a challenge. The number of digitised or born-digital items available may be tiny in comparison to the overall size of collections. The quality of digitised records – especially items digitised from microfiche and/or decades ago – might be less than ideal. Automatic text transcription and layout detection errors can limit the re-usability of some collections.
Some collections might not be available for re-use because they are still in copyright (or are orphan works, where the creator is not known), were digitised by a commercial partner, or are culturally sensitive.
The increase in the number of AI companies scraping collections site to train machine learning models has also given some institutions cause to re-consider their open data policies. Historical collections are more likely to be out of copyright and published for re-use, but they also contain structural prejudices and inequalities that could be embedded into machine learning models and generative AI outputs.
Conclusion
Open cultural data is more than just making collections available—it's about creating dynamic, collaborative spaces of knowledge exchange. By thoughtfully sharing our shared intellectual heritage, we enable new forms of research, inspiration and enjoyment.
AI use transparency statement: I recorded my recent lecture on my phone, then generated a loooong transcription on my phone. I then supplied the transcription and my key points to Claude, with a request to turn it into a blog post, then manually edited the results.