Safeguarding the Digital Legacy: the UK Web Archive is a finalist for the 2020 Digital Preservation Awards
By Ian Cooke, Head of Contemporary British Published Collections at the British Library
Alongside the other finalists, we presented at #WeMissiPRES conference on 23 September. We only had a few minutes, so our ‘lightning talk’ went by in a flash. Here is a slightly extended version of our presentation.
This year, the UK Web Archive celebrates its 15 year anniversary. It is 15 years since we first made public an online interface to our newly-created Web Archive. It’s important to us that we date from that point as, all through our 15 years, access has been a core part of what we do, and drives how we think about preservation.
Anniversaries are important, because they offer us a point to look back, to give us a longer-term perspective on our work, but also because they prompt us to think about our values as well as our legacy.
So, thinking about our values, preservation and legacy, we want to talk about three things that we are really proud of:
The content matters
This has led us in everything else. Communication on the web is primarily about us, about the people and communities that we share our lives with. Preservation of the web matters, because it is vital to how we understand ourselves now, and how we understand our recent past. From our beginnings, we have made the case that the web is not trivial – it should be valued – and we continue to make that case. We do this by creating thematic collections, which put the focus on the subject not the form; by talking publicly and online about our work; and by working with researchers to understand what the archived web can tell us.
Being led by the content can result in complex and innovative technological interventions, such as the continued monitoring and refinement of our domain crawls to ensure that we are as comprehensive as we can be.
It is also about policy and engagement. It’s about making sure we understand the content, and the people creating it. We reach out to communities and groups to help create collections, and this is something we understand better as we have grown. We do this by partnering with specialist archives or community groups, or through public calls for co-operation. An example currently is our LGBTQ+ lives collection, where we are working with the LGBTQ+ network of the Chartered Institute of Library and Information Professionals in the UK and also have been using social media to call for content.
We work collaboratively
This has been at the heart of the UK Web Archive, which has always existed as a collaborative venture between organisations – now linking the six Legal Deposit Libraries of the UK. We also engage with our peer institutions, to learn and share experience. Collaboration is vital to build and maintain the capacity that all institutions active in web archiving need to meet the preservation challenges presented by the live web. A key part of that has been the International Internet Preservation Consortium (IIPC), where we are proud to be the host for the Programme and Communications Officer. As well as participating in conferences, workshops and hackathons, we regularly take part in the ‘Online Hours: Supporting Open Source’ calls, which are dedicated to ensuring that the IIPC’s open source initiatives are truly open to members.
We work collaboratively also with researchers, both in collection-building and in research projects using the archived web. Working with researchers helps us to understand ‘real life’ challenges, and inspires the way we build our services and communicate about them. We are immensely proud of our role in the ‘Big UK Domain Data for the Arts and Humanities’ project, which helped us build our ‘Shine’ analytical tool for full-text indexes. More recently, we have been working on research in economic geography – using our postcode data set; and with researchers from the Alan Turing Institute, to understand how our data can be used to analyse word value change over time.
Research use of the UK Web Archive has developed over time. An early, and enduring use, has been a ‘close reading’ of websites. This approach may look at one or a small number of websites and study the content, layout and functionality in detail. Sometimes these studies have a longitudinal aspect, looking at change over time. Our user interface helps researchers find individual websites, or groups of websites, that are relevant to their study. This approach has been supplemented by other research methods which attempt to understand a much larger body of content at scale. This research uses tools and data to understand communication and behaviour on the web. These methods can be mutually supportive, with the results of computational analysis of the web providing supporting context for a close reading of a small number of sites.
We work openly
From the start, we have seen access as a vital part of our preservation work. This includes helping us to validate the preservation actions that we have taken, and also in wider advocacy for preservation of born digital content. We seek permissions to make selected web content more openly available, and look to use existing licences to make other content available. We currently do this with content released under Open Government Licences. We also work to make sure that the data we generate about our collections is available, whether that is the full-text indexes that can be searched in our User Interface, or datasets that we have generated from earlier crawls of the UK domain. Earlier this year, we worked with the National Library of Australia, National Library of New Zealand and the historian Tim Sherratt, to develop tools (using Jupyter Notebooks) that could be re-used to analyse our openly accessible data.
Looking ahead, we want to review and update our curatorial tools to support collaboration and collection building. We want to understand what the barriers are to using the archived web in research, and share more information to help researchers understand our collections. Linked to this, we are developing a research engagement plan, which will make sure that our collections and services continue to develop to meet identified needs.
So, as we look back over our 15 year history, these are three of the things that make us proud, and will continue to inspire us. Understanding the value of our collections, working in partnerships and connecting our users and public with our collections. These are values that we know we share with the wider Digital Preservation community, so are very grateful for this chance to join the celebration.
You can watch back on all of the presentations from this category on the #WeMissiPRES conference YouTube Channel.