23 February 2015
Preserving our digital heritage: how are we really doing?
Digital archiving has hit the headlines recently, as people begin to worry publicly about our society’s digital memory.
Are they right to worry? As suggested by Google Boss Vint Cerf last week, it’s true that if you do nothing to manage and preserve your digital content, then it will inevitably fade away with time as the software and formats with which to read them become obsolete.
This is a particular concern for national institutions holding vast amounts of digital content, as we are charged with preserving digital assets in the very long term, for the next generations of future historians and researchers in centuries to come.
This endeavour, which we refer to as digital preservation, has been the subject of research in libraries and archives around the world for the past two decades and, reassuringly, there is much progress being made.
The British Library, like many national institutions, has a team dedicated to ensuring our collection of more than four million digital items is accessible indefinitely. The diversity of our digital collection makes this a huge challenge: we’ve got everything from web archives to eJournals and eBooks, digitised archives and manuscripts (including tens of thousands of emails), as well as datasets and huge collections of audio and video content. We don’t collect games, or mobile apps, and we leave archiving Twitter to the Library of Congress, but setting that aside, the collection is still incredibly diverse.
Our digital preservation team works closely with IT and curators on development of end-to end-preservation workflows for all of our digital collections. Our work is currently led by the strategic priorities laid out in our 2013-2016 Digital Preservation Strategy , underpinned by our twelve principles of digital preservation. We have a systematic approach to preserving our digital collections, because planning and preparation is essential to avoid being caught out as formats and technology disappear over time.
The digital collections are preserved in our purpose built Digital Library System, which replicates content across four storage nodes in different parts of the UK.
We have plans in place to make sure that these digital collections are not part of the ‘lost century’ that Vint is worried about; we’ll be monitoring format changes, assessing risks and establishing a technical registry and preservation watch system . This work is constant, because technology is changing all of the time.
At the same time, much of this content is being made available to researchers in the British Library’s Reading Rooms, and we’re investigating with academics how future historians will tackle this enormous resource, as well as how it might be curated.
We share expertise with other institutions in the UK through our work with the Digital Preservation Coalition, a non-profit membership organisation founded specifically to help institutions understand and address digital preservation challenges. We also take a leading role in the International Internet Preservation Consortium, which looks specifically at the challenges of web archiving, and which comprises over 50 members in 30 countries around the world.
Emulation and the use of ‘virtual machines’, which was one of the solutions proposed by Vint Cerf, may yet form part of our solution. Lots of work has been done in this area over the past decade, and our web archive technology already utilises an emulation solution within its Interject prototype for accessing resources in non-current formats.
At the end of the day though, preservation is much more than just a technical problem.
Preservation is about planning, it’s about management, it’s about process, it’s about permanence, and it’s about people. You need all of these things for preservation to happen. And that’s what we’re working on.
You might be wondering how this large-scale digital preservation work applies to your own personal digital content, like those collections of emails and photos we hold our desktops and hard drives. It’s early days, but it’s possible that in the future the expertise and solutions that we and others are developing will be adopted by commercial systems.
For now, just knowing which ones you want to keep, then keeping them accessible and backed up is a great start. More advice on this is available from the Library of Congress.
For further information about the work of the Library’s Digital Preservation team, visit the British Library website.
For more information about the Digital Preservation Coalition, visit their website at www.dpconline.org
Visit the British Library Living Knowledge webpage to learn more about making our intellectual heritage accessible to everyone, as we look ahead to 2023.
Maureen Pennock, Head of Digital Preservation