UK Web Archive blog

14 September 2016

Surveying the Domain: Three Days with the Web Archiving Team

I’m Sara Day Thomson, researcher for the Digital Preservation Coalition. We’re a membership organisation who support institutions, like BL, to ensure long-term access to their digital content, no matter what that might be. To support my own professional development and general curiosity, the Web Archiving team at BL let me spend three days with them learning the ins and outs of archiving the Internet.


Web Archiving vs Digital Preservation?
What, you might ask, does web archiving have to do with digital preservation? I would answer: everything. Web Archiving operates at the frontier of capturing and preserving our contemporary cultural and historical record. From the Information Highway to social networking sites, the Internet represents not only our cultural record but the inscription of an evolving technology. As I learned while tinkering with the web archiving ‘machine’, I got a first-hand look at the challenge this creates for archivists who must keep pace with the development of the Web and how people use it.

If you haven’t seen it, I’m the author of the recent Preserving Social Media Technology Watch report. Preserving Social Media presents these same issues faced by organisations who want—or are required—to archive social media content. My three days with the BL team have provided a wider lens to my understanding of the role of social media and what it actually looks like to archive the wider Web.


Three days spent harvesting the Web with the BL team has solidified my view that web archiving is fundamentally an act of digital preservation. Just like many ‘traditional’ digital media, such as PDFs or emails or mp4s, further action must be taken on web content in its native form in order to ensure its long term accessibility. The need for further action for web content is urgent, even more so than for some other digital formats. During just my brief tenure, I came across more than one website that had disappeared since it was last harvested.

Challenges and rewards
Web content is complex—even discussing social media as a single category poses problems because different platforms function in different ways and are governed by varying Terms of Service. While social media has more recently become a dominant player, there’s a whole world of Web out there that isn’t ‘platformized’. Given this diversity—and the likelihood that technology will continue to dramatically alter how we dispense and consume information—web archivists are faced with the challenge of ensuring this content will be useable and comprehensible in the future. This challenge is at the centre of any digital preservation endeavour: it’s not just about saving the bits, or the code, but about preserving meaning.

The BL team are not alone in the effort to save the Web for future generations. While the team is relatively small (smaller than you’d think given the scale of the task), they work closely with their Legal Deposit partners, with curators within BL, with curators without the BL, and with the researchers and other users. The creation of a meaningful record of our lives online requires the input of all of these specialists and is likely to be more successful through open collaboration.

The challenges—and rewards—of digital preservation are best shared, whether it’s for the preservation of digitised manuscripts from the Middle Ages or the emails of the prime minister or a national record of the World Wide Web.

By Sara Day Thomson, Digital Preservation Coalition