Harnessing the Crowd: Coronavirus Topical Collection at the UK Web Archive
By Nicola Bingham, Lead Curator of Web Archiving, The British Library
Note: This post was originally published on the Digital Preservation Coalition (DPC) blog.
The UK Web Archive, a partnership of the 6 UK Legal Deposit Libraries* (LDLs), has been collecting UK websites since the early 2000’s. As well as archiving snapshots of the whole UK Web Space we have dozens of curated collections focussing on a wide range of topics, themes and events reflecting all aspects of UK life.
Collections are instigated by a broad range of curators – in this context, ‘curator’ is not necessarily synonymous with job title - including LDL staff, academic researchers, various UK GLAM organisations (e.g., Jersey Heritage, Hampshire Archives and Local Studies, Wimbledon Lawn Tennis Museum) and local community groups. Collections may focus on a researcher’s area of interest, align with an institution’s collection policy or reflect diverse political, sporting or topical events such as the London Olympic Games, Brexit or Climate Change. Below are the members of the Web Archiving team at the British Library.
We have a particularly strong time-series of collections focusing on UK General Elections having archived every campaign since 2005. For each event we have used more or less the same categories – candidate’s web presence, national and local political party websites, online news and commentary, interest group manifestos and comment and analysis by think tanks.
Structuring the collections with consistent sub-categories enables curators to distribute web archiving more efficiently, as does dividing selection broadly along the lines of the geographical interest of the 3 National Libraries that belong to the UKWA.
We hope that our General Election collections will preserve the voices and illustrate the concerns and priorities of a wide spectrum of UK society and help to show how political parties and candidates engaged and responded at pivotal moments in UK history.
It is interesting to note how use of the Internet for political campaigning and communication has evolved over time. In 2005 very little social media existed and politicians were just beginning to explore its capabilities, whereas by 2019 campaigners were making little or no use of websites, concentrating almost exclusively on using social media.
The (somewhat) scheduled nature of UK General Elections, especially since the Fixed-term Parliaments Act of 2011, allows us to plan election web archiving strategies ahead of time. Having said this, we have been tested in recent years with snap elections in June 2017 and December 2019! And of course candidates are only announced a couple of week’s before polling day which means we have to react at that point to archive candidate’s websites, or official, publicly facing social media accounts.
Rapidly unfolding events such as natural disasters or terrorist attacks require a different approach. However, even here we have some experience, having archived collections about the London Terrorist Attack 2005, Grenfell Tower Fire, and Pandemic Outbreaks such as Avian Flu and Swine Flu over the years.
For the past few weeks we have been actively collecting the UK perspective of the Coronavirus (COVID-19) Pandemic. We are clearly facing one of the severest threats in our lifetimes, certainly one of the fastest and most clearly devastating, and while Librarians might not (yet) be members of the Emergency Services, we feel the act of recording the outbreak as it plays out online is a crucial one.
Websites are being selected by a cohort of curators across the LDLs and beyond. We have also been ably assisted by colleagues at the Royal College of Nursing Archives who are nominating health-related websites. However due to the unpredictable, fast paced nature of the outbreak and the consequent deluge in online information, it is more important for us to harness the crowd to elicit website nominations. For this reason, we will canvas for website nominations much more widely among our colleagues, the library and archive community and the general public when responding to rapidly unfolding events. We will also visit targeted websites much more frequently than we would usually to capture frequently edited web content.
The collection is not public yet while we concentrate on acquiring the websites. Once we’re finished, it will take time to prepare the collection for publication by performing quality assurance and clearing permissions for open access. In due course, the Coronavirus collection will be available here under the Pandemic Outbreaks Collection. The top-level heading reflects the fact that we have previously collected around Avian Flu and Swine Flu and acknowledges that, sadly, we will be collecting about future outbreaks.
In terms of getting involved, we welcome submissions from colleagues in the DPC community - and in fact from any member of the public. Details of how to nominate websites for inclusion are here: www.webarchive.org.uk/nominate. Alternatively, please email nominations to firstname.lastname@example.org
We’re also working on an international collection with the International Internet Preservation Consortium (IIPC). Details of how to contribute to this collection are here: netpreserveblog.wordpress.com/2020/02/13/cdg-collection-novel-coronavirus/ (non-English language websites are particularly welcome here).
If your organisation has not previously done any web archiving and you would like to capture your own institution’s or communities’ response to Coronavirus, plenty of tools exist that can be used remotely. Webrecorder is a good place to start as it can be used in a browser, free of charge up to a 5GB data limit. Of course web archives such as the UKWA and Internet Archive would also be very happy to preserve your websites free of charge (see details above).
*The UK Legal Deposit Libraries: Bodleian Libraries, Oxford University, British Library, Cambridge University Libraries, National Library of Scotland, National Library of Wales, Trinity College, Dublin