UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

27 May 2020

Web Archiving the UK General Election 2019

By Jennie Grimshaw, Curator for Official Publications, The British Library

The 2019 general election was a turning point in British political history.  It saw the resurgence of the Conservative vote from 8.8% at the May 2019 European Parliament election to 23.6% eight months later. It saw the collapse of the “Red Wall” in the North and the Midlands as seats such as Sedgfield,  Labour since 1935, and Workington, Labour 1918-1976 and 1979-2019, turned Conservative. It saw the breaking of the Parliamentary deadlock over Brexit with the return to power of pro-Leave Conservative Prime Minister Boris Johnson, with a majority of 80, which then enabled him to “Get Brexit Done”

Polling station sign

To help researchers trace how use of the Internet for political campaigning and communication has evolved over time, the British Library, the National Libraries of Scotland and Wales, and the Bodleian Library have collaborated to create a web archive collection for all UK general elections since 2005, using more or less the same categories – candidates web presence, national and local political party websites, online news and commentary, interest group manifestos and comment and analysis by think tanks. This collection is the fifth in the time series and is complemented by the EU Referendum and Brexit collections. In 2019 Northern Ireland political party and candidate sites were selected by the Public Record Office of Northern Ireland (PRONI).

Sadly, our ability to harvest social media sites is very limited due to technical and legal issues. We can gather Twitter feeds, but not Facebook pages as the site deliberately blocks the crawls and it has proved impossible to negotiate access.  Due to the limitations of our crawl software, we cannot always gather dynamic content, Wix-based sites, documents stored in the cloud or videos. This may explain why some worthy candidate sites you might expect to see have not been archived.

However, the UK Web Archive General Election 2019 does offers researchers a respectable total of 2237 sites, including:

  • UK–based news and comment sites , such as Politics Home, Political UK, the Commentator, Unherd, Reaction, CAPX and  CAPX 2019 Election Archive as well as Twitter feeds of selected  political journalists. We include satirical sites such as the spoof Conservative Manifesto created by a company called Concerned Citizens Ltd and the Daily Reckless, which offers satirical songs by Tommy Mackay.
  • Candidates campaign websites and Twitter feeds and local constituency party sites. The collection includes all candidates standing in Scottish and Welsh constituencies. Due to staff resource limitations, we can only capture a sample of the websites and Twitter feeds of candidates standing in English constituencies. We cover three inner London and three outer London boroughs, and one rural and one urban constituency from each of the English regions.  We have used the same English constituencies for every election since 2005.
  • Websites and Twitter feeds of 100 national political parties, ranging from fringe groups such as the Animal Welfare Party, Arthur Horner of the Welsh Communist Party, Britain First and the anarchist Class War Party to the major national parties (Conservative, Labour, Liberal Democrats, and Greens) and political parties in the devolved administrations (Plaid Cymru, Scottish Nationalists and the Northern Ireland parties). We also capture major political party blogs such as Conservative Home, Conservative Woman, Labour List and Liberal Democrat Voice.
  • Social media sites of the main party leaders captured in depth using Web Recorder. We can use this software only very sparingly as operating it is very resource and time intensive, but it can capture sites our regular crawler cannot reach.
  • Interest groups, seeking to influence party policies through engagement with candidates and publication of manifestos and lists of “asks”.  We have selected about 340 sites, ranging from the manifestos of campaigning charities such as Age UK and trade associations such as Airlines UK and the Association of the British Pharmaceutical Industry to unions, religious groups, such as the Evangelical Alliance and Muslim Council of Britain, and pressure groups such as the Campaign to Protect Rural England, Actionaid and Anti-Slavery International. Health charities and environmental groups are particularly prominent at this election and professional associations such as the medical royal colleges are also well represented. The voices of disabled people and minority groups are also heard through manifestos and comment from Leonard Cheshire Disability, SCOPE, Disability Rights UK, MENCAP, RNIB, Operation Black Vote and the Muslim Public Affair’s Committee’s Operation Muslim Vote 2019.
  • Thank tanks and academic research centres providing in-depth comment and analysis. We have sought to include both right- and left-wing views, and comment from political, legal and economic viewpoints. Targets include the Centre for Labour and Social Studies (CLASS), the London School of Economics British Politics and Policy blog, the Centre for Constitutional Change, Demos’ Manifesto for consensus politics, the Democratic Audit blog, the British Future think tank, Full Fact, the Institute for Fiscal Studies the King’s Fund and the Institute for Government, etc.

We hope that this collection will preserve the voices and illustrate the concerns and priorities of a wide spectrum of UK society and help to show how political parties and candidates engaged and  responded at this pivotal moment of UK history.

General Election 2019 collection. Note that you can view what is in this collection but many of the actual websites can only be viewed in the reading room of a UK Legal Deposit Library.

27 April 2020

The Brexit Collection in the UK Web Archive

By Jennie Grimshaw, Curator of Official Publications, The British Library

The vote to leave the EU on June 23rd 2016 by 52% to 48% bitterly divided the nation. Remainers argued that the margin of victory was narrow, and that Leave voters had been misled. They saw Brexit as a self-inflicted wound and campaigned for a second referendum.  Leavers accused the Remain camp of seeking to undermine democracy and frustrate the clearly expressed will of the majority of voters. They saw Brexit as an opportunity to throw off what they viewed as the shackles of the EU and become a proud, independent sovereign nation once again.

Collection_910

The Brexit web archive collection follows on directly from the EU Referendum Collection and traces the course of the increasingly bitter battle to either overturn Brexit or “get it done” from June 24th 2016 to Brexit Day on January 31st 2020.  It was created by web archivists and curators at the British Library, the Bodleian Library, and the national libraries of Scotland and Wales working in collaboration over three years.  It seeks to achieve balance by including:

  • News reports, indexed at the level of the individual article, from pro-Brexit, anti-Brexit and neutral newspapers and broadcasters. Pro-Brexit sources include the Sun, the Daily Mail and the Express newspapers. Overtly anti-Brexit sources are represented by the Guardian and the Independent.  Coverage of broadcast news sources focuses on the BBC, ITV and Sky News. The aim is to document every twist and turn of the unbelievably complex and convoluted process of delivering the EU Referendum result in order to guide enquirers through the labyrinth.
  • A range of online only news and comment services, indexed at the site level, including Politics Home, Politics.co.uk, Huffingtom Post, CAPX, The Conversation, Spiked and Unherd. Unfortunately Politico had to be excluded as it operates outside of the UK web domain.
  • Legislation and legal commentary, covering bills, Commons and Lords debates,  briefings on them by the Commons and Lords Libraries, legal challenges and case law, and comment in blogs such as EU Law Analysis and by professional groups such as Lawyers for Britain
  • Government and Parliamentary action, offering Commons and Lords Select Committee reports, analysis by the Commons and Lords Libraries, and speeches, press notices, letters and guidance from GOV.UK.
  • The EU view on developments as reported by UK-based news sources. The constraints of the UK Non-Print Legal Deposit regulations meant that we could not include EU institutional websites based in Brussels!
  • The voices of politicians of all parties, heard through their websites, Twitter feeds, and blogs such as Conservative Home, Conservative Woman, Labour List and Liberal Democrat Voice.
  • Comment by pro- and anti-Brexit pressure groups such as Best for Britain and the People’s Vote Campaign (anti-Brexit) and Brexit Central and Better Off Out (pro-Brexit)
  • The voices of a wide range of trade and professional associations,  charities, business organisations and trade unions lobbying government and speaking up for and advising their members. Some voices are sadly silenced due to technical difficulties, such as the website of The3Million, which speaks for EU citizens in the UK, but which cannot be archived because it is Wix-based.  Others which are heard range from business associations MakeUK  and the CBI to charities like Age UK and professional bodies such as the British Veterinary Association.
  • Research and analysis by pro- and anti- EU think tanks such as Centre for European Reform, the Federal Trust,  Policy Exchange, the Institute for Fiscal Studies, the Red Cell, Briefings for Brexit,  Demos, IPPR  and Centre for Policy Studies, the Institute for Government, Global Britain, Politeia,  etc.
  • Sub-Collections on the impact of Brexit on Wales and Scotland built up energetically by colleagues at their respective National Libraries.
  • Sub-collections on the impact on Northern Ireland and the Republic of Ireland, especially the vexed question of the Irish land border.

The path leading to Brexit day on January 31st was long and winding. The debate polarised society and strained the UK constitution to breaking point as Remainers in a hung Parliament fought across party lines to delay or prevent Brexit  and Leavers went to equally extreme lengths to deliver it. The process saw the birth of Lawfare as Remainers used legal challenges to seek to block it.  Brexit also destabilised the devolution settlement and reignited calls for Scottish independence. We hope that by documenting the debate as it played out on the Internet and on social media we can help researchers to gain understanding in retrospect of this tumultuous period of our history.

24 April 2020

Harnessing the Crowd: Coronavirus Topical Collection at the UK Web Archive

By Nicola Bingham, Lead Curator of Web Archiving, The British Library

Note: This post was originally published on the Digital Preservation Coalition (DPC) blog.

The UK Web Archive, a partnership of the 6 UK Legal Deposit Libraries* (LDLs), has been collecting UK websites since the early 2000’s. As well as archiving snapshots of the whole UK Web Space we have dozens of curated collections focussing on a wide range of topics, themes and events reflecting all aspects of UK life.

Collections are instigated by a broad range of curators – in this context, ‘curator’ is not necessarily synonymous with job title - including LDL staff, academic researchers, various UK GLAM organisations (e.g., Jersey Heritage, Hampshire Archives and Local Studies, Wimbledon Lawn Tennis Museum) and local community groups. Collections may focus on a researcher’s area of interest, align with an institution’s collection policy or reflect diverse political, sporting or topical events such as the London Olympic Games, Brexit or Climate Change. Below are the members of the Web Archiving team at the British Library.

UK Web Archive Team

We have a particularly strong time-series of collections focusing on UK General Elections having archived every campaign since 2005. For each event we have used more or less the same categories – candidate’s web presence, national and local political party websites, online news and commentary, interest group manifestos and comment and analysis by think tanks.

Structuring the collections with consistent sub-categories enables curators to distribute web archiving more efficiently, as does dividing selection broadly along the lines of the geographical interest of the 3 National Libraries that belong to the UKWA.

We hope that our General Election collections will preserve the voices and illustrate the concerns and priorities of a wide spectrum of UK society and help to show how political parties and candidates engaged and responded at pivotal moments in UK history.

It is interesting to note how use of the Internet for political campaigning and communication has evolved over time. In 2005 very little social media existed and politicians were just beginning to explore its capabilities, whereas by 2019 campaigners were making little or no use of websites, concentrating almost exclusively on using social media.

The (somewhat) scheduled nature of UK General Elections, especially since the Fixed-term Parliaments Act of 2011, allows us to plan election web archiving strategies ahead of time. Having said this, we have been tested in recent years with snap elections in June 2017 and December 2019! And of course candidates are only announced a couple of week’s before polling day which means we have to react at that point to archive candidate’s websites, or official, publicly facing social media accounts.

Rapidly unfolding events such as natural disasters or terrorist attacks require a different approach. However, even here we have some experience, having archived collections about the London Terrorist Attack 2005, Grenfell Tower Fire, and Pandemic Outbreaks such as Avian Flu and Swine Flu over the years.

For the past few weeks we have been actively collecting the UK perspective of the Coronavirus (COVID-19) Pandemic. We are clearly facing one of the severest threats in our lifetimes, certainly one of the fastest and most clearly devastating, and while Librarians might not (yet) be members of the Emergency Services, we feel the act of recording the outbreak as it plays out online is a crucial one.

Websites are being selected by a cohort of curators across the LDLs and beyond. We have also been ably assisted by colleagues at the Royal College of Nursing Archives who are nominating health-related websites. However due to the unpredictable, fast paced nature of the outbreak and the consequent deluge in online information, it is more important for us to harness the crowd to elicit website nominations. For this reason, we will canvas for website nominations much more widely among our colleagues, the library and archive community and the general public when responding to rapidly unfolding events. We will also visit targeted websites much more frequently than we would usually to capture frequently edited web content.

The collection is not public yet while we concentrate on acquiring the websites. Once we’re finished, it will take time to prepare the collection for publication by performing quality assurance and clearing permissions for open access. In due course, the Coronavirus collection will be available here under the Pandemic Outbreaks Collection. The top-level heading reflects the fact that we have previously collected around Avian Flu and Swine Flu and acknowledges that, sadly, we will be collecting about future outbreaks.

UKWA_PandemicOutbreak_Collection_Screenshot

In terms of getting involved, we welcome submissions from colleagues in the DPC community - and in fact from any member of the public. Details of how to nominate websites for inclusion are here: www.webarchive.org.uk/nominate. Alternatively, please email nominations to web-archivist@bl.uk
We’re also working on an international collection with the International Internet Preservation Consortium (IIPC). Details of how to contribute to this collection are here: netpreserveblog.wordpress.com/2020/02/13/cdg-collection-novel-coronavirus/ (non-English language websites are particularly welcome here).

If your organisation has not previously done any web archiving and you would like to capture your own institution’s or communities’ response to Coronavirus, plenty of tools exist that can be used remotely. Webrecorder is a good place to start as it can be used in a browser, free of charge up to a 5GB data limit. Of course web archives such as the UKWA and Internet Archive would also be very happy to preserve your websites free of charge (see details above).

*The UK Legal Deposit Libraries: Bodleian Libraries, Oxford University, British Library, Cambridge University Libraries, National Library of Scotland, National Library of Wales, Trinity College, Dublin