UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

2 posts from September 2023

27 September 2023

What can you discover and access in the UK Web Archive collection?

UK Web Archiving team, British Library

The UK Web Archive collects and preserves websites from the UK. When we started collecting in 2005, we sought permission from owners to archive their websites. Since 2013, legal deposit regulations have allowed us to automatically collect all websites that we can identify as located in or originating from the UK. 

Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. In this post we attempt to explain the concepts and terms of what a user will be able to find.

In the table below is a summary of the different search and access options which can be carried out via our main website (www.webarchive.org.uk). The rest of this post will go into more detail about the terms that we have used in this table.

Table of content availble in the UK Web Archive
Table of content availble in the UK Web Archive 

Year

In this table, ‘year’ refers to the year in which we archived a website, or web resource. This might be different to the year in which it was published or made available online. Once you have found an archived website, you can use the calendar feature to view all the instances, or ‘snapshots’ of that page (which might run over many years).  

Legal deposit regulations came into effect in April 2013. Before this date, websites were collected selectively and with the owners’ permissions. This means the amount of content we have from this earlier period is comparatively smaller, but (with some exceptions) is all available openly online. 

From 2013 onwards, we have collected all websites that we can identify as located in or originating from the UK. We do this once per year in a process that we call the ‘annual domain crawl.’

URL look-up

If you know the URL of a website you want to find in the UK Web Archive, you can use the search box at: https://www.webarchive.org.uk. The search box should recognise that you are looking for a URL, and you can also use a drop-down menu to switch between Full Text and URL search.

URL search covers the widest amount of the collection, and our index, which makes the websites searchable, is updated daily.

UKWA Search Bar September 2023
https://www.webarchive.org.uk/

Full text search

Much of the web archive collection has been indexed and allows a free-text search of the content, i.e., any word, phrase, number etc. Note: Given the amount of data in the web archive, the number of results will be very large.

Currently, full text search is available for all our automatically collected content up to 2015, and our curator selected websites up to 2017. 

Access at legal deposit libraries

Unless the website owner gives explicit permission otherwise, legal deposit regulations restrict access to archived websites to the six UK Legal Deposit Libraries. Access is in reading rooms using a library managed computer terminal.

Users will need a reader's pass to access a reading room: check the website of each Library on how to get a reader’s pass.

Online access outside a legal deposit library

We frequently request permission from website owners to allow us to make their archived websites openly accessible through our website. Where permission has been granted, these archived websites can be accessed from our website https://www.webarchive.org.uk/ from any location where you have internet access.

Additionally, we also make archived web content we can identify as having an Open Government Licence openly accessible.

From all the requests we send for open access to websites, we receive permission from approximately 25% of website owners.  However, these websites form a significant overall amount of content available in the archive. This is because they tend to be larger websites and are captured more frequently (daily, weekly, monthly etc.) over many years.

Curator selected websites

Each year, UK Web Archive curators, and other partners who we work with, identify thousands of resources on the web that are related to a particular topic or event, or that require more frequent collection than once per year.

Many of these archived websites form part of our Topics and Themes collections. We have more than 100 of these, covering general elections, sporting events, creative works, and communications between groups with shared interests or experiences. You can browse these collections to find archived web resources relating to these topics and themes. 

Annual Domain Crawl

Separate from selections made by curators, we conduct an annual ‘domain crawl’ to collect as much of the UK Web as possible. This is done under the Non-Print Legal Deposit regulations, with one ‘crawl’ completed each year. This domain crawl is largely automated and looks to archive all .uk, .scot, .wales, .cymru and .london top-level domain websites plus others that have been identified as being UK-based and in scope for collection.

21 September 2023

How YouTube is helping to drive UK Web Archive nominations

By Carlos Lelkes-Rarugal, Assistant Web Archivist, British Library

Screenshot of the UK Web Archive website 'Save a UK website' page.
https://www.webarchive.org.uk/nominate

There currently exists a plethora of digital platforms for all manner of online published works; YouTube itself has become more than just a platform for sharing videos, it has evolved into a platform for individuals and organisations to reach a global audience and convey powerful messages. Recently, a popular content creator on YouTube, Tom Scott, produced a short video helping to outline the purpose of Legal Deposit and by extension, the work being carried out by UKWA.

Watch the video here: https://www.youtube.com/watch?v=ZNVuIU6UUiM

Tom Scott’s video, titled "This library has every book ever published", is a concise and authentic glimpse into the work being done by the British Library, one of the six UK Legal Deposit Libraries. The video highlighted some of the technology being used that enables preservation at scale, which also highlighted the current efforts in web archiving. Dr Linda Arnold-Stratford (Head of Liaison and Governance for the Legal Deposit Libraries) stated, “The Library collection is around 170 million items. The vast majority of that is Legal Deposit”. Ian Cooke (Head of Contemporary British and Irish Publications) highlighted that with the expansion of Legal Deposit to include born-digital content that “the UK Web Archive has actually become one of the largest parts of the collection. Billions of files, about one and a half terabytes of data”.

At the time of writing, the video has had over 1.4 million views. In addition, as the video continued to gain momentum, something remarkable happened. UKWA started receiving an influx of email nominations from website owners and members of the public. This was unexpected and the volume of nominations that have since come through has been impressive and unprecedented. 

The video has led to increased engagement with the public; with nominations representing an eclectic mix of websites. The comments on the video have been truly positive. We are grateful to Tom for highlighting our work, but we are also thankful and humbled that so many commentators have left encouraging messages, which are a joy to read. The British Library has the largest web archive team of all the Legal Deposit Libraries, but this is still a small team of three curators and four technical experts where we do everything in-house from curation to the technical side. Web archiving is a difficult task but we are hopeful that we can continue to develop the web archive by strengthening our ties to the community by bringing together our collective knowledge.

If you know of a UK website that should be included in the archive, please nominate it here:  https://www.webarchive.org.uk/en/ukwa/info/nominate