What can you discover and access in the UK Web Archive collection?
UK Web Archiving team, British Library
The UK Web Archive collects and preserves websites from the UK. When we started collecting in 2005, we sought permission from owners to archive their websites. Since 2013, legal deposit regulations have allowed us to automatically collect all websites that we can identify as located in or originating from the UK.
Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. In this post we attempt to explain the concepts and terms of what a user will be able to find.
In the table below is a summary of the different search and access options which can be carried out via our main website (www.webarchive.org.uk). The rest of this post will go into more detail about the terms that we have used in this table.
In this table, ‘year’ refers to the year in which we archived a website, or web resource. This might be different to the year in which it was published or made available online. Once you have found an archived website, you can use the calendar feature to view all the instances, or ‘snapshots’ of that page (which might run over many years).
Legal deposit regulations came into effect in April 2013. Before this date, websites were collected selectively and with the owners’ permissions. This means the amount of content we have from this earlier period is comparatively smaller, but (with some exceptions) is all available openly online.
From 2013 onwards, we have collected all websites that we can identify as located in or originating from the UK. We do this once per year in a process that we call the ‘annual domain crawl.’
If you know the URL of a website you want to find in the UK Web Archive, you can use the search box at: https://www.webarchive.org.uk. The search box should recognise that you are looking for a URL, and you can also use a drop-down menu to switch between Full Text and URL search.
URL search covers the widest amount of the collection, and our index, which makes the websites searchable, is updated daily.
Full text search
Much of the web archive collection has been indexed and allows a free-text search of the content, i.e., any word, phrase, number etc. Note: Given the amount of data in the web archive, the number of results will be very large.
Currently, full text search is available for all our automatically collected content up to 2015, and our curator selected websites up to 2017.
Access at legal deposit libraries
Unless the website owner gives explicit permission otherwise, legal deposit regulations restrict access to archived websites to the six UK Legal Deposit Libraries. Access is in reading rooms using a library managed computer terminal.
Users will need a reader's pass to access a reading room: check the website of each Library on how to get a reader’s pass.
Online access outside a legal deposit library
We frequently request permission from website owners to allow us to make their archived websites openly accessible through our website. Where permission has been granted, these archived websites can be accessed from our website https://www.webarchive.org.uk/ from any location where you have internet access.
Additionally, we also make archived web content we can identify as having an Open Government Licence openly accessible.
From all the requests we send for open access to websites, we receive permission from approximately 25% of website owners. However, these websites form a significant overall amount of content available in the archive. This is because they tend to be larger websites and are captured more frequently (daily, weekly, monthly etc.) over many years.
Curator selected websites
Each year, UK Web Archive curators, and other partners who we work with, identify thousands of resources on the web that are related to a particular topic or event, or that require more frequent collection than once per year.
Many of these archived websites form part of our Topics and Themes collections. We have more than 100 of these, covering general elections, sporting events, creative works, and communications between groups with shared interests or experiences. You can browse these collections to find archived web resources relating to these topics and themes.
Annual Domain Crawl
Separate from selections made by curators, we conduct an annual ‘domain crawl’ to collect as much of the UK Web as possible. This is done under the Non-Print Legal Deposit regulations, with one ‘crawl’ completed each year. This domain crawl is largely automated and looks to archive all .uk, .scot, .wales, .cymru and .london top-level domain websites plus others that have been identified as being UK-based and in scope for collection.