UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

43 posts categorized "Legal deposit"

18 October 2023

UK Web Archive Technical Update - Autumn 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2023 Q2 report

Replication

The most important achievement over the last quarter has been establishing a replica of the UK Web Archive holdings at the National Library of Scotland (NLS). The five servers we’d filled with data were shipped, and our NLS colleagues kindly unpacked and installed them. We visited a few weeks later, finishing off the configuration of the servers so they can be monitored by the NLS staff and remotely managed by us.

This replica contains 1.160 PB of WARCs and logs, covering the period up until February 2023. But, of course, we’ve continued collection since then, and including the 2023 Domain Crawl, we already have significantly more data held at the British Library (about 160 TB more, ~1.3 PB in total). So, the next stage of the project is to establish processes to monitor and update the remote replica. Hopefully, we can update it over the internet rather than having to ship hardware back and forth, but this is what we’ll be looking into over the next weeks.

The 2023 Domain Crawl

As reported before, this year we are running the Domain Crawl on site. It’s had some issues with link farms, which caused the number of domains to leap from around 30 million to around 175 million, which crashed the crawl process.

2023-10-10-dc2023-queues

2023 Domain Crawl queues over time, showing peak at 175 million queues.

However, we were able to clean up and restart it, and it’s been stable since then. As of the end of this quarter we’ve downloaded 2.8 billion URLs, corresponding to 183 TB of (uncompressed) data.

Legal Deposit Access Service

We’ve continued to work with Webrecorder, who have added citation, search and print functionality to the ePub reader part of the Legal Deposit Access Service. This has been deployed and is available for staff testing, but we are still resolving issues around making it available for realistic testing in reading rooms across the Legal Deposit Libraries.

Browsertrix Cloud Local Deployment

We have worked out most of the issues around getting Browsertrix Cloud deployed in a way that complies with Non-Print Legal Deposit legislation and with our local policies. We are awaiting the 1.7.0 release which will include everything we need to have a functional prototype service.

Once it’s running, we can start trying our some test crawls, and work on how best to integrate the outputs into our main collection. We need some metadata protocol for marking crawls as ready for ingest, and we need to update our tools to carefully copy the results into our archival store, and support using WACZ files for indexing and access.

27 September 2023

What can you discover and access in the UK Web Archive collection?

UK Web Archiving team, British Library

The UK Web Archive collects and preserves websites from the UK. When we started collecting in 2005, we sought permission from owners to archive their websites. Since 2013, legal deposit regulations have allowed us to automatically collect all websites that we can identify as located in or originating from the UK. 

Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. In this post we attempt to explain the concepts and terms of what a user will be able to find.

In the table below is a summary of the different search and access options which can be carried out via our main website (www.webarchive.org.uk). The rest of this post will go into more detail about the terms that we have used in this table.

Table of content availble in the UK Web Archive
Table of content availble in the UK Web Archive 

Year

In this table, ‘year’ refers to the year in which we archived a website, or web resource. This might be different to the year in which it was published or made available online. Once you have found an archived website, you can use the calendar feature to view all the instances, or ‘snapshots’ of that page (which might run over many years).  

Legal deposit regulations came into effect in April 2013. Before this date, websites were collected selectively and with the owners’ permissions. This means the amount of content we have from this earlier period is comparatively smaller, but (with some exceptions) is all available openly online. 

From 2013 onwards, we have collected all websites that we can identify as located in or originating from the UK. We do this once per year in a process that we call the ‘annual domain crawl.’

URL look-up

If you know the URL of a website you want to find in the UK Web Archive, you can use the search box at: https://www.webarchive.org.uk. The search box should recognise that you are looking for a URL, and you can also use a drop-down menu to switch between Full Text and URL search.

URL search covers the widest amount of the collection, and our index, which makes the websites searchable, is updated daily.

UKWA Search Bar September 2023
https://www.webarchive.org.uk/

Full text search

Much of the web archive collection has been indexed and allows a free-text search of the content, i.e., any word, phrase, number etc. Note: Given the amount of data in the web archive, the number of results will be very large.

Currently, full text search is available for all our automatically collected content up to 2015, and our curator selected websites up to 2017. 

Access at legal deposit libraries

Unless the website owner gives explicit permission otherwise, legal deposit regulations restrict access to archived websites to the six UK Legal Deposit Libraries. Access is in reading rooms using a library managed computer terminal.

Users will need a reader's pass to access a reading room: check the website of each Library on how to get a reader’s pass.

Online access outside a legal deposit library

We frequently request permission from website owners to allow us to make their archived websites openly accessible through our website. Where permission has been granted, these archived websites can be accessed from our website https://www.webarchive.org.uk/ from any location where you have internet access.

Additionally, we also make archived web content we can identify as having an Open Government Licence openly accessible.

From all the requests we send for open access to websites, we receive permission from approximately 25% of website owners.  However, these websites form a significant overall amount of content available in the archive. This is because they tend to be larger websites and are captured more frequently (daily, weekly, monthly etc.) over many years.

Curator selected websites

Each year, UK Web Archive curators, and other partners who we work with, identify thousands of resources on the web that are related to a particular topic or event, or that require more frequent collection than once per year.

Many of these archived websites form part of our Topics and Themes collections. We have more than 100 of these, covering general elections, sporting events, creative works, and communications between groups with shared interests or experiences. You can browse these collections to find archived web resources relating to these topics and themes. 

Annual Domain Crawl

Separate from selections made by curators, we conduct an annual ‘domain crawl’ to collect as much of the UK Web as possible. This is done under the Non-Print Legal Deposit regulations, with one ‘crawl’ completed each year. This domain crawl is largely automated and looks to archive all .uk, .scot, .wales, .cymru and .london top-level domain websites plus others that have been identified as being UK-based and in scope for collection.

21 September 2023

How YouTube is helping to drive UK Web Archive nominations

By Carlos Lelkes-Rarugal, Assistant Web Archivist, British Library

Screenshot of the UK Web Archive website 'Save a UK website' page.
https://www.webarchive.org.uk/nominate

There currently exists a plethora of digital platforms for all manner of online published works; YouTube itself has become more than just a platform for sharing videos, it has evolved into a platform for individuals and organisations to reach a global audience and convey powerful messages. Recently, a popular content creator on YouTube, Tom Scott, produced a short video helping to outline the purpose of Legal Deposit and by extension, the work being carried out by UKWA.

Watch the video here: https://www.youtube.com/watch?v=ZNVuIU6UUiM

Tom Scott’s video, titled "This library has every book ever published", is a concise and authentic glimpse into the work being done by the British Library, one of the six UK Legal Deposit Libraries. The video highlighted some of the technology being used that enables preservation at scale, which also highlighted the current efforts in web archiving. Dr Linda Arnold-Stratford (Head of Liaison and Governance for the Legal Deposit Libraries) stated, “The Library collection is around 170 million items. The vast majority of that is Legal Deposit”. Ian Cooke (Head of Contemporary British and Irish Publications) highlighted that with the expansion of Legal Deposit to include born-digital content that “the UK Web Archive has actually become one of the largest parts of the collection. Billions of files, about one and a half terabytes of data”.

At the time of writing, the video has had over 1.4 million views. In addition, as the video continued to gain momentum, something remarkable happened. UKWA started receiving an influx of email nominations from website owners and members of the public. This was unexpected and the volume of nominations that have since come through has been impressive and unprecedented. 

The video has led to increased engagement with the public; with nominations representing an eclectic mix of websites. The comments on the video have been truly positive. We are grateful to Tom for highlighting our work, but we are also thankful and humbled that so many commentators have left encouraging messages, which are a joy to read. The British Library has the largest web archive team of all the Legal Deposit Libraries, but this is still a small team of three curators and four technical experts where we do everything in-house from curation to the technical side. Web archiving is a difficult task but we are hopeful that we can continue to develop the web archive by strengthening our ties to the community by bringing together our collective knowledge.

If you know of a UK website that should be included in the archive, please nominate it here:  https://www.webarchive.org.uk/en/ukwa/info/nominate

12 July 2023

UK Web Archive Technical Update - Summer 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2023 Q1 report.

At the end of the last quarter, we launched the 2023 Domain Crawl. This started well (as described in the 2023 Q1 report) but a few days later it became clear the crawl was going a bit too well. We were collecting so quickly, we started to run out of space on the temporary store we use as a buffer for incoming content.

The full story of how we responded to this situation is quite complicated, so I wrote up the detailed analysis in a separate blog post. But in short, we took the opportunity to move to a faster transfer process and switch to a widely-used open source tool called Rclone. After about a week of downtime, the crawl was up and running again, and we were able to keep up and store and index all the new WARC files as they come in.

Since then, the crawl has been running pretty well, but there have been some problems…

2023-07-05-dc-storage-and-queues
2023 Domain Crawl Storage and Queues

The crawler uses disk space in two main ways: the database of queues of URLs to visit (a.k.a. the crawl frontier), and the results of the crawl (the WARCs and logs). The work with Rclone helped us get the latter under control, with the move from /mnt/gluster/dc2023 to sharing the main /opt drive and uploading directly to Hadoop. These uploads run daily, leading to a saw-tooth pattern as free space gets rapidly released before being slowly re-consumed.

But the frontier shares the same disk space, and can grow very large during a crawl. So it’s important we keep an eye on things to make sure we don’t run out of space. In the past, before we made some changes to Heritrix itself, it was possible for a domain crawl to consume huge amounts of disk space. Once, we hit over 100TB for the frontier, which becomes very difficult to manage. In recent domain crawls, our configuration changes we’ve managed to get this down to more like 10TB.

But, as you can see, around the 13th of June, we hit some kind of problem, where the apparent number of queues in the frontier started rapidly increasing, as did the rate at which we were consuming disk space. We deleted some crawler checkpoints to recover some space, as we very rarely need to restart the crawl from anything other than the most recent daily checkpoint, but this only freed-up modest amounts of space. Fortunately, the aggressive frontier growth seemed to subside before we ran out of space, and the crawl is now stable again.

Unfortunately, it’s not clear what happened. Based on previous crawls, it seems unlikely that the crawler suddenly discovered many more millions of web hosts at this point in the crawl. In the past, the number of queues has been consistently up to around 20 million at most, so this leap to over 30 million is surprising. It is possible we hit some weird web structures, but it’s difficult to tell as we do don’t yet have reliable tools for quickly analysing what’s going on in this situation.

Suspiciously, just prior to this problem, we resolved a different issue with the system used to record what URLs had been seen already. This had been accidentally starved of resources, causing problems when the crawler was trying to record what URLs had been seen. This lead to the gaps in the crawl monitoring data just prior to the frontier growth, as the system stopped working and required some reconfiguration. It’s possible this problem left the crawler in a bit of a confused state, leading to mis-management of the frontier database. Some analysis of the crawl will be needed to work out what happened.

In the laster quarter, the new URL search feature was deployed on our BETA service. Following favourable feedback on the new feature, the main https://www.webarchive.org.uk/ service has been updated to match. We hope you find the direct URL search useful.

We’ve also updated the code that recognises whether a visitor is in a Legal Deposit reading room, as it wasn’t correctly identifying readers at Cambridge University Library. Finally, there was an issue with how the CAPTCHAs on the contact and nomination forms were being validated, which has also been resolved.

Our colleagues from Webrecorder delivered the initial set of changes to the ePub renderer, making it easier to cite a paragraph of one of our Legal Deposit eBooks. Given how long the ePub format has been around, it is perhaps surprising that support for ‘obvious’ features like citations and printing are still quite immature, inconsistent and poorly-standardised. To make citation possible, we have ended up adopting the same approach as Calibre’s Reference Mode and implemented a web-based version that integrates with out access system.

We’ve also worked on updating the service documentation based on feedback from our Legal Deposit Library partners, resolved some problems with how the single-concurrent-use locks were being handled and managed, and implemented most of the translations for the Welsh language service. The translations should be complete shortly, and and updated service can be rolled out, including the second set of changes from Webrecorder (focused on searching the text of ePub documents).

Replication to NLS

The long process of establishing a replica of our holdings at the National Library of Scotland (NLS) is finally nearing completion. We have an up-to-date replica, and have been attempting to arrange the transfer of the servers. This turned out to be a bit more complicated that we expected, so has been delayed, but should be completed in the next few weeks.

Minor Updates

For curators, one small but important fix was improving how the W3ACT curation tool validates URLs. This was thought to have been fixed already, but the W3ACT software was not using URL validation consistently and this meant it was still blocking the creation of crawl target records with top-level domains like .sport (rather than the more familiar .uk or .com etc.). As of June 23rd, we released version 2.3.5 of W3ACT that should finally resolve this issue.

Apart from that, we also updated Apache Airflow to version 2.5.3, and leveraged our existing Prometheus monitoring system to send alerts if any of our SSL certificates are about to expire.

26 June 2023

LGBTQ+ Connections and Community

By Ash Green, CLIP LGBTQ+ Network, and Goldsmith University

The Marlborough Pub and Theatre
The Marlborough Pub and Theatre

I was browsing through the LGBTQ+ Lives Online collection recently, and reminded myself that I had added The Marlborough Pub and Theatre to it when I first began co-curating the collection. As far as I can remember, it was one of the first sites I added to the archive. I wanted it in there because it had been an important part of my coming out around 2017. I had a personal connection to it, and I wanted there to be a record of the impact it had on me. I know future explorers of the UK Web Archive won’t know why that site is archived, but maybe they will stumble across this blog post in connection to it and understand its importance to at least one BTQ person – me.

So, why did I specifically want this site in there? Well, in 2017, when I was working out what support there was for me as a trans/gender non-conforming person, I discovered The Clare project, which is a Tran’s support group in Brighton. I went along to it, and afterwards we went to The Marlborough Theatre and Pub, which was a venue with a long history of support for the LGBTQ+ community. The pub was the sort of place where I didn’t know anyone, but just being there made me feel okay about who I was. It was the first time being in an LGBTQ+ venue had felt like that to me. And I realised that there were other people there who seemed to be on similar paths in their lives. It was a reassuring place, and it was a place where I learnt about how diverse the LGBTQ+ community was. I remember going to a queer cabaret there, and it was such an amazing, heart-warming, queer, eye-opening and fun night. The pub is still there – now called The Actors. I’m not a regular visitor, and if you mention my name in there, they won’t know who I am. But when I call in from time to time when I’m in Brighton, I still get that sense of belonging to a community even if I’m quietly sitting in a corner reading on my own. It is a place that re-energises me.

It got me wondering about other sites in the LGBTQ+ Lives Online collection focused on artistic communities that may have had a similar impact on others in the same way that The Marly did on me.

So, for example, what joy did members of South Wales Gay Men's Chorus, Songbirds Choir, or True Colours LGBT Choir feel when they first sang with these choirs?

How excited were listeners when they heard a new track on LGBT Underground that stuck a strong emotional chord with them, and has stayed with them forever?

How did filmmakers feel when their first film appeared at the Scottish Queer International Film Festival, LezDiff, or the Iris Prize? And who in the audience saw something for the first time at these film festivals that resonated strongly with them?

And what sense of connection and belonging did those in queer / LGBTQ+ art groups such as The Queer Dot, Sanctuary Queer Arts, Wise Thoughts, and VFD find within their arts communities?

And maybe there are LGBTQ+ people who attended Queen Jesus, Teatro do Mundo, or even The Marlborough theatre performances, who realised the voice on stage was talking directly to them, and they clearly understood its message in relation to who they are as an LGBTQ+ person.

I’m know I can’t possibly be the only LGBTQ+ person who feels a strong connection with a place or community like these. Maybe you have a story to share about one of the sites in the collection? Or maybe you have a site like one of these that you would like us to add. You can nominate sites for inclusion here: https://www.webarchive.org.uk/nominate

We can’t curate the whole of the UK web on our own. We need your help to ensure that information, discussions, personal experiences and creative outputs related to the LGBTQ+ community are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in the above nominations form.

If you would like to explore any of the sites mentioned in this blog post, you can find them in the Arts, Literature, Music & Culture subsection of the LGBTQ+ Lives Online collection: https://www.webarchive.org.uk/en/ukwa/collection/3090

20 April 2023

UK Web Archive Technical Update - Spring 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2022 Q4 report.

Summarising Our Holdings

We regularly report on our holdings so other teams across the Legal Deposit Libraries have an understanding of how much data we hold and how we grow over time. Until recently, the reporting mechanism we used did not fully take into account the storage used across different clusters, and on Amazon Web Services.

In January the old reporting mechanism was replaced with a new implementation, better integrated with our other systems and covering all storage services. The Airflow scheduler (discussed in previous reports) generates updated lists of holdings from different systems, and a Jupyter notebook is then used as a dashboard. This is made accessible via the W3ACT curation service, unlike the old system, which was only available to British Library staff.

While it doesn’t get updated automatically, there’s also an older copy of the notebook on GitHub. See UK Web Archive Holdings Summary Report. As you can see there, the UK Web Archive now holds over 1.4 PB of WARCs and logs.

The new system for Reading Room access to Non-Print Legal Deposit material has also made steady progress. An alpha version of the system has been rolled out across all LDLs so staff can access the service for testing, and a beta service is being rolled out to run alongside the current system in reading rooms. The deployment of the services themselves has also been automated, using GitLab CI/CD to updated the systems rather than relying on updating them by hand.

Staff testing raised some additional requirements to be met before the service roll-out can proceed. Working with Webrecorder to meet these requirements will be the focus for the next quarter.

UKWA Website

Edited 28th April 2023 to include translation updates.

The main website has been updated to run version 2.6.9 of our PyWB playback engine, and version 1.4.5 of the main search interface. Version 1.4.5 does not change the sites basic functionality, but does significantly improve the Scotting Gaelic version of the site.

However, we’ve also looked at more significant changes to the public interface to the archive.

Firstly, we’d like to update to newer version of PyWB, which now features an updated timeline and calendar display. Secondly, some experimentation with letting search engines to index selected website showed that it may be necessary to include links to the archived sites somewhere in the main site so that the crawler finds and prioritizes those URLs for indexing. To test this out, a page has been added to the site that lists any archived sites that require indexing, and that page has been included in the site map.

Finally, we’ve found a lot of queries are better answered by direct URL search than keyword search, so wanted to find ways to better integrate PyWB’s URL search functionality with the main site. To make URL search easier to use, we want to change the the main search interface on the front page of the website to spot URL searches and direct the user to the right results.

The BETA version of the website has been updated to include these changes, and is now available For review. If you have any feedback, please let us know.

The BETA homepage for the UK Web Archive  offering URL or Full Text search

Image: The BETA homepage for the UK Web Archive, offering URL or Full Text search

Web Archive Discovery tool updates

One long-standing issue we have is that our full-text search does not contain recent material, and over the next year we hope to revisit the scaling problems we’ve seen and try to improve the situation.

As an initial step towards this, we spent some time updating our search tools. The webarchive-discovery indexer has been updated to use version 2 of Apache Tika, along with other upgrades to other dependencies like the Nanite wrapper that makes is possible for us to use National Archive’s PRONOM/DROID format identification engine. This changes are quite significant, so the version number has been bumped from 3.3.x to 3.4.x.

We are also considering an alternative workflow, where we store the extracted metadata in an intermediate form, rather than going directly to Apache Solr or Elasticsearch. To enable us to experiment with this approach, the indexer has been modified to support writing the extracted metadata to JSON Lines output files so that we can use it to support multiple forms of indexing or analysis.

2023 Domain Crawl Preparation

As discussed in the previous report, this year we are bringing the domain crawl back on-site rather than running on the cloud. The technical preperation for this was fairly straightforward, given the deployment of the crawl is largely automated. The main change from the last on-site crawl is that we switched to using a server with plenty of fast SSD disks. The cloud crawls had shown us how much the whole thing can benefit from faster disks, so we have attempted to match that when running on our own servers.

Add some updated seed lists from Nominet and from our curators, and we are ready to roll on the anniversary of the first Non-Print Legal Deposit domain crawl. That one started on the 12th of April 2013, and so we’ve chosen that for our start date this year. This will be part of the wider celebrations from across the legal deposit libraries.


Addendum - 13th April 2023

Due to staff holidays, we are only now publishing this quarterly report, so we can add some notes on the launch of the 2023 domain crawl.

The crawl was set up on the 11th, and loaded with the 11 million seed URLs from Nominet and the 27,059 domain crawl seeds from W3ACT (including 13,460 non-UK seeds). On the morning of the 12th, the crawl was launched, and seems to be running well, at around 400 URLs per second. If the system can sustain this rate, which corresponds to around one billion URLs per month, the whole crawl should complete in 2-3 months time.

Dashboard for the first 24 hours of the 2023 Domain

 Image: Dashboard for the first 24 hours of the 2023 Domain

For more information on the anniversary of Non-Print Legal Deposit, see Celebrating ten years of collecting the UK Web Space.

04 April 2023

Celebrating ten years of collecting the UK Web Space

Nicola Bingham, Lead Curator, Web Archiving, British Library

This April, we are celebrating ten years of collecting and preserving digital publications in the UK such as websites, e-books, and online journals, under legal deposit regulations. The UK Web Archive forms an important part of our collecting activity, across all six legal deposit libraries. We aim to preserve a copy of every UK website that we can identify, reflecting the broad range of experience and expression across the UK.

Large upper case text in a dark colour that reads - Everything Forever. The subtitle is - 10 Years Electronic Legal Deposit. At the bottom of the image is the logo of the six UK Legal Deposit Libraries - British Library, Bodleian Libraries, Cambridge University Library, National Library Scotland, The Library of Trinity College Dublin and the National Library of Wales.

The UK Web Archive provides a detailed insight into the evolution of online public communication over the past two decades. Communication on the web is central to understanding the history, politics, culture and society of the 21st century. However, we know that information shared publicly on the web is rapidly changed, deleted and replaced. The UK Web Archive helps people to understand current events, and the recent past, by preserving that information before it is lost.

Here are a few examples of topics and themes that we have preserved in the archive:

  • General elections: We have archived websites related to every UK general election since 2005. These websites provide a fascinating insight into the political campaigns, issues, and debates of each election.
  • London Olympics and Paralympics 2012: These websites document the planning, organisation, and events of the games, as well as the cultural and social impact they had on the UK.
  • Brexit: This collection documents the political, social, and economic impacts of Brexit. It contains official sources as well as voices from all sides of the debate across the UK.
  • Online Enthusiast Communities: This collection provides insight into hobbyists in the UK. It covers a wide range of interests from more traditional areas, such as stamp collecting and cycling, to the more esoteric, such as the UK Roundabout Appreciation Society.

The UK Web Archive is used by researchers to answer significant questions on various topics. Recent examples include:

The UK Web Archive has been in existence since 2004. Legal deposit regulations came into effect on 6 April 2013 which increased our capacity to collect the UK’s online heritage and ensure it is available for future generations to research and study.

Prior to these regulations, we had to ‘hand pick’ websites to archive, and then could only proceed with written permission of the website owner. From 6 April 2013, the six legal deposit libraries of the UK and Ireland (the British Library, the National Library of Scotland, the National Library of Wales, the Bodleian Libraries, Cambridge University Library and the Library of Trinity College Dublin) were empowered to collect and preserve all web content that could be identified as published in the UK. Since then, we have been archiving the UK Web at the “domain” level and hold many millions of websites - or over a Petabyte of digital content. The 11th annual “domain crawl” will be launched this week.

How can I access it?
Anyone can access the UK Web Archive, free of charge, at the six UK Legal Deposit Libraries.

You can search the archive, and view thousands of openly accessible archived websites at https://www.webarchive.org.uk/

Help us build the archive
Even though we aim to collect as much of the UK Web as possible, we miss many websites as we cannot automatically identify all of them as being published in UK. If you know of a UK website that should be preserved, please suggest it here: https://www.webarchive.org.uk/en/ukwa/info/nominate

07 December 2022

Pride and Visibility in the LGBTQ+ Lives Online Collection

By Ash Green, CLIP LGBTQ+ Network, and Goldsmith University

The LGBTQ+ Lives Online UK Web Archive collection currently holds over 600 sites, web pages, blogs etc focused on the LGBTQ+ experience of people in the UK. Community and the coming together of individuals is a key aspect of the LGBTQ+ experience, and this is particularly reflected in sites acting as networks; focused on Pride events; and visibility and remembrance days such as Bi Visibility Day, Lesbian Visibility Week, Trans Day of Remembrance, International Day Against Homophobia, Biphobia and Transphobia. These events, networks and days are there to support the community; remind others outside the community we are part of, that we exist; that we celebrate who we are; that the need to highlight and address inequalities continues to remain important despite LGBTQ+ people having existed for millennia.

Pride march with rainbow flags
Gotta Be Worth It from Pexels

An example of sites in the UK Web Archive under some of these banners include: LGBT Mummies (aiming to support LGBT+ women & people globally on the path to motherhood or parenthood); London Gaymers (a safe place for the LGBT gaming community in London and across the UK to connect with like minded individuals); African Rainbow Family (a non-for-profit charitable organisation that support lesbian, gay, bisexual, transgender intersexual and queer (LGBTIQ) people of African heritage and the wider Black Asian Minority Ethnic groups); Pride Sports (a focus on increasing participation in sport by lesbians, gay men, bisexual and transgender people as well as the wider community). As you can see from the examples given, many of the informal networks are focused on where other aspects of an individual’s life overlaps with being an LGBTQ+ person.

We also have Pride sites archived within the collection, including both local (Pride In Surrey , Glasgow’s Mardi Gla , York Pride) and nationwide (LGBTQYMRU ) events. Before the pandemic they were mainly face-to-face events, but between 2020 and 2022, there was an increase in online events as many sought to keep LGBTQ+ people connected in a safe way.

We would like to build the collection of UK sites focused around Pride and awareness/visibility days. We don’t limit our collection of sites to big organisations only – as we have said before, all LGBTQ+ content is welcome, including personal content if it is published in the UK. And even though we would like to develop the areas of the collection highlighted above, we are also still happy to receive submissions around any aspects of LGBTQ+ Lives Online. So, if you know of any online content you think we should be archiving within this collection please nominate it here.

Under the Non-Print Legal Deposit Regulations 2013, the UKWA can archive UK published websites, but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Libraries and Trinity College Dublin Library. If you’re curious about what is in the LGBTQ+ collection you can browse through it here.

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs