UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

45 posts categorized "Legal deposit"

14 July 2022

Web Archiving the UEFA Women’s Euro England 2022 tournament in Northern Ireland

By Rosita Murchan, Web Archivist, Public Record Office of Northern Ireland (PRONI)

Black and white photo of Female footballer in a black and white striped shirt in motion of keeping up the ball
Thanks to the Deputy Keeper of the Records, Public Record Office of Northern Ireland and the Northern Ireland Women’s Football Association for the photo

The Public Record Office of Northern Ireland (PRONI) is the official archive of Northern Ireland and is situated in the historic Titanic Quarter in Belfast. PRONI was established by the Public Records Act (Northern Ireland) in 1923 which means in June next year we look forward to celebrating our centenary. PRONI has been collecting websites for over ten years, focusing on Government departments, local councils and websites deemed historically or culturally important to Northern Ireland. Over the years our collection has grown in both size and scope and we now capture one terabyte of data per year. PRONI does not have legal deposit status, so working with the UK Web Archive enables us to widen the scope of our collections, and ensure that other relevant content is captured.

PRONI has a rich history of celebrating women in sport having previously curated ‘A Level Playing field – Women in sport’ an exhibition from the archives held by PRONI. With images from the late nineteenth century onwards, this exhibition reminds us that women actually have a long history of participation in a wide range of sporting activities. PRONI also holds the papers of the Northern Ireland Women’s Football Association which includes official minutes and documents, as well as scrapbooks, programmes, newspaper clippings and other ephemera (PRONI Reference: D4633).

We are delighted to be working in partnership once again with the British Library and adding a Northern Irish perspective to their UEFA Women’s Euro England 2022 collection.

The Northern Ireland team has defied the odds to book their place in this summer’s tournament, and PRONI’s collaboration with the British Library will enable us to capture web content documenting the progress of the players who are set to make history for Northern Ireland this summer.

We plan to select as much of the news and media coverage as we can, capturing the local views, hype and excitement of Northern Ireland’s historic qualification to the Euros as well as content from Northern Ireland women’s official home page within the IFA (NI Women's Football) detailing all fixtures, news, team profiles and updates throughout the tournament. We will also include social media content about the tournament, twitter feeds of organisations and team members, and general social media coverage of the competition.

In recent years, PRONI has developed a number of creative and digital engagement projects that put the public at the heart of archives, making archives more welcoming and inclusive. We plan to use our social media channels to put out a call for nominations for sites from PRONI followers but anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nominations form: www.webarchive.org.uk/nominate

PRONI Logo white background

13 July 2022

Web Archiving the UEFA Women’s Euros in Scotland and Wales

By Eilidh MacGlone (National Library of Scotland) and Aled Betts (National Library of Wales) 

a blue banner image with the UK Web Archive, British Library, Inspired by England 2022 and the National football Museum. A female football player kicking a ball and the text, Can you help us preserve football history? We are collecting websites about the UEFA Women’s EURO 2022. Nominate a website for us to archive: https://www.webarchive.org.uk/en/ukwa/info/nominate

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We are collecting websites about the 2022 UEFA Women’s EURO from around the UK.

You can view the UEFA Women’s Euro England 2022 collection here:  

https://www.webarchive.org.uk/en/ukwa/collection/4278 

Although Scotland and Wales didn’t qualify for this year's tournament, football fans in both countries will be getting involved in the celebrations. In this blog post we hear about what content the National Library of Scotland and the National Library of Wales have added to the UK Web Archive collection. 

National Library of Scotland 
As Scotland supporters are aware, we won’t be competing in this year’s Euros (ah, the 95th minute!). Yet, our collecting has captured a valiant qualifying effort, through news sites and the national team's social media. Caroline Weir is one player keeping an eye on the competition, writing an online column with some collegiate support for old Man City teammates. Also evident is that the writing we are collecting describes a sport reaching larger audiences.

Teams are competing in national stadiums – a departure from the smaller arenas we found collecting the last Women’s World Cup. It can be seen in the team taking advantage of this to share a broad message with more football fans. Captain Rachel Corsie giving an example, of wearing pride colours on her captain’s armband at Scotland’s game with Hungary. The national team are now looking to plans for its second World Cup next year, players are looking to a new Scottish Women's Premier League, starting in August. We will continue to preserve Scottish women's football, preserving growing interest in the sport.

National Library of Wales 
Wales were agonisingly close on qualifying for UEFA Women's Euro England 2022, which would have been a historic moment as would have meant Wales reaching their first ever major Tournament. Northern Ireland narrowly secured the play-off place at the expense of Wales as their head-to-head away goal count was superior! As the Euros are being held in England, National Library of Wales focus will be archiving sites looking at the competition from a Welsh perspective.

Women’s Football in Wales has never been stronger as the National team maintain their push for qualification for next year’s World Cup and the huge rise in domestic clubs over the last 20 years providing opportunities to so many. This is reflected by the many websites and twitter feeds that have been archived by the National Library of Wales. For instance, we archive the FAW website and Twitter account, the Twitter feeds of our most famous players, archive Premier clubs websites as well as delving into grassroots football by archiving Domestic League websites and we will look at adding many more sites to the rich collection that we already have.

Get involved with preserving women’s football online with the UK Web Archive
The UK Web Archive works across the six UK legal Deposit Libraries and with other external partners to try and bridge gaps in our subject expertise. But we can’t curate the whole of the UK web on our own, we need your help to ensure that information, discussions and creative output related to the UEFA Women’s Euro England 2022 are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form.

15 June 2022

Breaking the News - News collections in the Web Archive

By Jason Webber, Web Archive Engagement Manager, British Library

The British Library is currently running the wonderful ‘Breaking the News’ exhibition. If you’ve not seen it yet, make sure you check it out. It is open until Sun 21 Aug 2022. The exhibition explores how the News has impacted and influenced our society. This exploration includes modern digital forms of news, much of which are contained in the UK Web Archive (UKWA).

Breaking The News

The ‘News’ collection in UKWA contains over 2700 news sites that we archive. The scope ranges from major national news outlets - BBC, Guardian, Daily Mail etc. as well as many local and even hyper-local news websites. The collection includes one newspaper, The Independent, that ceased being a print paper to become exclusively a digital one.

The majority of these archived news sites and twitter accounts can only be viewed in reading rooms of UK Legal Deposit Libraries. Many, however, are openly available to view from home, lets see some examples:

Local news
In addition to major national news outlets we collect thousands of local and hyper-local news websites. Many towns, suburbs and villages maintain a local news website and we do our best to archive them.

Brixton blog

Bristol cable

Archived website - Bristol Cable

Cranfield and Marston Vale Chronicle 

International
Whilst the focus of the our collection is for UK based news, we do also collect some international or overseas publications. Tristan da Cunha, one of the remotest places on earth maintains a news website for its residents.
Irish news - TheJournal.ie

Tristan da Cunha News 

News-tristan

About journalism
As well as news outlets aimed at us the public, we also collect websites for journalists themselves.

The Bureau of Investigative Journalism

Media helping media

News-media-helping

You can discover everything we have collected in the News collection via our website.

If you know of a UK news website (this might be about your local area), nominate it to the UK Web Archive.

09 June 2021

Alternative Sports in the UK Web Archive - Part 1

By Jason Webber, Web Archive Engagement Manager, British Library

Welcome to the UK Web Archive 'Summer of Sport' season! Over the next few months we will show the many ways that sport is represented in the web archive.

Let's start with some of the more quirky and unusual 'sports' played in the UK:

Cheese Rolling Championship

Brave competitors chase a wheel of cheese down the terrifyingly steep Cooper's hill (1 in 1 in places) in Gloucestershire. First one to the bottom is the winner! The prize is a 7-8lb wheel of Double Gloucester cheese!

Cheese Rolling Championship website

The official Cheese Rolling Championship website in 2008.

The Chap Olympiad

What sport can there be for the well dressed 'person-about-town'? The 'Chap Olympiad' of course.

"A series of challenges ensue ranging from the frantic and frenetic to the barely mobile. The Tea Pursuit and Umbrella Jousting (where participants clamber aboard a bike holding an umbrella and a briefcase) see what is possibly the first use of Boris bikes as part of a sporting contest. The Tug Of Hair pits two teams against each other, pulling on a twenty feet long moustache until one team topples over. In Well-Dressage, individuals mount hobbyhorses and prance around to music while Not Tennis is the epitome of anti-sport with two players invited to do anything but play tennis."

Chap olympiad - Londonist website

Photos of the Chap Olympiad from the Londonist website in 2016.

Bog Snorkeling

If an athlete is not afraid of a spot of mud, what better event than the Bog Snorkeling Championships! Competitors aim to complete two consecutive lengths of a 60 yards (55 m) water-filled trench cut through a peat bog in the shortest time possible, wearing traditional snorkel, diving mask and flippers.

"Event rules state that no recognised swimming strokes are allowed at the event so it all comes down to honing down the perfect technique to power through the murky water."

Bog snorkling - Visit Wales Blog

Bog Snorkeling on the Visit Wales website from April 2013

World Conker Championships

Threading a piece of string through a horse chestnut seed and hitting another one has been a long standing feature of school playgrounds. Conkers, however, is a serious business and over a thousand are used in each World Championship contest!

World conker championships

Photo of the World Conker Championship 2016.

BBC News article on the World Conker Championship in 2004.

Summary
We aim to capture all aspects of UK life including the sporting life. If you have a UK sport website that you would like to suggest for the web archive, nominate it here.

#WebArchiveSummerOfSport

 

25 November 2020

LGBTQ+ Lives Online Web Archive Collection

By Steven Dryden, British Library LGBTQ+ Staff Network & Ash Green CILIP LGBTQ+ Network

As you’ll have read on this blog, the collaboration with UK Web Archive (UKWA), British Library and CILIP LGBTQ+ Network to develop LGBTQ+ content within the UK Web Archive was launched during summer 2020.

Rainbow tapestry

LGBTQ+ content was already part of the UK Web Archive before the collaboration began, with many sites in other collections overlapping LGBTQ+ themes. For example, Black and Asian Britain (blackgayblog.com), Gender Equality (Beyond the Binary), Sport (Graces Cricket Club). And some sites cut across many collections, highlighting the intersectional nature of the UK Web Archive. For example, Gal-Dem features in the News Sites; Zines and Fanzines; Black and Asian Britain; Gender Equality; Women's Issues; Unfinished Business: The Fight for Women’s Rights collections, as well as LGBTQ+ Lives Online. LGBTQ+ Lives Online, much like the lived experience of the LGBTQ+ does not sit in isolation, disconnected from other aspects of UK offline and online life. LGBTQ+ people play a part in all aspects of the UK community, and are not solely defined by their gender or sexual orientation.

This UK Web Archive collection doesn’t stand in isolation either, it enriches the scope of work already begun at The British Library.LGBTQ Histories aims to explore the experiences and stories encountered in the collections, posing questions about the lived experience of LGBTQ+ people throughout history.The LGBTQ+ Lives Online collection of the UK Web Archive plays a part in CILIP LGBTQ+ Network’s ambition to raise the profile of LGBTQ+ people, support the development of LGBTQ+ information resources and the work of LGBTQ+ Library, information and knowledge workers.

LGBTQ+ Lives Online Collection

UKWA 'ACT' tool

The collection currently contains over 400 sites and web pages in the main collection, with more of these being added to sub-collections every week. Many of the sites were already in the UKWA before the collaboration began, but were not linked to sub-collections. We are still at the stage where we are developing the structure of sub-collections but our initial indexes cover:

Since the launch of this collaborative project, we have been focused on a number of areas to both develop the project and to preserve sites within the collection. This includes:

  • Identifying sites already in the UK Web Archive to be added to the LGBTQ+ Lives Online sub-collections.
  • Identifying new sites not already in the UKWA to be included in the collection.
  • Spreading the word about the project as widely as possible via blog posts and articles such as this; social media; emails targeting specific LGBTQ+, library, and broader diversity organisations and networks.

You can browse through the collection here, and nominate a UK published site or webpage with a focus on LGBTQ+ lives to be included in the collection via: https://www.webarchive.org.uk/en/ukwa/info/nominate. We would especially like to see more nominations that reflect the multicultural nature of UK LGBTQ+ communities and the many diaspora communities based here, including UK sites written in languages other than English.

Though it can often be challenging for us to archive social media accounts, we are able to collect LGBTQ+ Twitter accounts. We have experimented with other methods of archiving social media but this is on a selective basis, but we would welcome nominations and projects that might address these challenges and how they might impact on archiving LGBTQ+ experience in the UK,

How can you access these archived websites?

UKWA search results page

Under the Non-Print Legal Deposit Regulations 2013, the UKWA  can archive UK published websites, but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

Some of the websites in UKWA have already had permission granted, these include Out Stories Bristol, Trans Ageing and Care, Bi Cymru/Wales and Queer Zine Library. As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view content. If there is no message underneath then the archived version of the website should be available on your personal device.

Due to the coronavirus pandemic, the reading rooms were closed for a number of weeks but are starting to reopen. This blog post gives an overview of opening hours and how to book a visit at the six UK Legal Deposit Libraries:

https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html 

Previous blog posts about the project can be viewed via the following links.

LGBTQ+ Lives Online project introduction

LGBTQ+ Lives Online: Introducing the Lead Curators

 

18 November 2020

2020 Domain Crawl Update

By Andy Jackson, Web Archiving Technical Lead at the British Library

 

On the 10th of September the 2020 Domain Crawl got underway. The annual Domain Crawl usually takes about three months to complete, it visits UK published websites on a UK Top Level Domain (TLD) like .uk, .cymru, .scot, .london etc., any web content hosted on a server registered in the UK as well as all the records manually created by the UK Web Archive teams across the UK Legal Deposit Libraries

 

Update on crawl management

Due to the billions of URLs involved, the Domain Crawl is the most technically difficult crawl we run. As the crawl frontier grows and grows, the strain starts to show, particularly on the disk space required to store all of the status information about the URLs that have been crawled or are awaiting crawling. Worst of all, we found some mysterious problems with how Heritrix3 manages this information, meant that we could not safely stop and restart long crawls. We could usually restart once, but if we restarted again strange errors would appear, and sometimes these would be serious enough to cause the whole crawl to fail. Fortunately, in the last year, we finally tracked this down and updated the Heritrix3 crawler so that it can be safely stopped and restarted multiple times. 

This has made managing the crawler much easier, as we can stop and restart the crawl with confidence if we need to change the software or hardware setup. This makes managing things like disk space much less stressful.

 

Update on the crawl performance 

In the initial phase of the crawl, we threw in the roughly 11 million web hostnames that we have seen in past crawls, which then got whittled down to about 7 million active hosts. After this bumpy start and some system tuning, the crawl settled down and has been pretty consistently processing 250-300 URLs per second.  This is acceptable, but isn’t quite as fast as we would like, so we are analysing the crawl while it runs to try and work out where the bottlenecks are.

 

What we have collected so far

The figure below shows the URLs collected over time.

 

Graph illustrating the number of URLs downloaded in the 2020 Domain Crawl
Graph illustrating the number of URLs downloaded in the 2020 Domain Crawl

 

The rather jagged start shows where we were able to stop and start the crawl in order to tune the initial hardware setup, and the flatter ‘pauses’ later on are from other maintenance activities like growing the available disk space. The advantage of being able to re-tune the crawler as we go is shown by the way the line gets steeper over time, corresponding to the increased crawl rate.

 

In terms of bytes downloaded, we see a similar result:

Graph illustrating the number of TBs downloaded in the 2020 Domain Crawl
Graph illustrating the number of TBs downloaded in the 2020 Domain Crawl

 

As you can see, we are rapidly approaching 90TB of downloaded data, which corresponds to roughly 50TB of compressed WARC.gz data.

Despite starting the crawl relatively late in the year (due to issues around the COVID-19 outbreak), we are making good and stable progress and are on track to download over two billion URLs by the end of the year.

 

Follow the UK Web Archive on Twitter for the latest updates on the Domain Crawl and other web archiving activities! 

 

04 November 2020

Curating culturally themed collections online: The Russia in the UK Collection, UK Web Archive

By Hannah Connell, Collaborative PhD Student, King’s College London; British Library

Title slide from Hannah's presentation with a London Underground map in Russian

 

I spoke about my position as a curator for the Russia in the UK curated collection as part of the recent Engaging with Web Archives conference (EWA), which was held online from the 21st-22nd of September 2020. This conference reflected the breadth of the web archiving community, bringing together speakers from researchers to librarians, as well as curators and web archiving teams from many different countries.

As always, it was inspiring to participate in such a welcoming event. Even online, the conference retained the collaborative atmosphere which has marked my experience of research in web archiving, allowing new researchers to interact with more experienced practitioners and encouraging questions and conversations between researchers, users and archivists.

The researcher-curated collection, Russia in the UK, is part of the UK Web Archive (UKWA). I was particularly pleased to have had the opportunity to present this curated collection, a resource on the Russian-speaking community in the UK, which was first started in November 2017. Such collections play an important role in making the wide range of material preserved in the UKWA more visible to researchers.

Curators are important to the preservation work of the UKWA. Curated collections are collected manually by curators and researchers with specialist knowledge in their field. The role of a curator in creating a UKWA collection involves identifying relevant websites to be included in a collection, and recording the metadata for these websites, including the translation and transliteration of titles and descriptions in other languages.

This collection is valuable both as a resource for further research, and as a means of questioning research practices. It is not possible to capture everything on the web, and collection curators ensure that a representative sample of websites for each thematic collection are selected. The practice of creating and maintaining a collection such as the Russia in the UK  ultimately influences the shape of the collection and the online representation of the diasporic community it will come to reflect. As such, it is important for researchers and users to understand the decisions taken by curators in selecting and capturing websites.

My paper for EWA focused on the creation of a curation guide for curators of new curated collections. This  draws on the ongoing process of curating the Russia in the UK collection, documenting both the provenance of this special collection and reflecting on this process as a model for future collections.  

In documenting the creation of this collection, I hope to enable future researchers to explore and contribute to this record of the online activity of the Russian diaspora in the UK, and to question and develop the curatorial and research practices behind the curation of collections.

You can watch Hannah Connell’s presentation on the EWA YouTube channel.

 

03 November 2020

LGBTQ+ Lives Online: Introducing the Lead Curators

By Steven Dryden, British Library LGBTQ+ Staff Network & Ash Green CILIP LGBTQ+ Network

In July 2020 the British Library, the UK Web Archive and CILIP LGBTQ+ Network relaunched the LGBTQ+ Lives Online web archive collection. We have received many nominations for new sites to be collected by the UK Web Archive and work has begun to re-tag many of the websites that have been collected since the UK Web Archive began collecting the UK web in 2005.

To mark two months since the project began, LGBTQ+ Lives Online leads Steven Dryden, of the British Library, and Ash Green, of CILIP LGBTQ+ Network write about the relevance of the World Wide Web to them as members of the LGBTQ+ community, and some of their collection highlights:

 

Steven he/him/his

StevenDryden
Steven Dryden

I first encountered the internet in Las Vegas. It was the summer of 1998, I was 17 and my family had migrated from Newcastle Upon Tyne to the western world’s party play pit in the Nevada desert. My friend, Lilian, was talking to someone in New York City about the band Depeche Mode through America Online (AOL).

Chat rooms were online spaces that allowed groups of people to join anonymously and had the options to talk and interact within a group or in private. Chatrooms quickly became a pivotal part of my small cohort of friends and I, the odd balls who didn’t quite fit, as we were forming our identities in those formative late teen years, and trying to find our place in the world.

Later the same year on October 12, 1998 Matthew Shepard would die. A gay student at the University of Wyoming, Shepherd was beaten, tortured, and left to die near Laramie on the night of October 6, 1998. AOL chatrooms formed the major part of how I found out about Shepherd, worked through my feelings about his murder, and was the first news story that I followed online.

The protections and general understanding of who the lesbian, gay, bisexual and transgender community are has undergone radical change in the 22 years since I first encountered the internet. I’m interested to see what survives online of the change in language relating to the community, and what evidence remains in the UK Web Archive of the online discussion. Some websites that interest me in these first months of the project include:

  • The Campaign for Homosexual Equality: an organisation which led the way to legal reform in the UK, following the passing of the Sexual Offences Act 1967, which partial decriminalised homosexuality in England and Wales.

https://www.webarchive.org.uk/wayback/en/archive/20130505124828/http://www.c-h-e.org.uk/

  • Around the Toilet: a community engaged art project exploring the accessibility and culture of toilets for the LGBTQ+ community

https://www.webarchive.org.uk/wayback/en/archive/20180606164959/https://aroundthetoilet.wordpress.com/

  • Asexual Visibility and Education Network: founded in 2001 with two distinct goals: creating public acceptance and discussion of asexuality and facilitating the growth of an asexual community

https://www.webarchive.org.uk/wayback/en/archive/20150226230020/http://www.asexuality.org/home/

 

Ash (they/them)

Ash Green
Ash Green

When I was studying for my BA Information and Library Management degree in the early 1990s, the internet and World Wide Web weren’t as high profile as they are now. I loved tech back then, and was into programming and creating databases as part of the degree. But I didn’t really understand what the lecturers were talking about when they mentioned the internet. At the time I had no idea how important it would be to my coming out just over 20 years later, and what a positive impact it would have.

Thinking about the lead up to my coming out in 2017, without access to sites and forums related to trans/gender non-conforming lives in particular, I doubt I would have come out at all. But when I decided to look for guidance online, I found a huge amount of information that was overwhelming at first, but eventually this helped me understood where I fitted into the world. They included medical sites; statements from WHO and other health organisations highlighting that being trans wasn’t a mental health issue; personal blogs and forums, talking about experiences and a variety of perspectives on what it means to be trans; finding out about non-binary, genderfluid, and genderqueer people experiences (I had no idea what these words meant); LGBTQ+ events; makeup and style tips; sites for face-to-face support groups and meetups, and sites for exhibitions such as the Museum of Transology and the Transworkers photography exhibition, which helped me understand that being trans is much broader than mainstream media would have the world believe.

Many sites were useful, but at the same time I came across quite a few that were more "Yes, this miracle herbal treatment really does change your hormones", and "You're only valid if you fit into trans box X or Y" that put my critical, digital literacy and research experience into practice. I also found supportive friends and allies, and I was able to share useful sites and sources of information I’d discovered to give them a better understanding of my experience. It’s important that these sites should be a part of the UK Web Archive LGBTQ+ Lives Online collection. Not only because they have a relevance to the UK Web Archive in general, but from a personal perspective I feel that if they had such an impact on helping me find where I fit into the world, how many other people have they also had a similar positive impact upon?

The sites I’ve chosen below from the UK Web Archive have all had a personal impact upon myself.

  • Museum of Transology: The UK’s most significant collection of objects representing trans, non-binary and intersex people’s lives. 

https://www.webarchive.org.uk/wayback/en/archive/20201003091027/https://www.museumoftransology.com/

  • OutStories Bristol: Collecting and preserving the social history and recollections of LGBT+ people living in or associated with Bristol, England.

https://www.webarchive.org.uk/wayback/en/archive/10000101000000/https://outstoriesbristol.org.uk/

  • Outline Surrey: Outline provides support to people with their sexuality and gender identity, including but not limited to the lesbian, gay, bi-sexual and trans community of Surrey, primarily through a helpline, website and support groups.

https://www.webarchive.org.uk/wayback/en/archive/20160107134238/http://www.outlinesurrey.org/

 

Get involved with preserving UK LGBTQ+ Lives Online with the UK Web Archive

We can’t curate the whole of the UK web on our own, we need your help to ensure that information, discussions, personal experiences and creative outputs related to the LGBTQ+ community are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nominations form:

https://www.webarchive.org.uk/en/ukwa/nominate

 

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs