UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

03 November 2022

Calling All Digital Preservers!

By Andy Jackson, Web Archive Digital Lead, British Library

Calling All Digital Preservers!

World Digital Preservation Day logo -WDPD2022

The digital preservation community is small and under resourced. This means we must work together if we want to make the biggest impact. To this end, a small group of us have been attempting to help the members of the digital preservation community better support each other. As it is World Digital Preservation Day  (https://www.dpconline.org/events/world-digital-preservation-day), we'd like to encourage you all to (re)discover what we've built so far:

If you'd like to help, we'd love to hear from you....

  • What have we missed from the Awesome List?
  • Can you answer any of the unanswered DigiPres questions? Do you need to ask questions of your own? Are there old questions and answers on mailing lists that need a more visible home, so others can find them again?
  • Can you contribute to the COPTR Tool Registry?
  • Are these resources useful? Should we change our approach?

The last one is really important. We've been in digital preservation long enough to see a lot of portals and projects come and go, and we recognize that making it possible to build on past work sometimes requires changing what we've built so far.

Please get in touch if you have any questions. You could talk us directly via Twitter or Mastodon (e.g. https://digipres.club/), or use the digipres.org discussion forums. We're happy to hear any and all ideas!

In particular, in the last few weeks, the digipres.org homepage has been modified and the Awesome List has been set up, based on community feedback (https://github.com/orgs/digipres/discussions/34). Now would be a great time to get some feedback on what we've been doing!

Thanks for reading, and thanks to everyone who has contributed so far.

Andy Jackson (@anjacks0n/@anj@digipres.club) & Paul Wheatley (@prwheatley), on behalf of all the digipres.org contributors.

With thanks to the Open Preservation Foundation for hosting many of these resources, and to the Digital Preservation Coalition for their support.

18 October 2022

UK Web Archive Technical Update - Autumn 2022

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the update at the start of the summer.

Website Refresh
On 16 August 2022 we relaunched the UK Web Archive website, although you might not have noticed!

The previous version of the website treated page content like it was software, so updating what the pages said was far too difficult. This quarter, we finally got to release some changes we’d made so that most of the website pages are statically generated from Markdown source held on GitHub, using Hugo. This means we could add in a content management system called NetlifyCMS, which should make editing and translating the pages of our site much easier.

We’ve taken care to match the old website presentation and carefully overlay the new system while falling back on the old system for more complex dynamic pages. You might notice some minor differences to the styling between the two, if you look closely…

An important part of this was our automated accessibility testing. While accessibility evaluation cannot be fully automated, these tools help us manage the process of making changes to our website and minimise the risks of making things worse in time periods between full accessibility evaluations.

Computer server and cables

2022 Domain Crawl Launch
As the British Library networks are in the final stages of being upgraded, 2022 is the last year we expect to run the domain crawl on Amazon Web Services.

We launched the 2022 crawl on the 17th August 2022, and since the British Library is now a member of Nominet we were able to use an up-to-date list of UK domains as our starting point.

So far, we’ve processed nearly over 500 million URLs, totaling over 20TiB of data (uncompressed).

However, we’ve noticed what seems to be an uptick in systems like fail2ban automatically mis-reporting our crawler activity as abusive behaviour. This means we have to put more work into managing our relationship with AWS, and has slowed things down a bit. Nevertheless, we expect the crawl to run successfully until the end of the year, as in previous years.

Hadoop Replication
After many weeks of steady progress, our replica Hadoop storage service is now pretty much at capacity. Filling the thing up with about one petabyte of content took a while, but it’s been taking us a bit longer to be sure we’ve double-checked the transfer worked.

We are now awaiting a decision on whether we can purchase another server for this cluster, so we can make sure there’s room for the most recent crawls, and for content we expect to get in the near future. Either way, we’ll then start to plan shifting the hardware up the the National Library of Scotland.

Exporting Collection Metadata
Working with the Archives of Tomorrow project, we’ve been developing a way to export our collection metadata so it’s more suitable for reuse.

Having real use cases drive the work has been useful, and over the next weeks we’re hoping to integrate the outputs into the UKWA API so anyone can use that data.

Legal Deposit Access & NPLD Player
Working with Webrecorder we’ve seen some good progress on a new version of PyWB that supports direct rendering of PDFs and ePubs, and on the secure player application that will be used to provide access in some reading rooms.

Much of the work has focussed on the challenges around testing and preparation for a new version of a service that works across multiple independent institutions. But it’s been good to start to get some user feedback on how the system works in practice, which has already flushed out some additional requirements for the first release.

iPres 2022
As covered in this dedicated blog post, iPres 2022 included a presentation partly based on lessons learned from managing the technical aspects of the UK Web Archive. The plan is to publish a longer version of that work later in the year.

Major Outage
After the successes of the iPres conference, we were quickly brought back down to earth by a severe hardware failure on the 25th of September. One of the network switches failed, and the whole UKWA dedicated network locked-up in a way that made it difficult to understand and route around the failure.

This took a while to diagnose and resolve, so we moved some critical components onto other machines so our curators and users could use our services. While this was relatively successful, it also showed that some of our automated tasks need breaking down so that different functions can be managed independently. For example, we need crawl launches to be able to proceed even if nothing else is running. These problems meant that our daily crawling activity was delayed and patchy for most of last week.

These complications mean it’s taken a bit longer than expected to undo all the interim changes that were made during the hardware outage. However, as of last week, everything is back to normal

07 October 2022

The UEFA Women’s EURO 2022 Arts and Heritage Programme

by Caterina Loriggio, UEFA Women’s EURO Arts and Heritage Lead

Jan Lyons (Manchester Corinthians) and Gail Redston (Manchester City) looking at the 1921 Ban. Part of Trafford's heritage programme. Photo by Rachel Adams for UEFA WEURO 2022 heritage programme
Jan Lyons (Manchester Corinthians) and Gail Redston (Manchester City) looking at the 1921 Ban. Part of Trafford's heritage programme. Photo by Rachel Adams for UEFA WEURO 2022 heritage programme

The UK Web Archive has been collaborating with the UEFA Women’s EURO 2022 Arts and Heritage Programme to develop the UEFA Women's Euro England 2022 web archive collection. In this guest blog post, we hear about the wider arts and heritage programme around the tournament from Caterina Loriggio.

The UEFA Women’s EURO 2022 arts and heritage programme was designed to promote community engagement, develop cultural leadership, support health and wellbeing, reinforce civic pride and to support local economies post-pandemic. Host City partners (Rotherham, Sheffield, Trafford, Wigan, Manchester, Milton Keynes, Brent, Hounslow, Brighton, and Southampton) were all keen to amplify the opportunity the tournament provided to engage and inspire their residents and visitors.

The £3m programme was supported by National Lottery players through Arts Council England and National Lottery Heritage Fund grants and through funding from the Host Cities. It included four arts commissions, eight museum/archive exhibitions, eight outdoor exhibitions, heritage outreach and education programmes, 45 memory films and new online content covering the history of the women’s game. The project also researched for the first time the full line-up of all the women who have played for England over the past 50 years. Many of those women will be honoured at Wembley Stadium on October 7th in front of a sell-out crowd when they will take a lap of honour during half time in the England USA match.

It was the first time The FA had ever delivered a cultural programme. A key priority for The FA is to establish female role models for both girls and boys. When Host City partners requested a cultural programme to support the tournament the Association saw that this could be a great opportunity to further fulfil this objective. It was also clear that partnering with cultural organisations in Hosts Cities, and national institutions such as the UK Web Archive and British Library would also be a great way to promote the UK’s cultural sector and would be a very effective tool to capture, for the first time on a national scale, the hidden history of women’s football.

Prior to writing funding applications, I led, with the support of the Football Supporters’ Association, four online fan consultations to ensure the programme spoke to the wants of women’s football fans. We also commissioned the organisation ‘64 Million Artists’ to lead half-term virtual workshops for young people aged 12 – 18 in Host Cities (many of whom played football). The fans and young people’s feedback was shared with artists, archivists and curators and was clearly reflected in all elements of the programme. The fans were clear that they could ‘never get enough history’.

Archives and contemporary collecting played an important part in the heritage programme. It was apparent many stories of women’s football (fans as well as players) had been lost already and that women who had played during the ban (1921-1970) were of an age that if we did not collect their stories now, then there was a real risk that they might never be captured. As well as collecting physical objects for museums and archives like caps, pennants, and programmes, there was a significant degree of online archiving. Many of the Host Cities created online exhibitions, hosted films, and imagery on digital archive platforms and digitally captured objects which retired footballers were happy to loan but not donate. Nationally we made 36 memory films live on The FA website. These will be moved to EnglandFootball.com in time for the 50th Anniversary of the Lionesses in November, plus there will be some new content made especially for the anniversary. We were greatly supported in our programme by The National Football Museum and Getty Images who gave us access to their photography archives, which greatly enriched all our work. We also sought to create content for the future by commissioning Getty photographers and by running fan and young people’s photography campaigns to capture the atmosphere of match day and the fan experience beyond the pitch. Some of these images will be shared in an online Getty Images Gallery to be launched in November.

It is hoped that the learnings from this programme will help to secure cultural content in future UK bids for major sporting events. I hope that archiving and collecting will remain important components in all these future projects.

Related Links
This is the ninth blog post published so far about the women’s Euros, the others can be found on the UK Web Archive blog under the 'sports' tag.

There is still an active call for nominations for the UEFA Women's Euro England 2022 web archive collection. Anyone can suggest UK published websites to be included in the archive by filling in our nomination form.