UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

26 October 2020

The 1916 Easter Rising Web Archive

By Brendan Power, Digital Preservation Librarian, Library of Trinity College Dublin

The 3 Legal Deposit Library logos who were involved in the collaboration - Bodleian Libraries, Trinitiy College Dublin and the British Library

At the recent conference, ‘Engaging with Web Archives: Opportunities, Challenges and Potentialities’, I presented a paper on a collaborative project between The Library of Trinity College Dublin, the University of Dublin, the Bodleian Libraries, the University of Oxford, and the British Library. The project was carried out in 2015/16 and aimed to identify, collect, and preserve online resources related to the 1916 Easter Rising and the diverse ways it was commemorated and engaged with throughout its centenary in 2016. The Bodleian Libraries primarily collected UK websites under the provisions of the 2013 Non-Print Legal Deposit Regulations (NPLD), while The Library of Trinity College Dublin focused on websites in the .ie domain. Since no legislation exists in the Republic of Ireland to ensure that the .ie domain is preserved, websites within the .ie domain were collected on a voluntary basis, that is, with the express formal permission of the website owners through the signing of a license agreement.

 

We aimed to reflect the variety of ways that the Irish and British states, cultural and educational institutions, as well as communities and individuals, approached the centenary events. These included official commemorative websites, the websites of museums, archives, heritage, cultural, and education institutions, along with traditional and alternative news media websites, blogs, and community websites. These resources will be invaluable primary resources to analyse how people interpreted and engaged with the Easter Rising in its centenary year. Researchers have reflected on the events organised on the fiftieth anniversary of the Easter Rising in 1966 and how these events were framed, the aspects that were championed, and the critical viewpoints denied expression. In a similar way, the records created throughout the centenary will be an essential resource for researchers in analysing how the generations of 2016 engaged with the legacy of the Easter Rising and the approaches, themes, and tone adopted.

 

The resulting web archive collection contains over 318 seeds, i.e. websites or sub-sections of these. Of these 318 websites, 112 (35%) were selected by The Library of Trinity College Dublin, 190 (60%) by the Bodleian Libraries, and 16 (5%) by curators at the British Library. 118 (37%) of the websites were from the .ie domain, 172 (54%) were from the .uk domain and 28 (9%) were associated with other areas, predominantly the USA. For all websites outside the UK (146), formal permission was sought from the website owners, resulting in 61 licenses to archive and make the archived copies publicly available. We received no response from 83 website owners, and 2 organisations agreed in principle to inclusion in the web archive but were not in a position to sign the license agreement required to allow us to archive the website as they could not affirm that they controlled the copyright of all the content that was to be archived. This meant an overall permissions rate of 42%, with the rate for websites in the .ie domain being even higher, at 51%.

 

Since the project was completed there have been many helpful reminders of the impact that such work has. This included one organisation that had created a website dedicated to an Easter Rising project which was no longer live on the web. The person that was responsible for the website had left the organisation and their replacement had no access to the materials that had been on the website. They had discovered an e-mail from me back in 2016 inviting them to participate in the web archive. Once they contacted me, I was able to direct them to the UK web archive and, as the organisation had signed the license agreement, they were able to access the archived website immediately from their office. This access had saved them both the time and staff resources that would have been expended in order to recreate some of the resources that were available on the archived website. It serves as an example of what embedding sustainability into a project can save in terms of time and staff resources and demonstrated the positive economic impact that organisations can derive by participation in cultural heritage initiatives such as web archives.

 

The co-curators of this collection have also previously published a paper on the collection in the academic journal, Internet Histories called Capturing commemoration: the 1916 Easter Rising web archive project.

You can watch Brendan Power’s presentation on the EWA YouTube Channel.

 

21 October 2020

The UK Web Archive and Wimbledon; A Winning Combination

By Robert McNicol, Kenneth Ritchie Wimbledon Library, Wimbledon Lawn Tennis Museum

 

Wimbledon Lawn Tennis Museum Logo

 

Opened in 1977, the Kenneth Ritchie Wimbledon Library, part of the Wimbledon Lawn Tennis Museum, is the most comprehensive collection of tennis publications in the world. We hold books, periodicals, programmes and other publications from more than 90 different countries.

As with everything at Wimbledon, we are always looking for ways to evolve and improve how we do things. That’s why we were delighted to team up with the UK Web Archive to put together a curated collection of tennis websites. The Tennis collection sits within the Sports Collection (Ball Sports Excluding Football) section of the UK Web Archive Sports Collection.

So far, we have added over 70 sites to the Tennis collection but ultimately the aim is to archive all UK-based tennis websites. This includes websites of tennis clubs, governing bodies and media, as well as the websites and social media feeds of individual players. We have already added the Twitter feeds of all world-ranked British players to the collection.

Social media archiving is an area we are particularly interested in and we have been experimenting with using Webrecorder to archive social media feeds to a level not possible on the UK Web Archive. We have recently conducted several trials, using both the manual and auto-pilot functions of Webrecorder to archive the Wimbledon Twitter and Instagram feeds. We have had mixed results from these pilot projects and would be interested in comparing notes with any other organisations that have used Webrecorder to perform social media archiving.

As well as social media feeds, we have been using Webrecorder to archive our own website, Wimbledon.com, which, as a particularly dynamic website, the UK Web Archive struggles to capture fully. Wimbledon.com is this year celebrating its 25th anniversary and by archiving it regularly we will be able to save the information contained in it for researchers of future generations. In the same way, we have also been trialling the archiving of our AELTC Intranet site, Wimbledon Insider.

We’ve greatly enjoyed our collaboration with the UK Web Archive so far and are very grateful for the web archiving advice that they have provided. We hope that our tennis expertise has also been of benefit to the UK Web Archive and the British Library. We look forward to working together for many years to come.

If you would like to nominate a tennis website to be archived, please fill in the public nomination form on the UK Web Archive website or get in touch with me at rmcn@aeltc.com, we’d love to hear from you.

You can watch Robert McNicol’s presentation on the EWA YouTube Channel.

 

19 October 2020

Exploring media events with Shine

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

Computer screen with some HTML code on the screen

This blogpost is a summary of the presentation I delivered with my colleague Daniela Major in the conference Engaging with Web Archives: ‘Opportunities, Challenges and Potentialities’ in September 2020. This presentation is entitled ‘Tracking and analysing media events through web archives’.

My research explores the media coverage of the Olympic Games in a cross-cultural, cross-lingual and temporal perspective. I am especially interested in comparing how the concept of 'Olympic legacy' has been approached by the Brazilian and British media considering different locations, languages and social-political contexts. I have written a bit about this before on the UK Web Archive blog in December 2019 and March 2020.

Because of its controversial nature, the term Olympic legacy is used in a variety of contexts and it has multiple meanings. Considering its narrative importance to legitimize the billionaire investment of cities to host these events, this study has as the main objective to explore and define the concept of Olympic Legacy and how it changes over time.

Here however, I will be focusing on my experience doing a secondment at the British Library with the UK Web Archive team. I have explored the potential of using the platform Shine to track news articles on Olympic legacy.

Why Shine?

Shine is a tool to explore .uk websites archived by the Internet Archive between 1996 and April 2013. While a big part of the content of the UK Web Archive can only be accessed from inside the British Library, Shine is open access and provides us with search results and URL data that can be easier to manage.

We have developed a pipeline based on 5 steps: searching, extraction, cleaning, filtering and visualisation. To extract information, we have conducted web scraping of the data using Python notebooks looking at specific newspapers (like The Guardian) and broadcast websites (like BBC) using the keyword “Olympic legacy”. Having searched for URL’s in Shine and extracted the results, the main challenge is cleaning. After extracting just the body text of the articles, we saw that many of them did not mention Olympic legacy. Usually, Shine provides results where the words searched appear in peripheral locations of the webpage. Cleaning consists of removing all the information around the main text, such as images, adverts, menus and links. With the documents we needed in hand, we had to verify if their content is relevant or not to our analysis. Sometimes, the term Olympic legacy appears but it is not necessarily related to Rio and London Olympics or it is not the main topic of the article. The process of filtering demanded a huge effort of close reading to identify contexts. At the end, we have produced some charts to visualise word-trends and topics that pop up around legacy. Although the Shine search results are limited in terms of time - it searched up until 2013 - it has been very useful as an exploratory tool to conduct preliminary analysis in a small-scale, and to build web archive and web scraping methods before applying my methods to huge amounts of texts elsewhere. 

You can watch Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major’s presentation on the EWA YouTube Channel.

*This project has received funding from the European Union’s Horizon 2020 research and innovation programme. For more information: cleopatra-project.eu.