THE BRITISH LIBRARY

UK Web Archive blog

10 posts categorized "Humanities"

04 November 2020

Curating culturally themed collections online: The Russia in the UK Collection, UK Web Archive

Add comment

By Hannah Connell, Collaborative PhD Student, King’s College London; British Library

Title slide from Hannah's presentation with a London Underground map in Russian

 

I spoke about my position as a curator for the Russia in the UK curated collection as part of the recent Engaging with Web Archives conference (EWA), which was held online from the 21st-22nd of September 2020. This conference reflected the breadth of the web archiving community, bringing together speakers from researchers to librarians, as well as curators and web archiving teams from many different countries.

As always, it was inspiring to participate in such a welcoming event. Even online, the conference retained the collaborative atmosphere which has marked my experience of research in web archiving, allowing new researchers to interact with more experienced practitioners and encouraging questions and conversations between researchers, users and archivists.

The researcher-curated collection, Russia in the UK, is part of the UK Web Archive (UKWA). I was particularly pleased to have had the opportunity to present this curated collection, a resource on the Russian-speaking community in the UK, which was first started in November 2017. Such collections play an important role in making the wide range of material preserved in the UKWA more visible to researchers.

Curators are important to the preservation work of the UKWA. Curated collections are collected manually by curators and researchers with specialist knowledge in their field. The role of a curator in creating a UKWA collection involves identifying relevant websites to be included in a collection, and recording the metadata for these websites, including the translation and transliteration of titles and descriptions in other languages.

This collection is valuable both as a resource for further research, and as a means of questioning research practices. It is not possible to capture everything on the web, and collection curators ensure that a representative sample of websites for each thematic collection are selected. The practice of creating and maintaining a collection such as the Russia in the UK  ultimately influences the shape of the collection and the online representation of the diasporic community it will come to reflect. As such, it is important for researchers and users to understand the decisions taken by curators in selecting and capturing websites.

My paper for EWA focused on the creation of a curation guide for curators of new curated collections. This  draws on the ongoing process of curating the Russia in the UK collection, documenting both the provenance of this special collection and reflecting on this process as a model for future collections.  

In documenting the creation of this collection, I hope to enable future researchers to explore and contribute to this record of the online activity of the Russian diaspora in the UK, and to question and develop the curatorial and research practices behind the curation of collections.

You can watch Hannah Connell’s presentation on the EWA YouTube channel.

 

02 November 2020

Digital archaeology in the web of links: reconstructing a late-90s web sphere

Add comment

By Dr. Peter Webster, Independent Scholar, Historian and Consultant

Fiber cables for the internet

 

The historian of the late 1990s has a problem. The vast bulk of content from the period is no longer on the live web; there are few, if any, indications of what has been lost – no inventory of the 1990s web against which to check. Of the content that was captured by the Internet Archive (more or less the only archive of the Anglophone web of the period), only a superficial layer is exposed to full-text search, and the bulk may only be retrieved by a search for the URL. We do not know what was never archived, and in the archive it is difficult to find what we might want, since there is no means of knowing the URL of a lost resource. Sometimes we need, then, to understand the archived web using only the technical data about itself that it can be made to disclose.

Niels Brügger has defined a web sphere as ‘web material … related to a topic, a theme, an event or a geographic area’.  My paper at the EWA conference presents a method of reconstructing a web sphere, much of which is lost from the live web and exists only in the Internet Archive: the web estate of the many conservative Christian campaign groups in the UK in the 1990s and early 2000s.

This method of web sphere reconstruction is based not on page content but on the relationships between sites, i.e., the web of hyperlinks. The method is iterative, and tracks back and forth between big data and small. Individual archived pages and directories, printed sources, the scholarly record itself, and even traces of previous unsuccessful attempts at web archiving come into play, as does a large dataset held by the British Library. From the more than 2 billion lines in the UK Host Link Graph dataset it is possible to extract the outlines of this particular web sphere.

You can watch Peter Webster’s presentation on his website peterwebster.me

 

Previous studies using a similar method are: 

Webster, Peter. 2019. Lessons from cross-border religion in the Northern Irish web sphere: understanding the limitations of the ccTLD as a proxy for the national web. In The Historical Web and Digital Humanities: the Case of National Web domains, eds Niels Brügger & Ditte Laursen, 110-23. London: Routledge.  http://dx.doi.org/10.17613/yms5-9v95     

Webster, Peter. 2017. Religious discourse in the archived web: Rowan Williams, archbishop of Canterbury, and the sharia law controversy of 2008. In: The Web as History, eds Niels Brügger & Ralph Schroeder, 190-203. London: UCL Press. (Available Open Access at:  https://www.uclpress.co.uk/products/84010)

 

30 October 2020

The UK Web Archive creeps and crawls into the domain of Halloween with byte sized steps

Add comment

By Helena Byrne, Curator of Web Archives at the British Library

Spider web with one spider some small flies stuck in the web and a dragon fly hovering just above the web.
Creepy crawlers - British Library digitised image from page 79 of "The Child's Book of Poetry. A selection of poems, ballads and hymns"

 

Halloween and the UK Web Archive

From the start of October all the shops and supermarkets were filling up with Halloween costumes, decorations and lots of fun sized confectionery that are easy to share with some of the trick-o-treaters who might be knocking on your door. It is not clear yet how the coronavirus pandemic will impact any of the informal celebrations that take place every year. No doubt the UK Web Archive crawlers are picking up lots of Halloween and 5th of November themed webpages as part of the 2020 Domain Crawl.

Halloween in the UK is often perceived to be a cultural import from North America. A YouGov poll in 2019 showed that only 30% of people surveyed were planning to celebrate the occasion. This Shine graph shows how the popularity of the term on the archived .uk web has increased in popularity over time. 

Click on a point in the graph to see a sample of how the phrase is used.

 

Screenshot of the search for Halloween on the UK Web Archive Shine trends search

 

Halloween History

The tradition of Halloween actually goes back centuries and was widely celebrated by people in Ireland, Britain and northern France. During pagan times, the 1st of November was officially the start of winter, this season was  associated with death as the crops, wildlife and many people died due to the cold and lack of sunlight during this period. Because of this day’s association with death, it was believed that the ghosts of the dead returned to earth on the night of the 31st October. It was during the Reformation that the tradition of celebrating Halloween died out in Britain, especially in England

A recent YouGov poll has shown that Guy Fawkes Night is more popular in Great Britain than Halloween

 

The 5th of November

The commemoration on the 5th of November goes by many names, traditionally it was Guy Fawkes Night but is sometimes referred to as Bonfire Night or Fireworks Night. But there seems to be some regional differences in what term is used and how the night is celebrated. 

What do you call this commemoration?

This is a question we visited back in 2017 and as you can see in the Shine graph in more recent years the term Bonfire Night was used more widely on the archived .uk web. 

 

Screenshot of the search results for Bonfire Night, Guy Fawkes, Gun Powder Plot and Fireworks Night on the UK Web Archive Shine interface

 

Get creative with Halloween at the British Library

Our Assistant Web Archivist, Carlos Lelkes-Rarugal, has designed some short animated videos using recordings from the British Library Sound Archive and images from the Ghosts & Ghoulish Scenes, British Library Flickr. See these on the UK Web Archive, Digital Scholarship and the Sound Archive’s Wildlife Department Twitter accounts.

This and other sounds can be experienced in the Sound Archive at the British Library which has over 260,000 wildlife sound recordings from all over the world. You can hear a selection of some of these recordings on the British Library, Sound & Vision blog, the latest blog post Going Batty for Halloween, gives an overview of the history of bats and Halloween. 

The Digital Scholarship’s latest blog post, Mind Your Paws and Claws, encourages you to use these images and sounds for various creative projects. The Ghosts & Ghoulish Scenes Flickr Album was previously used in the Gothic Off the Map competition

 

Get involved with preserving the UK web

The UK Web Archive aims to archive, preserve and give access to the historic UK web space. We endeavour to include important aspects of British culture and events that shape society. 

Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: www.webarchive.org.uk/nominate 

We have a Festivals collection, but are there any local Halloween or 5th of November events near you that haven’t been added yet? Equally, if these events have now been cancelled, we would like to add some of these online cancellation notifications to our collection Coronavirus (COVID-19) UK. Browse through what we have so far and please nominate more content!

 

28 October 2020

PRONI Web Archive: A Collaborative Approach

Add comment

By Rosita Murchan, Web Archivist, Public Record Office of Northern Ireland (PRONI)

PRONI Web Archive homepage
Screenshot of the PRONI Web Archive homepage

 

The Public Record Office of Northern Ireland (PRONI) Web Archive has been building its collection of websites for almost ten years, focusing initially on capturing the websites of our Government departments and local councils but also websites deemed historically or culturally important to Northern Ireland.

Our collection has grown in both size and scope and we now have over 240 captured sites that range from Instagram and twitter feeds to local history group pages to significant inquest sites with one terabyte of data being captured each year.

Unlike the rest of UK where legal deposit libraries are entitled to copy published material from the internet under the 2013 Non-Print Legal Deposit Regulations (NPLD),  PRONI has no legal deposit status and rather works on a permissions based approach where we write to website owners informing them of our intention to archive their website and then operate on a ‘silence is consent’ approach crawling a site anyway and taking down websites should the owner request it. 

In an attempt to continue to expand our collection we have been very lucky to have been invited to collaborate with the UK Web Archive team based at the British Library on some of their projects in the last year and have added a Northern Ireland perspective on the topics of Brexit and the General Election 2019 and more recently a NI Covid-19 viewpoint.

Part and parcel of this collaboration included us getting access to and being able to use and add to the UK Web Archive ACT Web Archiving tool. For me this was a great opportunity to see and use another web archiving tool, especially a custom piece of software for an institution as reputable as the British Library and it has been a fantastic opportunity for us to archive sites that would normally be outside our remit, as a result we have been able to add to further research for Northern Ireland.

We really hope to continue this partnership with the British library going forward, not only as a method of increasing the amount of NI archived sites but also as a way to continue to improve and learn from their expertise.

You can watch Rosita Murchan’s presentation on the EWA YouTube Channel.

 

26 October 2020

The 1916 Easter Rising Web Archive

Add comment

By Brendan Power, Digital Preservation Librarian, Library of Trinity College Dublin

The 3 Legal Deposit Library logos who were involved in the collaboration - Bodleian Libraries, Trinitiy College Dublin and the British Library

At the recent conference, ‘Engaging with Web Archives: Opportunities, Challenges and Potentialities’, I presented a paper on a collaborative project between The Library of Trinity College Dublin, the University of Dublin, the Bodleian Libraries, the University of Oxford, and the British Library. The project was carried out in 2015/16 and aimed to identify, collect, and preserve online resources related to the 1916 Easter Rising and the diverse ways it was commemorated and engaged with throughout its centenary in 2016. The Bodleian Libraries primarily collected UK websites under the provisions of the 2013 Non-Print Legal Deposit Regulations (NPLD), while The Library of Trinity College Dublin focused on websites in the .ie domain. Since no legislation exists in the Republic of Ireland to ensure that the .ie domain is preserved, websites within the .ie domain were collected on a voluntary basis, that is, with the express formal permission of the website owners through the signing of a license agreement.

 

We aimed to reflect the variety of ways that the Irish and British states, cultural and educational institutions, as well as communities and individuals, approached the centenary events. These included official commemorative websites, the websites of museums, archives, heritage, cultural, and education institutions, along with traditional and alternative news media websites, blogs, and community websites. These resources will be invaluable primary resources to analyse how people interpreted and engaged with the Easter Rising in its centenary year. Researchers have reflected on the events organised on the fiftieth anniversary of the Easter Rising in 1966 and how these events were framed, the aspects that were championed, and the critical viewpoints denied expression. In a similar way, the records created throughout the centenary will be an essential resource for researchers in analysing how the generations of 2016 engaged with the legacy of the Easter Rising and the approaches, themes, and tone adopted.

 

The resulting web archive collection contains over 318 seeds, i.e. websites or sub-sections of these. Of these 318 websites, 112 (35%) were selected by The Library of Trinity College Dublin, 190 (60%) by the Bodleian Libraries, and 16 (5%) by curators at the British Library. 118 (37%) of the websites were from the .ie domain, 172 (54%) were from the .uk domain and 28 (9%) were associated with other areas, predominantly the USA. For all websites outside the UK (146), formal permission was sought from the website owners, resulting in 61 licenses to archive and make the archived copies publicly available. We received no response from 83 website owners, and 2 organisations agreed in principle to inclusion in the web archive but were not in a position to sign the license agreement required to allow us to archive the website as they could not affirm that they controlled the copyright of all the content that was to be archived. This meant an overall permissions rate of 42%, with the rate for websites in the .ie domain being even higher, at 51%.

 

Since the project was completed there have been many helpful reminders of the impact that such work has. This included one organisation that had created a website dedicated to an Easter Rising project which was no longer live on the web. The person that was responsible for the website had left the organisation and their replacement had no access to the materials that had been on the website. They had discovered an e-mail from me back in 2016 inviting them to participate in the web archive. Once they contacted me, I was able to direct them to the UK web archive and, as the organisation had signed the license agreement, they were able to access the archived website immediately from their office. This access had saved them both the time and staff resources that would have been expended in order to recreate some of the resources that were available on the archived website. It serves as an example of what embedding sustainability into a project can save in terms of time and staff resources and demonstrated the positive economic impact that organisations can derive by participation in cultural heritage initiatives such as web archives.

 

The co-curators of this collection have also previously published a paper on the collection in the academic journal, Internet Histories called Capturing commemoration: the 1916 Easter Rising web archive project.

You can watch Brendan Power’s presentation on the EWA YouTube Channel.

 

19 October 2020

Exploring media events with Shine

Add comment

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

Computer screen with some HTML code on the screen

This blogpost is a summary of the presentation I delivered with my colleague Daniela Major in the conference Engaging with Web Archives: ‘Opportunities, Challenges and Potentialities’ in September 2020. This presentation is entitled ‘Tracking and analysing media events through web archives’.

My research explores the media coverage of the Olympic Games in a cross-cultural, cross-lingual and temporal perspective. I am especially interested in comparing how the concept of 'Olympic legacy' has been approached by the Brazilian and British media considering different locations, languages and social-political contexts. I have written a bit about this before on the UK Web Archive blog in December 2019 and March 2020.

Because of its controversial nature, the term Olympic legacy is used in a variety of contexts and it has multiple meanings. Considering its narrative importance to legitimize the billionaire investment of cities to host these events, this study has as the main objective to explore and define the concept of Olympic Legacy and how it changes over time.

Here however, I will be focusing on my experience doing a secondment at the British Library with the UK Web Archive team. I have explored the potential of using the platform Shine to track news articles on Olympic legacy.

Why Shine?

Shine is a tool to explore .uk websites archived by the Internet Archive between 1996 and April 2013. While a big part of the content of the UK Web Archive can only be accessed from inside the British Library, Shine is open access and provides us with search results and URL data that can be easier to manage.

We have developed a pipeline based on 5 steps: searching, extraction, cleaning, filtering and visualisation. To extract information, we have conducted web scraping of the data using Python notebooks looking at specific newspapers (like The Guardian) and broadcast websites (like BBC) using the keyword “Olympic legacy”. Having searched for URL’s in Shine and extracted the results, the main challenge is cleaning. After extracting just the body text of the articles, we saw that many of them did not mention Olympic legacy. Usually, Shine provides results where the words searched appear in peripheral locations of the webpage. Cleaning consists of removing all the information around the main text, such as images, adverts, menus and links. With the documents we needed in hand, we had to verify if their content is relevant or not to our analysis. Sometimes, the term Olympic legacy appears but it is not necessarily related to Rio and London Olympics or it is not the main topic of the article. The process of filtering demanded a huge effort of close reading to identify contexts. At the end, we have produced some charts to visualise word-trends and topics that pop up around legacy. Although the Shine search results are limited in terms of time - it searched up until 2013 - it has been very useful as an exploratory tool to conduct preliminary analysis in a small-scale, and to build web archive and web scraping methods before applying my methods to huge amounts of texts elsewhere. 

You can watch Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major’s presentation on the EWA YouTube Channel.

*This project has received funding from the European Union’s Horizon 2020 research and innovation programme. For more information: cleopatra-project.eu.

 

25 September 2020

The World of Food and the UK Web Archive

Add comment

 

By Helena Byrne, Curator of Web Archives at the British Library

 

Assorted sliced fruits in white ceramic bowl surrounded by more sliced fruits and some small muffins
A variety of food

 

Food is a subject that transcends culture, politics and leisure practices. Thus, food has always been a key part of the UK Web Archive (UKWA) since it was established in 2005. 

 

Recipes, restaurant menus, food blogs, online reviews are just the start of food related online material that UKWA collects. Even protest and campaigning can be food related, for instance, this summer, footballer Marcus Rashford highlighted the issue of child poverty and the lack of access to food, especially during the school holidays. 

 

For the last three years the British Library has been running a series of events around food. Due to the coronavirus pandemic, this year's Food Season moved online with a series of talks over the autumn period. 

 

The Food Season celebrates the British Library’s extensive food-related collections and explores the politics, pleasures and history of food. UKWA, which is a partnership of the six UK Legal Deposit Libraries, including the British Library, also has an extensive collection of food related websites. 

 

Food collections

In 2017, the Food Archive collection was established. This collection covers the following topics:

There are currently 333 websites or web pages in this collection. Some of the websites selected include Eat Like a Girl, the Good Grub Club and the Veggies Catering Campaign. Why not have a browse through the collection and nominate your favourite UK published food sites or restaurant websites to be included in the collection? Anyone can nominate a website by following this link: https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

Even though there is a dedicated collection about food, it also features as a subsection in a number of other collections. ‘Food and Drink’ is a subsection in both the Festivals and Online Enthusiast Communities in the UK collections. In addition, individual food websites appear in several other collections. Websites related to food activism appear in both the Political Action and Communication collection as well as the (soccer) fan subsection of the Sport: Football Collection, as numerous supporters clubs have organised to support their local food banks. 

 

Social media is a very popular way to share food and micro-reviews of eateries, however, this is often challenging for us to archive. At present, Twitter is the only social media platform that we archive on a regular basis but these captures are by no means comprehensive. We have experimented with other methods of archiving social media but this is on a selective basis.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

 

Some of the websites  in UKWA that have already had permission granted, these include the Cake Fest Edinburgh, the Lancashire Pork Pie Appreciation Society and the Food Research Collaboration. Some examples of websites that are onsite-only access include the Biscuit Appreciation Society, the UK Menu Archive and Fans Supporting Food Banks.

 

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

Due to the coronavirus pandemic, the reading rooms were closed for a number of weeks but are starting to reopen. This blog post gives an overview of opening hours and how to book a visit at the six UK Legal Deposit Libraries:

https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html 

 

We would especially like to see more food and drink nominations that reflect the multicultural nature of the UK and the many diaspora communities based here. Browse through what we have so far and please nominate more content here:

https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

17 September 2020

Arnhem75 - a special collection of websites added to the UK Web Archive

Add comment

 

By Marja Kingma, Curator of Germanic Collections, the British Library.

 

Arnhem75 blog image
Book cover of 75 Years Battle of Arnhem by Laurens van Aggelen

 

Introduction

The idea to create a collection of websites about the commemoration of Arnhem75 came to RAF Museum historian Harry Raffal and myself whilst attending the seminar ‘The Arnhem Spirit - 75 years of Brits in Arnhem’, on 15 May 2019, organised by the Dutch Embassy in London. The event was part of a programme in which the Netherlands, Britain and other former Allied countries commemorated Operation Market Garden, the code name for the battle for the bridge across the Rhine at Arnhem that took place in September 1944. Allied forces consisted of British, American and Polish troops, with help from Dutch resistance.

The Battle of Arnhem 1944 is of great significance to the UK and interest in it remains strong on both sides of the North Sea.

We wanted to create a lasting memory of these events and a special collection in the UK Web Archive on the subject seemed like a good idea.

 

What is included?

We kept the scope of the project quite narrow; only websites with a focus on the commemorations that took place in Britain and the Netherlands in 2019 are included, with the exception of some websites that deal with the historic facts regarding the Battle to give it some context.

So far over 150 individual websites within the UK web domain have been identified, of which 64 were selected to go into the collection. These sites are limited to the UK web domain, so have .uk in their domain name, or if they don’t must be hosted in the UK, or owned by UK organisations or individuals with a postal address in the UK.

Some of the websites selected for this collection include the 23 Parachute Field Ambulance, Airborne at the Bridge and Arnhem Oosterbeel War Cemetary.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the UK Legal Deposit Libraries reading rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

For this collection you can view what has been selected through the UK Web Archive website but will need to visit a UK Legal Deposit Library reading room to view the archived content. The reading rooms across the Legal Deposit Libraries are starting to reopen now, with some restrictions, as you can read in this blog: https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html

 

How Can I Get Involved?

You can help expand this collection by sending us a URL you think may be eligible for inclusion in the collection Arnhem75. Please go to https://www.webarchive.org.uk/en/ukwa/info/nominate to nominate a website and we’ll take it from there.

Occasionally websites from non UK domains can be included, if they have a strong link to the UK and the website owners have given their permission to be included in the collection. Dutch organisations that were involved in the Arnhem75 commemorations are encouraged to get in touch.

We look forward to your suggestions!