UK Web Archive blog

13 posts categorized "Humanities"

19 October 2020

Exploring media events with Shine

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

Computer screen with some HTML code on the screen

This blogpost is a summary of the presentation I delivered with my colleague Daniela Major in the conference Engaging with Web Archives: ‘Opportunities, Challenges and Potentialities’ in September 2020. This presentation is entitled ‘Tracking and analysing media events through web archives’.

My research explores the media coverage of the Olympic Games in a cross-cultural, cross-lingual and temporal perspective. I am especially interested in comparing how the concept of 'Olympic legacy' has been approached by the Brazilian and British media considering different locations, languages and social-political contexts. I have written a bit about this before on the UK Web Archive blog in December 2019 and March 2020.

Because of its controversial nature, the term Olympic legacy is used in a variety of contexts and it has multiple meanings. Considering its narrative importance to legitimize the billionaire investment of cities to host these events, this study has as the main objective to explore and define the concept of Olympic Legacy and how it changes over time.

Here however, I will be focusing on my experience doing a secondment at the British Library with the UK Web Archive team. I have explored the potential of using the platform Shine to track news articles on Olympic legacy.

Why Shine?

Shine is a tool to explore .uk websites archived by the Internet Archive between 1996 and April 2013. While a big part of the content of the UK Web Archive can only be accessed from inside the British Library, Shine is open access and provides us with search results and URL data that can be easier to manage.

We have developed a pipeline based on 5 steps: searching, extraction, cleaning, filtering and visualisation. To extract information, we have conducted web scraping of the data using Python notebooks looking at specific newspapers (like The Guardian) and broadcast websites (like BBC) using the keyword “Olympic legacy”. Having searched for URL’s in Shine and extracted the results, the main challenge is cleaning. After extracting just the body text of the articles, we saw that many of them did not mention Olympic legacy. Usually, Shine provides results where the words searched appear in peripheral locations of the webpage. Cleaning consists of removing all the information around the main text, such as images, adverts, menus and links. With the documents we needed in hand, we had to verify if their content is relevant or not to our analysis. Sometimes, the term Olympic legacy appears but it is not necessarily related to Rio and London Olympics or it is not the main topic of the article. The process of filtering demanded a huge effort of close reading to identify contexts. At the end, we have produced some charts to visualise word-trends and topics that pop up around legacy. Although the Shine search results are limited in terms of time - it searched up until 2013 - it has been very useful as an exploratory tool to conduct preliminary analysis in a small-scale, and to build web archive and web scraping methods before applying my methods to huge amounts of texts elsewhere. 

You can watch Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major’s presentation on the EWA YouTube Channel.

*This project has received funding from the European Union’s Horizon 2020 research and innovation programme. For more information: cleopatra-project.eu.

 

25 September 2020

The World of Food and the UK Web Archive

 

By Helena Byrne, Curator of Web Archives at the British Library

 

Assorted sliced fruits in white ceramic bowl surrounded by more sliced fruits and some small muffins
A variety of food

 

Food is a subject that transcends culture, politics and leisure practices. Thus, food has always been a key part of the UK Web Archive (UKWA) since it was established in 2005. 

 

Recipes, restaurant menus, food blogs, online reviews are just the start of food related online material that UKWA collects. Even protest and campaigning can be food related, for instance, this summer, footballer Marcus Rashford highlighted the issue of child poverty and the lack of access to food, especially during the school holidays. 

 

For the last three years the British Library has been running a series of events around food. Due to the coronavirus pandemic, this year's Food Season moved online with a series of talks over the autumn period. 

 

The Food Season celebrates the British Library’s extensive food-related collections and explores the politics, pleasures and history of food. UKWA, which is a partnership of the six UK Legal Deposit Libraries, including the British Library, also has an extensive collection of food related websites. 

 

Food collections

In 2017, the Food Archive collection was established. This collection covers the following topics:

There are currently 333 websites or web pages in this collection. Some of the websites selected include Eat Like a Girl, the Good Grub Club and the Veggies Catering Campaign. Why not have a browse through the collection and nominate your favourite UK published food sites or restaurant websites to be included in the collection? Anyone can nominate a website by following this link: https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

Even though there is a dedicated collection about food, it also features as a subsection in a number of other collections. ‘Food and Drink’ is a subsection in both the Festivals and Online Enthusiast Communities in the UK collections. In addition, individual food websites appear in several other collections. Websites related to food activism appear in both the Political Action and Communication collection as well as the (soccer) fan subsection of the Sport: Football Collection, as numerous supporters clubs have organised to support their local food banks. 

 

Social media is a very popular way to share food and micro-reviews of eateries, however, this is often challenging for us to archive. At present, Twitter is the only social media platform that we archive on a regular basis but these captures are by no means comprehensive. We have experimented with other methods of archiving social media but this is on a selective basis.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

 

Some of the websites  in UKWA that have already had permission granted, these include the Cake Fest Edinburgh, the Lancashire Pork Pie Appreciation Society and the Food Research Collaboration. Some examples of websites that are onsite-only access include the Biscuit Appreciation Society, the UK Menu Archive and Fans Supporting Food Banks.

 

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

Due to the coronavirus pandemic, the reading rooms were closed for a number of weeks but are starting to reopen. This blog post gives an overview of opening hours and how to book a visit at the six UK Legal Deposit Libraries:

https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html 

 

We would especially like to see more food and drink nominations that reflect the multicultural nature of the UK and the many diaspora communities based here. Browse through what we have so far and please nominate more content here:

https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

17 September 2020

Arnhem75 - a special collection of websites added to the UK Web Archive

 

By Marja Kingma, Curator of Germanic Collections, the British Library.

 

Arnhem75 blog image
Book cover of 75 Years Battle of Arnhem by Laurens van Aggelen

 

Introduction

The idea to create a collection of websites about the commemoration of Arnhem75 came to RAF Museum historian Harry Raffal and myself whilst attending the seminar ‘The Arnhem Spirit - 75 years of Brits in Arnhem’, on 15 May 2019, organised by the Dutch Embassy in London. The event was part of a programme in which the Netherlands, Britain and other former Allied countries commemorated Operation Market Garden, the code name for the battle for the bridge across the Rhine at Arnhem that took place in September 1944. Allied forces consisted of British, American and Polish troops, with help from Dutch resistance.

The Battle of Arnhem 1944 is of great significance to the UK and interest in it remains strong on both sides of the North Sea.

We wanted to create a lasting memory of these events and a special collection in the UK Web Archive on the subject seemed like a good idea.

 

What is included?

We kept the scope of the project quite narrow; only websites with a focus on the commemorations that took place in Britain and the Netherlands in 2019 are included, with the exception of some websites that deal with the historic facts regarding the Battle to give it some context.

So far over 150 individual websites within the UK web domain have been identified, of which 64 were selected to go into the collection. These sites are limited to the UK web domain, so have .uk in their domain name, or if they don’t must be hosted in the UK, or owned by UK organisations or individuals with a postal address in the UK.

Some of the websites selected for this collection include the 23 Parachute Field Ambulance, Airborne at the Bridge and Arnhem Oosterbeel War Cemetary.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the UK Legal Deposit Libraries reading rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

For this collection you can view what has been selected through the UK Web Archive website but will need to visit a UK Legal Deposit Library reading room to view the archived content. The reading rooms across the Legal Deposit Libraries are starting to reopen now, with some restrictions, as you can read in this blog: https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html

 

How Can I Get Involved?

You can help expand this collection by sending us a URL you think may be eligible for inclusion in the collection Arnhem75. Please go to https://www.webarchive.org.uk/en/ukwa/info/nominate to nominate a website and we’ll take it from there.

Occasionally websites from non UK domains can be included, if they have a strong link to the UK and the website owners have given their permission to be included in the collection. Dutch organisations that were involved in the Arnhem75 commemorations are encouraged to get in touch.

We look forward to your suggestions!

 

23 June 2020

WARCnet and the UK Web Archive

By Jason Webber, Web Archiving Engagement Manager

 

We at the UK Web Archive (UKWA) have recently taken part in a new initiative called WARCnet led by the University of Aarhus in Denmark (and funded by Independent Research Fund Denmark).

“The aim of the WARCnet network is to promote high-quality national and transnational research that will help us to understand the history of (trans)national web domains and of transnational events on the web, drawing on the increasingly important digital cultural heritage held in national web archives.”

 

Warcnetblog-01
WARCnet logo

 

The majority of participants are researchers currently using web archives as part of their studies, many with extensive experience and others new to the field. This makes this an exciting project to be part of as it is an excellent way for content holders such as UKWA to be able to work closely with a group of researchers and try and understand their needs and challenges. The project had a kick-off meeting in May 2020 that was originally intended to be in person but took place virtually. All the speakers pre-recorded their talks which does now mean that these are now all available (including one by myself). I’d particularly recommend viewing the two keynote speakers Matthew S. Weber and Ian Milligan.

 

Warcnetblog-02
Title slide for Jason Webber's WARCnet presentation

 

Working Groups
It is intended for any outcomes from WARCnet to be driven by the participants themselves and to this end four working groups have been formed:

 

  • Working Group 1 - Comparing entire web domains
  • Working Group 2 - Analysing transnational events
  • Working group 3 - Digital research methods and tools
  • Working group 4 - Research data management across borders

 

The UKWA team is involved with each of the first three working groups, all of which have met in the last weeks to see how we can take this project forward. You can read more about each group here.

There are at least three more small conferences planned (currently as in person), one later this year in Luxembourg and two next year in London and Aarhus.

Look out for updates on our involvement with this initiative on this blog and through our twitter account @UKWebArchive and @WARC_net.

09 August 2017

The Proper Serious Work of Preserving Digital Comics

Jen Aggleton is a PhD candidate in Education at the University of Cambridge, and is completing a work placement at the British Library on the subject of digital comics. 

If you are a digital comics creator, publisher, or reader, we would love to hear from you. We’d like to know more about the digital comics that you create, find out links to add to our Web Archive collection, and find examples of comic apps that we could collect. Please email suggestions to [email protected]. For this initial stage of the project, we will be accepting suggestions until the end of August 2017.

I definitely didn’t apply for a three month placement at the British Library just to have an excuse to read comics every day. Having a number of research interests outside of my PhD topic of illustrated novels (including comics and library studies), I am always excited when I find opportunities which allow me to explore these strands a little more. So when I saw that the British Library were looking for PhD placement students to work in the area of 21st century British comics, I jumped at the chance.

Having convinced my supervisor that I wouldn’t just be reading comics all day but would actually be doing proper serious work, I temporarily put aside my PhD and came to London to read lots and lots of digital comics (for the purpose of proper serious work). And that’s when I quickly realised that I was already reading comics every day.

The reason I hadn’t noticed was because I hadn’t specifically picked up a printed comic or gone to a dedicated webcomic site every day (many days, sure, but not every day). I was however reading comics every day on Facebook, slipped in alongside dubiously targeted ads and cat videos. It occurred to me that lots of other people, even those who may not think of themselves as comics readers, were probably doing the same.

Forweb2-slytherinpic
(McGovern, E. My Life As A Background Slytherin, https://www.facebook.com/backgroundslytherin/photos/a.287354904946325.1073741827.287347468280402/338452443169904/?type=3&theater Reproduced with kind permission of Emily McGovern.)

This is because the ways in which we interact with comics have been vastly expanded by digital technology. Comics are now produced and circulated through a number of different platforms, including apps, websites and social media, allowing them to reach further than their traditional audience. These platforms have made digital comics simultaneously both more and less accessible than their print equivalents; many webcomics are available for free online, which means readers no longer have to pay between £8 and £25 for a graphic novel, but does require them to have already paid for a computer/tablet/smartphone and internet connection (or have access to one at their local library, provided their local library wasn’t a victim of austerity measures).

Alongside access to reading comics, access to publishing has also changed. Anyone with access to a computer and internet connection can now publish a comic online. This has opened up comics production to many whose voices may not have often been heard in mainstream print comics, including writers and characters of colour, women, members of the LGBTQ+ community, those with disabilities, and creators who simply cannot give up the stability of full-time employment to commit the time needed to chase their dream of being a comics creator. The result is a vibrant array of digital comics, enormously varying in form and having a significant social and cultural impact.

But digital comics are also far more fragile than their print companions, and this is where the proper serious work part of my placement comes in. Comics apps are frequently removed from app stores as new platform updates come in. Digital files become corrupted, or become obsolete as the technology used to host them is updated and replaced. Websites are taken down, leaving no trace (all those dire warnings that the internet is forever are not exactly true. For more details about the need for digital preservation, see an earlier post to this blog). So in order to make sure that all the fantastic work happening in digital comics now is still available for future generations (which in British Library terms could mean ten years down the line, or five hundred years down the line), we need to find ways to preserve what is being created.

One method of doing this is to establish a dedicated webcomics archive. The British Library already has a UK Web Archive, due to the extension of legal deposit in 2013 to include the collection of non-print items. I am currently working on setting up a special collection of UK webcomics within that archive. This has involved writing collections guidelines covering what will (and won’t) be included in the collection, which had me wrestling with the thorny problem of what exactly a digital comic is (comics scholars will know that nobody can agree on what a print comic is, so you can imagine the fun involved in trying to incorporate digital elements such as audio and video into the mix as well). It has also involved building the collection through web harvesting, tracking down webcomics for inclusion in the collection, and providing metadata (information about the collection item) for cataloguing purposes (this last task may happen to require reading lots of comics).

Alongside this, I am looking into ways that digital comics apps might be preserved, which is very proper serious work indeed. Not only are there many different versions of the same app, depending on what operating system you are using, but many apps are reliant not only on the software of the platform they are running on, but sometimes the hardware as well, with some apps integrating functions such as the camera of a tablet into their design. Simply downloading apps will provide you with lots of digital files that you won’t be able to open in a few years’ time (or possibly even a few months’ time, with the current pace of technology). This is not a problem that can be solved in the duration of a three month placement (or, frankly, given my total lack of technical knowledge, by me at all). What I can do, however, is find people who do have technical knowledge and ask them what they think. Preserving digital comics is a complicated and ongoing process, and it is a great experience to be in at the early stages of exploration.

And you can be involved in this fun experience too! If you are a digital comics creator, publisher, or reader, we would love to hear from you. We’d like to know more about the digital comics that you create, find out links to add to our Web Archive collection, and find examples of comic apps that we could collect. Please email suggestions to [email protected]. For this initial stage of the project, we will be accepting suggestions until the end of August 2017. In that time, we are particularly keen to receive web addresses for UK published webcomics, so that I can continue to build the web archive, and do the proper serious work of reading lots and lots of comics.

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs