UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

14 posts categorized "Digital scholarship"

19 October 2020

Exploring media events with Shine

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

Computer screen with some HTML code on the screen

This blogpost is a summary of the presentation I delivered with my colleague Daniela Major in the conference Engaging with Web Archives: ‘Opportunities, Challenges and Potentialities’ in September 2020. This presentation is entitled ‘Tracking and analysing media events through web archives’.

My research explores the media coverage of the Olympic Games in a cross-cultural, cross-lingual and temporal perspective. I am especially interested in comparing how the concept of 'Olympic legacy' has been approached by the Brazilian and British media considering different locations, languages and social-political contexts. I have written a bit about this before on the UK Web Archive blog in December 2019 and March 2020.

Because of its controversial nature, the term Olympic legacy is used in a variety of contexts and it has multiple meanings. Considering its narrative importance to legitimize the billionaire investment of cities to host these events, this study has as the main objective to explore and define the concept of Olympic Legacy and how it changes over time.

Here however, I will be focusing on my experience doing a secondment at the British Library with the UK Web Archive team. I have explored the potential of using the platform Shine to track news articles on Olympic legacy.

Why Shine?

Shine is a tool to explore .uk websites archived by the Internet Archive between 1996 and April 2013. While a big part of the content of the UK Web Archive can only be accessed from inside the British Library, Shine is open access and provides us with search results and URL data that can be easier to manage.

We have developed a pipeline based on 5 steps: searching, extraction, cleaning, filtering and visualisation. To extract information, we have conducted web scraping of the data using Python notebooks looking at specific newspapers (like The Guardian) and broadcast websites (like BBC) using the keyword “Olympic legacy”. Having searched for URL’s in Shine and extracted the results, the main challenge is cleaning. After extracting just the body text of the articles, we saw that many of them did not mention Olympic legacy. Usually, Shine provides results where the words searched appear in peripheral locations of the webpage. Cleaning consists of removing all the information around the main text, such as images, adverts, menus and links. With the documents we needed in hand, we had to verify if their content is relevant or not to our analysis. Sometimes, the term Olympic legacy appears but it is not necessarily related to Rio and London Olympics or it is not the main topic of the article. The process of filtering demanded a huge effort of close reading to identify contexts. At the end, we have produced some charts to visualise word-trends and topics that pop up around legacy. Although the Shine search results are limited in terms of time - it searched up until 2013 - it has been very useful as an exploratory tool to conduct preliminary analysis in a small-scale, and to build web archive and web scraping methods before applying my methods to huge amounts of texts elsewhere. 

You can watch Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major’s presentation on the EWA YouTube Channel.

*This project has received funding from the European Union’s Horizon 2020 research and innovation programme. For more information: cleopatra-project.eu.

 

14 October 2020

Engaging with Web Archives - Conference Report

By Jason Webber, Web Archive Engagement Manager, The British Library

 

Engaging with Web Archives conference banner

 

Is it possible to have a successful conference when you can no longer meet in person? Going exclusively online doesn’t seem to have stopped the ‘Engaging with Web Archives’ (EWA) Conference from being a superb experience. Co-Chairs of the event are Sharon Healy and Michael Kurzmeier, PhD students at Maynooth University.

Originally planned as a more traditional, in person, conference in April 2020 the EWA team re-planned for a completely online event on 21and 22 September 2020. It is notable that this was the first web archiving conference in Ireland. Most talks were pre-recorded which meant that questions could be posed in the chat box and were often answered live by the presenter during the talk. This can be a significant advantage of pre-recorded talks.

The programme was packed with high quality presentations from many areas of web archiving but here I’ll highlight a few that were UK Web Archive (UKWA) projects or used UKWA data. 

 

Highlights

 

A Keynote talk was delivered by Professor Jane Winters, School of Advanced Study, University of London. Web archives as sites of collaboration. Jane has worked with the UK Web Archive extensively over many years and is one of only a few Professors in the UK training and promoting web archives to students. Jane's talk (link to YouTube).

 

Sara Day Thomson (University of Edinburgh) Developing a Web Archiving Strategy for the Covid-19 Collecting Initiative at the University of Edinburgh. Sara formerly worked for the Digital Preservation Coalition (DPC) led a ‘Web Archiving Task Force’ and more recently has been building important collections on Covid-19 with the University of Edinburgh in partnership with UKWA. Sara's talk (link to YouTube).

 

Dr. Brendan Power (The Library of Trinity College Dublin): Leveraging the UK Web Archive in an Irish context: Challenges and Opportunities. With Trinity College Dublin being a UK Legal Deposit Library we try and work together as much as possible and this talk highlights what is possible with specific mention of the Easter Rising collection. Brendan's talk (link to YouTube).

 

Robert McNicol (Kenneth Ritchie Wimbledon Library): The UK Web Archive and Wimbledon: A Winning Combination. We try to represent as many aspects of UK life as possible including sport. This also highlights our cooperation with other libraries and archives. See the Tennis collection. Robert's talk (link to YouTube).

 

Dr. Peter Webster (Independent Scholar, Historian and Consultant): Digital archaeology in the web of links: reconstructing a late-90s web sphere. Peter has conducted several pieces of research utilising the UKWA secondary datasets. These are free and available for download. Peter's talk (link to YouTube).

 

Helena Byrne (Curator of web Archiving, British Library): From the sidelines to the archived web: What are the most annoying football phrases in the UK? Helena is a curator in the UK Web Archive but also has a keen interest in sport and women’s football in particular. Here, Helena shows how the Trends feature (graphs) in our SHINE service can help guide research in an easy and accessible way. Helena's talk (link to YouTube).

 

Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major (School of Advanced Study, University of London): Tracking and Analysing Media Events through Web Archives. Caio was a Phd student placement with UKWA as part of the Cleopatra project. Read about some of his work on this blog on Olympic legacy. Caio and Daniella's talk (link to YouTube).

 

Hannah Connell (King’s College London; British Library): Curating culturally themed collections online: The Russia in the UK Special Collection, UK Web Archive. Hannah has worked extensively collecting one of the several diaspora community collections. In addition to Russia in the UK, there is London French and Latin America UK. Hannah's talk (link to YouTube).

 

Dr. Jessica Ogden (University of Southampton) & Emily Maemura (University of Toronto): A tale of two web archives: Challenges of engaging web archival infrastructures for research. Jessica has also worked previously with UKWA as a Phd placement on the challenges of researchers using web archives. This vital work helps guide our planning for the future. Jessica and Emily's talk (link to YouTube).

 

Dr. Olga Holownia (International Internet Preservation Consortium): IIPC: training, research, and outreach activities. Olga works full time for the IIPC but has been based within the UK Web Archive team at the British Library. We have been delighted to have worked with and been supported by the IIPC since it began (The British Library is a founding member).

 

Rosita Murchan (Public Record Office of Northern Ireland): PRONI Web Archive: A Collaborative Approach. PRONI maintains their own web archive but also collaborates with the UK Web Archive in collecting material specific to Northern Ireland. This is important as there currently is no Legal Deposit partner in Northern Ireland. Rosita’s talk (link to YouTube).

 

Summary

Whilst it is a shame not to meet people in person this conference has shown me how online conferences can be a viable way forward. I’m very much looking forward to the next one.

 

See all of the pre-recorded talks on the EWA conference Youtube Channel. You can find the Engaging with Web Archives on Twitter and catch up on the conference discussion with the hashtag #EWAvirtual

 

Look out for more in-depth blog posts from EWA conference speakers over the coming weeks on the UK Web Archive blog.

 

30 September 2020

National Sporting Heritage Day 2020

By Helena Byrne, Curator of Web Archives at the British Library

women playing soccer with a linesman in the foreground
Women playing soccer

 

The 30th September is National Sporting Heritage Day in the UK and to celebrate we will give you a quick overview of the UK Web Archive (UKWA) sporting activities in 2020. UKWA is made up of the six UK Legal Deposit Libraries, these are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

Sport is a subject that shapes and reflects society. As more publications about sport move to online only, preserving this cultural record through web archiving becomes paramount. To mark the occasion back in 2018 we published a blog post outlining the UKWA sports collection policies. 

We have three collections that focus on sport that are actively curated throughout the year:

  1. Sports Collection
  2. Sport: Football 
  3. Sports: International Events

 

International Internet Preservation Consortium (IIPC)

As individual institutions the British Library and the National Library of Scotland are members of the International Internet Preservation Consortium (IIPC) and worked on building collaborative collections covering international events such as the Summer and Winter Olympic/Paralympic Games. 2020 marks ten years of building IIPC Olympic/Paralympic web archive collections.  Since the formation of the IIPC Content Development Group (CDG) in 2015, there has been a consolidated effort to build collections both on and off the playing field. All of the IIPC collections are open access. The CDG planned to build a collection on the Tokyo 2020 Games. However, due to the coronavirus pandemic the Games were rescheduled for 2021 and so was CDG dedicated collection. However, some content around the 2020 event was included in the Novel Coronavirus (COVID-19) collection and there will be updates made to the National Olympic and Paralympic Committees collection this year.  

 

Documenting the Olympics and Paralympics

Even though Tokyo 2020 was postponed until 2021, the symposium Documenting the Olympics & Paralympics, which was supposed to be a full day face-to-face event, went online. This was a collaboration between the web archive team based at the British Library, the International Centre for Sports History and Culture (ICSHC) at De Montfort University, and the British Society of Sports History (BSSH).

A broad mix of physical, digitised and born digital resources were covered in the presentations. You can listen back to an audio recording of this symposium on the Sport in History Podcast. The full abstracts and some of the PowerPoint slides are available on the British Library Research Repository.

 

Engaging with Web Archives Conference

The Engaging with Web Archives conference brought together practitioners and web archive researchers from around the world. There were three presentations on the programme that focused on UK Web Archive sports collections. 

  1. Robert McNicol (Librarian, Kenneth Ritchie Wimbledon Library) discussed the collaboration on developing the Tennis section of the UK Web Archive Sports Collection. 
  2. Helena Byrne (Curator of Web Archives, British Library) looked at tracing the popularity of annoying football phrases on the archived .uk web space from 1996-2013. 
  3. Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major (PhD students, School of Advanced Study, University of London) used the London 2012 and Rio 2016 Olympic Games as a case study to analyse media events through the UK Web Archive. 

A series of blog posts about the Engaging with Web Archives conference will be coming out in the next few weeks on the UK Web Archive blog.

 

Accessing the UK Web Archive

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. 

 

Some of the websites  in UKWA that have already had permission granted, include Heritage Quay, Pride Sports UK and WheelPower. Some examples of websites that are onsite-only access include the Fans Supporting Food Banks, Barnsley Yorkshire: Tour de France and The Women's Open.

 

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

 

Get involved with preserving sports online with the UK Web Archive

We can’t curate the whole of the UK web on our own, we need your help to ensure that information, discussion and creative output related to sport are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate 

 

04 August 2020

Twit twoo: International Owl Awareness Day 2020

By Helena Byrne, Curator of Web Archives, The British Library
 
 
 
An illustration of four owls perched on a branch with the moonlight behind them
British Library digitised image from page 271 of "Madeline Power [A novel] https://www.flickr.com/photos/britishlibrary/11121066504

 

The 4th of August is International Owl Awareness Day. This is the perfect time to reflect on owl related content in the UK Web Archive. 

There are five native species of owls’ resident year-round in the UK, namely the Tawny Owl, Barn Owl, Long-eared Owl, Short-eared Owl and Little Owl. Also, the Snowy Owl is an is an occasional winter visitor to the Outer Hebrides, Shetland and the Cairngorms in Scotland.

Owls online

We were wondering, out of these six owl species, which one is the most popular on the archived .uk domain?

 

UK Owl Species Shine Trends
A graph showing how many mentions the six owl species have on the archived .uk web

 

In order to answer this question, the Shine graph may prove useful. Shine was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC. The data was acquired by JISC from the Internet Archive and includes all .uk websites in the Internet Archive web collection crawled between 1996 and April 2013. The collection comprises over 3.5 billion items (URLs, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.

The most popular owl species referenced in the Shine dataset is the Barn Owl. Despite the curve in the graph being at its peak in 2011, the most popular year for the Barn Owl was 2012. This is because the graph shows the percentage of resources archived for each year and some years have more resources than others. In 2011 there were 66,034 of 288,809,412 archived resources that mention Barn Owl, while in 2012 there were 94,990 of 463,367,189 resources. These numbers are too big to review manually but by clicking at a single point on the graph, Shine will generate a random sample of up to 100 references to the search term. The sample displays a sentence were the term appears, as well as a link out to the Internet Archive so that you can review the archived website.

 

Get creative with owls at the British Library

Video created by Carlos Lelkes-Rarugal, using Tawny Owl hoots recorded by Richard Margoschis in Gloucestershire, England (BL ref 09647). British Library digitised image from page 272 of "The Works of Alfred Tennyson, etc" 

 

Curious about what some of these owls’ sound like? Our Assistant Web Archivist, Carlos Lelkes-Rarugal, designed some short animated videos using recordings from the British Library Sound Archive and images from the British Library Flickr account. You can view these on the UK Web Archive, Digital Scholarship and the Sound Archive’s Wildlife Department Twitter accounts.

The title for this blog post was inspired by the sound made by the Tawny Owl. This and other sounds can be experienced in the Sound Archive at the British Library which has over 2,500 recordings of owls from all over the world. You can hear a selection of some these recordings on the British Library, Sound & Vision blog.

The Digital Scholarship team have also put together a useful album of digitised illustrations of owls on the British Library Flickr account. Their latest blog post encourages you to use these images for various creative projects.

 

Get involved with preserving owls online with the UK Web Archive

The UK Web Archive aims to archive, preserve and give access to the UK web space. We endeavour to include important aspects of British culture and events that shape society. The biodiversity of the UK is an important aspect of our collective national culture and is represented in several British Library collections including the UK Web Archive.

We can’t however, curate the whole of the UK Web on our own, we need your help to ensure that information, discussion and creative output on this subject are preserved for future generations.

Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate

We already have an Online Enthusiast Communities in the UK curated collection that features some owl related websites in the Animal related hobbies subsection. Browse through what we have so far and please nominate more content!

 

23 June 2020

WARCnet and the UK Web Archive

By Jason Webber, Web Archiving Engagement Manager

 

We at the UK Web Archive (UKWA) have recently taken part in a new initiative called WARCnet led by the University of Aarhus in Denmark (and funded by Independent Research Fund Denmark).

“The aim of the WARCnet network is to promote high-quality national and transnational research that will help us to understand the history of (trans)national web domains and of transnational events on the web, drawing on the increasingly important digital cultural heritage held in national web archives.”

 

Warcnetblog-01
WARCnet logo

 

The majority of participants are researchers currently using web archives as part of their studies, many with extensive experience and others new to the field. This makes this an exciting project to be part of as it is an excellent way for content holders such as UKWA to be able to work closely with a group of researchers and try and understand their needs and challenges. The project had a kick-off meeting in May 2020 that was originally intended to be in person but took place virtually. All the speakers pre-recorded their talks which does now mean that these are now all available (including one by myself). I’d particularly recommend viewing the two keynote speakers Matthew S. Weber and Ian Milligan.

 

Warcnetblog-02
Title slide for Jason Webber's WARCnet presentation

 

Working Groups
It is intended for any outcomes from WARCnet to be driven by the participants themselves and to this end four working groups have been formed:

 

  • Working Group 1 - Comparing entire web domains
  • Working Group 2 - Analysing transnational events
  • Working group 3 - Digital research methods and tools
  • Working group 4 - Research data management across borders

 

The UKWA team is involved with each of the first three working groups, all of which have met in the last weeks to see how we can take this project forward. You can read more about each group here.

There are at least three more small conferences planned (currently as in person), one later this year in Luxembourg and two next year in London and Aarhus.

Look out for updates on our involvement with this initiative on this blog and through our twitter account @UKWebArchive and @WARC_net.

09 August 2017

The Proper Serious Work of Preserving Digital Comics

Jen Aggleton is a PhD candidate in Education at the University of Cambridge, and is completing a work placement at the British Library on the subject of digital comics. 

If you are a digital comics creator, publisher, or reader, we would love to hear from you. We’d like to know more about the digital comics that you create, find out links to add to our Web Archive collection, and find examples of comic apps that we could collect. Please email suggestions to [email protected]. For this initial stage of the project, we will be accepting suggestions until the end of August 2017.

I definitely didn’t apply for a three month placement at the British Library just to have an excuse to read comics every day. Having a number of research interests outside of my PhD topic of illustrated novels (including comics and library studies), I am always excited when I find opportunities which allow me to explore these strands a little more. So when I saw that the British Library were looking for PhD placement students to work in the area of 21st century British comics, I jumped at the chance.

Having convinced my supervisor that I wouldn’t just be reading comics all day but would actually be doing proper serious work, I temporarily put aside my PhD and came to London to read lots and lots of digital comics (for the purpose of proper serious work). And that’s when I quickly realised that I was already reading comics every day.

The reason I hadn’t noticed was because I hadn’t specifically picked up a printed comic or gone to a dedicated webcomic site every day (many days, sure, but not every day). I was however reading comics every day on Facebook, slipped in alongside dubiously targeted ads and cat videos. It occurred to me that lots of other people, even those who may not think of themselves as comics readers, were probably doing the same.

Forweb2-slytherinpic
(McGovern, E. My Life As A Background Slytherin, https://www.facebook.com/backgroundslytherin/photos/a.287354904946325.1073741827.287347468280402/338452443169904/?type=3&theater Reproduced with kind permission of Emily McGovern.)

This is because the ways in which we interact with comics have been vastly expanded by digital technology. Comics are now produced and circulated through a number of different platforms, including apps, websites and social media, allowing them to reach further than their traditional audience. These platforms have made digital comics simultaneously both more and less accessible than their print equivalents; many webcomics are available for free online, which means readers no longer have to pay between £8 and £25 for a graphic novel, but does require them to have already paid for a computer/tablet/smartphone and internet connection (or have access to one at their local library, provided their local library wasn’t a victim of austerity measures).

Alongside access to reading comics, access to publishing has also changed. Anyone with access to a computer and internet connection can now publish a comic online. This has opened up comics production to many whose voices may not have often been heard in mainstream print comics, including writers and characters of colour, women, members of the LGBTQ+ community, those with disabilities, and creators who simply cannot give up the stability of full-time employment to commit the time needed to chase their dream of being a comics creator. The result is a vibrant array of digital comics, enormously varying in form and having a significant social and cultural impact.

But digital comics are also far more fragile than their print companions, and this is where the proper serious work part of my placement comes in. Comics apps are frequently removed from app stores as new platform updates come in. Digital files become corrupted, or become obsolete as the technology used to host them is updated and replaced. Websites are taken down, leaving no trace (all those dire warnings that the internet is forever are not exactly true. For more details about the need for digital preservation, see an earlier post to this blog). So in order to make sure that all the fantastic work happening in digital comics now is still available for future generations (which in British Library terms could mean ten years down the line, or five hundred years down the line), we need to find ways to preserve what is being created.

One method of doing this is to establish a dedicated webcomics archive. The British Library already has a UK Web Archive, due to the extension of legal deposit in 2013 to include the collection of non-print items. I am currently working on setting up a special collection of UK webcomics within that archive. This has involved writing collections guidelines covering what will (and won’t) be included in the collection, which had me wrestling with the thorny problem of what exactly a digital comic is (comics scholars will know that nobody can agree on what a print comic is, so you can imagine the fun involved in trying to incorporate digital elements such as audio and video into the mix as well). It has also involved building the collection through web harvesting, tracking down webcomics for inclusion in the collection, and providing metadata (information about the collection item) for cataloguing purposes (this last task may happen to require reading lots of comics).

Alongside this, I am looking into ways that digital comics apps might be preserved, which is very proper serious work indeed. Not only are there many different versions of the same app, depending on what operating system you are using, but many apps are reliant not only on the software of the platform they are running on, but sometimes the hardware as well, with some apps integrating functions such as the camera of a tablet into their design. Simply downloading apps will provide you with lots of digital files that you won’t be able to open in a few years’ time (or possibly even a few months’ time, with the current pace of technology). This is not a problem that can be solved in the duration of a three month placement (or, frankly, given my total lack of technical knowledge, by me at all). What I can do, however, is find people who do have technical knowledge and ask them what they think. Preserving digital comics is a complicated and ongoing process, and it is a great experience to be in at the early stages of exploration.

And you can be involved in this fun experience too! If you are a digital comics creator, publisher, or reader, we would love to hear from you. We’d like to know more about the digital comics that you create, find out links to add to our Web Archive collection, and find examples of comic apps that we could collect. Please email suggestions to [email protected]. For this initial stage of the project, we will be accepting suggestions until the end of August 2017. In that time, we are particularly keen to receive web addresses for UK published webcomics, so that I can continue to build the web archive, and do the proper serious work of reading lots and lots of comics.

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs