THE BRITISH LIBRARY

UK Web Archive blog

7 posts categorized "Research collaboration"

19 October 2020

Exploring media events with Shine

Add comment

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

Computer screen with some HTML code on the screen

This blogpost is a summary of the presentation I delivered with my colleague Daniela Major in the conference Engaging with Web Archives: ‘Opportunities, Challenges and Potentialities’ in September 2020. This presentation is entitled ‘Tracking and analysing media events through web archives’.

My research explores the media coverage of the Olympic Games in a cross-cultural, cross-lingual and temporal perspective. I am especially interested in comparing how the concept of 'Olympic legacy' has been approached by the Brazilian and British media considering different locations, languages and social-political contexts. I have written a bit about this before on the UK Web Archive blog in December 2019 and March 2020.

Because of its controversial nature, the term Olympic legacy is used in a variety of contexts and it has multiple meanings. Considering its narrative importance to legitimize the billionaire investment of cities to host these events, this study has as the main objective to explore and define the concept of Olympic Legacy and how it changes over time.

Here however, I will be focusing on my experience doing a secondment at the British Library with the UK Web Archive team. I have explored the potential of using the platform Shine to track news articles on Olympic legacy.

Why Shine?

Shine is a tool to explore .uk websites archived by the Internet Archive between 1996 and April 2013. While a big part of the content of the UK Web Archive can only be accessed from inside the British Library, Shine is open access and provides us with search results and URL data that can be easier to manage.

We have developed a pipeline based on 5 steps: searching, extraction, cleaning, filtering and visualisation. To extract information, we have conducted web scraping of the data using Python notebooks looking at specific newspapers (like The Guardian) and broadcast websites (like BBC) using the keyword “Olympic legacy”. Having searched for URL’s in Shine and extracted the results, the main challenge is cleaning. After extracting just the body text of the articles, we saw that many of them did not mention Olympic legacy. Usually, Shine provides results where the words searched appear in peripheral locations of the webpage. Cleaning consists of removing all the information around the main text, such as images, adverts, menus and links. With the documents we needed in hand, we had to verify if their content is relevant or not to our analysis. Sometimes, the term Olympic legacy appears but it is not necessarily related to Rio and London Olympics or it is not the main topic of the article. The process of filtering demanded a huge effort of close reading to identify contexts. At the end, we have produced some charts to visualise word-trends and topics that pop up around legacy. Although the Shine search results are limited in terms of time - it searched up until 2013 - it has been very useful as an exploratory tool to conduct preliminary analysis in a small-scale, and to build web archive and web scraping methods before applying my methods to huge amounts of texts elsewhere. 

You can watch Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major’s presentation on the EWA YouTube Channel.

*This project has received funding from the European Union’s Horizon 2020 research and innovation programme. For more information: cleopatra-project.eu.

 

14 October 2020

Engaging with Web Archives - Conference Report

Add comment

By Jason Webber, Web Archive Engagement Manager, The British Library

 

Engaging with Web Archives conference banner

 

Is it possible to have a successful conference when you can no longer meet in person? Going exclusively online doesn’t seem to have stopped the ‘Engaging with Web Archives’ (EWA) Conference from being a superb experience. Co-Chairs of the event are Sharon Healy and Michael Kurzmeier, PhD students at Maynooth University.

Originally planned as a more traditional, in person, conference in April 2020 the EWA team re-planned for a completely online event on 21and 22 September 2020. It is notable that this was the first web archiving conference in Ireland. Most talks were pre-recorded which meant that questions could be posed in the chat box and were often answered live by the presenter during the talk. This can be a significant advantage of pre-recorded talks.

The programme was packed with high quality presentations from many areas of web archiving but here I’ll highlight a few that were UK Web Archive (UKWA) projects or used UKWA data. 

 

Highlights

 

A Keynote talk was delivered by Professor Jane Winters, School of Advanced Study, University of London. Web archives as sites of collaboration. Jane has worked with the UK Web Archive extensively over many years and is one of only a few Professors in the UK training and promoting web archives to students. Jane's talk (link to YouTube).

 

Sara Day Thomson (University of Edinburgh) Developing a Web Archiving Strategy for the Covid-19 Collecting Initiative at the University of Edinburgh. Sara formerly worked for the Digital Preservation Coalition (DPC) led a ‘Web Archiving Task Force’ and more recently has been building important collections on Covid-19 with the University of Edinburgh in partnership with UKWA. Sara's talk (link to YouTube).

 

Dr. Brendan Power (The Library of Trinity College Dublin): Leveraging the UK Web Archive in an Irish context: Challenges and Opportunities. With Trinity College Dublin being a UK Legal Deposit Library we try and work together as much as possible and this talk highlights what is possible with specific mention of the Easter Rising collection. Brendan's talk (link to YouTube).

 

Robert McNicol (Kenneth Ritchie Wimbledon Library): The UK Web Archive and Wimbledon: A Winning Combination. We try to represent as many aspects of UK life as possible including sport. This also highlights our cooperation with other libraries and archives. See the Tennis collection. Robert's talk (link to YouTube).

 

Dr. Peter Webster (Independent Scholar, Historian and Consultant): Digital archaeology in the web of links: reconstructing a late-90s web sphere. Peter has conducted several pieces of research utilising the UKWA secondary datasets. These are free and available for download. Peter's talk (link to YouTube).

 

Helena Byrne (Curator of web Archiving, British Library): From the sidelines to the archived web: What are the most annoying football phrases in the UK? Helena is a curator in the UK Web Archive but also has a keen interest in sport and women’s football in particular. Here, Helena shows how the Trends feature (graphs) in our SHINE service can help guide research in an easy and accessible way. Helena's talk (link to YouTube).

 

Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major (School of Advanced Study, University of London): Tracking and Analysing Media Events through Web Archives. Caio was a Phd student placement with UKWA as part of the Cleopatra project. Read about some of his work on this blog on Olympic legacy. Caio and Daniella's talk (link to YouTube).

 

Hannah Connell (King’s College London; British Library): Curating culturally themed collections online: The Russia in the UK Special Collection, UK Web Archive. Hannah has worked extensively collecting one of the several diaspora community collections. In addition to Russia in the UK, there is London French and Latin America UK. Hannah's talk (link to YouTube).

 

Dr. Jessica Ogden (University of Southampton) & Emily Maemura (University of Toronto): A tale of two web archives: Challenges of engaging web archival infrastructures for research. Jessica has also worked previously with UKWA as a Phd placement on the challenges of researchers using web archives. This vital work helps guide our planning for the future. Jessica and Emily's talk (link to YouTube).

 

Dr. Olga Holownia (International Internet Preservation Consortium): IIPC: training, research, and outreach activities. Olga works full time for the IIPC but has been based within the UK Web Archive team at the British Library. We have been delighted to have worked with and been supported by the IIPC since it began (The British Library is a founding member).

 

Rosita Murchan (Public Record Office of Northern Ireland): PRONI Web Archive: A Collaborative Approach. PRONI maintains their own web archive but also collaborates with the UK Web Archive in collecting material specific to Northern Ireland. This is important as there currently is no Legal Deposit partner in Northern Ireland. Rosita’s talk (link to YouTube).

 

Summary

Whilst it is a shame not to meet people in person this conference has shown me how online conferences can be a viable way forward. I’m very much looking forward to the next one.

 

See all of the pre-recorded talks on the EWA conference Youtube Channel. You can find the Engaging with Web Archives on Twitter and catch up on the conference discussion with the hashtag #EWAvirtual

 

Look out for more in-depth blog posts from EWA conference speakers over the coming weeks on the UK Web Archive blog.

 

30 September 2020

National Sporting Heritage Day 2020

Add comment

By Helena Byrne, Curator of Web Archives at the British Library

women playing soccer with a linesman in the foreground
Women playing soccer

 

The 30th September is National Sporting Heritage Day in the UK and to celebrate we will give you a quick overview of the UK Web Archive (UKWA) sporting activities in 2020. UKWA is made up of the six UK Legal Deposit Libraries, these are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

Sport is a subject that shapes and reflects society. As more publications about sport move to online only, preserving this cultural record through web archiving becomes paramount. To mark the occasion back in 2018 we published a blog post outlining the UKWA sports collection policies. 

We have three collections that focus on sport that are actively curated throughout the year:

  1. Sports Collection
  2. Sport: Football 
  3. Sports: International Events

 

International Internet Preservation Consortium (IIPC)

As individual institutions the British Library and the National Library of Scotland are members of the International Internet Preservation Consortium (IIPC) and worked on building collaborative collections covering international events such as the Summer and Winter Olympic/Paralympic Games. 2020 marks ten years of building IIPC Olympic/Paralympic web archive collections.  Since the formation of the IIPC Content Development Group (CDG) in 2015, there has been a consolidated effort to build collections both on and off the playing field. All of the IIPC collections are open access. The CDG planned to build a collection on the Tokyo 2020 Games. However, due to the coronavirus pandemic the Games were rescheduled for 2021 and so was CDG dedicated collection. However, some content around the 2020 event was included in the Novel Coronavirus (COVID-19) collection and there will be updates made to the National Olympic and Paralympic Committees collection this year.  

 

Documenting the Olympics and Paralympics

Even though Tokyo 2020 was postponed until 2021, the symposium Documenting the Olympics & Paralympics, which was supposed to be a full day face-to-face event, went online. This was a collaboration between the web archive team based at the British Library, the International Centre for Sports History and Culture (ICSHC) at De Montfort University, and the British Society of Sports History (BSSH).

A broad mix of physical, digitised and born digital resources were covered in the presentations. You can listen back to an audio recording of this symposium on the Sport in History Podcast. The full abstracts and some of the PowerPoint slides are available on the British Library Research Repository.

 

Engaging with Web Archives Conference

The Engaging with Web Archives conference brought together practitioners and web archive researchers from around the world. There were three presentations on the programme that focused on UK Web Archive sports collections. 

  1. Robert McNicol (Librarian, Kenneth Ritchie Wimbledon Library) discussed the collaboration on developing the Tennis section of the UK Web Archive Sports Collection. 
  2. Helena Byrne (Curator of Web Archives, British Library) looked at tracing the popularity of annoying football phrases on the archived .uk web space from 1996-2013. 
  3. Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major (PhD students, School of Advanced Study, University of London) used the London 2012 and Rio 2016 Olympic Games as a case study to analyse media events through the UK Web Archive. 

A series of blog posts about the Engaging with Web Archives conference will be coming out in the next few weeks on the UK Web Archive blog.

 

Accessing the UK Web Archive

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. 

 

Some of the websites  in UKWA that have already had permission granted, include Heritage Quay, Pride Sports UK and WheelPower. Some examples of websites that are onsite-only access include the Fans Supporting Food Banks, Barnsley Yorkshire: Tour de France and The Women's Open.

 

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

 

Get involved with preserving sports online with the UK Web Archive

We can’t curate the whole of the UK web on our own, we need your help to ensure that information, discussion and creative output related to sport are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate 

 

17 September 2020

Arnhem75 - a special collection of websites added to the UK Web Archive

Add comment

 

By Marja Kingma, Curator of Germanic Collections, the British Library.

 

Arnhem75 blog image
Book cover of 75 Years Battle of Arnhem by Laurens van Aggelen

 

Introduction

The idea to create a collection of websites about the commemoration of Arnhem75 came to RAF Museum historian Harry Raffal and myself whilst attending the seminar ‘The Arnhem Spirit - 75 years of Brits in Arnhem’, on 15 May 2019, organised by the Dutch Embassy in London. The event was part of a programme in which the Netherlands, Britain and other former Allied countries commemorated Operation Market Garden, the code name for the battle for the bridge across the Rhine at Arnhem that took place in September 1944. Allied forces consisted of British, American and Polish troops, with help from Dutch resistance.

The Battle of Arnhem 1944 is of great significance to the UK and interest in it remains strong on both sides of the North Sea.

We wanted to create a lasting memory of these events and a special collection in the UK Web Archive on the subject seemed like a good idea.

 

What is included?

We kept the scope of the project quite narrow; only websites with a focus on the commemorations that took place in Britain and the Netherlands in 2019 are included, with the exception of some websites that deal with the historic facts regarding the Battle to give it some context.

So far over 150 individual websites within the UK web domain have been identified, of which 64 were selected to go into the collection. These sites are limited to the UK web domain, so have .uk in their domain name, or if they don’t must be hosted in the UK, or owned by UK organisations or individuals with a postal address in the UK.

Some of the websites selected for this collection include the 23 Parachute Field Ambulance, Airborne at the Bridge and Arnhem Oosterbeel War Cemetary.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the UK Legal Deposit Libraries reading rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

For this collection you can view what has been selected through the UK Web Archive website but will need to visit a UK Legal Deposit Library reading room to view the archived content. The reading rooms across the Legal Deposit Libraries are starting to reopen now, with some restrictions, as you can read in this blog: https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html

 

How Can I Get Involved?

You can help expand this collection by sending us a URL you think may be eligible for inclusion in the collection Arnhem75. Please go to https://www.webarchive.org.uk/en/ukwa/info/nominate to nominate a website and we’ll take it from there.

Occasionally websites from non UK domains can be included, if they have a strong link to the UK and the website owners have given their permission to be included in the collection. Dutch organisations that were involved in the Arnhem75 commemorations are encouraged to get in touch.

We look forward to your suggestions!

 

10 August 2020

Going for gold: exploring Olympic & Paralympic resources

Add comment

By Helena Byrne, Curator of Web Archives, The British Library

 

BL Olympics website 2012
Screenshot of the British Library website related to social science research and the Olympics/Paralympics during London 2012 https://www.webarchive.org.uk/wayback/en/archive/20120724080955/http://www.bl.uk/sportandsociety/index.html

 

Originally, Sunday 9th August, 2020 would have been the closing ceremony of the Tokyo 2020 Summer Olympics and we would have been waiting for the start of the Paralympics. However due to the coronavirus pandemic most events big and small were either cancelled, went online or were postponed till 2021. Even though Tokyo 2020 was postponed until 2021, the symposium Documenting the Olympics & Paralympics, which was supposed to be a full day face-to-face event, went online. The event was a much shorter panel session, held via Zoom on the 19th June, 2020.

This was a collaboration between the British Library, the International Centre for Sports History and Culture (ICSHC) at De Montfort University, and the British Society of Sports History (BSSH).

The event was organised not only because 2020 was supposed to be an Olympic and Paralympic year, but also because the UK Web Archive team at the British Library were celebrating two significant anniversaries. It is 15 years since the UK Web Archive was founded. It is also 10 years since the International Internet Preservation Consortium (IIPC) started Olympic and Paralympic collaborative web archive collections.

 

Presentations:

Laura Alexandra Brown, Northumbria University - The heritage of the Games: Interpreting urban change in Olympic host cities

Heather Dichter, De Montfort University - Finding Olympic history in non-sport archives

Robert McNicol, Librarian, Wimbledon Lawn Tennis Museum - Researching the Olympics/Paralympics at Wimbledon

Helena Byrne, Curator of Web Archives, British Library - Preserving the Olympics/Paralympics online

 

Summary:

A broad mix of physical, digitised and born digital resources were covered in the presentations. You can listen back to an audio recording of this symposium on the Sport in History Podcast. While the full abstracts and some of the PowerPoint slides are available on the British Library Research Repository. The official hashtag for the event on Twitter was, #ResearchingTheGames where you can catch up with the online discussions.

Laura Alexandra Brown from Northumbria University, discussed her experience of using archives in her research that primarily relates to architectural design and reuse from the perspective of the Olympic Games.

Heather Dichter from De Montfort University, discussed her experience of using non-sporting archives to research international sport and diplomacy. The aim of this presentation was to highlight to researchers that valuable resources can be also found in non-sporting archives as well as for archivists so that they can help researchers.

Robert McNicol the Librarian at Wimbledon Lawn Tennis Museum, reviewed the history of Wimbledon and the Olympics as well as discussed their collection policy around past and future Olympic and Paralympic Games.

Helena Byrne the Curator of Web Archives at the British Library, discussed the UK Web Archive collections related to the Olympics/Paralympics as well as their general sports collection policy. Along with the ongoing collaboration with the International Internet Preservation Consortium (IIPC).

 

Next event:

We are still planning to hold a face-to-face event at the British Library in July 2021. This will be a full day symposium with a social event planned after the presentations. This event is sponsored by the British Library, ICSHC at De Montfort University, BSSH and the School of Advanced Studies.

We will closely monitor the guidance on coronavirus and social gatherings. Nevertheless, we are hopeful that by next summer planned events can go ahead.

For more details follow the BSSH website, social media, the International Centre for Sports History and Culture (ICSHC) Twitter, the UK Web Archive Twitter as well as the #ResearchingTheGames hashtag on Twitter. Joining details will be posted online in spring 2021.

 

23 June 2020

WARCnet and the UK Web Archive

Add comment

By Jason Webber, Web Archiving Engagement Manager

 

We at the UK Web Archive (UKWA) have recently taken part in a new initiative called WARCnet led by the University of Aarhus in Denmark (and funded by Independent Research Fund Denmark).

“The aim of the WARCnet network is to promote high-quality national and transnational research that will help us to understand the history of (trans)national web domains and of transnational events on the web, drawing on the increasingly important digital cultural heritage held in national web archives.”

 

Warcnetblog-01
WARCnet logo

 

The majority of participants are researchers currently using web archives as part of their studies, many with extensive experience and others new to the field. This makes this an exciting project to be part of as it is an excellent way for content holders such as UKWA to be able to work closely with a group of researchers and try and understand their needs and challenges. The project had a kick-off meeting in May 2020 that was originally intended to be in person but took place virtually. All the speakers pre-recorded their talks which does now mean that these are now all available (including one by myself). I’d particularly recommend viewing the two keynote speakers Matthew S. Weber and Ian Milligan.

 

Warcnetblog-02
Title slide for Jason Webber's WARCnet presentation

 

Working Groups
It is intended for any outcomes from WARCnet to be driven by the participants themselves and to this end four working groups have been formed:

 

  • Working Group 1 - Comparing entire web domains
  • Working Group 2 - Analysing transnational events
  • Working group 3 - Digital research methods and tools
  • Working group 4 - Research data management across borders

 

The UKWA team is involved with each of the first three working groups, all of which have met in the last weeks to see how we can take this project forward. You can read more about each group here.

There are at least three more small conferences planned (currently as in person), one later this year in Luxembourg and two next year in London and Aarhus.

Look out for updates on our involvement with this initiative on this blog and through our twitter account @UKWebArchive and @WARC_net.

09 August 2017

The Proper Serious Work of Preserving Digital Comics

Add comment

Jen Aggleton is a PhD candidate in Education at the University of Cambridge, and is completing a work placement at the British Library on the subject of digital comics. 

If you are a digital comics creator, publisher, or reader, we would love to hear from you. We’d like to know more about the digital comics that you create, find out links to add to our Web Archive collection, and find examples of comic apps that we could collect. Please email suggestions to Jennifer.Aggleton@BL.uk. For this initial stage of the project, we will be accepting suggestions until the end of August 2017.

I definitely didn’t apply for a three month placement at the British Library just to have an excuse to read comics every day. Having a number of research interests outside of my PhD topic of illustrated novels (including comics and library studies), I am always excited when I find opportunities which allow me to explore these strands a little more. So when I saw that the British Library were looking for PhD placement students to work in the area of 21st century British comics, I jumped at the chance.

Having convinced my supervisor that I wouldn’t just be reading comics all day but would actually be doing proper serious work, I temporarily put aside my PhD and came to London to read lots and lots of digital comics (for the purpose of proper serious work). And that’s when I quickly realised that I was already reading comics every day.

The reason I hadn’t noticed was because I hadn’t specifically picked up a printed comic or gone to a dedicated webcomic site every day (many days, sure, but not every day). I was however reading comics every day on Facebook, slipped in alongside dubiously targeted ads and cat videos. It occurred to me that lots of other people, even those who may not think of themselves as comics readers, were probably doing the same.

Forweb2-slytherinpic
(McGovern, E. My Life As A Background Slytherin, https://www.facebook.com/backgroundslytherin/photos/a.287354904946325.1073741827.287347468280402/338452443169904/?type=3&theater Reproduced with kind permission of Emily McGovern.)

This is because the ways in which we interact with comics have been vastly expanded by digital technology. Comics are now produced and circulated through a number of different platforms, including apps, websites and social media, allowing them to reach further than their traditional audience. These platforms have made digital comics simultaneously both more and less accessible than their print equivalents; many webcomics are available for free online, which means readers no longer have to pay between £8 and £25 for a graphic novel, but does require them to have already paid for a computer/tablet/smartphone and internet connection (or have access to one at their local library, provided their local library wasn’t a victim of austerity measures).

Alongside access to reading comics, access to publishing has also changed. Anyone with access to a computer and internet connection can now publish a comic online. This has opened up comics production to many whose voices may not have often been heard in mainstream print comics, including writers and characters of colour, women, members of the LGBTQ+ community, those with disabilities, and creators who simply cannot give up the stability of full-time employment to commit the time needed to chase their dream of being a comics creator. The result is a vibrant array of digital comics, enormously varying in form and having a significant social and cultural impact.

But digital comics are also far more fragile than their print companions, and this is where the proper serious work part of my placement comes in. Comics apps are frequently removed from app stores as new platform updates come in. Digital files become corrupted, or become obsolete as the technology used to host them is updated and replaced. Websites are taken down, leaving no trace (all those dire warnings that the internet is forever are not exactly true. For more details about the need for digital preservation, see an earlier post to this blog). So in order to make sure that all the fantastic work happening in digital comics now is still available for future generations (which in British Library terms could mean ten years down the line, or five hundred years down the line), we need to find ways to preserve what is being created.

One method of doing this is to establish a dedicated webcomics archive. The British Library already has a UK Web Archive, due to the extension of legal deposit in 2013 to include the collection of non-print items. I am currently working on setting up a special collection of UK webcomics within that archive. This has involved writing collections guidelines covering what will (and won’t) be included in the collection, which had me wrestling with the thorny problem of what exactly a digital comic is (comics scholars will know that nobody can agree on what a print comic is, so you can imagine the fun involved in trying to incorporate digital elements such as audio and video into the mix as well). It has also involved building the collection through web harvesting, tracking down webcomics for inclusion in the collection, and providing metadata (information about the collection item) for cataloguing purposes (this last task may happen to require reading lots of comics).

Alongside this, I am looking into ways that digital comics apps might be preserved, which is very proper serious work indeed. Not only are there many different versions of the same app, depending on what operating system you are using, but many apps are reliant not only on the software of the platform they are running on, but sometimes the hardware as well, with some apps integrating functions such as the camera of a tablet into their design. Simply downloading apps will provide you with lots of digital files that you won’t be able to open in a few years’ time (or possibly even a few months’ time, with the current pace of technology). This is not a problem that can be solved in the duration of a three month placement (or, frankly, given my total lack of technical knowledge, by me at all). What I can do, however, is find people who do have technical knowledge and ask them what they think. Preserving digital comics is a complicated and ongoing process, and it is a great experience to be in at the early stages of exploration.

And you can be involved in this fun experience too! If you are a digital comics creator, publisher, or reader, we would love to hear from you. We’d like to know more about the digital comics that you create, find out links to add to our Web Archive collection, and find examples of comic apps that we could collect. Please email suggestions to Jennifer.Aggleton@BL.uk. For this initial stage of the project, we will be accepting suggestions until the end of August 2017. In that time, we are particularly keen to receive web addresses for UK published webcomics, so that I can continue to build the web archive, and do the proper serious work of reading lots and lots of comics.