UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

17 March 2021

Shakespeare in the UK Web

By Jason Webber, Web Archive Engagement Manager, The British Library

It's Shakespeare week (15-21 March). William Shakespeare is, almost certainly, the most quoted literary figure (in English) and the popularity of his plays and poems endures into the digital age. His work is continuingly being taught, examined, analysed and most of all, quoted on the internet. Often quoted in unlikely places such as 'Now is the winter of our discontent' on the Butterfly Conservation website.

Shakespeare-butterfly

Most Popular?
What are the most popular Shakespeare quotes? Perhaps unsurprisingly 'To be or not to be" has far and away the most mentions in our SHINE service - all .uk websites collected 1996-2012 (JISC dataset obtained from the Internet Archive):

Shakespeare quotes 01

Shakespeare quotes from SHINE

If we take away "to be or not to be" this graph looks even more interesting:

Shakespeare quotes 02

Shakespeare quotes from SHINE

Want to try your own Shakespeare quotes in our SHINE service?

  1. Go to the trends page of SHINE: www.webarchive.org.uk/shine/graph
  2. Add a word or phrase into the input box, NOTE: phrases should go in quotes e.g. "all that glisters"
  3. To compare multiple words or phrases, separate by a comma e.g. "william shakespeare", "christopher marlowe, "ben johnson"
  4. Click on any point in the graph to see examples of the context the word or phrase was used
  5. Enjoy!

Do let us know your own favourite quotes on Twitter: @UKWebArchive

12 March 2021

University of Edinburgh’s Collecting Covid-19 Initiative: Collaborative Collection Building with the UKWA

By Sara Day Thomson (Digital Archivist), Lorraine McLoughlin (Appraisal Archivist), and Aline Brodin (Cataloguing Archivist), University of Edinburgh  

With thanks to Eilidh MacGlone, Web Archivist, National Library of Scotland and UK Web Archive 

Introduction
The University of Edinburgh’s Centre for Research Collections (CRC) – which includes collections housed in libraries, archives, galleries, and museums – launched the Collecting Covid-19 Initiative in late April 2020. The Initiative invites staff, students, and anyone affiliated with the University to donate any materials that document their experiences of the pandemic and lockdown. Websites, photographs, videos, artwork, and all other materials are welcomed. In preserving a range of materials and formats for the long term, the CRC aims to prevent gaps in memory and to preserve a record of the University’s response to the pandemic.  

CRC Montage

As submissions began to come in via our online form, it became evident that online communications and platforms have played a critical role in how the University community has responded to the pandemic and lockdown. Even some submissions in more ‘traditional’ formats, like images or narratives, have been published online and submitted as a URL. In addition, web-based submissions range from ‘flat’ websites to social media posts to content shared on third party platforms. However, with no web archiving programme in place, the collecting team reached out to the UK Web Archive via the National Library of Scotland (NLS) for support in collecting these valuable records of life during the pandemic.

Covid-19 CallOut

In this post, we discuss this collaboration and how Covid-19-related web resources are integrated into the wider collection at the University. We also discuss how the Initiative aligns with existing collecting policies but also provides us with an opportunity to establish approaches for more active collecting. These new approaches are not temporary but will provide lasting innovations that will support more responsive (and therefore representative) collecting of the University’s diverse communities and activities beyond the pandemic.

Selection of Web Resources: Donations and Active Collecting
The team of archivists looking after the Initiative has taken a two-fold approach for considering what to include. Primarily, a range of web-based works have been submitted by members of the community, including student publications and tweets. In addition to these submissions, the team has been actively identifying relevant web resources, such as official University communications and research activities, to capture a meaningful sample. Identified materials include: 

  • University communications such as emails to staff and students, news feeds, and information webpages 
  • Remote learning resources and websites for projects and initiatives created by staff members and research centres  
  • Resources created by and for the University's students and alumni such as networking groups on social media, blogs, and webpages offering advice and guidance 

Edinburgh Uni C19 response

This approach to actively selecting contemporary content for the Archive is relatively unusual (though not unprecedented). Typically, the archivist intervenes at the ‘end of life’ of a collection. The traditional archival process of collecting materials at the end of a project, or even at the end of a researcher’s career - involving multiple conversations and usually in-person donation - does not support active, contemporaneous collecting. 

Websites can change rapidly or disappear altogether. The files or links embedded in websites may break or move location within months (or sooner!). Therefore, archivists don’t have time to wait for web resources to amass over time and don’t have a crystal ball to predict what content will grow into cohesive collections. Web archiving provides a method for capturing contemporary, born digital resources like those surrounding the pandemic in a rapid, proactive way. 

Collaborative Processes for Collection-Building  
Working with the UKWA has allowed us to get started with capturing these web resources through access to their technical infrastructure and, very importantly, their valuable expertise. The UKWA uses a tool with a web interface for selecting and managing web resources – Annotation and Curation Tool – which has made collaborative collection-building much easier. The tool is well-documented (so great for newbies!) and staff possess wide knowledge of methods for capturing and curating web resources. It’s not a surprise that the UKWA has a well-established history of collaborating with external specialists to build topical collections around different subjects. This experience has made it relatively straightforward for us to develop a set of procedures.

W3ACT

Capturing and Contextualising Web Resources 
With the help of Eilidh MacGlone, the Web Archivist at NLS, we have begun to add relevant web resources (either submitted or actively selected) using ACT. We assign these captured resources to a dedicated collection: Collecting Covid-19 Initiative of the University of Edinburgh. This University of Edinburgh collection sits alongside other collections within UKWA related to the coronavirus pandemic. In fact, many of the web resources selected for collection have already been added to the UKWA by other curators like Eilidh. Therefore, the dedicated University of Edinburgh collection both provides a home for the web resources in the CRC’s Initiative and also contributes to the growing collections of web resources documenting this momentous event in the wider UKWA.

By including these web resources in our dedicated collection, we provide important context, often linking them to wider activities at the University or to other related, non-web materials. We can also provide descriptive information supplied at the point of submission by a member of the University community or based on organisational knowledge of the resource and how it relates to our other holdings.  

In addition to adding richer metadata, we enjoy a closer relationship to the creators of these web resources – either through direct consultation or through our existing collecting remit. These relationships enhance the meaning and significance of these archived resources, giving them an anchor to a place and to a community. Our collecting policies also inform the process of review for open access and, where needed, facilitates permission gathering to make as many resources in the collection as possible openly available online.  

Integrating Web Resources into the Wider Collection 
As mentioned, the web content selected for the Initiative will sit in a dedicated collection amongst other UKWA topical collections. However, we want to ensure the web resources remain integrated with other materials in the CRC Initiative’s collection in different formats. Though we don’t have anything to share yet, we plan to create catalogue entries for web resources with a link to the UKWA access portal. This way the end user will have a single point of entry to all the materials in the collection, with web resources just one click away. One caveat, without an open access licence, these links will only be accessible via terminals on-site at the Legal Deposit Libraries. We anticipate most of our users at the CRC will expect to be able to view web resources on the web. Therefore, we are highly motivated to ensure as many of the web resources are granted open access licences as possible. 

Open access for archived web pages that clearly form part of the University’s web estate and fulfil the criteria the University Archives’ collecting policy is relatively straightforward. However, many submissions to the Initiative have been created on third party platforms, outside the University’s web estate. Others have been developed collaboratively, with significant contributors from outside the University community. In these instances, it may prove more complicated to grant open access and therefore more complicated to make available remotely, online.   

In addition to links to the archived web resources themselves, we aim also to create some basic guidance about web archives and how they can be accessed and used. Though plans are still in the works, this guidance would likely sit on our public interface or possibly on individual catalogue records. We hope this informational metadata will help facilitate wider use of archived web resources in research but also prompt users to ensure their own web content is being archived and looked after. First things first, however, we’re busy building our own internal knowledge about web archiving. (So much to do! So many possibilities!) 

A Learning Experience 
As we have begun adding web resources to our collection, we have learned a great deal about web archiving, ACT, and procedures at the UKWA (largely informed by Legal Deposit legislation and restrictions). We’ve found that many types of web resources evade the crawlers, requiring adjustments to records on ACT … and many emails to Eilidh at NLS. More complex pages or content on third party platforms, as opposed to ‘flat’ web pages, pose real challenges to collecting a complete, authentic copy. Ultimately, finding the time to sit down and add web resources to the collection has been the greatest challenge of all. The team of archivists looking after the Collecting Covid-19 Initiative – including all formats of content not just web – have other core responsibilities (and, like most, the added complication of trying to translate our jobs to home working).  

The Collecting Covid-19 Initiative is still live and actively receiving submissions. Our Cataloguing Archivist Aline Brodin regularly surveys University outputs to identify relevant resources. We have begun reaching out to different groups and communities across the University to request input into the direction of our collecting and improve diversity and representation. We expect the nature of submissions and identified materials to evolve as the situation evolves and, as we gain experience in web archiving, we expect our procedures and approaches to evolve as well.  

Though we are at the very beginning of our journey, we hope our own little corner of web resources related to the pandemic will enhance wider collections about Covid-19 and how different communities have responded in real time.   

Multiple Approaches  
While the collaboration with the UKWA to build our own collection of web resources related to the pandemic is beyond valuable, there are some limitations to this approach (as discussed above). One is technical – the infrastructure used by the UKWA (the Heritrix-based crawlers) are built for scale not detail. As a result, there are a few web resources we have struggled to capture. The other limitation is practical – the archives team at Edinburgh only has minimal permissions in the UKWA system (to ensure the integrity of the archived content it holds). Therefore, many basic functions – such as quality assurance and granting open access licences – must go through Eilidh at NLS. The UKWA team are incredibly busy and their capacity to support individual queries is limited (they are after all archiving the UK web…).  

Therefore, we have pursued an alternative approach for a small portion of selected content using Webrecorder Desktop. This approach comes with its own limitations. Webrecorder is a tool built to capture complex, often interactive web resources. However, to enable this ‘high-fidelity’ approach, the tool requires a curator to click every link and every button to trigger a capture. This makes Webrecorder a time-consuming approach to capturing web resources – especially large ones. Furthermore, the output of Webrecorder is a WARC file. While WARC files are the gold standard for preservation, they pose a barrier for access. The typical user of CRC collections is unlikely to know what a WARC file is and even less likely to know how to access and view one.  

Conifer CAHSS blog

Despite these limitations, the team has devised a workflow that uses Webrecorder for selected web resources that cannot be captured through UKWA. This capture of a University blog ‘Covid-19 Perspectives’, for example, was captured using Webrecorder Desktop and the similar web-based service Conifer. The WARC files exported from Webrecorder will be ingested into our preservation system and possibly made available by request by users. We’re currently exploring the possibility of an institutional account with Conifer – who provides a web-based service for capturing and sharing archived web resources. This way, we could provide access via a link embedded in our catalogue, exactly the same way as for UKWA resources. This approach would create a more seamless user experience, though also relies on a third party platform for continued access.  

Conclusion
Though our collaboration with the UKWA and experiments with other web archiving tools focuses on the Collecting Covid-19 Initiative, we hope to apply these lessons learned to different areas of collecting. The archives team has started conversations with the University’s web team to discuss plans for archiving the web estate as a vital record of the institution's history. I’ve delivered a few tutorials on the basics of web archiving for different staff across the Library, including how-to sessions for Webrecorder Desktop and submitting URLs to the UKWA. I’ve also started discussions with a research data management colleague about building services for researchers to capture and deposit web and social media content as part of their research outputs.  

If this experience has taught us anything, it’s that none of these undertakings will be possible without close collaboration and willingness to test out new methods and tools. While the scale of resources that need to be archived can seem daunting, I’m confident the incremental progress we make will ensure a much richer, more authentic record makes it to the future.  

More Information
To learn more about the approach to collecting materials (in all formats) for the University of Edinburgh’s Collecting Covid-19 Initiative, see 'Collecting Covid-19: an initiative to document the University’s community response to the pandemic', by Lorraine McLoughlin and Sara Day Thomson, COVID-19 Perspectives blog, College of Arts, Humanities, and Social Sciences at The University of Edinburgh, https://blogs.ed.ac.uk/covid19perspectives/2021/03/08/collecting-covid-19-an-initiative-to-document-the-universitys-community-response-to-the-pandemic-by-lorraine-mcloughlin-and-sara-day-thomson/

10 March 2021

British Science Week and the UK Web Archive

It is British Science Week!

Britain has a fantastic record of pioneering science that has continued into the digital age. The UK Web Archive aims to archive as much of this online scientific output as we can. Here are just some of the many science websites in the archive. Don't forget that anyone can suggest UK science websites for the archive here: www.webarchive.org.uk/nominate

Science Sparks website
https://www.webarchive.org.uk/wayback/en/archive/20200408080530/https://www.science-sparks.com/

Science collection

The Science collection was started in 2020 and now contains over 200 different websites. The collection is wonderfully diverse - from the the British Bryological Society (Mosses and Liverworts) to the online identification site British Bugs. The collection aims to cover all areas of British science, including science communication.

Cambridge Science and Stephen Hawking

We have worked closely with our partner Cambridge University Libraries to capture the amazing science undertaken at Cambridge University. This, of course, includes the late Stephen Hawking a Professor at the Centre for Theoretical Cosmology,  famous for groundbreaking research into Black Holes amongst other things.

Centre theoretical cosmology website

Darwin 200

To celebrate the 200th anniversary of the birth of Charles Darwin in 2009, staff at the British Library put together the Darwin 200 collection. Would you like to read the complete works of Darwin, try Darwin Online?

Darwin online website

Other collections that include science elements are:

Do get in touch with suggestions for inclusion of more UK science websites: www.webarchive.org.uk/nominate

#BritishScienceWeek