UK Web Archive blog

100 posts categorized "Web/Tech"

08 July 2021

London’s Olympic Legacy: Local, National and International Aspirations

Add comment

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

For two years, I have been studying the media coverage of London and Rio’s Olympic legacies. See the previous posts, where I explained the project’s main objective of understanding and conceptualizing the meaning of the word legacy based on the news coverage of the Games. I have also written about how controversial the word 'legacy' can be once it is a term under dispute by several actors in the political arena. In the most recent post, I introduced the use of SHINE as a platform for exploratory analysis of news events and I briefly described how it was a useful tool for my research project.

Olympic Aquatic centre, London

The Approach
In this post, I aim to discuss the different approaches taken by news organisations, government websites and activist blogs to the legacy of the London Olympics. Although my initial interest was mainly focused on understanding the journalistic framing of legacy, looking at other sources has proved to be beneficial in a comparative perspective. For this purpose, I searched for articles on ‘Olympic legacy London’ via SHINE and selected, among the 10 domains provided by the platform, three news websites (, and, one official government website ( and one activist blog (

The Research
Texts were collected, processed, cleaned and filtered using Python scripts and combined with articles extracted from the live web. The data was ranked and the top 50 bigrams (co-occurrence of two words) mentioned in the texts were transferred to a spreadsheet using the Natural Language Toolkit (NLTK) - a suite of Python libraries for linguistic analysis. The list of trends was then used in a first distant reading to give a sense of the most discussed topics and then combined later on with a more qualitative approach of close reading for a deep understanding of context.


These bigrams have revealed a significant difference in the way the Olympic legacy of London was approached by different sources from 2004 to 2020. Among the most cited bigrams by news publishers are ‘young people’ and ‘school sport’, both referring to the promises included in the legacy plan of London published in 2008 by the Department for Culture Media and Sport (DCMS). Promises number 1 and 3, entitled ‘making the UK a world-leading sporting nation’ and ‘inspiring a generation of young people’, included the engagement of young people in physical activities by increasing the offer of high-quality sports. The drop in the number of 16 to 25-year-olds playing sport after the games was one of the main topics highlighted by the media.

While both ‘young people’ and ‘school sport’ are a response to the legacy plan published by the DCMS, the most mentioned bigram in the list of texts analysed did not receive much attention in the document: ‘west ham’.

The destiny of the Olympic Stadium became one of the most controversial events around the Olympic legacy of London. Initially, the disagreement on whether it should remain as an athletics venue or be handed over to West Ham United drew the attention of the media with important voices like the Olympics Minister Tessa Jowell and ex-London mayor Ken Livingstone supporting the opposition against the football club. The dispute between West Ham and Tottenham for the Olympic Stadium and the threat of becoming a ‘white elephant’ - a recurrent fear in recent Olympic history shed light on the place as a symbol of London’s Olympic legacy.

The media coverage of London’s legacy contrasts with the much more abstract and broader bigram found in the texts published by the British government: ‘international inspiration’. Articles published by have revealed as focused mainly on The International Inspiration programme, a project to promote sports in ‘some of the most disadvantaged communities in the world’. While the media seemed to be looking for internal issues, the government was targeting international audiences. The choice of the word ‘inspiration’ references a much more immaterial and abstract idea of legacy that contrasts with the very concrete discussion around the Olympic Stadium hosted by the media.

Looking at the bigrams obtained from activist blogs, the concerns are shown to have been more local, targeting primarily challenges faced by citizens of East London. Among the main bigrams are ‘Stratford City’, ‘new jobs’ and ‘public housing’. The community-focused approach highlights a significant discrepancy between the framing of the event. These are preliminary steps to understand the multiple ways in which London’s legacy has been understood and narrated. The different perspectives indicate a distance between immediate public interest and government official communication regarding the most important sporting event in the world.

The Summer Olympic Games are hosted every four years by a different global city bringing together its promises to be an urban development catalyst and also the past events frustrations. Understanding the communication processes around the Olympics is fundamental for the future planning of effective legacies that correspond to the interests of the nations’ citizens.

*This post summarizes the preliminary results presented in my talk at ‘Documenting the Olympics and the Paralympics’, an event organised and hosted by the British Library in collaboration with the British Society of Sports History (BSSH), the International Centre for Sports History and Culture at De Montfort University (ICSHC) and the School of Advanced Study (SAS).

**This research is part of the CLEOPATRA Innovative Training Network, funded by the European Union’s Horizon 2020 research and innovation programme. It has been conducted under a PhD developed at the School of Advanced Study, University of London. For more information:

30 June 2021

Alternative Sports in the UK Web Archive - Part 2

Add comment

By Jason Webber, Web Archive Engagement Manager, British Library

The latest post in the #WebArchiveSummerOfSport season that highlights the amazing UK sporting activities that have been captured in the UK Web Archive. This time we look at some popular recreational sports and pastimes.

Participation in outdoor and 'wild' swimming has increased over the last few years. Organisations such as Wild Swimming and the Outdoor Swimming Society give great information and guidance on this popular activity. One of the oldest and most famous outdoor swimming clubs is the Serpentine Swimming Club most famous for it's Christmas day race - The Peter Pan Cup. Bracing!

Peter pan cup londonist website

Peter pan Cup - Serpentine Swimming Club - Londonist Website, (archived 2017)

Hiking, Walking and Rambling
Walking in the countryside of the UK has a long tradition, helped by excellent maps by the Ordnance Survey and the wonderful web of public footpaths. These rights of way sometimes had to be fought for and the example of the 1932 mass trespass of Kinder Scout in the Peak District being one.

Peakland Heritage website - Kinder scout 1932

Peakland Heritgae website, archived 2013

There are many walking and hiking clubs in the UK, one of the most famous being the Ramblers Association. See this local branch in Swansea, based near the beautiful Gower Peninsula.


Swansea Ramblers website, archived 2014

Kayaking, Canoeing and Stand Up Paddle boarding
Messing about in boats is, of course, historically a wide-spread pastime in the UK, especially given the long coastline and many navigable waterways. Kayaking/canoeing takes many forms from the adrenaline ride of white water to the majesty of sea kayaking or just (literally) going with the flow. Much newer is 'Stand Up Paddle boarding' (SUP) where you stand on what looks like a surfboard and paddle along.

British Canoeing website - white water

British Canoeing website, archived 2015.

Canoe Wales website SUP

Canoe Wales website, archived 2017


24 June 2021

Scottish Sport in the UK Web Archive

Add comment

By Trevor Thomson, Curator, National Library of Scotland

The latest in the #WebArchiveSummerOfSport series on the amazing range of UK sport that has been collected in the UK Web Archive.

It’s difficult to ignore; millions participate and tens of millions watch - sport is one of Britain’s national passions and to many is an obsession. This love for playing and watching sport is reflected in the proliferation of websites dedicated to everything from humble clubs to massive governing bodies running sports worth billions.

High profile to Grass roots
The big organisations, of course, already have a high profile but websites for minority sports or clubs at grass roots level are less likely to get much publicity. Their online presence becomes the record of their existence, what they do, and how they have performed. While once upon a time clubs might have produced a small book, perhaps to mark a centenary or significant success, many more are now continually updating their histories creating an ongoing narrative of their activities.

Scottish Sport
In modern times Scots participate in all sorts of sport. However, there are some games that are particularly associated with Scotland, and therefore have attracted concentrated activity in terms of website collection – golf, shinty, curling, Highland games and football.

Golf in Scotland is so old it seems that there are ongoing disagreements about exactly how and where it started. St. Andrews in Fife, Musselburgh in East Lothian, and Leith Links now in Edinburgh, all make claims of great age, first formalised courses or original set of rules. What these places all have in common is that they are on the East Coast of Scotland and it is these links courses almost synonymous with Scottish golf. However, just about every town and lots of smaller places in Scotland has at least one golf course, as the exercise to identify the relevant websites proved – 683 separate websites mainly for clubs, but also for national and local associations, courses, the history of golf as well as golf news and marketing. However, a site like Forgotten Greens shows that an established game like golf was once even more widespread in the country of its foundation.

If early pictures of golf are anything to go by, a caman (a stick for playing shinty) looks remarkably like a primitive golf club – there the similarity ends, for shinty is a team sport a bit like hockey, but much more like its Irish cousin hurling. Played with the caman and hard ball akin to a baseball, camans fly in shinty as the ball can be controlled in the air, so, to an outsider, the game can look incredibly dangerous. Mainly, but not exclusively, played in the Highlands and West Coast of Scotland, shinty was formalised by its governing body The Camanachd Association in 1893 – and its location in the less populated areas of Scotland means that it is represented in the Web Archive by a relatively small number of sites, a small but elegant fifty-nine, including the Camanachd Association’s own site which has much about the development of the game, as well as sections for women’s and school’s shinty.

Shinty - Sport scotland website

Major Investment for Shinty from the Sport Scotland website, archived 2014.

While shinty is very Scottish, curling is both international and very Scottish – teams from around the world compete in the Olympic and World Championship events, with Canada being fairly dominant and notable successes by Sweden and Switzerland. However, it must be one of the few sports that Scotland can claim to have recently been World Champions – and it is a game that has been formalised in Scotland for centuries. For example, the claim for the oldest curling club goes to Kilsyth Curling Club (who in keeping with their foundation in 1716, don’t actually have a website), and there are many others older than the governing body, the Royal Caledonian Curling Club (RCCC), a relative arriviste having been formed in 1838. The great age of these clubs suggests that Scottish winters must have delivered frozen ponds and lochs on a regular basis for the sport to flourish; and while outdoor curling is not a thing of the past, it is mainly played in purpose built rinks and ice arenas throughout Scotland.

Royal Caledonian Curling website

The Royal Caledonian Curling Club, archived in 2015.

Given the relatively small number of these rinks, there are an incredible number of curling clubs in Scotland – 584 listed on the RCCC website (although today they are less romantically known as Scottish Curling). While not every club has a website, at least 300 do, and those that do not are represented on provincial sites. However, looking at all this material shows that one of Scotland’s older sports is still thriving and its players are ranked among the best in the world.

Highland Games
Perhaps the most traditional of Scottish sports websites cover Highland Games. Part summer community event, part cultural festival, ‘games’ covers more than just the stereotyped heavy events like caber toss or stone put, as featured on the box of a famous brand of porridge. The events tend to be bespoke for the particular games, but will usually include athletics, bagpiping competitions and Highland dance - indeed, in Scotland, Highland dance is affiliated to the central sports agency Sport Scotland. The non-sport element is perhaps why such events are often known as ‘Highland Gatherings’, which also has connotations of the clan.

Highland game traditions - website

Highland Game Traditions at the website, archived 2015

Healthy at home (at least until a pandemic came along) Highland games are also one of Scotland’s cultural exports, delivered by the Scots diaspora particularly in North America and Australia/New Zealand, but also making an appearance in places like Brazil and the Czech Republic – of course websites for overseas games elude us, unless we ask for owners’ permission to make copies. However, when collating a list of Highland Games in Scotland, it is notable, as mentioned, that they are not always games nor are they always Highland, with events staged in places like North Berwick (East Lothian) or Ardrossan (North Ayrshire) – around 80 websites have been collected specifically about Highland games/gatherings in Scotland.

As noted, organised sport in Scotland has incredibly deep roots thanks in part to our English neighbours. Indeed, during the current festival of international football, it is worth noting that the first international match was staged at the West of Scotland Cricket Club in Glasgow in 1872 – like the recent clash, an inconclusive 0-0 draw. Despite Scotland’s relatively humble status in the professional game, football is incredibly popular all over the country, with hundreds of clubs playing a bewildering number of local leagues and tournaments, from Shetland to the Rhins of Galloway.

The senior clubs’ websites were easy to identify as were reasonably large teams playing in amateur, junior (‘junior’ in this context means ‘not senior’ as opposed to young people) and grass roots levels. However, just about everywhere in Scotland has a football club, often changing names and rebranding over time - the consequence is that there are hundreds of sites for obscure clubs all over the country.

Tartan kicks website

‘Scottish Women’s Premier League’ from Tartan Kicks, on the UK Web Archive.

The effort also helps capture the underrepresented area of women’s football - most sizeable clubs have a women’s section, but there are numerous clubs that have been established specifically for women and girls, while the work of the Scottish football authorities are captured in their site and the women’s leagues in Scotland.

So, thousands of sites have been found and included in the UK Web Archive relating to Scottish sport, an ongoing record of sport and its vagaries over time. While this has a direct benefit of creating a sports archive, looking in detail for this kind of material uncovers other organisational sites related to places in Scotland that would otherwise be difficult to find. And of course the sports team might be the only representation of a place, worth recording to put it on the map. The popularity of Scottish sports all over the country draws out the ongoing life of communities and their histories.

16 June 2021

Football Associations in the UK Web Archive

Add comment

By Jason Webber, Web Archive Engagement Manager, British Library

This post is part of the #WebArchiveSummerOfSport series highlighting the many sporting websites that have been collected in the UK Web Archive.

Euro 2020
The European Championships 2020 (taking place in 2021) are underway across many venues across europe. In this tournament Scotland, England and Wales are all competing with England and Scotland in the same group. Eight games will take place in UK venues such as Wembley, Hampden Park.

Football Associations
The national football teams are organised and managed by their respective Associations. You can map many of the messages and developments at all levels of football through their websites in the archive.

Welsh FA
Here is the Welsh FA archived website from May 2010 that shows the senior men's squad. At the time Gareth Bale was a Defender at Tottenham Hotspur. 

Screenshot 2021-06-16 at 08.46.09

The National Library of Wales has recently written a blog on the Welsh Team at Euro 2016.

The English FA
This archived website from May 2016 shows The FA just before the previous European Championships. Use the calendar view to see the messages from after the tournament.

Screenshot 2021-06-16 at 08.48.10

Scottish FA
This is a screen shot of the live website of the Scottish FA. We collect the scottish FA website which is available to view in Reading rooms of UK Legal Deposit Libraries.

Scottish FA website

Good luck to all the home nations in the Euros! 

09 June 2021

Alternative Sports in the UK Web Archive - Part 1

Add comment

By Jason Webber, Web Archive Engagement Manager, British Library

Welcome to the UK Web Archive 'Summer of Sport' season! Over the next few months we will show the many ways that sport is represented in the web archive.

Let's start with some of the more quirky and unusual 'sports' played in the UK:

Cheese Rolling Championship

Brave competitors chase a wheel of cheese down the terrifyingly steep Cooper's hill (1 in 1 in places) in Gloucestershire. First one to the bottom is the winner! The prize is a 7-8lb wheel of Double Gloucester cheese!

Cheese Rolling Championship website

The official Cheese Rolling Championship website in 2008.

The Chap Olympiad

What sport can there be for the well dressed 'person-about-town'? The 'Chap Olympiad' of course.

"A series of challenges ensue ranging from the frantic and frenetic to the barely mobile. The Tea Pursuit and Umbrella Jousting (where participants clamber aboard a bike holding an umbrella and a briefcase) see what is possibly the first use of Boris bikes as part of a sporting contest. The Tug Of Hair pits two teams against each other, pulling on a twenty feet long moustache until one team topples over. In Well-Dressage, individuals mount hobbyhorses and prance around to music while Not Tennis is the epitome of anti-sport with two players invited to do anything but play tennis."

Chap olympiad - Londonist website

Photos of the Chap Olympiad from the Londonist website in 2016.

Bog Snorkeling

If an athlete is not afraid of a spot of mud, what better event than the Bog Snorkeling Championships! Competitors aim to complete two consecutive lengths of a 60 yards (55 m) water-filled trench cut through a peat bog in the shortest time possible, wearing traditional snorkel, diving mask and flippers.

"Event rules state that no recognised swimming strokes are allowed at the event so it all comes down to honing down the perfect technique to power through the murky water."

Bog snorkling - Visit Wales Blog

Bog Snorkeling on the Visit Wales website from April 2013

World Conker Championships

Threading a piece of string through a horse chestnut seed and hitting another one has been a long standing feature of school playgrounds. Conkers, however, is a serious business and over a thousand are used in each World Championship contest!

World conker championships

Photo of the World Conker Championship 2016.

BBC News article on the World Conker Championship in 2004.

We aim to capture all aspects of UK life including the sporting life. If you have a UK sport website that you would like to suggest for the web archive, nominate it here.



07 June 2021

Curating the UKWA LGBTQ+ Lives Online collection

Add comment

By Ash Green and Steven Dryden - LGBTQ+ Lives Online lead curators

The LGBTQ+ Lives Online collection has been live now for almost a year. This has given us the time to gain a better understanding of the content in it, and also how people are interacting with the collection. Based on this, we wanted to consider some of the challenges around structuring, tagging and representing sites within it.

Sub-Collections & Subject tagging

When the collection was set up, one of the tasks that needed to be undertaken was defining the structure of the sub-collections. At this stage they are organised as follows:

  • Activism/Pride
  • Arts, Literature, Music & Culture
  • Business/Commerce
  • Education
  • History
  • Medicine and Community
  • Policy and Legislative Change
  • Religion
  • Social Organisations
  • Sport

We defined the sub-collections based on what we thought would be added to the collection, without actually knowing what the majority of that content might be. As more sites have been added we can see that some sub-collections work well and others not so well. We are getting a sense of which sub-collections might need to be revisited.

One sub-collection that at this stage requires more consideration in terms of whether it should be changed or split, is Medicine & Community. When we set this up, it felt like a logical pairing – both aspects of the sub-collection are about well-being, with one indicating it’s about medical support, and the other about wellness achieved through peer support. But now, as we add more sites to this sub-collection, the terminology doesn’t feel quite right. This is especially true when sites focused more on well-being, emotional support and guidance, such as Spectra and Outline Surrey are included in the sub-collection. Possibly a more appropriate sub-collection name would be Health & Community, which would still allow the inclusion of medical and community wellness, but under a clearer umbrella.

No homophobia, no violence t-shirt

When the collection was set up, Retirement also featured as a sub-collection. We eventually removed this before go-live. Not because it wasn’t relevant, but because there was insufficient online content within the collection to justify including it at this stage. That said, that may change over time and an increase in sites focusing on both retirement and older LGBTQ+ people’s lives may result in us re-instating it or a similar sub-collection. Similarly, other themes might rise out of existing sites in the collection that would require new sub-collections to be added, or even new subjects to be included. Part of the key purpose of the collection is to not only archive appropriate web sites, but to also make them findable via the sub-collections.

As well as adding sites to sub-collections within the LGBTQ+ Lives Online collection, they can also be assigned to other collections and sub-collections. For example, Graces Cricket Club (a gay cricket club) appear in both the LGBTQ+ Lives Online / Sport sub-collection and the separate Sport: Football collection. In cases like this, there’s no question that it’s perfectly appropriate to include this site in both subject collections. However, in some instances, LGBTQ+ sites have also been previously included in inappropriate sub-collections. For example, one site in the collection had previously been assigned to Medicine and Health / Conditions & Diseases sub-collection before the LGBTQ+ Lives

Online project began. This incorrectly implied that being an LGBTQ+ person was either a “condition” or a “disease”. This has been corrected, but it highlights that we also need to be aware that choosing which collection or sub-collection we add a site to has implications about how a curator perceives that site, and the negative bias we may in turn present to collection users by including a site in an inappropriate sub-collection.

Content Warnings
Another area we are considering is content warnings. When we recently ran an online session about the collection, we were asked if any content warnings were included in the descriptions of sites tagged within the collection. Another person also expressed concern about the inclusion of sites within the LGBTQ+ Lives Online collection that were negative or hostile towards members of the LGBTQ+ community. Though these sites are included, they do not provide content warnings about their harmful and negative perspectives or context about their inclusion. Again, this is a valid comment, and content warnings would help identify that users were about to enter a site whose perspectives might be problematic or triggering.

Keyboard - caution

You may also be wondering why sites such as the ones that are negative or hostile towards members of the LGBTQ+ community are included in the collection? It goes back to the purpose of this project, which is to archive UK sites that reflect UK LGBTQ+ lives and experiences. This includes positive, neutral and negative sites if relevant. For example, we include at least one site in the collection that questions the validity of trans and gender non-conforming people as apart of the LGBTQ+ community. If we didn’t include this site, it would not give a balanced picture of trans people’s experience, as it would miss out on a key factor that has had a huge impact on many trans lives over the past few years. As such, even though we do not agree with questioning the validity of trans and gender non-

conforming people, those sites are valid to LGBTQ+ research and discussion. But it’s not just sites like these that we would consider including a content warning against. Any sites highlighting LGBTQ+ phobic or hate content may also be included.

Content warnings are not something we’ve considered before, and at present, the cataloguing rules for the UK Web Archive collection don’t have capacity for the inclusion of content warnings. However, following on from these conversations, it is something we need to address, along with highlighting that including content within the collection does not necessarily mean that the curators agree with the opinions in those sites.

The structure of the sub-collections and content warnings are areas that we want to address as soon as we can, and it is something we would like to discuss with the wider LGBTQ+ community. How we achieve that is yet to be decided, but we are always open to suggestions.

In the mean-time, don’t forget that you can explore the LGBTQ+ Lives Online UK Web Archive collection.

You can also nominate sites for inclusion in the collection.


17 March 2021

Shakespeare in the UK Web

Add comment

By Jason Webber, Web Archive Engagement Manager, The British Library

It's Shakespeare week (15-21 March). William Shakespeare is, almost certainly, the most quoted literary figure (in English) and the popularity of his plays and poems endures into the digital age. His work is continuingly being taught, examined, analysed and most of all, quoted on the internet. Often quoted in unlikely places such as 'Now is the winter of our discontent' on the Butterfly Conservation website.


Most Popular?
What are the most popular Shakespeare quotes? Perhaps unsurprisingly 'To be or not to be" has far and away the most mentions in our SHINE service - all .uk websites collected 1996-2012 (JISC dataset obtained from the Internet Archive):

Shakespeare quotes 01

Shakespeare quotes from SHINE

If we take away "to be or not to be" this graph looks even more interesting:

Shakespeare quotes 02

Shakespeare quotes from SHINE

Want to try your own Shakespeare quotes in our SHINE service?

  1. Go to the trends page of SHINE:
  2. Add a word or phrase into the input box, NOTE: phrases should go in quotes e.g. "all that glisters"
  3. To compare multiple words or phrases, separate by a comma e.g. "william shakespeare", "christopher marlowe, "ben johnson"
  4. Click on any point in the graph to see examples of the context the word or phrase was used
  5. Enjoy!

Do let us know your own favourite quotes on Twitter: @UKWebArchive

12 March 2021

University of Edinburgh’s Collecting Covid-19 Initiative: Collaborative Collection Building with the UKWA

Add comment

By Sara Day Thomson (Digital Archivist), Lorraine McLoughlin (Appraisal Archivist), and Aline Brodin (Cataloguing Archivist), University of Edinburgh  

With thanks to Eilidh MacGlone, Web Archivist, National Library of Scotland and UK Web Archive 

The University of Edinburgh’s Centre for Research Collections (CRC) – which includes collections housed in libraries, archives, galleries, and museums – launched the Collecting Covid-19 Initiative in late April 2020. The Initiative invites staff, students, and anyone affiliated with the University to donate any materials that document their experiences of the pandemic and lockdown. Websites, photographs, videos, artwork, and all other materials are welcomed. In preserving a range of materials and formats for the long term, the CRC aims to prevent gaps in memory and to preserve a record of the University’s response to the pandemic.  

CRC Montage

As submissions began to come in via our online form, it became evident that online communications and platforms have played a critical role in how the University community has responded to the pandemic and lockdown. Even some submissions in more ‘traditional’ formats, like images or narratives, have been published online and submitted as a URL. In addition, web-based submissions range from ‘flat’ websites to social media posts to content shared on third party platforms. However, with no web archiving programme in place, the collecting team reached out to the UK Web Archive via the National Library of Scotland (NLS) for support in collecting these valuable records of life during the pandemic.

Covid-19 CallOut

In this post, we discuss this collaboration and how Covid-19-related web resources are integrated into the wider collection at the University. We also discuss how the Initiative aligns with existing collecting policies but also provides us with an opportunity to establish approaches for more active collecting. These new approaches are not temporary but will provide lasting innovations that will support more responsive (and therefore representative) collecting of the University’s diverse communities and activities beyond the pandemic.

Selection of Web Resources: Donations and Active Collecting
The team of archivists looking after the Initiative has taken a two-fold approach for considering what to include. Primarily, a range of web-based works have been submitted by members of the community, including student publications and tweets. In addition to these submissions, the team has been actively identifying relevant web resources, such as official University communications and research activities, to capture a meaningful sample. Identified materials include: 

  • University communications such as emails to staff and students, news feeds, and information webpages 
  • Remote learning resources and websites for projects and initiatives created by staff members and research centres  
  • Resources created by and for the University's students and alumni such as networking groups on social media, blogs, and webpages offering advice and guidance 

Edinburgh Uni C19 response

This approach to actively selecting contemporary content for the Archive is relatively unusual (though not unprecedented). Typically, the archivist intervenes at the ‘end of life’ of a collection. The traditional archival process of collecting materials at the end of a project, or even at the end of a researcher’s career - involving multiple conversations and usually in-person donation - does not support active, contemporaneous collecting. 

Websites can change rapidly or disappear altogether. The files or links embedded in websites may break or move location within months (or sooner!). Therefore, archivists don’t have time to wait for web resources to amass over time and don’t have a crystal ball to predict what content will grow into cohesive collections. Web archiving provides a method for capturing contemporary, born digital resources like those surrounding the pandemic in a rapid, proactive way. 

Collaborative Processes for Collection-Building  
Working with the UKWA has allowed us to get started with capturing these web resources through access to their technical infrastructure and, very importantly, their valuable expertise. The UKWA uses a tool with a web interface for selecting and managing web resources – Annotation and Curation Tool – which has made collaborative collection-building much easier. The tool is well-documented (so great for newbies!) and staff possess wide knowledge of methods for capturing and curating web resources. It’s not a surprise that the UKWA has a well-established history of collaborating with external specialists to build topical collections around different subjects. This experience has made it relatively straightforward for us to develop a set of procedures.


Capturing and Contextualising Web Resources 
With the help of Eilidh MacGlone, the Web Archivist at NLS, we have begun to add relevant web resources (either submitted or actively selected) using ACT. We assign these captured resources to a dedicated collection: Collecting Covid-19 Initiative of the University of Edinburgh. This University of Edinburgh collection sits alongside other collections within UKWA related to the coronavirus pandemic. In fact, many of the web resources selected for collection have already been added to the UKWA by other curators like Eilidh. Therefore, the dedicated University of Edinburgh collection both provides a home for the web resources in the CRC’s Initiative and also contributes to the growing collections of web resources documenting this momentous event in the wider UKWA.

By including these web resources in our dedicated collection, we provide important context, often linking them to wider activities at the University or to other related, non-web materials. We can also provide descriptive information supplied at the point of submission by a member of the University community or based on organisational knowledge of the resource and how it relates to our other holdings.  

In addition to adding richer metadata, we enjoy a closer relationship to the creators of these web resources – either through direct consultation or through our existing collecting remit. These relationships enhance the meaning and significance of these archived resources, giving them an anchor to a place and to a community. Our collecting policies also inform the process of review for open access and, where needed, facilitates permission gathering to make as many resources in the collection as possible openly available online.  

Integrating Web Resources into the Wider Collection 
As mentioned, the web content selected for the Initiative will sit in a dedicated collection amongst other UKWA topical collections. However, we want to ensure the web resources remain integrated with other materials in the CRC Initiative’s collection in different formats. Though we don’t have anything to share yet, we plan to create catalogue entries for web resources with a link to the UKWA access portal. This way the end user will have a single point of entry to all the materials in the collection, with web resources just one click away. One caveat, without an open access licence, these links will only be accessible via terminals on-site at the Legal Deposit Libraries. We anticipate most of our users at the CRC will expect to be able to view web resources on the web. Therefore, we are highly motivated to ensure as many of the web resources are granted open access licences as possible. 

Open access for archived web pages that clearly form part of the University’s web estate and fulfil the criteria the University Archives’ collecting policy is relatively straightforward. However, many submissions to the Initiative have been created on third party platforms, outside the University’s web estate. Others have been developed collaboratively, with significant contributors from outside the University community. In these instances, it may prove more complicated to grant open access and therefore more complicated to make available remotely, online.   

In addition to links to the archived web resources themselves, we aim also to create some basic guidance about web archives and how they can be accessed and used. Though plans are still in the works, this guidance would likely sit on our public interface or possibly on individual catalogue records. We hope this informational metadata will help facilitate wider use of archived web resources in research but also prompt users to ensure their own web content is being archived and looked after. First things first, however, we’re busy building our own internal knowledge about web archiving. (So much to do! So many possibilities!) 

A Learning Experience 
As we have begun adding web resources to our collection, we have learned a great deal about web archiving, ACT, and procedures at the UKWA (largely informed by Legal Deposit legislation and restrictions). We’ve found that many types of web resources evade the crawlers, requiring adjustments to records on ACT … and many emails to Eilidh at NLS. More complex pages or content on third party platforms, as opposed to ‘flat’ web pages, pose real challenges to collecting a complete, authentic copy. Ultimately, finding the time to sit down and add web resources to the collection has been the greatest challenge of all. The team of archivists looking after the Collecting Covid-19 Initiative – including all formats of content not just web – have other core responsibilities (and, like most, the added complication of trying to translate our jobs to home working).  

The Collecting Covid-19 Initiative is still live and actively receiving submissions. Our Cataloguing Archivist Aline Brodin regularly surveys University outputs to identify relevant resources. We have begun reaching out to different groups and communities across the University to request input into the direction of our collecting and improve diversity and representation. We expect the nature of submissions and identified materials to evolve as the situation evolves and, as we gain experience in web archiving, we expect our procedures and approaches to evolve as well.  

Though we are at the very beginning of our journey, we hope our own little corner of web resources related to the pandemic will enhance wider collections about Covid-19 and how different communities have responded in real time.   

Multiple Approaches  
While the collaboration with the UKWA to build our own collection of web resources related to the pandemic is beyond valuable, there are some limitations to this approach (as discussed above). One is technical – the infrastructure used by the UKWA (the Heritrix-based crawlers) are built for scale not detail. As a result, there are a few web resources we have struggled to capture. The other limitation is practical – the archives team at Edinburgh only has minimal permissions in the UKWA system (to ensure the integrity of the archived content it holds). Therefore, many basic functions – such as quality assurance and granting open access licences – must go through Eilidh at NLS. The UKWA team are incredibly busy and their capacity to support individual queries is limited (they are after all archiving the UK web…).  

Therefore, we have pursued an alternative approach for a small portion of selected content using Webrecorder Desktop. This approach comes with its own limitations. Webrecorder is a tool built to capture complex, often interactive web resources. However, to enable this ‘high-fidelity’ approach, the tool requires a curator to click every link and every button to trigger a capture. This makes Webrecorder a time-consuming approach to capturing web resources – especially large ones. Furthermore, the output of Webrecorder is a WARC file. While WARC files are the gold standard for preservation, they pose a barrier for access. The typical user of CRC collections is unlikely to know what a WARC file is and even less likely to know how to access and view one.  

Conifer CAHSS blog

Despite these limitations, the team has devised a workflow that uses Webrecorder for selected web resources that cannot be captured through UKWA. This capture of a University blog ‘Covid-19 Perspectives’, for example, was captured using Webrecorder Desktop and the similar web-based service Conifer. The WARC files exported from Webrecorder will be ingested into our preservation system and possibly made available by request by users. We’re currently exploring the possibility of an institutional account with Conifer – who provides a web-based service for capturing and sharing archived web resources. This way, we could provide access via a link embedded in our catalogue, exactly the same way as for UKWA resources. This approach would create a more seamless user experience, though also relies on a third party platform for continued access.  

Though our collaboration with the UKWA and experiments with other web archiving tools focuses on the Collecting Covid-19 Initiative, we hope to apply these lessons learned to different areas of collecting. The archives team has started conversations with the University’s web team to discuss plans for archiving the web estate as a vital record of the institution's history. I’ve delivered a few tutorials on the basics of web archiving for different staff across the Library, including how-to sessions for Webrecorder Desktop and submitting URLs to the UKWA. I’ve also started discussions with a research data management colleague about building services for researchers to capture and deposit web and social media content as part of their research outputs.  

If this experience has taught us anything, it’s that none of these undertakings will be possible without close collaboration and willingness to test out new methods and tools. While the scale of resources that need to be archived can seem daunting, I’m confident the incremental progress we make will ensure a much richer, more authentic record makes it to the future.  

More Information
To learn more about the approach to collecting materials (in all formats) for the University of Edinburgh’s Collecting Covid-19 Initiative, see 'Collecting Covid-19: an initiative to document the University’s community response to the pandemic', by Lorraine McLoughlin and Sara Day Thomson, COVID-19 Perspectives blog, College of Arts, Humanities, and Social Sciences at The University of Edinburgh,