UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

152 posts categorized "Web/Tech"

22 May 2024

Reflections on the IIPC Early Scholars Spring School on Web Archives 2024

By Cameron Huggett, PhD Student (CDP), British Library/Teesside University

IIPC-2024-Paris-Early-Scholars-Summer-School-banner
IIPC Early Scholars Spring School on Web Archives banner

My name is Cameron, and I am currently undertaking an AHRC funded Collaborative Doctoral Partnership (CDP) project, between the British Library and Teesside University. My research centres on racial discourses within association football fanzines and e-zines from c.1975 to the present, and aims to examine the broader connections between football fandom, race and identity. 

I attended the Early Scholars Spring School on Web Archives, prior to commencement of the conference, which allowed me to knowledge share with colleagues from a number of different countries, institutions and disciplines, offering new perspectives on my own research. Within this school, I was fortunate enough to be able to deliver a short lighting talk, outlining my own use of web archiving within my research into the history of racial discourses within football fanzines. This generated an engaging discussion around my methodologies and led me to reflect upon how quantitative techniques can be better adopted within historical research practices.

I also particularly enjoyed discovering more about the collections of the Bibliothèque Nationale de France (BNF) and Institut National de L'audiovisuel (INA). The scope of the collections and innovative user interfaces were particularly impressive. For example, INA had created a programme that allowed the user to view a collection item, such as an election debate broadcast, alongside archived tweets relating to event in real time.

 My primary takeaway was how web archives can be innovatively employed to record the breadth and depth of online communities and discourses, as well as supplement more traditional sources within a historian’s research framework.  

24 January 2024

Exploring Alternative Access: Making the Most of Web Archives During UK Web Archive Downtime

Nicola Bingham, Lead Curator of Web Archiving, British Library

The British Library is continuing to experience disruption following a cyber-attack and are working hard to restore services. Disruption to some services is, however, expected to persist for several months. In the meantime, our buildings are open and we’ve released a searchable online version of our main catalogue, which contains records of the majority of our printed collections as well as some freely available online resources. Our reference team are on hand to answer queries, advise on collection item availability and help with other ways to complete your work. Please email [email protected] or find out more. The disruption is affecting our website, online systems and services. Please see our temporary website for up-to-date information.

Despite the disruption to access to the UK Web Archive, we continue to crawl or acquire copies of websites, as well as add new websites to our acquisition process which is being undertaken with Amazon Web Services in the Cloud, ensuring that the UK Web Archive collection is updated and preserved as usual.

We appreciate that for regular users of the UK Web Archive, the temporary unavailability of this valuable resource is inconvenient and disruptive. There exist several alternative openly accessible web archives that can serve as sources of information while the UK Web Archive is offline.

Other Openly Accessible Web Archives

Internet Archive: Known as the largest and most comprehensive web archive globally, it includes the famous Wayback Machine and boasts an extensive collection of archived web pages.

Understanding the Differences

While the Internet Archive captures a broad spectrum of global content, the UK Web Archive focuses specifically on the UK web. The UK Web Archive offers comprehensive crawls, curated collections, and secondary datasets for research. However, access is primarily restricted to legal deposit libraries, with some resources available openly.

The Internet Archive allows remote access to archived websites, but its search functionalities and scope differ from the UK Web Archive.

Memento Time Travel: This innovative platform operates under the Memento protocol, allowing users to view archived websites across various openly accessible web archives. It acts as a bridge, enabling access to past versions of web resources stored in archives such as the Internet Archive, Archive-It, UK Web Archive, archive.today, GitHub, and more. While it displays links to Mementos, it doesn’t retain the content itself.

Portuguese Web Archive (Arquivo.pt): Developed by the Portuguese Foundation for Science and Technology, this archive aims to preserve and grant access to the Portuguese web domain and its contents. It also archives a significant amount of European Union and transnational content. It's a valuable resource for preserving the digital heritage of Portugal and contributing to the preservation of European and Portuguese-language online information.

UK Government Web Archive: An openly accessible archive preserving UK central government information, encompassing videos, tweets, images, and websites dating from 1996 to the present day.

UK Parliament Web Archive: This openly accessible archive covers parliamentary websites and social media content from 2009 to the present day.

National Records of Scotland Web Archive: Offering open access, this archive allows browsing and searching of websites related to Scotland’s people and history.

Seeking Information and Resources While the UK Web Archive is offline, the UK Web Archive blog remains accessible and serves as a useful source of information about the archive.

Additionally, although the UK Web Archive itself might be temporarily inaccessible, its information pages have been preserved by the Internet Archive, accessible [here] (https://web.archive.org/web/20240000000000*/https://www.webarchive.org.uk).

For those keen on delving deeper, the British Library Research Repository houses supporting documents related to the UK Web Archive, such as collection scoping documents, annual reports, statistics, and research publications. The repository can be accessed [here](https://doi.org/10.23636/hj5v-3c07).

While the UK Web Archive takes a brief hiatus, we hope these alternative resources help. And perhaps embracing these other openly accessible archives might even unveil new avenues and perspectives for exploration.

While we work hard to recover all our online services you can find regular updates on progress published on our Knowledge Matters blog.

18 October 2023

UK Web Archive Technical Update - Autumn 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2023 Q2 report

Replication

The most important achievement over the last quarter has been establishing a replica of the UK Web Archive holdings at the National Library of Scotland (NLS). The five servers we’d filled with data were shipped, and our NLS colleagues kindly unpacked and installed them. We visited a few weeks later, finishing off the configuration of the servers so they can be monitored by the NLS staff and remotely managed by us.

This replica contains 1.160 PB of WARCs and logs, covering the period up until February 2023. But, of course, we’ve continued collection since then, and including the 2023 Domain Crawl, we already have significantly more data held at the British Library (about 160 TB more, ~1.3 PB in total). So, the next stage of the project is to establish processes to monitor and update the remote replica. Hopefully, we can update it over the internet rather than having to ship hardware back and forth, but this is what we’ll be looking into over the next weeks.

The 2023 Domain Crawl

As reported before, this year we are running the Domain Crawl on site. It’s had some issues with link farms, which caused the number of domains to leap from around 30 million to around 175 million, which crashed the crawl process.

2023-10-10-dc2023-queues

2023 Domain Crawl queues over time, showing peak at 175 million queues.

However, we were able to clean up and restart it, and it’s been stable since then. As of the end of this quarter we’ve downloaded 2.8 billion URLs, corresponding to 183 TB of (uncompressed) data.

Legal Deposit Access Service

We’ve continued to work with Webrecorder, who have added citation, search and print functionality to the ePub reader part of the Legal Deposit Access Service. This has been deployed and is available for staff testing, but we are still resolving issues around making it available for realistic testing in reading rooms across the Legal Deposit Libraries.

Browsertrix Cloud Local Deployment

We have worked out most of the issues around getting Browsertrix Cloud deployed in a way that complies with Non-Print Legal Deposit legislation and with our local policies. We are awaiting the 1.7.0 release which will include everything we need to have a functional prototype service.

Once it’s running, we can start trying our some test crawls, and work on how best to integrate the outputs into our main collection. We need some metadata protocol for marking crawls as ready for ingest, and we need to update our tools to carefully copy the results into our archival store, and support using WACZ files for indexing and access.

27 September 2023

What can you discover and access in the UK Web Archive collection?

UK Web Archiving team, British Library

The UK Web Archive collects and preserves websites from the UK. When we started collecting in 2005, we sought permission from owners to archive their websites. Since 2013, legal deposit regulations have allowed us to automatically collect all websites that we can identify as located in or originating from the UK. 

Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. In this post we attempt to explain the concepts and terms of what a user will be able to find.

In the table below is a summary of the different search and access options which can be carried out via our main website (www.webarchive.org.uk). The rest of this post will go into more detail about the terms that we have used in this table.

Table of content availble in the UK Web Archive
Table of content availble in the UK Web Archive 

Year

In this table, ‘year’ refers to the year in which we archived a website, or web resource. This might be different to the year in which it was published or made available online. Once you have found an archived website, you can use the calendar feature to view all the instances, or ‘snapshots’ of that page (which might run over many years).  

Legal deposit regulations came into effect in April 2013. Before this date, websites were collected selectively and with the owners’ permissions. This means the amount of content we have from this earlier period is comparatively smaller, but (with some exceptions) is all available openly online. 

From 2013 onwards, we have collected all websites that we can identify as located in or originating from the UK. We do this once per year in a process that we call the ‘annual domain crawl.’

URL look-up

If you know the URL of a website you want to find in the UK Web Archive, you can use the search box at: https://www.webarchive.org.uk. The search box should recognise that you are looking for a URL, and you can also use a drop-down menu to switch between Full Text and URL search.

URL search covers the widest amount of the collection, and our index, which makes the websites searchable, is updated daily.

UKWA Search Bar September 2023
https://www.webarchive.org.uk/

Full text search

Much of the web archive collection has been indexed and allows a free-text search of the content, i.e., any word, phrase, number etc. Note: Given the amount of data in the web archive, the number of results will be very large.

Currently, full text search is available for all our automatically collected content up to 2015, and our curator selected websites up to 2017. 

Access at legal deposit libraries

Unless the website owner gives explicit permission otherwise, legal deposit regulations restrict access to archived websites to the six UK Legal Deposit Libraries. Access is in reading rooms using a library managed computer terminal.

Users will need a reader's pass to access a reading room: check the website of each Library on how to get a reader’s pass.

Online access outside a legal deposit library

We frequently request permission from website owners to allow us to make their archived websites openly accessible through our website. Where permission has been granted, these archived websites can be accessed from our website https://www.webarchive.org.uk/ from any location where you have internet access.

Additionally, we also make archived web content we can identify as having an Open Government Licence openly accessible.

From all the requests we send for open access to websites, we receive permission from approximately 25% of website owners.  However, these websites form a significant overall amount of content available in the archive. This is because they tend to be larger websites and are captured more frequently (daily, weekly, monthly etc.) over many years.

Curator selected websites

Each year, UK Web Archive curators, and other partners who we work with, identify thousands of resources on the web that are related to a particular topic or event, or that require more frequent collection than once per year.

Many of these archived websites form part of our Topics and Themes collections. We have more than 100 of these, covering general elections, sporting events, creative works, and communications between groups with shared interests or experiences. You can browse these collections to find archived web resources relating to these topics and themes. 

Annual Domain Crawl

Separate from selections made by curators, we conduct an annual ‘domain crawl’ to collect as much of the UK Web as possible. This is done under the Non-Print Legal Deposit regulations, with one ‘crawl’ completed each year. This domain crawl is largely automated and looks to archive all .uk, .scot, .wales, .cymru and .london top-level domain websites plus others that have been identified as being UK-based and in scope for collection.

21 September 2023

How YouTube is helping to drive UK Web Archive nominations

By Carlos Lelkes-Rarugal, Assistant Web Archivist, British Library

Screenshot of the UK Web Archive website 'Save a UK website' page.
https://www.webarchive.org.uk/nominate

There currently exists a plethora of digital platforms for all manner of online published works; YouTube itself has become more than just a platform for sharing videos, it has evolved into a platform for individuals and organisations to reach a global audience and convey powerful messages. Recently, a popular content creator on YouTube, Tom Scott, produced a short video helping to outline the purpose of Legal Deposit and by extension, the work being carried out by UKWA.

Watch the video here: https://www.youtube.com/watch?v=ZNVuIU6UUiM

Tom Scott’s video, titled "This library has every book ever published", is a concise and authentic glimpse into the work being done by the British Library, one of the six UK Legal Deposit Libraries. The video highlighted some of the technology being used that enables preservation at scale, which also highlighted the current efforts in web archiving. Dr Linda Arnold-Stratford (Head of Liaison and Governance for the Legal Deposit Libraries) stated, “The Library collection is around 170 million items. The vast majority of that is Legal Deposit”. Ian Cooke (Head of Contemporary British and Irish Publications) highlighted that with the expansion of Legal Deposit to include born-digital content that “the UK Web Archive has actually become one of the largest parts of the collection. Billions of files, about one and a half terabytes of data”.

At the time of writing, the video has had over 1.4 million views. In addition, as the video continued to gain momentum, something remarkable happened. UKWA started receiving an influx of email nominations from website owners and members of the public. This was unexpected and the volume of nominations that have since come through has been impressive and unprecedented. 

The video has led to increased engagement with the public; with nominations representing an eclectic mix of websites. The comments on the video have been truly positive. We are grateful to Tom for highlighting our work, but we are also thankful and humbled that so many commentators have left encouraging messages, which are a joy to read. The British Library has the largest web archive team of all the Legal Deposit Libraries, but this is still a small team of three curators and four technical experts where we do everything in-house from curation to the technical side. Web archiving is a difficult task but we are hopeful that we can continue to develop the web archive by strengthening our ties to the community by bringing together our collective knowledge.

If you know of a UK website that should be included in the archive, please nominate it here:  https://www.webarchive.org.uk/en/ukwa/info/nominate

28 July 2023

UK-Ireland Digital Humanities Association Launch Event Report from the British Library

By Helena Byrne, Curator of Web Archives, Frankie Perry, Music Manuscripts and Archives Cataloguer and Stella Wisdom, Digital Curator for Contemporary British Collections

UK-Ireland Digital Humanities Association Launch Event Banner with event details
UK-Ireland Digital Humanities Association Launch Event Banner

The First Annual Event for the UK-Ireland Digital Humanities Association took place  on 29th and 30th June 2023 at Senate House, University of London as well as online. The Association “aims to build a collaborative vision for the field, and create new and sustainable long-term partnerships in alignment with the international community”. The programme set across one and half days covered a wide variety of topics and included an opportunity for the Community Interest Groups to meet up. 

The British Library was involved in four presentations either as an individual presentation or as part of a collaborative project. In this blog post we hear back from the British Library colleagues who attended.

Helena Byrne, Curator of Web Archives

I was involved in two collaborative presentations with Sharon Healy (Maynooth University) and Juan-José Boté-Vericad (Universitat de Barcelona). Our first presentation was a lightning talk on day one called 'Finding Web Archives under the ‘Big Tent’ of DH: A Case Study of Ireland and the UK'. This presented one element of a forthcoming chapter in a WARCnet edited collection on web archiving. This presentation reviewed postgraduate courses for the provision of web archiving in information management and digital humanities courses in Britain and Ireland. Our second presentation was part of Panel #2 on day two called 'The Potential of a Reborn Digital Archival Edition for Collating a Corpus of Archived Web Materials'. This presentation outlined a methodology for researchers without coding skills to select, collate and analyse a corpus of archived websites. 

The highlight for me was Panel #3, especially the presentation 'Towards a Critical Black Digital Humanities: A Critical Librarian’s Response' by Naomi L.A Smith (University of West London). This presentation and the discussion that followed highlighted some of the challenges as well as some of the positive action steps that can be taken to ensure digital humanities research is more inclusive. 

Frankie Perry, Postdoctoral Research Assistant, InterMusE project, University of York / Music Manuscripts and Archives Cataloguer, British Library

I gave a paper with Prof. Rachel Cowgill (University of York) who is Principal Investigator on the InterMusE project – a collaborative venture between musicologists, computer scientists, and archive and library specialists funded by the AHRC’s UK-US New Directions for Digital Scholarship in Cultural Institutions programme. The British Library is an institutional partner, with Dr Rupert Ridgewell (Lead Curator, Printed Music) as Co-Investigator; the universities of Swansea and Illinois at Urbana-Champagne are further partners, and we’re also working with the University of Waikato. In our paper, we introduced the complexities of sourcing, digitising, and piecing together ephemera relating to historical musical events (eg. concert programmes, flyers, newspaper reviews), using as our case study materials relating to the British Music Society (1918-1933) and its regional centres and branches. We showed the interface of the digital archive built for the project, which uses a combination of the Greenstone Digital Library system, the Mirador Annotation Viewer, and the SimpleAnnotationServer to make materials browsable, searchable, and interactive for musicologists and community users alike.

I really enjoyed the event and the snapshot it provided into current digital humanities research and techniques. I especially enjoyed a paper by Orla Delaney (Cambridge) on 'Database ethnography and the museum object record', and one by Lisa Griffith (Digital Repository of Ireland) and Laura Molloy (CODATA) titled 'Pathways to collaboration – creating and sharing GLAM image collections as data'.

Stella Wisdom, Digital Curator for Contemporary British Collections

My lightning talk 'Collaborating to Curate and Exhibit Complex Digital Literature' reflected on the cooperation between curators, researchers, experimental writers and creative practitioners to plan and produce the British Library’s Digital Storytelling exhibition (2 June 2023 - 15 October 2023). A hands-on display, which explores the ways that digital innovations have transformed and enhanced our narrative experiences. Showcasing eleven examples of electronic literature that invite readers to become a part of the story themselves, through interactive narratives that respond to user input, reading experiences influenced and personalised by data feeds, and works that draw from multiple platforms and audience participation to create immersive story worlds. Preparing and in some cases modifying these interactive works to display them in a public gallery has only been possible through practical collaborations between Library staff with the writers and games studios who created these digital stories. I shared some insights from my experience of this co-curation work and encouraged attendees to visit the exhibition.

It was a pleasure to meet a number of people in real life who I had only previously spoken with online. A personal highlight was hearing Reham Hosny from the University of Cambridge and Minia University speak about 'DH and E-Lit Communities: Intersectional Perspectives'. In the refreshment breaks at this event I chatted with Reham about her novel, Al-Barrah (The Announcer) and she demonstrated to me how both augmented reality and hologram technologies work with the printed book to immerse readers in this thought provoking narrative.

12 July 2023

UK Web Archive Technical Update - Summer 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2023 Q1 report.

At the end of the last quarter, we launched the 2023 Domain Crawl. This started well (as described in the 2023 Q1 report) but a few days later it became clear the crawl was going a bit too well. We were collecting so quickly, we started to run out of space on the temporary store we use as a buffer for incoming content.

The full story of how we responded to this situation is quite complicated, so I wrote up the detailed analysis in a separate blog post. But in short, we took the opportunity to move to a faster transfer process and switch to a widely-used open source tool called Rclone. After about a week of downtime, the crawl was up and running again, and we were able to keep up and store and index all the new WARC files as they come in.

Since then, the crawl has been running pretty well, but there have been some problems…

2023-07-05-dc-storage-and-queues
2023 Domain Crawl Storage and Queues

The crawler uses disk space in two main ways: the database of queues of URLs to visit (a.k.a. the crawl frontier), and the results of the crawl (the WARCs and logs). The work with Rclone helped us get the latter under control, with the move from /mnt/gluster/dc2023 to sharing the main /opt drive and uploading directly to Hadoop. These uploads run daily, leading to a saw-tooth pattern as free space gets rapidly released before being slowly re-consumed.

But the frontier shares the same disk space, and can grow very large during a crawl. So it’s important we keep an eye on things to make sure we don’t run out of space. In the past, before we made some changes to Heritrix itself, it was possible for a domain crawl to consume huge amounts of disk space. Once, we hit over 100TB for the frontier, which becomes very difficult to manage. In recent domain crawls, our configuration changes we’ve managed to get this down to more like 10TB.

But, as you can see, around the 13th of June, we hit some kind of problem, where the apparent number of queues in the frontier started rapidly increasing, as did the rate at which we were consuming disk space. We deleted some crawler checkpoints to recover some space, as we very rarely need to restart the crawl from anything other than the most recent daily checkpoint, but this only freed-up modest amounts of space. Fortunately, the aggressive frontier growth seemed to subside before we ran out of space, and the crawl is now stable again.

Unfortunately, it’s not clear what happened. Based on previous crawls, it seems unlikely that the crawler suddenly discovered many more millions of web hosts at this point in the crawl. In the past, the number of queues has been consistently up to around 20 million at most, so this leap to over 30 million is surprising. It is possible we hit some weird web structures, but it’s difficult to tell as we do don’t yet have reliable tools for quickly analysing what’s going on in this situation.

Suspiciously, just prior to this problem, we resolved a different issue with the system used to record what URLs had been seen already. This had been accidentally starved of resources, causing problems when the crawler was trying to record what URLs had been seen. This lead to the gaps in the crawl monitoring data just prior to the frontier growth, as the system stopped working and required some reconfiguration. It’s possible this problem left the crawler in a bit of a confused state, leading to mis-management of the frontier database. Some analysis of the crawl will be needed to work out what happened.

In the laster quarter, the new URL search feature was deployed on our BETA service. Following favourable feedback on the new feature, the main https://www.webarchive.org.uk/ service has been updated to match. We hope you find the direct URL search useful.

We’ve also updated the code that recognises whether a visitor is in a Legal Deposit reading room, as it wasn’t correctly identifying readers at Cambridge University Library. Finally, there was an issue with how the CAPTCHAs on the contact and nomination forms were being validated, which has also been resolved.

Our colleagues from Webrecorder delivered the initial set of changes to the ePub renderer, making it easier to cite a paragraph of one of our Legal Deposit eBooks. Given how long the ePub format has been around, it is perhaps surprising that support for ‘obvious’ features like citations and printing are still quite immature, inconsistent and poorly-standardised. To make citation possible, we have ended up adopting the same approach as Calibre’s Reference Mode and implemented a web-based version that integrates with out access system.

We’ve also worked on updating the service documentation based on feedback from our Legal Deposit Library partners, resolved some problems with how the single-concurrent-use locks were being handled and managed, and implemented most of the translations for the Welsh language service. The translations should be complete shortly, and and updated service can be rolled out, including the second set of changes from Webrecorder (focused on searching the text of ePub documents).

Replication to NLS

The long process of establishing a replica of our holdings at the National Library of Scotland (NLS) is finally nearing completion. We have an up-to-date replica, and have been attempting to arrange the transfer of the servers. This turned out to be a bit more complicated that we expected, so has been delayed, but should be completed in the next few weeks.

Minor Updates

For curators, one small but important fix was improving how the W3ACT curation tool validates URLs. This was thought to have been fixed already, but the W3ACT software was not using URL validation consistently and this meant it was still blocking the creation of crawl target records with top-level domains like .sport (rather than the more familiar .uk or .com etc.). As of June 23rd, we released version 2.3.5 of W3ACT that should finally resolve this issue.

Apart from that, we also updated Apache Airflow to version 2.5.3, and leveraged our existing Prometheus monitoring system to send alerts if any of our SSL certificates are about to expire.

03 July 2023

RESAW 2023 Conference Report from the UK Web Archive

By Cui Cui Bodleian Libraries/University of Sheffield Information School, Nicola Bingham, Helena Byrne, British Library, Alice Austin Edinburgh University.

RESAW 2023 Exploring the archived web during a highly transformative age - Sciencesconf.org
RESAW 2023 Exploring the Archived Web During a Highly Transformative Age

2023 was the fifth RESAW conference. RESAW stands for Research Infrastructure for the Study of Archived Web Materials. It was established in 2012, aims to promote a collaborative European research infrastructure for the study of archived web materials and holds a conference every two years. The 2023 conference was held in Marseille from June 5-6 under the theme ‘Exploring the Archived Web During a Highly Transformative Age’. There was a packed programme with a number of UK based presentations especially from the UK Web Archive teams based at the Bodleian Libraries, British Library and Archive of Tomorrow project partner, University of Edinburgh.

The keynote presentations from the conference were streamed live and the recording of the day two keynote ‘Saving Ukrainian Cultural Heritage Online' by Sebastian Majstorovic (European University Institute) is available on the Inspé Aix-Marseille YouTube channel.

In this blog post participants from the UK Web Archive teams have reported back on their conference experience.

Bodleian Libraries/University of Sheffield Information School 

Cui Cui, Web Archivist / PhD researcher

The experience of presenting two papers in the fifth RESAW conference turned out to be a highly emotional one for me. The first presentation alongside my fellow web archivist, Alice Austin from University of Edinburgh, marked the end of the Archive of Tomorrow project. The opportunity provided me with a chance to reflect on the work we carried out for the project. The second presentation concluded the initial phase of my PhD research project on participatory web archiving. Presenting at the conference compelled me to summarise the findings from a survey I delivered last year, aiming to gain insights into the current practices of participatory web archiving. This experience not only marked a significant milestone, but also served as a starting point to bring theories and practices together to develop better web archives. 

During a panel discussion titled “Interrogating the logics of web archiving in the era of platformization”, Jessica Ogden, Katie Mackinnon, Emily Maemura posed some critical questions about web archiving practices. Who are we collecting for, what shall we collect and how can we approach this process ethically? They particularly put content creators at the centre of considerations and challenged web archivists to critically reflect our practices and ethical considerations. It is assuring that we are not alone in grappling with these complex issues as web archivists. These questions echo with the constant dilemmas we face as web archivists. In particular, the Archive of Tomorrow project highlighted the double-bind situations we encountered when dealing with ethical considerations and piloted engagement work with content creators. From both researchers’ and archivists’ perspectives, it is evidenced that these concerns call for more evidence-based studies and a deeper understanding of the views held by content creators and other wide range of stakeholders. 

Overall, the RESAW conference provided a thought-provoking experience. It allowed me to reflect on our work, consolidate my understanding, and recognise the need for continued efforts to address these complex issues.

British Library

Nicola Bingham, Lead Curator of Web Archives

I felt very privileged to attend this conference at the Mucemlab in Marseille, set in the courtyard of Fort Saint-Jean, with a stunning mix of old and new architecture and amazing sea views. During the conference, I found numerous presentations informative, engaging, thought-provoking and humorous, however, among them, two in particular, sparked profound reflections on curatorial praxis within the context of my own work.

Henrik Smith-Sivertsen took the audience on a captivating journey into the world of digital music archiving. With a focus on three distinct songs, he illustrated how the mediascapes in which they were published have a significant impact on the archiving process. Through his exploration, he highlighted the challenges of capturing and preserving complex digital objects from social media platforms and streaming services. The question of which version(s) to capture became a pivotal point of discussion, raising awareness of the dynamic nature of digital music and the evolving digital landscape it resides in. A thought-provoking video presentation showcased the different online iterations of Lukas Graham's "7 Years" from 2015. The variations in platforms, remixes, and user-generated content surrounding this song demonstrated the diverse ways in which music proliferates and evolves online. The presentation served as a powerful reminder of the challenges faced by archivists when attempting to capture and preserve such dynamic and multi-faceted digital musical artefacts.

Tiancheng Leo Cao from the University of Texas at Austin's intriguing paper focused on the changing meanings of openness within the museum context. He shed light on the gradual shift from an institution-oriented understanding to an access-oriented interpretation, prioritising the needs and participation of the public. I was struck by how this ideology parallels our thinking in the UK Web Archive where efforts are being made to embed more participation in the curatorial process. By involving communities, ensuring diverse perspectives, and including multiple voices, heritage organisations can create a more inclusive and representative platform for preserving our digital heritage.

Helena Byrne, Curator of Web Archives 

This was my second time attending a RESAW conference. The first I attended was 2017 as part of the Web Archiving Week event held in London when the IIPC Web Archiving Conference and RESAW collaborated on organising a full week of web archiving activities. At RESAW 2023 I co-presented two presentations both on day two of the conference. These were both collaborations that came out of the WARCnet network. The first was a joint presentation with Emily Maemura from (University of Illinois) where we fed back some initial findings from the series of workshops we facilitated on ‘Describing Collections with Datasheets for Datasets’. The second presentation was a joint presentation with Sharon Healy (Maynooth University) on ‘Assessing the Scholarly Use of Web Archives in Ireland’. In this presentation we highlighted a section from a much larger report that will be published as part of the WARCnet Papers and Special Reports

A key highlight for me in the programme was the session 'Building the Next Generation of Web Archive Analysis Service'. This panel gave an overview of the development of the Archives Unleashed project from 2017. The project is now winding up and will be supported by the Internet Archive who will be releasing a subscription service to Archives Research Compute Hub (ARCH) this summer. I've been lucky enough to attend Archives Unleashed events in 2017 and 2019 so it was really great to see how the project has changed over time. I wish the Archives Unleashed team all the best.

University of Edinburgh

Alice Austin, Web Archivist

The Archive of Tomorrow project team took two papers to RESAW this year. The first was a deep-dive into the Trans Health sub-category within the Talking About Health collection. The second, presented jointly with my fellow web archivist Cui Cui of the Bodleian Libraries, delivered a condensed version of the project’s Final Report, and reflected on the challenges, wins and losses of the project as a whole.

A few related themes emerged from this year’s papers. A number of speakers reflected on the value of the archived web as a source for ‘bottom-up’ perspectives on the impact of online spaces in the development of narratives at a personal and social level. Arguing that the events of 9/11 galvanised emerging web archiving efforts, Ian Milligan’s paper explored how the resultant archived pages provide a rich source for future historians wanting to understand how that day evolved; Dana Diminescu’s paper on the archive of the ‘Comme a la maison’ platform examined how changes in the language of hospitality used online can reflect changes in societal understanding of the migrant experience; and Anya Shchetvina’s paper discussed how web-based communication objects can become recontextualised as memory objects.

Another theme concerned how to do web archiving in an age of ‘platformisation’. A trio of papers by Emily Maemura, Jess Ogden, and Kate MacKinnon explored this in detail, raising important questions about how web archiving practices might better serve the communities that they draw from. Camille Riou considered the vulnerability of data in a capitalist world in the context of the withdrawal of Twitter’s API for academic research, and Cade Diehm and Benjamin Royer of the New Design Congress presented an excellent overview of the sector’s readiness to grapple with issues of the polycrisis such as colonialism, privatisation and datafication. 

The sixth RESAW Conference will be held in 2025 at University of Siegen in Germany. The theme for the conference is ‘Histories of the Datafied Web: Infrastructures, metrics, aesthetics’. More details about the conference and the call for papers will be announced in due course. 

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs