Web Archives Collections as Data at the Digital Humanities in the Nordic and Baltic Countries (DHNB) Workshop Report
By Helena Byrne, Curator of Web Archives
The UK Web Archive was one of five web archive organisations represented in the Web Archive Collections as Data workshop held at the Digital Humanities in the Nordic and Baltic Countries (DHNB) 2025 conference held at the National Museum of Estonia in Tartu. The UK Web Archive has participated in the 2025, 2024 and 2023 DHNB conference. The workshop was organised by Olga Holownia, Senior Programme Officer at the International Internet Preservation Consortium (IIPC). It served as an introduction to web archives and web archives collections as data with a focus on use cases but also the challenges related to producing, sharing and publishing, collections as data.
The first stage of the workshop gave a brief overview of the collections as data movement within the GLAM sector, and introduced the Collections as Data Checklist developed by members of the GLAM Labs community. It also introduced what web archives are and where you can access them, how a selection of web archives are making their collections available as data as well as what are the potential research opportunities for these collections. The panel included Olga Holownia (IIPC), Gustavo Candela (University of Alicante), Helena Byrne (British Library), Jon Carlstedt Tønnessen (National Library of Norway), Anders Klindt Myrvoll (Royal Danish Library), Sophie Ham and Steven Claeyssens (KB, National Library of the Netherlands).
The UK Web Archive presentation promoted the recently published Datasheets for Web Archives Toolkit and the new metadata data sets that are available through the British Library Research Repository. The presentation gave an overview of how the project started, the background to how the Toolkit was prepared and how it was implemented.
The activity stage of the workshop focused on how we could adapt the Collections as Data Checklist for web archives. The participants were split into three groups. They reviewed the checklist through the lens of if it is applicable to web archives, how it could be adapted if it does not fit, what solutions can be developed to overcome some of the challenging sections of the checklist. There was a rich discussion amongst the groups which also benefited from having both researchers and library professionals involved in reviewing the checklist.
The general consensus from the groups was that maybe more detail is needed to accompany the Checklist so that it could be applied to web archive collections. Some of the points on the Checklist are particularly difficult to apply to web archive collections. There was a lot of discussion on the first two points as they cover licensing and citation. These are particularly difficult for web archives due to national legislation; most web archives operate on a dark or grey access model and most onsite terminals used to access web archives have copy and paste functions disabled so citation can become problematic. However, the participants were positive about the potential to apply an annotated or adapted Collections as Data Checklist specifically for web archives. The brainstorming session at this workshop was the first step of starting a discussion about what resources are needed to improve the process of publishing web archive collections as data. The second of these discussions was picked up at the IIPC Web Archiving Conference in April 2025.
For a more general report from the DHNB conference click the link to the Digital Scholarship blog to read the report: https://blogs.bl.uk/digital-scholarship/2025/04/dhnb-2025-digital-humanities-in-the-nordic-and-baltic-countries-conference-report.html