UK Web Archive blog

2 posts from June 2025

30 June 2025

UK Web Archive Report on Digital Methodologies for the Study of Religion Symposium

By Helena Byrne, Curator of Web Archives

Digital Methodologies for the Study of Religion event details
Digital Methodologies for the Study of Religion event details

The UK Web Archive participated in the one day symposium Digital Methodologies for the Study of Religion on 25th June 2025. This knowledge exchange symposium was organised as part of the ESRC-funded Digital British Islam Project. It was a hybrid event with a mix of online presentations and in person presentations at Coventry University. 

The fourteen presentations were divided into four thematic panels: Panel 1 – Innovative Methods and Platforms, Panel 2 – Digital Archives and Cataloguing, Panel 3 – Mixed Methods and Online-Offline Dynamics and Panel 4 – Emerging Ethical Challenges.

The UK Web Archive participated in Panel 2 – Digital Archives and Cataloguing. The first speaker, Emily Cottrell from Université de Strasbourg, outlined a project that produced an online database to study digitised religious texts. The final two presentations in the panel were  from Gary R Bunt from Digital British Islam at University of Wales Trinity Saint David and Anna Grasso from Digital Islam Across Europe at University of Edinburgh. Professor Bunt outlined the scope of the Digital British Islam web archive collection as well as the lessons learnt from developing the curation skills needed to develop a web archive collection. Dr Grasso then gave an overview of the Digital Islam Across Europe web archive collection and how they were able to use the ARCH platform through their Archive-It subscription. It was really interesting to hear curatorial insights from these web archive collections and how the data collected can be used to further understand the lived experience of Islamic communities in Britain and across Europe. 

The British Library presentation was Using the UK Web Archive to understand religion on the web. This presentation gave a general introduction to the UK Web Archive explaining who is involved in curating the UK Web Archive collections, an overview of Non-Print Legal Deposit and how this shapes curation practices. It gave an overview of how religion is represented within the UK Web Archive. Religions are broadly represented across many of the over one hundred curated collections and there are currently nine individual collections that focus on a topic related to religion. The presentation gave an overview of the recent work we did to publish metadata from the UK Web Archive as data by co-developing the Datasheets for Web Archives Toolkit. So far, the Scottish Churches - Collection Seed List is the only data set related to religion that has been published but keep an eye on the UK Web Archive for updates on when the next phase of data sets will be published.  

Potential research example with the Scottish Churches - Collection Seed List data set
Potential research example with the Scottish Churches - Collection Seed List data set

All the presentations gave methodological insights that could be reused by researchers studying a different subject and I would highly recommend checking out the recordings when they are made available through the project website: https://digitalbritishislam.com/

One highlight for anyone who manages a GLAM sector catalogue was the presentation by Dr. Nur Efeoglu who presented Curating Islam Online: Religious Heritage in UK Museum Digital Catalogues. This presentation focused on reviewing three UK museum catalogues for content related to the Selçuk and Ottoman period. The lessons learnt from this report are valuable for running any effective catalogue. My favourite quote from this presentation was "curation should be a collaboration not a monologue". This is something we try to encourage in the UK Web Archive by collaborating with subject experts to curate collections on various topics and from gathering nominations for the archive from the public. 

20 June 2025

RESAW 2025: Report from UK Web Archive Colleagues

RESAW 2025 Conference Banner
RESAW 2025 Conference Banner

Introduction

The RESAW (Research Infrastructure for the Study of Archived Web) 2025 conference took place at the University of Siegen in Germany. It was organized by the Collaborative Research Centre 1187 “Media of Cooperation” at the University of Siegen in cooperation with the Centre for Contemporary and Digital History (C²DH) at the University of Luxembourg.

This was a special conference as the organisers of this, past and and future conferences had a special presentation (it included cake and balloons) to mark ten years since the first RESAW conference was held in Aarhus, Denmark. They all paid tribute to Niels Brügger from Aarhus University who founded RESAW and helped develop the RESAW community.

The conference theme, “The Datafied Web” explored this theme from a historical perspective. The call for papers stated that “we would like to explore the historical roots, trends, and trajectories that shaped the data-driven paradigm in web development and to examine the genealogies of the datafied and metrified web”. The opening panel discussion aimed to define what is meant by “the datified web”.

UK Web Archive colleagues from Bodleian Libraries, the British Library and National Library of Scotland attended the Web Archiving Conference. There was a packed programme with a variety of presentation forms and workshops that shared best practices and innovative projects in the world of web archiving. In this blog post they report highlights of their conference experience.

Reflections

Helena Byrne - Curator of Web Archives - British Library 

I was part of the panel called Web archives practices along with colleagues from the Portuguese and Belgian web archive. My presentation, Lessons learnt from preparing collections as data: the UK Web Archive experience, gave an overview of the project that spanned from October 2022 to November 2024 to develop a framework for publishing UK Web Archive curated collections as data

There were so many great presentations and panels at this conference that it is hard to just pick one highlight. The opening panel discussion defining “the datified web” raised lots of interesting points. In this panel Anne Helmond made the important point that “while the front-end of the web has changed dramatically, the back-end has undergone a deeper transformation” and the study of the web requires a mix of methodologies and resources. Another session that stood out was the panel on Past Metrics. We were reminded in this session about the visitor counters that used to be popular on early versions of websites. This was especially poignant as just a few days before this presentation I received an enquiry about a website and when I used the Memento Time Travel search function to view if any other web archive’s held a copy of it. I found one copy from its earlier years. This version had a prominent visitor counter and evoked a nostalgic response as I’d realised I hadn’t seen one for many years and had forgotten about this feature.

Beatrice Cannelli - Curatorial and Policy Research Officer (Algorithmic Archive Project) - Bodleian Libraries

At this year’s RESAW conference, my colleague Pierre Marshall and I organised a workshop titled “Towards an ‘Algorithmic Archive’: Developing Collaborative Approaches to Persistent Social and Algorithmic Data Services for Researchers”. The workshop brought together diverse perspectives from practitioners and researchers working with social media data, fostering discussions regarding the development of sustainable strategies to collect social media platforms. The workshop was a valuable opportunity to gather insights for the Algorithmic Archive project, particularly regarding issues and expectations related to short- and long-term access to social media data. 

Among the many engaging sessions, I found the one on “the challenges of archival practices” particularly interesting. Using the case of the web archive at the Aix-Marseille University, the panellists underscored the importance of encouraging critical engagement with issues researchers face, such as data ethics, data surveillance and archival responsibility, especially when dealing with potentially sensitive web archived data. Similarly, the panel of “Data Regimes” reflected on the complexity of data stewardship, where open data policies often clash with ethical concerns, especially when dealing with sensitive content like social media data. This often leaves researchers and librarians to navigate these grey areas without clear guidance, raising questions about reuse and long-term preservation.

Pierre Marshall - Technical Research Officer (Algorithmic Archive Project) - Bodleian Libraries

Vasco Rato gave an overview of arquivo.pt’s API. Arquivo.pt runs a CDX(J) server, and about half of the traffic to the archive comes from the API. Rato mentioned that sometimes people _ask_ for WARCs, but what they really want is just the text or media content of a page. It would be a better user experience to provide text or image search directly through the API. The CDX(J) server also helps anyone wanting to page through the archive without downloading the whole thing. Most researchers don't have the capacity to store and process 1.5PB of WARC files.

Helge Holzmann of the Internet Archive ran a workshop on the Archives Research Compute Hub (ARCH) service. Holzmann talked us through a series of recipes for the ArchiveSpark library, intended to make it easier for researchers to run data-centric queries against items in the Internet Archive. Besides the content of the workshop, I appreciated Holzmann's use of 2000s-era retro web graphics to illustrate his presentation. We are all here for the datafied web, but beyond the data I'm happy to celebrate the art of the early web.

The BnF also presented their Skyblogs collection, including work on parsing the page markup (back) into a data model for analysis across the corpus.

The common theme I took from these sessions is that there's a lot to learn from making large web datasets usefully available to academics. Hopefully next year Beatrice and I will be back with some examples of what internet researchers could do with our planned social media archive.

Andrea Kocsis - Chancellor’s Fellow in Humanities Informatics, University of Edinburgh/ The National Librarian’s Fellow in Digital Scholarship 2024-45, The National Library of Scotland

I was glad to present our work on web archive engagement with Leontien Talboom, where we discussed how to support not only traditional readers and computational users, but also the digitally curious who often fall between categories. I also shared a glimpse into the creative process behind Digital Ghosts, the web archive exhibition I’m currently developing with artist Dorsey Kaufmann and the National Library of Scotland, which will take place in November at Inspace in Edinburgh.

One of the talks that stayed with me was Ian Milligan’s reflection on the ethical challenges of crowdsourced digital archives in the context of 9/11. I plan to bring this ethical dilemma of accessibility, metadata, and data protection into my teaching next year in Future Libraries and Archives at the Edinburgh Futures Institute. The most inspiring talk for me, though, was Nanna Bonde Thylstrup’s keynote on data loss. Her interdisciplinary framing - drawing equally from humanities, sociology, and STEM - challenged the usual discourse of data loss as an evolutionary narrative and instead reframed it as a question of digital politics and infrastructure. Overall, RESAW was inspiring both intellectually and as a generous, thoughtful community of dedicated netpreservers.

Conclusion

Attending the RESAW conference is a great opportunity to exchange ideas, learn about innovative research projects, and foster collaborations in the field of web archive studies. The UK Web Archive colleagues contributed significantly through presentations and active participation in other sessions. Participation at conferences in this manner supports the recognition and reuse of the UK Web Archive collections as a significant resource in the wider academic discourse on web archiving. We look forward to participating in the next edition of the conference which will take place in June 2027 at the University of Groningen, the Centre for Media and Journalism Studies & Centre for Digital Humanities. The theme for 2027 is “Engaging Public Internet Histories: New Ways of Telling the Story of & with the Web”. So keep an eye out for the call for papers for the seventh RESAW conference in 2026.