UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

68 posts categorized "Contemporary Britain"

27 September 2023

What can you discover and access in the UK Web Archive collection?

UK Web Archiving team, British Library

The UK Web Archive collects and preserves websites from the UK. When we started collecting in 2005, we sought permission from owners to archive their websites. Since 2013, legal deposit regulations have allowed us to automatically collect all websites that we can identify as located in or originating from the UK. 

Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. In this post we attempt to explain the concepts and terms of what a user will be able to find.

In the table below is a summary of the different search and access options which can be carried out via our main website (www.webarchive.org.uk). The rest of this post will go into more detail about the terms that we have used in this table.

Table of content availble in the UK Web Archive
Table of content availble in the UK Web Archive 

Year

In this table, ‘year’ refers to the year in which we archived a website, or web resource. This might be different to the year in which it was published or made available online. Once you have found an archived website, you can use the calendar feature to view all the instances, or ‘snapshots’ of that page (which might run over many years).  

Legal deposit regulations came into effect in April 2013. Before this date, websites were collected selectively and with the owners’ permissions. This means the amount of content we have from this earlier period is comparatively smaller, but (with some exceptions) is all available openly online. 

From 2013 onwards, we have collected all websites that we can identify as located in or originating from the UK. We do this once per year in a process that we call the ‘annual domain crawl.’

URL look-up

If you know the URL of a website you want to find in the UK Web Archive, you can use the search box at: https://www.webarchive.org.uk. The search box should recognise that you are looking for a URL, and you can also use a drop-down menu to switch between Full Text and URL search.

URL search covers the widest amount of the collection, and our index, which makes the websites searchable, is updated daily.

UKWA Search Bar September 2023
https://www.webarchive.org.uk/

Full text search

Much of the web archive collection has been indexed and allows a free-text search of the content, i.e., any word, phrase, number etc. Note: Given the amount of data in the web archive, the number of results will be very large.

Currently, full text search is available for all our automatically collected content up to 2015, and our curator selected websites up to 2017. 

Access at legal deposit libraries

Unless the website owner gives explicit permission otherwise, legal deposit regulations restrict access to archived websites to the six UK Legal Deposit Libraries. Access is in reading rooms using a library managed computer terminal.

Users will need a reader's pass to access a reading room: check the website of each Library on how to get a reader’s pass.

Online access outside a legal deposit library

We frequently request permission from website owners to allow us to make their archived websites openly accessible through our website. Where permission has been granted, these archived websites can be accessed from our website https://www.webarchive.org.uk/ from any location where you have internet access.

Additionally, we also make archived web content we can identify as having an Open Government Licence openly accessible.

From all the requests we send for open access to websites, we receive permission from approximately 25% of website owners.  However, these websites form a significant overall amount of content available in the archive. This is because they tend to be larger websites and are captured more frequently (daily, weekly, monthly etc.) over many years.

Curator selected websites

Each year, UK Web Archive curators, and other partners who we work with, identify thousands of resources on the web that are related to a particular topic or event, or that require more frequent collection than once per year.

Many of these archived websites form part of our Topics and Themes collections. We have more than 100 of these, covering general elections, sporting events, creative works, and communications between groups with shared interests or experiences. You can browse these collections to find archived web resources relating to these topics and themes. 

Annual Domain Crawl

Separate from selections made by curators, we conduct an annual ‘domain crawl’ to collect as much of the UK Web as possible. This is done under the Non-Print Legal Deposit regulations, with one ‘crawl’ completed each year. This domain crawl is largely automated and looks to archive all .uk, .scot, .wales, .cymru and .london top-level domain websites plus others that have been identified as being UK-based and in scope for collection.

21 September 2023

How YouTube is helping to drive UK Web Archive nominations

By Carlos Lelkes-Rarugal, Assistant Web Archivist, British Library

Screenshot of the UK Web Archive website 'Save a UK website' page.
https://www.webarchive.org.uk/nominate

There currently exists a plethora of digital platforms for all manner of online published works; YouTube itself has become more than just a platform for sharing videos, it has evolved into a platform for individuals and organisations to reach a global audience and convey powerful messages. Recently, a popular content creator on YouTube, Tom Scott, produced a short video helping to outline the purpose of Legal Deposit and by extension, the work being carried out by UKWA.

Watch the video here: https://www.youtube.com/watch?v=ZNVuIU6UUiM

Tom Scott’s video, titled "This library has every book ever published", is a concise and authentic glimpse into the work being done by the British Library, one of the six UK Legal Deposit Libraries. The video highlighted some of the technology being used that enables preservation at scale, which also highlighted the current efforts in web archiving. Dr Linda Arnold-Stratford (Head of Liaison and Governance for the Legal Deposit Libraries) stated, “The Library collection is around 170 million items. The vast majority of that is Legal Deposit”. Ian Cooke (Head of Contemporary British and Irish Publications) highlighted that with the expansion of Legal Deposit to include born-digital content that “the UK Web Archive has actually become one of the largest parts of the collection. Billions of files, about one and a half terabytes of data”.

At the time of writing, the video has had over 1.4 million views. In addition, as the video continued to gain momentum, something remarkable happened. UKWA started receiving an influx of email nominations from website owners and members of the public. This was unexpected and the volume of nominations that have since come through has been impressive and unprecedented. 

The video has led to increased engagement with the public; with nominations representing an eclectic mix of websites. The comments on the video have been truly positive. We are grateful to Tom for highlighting our work, but we are also thankful and humbled that so many commentators have left encouraging messages, which are a joy to read. The British Library has the largest web archive team of all the Legal Deposit Libraries, but this is still a small team of three curators and four technical experts where we do everything in-house from curation to the technical side. Web archiving is a difficult task but we are hopeful that we can continue to develop the web archive by strengthening our ties to the community by bringing together our collective knowledge.

If you know of a UK website that should be included in the archive, please nominate it here:  https://www.webarchive.org.uk/en/ukwa/info/nominate

12 July 2023

UK Web Archive Technical Update - Summer 2023

By Andy Jackson, Web Archive Technical Lead, British Library

This is a summary of what’s been going on since the 2023 Q1 report.

At the end of the last quarter, we launched the 2023 Domain Crawl. This started well (as described in the 2023 Q1 report) but a few days later it became clear the crawl was going a bit too well. We were collecting so quickly, we started to run out of space on the temporary store we use as a buffer for incoming content.

The full story of how we responded to this situation is quite complicated, so I wrote up the detailed analysis in a separate blog post. But in short, we took the opportunity to move to a faster transfer process and switch to a widely-used open source tool called Rclone. After about a week of downtime, the crawl was up and running again, and we were able to keep up and store and index all the new WARC files as they come in.

Since then, the crawl has been running pretty well, but there have been some problems…

2023-07-05-dc-storage-and-queues
2023 Domain Crawl Storage and Queues

The crawler uses disk space in two main ways: the database of queues of URLs to visit (a.k.a. the crawl frontier), and the results of the crawl (the WARCs and logs). The work with Rclone helped us get the latter under control, with the move from /mnt/gluster/dc2023 to sharing the main /opt drive and uploading directly to Hadoop. These uploads run daily, leading to a saw-tooth pattern as free space gets rapidly released before being slowly re-consumed.

But the frontier shares the same disk space, and can grow very large during a crawl. So it’s important we keep an eye on things to make sure we don’t run out of space. In the past, before we made some changes to Heritrix itself, it was possible for a domain crawl to consume huge amounts of disk space. Once, we hit over 100TB for the frontier, which becomes very difficult to manage. In recent domain crawls, our configuration changes we’ve managed to get this down to more like 10TB.

But, as you can see, around the 13th of June, we hit some kind of problem, where the apparent number of queues in the frontier started rapidly increasing, as did the rate at which we were consuming disk space. We deleted some crawler checkpoints to recover some space, as we very rarely need to restart the crawl from anything other than the most recent daily checkpoint, but this only freed-up modest amounts of space. Fortunately, the aggressive frontier growth seemed to subside before we ran out of space, and the crawl is now stable again.

Unfortunately, it’s not clear what happened. Based on previous crawls, it seems unlikely that the crawler suddenly discovered many more millions of web hosts at this point in the crawl. In the past, the number of queues has been consistently up to around 20 million at most, so this leap to over 30 million is surprising. It is possible we hit some weird web structures, but it’s difficult to tell as we do don’t yet have reliable tools for quickly analysing what’s going on in this situation.

Suspiciously, just prior to this problem, we resolved a different issue with the system used to record what URLs had been seen already. This had been accidentally starved of resources, causing problems when the crawler was trying to record what URLs had been seen. This lead to the gaps in the crawl monitoring data just prior to the frontier growth, as the system stopped working and required some reconfiguration. It’s possible this problem left the crawler in a bit of a confused state, leading to mis-management of the frontier database. Some analysis of the crawl will be needed to work out what happened.

In the laster quarter, the new URL search feature was deployed on our BETA service. Following favourable feedback on the new feature, the main https://www.webarchive.org.uk/ service has been updated to match. We hope you find the direct URL search useful.

We’ve also updated the code that recognises whether a visitor is in a Legal Deposit reading room, as it wasn’t correctly identifying readers at Cambridge University Library. Finally, there was an issue with how the CAPTCHAs on the contact and nomination forms were being validated, which has also been resolved.

Our colleagues from Webrecorder delivered the initial set of changes to the ePub renderer, making it easier to cite a paragraph of one of our Legal Deposit eBooks. Given how long the ePub format has been around, it is perhaps surprising that support for ‘obvious’ features like citations and printing are still quite immature, inconsistent and poorly-standardised. To make citation possible, we have ended up adopting the same approach as Calibre’s Reference Mode and implemented a web-based version that integrates with out access system.

We’ve also worked on updating the service documentation based on feedback from our Legal Deposit Library partners, resolved some problems with how the single-concurrent-use locks were being handled and managed, and implemented most of the translations for the Welsh language service. The translations should be complete shortly, and and updated service can be rolled out, including the second set of changes from Webrecorder (focused on searching the text of ePub documents).

Replication to NLS

The long process of establishing a replica of our holdings at the National Library of Scotland (NLS) is finally nearing completion. We have an up-to-date replica, and have been attempting to arrange the transfer of the servers. This turned out to be a bit more complicated that we expected, so has been delayed, but should be completed in the next few weeks.

Minor Updates

For curators, one small but important fix was improving how the W3ACT curation tool validates URLs. This was thought to have been fixed already, but the W3ACT software was not using URL validation consistently and this meant it was still blocking the creation of crawl target records with top-level domains like .sport (rather than the more familiar .uk or .com etc.). As of June 23rd, we released version 2.3.5 of W3ACT that should finally resolve this issue.

Apart from that, we also updated Apache Airflow to version 2.5.3, and leveraged our existing Prometheus monitoring system to send alerts if any of our SSL certificates are about to expire.

03 July 2023

RESAW 2023 Conference Report from the UK Web Archive

By Cui Cui Bodleian Libraries/University of Sheffield Information School, Nicola Bingham, Helena Byrne, British Library, Alice Austin Edinburgh University.

RESAW 2023 Exploring the archived web during a highly transformative age - Sciencesconf.org
RESAW 2023 Exploring the Archived Web During a Highly Transformative Age

2023 was the fifth RESAW conference. RESAW stands for Research Infrastructure for the Study of Archived Web Materials. It was established in 2012, aims to promote a collaborative European research infrastructure for the study of archived web materials and holds a conference every two years. The 2023 conference was held in Marseille from June 5-6 under the theme ‘Exploring the Archived Web During a Highly Transformative Age’. There was a packed programme with a number of UK based presentations especially from the UK Web Archive teams based at the Bodleian Libraries, British Library and Archive of Tomorrow project partner, University of Edinburgh.

The keynote presentations from the conference were streamed live and the recording of the day two keynote ‘Saving Ukrainian Cultural Heritage Online' by Sebastian Majstorovic (European University Institute) is available on the Inspé Aix-Marseille YouTube channel.

In this blog post participants from the UK Web Archive teams have reported back on their conference experience.

Bodleian Libraries/University of Sheffield Information School 

Cui Cui, Web Archivist / PhD researcher

The experience of presenting two papers in the fifth RESAW conference turned out to be a highly emotional one for me. The first presentation alongside my fellow web archivist, Alice Austin from University of Edinburgh, marked the end of the Archive of Tomorrow project. The opportunity provided me with a chance to reflect on the work we carried out for the project. The second presentation concluded the initial phase of my PhD research project on participatory web archiving. Presenting at the conference compelled me to summarise the findings from a survey I delivered last year, aiming to gain insights into the current practices of participatory web archiving. This experience not only marked a significant milestone, but also served as a starting point to bring theories and practices together to develop better web archives. 

During a panel discussion titled “Interrogating the logics of web archiving in the era of platformization”, Jessica Ogden, Katie Mackinnon, Emily Maemura posed some critical questions about web archiving practices. Who are we collecting for, what shall we collect and how can we approach this process ethically? They particularly put content creators at the centre of considerations and challenged web archivists to critically reflect our practices and ethical considerations. It is assuring that we are not alone in grappling with these complex issues as web archivists. These questions echo with the constant dilemmas we face as web archivists. In particular, the Archive of Tomorrow project highlighted the double-bind situations we encountered when dealing with ethical considerations and piloted engagement work with content creators. From both researchers’ and archivists’ perspectives, it is evidenced that these concerns call for more evidence-based studies and a deeper understanding of the views held by content creators and other wide range of stakeholders. 

Overall, the RESAW conference provided a thought-provoking experience. It allowed me to reflect on our work, consolidate my understanding, and recognise the need for continued efforts to address these complex issues.

British Library

Nicola Bingham, Lead Curator of Web Archives

I felt very privileged to attend this conference at the Mucemlab in Marseille, set in the courtyard of Fort Saint-Jean, with a stunning mix of old and new architecture and amazing sea views. During the conference, I found numerous presentations informative, engaging, thought-provoking and humorous, however, among them, two in particular, sparked profound reflections on curatorial praxis within the context of my own work.

Henrik Smith-Sivertsen took the audience on a captivating journey into the world of digital music archiving. With a focus on three distinct songs, he illustrated how the mediascapes in which they were published have a significant impact on the archiving process. Through his exploration, he highlighted the challenges of capturing and preserving complex digital objects from social media platforms and streaming services. The question of which version(s) to capture became a pivotal point of discussion, raising awareness of the dynamic nature of digital music and the evolving digital landscape it resides in. A thought-provoking video presentation showcased the different online iterations of Lukas Graham's "7 Years" from 2015. The variations in platforms, remixes, and user-generated content surrounding this song demonstrated the diverse ways in which music proliferates and evolves online. The presentation served as a powerful reminder of the challenges faced by archivists when attempting to capture and preserve such dynamic and multi-faceted digital musical artefacts.

Tiancheng Leo Cao from the University of Texas at Austin's intriguing paper focused on the changing meanings of openness within the museum context. He shed light on the gradual shift from an institution-oriented understanding to an access-oriented interpretation, prioritising the needs and participation of the public. I was struck by how this ideology parallels our thinking in the UK Web Archive where efforts are being made to embed more participation in the curatorial process. By involving communities, ensuring diverse perspectives, and including multiple voices, heritage organisations can create a more inclusive and representative platform for preserving our digital heritage.

Helena Byrne, Curator of Web Archives 

This was my second time attending a RESAW conference. The first I attended was 2017 as part of the Web Archiving Week event held in London when the IIPC Web Archiving Conference and RESAW collaborated on organising a full week of web archiving activities. At RESAW 2023 I co-presented two presentations both on day two of the conference. These were both collaborations that came out of the WARCnet network. The first was a joint presentation with Emily Maemura from (University of Illinois) where we fed back some initial findings from the series of workshops we facilitated on ‘Describing Collections with Datasheets for Datasets’. The second presentation was a joint presentation with Sharon Healy (Maynooth University) on ‘Assessing the Scholarly Use of Web Archives in Ireland’. In this presentation we highlighted a section from a much larger report that will be published as part of the WARCnet Papers and Special Reports

A key highlight for me in the programme was the session 'Building the Next Generation of Web Archive Analysis Service'. This panel gave an overview of the development of the Archives Unleashed project from 2017. The project is now winding up and will be supported by the Internet Archive who will be releasing a subscription service to Archives Research Compute Hub (ARCH) this summer. I've been lucky enough to attend Archives Unleashed events in 2017 and 2019 so it was really great to see how the project has changed over time. I wish the Archives Unleashed team all the best.

University of Edinburgh

Alice Austin, Web Archivist

The Archive of Tomorrow project team took two papers to RESAW this year. The first was a deep-dive into the Trans Health sub-category within the Talking About Health collection. The second, presented jointly with my fellow web archivist Cui Cui of the Bodleian Libraries, delivered a condensed version of the project’s Final Report, and reflected on the challenges, wins and losses of the project as a whole.

A few related themes emerged from this year’s papers. A number of speakers reflected on the value of the archived web as a source for ‘bottom-up’ perspectives on the impact of online spaces in the development of narratives at a personal and social level. Arguing that the events of 9/11 galvanised emerging web archiving efforts, Ian Milligan’s paper explored how the resultant archived pages provide a rich source for future historians wanting to understand how that day evolved; Dana Diminescu’s paper on the archive of the ‘Comme a la maison’ platform examined how changes in the language of hospitality used online can reflect changes in societal understanding of the migrant experience; and Anya Shchetvina’s paper discussed how web-based communication objects can become recontextualised as memory objects.

Another theme concerned how to do web archiving in an age of ‘platformisation’. A trio of papers by Emily Maemura, Jess Ogden, and Kate MacKinnon explored this in detail, raising important questions about how web archiving practices might better serve the communities that they draw from. Camille Riou considered the vulnerability of data in a capitalist world in the context of the withdrawal of Twitter’s API for academic research, and Cade Diehm and Benjamin Royer of the New Design Congress presented an excellent overview of the sector’s readiness to grapple with issues of the polycrisis such as colonialism, privatisation and datafication. 

The sixth RESAW Conference will be held in 2025 at University of Siegen in Germany. The theme for the conference is ‘Histories of the Datafied Web: Infrastructures, metrics, aesthetics’. More details about the conference and the call for papers will be announced in due course. 

28 June 2023

IIPC Web Archiving Conference 2023 Report from the UK Web Archive

By Nicola Bingham, Helena Byrne, Ian Cooke, Carlos Lelkes-Rarugal, Andrew Jackson, Richard Price British Library, Leontien Talboom Cambridge University Library, Mark Simon Haydn National Library of Scotland.

IIPC WAC2023 Conference Banner with details of the online and in person conference details.
IIPC WAC2023 Conference Banner

The IIPC 2023 Web Archiving Conference was hosted by the Netherlands Institute of Sound and Vision in Hilversum and co-organised by KB, National Library of the Netherlands. There was an online session held on May 3rd and the main in-person event took place on May 11th and 12th. There was a packed programme that included Q&A sessions for pre-recorded presentations for the online day and  presentations, workshops, lighting talks as well as posters for the in-person event. This was the first in-person IIPC conference since 2019 when the event was hosted  by the National and University Library in Zagreb (NSK), Croatia. 

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the conference both as delegates and presenters. In this blog post they have reported back on their conference experience.

British Library

Nicola Bingham, Lead Curator of Web Archiving

Attending the IIPC conference in person for the first time since 2019 was a great experience. The combination of reconnecting with colleagues after four long years and the (literally) colourful ambience of the Beeld & Geluid (Institute for Sound & Vision), created an atmosphere brimming with renewed energy and optimism. I will highlight just a few of the presentations and conversations that were interesting from my point of view.

I enjoyed hearing about the De Digitale Stad Herleeft (the Digital City Revived) from Marleen Stikker, founder and ‘mayor’ of DDS, Marieke Brugman of UNESCO and Tjarda de Haan, Bits and Bytes United. Presentations focused on the "webarchaeological excavations” which took place to reconstruct, preserve, store and make accessible this unique digital heritage based on KB’s XS4ALL web collection - which was listed as UNESCO Memory of the World Heritage for the Dutch list and is now under review for the worldwide list.

I enjoyed insights into diversity and co-curation from Jesper Verheof, a Researcher-in-Residence at KB working on "Mapping the Dutch LGBT+ Web Archive". Jesper's work utilises KB's collections to explore the unique web sphere formed by LGBTQ+ - or queer people - and how this evolved over time. It sparked intriguing insights and perspectives which could be applied to our own LGBTQ+ collection.

Collaboration and innovation in web archiving were recurring themes at the conference. Valuable insights were shared by the team from the Library of Congress, emphasising their investment in and education of curators to effectively participate in the web archiving process. 

Finally, I had the privilege of presenting the research by WG2 of the WARCnet project, ‘Surveying the Landscape of COVID-19 Web Collections in European GLAM Institutions’ in a session dedicated to Covid-19 collections. Our findings shed light on the scope of these collections, how they were defined, and the common challenges institutions face in making them accessible for research purposes. 

Helena Byrne, Curator of Web Archives 

I participated in both the online and in-person event as a collaborator in a presentation in the online day and co-facilitating a workshop at the in-person event. I was involved in the ‘Developing a Reborn Digital Archival Edition as an Approach for the Collection, Organisation, and Analysis of Web Archive Sources’ project with Sharon Healy (Maynooth University) and Juan-José Boté-Vericad (Universitat de Barcelona). Along with Emily Maemura (University of Illinois) we facilitated Workshop-01 ‘Describing Collections with Datasheets for Datasets’. This was part of a series of workshops we hosted to see if the Datasheets for Datasets framework could be applied to UK Web Archive collections published as data. 

As a participant there were so many great takeaways from this conference. One of the sessions that stands out most for me is the ‘Renewal in Web Archiving: Towards More Inclusive Representation and Practices’. This was on day two of the conference. The conversations in this session were really useful for me to try and ensure that we continue to try and develop more inclusive collections and opportunities to engage in the curation process. In this session we heard about the next steps for the Archiving the Black Web (ATBW) project. Although this is a USA based project, its impact will be global as they are now currently developing a training programme to improve the curation and research use of the archived black web. 

Andrew Jackson, Web Archive Technical Lead

I was involved in a couple of tool workshops during the conference, where it was great to see the interest in shared tooling, and the collaborative commitments this implies. I was also interested in how many of the presentations related to issues around information literacy. For more, see my blog post Reflections on the IIPC Web Archiving Conference 2023.

Ian Cooke, Head of Contemporary British & Irish Publications

This year’s conference was a strong reminder that web archiving is about people - the people whose lives and experiences are expressed in the collections we build; the people whose imaginations shaped the way we use, and have used, the web over time; and the people who are working across collecting, preserving and researching the archived web.

There was a great mix of presentations, blending new developments in technologies, evolving research methods, and approaches to creating and understanding collections, in ways that were accessible to all attendees. Giulia Carla Rossi and I were both pleased to talk about the development of our practice at the British Library, and legal deposit libraries, in collecting ‘emerging formats’.  

The IIPC itself is celebrating its 20th year, and the conference reflected that sense of celebration. It also demonstrated the maturing of practice, and reflection on web archiving methods and goals, at many of the organisations represented. A highlight of the conference was the presentations by Makiba Foster and Zakiya Collier on the Archiving the Black Web project, and the potential of web archiving to contribute to ‘black self-education practices, collective study and librarianship’. Foster and Collier argued for well-resourced institutions to take responsibility for providing support to community heritage organisations in building inclusive collections, and also stressed the need for ethical considerations, in particular regarding the rights of people represented within collections, when building collections.        

Overall, it was a privilege to take part in the conference and to have the time to connect in person with a community of web archive practitioners and researchers, being able to share knowledge and experience and reminding ourselves of what we have in common.

Carlos Lelkes-Rarugal, Assistant Web Archivist

I very much enjoyed my second attendance of an IIPC annual web archiving conference, 2019 was my first one, so I didn’t quite know what to expect. Sufficed to say, the 2023 WAC was just as successful and another enjoyable, unique experience.

There’s such a diverse background of people, I think this is because web archiving is approached very differently as each organisation have their particular way of going about it, which is why there is such an emphasis on sharing knowledge and information. I attended many talks and learnt about new methods of quality assurance, the infrastructure set up of institutions, policies on collecting; whichever presentation it is, you can be sure there’s something innovative going on that could be applied to your domain.

The UK Web Archive itself represents the six UK Legal Deposit Libraries, and as such, we’re inherently maintaining relationships but more importantly trying to build new relationships for new opportunities, collaborations, and potential partnerships. We’re a small team (larger than others) but still relatively small when considering the scope of our work, and I think this is exactly what the IIPC can help with. Like many organisations, the UK Web Archive does at times find web archiving to be a challenge, and as such, the IIPC helps foster a network of people who are willing to share their knowledge and expertise so that we can connect with them to tackle these emerging and ever-evolving challenges. There’s a collective effort to further web archiving, we’re trying to advance a field that has a lot of potential, so if you’re interested, please join this invaluable community.

Richard Price, Head of Contemporary British Collections

I attended this conference to reacquaint myself with web archiving in a little more detail than I have for some years. It was a privilege to attend, seeing so many different kinds of response from the international community and, if I may so, I felt especially proud of my colleagues at the British Library for their presentations and workshops. If there was a common thread through the papers it was that the problem-solving and information-sharing intrinsic to the web archiving community are values translated from the early days of the web itself – that substantial part of the early Internet that was altruistic and public-minded – and, in today’s archiving world, underpinned by layers of technical, social, and curatorial expertise. Thank you to IIPC and to Sound and Vision at Hilversum, and to all those presenting and attending!

Cambridge University Library

Leontien Talboom, Technical Analyst

This was my first time attending IIPC apart from a very brief appearance on a panel in 2022. I was fortunate enough to be a co-presenter on two talks during the conference. One was with my colleague Mark Haydn where we presented on the datasets that we were able to create during the Archive of Tomorrow project and the other was with my colleague Caylin Smith where we explored the difficulties and opportunities of capturing the University of Cambridge domain. 

Both presentations were really enjoyable and it was great to get feedback and questions from colleagues across our field. As this was my first time attending IIPC I wasn’t sure what to expect. However, I was pleasantly surprised by the wide range of topics and formats discussed. One that really stood out to me was the work of Emily Escamilla who talked about reference rot and what would happen if GitHub was to disappear. This really showcased how much as an academic sector we rely on these types of sources to be around when referencing them, but this is not necessarily a given. 

National Library of Scotland

Mark Haydn, Metadata Analyst

It has been a few years since I've been at an in-person conference, & I had forgotten how nice it can be to visit another city and spend a few days immersed in presentations and conversations with people working in the same area. Sometimes this meant hearing about something immediately relevant to my own metadata work at the National Library of Scotland, like hearing Tom Storrar of the UK Government Web Archive assess how effective their work ramping up collecting early in the pandemic to capture frequent website updates had been, or listening to members of the ResPaDon Project detail their experiences extending regional access to web archives collections across France. Other presentations served as an opportunity to better understand topics being explored further afield: there were many demonstrations of potential uses of AI, not all of them ominous, ranging from automatically producing descriptive summaries of technical metadata, for use in Library of Congress catalogue records, to generating a generic Stirring Plenary Speech at short notice.

As well as listening in, my colleague Leontien Talboom and I presented some of our work on the Archive of Tomorrow project, summarising the progress that's been possible since the development of the British Library's web archive metadata export. We heard about other institutional and international approaches and platforms for looking at web archives at scale, like Archive-It's ARCH tools, and caught fellow Archive of Tomorrow web archivist Cui Cui's discussion of knowledge sharing before heading back to the UK.

The 2024 IIPC Web Archive Conference will be hosted by the Bibliothèque nationale de France (BnF) 24-26 April. Follow the IIPC Twitter account for updates and the call for papers due out in early autumn.

26 June 2023

LGBTQ+ Connections and Community

By Ash Green, CLIP LGBTQ+ Network, and Goldsmith University

The Marlborough Pub and Theatre
The Marlborough Pub and Theatre

I was browsing through the LGBTQ+ Lives Online collection recently, and reminded myself that I had added The Marlborough Pub and Theatre to it when I first began co-curating the collection. As far as I can remember, it was one of the first sites I added to the archive. I wanted it in there because it had been an important part of my coming out around 2017. I had a personal connection to it, and I wanted there to be a record of the impact it had on me. I know future explorers of the UK Web Archive won’t know why that site is archived, but maybe they will stumble across this blog post in connection to it and understand its importance to at least one BTQ person – me.

So, why did I specifically want this site in there? Well, in 2017, when I was working out what support there was for me as a trans/gender non-conforming person, I discovered The Clare project, which is a Tran’s support group in Brighton. I went along to it, and afterwards we went to The Marlborough Theatre and Pub, which was a venue with a long history of support for the LGBTQ+ community. The pub was the sort of place where I didn’t know anyone, but just being there made me feel okay about who I was. It was the first time being in an LGBTQ+ venue had felt like that to me. And I realised that there were other people there who seemed to be on similar paths in their lives. It was a reassuring place, and it was a place where I learnt about how diverse the LGBTQ+ community was. I remember going to a queer cabaret there, and it was such an amazing, heart-warming, queer, eye-opening and fun night. The pub is still there – now called The Actors. I’m not a regular visitor, and if you mention my name in there, they won’t know who I am. But when I call in from time to time when I’m in Brighton, I still get that sense of belonging to a community even if I’m quietly sitting in a corner reading on my own. It is a place that re-energises me.

It got me wondering about other sites in the LGBTQ+ Lives Online collection focused on artistic communities that may have had a similar impact on others in the same way that The Marly did on me.

So, for example, what joy did members of South Wales Gay Men's Chorus, Songbirds Choir, or True Colours LGBT Choir feel when they first sang with these choirs?

How excited were listeners when they heard a new track on LGBT Underground that stuck a strong emotional chord with them, and has stayed with them forever?

How did filmmakers feel when their first film appeared at the Scottish Queer International Film Festival, LezDiff, or the Iris Prize? And who in the audience saw something for the first time at these film festivals that resonated strongly with them?

And what sense of connection and belonging did those in queer / LGBTQ+ art groups such as The Queer Dot, Sanctuary Queer Arts, Wise Thoughts, and VFD find within their arts communities?

And maybe there are LGBTQ+ people who attended Queen Jesus, Teatro do Mundo, or even The Marlborough theatre performances, who realised the voice on stage was talking directly to them, and they clearly understood its message in relation to who they are as an LGBTQ+ person.

I’m know I can’t possibly be the only LGBTQ+ person who feels a strong connection with a place or community like these. Maybe you have a story to share about one of the sites in the collection? Or maybe you have a site like one of these that you would like us to add. You can nominate sites for inclusion here: https://www.webarchive.org.uk/nominate

We can’t curate the whole of the UK web on our own. We need your help to ensure that information, discussions, personal experiences and creative outputs related to the LGBTQ+ community are preserved for future generations. Anyone can suggest UK published websites to be included in the UK Web Archive by filling in the above nominations form.

If you would like to explore any of the sites mentioned in this blog post, you can find them in the Arts, Literature, Music & Culture subsection of the LGBTQ+ Lives Online collection: https://www.webarchive.org.uk/en/ukwa/collection/3090

04 May 2023

Regal Reflections: Exploring a New UK Web Archive Collection on King Charles III

Nicola Bingham, Lead Curator of Web Archiving, British Library

It has been 70 years since a new monarch was crowned in the UK. As we bear witness to a new era of the British monarchy and reflect on its role within the UK, the UK Web Archive is recording and preserving this momentous occasion by capturing websites in a special collection about King Charles III. Work started in earnest on this collection on 8th September 2022 when the late Queen, Elizabeth II, passed away and Charles became King, however, it also forms part of a larger series of collections about the British monarchy in the early 21st Century, curated by staff in the UK Legal Deposit Libraries.

Through this series of special collections, we can trace how the Royal Family has adopted the internet to communicate more efficiently with their supporters, members of the public, and other stakeholders as well as to promote their charitable causes and connect with younger generations who are more likely to engage with social media. As well as ‘official’ information, the UK Web Archive is also capturing user-generated content from a wide range of publishers including the general public, as recorded in websites, blogs, and social media posts, much of which is not available through traditional historical records.

In building this collection we have several priorities. As with all our collecting activity, our mission is to save ephemeral digital content ensuring it is preserved for the historical record. A good illustration of this is that the official website of Charles, Prince of Wales, published in his former position as heir apparent, no longer exists on the internet and is only available in the web archive.

Screenshot of the archived website of the Prince of Wales. Image of the Prince walking in a garden

Archived copy of www.princeofwales.gov.uk/ in the UK Web Archive (21/06/2019) https://www.webarchive.org.uk/wayback/archive/20190621085304/https://www.princeofwales.gov.uk/

We hope that the collection can help to provide a more comprehensive understanding of King Charles III and his impact on society, by preserving a diverse range of viewpoints and perspectives. There is a huge groundswell of affection for the new King, and the Royal Family in general, and a great sense of celebration and optimism in the lead-up to the Coronation on 6th May, however, there is of course, opposition, skepticism, and criticism, all of which is reflected online. It is important to capture all sides of the conversation to provide a balanced view of the Royal Family and create a digital legacy that will be of interest to researchers to study, and future generations to appreciate.

Another of our aims is to represent different communities across the UK and Commonwealth in the UK Web Archive. The collection will reflect how towns, cities, and villages celebrate the Coronation. Many people will be holding street parties, such as the residents of Calderdale, West Yorkshire, where residents are encouraged to get together and make the Coronation Weekend a community celebration to remember.

Seal of King Charles III - red background and white seal

In Glasgow organisations and communities are encouraged to engage in various Coronation initiatives and events in order to create a positive lasting legacy. The Big Help Out, for example, is an opportunity to highlight the positive impact of volunteering. It is hoped that the extra bank holiday for the Coronation will be remembered as a day of donating time and skills to help charities, causes, and the vulnerable.

Along with street parties, other traditions surrounding significant royal events include the manufacture and purchase of souvenirs. This article on the V&A’s website, preserved in the UK Web Archive, shows a few examples of souvenirs from past events such as the 'Jubilee' biscuit tin made in 1887 for the Carlisle-based biscuit manufacturer Carr & Co., to commemorate Queen Victoria's Golden Jubilee and the 'Coronation Coach' biscuit tin resembling the ornate coach used by King George VI and Queen Elizabeth on their Coronation Day on 11 December 1936. Of course, now that online shopping is ubiquitous any type of royal-themed memorabilia or amenity can be purchased, from the more traditional such as this mug from the National Archives shop to the more esoteric such as hiring a King Charles look-a-like.

One of the more peculiar aspects of the British monarchy is that special occasions are often associated with an official dish. Queen Elizabeth had
curried chicken for her Coronation, which was a relatively exotic choice in the Britain of the 1950s while King Charles has a ceremonial quiche (disappointingly not named Quiche l’Reign) which is intended for people to cook at home as part of the Coronation Big Lunch.

Tweet from the Prime Ministers twitter account discussing the upcoming coronation.

Image from UK Government Twitter showing Queen’s Coronation banquet UK Prime Minister (@10DowningStreet) / Twitter (webarchive.org.uk)]

In conclusion, the UK Web Archive is a collection affording a unique opportunity to witness and record unfolding historical events. As a historical figure, Charles III and the events that occur during his reign will be of significant interest to researchers, scholars, and the general public. Please do visit the King Charles III collection in the UK Web Archive, and if you know of a website that should be included in this collection, please nominate it here: https://www.webarchive.org.uk/en/ukwa/info/nominate

 

12 January 2023

Changes in Nature’s Calendar – Early Bloomers

The Importance of Citizen Science in Monitoring and Adapting to Climatic Change

By Andrea Deri, Cataloguer and UKWA Climate Change Collection’s lead curator

On 1 January 2023, I had my usual walk from Folkestone Gardens via Sue Godfrey Nature Park, Deptford, London Borough of Lewisham to Greenwich Park, Royal Borough of Greenwich. Overcast, temperature in single digit, humid but calm. Trees and shrubs mostly leafless: an accentuating background to patches of bright green mosses.

I was hoping to see some flowers on winter blossoming plants, for example the bell-shaped flowers of clematis ‘Jingle Bell’ in St Alfege Church’s yard, and the spidery flowers of witch hazels in the Royal Observatory Garden in Greenwich. I was also curious what other flowers I would find, earlier than usual, triggered by the warming climate. Having joined a month ago (1 December 2022) the annual wildflower ‘hunt’ on the first day of the winter, a survey of species in flower in my locality, Deptford’s urban area since 2009 organised by the Creekside Education Trust and the London Natural History Society, I expected several early bloomers. Here is Creekside’s blog post of the 2021 wildflower survey. 

While the witch hazels (Fig. 1.) did not disappoint, I was up for a surprise with clematis “Jingle Bell”: only the silky fluffy seedheads were left: it finished flowering earlier this year. I was lucky to see its last flowers on Christmas Eve 2022 (Fig. 2.). Other early flowers greeted me on a hazelnut shrub in Sue Godfrey Nature Park (Fig. 3.). But, I was truly astonished to see daffodils fully opened in a park by Creekside, just across the Creekside Discovery Centre (Fig.4.) 

Witch hazel flower

Figure 1 Witch hazel (Hamamelis sp.) in flower. Photo: Andrea Deri, Royal Observatory Garden, Greenwich, London, 1 January 2023

I started searching for phenology calendars, almanacs, and any information on the blooming time of these species in my local and other areas in order to compare my observations with the “expected” (based on previous years) flowering periods. The online findings supported my assumption: I did observe earlier than expected flowerings, with the most specific data for the hazelnut.

Clematis ‘Jingle Bell’ 
According to the Royal Horticultural Society (RHS) clematis “Jingle Bell” flowers in winter and early spring. Compared to this broad-brush period, my observation this year suggests this individual specimen finished flowering much earlier than expected and earlier than I had observed this specimen in previous years. 

Clematis flower

Figure 2 Clematis cirrhosa “Jingle Bells” one bell-shaped flower and fluffy seedheads. Photo: Andrea Deri, St Elfege Church, Greenwich, London, 24 December 2022

Daffodil 
A post on the Daffodil Society prompted me to do a search on RHS’s website for daffodils where February-March was quoted as the usual flowering period. More precise than for the clematis. Early flowering daffodil horticultural varieties, however, can bloom as early as January, stated one of the Gardeners World blogposts. I may have encountered an early flowering daffodil garden variety. In addition to its literary associations, this iconic flower may have just now become also a conversation starter about the climate crisis. Would its freshness and brightness frame a difficult dialogue in hope? 

Daffodil flowers

Figure 3 Daffodils (Narcissus sp.) in flower. Photo: Andrea Deri, near Creekside Discovery Centre, Deptford, London, 1 January 2023

Hazelnut 
The Woodland Trust Nature’s Calendar offered me with the tool I had been really looking for: a peer-reviewed database linked to a live map that allowed me to compare my observation with fellow observers in the UK at day level precision.  

Hazelnut flower

Figure 4 Hazelnut (Corylus avellana) in flower: crimson female flowers, yellow catkin male flowers. Photo: Andrea Deri, Sue Godfrey Nature Park, Deptford, London, 1 January 2023

Before I signed up to add my hazelnut observation, I took a screenshot of the “Add a Record” webpage on 5 January 2023 that showed the first hazelnut flower sighting on 4 January 2023. (Fig.5.)

Screenshot of Wildlife trust 'Nature's calendar' website

Figure 5 Screenshot of Nature's Calendar, Woodland Trust. Photo: Andrea Deri, @20:34 pm GMT 5 January 2023

Hazelnut first flowering was among the recently recorded data of the Nature’s Calendar (Fig. 5.) My observation of hazelnut flowers on 1 January 2023 was not extraordinary but earlier than the one featured online. Hazelnut is expected to be in flower in early January according to Nature Calendar (downloadable pdf). But as early as 1 January? To answer this question, I had to register to enter my data. When I entered my observation date, I received an automatic note, all in red: 

This date falls outside of the expected range

The date you have entered is unusually early or late for this species and event; please double check the record. If it’s correct we’d like to know more about your observation, so please add a comment before clicking ‘next’ to continue. If possible, a photo is very useful too. Please note that your record will not appear on the live map until it has been checked by the Nature’s Calendar team.”

For evidence, I uploaded one of my photos of the hazelnut flowers (Fig.4.) and a description of the place and circumstances. My hazelnut flowering observations may turn out to be some of the earliest this year. To prove or refute this statement I rely on the Woodland Trust’s online database, the Nature’s Calendar team’s peer-review and keen monitoring of fellow citizen scientists. This type of on-land & online live collaboration in monitoring the slightest phenological changes is gaining increasing importance in addressing local impacts of climatic changes.

Will hazelnut flower earlier and earlier in the future? Only regular visitors can answer this question by careful monitoring the same hazelnut shrub and recording the date of the first flowers and uploading the data to Nature Calendar.

Nature Calendar invites citizen scientists to monitor a carefully selected list of species of shrubs, trees, flowers, grasses, fungi, birds, insects and amphibians throughout the year. Their changes over time will give us information on how these species (plants, animals and mushroom) adapt to the unfolding climatic changes. Phenological change data contributes to better decisions in wildlife conservation, among others.  

While I was browsing, I came across several websites and webpages on various other decisions and local actions related to climate change adaptation. For example: What can I do about climate change in my garden?  What local residents are doing in the boroughs of Lewisham and Greenwich about the climate crisis:  Climate Action Lewisham, Climate Home – a home of creativity, imagination and community activism by young people, Lewisham Climate Action Bond as an example of Local Climate Bonds, Lewisham Climate Emergency Declaration and Action Plan, CAPE Informing Local Action on Climate Change / London Borough of Lewisham, The Climate Emergency website of Royal Borough of Greenwich, Carbon Neutral Greenwich, Greenwich Climate Network. 

Some of the activities and organisations were familiar to me, I was taken aback by others: ‘How could I miss them?  I live here!” A fast-changing landscape of actions and online information. Having saved these sites to my further actions, I also realised some of these online contents could be highly ephemeral. Uploading my list of URLs to the UKWA Climate Change collection saved local digital content for future research on climatic changes.  

Sauntering through streets, gardens and parks has turned into an archival journey, connecting past, present and future. Fit for the first day of the year. Fit for any days, anywhere where your interest, experience, and local knowledge crosses climatic changes.  

The Natural History Museum’s community science webpage lists a broad range of UK wildlife monitoring activities related to climatic changes, including the New Year Plant Hunt of the Botanical Society of Britain and Ireland and the upcoming annual Big Garden Birdwatch (27-28 January 2023) organised by the Royal Society for the Protection of Birds since 1979. 

Contribute to the web archive
Your next walk or online stroll may spark you to nominate some of your local climate initiatives (civil society, governmental, business, media, arts and academia) to the UK Web Archive Climate Change Collection. Many thanks for your consideration. 

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs