UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

71 posts categorized "Collections"

18 September 2024

Creating and Sharing Collection Datasets from the UK Web Archive

By Carlos Lelkes-Rarugal, Assistant Web Archivist

We have data, lots and lots of data, which is of unique importance to researchers, but presents significant challenges for those wanting to interact with it. As our holdings grow by terabytes each month, this creates significant hurdles for the UK Web Archive team who are tasked with organising the data and for researchers who wish to access it. With the scale and complexity of the data, how can one first begin to comprehend what it is that they are dealing with and understand how the collection came into being? 

This challenge is not unique to digital humanities. It is a common issue in any field dealing with vast amounts of data. A recent special report on the skills required by researchers working with web archives was produced by the Web ARChive studies network (WARCnet). This report, based on the Web Archive Research Skills and Tools Survey (WARST), provides valuable insights and can be accessed here: WARCnet Special Report - An overview of Skills, Tools & Knowledge Ecologies in Web Archive Research.

At the UK Web Archive, legal and technical restrictions dictate how we can collect, store and provide access to the data. To enhance researcher engagement, Helena Byrne, Curator of Web Archives at the British Library, and Emily Maemura, Assistant Professor at the School of Information Sciences at the University of Illinois Urbana-Champaign, have been collaborating to explore how and which types of datasets can be published. Their efforts include developing options that would enable users to programmatically examine the metadata of the UK Web Archive collections.

Thematic collections and our metadata

To understand this rich metadata, we first have to examine how it is created and where it is held..

Since 2005 we have used a number of applications, systems, and tools to enable us to curate websites. The most recent being the Annotation and Curation Tool (ACT), which enables authenticated users, mainly curators and archivists, to create metadata that define and describe targeted websites. The ACT tool also serves  to help users build collections around topics and themes, such as the UEFA Women's Euro England 2022. To build collections, ACT users first input basic metadata to build a record around a website, including information such as website URLs, descriptions, titles, and crawl frequency. With this basic ACT record describing a website, additional metadata can be added, for example metadata that is used to assign a website record to a collection. One of the great features of ACT is its extensibility, allowing us, for instance, to create new collections.

These collections, which are based around a theme or an event, give us the ability to highlight archived content. The UK Web Archive holds millions of archived websites, many of which may be unknown or rarely viewed, and so to help showcase a fraction of our holdings, we build these collections which draw on the expertise of both internal and external partners.

Exporting metadata as CSV and JSON files

That’s how we create the metadata, but how is it stored? ACT  is a web application and the metadata created through it is stored in a Postgres relational database, allowing authenticated users to input metadata in accordance to the fields within ACT. As the Assistant Web Archivist, I was given the task to extract the metadata from the database, exporting each selected collection as a CSV and JSON file. To get to that stage, the Curatorial team first had to decide which fields were to be exported. 

The ACT database is quite complex, in that there are 50+ tables which need to be considered. To enable local analysis of the database, a static copy is loaded into a database administration application, in this case, DBeaver. Using the free-to-use tool, I was able to create entity relationship diagrams of the tables and provide an extensive list of fields to the curators so that they could determine which fields are the most appropriate to export.

I then worked on a refined version of the list of fields, running a script for the designated Collection and pulling out specific metadata to be exported. To extract the fields and the metadata into an exportable format, I created an SQL (Structured Query Language) script which can be used to export results in both JSON and/or CSV: 

Select

taxonomy.parent_id as "Higher Level Collection",

collection_target.collection_id as "Collection ID",

taxonomy.name as "Collection or Subsection Name",

CASE

     WHEN collection_target.collection_id = 4278 THEN 'Main Collection'

     ELSE 'Subsection'

END AS "Main Collection or Subsection",

target.created_at as "Date Created",

target.id as"Record ID",

field_url.url as "Primary Seed",

target.title as "Title of Target",

target.description as "Description",

target.language as "Language",

target.license_status as "Licence Status",

target.no_ld_criteria_met as "LD Criteria",

target.organisation_id as "Institution ID",

target.updated_at as "Updated",

target.depth as "Depth",

target.scope as "Scope",

target.ignore_robots_txt as "Robots.txt",

target.crawl_frequency as "Crawl Frequency",

target.crawl_start_date as "Crawl Start Date",

target.crawl_end_date as "Crawl End Date"

From

collection_target

Inner Join target On collection_target.target_id = target.id

Left Join taxonomy On collection_target.collection_id = taxonomy.id

Left Join organisation On target.organisation_id = organisation.id

Inner Join field_url On field_url.target_id = target.id

Where

collection_target.collection_id in (4278, 4279, 4280, 4281, 4282, 4283, 4284) And

(field_url.position Is Null Or field_url.position In (0))

JSON Example
JSON output example for the Women’s Euro Collection

Accessing and using the data

The published metadata is available from the BL Research Repository within the UK Web Archive section, in the folder “UK Web Archive: Data”. Each dataset includes the metadata seed list in both CSV and JSON formats, a data dictionary and a datasheet which gives provenance information about how the dataset was created as well as a data dictionary that defines each of the data fields. The first collections selected for publication were:

  1. Indian Ocean Tsunami December 2004 (January-March 2005) [https://doi.org/10.23636/sgkz-g054]
  2. Blogs (2005 onwards) [https://doi.org/10.23636/ec9m-nj89] 
  3. UEFA Women's Euro England 2022 (June-October 2022) [https://doi.org/10.23636/amm7-4y46] 

31 July 2024

If websites could talk (part 6)

By Ely Nott, Library, Information and Archives Services Apprentice

After another extended break, we return to a conversation between UK domain websites as they try to parse out who among them should be crowned the most extraordinary…

“Where should we start this time?” asked Following the Lights. “Any suggestions?”

“If we’re talking weird and wonderful, clearly we should be considered first.” urged Temporary Temples, cutting off Concorde Memorabilia before they could make a sound.

“We should choose a website with a real grounding in reality.” countered the UK Association of Fossil Hunters.

“So, us, then.” shrugged the Grampian Speleological Group. “Or if not, perhaps the Geocaching Association of Great Britain?”

“We’ve got a bright idea!” said Lightbulb Languages, “Why not pick us?”

“There is no hurry.” soothed the World Poohsticks Champsionships, “We have plenty of time to think, think, think it over.”

“This is all a bit too exciting for us.” sighed the Dull Men’s Club, who was drowned out by the others.

“The title would be right at gnome with us.” said The Home of Gnome, with a little wink and a nudge to the Clown Egg Gallery, who cracked a smile.

“Don’t be so corny.” chided the Corn Exchange Benevolent Society. “Surely the title should go to the website that does the most social good?”

“Then what about Froglife?” piped up the Society of Recorder Players.

“If we’re talking ecology, we’d like to be considered!” the Mushroom enthused, egged on by Moth Dissection UK. “We have both aesthetic and environmental value.”

“Surely, any discussion of aesthetics should prioritise us.” preened Visit Stained Glass, as Old so Kool rolled their eyes.

The back and forth continued, with time ticking on until they eventually concluded that the most extraordinary site of all had to be… Saving Old Seagulls.

Check out previous episodes in this series by Hedley Sutton - Part 1Part 2, Part 3 Part 4 and Part 5

 

27 September 2023

What can you discover and access in the UK Web Archive collection?

UK Web Archiving team, British Library

The UK Web Archive collects and preserves websites from the UK. When we started collecting in 2005, we sought permission from owners to archive their websites. Since 2013, legal deposit regulations have allowed us to automatically collect all websites that we can identify as located in or originating from the UK. 

Since its inception, the UK Web Archive has collected websites using a number of different methods, with an evolving technological structure and under different legal regulations. The result of this means that what can be discovered and accessed is complicated and, therefore, not always easy to explain and understand. In this post we attempt to explain the concepts and terms of what a user will be able to find.

In the table below is a summary of the different search and access options which can be carried out via our main website (www.webarchive.org.uk). The rest of this post will go into more detail about the terms that we have used in this table.

Table of content availble in the UK Web Archive
Table of content availble in the UK Web Archive 

Year

In this table, ‘year’ refers to the year in which we archived a website, or web resource. This might be different to the year in which it was published or made available online. Once you have found an archived website, you can use the calendar feature to view all the instances, or ‘snapshots’ of that page (which might run over many years).  

Legal deposit regulations came into effect in April 2013. Before this date, websites were collected selectively and with the owners’ permissions. This means the amount of content we have from this earlier period is comparatively smaller, but (with some exceptions) is all available openly online. 

From 2013 onwards, we have collected all websites that we can identify as located in or originating from the UK. We do this once per year in a process that we call the ‘annual domain crawl.’

URL look-up

If you know the URL of a website you want to find in the UK Web Archive, you can use the search box at: https://www.webarchive.org.uk. The search box should recognise that you are looking for a URL, and you can also use a drop-down menu to switch between Full Text and URL search.

URL search covers the widest amount of the collection, and our index, which makes the websites searchable, is updated daily.

UKWA Search Bar September 2023
https://www.webarchive.org.uk/

Full text search

Much of the web archive collection has been indexed and allows a free-text search of the content, i.e., any word, phrase, number etc. Note: Given the amount of data in the web archive, the number of results will be very large.

Currently, full text search is available for all our automatically collected content up to 2015, and our curator selected websites up to 2017. 

Access at legal deposit libraries

Unless the website owner gives explicit permission otherwise, legal deposit regulations restrict access to archived websites to the six UK Legal Deposit Libraries. Access is in reading rooms using a library managed computer terminal.

Users will need a reader's pass to access a reading room: check the website of each Library on how to get a reader’s pass.

Online access outside a legal deposit library

We frequently request permission from website owners to allow us to make their archived websites openly accessible through our website. Where permission has been granted, these archived websites can be accessed from our website https://www.webarchive.org.uk/ from any location where you have internet access.

Additionally, we also make archived web content we can identify as having an Open Government Licence openly accessible.

From all the requests we send for open access to websites, we receive permission from approximately 25% of website owners.  However, these websites form a significant overall amount of content available in the archive. This is because they tend to be larger websites and are captured more frequently (daily, weekly, monthly etc.) over many years.

Curator selected websites

Each year, UK Web Archive curators, and other partners who we work with, identify thousands of resources on the web that are related to a particular topic or event, or that require more frequent collection than once per year.

Many of these archived websites form part of our Topics and Themes collections. We have more than 100 of these, covering general elections, sporting events, creative works, and communications between groups with shared interests or experiences. You can browse these collections to find archived web resources relating to these topics and themes. 

Annual Domain Crawl

Separate from selections made by curators, we conduct an annual ‘domain crawl’ to collect as much of the UK Web as possible. This is done under the Non-Print Legal Deposit regulations, with one ‘crawl’ completed each year. This domain crawl is largely automated and looks to archive all .uk, .scot, .wales, .cymru and .london top-level domain websites plus others that have been identified as being UK-based and in scope for collection.

21 September 2023

How YouTube is helping to drive UK Web Archive nominations

By Carlos Lelkes-Rarugal, Assistant Web Archivist, British Library

Screenshot of the UK Web Archive website 'Save a UK website' page.
https://www.webarchive.org.uk/nominate

There currently exists a plethora of digital platforms for all manner of online published works; YouTube itself has become more than just a platform for sharing videos, it has evolved into a platform for individuals and organisations to reach a global audience and convey powerful messages. Recently, a popular content creator on YouTube, Tom Scott, produced a short video helping to outline the purpose of Legal Deposit and by extension, the work being carried out by UKWA.

Watch the video here: https://www.youtube.com/watch?v=ZNVuIU6UUiM

Tom Scott’s video, titled "This library has every book ever published", is a concise and authentic glimpse into the work being done by the British Library, one of the six UK Legal Deposit Libraries. The video highlighted some of the technology being used that enables preservation at scale, which also highlighted the current efforts in web archiving. Dr Linda Arnold-Stratford (Head of Liaison and Governance for the Legal Deposit Libraries) stated, “The Library collection is around 170 million items. The vast majority of that is Legal Deposit”. Ian Cooke (Head of Contemporary British and Irish Publications) highlighted that with the expansion of Legal Deposit to include born-digital content that “the UK Web Archive has actually become one of the largest parts of the collection. Billions of files, about one and a half terabytes of data”.

At the time of writing, the video has had over 1.4 million views. In addition, as the video continued to gain momentum, something remarkable happened. UKWA started receiving an influx of email nominations from website owners and members of the public. This was unexpected and the volume of nominations that have since come through has been impressive and unprecedented. 

The video has led to increased engagement with the public; with nominations representing an eclectic mix of websites. The comments on the video have been truly positive. We are grateful to Tom for highlighting our work, but we are also thankful and humbled that so many commentators have left encouraging messages, which are a joy to read. The British Library has the largest web archive team of all the Legal Deposit Libraries, but this is still a small team of three curators and four technical experts where we do everything in-house from curation to the technical side. Web archiving is a difficult task but we are hopeful that we can continue to develop the web archive by strengthening our ties to the community by bringing together our collective knowledge.

If you know of a UK website that should be included in the archive, please nominate it here:  https://www.webarchive.org.uk/en/ukwa/info/nominate

28 July 2023

UK-Ireland Digital Humanities Association Launch Event Report from the British Library

By Helena Byrne, Curator of Web Archives, Frankie Perry, Music Manuscripts and Archives Cataloguer and Stella Wisdom, Digital Curator for Contemporary British Collections

UK-Ireland Digital Humanities Association Launch Event Banner with event details
UK-Ireland Digital Humanities Association Launch Event Banner

The First Annual Event for the UK-Ireland Digital Humanities Association took place  on 29th and 30th June 2023 at Senate House, University of London as well as online. The Association “aims to build a collaborative vision for the field, and create new and sustainable long-term partnerships in alignment with the international community”. The programme set across one and half days covered a wide variety of topics and included an opportunity for the Community Interest Groups to meet up. 

The British Library was involved in four presentations either as an individual presentation or as part of a collaborative project. In this blog post we hear back from the British Library colleagues who attended.

Helena Byrne, Curator of Web Archives

I was involved in two collaborative presentations with Sharon Healy (Maynooth University) and Juan-José Boté-Vericad (Universitat de Barcelona). Our first presentation was a lightning talk on day one called 'Finding Web Archives under the ‘Big Tent’ of DH: A Case Study of Ireland and the UK'. This presented one element of a forthcoming chapter in a WARCnet edited collection on web archiving. This presentation reviewed postgraduate courses for the provision of web archiving in information management and digital humanities courses in Britain and Ireland. Our second presentation was part of Panel #2 on day two called 'The Potential of a Reborn Digital Archival Edition for Collating a Corpus of Archived Web Materials'. This presentation outlined a methodology for researchers without coding skills to select, collate and analyse a corpus of archived websites. 

The highlight for me was Panel #3, especially the presentation 'Towards a Critical Black Digital Humanities: A Critical Librarian’s Response' by Naomi L.A Smith (University of West London). This presentation and the discussion that followed highlighted some of the challenges as well as some of the positive action steps that can be taken to ensure digital humanities research is more inclusive. 

Frankie Perry, Postdoctoral Research Assistant, InterMusE project, University of York / Music Manuscripts and Archives Cataloguer, British Library

I gave a paper with Prof. Rachel Cowgill (University of York) who is Principal Investigator on the InterMusE project – a collaborative venture between musicologists, computer scientists, and archive and library specialists funded by the AHRC’s UK-US New Directions for Digital Scholarship in Cultural Institutions programme. The British Library is an institutional partner, with Dr Rupert Ridgewell (Lead Curator, Printed Music) as Co-Investigator; the universities of Swansea and Illinois at Urbana-Champagne are further partners, and we’re also working with the University of Waikato. In our paper, we introduced the complexities of sourcing, digitising, and piecing together ephemera relating to historical musical events (eg. concert programmes, flyers, newspaper reviews), using as our case study materials relating to the British Music Society (1918-1933) and its regional centres and branches. We showed the interface of the digital archive built for the project, which uses a combination of the Greenstone Digital Library system, the Mirador Annotation Viewer, and the SimpleAnnotationServer to make materials browsable, searchable, and interactive for musicologists and community users alike.

I really enjoyed the event and the snapshot it provided into current digital humanities research and techniques. I especially enjoyed a paper by Orla Delaney (Cambridge) on 'Database ethnography and the museum object record', and one by Lisa Griffith (Digital Repository of Ireland) and Laura Molloy (CODATA) titled 'Pathways to collaboration – creating and sharing GLAM image collections as data'.

Stella Wisdom, Digital Curator for Contemporary British Collections

My lightning talk 'Collaborating to Curate and Exhibit Complex Digital Literature' reflected on the cooperation between curators, researchers, experimental writers and creative practitioners to plan and produce the British Library’s Digital Storytelling exhibition (2 June 2023 - 15 October 2023). A hands-on display, which explores the ways that digital innovations have transformed and enhanced our narrative experiences. Showcasing eleven examples of electronic literature that invite readers to become a part of the story themselves, through interactive narratives that respond to user input, reading experiences influenced and personalised by data feeds, and works that draw from multiple platforms and audience participation to create immersive story worlds. Preparing and in some cases modifying these interactive works to display them in a public gallery has only been possible through practical collaborations between Library staff with the writers and games studios who created these digital stories. I shared some insights from my experience of this co-curation work and encouraged attendees to visit the exhibition.

It was a pleasure to meet a number of people in real life who I had only previously spoken with online. A personal highlight was hearing Reham Hosny from the University of Cambridge and Minia University speak about 'DH and E-Lit Communities: Intersectional Perspectives'. In the refreshment breaks at this event I chatted with Reham about her novel, Al-Barrah (The Announcer) and she demonstrated to me how both augmented reality and hologram technologies work with the printed book to immerse readers in this thought provoking narrative.

07 October 2022

The UEFA Women’s EURO 2022 Arts and Heritage Programme

by Caterina Loriggio, UEFA Women’s EURO Arts and Heritage Lead

Jan Lyons (Manchester Corinthians) and Gail Redston (Manchester City) looking at the 1921 Ban. Part of Trafford's heritage programme. Photo by Rachel Adams for UEFA WEURO 2022 heritage programme
Jan Lyons (Manchester Corinthians) and Gail Redston (Manchester City) looking at the 1921 Ban. Part of Trafford's heritage programme. Photo by Rachel Adams for UEFA WEURO 2022 heritage programme

The UK Web Archive has been collaborating with the UEFA Women’s EURO 2022 Arts and Heritage Programme to develop the UEFA Women's Euro England 2022 web archive collection. In this guest blog post, we hear about the wider arts and heritage programme around the tournament from Caterina Loriggio.

The UEFA Women’s EURO 2022 arts and heritage programme was designed to promote community engagement, develop cultural leadership, support health and wellbeing, reinforce civic pride and to support local economies post-pandemic. Host City partners (Rotherham, Sheffield, Trafford, Wigan, Manchester, Milton Keynes, Brent, Hounslow, Brighton, and Southampton) were all keen to amplify the opportunity the tournament provided to engage and inspire their residents and visitors.

The £3m programme was supported by National Lottery players through Arts Council England and National Lottery Heritage Fund grants and through funding from the Host Cities. It included four arts commissions, eight museum/archive exhibitions, eight outdoor exhibitions, heritage outreach and education programmes, 45 memory films and new online content covering the history of the women’s game. The project also researched for the first time the full line-up of all the women who have played for England over the past 50 years. Many of those women will be honoured at Wembley Stadium on October 7th in front of a sell-out crowd when they will take a lap of honour during half time in the England USA match.

It was the first time The FA had ever delivered a cultural programme. A key priority for The FA is to establish female role models for both girls and boys. When Host City partners requested a cultural programme to support the tournament the Association saw that this could be a great opportunity to further fulfil this objective. It was also clear that partnering with cultural organisations in Hosts Cities, and national institutions such as the UK Web Archive and British Library would also be a great way to promote the UK’s cultural sector and would be a very effective tool to capture, for the first time on a national scale, the hidden history of women’s football.

Prior to writing funding applications, I led, with the support of the Football Supporters’ Association, four online fan consultations to ensure the programme spoke to the wants of women’s football fans. We also commissioned the organisation ‘64 Million Artists’ to lead half-term virtual workshops for young people aged 12 – 18 in Host Cities (many of whom played football). The fans and young people’s feedback was shared with artists, archivists and curators and was clearly reflected in all elements of the programme. The fans were clear that they could ‘never get enough history’.

Archives and contemporary collecting played an important part in the heritage programme. It was apparent many stories of women’s football (fans as well as players) had been lost already and that women who had played during the ban (1921-1970) were of an age that if we did not collect their stories now, then there was a real risk that they might never be captured. As well as collecting physical objects for museums and archives like caps, pennants, and programmes, there was a significant degree of online archiving. Many of the Host Cities created online exhibitions, hosted films, and imagery on digital archive platforms and digitally captured objects which retired footballers were happy to loan but not donate. Nationally we made 36 memory films live on The FA website. These will be moved to EnglandFootball.com in time for the 50th Anniversary of the Lionesses in November, plus there will be some new content made especially for the anniversary. We were greatly supported in our programme by The National Football Museum and Getty Images who gave us access to their photography archives, which greatly enriched all our work. We also sought to create content for the future by commissioning Getty photographers and by running fan and young people’s photography campaigns to capture the atmosphere of match day and the fan experience beyond the pitch. Some of these images will be shared in an online Getty Images Gallery to be launched in November.

It is hoped that the learnings from this programme will help to secure cultural content in future UK bids for major sporting events. I hope that archiving and collecting will remain important components in all these future projects.

Related Links
This is the ninth blog post published so far about the women’s Euros, the others can be found on the UK Web Archive blog under the 'sports' tag.

There is still an active call for nominations for the UEFA Women's Euro England 2022 web archive collection. Anyone can suggest UK published websites to be included in the archive by filling in our nomination form.

05 October 2022

iPres 2022 Conference Report from the UK Web Archive

By Helena Byrne, Nicola Bingham, Dr Andrew Jackson, British Library, Eilidh MacGlone, National Library of Scotland and Caylin Smith, Cambridge University Libraries

IPres2022-logo

iPres is the largest international conference on digital preservation. The conference has been held every year since 2004. The 2022 edition was hosted by the DPC in Glasgow. This meant that the official conference website ipres2022.scot was within scope for the UK Web Archive to preserve. You can view the archived version of the website here: 

https://www.webarchive.org.uk/wayback/archive/20220914105705/https://ipres2022.scot/ 

Screenshot of the iPres 2022 conference website

iPres 2022 was held from Monday 12 to Friday 16 September. There were a mix of presentations over the week with workshops, long papers, short papers, poster presentations and lightning talks as well as show and tell sessions in the form of a ‘Bake Off’. On the final day of the conference, there were a number of site visits to organisations that are running a digital preservation programme. 

This year’s conference also coincided with the 20th anniversary celebrations of the DPC, as well as the DPC Preservation Awards that are held every two years. In 2020, the UK Web Archive won The National Archives (UK) Award for Safeguarding the Digital Legacy at the virtual Digital Preservation Awards 2020 ceremony.

There are also a number of awards given at iPres in various categories. This year’s winner of the Angela Dappert Memorial Award established in 2021, was Dr Andrew Jackson, Technical Lead for the UK Web Archive for his presentation ‘Design Patterns in Digital Preservation: Understanding Information Flows’. 

Many UK Web Archive colleagues from the British Library, National Library of Scotland and Cambridge University Library attended the conference both as delegates and presenters. In this blog post they have reported back on their conference experience.

British Library

Dr Andrew Jackson
As well as presenting my Design Patterns paper, I was also involved in a workshop on format registries in digital preservation. Both sessions were well-attended and seemed to go well, and I’m planning to post about both in more detail in the future. 

I particularly enjoyed the session on DNA storage, especially because of Euan Cochrane’s approach: working with a DNA lab at Yale University to independently verify the work being done by Twist Bioscience.  It’s still a long way from being a storage option we can depend on, but it’s starting to look like it might actually happen!

There were a lot of good quality papers but I particularly enjoyed “Monitoring Bodleian Libraries' Repositories with Micro Services” presented by James Mooney. The overall approach was very similar to how I like to work, from the design of the overall architecture (federated monitoring of resources in situ rather than centralised and ingest-driven) to the style of implementation (microservices combined with best-in-class open source service components).

Nicola Bingham
This was the first iPres conference I have attended. I wish I could have been there in person but due to practicalities, I attended online. Some of my highlights were the presentation from William Kilbride in which he stated that one of the aims of the DPC was to build “the social infrastructure of digital preservation” (as opposed to focussing on technical aspects), which I think has always been true but is now more so than ever especially when it comes to diversifying our archives and enabling communities to have agency in telling their own stories, as articulated by Tamar Evangelista-Dougherty in her keynote. 

Other highlights were hearing from Garth Stewart, Head of Digital Records at National Records Scotland. Garth presented on NRS’s two year project to ingest and make available Scottish Government Cabinet Records and had practical advice for negotiating the transfer of good quality metadata from the depositors - it’s all about gaining trust and explaining to depositors that the quality of metadata provided impacts the experience of the end users. I was also intrigued that they had the challenge of building and maintaining two access solutions, one for journalist access and one for the public. 

A final highlight for me was the long paper, “A Digital Preservation Wikibase” by Kenneth Seals-Nutt of Yale University. Kenneth’s presentation set down the practical steps taken by Yale University Library’s department of digital preservation to implement a Wikibase instance and how this was used to transform a data set related to software into a knowledge base using technologies of the Semantic Web. This is particularly useful to us at the UK Web Archive as we consider the next steps in our web archiving roadmap. 

Helena Byrne
This was my first time attending iPres but I wasn’t able to make it in person so I was delighted that they had an option to join the conference remotely. I was also involved in a collaborative poster presentation with Katharina Schmid (Bayerische Staatsbibliothek) and Sharon Healy (Maynooth University). Our poster ‘Exploring Software, Tools and Methods used in Web Archive Research’ was part of a bigger study that will be published through WARCnet in the coming weeks. 

There were so many great talks, especially around inclusion and diversity in the wider digital preservation field. This along with activism was also a common theme in the three keynotes. These were all very different in scope so it is hard to pick one over the other but I will definitely be watching back over these in the coming weeks and I will share them with colleagues when they are published online.

National Library of Scotland

Eilidh MacGlone
I was grateful to have the opportunity to attend iPres this year. This was my first experience of the conference, and it was a happy one. There were lots of opportunities to meet up with new people and catch up with those I knew from the preservation world. And it was useful! The continuous improvement models are a very handy way to set achievable targets to professionals who are often the only preservationists in their organisation. I know this will be useful to me, even though I am not on my own. I was fascinated to hear about DNA data storage, which although not yet operating at scale, has interesting properties of robustness at room temperature.

You can read more about one of Eilidh’s takeaways from iPres in her blog post - iPres report: a simple workshop exercise using Robust Links.

Cambridge University Library

Caylin Smith
Glasgow 2022 was the second in-person iPres I’ve attended; I previously attended in 2019 when the conference was held in Amsterdam. I was grateful to attend again this year to present about ongoing research as well as catch up with friends and colleagues in the field and meet some new faces. 

Along with Sara Day-Thomson (Edinburgh University Library) and Patricia Falcao (TATE), I led a workshop on the first day of the conference. Titled “Preserving Complex Digital Objects: Revisited”, this workshop picked up on the workshop we gave at iPres in 2019 and focused on supporting the collection management of digital materials for which few or no solutions currently exist. 

There were many great submissions to iPres this year. One paper on the topic of web archiving that stood out to me was “These Crawls Can Talk. Context Information for Web Collections” by Susanne van den Eijkel and Daniel Steinmeier from the KB (National Library of the Netherlands). I’m looking forward to thinking further about their research in the context of web archiving activities at Cambridge University Libraries. 

The next iPres conference will be held in Champaign-Urbana, Illinois in the U.S.A. from September 19-22, 2023.

07 September 2022

GLAM Workbench update

By Andy Jackson, Web Archive Technical Lead, British Library

In 2020, we led a project funded by the International Internet Preservation Consortium (IIPC) called Asking questions with web archives – introductory notebooks for historians, developing a set of Jupyter notebooks to introduce researchers to the potential and possibilities of web archives. In collaboration with the National Library of Australia and National Library of New Zealand, this funding enabled Tim Sherratt to create the Web Archives section of the GLAM Workbench.

Screenshot of GLAM workbench website

We were very happy with how this project worked out, and we think collaborating with someone like Tim opens up new ways of supporting researchers working with web archives. If you’d like to know more about the results of the project, check out Tim’s 2020 blog post and his conference presentation from 2021.

While the investment in project funding got the ball rolling, the GLAM Workbench needs ongoing management and maintenance to keep it running.  This should not be taken for granted, so we’re proud to announce that the Web Archives section of the GLAM Workbench is now supported by the British Library.

We hope this will help ensure this critical resource remains available in the future, and we would like to encourage other web archives to look at whether they could pursue project or supporting funding to help maintain and grow the GLAM Workbench.

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs