UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

13 posts categorized "vanished site"

29 May 2024

IIPC Web Archiving Spring/Summer School and Conference 2024: Report from UK Web Archive Colleagues

Nicola Bingham, Helena Byrne, Ian Cooke, Gil Hoggarth, Cameron Huggett (British Library), Caylin Smith (Cambridge University Library) and  Eilidh MacGlone (National Library of Scotland).

GAWAC2024-website-banner-v4.4-o

This year’s IIPC General Assembly and Web Archiving Conference took place at the Bibliothèque nationale de France (BnF) in Paris. Before this year's conference there was an Early Scholars Spring School on Web Archives aimed at early career researchers interested in working with web archive materials.

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the Spring/Summer School and the Web Archiving Conference both as delegates and presenters. In this blog post they report highlights of their conference experience.

Nicola Bingham, Lead Curator of Web Archives, British Library

The IIPC conference lived up to its reputation for being incredibly informative, inspiring, and intense! It was wonderful to reconnect with ‘old’ friends and to meet many new colleagues who are bringing diverse skills and perspectives to the field of web archiving.

As Co-Chair of the IIPC’s Content Development Group, alongside Alex Thurman of Columbia University Libraries, I delivered the keynote speech at the Early Scholars Spring School on Web Archives, which preceded the conference. Our presentation reflected on the history, importance, and legacy of the collaborative transnational web archive collections initiated by IIPC members over the past 14 years.

It was fascinating and gratifying to hear from web archive scholars about their diverse approaches and the variety of research questions they are exploring using web archives. Having worked in web archiving for 20 years, I find the increasing use of collections by researchers, particularly through data-mining approaches, especially interesting and rewarding.

Another interesting and informative highlight was the conference opening keynote speech by Pierre Bellanger, Pauline Ferrari, Jérôme Thièvre, and Sara Aubry. Pierre Bellanger, the founder and CEO of Skyrock and Skyrock.com, emphasised that "there is no freedom without memory," setting the tone for a discussion on the archiving of Skyblogs . Sara Aubry, web archiving technical lead at BnF, detailed the challenges they faced, including working with the Skyblog technical team on short notice to archive the blogs and altering web pages to display more articles and comments before the platform went offline. They managed to collect a substantial amount of content before the closure, amassing 5 million media files and providing API access for metadata extraction. This initiative highlights the importance of preserving the vernacular web, capturing personal pages rather than corporate content. The Skybox project further explores data-oriented methods of access and structural metadata to enhance discovery, with potential future projects aiming to build large language models to analyse and identify regional content within the blogs.

Helena Byrne, Curator of Web Archives, British Library

At this year's conference I presented in the Lighting Talk and Poster sessions. The abstracts are available to read on the IIPC website. IIPC WAC 2024 was a really great conference and there were so many takeaways to help improve my practice. One session I’d like to focus on for this blog post was SESSION #10: Digital Preservation. This session focused on citation practices for researchers using web archives in their research. This is an area that is not fully understood in the academic publishing world. I particularly liked the Citation Saver tool from Arquivo.pt as this is a simple but effective tool to bulk upload online citations from an academic publication. At the British Library we support a variety of researchers and the tools and methods discussed in this session will be useful to support them using web archives in their work. 

Gil Hoggarth, Web Archive Technical Lead, British Library

I personally had not been able to attend the last few IIPC annual conferences, so it was fabulous to meet up and connect with old faces, and new, and learn about all the exciting projects going on. As I take a technical view (of most things), I found it particularly interesting that so many institutions were trying to establish, and expand, their web archiving services. Plus, the number of people involved in joint projects, with a combined aim but also with a community benefit in mind, was quite striking. Now, having returned to challenges ahead for The British Library and the UK Web Archive, I feel far more informed and aware of these community efforts - and have been in contact with many conference attendees to follow up!

Caylin Smith, Head of Digital Preservation, Cambridge University Libraries 

This was my second time attending the IIPC conference; I attended last year in Hilversum. I enjoy attending this conference for its presentations about solving operational challenges relating to web archiving and ones about how web archiving supports an institution’s strategic mission. 

I chaired a panel titled “Striking the Balance: Empowering Web Archivists and Researchers In Accessible Web Archives” whose presenters included Leontien Talboom (Technical Analyst on the CUL Digital Preservation team), Alice Austin (Web Archivist at Edinburgh University Library), Tom Storrar (Head of Web Archiving at The National Archives, UK), and Andrea Kocsis (Heritage and Digital Humanities researcher formerly at Northeastern University London; now Chancellor’s Fellow at the University of Edinburgh). 

This panel focused on different perspectives to using web archives, including as a leader of a web archiving service, as a web archivist, and as a researcher. It highlighted evolving user expectations for web archives as well as the challenges around communicating what users can and cannot do because of technical and/or legislative requirements.

Cameron Huggett, PhD Student (CDP), British Library/Teesside University

I attended the IIPC Early Scholars Spring School on Web Archives. You can read more about my reflections at this event in this event in this blog post -  https://blogs.bl.uk/webarchive/2024/05/reflections-on-the-iipc-early-scholars-spring-school-on-web-archives-2024.html 

Eilidh MacGlone, Web Archivist, National Library of Scotland

I was attending my second IIPC in Paris, the last was in 2014. This when I was a nervous first timer – so I was happy to take part in the new mentorship programme. It was a good way to share experience across different points in our professional arcs.

Planning my conference agenda, presentations on machine learning were at the top of my list. These outlined services to classify and retrieve items from large, complex stores of resources. I knew these would be interesting, as attempts to solve a problem with no complete answer.

Ben Charles Germain Lee spoke about working with born digital government publications. He introduced these ideas using a published experiment. This combination of text and visual analysis provides at least one way to organise retrieval of a very large collection. In the presented case, born digital government publications derived from the End of Term web archive. In future, these techniques could offer a way to offer information retrieval to readers for collections which are too big to catalogue.

The IIPC’s Training Working Group session, led by Claire Newing (TNA) and Ricardo Basílio (Arquivo.pt) was another highlight. It gave me a chance to speak briefly on the most important thing in training colleagues (practice!) and the group shared a lot of really good ideas for training. I had the opportunity to use the information almost immediately on my return, training a colleague to self-archive. All in all, this IIPC was a conference with many good lessons.

Ian Cooke, Head of Contemporary British & Irish Publications, British Library

This year, I was struck by how big, and how varied, web archiving has become. The conference covered a huge array of topics and approaches. Many thanks to the Programme Committee, and especially to the team at BnF for being such excellent hosts. For me, the conference got off to a great start a day early, as I attended the pre-conference workshop on appraisal strategies for web archive curated collections, led by Melissa Wertheimer (Library of Congress). The hands-on session was a very clear reminder of the importance of professional librarians and archivists in creating focused and meaningful collections. The conference was also an opportunity for me to dive into some of the more technical sessions. Kristi Mukk and Matteo Cargnelutti’s (Harvard University Library) presentation on using AI to support search in web archives was both very clear and inspiring. I particularly liked Kristi’s assertion that ‘AI literacy is information literacy’ and the importance of thinking like a librarian. Katherine Boss’ (New York University Library) paper on an experimental project to preserve dynamic and database-driven websites using server-side web archiving (not something to be done at scale!) was also brilliant. Both also emphasised the importance of working collaboratively in teams, bringing principles from librarianship to work alongside software engineering in developing and testing new responses to preservation and discovery challenges.          

Conclusion

The IIPC Web Archiving Spring/Summer School and Conference 2024 at the Bibliothèque nationale de France provided a dynamic platform for exchanging ideas, learning about innovative projects, and fostering collaborations in the field of web archiving. UK Web Archive colleagues contributed significantly through presentations and active participation. This conference highlighted the evolving landscape of web archiving, emphasising the importance of preserving the vernacular web, improving researcher access, and leveraging new technologies like AI for better archival practices. As we return to our respective roles, we carry forward new insights and strengthened connections, ready to tackle the challenges ahead with renewed vigour and informed strategies.




02 November 2020

Digital archaeology in the web of links: reconstructing a late-90s web sphere

By Dr. Peter Webster, Independent Scholar, Historian and Consultant

Fiber cables for the internet

 

The historian of the late 1990s has a problem. The vast bulk of content from the period is no longer on the live web; there are few, if any, indications of what has been lost – no inventory of the 1990s web against which to check. Of the content that was captured by the Internet Archive (more or less the only archive of the Anglophone web of the period), only a superficial layer is exposed to full-text search, and the bulk may only be retrieved by a search for the URL. We do not know what was never archived, and in the archive it is difficult to find what we might want, since there is no means of knowing the URL of a lost resource. Sometimes we need, then, to understand the archived web using only the technical data about itself that it can be made to disclose.

Niels Brügger has defined a web sphere as ‘web material … related to a topic, a theme, an event or a geographic area’.  My paper at the EWA conference presents a method of reconstructing a web sphere, much of which is lost from the live web and exists only in the Internet Archive: the web estate of the many conservative Christian campaign groups in the UK in the 1990s and early 2000s.

This method of web sphere reconstruction is based not on page content but on the relationships between sites, i.e., the web of hyperlinks. The method is iterative, and tracks back and forth between big data and small. Individual archived pages and directories, printed sources, the scholarly record itself, and even traces of previous unsuccessful attempts at web archiving come into play, as does a large dataset held by the British Library. From the more than 2 billion lines in the UK Host Link Graph dataset it is possible to extract the outlines of this particular web sphere.

You can watch Peter Webster’s presentation on his website peterwebster.me

 

Previous studies using a similar method are: 

Webster, Peter. 2019. Lessons from cross-border religion in the Northern Irish web sphere: understanding the limitations of the ccTLD as a proxy for the national web. In The Historical Web and Digital Humanities: the Case of National Web domains, eds Niels Brügger & Ditte Laursen, 110-23. London: Routledge.  http://dx.doi.org/10.17613/yms5-9v95     

Webster, Peter. 2017. Religious discourse in the archived web: Rowan Williams, archbishop of Canterbury, and the sharia law controversy of 2008. In: The Web as History, eds Niels Brügger & Ralph Schroeder, 190-203. London: UCL Press. (Available Open Access at:  https://www.uclpress.co.uk/products/84010)

 

12 February 2013

What’s in a name ? Domain names and website longevity

I wrote about how to make websites more archivable in a previous post. Having websites archived and making an effort to make websites “archive-friendly” are all good steps which can help increase their longevity. This blog post is about domain names, the name you use to call your website and the address which identifies it on the Web.

To obtain a domain name, you need to pay an annual fee with a registrar for the right to use it. The rented nature of domain names means that they are not permanent and the same domain name could host completely different content at different times if it changes hands.

When planning the take-down or replacement of a website, the question of what to do with the domain name requires some thought. As well as being relevant to record-keeping, it is an important part of (business) continuity.

CyboRoz 404

In most cases the existing domain name is used to host the new version of the website. This is usually the right thing to do – users expect it and (if you chose the right one) a domain name often becomes a part of the identity of the website and/or the brand. Unless there are good reasons to switch to a new one, most domain names are kept when changing websites. Many websites also provide users with the option to view historical versions of the website by linking to a web archive or putting in place a landing page which points to old versions as well as new.

When a website is taken out of service, keeping the domain name and redirecting it to the archival version is also an option. This will incur a small charge in retaining the domain name; but this is much less than paying for the hosting fee and technical support to keep a website live. The advantage of this approach is seamless continuity: users are automatically referred to an archival version of the website without having to be aware of the existence of the web archive. For example, www.oneandother.co.uk, the domain name of the One and Other Project, featuring artist Antony Gormley’s commission for Trafalgar Square’s ‘empty’ fourth plinth in July 2009, points directly to the archival version in the UK Web Archive. Users can type the same web address or click on a link as they used to do and get to the website, despite the fact that it disappeared from the live web years ago.

Keeping the domain name may not be the right solution for everyone but it’s a possibility well worth considering.

Helen Hockx-Yu

[Image courtesy of Roberto Zingales, Creative Commons CC-BY 2.0, via Flickr]

04 October 2012

Exploring the lost web

There has been some attention paid recently to the rate at which the web decays. A very interesting recent article by SalahEldeen and Nelson looked at the rate at which online sources shared via social media subsequently disappear. The authors concluded that 11% would disappear in the first year, and after that there would be a loss of 0.02% per day (that's another 7.24% per year); a startling rate of loss.

There are ways and means of doing something about it, not least through national and international web archives like ourselves. And we preserve many extremely interesting sites that are already lost from the live UK web domain.

Some of them relate to prominent public figures who have either passed away, or are no longer in that public role. One example of the former is the site of the former Labour MP and foreign secretary Robin Cook, who died in 2005. One of the latter is that of his colleague Clare Short, who left parliamentary politics in 2010 after serving as secretary of state for international development.

Organisations also often have limited lives as well, of course, and amongst our collections is the site of the Welsh Language Board, set up by act of Parliament in 2003, and abolished by later legislation in 2012. Perhaps more familiar was one of the major corporate casualties of recent years, Woolworths, which went into administration in 2009.

Some others relate to events that have happened or campaigns that have ended. In the case of some of the more 'official' sites, we in the web archiving team can anticipate when sites are likely to be at risk, and can take steps to capture them. In other cases, we need members of the public to let us know. If you know of a site which you think is important, and that may be at risk, please let us know using our nomination form.

One such site is One and Other, Anthony Gormley's live artwork on the vacant fourth plinth in Trafalgar Square. Also in the archive is David Cameron's campaign site when a candidate for the constituency of Witney in the 2005 general election. Finally, there is What a difference a day makes, a remarkable blog post from one who experienced the London terrorist attacks of 2005. All three now exist only in the web archive.

15 December 2011

Advent Calendar: December 15th

Consumers for Ethics in Research: CERES

'An independent charity set up 1989 to promote informed debate about research and help users of health services to develop and publicise their views on health research and on new treatments.'

Website archived on: 15 December 2006

Still available on live web? No

Ceres

Archived by: The Wellcome Library

Subject classifications: Medicine & Health > Health Organisations and Services

Special collection? No

Other instances? Yes, 9 in total (archived from 2004 - 2007)

12 December 2011

Advent Calendar: December 12th

SpecialSchool.org

'Site created to help teachers, parents and professionals understand the workings of a Special School for Children with Severe and Profound Learning Difficulties'

Archived on: 12th December 2005

Still available on live web? No

Specialschool
Archived by: The British Library

Subject Classifications: Society & Culture; Education & Research > Special Needs Education

Special collection: No

Other instances? Yes - 12 Dec 2006 (though as a parked domain)

09 December 2011

Advent Calendar: December 9th

AFFIRM Project website

'A Flexible Framework for Institutional Records Management'

Archived on: December 9th 2006

Still available on the live web? No

Affirm
Archived by: The JISC (Joint Information Systems Committee)

Subject Classification: Education & Research > Higher Education

Special collection? No

Other instances available? Yes - also March, June, Sept (2006); March 2007. 

08 December 2011

Advent Calendar: December 8th

Martin Blyth (poet)

Website archived on: 8th December 2005

Still available on live web? No. Martin passed away in 2006. 

Blyth
Archived by: The British Library

Subject Classification: Arts & Humanities > Literature 

Special collection? No

Other instances available? No