THE BRITISH LIBRARY

UK Web Archive blog

12 posts categorized "vanished site"

02 November 2020

Digital archaeology in the web of links: reconstructing a late-90s web sphere

Add comment

By Dr. Peter Webster, Independent Scholar, Historian and Consultant

Fiber cables for the internet

 

The historian of the late 1990s has a problem. The vast bulk of content from the period is no longer on the live web; there are few, if any, indications of what has been lost – no inventory of the 1990s web against which to check. Of the content that was captured by the Internet Archive (more or less the only archive of the Anglophone web of the period), only a superficial layer is exposed to full-text search, and the bulk may only be retrieved by a search for the URL. We do not know what was never archived, and in the archive it is difficult to find what we might want, since there is no means of knowing the URL of a lost resource. Sometimes we need, then, to understand the archived web using only the technical data about itself that it can be made to disclose.

Niels Brügger has defined a web sphere as ‘web material … related to a topic, a theme, an event or a geographic area’.  My paper at the EWA conference presents a method of reconstructing a web sphere, much of which is lost from the live web and exists only in the Internet Archive: the web estate of the many conservative Christian campaign groups in the UK in the 1990s and early 2000s.

This method of web sphere reconstruction is based not on page content but on the relationships between sites, i.e., the web of hyperlinks. The method is iterative, and tracks back and forth between big data and small. Individual archived pages and directories, printed sources, the scholarly record itself, and even traces of previous unsuccessful attempts at web archiving come into play, as does a large dataset held by the British Library. From the more than 2 billion lines in the UK Host Link Graph dataset it is possible to extract the outlines of this particular web sphere.

You can watch Peter Webster’s presentation on his website peterwebster.me

 

Previous studies using a similar method are: 

Webster, Peter. 2019. Lessons from cross-border religion in the Northern Irish web sphere: understanding the limitations of the ccTLD as a proxy for the national web. In The Historical Web and Digital Humanities: the Case of National Web domains, eds Niels Brügger & Ditte Laursen, 110-23. London: Routledge.  http://dx.doi.org/10.17613/yms5-9v95     

Webster, Peter. 2017. Religious discourse in the archived web: Rowan Williams, archbishop of Canterbury, and the sharia law controversy of 2008. In: The Web as History, eds Niels Brügger & Ralph Schroeder, 190-203. London: UCL Press. (Available Open Access at:  https://www.uclpress.co.uk/products/84010)

 

12 February 2013

What’s in a name ? Domain names and website longevity

Add comment Comments (3)

I wrote about how to make websites more archivable in a previous post. Having websites archived and making an effort to make websites “archive-friendly” are all good steps which can help increase their longevity. This blog post is about domain names, the name you use to call your website and the address which identifies it on the Web.

To obtain a domain name, you need to pay an annual fee with a registrar for the right to use it. The rented nature of domain names means that they are not permanent and the same domain name could host completely different content at different times if it changes hands.

When planning the take-down or replacement of a website, the question of what to do with the domain name requires some thought. As well as being relevant to record-keeping, it is an important part of (business) continuity.

CyboRoz 404

In most cases the existing domain name is used to host the new version of the website. This is usually the right thing to do – users expect it and (if you chose the right one) a domain name often becomes a part of the identity of the website and/or the brand. Unless there are good reasons to switch to a new one, most domain names are kept when changing websites. Many websites also provide users with the option to view historical versions of the website by linking to a web archive or putting in place a landing page which points to old versions as well as new.

When a website is taken out of service, keeping the domain name and redirecting it to the archival version is also an option. This will incur a small charge in retaining the domain name; but this is much less than paying for the hosting fee and technical support to keep a website live. The advantage of this approach is seamless continuity: users are automatically referred to an archival version of the website without having to be aware of the existence of the web archive. For example, www.oneandother.co.uk, the domain name of the One and Other Project, featuring artist Antony Gormley’s commission for Trafalgar Square’s ‘empty’ fourth plinth in July 2009, points directly to the archival version in the UK Web Archive. Users can type the same web address or click on a link as they used to do and get to the website, despite the fact that it disappeared from the live web years ago.

Keeping the domain name may not be the right solution for everyone but it’s a possibility well worth considering.

Helen Hockx-Yu

[Image courtesy of Roberto Zingales, Creative Commons CC-BY 2.0, via Flickr]

04 October 2012

Exploring the lost web

Add comment Comments (0)

There has been some attention paid recently to the rate at which the web decays. A very interesting recent article by SalahEldeen and Nelson looked at the rate at which online sources shared via social media subsequently disappear. The authors concluded that 11% would disappear in the first year, and after that there would be a loss of 0.02% per day (that's another 7.24% per year); a startling rate of loss.

There are ways and means of doing something about it, not least through national and international web archives like ourselves. And we preserve many extremely interesting sites that are already lost from the live UK web domain.

Some of them relate to prominent public figures who have either passed away, or are no longer in that public role. One example of the former is the site of the former Labour MP and foreign secretary Robin Cook, who died in 2005. One of the latter is that of his colleague Clare Short, who left parliamentary politics in 2010 after serving as secretary of state for international development.

Organisations also often have limited lives as well, of course, and amongst our collections is the site of the Welsh Language Board, set up by act of Parliament in 2003, and abolished by later legislation in 2012. Perhaps more familiar was one of the major corporate casualties of recent years, Woolworths, which went into administration in 2009.

Some others relate to events that have happened or campaigns that have ended. In the case of some of the more 'official' sites, we in the web archiving team can anticipate when sites are likely to be at risk, and can take steps to capture them. In other cases, we need members of the public to let us know. If you know of a site which you think is important, and that may be at risk, please let us know using our nomination form.

One such site is One and Other, Anthony Gormley's live artwork on the vacant fourth plinth in Trafalgar Square. Also in the archive is David Cameron's campaign site when a candidate for the constituency of Witney in the 2005 general election. Finally, there is What a difference a day makes, a remarkable blog post from one who experienced the London terrorist attacks of 2005. All three now exist only in the web archive.

15 December 2011

Advent Calendar: December 15th

Add comment Comments (0)

Consumers for Ethics in Research: CERES

'An independent charity set up 1989 to promote informed debate about research and help users of health services to develop and publicise their views on health research and on new treatments.'

Website archived on: 15 December 2006

Still available on live web? No

Ceres

Archived by: The Wellcome Library

Subject classifications: Medicine & Health > Health Organisations and Services

Special collection? No

Other instances? Yes, 9 in total (archived from 2004 - 2007)

12 December 2011

Advent Calendar: December 12th

Add comment Comments (0)

SpecialSchool.org

'Site created to help teachers, parents and professionals understand the workings of a Special School for Children with Severe and Profound Learning Difficulties'

Archived on: 12th December 2005

Still available on live web? No

Specialschool
Archived by: The British Library

Subject Classifications: Society & Culture; Education & Research > Special Needs Education

Special collection: No

Other instances? Yes - 12 Dec 2006 (though as a parked domain)

09 December 2011

Advent Calendar: December 9th

Add comment Comments (0)

AFFIRM Project website

'A Flexible Framework for Institutional Records Management'

Archived on: December 9th 2006

Still available on the live web? No

Affirm
Archived by: The JISC (Joint Information Systems Committee)

Subject Classification: Education & Research > Higher Education

Special collection? No

Other instances available? Yes - also March, June, Sept (2006); March 2007. 

08 December 2011

06 December 2011

Advent Calendar: December 6th

Add comment Comments (0)

The Hutton Inquiry

Investigation into the Circumstances surrounding the Death of Dr David Kelly in 2003.

Website archived on: 6th December 2004

Still available on the live web?No Yes

Hutton
Archived by: The National Archives

Subject classifications: Government, Law & Politics > Public Inquiries

Special Collection? No

Other instances? Yes: 17 others, collected between Oct 2004 & Feb 2005

(Editors note: updated 10.14am with correct reference to live site)