UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

4 posts from February 2013

26 February 2013

Governing the Police: a special collection

In an earlier post, I wrote about our efforts to capture the c.41 police authority websites, due to go offline with the abolition of the authorities themselves following the new Police Reform and Social Responsibility Act 2011. Saving web-based content from disappearing forever like this is a key part of the mission of the Web Archive.

However, our objectives also include collecting comprehensively around current issues, in order to capture all the issues and debates. It was therefore important to capture the sites of the newly appointed Police and Crime Commissioners as well, to set alongside the sites of the defunct police authorities, so that researchers will be able to track changes in the way the police are governed over time. Other websites in this collection include some relating to the first elections of Police and Crime Commissioners in November 2012 and a sample of news coverage. Ass PCC thumbnail

For websites for the selective archive, we use a permissions approach, which is resource-intensive. Each police authority was contacted on average between four and seven times to secure the permission and for us to answer questions. Nevertheless we achieved a 100% success rate with the PA sites as there was the added impetus for the website publishers of the sites going offline. With the Commissioners websites the process was a little easier as the staff were (by and large) the same people we had contacted at the police authorities. It helped that we were able to articulate the benefits of having a corporate archive from the very beginning that would be accessible by both the commissioners and their staff and also by the public, capturing content that may be taken off the live website but may be needed in future.

At the time of writing there were 80 titles in the Governing the Police collection although we are still adding titles as and when we receive permission to archive them. We will be taking regular snapshots every six months to capture developments over time, and so the collection promises to be a fundamental resource to scholars of policing in Britain in the years ahead.

Nicola Johnson and Ravish Mistry

19 February 2013

Nineteenth century English literature: a new special collection

[A guest post from Andrea Lloyd, Curator of Printed Literary Sources, 1801-1914 at the British Library]

After almost a year of gathering I’m pleased to announce that my ‘Curator’s Choice’ collection of websites relating to 19th century English literature has now been published on the UK Web Archive.

As a curator of printed literary sources for the period 1801-1914 it doesn’t require a great leap of imagination to discover why I chose this particular topic. The collection is intended to reflect the diverse interests in the genre that are substantiated on the web. Opinions about, and interpretations of 19th century literature and its authors are constantly evolving and I hope that this resource contextualises these important scholarly and cultural changes.

The sites included so far display a broad and eclectic array of subject matters – ranging from author societies to museums; from literary adaptations to academic syllabi. 19th century literature is still hugely popular and attracts a wide audience. Given the massive interest in the likes of Jane Austen and Charles Dickens, I initially thought I would concentrate on lesser-known authors, and on literature that has grown somewhat obscure in the intervening years. This ultimately isn’t how the collection has evolved – sometimes because many of the more niche sites are published without giving any administrator contact details (so permission cannot be sought to archive the site). In other cases, the owners have not responded to permission requests – often because they have cast the sites off into the vast ‘webosphere’ to fend for themselves.

Anna_t BY-NC-SA Flickr

As someone who works with 19th century printed ephemera on a regular basis I found this exercise particularly fascinating. Pertinent comparisons can be drawn between the ephemeral items that are published on the web and those that were printed in the 19th century. A great deal of the ephemeral literature produced in the 19th century has survived to this day (albeit in a fragile state) – either through luck or thanks to collectors with foresight. Given its transient and contributory nature there is a great danger that similar items produced in electronic formats may not be so lucky – hence the reason the Web Archive is so vital. Hopefully my 22nd century counterpart will thank me for choosing to preserve for posterity some of the more marginal, fleeting and subjective sites available relating to the genre!

Now it’s available for all to see, I hope that others will recommend sites that they think would complement the theme and  help to create a lasting snapshot of 19th century literary scholarship in the 21st century. Do get in touch via this blog, or @UKWebArchive on Twitter.

[Image by anna_t, Creative Commons BY-NC-SA]

12 February 2013

What’s in a name ? Domain names and website longevity

I wrote about how to make websites more archivable in a previous post. Having websites archived and making an effort to make websites “archive-friendly” are all good steps which can help increase their longevity. This blog post is about domain names, the name you use to call your website and the address which identifies it on the Web.

To obtain a domain name, you need to pay an annual fee with a registrar for the right to use it. The rented nature of domain names means that they are not permanent and the same domain name could host completely different content at different times if it changes hands.

When planning the take-down or replacement of a website, the question of what to do with the domain name requires some thought. As well as being relevant to record-keeping, it is an important part of (business) continuity.

CyboRoz 404

In most cases the existing domain name is used to host the new version of the website. This is usually the right thing to do – users expect it and (if you chose the right one) a domain name often becomes a part of the identity of the website and/or the brand. Unless there are good reasons to switch to a new one, most domain names are kept when changing websites. Many websites also provide users with the option to view historical versions of the website by linking to a web archive or putting in place a landing page which points to old versions as well as new.

When a website is taken out of service, keeping the domain name and redirecting it to the archival version is also an option. This will incur a small charge in retaining the domain name; but this is much less than paying for the hosting fee and technical support to keep a website live. The advantage of this approach is seamless continuity: users are automatically referred to an archival version of the website without having to be aware of the existence of the web archive. For example,, the domain name of the One and Other Project, featuring artist Antony Gormley’s commission for Trafalgar Square’s ‘empty’ fourth plinth in July 2009, points directly to the archival version in the UK Web Archive. Users can type the same web address or click on a link as they used to do and get to the website, despite the fact that it disappeared from the live web years ago.

Keeping the domain name may not be the right solution for everyone but it’s a possibility well worth considering.

Helen Hockx-Yu

[Image courtesy of Roberto Zingales, Creative Commons CC-BY 2.0, via Flickr]

07 February 2013

Archiving social media: a workshop report

I was very pleased to be invited to a recent workshop on social media archiving. It was organised by Laura Lannin and colleagues at the Museum of London, to whom many thanks for a wide-ranging and stimulating afternoon.

The day saw a cluster of diverse and useful presentations. Among them was our very own Helen Hockx-Yu, on the potential and problems relating to social media archiving on a national scale, as we experience them at the UK Web Archive. Web archiving is always a technological arms race, with the archiving technologies having to adapt constantly as the way the web works continues to change.

The other presentations between them showed the wide variety of perspectives from which the whole issue needs to be approached. Two projects examined the way in which Twitter can be used as a means of identifying content on the wider web that should be preserved, as well as an archive resource in itself. Both projects came from within specialist museums, and both were concerned with the Olympics. The Victoria and Albert Museum (represented by Catherine Flood) had monitored Twitter to identify graphically significant visual resources, shared on Flickr as the Collect London 2012 collection. The Museum of London (in partnership with Peter Ride of the University of Westminster) had gone a step further, bringing together a team of Citizen Curators to keep eyes and ears open during the Games for important resources, and to identify them by means of the Twitter hashtag #citizencurators for later harvesting.

In contrast, Ruth Page (University of Leicester, or @ruthtweetpage) gave us the perspective of a linguist interested in the analysis of large corpora of tweets, for the patterns of language usage within them. And although there was not a presentation from this perspective, several of those present were responsible for social media engagement between museums and their users, and are faced with working out how best to archive their own social media output.

In a previous post, Nicola Johnson reported on the difficulties of implementing web archiving activity in national libraries charged with archiving the web outside their own walls. This workshop neatly showed the different concerns of a wider group of interested parties. Whether it is national libraries, museums or users; whether it is social media content itself or the other resources they link to, there is much to think about when it comes to social media archiving. 

Peter Webster (@pj_webster)