UK Web Archive blog

6 posts from January 2013

30 January 2013

Surfing the web in time: Mementos

Have you ever needed to see a copy of a now-lost website, and didn't know where to start ? Help is at hand, with Mementos.

Mementos search

The Memento protocol has been around for a while (since 2009). It's a way of adding a time dimension to our common HTTP-based way of browsing the web, and has been available as a plug-in for Firefox. (See for details.)

On the UKWA site, we have launched an alternative web-based way of delivering Memento, without needing to amend your browser. Mementos allows you to search across multiple different web archives around the world at once - particularly helpful if you don't know by which territorial web archive a site is most likely to be kept. It gives a breakdown of how many versions each archive holds, and from when, and leads users through to the archived versions themselves.

Get started with the search page; or, see it in action for the Google homepage (over 4,000 snapshots in four archives since 1999) and the BBC homepage (more than 5,000, in five archives, since 1996).

For those interested in the detailed workings and in reusing the web client, the source code is hosted on Github.

Peter Webster


24 January 2013

Web archiving: how to fit it in ? A workshop report

[A report by Nicola Johnson, Web Archivist at the British Library]

I attended a workshop “How to fit in – integrating a web archiving program in your organization” at the Bibliotheque Nationale de France, in Paris, 26th – 30th November 2012. It was sponsored by the International Internet Preservation Coalition.

The workshop was intended for curators, archivists and managers involved in (or about to embark on) web archiving at their institutions. The BnF has been archiving websites since late 1999 and has a vast amount of expertise. France was an early adopter of legal deposit for websites, with legislation in August 2006 meaning that websites from the French national domain can be collected by the BnF for preservation and public use. I was particularly interested in the transition that they have made to this large-scale operation, as Legal Deposit legislation is expected in the UK this April and we will have the task of integrating large scale archiving with our current selective undertaking.


Several IIPC member organisations attended the workshop, hosted in one of the four ‘towers of open books’ at the BnF’s main site. The Francois Mitterrand building was one of the grands projets of the former president and is one of the largest and most modern libraries in the world. Participants included the British Library, the national libraries of Germany, Slovenia, Estonia, Spain and the Netherlands. Also represented were the Bavarian State Library, the California Digital Library, the National Library and Archives of Quebec, the Bibliotheca Alexandrina and the Library of Congress. Participants represented a range of experience in web archiving and were at different stages of national legal deposit legislation.

A wide range of topics were covered, including the integration of web archiving in acquisition practices; the role of subject librarians in selecting websites; and how web collections should align with general collection development policies. As the business of web archiving involves several parts of a library, we also heard representatives of various departments at the BnF speak of their role, including IT, conservation, legal deposit, collections co-ordination and digital and bibliographic information. There were subject specialists from the music, literature and art departments, who spoke about their collection development policies and how to incentivise staff to select websites when they have a multitude of other duties to perform. Given my role as Web Archivist I was particularly interested in the role of the 70 or so curators or “recommending officers” who select websites for the focussed crawls undertaken by the BnF.

A presentation was also made by the Internet Memory Foundation, a non-profit institution based in Amsterdam and Paris. The foundation provides a shared platform for institutions to collect websites and is archiving dozens of terabytes of data every month. They are also involved in various research projects with institutions and are developing a new crawler and architecture for web-scale crawling. Later in the week we also had the opportunity to visit the National Audiovisual Institute (INA), a repository containing 70 years of French radio programmes and 60 years of TV. The INA shares responsibility for collecting legal deposit online content with BnF and began collecting broadcast-related websites in February 2009. It holds approximately 10,000 websites, employing multiple crawlers for different types of content. Access is available at six sites in France, but some material under open licence is available online.

Our hosts succeeded in creating an atmosphere that was relaxed and stimulating (see the pictures); a great many ideas were exchanged and the commonality of purpose among the participants was encouraging. I have returned to work with a renewed vigour and positivity towards web archiving and I know the other participants have after reading their messages after the event. Positive changes are being made in our respective institutions as a result of the workshop.

[Image of the BNF (Creative Commons BY-NC-SA) from Images et Voyages ]

21 January 2013

What could you do with an archive of the UK web, 1996-2010 ?

The Analytical Access to the Domain Dark Archive (AADDA) project has brought together a group of scholars to help us formulate which analytical tools users will need to make the most of the JISC UK Web Domain Dataset, a dataset of all the holdings of the Internet Archive for the UK from 1996 to 2010.

A (very large) geo-index of the data is already available for download, and the dataset can also be visualised using the Ngram. But this group of scholars of the humanities and social sciences are beginning to imagine the projects they would like to pursue using the data. I myself began to sketch an answer in a previous post on the AADDA blog. Wikimedia_Servers-0051_17

Since then, summaries of those projects have been appearing on the project blog. Here are some of them.

(i) Dr Richard Deswarte will be Exploring and uncovering Euroscepticism in the Dark Archive.

(ii) Saskia Huc-Hepher (University of Westminster) will be exploring the spatial dimensions of the French community in London.

(iii) Professor Gemma Moss (Institute of Education) will be examining the use of statistical data in setting agendas for education change, and the PISA rankings in particular.

(iii) Carole Taylor is investigating the decline of parliamentary political engagement and its implications.

(iv) Helen Taylor (Royal Holloway, University of London) will be examining the reception of the Liverpool poets

Watch out for more posts here on this project as it unfolds. It is a collaboration between ourselves at the British Library, the Institute of Historical Research (University of London) and the University of Cambridge, and is funded by the JISC.

Creative Commons image courtesy of Wikimedia Commons.

14 January 2013

Religion, politics and the law: a new special collection

It has been over two years in the making, but I am delighted to be able to say that my own special collection in the UK Web Archive is now online.

A couple of years ago, long before coming to the BL, I joined the Researchers and the UK Web Archive project at the Library which brought together a group of scholars to guest-curate special collections on our own particular research interests. As an historian, I was interested in the marked sharpening of the terms of discourse about the place of religion in British public life, particularly since 9/11 and the London bombings in 2005. It struck me that a good deal of this debate had already shifted online, and so new ways and means of capturing and preserving it were going to be needed. And so, the ‘politics of religion collection’ (as it was then known) was born. Religion politics law thumbnail

As has been noted many times in this blog, the problem for web archiving is that we’re dealing with other people’s copyright work, and so an individual permission is needed for each site. I have a long list of sites which I would dearly love to add to the collection, but for which (for various reasons) we’ve had no response. So, if you are the owner of Protest the Pope, or Holy Redundant, or Christians in Politics, please get in touch. For now, even if the collection cannot be anything like comprehensive, I do hope that it is at least coherent.

There are particular strengths, and some gaps. It includes many campaigning organisations, both secularist and religious, and is heavy on the conservative Christian organisations about which I myself know most. It is relatively light on non-Christian faiths, since I know the field much less well. It is still very much open, however, and so suggestions of sites that ought to be included are very welcome, via this blog or via the UK Web Archive site.

See a previous post about my progress in 2012.

Peter Webster

07 January 2013

Oral history in the UK: a new special collection

[A guest post from Elspeth Millar, Oral History Archive Assistant in the National Lifestory Collection at the British Library.]

I have been involved in the the pilot Curators' Choice project, led by the Digital Curator team. The Curators' Choice project is helping curators within the British Library to establish collections in the UK Web Archive, based on the subject expertise of their curatorial department. As Archive Assistant for Oral History and National Life Stories at the British Library my natural topic of choice was going to be websites relating to 'Oral History in the UK'. I have nominated organisational or individual project websites which give information about a project (project background, participants, funding information), and websites which provide access to finding aids for oral history interviews, but ideally sites which provide direct online access to oral history archive material (either clips or full interviews).

Oral history thumbnail

I was lucky to have existing resources at my disposal to discover relevant websites, in particular our own Oral History section resources page, the Oral History Society website and the Oral History Journal; the journal includes a 'Current British Work' section which helpfully lists current oral history projects around the UK.

Oral History in the UK was traditionally concerned with community history and uncovering 'history from below' although it is now widely used within many academic disciplines.  I hope that the websites so far included in the 'Oral History in the UK' collection demonstrate the variety of ways in which Oral History is now used - from use by community and local history groups, charities but also universities. The range of websites in the collection includes those which document local history (Durham in Time, St. Helier Memories); the experiences of people who have emigrated to the UK (such as the Birmingham Black Oral History Project); disability history (Speaking Up For Disability); health (Testimony - inside stories of mental health care); industry (Songs of Steel); and memories of war (The Workers' War, Captive Memories).

The websites vary widely in the way they present oral history. Many websites, although not all, provide access to extracts from oral history audio or video archive material; and most sites also provide information on the project background, participants and funding arrangements.

There are many more websites I would love to include in the collection; indeed many more  websites have been nominated for inclusion within the collection but the Web Archive team is awaiting permission from the website owners to include the site.  We'll carry on nominating sites for inclusion, but we welcome nominations from the public as well - if you think there is an important UK oral history website that is not being included in the UK Web Archive at the moment contact the Web Archive team.

02 January 2013

Slavery and Abolition in the Caribbean: a new special collection

[A guest post by Dr Philip Hatfield, Curator for Canadian and Caribbean Studies at the British Library.]

Back in July I added a short post to this blog about the first stages of selecting material for the UK Web Archive Special Collection, ‘Slavery and Abolition in the Caribbean’. Now, after much trawling of the web and selection of sites, and brilliant work from my colleagues from the UK Web Archive (whose determination and technical wizardry know no bounds) I’m delighted to say that the first iteration is now live for public use. You can access it here, and I hope you find it of use.

Before I go though, some further thoughts about web archiving in the context of this collection. The first thing to note is how important this kind of work is for maintaining a record of not just the Web but writing, publishing and commemoration more generally in the early 21st century. There Plan of the slave ship Brookes are many websites and pages produced during the 2007 bicentenary of the abolition of the slave trade that have either disappeared or no longer have a contactable administrator who can grant the Web Archive rights to collect and display the site. And so, valuable resources for understanding the UK’s engagement with the history of slavery and the politics of remembrance are lost to us.

Following on from this it is impossible to overstate the importance of permissions to the construction of viable collections within the UK Web Archive. Permissions allow sites to be archived and made available to the public and are key to providing a comprehensive research resource. Without them, a collection may not reflect completely the selections of the curator or material that is live on the Web, which is partly the case with the ‘Slavery and Abolition in the Caribbean’ collection. We’ll keep trying with those sites for which we have not yet got permission;  but I am very grateful to those institutions and individuals who have taken the time to consider our request and grant permission.

Highlighting these problems brings me to my main point: this is an evolving collection driven by the need to continue to archive what already exists on the Web and also relevant sites created in future. This is where, hopefully, readers of this post and users of the collection come in. I hope the process of building and maintaining this collection can become a dialogue between users, myself and the UKWA. If you know or moderate a site you think should be part of the collection please do get in touch with me, at [email protected].

[The image is part of a plan of the slave ship Brookes, found in various archived sites, including that of Brycchan Carey . Originally from Thomas Clarkson’s, ‘The history of the rise, progress and accomplishment of the abolition of the African slave trade by the British Parliament’ [BL Shelfmark: 522.f.23]