UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

2 posts from December 2012

19 December 2012

Digital Humanities and the Study of the Web and Web Archives

In early December 2012, I attended a PhD seminar on Digital Humanities and the Study of the Web and Web Archives. It was organised by netLab, a research project for the study of Internet materials affiliated to Centre for Internet Studies, Aarhus University, Denmark.

18 PhD candidates from different parts of the world attended the seminar. They are all at different stages of their research but together represent a new generation of researchers who have embraced the Internet to study society and culture as “it holds the most multifaceted material documenting contemporary social, cultural and political life”, in the words of the organisers. The workshop draws specific attention to Web Archives. This is not surprising as Niels Ole Finneman and Niels Brugger, organisers of the seminar, were not only closely involved in the conception and development of the Danish National Web Archive (, but also use web archives as a key source in their own research of the history of the Internet. The purpose of the workshop is two-fold: to explore relevant digital research tools and methods, and to introduce web archives, their characteristics and analytical and methodological consequences to the students as a corpus for research.

Presentations from the students painted a diverse picture of research topics and disciplines. Things that struck me included the already creative use of various digital research methods as well as the (almost indispensable) role of social networks such as Twitter and Facebook. Adrian Bertoli of the University
of Copenhagen, for example, who studies the online diabetes community, is also investigating how that community relates online to medical professionals, pharmaceutical companies and governments. He produced a hyperlink map to illustrate the interactions between the various actors. Another example is Jacob Ørmen, also of the University of Copenhagen, who investigates the interplay between established media and social media in the coverage of worldwide “media events”, such as the Diamond Jubilee or the 2012 London Olympics, where social media data about the events would be fundamental to the research. 

Over time, users of web archives such as those at the seminar are likely to need more and more the means to collect or assemble individual research corpora.  From our point of view, that of a web archiving service provider whose main users are academic researchers, broad national web archive collections, which often only have limited accessibility for legal and technical reasons, may not meet the dispersed needs of individual researchers, and be in danger of providing a “one-size fits nobody” solution. Archiving and providing access to individual historical web resource is the basic “must-have” of a web archive. To add value beyond that, we should think about collecting and storing those web resources in such a way that it will allow individual researchers to organise and then continually reassemble their own research corpora. We also need to provide the tools for processing and manipulating them using various digital methods.

One of the difficulties in studying web archives highlighted by Niels Brugger is the problematic interoperability between web archives with different scopes and geographical coverage. What we need is a research infrastructure which is capable of supporting the study of the history of the Internet across web archives in different countries, collected using different principles and with content in different languages. There is a funding bid under consideration by the EU to develop this.

Helen Hockx-Yu, December 2012  


04 December 2012

Capturing the police authorities

For almost half a century Police Authorities in England and Wales fulfilled their role of ensuring that the public had an efficient and effective local police force. This system was however replaced by a single elected individual (a Police & Crime Commissioner) following the Police Reform and Social Responsibility Act 2011.

Thursday 15th November saw elections for the new Police and Crime Commissioners in the 41 police force areas in England and Wales outside London (The Mayor of London, Boris Johnson, has since January held the equivalent role over the Metropolitan Police Force).

We in the British Library Web Archiving Team were concerned that with the abolition of the Police Authorities and the disappearance of their websites significant documentary material would be lost. Information on the Authority websites typically includes annual reports, statements of accounts, policing plans, public consultations, strategy and delivery plans and newsletters, all of which serve to inform the public of the work of the Authorities and to enable Authority members to scrutinise the constabulary and hold the Chief Constable accountable.

In light of this we contacted the Police Authorities asking for permission to archive their current websites before being replaced by the PCCs on 20 November. Some Authorities responded immediately whereas others required further information and (after a little bit of chasing) we received a 100% positive response rate. This is certainly something to be pleased about as the usual response rate is between 25 and 30 % and so for the first time we have been able to capture a nationwide administrative change comprehensively.

Between two and four snapshots of each website have been taken and reviewed individually for quality and completeness before being submitted to the archive. Typical issues included the need to add supplementary seeds to capture linked documents and style sheets external to the host server; applying filters to prevent crawler traps and probing crawl logs to identify the reasons for missing content. The final snapshots were taken on 20th November in case of any last minute changes. See the whole collection.