UK Web Archive blog

1 posts from January 2014

15 January 2014

RESAW: Research infrastructure for the Study of Archived Web materials

[Helen Hockx-Yu, Head of Web Archiving at the British Library, writes:]

Two scholars at Aarhus University, Denmark, Niels Brugger and Niels Ole Finneman, organised a workshop in December for potential partners of RESAW, an initiative aimed at building a pan-European research infrastructure for the study of web archives. An important element of the infrastructure is existing national web archives, often underpinned by legal frameworks such as legal deposit or copyright law but not fully available publicly. To make use of such archives, researchers have to be present physically at archiving institutions’ premises.

A research infrastructure, however, is more than isolated national web archives with restricted access, often referred to as “dark archives”. The goal is to find ways to link these together and offer seamless access to distributed web archives. The Mementos Service developed by the UK Web Archive, which allows discovery and delivery of archived web pages from multiple web archives, is a good example of how this could be done. Anat Ben David of the University of Amsterdam, associated with the WebArt project, presented impressive and promising search and visualisation approaches, which significantly improve access to large scale, closed national web archives.

Awareness and understanding of the characteristics of archived web material, and the development of appropriate research methods to study it, are equally indispensable elements of RESAW. It is not surprising that, in addition to a number of national web archives, there was strong representation of researchers at the workshop, from universities and research institutions across Europe. In his keynote, Niels Ole Finneman analysed the particularities of archived web material against the context of the live web as well as in the study of other digital sources. He argued that the archived web is “re-born” digital content, and differs from the live web in many ways. RESAW does not have a particular disciplinary focus but aims to allow for all kinds of epistemological and methodological approaches, whether rooted within the sciences, the social sciences or the humanities.

I was honoured to be able to present the perspectives of web archiving institutions, and was given a brief to focus on our interactions with scholars. I reported on our earlier work on scholarly feedback and highlighted an increasing amount of interactions with scholars in recent years, with a number of research groups emerging, which devote effort and attention to web archives. UK institutions among these include the Institute of Historical Research based at the University of London, and the Oxford Internet Institute. Both have recently been funded by the Joint Information Systems Committee (JISC) to carry out research projects using web archives, in partnership with the British Library. A general trend with three phases can be observed with regard to scholarly interaction with web archives:

Phase 1: Building collections
Scholars are involved in scoping collections, selecting and describing websites relevant to research interests. This effort often ended up with the creation of specific, if sometimes narrow, topical collections.

Phase 2: Formulating research questions
This often takes the forms of brain-storming sessions, workshops and projects, where researchers are made aware of web archives and asked the question: which research questions might web archives help you answer? This is a much more bilateral interaction and represents a shift of focus to web archives in their entirety. It however suffers from being required to define the unknown, and is also time- and resource-consuming.

Phase 3: independent use of web archives
This type of interaction has just begun to emerge. It is the desired “go-to” state, where interfaces to web archives already meet the most common scholarly requirements. Scholars are able to use web archives without having to depend on (personal) interactions with providers. This requires user interfaces to be self-explanatory, jargon-free and to contain base-line information about the archive. This includes information on the scope of the archive, its coverage and lacunae, how it was collected, and how a particular website was crawled.

RESAW is aiming to apply for funding from the European Commission under the Horizon 2020 Framework. The workshop was an opportunity to identify issues and discuss a plan. It produced a list of work, which RESAW will tackle and address, as well as the steps towards a funding application.

As one of the providers of the UK’s national web archive, we are pleased to be involved as we see RESAW as an important initiative which will help connect scholars with web archives and with each other in new ways.