Scholars and web archives: a report on the IIPC General Assembly, Slovenia April 2013
[Nicola Johnson, Web Archivist at the British Library reports on the General Assembly of the International Internet Preservation Consortium (IIPC) held in Ljubljana, Slovenia in April 2013.]
The IIPC is a membership organization dedicated to improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage. The British Library was a founder member of the IIPC which this year saw its tenth anniversary, providing an opportunity for members to reflect on their achievements and discuss future directions of the Consortium.
The General Assembly, hosted by the National and University Library of Slovenia, comprised three days of member meetings and a two-day public conference on the theme ‘Scholarly Access to Web Archives’. This was the part I attended, held at the Hotel Mons just outside Ljubljana surrounded by pine forests with views to mountains beyond.
The overall vibe of the General Assembly was very positive. Members share a strong commonality of purpose and the workshop allowed people to share their experiences through open and honest dialogue.
I was particularly interested in the perspective of scholars using web archives, several of whom presented at the conference. Niels Brugger from Netlab at Aarhus University discussed the various programmes NetLab are currently running in digital humanities and internet studies, such as: Danish Internet Favourites 2009 ; Digital Footprints; Network Analysis of the Danish Parliamentary Elections 2007-2011; Cross Media Production and Communication and ‘Fundamental Tools for Web Archive Research (FUTARC)’.
Niels stressed the importance for researchers of making informed decisions about the completeness of archived websites and determining if there are inconsistencies between versions. Information about what is missing from the archived object enables scholars to assess resources and is something web archivists can help with by documenting the gaps.
Sophie Gebeil (University of Aix-Marseille) provided a different perspective on the researcher’s experience of using web archives when she described her doctoral studies on North African immigrant communities. During her research the Web materialised as the medium of expression for immigrants and therefore web archives were of the greatest significance for her studies. Sophie stressed that archived websites are not original documents but are to a greater or lesser extent artificially reconstructed and historians therefore need to understand the limitations of the material they are working with.
Meghan Dougherty of Loyola University, Chicago expanded further on the difficulties scholars encounter when using web archives. As a web historian and web methodologist Meghan investigates the idea of the Web as a co-authored medium and is interested in the data that users share, expose or trade when communicating through the internet. She put forward the interesting notion that methods in web history are analogous to anthropology or archaeology as researchers in this field seek to reconstruct the user’s journey through a website. To this end the ‘share this’ or ‘like’ buttons on a website ought to be preserved with as much consideration as the content of the website.
In the panel discussion that followed, the consensus was that scholars researching web archives require as much contextual information as possible about the archived objects, including curatorial data, the legal framework in which archiving took palce, and which content is missing. This information is extremely helpful to the web archivist when performing quality assurance checks on harvested material.
Obstacles to collaboration between researchers and archivists were discussed. In the early years of web archiving the immediate concern was to acquire content, and the question of what to preserve was to some extent secondary. Now that there a good number of existing web archives, scholars can start to articulate how exactly they will use them and what cultural and heritage institutions should focus on collecting.
From the web archivist’s point of view one of the big questions is how and where to select content from given the enormous size of the web. So far, the selection of curated web resources has been a largely manual, resource intensive process which is not only expensive but represents the ‘expert view’. To address this, web archiving institutions have begun to explore the benefits of crowdsourcing in selecting web content to archive.
Helen Hockx-Yu, Head of Web Archiving at the British Library demonstrated the Twittervane tool, a prototype application designed to collect and analyse the outputted URLs published in tweets (see previous post). Workshop participants had the opportunity to set up and run collections and to submit feedback which will be used in the further development of the tool.
Delegates were also impressed by the National and University Library of Slovenia’s WayBack Annotator, another prototype tool which enables members of the public to collaborate with other users on a common platform by selecting URLS of interest, forming groups or collections relevant to them, tagging individual URLs and/or whole websites, supplying additional metadata, marking important parts of individual pages and adding notes and annotations to selected pages.
The IIPC General Assembly provided an excellent forum for members to discuss the simultaneous challenges faced by web archiving institutions from the technical challenges of harvesting, preservation and replay to the challenge of defining the future use cases for web archives and the requirements of scholars.
[Other accounts of the conference include those by Ahmed Alsum (Web Science and Digital Libraries Research Group at Old Dominion University), Rosalie Lack of the California Digital Library and Abbey Potter, Program Officer with NDIIPP at the Library of Congress and outgoing Communications Officer at the IIPC.]