I recently made the trip to the Digital Research 2012 gathering in Oxford, with my colleagues in the web archiving team Helen Hockx-Yu and Andy Jackson. We were taking part in a day of presentations and workshops on the theme of digital research using web archives. (See the programmes of our session and of the whole conference.)
It was an excellent opportunity to showcase a cluster of current projects, both here at the BL and in association with us, and to make connections between them. Andy demonstrated some of the forthcoming visualisation tools for the archive, some of which are already available on the UK Web Archive site (see earlier post). Helen presented some summary results from a recent survey of our users, about which she wrote in an earlier post.
Recently, the JISC very generously funded two projects to explore the use of the UK Web Domain Dataset, and there were presentations from both. Helen Margetts from the Oxford Internet Institute presented the Big Data project, which is conducting a link analysis of the whole dataset, showing its usefulness for political scientists and other social science researchers by analysing the place of government in information networks in the UK.
I myself then presented some early findings from the Analytical Access to the Domain Dark Archive project, led by the Institute of Historical Research (University of London). I reported on a series of workshops with potential users of the dataset, who raised important questions about research of this type. How far should researchers trust analytical tools inside a 'black box', presenting results generated by algorithms that are not (and often cannot) be transparent ? Also, how far does research on datasets of this scale present new questions of research ethics, and who should be looking for the answers to them ?
In the afternoon we discussed some of the themes raised in the morning, to do with potential users and their needs. Some of these were:
(i) that large datasets present amazing opportunities for analysis at a macro level, but at the same time many scholars will still want to use web archives as simply another resource discovery option, to find and consult individual sites. Both approaches need to be catered for.
(ii) possible interaction with Wikipedia. As over time more and more sites disappear from the live web, and UKWA increasingly becomes the repository for the only copy, we might expect UKWA to become cited as a source more in Wikipedia. However, there may be ways to aid and encourage this process.
(iii) how do we identify potential user groups ? We can't safely say that scholars in Discipline A are more likely to use the archive than those in Discipline B. It may be that sub-groups within each discipline find their own uses. For instance: one wouldn't find much data about the Higgs Boson in the archive; but a physicist interested in public engagement with the issue might find a great deal. One wouldn't look in UKWA for the texts of the Man Booker prize shortlist; but a literature specialist could find a wealth of reviews and other public engagement with those texts.
Overall, it was a most successful day, which gave us much food for thought.