UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

16 June 2015

Beginner’s Guide to Web Archives Part 2

Image credit:

Having begun to collect websites for a special collection on climate change, in part two of our beginner’s guide, science policy intern Peter Spooner continues his journey into web archives, considering how they can be used by researchers.

Digging Deeper

While still a young field, more and more is being thought about how to use web archives as a research tool. Different ways of using the archive include detailed evaluation of website content (language, imagery, context), evaluation of the links between websites and establishing trends across the material.  I will give some brief examples of these approaches and highlight some of the difficult problems involved.

In perhaps the simplest example, the UK web archive contains an interactive visualisation tool that can track the number mentions of a word or phrase over time. This was explored in some detail in the blog post ‘Towards a Macroscope for UK Web History’. For example, since I am interested in climate change I could type “climate” into the search box and see what I get. However, the single word “climate” could also be a reference to anything from holidays in Spain to climate control in cars. Instead I could enter the more specific phrase “climate change” (Fig 1). The search “climate change” is likely a more reliable indicator of climate change on the web, although phrases such as “warming climate” or “changes in climate” would not be found by this search. 


The data, while imperfect, suggest that climate change has been growing as an online topic between 1996 and 2010. But what other quick climate tests can we do? A claim among some sceptical of climate science is that certain organisations and groups of people have dropped the term “global warming” in favour of the term “climate change”, because of an apparent lack in surface warming since 1998. The data from the UK web archive suggest that any such ‘lexicon shift’ has not made its way into the UK web, with “climate change” always being mentioned more from 1996 to 2010 (Fig 2). Data from published books and science papers also indicate that this shift has not happened on a large scale. However, what this tool cannot tell us in detail is who was using the terms online. In order to get any kind of detailed understanding, digging further into the data is essential.  


One amongst a variety of other tools that could be used is sentiment analysis. In our case such analysis could be used to determine the attitude of a website to climate related policies (essential or ridiculous), climate scientists (heroes or demons), climate science (true or misleading), etc. Again there are limitations. Websites usually contain sections and quotes written by different people which may not have the same ‘sentiment’ as the website overall. Once again, the key to the research would be to dig deeper into the website. Digging deeper was a common theme amongst speakers at a recent British Library workshop on web archives, where the researchers involved mostly had to revise their initial research questions and make them more specific. For example, I could start out by asking “How has climate change been portrayed online since 1996?”, but I would likely find so much data I would not know what to do with it. 

Uses of Special Collections

Sub-collections of websites from the archive can be useful in trimming down the data set, helping to focus the research question. The researcher must be aware that the set of websites has been subjectively chosen. Collections require a slightly different approach than analysing the entire archive. For example, in an early example, a set of websites (a corpus) were collected in order to analyse political action online using links between websites. In terms of my collection, rather than the very broad question I asked above, we could ask questions like: “How do energy companies’ portrayals of climate change change over time?” which could involve detailed analyses of the websites, link analysis, language analysis, etc. It would be hoped that all the necessary sites would be in the collection ready for the researcher to use.

Research Collaboration

An obvious way to make sure this hope becomes a reality would be to engage researchers in the creation of special collections. This approach would mean that more relevant sites are included, and that researchers can learn first-hand about the web archive before using it, helping them to better develop the questions required to use the archive as a tool alongside their existing research methods. Many papers study human attitudes to climate change, and how attitudes have changed over time (eg. here). We hope to involve some of these researchers in the web archives project. Stay tuned for updates.

Peter Spooner, Science Policy Intern


The comments to this entry are closed.