UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

2 posts categorized "Visual arts"

05 November 2020

On World Digital Preservation Day, the UK Web Archive and the ‘Children of Lockdown’ capture the moment for future generations

By Charlotte McMillan, Founder, Storychest

Introduction from Nicola Bingham, Lead Curator, Web Archiving, British Library

Today is World Digital Preservation Day (WDPD), a celebration of all things digital preservation which is organised by the Digital Preservation Coalition and held annually on the first Thursday of November.

To mark today’s WDPD, the UK Web Archive is highlighting the ‘Children of Lockdown’ project, a recent addition to the Archive which exemplifies the diversity of voices captured in the Archive and highlights the importance of preserving our digital legacy.

Charlotte McMillan, founder of Storychest and curator of the ‘Children of Lockdown’ project writes:

“Lockdown has been like a question mark in the middle of a sentence. Unexpected, confusing and stressful”, observes Yilu, 14, from London, in one of 200 thoughtful and powerful responses to lockdown and Covid19 from children aged 3 to 17, spanning the whole of United Kingdom and from as far afield as Australia.

In July this year, children were asked to contribute to a ‘digital time capsule’ to be included by the British Library in the UK Web Archive’s Covid19 collection. The resulting group of stories, poems and artwork, is insightful and poignant, and whilst it reflects anxiety and, in some cases, profound sadness, it is also imbued with humour, imagination and enduring hope. Disruption to education, distancing from friends and family and uncertainty about the future, are themes which have been captured through the lenses and with the clarity of the voices of the children themselves, a generation whose lives have been affected by the pandemic in so many ways.

The ‘Children of Lockdown’ collection was initiated by Charlotte McMillan, mother of 3 boys and founder of Storychest, the private digital memory box app. Witnessing the impact of the upheaval of lockdown restrictions, she encouraged her own children to record their impressions of this unprecedented period of time.

Charlotte studied history at university and remembers the amazement she felt at exploring on microfiche newspaper articles stored by the British Library from a century ago to help her to understand how events were perceived contemporaneously. She wanted to enable today’s children to record their thoughts and impressions in a lasting way, to help future generations to understand what they were going through. So, Charlotte, together with a group of 5 British children’s authors put the word out to schools and other groups for children’s submissions.

The children have captured enduring and iconic snapshots like the ‘clap for carers’, PE with Joe Wicks, the run on loo paper, empty streets and the emergence of nature, including goats taking over Llandudno.

Maddy mask

Maddy, 15, from London drew herself in monochrome, with the now all too familiar accessory of a mask, eyes peering knowingly at the viewer.

Tiggy

Tiggy, 10, from Kent felt trapped and isolated away from her friends, so drew herself behind prison bars in her own home, but added hope to her work by overlaying a rainbow in pastels.

Sholto

Charlotte’s son Sholto, 14, pictured himself caught inside his phone, referencing the use of devices in lockdown as both a window to the outside world and a trap.

There are stories of imagination and escapism: Flora, 10, from Wallingford likened the virus to a wandering wolf ‘pacing the fence line’ outside her home; Joseph, 8, from Nottingham, missing playing football, invented games using Lego, “Harry Potter, racing through on his broomstick to smash it in the goal. Godzilla in goal, nobody could beat him”.

Incredibly brave Emma, 11, from Derbyshire, was being treated for a brain tumour during lockdown. She reflected on happy moments sat on a best friend’s drive for a chat and also describes sadness that her mum was not able to be with her, due to the restrictions, when she rang the hospital bell to mark the end of her treatment.

Siomha, 11, from London grieves for the death of a beloved great uncle, who had been known as the ‘baby whisperer’ for his calming effect on her as a baby: “Where is my baby whisperer? This time he cannot stop the tears, because this time I weep for him.”

Maddy mural

And yet, through it all, there is hope. Maddy, 15, from Exeter, creates a lockdown mural in her bedroom to show hope for her family and friends. Saanvi, 11, from Leamington Spa, remembering Dumbledore’s words “Happiness can be found even in the darkest of times, if one only remembers to turn on the light”, reflects that “mankind will get through any crisis and discover positivity even where it seems impossible to find”.

Kenzi, 12, from Chesterfield who is autistic and found lockdown to be a particularly anxious time, sums it up beautifully, in his poem ‘Life in Lockdown’:

When normality returns

2020 will be remembered

In our broken battered hearts

As the time the world finally united

By staying apart

The Children of Lockdown collection can be viewed in full at https://childrenoflockdown.storychest.com/

The UK Web Archive Coronavirus Collection can be viewed here: https://www.webarchive.org.uk/en/ukwa/collection/2975

27 September 2018

Web Archives: A Tool for Geographical Research?

By Emmanouil Tranos and Christoph Stich, University of Birmingham

Introduction
If you are a quantitative social scientist there are few things more fascinating than free, under-utilised, quirky and easy to download data that also fits well the narrative of 'big data'.

Combine the above characteristics with data that have the potential to support researchers answering interesting research questions and then you will make a researcher happy! And this is exactly what the JISC UK Web Domain Dataset held by the UK Web Archive is all about.

A detailed description of the data can be found here, but briefly this is a subset of the Internet Archive that includes all the archived webpages under the .UK Top Level Domain (TLD) as well as the archival timestamp for the period January 1996 to March 2013. The UK Web Archive partnered with the Internet Archive and JISC to create this unique data set, which enables researchers to easily access probably the largest national archive of webpages.

The UK web space has several unique characteristics
Apart from the fact that UK was an early adopter of internet technologies and applications, it also includes some widely recognisable second level domain names such as the .co.uk and the .ac.uk. While the first one (mainly) denotes commercial activities based in the UK similar to the .com top level domain, the latter is used for UK universities. Moreover, the English language makes the UK web space more accessible to the rest of the world.

How is this dataset useful?
The JISC UK Web Domain Dataset is an easy way to access the Internet Archive data. It is, in essence, a long list of strings (i.e. groups of characters), that include the archival timestamp and the original URL of the archived webpages.

For instance, the first numerical part of the line below indicates when the contact page of the uk.eurogate.co.uk website was archived (9/5/2008 at 16:21:38).

20080509162138/http://uk.eurogate.co.uk/contact_us IG8 8HD

With the use of these strings a researcher can retrieve the HTML documents of the archived webpages from the Internet Archive API. The UK Web Archive further processed this data and created a subset of the archived UK webpages that includes all the .uk webpages that contain a UK postcode.

In the above example, the last element indicates that this specific webpage contains the postcode IG8 8HD.

This dataset, which is known as the Geoindex and can be downloaded from here, is probably one of the largest open data sets of georeferenced digital content.

Challenges
There are, however, a number of technical and conceptual challenges attached to the usage of these data. For instance, there is a debate in the literature regarding how much of the web is currently archived (e.g. Hale et al. 2017). Although there is some critique regarding the depth of archival process (i.e. how many webpages from each website are archived), the Internet Archive is the most extended digital archive (Holzmann et al., 2016; Ainsworth et al., 2011).

Moreover, the volume of the data requires some upfront investment regarding data analysis skills, but is still doable with some standard off-the-shelf libraries and tools (e.g. Python or R).

Results
After filtering out invalid postcodes, we are left with a dataset that contains about 5.8 million pairs of British postcodes and domain names.

As one can see in plot, the number of domains that reference a postcode grows relatively rapidly in the decade between 1995 and 2005 before growth levels off. The distribution of domains also more or less aligns with the population density of the UK. This is a good indicator that the collected data captures actual activity in the UK.

Domains_2012

Unsurprisingly the data also reveal a difference between London and the rest of the country. The number of domains that reference a postcode per inhabitant grew faster in London than in other places, but eventually the rest of the country caught up with London. There are, however, quite significant differences in how the domains are distributed within London as well.

London_dpt

London_dpt

So, what research questions can these data help us answer? Utilising funding from the ESRC and the Consumer Data Research Centre (CDRC) we employed this data to explore the evolution of the digital economy in the UK. Firstly, we are utilising this data in order to understand whether the availability of online content attracts individuals online. We do that by employing unique survey data available from CDRC.

Hypothesis
Our underlying hypothesis is that the availability of internet content of local interest can attract people online in order to access and take advantage of the potential on-line opportunities such as accessing local products and services. The first results seem to support our hypothesis.

Secondly, we are using this data to explore the economic activities (e.g. products and services offered b firms) that take place in some of the UK digital clusters. By filtering the data to only focus on archived web pages from specific clusters in the UK and by utilising the textual data available from the archived HTML documents, we are building topic models to reveal what type of economic activities exist in these clusters and how these activities have evolved over time.

We are testing how this archived web data can help us learn more about economic activities and how they have evolved over time. We are also comparing the outputs of this analysis with official industrial classifications from various sources including freely available such data from CDRC.

Lastly, together with colleagues from City-REDI, we are using the archived web data as a proxy to understand the early adoption of web technologies in the UK. Building upon arguments developed in evolutionary economics, the early adoption of web technologies may signify innovative regions which developed 'digital capacity' early enough, something which may affect their future growth trajectories. The first results indicate that indeed the early adoption of web technologies is related to positive future growth trajectories.

To close, we believe that our on-going research, apart from answering substantive geographical research questions, will also illustrate the value of archived web data for geographical research. It is one of the few available data sources that can provide longitudinal georeferenced data, which also includes a wealth of unstructured textual data.

The latter can also reveal patterns and activities that other more 'conventional' data sources would not have been able to uncover.

References
Ainsworth, S. G., Alsum, A., SalahEldeen, H., Weigle, M. C., & Nelson, M. L. (2011). How much of the web is archived? Paper presented at the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries.

Hale, S. A., Blank, G., & Alexander, V. D. (2017). Live versus archive: Comparing a web archive to a population of web pages. In N. Brügger & R. Schroeder (Eds.), Web as History: Using Web Archives to Understand the Past and the Present (pp. 45-61). London: UCL Press.

Holzmann, H., Nejdl, W., & Anand, A. (2016). The Dawn of today's popular domains: A study of the archived German Web over 18 years. Paper presented at the Digital Libraries (JCDL), 2016 IEEE/ACM Joint Conference.