UK Web Archive blog

107 posts categorized "Web/Tech"

19 October 2021

Clouds and blackberries: how web archives can help us to track the changing meaning of words

By Dr Barbara McGillivray (Turing Fellow), Pierpaolo Basile (Assistant Professor in Computer Science, University of Bari), Dr Marya Bazzi (Turing Fellow) and  Dr Jenny Basford, Jason Webber (British Library)

NOTE: This a re-blog from the Alan Turing Institute, with permission.

The meaning of words changes all the time. Think of the word ‘blackberry’, for example, which has been used for centuries to refer to a fruit. In 1999, a new brand of mobile devices was launched with the name BlackBerry. Suddenly, there was a new way of using this old word. ‘Cloud’ is another example of a well-established word whose association with ‘cloud computing’ only emerged in the past couple of decades. Linguists call this phenomenon ‘semantic change’ and have studied its complex mechanisms for a long time. What has changed in recent years is that we now have access to huge collections of data which can be mined to find these changes automatically. Web archives are a great example of such collections, because they contain a record of the changing content of web pages.

But how can we automatically detect in a huge web archive when a word has changed its meaning? A common strategy is to build geometric representations of words called word embeddings. Word embeddings use lots of data about the context in which words are used so that similar words can be clustered together. We can then do operations on these embeddings, for example to find the words that are closest (and most similar in meaning) to a given word. It’s a useful technique, but building embeddings takes a lot of computing power. Having access to pre-trained embeddings can therefore make a big difference, enabling those in the scientific community without sufficient computational resources to participate in this research.

A team of researchers from The Alan Turing Institute and the Universities of Bari, Oxford and Warwick, in collaboration with the UK Web Archive team based at the British Library, has now released DUKweb, a set of large-scale resources that make pre-trained word embeddings freely available. Described in this article, DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a collection of all .uk websites archived by the Internet Archive between 1996 and 2013. (This dataset is held and maintained by the UK Web Archive, which has been collecting websites since 2005, initially on a selective basis and since 2013 at a whole domain level.) DUKweb contains 1.3 billion word occurrences and two types of word embeddings for each year of the JISC UK Web Domain Dataset. The size of DUKweb is 330GB.

Researchers can use DUKweb to study semantic change in English between 1996 and 2013, looking at, for instance, the effects of the growth of the internet and social media on word meanings. For example, if the word ‘blackberry’ is used mostly to refer to fruits in 1996 and to mobile phones in 2000, the 1996 embedding for this word will be quite different from its 2000 embedding. In this way, we can find words that may have changed meaning in this time period. The figure below (from Tsakalidis et al., 2019) shows four words whose contexts of use have changed in the last couple of decades: ‘blackberry’, ‘cloud’, ‘eta’ and ‘follow’. The bars indicate words most similar to these four words in 2000 (red bars) and in 2013 (blue bars). The scale along the bottom gives a measure of the change.

figure 02 - analysis - clouds, blackberries

The resources that underpin DUKweb are hosted on the British Library’s research repository, and are available for anyone in the world to download, reuse and repurpose for their own projects. This repository is part of the BL’s Shared Research Repository for cultural heritage organisations, which brings together the research outputs produced by participating institutions, and makes them discoverable to anybody with an internet connection. Providing a stable, dedicated location to hold heritage datasets in order to share them with a wider research community has been one of the key drivers in the implementation and development of this repository service. We are grateful to the British Library’s Repository Services team for supporting this collaboration between the UK Web Archive team and the Turing by making the content for DUKweb available.

Read the paper: DUKweb: diachronic word representations from the UK Web Archive corpus

 

04 October 2021

UK Web Archive Climate Change Collection

By Andrea Deri, Cataloguer, Lead Curator of UK Web Archive Climate Change Collection; Nicola Bingham, Lead Curator, Web Archives; Eilidh MacGlone, Web Archivist; Trevor Thomson, General Collections Assistant (Collection Development) National Library of Scotland


What public climate and sustainability related UK websites would you preserve for future research?

What public UK websites tell the story of climate change actions in your areas of living, travelling, working, study and passions?

Nominate these websites to the UK Web Archive Climate Change Collection. You can nominate as many websites or webpages as you feel are relevant.

Desert landscape - Photo by '_Marion'
Photo by '_Marion'

About the Climate Change Collection
The UK Web Archive Climate Change Collection is not only an archive of past digital content preserved for future research. It is also a live, dynamic, growing resource for decisions, research and learning today.  

Much of the debate around climate change is taking place on the Web and is, therefore, highly ephemeral, meaning it is important to capture it now, in real time. The UK Web Archive Climate Change collection does just that: captures climate related public UK websites and archives them regularly according to the frequency of updates on the website. 

What is the UK Web Archive?
The UK Web Archive (UKWA) is a collaboration of the six UK legal deposit libraries working together to preserve websites for future generations. The Climate Change collection is one of over hundred curated collections of the UK Web Archive. Given the multi-, inter- and transdisciplinary nature of the climate crisis, researchers may also find several other UKWA collections relevant for studying climate change, for example, the News Sites, Science Collection, British Countryside, Energy, Local History Societies, District Councils, Political Action and Communication, Brexit, among others.  

While all the UK legal deposit libraries contribute subject expertise to the Climate Change collection’s development, to make it more representative we solicit nominations as widely as possible. To this end we have developed a simple form, which allows anyone to nominate public websites or web pages published in the UK. If you would like to nominate a website for the UK Web Archive Climate Change collection add the title, URL and brief description of the website or webpage. 

UKWA Climate change nomination-form

If you would like us to acknowledge your nomination, enter  your name and email address.

What can UKWA archive?
Before you nominate, you might want to check your nomination for scope and duplication. The UK Web Archive cannot archive sound and video platforms in which the audio and video content dominate. Websites that require personal log-in details, for example Facebook sites, or private intranets, emails, personal data on social networking sites or websites only allowable to restricted groups. 

What happens to my nomination?
All nominations are checked manually by a curator. If the website meets the requirements of non-print legal deposit, it is added to the collection by library staff without any prejudice regarding content. We want to make the climate change collection representative of diverse perspectives. The annotation process includes assigning broad subject labels, crawl frequency (the frequency of archiving), and a licencing request for making historical pages public. While all UKWA Climate Change collection titles are listed online, archived versions of the websites can be accessed only in legal deposit libraries’ reading rooms unless licenced.  

 Why is this collection important?
The UKWA Climate Change collection serves several functions, three being particularly important: 

  1. Supports research - Supports research related to climate change issues
  2. Raises awareness & curiosity - Makes readers aware of and curious about the diversity of climate change impacts, mitigation and adaptation activities across scale
  3. Engages in action - Inspires readers to take action including nominating websites for future preservation and by doing so contributing to the knowledge base of climate change

By inviting nominations, the UKWA Climate Change collection draws on a citizen science approach, in other words, engages members of the public in academic research and developing the collection. The integration of library science and citizen science acknowledges the complementary values of diverse forms of knowledge, including diverse forms of local knowledge. With their nominations contributors can diversify existing sub-collections and initiate the creation of new sub-collections. For example, a new sub-collection has just recently been suggested dedicated to climate change & sustainability strategies of UK galleries, libraries, archives and museums (GLAM sector).  

History of the Collection
The collection was established when The Paris Agreement was negotiated at the UNFCCC COP21, in 2015. The acceleration of the climate crises, the exponential growth of digital climate content publishing and the demand for innovations that can be inspired by a diversity of knowledge, local, practical, technical and academic, called for an upgrade. The Climate Change collection is an important source of knowledge both in preparation for the UNFCCC COP26 conference in Glasgow

Websites and webpages archived over time tell the stories how individuals and organisations have been making sense of and responding to the climate crises. We encourage you to nominate the public websites that tell the stories of your engagement with the changing climate and websites you want to preserve for future generations. 

Further recommended sources 

09 September 2021

UK Communities online - Enthusiasts and Hobbyists

By Jason Webber, Web Archive Engagement Manager, The British Library

Over the next few months we will be looking at some of the communities in the UK and how they have used the web. Look out for #UKCommunitiesOnline on twitter.

Whatever your hobby or interest, however obscure or niche, the web has allowed people to share their passions and meet (virtually or in real life) likeminded folk. Let's look at just a few items from the fantastic 'Online Enthusiast' collection.

The English Tiddlywinks Association

Blitz, gromp and squidger - just a few of the fabulous terms used in the very serious (but also very fun) game of Tiddlywinks. 

English tiddlywinks assoc

Archived website from 2014.

British Trams Online

There may not be as many tram services in the UK as there once was but here is THE definitive guide to trams currently in service across the UK.

British trams online archived website

Archived website from 2015.

Morris Dancing - 'Open Morris'

The long held tradition of 'Morris Dancing'. A form of this folk dancing has been around since the late middle ages and is still a popular pastime.

Open morris website in the UK web Archive

Archived page from 2014.

Synth DIY Wiki

Music is a huge part of many people's lives but not many make their own instruments. From the website: "This is a budding wiki for learning and sharing knowledge about making, modifying, or repairing electronic musical instruments and related equipment yourself."

Synth diy website

Archived page from 2014.

Telegraph Pole Society

"These much ignored pieces of rural and urban furniture finally have a website of their own."

Telegraph pole society in the UK Web Archive

Archived page from 2020.

Call out!

Is your hobby, interest or pastime represented online? Have you made a website about your spare time passion? You can nominate ANY UK website here: www.webarchive.org.uk/nominate

01 September 2021

Web Archive Summer of Sport Roundup

By Jason Webber, Web Archive Engagement Manager, British Library

If you like sport then this summer will have been a fantastic time for you. There has been the Football European Cup, Olympics and Paralympics, Tour de France, Wimbledon and many others.

Over the last few months, we in the UK Web Archive, have attempted to show off some of the sport that we collect and have archived for future generations. Let's look back at the wonderful  #WebArchiveSummerOfSport

Alternative Sports in the UK Web Archive - Part 1

Cheese Rolling, Bog Snorkeling, Conkers and The Chap Olympiad, just some of the more unique 'sports' we have collected.

Cheese rolling champs website

Football Associations in the UK Web Archive

The organisations that run football and are in charge of the respective home nation teams. Scotland, Wales and England all competed in this years European Cup.

Welsh FA website

Scottish Sport in the UK Web Archive

We teamed up with colleagues from the National Library of Scotland looking at particularly Scottish sports that have made their way into the archive. Shinty, Highland Games and Curling?

Highlandgametraditions

Alternative Sports in the UK Web Archive - Part 2

Grass roots sport such as open water swimming, hiking, kayaking and stand-up paddle boarding have made it into the web archive too.

Peter pan cup londonist website

London’s Olympic Legacy: Local, National and International Aspirations

Legacy is a much talked about aspect of hosting an Olympic games, here Phd researcher Caio Mello looks back at the London 2012 games.

Aquatics centre

London 2012 Paralympics in the UK Web Archive

The London 2012 Olympic and Paralympic games were an incredible time in the national consciousness. The Paralympic games in particular felt special and here we highlight just some of the websites that were collected.

GB Wheelchair rugby website

We hope you enjoyed this series on sport? Do let us know through twitter if there are future topics you would like us to cover?

26 August 2021

Important information for our email subscribers

Unfortunately, the third-party platform that the British Library uses for email notifications for our blogs is making changes to its infrastructure. This means that, from August 2021, we anticipate that email notifications will no longer be sent to subscribers (although the provider has been unable to specify when exactly these will cease).

To find out when new blog posts are published, we recommend following us on Twitter - @UKWebArchive or checking the British Library blogs page where they are all listed.

We want to assure you that we are actively looking into this issue and working to implement a solution which will continue your email notifications, however we do not know whether you will continue to receive notifications about new posts before we are able to implement this. We promise to update the blog with further information as soon as we have it. Thank you for your patience and understanding while we resolve this matter.

We appreciate this is inconvenient and know many people are not on social media and have no intention of being so. Many rely on email notifications and may miss out without them. As soon as we have been able to implement a new solution we will post about it here. Thanks for bearing with us.

24 August 2021

London 2012 Paralympics in the UK Web Archive

By Jason Webber, Web Archive Engagement Manager, British Library

"Our greatest of days witnessed through disbelieving eyes

AND that was that, a summer like no other now consigned to the pages of history. A cherished memory of endless days and golden glory witnessed through often disbelieving eyes."

Paralympics-01

More than the Games website - 2012

The London 2012 Paralympic Games was, then as now, seen as a great success and a new milestone for for how disability is viewed by the wider public. Channel Four made the iconic advert 'Meet the Superhumans' that arguably created a distinct and positive tone leading into the games. They also broadcast an unprecedented (at that time) 150 hours of Paralympic sport.

The term 'superhuman' is seen by some as controversial and problematic and it is notable that the slogan for the 2016 games were changed to 'Yes I can'.  Language around disability can often be complicated, as words and phrases used historically are now considered offensive.  Additionally words or phrases intended to convey a positive message can sometimes be misguided and have a negative impact.  Particularly if thought of by someone without the lived experience of a disability.

Paralympics-02

Official website of the Paralympic Movement - 2012

What is undeniable, however, is the success of the sporting event itself. 'Team GB' won 34 Gold medals and 120 in total. Athletes Sarah Storey and David Weir each won 4 Gold in Swimming and Athletics respectively. Great Britain ended third in the medal table, a fantastic achievement.

The UK Web Archive extensively collected websites for the London 2012 Olympic and Paralympic games collection that now represent a superb resource of this key time in modern history. The collection holds nearly 500 target websites, the vast majority of which can be viewed anywhere online.

Paralympics-03

Great Britain Wheelchair Rugby - 2012

Paralympics-04

British Disabled Fencing Association - 2012

Do you know of UK Paralympic athletes and sports for the Tokyo 2020 games? Nominate here.

18 August 2021

If Websites Could Talk - Part 4

By Hedley Sutton, Asian & African Studies Reference Services Team Leader

Check out previous episodes in this series - Part 1, Part 2 and Part 3.

Raspberry pi website

Once again we are privileged to be able to eavesdrop on a diverse group of UK domain websites, as they attempt to identify the most extraordinary site of all.

“Shall we start?” said the Happy Museum Project. “Surely no-one will object to our being a candidate?”

“Indeed. No-one will object,” said Fat Llama. “Then again, nobody is going to be wildly enthusiastic.”

“Couldn’t agree more,” said Crazy Coffins. “We want to be more ambitious, and choose a site with some wit and humour.”

“Like us!” cried Raspberry Pi.

“Or us!” added the Use Less Group “Geddit?”

“Of course we get it,” sighed Intelligent Lifts & Escalators. “We’re not stupid. However we really need a site with an aura of mystery … something that draws the outsider in … “

“Then you can only mean us,” suggested the Edible Bus Stop. “We are surely a stronger candidate than Rubber Cheese, or VocalEyes.”

“Don’t be ridiculous,” said the Eton Fives Association with a sneer. “You might as well nominate a site like Mutts With Friends.”

“Oooh, hark at you,” said Dog Daddies, in canine solidarity. “As a site championing another of our four-legged friends, are you with us too, HopeThruHorses?”

“Good grief!” exclaimed the Good Grief Trust, somewhat predictably. “If the discussion continues in this vein, we’re probably going to need to bring in Spread A Smile.”

“Or possibly even the Wellbeing Supervisor,” mused Magic Bus UK.

A brief silence descended on the gathering, soon broken by the UK Corrupt Police. “Hello hello hello, what about nominating us?”  

This seemed to concentrate minds. It was decided that the wisest course of action was to let Verifiable Credentials make the final choice. This meant that the site eventually put forward was … the Fab Foundation.

If you know of a UK website that should be included in the archive, nominate it here.

08 July 2021

London’s Olympic Legacy: Local, National and International Aspirations

By Caio Mello, Doctoral Researcher at the School of Advanced Study, University of London

For two years, I have been studying the media coverage of London and Rio’s Olympic legacies. See the previous posts, where I explained the project’s main objective of understanding and conceptualizing the meaning of the word legacy based on the news coverage of the Games. I have also written about how controversial the word 'legacy' can be once it is a term under dispute by several actors in the political arena. In the most recent post, I introduced the use of SHINE as a platform for exploratory analysis of news events and I briefly described how it was a useful tool for my research project.

Olympic Aquatic centre, London

The Approach
In this post, I aim to discuss the different approaches taken by news organisations, government websites and activist blogs to the legacy of the London Olympics. Although my initial interest was mainly focused on understanding the journalistic framing of legacy, looking at other sources has proved to be beneficial in a comparative perspective. For this purpose, I searched for articles on ‘Olympic legacy London’ via SHINE and selected, among the 10 domains provided by the platform, three news websites (bbc.co.uk, guardian.co.uk and independent.co.uk), one official government website (uksport.gov.uk) and one activist blog (gamesmonitor.org.uk).

The Research
Texts were collected, processed, cleaned and filtered using Python scripts and combined with articles extracted from the live web. The data was ranked and the top 50 bigrams (co-occurrence of two words) mentioned in the texts were transferred to a spreadsheet using the Natural Language Toolkit (NLTK) - a suite of Python libraries for linguistic analysis. The list of trends was then used in a first distant reading to give a sense of the most discussed topics and then combined later on with a more qualitative approach of close reading for a deep understanding of context.

Bigram

Findings
These bigrams have revealed a significant difference in the way the Olympic legacy of London was approached by different sources from 2004 to 2020. Among the most cited bigrams by news publishers are ‘young people’ and ‘school sport’, both referring to the promises included in the legacy plan of London published in 2008 by the Department for Culture Media and Sport (DCMS). Promises number 1 and 3, entitled ‘making the UK a world-leading sporting nation’ and ‘inspiring a generation of young people’, included the engagement of young people in physical activities by increasing the offer of high-quality sports. The drop in the number of 16 to 25-year-olds playing sport after the games was one of the main topics highlighted by the media.

While both ‘young people’ and ‘school sport’ are a response to the legacy plan published by the DCMS, the most mentioned bigram in the list of texts analysed did not receive much attention in the document: ‘west ham’.

The destiny of the Olympic Stadium became one of the most controversial events around the Olympic legacy of London. Initially, the disagreement on whether it should remain as an athletics venue or be handed over to West Ham United drew the attention of the media with important voices like the Olympics Minister Tessa Jowell and ex-London mayor Ken Livingstone supporting the opposition against the football club. The dispute between West Ham and Tottenham for the Olympic Stadium and the threat of becoming a ‘white elephant’ - a recurrent fear in recent Olympic history shed light on the place as a symbol of London’s Olympic legacy.

The media coverage of London’s legacy contrasts with the much more abstract and broader bigram found in the texts published by the British government: ‘international inspiration’. Articles published by uksport.gov.uk have revealed as focused mainly on The International Inspiration programme, a project to promote sports in ‘some of the most disadvantaged communities in the world’. While the media seemed to be looking for internal issues, the government was targeting international audiences. The choice of the word ‘inspiration’ references a much more immaterial and abstract idea of legacy that contrasts with the very concrete discussion around the Olympic Stadium hosted by the media.

Looking at the bigrams obtained from activist blogs, the concerns are shown to have been more local, targeting primarily challenges faced by citizens of East London. Among the main bigrams are ‘Stratford City’, ‘new jobs’ and ‘public housing’. The community-focused approach highlights a significant discrepancy between the framing of the event. These are preliminary steps to understand the multiple ways in which London’s legacy has been understood and narrated. The different perspectives indicate a distance between immediate public interest and government official communication regarding the most important sporting event in the world.

The Summer Olympic Games are hosted every four years by a different global city bringing together its promises to be an urban development catalyst and also the past events frustrations. Understanding the communication processes around the Olympics is fundamental for the future planning of effective legacies that correspond to the interests of the nations’ citizens.

*This post summarizes the preliminary results presented in my talk at ‘Documenting the Olympics and the Paralympics’, an event organised and hosted by the British Library in collaboration with the British Society of Sports History (BSSH), the International Centre for Sports History and Culture at De Montfort University (ICSHC) and the School of Advanced Study (SAS).

**This research is part of the CLEOPATRA Innovative Training Network, funded by the European Union’s Horizon 2020 research and innovation programme. It has been conducted under a PhD developed at the School of Advanced Study, University of London. For more information: cleopatra-project.eu.

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs