UK Web Archive blog

57 posts categorized "Collections"

05 July 2022

What to expect on the UK Web Archive blog during UEFA Women’s Euro England 2022

By Helena Byrne, Curator of Web Archives, British Library

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We are collecting websites about the UEFA Women’s Euro 2022 from around the UK

You can view the UEFA Women’s Euro England 2022 collection here:  https://www.webarchive.org.uk/en/ukwa/collection/4278

a blue banner image with the British Library, Inspired by England 2022, the National Football Museum and the UK Web Archive. A female football player kicking a ball and the text, Can you help us preserve football history? We are collecting websites about the UEFA Women’s EURO 2022. Nominate a website for us to archive QR code and link to the nomination form: https://www.webarchive.org.uk/en/ukwa/info/nominate

Over the next few weeks there will be a number of guest blog posts from the UK Web Archive and collaborators from around the UK. 

First up, we will have a blog post from the National Library of Scotland and the National Library of Wales. Neither Scotland nor Wales qualified for this edition of the tournament, but as part of the UK Web Archive, both national libraries will be contributing to the collection and ensuring that any fan events taking place are preserved. 

From the 18th July there will be a number of blog posts published each week in July.  There will be a guest blog post from the Public Records Office of Northern Ireland (PRONI) who will be contributing a range of content from Northern Ireland. The team from Northern Ireland made history by qualifying for their first UEFA Women’s Euro tournament. 

There will be a series of blog posts from the tournament’s Arts and Heritage partners in the host cities. There were three specially commissioned projects to celebrate the rich history of women’s football and its players and to encourage more people to be inspired by the tournament. These blog posts will also include updates from across the UEFA Women’s Euro England 2022 host cities. These blog posts will give a summary of their local cultural programme activities, as well as an overview of what websites they nominated to the collection that are important for telling the story of the UEFA Women’s Euro England 2022 tournament in their area.

The final blog post in the series will be published in late September, this will be a reflection on the collection activities and give an overview of some personal favourites from the curator of the web archive collection, Helena Byrne. 

Get involved 
Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form: https://www.webarchive.org.uk/en/ukwa/info/nominate 

29 June 2022

What content should I nominate on the UEFA Women’s Euro to the UK Web Archive?

By Helena Byrne, Curator of Web Archives, British Library

a blue banner image with the UK Web Archive, British Library, Inspired by England 2022 and the National Football Museum. A female football player kicking a ball and the text, Can you help us preserve football history? We are collecting websites about the UEFA Women’s EURO 2022. Nominate a website for us to archive:

The UEFA Women's Euro 2022 competition is taking place across England from July 6 to July 31, 2022. We are collecting websites about the 2022 UEFA Women’s EURO from around the UK. You can view the collection here:  

https://www.webarchive.org.uk/en/ukwa/collection/4278 

This blog post runs through some examples of the type of content you might like to nominate to the collection. 

We archive websites: 1. That are on a .uk or other UK geographic top-level domain such as .scot or .cymru. 2. That are published in the UK.  We do not archive: 1.Online Sound or Video platforms, in which audio-visual material is the predominant content. 2. Private Intranets and Emails. 3. Personal data in social networking sites or websites only available to restricted groups.

We archive as much openly available online content that we can identify as being published in the UK. Archiving is carried out through a mix of automated processes such as an annual domain crawl or through manual selection by the UK Web Archive teams, as well as the public nomination form.

UEFA Women’s Euro England 2022
For the UEFA Women’s Euro England 2022 we want content that specifically refers to the tournament. Some websites might only have a subsection or even just one page dedicated to the tournament so you can nominate that specific URL. 

We add the following type of web content to the collection:

  1. Full website
  2. Subsection of a website
  3. Individual page from a website
  4. Event page
  5. Twitter accounts

Unfortunately due to technical challenges, the only social media content we can successfully archive is Twitter. If you know of any high-profile Twitter accounts -  that aren’t personal accounts of ordinary people - then please nominate them. 

Examples of some website content we have added so far include:

Full website
Have you seen any new websites set up just for the UEFA Women’s Euro 2022 tournament? Most websites will, at most, just have a dedicated subsection or page for the tournament. Some websites such as the official sponsor, Visa, highlight the tournament on their home page in the run-up to and during the tournament. This is why we have added the whole website to the collection, as it is easy for the user to navigate from the home page of the archived website during the tournament to the dedicated section for the tournament. 

Subsection of a website
The FA website has a subsection dedicated to UEFA Women’s Euro 2022. The earliest captures of this subsection are from July 2020 which you can view here:

https://www.webarchive.org.uk/wayback/archive/20200726095218/http://www.thefa.com/competitions/uefa-womens-euro-2022 

a screenshot of the UEFA Women’s Euro 2022 subsection of the FA website from July 26 2020. The text reads Women’s Euro set for 2022. The UEFA Women’s Euro 2021 in England is postponed until the summer of 2022] https://www.webarchive.org.uk/wayback/archive/20200726095218/http://www.thefa.com/competitions/uefa-womens-euro-2022

Link to archived website: https://www.webarchive.org.uk/wayback/archive/20200726095218/http://www.thefa.com/competitions/uefa-womens-euro-2022 

Individual page from a website
In some cases there is just one page on a website relevant to the collection subject. When thinking about women’s football, the Royal Philharmonic Orchestra (RPO) doesn’t always come top of the list of potential websites. However, they have partnered with the FA to ‘engage fans in a range of musical opportunities and public events celebrating the history, ethos and future of women’s football’. What other websites have you seen that have posted an article about the UEFA Women’s Euro 2022 tournament? 

You can listen back to the archived versions of the anthems on the RPO website here: https://www.webarchive.org.uk/wayback/archive/20220621111257/https://www.rpo.co.uk/rpo-resound/womens-euro-anthem 

Event pages:
There are lots of events going on around the UEFA Women’s Euro 2022, these range from official events, fan-led events or venues organising their own events such as talks, book launches or watch parties for the matches. Eventbrite is one of the most popular platforms for ticketing these events, but have you seen any other platforms or websites?

A search on Eventbrite for Euro 2022 in the United Kingdom on the day of writing comes back with 500 pages

Twitter accounts:
Archived copies of Twitter accounts are only accessible through a reading room, but you can view what we have selected here: https://www.webarchive.org.uk/en/ukwa/collection/4284

We have already added the Twitter accounts of the players for England, Northern Ireland and other players based in the UK. However, we may have missed some, so please let us know through the nomination form.

Get involved 
Anyone can suggest UK published websites to be included in the UK Web Archive by filling in our nomination form.

30 May 2022

What UKWA did at the IIPC Web Archive Conference 2022

By Jason Webber, Engagement Manager, The British Library

Between the 18 and 25 May 2022, we had the biggest annual event in the world of web archiving - The IIPC General Assembly and Web Archive Conference. Some of the sessions were for members only but many were free and open for anyone to attend.

IIPC conference banner

Here are the UKWA staff and research partners who gave presentations at the conference with links to their pre-recorded talks that have been uploaded to our YouTube channel.

 

 

23 May 2022

Building Event Collections from Web Archives

By Sara Abdollahi, PhD student, L3S Research Center

The world is frequently experiencing events such as terrorist attacks, Brexit, and the migrant crisis, that has resulted in a vast amount of event-centric information on the web. Researchers, particularly digital humanities researchers and social scientists who analyse the significant events that influence and shape our societies, can benefit from web archives that reflect the perception of events as they happened at the time.

The Research challenge
Web archiving services provide a preserved state of the web that facilitates its study in the future. The ever-growing structure of web archives is one of the main challenges in accessing information for specific research. It is often difficult or even impossible for researchers to find their required documents. Typically, web archives offer interfaces for the users to access the information they need through keyword search. Researchers can then type the name of the event they are interested in and retrieve a list of web documents containing the text's keyword. The returned results are often overwhelming due to their quantity, potential redundancy, and irrelevance, needing an additional intensive cleaning phase to get more related web documents.

The UK Web Archive (UKWA) as well as some other web archives, offer manually collected event-centric collections to solve this issue, which can be considerably time-consuming to create. More importantly, these collections might not cover all necessary information related to a specific event.

A Potential Solution
To address the mentioned challenge, I propose automatically building event collections from web archives using knowledge graphs. Knowledge graphs such as
Wikidata and DBpedia are collections of interlinked real-world entities and concepts. 

In this research, I utilise the EventKG knowledge graph which provides structured information about events, their characteristics, and relationships (e.g., sub-events) and can thus be used as a resource for extending and diversifying the search space when building event collections.

Take the Arab Spring as an example; Tunisian Revolution, Bahraini protests of 2011, and 2011 Yemeni revolution are three sub-events of it. The figure below demonstrates an example of using EventKG to create event collections for Arab Spring. 

Building Event collections diagram

By utilising sub-events to expand the initial user query, a more diverse initial set of documents can be retrieved. This process leads to increased precision and coverage of the final event collection. Traditional methods might miss related documents to sub-events if there is no mention of the main event in those documents. To advance such methods, I demonstrate the impact of event-centric features and relations from a knowledge graph on building event collections.

Sara is giving a presentation of this project at IIPC Web Archive Conference 2022 (session 15) - Register for free.

12 November 2021

Welsh language websites within the UK Web Archive

By Aled Betts, Acquisitions Librarian and Web Archivist, National Library of Wales

The National Library of Wales have been collecting Welsh language websites to archive for the UK Web Archive since the 2004. In 2018, we decided to collate these websites and include them in a dedicated Collection in order to make it more accessible to researchers.

Significantly, 2018 was an important milestone for the Welsh language as it was 25 years since the passing of the Welsh Language Act in 1993 which gives effect to the principle that in the conduct of public business in Wales, the English and Welsh languages should be treated ‘on the basis of equality’. It was also 10 years since the passing of Welsh Language (Wales) Measure 2011 giving the Welsh language official status in Wales. In terms of Government and Public Bodies, the following principle that the Welsh language will not be treated less favourably than English was observed. As a result, the Welsh language is clearly visible and widespread on the web as many websites by law are now bilingual.

However, the aim of the Welsh Language Collection was not simply to list websites that were published through the medium of Welsh. The focus was more on those websites and organisations whose aim was to promote and facilitate the use of the Welsh language in all walks of life. The Collection also covers websites relating to Welsh language communities, online and physical, where Welsh is the medium of communication. It also looks at bodies that promote Welsh umbrella organisations as well as groups that campaign and lobby for the language. Furthermore, we have been collecting Welsh language websites since 2004, therefore we were able to showcase many of these websites and show how much they had changed over the last 17 years!

Here is just a small sample of the type of websites covered in the Welsh Language Collection.

Advocacy, campaigning and lobbying
Much of the work promoting the Welsh language across Wales is done by Mentrau Iaith (English: Language Initiatives). These are community-based organisations that operate to raise the profile of the Welsh language in a specific area. The percentage of Welsh speakers vary considerably. For instance, the highest percentages of Welsh speakers can be found in Gwynedd (64%) and the lowest is Blaenau Gwent (8%) therefore the challenges in each area differ. In order to capture this important work, we also archived their twitter feeds. These feeds are showing us how these initiatives are promoting the Welsh language in their respective areas. Furthermore, the Menter Iaith (English: Language initiative) umbrella body website is one the earliest sites we captured, a site we first archived in 2006.

Welsh-language-02

Mentrau Iaith (English: Language initative) website in 2021

Mentrau Iaith website

Mentrau Iaith (English: Language initative) website in 2006 

Over the last 2 decades, we have seen bodies and organisations evolve, grow and some disappear. A statutory body set up under the Welsh Language Act 1993 was Bwrdd yr Iaith Gymraeg (English: Welsh Language Board). The board was responsible for administering the Welsh Language Act and for seeing that public bodies in Wales kept to its terms. The Welsh Language Board was abolished in 2012 and following the passing of the 2011 Welsh Language (Wales) Measure, powers were transferred to the Welsh Government and the Welsh Language Commissioner, a new body promoting and facilitating the use of the Welsh language. Fortunately, we have captured this transfer of power as we have been archiving the Welsh Language Board website since 2008 and the Welsh Language Commissioner since 2012, in both cases, open access has been granted.

Welsh-language-03

Bwrdd yr iaith Gymraeg/ (English: Welsh Language Board) website in 2008

Welsh-language-04

Comisiynydd y Gymraeg (English: Welsh Language Commissioner) website in 2021

Arts and Culture
The Welsh language has a lively and vibrant arts, music and literature scene. This is no more exemplified by the Eisteddfod Genedlaethol (English: National Eisteddfod) and Urdd Gobaith Cymru, the Welsh language national voluntary youth organisation, who run the Urdd Eisteddfod, arguably Europe's largest youth festival. Both sites are archived since early 2000’s. The National Eisteddfod is held in different locations each year alternating between north and south Wales therefore naturally the content changes every year. The first National Eisteddfod we archived was Eisteddfod Genedlaethol Cymru Casnewydd a’r Cylch (English: National Eisteddfod of Wales Newport and surrounding area) in 2004 and our first Urdd National Eisteddfod was Eisteddfod yr Urdd Sir Ddinbych (English: Urdd Eisteddfod Denbighshire) in 2006! Again, open access granted, therefore available to view anywhere.

Welsh-language-05

The Eisteddfod Genedlaethol Cymru Casnewydd a’r Cylch (English: National Eisteddfod of Wales Newport and surrounding area) 2004

Welsh-language-06

Urdd Eisteddfod Denbighshire 2006

Alongside the all-important bodies, we archive a plethora of arts and culture websites, from record labels to folk groups, theatrical bodies, local eisteddfodau and Welsh language festivals. Same goes for the buoyant Welsh literature and publishing scene, close to a hundred websites listed within our ‘literature and publishing sub-section.

Education and Learning
An all-important sub-section is Education and learning. Here two types of websites dominate. One is education and learning through the medium of Welsh. Here, Welsh-medium education, including Mudiad Meithrin (English: Nursery Movement), formed in 1971, to nurture early-years Welsh speakers to Coleg Cymraeg Cenedlaethol (English: Welsh National College), formed in 2011, to develop Welsh-language courses and resources for Higher Education students are archived.

Welsh-language-07

Coleg Cymraeg Cenedlaethol (English: Welsh National College) website in 2011

Secondly, the web has seen an explosion of language learning websites globally. This is also apparent in the Welsh language allowing those wishing to learn a second language to do so through the internet.

Welsh-language-08

SaySomethinginWelsh website in 2011

As of 2021, the collection has between 500 and 600 websites and is a growing collection. However, a significant collection, as many websites were collected since the early days of web archiving in 2004. The principle of equality had been an underlying theme in Welsh language discourse and legislation was passed to meet this demand. The Collection explores how promoting and supporting the Welsh language has changed over the past 20 years but also shows how legislation has helped shape this change.

19 October 2021

Clouds and blackberries: how web archives can help us to track the changing meaning of words

By Dr Barbara McGillivray (Turing Fellow), Pierpaolo Basile (Assistant Professor in Computer Science, University of Bari), Dr Marya Bazzi (Turing Fellow) and  Dr Jenny Basford, Jason Webber (British Library)

NOTE: This a re-blog from the Alan Turing Institute, with permission.

The meaning of words changes all the time. Think of the word ‘blackberry’, for example, which has been used for centuries to refer to a fruit. In 1999, a new brand of mobile devices was launched with the name BlackBerry. Suddenly, there was a new way of using this old word. ‘Cloud’ is another example of a well-established word whose association with ‘cloud computing’ only emerged in the past couple of decades. Linguists call this phenomenon ‘semantic change’ and have studied its complex mechanisms for a long time. What has changed in recent years is that we now have access to huge collections of data which can be mined to find these changes automatically. Web archives are a great example of such collections, because they contain a record of the changing content of web pages.

But how can we automatically detect in a huge web archive when a word has changed its meaning? A common strategy is to build geometric representations of words called word embeddings. Word embeddings use lots of data about the context in which words are used so that similar words can be clustered together. We can then do operations on these embeddings, for example to find the words that are closest (and most similar in meaning) to a given word. It’s a useful technique, but building embeddings takes a lot of computing power. Having access to pre-trained embeddings can therefore make a big difference, enabling those in the scientific community without sufficient computational resources to participate in this research.

A team of researchers from The Alan Turing Institute and the Universities of Bari, Oxford and Warwick, in collaboration with the UK Web Archive team based at the British Library, has now released DUKweb, a set of large-scale resources that make pre-trained word embeddings freely available. Described in this article, DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a collection of all .uk websites archived by the Internet Archive between 1996 and 2013. (This dataset is held and maintained by the UK Web Archive, which has been collecting websites since 2005, initially on a selective basis and since 2013 at a whole domain level.) DUKweb contains 1.3 billion word occurrences and two types of word embeddings for each year of the JISC UK Web Domain Dataset. The size of DUKweb is 330GB.

Researchers can use DUKweb to study semantic change in English between 1996 and 2013, looking at, for instance, the effects of the growth of the internet and social media on word meanings. For example, if the word ‘blackberry’ is used mostly to refer to fruits in 1996 and to mobile phones in 2000, the 1996 embedding for this word will be quite different from its 2000 embedding. In this way, we can find words that may have changed meaning in this time period. The figure below (from Tsakalidis et al., 2019) shows four words whose contexts of use have changed in the last couple of decades: ‘blackberry’, ‘cloud’, ‘eta’ and ‘follow’. The bars indicate words most similar to these four words in 2000 (red bars) and in 2013 (blue bars). The scale along the bottom gives a measure of the change.

figure 02 - analysis - clouds, blackberries

The resources that underpin DUKweb are hosted on the British Library’s research repository, and are available for anyone in the world to download, reuse and repurpose for their own projects. This repository is part of the BL’s Shared Research Repository for cultural heritage organisations, which brings together the research outputs produced by participating institutions, and makes them discoverable to anybody with an internet connection. Providing a stable, dedicated location to hold heritage datasets in order to share them with a wider research community has been one of the key drivers in the implementation and development of this repository service. We are grateful to the British Library’s Repository Services team for supporting this collaboration between the UK Web Archive team and the Turing by making the content for DUKweb available.

Read the paper: DUKweb: diachronic word representations from the UK Web Archive corpus

 

04 October 2021

UK Web Archive Climate Change Collection

By Andrea Deri, Cataloguer, Lead Curator of UK Web Archive Climate Change Collection; Nicola Bingham, Lead Curator, Web Archives; Eilidh MacGlone, Web Archivist; Trevor Thomson, General Collections Assistant (Collection Development) National Library of Scotland


What public climate and sustainability related UK websites would you preserve for future research?

What public UK websites tell the story of climate change actions in your areas of living, travelling, working, study and passions?

Nominate these websites to the UK Web Archive Climate Change Collection. You can nominate as many websites or webpages as you feel are relevant.

Desert landscape - Photo by '_Marion'
Photo by '_Marion'

About the Climate Change Collection
The UK Web Archive Climate Change Collection is not only an archive of past digital content preserved for future research. It is also a live, dynamic, growing resource for decisions, research and learning today.  

Much of the debate around climate change is taking place on the Web and is, therefore, highly ephemeral, meaning it is important to capture it now, in real time. The UK Web Archive Climate Change collection does just that: captures climate related public UK websites and archives them regularly according to the frequency of updates on the website. 

What is the UK Web Archive?
The UK Web Archive (UKWA) is a collaboration of the six UK legal deposit libraries working together to preserve websites for future generations. The Climate Change collection is one of over hundred curated collections of the UK Web Archive. Given the multi-, inter- and transdisciplinary nature of the climate crisis, researchers may also find several other UKWA collections relevant for studying climate change, for example, the News Sites, Science Collection, British Countryside, Energy, Local History Societies, District Councils, Political Action and Communication, Brexit, among others.  

While all the UK legal deposit libraries contribute subject expertise to the Climate Change collection’s development, to make it more representative we solicit nominations as widely as possible. To this end we have developed a simple form, which allows anyone to nominate public websites or web pages published in the UK. If you would like to nominate a website for the UK Web Archive Climate Change collection add the title, URL and brief description of the website or webpage. 

UKWA Climate change nomination-form

If you would like us to acknowledge your nomination, enter  your name and email address.

What can UKWA archive?
Before you nominate, you might want to check your nomination for scope and duplication. The UK Web Archive cannot archive sound and video platforms in which the audio and video content dominate. Websites that require personal log-in details, for example Facebook sites, or private intranets, emails, personal data on social networking sites or websites only allowable to restricted groups. 

What happens to my nomination?
All nominations are checked manually by a curator. If the website meets the requirements of non-print legal deposit, it is added to the collection by library staff without any prejudice regarding content. We want to make the climate change collection representative of diverse perspectives. The annotation process includes assigning broad subject labels, crawl frequency (the frequency of archiving), and a licencing request for making historical pages public. While all UKWA Climate Change collection titles are listed online, archived versions of the websites can be accessed only in legal deposit libraries’ reading rooms unless licenced.  

 Why is this collection important?
The UKWA Climate Change collection serves several functions, three being particularly important: 

  1. Supports research - Supports research related to climate change issues
  2. Raises awareness & curiosity - Makes readers aware of and curious about the diversity of climate change impacts, mitigation and adaptation activities across scale
  3. Engages in action - Inspires readers to take action including nominating websites for future preservation and by doing so contributing to the knowledge base of climate change

By inviting nominations, the UKWA Climate Change collection draws on a citizen science approach, in other words, engages members of the public in academic research and developing the collection. The integration of library science and citizen science acknowledges the complementary values of diverse forms of knowledge, including diverse forms of local knowledge. With their nominations contributors can diversify existing sub-collections and initiate the creation of new sub-collections. For example, a new sub-collection has just recently been suggested dedicated to climate change & sustainability strategies of UK galleries, libraries, archives and museums (GLAM sector).  

History of the Collection
The collection was established when The Paris Agreement was negotiated at the UNFCCC COP21, in 2015. The acceleration of the climate crises, the exponential growth of digital climate content publishing and the demand for innovations that can be inspired by a diversity of knowledge, local, practical, technical and academic, called for an upgrade. The Climate Change collection is an important source of knowledge both in preparation for the UNFCCC COP26 conference in Glasgow

Websites and webpages archived over time tell the stories how individuals and organisations have been making sense of and responding to the climate crises. We encourage you to nominate the public websites that tell the stories of your engagement with the changing climate and websites you want to preserve for future generations. 

Further recommended sources 

09 September 2021

UK Communities online - Enthusiasts and Hobbyists

By Jason Webber, Web Archive Engagement Manager, The British Library

Over the next few months we will be looking at some of the communities in the UK and how they have used the web. Look out for #UKCommunitiesOnline on twitter.

Whatever your hobby or interest, however obscure or niche, the web has allowed people to share their passions and meet (virtually or in real life) likeminded folk. Let's look at just a few items from the fantastic 'Online Enthusiast' collection.

The English Tiddlywinks Association

Blitz, gromp and squidger - just a few of the fabulous terms used in the very serious (but also very fun) game of Tiddlywinks. 

English tiddlywinks assoc

Archived website from 2014.

British Trams Online

There may not be as many tram services in the UK as there once was but here is THE definitive guide to trams currently in service across the UK.

British trams online archived website

Archived website from 2015.

Morris Dancing - 'Open Morris'

The long held tradition of 'Morris Dancing'. A form of this folk dancing has been around since the late middle ages and is still a popular pastime.

Open morris website in the UK web Archive

Archived page from 2014.

Synth DIY Wiki

Music is a huge part of many people's lives but not many make their own instruments. From the website: "This is a budding wiki for learning and sharing knowledge about making, modifying, or repairing electronic musical instruments and related equipment yourself."

Synth diy website

Archived page from 2014.

Telegraph Pole Society

"These much ignored pieces of rural and urban furniture finally have a website of their own."

Telegraph pole society in the UK Web Archive

Archived page from 2020.

Call out!

Is your hobby, interest or pastime represented online? Have you made a website about your spare time passion? You can nominate ANY UK website here: www.webarchive.org.uk/nominate

UK Web Archive blog recent posts

Archives

Tags

Other British Library blogs