THE BRITISH LIBRARY

UK Web Archive blog

14 posts categorized "Legal deposit"

25 September 2020

The World of Food and the UK Web Archive

Add comment

 

By Helena Byrne, Curator of Web Archives at the British Library

 

Assorted sliced fruits in white ceramic bowl surrounded by more sliced fruits and some small muffins
A variety of food

 

Food is a subject that transcends culture, politics and leisure practices. Thus, food has always been a key part of the UK Web Archive (UKWA) since it was established in 2005. 

 

Recipes, restaurant menus, food blogs, online reviews are just the start of food related online material that UKWA collects. Even protest and campaigning can be food related, for instance, this summer, footballer Marcus Rashford highlighted the issue of child poverty and the lack of access to food, especially during the school holidays. 

 

For the last three years the British Library has been running a series of events around food. Due to the coronavirus pandemic, this year's Food Season moved online with a series of talks over the autumn period. 

 

The Food Season celebrates the British Library’s extensive food-related collections and explores the politics, pleasures and history of food. UKWA, which is a partnership of the six UK Legal Deposit Libraries, including the British Library, also has an extensive collection of food related websites. 

 

Food collections

In 2017, the Food Archive collection was established. This collection covers the following topics:

There are currently 333 websites or web pages in this collection. Some of the websites selected include Eat Like a Girl, the Good Grub Club and the Veggies Catering Campaign. Why not have a browse through the collection and nominate your favourite UK published food sites or restaurant websites to be included in the collection? Anyone can nominate a website by following this link: https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

Even though there is a dedicated collection about food, it also features as a subsection in a number of other collections. ‘Food and Drink’ is a subsection in both the Festivals and Online Enthusiast Communities in the UK collections. In addition, individual food websites appear in several other collections. Websites related to food activism appear in both the Political Action and Communication collection as well as the (soccer) fan subsection of the Sport: Football Collection, as numerous supporters clubs have organised to support their local food banks. 

 

Social media is a very popular way to share food and micro-reviews of eateries, however, this is often challenging for us to archive. At present, Twitter is the only social media platform that we archive on a regular basis but these captures are by no means comprehensive. We have experimented with other methods of archiving social media but this is on a selective basis.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

 

Some of the websites  in UKWA that have already had permission granted, these include the Cake Fest Edinburgh, the Lancashire Pork Pie Appreciation Society and the Food Research Collaboration. Some examples of websites that are onsite-only access include the Biscuit Appreciation Society, the UK Menu Archive and Fans Supporting Food Banks.

 

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

Due to the coronavirus pandemic, the reading rooms were closed for a number of weeks but are starting to reopen. This blog post gives an overview of opening hours and how to book a visit at the six UK Legal Deposit Libraries:

https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html 

 

We would especially like to see more food and drink nominations that reflect the multicultural nature of the UK and the many diaspora communities based here. Browse through what we have so far and please nominate more content here:

https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

17 September 2020

Arnhem75 - a special collection of websites added to the UK Web Archive

Add comment

 

By Marja Kingma, Curator of Germanic Collections, the British Library.

 

Arnhem75 blog image
Book cover of 75 Years Battle of Arnhem by Laurens van Aggelen

 

Introduction

The idea to create a collection of websites about the commemoration of Arnhem75 came to RAF Museum historian Harry Raffal and myself whilst attending the seminar ‘The Arnhem Spirit - 75 years of Brits in Arnhem’, on 15 May 2019, organised by the Dutch Embassy in London. The event was part of a programme in which the Netherlands, Britain and other former Allied countries commemorated Operation Market Garden, the code name for the battle for the bridge across the Rhine at Arnhem that took place in September 1944. Allied forces consisted of British, American and Polish troops, with help from Dutch resistance.

The Battle of Arnhem 1944 is of great significance to the UK and interest in it remains strong on both sides of the North Sea.

We wanted to create a lasting memory of these events and a special collection in the UK Web Archive on the subject seemed like a good idea.

 

What is included?

We kept the scope of the project quite narrow; only websites with a focus on the commemorations that took place in Britain and the Netherlands in 2019 are included, with the exception of some websites that deal with the historic facts regarding the Battle to give it some context.

So far over 150 individual websites within the UK web domain have been identified, of which 64 were selected to go into the collection. These sites are limited to the UK web domain, so have .uk in their domain name, or if they don’t must be hosted in the UK, or owned by UK organisations or individuals with a postal address in the UK.

Some of the websites selected for this collection include the 23 Parachute Field Ambulance, Airborne at the Bridge and Arnhem Oosterbeel War Cemetary.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the UK Legal Deposit Libraries reading rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

For this collection you can view what has been selected through the UK Web Archive website but will need to visit a UK Legal Deposit Library reading room to view the archived content. The reading rooms across the Legal Deposit Libraries are starting to reopen now, with some restrictions, as you can read in this blog: https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html

 

How Can I Get Involved?

You can help expand this collection by sending us a URL you think may be eligible for inclusion in the collection Arnhem75. Please go to https://www.webarchive.org.uk/en/ukwa/info/nominate to nominate a website and we’ll take it from there.

Occasionally websites from non UK domains can be included, if they have a strong link to the UK and the website owners have given their permission to be included in the collection. Dutch organisations that were involved in the Arnhem75 commemorations are encouraged to get in touch.

We look forward to your suggestions!

 

10 September 2020

Launching the UK Web Archive 2020 Annual Domain Crawl

Add comment

By Helena Byrne, Curator of Web Archives at the British Library

Today (10th September 2020) the UK Web Archive team will be pushing the big red button to kickstart the annual Domain Crawl of the UK webspace. The current coronavirus pandemic will no doubt feature strongly in this year’s crawl. This will complement the curated collection that the web archive teams across the UK Legal Deposit Libraries are contributing. The British Library along with the National Library of Scotland are also selecting websites for the International Internet Preservation Consortium (IIPC) Content Development Group (CDG) Novel Coronavirus (COVID-19) collection. 

What we collect

The UK Web Archive has been archiving UK published websites on a selective basis since 2005 and in 2020 is celebrating #15YearsOfUKWA. Domain Crawl 2020 is the seventh that has taken place. It wasn’t till after the implementation of the Non-Print Legal Deposit Regulations (NPLD) in April 2013, that we were able to run a broad crawl over the UK webspace. This includes anything with a .uk or other UK geographic Top Level Domain (TLD) such as .scot, .cymru or .london etc. It also includes websites on other TLDs that have been registered in the UK or that have been manually selected. 

NPLD came into effect on the 6th April 2013 and the British Library hosted a special event to launch the first Domain Crawl. This was widely covered in the national press and you can still watch back a short video from the event on The Guardian website

How much data is collected in the Domain Crawl?

The Domain Crawl usually runs for three months of the year and each year starts at a different time of year to avoid seasonal biases. Roughly 5-10 million hosts (websites) are archived every year. However, the amount of data collected each year varies. Also the way the data is collected and stored over time changes. We compress the data we store and as technology develops the amount of data that can be compressed into one terabyte changes. Last year 63.7 TB of compressed data was collected bringing the total collected during Domain Crawls from 2013 to 2019 to 477.62 TB. 

UKWA Domain Crawl 2013-2019 (1)

When can I view this content?

Due to the enormous amounts of data that is collected each year from the annual Domain Crawl and our Frequent Crawls, there is a significant lag from when the content is archived and made available through the UK Web Archive website. The Frequent Crawl data collected from 2013-2019 was 250.34 TB bringing the combined total to 727.96 TB of compressed data. To make searching content easier the website allows you search across all the Selectively Crawled content from 2005 to 2013 as well as the Frequent Crawl content from 2013 to 2017 and the Domain Crawl content 2013 to 2015. 

Under the Non-Print Legal Deposit (NPLD) Regulations 2013, we can archive all UK published websites but we are only able to make them available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission.

Due to the NPLD Regulations, access to the archived content is a mix of open and onsite access. The ‘Viewable only on Library premises’ message on individual records indicates that you have to visit one of the six UK Legal Deposit Libraries.  The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.

Follow the UK Web Archive on Twitter for the latest updates on the domain crawl and other web archiving activities! 



04 August 2020

Twit twoo: International Owl Awareness Day 2020

Add comment

By Helena Byrne, Curator of Web Archives, The British Library
 
 
 
An illustration of four owls perched on a branch with the moonlight behind them
British Library digitised image from page 271 of "Madeline Power [A novel] https://www.flickr.com/photos/britishlibrary/11121066504

 

The 4th of August is International Owl Awareness Day. This is the perfect time to reflect on owl related content in the UK Web Archive. 

There are five native species of owls’ resident year-round in the UK, namely the Tawny Owl, Barn Owl, Long-eared Owl, Short-eared Owl and Little Owl. Also, the Snowy Owl is an is an occasional winter visitor to the Outer Hebrides, Shetland and the Cairngorms in Scotland.

Owls online

We were wondering, out of these six owl species, which one is the most popular on the archived .uk domain?

 

UK Owl Species Shine Trends
A graph showing how many mentions the six owl species have on the archived .uk web

 

In order to answer this question, the Shine graph may prove useful. Shine was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC. The data was acquired by JISC from the Internet Archive and includes all .uk websites in the Internet Archive web collection crawled between 1996 and April 2013. The collection comprises over 3.5 billion items (URLs, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.

The most popular owl species referenced in the Shine dataset is the Barn Owl. Despite the curve in the graph being at its peak in 2011, the most popular year for the Barn Owl was 2012. This is because the graph shows the percentage of resources archived for each year and some years have more resources than others. In 2011 there were 66,034 of 288,809,412 archived resources that mention Barn Owl, while in 2012 there were 94,990 of 463,367,189 resources. These numbers are too big to review manually but by clicking at a single point on the graph, Shine will generate a random sample of up to 100 references to the search term. The sample displays a sentence were the term appears, as well as a link out to the Internet Archive so that you can review the archived website.

 

Get creative with owls at the British Library

Video created by Carlos Lelkes-Rarugal, using Tawny Owl hoots recorded by Richard Margoschis in Gloucestershire, England (BL ref 09647). British Library digitised image from page 272 of "The Works of Alfred Tennyson, etc" 

 

Curious about what some of these owls’ sound like? Our Assistant Web Archivist, Carlos Lelkes-Rarugal, designed some short animated videos using recordings from the British Library Sound Archive and images from the British Library Flickr account. You can view these on the UK Web Archive, Digital Scholarship and the Sound Archive’s Wildlife Department Twitter accounts.

The title for this blog post was inspired by the sound made by the Tawny Owl. This and other sounds can be experienced in the Sound Archive at the British Library which has over 2,500 recordings of owls from all over the world. You can hear a selection of some these recordings on the British Library, Sound & Vision blog.

The Digital Scholarship team have also put together a useful album of digitised illustrations of owls on the British Library Flickr account. Their latest blog post encourages you to use these images for various creative projects.

 

Get involved with preserving owls online with the UK Web Archive

The UK Web Archive aims to archive, preserve and give access to the UK web space. We endeavour to include important aspects of British culture and events that shape society. The biodiversity of the UK is an important aspect of our collective national culture and is represented in several British Library collections including the UK Web Archive.

We can’t however, curate the whole of the UK Web on our own, we need your help to ensure that information, discussion and creative output on this subject are preserved for future generations.

Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate

We already have an Online Enthusiast Communities in the UK curated collection that features some owl related websites in the Animal related hobbies subsection. Browse through what we have so far and please nominate more content!

 

31 July 2020

LGBTQ+ Lives Online

Add comment

 
 A white banner with the LGBTQ+ flag colours painted on with the text - love is love
Photo by 42 North from Pexels

By Steven Dryden, British Library LGBTQ+ Staff Network & Ash Green CILIP LGBTQ+ Network

 

When the internet first rose to prominence in the late 1990s, one of the primary modes of communicating with others was through internet chat rooms and forums. Suddenly, isolated people all over the world with a personal computer and internet access could communicate with others ‘like them’.

By using the term ‘like them’ we acknowledge that there is some form of social oppression which makes a person, perhaps alone in a rural community, feel unable to be themselves - to know anything about themselves at all. It is perhaps partly for the need to feel more connected with other people ‘like them’ that LGBTQ+ people adapted to online community-building quickly. Now, as we have been living online for over 25 years, it seems pertinent to consider what traces of early digital lives survive, and how we can begin to make sense of it. What survives of digital campaigns to legalise the age of consent for all sexualities in the UK (2001), gain recognition and protections of members of the trans community (Gender Recognition Act 2004) or the battle for marriage equality in the UK (England and Wales, 2013, Scotland 2014, Northern Ireland 2019)? As well as historical content such as this, we must also ensure we are ready and able to curate current and future online discussions and websites surrounding LGBTQ+ lives as well.

Part of this process has already begun. Through the UK Web Archive, the British Library along with the other five UK Legal Deposit Libraries, has been able to run an annual domain crawl of the UK web since April 2013, after the implementation of Non-Print Legal Deposit Regulations. Prior to this websites were archived on a permissions basis since January 2005. Through the Shine interface you can search the JISC UK Web Domain Dataset (1996-2013), this holds all the .uk websites archived by the Internet Archive from 1996 to April 2013. As a next step, the British Library and Chartered Institute of Library and Information Professionals (CILIP) LGBTQ+ Network are pleased to work collaboratively and develop LGBTQ+ Lives Online. This project will tag and subject categorise relevant websites in the UK Web Archive, and expand the scope of websites we collect for future generations. We look forward to sharing with you over the coming months the work that is being undertaken and how you can contribute.

CILIP LGBTQ+ Network members are pleased to be working collaboratively with the British Library and the UK Web Archive on this project, and recognise the historical value and importance of developing the LGBTQ+ Lives Online web archive.

The aim of the UK Web Archive is to collect content published on the UK web that reflects all aspects of life in the UK. This includes important aspects of British culture and events that shape society. The LGBTQ+ Lives Online collection reflects the important role this community plays in British society. The UK Web Archive is delighted to collaborate with the British Library LGBTQ+ Staff Network and the CILIP LGBTQ+ Network to build on the existing LGBTQ+ collection. Although there is a dedicated collection about the LGBTQ+ community, many of the websites tagged in this collection also intersect with other collections in the archive such as our various sports collections, Political Action and Communication and Oral History in the UK.

 

Get Involved:

CILIP LGBTQ+ Network, the British Library and the UK Web Archive welcome nominations for UK websites which should be included in the LGBTQ+ Lives Online.

Nominations can be made via this form: https://www.webarchive.org.uk/en/ukwa/nominate

 

Keep an eye on the CILIP LGBTQ+ Network Twitter as well as the UK Web Archive blog and Twitter account for more updates on the LGBTQ+ Lives Online collection.

 

29 July 2020

15 Years of UKWA - Looking back at our first collections

Add comment

By Jason Webber, Web Archive Engagement Manager, The British Library

 

This blog follows on from ‘15 Years of the UK Web Archive - The Early Years’.

2020 marks fifteen years since  the UK Web Archive (UKWA) started archiving UK published  websites. In this blog I’ll be looking at the first curated collections that were made and some of the differences in web archiving from then until now.

In 2005, when the British Library (as part of the UK Web Archive Consortium (UKWAC)) started collecting websites, the techniques and procedures were still being pioneered. It was identified early on that grouping captured websites into collections would be useful for future researchers. Read about a few of our first.

 

Indian Ocean Tsunami 

On Boxing day 2004, a huge earthquake and subsequent Tsunami caused severe destruction and loss of life in many areas around the Indian Ocean. Almost immediately afterwards a huge international relief effort was underway that included several UK based efforts. This catastrophic event happened just at the point that UKWAC started archiving websites and curators quickly decided that this deserved to be reflected in the archive . Selection and archiving took place between January and March 2005. It resulted in a small collection of websites representing news articles, charities and the response from travel companies.

This first collection demonstrated the ability of web archives to collect digital material around key events as they happened. Indian Ocean Tsunami collection

 

Collection_2435_indianoceantsunami
Indian Ocean

 

UK General Election 2005
In addition to ‘rapid response’ events, UKWA aims to collect important national events such as elections. 2005 was a period before fixed term elections and the curation team had only a matter of weeks to organise a plan between the government calling the election and it taking place. The way that candidates promoted themselves was different in 2005 than they are now. Only some had their own websites, Facebook was not yet widespread and Twitter didn’t yet exist. It is a fascinating contrast between the 2005 UK General Election and the last one in 2019 both in number (148 v 2,234) and in the range and breadth of the collection.

 

View of Westminster Bridge and the Palace of Westminster from the opposite side if the River Thames

 

Blogs
We all now know what a blog is, right? In 2005 though, it was a relatively new way for people to self publish on the web. It was so new that when the collection was first made we felt the need to explain what one was and that it was a shortening of ‘web log’.

Since then, of course, blogs have been a widespread form of self expression and creativity. They cover every imaginable subject from politics to satire, local history to personal history and many more. This collection contains over 1000 blogs, many of which are no longer available. See what you can find in the Blogs collection.

 

Image of word tiles spelling the word blog

 

Selective curation

Since 2013, thanks to the Non-Print Legal Deposit Regulations, the UK Web Archive is able to archive any UK published website. Prior to 2013, however, curators had to obtain permission from the website owner before any archiving  could take place. UKWA has always tried to collect a representative sample of the UK web which can include a very wide range of topics and opinions. We have always tried to be clear that selection is not endorsement, either of views or of quality. Each item in the collection is rich in its own way.

 

100+ curated collections and counting

Since these first collections in 2005, the number of collections has grown to over 100.  See all of our curated collections here.

We have continued to respond to important events with ‘rapid response’ collections such as the Zika Virus outbreak of 2016-2017 and the death of Margaret Thatcher in 2013. We have also continued to collect political events such as General elections, Scottish and Welsh Parliamentary elections and several key referendums such as the EU referendum. We also try to represent all parts of the UK from the FTSE100 to the lives and hobbies of the nation in ‘Online enthusiasts’.

 

24 June 2020

Our new Science web archive collection

Add comment

 
By Philip Eagle, Subject Librarian - Science, Technology and Medicine at The British Library
 
 
Air pump CC0
A Philosopher Shewing an Experiment on the Air Pump, 1769 by Valentine Green

 

Introduction

We have just activated our new web archive collection on science in the UK. One of the British Library's objectives as an institution as a whole is to increase our profile and level of service to the science community. In pursuit of this aim we are curating a web archive collection in collaboration with the UK legal deposit libraries. We have some collections already on science related subjects such as the late Stephen Hawking and science at Cambridge University, but not science as a whole.

 

Collection scope

We have interpreted "science" widely to include engineering and communications, but not IT, as that already has a collection. Our collection is arranged according to the standard disciplines such as biology, chemistry, engineering, earth sciences and physics, and then subdivided according to their common divisions, based on the treatment of science in the Universal Decimal Classification.

The collection has a wide range of types of site. We have tried to be fairly exhaustive on active UK science-related blogs, learned societies, charities, pressure groups, and museums. Because of the sheer number of university departments in the UK, we have not been able to cover them all. Instead we have selected the departments that did best in the 2014 Research Excellence Framework, and then taken a random sample to make sure that our collection properly reflects the whole world of academic science in the UK. We are also adding science-related Twitter accounts. Social media is generally difficult to archive due to its proprietary nature, but Twitter is open source so we can archive this more easily.

 

Access

Under the Non-Print Legal Deposit Regulations 2013 we can archive UK websites but we are only able to make them available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. Some of the sites in the collection have already had permission granted, such as the Hunterian Society, Dame Athene Donald’s blog, and the Royal College of Anaesthetists. Some others who have not given permission include Science Sparks, the Wellcome Collection, and the British Pregnancy Advisory Service. The Web Archive page will tell you whether any archived site is only viewable from a library, anything with no statement can be viewed on the public web.


Get involved

As ever, if you have a site to nominate that has been left out, you can tell us by filling in our public nomination form: https://www.webarchive.org.uk/ukwa/info/nominate

08 June 2020

Documenting the Olympics & Paralympics

Add comment

 
 
Olympic Stamps
Stamps issued by Greece in 1896, the Universal Postal Union Collection, Philatelic Collections, The British Library.

 

Join our panel discussion to discover more about researchers' experiences when navigating archives, as well as the collection policies related to Olympics/Paralympics of GLAM organisations. This event is a collaboration between the British Society of Sports History (BSSH) and the British Library Web Archive team.

 

Register here to receive the joining details:

https://forms.gle/Tjzikxgjvr3FofSr8 

Date:           19 June 2020

Time:          3-4:30pm (BST) / 10-11:30am (EST)

Location:    Zoom

Twitter hashtag: #ResearchingtheGames

 

Presentations

Heather Dichter, De Montfort University - Finding Olympic history in non-sport archives

Laura Alexandra Brown, Northumbria University - The heritage of the Games: Interpreting urban change in Olympic host cities

Robert McNicol, Librarian, Wimbledon Lawn Tennis Museum - Researching the Olympics/Paralympics at Wimbledon

Helena Byrne, Curator of Web Archives, British Library - Preserving the Olympics/Paralympics online

 

What to expect

There is a broad mix of physical, digitised and born digital resources will be covered in the presentations. The Curator of Web Archives, Helena Byrne will be discussing the UK Web Archive collections related to the Olympics/Paralympics as well as the collaboration with the International Internet Preservation Consortium (IIPC).

The year 2020 was originally an Olympic/Paralympic year before the outbreak of the coronavirus pandemic. It is also a significant milestone for the UK Web Archive and the IIPC. It marks 15 years since the first UK Web Archive collections were published and also 10 years since the IIPC first started archiving the Olympics.

 

UKWA Sports
https://www.webarchive.org.uk/en/ukwa/collection

 

The UK Web Archive and sports

The UK Web Archive has been archiving sports related websites since it was established in 2005. However, it wasn’t until 2017 when dedicated sports collections were established. There are three broad collection groups Sports Collection, Sports: Football and Sports: International Events. The subsections of the Sports: International Events includes two summer and two winter Olympic/Paralympic collections from 2010, 2012, 2014 and 2016. The largest of these collections is the Olympic & Paralympic Games 2012 collection as the Games were hosted in the UK.

 

Access and reuse

Under the Non-Print Legal Deposit Regulations 2013 (NPLD) access to archived content is restricted to a UK legal deposit library reading room. However, if we have permission from the website owner, we can make the archived version of their content open access along with government publications under the Open Government Licence. This is why if you browse through the collections on our website, most of the links to archived content will direct you to one of the UK legal deposit libraries for access but some of the content you can view from your personal device.

 

IIPC and the Olympic/Paralympics

The UK Web Archive is made up of the six UK legal deposit libraries, two of those libraries, the British Library and the National Library of Scotland are also members of the International Internet Preservation Consortium (IIPC) which was founded in 2003. In 2010 the IIPC started its first collaborative collection on the Winter Olympics 2010 and has covered every Olympic/Paralympic Games since. Since the formation of the IIPC Content Development Group (CDG) the collections have started to include a broader range of subjects on and off the playing field.

 

Get Involved

The UK Web Archive aims to archive, preserve and give access to the UK web space.

If you see content that that should be included in one of sports collections then please fill in our online nomination form.