THE BRITISH LIBRARY

UK Web Archive blog

5 posts categorized "Crowdsourcing"

25 September 2020

The World of Food and the UK Web Archive

Add comment

 

By Helena Byrne, Curator of Web Archives at the British Library

 

Assorted sliced fruits in white ceramic bowl surrounded by more sliced fruits and some small muffins
A variety of food

 

Food is a subject that transcends culture, politics and leisure practices. Thus, food has always been a key part of the UK Web Archive (UKWA) since it was established in 2005. 

 

Recipes, restaurant menus, food blogs, online reviews are just the start of food related online material that UKWA collects. Even protest and campaigning can be food related, for instance, this summer, footballer Marcus Rashford highlighted the issue of child poverty and the lack of access to food, especially during the school holidays. 

 

For the last three years the British Library has been running a series of events around food. Due to the coronavirus pandemic, this year's Food Season moved online with a series of talks over the autumn period. 

 

The Food Season celebrates the British Library’s extensive food-related collections and explores the politics, pleasures and history of food. UKWA, which is a partnership of the six UK Legal Deposit Libraries, including the British Library, also has an extensive collection of food related websites. 

 

Food collections

In 2017, the Food Archive collection was established. This collection covers the following topics:

There are currently 333 websites or web pages in this collection. Some of the websites selected include Eat Like a Girl, the Good Grub Club and the Veggies Catering Campaign. Why not have a browse through the collection and nominate your favourite UK published food sites or restaurant websites to be included in the collection? Anyone can nominate a website by following this link: https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

Even though there is a dedicated collection about food, it also features as a subsection in a number of other collections. ‘Food and Drink’ is a subsection in both the Festivals and Online Enthusiast Communities in the UK collections. In addition, individual food websites appear in several other collections. Websites related to food activism appear in both the Political Action and Communication collection as well as the (soccer) fan subsection of the Sport: Football Collection, as numerous supporters clubs have organised to support their local food banks. 

 

Social media is a very popular way to share food and micro-reviews of eateries, however, this is often challenging for us to archive. At present, Twitter is the only social media platform that we archive on a regular basis but these captures are by no means comprehensive. We have experimented with other methods of archiving social media but this is on a selective basis.

 

How can you access these archived websites?

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK published websites but are only able to make the archived version available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  

 

Some of the websites  in UKWA that have already had permission granted, these include the Cake Fest Edinburgh, the Lancashire Pork Pie Appreciation Society and the Food Research Collaboration. Some examples of websites that are onsite-only access include the Biscuit Appreciation Society, the UK Menu Archive and Fans Supporting Food Banks.

 

As the content of UKWA has mixed access, the message ‘Viewable only on Library premises’ will appear under the title of the website if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

Due to the coronavirus pandemic, the reading rooms were closed for a number of weeks but are starting to reopen. This blog post gives an overview of opening hours and how to book a visit at the six UK Legal Deposit Libraries:

https://blogs.bl.uk/webarchive/2020/09/ukwa-available-in-reading-rooms-again.html 

 

We would especially like to see more food and drink nominations that reflect the multicultural nature of the UK and the many diaspora communities based here. Browse through what we have so far and please nominate more content here:

https://www.webarchive.org.uk/en/ukwa/info/nominate 

 

25 August 2020

Cats vs Dogs on the Archived Web

Add comment

 By Helena Byrne, Curator of Web Archives at the British Library

 

Cats and dogs, two of the most popular pets in the world, have international days of celebration in August. The 8th August is International Cat Day and the 26th August is International Dog Day. 

 

How popular are cats and dogs on the archived web?

 

Cats vs Dogs
Screenshot of the search results on Shine for Cat and Dog

 

One way to answer this question is to use the Shine Trends feature. Shine was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC. The data was acquired by JISC from the Internet Archive and includes all .uk websites in the Internet Archive web collection crawled between 1996 and April 2013. The collection comprises over 3.5 billion items (URLs, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.

 

Taking the Shine graph at face value, overall it would seem that cats are more popular on the archived .uk domain than dogs.

 

The graph shows the percentage of resources archived for each year. In some cases the largest peak on the graph doesn’t necessarily mean the most mentions for your search; this could be attributed to a larger amount of data archived for that particular year. However, when it comes to ‘Cats vs Dogs’, the largest peak for ‘Cat’ is the most popular year while the most popular year for ‘Dog’ is slightly below the peak in the graph.  In 2005, there were almost 14.2 million mentions of ‘cat’ out of 331 million resources archived. While in 2012, there were almost 13 million mentions of ‘Dog’ out of 464 million resources archived that year.

It is not possible to view every archived resource attributed to the generated stats, but you can click on markers along the plotted graph and you will be supplied with a random sample of matching records for that year. The sample displays a sentence where the term appears, as well as a link out to the Internet Archive so that you can review the archived website.

When we review the random sample for ‘Cat’ generated for 2005, we can see that very few of the references are to our furry friends; instead, the word “Cat” mostly refers to an abbreviation for catalogue (for shopping online). This reflects a lot of the changes in how the web is used and online shopping became more popular during this period. By looking through some of the other samples we can see the use of the term ‘CAT’ as an acronym for various different systems.

On the other hand, when we look at the sample results for ‘Dog’ in 2012, most of the results are about the animal or related products such as dog food and dog accessories.

 

Possible big data project

 

After reviewing the use of the term ‘Cat’ and ‘Dog’ can we really say that the animal-related variation is the most popular on the archived .uk domain?

A possible way to truly determine which family pet is the most popular would be through an in depth analysis of the .UK domain. Something similar to the project, ‘Mining the UK Web Archive for Semantic Change Detection’ run by the Alan Turing Institute, would provide more insight into which animal is more popular in this dataset. 

This project identified words whose meaning has changed over time on the archived web. For example, when the word ‘tweet’ stopped being commonly referred to as the sound a bird makes and used more often to describe the message being sent through the social media platform Twitter.

Pierpaolo Basile, a visiting researcher at the Alan Turing Institute, used the same data that is behind Shine in his research project ‘Detecting semantic shift in large corpora by exploiting temporal random indexing’. You can watch a recording of a presentation about this research on the Alan Turing Institute YouTube channel.

 

What cats and dogs websites are in the UK Web Archive?

 

The general UK Web Archive and a number of curated collections on the Topics and Themes page of the website feature many animal-related websites, and a lot of these focus on cats and dogs. Although archiving social media is very challenging, we do have a wide selection of Twitter accounts in the archive. These include many cat persona profiles; from libraries to political cats. Some of the political cats included in the archive are Larry the Cat from 10 Downing Street and Palmerston from the Foreign Office. We haven’t come across any similar UK dog persona profiles so if you know of any please nominate them to be included in the UK Web Archive. However, there are other Twitter profiles that collect images of dogs such as Non-League Dogs. This profile is included in both the soccer section of our Sport: Football collection as well as our Online Enthusiast Communities in the UK collection.

Animal welfare websites are also well represented in our UK General Election series of collections dating from 2005 to 2019, as many publish political manifestos during the election period.

As mentioned in the International Owl Awareness Day blog post, the Online Enthusiast Communities in the UK curated collection has an Animal Related Hobbies subsection. Here you can find a number of cat and dog-related sites but we know there are many more out there. Why not nominate your favourite websites and forums?

 

How can you access these archived websites?

 

Under the Non-Print Legal Deposit Regulations 2013, we can archive UK websites but we are only able to make them available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. The UK Legal Deposit Libraries are the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin Library.  Some of the sites in the collection have already had permission granted, such as the Battersea Dogs & Cats Home, Cats Protection and Library Cat. Some examples of websites that are onsite-only access include Dogs Trust, Dog Forum and Purrs In Our Hearts Forum.

 

As the content of the UK Web Archive has mixed access, the message ‘Viewable only on Library premises’ will appear under the title if you need to visit a Legal Deposit Library to view the content. If there is no message underneath then the archived version of the website should be available on your personal device.

 

Get involved with preserving cats and dogs online with the UK Web Archive

 

The UK Web Archive aims to archive, preserve and give access to the UK web space. We endeavour to include important aspects of British culture and events that shape society. Animals and especially pets in the UK are an important aspect of our collective national culture and are represented in several collections across the UK Legal Deposit Libraries, including the UK Web Archive.

 

We can’t however, curate the whole of the UK Web on our own, we need your help to ensure that information, discussion and creative output on this subject are preserved for future generations. Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate

 

Browse through what we have so far and please nominate more content!

 

04 August 2020

Twit twoo: International Owl Awareness Day 2020

Add comment

By Helena Byrne, Curator of Web Archives, The British Library
 
 
 
An illustration of four owls perched on a branch with the moonlight behind them
British Library digitised image from page 271 of "Madeline Power [A novel] https://www.flickr.com/photos/britishlibrary/11121066504

 

The 4th of August is International Owl Awareness Day. This is the perfect time to reflect on owl related content in the UK Web Archive. 

There are five native species of owls’ resident year-round in the UK, namely the Tawny Owl, Barn Owl, Long-eared Owl, Short-eared Owl and Little Owl. Also, the Snowy Owl is an is an occasional winter visitor to the Outer Hebrides, Shetland and the Cairngorms in Scotland.

Owls online

We were wondering, out of these six owl species, which one is the most popular on the archived .uk domain?

 

UK Owl Species Shine Trends
A graph showing how many mentions the six owl species have on the archived .uk web

 

In order to answer this question, the Shine graph may prove useful. Shine was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC. The data was acquired by JISC from the Internet Archive and includes all .uk websites in the Internet Archive web collection crawled between 1996 and April 2013. The collection comprises over 3.5 billion items (URLs, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.

The most popular owl species referenced in the Shine dataset is the Barn Owl. Despite the curve in the graph being at its peak in 2011, the most popular year for the Barn Owl was 2012. This is because the graph shows the percentage of resources archived for each year and some years have more resources than others. In 2011 there were 66,034 of 288,809,412 archived resources that mention Barn Owl, while in 2012 there were 94,990 of 463,367,189 resources. These numbers are too big to review manually but by clicking at a single point on the graph, Shine will generate a random sample of up to 100 references to the search term. The sample displays a sentence were the term appears, as well as a link out to the Internet Archive so that you can review the archived website.

 

Get creative with owls at the British Library

Video created by Carlos Lelkes-Rarugal, using Tawny Owl hoots recorded by Richard Margoschis in Gloucestershire, England (BL ref 09647). British Library digitised image from page 272 of "The Works of Alfred Tennyson, etc" 

 

Curious about what some of these owls’ sound like? Our Assistant Web Archivist, Carlos Lelkes-Rarugal, designed some short animated videos using recordings from the British Library Sound Archive and images from the British Library Flickr account. You can view these on the UK Web Archive, Digital Scholarship and the Sound Archive’s Wildlife Department Twitter accounts.

The title for this blog post was inspired by the sound made by the Tawny Owl. This and other sounds can be experienced in the Sound Archive at the British Library which has over 2,500 recordings of owls from all over the world. You can hear a selection of some these recordings on the British Library, Sound & Vision blog.

The Digital Scholarship team have also put together a useful album of digitised illustrations of owls on the British Library Flickr account. Their latest blog post encourages you to use these images for various creative projects.

 

Get involved with preserving owls online with the UK Web Archive

The UK Web Archive aims to archive, preserve and give access to the UK web space. We endeavour to include important aspects of British culture and events that shape society. The biodiversity of the UK is an important aspect of our collective national culture and is represented in several British Library collections including the UK Web Archive.

We can’t however, curate the whole of the UK Web on our own, we need your help to ensure that information, discussion and creative output on this subject are preserved for future generations.

Anyone can suggest UK websites to be included in the UK Web Archive by filling in our nominations form: https://www.webarchive.org.uk/en/ukwa/nominate

We already have an Online Enthusiast Communities in the UK curated collection that features some owl related websites in the Animal related hobbies subsection. Browse through what we have so far and please nominate more content!

 

01 March 2012

A Note on Nominations

Add comment Comments (0)

Did you know that anyone can nominate websites for the UK Web Archive? We're exploring different ways to make it easier for people to nominate websites.

For several years now we've had a public nominations form on our website. However, we know that filling in a form can be a little daunting sometimes, even when it's only small and especially if you've not much time. So for the past few weeks we've been looking into additional options for accepting or submitting nominations.

Yesterday, we ran a small experiment using Twitter and invited followers to simply tweet the details of their nomination to @ukwebarchive. Our reasoning was simple: 

  1. It's very, very easy to share a link on Twitter
  2. So many people and organisations are already on Twitter and regularly share links with their followers
  3. It's fairly easy for us to monitor nominations coming in this way

We tweeted several times throughout the day about this and are pleased with the response. We had a small number of nominations on the day and several ReTweets, reaching a wider audience that our followers alone. It will be interesting to see if nominations continue to be tweeted when we aren't actively encouraging them. We need to evaluate the day in more detail, particularly with regards to how (and when) we respond, the types of nominations we receive, and how we can factor this into our current workflow. At the moment though, it's certainly worth more investigation.

We've also thinking about producing a browser plug-in that would  automatically populate a small number of fields with details of the site people are visiting, and submit them directly to us as a packaged nomination. This needs further thought, but we'd be interested to hear from people who'd like to use a plug-in like this. 

Finally, we're planning to overhaul the nominations form on the UK Web Archive website. This will make sure we're only asking people for information we really need, and which will help us to better assess their nomination.

So why not drop us a line, or a tweet, with your nomination? Alternatively, if you have any other ideas on how you'd like to nominate sites, why not leave a comment below?  We're always happy to hear new suggestions. 

02 December 2011

Twittervane: Crowdsourcing selection

Add comment Comments (0)


TwitterbirdWe’re excited to announce development of a new tool to automate the selection of websites for archiving: the Twittervane.

At the moment, our selection process is manual, dependent upon internal subject specialists or external experts to contact us and nominate websites for archiving in the UK Web Archive. We benefit from their expertise and wouldn’t be without it, but we recognise that this manual selection process can sometimes be time consuming for frequent selectors. It’s also inevitably subjective, reflecting the interests of a relatively small number of selectors. 

Automated selection is an efficient and under-utilised alternative, but up until now it has been difficult to see how an automated approach could clearly identify the most popular and widely relevant websites. Our answer?  Twittervane. 

The Twittervane project will investigate how the power and wisdom of the crowd can be leveraged to automatically select websites for archiving. In essence, it's a crowdsourcing approach to selection that will compliment the manual selections provided by subject specialists and other experts. 

The project will:

  • Deliver a prototype tool for analysing twitter content that will:
    • determine which websites are shared most frequently around a given theme over a given time period;
    • link to our existing web archiving infrastructure to support harvesting of sites that fall within the UK domain
  • Generate at least one pilot special collection comprising websites most frequently shared across the crowd that address or are relevant to a unifying theme
  • Assess the viability of the approach from a curatorial perspective and investigate the ‘wisdom of crowds’ in this context. 

It’s important to get curatorial input to this approach, so we’ll be asking curators from the Library to assess the quality and relevance of resulting selections. The project will start in December and the prototype completed in time for next year’s IIPC May General Assembly in Washington, particularly important as the IIPC are contributing funding for the project.

We aim to provide regular progress updates as development takes place, so watch this space - and Twitter, of course - for more details.