UK Web Archive blog

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

3 posts categorized "Twitter"

24 June 2020

Our new Science web archive collection

By Philip Eagle, Subject Librarian - Science, Technology and Medicine at The British Library
Air pump CC0
A Philosopher Shewing an Experiment on the Air Pump, 1769 by Valentine Green



We have just activated our new web archive collection on science in the UK. One of the British Library's objectives as an institution as a whole is to increase our profile and level of service to the science community. In pursuit of this aim we are curating a web archive collection in collaboration with the UK legal deposit libraries. We have some collections already on science related subjects such as the late Stephen Hawking and science at Cambridge University, but not science as a whole.


Collection scope

We have interpreted "science" widely to include engineering and communications, but not IT, as that already has a collection. Our collection is arranged according to the standard disciplines such as biology, chemistry, engineering, earth sciences and physics, and then subdivided according to their common divisions, based on the treatment of science in the Universal Decimal Classification.

The collection has a wide range of types of site. We have tried to be fairly exhaustive on active UK science-related blogs, learned societies, charities, pressure groups, and museums. Because of the sheer number of university departments in the UK, we have not been able to cover them all. Instead we have selected the departments that did best in the 2014 Research Excellence Framework, and then taken a random sample to make sure that our collection properly reflects the whole world of academic science in the UK. We are also adding science-related Twitter accounts. Social media is generally difficult to archive due to its proprietary nature, but Twitter is open source so we can archive this more easily.



Under the Non-Print Legal Deposit Regulations 2013 we can archive UK websites but we are only able to make them available to people outside the Legal Deposit Libraries Reading Rooms, if the website owner has given permission. Some of the sites in the collection have already had permission granted, such as the Hunterian Society, Dame Athene Donald’s blog, and the Royal College of Anaesthetists. Some others who have not given permission include Science Sparks, the Wellcome Collection, and the British Pregnancy Advisory Service. The Web Archive page will tell you whether any archived site is only viewable from a library, anything with no statement can be viewed on the public web.

Get involved

As ever, if you have a site to nominate that has been left out, you can tell us by filling in our public nomination form:

01 March 2012

A Note on Nominations

Did you know that anyone can nominate websites for the UK Web Archive? We're exploring different ways to make it easier for people to nominate websites.

For several years now we've had a public nominations form on our website. However, we know that filling in a form can be a little daunting sometimes, even when it's only small and especially if you've not much time. So for the past few weeks we've been looking into additional options for accepting or submitting nominations.

Yesterday, we ran a small experiment using Twitter and invited followers to simply tweet the details of their nomination to @ukwebarchive. Our reasoning was simple: 

  1. It's very, very easy to share a link on Twitter
  2. So many people and organisations are already on Twitter and regularly share links with their followers
  3. It's fairly easy for us to monitor nominations coming in this way

We tweeted several times throughout the day about this and are pleased with the response. We had a small number of nominations on the day and several ReTweets, reaching a wider audience that our followers alone. It will be interesting to see if nominations continue to be tweeted when we aren't actively encouraging them. We need to evaluate the day in more detail, particularly with regards to how (and when) we respond, the types of nominations we receive, and how we can factor this into our current workflow. At the moment though, it's certainly worth more investigation.

We've also thinking about producing a browser plug-in that would  automatically populate a small number of fields with details of the site people are visiting, and submit them directly to us as a packaged nomination. This needs further thought, but we'd be interested to hear from people who'd like to use a plug-in like this. 

Finally, we're planning to overhaul the nominations form on the UK Web Archive website. This will make sure we're only asking people for information we really need, and which will help us to better assess their nomination.

So why not drop us a line, or a tweet, with your nomination? Alternatively, if you have any other ideas on how you'd like to nominate sites, why not leave a comment below?  We're always happy to hear new suggestions. 

02 December 2011

Twittervane: Crowdsourcing selection

TwitterbirdWe’re excited to announce development of a new tool to automate the selection of websites for archiving: the Twittervane.

At the moment, our selection process is manual, dependent upon internal subject specialists or external experts to contact us and nominate websites for archiving in the UK Web Archive. We benefit from their expertise and wouldn’t be without it, but we recognise that this manual selection process can sometimes be time consuming for frequent selectors. It’s also inevitably subjective, reflecting the interests of a relatively small number of selectors. 

Automated selection is an efficient and under-utilised alternative, but up until now it has been difficult to see how an automated approach could clearly identify the most popular and widely relevant websites. Our answer?  Twittervane. 

The Twittervane project will investigate how the power and wisdom of the crowd can be leveraged to automatically select websites for archiving. In essence, it's a crowdsourcing approach to selection that will compliment the manual selections provided by subject specialists and other experts. 

The project will:

  • Deliver a prototype tool for analysing twitter content that will:
    • determine which websites are shared most frequently around a given theme over a given time period;
    • link to our existing web archiving infrastructure to support harvesting of sites that fall within the UK domain
  • Generate at least one pilot special collection comprising websites most frequently shared across the crowd that address or are relevant to a unifying theme
  • Assess the viability of the approach from a curatorial perspective and investigate the ‘wisdom of crowds’ in this context. 

It’s important to get curatorial input to this approach, so we’ll be asking curators from the Library to assess the quality and relevance of resulting selections. The project will start in December and the prototype completed in time for next year’s IIPC May General Assembly in Washington, particularly important as the IIPC are contributing funding for the project.

We aim to provide regular progress updates as development takes place, so watch this space - and Twitter, of course - for more details.