At the moment, our selection process is manual, dependent upon internal subject specialists or external experts to contact us and nominate websites for archiving in the UK Web Archive. We benefit from their expertise and wouldn’t be without it, but we recognise that this manual selection process can sometimes be time consuming for frequent selectors. It’s also inevitably subjective, reflecting the interests of a relatively small number of selectors.
Automated selection is an efficient and under-utilised alternative, but up until now it has been difficult to see how an automated approach could clearly identify the most popular and widely relevant websites. Our answer? Twittervane.
The Twittervane project will investigate how the power and wisdom of the crowd can be leveraged to automatically select websites for archiving. In essence, it's a crowdsourcing approach to selection that will compliment the manual selections provided by subject specialists and other experts.
The project will:
- Deliver a prototype tool for analysing twitter content that will:
- determine which websites are shared most frequently around a given theme over a given time period;
- link to our existing web archiving infrastructure to support harvesting of sites that fall within the UK domain
- Generate at least one pilot special collection comprising websites most frequently shared across the crowd that address or are relevant to a unifying theme
- Assess the viability of the approach from a curatorial perspective and investigate the ‘wisdom of crowds’ in this context.
It’s important to get curatorial input to this approach, so we’ll be asking curators from the Library to assess the quality and relevance of resulting selections. The project will start in December and the prototype completed in time for next year’s IIPC May General Assembly in Washington, particularly important as the IIPC are contributing funding for the project.
We aim to provide regular progress updates as development takes place, so watch this space - and Twitter, of course - for more details.