Digital scholarship blog

Enabling innovative research with British Library digital collections

29 November 2017

Crowdsourcing using IIIF and Web Annotations

Alex Mendes from the Digital Scholarship team explains how the LibCrowds platform uses emerging standards for digitised images and annotations.

Our new crowdsourcing project, In the Spotlight, was officially launched at the start of November 2017. The project asks volunteers to identify and transcribe key data held in digitised playbills. Here we explore two of the key technologies we adopted to enable this: IIIF and Web Annotations.

Task-configuration
Configuring a selection task using JSON

Commonly, when an institution began digitising a new type of content, or a particular project realised that the current infrastructure didn’t fit their needs, they may have built or commissioned a new image viewer, one that would probably be tightly coupled with their custom metadata structures. This leads to an ever-growing collection of isolated data silos that, among other issues, do not allow the information they contain to be easily reused.

The International Image Interoperability Framework (IIIF) is a set of APIs (protocols for requests between computers) that aims to tackle this issue by allowing images and metadata to be requested in a standardised way. Via these APIs, particular regions of images can be requested in a specified quality, size and format. The associated metadata includes information about how the images should be displayed and in what order. As this metadata is standardised, different image viewers can be built that are all able to understand and display the same sets of images. The one increasingly used by the library for catalogue items is called the 'Universal Viewer'.

Another IIIF-compliant viewer, called LibCrowds Viewer, has been developed for In the Spotlight. The viewer takes advantage of the flexibility enabled by the APIs described above. Images and metadata already held by the British Library can be requested, combined with some additional configuration details, and used to generate sets of crowdsourcing tasks. This means that we don’t need to host any additional image data, nor are we tied to any institution-specific metadata structures. In fact, the system could be used to generate crowdsourced annotations for any IIIF-compliant content.

Transcriptions are collected in the form of Web Annotations, a W3C standard that was published at the start of this year. This is another step towards future interoperability and reuse. By adopting this standard we can share our transcriptions more easily across the Web and incorporate them back into our core discovery systems.

As well as making the crowdsourced transcriptions searchable via the library’s catalogue viewer, they will be made available via the IIIF Content Search API, further increasing the ways in which the data could be reused. For example, we could develop programmatic ways to search the collection for a particular person who performed in a certain play in a given location.

To enable such exciting functionality we first need to collect the data and since we launched volunteers have completed over 14,000 tasks, which is a fantastic start. Visit In the Spotlight to get involved.

.