Alex Mendes, Research Software Engineer with the British Library's Digital Scholarship team, provides some insight into our adaptation of an existing crowdsourcing platform to meet our varied needs.
Earlier this month, we announced a preview of a new crowdsourcing project we're working on. In the Spotlight aims to make the libraryâ€™s collection of historic playbills easier to find. This post will explore some of the factors involved in our initial project design and the technologies used within the core application.
The In the Spotlight homepage
During the early stages of development we talked to people working on various projects that deal with similar material, such as Ensemble @ Yale, which is an experiment into crowdsourcing transcriptions of digitised programs for Yale dramatic productions. While these conversations were incredibly useful, and the projects inspiring, after some deliberation we decided that the overhead of modifying such an application to fit our particular needs was too large.
Such projects have often been built for, and become increasingly tightly coupled with, a particular institutional purpose. By starting with such an application and modifying it heavily with our own institution-specific code we would likely be assuming sole responsibility for future maintenance of that application. Being unable to merge our code back into the original, we would be left managing our own modified version; one with limited usefulness outside an increasingly specific purpose. We wanted to avoid creating a significant maintenance issue, and sought a more generic, yet customisable platform.
Accordingly, we turned to our existing crowdsourcing platform, LibCrowds, which was launched in June 2015 to host the Convert-a-Card projects and help turn printed card catalogues into a searchable online database. The platform is based on PyBossa, a Python library for building crowdsourcing projects that is still very much in active development.
We hoped that it would be relatively quick to generate a new set of projects for collecting the crowdsourced playbills data. In fact, our first prototypes were ready back in April. However, as more detailed requirements were established we soon began to come up against some of the limits of the platformâ€™s existing architecture.
The projects page from the old LibCrowds theme
For instance, we needed to present the appearance of a self-contained website designed around the playbills, with additional pages and features not present in the core PyBossa model. We previously navigated some of these issues by developing custom plugins, but as the need for these grew the approach was becoming unwieldy.
Not long before we encountered these issues, PyBossa had released an update allowing for it to be run as a headless backend server. 'Headless' means that it can be run as a stand-alone piece of software, separate from any graphical user interface, and be interacted with purely via an API. This differs from the â€˜traditionalâ€™ website, in which the front and backend communicate directly, causing the functionality and architecture of one to be heavily dependent on the other.
We took the plunge and decided to drop some of the work that had gone into the redesign up to that point, opting to run a headless PyBossa instance as our backend and rewriting our frontend as a separate single-page application (SPA), using the Vue.js framework. This approach gives us the freedom to structure the website as required, without having to modify large amounts of backend code. Backend plugins still have a place but the majority of custom functionality can be handled within the browser.
The new LibCrowds homepage.
This new frontend application comprises a set of core LibCrowds pages, including a homepage and an administration interface where staff can manage the projects. Sitting beneath these, each project has its own set of themed pages, giving the appearance of bespoke websites for each project. Crucially, the new architecture managed this without requiring us to maintain multiple application instances, or the handling of user authentication between those instances.
In hindsight, we should have spent more time on requirements gathering at the start of the process, as we iterated through a number of possible system designs before settling on our current architecture. However, we seem to be moving towards quite a clean solution and one that will hopefully provide a satisfying user experience.