Digital scholarship blog

16 December 2016

Re-imagining a catalogue of illuminated manuscripts - from search to browse

In this guest post, Thomas Evans discusses his work with Digital Curator Dr Mia Ridge to re-imagine the interface to the British Library's popular Online Catalogue of Illuminated manuscripts.

The original Catalogue was built using an Access 2003 database, and allows users to create detailed searches from amongst 20 fields (such as date, title, origin, and decoration) or follow 'virtual exhibitions' to view manuscripts. Search-based interfaces can be ideal for specialists who already know what they're looking for, but the need to think of a search term likely to yield interesting results can be an issue for people unfamiliar with a catalogue. 'Generous interfaces' are designed as rich, browsable experiences that highlight the scope and composition of a particular collection by loading the page with images linked to specific items or further categories. Mia asked Thomas to apply faceted browsing and 'generous' styles to help first-time visitors discover digitised illuminated manuscripts. In this post Thomas explains the steps he took to turn the catalogue data supplied into a more 'generous' browsing interface. An archived version of his interface is available on the Internet Archive.

With over 4,300 manuscripts, written in a variety of languages and created in countries across Europe over a period of about a thousand years, the British Library's collection of illuminated manuscripts contains a diverse treasure trove of information and imagery for both the keen enthusiast and the total novice.

As the final project for my Masters in Computer Science at UCL, I worked with the British Library to design and start to implement alternative ways of exploring the collection. This project had some constraints in time, knowledge and resources. The final deadline for submission was only four months after receiving the project outline and the success of the project rested on the knowledge, experience and research of a fresh-faced rookie (me) using whatever tools I had the wherewithal to cobble together (open source software running on a virtual machine server hosted by UCL).

Rather than showing visitors an empty search box when they first arrive, a generous interface will show them everything available. However, taken literally, displaying 'everything' means details for over 4,300 manuscripts and around 40,000 images would have to be displayed on one page. While this approach would offer visitors a way to explore the entire catalogue, it could be quite unwieldy.

One way to reduce the number of manuscripts loaded onto the screen is to allow visitors to filter out some items, for example limiting the 'date' field to between 519 and 927 or the 'region' field to England. This is 'faceted' browsing, and it makes exploration more manageable. Presenting the list of available values for region or language, etc., also gives you a sense of the collection's diversity. It also means that 'quirky' members of the collection are less likely to be overlooked.

Screenshot of filters in Thomas CIM interface II
An example of 'date' facets providing an instant overview of the temporal range of the Catalogue

For example, if you were to examine 30 random manuscripts from the British Library's collection, you might find 20 written in Latin, three each in French and English, and perhaps one each in Greek, Hebrew or Italian. You would almost certainly miss that the Catalogue contains a manuscript written in Cornish, another in Portuguese and another in Icelandic. These languages might be of interest precisely because they are hard to come by in the British Library's catalogue. Listing all the available languages (as well as their frequencies) exposes the exceptional parts of the collection where an unfaceted generous interface would hide them in plain sight.

Once I understood the project's goals and completed some high-level planning and design sketches, it was time to get to grips with implementation. Being fairly inexperienced, I found some tasks took much longer than expected. A few examples which stick in the mind are properly configuring the web server, debugging errant server-side scripts (which have a habit of failing either silently or with an unhelpful error message) and transforming Library's database into a form which I could use.

Being the work of many hands over the years, the database inevitably contained some tiny differences in the way entries were recorded, which Mia informs me is not uncommon for a long-standing database in a collecting institution. These small inconsistencies - for example, the use of an en-dash in some cases and a hyphen in others - look fine to us, but confuse a computer. I worked around these where I could, 'cleaning' the records only when I was certain of my correction.

Being new to web design, I built the interface iteratively, component by component, consulting periodically with Mia for feedback. Thankfully, frameworks exist for responsive web design and page templating. Nevertheless, there was a small learning curve and some thought was required to properly separate application logic from presentation logic.

There were some ambitions for the project which were ultimately not pursued due to time (and knowledge!) constraints, but this iterative process made other improvements possible over the course of my project. To make exploration of the catalogue easier, the page listing a manuscript's details also contained links to related manuscripts. For instance, Ioannes Rhosos is attributed as the scribe of Harley 5699, so, on that manuscript's page, users could click on his name to see a list of all manuscripts by him. They could then apply further filters if desired. This made links between manuscripts much more clear than the old interface, but it is limited to direct links which were explicitly recorded in the database.

An example of a relevant feature not explicitly recorded in the database is genre - only by reading manuscript descriptions can you determine whether it is religious, historical, medical etc. in its subject matter. Two possible techniques for revealing such features were considered: applying natural language processing to manuscript descriptions in order to classify them, or analysing data about which manuscripts were viewed by which users to build a recommendation system. Both of these turned out to require more in-depth knowledge than I was able to acquire within the time limit of the project.

I enjoyed working out how to transform all the possible inputs to the webpage into queries which could be run against the database, dealing with missing/invalid inputs by providing appropriate defaults etc. There was a quiet satisfaction to be had when tests of the interface went well - seeing something work and thinking 'I made that!'. It was also a pleasure to work with data about such an engaging topic.

Hopefully, this project will have proved that exploration of British Library's Catalogue of Illuminated Manuscripts has the potential to become a richer experience. Relationships between manuscripts which are currently not widely known could be revealed to more visitors and, if the machine learning techniques were to be implemented, perhaps new relationships would be revealed and related manuscripts could be recommended. My project showed the potential for applying new computational methods to better reveal the character of collections and connections between their elements. Although the interface I delivered has some way to go before it can achieve this goal, I earnestly hope that it is a first step in that direction.

Thomas' Catalogue interface
Thomas' Catalogue interface