Student project report: Scribal Handwriting: An automated manuscript analysis tool
In 2017-18, Dr Mia Ridge worked with three groups of second year students on UCL's Computer Science course to apply their skills to collections and digital scholarship-based projects. In this post, Francesco Benintende (firstname.lastname@example.org), Kamil Zajac (email@example.com) and Andrei Maxim (firstname.lastname@example.org) explain how they worked with curator Alison Hudson on 'Scribal Handwriting: An automated manuscript analysis tool'. A video of their final presentation is available online and their project page contains more technical information.
The team was challenged to create a tool for palaeographers (researchers who analyse handwriting) that can determine the date of a manuscript and sometimes even its scribe and place of production. To help with this task, we designed a tool to quickly find occurrences of similar handwritten characters across a collection of documents. This would be a lengthy and repetitive task if done manually by researchers. Typically, researchers compare characters’ features such as script, size and ink of different manuscripts to establish possible similarities between manuscripts and scribes.
Our mission was to create a faster and reliable tool that could be used by palaeographers. Our aim was to speed up their research process by automating the comparisons between characters.
To create a solution for this particular challenge, our first approach consisted of problem research and user needs analysis. During this phase we made sure we highlighted the main features necessary for the application. We wanted to improve current methods and to understand the needs of future users. This phase was characterised by interviews, questionnaires and surveys aimed at people with similar technical level and background of the future user. This helped us tailor an appropriate user interface to the researchers. In addition to this, we tried to understand what the current limitations of research in the palaeography field were. Our initial research can be found at http://students.cs.ucl.ac.uk/2017/group33/initial_research.html.
After acquiring this initial information, creating the first prototype of the web application and testing the user response to the graphic components, we shifted our focus on building a system that would recognise characters written in similar scripts. This was the main phase of the development of the project. It consisted mainly of testing and evaluating different methods to find and compare characters’ features.
In our final phase, we were concerned with testing and evaluating our web application overall.
Our solution is a web app that allows researchers to create an account, upload and maintain a collection of manuscripts. With this, they can perform character searches in their personal collection. Furthermore, it allows researchers to perform analyses on these documents from anywhere.
To power our web app, we created our own algorithm to perform analysis between two characters or two ligatures. (Ligatures are the name for two characters written as one shape, as in the example of ‘NT’ below.) It does this by finding characters in selected pages to compare them with the character to be found. This analysis relies on converting the images of characters into ‘functions’ and then comparing them. This enables us to identify similar patterns in the characters, such as recurrent shapes and angles, and use this information to treat two characters as being similar.
Overall our solution offers a consistent improvement over current manual methods as it enables researchers to work on important documents without having to physically consult them. It also offers useful data about the results, by indicating which results are most similar in shape and size to the original character. This can help scholars think about how scribes’ work or even a pen’s sharpness might change over the course of many pages. This might also offer new ways of arguing which parts of a manuscript were copied out by different scribes. Such arguments are often based largely or entirely on more subjective appraisals. While these are still necessary, this app is a useful addition to palaeographers’ toolkits. The app also usefully places the results in context: their location is shown on the full page, and the excerpts include a few letters to either side. As we conclude from our testing, the main limitation in performance can be found in larger images (in pixels) and damaged manuscripts.
Regardless of size and condition, our web app is able to consistently find occurrences of characters in different manuscripts.
This project allowed our team to experience real world applications of image processing as well as getting a unique insight into the world of palaeography and its research procedures. The team also had to consider how to develop a web application for an audience that might not have used this sort of program before, and how to make a website that could work on a wide range of devices, from smartphones to relatively old desktop computers. Moreover, we outlined future points that would improve the web app to make it a consumer grade tool for researchers: the use of machine learning technologies to improve performance, a mobile version to allow researchers to work from their smartphones too, a version of the app to analyse shapes and decoration as well as text, and an improved version of the algorithm to analyse damaged documents where there is less contrast between the colour of the ink and the colour of the parchment.