Digital scholarship blog

29 November 2022

My AHRC-RLUK Professional Practice Fellowship: Four months on

In August 2022 I started work on a project to investigate the legacies of curatorial voice in the descriptions of incunabula collections at the British Library and their future reuse. My research is funded by the collaborative AHRC-RLUK Professional Practice Fellowship Scheme for academic and research libraries which launched in 2021. As part of the first cohort of ten Fellows I embraced this opportunity to engage in practitioner research that benefits my institution and the wider sector, and to promote the role of library professionals as important research partners.

The overall aim of my Fellowship is to demonstrate new ways of working with digitised catalogues that would also improve the discoverability and usability of the collections they describe. The focus of my research is the Catalogue of books printed in the 15th century now at the British Museum (or BMC) published between 1908 and 2007 which describes over 12,700 volumes from the British Library incunabula collection. By using computational approaches and tools with the data derived from the catalogue I will gain new insights into and interpretations of this valuable resource and enable its reuse in contemporary online resources. 

Titlepage to volume 2 of the Catalogue of books printed in the fifteenth century now in the British Museum, part 2, Germany, Eltvil-Trier
BMC volume 2 titlepage


This research idea was inspired by a recent collaboration with Dr James Baker, who is also my mentor for this Fellowship, and was further developed in conversations with Dr Karen Limper-Herz, Lead Curator for Incunabula, Adrian Edwards, Head of Printed Heritage Collections, and Alan Danskin, Collections Metadata Standards Manager, who support my research at the Library.

My Fellowship runs until July 2023 with Fridays being my main research days. I began by studying the history of the catalogue, its arrangement and the structure of the item descriptions and their relationship with different online resources. Overall, the main focus of this first phase has been on generating the text data required for the computational analysis and investigations into curatorial and cataloguing practice. This work involved new digitisation of the catalogue and a lot of experimentation using the Transkribus AI-empowered platform that proved best-suited for improving the layout and text recognition for the digitised images. During the last two months I have hugely benefited from the expertise of my colleague Tom Derrick, as we worked together on creating the training data and building structure models for the incunabula catalogue images.

An image from Transkribus Lite showing a page from the catalogue with separate regions drawn around columns 1 and 2, and the text baselines highlighted in purple
Layout recognition output for pages with only two columns, including text baselines, viewed on Transkribus Lite

 

An image from Transkribus Lite showing a page from the catalogue alongside the text lines
Text recognition output after applying the model trained with annotations for 2 columns on the page, viewed on Transkribus Lite

 

An image from Transkribus Lite showing a page from the catalogue with separate regions drawn around 4 columns of text separated by a single text block
Layout recognition output for pages with mixed layout of single text block and text in columns, viewed on Transkribus Lite

Whilst the data preparation phase has taken longer than I had planned due to the varied layout of the catalogue, this has been an important part of the process as the project outcomes are dependent on using the best quality text data for the incunabula descriptions. The next phase of the research will involve the segmentation of the records and extraction of relevant information to use with a range of computational tools. I will report on the progress with this work and the next steps early next year. Watch this space and do get in touch if you would like to learn more about my research.

This blogpost is by Dr Rossitza Atanassova, Digital Curator for Digitisation, British Library. She is on Twitter @RossiAtanassova  and Mastodon @ratanass@glammr.us

.