THE BRITISH LIBRARY

Digital scholarship blog

13 posts categorized "Manuscripts"

13 February 2018

BL Labs 2017 Symposium: Samtla, Research Award Runner Up

Add comment

Samtla (Search And Mining Tools for Labelling Archives) was developed to address a need in the humanities for research tools that help to search, browse, compare, and annotate documents stored in digital archives. The system was designed in collaboration with researchers at Southampton University, whose research involved locating shared vocabulary and phrases across an archive of Aramaic Magic Texts from Late Antiquity. The archive contained texts written in Aramaic, Mandaic, Syriac, and Hebrew languages. Due to the morphological complexity of these languages, where morphemes are attached to a root morpheme to mark gender and number, standard approaches and off-the-shelf software were not flexible enough for the task, as they tended to be designed to work with a specific archive or user group. 

Figure1
Figure 1: Samtla supports tolerant search allowing queries to be matched exactly and approximately. (Click to enlarge image)

  Samtla is designed to extract the same or similar information that may be expressed by authors in different ways, whether it is in the choice of vocabulary or the grammar. Traditionally search and text mining tools have been based on words, which limits their use to corpora containing languages were 'words' can be easily identified and extracted from text, e.g. languages with a whitespace character like English, French, German, etc. Word models tend to fail when the language is morphologically complex, like Aramaic, and Hebrew. Samtla addresses these issues by adopting a character-level approach stored in a statistical language model. This means that rather than extracting words, we extract character-sequences representing the morphology of the language, which we then use to match the search terms of the query and rank the documents according to the statistics of the language. Character-based models are language independent as there is no need to preprocess the document, and we can locate words and phrases with a lot of flexibility. As a result Samtla compensates for the variability in language use, spelling errors made by users when they search, and errors in the document as a result of the digitisation process (e.g. OCR errors). 

Figure2
Figure 2: Samtla's document comparison tool displaying a semantically similar passage between two Bibles from different periods. (Click to enlarge image)

 The British Library have been very supportive of the work by openly providing access to their digital archives. The archives ranged in domain, topic, language, and scale, which enabled us to test Samtla’s flexibility to its limits. One of the biggest challenges we faced was indexing larger-scale archives of several gigabytes. Some archives also contained a scan of the original document together with metadata about the structure of the text. This provided a basis for developing new tools that brought researchers closer to the original object, which included highlighting the named entities over both the raw text, and the scanned image.

Currently we are focusing on developing approaches for leveraging the semantics underlying text data in order to help researchers find semantically related information. Semantic annotation is also useful for labelling text data with named entities, and sentiments. Our current aim is to develop approaches for annotating text data in any language or domain, which is challenging due to the fact that languages encode the semantics of a text in different ways.

As a first step we are offering labelled data to researchers, as part of a trial service, in order to help speed up the research process, or provide tagged data for machine learning approaches. If you are interested in participating in this trial, then more information can be found at www.samtla.com.

Figure3
Figure 3: Samtla's annotation tools label the texts with named entities to provide faceted browsing and data layers over the original image. (Click to enlarge image)

 If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.


Posted by BL Labs on behalf of Dr Martyn Harris, Prof Dan Levene, Prof Mark Levene and Dr Dell Zhang.

05 February 2018

8th Century Arabic science meets today's computer science

Add comment

Or, Announcing a Competition for the Automatic Transcription of Historical Arabic Scientific Manuscripts 

“An impartial view of Digital Humanities (DH) scholarship in the present day reveals a stark divide between ‘the West and the rest’…Far fewer large-scale DH initiatives have focused on Asia and the non-Western world than on Western Europe and the Americas…Digital databases and text corpora – the ‘raw material’ of text mining and computational text analysis – are far more abundant for English and other Latin alphabetic scripts than they are for Chinese, Japanese, Korean, Sanskrit, Hindi, Arabic and other non-Latin orthographies…Troves of unread primary sources lie dormant because no text mining technology exists to parse them.”

-Dr. Thomas Mullaney, Associate Professor of Chinese History at Stanford University

Supporting the use of Asian & African Collections in digital scholarship means shining a light on this stark divide and seeking ways to close the gap. In this spirit, we are excited to announce the ICFHR2018 Competition on Recognition of Historical Arabic Scientific Manuscripts.

Add MS 7474_0043.script

The Competition

Drawing together experts from British Library, The Alan Turing Institute, Qatar Digital Library and PRImA Research Lab, our aim in launching this competition is to play an active roll in advancing the state-of-the-art in handwritten text recognition technologies for Arabic. For our first challenge we are focussing on finding an optimal solution for accurately and automatically transcribing historical Arabic scientific handwritten manuscripts.

Though such technologies are still in their infancy, unlocking historical handwritten Arabic manuscripts for large-scale text analysis has the potential to truly transform research. In conjunction with the competition we hope to build and make freely open and available a substantial image and ground truth dataset to support continued efforts in this area. 

Enter the Competition

Organisers

Apostolos Antonacopoulos Professor of Pattern Recognition, University of Salford and Head of (PRImA) research lab 
Christian Clausner Research Fellow at the Pattern Recognition and Image Analysis (PRImA) research lab  
Nora McGregor Digital Curator at British Library, Asian & African Collections
Daniel Lowe Curator at British Library, Arabic Collections
Daniel Wilson-Nunn, PhD student at University of Warwick & Turing PhD Student based at Alan Turing Institute 
• Bink Hallum, Arabic Scientific Manuscripts Curator at British Library/Qatar Foundation Partnership 

Further reading

For more on recent Digital Research Team text recognition and transcription projects see:

 

This post is by Nora McGregor, Digital Curator, British Library. She is on twitter as @ndalyrose

23 January 2018

Using Transkribus for handwritten text recognition with the India Office Records

Add comment

In this post, Alex Hailey, Curator, Modern Archives and Manuscripts, describes the Library's work with handwritten text recognition.

National Handwriting Day seems like a good time to introduce the Library’s initial work with the Transkribus platform to produce automatic Handwritten Text Recognition models for use with the India Office Records.

Transkribus is produced and supported as part of the READ project, and provides a platform 'for the automated recognition, transcription and searching of historical documents'. Users upload images and then identify areas of writing (text regions) and lines within those regions. Once a page has been segmented in this way, users transcribe the text to produce a 'ground truth' transcription – an accurate representation of the text on the page. The ground truth texts and images are then used to train a recurrent neural network to produce a tool to transcribe texts from images: a Handwritten Text Recognition (HTR) model.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-01-22/8f108ba6-3247-429a-995c-6db42a4d3d7f.png
Page segmented using the automated line identification tool. The document structure tree can be seen in the left panel.

After hearing about the project at the Linnean Society’s From Cabinet to Internet conference in 2015, we decided to run a small pilot project using material digitised as part of the Botany in British India project.

Producing ground truth text and Handwritten Text Recognition (HTR) models

We created an initial set of ground truth training data for 200 images, produced by India Office curators and with the help of a PhD student. This data was sent to the Transkribus team to produce our first HTR model. We also supplied material for the construction of a dictionary to be used alongside the HTR, based on the text from the botany chapter of Science and the Changing Environment in India 1780-1920 and contemporary botanical texts.

The accuracy of an HTR model can be determined by generating an automated transcription, correcting any errors, and then comparing the two versions. The Transkribus comparison tool calculates a Character Error Rate (CER) and a Word Error Rate (WER), and also provides a handy visualisation. With our first HTR model we saw an average CER of 30% and WER of 50%, which reflected the small size of the training set and the number of different hands across the collections.

(Transkribus recommends using collections with one or two consistent hands, but we thought we would push on regardless to get an idea of the challenges when using complex, multi-authored archives).

Doc18776img16
WER and CER are quite unforgiving measures of accuracy. The image above has 18.5% WER and 9.5% CER

For our second model we created an additional 500 pages of ground truth text, resulting in a training set of 83,358 words over 14,599 lines. We saw a marked improvement in results with this second HTR model – an average WER of 30%, and CER of 15%.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-01-22/a59e02fd-b126-424b-97c8-57aa42172c10.png
Graph showing the learning curve for our second HTR model, measured in CER

Improvements in the automatic layout detection and the ability to run the HTR over images in batch means that we can now generate ground truth more quickly by correcting computer-produced transcriptions than we could through a fully-manual process. We have since generated and corrected an additional 200 pages of transcriptions, and have expanded the training dataset for our next HTR model.

Lessons learned and next steps

We have now produced over 800 pages of corrected transcriptions using Transkribus, and have a much better idea of the challenges that the India Office material poses for current HTR technologies. Pages with margins and inconsistent paragraph widths prove challenging for the automatic layout detection, although the line identification has improved significantly, and tends to require only minor corrections (if any). Faint text, numerals, and tabulated text appeared to pose problems for our HTR models, as did particularly elaborate or lengthy ascenders and descenders.

More positively, we have signed a Memorandum of Understanding with the READ project, and are now able to take part in the exciting conversations around the transcription and searching of digitised manuscript materials, which we can hopefully start to feed into developments at the Library. The presentations from the recent Transkribus Conference are a good place to start if you want to learn more.

The transcriptions will be made available to researchers via data.bl.uk, and we are also planning to use them to test the ingest and delivery of transcriptions for manuscript material via the Universal Viewer.

By Alex Hailey, Curator, Modern Archives and Manuscripts

If you liked this post, you might also be interested in The good, the bad, and the cross-hatched on the Untold Lives blog.

30 December 2017

The Flitch of Bacon: An Unexpected Journey Through the Collections of the British Library

Add comment

Digital Curator Dr. Mia Ridge writes: we're excited to feature this guest post from an In the Spotlight participant. Edward Mills is a PhD student at the University of Exeter working on Anglo-Norman didactic literature. He also runs his own (somewhat sporadic) blog, ‘Anglo-Normantics’, and can be found Tweeting, rather more frequently, at @edward_mills.

Many readers of [Edward's] blog will doubtless be familiar with the work being done by the Digital Scholarship team, of which one particularly remarkable example is the ‘In the Spotlight‘ project. The idea behind the project, for anyone who may have missed it, is absolutely fascinating: to create crowd-sourced transcriptions of part of the Library’s enormous collection of playbills. The part of the project that I’ve been most involved with so far is concerned with titles, and it’s a two-part process; first, the title is identified out of the (numerous) lines of text on the page, and once this has been verified by multiple volunteers, it is then fed back into the database as an item for transcription.

PlaybillsPizarro
In the Spotlight interface

Often, though, the titles alone are more than sufficient to pique my interest. One such intriguing morsel came to light during a recent transcribing stint, when I found myself faced with a title that raised even more questions than Love, Law, & Physic:

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-12-21/85a34802-64e9-4beb-8156-9aa1517413cd.png
Playbill for a performance of The Flitch of Bacon

In my day-job, I’m actually a medievalist, which meant that any play entitled The Flitch of Bacon was bound to pique my interest. The ‘flitch’ refers to an ancient – and certainly medieval –  custom in Dunmow, Essex, wherein couples who could prove that they had never once regretted their marriage in a year and a day would be awarded a ‘flitch’ (side) of bacon in recognition of their fidelity. I first came across the custom of these ‘flitch trials’ while watching an episode of the excellent Citation Needed podcast, and was intrigued to learn from there that references to the trials existed as far back as Chaucer (more on which later). The trials have an unbroken tradition stretching back centuries, and videos from 1925, 1952 and 2012 go some way towards demonstrating their continuing popularity. What the British Library project revealed, however, was that the flitch also served as the driver for artistic creation in its own right. A little bit of digging revealed that the libretto to the 1776 Flitch of Bacon farce has been digitised as part of the British Library’s own collections, and the lyrics are every bit as spectacular as one might expect them to be.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-12-21/36b47ae7-9dc4-48dc-8d5a-3e023eae6f27.png
Rev. Henry Bate, The Flitch of Bacon: A Comic Opera in Two Acts (London: T. Evans, 1779), p. 24.

So far, so … unique. But, of course, the medievalist that dwells deep within me couldn’t resist digging into the history of the tradition, and once again the British Library’s collections came up trumps. The official website for the Dunmow Flitch Trials (because of course such a thing exists) proudly asserts that ‘a reference … can even be found within Chaucer’s 14th-century Canterbury Tales‘, which of course can easily be checked with a quick skim through the Library’s wonderful catalogue of digitised manuscripts. The Wife of Bath’s Prologue opens with the titular wife describing her attitude towards her first three husbands, whom she ‘hadde […] hoolly in myn honde’. She keeps them so busy that they soon come to regret their marriage to her, forfeiting their right to ‘the bacoun …that som men fecche in Essex an Donmowe’ in the process:

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-12-21/8e410cff-7b1c-4413-ae03-635c2f58fac9.png
‘The bacoun was nought fet for hem I trowe / That som men fecche in Essex an Donmowe’. From the Wife of Bath’s Tale (British Library, MS Harley 7334, fol. 89r).

Chaucer’s reference to the flitch custom is frequently taken, along with William Langland’s allusion in Piers Plowman to couples who ‘do hem to Donemowe […] To folwe for the fliche’, to be the earliest reference to the tradition that can be found in English literature. Once again, though, the British Library’s collections can help us to put this particular statement to the test; as you’ve probably guessed by now, they show that there is indeed an earlier reference to the custom waiting to be found.

Baconanglonorman

Our source for this precocious French-language reference is MS Harley 4657. Like many surviving medieval manuscripts, this codex is often described as a ‘miscellany’: that is, a collection of shorter works brought together into a single volume. In the case of Harley 4657, the book appears to have been designed as a coherent whole, with the texts copied together at around the same time and sharing quires with each other; this is perhaps explained by the fact that the texts contained within it are all devotional and didactic in nature. (Miscellanies that were, by contrast, put together at a later date are known as recueils factices – another useful term, along with the ‘flitch of bacon’, to slip into conversation with friends and family members.) The bulk of the book is taken up by the Manuel des pechez, a guide to confession that was later translated into English by Robert Manning as Handling Synne. It’s in this text that the flitch custom makes an appearance, as part of a description of how many couples do not deserve any recompense for loyalty on account of their mutual mistrust (fol. 21):

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-12-21/7f90c385-77be-4cdb-9160-94c0aa7ce873.png

21 July 2017

Through the British Library Looking Glass - A Continuation of Nadya Miryanova's Work Experience

Add comment

Posted by Nadya Miryanova BL Labs School Work Placement Student, currently studying at Lady Eleanor Holles, working with Mahendra Mahey, Manager of BL Labs.

Day 6

Despite the fact that a week of my work experience here has already elapsed, I still can’t quite believe that I am lucky enough to find myself in this magnificent institution, let alone have access to ‘staff-only’ areas and actually be able to work here. One thing I particularly love is that I can enter the library in the early morning, before official opening hours, and see it evolve from a certain peaceful stillness to its usual excited buzz of activity as the day progresses and watch as the library is brought to life once more by the people that visit it.

Photo of me at the book tower
A photograph of me by the book tower in the British Library

Previously, in a very serious and sophisticated catch-up session (including, of course, only work-related matters), Mahendra had discovered that I was a huge fan of the Harry Potter series. Although this subject may seem quite unexpected and completely out of context in this blog, it is actually very relevant, since on the next day, Mahendra had informed me that I would be able to meet the Harry Potter curator. This was something that caught me completely by surprise, but it also shamelessly sparked a child-like excitement within me, having loved the franchise ever since I was seven. A meeting was set for Monday morning, and I waited, with some impatience, to meet Julian Harrison, the curator of medieval manuscripts and also the man who was involved in the organisation of the Harry Potter exhibition.

People looking at exhibition
People looking at an exhibition in the British Library

During the meeting, I was able to gain an insight into the working life of a curator. Julian explained the sorts of things involved in this role, and also talked more about the exhibitions themselves, where inspiration comes from, as well as previous exhibitions and their structure. 

In addition to this, I was able to find out lots of details about the Harry Potter exhibition (it’s fascinating and definitely worth a visit, trust me!). Furthermore, we had an in-depth discussion about the Harry Potter series itself, and we talked about some of the key themes as well as key characters in the books. You’ll soon be able to find out more about the exhibition too, be sure to book your tickets early and visit the British Library to be part of what will truly be a magical experience!

Phoenix
A preview of the "Harry Potter- A History of Magic" Exhibition, coming soon on 27th October 2017

In the afternoon, I went to a classical music concert at the British Museum. As I stepped into the light interior of the museum, I felt a hundred memories instantly come to mind, dating back to various visits with my family and numerous school projects over the years. The British Library and British Museum singers presented a concert performance of ‘Trial by Jury’, an opera in one act, with music by Arthur Sullivan and libretto by W. S. Gilbert. ‘Trial by Jury’ is set at a Court of Justice in 1876. The defendant, Edwin, has recently promised to marry a beautiful woman, Angelina, but has since changed his mind, for which reason Angelina is now suing him for Breach of Promise. After a multitude of entertaining events, involving the Jury, the public, the Usher, and many comic disagreements over the issue, a decision is finally reached. The Judge decides the only real logical solution to the problem is to marry Angelina himself, resulting in happiness for all parties. The choir then performed Te Deum, op 103, by Dvorak, a true choral masterpiece, and the performance itself was very moving.

Although the choir was relatively small in number, their bright and beautiful voices resonated across the room, creating a light-hearted and friendly atmosphere, upheld by the choir’s energy and enthusiasm. I always love seeing how music can unite people to interpret a piece together, and each member was fully involved in this collaborative effort to create stunning music, making the performance an unquestionable success.  

Choir
The British Museum and British Library Singers

When I returned to the office, I checked my e-mails and saw that Laurence Roger, Project Support Officer in the Collections Division, had very kindly offered to help me examine a book about Catullus’ poetry. The book that I eventually saw was dating back to the 18th century, and I spent the last section of my day looking at this book with Laurence, who is very nice, and I felt extremely lucky to be able to have access to it.

Book pic
One of the books that Laurence herself had lent me to look at.

Day 7

My seventh day of work experience arrived, and almost as soon as I got into the office, I set up my desk and eagerly launched straight into my working day. My morning consisted of independent work, where I further developed my research project and carried on with the interview storyboard for Hannah-Rose Murray, a finalist of the BL Labs competition in 2016. Her project was centred on black American activists in the 19th century, particularly their speeches and lectures from the 1830s to the 1890s. This was a period of history that I previously knew little about, and so I enjoyed learning about the influence that black Americans had on British society and seeing the way Hannah went around creating her project, bringing history to life. Read more about her project here. 

Locations of Frederick Douglass
Map displaying the locations of Frederick Douglass’ lectures in the United Kingdom and Ireland, a small section of Hannah-Rose Murray's project

At 12:30, I attended a Welcome Day at the British Library, and this presented me with an excellent opportunity to not only find out more about the different departments of the library, but also to tell some new members of staff about some of the work the Digital Scholarship Department does (I was also provided with a free lunch, always a bonus!). I talked to a variety of departments, ranging from Human Resources to Publishing and Retail, and everyone was extremely friendly, helpful and accommodating.

In the afternoon, I worked independently once again, more specifically on a YouTube transcription of an interview with Melodee Beals, a 2016 research award winner, who created an amazing project entitled ‘Scissors and Paste’. This project utilises the 1800-1900 British Library Newspapers collection to explore the possibilities of mining large-scale newspaper databases for reprinted and re purposed news content.

Melodee presenting her project
Melodee Beals presenting her project, 'Scissors and Paste'

After finishing my working day, I decided to wonder around and explore the British Library. The amazing thing about this place is that it really does resemble a maze, I constantly find myself discovering new places and rooms, with each day presenting something new and different to the previous one.

Day 8

As I entered the lift, I looked at the hard copy of my schedule, and I noticed that a meeting with a fashion company and members of the British Fashion Council was fixed that very morning. Feeling suddenly a little more self-conscious than usual about my appearance, I glanced cautiously in the mirror that was in the lift and my reflection stared back, wondering if anything could be done to cover the consequences that a malfunctioning alarm clock and getting ready in five minutes that morning could bring. After a few fruitless attempts of trying to somehow tame my hair, I finally accepted defeat and entered the meeting room.

The meeting at 9 o’clock was with a luxury womenswear brand. During the meeting, Mahendra introduced BL Labs, showing a presentation that informed the company about Digital Scholarship and detailed previous projects that the department had worked on, including ‘Burning Man’. A project with the fashion company was then initiated, which would involve the Library's collections, and some possible ideas for the project were also brainstormed. The fashion company talked more about their collections and how ideas for projects generally come about. It is inspiring to think how each individual collection, whether an assortment of garments or a literary exhibition of novels, tells its own unique story, and I found out that in many ways the research for the project is itself a sensational journey.

After this meeting, I returned back to my desk and had a quick catch-up with Mahendra, where we evaluated the YouTube transcription work, and the general progress made over the first half of this week. To finish off, I was whisked off to another meeting, this time with Wayne Boucher, a photographer who has a very big interest in beautiful stain-glass windows, and will be keeping in contact with the British Library to promote this stunning artwork.

Tiffany stain glass window
A Tiffany stain-glass window

Day 9

In the morning, I hurriedly entered the British Library through the staff entrance, as usual, but instead of walking over to the doors of the lift, I took a sharp right turn, and walked over to the Post Room. Mahendra had previously organised for me to visit the Post Room with Peter Clarke, Service Delivery Manager, Messenger/Post Service, and today I would be having a tour of certain sections of the building that are off bounds to not only the general public, but also to many members of staff. I was able to see the process of delivery take place, and even help with this crucial procedure, without which many of the library books that researchers and readers need would not be available. I was shown the delivery room by Keiran Duncan-Johnson, Late Team Leader LMS, Messenger/Post Service, Finance Division, and this was a huge, open space, which once more reminded me of the sheer scale of the place. 

I was also kindly shown round other areas of the library  I was previously unfamiliar with by Keiran, such as the modern languages sector and the Alan Turing Institute, both of which are incredible departments that work tirelessly to make great leaps in their corresponding fields of study to change the world for the better.

Alan Turing institute
The Alan Turing Institute

The afternoon commenced with a meeting with the music curator, Chris Scobie. For the second time that day, I was lucky enough to visit a new area of the library that is of limited access, and Chris showed me the music reading room, and most notably, the basement. The basement is where all the music scores and manuscripts lie, and needless to say, I was incredibly excited. As we browsed through the shelves of the collections, I saw multiple familiar names of composers, such as Bach, Beethoven and Brahms, and I even got to read and touch some of Elgar’s letters to Vaughan Williams and look at his original manuscript for his Enigma Variations!  

Elgar Manuscript
A digitised version of the original Elgar manuscript for the theme of the Enigma Variations

Day 10

As I walked down the second floor corridor, I soon came to face the wooden door of the office for what it seemed was the last time. I sighed and a miserable thought came into my head, as I began to contemplate what on earth I was going to do with myself on Monday, when I was no longer going to work here. However, I soon brushed it off, and decided to make the most of my final day at the British Library.

Door to office
The door to the office of the Digital Scholarship Department

My final day consisted of making concluding touches to my numerous projects, including refining and making last minute edits to some of the transcriptions I had done. I then met Christin Hoene from the University of Kent, who was working on a project that was based on the concept of sound within novels. I was able to show her some of the work that I did on Excel with my independent research project, which can be accessed here.

At lunchtime, rather than eating in the staff canteen as usual, I decided to eat my lunch in a free reading space in the centre of the library, whilst reading my book, ‘Mother Tongue’ by Bill Bryson. What I love most about libraries is that there are so many untold stories hiding in the shelves, and I feel like I could sit comfortably in here for hours. In fact, in the space of an hour, you could travel to as many as 10 countries, should you only have the will to open a few different books and immerse yourself in their stories. As Lloyd Alexander once said “Books can truly change our lives: the lives of those who read them, the lives of those who write them. Readers and writers alike discover things they never knew about the world and about themselves”.

Lloyd Alexander quotation
Another great Lloyd Alexander quotation

Lastly, and most importantly, I would like to say a huge thank you to everyone who has made this experience a possibility for me, especially Mahendra, who has not only been very kind and patient, but has also provided me with so many wonderful opportunities and has helped me hugely with a multitude of different things. I have always loved books since a young age, and to be surrounded by so many was in itself very special, but to be able to work in the library and help the Digital Scholarship Department was just incredible. My experience here has taught me multiple valuable things, which is something I am eternally grateful for.

The same way I would never judge a book by its front cover, I will not judge a building by its name, for the British Library is infinitely more than just a residence for books. It is a museum in which there are many exhibitions, it is a research centre, and most importantly, it is an institution that stores the world’s knowledge behind its brick walls.

The-British-Library
The British Library

Inspiration can really come from absolutely anywhere, and from something small you can make something incredibly vast. It makes you think what you could do and what a difference it could make, if only you just choose to try. Inevitably, in life, you have to take risks, but more often than not, lots of these are worth taking in an attempt to brighten and bring artistic colour as well as creativity to the world. In the words of Stephen King, “books are a uniquely portable magic”, something which certainly rings true within the walls of this institution, where so many items are kept and so many new ones are constantly being acquired and discovered.

So, I send a big thank you to the British Library and all who work here, for making what was essentially a childhood dream into a reality and this will truly be a chapter of my life that I will always remember.

Nadya Miryanova

03 May 2017

How can a turtle and the BBC connect learners with literature?

Add comment

Illustration of a youth on a turtle
Image from 'When Life is Young: a collection of verse for boys and girls'. This turtle is ace but we used a different kind of turtle for our project.

Digital Curator Mia Ridge explains how and why we used linked open data to help more people find British Library content.

Despite the picture, it's not a real turtle (sorry to disappoint you). We've used a file format called 'Turtle' (.ttl) to help make articles and collections in Library's Discovering Literature: Romantics and Victorians easier for teachers to find.

We did this to make content available to the BBC's Research and Education Space (RES) Project. RES helps make public archives easier to find and use in education and teaching. It collects and organises the digital collections of libraries, museums, broadcasters and galleries so that developers can create educational products to connect learners to information and collections.

We were keen to join the RES project and help learners discover our collections and knowledge, but first we had to find the right content and figure out some technical issues. This post gives an overview of how we did it.

Finding the right content

Our collections are vast. Knowing where to start can be daunting. Which section of our website would be most immediately useful for the RES project's goals and audiences?

After looking over our online material, the Discovering Literature: Romantics and Victorians site seemed like a perfect match. Discovering Literature is a free educational resource that puts manuscript and printed collection items in historical, cultural and political context. The Romantics and Victorians site includes thousands of collection items, hundreds of articles, films, teachers’ notes and more to help make collection items more accessible, so it was a great place to start.

Using linked open data to make information easier to find

Created with support from Jisc and Learning on Screen, the RES platform collects data published as linked open data, which at its simplest means data that is structured and linked to vocabularies that help define the meaning of terms used.

For example, we might include a bit of technical information to unambiguously identify Elizabeth Barrett Browning as the author of the published volumes of poetry or as the writer of a letter. Applying a shared identifier helps connect our resources to information about Barrett Browning in other collections. A teacher preparing a lesson plan can be sure that the RES resources they include are accurate and authoritative articles that'll help their students understand Barrett Browning and other writers.

How did we do it?

There were three main stages in creating linked open data for the RES project, involving staff across the Library, at an external agency and at the BBC. Short, weekly conference calls kept things moving by making us accountable for progress between calls.

First, we had to work out which vocabularies to apply to describe people, the works they created, the collection items used to illustrate articles, the articles themselves, etc. Some terms, like the names of published authors, already exist in other vocabularies so we could just link to them. Others, like the 'genre' or 'literary period' used to describe a work, were particular to the Library. We posted work in progress online so that other people could review and comment on our work.

Once the mappings were agreed, the technical work of updating code used in the content management system so that special pages containing the data could be published as 'Turtle'-formatted files was carried out.  Licence information was included to meet the RES Project requirements.

Finally, the work was tested on a staging server, then checked again by the RES team once the changes had gone live on our website. If you're curious about the underlying linked data technologies, the BBC's guide to the Research & Education Space for contributors and developers has all the details.

Looking to the future

We learned a lot of practical and technical lessons that we hope to apply to future projects. For a start, there are more Discovering Literature sites, and others using a similar web architecture. If you're interested in other perspectives, the RES Project have collected different experiences on their platform, process and progress on their blog. I'm looking forward to seeing how the linked open data we created is used to connect learners to our collections and knowledge.

16 December 2016

Re-imagining a catalogue of illuminated manuscripts - from search to browse

Add comment

In this guest post, Thomas Evans discusses his work with Digital Curator Dr Mia Ridge to re-imagine the interface to the British Library's popular Online Catalogue of Illuminated manuscripts.

The original Catalogue was built using an Access 2003 database, and allows users to create detailed searches from amongst 20 fields (such as date, title, origin, and decoration) or follow 'virtual exhibitions' to view manuscripts. Search-based interfaces can be ideal for specialists who already know what they're looking for, but the need to think of a search term likely to yield interesting results can be an issue for people unfamiliar with a catalogue. 'Generous interfaces' are designed as rich, browsable experiences that highlight the scope and composition of a particular collection by loading the page with images linked to specific items or further categories. Mia asked Thomas to apply faceted browsing and 'generous' styles to help first-time visitors discover digitised illuminated manuscripts. In this post Thomas explains the steps he took to turn the catalogue data supplied into a more 'generous' browsing interface. An archived version of his interface is available on the Internet Archive.

With over 4,300 manuscripts, written in a variety of languages and created in countries across Europe over a period of about a thousand years, the British Library's collection of illuminated manuscripts contains a diverse treasure trove of information and imagery for both the keen enthusiast and the total novice.

As the final project for my Masters in Computer Science at UCL, I worked with the British Library to design and start to implement alternative ways of exploring the collection. This project had some constraints in time, knowledge and resources. The final deadline for submission was only four months after receiving the project outline and the success of the project rested on the knowledge, experience and research of a fresh-faced rookie (me) using whatever tools I had the wherewithal to cobble together (open source software running on a virtual machine server hosted by UCL).

Rather than showing visitors an empty search box when they first arrive, a generous interface will show them everything available. However, taken literally, displaying 'everything' means details for over 4,300 manuscripts and around 40,000 images would have to be displayed on one page. While this approach would offer visitors a way to explore the entire catalogue, it could be quite unwieldy.

One way to reduce the number of manuscripts loaded onto the screen is to allow visitors to filter out some items, for example limiting the 'date' field to between 519 and 927 or the 'region' field to England. This is 'faceted' browsing, and it makes exploration more manageable. Presenting the list of available values for region or language, etc., also gives you a sense of the collection's diversity. It also means that 'quirky' members of the collection are less likely to be overlooked.

Screenshot of filters in Thomas CIM interface II
An example of 'date' facets providing an instant overview of the temporal range of the Catalogue

For example, if you were to examine 30 random manuscripts from the British Library's collection, you might find 20 written in Latin, three each in French and English, and perhaps one each in Greek, Hebrew or Italian. You would almost certainly miss that the Catalogue contains a manuscript written in Cornish, another in Portuguese and another in Icelandic. These languages might be of interest precisely because they are hard to come by in the British Library's catalogue. Listing all the available languages (as well as their frequencies) exposes the exceptional parts of the collection where an unfaceted generous interface would hide them in plain sight.

Once I understood the project's goals and completed some high-level planning and design sketches, it was time to get to grips with implementation. Being fairly inexperienced, I found some tasks took much longer than expected. A few examples which stick in the mind are properly configuring the web server, debugging errant server-side scripts (which have a habit of failing either silently or with an unhelpful error message) and transforming Library's database into a form which I could use.

Being the work of many hands over the years, the database inevitably contained some tiny differences in the way entries were recorded, which Mia informs me is not uncommon for a long-standing database in a collecting institution. These small inconsistencies - for example, the use of an en-dash in some cases and a hyphen in others - look fine to us, but confuse a computer. I worked around these where I could, 'cleaning' the records only when I was certain of my correction.

Being new to web design, I built the interface iteratively, component by component, consulting periodically with Mia for feedback. Thankfully, frameworks exist for responsive web design and page templating. Nevertheless, there was a small learning curve and some thought was required to properly separate application logic from presentation logic.

There were some ambitions for the project which were ultimately not pursued due to time (and knowledge!) constraints, but this iterative process made other improvements possible over the course of my project. To make exploration of the catalogue easier, the page listing a manuscript's details also contained links to related manuscripts. For instance, Ioannes Rhosos is attributed as the scribe of Harley 5699, so, on that manuscript's page, users could click on his name to see a list of all manuscripts by him. They could then apply further filters if desired. This made links between manuscripts much more clear than the old interface, but it is limited to direct links which were explicitly recorded in the database.

An example of a relevant feature not explicitly recorded in the database is genre - only by reading manuscript descriptions can you determine whether it is religious, historical, medical etc. in its subject matter. Two possible techniques for revealing such features were considered: applying natural language processing to manuscript descriptions in order to classify them, or analysing data about which manuscripts were viewed by which users to build a recommendation system. Both of these turned out to require more in-depth knowledge than I was able to acquire within the time limit of the project.

I enjoyed working out how to transform all the possible inputs to the webpage into queries which could be run against the database, dealing with missing/invalid inputs by providing appropriate defaults etc. There was a quiet satisfaction to be had when tests of the interface went well - seeing something work and thinking 'I made that!'. It was also a pleasure to work with data about such an engaging topic.

Hopefully, this project will have proved that exploration of British Library's Catalogue of Illuminated Manuscripts has the potential to become a richer experience. Relationships between manuscripts which are currently not widely known could be revealed to more visitors and, if the machine learning techniques were to be implemented, perhaps new relationships would be revealed and related manuscripts could be recommended. My project showed the potential for applying new computational methods to better reveal the character of collections and connections between their elements. Although the interface I delivered has some way to go before it can achieve this goal, I earnestly hope that it is a first step in that direction.

Thomas' Catalogue interface
Thomas' Catalogue interface

28 January 2016

Book Now! Nottingham @BL_Labs Roadshow event - Wed 3 Feb (12.30pm-4pm)

Add comment Comments (0)

Do you live in or near Nottingham and are available on Wednesday 3 Feb between 1230 - 1600? Come along to the FREE UK @BL_Labs Roadshow event at GameCity and The National Video Game Arcade, Nottingham (we have some places left and booking is essential for anyone interested) and:

 

BL Labs Roadshow in Nottingham - Wed 3 Feb (1200 - 1600)
BL Labs Roadshow at GameCity and The National Video Game Arcade, Nottingham, hosted by the Digital Humanities and Arts (DHA) Praxis project based at the University of Nottingham, Wed 3 Feb (1230 - 1600)
  • Discover the digital collections the British Library has, understand some of the challenges of using them and even take some away with you.
  • Learn how researchers found and revived forgotten Victorian jokes and Political meetings from our digital archives.
  • Understand how special games and computer code have been developed to help tag un-described images and make new art.
  • Find out about a tool that links digitised handwritten manuscripts to transcribed texts and one that creates statistically representative samples from the British Library’s book collections.
  • Consider how the intuitions of a DJ could be used to mix and perform the Library's digital collections.
  • Talk to Library staff about how you might use some of the Library's digital content innovatively.
  • Get advice, pick up tips and feedback on your ideas and projects for the 2016 BL Labs Competition (deadline 11 April) and Awards (deadline 5 September). 

Our hosts are the Digital Humanities and Arts (DHA) Praxis project at the University of Nottingham who are kindly providing food and refreshments and will be talking about two amazing projects they have been involved in:

ArtMaps: putting the Tate Collection on the map project
ArtMaps: Putting the Tate Collection on the map

Dr Laura Carletti will be talking about the ArtMaps project which is getting the public to accurately tag the locations of the Tate's 70,000 artworks.

The 'Wander Anywhere' free mobile app developed by Dr Benjamin Bedwell.
The 'Wander Anywhere' free mobile app developed by Dr Benjamin Bedwell.

Dr Benjamin Bedwell, Research Fellow at the University of Nottingham will talk about the free mobile app he developed called 'Wander Anywhere'.  The mobile software offers users new ways to experience art, culture and history by guiding them to locations where it downloads stories intersecting art, local history, architecture and anecdotes on their mobile device relevant to where they are.

For more information, a detailed programme and to book your place, visit the Labs and Digital Humanities and Arts Praxis Workshop event page.

Posted by Mahendra Mahey, Manager of BL Labs.

The BL Labs project is funded by the Andrew W. Mellon Foundation.