THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

19 November 2018

The British Library / Qatar Foundation Partnership Imaging Hack Day

The BL/QFP is digitising archive material related to Persian Gulf History as well as Arabic scientific manuscripts, in the past four years we have added in excess of 1.5 million images to the Qatar Digital Library. Our team of ~45 staff includes a group of eight dedicated imaging professionals, who between them produce 30,000 digitised images each month, to exacting standards that focus on presenting the information on the page in a visually clear and consistent manner.

 

Our imaging team are a highly-skilled group, with a variety of backgrounds, experiences and talents, and we wished to harness these. Therefore, we decided to set aside a day for our Imaging team to use their creative and technical skills to ‘hack’ the material in our collection.

By dedicating a whole day for our imaging team to experiment with different ways of capturing the material we are digitising we hoped it would reveal some interesting aspects of the collection, which were not seen through our standardised capture process. It also gave the Imaging team a chance to show off and share their skills amongst themselves and the wider BL/QFP team.

This was how we conceived of our first Imaging Hack Day, and the rest of this blog post outlines how we promoted and organised it.

From its conception the Imaging team were keen for the wider team to be involved, so we asked them to nominate material from the collections we are digitising that they thought could be ‘hacked’ and to state their reasons why.

To begin with it was mostly members of the Imaging team that nominated items. So we decided to wage a PR campaign: firstly the Imaging team delivered a presentation on the 9th of October at one of BL/QFP’s all-staff meetings. The presentation outlined some of the techniques and ideas they had for the hack day, in order to appeal to the rest of the team for nominations. Additionally, on the morning of the 9th members of the Imaging team snuck into the office and planted some not-so-subtle propaganda:

Posters

The impact of the posters and presentation was really pronounced. After having a handful of nominations from people outside of the Imaging team before 9th Oct, within days the number had increased by a factor in excess of five (see graph below). The posters also became highly sought after amongst the team.

Nominations
Graph showing how many shelfmarks were nominated each day, with cumulative totals for members of the imaging team vs non-imaging teams.

 

The day before the Hack Day, anyone who had nominated an item was invited to a prep session with the Imaging team. Here the nominated items were presented, as well as the ideas for hacks. Extra judicious use of Post-Its and Sharpies facilitated feedback, and by the end of the session the Imaging team were armed with lots of ideas, encouragement, and knew they had curatorial expertise from the rest of the BL/QFP team to call upon if necessary.

Postits

As a final surprise, and a sign of appreciation Hack Sacks filled with goodies were secreted into the imaging studio late on the eve of the Hack Day:

Hacksacks

The resulting images/hacks of the Hack Day will be covered in an upcoming post by our studio manager Renata Kaminska. However, in addition the non-material results were manifold. Throughout the lead-up and on the actual day there was a palpable buzz amongst the Imaging team, evidence of the positive impact on their morale. It also led to a greater exchange of knowledge between the Imaging team and their colleagues throughout the BL/QFP. The day allowed for different areas of the team to come together, combine their expertise and find new ways of working and innovative ways of capturing our collections. Finally, it also demonstrated the fantastic experience and skills of our imaging technicians, many of which had not previously been exposed to the rest of the team. It was a real celebration of both the material that we are digitising and our talented imaging studio.

This is a guest post by Sotirios Alpanis, Head of Digital Operations for the British Library's Qatar Project, on Twitter as @SotiriosAlpanis

02 November 2018

Digital Conversation: History and Games

It is very nearly International Games Week; this is an initiative run by volunteers from around the world to reconnect communities through their libraries around the educational, recreational, and social value of all types of games. Here at the British Library we are excited to be hosting the narrative games convention AdventureX on Saturday 10th and Sunday 11th November, and to get the party started on Thursday 8th November we are delighted to run, in partnership with The National Archives and Wellcome, a Digital Conversation event on the topic of History and Games.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-11-02/a94ae6e5-8ae4-4fca-b786-91c9fab10c7a.png

Our star Digital Conversation panel features:
  • Toni Brasting, Creative Partnerships Manager at Wellcome Trust, who collaborates with games studios, designers and scientific researchers to create games that inspire conversations about health.
  • Andrew Burn, Professor of Media Education at the UCL Institute of Education, who will launch MissionMaker Beowulf, a digital platform which empowers students to make 3-D adventure games.

A video showing the process of making a game in Missionmaker Beowulf, followed by a video capture of the game
  • James Delaney founder and Managing Director of BlockWorks, who built Minecraft maps for Great Fire 1666 at the Museum of London, to mark the 350th anniversary of London's Great Fire. Furthermore, this summer they teamed up with English Heritage on a castle building project.

Kenilworth Castle in Minecraft

Trailer of Winter Hall by Lost Forest Games

  • Nick Webber, Associate Professor at Birmingham City University, whose research explores the impact of virtual worlds and online games on the practice of history.
  • Stella Wisdom, Digital Curator for Contemporary British Collections at the British Library, who has collaborated on multiple games initiatives.

The Digital Conversation event takes place in The Knowledge Centre at the British Library on Thursday 8th November, 18.30- 20.30; for more details including booking, visit: https://www.bl.uk/events/digital-conversation-history-and-games. Hope to see you there.

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom

29 October 2018

Using Transkribus for automated text recognition of historical Bengali Books

In this post Tom Derrick, Digital Curator, Two Centuries of Indian Print, explains the Library's recent use of Transkribus for automated text recognition of Bengali printed books.

Are you working with digitised printed collections that you want to 'unlock' for keyword search and text mining? Maybe you have already heard about Transkribus but thought it could only be used for automated recognition of handwritten texts. If so you might be surprised to hear it also does a pretty good job with printed texts too. You might be even more surprised to hear it does an impressive job with printed texts in Indian scripts! At least that is what we have found from recent testing with a batch of 19th century printed books written in Bengali script that have been digitised through the British Library’s Two Centuries of Indian Print project.

Transkribus is a READ project and available as a free tool for users who want to automate recognition of historical documents. The British Library has already had some success using Transkribus on manuscripts from our India Office collection, and it was that which inspired me to see how it would perform on the Bengali texts, which provides an altogether different type of challenge.

For a start, most text recognition solutions either do not support Indian scripts, or do not reach close to the same level of recognition as they do with documents written in English or other Latin scripts. In part this is down to supply and demand. Mainstream providers of tools have prioritised Western customers, yet there is also the relative lack of digitised Indian texts that can be used to train text recognition engines.

These text recognition engines have also been well trained on modern dictionaries and a collection of historical texts like the Bengali books will often contain words which are no longer in use. Their aged physicality also brings with it the delights of faded print, blotchy paper and other paper-based gremlins that keeps conservationists in work yet disrupts automated text recognition. Throw in an extensive alphabet that contains more diverse and complicated character forms than English and you can start to piece together how difficult it can be to train recognition engines to achieve comparable results with Bengali texts.

So it was with more with hope than expectation I approached Transkribus. We began by selecting 50 pages from the Bengali books representing the variety of typographical and layout styles within the wider collection of c. 500,000 pages as much as possible. Not an easy task! We uploaded these to Transkribus, manually segmenting paragraphs into text regions and automating line recognition. We then manually transcribed the texts to create a ground truth which, together with the scanned page images, were used to train the recurrent neural network within Transkribus to create a model for the 5,700 transcribed words.

Transkribus_Bengali_screenshot                                 View of a segmented page from one of the British Library's Bengali books along with its transcription, within the Transkribus viewer. 

The model was tested on a few pages from the wider collection and the results clearly communicated via the graph below. The model achieved an average character error rate (CER) of 21.9%, which is comparable to the best results we have seen from other text recognition services. Word accuracy of 61% was based on the number of words that were misspelled in the automated transcription compared to the ground truth. Eventually we would like to use automated transcriptions to support keyword searching of the Bengali books online and the higher the word accuracy increases the chances of users pulling back all relevant hits from their keyword search. We noticed the results often missed the upper zone of certain Bengali characters, i.e. the part of the character or glyph which resides above the matra line that connects characters in Bengali words. Further training focused on recognition of these characters may improve the results.

TranskribusResultsGraph showing the learning curve of the Bengali model using the Transkribus HTR tool.      

Our training set of 50 pages is very small compared to other projects using Transkribus and so we think the accuracy could be vastly improved by creating more transcriptions and re-training the model. However, we're happy with these initial results and would encourage others in a similar position to give Transkribus a try.