THE BRITISH LIBRARY

Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

02 November 2018

Digital Conversation: History and Games

It is very nearly International Games Week; this is an initiative run by volunteers from around the world to reconnect communities through their libraries around the educational, recreational, and social value of all types of games. Here at the British Library we are excited to be hosting the narrative games convention AdventureX on Saturday 10th and Sunday 11th November, and to get the party started on Thursday 8th November we are delighted to run, in partnership with The National Archives and Wellcome, a Digital Conversation event on the topic of History and Games.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-11-02/a94ae6e5-8ae4-4fca-b786-91c9fab10c7a.png

Our star Digital Conversation panel features:
  • Toni Brasting, Creative Partnerships Manager at Wellcome Trust, who collaborates with games studios, designers and scientific researchers to create games that inspire conversations about health.
  • Andrew Burn, Professor of Media Education at the UCL Institute of Education, who will launch MissionMaker Beowulf, a digital platform which empowers students to make 3-D adventure games.

A video showing the process of making a game in Missionmaker Beowulf, followed by a video capture of the game
  • James Delaney founder and Managing Director of BlockWorks, who built Minecraft maps for Great Fire 1666 at the Museum of London, to mark the 350th anniversary of London's Great Fire. Furthermore, this summer they teamed up with English Heritage on a castle building project.

Kenilworth Castle in Minecraft

Trailer of Winter Hall by Lost Forest Games

  • Nick Webber, Associate Professor at Birmingham City University, whose research explores the impact of virtual worlds and online games on the practice of history.
  • Stella Wisdom, Digital Curator for Contemporary British Collections at the British Library, who has collaborated on multiple games initiatives.

The Digital Conversation event takes place in The Knowledge Centre at the British Library on Thursday 8th November, 18.30- 20.30; for more details including booking, visit: https://www.bl.uk/events/digital-conversation-history-and-games. Hope to see you there.

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom

29 October 2018

Using Transkribus for automated text recognition of historical Bengali Books

In this post Tom Derrick, Digital Curator, Two Centuries of Indian Print, explains the Library's recent use of Transkribus for automated text recognition of Bengali printed books.

Are you working with digitised printed collections that you want to 'unlock' for keyword search and text mining? Maybe you have already heard about Transkribus but thought it could only be used for automated recognition of handwritten texts. If so you might be surprised to hear it also does a pretty good job with printed texts too. You might be even more surprised to hear it does an impressive job with printed texts in Indian scripts! At least that is what we have found from recent testing with a batch of 19th century printed books written in Bengali script that have been digitised through the British Library’s Two Centuries of Indian Print project.

Transkribus is a READ project and available as a free tool for users who want to automate recognition of historical documents. The British Library has already had some success using Transkribus on manuscripts from our India Office collection, and it was that which inspired me to see how it would perform on the Bengali texts, which provides an altogether different type of challenge.

For a start, most text recognition solutions either do not support Indian scripts, or do not reach close to the same level of recognition as they do with documents written in English or other Latin scripts. In part this is down to supply and demand. Mainstream providers of tools have prioritised Western customers, yet there is also the relative lack of digitised Indian texts that can be used to train text recognition engines.

These text recognition engines have also been well trained on modern dictionaries and a collection of historical texts like the Bengali books will often contain words which are no longer in use. Their aged physicality also brings with it the delights of faded print, blotchy paper and other paper-based gremlins that keeps conservationists in work yet disrupts automated text recognition. Throw in an extensive alphabet that contains more diverse and complicated character forms than English and you can start to piece together how difficult it can be to train recognition engines to achieve comparable results with Bengali texts.

So it was with more with hope than expectation I approached Transkribus. We began by selecting 50 pages from the Bengali books representing the variety of typographical and layout styles within the wider collection of c. 500,000 pages as much as possible. Not an easy task! We uploaded these to Transkribus, manually segmenting paragraphs into text regions and automating line recognition. We then manually transcribed the texts to create a ground truth which, together with the scanned page images, were used to train the recurrent neural network within Transkribus to create a model for the 5,700 transcribed words.

Transkribus_Bengali_screenshot                                 View of a segmented page from one of the British Library's Bengali books along with its transcription, within the Transkribus viewer. 

The model was tested on a few pages from the wider collection and the results clearly communicated via the graph below. The model achieved an average character error rate (CER) of 21.9%, which is comparable to the best results we have seen from other text recognition services. Word accuracy of 61% was based on the number of words that were misspelled in the automated transcription compared to the ground truth. Eventually we would like to use automated transcriptions to support keyword searching of the Bengali books online and the higher the word accuracy increases the chances of users pulling back all relevant hits from their keyword search. We noticed the results often missed the upper zone of certain Bengali characters, i.e. the part of the character or glyph which resides above the matra line that connects characters in Bengali words. Further training focused on recognition of these characters may improve the results.

TranskribusResultsGraph showing the learning curve of the Bengali model using the Transkribus HTR tool.      

Our training set of 50 pages is very small compared to other projects using Transkribus and so we think the accuracy could be vastly improved by creating more transcriptions and re-training the model. However, we're happy with these initial results and would encourage others in a similar position to give Transkribus a try.

 

 

03 October 2018

The submission deadline for BL Labs Awards 2018 is next week!

The British Library has a vast, and continuously expanding, collection of material in digital form. You can dig into our datasets with text and data mining tools, conduct image analysis while listening to wildlife recordings, browse thousands of digitised manuscripts, and get lost in the million public domain images from BL publications available on Flickr.

To celebrate the variety of ways in which people have engaged with these amazing resources, the British Library Labs team run an Awards competition every autumn. Awards are given for completed projects in four categories:

  • Research - A project or activity which shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour which inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

The competition is open to applicants from anywhere in the world – providing they have based their work on the British Library’s data or digital collections. There is also a Staff Award for a project by a current member of staff (or a team) at the British Library. In each category, winners receive £500 and runners up, £100 – as well as fame, glory and prestige, of course, and will be presented with their awards at the annual BL Labs Symposium on Monday 12th November 2018.

The deadline for submitting your project for one of this year’s external awards is midnight (BST) on Thursday 11th October – just over a week from now! You can read the small print (Terms & Conditions etc) here, and submit your entry using this online form.

Lucky British Library staff get an extra 12 hours to submit a project for an award – deadline midday (BST) on Friday 12th October.

BLAwards2018

BL Labs Awards 2017 Winners. Top-Left, Research – A large-scale comparison of world music corpora with computational tools; Top-Right, Commercial – Movable Type: The Card Game; Bottom-Left, Artistic – Imaginary Cities; Bottom-Right, Teaching/Learning – Vittoria’s World of Stories.

We encourage applications from all fields: digital humanities researchers, artists, musicians, entrepreneurs, game designers, writers, poets, statisticians, library scientists … the list really is endless and every year we are surprised and delighted by the new ways in which the digital collections have been used. If you would like to read about some of the previous projects, click on the links below which take you to blogs about last year’s star entrants. You can also browse previous submissions in any of the categories using this guide to the digital projects archive.

Read about some of the fantastic projects that won awards in 2017:

So hurry - get your applications in, and join the party on the 12th November!!

For any further information about BL Labs or our Awards, please contact us at labs@bl.uk.