THE BRITISH LIBRARY

Digital scholarship blog

13 posts categorized "Middle East"

19 March 2019

BL Labs 2018 Commercial Award Runner Up: 'The Seder Oneg Shabbos Bentsher'

Add comment

This guest blog was written by David Zvi Kalman on behalf of the team that received the runner up award in the 2018 BL Labs Commercial category.

32_god_web2

The bentsher is a strange book, both invisible and highly visible. It is not among the more well known Jewish books, like the prayerbook, Hebrew Bible, or haggadah. You would be hard pressed to find a general-interest bookstore selling a copy. Still, enter the house of a traditional Jew and you’d likely find at least a few, possibly a few dozen. In Orthodox communities, the bentsher is arguably the most visible book of all.

Bentshers are handbooks containing the songs and blessings, including the Grace after Meals, that are most useful for Sabbath and holiday meals, as well as larger gatherings. They are, as a rule, quite small. These days, bentshers are commonly given out as party favors at Jewish weddings and bar/bat mitzvahs, since meals at those events require them anyway. Many bentshers today have personalized covers relating the events at which they were given.

Bentshers have never gone out of print. By this I mean that printing began with the invention of the printing press and has never stopped. They are small, but they have always been useful. Seder Oneg Shabbos, the version which I designed, was released 500 years after the first bentsher was published. It is, in a sense, a Half Millennium Anniversary Special Edition.

SederOneg_4

Bentshers, like other Jewish books, could be quite ornate; some were written and illustrated by hand. Over the years, however, bentshers have become less and less interesting, largely in order to lower the unit cost. In order to make it feasible for wedding planners to order hundreds at a time, all images were stripped from the books, the books themselves became very small, and any interest in elegant typography was quickly eliminated. My grandfather, who designed custom covers for wedding bentshers, simply called the book, “the insert.” Custom prayerbooks were no different from custom matchbooks.

This particular bentsher was created with the goal of bucking this trend; I attempted to give the book the feel of the some of the Jewish books and manuscripts of the past, using the research I was able to gather a graduate student in the field of Jewish history. Doing this required a great deal of image research; for this, the British Library’s online resources were incredible valuable. Of the more than one hundred images in the book, a plurality are from the British Library’s collections.

https://data.bl.uk/hebrewmanuscripts/

https://www.bl.uk/hebrew-manuscripts

OS_36_37

In addition to its visual element, this bentsher differs from others in two important ways. First, it contains ritual languages that is inclusive of those in the LGBTQ community, and especially for those conducting same-sex weddings. In addition, the book contains songs not just in Hebrew, but in Yiddish, as well; this was a homage to two Yiddishists who aided in creating the bentsher’s content. The bentsher was first used at their wedding.

SederOneg_3

More here: https://shabb.es/sederonegshabbos/

Watch David accepting the runner up award and talking about the Seder Oneg Shabbos Bentsher on our YouTube channel (clip runs from 5.33 to 7.26): 

David Zvi Kalman was responsible for the book’s design, including the choice of images. He is a doctoral candidate at the University of Pennsylvania, where he focuses on the relationship between Jewish history and the history of technology. Sarah Wolf is a specialist in rabbinics and is an assistant professor at the Jewish Theology Seminary of America. Joshua Schwartz is a doctoral student at New York University, where he studies Jewish mysticism. Sarah and Joshua were responsible for most of the books translations and transliterations. Yocheved and Yudis Retig are Yiddishists and were responsible for the book’s Yiddish content and translations.

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

01 February 2019

The British Library / Qatar Foundation Partnership Imaging Hack Day – Part 2

Add comment

On 25th October the BL/QFP team held their first Hack Day. The original concept, as well as the planning and run up to the day were previously discussed in a blog piece by Sotirios Alpanis, but here, I am excited to present details of the day itself and its results.

On the morning of the Hack Day, there was a creative buzz in the air as the digitisation studio transformed into an artist’s workshop. Free from the strict guidelines for imaging and quality assurance, there were no targets to meet, and the only brief was to engage creatively with the collection. Throughout the day, there was a noticeable stream of movement back and forth between the sixth floor office and the Imaging studio. Elizabeth Hunter and Carl Norman from Imaging Services South were in and out all day to check on the three cameras they had set up in different corners of the studio to record a time-lapse video of all the activities, and many members of the wider team dropped in for conversation.

At the end of the day, the team gave short informal presentations on their individual projects. The projects revealed a variety of skills and a wide range of ideas. Some linked with each other contextually, whilst others resulted from collaboration with members of the wider team - hence the frequent visits by staff from the main office!

Matt and Melanie working on their hacks


The hacks

Daniel Loveday was interested in colourisation, setting out to add historically accurate colours to black and white photographic images. With assistance from Louis Allday, Gulf History Content Specialist, he worked on a portrait of the Sharif of Mecca. Louis speculated that the ‘turban would have been white, a colour of turban that was often worn by Sayyids (descendants of the Prophet Muhammad, which he was) and as worn by a later Sharif of Mecca, Husayn’. He also suggested that the gown ‘would be similar to European-style naval dress (dark/navy blue with gold detailing) as the dress of late Ottoman era officials such as this was heavily influenced by European military forms at the time.’ The last piece of clothing to colour was the sash across the body which Louis reasoned ‘would possibly be green, a colour also closely associated with the Prophet, his family and his descendants.’

Sharif of Mecca


Coming from a different angle, Melanie Taylor explored the potential to form new relationships and striking visual narratives from digitised content. Reflecting on how imaging technicians engage with content, Melanie’s project directed focus to the purely visual qualities of digitised items. Drawing on hundreds of images, her results were poetic and illuminating. The wider BL/QFP team, who usually look at the material through the eyes of a reader or researcher, appreciated this fresh perspective.

Visual Narratives 1
Visual Narratives 2
Visual Narratives 3

Matt’s Lee work also evolved from acts of repetition. Placing images of parallel or similar things side by side, he was able to create interesting patterns, comparisons and meanings. These visual typologies included lists of dates, places, signatures, and logos. Louis suggested a typology made from ‘the names of the numerous states of the area that was to become Saudi Arabia in 1932, including both earlier (and smaller) incarnations of the Saudi state and other rival kingdoms and emirates that it conquered.’ This idea remains a work in progress.

Visual typology (stamps, crests and logos)

Visual typology (Muscat typography)

Darran Murray and Jordi Clopes Masjuan were both inspired by an astrolabe quadrant from ‘Two treatises on Astronomy by Sibṭ al-Māridīnī’, albeit in very different ways. Jordi sought to capture images highlighting the quadrant’s shape and texture, emphasising its beauty as an artefact. Owing to pressures of time, copy stand photography in digitisation projects doesn’t usually allow for experimentation with lighting techniques; and, in the interests of consistency, the colour of the backing paper remains the same throughout. Here, however, Jordi specifically matched the paper to the quadrant, and thus effectively revealed more details of its edges.

Astrolabe quadrant before and after

Astrolabe quadrant and the ‘Four treatises on Astronomy‘

Darran investigated the quadrant’s practical function and educational qualities by creating a guide on ‘How to build and use your very own Astrolabe.’ He went on to make a paper version of the wooden device himself.

Astrolabe quadrant model

Darran’s guide would be beautifully illustrated by both of these projects!


The next project also had an educational purpose and aimed to make the resources of the QDL more accessible to visitors who may be less familiar with Gulf and Middle Eastern history. Combining storytelling, maps and images, they created an interactive version of a tour by Political Resident, Geoffrey Prior, from Bahrain to Muscat, in February 1940, transporting the viewer to the different locations where particular events took place. To do this, they used a JavaScript tool called Story Map by Knight Lab, the Northwestern University, which is user friendly and can be embedded in a website. Unsurprisingly, the idea generated a lot of interest and further ideas for telling other stories in similar ways.

 

Rebecca Harris wanted to digitise landscape and aerial photographs using a macro lens in order to ‘bring the viewers’ attention to details and fragments which would otherwise go unnoticed.’ Using her computer screen like a magnifying glass, she examined topographical features found in shelfmarks IOR/R/15/1/606 and IOR/L/PS/12/1956, and was able to identify potential coordinates.

Macro experiments

Similarly to Daniel, Hannah Nagle has focused on colourisation. She drew inspiration from early autochrome images of the Middle East found in Albert Kahn’s collection ‘The Archives of the Planet’. She coloured a black and white photograph depicting a group of women in Linga, Persia, skilfully conveying the atmosphere of the region at the beginning of the twentieth century.

Colourised black and white image


At the end of the day, I hacked my own shelfmark as well, 'Four treatises on Astronomy'. I picked a manuscript damaged by insects and with assistance from our Conservation Team Manager Salvador Alcantra Pelaez, I photographed it on contrasting paper. Later the pages were digitally removed in order to draw attention to the damage. All the images were combined in an animation showing insects’ impact on the manuscript.

Capturing ‘Four treatises on Astronomy’

Marks of insect damage

The first Hack Day was an all-round success. It was interesting for the wider team to see how imaging technicians engaged creatively with the material we are digitising. Furthermore, to observe what collaborations formed and how knowledge was shared amongst different areas of the project.

A week after the Hack Day, we ran a retrospective meeting, where we unanimously decided to repeat the activity on quarterly basis. We gathered a lot of feedback and scheduled the next Hack Day for the 7th February. I can reveal that this time the Imaging team will be responding to a theme, watch this space for more hacks!

Hack Day poster

Hack Day poster

This is a guest post by Renata Kaminska, Digitisation Studio Manager for the British Library's Qatar Project

 

19 November 2018

The British Library / Qatar Foundation Partnership Imaging Hack Day

Add comment

The BL/QFP is digitising archive material related to Persian Gulf History as well as Arabic scientific manuscripts, in the past four years we have added in excess of 1.5 million images to the Qatar Digital Library. Our team of ~45 staff includes a group of eight dedicated imaging professionals, who between them produce 30,000 digitised images each month, to exacting standards that focus on presenting the information on the page in a visually clear and consistent manner.

 

Our imaging team are a highly-skilled group, with a variety of backgrounds, experiences and talents, and we wished to harness these. Therefore, we decided to set aside a day for our Imaging team to use their creative and technical skills to ‘hack’ the material in our collection.

By dedicating a whole day for our imaging team to experiment with different ways of capturing the material we are digitising we hoped it would reveal some interesting aspects of the collection, which were not seen through our standardised capture process. It also gave the Imaging team a chance to show off and share their skills amongst themselves and the wider BL/QFP team.

This was how we conceived of our first Imaging Hack Day, and the rest of this blog post outlines how we promoted and organised it.

From its conception the Imaging team were keen for the wider team to be involved, so we asked them to nominate material from the collections we are digitising that they thought could be ‘hacked’ and to state their reasons why.

To begin with it was mostly members of the Imaging team that nominated items. So we decided to wage a PR campaign: firstly the Imaging team delivered a presentation on the 9th of October at one of BL/QFP’s all-staff meetings. The presentation outlined some of the techniques and ideas they had for the hack day, in order to appeal to the rest of the team for nominations. Additionally, on the morning of the 9th members of the Imaging team snuck into the office and planted some not-so-subtle propaganda:

Posters

The impact of the posters and presentation was really pronounced. After having a handful of nominations from people outside of the Imaging team before 9th Oct, within days the number had increased by a factor in excess of five (see graph below). The posters also became highly sought after amongst the team.

Nominations
Graph showing how many shelfmarks were nominated each day, with cumulative totals for members of the imaging team vs non-imaging teams.

 

The day before the Hack Day, anyone who had nominated an item was invited to a prep session with the Imaging team. Here the nominated items were presented, as well as the ideas for hacks. Extra judicious use of Post-Its and Sharpies facilitated feedback, and by the end of the session the Imaging team were armed with lots of ideas, encouragement, and knew they had curatorial expertise from the rest of the BL/QFP team to call upon if necessary.

Postits

As a final surprise, and a sign of appreciation Hack Sacks filled with goodies were secreted into the imaging studio late on the eve of the Hack Day:

Hacksacks

The resulting images/hacks of the Hack Day will be covered in an upcoming post by our studio manager Renata Kaminska. However, in addition the non-material results were manifold. Throughout the lead-up and on the actual day there was a palpable buzz amongst the Imaging team, evidence of the positive impact on their morale. It also led to a greater exchange of knowledge between the Imaging team and their colleagues throughout the BL/QFP. The day allowed for different areas of the team to come together, combine their expertise and find new ways of working and innovative ways of capturing our collections. Finally, it also demonstrated the fantastic experience and skills of our imaging technicians, many of which had not previously been exposed to the rest of the team. It was a real celebration of both the material that we are digitising and our talented imaging studio.

This is a guest post by Sotirios Alpanis, Head of Digital Operations for the British Library's Qatar Project, on Twitter as @SotiriosAlpanis

23 August 2018

BL Labs Symposium (2018): Book your place for Mon 12-Nov-2018

Add comment

The BL Labs team are pleased to announce that the sixth annual British Library Labs Symposium will be held on Monday 12 November 2018, from 9:30 - 17:30 in the British Library Knowledge Centre, St Pancras. The event is free, and you must book a ticket in advance. Last year's event was a sell out, so don't miss out!

The Symposium showcases innovative and inspiring projects which use the British Library’s digital content, providing a platform for development, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of digital collections and data in the cultural heritage sector.

We are very proud to announce that this year's keynote will be delivered by Daniel Pett, Head of Digital and IT at the Fitzwilliam Museum, University of Cambridge.

Daniel Pett
Daniel Pett will be giving the keynote at this year's BL Labs Symposium. Photograph Copyright Chiara Bonacchi (University of Stirling).

  Dan read archaeology at UCL and Cambridge (but played too much rugby) and then worked in IT on the trading floor of Dresdner Kleinwort Benson. Until February this year, he was Digital Humanities lead at the British Museum, where he designed and implemented digital practises connecting humanities research, museum practice, and the creative industries. He is an advocate of open access, open source and reproducible research. He designed and built the award-winning Portable Antiquities Scheme database (which holds records of over 1.3 million objects) and enabled collaboration through projects working on linked and open data (LOD) with the Institute for the Study of the Ancient World (New York University) (ISAWNYU) and the American Numismatic Society. He has worked with crowdsourcing and crowdfunding (MicroPasts), and developed the British Museum's 3D capture reputation. He holds Honorary posts at UCL Institute of Archaeology and the Centre for Digital Humanities and publishes regularly in the fields of museum studies, archaeology and digital humanities.

Dan's keynote will reflect on his years of experience in assessing the value, impact and importance of experimenting with, re-imagining and re-mixing cultural heritage digital collections in Galleries, Libraries, Archives and Museums. Dan will follow in the footsteps of previous prestigious BL Labs keynote speakers: Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); and Bill Thompson and Andrew Prescott in 2013.

Stella Wisdom (Digital Curator for Contemporary British Collections at the British Library) will give an update on some exciting and innovative projects she and other colleagues have been working on within Digital Scholarship. Mia Ridge (Digital Curator for Western Heritage Collections at the British Library) will talk about a major and ambitious data science/digital humanities project 'Living with Machines' the British Library is about to embark upon, in collaboration with the Alan Turing Institute for data science and artificial intelligence.Throughout the day, there will be several announcements and presentations from nominated and winning projects for the BL Labs Awards 2018, which recognise work that have used the British Library’s digital content in four areas: Research, Artistic, Commercial, and Educational. The closing date for the BL Labs Awards is 11 October, 2018, so it's not too late to nominate someone/a team, or enter your own project! There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2018 which showcases the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close 12 October 2018).

Adam Farquhar (Head of Digital Scholarship at the British Library) will give an update about the future of BL Labs and report on a special event held in September 2018 for invited attendees from National, State, University and Public Libraries and Institutions around the world, where they were able to share best practices in building 'labs style environmentsfor their institutions' digital collections and data.

There will be a 'sneak peek' of an art exhibition in development entitled 'Imaginary Cities' by the visual artist and researcher Michael Takeo Magruder. His practice  draws upon working with information systems such as live and algorithmically generated data, 3D printing and virtual reality and combining modern / traditional techniques such as gold / silver gilding and etching. Michael's exhibition will build on the work he has been doing with BL Labs over the last few years using digitised 18th and 19th century urban maps bringing analog and digital outputs together. The exhibition will be staged in the British Library's entrance hall in April and May 2019 and will be free to visit.

Finally, we have an inspiring talk lined up to round the day off (more information about this will be announced soon), and - as is our tradition - the symposium will conclude with a reception at which delegates and staff can mingle and network over a drink and nibbles.

So book your place for the Symposium today and we look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

Posted by Mahendra Mahey and Eleanor Cooper (BL Labs Team)

01 May 2018

New Digital Curator in the Digital Scholarship Team

Add comment

Adi Keinan-SchoonbaertHello all! My name is Adi Keinan-Schoonbaert, and I’m the new Digital Curator for Asian and African collections at the British Library. One of the core remits of the Digital Scholarship team is to enable and encourage the reuse of the Library’s digital collections. When it comes to Asian and African collections, there are always interesting projects and initiatives going on. One is the Two Centuries of Indian Print project, which just started a second phase in March 2018 – a project with a strong Digital Humanities strand led by Digital Curator Tom Derrick. Another example is a collaborative transcription project, supporting the transcription of handwritten historical Arabic scientific works for Handwritten Text Recognition (HTR) research with the help of volunteers.

To give a bit of a background about myself and how I got to the Library: I’m an archaeologist and heritage professional by education and practice, with a PhD in Heritage Studies from University College London (2013). As a field archaeologist I used to record large quantities of excavation-related data – all manually, on paper. This was probably the first time I saw the potential of applying digital tools and technologies to record, manage and share archaeological data.

My first meaningful engagement with archaeological data and digital technologies started in 2005, when I joined the Israeli-Palestinian Archaeology Working Group (IPAWG) to create a database of all archaeological sites surveyed or excavated by Israel in the West Bank since its occupation in 1967, and its linking with a Geographic Information System (GIS), enabling the spatial visualisation and querying of this data for the first time. The research potential of this GIS-linked database proved so great, that I’ve decided to further explore it in a PhD dissertation. My dissertation focused on archaeological databases covering the occupied West Bank, and I was especially interested in the nature of archaeological records and the way they reflect particular research interests and heritage management priorities, as well as variability in data quality, coverage, accuracy and reliability.

Following my PhD I stayed at UCL Institute of Archaeology as a post-doctoral research associate, and participated in a project called MicroPasts, a UCL-British Museum collaboration. This project used web-based, crowdsourcing methods to allow traditional academics and other communities in archaeology to co-produce innovative open datasets. The MicroPasts crowdsourcing platform provided a great variety of projects through which people could contribute – from transcribing British Museum card catalogues, through tagging videos on the Roman Empire, to photomasking images in preparation for 3D modelling of museum objects.

With the main phase of the MicroPasts project coming to an end, I joined the British Library as Digital Curator (Polonsky Fellow) for the Hebrew Manuscripts Digitisation Project. This role allowed me to create and implement a digital strategy for engaging, accessing and promoting a specific digitised collection, working closely with curators and the Digital Scholarship team. My work included making the collection digitally accessible (on data.bl.uk, working with British Library Labs) and encouraging open licensing, creating a website, promoting the collection in different ways, researching available digital methods to explore and exploit collections in novel ways, and implementing tools such as an online catalogue records viewer (TEI XML), OpenRefine, and 3D modelling.

A 6-months backpacking trip to Asia unexpectedly prepared me for my new role at the Library. I was delighted to join – or re-join – the Library’s Digital Research team, this time as Digital Curator for Asian and African Collections. I find these collections especially intriguing due to their diversity, richness and uniqueness. These include mostly manuscripts, printed books, periodicals, newspapers, photographs and e-resources from Africa, the Middle East (including Qatar Digital Library), Central Asia, East Asia (including the International Dunhuang Project), South Asia, SE Asia – as well as the Visual Arts materials.

I’m very excited to join the Library’s Digital Research team work alongside Neil Fitzgerald, Nora McGregor, Mia Ridge and Stella Wisdom and learn from their rich experience. Feel free to get in touch with us via digitalresearch@bl.uk or Twitter - @BL_AdiKS for me, or @BL_DigiSchol for the Digital Scholarship team.

12 March 2018

The Ground Truth: Transcribing historical Arabic Scientific Manuscripts for OCR research

Add comment

Announcing a collaborative transcription project to support state-of-the-art research in automatic handwritten text recognition for historical Arabic texts

Cultural heritage institutions around the world are digitising hundreds of thousands of pages of historical Arabic manuscript and archive collections. Making these fully text searchable has the potential to truly transform scholarship, opening up this rich content for discovery and enabling large-scale analysis.

Computer scientists and scholars are working on this challenge, building systems which can automatically transcribe images of handwritten text, but for historical Arabic script a solution remains just out of reach.

Our aim is to contribute to continued research in this area by building an open image and ground truth dataset of historical handwritten Arabic texts, ensuring historical Arabic collections benefit from state-of-the-art developments in handwritten text recognition.

What is Ground Truth?

Optical Character Recognition (OCR) systems essentially turn a picture of text into text itself—in other words, producing something like a .TXT or .DOC file from a scanned .JPG of a printed or handwritten page. Most OCR systems require ground truth, a set of files which represent the truthful record of elements of an image, for training and evaluation purposes.

The ground truth of an image’s text content, for instance, is the complete and accurate record of every character and word in the image.

By knowing what the system is supposed to recognise on a page of handwritten text, researchers can both train their system to recognise the characters as well as test how well the system does once trained.

Transcription
 

  
View more transcriptions in progress from this manuscript (Or 3366) on the platform 

A collaborative approach

This project is a proof of concept exploring whether the creation of such a dataset can be done collaboratively at scale, using the collective expertise of volunteers around the world. At the heart of this approach is the Library’s enduring commitment to creating new and interesting ways to connect diverse communities of interest and expertise, be it scholars, the general public, computer scientists, students, and curators, around our collections. For this we are utilising a free and open-source platform, From the Page, which allows anyone with an interest in historical Arabic manuscripts to experience them up close, many for the first time, to discuss, learn and share expertise in their transcription.

Helping transform research

The Digital Scholarship Department was able to fund the development of this open source platform to support Right-to-Left transcription, a feature which will benefit any scholar wishing to use the software for their own transcription needs. Any transcriptions produced in this pilot will be transformed into ground truth resources, hosted by the British Library and made freely available, without rights restriction, for anyone wishing to advance the state-of-the-art in optical character recognition technology. Specifically, resources created will be contributed to ground-breaking projects already underway such as Transkribus, the Open Islamic Texts Initiative, the IMPACT Centre of Competence Image and Ground Truth Resources and more!

Visit the new Arabic Scientific Manuscripts of the British Library transcription platform and download our Getting Started Guide for more detail (an Arabic version will be available shortly). 

  

Posted by Nora McGregor, Digital Curator, British Library

 

13 February 2018

BL Labs 2017 Symposium: Samtla, Research Award Runner Up

Add comment

Samtla (Search And Mining Tools for Labelling Archives) was developed to address a need in the humanities for research tools that help to search, browse, compare, and annotate documents stored in digital archives. The system was designed in collaboration with researchers at Southampton University, whose research involved locating shared vocabulary and phrases across an archive of Aramaic Magic Texts from Late Antiquity. The archive contained texts written in Aramaic, Mandaic, Syriac, and Hebrew languages. Due to the morphological complexity of these languages, where morphemes are attached to a root morpheme to mark gender and number, standard approaches and off-the-shelf software were not flexible enough for the task, as they tended to be designed to work with a specific archive or user group. 

Figure1
Figure 1: Samtla supports tolerant search allowing queries to be matched exactly and approximately. (Click to enlarge image)

  Samtla is designed to extract the same or similar information that may be expressed by authors in different ways, whether it is in the choice of vocabulary or the grammar. Traditionally search and text mining tools have been based on words, which limits their use to corpora containing languages were 'words' can be easily identified and extracted from text, e.g. languages with a whitespace character like English, French, German, etc. Word models tend to fail when the language is morphologically complex, like Aramaic, and Hebrew. Samtla addresses these issues by adopting a character-level approach stored in a statistical language model. This means that rather than extracting words, we extract character-sequences representing the morphology of the language, which we then use to match the search terms of the query and rank the documents according to the statistics of the language. Character-based models are language independent as there is no need to preprocess the document, and we can locate words and phrases with a lot of flexibility. As a result Samtla compensates for the variability in language use, spelling errors made by users when they search, and errors in the document as a result of the digitisation process (e.g. OCR errors). 

Figure2
Figure 2: Samtla's document comparison tool displaying a semantically similar passage between two Bibles from different periods. (Click to enlarge image)

 The British Library have been very supportive of the work by openly providing access to their digital archives. The archives ranged in domain, topic, language, and scale, which enabled us to test Samtla’s flexibility to its limits. One of the biggest challenges we faced was indexing larger-scale archives of several gigabytes. Some archives also contained a scan of the original document together with metadata about the structure of the text. This provided a basis for developing new tools that brought researchers closer to the original object, which included highlighting the named entities over both the raw text, and the scanned image.

Currently we are focusing on developing approaches for leveraging the semantics underlying text data in order to help researchers find semantically related information. Semantic annotation is also useful for labelling text data with named entities, and sentiments. Our current aim is to develop approaches for annotating text data in any language or domain, which is challenging due to the fact that languages encode the semantics of a text in different ways.

As a first step we are offering labelled data to researchers, as part of a trial service, in order to help speed up the research process, or provide tagged data for machine learning approaches. If you are interested in participating in this trial, then more information can be found at www.samtla.com.

Figure3
Figure 3: Samtla's annotation tools label the texts with named entities to provide faceted browsing and data layers over the original image. (Click to enlarge image)

 If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.


Posted by BL Labs on behalf of Dr Martyn Harris, Prof Dan Levene, Prof Mark Levene and Dr Dell Zhang.

05 February 2018

8th Century Arabic science meets today's computer science

Add comment

Or, Announcing a Competition for the Automatic Transcription of Historical Arabic Scientific Manuscripts 

“An impartial view of Digital Humanities (DH) scholarship in the present day reveals a stark divide between ‘the West and the rest’…Far fewer large-scale DH initiatives have focused on Asia and the non-Western world than on Western Europe and the Americas…Digital databases and text corpora – the ‘raw material’ of text mining and computational text analysis – are far more abundant for English and other Latin alphabetic scripts than they are for Chinese, Japanese, Korean, Sanskrit, Hindi, Arabic and other non-Latin orthographies…Troves of unread primary sources lie dormant because no text mining technology exists to parse them.”

-Dr. Thomas Mullaney, Associate Professor of Chinese History at Stanford University

Supporting the use of Asian & African Collections in digital scholarship means shining a light on this stark divide and seeking ways to close the gap. In this spirit, we are excited to announce the ICFHR2018 Competition on Recognition of Historical Arabic Scientific Manuscripts.

Add MS 7474_0043.script

The Competition

Drawing together experts from British Library, The Alan Turing Institute, Qatar Digital Library and PRImA Research Lab, our aim in launching this competition is to play an active roll in advancing the state-of-the-art in handwritten text recognition technologies for Arabic. For our first challenge we are focussing on finding an optimal solution for accurately and automatically transcribing historical Arabic scientific handwritten manuscripts.

Though such technologies are still in their infancy, unlocking historical handwritten Arabic manuscripts for large-scale text analysis has the potential to truly transform research. In conjunction with the competition we hope to build and make freely open and available a substantial image and ground truth dataset to support continued efforts in this area. 

Enter the Competition

Organisers

Apostolos Antonacopoulos Professor of Pattern Recognition, University of Salford and Head of (PRImA) research lab 
Christian Clausner Research Fellow at the Pattern Recognition and Image Analysis (PRImA) research lab  
Nora McGregor Digital Curator at British Library, Asian & African Collections
Daniel Lowe Curator at British Library, Arabic Collections
Daniel Wilson-Nunn, PhD student at University of Warwick & Turing PhD Student based at Alan Turing Institute 
• Bink Hallum, Arabic Scientific Manuscripts Curator at British Library/Qatar Foundation Partnership 

Further reading

For more on recent Digital Research Team text recognition and transcription projects see:

 

This post is by Nora McGregor, Digital Curator, British Library. She is on twitter as @ndalyrose