THE BRITISH LIBRARY

Digital scholarship blog

25 posts categorized "Manuscripts"

20 January 2020

Using Transkribus for Arabic Handwritten Text Recognition

Add comment

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

In the last couple of years we’ve teamed up with PRImA Research Lab in Salford to run competitions for automating the transcription of Arabic manuscripts (RASM2018 and RASM2019), in an ongoing effort to identify good solutions for Arabic Handwritten Text Recognition (HTR).

I’ve been curious to test our Arabic materials with Transkribus – one of the leading tools for automating the recognition of historical documents. We’ve already tried it out on items from the Library’s India Office collection as well as early Bengali printed books, and we were pleased with the results. Several months ago the British Library joined the READ-COOP – the cooperative taking up the development of Transkribus – as a founding member.

As with other HTR tools, Transkribus’ HTR+ engine cannot start automatic transcription straight away, but first needs to be trained on a specific type of script and handwriting. This is achieved by creating a training dataset – a transcription of the text on each page, as accurate as possible, and a segmentation of the page into text areas and line, demarcating the exact location of the text. Training sets are therefore comprised of a set of images and an equivalent set of XML files, containing the location and transcription of the text.

A screenshot from Transkribus, showing the segmentation and transcription of a page from Add MS 7474
A screenshot from Transkribus, showing the segmentation and transcription of a page from Add MS 7474.

 

This process can be done in Transkribus, but in this case I already had a training set created using PRImA’s software Aletheia. I used the dataset created for the competitions mentioned above: 120 transcribed and ground-truthed pages from eight manuscripts digitised and made available through QDL. This dataset is now freely accessible through the British Library’s Research Repository.

Transkribus recommends creating a training set of at least 75 pages (between 5,000 and 15,000 words), however I was interested to find out a few things. First, the methods submitted for the RASM2019 competition worked on a training set of 20 pages, with an evaluation set of 100 pages. Therefore, I wanted to see how Transkribus’ HTR+ engine dealt with the same scenario. It should be noted that the RASM2019 methods were evaluated using PRImA’s evaluation methods, and this is not the case with Transkribus evaluation method – therefore, the results shown here are not accurately comparable, but give some idea on how Transkribus performed on the same training set.

I created four different models to see how Transkribus’ recognition algorithms deal with a growing training set. The models were created as follows:

  • Training model of 20 pages, and evaluation set of 100 pages
  • Training model of 50 pages, and evaluation set of 70 pages
  • Training model of 75 pages, and evaluation set of 45 pages
  • Training model of 100 pages, and evaluation set of 20 pages

The graphs below show each of the four iterations, from top to bottom:

CER of 26.80% for a training set of 20 pages

CER of 19.27% for a training set of 50 pages

CER of 15.10% for a training set of 75 pages

CER of 13.57% for a training set of 100 pages

The results can be summed up in a table:

Training Set (pp.)

Evaluation Set (pp.)

Character Error Rate (CER)

Character Accuracy

20

100

26.80%

73.20%

50

70

19.27%

80.73%

75

45

15.10%

84.9%

100

20

13.57%

86.43%

 

Indeed the accuracy improved with each iteration of training – the more training data the neural networks in Transkribus’ HTR+ engine have, the better the results. With a training set of a 100 pages, Transkribus managed to automatically transcribe the rest of the 20 pages with 86.43% accuracy rate – which is pretty good for historical handwritten Arabic script.

As a next step, we could consider (1) adding more ground-truthed pages from our manuscripts to increase the size of the training set, and by that improve HTR accuracy; (2) adding other open ground truth datasets of handwritten Arabic to the existing training set, and checking whether this improves HTR accuracy; and (3) running a few manuscripts from QDL through Transkribus to see how its HTR+ engine transcribes them. If accuracy is satisfactory, we could see how to scale this up and make those transcriptions openly available and easily accessible.

In the meantime, I’m looking forward to participating at the OpenITI AOCP workshop entitled “OCR and Digital Text Production: Learning from the Past, Fostering Collaboration and Coordination for the Future,” taking place at the University of Maryland next week, and catching up with colleagues on all things Arabic OCR/HTR!

 

03 October 2019

BL Labs Symposium (2019): Book your place for Mon 11-Nov-2019

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the seventh annual British Library Labs Symposium will be held on Monday 11 November 2019, from 9:30 - 17:00* (see note below) in the British Library Knowledge Centre, St Pancras. The event is FREE, and you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early!

*Please note, that directly after the Symposium, we have teamed up with an interactive/immersive theatre company called 'Uninvited Guests' for a specially organised early evening event for Symposium attendees (the full cost is £13 with some concessions available). Read more at the bottom of this posting!

The Symposium showcases innovative and inspiring projects which have used the British Library’s digital content. Last year's Award winner's drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

The annual event provides a platform for the development of ideas and projects, facilitating collaboration, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of the British Library's and other organisations' digital collections and data in many other sectors. Read what groups of Master's Library and Information Science students from City University London (#CityLIS) said about the Symposium last year.

We are very proud to announce that this year's keynote will be delivered by scientist Armand Leroi, Professor of Evolutionary Biology at Imperial College, London.

Armand Leroi
Professor Armand Leroi from Imperial College
will be giving the keynote at this year's BL Labs Symposium (2019)

Professor Armand Leroi is an author, broadcaster and evolutionary biologist.

He has written and presented several documentary series on Channel 4 and BBC Four. His latest documentary was The Secret Science of Pop for BBC Four (2017) presenting the results of the analysis of over 17,000 western pop music from 1960 to 2010 from the US Bill Board top 100 charts together with colleagues from Queen Mary University, with further work published by through the Royal Society. Armand has a special interest in how we can apply techniques from evolutionary biology to ask important questions about culture, humanities and what is unique about us as humans.

Previously, Armand presented Human Mutants, a three-part documentary series about human deformity for Channel 4 and as an award winning book, Mutants: On Genetic Variety and Human Body. He also wrote and presented a two part series What Makes Us Human also for Channel 4. On BBC Four Armand presented the documentaries What Darwin Didn't Know and Aristotle's Lagoon also releasing the book, The Lagoon: How Aristotle Invented Science looking at Aristotle's impact on Science as we know it today.

Armands' keynote will reflect on his interest and experience in applying techniques he has used over many years from evolutionary biology such as bioinformatics, data-mining and machine learning to ask meaningful 'big' questions about culture, humanities and what makes us human.

The title of his talk will be 'The New Science of Culture'. Armand will follow in the footsteps of previous prestigious BL Labs keynote speakers: Dan Pett (2018); Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); Bill Thompson and Andrew Prescott in 2013.

The symposium will be introduced by the British Library's new Chief Librarian Liz Jolly. The day will include an update and exciting news from Mahendra Mahey (BL Labs Manager at the British Library) about the work of BL Labs highlighting innovative collaborations BL Labs has been working on including how it is working with Labs around the world to share experiences and knowledge, lessons learned . There will be news from the Digital Scholarship team about the exciting projects they have been working on such as Living with Machines and other initiatives together with a special insight from the British Library’s Digital Preservation team into how they attempt to preserve our digital collections and data for future generations.

Throughout the day, there will be several announcements and presentations showcasing work from nominated projects for the BL Labs Awards 2019, which were recognised last year for work that used the British Library’s digital content in Artistic, Research, Educational and commercial activities.

There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2019 which highlights the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close midday 5 November 2019).

As is our tradition, the Symposium will have plenty of opportunities for networking throughout the day, culminating in a reception for delegates and British Library staff to mingle and chat over a drink and nibbles.

Finally, we have teamed up with the interactive/immersive theatre company 'Uninvited Guests' who will give a specially organised performance for BL Labs Symposium attendees, directly after the symposium. This participatory performance will take the audience on a journey through a world that is on the cusp of a technological disaster. Our period of history could vanish forever from human memory because digital information will be wiped out for good. How can we leave a trace of our existence to those born later? Don't miss out on a chance to book on this unique event at 5pm specially organised to coincide with the end of the BL Labs Symposium. For more information, and for booking (spaces are limited), please visit here (the full cost is £13 with some concessions available). Please note, if you are unfortunate in not being able to join the 5pm show, there will be another performance at 1945 the same evening (book here for that one).

So don't forget to book your place for the Symposium today as we predict it will be another full house again and we don't want you to miss out.

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

13 September 2019

Results of the RASM2019 Competition on Recognition of Historical Arabic Scientific Manuscripts

Add comment

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

 

Earlier this year, the British Library in collaboration with PRImA Research Lab and the Alan Turing Institute launched a competition on the Recognition of Historical Arabic Scientific Manuscripts, or in short, RASM2019. This competition was held in the context of the 15th International Conference on Document Analysis and Recognition (ICDAR2019). It was the second competition of this type, following RASM2018 which took place in 2018.

The Library has an extensive collection of Arabic manuscripts, comprising of almost 15,000 works. We have been digitising several hundred manuscripts as part of the British Library/Qatar Foundation Partnership, making them available on Qatar Digital Library. A natural next-step would be the creation of machine-readable content from scanned images, for enhanced search and whole new avenues of research.

Running a competition helps us identify software providers and tool developers, as well as introduce us to the specific challenges that pattern recognition systems face when dealing with historic, handwritten materials. For this year’s competition we provided a ground truth set of 120 images and associated XML files: 20 pages to be used to train text recognition systems to automatically identify Arabic script, and a 100 pages to evaluate the training.

Aside from providing larger training and evaluation sets, for this year’s competition we’ve added an extra challenge – marginalia. Notes written in the margins are often less consistent and less coherent than main blocks of text, and can go in different directions. The competition set out three different challenges: page segmentation, text line detection and Optical Character Recognition (OCR). Tackling marginalia was a bonus challenge!

We had just one submission for this year’s competition – RDI Company, Cairo University, who previously participated in 2018 and did very well. RDI submitted three different methods, and participated in two challenges: text line segmentation and OCR. When evaluating the results, PRImA compared established systems used in industry and academia – Tesseract 4.0, ABBYY FineReader Engine 12 (FRE12), and Google Cloud Vision API – to RDI’s submitted methods. The evaluation approach was the same as last year’s, with PRImA evaluating page analysis and recognition methods using different evaluation metrics, in order to gain an insight into the algorithms.

 

Results

Challenge 1 - Page Layout Analysis

The first challenge was set out to identify regions in a page, and find out where blocks of text are located on the page. RDI did not participate in this challenge, therefore an analysis was made only on common industry software mentioned above. The results can be seen in the chart below:

Chart showing RASM2019 page segmentation results
Chart showing RASM2019 page segmentation results

 

Google did relatively well here, and the results are quite similar to last year’s. Despite dealing with the more challenging marginalia text, Google’s previous accuracy score (70.6%) has gone down only very slightly to a still impressive 69.3%.

Example image showing Google’s page segmentation
Example image showing Google’s page segmentation

 

Tesseract 4 and FRE12 scored very similarly, with Tesseract decreasing from last year’s 54.5%. Interestingly, FRE12’s performance on text blocks including marginalia (42.5%) was better than last year’s FRE11 performance without marginalia, scoring at 40.9%. Analysis showed that Tesseract and FRE often misclassified text areas as illustrations, with FRE doing better than Tesseract in this regard.

 

Challenge 2 - Text Line Segmentation

The second challenge looked into segmenting text into distinct text lines. RDI submitted three methods for this challenge, all of which returned the text lines of the main text block (as they did not wish to participate in the marginalia challenge). Results were then compared with Tesseract and FineReader, and are reflected below:

Chart showing RASM2019 text line segmentation results
Chart showing RASM2019 text line segmentation results

 

RDI did very well with its three methods, with an accuracy level ranging between 76.6% and 77.6%. However, despite not attempting to segments marginalia text lines, their methods did not perform as well as last year’s method (with 81.6% accuracy). Their methods did seem to detect some marginalia, though very little overall, as seen in the screenshot below.

Example image showing RDI’s text line segmentation results
Example image showing RDI’s text line segmentation results

 

Tesseract and FineReader again scored lower than RDI, both with decreasing accuracy compared to RASM2018’s results (Tesseract 4 with 44.2%, FRE11 with 43.2%). This is due to the additional marginalia challenge. The Google method does not detect text lines, therefore the Text Line chart above does not include their results.

 

Challenge 3 - OCR Accuracy

The third and last challenge was all about text recognition, tackling the correct identification of characters and words in the text. Evaluation for this challenge was conducted four times: 1) on the whole page, including marginalia, 2) only on main blocks of text, excluding marginalia, 3) using the original texts, and 4) using normalised texts. Text normalisation was performed for both ground truth and OCR results, due to the historic nature of the material, occasional unusual spelling, and use/lack of diacritics. All methods performed slightly better when not tested on marginalia; accuracy rates are demonstrated in the charts below:

Chart showing OCR accuracy results, for main text body only (normalised, no marginalia)
Chart showing OCR accuracy results, for main text body only (normalised, no marginalia)
 
Chart showing OCR accuracy results for all text regions (normalised, with marginalia)
Chart showing OCR accuracy results for all text regions (normalised, with marginalia)

 

It is evident that there are minor differences in the character accuracies for the three RDI methods, with RDI2 performing slightly better than the others. When comparing the OCR accuracy between texts with and without marginalia, there are slightly higher success rates for the latter, though the difference is not significant. This means that tested methods performed on the marginalia almost as well as they did on the main text, which is encouraging.

Comparing RASM2018’s results, RDI’s results are good but not as good as last year (with 85.44% accuracy), likely to be a result of adding marginalia to the recognition challenge. Google performed very well too, considering they did not specifically train or optimised for this competition. Tesseract’s results went down from 30.45% to 25.13%, and FineReader Engine 12 performed better than its previous version FRE11, going up from 12.23% to 17.53% accuracy. However, it is still very low, as handwritten texts are not part of their target material.

 

Further Thoughts

RDI-Corporation has its own historical Arabic handwritten and typewritten OCR system, which has been built using different historical manuscripts. Its methods have done well, given the very challenging nature of the documents. Neither Tesseract nor ABBYY FineReader produce usable results, but that’s not surprising since they are both optimised for printed texts, and target contemporary material and not historical manuscripts.

As next steps, we would like to test these materials with Transkribus, which produced promising results for early printed Indian texts (see e.g. Tom Derrick’s blog post – stay tuned for some even more impressive results!), and potentially Kraken as well. All ground truth will be released through the Library’s future Open Access repository (now in testing phase), as well as through the website of IMPACT Centre for Competence. Watch this space for any developments!

 

21 August 2019

Chevening British Library Fellowship working with Chinese historical texts

Add comment

Chevening is the UK government’s international awards programme aimed at developing global leaders. In 2015, the Foreign and Commonwealth Office (FCO) has partnered with the British Library to offer professionals two new fellowships every year. These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as Nationalism, Independence, and Partition in South Asia, 1900-1950 and Big Data and Libraries.

We are thrilled to announce that one of the two placements available for the 2020/2021 academic year will focus on automating the recognition of historical Chinese handwritten texts. This is a special opportunity to work in the Library’s Digital Scholarship Department, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this Fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.

Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project. © The British Library
Chinese Lotus Sutra scroll with Tibetan divination texts on the back (Shelfmark: Or.8210/S.155). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project. © The British Library

 

The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it’s important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western language print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, a project called “Living with Machines,” is underway to apply Optical Character Recognition (OCR) to UK newspapers, design and implement new methods in data science and artificial intelligence, and analyse these materials at scale.

Taking a broader perspective on Library collections, we have started to explore opportunities with non-Latin collections too. Members of the Digital Scholarship team are engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. Another initiative which Tom is engaged with is exploring Transkribus for Bengali printed texts. He trained Transkribus’ HTR+ recognition engine, which ended up transcribing this material at 94% character accuracy! Tom and Adi’s recent blog post in EuropeanaTech Insight (issue on OCR) summarises these initiatives.

Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.
Regions and text lines demarcated as ground truth for RASM2019 ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts (Shelfmark: Add MS 7474). Digitised and available on Qatar Digital Library.

 

The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.

This fellowship programme will start in September 2020 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.

For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” Fellow page: https://www.chevening.org/fellowship/british-library-chinese-handwritten-texts/.

Applications close at 12pm (GMT), 5 November 2019. Good luck!

 

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Twitter as @BL_AdiKS.

16 April 2019

BL Labs 2018 Commercial Award Winner: 'The Library Collection'

Add comment

This guest blog post is by the team led by fashion designer, Nabil Nayal - winner of the BL Labs Commercial Award for 2018 - for his Spring/Summer 2019 collection, presented at the 2018 London Fashion Week.

Fashion models posing in room set
Nabil Nayal's SS19 Collection: fashion shoot at the British Library

The Nabil Nayal SS19 collection (The Library Collection) made history by becoming the first fashion show, on the official London Fashion Week schedule, to be hosted at the iconic British Library. The British Library’s digital archives deeply informed the collection. The Tilbury Speech, delivered by Queen Elizabeth I ahead of the attempted invasion of England by the Spanish Armada in 1588, was central to the use of print, as were other manuscripts, digitised images, maps and hymn sheets from the era. The collection encapsulates Nabil’s obsession with Elizabethan craftsmanship, whilst symbolising the power and strength of a woman who succeeded in bringing England into its Golden Age.

Nabil undertook historical research in the British Library for his PhD on Elizabethan dress, so the opportunity to collaborate with the Library in order to emphasise the importance of research in fashion education and practice was something he felt passionately about doing. Paying particular attention to the Library’s Elizabethan and Medieval Manuscripts archives, Nabil conducted his research with guidance from expert curators and with support from the Reading Room staff. Using key word search terms and date limitations to search through the digitised archives was particularly useful to find historically accurate documents to incorporate into the collection.

fashion model posing in manuscript inspired design
Nabil's design takes inspiration from the British Library's digitised 1588 manuscript of Queen Elizabeth I's 'Tilbury Speech'  © Nabil Nayal 2018

Elizabethan silhouettes were modernised in this collection by printing these manuscripts onto Nabil’s designs, including a three-metre-long cloak featuring the Tilbury Speech. A UK-based supplier, Silk Bureau, digitally printed the archival material on to a range of fine silks and cottons, which were then used to make garments within the collection. Nabil’s love of the classic white shirt was further explored too, offering a puritan backdrop that ‘whitewashes’ the complex hand-cut embellishments made of bonded poplins and marcella.

The designs in the SS19 collection have been sold to prestigious international stores such as Dover Street Market and Joyce and the collection will be launching exclusively in Selfridges this May (2019). The presentation also generated a huge response in key press and social media, including coverage in Vogue.

5 models posing on the catwalk
Nabil's Elizabethan-inspired designs at the BL Fashion Shoot © Nabil Nayal 2018

Nabil’s interest in promoting historical research within fashion was not limited to this collection. Currently, the brand is working with Collette Taylor of Vega Associates to continue to raise awareness of the potential of the Library’s collections to inspire the next generation of fashion researchers. Nabil held a Research Masterclass at the British Library in November 2018 to work with emerging designers as part of a fashion research competition to develop a capsule collection inspired by the Library’s collections.

This collaboration between Nabil Nayal and the British Library highlights the importance of design education and research for the future-proofing and continued success of UK creative industries, which is a pressing issue. Since 2010, there has been a 34% drop in GCSE entries across the arts, despite the fact that the UK fashion industry supports over 880,000 jobs and delivered a direct contribution of £28 billion to the UK economy in 2015. The wealth of free resources at the British Library provides ample opportunity for design students to explore how education and research can enrich their creativity and allow them to succeed within the fashion industry.

Nabil’s work has received praise from the late Karl Lagerfeld and celebrities such as Rihanna, Lorde and Florence Welch. His SS19 collection epitomises the way that the use of archival research within fashion can generate commercial success, suggesting that the ever-changing fashion industry can benefit from becoming more historically informed and that modernity can be evoked through an interest in the past.

Watch Jennifer Davies receiving the Commercial award on behalf of Nabil's team, and talking about the collection on our YouTube channel (clip runs from 7.26): 

You can read other blogs about Nabil Nayal at London Fashion Week and the fashion show at the British Library, and if you're feel inspired, use the British Library's online Fashion resources.

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

19 March 2019

BL Labs 2018 Commercial Award Runner Up: 'The Seder Oneg Shabbos Bentsher'

Add comment

This guest blog was written by David Zvi Kalman on behalf of the team that received the runner up award in the 2018 BL Labs Commercial category.

32_god_web2

The bentsher is a strange book, both invisible and highly visible. It is not among the more well known Jewish books, like the prayerbook, Hebrew Bible, or haggadah. You would be hard pressed to find a general-interest bookstore selling a copy. Still, enter the house of a traditional Jew and you’d likely find at least a few, possibly a few dozen. In Orthodox communities, the bentsher is arguably the most visible book of all.

Bentshers are handbooks containing the songs and blessings, including the Grace after Meals, that are most useful for Sabbath and holiday meals, as well as larger gatherings. They are, as a rule, quite small. These days, bentshers are commonly given out as party favors at Jewish weddings and bar/bat mitzvahs, since meals at those events require them anyway. Many bentshers today have personalized covers relating the events at which they were given.

Bentshers have never gone out of print. By this I mean that printing began with the invention of the printing press and has never stopped. They are small, but they have always been useful. Seder Oneg Shabbos, the version which I designed, was released 500 years after the first bentsher was published. It is, in a sense, a Half Millennium Anniversary Special Edition.

SederOneg_4

Bentshers, like other Jewish books, could be quite ornate; some were written and illustrated by hand. Over the years, however, bentshers have become less and less interesting, largely in order to lower the unit cost. In order to make it feasible for wedding planners to order hundreds at a time, all images were stripped from the books, the books themselves became very small, and any interest in elegant typography was quickly eliminated. My grandfather, who designed custom covers for wedding bentshers, simply called the book, “the insert.” Custom prayerbooks were no different from custom matchbooks.

This particular bentsher was created with the goal of bucking this trend; I attempted to give the book the feel of the some of the Jewish books and manuscripts of the past, using the research I was able to gather a graduate student in the field of Jewish history. Doing this required a great deal of image research; for this, the British Library’s online resources were incredible valuable. Of the more than one hundred images in the book, a plurality are from the British Library’s collections.

https://data.bl.uk/hebrewmanuscripts/

https://www.bl.uk/hebrew-manuscripts

OS_36_37

In addition to its visual element, this bentsher differs from others in two important ways. First, it contains ritual languages that is inclusive of those in the LGBTQ community, and especially for those conducting same-sex weddings. In addition, the book contains songs not just in Hebrew, but in Yiddish, as well; this was a homage to two Yiddishists who aided in creating the bentsher’s content. The bentsher was first used at their wedding.

SederOneg_3

More here: https://shabb.es/sederonegshabbos/

Watch David accepting the runner up award and talking about the Seder Oneg Shabbos Bentsher on our YouTube channel (clip runs from 5.33 to 7.26): 

David Zvi Kalman was responsible for the book’s design, including the choice of images. He is a doctoral candidate at the University of Pennsylvania, where he focuses on the relationship between Jewish history and the history of technology. Sarah Wolf is a specialist in rabbinics and is an assistant professor at the Jewish Theology Seminary of America. Joshua Schwartz is a doctoral student at New York University, where he studies Jewish mysticism. Sarah and Joshua were responsible for most of the books translations and transliterations. Yocheved and Yudis Retig are Yiddishists and were responsible for the book’s Yiddish content and translations.

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

31 January 2019

BL Labs 2018 Staff Award Winner: 'The Polonsky Foundation England and France Project: Manuscripts from the British Library and the Bibliothèque nationale de France, 700–1200'

Add comment

A guest blog by our colleague Tuija Ainonen, describing the project which won the 2018 BL Labs staff award.

The collections of medieval manuscripts in the British Library and the Bibliothèque nationale de France (BnF) rank amongst the finest and most important in the world. Together they have particularly strong holdings of manuscripts made in France and England before 1200.

Image2_BnF MS Latin 8846  f001v
Scenes from Genesis from the Canterbury or Anglo-Catalan Psalter, Canterbury, 4th quarter of the 12th century: BnF MS Latin 8846, f. 1v.

In the summer of 2016, the two Libraries joined forces to digitise and promote access to 800 medieval manuscripts. 400 manuscripts from each Library were fully digitised in the project that was made possible by generous funding from The Polonsky Foundation. The successful completion of this ground-breaking international project in November 2018 required the collaboration of a large number of specialists from various fields: curators, cataloguers, conservators, and imaging and information technology specialists from both libraries worked together closely through a programme of knowledge exchange and collaborative workshops.

Image3_Cotton MS Caligula A XIV f022r
An angel leads St Peter out of prison, from the Caligula Troper, England, 2nd half of the 11th century: British Library Cotton MS Caligula A XIV, f. 22r.

The result of this collaboration now allows anyone around the world to explore and compare these beautiful and historically important manuscripts, which were previously available principally to scholars using the reading rooms of the British Library in London and the Bibliothèque nationale in Paris and in occasional exhibitions.

Two web resources

All 800 manuscripts are now available on an innovative website hosted by the BnF, France et Angleterre: manuscrits médiévaux entre 700 et 1200. The website allows users to search manuscripts in English, French and Italian, and to view and compare manuscripts side-by-side using International Image Interoperability Framework (IIIF) technology. Images can be annotated and downloaded either as an individual image or as a PDF of an entire manuscript. Annotations can also be downloaded in a data-interchange JSON (JavaScript Object Notation) format and shared.

Image4_Website France et Angleterre
The website France et Angleterre: manuscrits médiévaux entre 700 et 1200 provides full access to all 800 project manuscripts. It is searchable in English, French and Italian.

The team at the British Library tackled the processes required to transform the images and catalogue records into an IIIF format. For example, cataloguing in the Library’s Integrated Archives and Manuscripts System (IAMS) required expansion of the use of the authority files, including a systematic application of International Standard Name Identifier (ISNI) and Getty Thesaurus of Geographic Names (TGN) numbers, all of which facilitated the work toward multilingual search functionality for author and place information. The incorporation of authority files also allowed the author and place information to be present in the IIIF manifests, and thus to be displayed in the IIIF viewer. Collaboration with the Heritage Made Digital team and the Technology department’s Architecture and Design team allowed us to ingest 400 medieval manuscripts in the new IIIF format. In some ways, the project constituted a pilot project paving the way for the continuation of the transformation of the thousands of manuscripts that are currently available through the Digitised Manuscripts website in a pre-IIIF format. The project’s contribution in this respect was both pioneering and transformative.

Image5_Website France et Angleterre MSSCompared
Side-by-side display of three manuscripts from Winchester: British Library Arundel MS 155, f. 12r, BnF Latin 987, f. 31r, and British Library Arundel MS 60, f. 13r, all in same IIIF compatible viewer.

The other innovative resource, the British Library-hosted Medieval England and France, 700–1200 website, presents a curated selection of these manuscripts, highlighting different topics and manuscripts in both English and French. Curiously minded people with a more general interest are able to explore themes such as medieval art, history and science on this website by reading articles and immersing themselves in the beautiful images that showcase some of the most spectacular highlights of the collections. Further, a number of short videos reveal how manuscripts were made and what they can tell us about the cultural exchange between England and France during the early Middle Ages.

Image6_Curated_Website_Medieval_England_and_France
The curated website Medieval England and France, 700–1200 is an online exhibition presenting medieval manuscripts and their significance through videos, articles, and short manuscript descriptions.

How did we do it? Preservation, cataloguing and digitisation

Here at the British Library the digitisation and cataloguing workflows proceeded in tandem for almost two years, and were joined by the web curation workflow for the final year of the project. A conservator checked each manuscript before it was photographed and any necessary preservation work was performed to ensure that all manuscripts could be digitised safely. Two photographers, expert in handling rare manuscript material, worked for a year and a half to produce over 125,000 images. An imaging officer checked each image and processed them for display in two image viewers (Digitised Manuscripts and the IIIF compatible project viewer).

Image7_Polonsky-Team Feb 2018
One stage of the project team (left to right), front row: Emilia Henderson, Alison Ray, Tuija Ainonen, Jessica Pollard; back row: Clarck Drieshen, Carl Norman, Neil McCowlen; furthest back: Cristian Ispir.

All of the manuscripts were newly catalogued to include an up-to-date bibliography, the identification of texts and provenance, and descriptions of the artwork. Soon after the project began, the web curation joined the workflow, involving collaboration with a large number of external contractors, including article authors, a filmmaker, and a translator. Two project interns helped at various stages and gained valuable insight into the workings of a large international digitisation and curation project.

Team at the British Library

The Core team at the British Library (in alphabetical order): Tuija Ainonen (Project Curator and Manager), Calum Cockburn (Intern), Clarck Drieshen (Cataloguer), Andy Irving (Solutions Architect), Cristian Ispir (Cataloguer), Amy Jeffs (Intern), Neil McCowlen (Senior Imaging Technician), Laure Miolo (Cataloguer), Carl Norman (Senior Imaging Technician), Jessica Pollard (Conservator), Alison Ray (Imaging Officer and Curatorial Web Officer), and Kate Thomas (Imaging Officer).

Image1_BL_Labs_Staff_Award_Polonsky_team_winners
From left: David Sparling congratulating Tuija Ainonen, Kate Thomas, Cristian Ispir, Calum Cockburn, and Clarck Drieshen who accepted the BL Labs Staff Award 2018 on behalf of the The Polonsky Foundation England and France Project team at the British Library.

In addition to these, many British Library staff members were involved at various stages of the delivery, and we wish to extend our gratitude to Jo Harrop, Lulu Paul, Mia Ridge, Nicolas Moretto and Sandra Tuppen who helped us to actualise and improve the technical workflows along the way. We wish to thank also Alison Hudson, Chantry Westwell, and Emilia Henderson who provided valuable content and comments at various stages of the project. Special thanks are due to the many staff members at the British Library who had the vision, gave their support and guided the team all along the way, and especially to the members of the internal and joint project boards of (in alphabetical order) Claire Breay, Michele Burton, Paul Clements, Kathleen Doyle, Hannah Gabrielle, Karl Harris, Kristian Jensen, Scot McKendrick, Cordelia Rogerson, and Ben Sanderson.

The project was made possible thanks to the vision and support of The Polonsky Foundation as part of its mission to provide and improve access to our shared cultural heritage.

Tuija Ainonen

Follow us on Twitter @BLMedieval

#PolonskyPre1200

Part of the Polonsky Digitisation Project

In collaboration with 
BNF logo

 

 

Supported by
Polonsky Foundation Logo

 

 

 

Watch the Polonsky England and France Project team receiving their award and talking about their project on our YouTube channel (clip runs from 15:05):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital cotent in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

28 January 2019

BL Labs 2018 Teaching & Learning Award Winner: 'Pocket Miscellanies'

Add comment

This guest blog is by the 2018 BL Labs Teaching & Learning Award winner, Jonah Coman.

Pocket Miscellanies were born as a response to a cluster of problems posed by digitisation and access to medieval content. Medieval images are rarely seen by non-medievalists and members of the general public outside of meme-based content. Offline and analog, the medievalist has no freely-available tools to educate or illustrate to a non-specialist what their research is about. The digital and physical zines showcase close-reading snippets of the digitised medieval manuscripts held by the British Library, as well as over 70 other institutions.

PocketMisc fig 1

Figure 1. Leather binder with the first ten issues of Pocket Miscellanies. Photo © Eleanor May Baker.

Teaching and learning resource

The Pocket Miscellany choice of topics was selected to showcase the diversity of human representation in medieval manuscripts. This project is as political as it is educational. The first ten little volumes (#1 Adam, #2 Eve, #3 Temptation by the Snake, #4 Sex, #5 Sodom, #6 Trans bodies, #7 People of colour, #8 Racism #9 Disability and #10 Mobility aids) set up the political project of this ongoing collection, concentrating on disenfranchised communities, such as people of colour, LGBTQ people and disability in medieval visual culture. To date, there are ten published zines, but the project is expanded to include over 80 topics to be gradually released in the future.

The Pocket Miscellanies are distributed both online and offline as pocket-sized concertina books (usually distributed as collections), so that learners from different communities outwith most obvious user groups (researchers, teachers, educators) gain access to digital content provided by national, regional and university libraries with comprehensive medieval digital content.

Publication DIY: online and offline

From a feminist medievalist position, the format of the zine was the obvious choice for distributable political scholarship. Zines (short from magazines) are DIY radical publications that elide strictures of book publishing. Zine distribution models rely on sharing via social interaction: a zine can be a reminder of a discussion or political statement. Zines democratise knowledge that mainstream works might be afraid to tackle, or might be suppressed by mainstream publication systems concerned with sales rather than radical ideas. The small, folded formats native to zines are also reminiscent to the materiality and physical formats of medieval and early modern books created for English readers, such as the Sarum books of hours and the folding almanac.

The Pocket Miscellanies have two pathways to impact: the digital version has been shared with medievalist and historian teachers and educators via the Issuu publication platform, garnering nearly a thousand unique readers in the months they have been online. The paper copy, of very small size, can and was distributed at conferences (Bodies Ignored in Leeds, Permeable Bodies in London), other public events (Edinburgh Pride, Glasgow and Dundee Zine Fest, Edinburgh book art and comic book conventions) and to non-specialists in casual conversation. Over 3000 paper copies were printed and distributed for free since August. Both of these impact pathways have the advantage of accessibility - they are quick-and-dirty guides for non-specialists to learn about the most common depictions of a specific motif – as well as a history within DIY teaching community.

PocketMisc fig 2

Figure 2. Poster and zine display at the BL Labs Symposium, 11 Nov 2018. Photo © Ash.

The online version of the zines links to the digitised source hosted on the library’s own website, and is easily editable/correctable. After the initial publication of the online zines. Due to their digital form, each individual zine is permanently undergoing improvement via the open loop of online feedback and consumption facilitated by Twitter and Issuu. I use crowd-sourced information about the specific themes and amended the content to reflect spearheading scholarship in the field - information that has not been published yet, nor, sometimes, may be published in the future. This way, state-of-the-art research can be integrated in a quick publication and distribution circuit. 

PocketMisc fig 3

Figure 3. Screenshot from the Issuu.com/MxComan online library.

The paper copies are easily distributable in offline, analog spaces and provide a physical token of the learning experience. I use an independent publishing method historically widespread in queer communities, the zine, to create an analog version to 'viral content'. Zines are bricolage-fuelled, cheaply-printed, freely distributed and easily discarded methods of teaching and information. Using the independent publishing medium of the zine I created small chapbooks that can be printed at home, mixed and shared, carried in a pocket and left in community spaces and flier racks.

PocketMisc fig 4

Figure 4. A bundle of the ten original zines. Photo (c) Ana Hine.

Rip-and-mix: how copyright can the enemy of knowledge

Working with digitised content from tens of libraries across the world has proved frustrating because of the diversity of copyright policies. Modern libraries and research centres have a lot of power as gatekeepers of historical material. Texts and images that would be long out of copyright (virtually anything produced in the middle ages) is protected by many institutions under copy rights, prohibiting (esp. commercial) reproduction. This affects what images researchers choose to present to wider public; most academic publications will never be able to include the amount of colour illustrations that the self-published zine format allows. The collaborative and radical DIY ethics of zine-making allows Pocket Miscellanies to be a disruptive alternative to mainstream publication industry, bringing cutting-edge research in print (and full-colour illustration) right now, at very small costs and an extremely agile pace.

The whole issue of copyright is where zines have been historically and still are so radical. Reproduction rights are different than publication rights; strict reproduction and redistribution rights are essentially violated by any dissemination of an image anywhere else but on its origin website. Attaching a ‘medieval reaction’ image to a tweet or Facebook post, as well as pining it on a Pinterest board, are essentially in violation with the most museums’ and auction houses’ extremely strict CC-BY-NC-ND+ rules. On the other side, 'publication' rights are eschewed by zines since, technically, zines are not publications. Unlike magazines, journals or books, zines do not have ISBNs, cannot count towards REFs etc so are essentially outlaws in terms of publication rights. Unlike mainstream publications, zines are predicated on anarchist, bootleg, rip-and-mix aesthetic.

The Pocket Miscellany zines posed hard choices: do I follow the anarchic, disruptive and historically radical tradition of the zine, and use any digitised image that I can find, disregarding the copyright statements and challenging the hegemonical hold institutions have over historical images via aberrant legalities, or do I create a series of zines only with images obtained by legitimate venues, choosing academic strictures for the advantage of being able to share them far and wide without breaking copyright terms? In the end, the content of the zines, showing collections of the same visual motif in a context of continuity, dictated my choice: having as varied examples of one image as possible was more important than being able to sell these zines in bookshops and gift-shops. At the same time, I chose to only use images that are ok to use in a non-commercial capacity, so none from libraries with ‘non-derivatives’ policies. These choices (half-punk, half-tame) made selling these zines in any form and at any price point impossible, so their production relies on donations

The Pocket Miscellanies are an ongoing project. As I mentioned, I have over 80 topics planned, and half a dozen collaborations in the works. If you would want to share your expertise on a specific topic, please get in touch via Twitter @MxComan; if you want to support the project, as well as get your hands on some paper goodies, you can do so on Patreon. If you are organising a conference and you want to distribute any of the zines related to the conference, or even better, have me deliver an impact, public engagement and zine-making workshop at your conference, get in touch and we can discuss it further.

Watch Jonah receiving the winning award for Teaching and Learning, and talking about Pocket Miscellanies on our YouTube channel (clip runs from 10.32):

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.