THE BRITISH LIBRARY

Digital scholarship blog

95 posts categorized "Projects"

21 February 2018

BL Labs 2017 Symposium: Opening up the British Library’s Early Indian Printed Books Collection (Staff Award Winner)

Add comment

Making the British Library’s valuable collection of early Bengali books more accessible to researchers and the general public around the world rests heavily on the collaborative work undertaken across different teams of the library and partners in the UK and abroad. The commitment and passion of the project team has relied on the contribution and expertise of collaborators, as well as the forward thinking vision of the library, partners and fundraisers.

Receiving the BL Labs Staff Award 2017 is a great opportunity to thank everyone involved. 

Members of the Two Centuries of Indian Print team receiving the British Library Labs award at the Symposium on 30th October.
Members of the Two Centuries of Indian Print team receiving the British Library Labs award at the Symposium on 30th October 2017
 
Tom Derrick (Digital Curator) was in India at the same time the team received their Award.
Tom Derrick (Digital Curator) was in India at the same time the team received their Award

The Two Centuries of Indian Print project is a partnership between the British Library, the School of Cultural Texts and Records (SCTR) at Jadavpur University, Srishti Institute of Art, Design and Technology, and the Library at SOAS University of London, among others. It has also involved collaborations with the National Library of India, and other institutions in India.

The AHRC Newton-Bhabha Fund and the Department for Business, Energy and Industrial Strategy have generously funded the work undertaken so far by the project, focusing on early printed Bengali books. Many are unavailable in other library collections or are extremely difficult to locate and access. The project has undertaken a variety of initiatives from the digitisation of books and enhancement of the catalogue records in English and Bengali, to stimulating the use of digital humanities tools and techniques, running a programme of digital skills sharing and capacity building workshops, and hosting the South Asia Series seminars. All of these initiatives greatly contribute to the discovery and study of the collection. The project is also conducting ground breaking work in finding a solution to Optical Character Recognition (OCR) in Bangla script. OCR is not available for South Asian languages currently and harnessing viable Optical Character Recognition technology would enable full text search of the books, paving the way for researchers to use natural language processing techniques to perform large scale analysis across a large corpus of text covering a diverse range of topics relating to Indian society, religion, and politics to name but a few. Doing so will increase the possibilities for new discoveries in this academic field. 

However, despite its status as one of the most widely spoken languages in the world, Bangla script has been greatly underserved by providers of OCR solutions. This is due in part to the orthographical and typographical variances that have taken place in recent centuries that make building a dictionary and character ‘classifier’ more challenging. Due to the wide date range of the books we are digitising, these issues affect the quality of OCR. The physical condition of our historical books, including faded text, presents additional difficulties for creating machine readable versions of the books. 

To overcome these obstacles, the project team has been advancing the development of OCR for Bangla through the organisation of an international competition which reviewed the state-of-the-art in commercial and open source text recognition tools. The results of the competition will be announced at the ICDAR 2017 conference in Kyoto later this month. Watch this space! The competition dataset has been made openly available for download and reuse for any researchers or institutions who would like to experiment with OCR for Bengali.

A page from the Animal Biographies, VT 1712 showing its transcription produced for the ICDAR 2017 competition
A page from the Animal Biographies, VT 1712 showing its transcription produced for the ICDAR 2017 competition

The project has organised two Skills Exchange Programmes, hosting mid-career Library professionals from the the National Library of India at the British Library for a week, providing a packed programme of tours and talks from all areas of the Library. The project has also conducted digital skills sharing and capacity building workshops for library professionals and archivists from cultural heritage institutions in India. The first workshop took place at Jadavpur University, Kolkata, in December 2016. Library and information professionals from cultural heritage institutions in Bengal took part in a one-day event to learn more about how information technology is transforming humanities research today and in turn Library services, as well as the methods for interrogating humanities-related datasets.

Afterthe success of this first workshop another event was held in July 2017, at which more than 30 library professionals discussed OCR developments for Bangla, trying out different tools and discussing digital scholarship techniques and projects. Most recently, the project’s digital curator facilitated a workshop around Digitisation Standards at the International Conference of Asian Libraries in Delhi. The workshops continue in earnest in the new year with another digital humanities skills workshop planned for January 2018 to be held in partnership with the Srishti Institute of Art, Design, and Technology.

Attendees of the workshop held at Jadavpur University in December 2016 taking part in a group activity to discuss the application of digital humanities methods to library collections
Attendees of the workshop held at Jadavpur University in December 2016 taking part in a group activity to discuss the application of digital humanities methods to library collections

The Project Team also held a two day Academic Symposium on South Asian book history at Jadavpur University in the summer, with 17 speakers from India, wider South Asia, and the UK. Attendance was between 50-70 people a day and feedback was very good.  We plan to have a publication arising from this Symposium, and to upload a video to our project webspace. The project also hosts a popular series of talks based around the Two Centuries of Indian Print project and the British Library’s South Asia collections. The seminars take place fortnightly at the British Library. So far we have hosted a range of academics and researchers, from PhD students to senior academics from the UK and abroad, who share cutting-edge research with discussion chaired by curators and specialists in the field. The seminars have been a great success attracting large attendances and speakers from around the world. We also host a number of show and tells of our material to raise awareness for our collection and to engage in community outreach.

Everyone on the project is thrilled to have won this award and we will be working hard in 2018 to continue bringing the Two Centuries of Indian Print project to the attention and use of researchers and the general public.

Submit a project for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.

Posted by BL Labs on behalf of The Two Centuries of Indian Print team.

15 February 2018

BL Labs 2017 Symposium: Git Lit, Learning & Teaching Award Runner Up

Add comment

Applications of Distributed Version Control Technologies Toward the Creation of 50,000 Digital Scholarly Editions

The British Library maintains a collection of roughly 50,000 digital texts, scanned from public-domain books, most of which were originally published in the 19th century. As scanned books, their text format is Analyzed Layout and Text Object (ALTO) Extensible Markup Language (XML), a verbose markup format created by Optical Character Recognition (OCR) software, and one which is only marginally human-readable. Our project, Git-Lit, converts each text to the plain text format Markdown, creates version-controlled repositories for each using the distributed version control system Git, and posts the repositories to the project management platform GitHub, where they can be edited by anyone. Along the way, websites for each text, optimized for readability, are automatically generated via GitHub Pages. These websites integrate with the annotation platform Hypothes.is, enabling them to be annotated. In this way, Git-Lit aims to make this collection of British Library electronic texts discoverable, readable, editable, annotatable, and downloadable.

A Screenshot of the Website Automatically Generated from the British Library Electronic Text
A Screenshot of the Website Automatically Generated from the British Library Electronic Text


The biggest advantage of using a distributed version control system like Git is that it leverages the kinds of decentralized collaboration workflows that have long been in use in software development. Open-source software and web development, for which Git and GitHub were originally designed, is a much-studied methodology, long proven to be more effective than closed-source methods. Rather than maintain a central silo for serving code and electronic texts, the decentralized approach ensures a plurality of textual versions. Since anyone may copy ("fork") a project, modify it, and create their own version, there is no one central, canonical text, but many. Each version may freely borrow ("pull") from others, request that others integrate their changes ("pull request"), and discuss potential changes ("issues") using the project management subsystems of GitHub. This workflow streamlines collaboration, and encourages external contributions. Furthermore, since each change ("commit") requires a description of the commit, and reasons for it, the Git platform enforces the kind of editorial documentation necessary for scholarly editing. We like to think of git-based editing, therefore, as scholarly editing, and GitHub-based collaboration as a democratization of scholarly editing.

Furthermore, since GitHub allows instant editing of texts in the web browser, it is a simple and intuitive method of crowdsourcing the text cleanup process. Since OCRd texts are often full of errors, GitHub allows any reader to correct an obvious OCR error she or he finds. The analogous process of reporting errors to centralized text repositories like Project Gutenberg has been known to take several years. On GitHub, however, it is instantaneous.

Not the least advantage of this setup is the automated creation of websites from the plain text sources. Not only does this transform the markdown to a clean, readable edition of the text, but it provides integration with the annotation platform Hypothes.is. Hypothes.is allows for social annotation of a text, making it ideal for classroom use. Professors may assign a British Library text as a course reading, and may require their students annotate it, an activity which can generate discussions in the limitless virtual margins of this electronic textual space.

The Git-Lit project has so far posted around 50 texts to GitHub, as prototypes, with the full corpus of roughly 50,000 texts soon to come. After the full corpus is processed in this way, we'll begin enhancing some of the metadata. So far, we have developed techniques for probabilistically inferring the language of each text, and using Ben Schmidt's document vectorization method, Stable Random Projection, we have been able to probabilistically infer Library of Congress classifications, as well. This enables the automatic generation of sub-corpora like PR (British Literature), or PZ (American Literature).

In the coming year, we hope to integrate the Git-Lit transformed British Library texts into a structured database, further enhancing the discoverability of its texts. We have just received a micro-grant from NYC-DH to help launch Corpus-DB, a project also aiming to produce textual corpora, and through Corpus-DB, we will soon create a SQL database containing the metadata, our enhanced and inferred metadata, and other aggregated book data gleaned from public APIs. This will soon allow readers and computational text analysts the ability to download groups of British Library electronic texts. Users interested in downloading, say, all novels set in London, will be able to get a complete full-text dump of all public-domain novels in this category by visiting a URL such as api.corpus-db.org/novels/setting/London. We expect that this will greatly streamline the corpus creation process that takes up so much of the time of a computational text analysis.

Both Git-Lit and Corpus-DB are open-source projects, open to contributions from anyone, regardless of skill. If you'd like to contribute to our project in some way, get in contact with us, and we'll tell you how you can help.

Jonathan Reeve
Jonathan Reeve

Jonathan Reeve is a third-year graduate student in the Department of English and Comparative Literature at Columbia University, where he specializes in computational literary analysis. Find his recent experiments at jonreeve.com.

If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library to find out who wins.

Posted by BL Labs on behalf of Jonathan Reeve

13 February 2018

BL Labs 2017 Symposium: Samtla, Research Award Runner Up

Add comment

Samtla (Search And Mining Tools for Labelling Archives) was developed to address a need in the humanities for research tools that help to search, browse, compare, and annotate documents stored in digital archives. The system was designed in collaboration with researchers at Southampton University, whose research involved locating shared vocabulary and phrases across an archive of Aramaic Magic Texts from Late Antiquity. The archive contained texts written in Aramaic, Mandaic, Syriac, and Hebrew languages. Due to the morphological complexity of these languages, where morphemes are attached to a root morpheme to mark gender and number, standard approaches and off-the-shelf software were not flexible enough for the task, as they tended to be designed to work with a specific archive or user group. 

Figure1
Figure 1: Samtla supports tolerant search allowing queries to be matched exactly and approximately. (Click to enlarge image)

  Samtla is designed to extract the same or similar information that may be expressed by authors in different ways, whether it is in the choice of vocabulary or the grammar. Traditionally search and text mining tools have been based on words, which limits their use to corpora containing languages were 'words' can be easily identified and extracted from text, e.g. languages with a whitespace character like English, French, German, etc. Word models tend to fail when the language is morphologically complex, like Aramaic, and Hebrew. Samtla addresses these issues by adopting a character-level approach stored in a statistical language model. This means that rather than extracting words, we extract character-sequences representing the morphology of the language, which we then use to match the search terms of the query and rank the documents according to the statistics of the language. Character-based models are language independent as there is no need to preprocess the document, and we can locate words and phrases with a lot of flexibility. As a result Samtla compensates for the variability in language use, spelling errors made by users when they search, and errors in the document as a result of the digitisation process (e.g. OCR errors). 

Figure2
Figure 2: Samtla's document comparison tool displaying a semantically similar passage between two Bibles from different periods. (Click to enlarge image)

 The British Library have been very supportive of the work by openly providing access to their digital archives. The archives ranged in domain, topic, language, and scale, which enabled us to test Samtla’s flexibility to its limits. One of the biggest challenges we faced was indexing larger-scale archives of several gigabytes. Some archives also contained a scan of the original document together with metadata about the structure of the text. This provided a basis for developing new tools that brought researchers closer to the original object, which included highlighting the named entities over both the raw text, and the scanned image.

Currently we are focusing on developing approaches for leveraging the semantics underlying text data in order to help researchers find semantically related information. Semantic annotation is also useful for labelling text data with named entities, and sentiments. Our current aim is to develop approaches for annotating text data in any language or domain, which is challenging due to the fact that languages encode the semantics of a text in different ways.

As a first step we are offering labelled data to researchers, as part of a trial service, in order to help speed up the research process, or provide tagged data for machine learning approaches. If you are interested in participating in this trial, then more information can be found at www.samtla.com.

Figure3
Figure 3: Samtla's annotation tools label the texts with named entities to provide faceted browsing and data layers over the original image. (Click to enlarge image)

 If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.


Posted by BL Labs on behalf of Dr Martyn Harris, Prof Dan Levene, Prof Mark Levene and Dr Dell Zhang.

05 February 2018

8th Century Arabic science meets today's computer science

Add comment

Or, Announcing a Competition for the Automatic Transcription of Historical Arabic Scientific Manuscripts 

“An impartial view of Digital Humanities (DH) scholarship in the present day reveals a stark divide between ‘the West and the rest’…Far fewer large-scale DH initiatives have focused on Asia and the non-Western world than on Western Europe and the Americas…Digital databases and text corpora – the ‘raw material’ of text mining and computational text analysis – are far more abundant for English and other Latin alphabetic scripts than they are for Chinese, Japanese, Korean, Sanskrit, Hindi, Arabic and other non-Latin orthographies…Troves of unread primary sources lie dormant because no text mining technology exists to parse them.”

-Dr. Thomas Mullaney, Associate Professor of Chinese History at Stanford University

Supporting the use of Asian & African Collections in digital scholarship means shining a light on this stark divide and seeking ways to close the gap. In this spirit, we are excited to announce the ICFHR2018 Competition on Recognition of Historical Arabic Scientific Manuscripts.

Add MS 7474_0043.script

The Competition

Drawing together experts from British Library, The Alan Turing Institute, Qatar Digital Library and PRImA Research Lab, our aim in launching this competition is to play an active roll in advancing the state-of-the-art in handwritten text recognition technologies for Arabic. For our first challenge we are focussing on finding an optimal solution for accurately and automatically transcribing historical Arabic scientific handwritten manuscripts.

Though such technologies are still in their infancy, unlocking historical handwritten Arabic manuscripts for large-scale text analysis has the potential to truly transform research. In conjunction with the competition we hope to build and make freely open and available a substantial image and ground truth dataset to support continued efforts in this area. 

Enter the Competition

Organisers

Apostolos Antonacopoulos Professor of Pattern Recognition, University of Salford and Head of (PRImA) research lab 
Christian Clausner Research Fellow at the Pattern Recognition and Image Analysis (PRImA) research lab  
Nora McGregor Digital Curator at British Library, Asian & African Collections
Daniel Lowe Curator at British Library, Arabic Collections
Daniel Wilson-Nunn, PhD student at University of Warwick & Turing PhD Student based at Alan Turing Institute 
• Bink Hallum, Arabic Scientific Manuscripts Curator at British Library/Qatar Foundation Partnership 

Further reading

For more on recent Digital Research Team text recognition and transcription projects see:

 

This post is by Nora McGregor, Digital Curator, British Library. She is on twitter as @ndalyrose

01 February 2018

BL Labs 2017 Symposium: A large-scale comparison of world music corpora with computational tools, Research Award Winner

Add comment

A large-scale comparison of world music corpora with computational tools.

By Maria Panteli, Emmanouil Benetos, and Simon Dixon from the Centre for Digital Music, Queen Mary University of London

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We combine music recordings from two archives, the Smithsonian Folkways Recordings and British Library Sound Archive, to create one of the largest world music corpora studied so far (8200 geographically balanced recordings sampled from a total of 70000 recordings). This work was submitted for the 2017 British Library Labs Awards - Research category.

Our aim is to explore relationships of music similarity between different parts of the world. The history of cultural exchange goes back many years and music, an essential cultural identifier, has travelled beyond country borders. But is this true for all countries? What if a country is geographically isolated or its society resisted external musical influence? Can we find such music examples whose characteristics stand out from other musics in the world? By comparing folk and traditional music from 137 countries we aim to identify geographical areas that have developed a unique musical character.

Maria Panteli fig 1

Methodology: Signal processing and machine learning methods are combined to extract meaningful music representations from the sound recordings. Data mining methods are applied to explore music similarity and identify outlier recordings.

We use digital signal processing tools to extract music descriptors from the sound recordings capturing aspects of rhythm, timbre, melody, and harmony. Machine learning methods are applied to learn high-level representations of the music and the outcome is a projection of world music recordings to a space respecting music similarity relations. We use data mining methods to explore this space and identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’ and study their geographical patterns. More details on the methodology are provided here.

 

  Maria Panteli fig 2

 

Distribution of outliers per country: The colour scale corresponds to the normalised number of outliers per country, where 0% indicates that none of the recordings of the country were identified as outliers and 100% indicates that all of the recordings of the country are outliers.

We observed that out of 137 countries, Botswana had the most outlier recordings compared to the rest of the corpus. Music from China, characterised by bright timbres, was also found to be relatively distinct compared to music from its neighbouring countries. Analysis with respect to different features revealed that African countries such as Benin and Botswana, indicated the largest amount of rhythmic outliers with recordings often featuring the use of polyrhythms. Harmonic outliers originated mostly from Southeast Asian countries such as Pakistan and Indonesia, and African countries such as Benin and Gambia, with recordings often featuring inharmonic instruments such as the gong and bell. You can explore and listen to music outliers in this interactive visualisation. The datasets and code used in this project are included in this link.

Maria Panteli fig 3

Interactive visualisation to explore and listen to music outliers.

This line of research makes a large-scale comparison of recorded music possible, a significant contribution for ethnomusicology, and one we believe will help us understand better the music cultures of the world.

Posted by British Library Labs.

 

23 January 2018

Using Transkribus for handwritten text recognition with the India Office Records

Add comment

In this post, Alex Hailey, Curator, Modern Archives and Manuscripts, describes the Library's work with handwritten text recognition.

National Handwriting Day seems like a good time to introduce the Library’s initial work with the Transkribus platform to produce automatic Handwritten Text Recognition models for use with the India Office Records.

Transkribus is produced and supported as part of the READ project, and provides a platform 'for the automated recognition, transcription and searching of historical documents'. Users upload images and then identify areas of writing (text regions) and lines within those regions. Once a page has been segmented in this way, users transcribe the text to produce a 'ground truth' transcription – an accurate representation of the text on the page. The ground truth texts and images are then used to train a recurrent neural network to produce a tool to transcribe texts from images: a Handwritten Text Recognition (HTR) model.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-01-22/8f108ba6-3247-429a-995c-6db42a4d3d7f.png
Page segmented using the automated line identification tool. The document structure tree can be seen in the left panel.

After hearing about the project at the Linnean Society’s From Cabinet to Internet conference in 2015, we decided to run a small pilot project using material digitised as part of the Botany in British India project.

Producing ground truth text and Handwritten Text Recognition (HTR) models

We created an initial set of ground truth training data for 200 images, produced by India Office curators and with the help of a PhD student. This data was sent to the Transkribus team to produce our first HTR model. We also supplied material for the construction of a dictionary to be used alongside the HTR, based on the text from the botany chapter of Science and the Changing Environment in India 1780-1920 and contemporary botanical texts.

The accuracy of an HTR model can be determined by generating an automated transcription, correcting any errors, and then comparing the two versions. The Transkribus comparison tool calculates a Character Error Rate (CER) and a Word Error Rate (WER), and also provides a handy visualisation. With our first HTR model we saw an average CER of 30% and WER of 50%, which reflected the small size of the training set and the number of different hands across the collections.

(Transkribus recommends using collections with one or two consistent hands, but we thought we would push on regardless to get an idea of the challenges when using complex, multi-authored archives).

Doc18776img16
WER and CER are quite unforgiving measures of accuracy. The image above has 18.5% WER and 9.5% CER

For our second model we created an additional 500 pages of ground truth text, resulting in a training set of 83,358 words over 14,599 lines. We saw a marked improvement in results with this second HTR model – an average WER of 30%, and CER of 15%.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2018-01-22/a59e02fd-b126-424b-97c8-57aa42172c10.png
Graph showing the learning curve for our second HTR model, measured in CER

Improvements in the automatic layout detection and the ability to run the HTR over images in batch means that we can now generate ground truth more quickly by correcting computer-produced transcriptions than we could through a fully-manual process. We have since generated and corrected an additional 200 pages of transcriptions, and have expanded the training dataset for our next HTR model.

Lessons learned and next steps

We have now produced over 800 pages of corrected transcriptions using Transkribus, and have a much better idea of the challenges that the India Office material poses for current HTR technologies. Pages with margins and inconsistent paragraph widths prove challenging for the automatic layout detection, although the line identification has improved significantly, and tends to require only minor corrections (if any). Faint text, numerals, and tabulated text appeared to pose problems for our HTR models, as did particularly elaborate or lengthy ascenders and descenders.

More positively, we have signed a Memorandum of Understanding with the READ project, and are now able to take part in the exciting conversations around the transcription and searching of digitised manuscript materials, which we can hopefully start to feed into developments at the Library. The presentations from the recent Transkribus Conference are a good place to start if you want to learn more.

The transcriptions will be made available to researchers via data.bl.uk, and we are also planning to use them to test the ingest and delivery of transcriptions for manuscript material via the Universal Viewer.

By Alex Hailey, Curator, Modern Archives and Manuscripts

If you liked this post, you might also be interested in The good, the bad, and the cross-hatched on the Untold Lives blog.

17 January 2018

BL Labs 2017 Symposium: Keynote Talk by Josie Fraser

Add comment

The fifth annual British Library Labs Symposium kicked off with an inspiring keynote speech by Josie Fraser, entitled ‘Open, Digital, Inclusive: Unleashing Knowledge’.

As well as working as senior technology adviser within the National Technology Team at the UK Government's Department for Digital, Culture, Media and Sport Josie is currently the Chair of Wikimedia UK.

Josie discussed the impact of the open knowledge movement on education and learning. She looked at the powerful role that Wikimedia UK and Wikimedians have played in bringing UK cultural institutions and their digital collections to new and wider audiences. Her talk also explored how open knowledge partnerships are driving diversity and better representation for all online. At the end, she took questions from the audience and invited them to join her in exploring ideas and opportunities for the future.

You can see a video the full talk, with an introduction by Dr Adam Farquhar, Head of Digital Scholarship at the British Library, here:

You can follow this link to see her slides:

 Josie slide 1

 https://www.slideshare.net/labsbl/open-digital-inclusive-unleashing-knowledge

The sixth BL Labs Symposium will be on the 12th November 2018.

Posted by Eleanor Cooper, Project Officer BL Labs.

22 December 2017

All I want for Christmas is... playbills!

Add comment

Digital Curator Mia Ridge with an update on our playbills crowdsourcing project (with apologises to Mariah Carey for the dodgy headline)...

What do you do once you've eaten all the chocolates and cheese and watched all the Christmas movies? If you haven't had a go at transcribing historic playbills yet, the holidays are a great time to start.

Home, Sweet Home from: A collection of playbills from miscellaneous theatres: Nottingham - Oswestry 1755-1848 Collection Item, ([British Isles]: s.n.], 1755-1848.) <http://access.bl.uk/item/viewer/ark:/81055/vdc_100022589132.0x000002>

As 2017 turns into 2018, we thought it was time for an update on our progress with In the Spotlight. We've had over 20,000 contributions from over 2,000 visitors from 61 countries. Together, they've completed 21 sets of tasks on individual volumes - a wonderful result. We're still analysing it but the transcribed data looks good so far. Our next step is agreeing the details of including the results in the Library's catalogue - once that's done, information from individual playbills will be searchable for the first time.

Since the project launched in early November we've had some fantastic feedback, questions and comments on our forum and on social media. For example, Sylvia Morris @sylvmorris1 has written two blog posts, International Migrants Day: Ira Aldridge and theatre and British Library project enlists public to transcribe historical playbills.Twitter users like @e_stanf shared fantastic images they'd discovered, and we even made The Stage and the Russian media! Look out for more updates and blog posts from project participants in the new year.

Questions from our participants include a request from a PhD student to collect references to plays set at fairs. A question about plays being 'for the benefit of' led to the Wikipedia entry for 'benefit performances' being updated with one of our images. Share your curiosities and questions on our forum or twitter - we love hearing from you!

We haven't forgotten about Convert-a-Card in the excitement of launching In the Spotlight. Since launch, this project for digitising information from old card catalogues has had over 33,000 contributions. Early in the new year, we'll be adding a thousand new records to the Library's catalogue. Our thanks to everyone who's made a contribution.

So if you're looking for entertainment these holidays, we invite you to step Into the Spotlight at http://playbills.libcrowds.com and discover how people entertained themselves before Netflix!