THE BRITISH LIBRARY

Digital scholarship blog

35 posts categorized "Printed books"

03 October 2019

BL Labs Symposium (2019): Book your place for Mon 11-Nov-2019

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the seventh annual British Library Labs Symposium will be held on Monday 11 November 2019, from 9:30 - 17:00* (see note below) in the British Library Knowledge Centre, St Pancras. The event is FREE, and you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early!

*Please note, that directly after the Symposium, we have teamed up with an interactive/immersive theatre company called 'Uninvited Guests' for a specially organised early evening event for Symposium attendees (the full cost is £13 with some concessions available). Read more at the bottom of this posting!

The Symposium showcases innovative and inspiring projects which have used the British Library’s digital content. Last year's Award winner's drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

The annual event provides a platform for the development of ideas and projects, facilitating collaboration, networking and debate in the Digital Scholarship field as well as being a focus on the creative reuse of the British Library's and other organisations' digital collections and data in many other sectors. Read what groups of Master's Library and Information Science students from City University London (#CityLIS) said about the Symposium last year.

We are very proud to announce that this year's keynote will be delivered by scientist Armand Leroi, Professor of Evolutionary Biology at Imperial College, London.

Armand Leroi
Professor Armand Leroi from Imperial College
will be giving the keynote at this year's BL Labs Symposium (2019)

Professor Armand Leroi is an author, broadcaster and evolutionary biologist.

He has written and presented several documentary series on Channel 4 and BBC Four. His latest documentary was The Secret Science of Pop for BBC Four (2017) presenting the results of the analysis of over 17,000 western pop music from 1960 to 2010 from the US Bill Board top 100 charts together with colleagues from Queen Mary University, with further work published by through the Royal Society. Armand has a special interest in how we can apply techniques from evolutionary biology to ask important questions about culture, humanities and what is unique about us as humans.

Previously, Armand presented Human Mutants, a three-part documentary series about human deformity for Channel 4 and as an award winning book, Mutants: On Genetic Variety and Human Body. He also wrote and presented a two part series What Makes Us Human also for Channel 4. On BBC Four Armand presented the documentaries What Darwin Didn't Know and Aristotle's Lagoon also releasing the book, The Lagoon: How Aristotle Invented Science looking at Aristotle's impact on Science as we know it today.

Armands' keynote will reflect on his interest and experience in applying techniques he has used over many years from evolutionary biology such as bioinformatics, data-mining and machine learning to ask meaningful 'big' questions about culture, humanities and what makes us human.

The title of his talk will be 'The New Science of Culture'. Armand will follow in the footsteps of previous prestigious BL Labs keynote speakers: Dan Pett (2018); Josie Fraser (2017); Melissa Terras (2016); David De Roure and George Oates (2015); Tim Hitchcock (2014); Bill Thompson and Andrew Prescott in 2013.

The symposium will be introduced by the British Library's new Chief Librarian Liz Jolly. The day will include an update and exciting news from Mahendra Mahey (BL Labs Manager at the British Library) about the work of BL Labs highlighting innovative collaborations BL Labs has been working on including how it is working with Labs around the world to share experiences and knowledge, lessons learned . There will be news from the Digital Scholarship team about the exciting projects they have been working on such as Living with Machines and other initiatives together with a special insight from the British Library’s Digital Preservation team into how they attempt to preserve our digital collections and data for future generations.

Throughout the day, there will be several announcements and presentations showcasing work from nominated projects for the BL Labs Awards 2019, which were recognised last year for work that used the British Library’s digital content in Artistic, Research, Educational and commercial activities.

There will also be a chance to find out who has been nominated and recognised for the British Library Staff Award 2019 which highlights the work of an outstanding individual (or team) at the British Library who has worked creatively and originally with the British Library's digital collections and data (nominations close midday 5 November 2019).

As is our tradition, the Symposium will have plenty of opportunities for networking throughout the day, culminating in a reception for delegates and British Library staff to mingle and chat over a drink and nibbles.

Finally, we have teamed up with the interactive/immersive theatre company 'Uninvited Guests' who will give a specially organised performance for BL Labs Symposium attendees, directly after the symposium. This participatory performance will take the audience on a journey through a world that is on the cusp of a technological disaster. Our period of history could vanish forever from human memory because digital information will be wiped out for good. How can we leave a trace of our existence to those born later? Don't miss out on a chance to book on this unique event at 5pm specially organised to coincide with the end of the BL Labs Symposium. For more information, and for booking (spaces are limited), please visit here (the full cost is £13 with some concessions available). Please note, if you are unfortunate in not being able to join the 5pm show, there will be another performance at 1945 the same evening (book here for that one).

So don't forget to book your place for the Symposium today as we predict it will be another full house again and we don't want you to miss out.

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

02 October 2019

The 2019 British Library Labs Staff Award - Nominations Open!

Add comment

Looking for entries now!

A set of 4 light bulbs presented next to each other, the third light bulb is switched on. The image is supposed to a metaphor to represent an 'idea'
Nominate a British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2019 British Library Labs Staff Award, now in its fourth year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data.

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself (if you are a member of staff), for the Staff Award using this form.

The deadline for submission is 12:00 (BST), Tuesday 5 November 2019.

Nominees will be highlighted on Monday 11 November 2019 at the British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects.

You can see the projects submitted by members of staff for the last two years' awards in our online archive, as well as blogs for last year's winners and runners-up.

The Staff Award complements the British Library Labs Awards, introduced in 2015, which recognise outstanding work that has been done in the broader community. Last year's winner focused on the brilliant work of the 'Polonsky Foundation England and France Project: Digitising and Presenting Manuscripts from the British Library and the Bibliothèque nationale de France, 700–1200'.

The runner up for the BL Labs Staff Award last year was the 'Digital Documents Harvesting and Processing Tool (DDHAPT)' which was designed to overcome the problem of finding individual known documents in the United Kingdom's Legal Deposit Web Archive.

In the public competition, last year's winners drew attention to artistic, research, teaching & learning, and commercial activities that used our digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It was previously funded by the Andrew W. Mellon Foundation and is now solely funded by the British Library.

If you have any questions, please contact us at labs@bl.uk.

 

30 August 2019

Using Transkribus for automated text recognition of historical Bengali Books

Add comment

In this post Tom Derrick, Digital Curator, Two Centuries of Indian Print, explains the Library's recent use of Transkribus for automated text recognition of Bengali printed books.

Are you working with digitised printed collections that you want to 'unlock' for keyword search and text mining? Maybe you have already heard about Transkribus but thought it could only be used for automated recognition of handwritten texts. If so you might be surprised to hear it also does a pretty good job with printed texts too. You might be even more surprised to hear it does an impressive job with printed texts in Indian scripts! At least that is what we have found from recent testing with a batch of 19th century printed books written in Bengali script that have been digitised through the British Library’s Two Centuries of Indian Print project.

Transkribus is a READ project and available as a free tool for users who want to automate recognition of historical documents. The British Library has already had some success using Transkribus on manuscripts from our India Office collection, and it was that which inspired me to see how it would perform on the Bengali texts, which provides an altogether different type of challenge.

For a start, most text recognition solutions either do not support Indian scripts, or do not reach close to the same level of recognition as they do with documents written in English or other Latin scripts. In part this is down to supply and demand. Mainstream providers of tools have prioritised Western customers, yet there is also the relative lack of digitised Indian texts that can be used to train text recognition engines.

These text recognition engines have also been well trained on modern dictionaries and a collection of historical texts like the Bengali books will often contain words which are no longer in use. Their aged physicality also brings with it the delights of faded print, blotchy paper and other paper-based gremlins that keeps conservationists in work yet disrupts automated text recognition. Throw in an extensive alphabet that contains more diverse and complicated character forms than English and you can start to piece together how difficult it can be to train recognition engines to achieve comparable results with Bengali texts.

So it was with more with hope than expectation I approached Transkribus. We began by selecting 50 pages from the Bengali books representing the variety of typographical and layout styles within the wider collection of c. 500,000 pages as much as possible. Not an easy task! We uploaded these to Transkribus, manually segmenting paragraphs into text regions and automating line recognition. We then manually transcribed the texts to create a ground truth which, together with the scanned page images, were used to train the recurrent neural network within Transkribus to create a model for the 5,700 transcribed words.

Screenshot of a page from one of the British Library's Bengali books within the Transkribus viewer showing segmentation of the page by green bounding boxes around paragraphs and underlined text lines. Typed transcriptions of the text are shown below the page image                               Screenshot of a page from one of the British Library's Bengali books within the Transkribus viewer showing segmentation of the page by green bounding boxes around paragraphs and underlined text lines. Typed transcriptions of the text are shown below the page image. 

The model was tested on a few pages from the wider collection and the results clearly communicated via the graph below. The model achieved an average character error rate (CER) of 21.9%, which is comparable to the best results we have seen from other text recognition services. Word accuracy of 61% was based on the number of words that were misspelled in the automated transcription compared to the ground truth. Eventually we would like to use automated transcriptions to support keyword searching of the Bengali books online and the higher the word accuracy increases the chances of users pulling back all relevant hits from their keyword search. We noticed the results often missed the upper zone of certain Bengali characters, i.e. the part of the character or glyph which resides above the matra line that connects characters in Bengali words. Further training focused on recognition of these characters may improve the results.

Screenshot of a graph showing the learning curve of the Bengali model using the Transkribus HTR tool which achieved 21.91% character error rateScreenshot of a graph showing the learning curve of the Bengali model using the Transkribus HTR tool which achieved 21.91% character error rate      

Our training set of 50 pages is very small compared to other projects using Transkribus and so we think the accuracy could be vastly improved by creating more transcriptions and re-training the model. However, we're happy with these initial results and would encourage others in a similar position to give Transkribus a try.

 

 

19 March 2019

BL Labs 2018 Commercial Award Runner Up: 'The Seder Oneg Shabbos Bentsher'

Add comment

This guest blog was written by David Zvi Kalman on behalf of the team that received the runner up award in the 2018 BL Labs Commercial category.

32_god_web2

The bentsher is a strange book, both invisible and highly visible. It is not among the more well known Jewish books, like the prayerbook, Hebrew Bible, or haggadah. You would be hard pressed to find a general-interest bookstore selling a copy. Still, enter the house of a traditional Jew and you’d likely find at least a few, possibly a few dozen. In Orthodox communities, the bentsher is arguably the most visible book of all.

Bentshers are handbooks containing the songs and blessings, including the Grace after Meals, that are most useful for Sabbath and holiday meals, as well as larger gatherings. They are, as a rule, quite small. These days, bentshers are commonly given out as party favors at Jewish weddings and bar/bat mitzvahs, since meals at those events require them anyway. Many bentshers today have personalized covers relating the events at which they were given.

Bentshers have never gone out of print. By this I mean that printing began with the invention of the printing press and has never stopped. They are small, but they have always been useful. Seder Oneg Shabbos, the version which I designed, was released 500 years after the first bentsher was published. It is, in a sense, a Half Millennium Anniversary Special Edition.

SederOneg_4

Bentshers, like other Jewish books, could be quite ornate; some were written and illustrated by hand. Over the years, however, bentshers have become less and less interesting, largely in order to lower the unit cost. In order to make it feasible for wedding planners to order hundreds at a time, all images were stripped from the books, the books themselves became very small, and any interest in elegant typography was quickly eliminated. My grandfather, who designed custom covers for wedding bentshers, simply called the book, “the insert.” Custom prayerbooks were no different from custom matchbooks.

This particular bentsher was created with the goal of bucking this trend; I attempted to give the book the feel of the some of the Jewish books and manuscripts of the past, using the research I was able to gather a graduate student in the field of Jewish history. Doing this required a great deal of image research; for this, the British Library’s online resources were incredible valuable. Of the more than one hundred images in the book, a plurality are from the British Library’s collections.

https://data.bl.uk/hebrewmanuscripts/

https://www.bl.uk/hebrew-manuscripts

OS_36_37

In addition to its visual element, this bentsher differs from others in two important ways. First, it contains ritual languages that is inclusive of those in the LGBTQ community, and especially for those conducting same-sex weddings. In addition, the book contains songs not just in Hebrew, but in Yiddish, as well; this was a homage to two Yiddishists who aided in creating the bentsher’s content. The bentsher was first used at their wedding.

SederOneg_3

More here: https://shabb.es/sederonegshabbos/

Watch David accepting the runner up award and talking about the Seder Oneg Shabbos Bentsher on our YouTube channel (clip runs from 5.33 to 7.26): 

David Zvi Kalman was responsible for the book’s design, including the choice of images. He is a doctoral candidate at the University of Pennsylvania, where he focuses on the relationship between Jewish history and the history of technology. Sarah Wolf is a specialist in rabbinics and is an assistant professor at the Jewish Theology Seminary of America. Joshua Schwartz is a doctoral student at New York University, where he studies Jewish mysticism. Sarah and Joshua were responsible for most of the books translations and transliterations. Yocheved and Yudis Retig are Yiddishists and were responsible for the book’s Yiddish content and translations.

Find out more about Digital Scholarship and BL Labs. If you have a project which uses British Library digital content in innovative and interesting ways, consider applying for an award this year! The 2019 BL Labs Symposium will take place on Monday 11 November at the British Library.

28 February 2019

The World Wide Lab: Building Library Labs - Part II

Add comment

Abstract illustration featuring ropes and ships from 19th Century book

We're setting sail for Denmark! Along with colleagues from the UK, Austria, Belgium, Egypt, Finland, Germany, Ireland, Latvia, Luxembourg, the Netherlands, Qatar, Spain, Sweden and the USA, we will be mooring at Copenhagen's Black Diamond, waterfront home to Denmark's Royal Library, for the second International Building Library Labs event: 4-5 March 2019.

Royal Danish Library logo and British Library logo

For some time now, leading national, state, university and public libraries around the world have been creating 'digital lab type environments'. The purpose of these 'laboratories' is to afford access to their institutions' digital content - the digitised and 'born digital' collections as well as data - and to provide a space where users can experiment and work with that content in creative, innovative and inspiring ways. Our shared ethos is to open up our collections for everyone: digital researchers, artists, entrepreneurs, educators, and everyone in between.

BL Labs has been running in such a capacity for six years. In September 2018, we hosted a 2-day workshop at the British Library in London for invited participants from national, state and university libraries - the first event of its kind in the world. It was a resounding success, and it was decided that we should organise a second event, this time in collaboration with our colleagues in Copenhagen.

19th century book illustration featuring three ship steering wheels with city names written on themNext week's participants, from over 30 institutions, will be sharing lessons learned, talking about innovative projects and services that have used their digital collections and data in clever ways, and continuing to establish the foundations for an international network of Library Labs. We aim to work together in the spirit of collaboration so that we can continue to build even better Library Labs for our users in the future.

Our packed programme is available to view on Eventbrite or as a Googledoc. We still have a few spaces left so if you are interested in coming along, you can still book here. As well as presentations and plenary debates, we will have eight lightning talks with topics ranging from how to handle big data to how to run a data visualisation lab. To accommodate our many delegates, with their own interests and specialisms, we will break out into 12 parallel discussion groups focusing on subjects such as how to set up a lab; how to get access to data; moving from 'project' lab to 'business as usual'; data curation; how to deal with large datasets; and using Labs & Makerspaces for data-driven research and innovation in creative industries. 

We will blog again after the event, and provide links to some of the presentations and outputs. Watch this space! 

Abstract 19th century book illustration featuring seagulls and ship carpentry

Danish-themed images trawled from our British Library Flickr Images set: pages 37, 126, and 15 of Copenhagen, the Capital of Denmark, published by the Danish Tourist Society, 1898. Find the original book here.

Posted by Eleanor Cooper on behalf of BL Labs

26 February 2019

Competition to automate text recognition for printed Bangla books

Add comment

You may have seen the exciting news last week that the British Library has launched a competition on recognition of historical Arabic scientific manuscripts that will run as part of ICDAR2019. We thought it only fair to cover printed material too! So we’re running another competition, also at ICDAR, for automated text recognition of rare and unique printed books written in Bangla that have been digitised through the Library's Two Centuries of Indian Print project.

Some of you may remember the Bangla printed books competition which took place at ICDAR2017 which generated significant interest among academic institutions and technology providers both in India and across the world. The 2017 competition set the challenge of finding an optimal solution for automating recognition of Bangla printed text and resulted in Google’s method performing best for both text detection and layout analysis.

Fast forward to 2019 and, thanks to Jadavpur University in Kolkata, we have added more ground truth transcriptions for competition entrants to train their OCR systems with. We hope that the competition encourages submissions again from cutting-edge OCR methods leading to a solution that can truly open up these historic books, dating between 1713 and 1914, for text mining, enabling scholars of South Asian studies to explore hundreds of thousands of pages on a scale that has not been possible until now.

 Image showing a transcribed page from one of the Bengali books featured in the ICDAR2019 competition

              Image showing a transcribed page from one of the Bengali books featured in the ICDAR2019 competition

As with the Arabic competition, we are collaborating with PRImA (Pattern Recognition & Image Analysis Research Lab) who will provide expert and objective evaluation of OCR results produced through the competition. The final results will be revealed at the ICDAR2019 conference in Sydney in September.

So if you missed out last time but are interested in testing your OCR systems on our books the competition is now open! For instructions of how to apply and more about the competition, please visit https://www.primaresearch.org/REID2019/

 

This post is by Tom Derrick, Digital Curator for Two Centuries of Indian Print, British Library. He is on Twitter as @TommyID83 and Two Centuries of Indian Print tweet from @BL_IndianPrint

 

18 February 2019

Updated Eighteenth-Century Collections Online

Add comment

The traditional, somewhat stereotypical image of the researcher of things past has not changed much in recent times. There is nothing easier than to imagine a scholar sitting at a scarcely illuminated wooden desk, surrounded by piles of old hardbound volumes, spending hours on end rummaging through the sheets in search of a clue.

In the field of eighteenth-century studies, this is certainly still the case. Scholars often go on a pilgrimage to prestigious repositories such as the British Library. However, in the last fifteen years or so, technology has started to offer attractive alternatives to the pleasure of travelling to London. Powered by Gale-Cengage, the Eighteenth-Century Collections Online (commonly referred to as ECCO) is a well-known resource that provides access to English-language and foreign-language publications printed in Britain, Ireland and the American colonies during the eighteenth century. This extensive collection contains over 180,000 titles (200,000 volumes) and allows full-text searching of some 32 million pages. These are digital editions based on the Eighteenth Century microfilming that started in 1981 and the English Short Title Catalogue.

New ECCO main screen
New ECCO home page

Moving away from its classic web-1.0 design, the Gale-Cengage team recently decided to revamp the layout of ECCO – indeed, of their entire portfolio of archive products, which include among others the Seventeenth- and Eighteenth-Century Burney Newspapers Collection. The aim is to make the Gale Primary Sources experience more consistent and intuitive for the user. At the head of this delicate operation are product managers Doran Steele and Megan Sullivan, who lead a nine-person team of software developers, content engineers, researchers and designers. Not quite the IT-only type of personnel, Doran and Megan are scholars themselves, respectively holding degrees in History and Information Science and a remarkable passion for all things past. They are responsible for the maintenance of the existing ECCO interface, as well as the development of the upcoming design refresh.

During a recent interview they gave to the authors of this post, Doran and Megan declared their objective of evolving ECCO in line ‘with user expectations of modern online research experiences’. Their driving force was stated very clearly as a bottom-up process. ‘This redesign’, they explained, ‘is informed by user feedback and market research’. A beta version of the new site has been available since the second half of 2018 to enable the Gale-Cengage team to gather feedback about the new design. The product managers specified that the final transition to the ‘new’ ECCO will only be completed once they feel confident that the new experience ‘successfully meets the needs of our users’. The final goal is a better user experience, ‘one that is faster and more intuitive’. To achieve this, a range of new features have been included, such as more filters on search results; results more relevant to the search queries; data visualization tools; improved subject indexing; more options for adjusting the image; and the ability to download in a text format the OCR (optical character recognition) version of a volume. The latter feature will be a particularly welcome innovation for scholars that often need to look up the occurrence of a single word or cut and paste long chunks of text.

ECCO search results
New ECCO search results screen

The options for adjusting the page view are another significant novelty. The beta version boasts new settings to quickly select the preferred zoom level, as well as sliders to increase or decrease the brightness and contrast of the page. These improvements are particularly welcome considering that the quality of the scans remains unchanged. The page quality is not directly related to ECCO. The portal simply allows the consultation of the digitised microfilms included in the first collection (also known as ECCO 1, comprising over 154.000 texts) and the digitisation of a second, smaller collection of books (ECCO 2, over 52.000 titles). This raises an important issue. A plethora of relatively unknown, yet precious eighteenth-century material remains difficult to consult because, on top of the uneven quality in the texts that came out of eighteenth-century printing presses, the original microfilming technology that was employed for the first collection yielded relatively low-resolution results. This causes some hiccups with OCR recognition, thus discouraging the use of quantitative methodologies. But the issue is all the more salient when the category of eighteenth-century visuals is taken into account. At a time when British engravers multiplied in numbers to illustrate the newly-discovered wonders of the natural world or the archaeological remains of Roman cities in England, illustrations became an essential aspect of the eighteenth-century book market and reading experience. While for essential texts such as William Stukeley’s Itinerarium curiosum (1724) or Eleazar Albin and William Derham’s A Natural History of Birds (1734) more refined scans can be found elsewhere, a large number of texts is digitally available only through ECCO 1. Scholars interested in images are either to focus on well-known texts that have been digitised by other providers – with serious consequences in terms of canonicity – or eventually need to plan a visit to major libraries to consult the relevant volumes in person, somehow defeating the very idea of digital reading. Either way, the study of visual culture is somewhat inhibited. Nevertheless, the ‘new’ ECCO promises to enhance the user experience and to offer even more opportunities to engage with outstanding repositories of primary material. If you already had a chance to use the new version, we encourage you to get in touch with Doran and Megan: as your feedback and suggestions can improve ECCO even further.

New ECCO text screen
New ECCO image viewer screen

This post is by Alessio Mattana, Teaching Assistant in Eighteenth-Century Literature at the University of Leeds (on Twitter as @mattanaless), and Dr Giacomo Savani, Teaching Assistant in Ancient History at the University of Leeds (on Twitter as @GiacomoSavani).

30 January 2019

Reading 35,000 Books: The UCD Contagion Project and the British Library Digital Corpus - Workshop & Roundtable

Add comment

A guest post by Geradine Meany, Professor of Cultural Theory in the School of English, Drama and Film and Derek Greene, Assistant Professor at the School of Computer Science, both at the University College Dublin who are organising a FREE workshop and roundtable together with the BL Labs team on Thursday 20 February 2019 at the British Library in London.

How do you set about finding specific references and thematic associations in the massive digital resource represented by the British Library Nineteenth Century Book Corpus, originally digitised through a collaboration with Microsoft?

The Contagion, Biopolitics and Cultural Memory project at UCD Dublin set out to illuminate culturally and historically specific understandings of disease and contagion that appear within the fiction in the corpus. In order to do so, the project team extracted over 35,000 unique volumes out of a total of 65,000 in English and built a searchable interface of 12.3 million individual pages of text, which can be filtered and sorted using the corpus metadata (e.g. author, title, year, etc). The interface incorporates an index of the topical catalogue of volumes used by the British Library from 1823-1985 (within Alston index). Using a combination of OCR text recognition and manual annotation, we have extracted data the two top levels of the index, covering over 98% of the English language texts in the corpus. So for the first time it is possible to reliably identify and extract fiction, drama, history, topography, etc, from the corpus.

35000books
Extracting data from 35,000 digitised books

To allow researchers to further filter the corpus to identify texts from niche topic areas, the interface supports the semi-automatic creation of word lexicons, built upon modern “word embedding” natural language processing methods. By combining the resulting lexicons with existing corpus metadata and the data extracted from digitised version of the Alston Index, researchers can efficiently create and export small topical sub-corpora for subsequent close reading.

The Contagion project team is currently using information retrieval and word embeddings to identify texts for close reading. This combination allows us to track key trends pertaining to illness and contagion in the corpus, and interpret these findings with particular reference to current and historical debates surrounding biopolitics, medical culture and migration. Clusters of associations between contagion, poverty and morality are identifiable within the corpus. However, to date our research indicates that Victorians were more worried about religious contamination from migrants and minorities than they were about contagious diseases.

A key feature of the project is the intersection of methodologies and concepts from English literature, automated text mining, and medical humanities. This involves using data analytics as a mode of interpretation not a substitute for it, a way of engaging with the extent and complexity of cultural production in the nineteenth century. Cultural data resists giving definitive yes or no answers to the questions put to it by researchers, but the more cultural data we analyse the better we can map the processes of cultural change and continuity, in all their complexity. The process of tracking themes, topics, and associations enabled by the new interface offers an opportunity to work with and far beyond the existing canon of nineteenth century fiction, itself radically expanded by the last 20 years of scholarship. The identification within the corpus of a very large collection of 3 volume novels indicates that the popular novel is very well represented, for example, while the ability to identify and extract ‘Collected Works’ indicates which writers their contemporaries expected to remain central to the tradition of fiction.

On February 20th 2019, the FREE ‘Reading 35,000 Books’ workshop and roundtable will present the project’s work to date, and will also include discussion by scholars of nineteenth century literature and the British Library Labs of the future development and use of the new searchable interface, including exporting topical sub-corpora for further research.

The event is supported by the Irish Research Council.