THE BRITISH LIBRARY

Digital scholarship blog

53 posts categorized "Digital scholarship"

22 March 2017

British Library Launches OCR Competition for Rare Indian Books

Add comment

Calling all transcription enthusiasts! We’ve launched a competition to find an accurate and automated transcription solution for our rare Indian books and printed catalogue records, currently being digitised through the Two Centuries of Indian Print project. 

The competition, in partnership with the University of Salford’s PRIMA Research Lab, is part of the International Conference on Document Analysis and Recognition, taking place in Kyoto, Japan this November. The winners will be announced at a special event during the conference.

Digitised images of the books will be made openly available through the library’s website and we hope this competition will produce transcriptions that enable full text search and discovery of this rich material. Sharing XML transcriptions will also give researchers the foundation to apply computational tools and methods such as text mining that may lead to new insights into book and publishing history in India.   

Split into two challenges, those wishing to participate in the competition can enter either or both.

The first challenge is to find an automated transcription for the 19th century printed books written in Bengali script. Optical Character Recognition of many non-Latin scripts is a developing area, but still presents a considerable barrier for libraries and other cultural institutions hoping to open up their material for scholarly research.

Vt1712_Schoolbook_lion_0007

Above: A page from 'Animal Biography', one of the Bengali books being digitised as part of Two Centuries of Indian Print (VT 1712)

 

Challenge number two involves our printed catalogue records, known as ‘Quarterly Lists’. These describe books published in India between 1867 and 1967. The lists are arranged in tables and therefore accurately representing the layout of the data is important if researchers are able to use computational methods to identify chunks of information such as the place of publication and cost of the book.    

Quarterly_List

 Above: A typical double page from the Quarterly Lists (SV 412/8)

 

With the competition now open, we’ve already gone some way to helping participants by manually transcribing a few pages to create ‘ground truth’ using PRIMA's editing tool, Aletheia.  So if you or anyone you know would like to enter, do please register and you could be contributing to this landmark project, and picking up an award for your troubles!   

21 March 2017

Poetic Places and World Poetry Day 2017

Add comment

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom.

Happy World Poetry Day!

The Digital Scholarship team are marking the day with an event exploring how poetry, history and literature can be discovered and experienced via digital technologies. Creative Entrepreneur-in-Residence Sarah Cole is talking about the development of Poetic Places, a free app for iOS and Android devices, that creates digital encounters with poems and literature in the locations described, accompanied by sounds and illustrations from cultural heritage collections; including the British Library's images on Flickr.

Being a creative type Sarah has also been using the Flickr collection in her new enterprise Badgical Kingdom, which takes images from galleries, libraries, archives, and museums released under Creative Commons licenses and turns them into badges. Sarah hopes to bring forgotten works out into the everyday world where they can be re-admired. Furthermore, every piece is sent with a card detailing a little of the design’s history and naming the institution which has made the work available; including the Rijksmuseum, whose collections have inspired these flower brooches, which could make perfect Mother's Day presents in my opinion.

Photo-02-02-2017-15-11-58 Billycock-Cat-reverse

Images of Billycock Cat Pin, copyright Sarah Cole.

Also speaking at the event are 

  • Dr Jennifer Batt, a lecturer in English, University of Bristol, who has been working with British Library Labs on an innovative project to data mine 18th-century newspapers for verse.
  • Dr Duncan Hay, from the Bartlett Centre for Advanced Spatial Analysis who works on the Survey of London, check out their map. It is also worth noting that Duncan is a colleague of Martin Zaltz Austwick, who did GPS mapping of a walk based around the first section of William Gull's coach ride in Alan Moore's From Hell. There is a short video of this here.

For those of you unable to join us this evening and also those of you who are; please check out the British Library's drama and literature recordings on SoundCloud. These include excellent poems from The Michael Marks Awards for Poetry Pamphlets winners and shortlisted entries and readings from other British Library events, enjoy ...

 Recording of Richard Scott reading from his pamphlet ‘Wound’, published by The Rialto

09 March 2017

Archaeologies of reading: guest post from Matthew Symonds, Centre for Editing Lives and Letters

Add comment

Digital Curator Mia Ridge: today we have a guest post by Matthew Symonds from the Centre for Editing Lives and Letters on the Archaeologies of reading project, based on a talk he did for our internal '21st century curatorship' seminar series. Over to Matt...

Some people get really itchy about the idea of making notes in books, and dare not defile the pristine printed page. Others leave their books a riot of exclamation marks, sarcastic incredulity and highlighter pen.

Historians – even historians disciplined by spending years in the BL’s Rare Books and Manuscripts rooms – would much prefer it if people did mark books, preferably in sentences like “I, Famous Historical Personage, have read this book and think the following having read it…”. It makes it that much easier to investigate how people engaged with the ideas and information they read.

Brilliantly for us historians, rare books collections are filled with this sort of material. The problem is it’s also difficult to catalogue and make discoverable (nota bene – it’s hard because no institutions could afford to employ and train sufficient cataloguers, not because librarians don’t realise this is an issue).

The Archaeology of Reading in Early Modern Europe (AOR) takes digital images of books owned and annotated by two renaissance readers, the professional reader Gabriel Harvey and the extraordinary polymath John Dee, transcribes and translates all the comments in the margin, and marks up all traces of a reader’s intervention with the printed book and puts the whole thing on the Internet in a way designed to be useful and accessible to researchers and the general public alike.

image from https://s3.amazonaws.com/feather-client-files-aviary-prod-us-east-1/2017-03-09/76bacc2c-befe-4e7c-b729-c49cf47adf0b.png
Screenshot, The Archaeology of Reading in Early Modern Europe

AOR is a digital humanities collaboration between the Centre for Editing Lives and Letters (CELL) at University College London, Johns Hopkins University and Princeton University, and generously funded by the Andrew W. Mellon Foundation.

More importantly, it’s also a collaboration between academic researchers, librarians and software engineers. An absolutely vital consideration of how we planned AOR, how we work on it, how we’re planning to expand it, was to identify a project that could offer a common ground to be shared between these three interests, where each party would have something to gain from it.

As one of the researchers, it was really important to me to avoid forming some sort of “client-provider” relationship with the librarians who curate and know so much about my sources, and the software engineers who build the digital infrastructure.

But we do use an academic problem as a means of giving our project a focus. In 1990, Antony Grafton and the late Lisa Jardine published their seminal article ‘“Studied for Action: how Gabriel Harvey read his Livy’ in the journal Past & Present.

One major insight of the article is that people read books in conjunction with one another, often for specific, pragmatic purposes. People didn’t pick up a book from their shelves, open at page one and proceed through to the finis, marking up as they went. They put other books next to them, books that explained, clarified, argued with one another.

By studying the marginalia, it’s possible to reconstruct these pathways across a library, recreating the strategies people used to manage the vast quantities of information they had at their disposal.

In order to produce this archaeology of reading, we’ve built a “digital bookwheel”, an attempt to recreate the revolving reading desk of the renaissance period which allowed the lucky owner to manoeuvre back and forth their books. From here, the user can call up the books we’ve digitised, read the transcriptions, and search for particular words and concepts.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2017-03-09/ac83353a40f24bea921e478b1450993e.png
Screenshot, The Archaeology of Reading in Early Modern Europe


It’s built out of open source materials, leveraging the International Image Interoperability Framework (IIIF) and the IIIF-compliant Mirador 2 Viewer. Interested parties can download the XML files of our transcriptions, as well as the data produced in the process.

The exciting thing for us is that all the work on creating this digital infrastructure – which is very much a work in progress -- has provided us with the raw materials for asking new research questions, questions that can only be asked by getting away from our computer and returning back to the rare books room.

27 February 2017

British Library resources on digital scholarship for PhD students

Add comment

C5453-02a_Arundel_74_f.2v croppedFinding your way around the vast collections of the British Library can be daunting at first, but there are lots of resources and staff keen to help doctoral students get started in this post from Digital Curator Mia Ridge (@mia_out).

These resources were compiled for the digital scholarship sessions at the British Library's doctoral open days. We'd love to hear from you with questions or comments at digitalresearch@bl.uk.

Learning about our collections

Help for researchers - a great place to start with general collections queries

Collection guides

Subject pages

Discovering digitised content

Catalogues: http://explore.bl.uk for printed materials ('I want this' will list digitised items); http://searcharchives.bl.uk for archives and manuscripts

Digitised manuscripts, Illuminated manuscripts and Hebrew manuscripts

British Library sounds for music, drama and literature, oral history, wildlife and environmental sounds

Flickr - particularly rich in images from 19th century books

Wikimedia Commons

International Dunhuang Project (IDP) - manuscripts, paintings, textiles and artefacts from Dunhuang and archaeological sites of the Eastern Silk Road

Endangered Archives Programme (EAP) - international digitisation projects

data.bl.uk - text, images and catalogue 'metadata' datasets available for research and creative re-use

British National Bibliography metadata

Learning about digital scholarship

The British Library's Digital Scholarship pages list digital datasets, staff, case studies and projects

BL Labs Awards and Competitions are a great source of inspiration

The British Library's Digital Scholarship blog (you're reading it right now!) and twitter account @Bl_DigiSchol

Humanist mailing list

Events with online / in-person sessions include IHR Digital History Seminar and Digital Classicist

The Institute of Historical Research offers training courses or there's the Programming Historian

Finally, your university may be a member of a training consortium (CHASE, White Rose, etc) that offers specialist digital scholarship courses

24 February 2017

Library Carpentry: software skills workshops for librarians

Add comment

Guest post by James Baker, Lecturer in Digital History and Archives, University of Sussex.

Librarians play a crucial role in cultivating world-class research and in most disciplinary areas today world-class research relies on the use of software. Established non-profit organisations such as Software Carpentry and Data Carpentry offer introductory software skills training with a focus on the needs and requirements of research scientists. Library Carpentry is a comparable introductory software skills training programme with a focus on the needs and requirements of library professionals: and by software skills, I mean coding and data manipulation that go beyond the use of familiar office suites. As librarians have substantial expertise working with data, we believe that adding software skills to their armoury is an effective and important use of professional development resource that benefits both library professionals and their colleagues and collaborators across higher education and beyond.

In November 2015 the first Library Carpentry workshop programme took place at City University London Centre for Information, generously supported by the Software Sustainability Institute as part of my 2015 Fellowship. Since then 21 workshops have run in 7 countries across 4 continents and the Library Carpentry training materials have been developed by an international team of librarians, information scientists, and information technologists. Our half-day lessons, which double up as self-guided learning materials, now cover the basics of data and computing, using a command line prompt to manipulate data, version control in Git, normalising data in OpenRefine, working with databases in SQL, and programming with Python.

What distinguishes these lessons from other learning materials are that the exercises and use cases that frame Library Carpentry are drawn from library practice and are based on data familiar to librarians: in most cases, open datasets of publication metadata released under an open licence by the British Library. Library Carpentry then is as much about daily practice as it is about novelty, about dealing with what is front of us today as much as about preparing us for what is coming.

These lessons and everything we do is in the commons, for the commons, and are not tied to any institution or person. We are a community effort built and maintained by the community. For more on Library Carpentry and our future plans, see our recent article in LIBER Quarterly (Baker et al. Library Carpentry: software skills training for library professionals. 2016. DOI: http://doi.org/10.18352/lq.10176) and our website librarycarpentry.github.io.

James_baker
James Baker, receiving the BL Labs Award for Teaching and Learning 2016 on behalf of the Library Carpentry community 

The Learning and Teaching Award given to Library Carpentry at the 2016 British Library Labs Awards has enabled us to extend this community. In November we launched a call for Library Carpentry workshops seeking financial support. We were humbled by the volume and diversity of the responses received and are delighted to be able to fund two very different workshops that will reach very different communities of librarians. The first is a collaboration between Somerset Libraries Glass Box Project, {Libraries:Hacked}, and Plymouth Libraries for a Library Carpentry workshop that will target public, academic, and specialist librarians. The second workshop will take place at University of Sheffield and will be coordinated by the White Rose Consortium for the benefit of university librarians across the region. Details of these events will be advertised at librarycarpentry.github.io in due course, along with four or five Library Carpentry workshops that were unable to fund but that will still enjoy logistical support from members of the Library Carpentry community.

Library Carpentry has taken great strides in a short period of time. We continue to maintain and update our lesson materials to ensure that they fit with library practice and we are working closely with Software Carpentry and Data Carpentry to map out a future direction for Library Carpentry that meets the needs of this valuable community. We are always looking for people to bring their expertise and perspective to this work. So if you want to get involved in any capacity, please post something in our Gitter discussion forum, raise a issue on or suggest an edit to one of our lessons, contact us via Twitter, or request support with a workshop. We'd love to hear from you.

 

24 January 2017

Publication of Quarterly Lists: Catalogues of Indian Books

Add comment

The Two Centuries of Indian Print project is pleased to announce the online availability of some wonderful catalogues held by the library, generally known as the Quarterly Lists. They record books published quarterly and by province of British India between 1867 and 1947.

Digitised for the first time, the Quarterly Lists can now be accessed as searchable PDFs via the British Library's datasets portal, data.bl.uk. Researchers will be able to examine rich bibliographic data about books published throughout India, including the names and address of printers and publishers, publication price and how many copies were sold.

 

SV_412_8_1875-78_0003

 

Our next steps will be to OCR the Quarterly Lists to create ALTO XML for every page, which is designed to show accurate representations of the content layout. This will allow researchers to apply computational tools and methods to look across all of the lists to answer their questions about book history. So if a researcher is interested in what the history of book publishing reveals about a particular time period and place, we would like to make that possible by giving them full access to this dataset.

To get to this point however, we will have to overcome the layout challenge that the Quarterly Lists present. Across all of the lists we have found a few different layout styles which are rather tricky for OCR solutions to handle meaningfully. Note for instance how the list below compares to the one from the Calcutta Gazette above. Through the Digital Research strand of the project we will be seeking out innovative research groups willing to take a crack at improving the OCR quality and accuracy of tabular text extraction from the Quarterly Lists. 

The Quarterly Lists available on data.bl.uk are out of copyright and openly licensed for reuse. If you or anyone you know are interested in using the Quarterly Lists in your research or simply want to find out more about them, feel free to drop me an email; Tom.Derrick@bl.uk or follow more about the project @BL_IndianPrint

You can read more about the history of the Quarterly Lists, in a previous blog I wrote last year.

21 December 2016

Mobius programme – on the beach of learning

Add comment

This guest post is by Virve Miettinen, who spent four months with various teams at the British Library.

Every morning there’s a 100 meter queue in front of the British Library. It seems to say a lot about an unashamed nerdiness and love for learning in this city. Usually all the queuers have already put the things they might need in the Reading Room in a clear plastic bag, so they can head straight down to the lockers, stow away their coats, handbags and laptop cases and secure a place on the beach of learning.

Virve
Virve Miettinen

The Mobius fellowship programme, organised by the Finnish Institute in London, enables mobility for visual arts, museum, library and archives professionals, and customised working periods as part of the host organisation’s staff, in my case the British Library. The programme is a great opportunity to break away from daily routines, to think about one’s professional identity, find fresh ideas, compare the practices and methods between two countries, share knowledge and build meaningful networks.

Learn, relearn and unlearn from each other

Learning isn’t a destination, it’s a never-ending road of discovery, challenge, inspiration and wonder. Each learning moment builds character, shapes thoughts, guides futures. But what makes us learn? For me the answer is other people, and during the Mobius Fellowship I’ve been blessed with the chance to work with talented people willing to share their knowledge at the British Library.

I’ve familiarised myself with British Library Learning Team which is responsible for the library’s engagement with all kinds of learners. The Learning Team offers workshops, activities and resources for schools, teachers and learners of all ages.

I’ve been following the work of the Digital Scholarship team and BL Labs project to learn more about the incredible digital collections the library has to offer, and how to open them up for the public through various activities such as competitions, events and projects.

I’ve worked with the Knowledge Quarter, which is a network of now 76 partners within a one mile radius of Kings Cross and who actively create and disseminate knowledge. Partners include over 49 academic, cultural, research, scientific and media organisations large and small: from the British Library and University of the Arts London to the School of Life, Connected Digital Economy Catapult, Francis Crick Institute and Google.

I’ve assisted the Library’s Community Engagement Manager Emma Morgan. She has been working as a community engagement manager for six months now and the aim of her work is to create meaningful, long-lasting, mutually beneficial relationships with the surrounding community, i.e. residents, networks and organisations.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2016-12-21/8bd92af45559431385823ecce6782cb7.png
Inside the British Library

I’ve observed the library’s marketing and communications unit in action, and learned for example how they measure and research the customer experience, i.e. who visits and uses the BL, what they think of their experience and how the BL might improve it.

 

I’ve got many 'mental souvenirs' to take back home with me - if they interest you, read more from my Mobius blog: http://itssupercalafragilistic.tumblr.com/. 

100 digital stories about Finnish-British relations

As part of the Mobius programme I’ve been working on a co-operative project between the British Library, the National Library in Finland, the Finnish National Archives, The Finnish Institute in London and the Finnish Embassy. In the last three decades, contacts between Finland and UK, the two relatively distant nations have multiplied. At the same time, the network of cultural relationships has tightened into a seamless 'love-story' – something that would not have been easy to predict just 50 years ago. In the coming year of 2017 the Finnish Institute celebrates the centennial anniversary of Finland’s independence by telling the story of two nations – the aim is to make the history, the interaction and the links between these two countries tangible and visible.

We are collaborating to create a digital gallery open to all, which offers its visitors carefully curated pieces of the shared history of the two countries and their political, cultural and economic relations. It will offer new information on the relations and influences between the two countries. It consists of digitised historical materials, like letters, news, cards, photographs, tickets and maps. The British Library and other partners will select 100 digitised items to create the basis of the gallery.

The gallery will be expanded further through co-creation. In the spirit of the theme of Finland’s centenary 'together', the gallery is open to all and easily accessible. With the call 'Wanted – make your own heritage' we invite people to share their own stories and interpretations, and record history through them. The gallery feeds curiosity, creates interaction and engages users to share their own memories relating to Finnish-British experiences. The users are invited to interpret recent history from a personal point of view.

The work continues after my Mobius-period and the gallery will open in September 2017. Join us and share your memories. Be frank, withdrawn, furious, imaginative, witty or sad. Through your story you create history.

P.S. The British Library Reading Room is actually far from The Beach of Learning, it’s more like The Coolest Place To Be, I found myself freezing in the air-conditioned Rare Books Reading Room despite wearing my leather jacket and extra pair of leggings

Virve Miettinen is working at Helsinki City Library/ Central Library as a participation planner. Her job is to engage citizens and partners to design the library of the future. For Helsinki City Library co-operative planning and service design means designing the premises and services together with the library users while taking advantage of user centric methods. Her interests involve co-design, service design, community engagement and community-led city development. At the moment she is also working with her PhD under the title 'Co-creative practices in library services'.

16 December 2016

Re-imagining a catalogue of illuminated manuscripts - from search to browse

Add comment

In this guest post, Thomas Evans discusses his work with Digital Curator Dr Mia Ridge to re-imagine the interface to the British Library's popular Online Catalogue of Illuminated manuscripts.

The original Catalogue was built using an Access 2003 database, and allows users to create detailed searches from amongst 20 fields (such as date, title, origin, and decoration) or follow 'virtual exhibitions' to view manuscripts. Search-based interfaces can be ideal for specialists who already know what they're looking for, but the need to think of a search term likely to yield interesting results can be an issue for people unfamiliar with a catalogue. 'Generous interfaces' are designed as rich, browsable experiences that highlight the scope and composition of a particular collection by loading the page with images linked to specific items or further categories. Mia asked Thomas to apply faceted browsing and 'generous' styles to help first-time visitors discover digitised illuminated manuscripts. In this post Thomas explains the steps he took to turn the catalogue data supplied into a more 'generous' browsing interface. An archived version of his interface is available on the Internet Archive.

With over 4,300 manuscripts, written in a variety of languages and created in countries across Europe over a period of about a thousand years, the British Library's collection of illuminated manuscripts contains a diverse treasure trove of information and imagery for both the keen enthusiast and the total novice.

As the final project for my Masters in Computer Science at UCL, I worked with the British Library to design and start to implement alternative ways of exploring the collection. This project had some constraints in time, knowledge and resources. The final deadline for submission was only four months after receiving the project outline and the success of the project rested on the knowledge, experience and research of a fresh-faced rookie (me) using whatever tools I had the wherewithal to cobble together (open source software running on a virtual machine server hosted by UCL).

Rather than showing visitors an empty search box when they first arrive, a generous interface will show them everything available. However, taken literally, displaying 'everything' means details for over 4,300 manuscripts and around 40,000 images would have to be displayed on one page. While this approach would offer visitors a way to explore the entire catalogue, it could be quite unwieldy.

One way to reduce the number of manuscripts loaded onto the screen is to allow visitors to filter out some items, for example limiting the 'date' field to between 519 and 927 or the 'region' field to England. This is 'faceted' browsing, and it makes exploration more manageable. Presenting the list of available values for region or language, etc., also gives you a sense of the collection's diversity. It also means that 'quirky' members of the collection are less likely to be overlooked.

Screenshot of filters in Thomas CIM interface II
An example of 'date' facets providing an instant overview of the temporal range of the Catalogue

For example, if you were to examine 30 random manuscripts from the British Library's collection, you might find 20 written in Latin, three each in French and English, and perhaps one each in Greek, Hebrew or Italian. You would almost certainly miss that the Catalogue contains a manuscript written in Cornish, another in Portuguese and another in Icelandic. These languages might be of interest precisely because they are hard to come by in the British Library's catalogue. Listing all the available languages (as well as their frequencies) exposes the exceptional parts of the collection where an unfaceted generous interface would hide them in plain sight.

Once I understood the project's goals and completed some high-level planning and design sketches, it was time to get to grips with implementation. Being fairly inexperienced, I found some tasks took much longer than expected. A few examples which stick in the mind are properly configuring the web server, debugging errant server-side scripts (which have a habit of failing either silently or with an unhelpful error message) and transforming Library's database into a form which I could use.

Being the work of many hands over the years, the database inevitably contained some tiny differences in the way entries were recorded, which Mia informs me is not uncommon for a long-standing database in a collecting institution. These small inconsistencies - for example, the use of an en-dash in some cases and a hyphen in others - look fine to us, but confuse a computer. I worked around these where I could, 'cleaning' the records only when I was certain of my correction.

Being new to web design, I built the interface iteratively, component by component, consulting periodically with Mia for feedback. Thankfully, frameworks exist for responsive web design and page templating. Nevertheless, there was a small learning curve and some thought was required to properly separate application logic from presentation logic.

There were some ambitions for the project which were ultimately not pursued due to time (and knowledge!) constraints, but this iterative process made other improvements possible over the course of my project. To make exploration of the catalogue easier, the page listing a manuscript's details also contained links to related manuscripts. For instance, Ioannes Rhosos is attributed as the scribe of Harley 5699, so, on that manuscript's page, users could click on his name to see a list of all manuscripts by him. They could then apply further filters if desired. This made links between manuscripts much more clear than the old interface, but it is limited to direct links which were explicitly recorded in the database.

An example of a relevant feature not explicitly recorded in the database is genre - only by reading manuscript descriptions can you determine whether it is religious, historical, medical etc. in its subject matter. Two possible techniques for revealing such features were considered: applying natural language processing to manuscript descriptions in order to classify them, or analysing data about which manuscripts were viewed by which users to build a recommendation system. Both of these turned out to require more in-depth knowledge than I was able to acquire within the time limit of the project.

I enjoyed working out how to transform all the possible inputs to the webpage into queries which could be run against the database, dealing with missing/invalid inputs by providing appropriate defaults etc. There was a quiet satisfaction to be had when tests of the interface went well - seeing something work and thinking 'I made that!'. It was also a pleasure to work with data about such an engaging topic.

Hopefully, this project will have proved that exploration of British Library's Catalogue of Illuminated Manuscripts has the potential to become a richer experience. Relationships between manuscripts which are currently not widely known could be revealed to more visitors and, if the machine learning techniques were to be implemented, perhaps new relationships would be revealed and related manuscripts could be recommended. My project showed the potential for applying new computational methods to better reveal the character of collections and connections between their elements. Although the interface I delivered has some way to go before it can achieve this goal, I earnestly hope that it is a first step in that direction.

Thomas' Catalogue interface
Thomas' Catalogue interface