THE BRITISH LIBRARY

Digital scholarship blog

95 posts categorized "Events"

21 March 2017

Poetic Places and World Poetry Day 2017

Add comment

This post is by Digital Curator Stella Wisdom, on twitter as @miss_wisdom.

Happy World Poetry Day!

The Digital Scholarship team are marking the day with an event exploring how poetry, history and literature can be discovered and experienced via digital technologies. Creative Entrepreneur-in-Residence Sarah Cole is talking about the development of Poetic Places, a free app for iOS and Android devices, that creates digital encounters with poems and literature in the locations described, accompanied by sounds and illustrations from cultural heritage collections; including the British Library's images on Flickr.

Being a creative type Sarah has also been using the Flickr collection in her new enterprise Badgical Kingdom, which takes images from galleries, libraries, archives, and museums released under Creative Commons licenses and turns them into badges. Sarah hopes to bring forgotten works out into the everyday world where they can be re-admired. Furthermore, every piece is sent with a card detailing a little of the design’s history and naming the institution which has made the work available; including the Rijksmuseum, whose collections have inspired these flower brooches, which could make perfect Mother's Day presents in my opinion.

Photo-02-02-2017-15-11-58 Billycock-Cat-reverse

Images of Billycock Cat Pin, copyright Sarah Cole.

Also speaking at the event are 

  • Dr Jennifer Batt, a lecturer in English, University of Bristol, who has been working with British Library Labs on an innovative project to data mine 18th-century newspapers for verse.
  • Dr Duncan Hay, from the Bartlett Centre for Advanced Spatial Analysis who works on the Survey of London, check out their map. It is also worth noting that Duncan is a colleague of Martin Zaltz Austwick, who did GPS mapping of a walk based around the first section of William Gull's coach ride in Alan Moore's From Hell. There is a short video of this here.

For those of you unable to join us this evening and also those of you who are; please check out the British Library's drama and literature recordings on SoundCloud. These include excellent poems from The Michael Marks Awards for Poetry Pamphlets winners and shortlisted entries and readings from other British Library events, enjoy ...

 Recording of Richard Scott reading from his pamphlet ‘Wound’, published by The Rialto

14 November 2016

British Library Labs Symposium 2016 - Competition and Award runners up

Add comment

The 4th annual British Library Labs Symposium was held on 7th November 2016, and the event was a great success in celebrating and showcasing Digital Scholarship and highlighting the work of BL Labs and their collaborators. The exciting day included the announcement of the winners of the BL Labs Competition and BL Labs Awards, as well as of the runners up who are presented in this blog post. Posts written by all of the winners and runners up about their work are also scheduled for the next few weeks - watch this space!

BL Labs Competition finalist for 2016
Roly Keating, Chief Executive of the British Library announced that the runner up of the two finalists of the BL Labs Competition for 2016 was...

Black Abolitionist Performances and their Presence in Britain
By Hannah-Rose Murray (PhD student at the University of Nottingham)

Bl_labs_symposium_2016_027
Roly Keating, Chief Executive of the British Library, welcoming Hannah-Rose Murray on to the stage.

The project focuses on African American lives, experiences and lectures in Britain between 1830–1895. By assessing black abolitionist speeches in the British Library’s nineteenth-century newspaper collection and using the British Library’s Flickr Commons 1 million collection. to illustrate, the project has illuminated their performances and how their lectures reached nearly every corner of Britain. For the first time, the location of these meetings has been mapped and the number and scale of the lectures given by black abolitionists in Britain has been evaluated, allowing their hidden voices to be heard and building a more complete picture of Victorian London for us. Hannah-Rose has recently posted an update about her work and the project findings can also be found on her website: www.frederickdouglassinbritain.com.

RoseHannah-Rose Murray is a second year PhD student with the Department of American and Canadian Studies, University of Nottingham. Her AHRC/M3C-funded PhD focuses on the legacy of formerly enslaved African Americans on British society and the different ways they fought British racism. Hannah-Rose received a first class Masters degree in Public History from Royal Holloway University and has a BA History degree from University College London (UCL). In Nottingham, Hannah-Rose works closely with the Centre for Research in Race and Rights and is one of the postgraduate directors of the Rights and Justice Research Priority Area, which includes the largest number of scholars (700) in the world working on rights and justice.

BL Labs Awards runners up for 2016

Research Award runner up
Allan Sudlow, Head of Research Development at the British Library announced that the runner up of the Research Award was...

Nineteenth-century Newspaper Analytics
By Paul Fyfe (Associate Professor of English, North Carolina State University) and Qian Ge (PhD Candidate in Electrical and Computer Engineering, North Carolina State University)

News
Nineteenth-Century Newspaper Analytics

The project represents an innovative partnership between researchers in English literature, Electrical & Computer Engineering, and data analytics in pursuit of a seemingly simple research question: How can computer vision and image processing techniques be adapted for large-scale interpretation of historical illustrations? The project is developing methods in image analytics to study a corpus of illustrated nineteenth-century British newspapers from the British Library’s collection, including The Graphic, The Illustrated Police News, and the Penny Illustrated Paper. 

Paul_fyfe_qian_ge
Paul Fyfe and Qian Ge gave a recorded acceptance speech at the Symposium as they were unable to attend in person.

It aims to suggest ways of adapting image processing techniques to other historical media while also pursuing scholarship on nineteenth-century visual culture and the illustrated press. The project also exposes the formidable technical challenges presented by historical illustrations and suggests ways to refine computer vision algorithms and analytics workflows for such difficult data. The website includes sample workflows as well as speculations about how large-scale image analytics might yield insights into the cultural past, plus much more: http://ncna.dh.chass.ncsu.edu/imageanalytics 

Commercial Award runner up
Isabel Oswell, Head of Business Audiences at the British Library announced that the runner up of the Commercial Award was...

Poetic Places
By Sarah Cole (TIME/IMAGE organisation and Creative Entrepreneur-in-Residence at the British Library)

Bl_labs_symposium_2016_172Sarah Cole, presenting Poetic Places PoeticPoetic Places

Poetic Places is a free app for iOS and Android devices which was launched in March 2016. It brings poetic depictions of places into the everyday world, helping users to encounter poems in the locations described by the literature, accompanied by contextualising historical narratives and relevant audiovisual materials. These materials are primarily drawn from open archive collections, including the British Library Flickr collection. Utilising geolocation services and push notifications, Poetic Places can (whilst running in the background on the device) let users know when they stumble across a place depicted in verse and art, encouraging serendipitous discovery. Alternatively, they can browse the poems and places via map and list interfaces as a source of inspiration without travelling. Poetic Places aspires to give a renewed sense of place, to bring together writings and paintings and sounds to mean more than they do alone, and to bring literature into people’s everyday life in unexpected moments.

Artistic Award runner up
Jamie Andrews, Head of Culture and Learning at the British Library announced that the runner up of the Artistic Award was... 

Bl_labs_symposium_2016_190Kristina Hofmann and Claudia Rosa Lukas

Fashion Utopia
By Kris Hofmann (Animation Director) and Claudia Rosa Lukas (Curator)

 
Fashion Utopia

The project involved the creation of an 80 second animation and five vines which accompanied the Austrian contribution to the International Fashion Showcase London, organised annually by the British Council and the British Fashion Council. Fashion Utopia garnered creative inspiration from the treasure trove of images from the British Library Flickr Commons collection and more than 500 images were used to create a moving collage that was, in a second step, juxtaposed with stop-frame animated items of fashion and accessories.

Teaching / Learning Award runner up
Ria Bartlett, Lead Producer: Onsite Learning at the British Library announced that the runner up of the Teaching / Learning Award was...

The PhD Abstracts Collections in FLAX: Academic English with the Open Access Electronic Theses Online Service (EThOS) at the British Library

By Shaoqun Wu (FLAX Research & Development and Lecturer in Computer Science), Alannah Fitzgerald (FLAX Open Education Research and PhD Candidate), Ian H. Witten (FLAX Project Lead and Professor of Computer Science) and Chris Mansfield (English Language and Academic Writing Tutor)

Flax
The PhD Abstracts Collections in FLAX

The project presents an educational research study into the development and evaluation of domain-specific language corpora derived from PhD abstracts with the Electronic Theses Online Service (EThOS) at the British Library. The collections, which are openly available from this study, were built using the interactive FLAX (Flexible Language Acquisition flax.nzdl.org) open-source software for uptake in English for Specific Academic Purposes programmes (ESAP) at Queen Mary University of London. The project involved the harvesting of metadata, including the abstracts of 400,000 doctoral theses from UK universities, from the EThOS Toolkit at the British Library. These digital PhD abstract text collections were then automatically analysed, enriched, and transformed into a resource that second-language and novice research writers can browse and query in order to extend their ability to understand the language used in specific domains, and to help them develop their abstract writing. It is anticipated that the practical contribution of the FLAX tools and the EThOS PhD Abstract collections will benefit second-language and novice research writers in understanding the language used to achieve the persuasive and promotional aspects of the written research abstract genre. It is also anticipated that users of the collections will be able to develop their arguments more fluently and precisely through the practice of research abstract writing to project a persuasive voice as is used in specific research disciplines.

Bl_labs_symposium_2016_209
Alannah Fitzgerald and Chris Mansfield receiving the Runner Up Teaching and Learning Award on behalf of the FLAX team.

British Library Labs Staff Award runner up
Phil Spence, Chief Operating Officer at the British Library announced that the runner up of the British Library Labs Staff Award as...

SHINE 2.0 - A Historical Search Engine

Led by Andy Jackson (Web Archiving Technical Lead at the British Library) and Gil Hoggarth (Senior Web Archiving Engineer at the British Library)

Shine
SHINE

SHINE is a state-of-the-art demonstrator for the potential of Web Archives to transform research. The current implementation of SHINE exposes metadata from the Internet Archive's UK domain web archives for the years 1996- 2013. This data was licensed for use by the British Library by agreement with JISC. SHINE represents a high level of innovation in access and analysis of web archives, allowing sophisticated searching of a very large and loosely-structured dataset and showing many of the characteristics of "Big Social Data". Users can fine-tune results to look for file-types, results from specific domains, languages used and geo-location data (post-code look-up). The interface was developed by Web Archive technical development alongside the AHRC-funded Big UK Domain Data for the Arts and Humanities project. An important concept in its design and development was that it would be researcher-led and SHINE was developed iteratively with research case studies relating to use of UK web archives.

Bl_labs_symposium_2016_298
Andy Jackson, Receiving the Runner up Staff Award on behalf of the SHINE team

The lead institution for SHINE was the University of London, with Professor Jane Winters as principle investigator, and former British Library staff members Peter Webster and Helen Hockx were also instrumental in developing the project and maintaining researcher engagement through the project. 

10 November 2016

British Library Labs Symposium 2016 - Competition and Award Winners

Add comment

The 4th annual British Library Labs Symposium took place on 7th November 2016 and was a resounding success! 

More than 220 people attended and the event was a fantastic experience, showcasing and celebrating the Digital Scholarship field and highlighting the work of BL Labs and their collaborators. The Symposium included a number of exciting announcements about the winners of the BL Labs Competition and BL Labs Awards, who are presented in this blog post. Separate posts will be published about the runners up of the Competition and Awards and posts written by all of the winners and runners up about their work are also scheduled for the next few weeks - watch this space!

BL Labs Competition winner for 2016

Roly Keating, Chief Executive of the British Library announced that the overall winner of the BL Labs Competition for 2016 was...

SherlockNet: Using Convolutional Neural Networks to automatically tag and caption the British Library Flickr collection
By Karen Wang and Luda Zhao, Masters students at Stanford University, and Brian Do, Harvard Medicine MD student

Machine learning can extract information and insights from data on a massive scale. The project developed and optimised Convolutional Neural Networks (CNN), inspired by biological neural networks in the brain, in order to tag and caption the British Library’s Flickr Commons 1 million collection. In the first step of the project, images were classified with general categorical tags (e.g. “people”, “maps”). This served as the basis for the development of new ways to facilitate rapid online tagging with user-defined sets of tags. In the second stage, automatically generate descriptive natural-language captions were provided for images (e.g. “A man in a meadow on a horse”). This computationally guided approach has produced automatic pattern recognition which provides a more intuitive way for researchers to discover and use images. The tags and captions will be made accessible and searchable by the public through the web-based interface and text annotations will be used to globally analyse trends in the Flickr collection over time.

Bl_labs_symposium_2016_131
SherlockNet team presenting at the Symposium

Karen Wang is currently a senior studying Computer Science at Stanford University, California. She also has an Art Practice minor. Karen is very interested in the intersection of computer science and humanities research, so this project is near and dear to her heart! She will be continuing her studies next year at Stanford in CS, Artificial Intelligence track.

Luda Zhao is currently a Masters student studying Computer Science at Stanford University, living in Palo Alto, California. He is interested in using machine learning and data mining to tackle tough problems in a variety of real-life contexts, and he's excited to work with the British Library to make art more discoverable for people everywhere.

Brian Do grew up in sunny California and is a first-year MD/PhD student at Harvard Medical School. Previously he studied Computer Science and biology at Stanford. Brian loves using data visualisation and cutting edge tools to reveal unexpected things about sports, finance and even his own text message history.

SherlockNet recently posted an update of their work and you can try out their SherlockNet interface and tell us what you think.

BL Labs Awards winners for 2016

Research Award winner

Allan Sudlow, Head of Research Development at the British Library announced that the winner of the Research Award was...

Scissors and Paste

By Melodee Beals, Lecturer in Digital History at Loughborough University and historian of migration and media

Bl_labs_symposium_2016_162
Melodee Beals presenting Scissors & Paste

Scissors and Paste utilises the 1800-1900 digitised British Library Newspapers, collection to explore the possibilities of mining large-scale newspaper databases for reprinted and repurposed news content. The project has involved the development of a suite of tools and methodologies, created using both out-of-the-box and custom-made project-specific software, to efficiently identify reprint families of journalistic texts and then suggest both directionality and branching within these subsets. From these case-studies, detailed analyses of additions, omissions and wholesale changes offer insights into the mechanics of reprinting that left behind few if any other traces in the historical record.

Melodee Beals joined the Department of Politics, History and International Relations at Loughborough University in September 2015. Previously, Melodee has worked as a pedagogical researcher for the History Subject Centre, a teaching fellow for the School of Comparative American Studies at the University of Warwick and a Principal Lecturer for Sheffield Hallam University, where she acted as Subject Group Leader for History. Melodee completed her PhD at the University of Glasgow.

Commercial Award winner

Isabel Oswell, Head of Business Audiences at the British Library announced that the winner of the Commercial Award was...

Curating Digital Collections to Go Mobile

By Mitchel Davis, publishing and media entrepreneur

Bl_labs_symposium_2016_178
Mitchell Davis presenting Curating Digital Collections to Go Mobile

As a direct result of its collaborative work with the British Library, BiblioLabs has developed BiblioBoard, an award-winning e-Content delivery platform, and online curatorial and multimedia publishing tools to support it. These tools make it simple for subject area experts to create visually stunning multi-media exhibits for the web and mobile devices without any technical expertise. The curatorial output is almost instantly available via a fully responsive web site as well as through native apps for mobile devices. This unified digital library interface incorporates viewers for PDF, ePub, images, documents, video and audio files allowing users to immerse themselves in the content without having to link out to other sites to view disparate media formats.

Mitchell Davis founded BookSurge in 2000, the world’s first integrated global print-on-demand and publishing services company (sold to Amazon.com in 2005 and re-branded as CreateSpace). Since 2008, he has been founder and chief business officer of BiblioLabs- the creators of BiblioBoard. Mitchell is also an indie producer and publisher who has created several award winning indie books and documentary films over the past decade through Organic Process Productions, a small philanthropic media company he founded with his wife Farrah Hoffmire in 2005.

Artistic Award winner

Jamie Andrews, Head of Culture and Learning at the British Library announced that the winner of the Artistic Award was... 

Here there, Young Sailor

Written and directed by writer and filmmaker Ling Low and visual art by Lyn Ong

Hey There, Young Sailor combines live action with animation, hand-drawn artwork and found archive images to tell a love story set at sea. Inspired by the works of early cinema pioneer Georges Méliès, the video draws on late 19th century and early 20th century images from the British Library's Flickr collection for its collages and tableaux. The video was commissioned by Malaysian indie folk band The Impatient Sisters and independently produced by a Malaysian and Indonesian team.

Bl_labs_symposium_2016_192
Ling Low receives her Award from Jamie Andrews

Ling Low is based between Malaysia and the UK and she has written and directed various short films and music videos. In her fiction and films, Ling is drawn to the complexities of human relationships and missed connections. By day, she works as a journalist and media consultant. Ling has edited a non-fiction anthology of human interest journalism, entitled Stories From The City: Rediscovering Kuala Lumpur, published in 2016. Her journalism has also been published widely, including in the Guardian, the Telegraph and Esquire Malaysia.

Teaching / Learning Award winner

Ria Bartlett, Lead Producer: Onsite Learning at the British Library announced that the winner of the Teaching / Learning Award was...

Library Carpentry

Founded by James Baker, Lecturer at the Sussex Humanities Lab, who represented the global Library Carpentry Team (see below) at the Symposium

Bl_labs_symposium_2016_212
James Baker presenting Library Carpentry

Library Carpentry is software skills training aimed at the needs and requirements of library professionals. It takes the form of a series of modules that are available online for self-directed study or for adaption and reuse by library professionals in face-to-face workshops. Library Carpentry is in the commons and for the commons: it is not tied to any institution or person. For more information on Library Carpentry see http://librarycarpentry.github.io/

James Baker is a Lecturer in Digital History and Archives at the School of History, Art History and Philosophy and at the Sussex Humanities Lab. He is a historian of the long eighteenth century and contemporary Britain. James is a Software Sustainability Institute Fellow and holds degrees from the University of Southampton and latterly the University of Kent. Prior to joining Sussex, James has held positions of Digital Curator at the British Library and Postdoctoral Fellow with the Paul Mellon Centre for Studies of British Art. James is a convenor of the Institute of Historical Research Digital History seminar and a member of the History Lab Plus Advisory Board.

 The Library Carpentry Team is regularly accepting new members and currently also includes: 

Carpentry
The Library Carpentry Team

British Library Labs Staff Award winner

Phil Spence, Chief Operating Officer at the British Library announced that the winner of the British Library Labs Staff Award was...

Libcrowds

Led by Alex Mendes, Software Developer at the British Library

LibCrowds is a crowdsourcing platform built by Alexander Mendes. It aims to create searchable catalogue records for some of the hundreds of thousands of items that can currently only be found in printed and card catalogues. By participating in the crowdsourcing projects, users will help researchers everywhere to access the British Library’s collections more easily in the future.

Bl_labs_symposium_2016_247
Nora McGregor presenting LibCrowds on behalf of Alex Mendes

The first project series, Convert-a-Card, experimented with a new method for transforming printed card catalogues into electronic records for inclusion in our online catalogue Explore, by asking volunteers to link scanned images of the cards with records retrieved from the WorldCat database. Additional projects have recently been launched that invite volunteers to transcribe cards that may require more specific language skills, such as the South Asian minor languages. Records matched, located, transcribed or translated as part of the crowdsourcing projects were uploaded to the British Library's Explore catalogue for anyone to search online. By participating users can have a direct impact on the availability of research material to anyone interested in the diverse collections available at the British Library.

Alex Mendes has worked at the British Library for several years and recently completed a Bachelor’s degree in Computer Science with the Open University. Alex enjoys the consistent challenges encountered when attempting to find innovative new solutions to unusual problems in software development.

AlexMendes
Alex Mendes

If you would like to find out more about BL Labs, our Competition or Awards please contact us at labs@bl.uk   

03 November 2016

SherlockNet update - 10s of millions more tags and thousands of captions added to the BL Flickr Images!

Add comment

SherlockNet are Brian Do, Karen Wang and Luda Zhao, finalists for the Labs Competition 2016.

We have some exciting updates regarding SherlockNet, our ongoing efforts to using machine learning techniques to radically improve the discoverability of the British Library Flickr Commons image dataset.

Tagging

Over the past two months we’ve been working on expanding and refining the set of tags assigned to each image. Initially, we set out simply to assign the images to one of 11 categories, which worked surprisingly well with less than a 20% error rate. But we realised that people usually search from a much larger set of words, and we spent a lot of time thinking about how we would assign more descriptive tags to each image.

Eventually, we settled on a Google Images style approach, where we parse the text surrounding each image and use it to get a relevant set of tags. Luckily, the British Library digitised the text around all 1 million images back in 2007-8 using Optical Character Recognition (OCR), so we were able to grab this data. We explored computational tools such as Term Frequency – Inverse Document Frequency (Tf-idf) and Latent Dirichlet allocation (LDA), which try to assign the most “informative” words to each image, but found that images aren’t always associated with the words on the page.

To solve this problem, we decided to use a 'voting' system where we find the 20 images most similar to our image of interest, and have all images vote on the nouns that appear most commonly in their surrounding text. The most commonly appearing words will be the tags we assign to the image. Despite some computational hurdles selecting the 20 most similar images from a set of 1 million, we were able to achieve this goal. Along the way, we encountered several interesting problems.

Similar images
For all images, similar images are displayed
  1. Spelling was a particularly difficult issue. The OCR algorithms that were state of the art back in 2007-2008 are now obsolete, so a sizable portion of our digitised text was misspelled / transcribed incorrectly. We used a pretty complicated decision tree to fix misspelled words. In a nutshell, it amounted to finding the word that a) is most common across British English literature and b) has the smallest edit distance relative to our misspelled word. Edit distance is the fewest number of edits (additions, deletions, substitutions) needed to transform one word into another.
  2. Words come in various forms (e.g. ‘interest’, ‘interested’, ‘interestingly’) and these forms have to be resolved into one “stem” (in this case, ‘interest’). Luckily, natural language toolkits have stemmers that do this for us. It doesn’t work all the time (e.g. ‘United States’ becomes ‘United St’ because ‘ates’ is a common suffix) but we can use various modes of spell-check trickery to fix these induced misspellings.
  3. About 5% of our books are in French, German, or Spanish. In this first iteration of the project we wanted to stick to English tags, so how do we detect if a word is English or not? We found that checking each misspelled (in English) word against all 3 foreign dictionaries would be extremely computationally intensive, so we decided to throw out all misspelled words for which the edit distance to the closest English word was greater than three. In other words, foreign words are very different from real English words, unlike misspelled words which are much closer.
  4. Several words appear very frequently in all 11 categories of images. These words were ‘great’, ‘time’, ‘large’, ‘part’, ‘good’, ‘small’, ‘long’, and ‘present’. We removed these words as they would be uninformative tags.

In the end, we ended up with between 10 and 20 tags for each image. We estimate that between 30% and 50% of the tags convey some information about the image, and the other ones are circumstantial. Even at this stage, it has been immensely helpful in some of the searches we’ve done already (check out “bird”, “dog”, “mine”, “circle”, and “arch” as examples). We are actively looking for suggestions to improve our tagging accuracy. Nevertheless, we’re extremely excited that images now have useful annotations attached to them!

SherlockNet Interface

Sherlocknet-interface
SherlockNet Interface

For the past few weeks we’ve been working on the incorporation of ~20 million tags and related images and uploading them onto our website. Luckily, Amazon Web Services provides comprehensive computing resources to take care of storing and transferring our data into databases to be queried by the front-end.

In order to make searching easier we’ve also added functionality to automatically include synonyms in your search. For example, you can type in “lady”, click on Synonym Search, and it adds “gentlewoman”, “ma'am”, “madam”, “noblewoman”, and “peeress” to your search as well. This is particularly useful in a tag-based indexing approach as we are using.

As our data gets uploaded over the coming days, you should begin to see our generated tags and related images show up on the Flickr website. You can click on each image to view it in more detail, or on each tag to re-query the website for that particular tag. This way users can easily browse relevant images or tags to find what they are interested in.

Each image is currently captioned with a default description containing information on which source the image came from. As Luda finishes up his captioning, we will begin uploading his captions as well.

We will also be working on adding more advanced search capabilities via wrapper calls to the Flickr API. Proposed functionality will include logical AND and NOT operators, as well as better filtering by machine tags.

Captioning

As mentioned in our previous post, we have been experimenting with techniques to automatically caption images with relevant natural language captions. Since an Artificial Intelligence (AI) is responsible for recognising, understanding, and learning proper language models for captions, we expected the task to be far harder than that of tagging, and although the final results we obtained may not be ready for a production-level archival purposes, we hope our work can help spark further research in this field.

Our last post left off with our usage of a pre-trained Convolutional Neural Networks - Recurrent Neural Networks (CNN-RNN) architecture to caption images. We showed that we were able to produce some interesting captions, albeit at low accuracy. The problem we pinpointed was in the training set of the model, which was derived from the Microsoft COCO dataset, consisting of photographs of modern day scenes, which differs significantly from the BL Flickr dataset.

Through collaboration with BL Labs, we were able to locate a dataset that was potentially better for our purposes: the British Museum prints and drawing online collection, consisting of over 200,000 print drawing, and illustrations, along with handwritten captions describing the image, which the British Museum has generously given us permission to use in this context. However, since the dataset is directly obtained from the public SPARQL endpoints, we needed to run some pre-processing to make it usable. For the images, we cropped them to standard 225 x 225 size and converted them to grayscale. For caption, pre-processing ranged from simple exclusion of dates and author information, to more sophisticated “normalization” procedures, aimed to lessen the size of the total vocabulary of the captions. For words that are exceeding rare (<8 occurrences), we replaced them with <UNK> (unknown) symbols denoting their rarity. We used the same neuraltalk architecture, using the features from a Very Deep Convolutional Networks for Large-Scale Visual Recognition (VGGNet) as intermediate input into the language model. As it turns out, even with aggressive filtering of words, the distribution of vocabulary in this dataset was still too diverse for the model. Despite our best efforts to tune hyperparameters, the model we trained was consistently over-sensitive to key phrases in the dataset, which results in the model converging on local minimums where the captions would stay the same and not show any variation. This seems to be a hard barrier to learning from this dataset. We will be publishing our code in the future, and we welcome anyone with any insight to continue on this research.

Captions
Although there were occasion images with delightfully detailed captions (left), our models couldn’t quite capture useful information for the vast majority of the images(right). More work is definitely needed in this area!

The British Museum dataset (Prints and Drawings from the 19th Century) however, does contain valuable contextual data, and due to our difficulty in using it to directly caption the dataset, we decided to use it in other ways. By parsing the caption and performing Part-Of-Speech (POS) tagging, we were able to extract nouns and proper nouns from each caption. We then compiled common nouns from all the images and filtered out the most common(>=500 images) as tags, resulting in over 1100 different tags. This essentially converts the British Museum dataset into a rich dataset of diverse tags, which we would be able to apply to our earlier work with tag classification. We trained a few models with some “fun” tags, such as “Napoleon”, “parrots” and “angels”, and we were able to get decent testing accuracies of over 75% on binary labels. We will be uploading a subset of these tags under the “sherlocknet:tags” prefix to the Flickr image set, as well as the previous COCO captions for a small subset of images(~100K).

You can access our interface here: bit.ly/sherlocknet or look for 'sherlocknet:tag=' and 'sherlocknet:category=' tags on the British Library Flickr Commons site, here is an example, and see the image below:

Sherlocknet tags
Example Tags on a Flickr Image generated by SherlockNet

Please check it out and let us know if you have any feedback!

We are really excited that we will be there in London in a few days time to present our findings, why don't you come and join us at the British Library Labs Symposium, between 0930 - 1730 on Monday 7th of November, 2016?

Black Abolitionist Performances and their Presence in Britain - An update!

Add comment

Posted by Hannah-Rose Murray, finalist in the BL Labs Competition 2016.

Reflecting back on an incredible and interesting journey over the last few months, it is remarkable at the speed in which five months has flown by! In May, I was chosen as one of the finalists for the British Library Labs Competition 2016, and my project has focused on black abolitionist performances and their presence in Britain during the nineteenth century. Black men and women had an impact in nearly every part of Great Britain, and it is of no surprise to learn their lectures were held in famous meeting halls, taverns, the houses of wealthy patrons, theatres, and churches across the country: we inevitably and unknowably walk past sites with a rich history of Black Britain every day.

I was inspired to apply for this competition by last year’s winner, Katrina Navickas. Her project focused on the Chartist movement, and in particular using the nineteenth century digitised newspaper database to find locations of Chartist meetings around the country. Katrina and the Labs team wrote code to identify these meetings in the Chartist newspaper, and churned out hundreds of results that would have taken her years to search manually.

I wanted to do the same thing, but with black abolitionist speeches. However, there was an inherent problem: these abolitionists travelled to Britain between 1830-1900 and gave lectures in large cities and small towns: in other words their lectures were covered in numerous city and provincial newspapers. The scale of the project was perhaps one of the most difficult things we have had to deal with.

When searching the newspapers, one of the first things we found was the OCR (Optical Character Recognition) is patchy at best. OCR refers to scanned images that have been turned into machine-readable text, and the quality of the OCR depended on many factors – from the quality of the scan itself, to the quality of the paper the newspaper was printed on, to whether it has been damaged or ‘muddied.’ If the OCR is unintelligible, the data will not be ‘read’ properly – hence there could be hundreds of references to Frederick Douglass that are not accessible or ‘readable’ to us through an electronic search (see the image below).

American-slavery
An excerpt from a newspaper article about a public meeting about slavery, from the Leamington Spa Courier, 20 February 1847

In order to 'clean' and sort through the ‘muddied’ OCR and the ‘clean’ OCR, we need to teach the computer what is ‘positive text’ (i.e., language that uses the word ‘abolitionist’, ‘black’, ‘fugitive’, ‘negro’) and ‘negative text’ (language that does not relate to abolition). For example, the image to the left shows an advert for one of Frederick Douglass’s lectures (Leamington Spa Courier, 20 February 1847). The key words in this particular advert that are likely to appear in other adverts, reports and commentaries are ‘Frederick Douglass’, ‘fugitive’, ‘slave’, ‘American’, and ‘slavery.’ I can search for this advert through the digitised database, but there are perhaps hundreds more waiting to be uncovered.
We found examples where the name ‘Frederick’ had been ‘read’ as F!e83hrick or something similar. The image below shows some OCR from the Aberdeen Journal, 5 February 1851, and an article about “three fugitive slaves.” The term ‘Fugitive Slaves’ as a heading is completely illegible, as is William’s name before ‘Crafts.’ If I used a search engine to search for William Craft, it is unlikely this result would be highlighted because of the poor OCR.

Ocr-text
OCR from the Aberdeen Journal, 5 February 1851, and an article about “three fugitive slaves.”

I have spent several years transcribing black abolitionist speeches and most of this will act as the ‘positive’ text. ‘Negative’ text can refer to other lectures of a similar structure but do not relate to abolition specifically, for example prison reform meetings or meetings about church finances. This will ensure the abolitionist language becomes easily readable. We can then test the performance of this against some of the data we already have, and once the probability ensures we are on the right track, we can apply it to a larger data set.

All of this data is built into what is called a classifier, created by Ben O’Steen, Technical Lead of BL Labs. This classifier will read the OCR and collect newspaper references, but works differently to a search engine because it measures words by weight and frequency. It also relies on probability, so for example, if there is an article that mentions fugitive and slave in the same section, it ranks a higher probability that article will be discussing someone like Frederick Douglass or William Craft. On the other hand, a search engine might read the word ‘fugitive slave’ in different articles on the same page of a newspaper.

We’re currently processing the results of the classifier, and adjusting accordingly to try and reach a higher accuracy. This involves some degree of human effort while I double check the references to see whether the results actually contains an abolitionist speech. So far, we have had a few references to abolitionist speeches, but the classifier’s biggest difficulty is language. For example, there were hundreds of results from the 1830s and the 1860s – I instantly knew that these would be references around the Chartist movement because the language the Chartists used would include words like ‘slavery’ when describing labour conditions, and frequently compared these conditions to ‘negro slavery’ in the US. The large number of references from the 1860s highlight the renewed interest in American slavery because of the American Civil War, and there are thousands of articles discussing the Union, Confederacy, slavery and the position of black people as fugitives or soldiers. Several times, the results focused on fugitive slaves in America and not in Britain.

Another result we had referred to a West Indian lion tamer in London! This is a fascinating story and part of the hidden history we see as a central part of the project, but is obviously not an abolitionist speech. We are currently working on restricting our date parameters from 1845 to 1860 to start with, to avoid numerous mentions of Chartists and the War. This is one way in which we have had to be flexible with the initial proposal of the project.

Aside from the work on the classifier, we have also been working on numerous ways to improve the OCR – is it better to apply OCR correction software or is it more beneficial to completely re-OCR the collection, or perhaps a combination of both? We have sent some small samples to a company based in Canberra, Australia called Overproof, who specialise in OCR correction and have provided promising results. Obviously the results are on a small scale but it’s been really interesting so far to see the improvements in today’s software compared to when some of these newspapers were originally scanned ten years before. We have also sent the same sample to the IMPACT centre for competence of Competence in Digitisation whose mission is to make the digitisation of historical printed text “better, faster, cheaper” and provides tools, services and facilities to further advance the state-of-the-art in the field of document imaging, language technology and the processing of historical text. Preliminary results will be presented at the Labs Symposium.

Updated website

Before I started working with the Library, I had designed a website at http://www.frederickdouglassinbritain.com. The structure was rudimentary and slightly awkward, dwarfed by the numerous pages I kept adding to it. As the project progressed, I wanted to improve the website at the same time, and with the invaluable help of Dr Mike Gardner from the University of Nottingham, I re-launched my website at the end of October. Initially, I had two maps, one showing the speaking locations of Frederick Douglass, and another map showing speaking locations by other black abolitionists such as William and Ellen Craft, William Wells Brown and Moses Roper (shown below).

Website-update-maps
Left map showing the speaking locations of Frederick Douglass. Right map showing speaking locations by other black abolitionists such as William and Ellen Craft, William Wells Brown and Moses Roper.

After working with Mike, we not only improved the aesthetics of the website and the maps (making them more professional) but we also used clustering to highlight the areas where these men and women spoke the most. This avoided the ‘busy’ appearance of the first maps and allowed visitors to explore individual places and lectures more efficiently, as the old maps had one pin per location. Furthermore, on the black abolitionist speaking locations map (below right), a user can choose an individual and see only their lectures, or choose two or three in order to correlate patterns between who gave these lectures and where they travelled. 

Website-update-maps-v2
The new map interface for my website.

Events

I am very passionate about public engagement and regard it as an essential part of being an academic, since it is so important to engage and share with, and learn from, the public. We have created two events: as part of Black History Month on the 6th October, we had a performance here at the Library celebrating the life of two formerly enslaved individuals named William and Ellen Craft. Joe Williams of Heritage Corner in Leeds – an actor and researcher who has performed as numerous people such as Frederick Douglass and the black circus entertainer Pablo Fanque – had been writing a play about the Crafts, and because it fitted so well with the project, we invited Joe and actress Martelle Edinborough, who played Ellen, to London for a performance. Both Joe and Martelle were incredible and it really brought the Craft’s story and the project to life. We had a Q&A afterwards where everyone was very responsive and positive to the performance and the Craft’s story of heroism and bravery.

Hannah-murray-actors
(Left to Right) Martelle Edinborough, Hannah-Rose Murray and Joe Williams

The next event is a walking tour, taking place on Saturday 26 November. I’ve devised this tour around central London, highlighting six sites where black activists made an indelible mark on British society during the nineteenth century. It is a way of showing how we walk past these sites on a daily basis, and how we need to recognise the contributions of these individuals to British history.

Hopefully this project will inspire others to research and use digital scholarship to find more ‘hidden voices’ in the archive. In terms of black history specifically, people of colour were actors, sailors, boxers, students, authors as well as lecturers, and there is so much more to uncover about their contribution to British history. My personal journey with the Library and the Labs team has also been a rewarding experience. It has further convinced me that we need stronger networks of collaboration between scholars and computer scientists, and the value of digital humanities in general. Academics could harness the power of technology to bring their research to life, an important and necessary tool for public engagement. I hope to continue working with the Labs team fine-tuning some of the results, as well as writing some pages about black abolitionists for the new website. I’m very grateful to the Library and the Labs team for their support, patience, and this amazing opportunity as I’ve learned so much about digital humanities, and this project – with its combination of manual and technological methods – as a larger model for how we should move forward in the future. The project will shape my career in new and exciting ways, and the opportunity to work with one of the best libraries in the world is a really gratifying experience.

I am really excited that I will be there in London in a few days time to present my findings, why don't you come and join us at the British Library Labs Symposium, between 0930 - 1730 on Monday 7th of November, 2016?

28 October 2016

2016 Shakespeare Off the Map Competition Winners Announced at GameCity11 Festival

Add comment

Last night was the awards event at The National Videogame Arcade in Nottingham for the 2016 Off the Map competition, which had a Shakespeare theme. Now in its fourth year, Off the Map challenges full time UK students in higher or further education to make videogames, digital explorable environments, or interactive fiction based on digitised British Library collection items.

For 2016 the competition has been part of the British Library's commemoration of 400 years since the death of Shakespeare and has been running in conjunction with the Library’s recent exhibition “Shakespeare in Ten Acts”. Curators selected text, images, maps and sounds based on three sub themes:

  • Castles: Scene of Ghosts and Murder
  • The Tempest
  • Forests, Woodlands and A Midsummer Night’s Dream

This year's fantastic first place winning entry used the The Tempest sub theme and was created by Team Quattro from De Montfort University in Leicester. The team consisted of six students: Chris Anka, Perrie Green, Tara Naz, Jade Silver, Jasdev Singh and Joel Wilkins.

image from http://s3.amazonaws.com/hires.aviary.com/k/mr6i2hifk4wxt1dp/16102809/02c40d29-65e4-4846-b2f1-37be0a90907e.png
The Tempest game logo

image from http://s3.amazonaws.com/hires.aviary.com/k/mr6i2hifk4wxt1dp/16102809/32ba2e99-8b3c-48db-8422-77de846cbb42.png
Team Quattro

 

Flythrough of Team Quattro’s ‘The Tempest’ game

Dr Erin Sullivan described the winning game as ‘an evocative, immersive world that powerfully channels the drama of The Tempest. It introduces players to the story of the play in a deep, thoughtful way.’

Dr Abigail Parry said ‘I was head-over-heels for the metatextual element of this submission – you had me at the stage door. It was good, too, to see source text daubed on the caves walls – for me, the greatest strength of the submission was that it succeeded in synthesising text, assets and game environment in a way that was both engaging and beautiful.  Also to be commended was the attention to detail – the prop storm clouds were a delight.  The individual domains were characterful, and the story welcome without being obtrusive.  Most of all, it displayed a real interest in – and affection for - the play. I would want to play this game, and would be equally proud to teach with it.’ 

In second place came Tom Battey from the London College of Communication with a game called ‘Midsummer’ based on the characters in the play A Midsummer Night’s Dream.

Dr Erin Sullivan describing 'Midsummer' said that ‘the visual world and the engagement with the play were extremely impressive. I loved the historical flourishes and the imaginative exploration of the characters’ emotions.’

Midsummer1-825x510
Midsummer by Tom Battey

In third place was an interactive fiction story again using The Tempest sub theme called This Most Desolate Isle by Alan Stewart from Brunel University who effectively used illustrations by Arthur Rackham to accompany his creative writing.

image from http://s3.amazonaws.com/hires.aviary.com/k/mr6i2hifk4wxt1dp/16102715/f2e4c09a-d1eb-4f6e-b976-49078cab8d9c.png
This Most Desolate Isle by Alan Stewart

Huge congratulations to this year's winning entries, and I'd also like to offer sincere thanks to the 2016 Off the Map jury members:

  • Sarah Ellis, Head of Digital Development at the Royal Shakespeare Company
  • Dr Abigail Parry, Poet in Residence at the National Videogame Arcade
  • Dr Erin Sullivan, Shakespeare Institute Senior Lecturer at the University of Birmingham
  • Cheryl Tipp, Wildlife and Environmental Sounds Curator at the British Library
  • Zoë Wilcox, Lead Curator of the Shakespeare Exhibition at the British Library

The 2017 competition is called There Will be Fun Off The Map and this is associated with the British Library’s current exhibition Victorian Entertainments: There Will Be Fun, which is open until Sunday 12 February 2017. Keep your eyes peeled for further information about this; I will be blogging here over the next few weeks, when the new  There Will be Fun Off The Map competition website is available.

Stella Wisdom, Digital Curator, @miss_wisdom

19 October 2016

Maurice Nicholson - British Library Flickr Commons Map Tagger and Top Georeferencer

Add comment

I am a retired pharmacist who has been involved in the British Library's (BL) georeferencing work from its inception in February 2012. I have always had an interest in maps and mapping, and was alerted to this project through social media.

Maurice Nicholson
Maurice Nicholson, Volunteer BL Flickr Maps Tagger & Georeferencer

I made my first attempt at georeferencing one of the maps from the BL collection of Ordnance Survey original manuscripts. This map of my local area of Bedfordshire from 1815 had me 'hooked' on georeferencing and by the end of that week I had georeferenced more than a hundred maps, meaning that I was the main contributor to that first batch of maps.

Subsequent batches of maps were released at intervals from November 2012 through to July 2014, with each set of up to 3200 maps all being georeferenced by volunteers in less than a month per issue, with myself making a major contribution.

Bedford OS 1815
Bedford OS 1815, Screenshot of a detail of the first map I georeferenced (Bedfordshire Ordinance Survey manuscript map 1815).

The July 2014 release consisted of images that had been identified as maps and plans from the BL Flickr commons collection. This collection has just over a million digital images, and it was recognised that there was probably a considerable number of maps and plans suitable for georeferencing within this digital archive. Starting with a map Tag-a-thon held at the BL (through British Library Labs) on Hallowe’en 2014, myself and other volunteers went systematically though the Flickr collection, tagging all these suitable images.

Just over 50,000 maps and plans were found and these images were released as the latest batch needing to be georeferenced in March 2015.

As of October 2016, just over 18,000 of these have been successfully georeferenced, meaning that there are still around 32,000 waiting to be done. Considering how quickly previous releases had been completed the progress has been comparatively disappointing, however there are several reasons for this. Firstly, the sheer number of images needing to be processed, and secondly the variation in the mapping quality.  Previous batches had comprised specific map collections or specially chosen maps, whereas the Flickr collection contained a much wider range of images with many that are going to prove very tricky to georeference.

My own personal contribution to this current batch is 47,000 reference points (76,000 in total since the project started), which at around 10 to 20 points per map equates to several thousand maps georeferenced. This places me a considerable way ahead of any other contributor.
http://www.bl.uk/maps/georeferencingdata.html

No man's land 1916
Screenshot of a detail of one of my favourite maps that I've georeferenced (no man's land south of Ypres 1916)

This year I have been promoting georeferencing in my local area by giving presentations to local history groups, highlighting the uses that georeferenced maps can be put to for research in their area.

In November, the British Library (through its Learning Team and Emma Bull, Schools Programme Manager) is holding a half day conference (details below) aimed at geography teachers, exploring digital resources and their uses in an educational setting. Working with Mahendra Mahey Manager of British Library Labs and Digital Mapping Curator at the British Library, Philip Hatfield my contribution to this is running workshops using my experience and expertise to demonstrate the art of georeferencing and allowing the participants to try georeferencing themselves.

The Way Ahead? Map Making and Digital Skills for Geography Teaching.

Sat 12 Nov, 9:45 – 13:30

British Library Conference Centre, 96 Euston Road, NW1 2DB

Cost: £12 - £24

This half-day conference for Geography teachers at Key Stages 2–5 uncovers the British Library’s forthcoming major exhibition Maps and the 20th Century: Drawing the Line and explores a range of approaches to interpreting and creating maps, with a focus on digital resources, to support and enrich Geography in the Primary and Secondary classroom. 

Link: https://goo.gl/f014YR

I will of course be attending the British Library's Maps Exhibition which starts on the 4th of November 2016 and you can also meet me on Monday 7th of November 2016 at the British Library Labs Symposium.

20 September 2016

Black Abolitionists: Performance and Discussion for Black History Month by Hannah-Rose Murray

Add comment

Posted by Mahendra Mahey on behalf of Hannah-Rose Murray, 2016 finalist of the BL Labs 2016 Competition.

To celebrate Black History Month in October 2016, you are welcome to attend an evening of performance on the 6th October, 7pm, hosted by the British Library Labs project and the Eccles Centre for American Studies in the Auditorium, Conference Centre, British Library, St Pancras, London, UK.

I am very lucky to be one of the finalists for the Labs Competition for 2016, and together we have organized an event that celebrates our project. Through my work with the Labs team, we are attempting to use machine learning to search through the digitized newspaper collections to access black abolitionist speeches and performances that have never been discovered before (read more here). This stems from my PhD project, which focuses on African Americans in Britain during the nineteenth century and the myriad ways they resisted British racism.

Two of the individuals I study are William and Ellen Craft, and we are really pleased to be working with two performers who will bring this incredible history to light on the evening of the 6th.

Ellen_craft
Ellen Craft dressed as a man to escape from slavery. Image from "The Underground Railroad from Slavery to Freedom" 2nd ed.,

William and Ellen Craft were born enslaved in Georgia. Ellen worked as a house servant, and when she was 20, married William (although by law in the South slave marriages were not legal.) They were determined to escape as they were fearful their master would sell them separately further South and they did not want to raise children in slavery. In 1848, they devised an ingenious escape plan: Ellen would pose as a gentleman with William as her manservant, and they would catch a series of trains and steamboats to the North. Ellen was fair-skinned, which was a result of her mother’s rape by her master, the plantation owner. Ellen could thus pass for a white person, but she could not read or write. To overcome this, Ellen strapped a bandage to her right hand to give her a reason not to be able to write just in case she was asked. This was an incredibly dangerous mission to accomplish - if caught, both William and Ellen would have been tortured and most certainly separated to different parts of the South, never to see each other again. It is a testimony to their bravery they managed to succeed.

 

For a short time, the Crafts settled in Boston but legally they were still enslaved in the eyes of the American government. When slave catchers threatened to steal them back into slavery, they set sail for England where they remained for over a decade. The Crafts soon became part of an abolitionist network in which hundreds of African Americans travelled to Britain to lecture against slavery, raise money to purchase enslaved family members or to live in Britain relatively safely from the violence they experienced in America. British audiences were fascinated by their incredible escape attempt, and were shocked that a ‘white’ person like Ellen could ever have been enslaved. Both William and Ellen travelled around Britain to educate Britons about the true nature of slavery and demanded their support in helping Americans abolish it.

During the evening, performer and writer Joe Williams will play William Craft. Joe has an MA from Leeds University’s School of Performance and Cultural industries and is the founder of Heritage Corner, which focuses on African narratives in British history. He has written performed works on leading abolitionists as well as on Victorian circus genius Pablo Fanque.

Martelle Edinborough will play Ellen Craft. Martelle has stage, film and television credits that include commercials and short films. Martelle has recently worked with the Leeds based Geraldine Connor Foundation on Forrest Dreaming and Chicken Shop Shakespeare’s contribution to this year’s Ilkley Literature Festival.

There will be a short welcome and introduction to the Crafts, and after which the performance will commence for an hour, with time for a Q&A afterwards.

Tickets are £8 (with some concessions available), and available here.

Please note a small number of free seats are available for community residents in Camden (London, England). If you think you are eligible, please contact Emma Morgan, Community Engagement Manager at the British Library at emma.morgan@bl.uk.