THE BRITISH LIBRARY

Digital scholarship blog

15 posts categorized "Visual arts"

17 October 2017

Imaginary Cities – Collaborations with Technologists

Add comment

Posted by Mahendra Mahey (Manager of BL Labs) on behalf of Michael Takeo Magruder (BL Labs Artist/Researcher in Residence).

In developing the Imaginary Cities project, I enlisted two long-standing colleagues to help collaboratively design the creative-technical infrastructures required to realise my artistic vision.

The first area of work sought to address my desire to create an automated system that could take a single map image from the British Library’s 1 Million Images from Scanned Books Flickr Commons collection and from it generate an endless series of everchanging aesthetic iterations. This initiative was undertaken by the software architect and engineer David Steele who developed a server-side program to realise this concept.

David’s server application links to a curated set of British Library maps through their unique Flickr URLs. The high-resolution maps are captured and stored by the server, and through a pre-defined algorithmic process are transformed into ultra-high-resolution images that appear as mandala-esque ‘city plans’. This process of aesthetic transformation is executed once per day, and is affected by two variables. The first is simply the passage of time, while the second is based on external human or network interaction with the original source maps in the digital collection (such as changes to meta data tags, view counts, etc.).


Time-lapse of algorithmically generated images (showing days 1, 7, 32 and 152) constructed from a 19th-century map of Paris

The second challenge involved transforming the algorithmically created 2D assets into real-time 3D environments that could be experienced through leading-edge visualisation systems, including VR headsets. This work was led by the researcher and visualisation expert Drew Baker, and was done using the 3D game development platform Unity. Drew produced a working prototype application that accessed the static image ‘city plans’ generated by David’s server-side infrastructure, and translated them into immersive virtual ‘cityscapes’.

The process begins with the application analysing an image bitmap and converting each pixel into a 3D geometry that is reminiscent of a building. These structures are then textured and aligned in a square grid that matches the original bitmap. Afterwards, the camera viewpoint descends into the newly rezzed city and can be controlled by the user.

Takeo_DS-Blog3-2_Unity1
Analysis and transformation of the source image bitmap
Takeo_DS-Blog3-3_Unity2
View of the procedurally created 3D cityscape

At present I am still working with David and Drew to refine and expand these amazing systems that they have created. Moving forward, our next major task will be to successfully use the infrastructures as the foundation for a new body of artwork.

You can see a presentation from me at the British Library Labs Symposium 2017 at the British Library Conference Centre Auditorium in London, on Monday 30th of October, 2017. For more information and to book (registration is FREE), please visit the event page.

About the collaborators:

Takeo_DS-Blog3-4_D-Steele
David Steele

David Steele is a computer scientist based in Arlington, Virginia, USA specialising in progressive web programming and database architecture. He has been working with a wide range of web technologies since the mid-nineties and was a pioneer in pairing cutting-edge clients to existing corporate infrastructures. His work has enabled a variety of advanced applications from global text messaging frameworks to re-entry systems for the space shuttle. He is currently Principal Architect at Crunchy Data Solutions, Inc., and is involved in developing massively parallel backup solutions to protect the world's ever-growing data stores.

Takeo_DS-Blog3-5_D-Baker
Drew Baker

Drew Baker is an independent researcher based in Melbourne Australia. Over the past 20 years he has worked in visualisation of archaeology and cultural history. His explorations in 3D digital representation of spaces and artefacts as a research tool for both virtual archaeology and broader humanities applications laid the foundations for the London Charter, establishing internationally-recognised principles for the use of computer-based visualisation by researchers, educators and cultural heritage organisations. He is currently working with a remote community of Indigenous Australian elders from the Warlpiri nation in the Northern Territory’s Tanami Desert, digitising their intangible cultural heritage assets for use within the Kurdiji project – an initiative that seeks to improve mental health and resilience in the nation’s young people through the use mobile technologies.

26 September 2017

BL Labs Symposium (2017), Mon 30 Oct: book your place now!

Add comment

Bl_labs_logo

Posted by Mahendra Mahey, BL Labs Manager

The BL Labs team are pleased to announce that the fifth annual British Library Labs Symposium will be held on Monday 30 October, from 9:30 - 17:30 in the British Library Conference Centre, St Pancras. The event is FREE, although you must book a ticket in advance. Don't miss out!

The Symposium showcases innovative projects which use the British Library’s digital content, and provides a platform for development, networking and debate in the Digital Scholarship field.

Josie-Fraser
Josie Fraser will be giving the keynote at this year's Symposium

This year, Dr Adam Farquhar, Head of Digital Scholarship at the British Library, will launch the Symposium and Josie Fraser, Senior Technology Adviser on the National Technology Team, based in the Department for Digital, Culture, Media and Sport in the UK Government, will be presenting the keynote. 

There will be presentations from BL Labs Competition (2016) runners up, artist/researcher Michael Takeo Magruder about his 'Imaginary Cities' project and lecturer/researcher Jennifer Batt about her 'Datamining verse in Eighteenth Century Newspapers' project.

After lunch, the winners of the BL Labs Awards (2017) will be announced followed by presentations of their work. The Awards celebrates researchers, artists, educators and entrepreneurs from around the world who have made use of the British Library's digital content and data, in each of the Awards’ categories:

  • BL Labs Research Award. Recognising a project or activity which shows the development of new knowledge, research methods or tools.
  • BL Labs Artistic Award. Celebrating a creative or artistic endeavour which inspires, stimulates, amazes and provokes.
  • BL Labs Commercial Award. Recognising work that delivers or develops commercial value in the context of new products, tools or services that build on, incorporate or enhance the British Library's digital content.
  • BL Labs Teaching / Learning Award. Celebrating quality learning experiences created for learners of any age and ability that use the British Library's digital content.
  • BL Labs Staff Award. Recognising an outstanding individual or team who have played a key role in innovative work with the British Library's digital collections.  

The Symposium's endnote will be followed by a networking reception which will conclude the event, at which delegates and staff can mingle and network over a drink.  

Tickets are going fast, so book your place for the Symposium today!

For any further information please contact labs@bl.uk

04 August 2017

BL Labs Awards (2017): enter before midnight Wednesday 11th October!

Add comment

Posted by Mahendra Mahey, Manager of of British Library Labs.

The BL Labs Awards formally recognises outstanding and innovative work that has been created using the British Library’s digital collections and data.

The closing date for entering the BL Labs Awards (2017) is midnight BST on Wednesday 11th October. So please submit your entry and/or help us spread the word to all interested and relevant parties over the next few months or so. This will ensure we have another year of fantastic digital-based projects highlighted by the Awards!

This year, the BL Labs Awards is commending work in four key areas:

  • Research - A project or activity which shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour which inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

After the submission deadline of midnight BST on Wednesday 11th October for entering the BL Labs Awards has past, the entries will be shortlisted. Selected shortlisted entrants will be notified via email by midnight BST on Friday 20th October 2017. 

A prize of £500 will be awarded to the winner and £100 to the runner up of each Awards category at the BL Labs Symposium on 30th October 2017 at the British Library, St Pancras, London.

The talent of the BL Labs Awards winners and runners ups of 2016 and 2015 has led to the production a remarkable and varied collection of innovative projects. In 2016, the Awards commended work in four main categories – Research, Creative/Artistic and Entrepreneurship:

  • Research category Award (2016) winner: 'Scissors and Paste', by M. H. Beals. Scissors and Paste utilises the 1800-1900 digitised British Library Newspapers, collection to explore the possibilities of mining large-scale newspaper databases for reprinted and repurposed news content.
  • Artistic Award (2016) winner: 'Hey There, Young Sailor', written and directed by Ling Low with visual art by Lyn Ong. Hey There, Young Sailor combines live action with animation, hand-drawn artwork and found archive images to tell a love story set at sea. The video draws on late 19th century and early 20th century images from the British Library's Flickr collection for its collages and tableaux and was commissioned by Malaysian indie folk band The Impatient Sisters and independently produced by a Malaysian and Indonesian team.
BL Labs Award Winners 2016
Image: 'Scissors and Paste', by M. H. Beals (Top-left)
'Curating Digital Collections to Go Mobile', by Mitchell Davis; (Top-right)
 'Hey There, Young Sailor',
written and directed by Ling Low with visual art by Lyn Ong; (Bottom-left)
'Library Carpentry', founded by James Baker and involving the international Library Carpentry team;
(Bottom-right) 
  • Commercial Award (2016) winner: 'Curating Digital Collections to Go Mobile', by Mitchell Davis. BiblioBoard, is an award-winning e-Content delivery platform, and online curatorial and multimedia publishing tools to support it to make it simple for subject area experts to create visually stunning multi-media exhibits for the web and mobile devices without any technical expertise, the example used a collection of digitised 19th Century books.
  • Teaching and Learning (2016) winner: 'Library Carpentry', founded by James Baker and involving the international Library Carpentry team. Library Carpentry is software skills training aimed at the needs and requirements of library professionals taking the form of a series of modules that are available online for self-directed study or for adaption and reuse by library professionals in face-to-face workshops using British Library data / collections. Library Carpentry is in the commons and for the commons: it is not tied to any institution or person. For more information, see http://librarycarpentry.github.io/.
  • Jury’s Special Mention Award (2016): 'Top Geo-referencer -Maurice Nicholson' . Maurice leads the effort to Georeference over 50,000 maps that were identified through Flickr Commons, read more about his work here.

For any further information about BL Labs or our Awards, please contact us at labs@bl.uk.

16 May 2017

Michael Takeo Magruder @ Gazelli Art House

Add comment

Posted by Mahendra Mahey (Manager of BL Labs) on behalf of Michael Takeo Magruder (BL Labs Artist/Researcher in Residence).

Takeo_BL-Labs-Blog_Gazelli1
Michael Takeo Marguder's Gazell.io works

Earlier this year I was invited by Gazelli Art House to be a digital artist-in-residence on their online platform Gazell.io. After a series of conversations with Gazelli’s director, Mila Askarova, we decided it would be a perfect opportunity to broker a partnership with British Library Labs and use the occasion to publish some of the work-in-progress ideas from my Imaginary Cities project at the British Library.

Given Gazelli’s growing interest in and reputation for exhibiting virtual reality (VR) art, we chose to launch my March showcase with A New Jerusalem since it was in many ways the inspiration for the Imaginary Cities concept.

MTM_NJ-internal
A New Jerusalem by Michael Takeo Magruder

During the second half of my Gazell.io residency I began publishing various aesthetic-code studies that had been created for the Imaginary Cities project. I was also invited by Gazelli to hold a private sharing event at their London gallery in Mayfair to showcase some of the project’s physical experiments and outcomes. The evening was organised by Gazelli’s Artist Liaison, Victoria Al-Din, and brought together colleagues from the British Library, art curators from leading cultural institutions and academics connected to media art practice. It was a wonderful event, and it was incredibly useful to be able to present my ideas and the resulting artistic-technical prototypes to a group with such a deep and broad range of expertise. 


Sharing works in progress for the Imaginary Cities project at Gazelli Art House, London. 30th March 2017

03 November 2016

SherlockNet update - 10s of millions more tags and thousands of captions added to the BL Flickr Images!

Add comment

SherlockNet are Brian Do, Karen Wang and Luda Zhao, finalists for the Labs Competition 2016.

We have some exciting updates regarding SherlockNet, our ongoing efforts to using machine learning techniques to radically improve the discoverability of the British Library Flickr Commons image dataset.

Tagging

Over the past two months we’ve been working on expanding and refining the set of tags assigned to each image. Initially, we set out simply to assign the images to one of 11 categories, which worked surprisingly well with less than a 20% error rate. But we realised that people usually search from a much larger set of words, and we spent a lot of time thinking about how we would assign more descriptive tags to each image.

Eventually, we settled on a Google Images style approach, where we parse the text surrounding each image and use it to get a relevant set of tags. Luckily, the British Library digitised the text around all 1 million images back in 2007-8 using Optical Character Recognition (OCR), so we were able to grab this data. We explored computational tools such as Term Frequency – Inverse Document Frequency (Tf-idf) and Latent Dirichlet allocation (LDA), which try to assign the most “informative” words to each image, but found that images aren’t always associated with the words on the page.

To solve this problem, we decided to use a 'voting' system where we find the 20 images most similar to our image of interest, and have all images vote on the nouns that appear most commonly in their surrounding text. The most commonly appearing words will be the tags we assign to the image. Despite some computational hurdles selecting the 20 most similar images from a set of 1 million, we were able to achieve this goal. Along the way, we encountered several interesting problems.

Similar images
For all images, similar images are displayed
  1. Spelling was a particularly difficult issue. The OCR algorithms that were state of the art back in 2007-2008 are now obsolete, so a sizable portion of our digitised text was misspelled / transcribed incorrectly. We used a pretty complicated decision tree to fix misspelled words. In a nutshell, it amounted to finding the word that a) is most common across British English literature and b) has the smallest edit distance relative to our misspelled word. Edit distance is the fewest number of edits (additions, deletions, substitutions) needed to transform one word into another.
  2. Words come in various forms (e.g. ‘interest’, ‘interested’, ‘interestingly’) and these forms have to be resolved into one “stem” (in this case, ‘interest’). Luckily, natural language toolkits have stemmers that do this for us. It doesn’t work all the time (e.g. ‘United States’ becomes ‘United St’ because ‘ates’ is a common suffix) but we can use various modes of spell-check trickery to fix these induced misspellings.
  3. About 5% of our books are in French, German, or Spanish. In this first iteration of the project we wanted to stick to English tags, so how do we detect if a word is English or not? We found that checking each misspelled (in English) word against all 3 foreign dictionaries would be extremely computationally intensive, so we decided to throw out all misspelled words for which the edit distance to the closest English word was greater than three. In other words, foreign words are very different from real English words, unlike misspelled words which are much closer.
  4. Several words appear very frequently in all 11 categories of images. These words were ‘great’, ‘time’, ‘large’, ‘part’, ‘good’, ‘small’, ‘long’, and ‘present’. We removed these words as they would be uninformative tags.

In the end, we ended up with between 10 and 20 tags for each image. We estimate that between 30% and 50% of the tags convey some information about the image, and the other ones are circumstantial. Even at this stage, it has been immensely helpful in some of the searches we’ve done already (check out “bird”, “dog”, “mine”, “circle”, and “arch” as examples). We are actively looking for suggestions to improve our tagging accuracy. Nevertheless, we’re extremely excited that images now have useful annotations attached to them!

SherlockNet Interface

Sherlocknet-interface
SherlockNet Interface

For the past few weeks we’ve been working on the incorporation of ~20 million tags and related images and uploading them onto our website. Luckily, Amazon Web Services provides comprehensive computing resources to take care of storing and transferring our data into databases to be queried by the front-end.

In order to make searching easier we’ve also added functionality to automatically include synonyms in your search. For example, you can type in “lady”, click on Synonym Search, and it adds “gentlewoman”, “ma'am”, “madam”, “noblewoman”, and “peeress” to your search as well. This is particularly useful in a tag-based indexing approach as we are using.

As our data gets uploaded over the coming days, you should begin to see our generated tags and related images show up on the Flickr website. You can click on each image to view it in more detail, or on each tag to re-query the website for that particular tag. This way users can easily browse relevant images or tags to find what they are interested in.

Each image is currently captioned with a default description containing information on which source the image came from. As Luda finishes up his captioning, we will begin uploading his captions as well.

We will also be working on adding more advanced search capabilities via wrapper calls to the Flickr API. Proposed functionality will include logical AND and NOT operators, as well as better filtering by machine tags.

Captioning

As mentioned in our previous post, we have been experimenting with techniques to automatically caption images with relevant natural language captions. Since an Artificial Intelligence (AI) is responsible for recognising, understanding, and learning proper language models for captions, we expected the task to be far harder than that of tagging, and although the final results we obtained may not be ready for a production-level archival purposes, we hope our work can help spark further research in this field.

Our last post left off with our usage of a pre-trained Convolutional Neural Networks - Recurrent Neural Networks (CNN-RNN) architecture to caption images. We showed that we were able to produce some interesting captions, albeit at low accuracy. The problem we pinpointed was in the training set of the model, which was derived from the Microsoft COCO dataset, consisting of photographs of modern day scenes, which differs significantly from the BL Flickr dataset.

Through collaboration with BL Labs, we were able to locate a dataset that was potentially better for our purposes: the British Museum prints and drawing online collection, consisting of over 200,000 print drawing, and illustrations, along with handwritten captions describing the image, which the British Museum has generously given us permission to use in this context. However, since the dataset is directly obtained from the public SPARQL endpoints, we needed to run some pre-processing to make it usable. For the images, we cropped them to standard 225 x 225 size and converted them to grayscale. For caption, pre-processing ranged from simple exclusion of dates and author information, to more sophisticated “normalization” procedures, aimed to lessen the size of the total vocabulary of the captions. For words that are exceeding rare (<8 occurrences), we replaced them with <UNK> (unknown) symbols denoting their rarity. We used the same neuraltalk architecture, using the features from a Very Deep Convolutional Networks for Large-Scale Visual Recognition (VGGNet) as intermediate input into the language model. As it turns out, even with aggressive filtering of words, the distribution of vocabulary in this dataset was still too diverse for the model. Despite our best efforts to tune hyperparameters, the model we trained was consistently over-sensitive to key phrases in the dataset, which results in the model converging on local minimums where the captions would stay the same and not show any variation. This seems to be a hard barrier to learning from this dataset. We will be publishing our code in the future, and we welcome anyone with any insight to continue on this research.

Captions
Although there were occasion images with delightfully detailed captions (left), our models couldn’t quite capture useful information for the vast majority of the images(right). More work is definitely needed in this area!

The British Museum dataset (Prints and Drawings from the 19th Century) however, does contain valuable contextual data, and due to our difficulty in using it to directly caption the dataset, we decided to use it in other ways. By parsing the caption and performing Part-Of-Speech (POS) tagging, we were able to extract nouns and proper nouns from each caption. We then compiled common nouns from all the images and filtered out the most common(>=500 images) as tags, resulting in over 1100 different tags. This essentially converts the British Museum dataset into a rich dataset of diverse tags, which we would be able to apply to our earlier work with tag classification. We trained a few models with some “fun” tags, such as “Napoleon”, “parrots” and “angels”, and we were able to get decent testing accuracies of over 75% on binary labels. We will be uploading a subset of these tags under the “sherlocknet:tags” prefix to the Flickr image set, as well as the previous COCO captions for a small subset of images(~100K).

You can access our interface here: bit.ly/sherlocknet or look for 'sherlocknet:tag=' and 'sherlocknet:category=' tags on the British Library Flickr Commons site, here is an example, and see the image below:

Sherlocknet tags
Example Tags on a Flickr Image generated by SherlockNet

Please check it out and let us know if you have any feedback!

We are really excited that we will be there in London in a few days time to present our findings, why don't you come and join us at the British Library Labs Symposium, between 0930 - 1730 on Monday 7th of November, 2016?

02 November 2016

An Overview of 6 Second History Animated British Library Photographs & Illustrations

Add comment

Posted by Mahendra Mahey on behalf of Nick Cave, entrant in the BL Labs Awards 2016 under the 'Artistic Category'. 

Nick-cave
Nick Cave - Animator

Today’s blockbuster films sees long forgotten dinosaurs, futuristic warping spaceships and metallic looking beings the size of our tallest buildings, transforming from a car to a giant robot in the blink of an eye. There are even whole planets of giant alien creatures walking amongst people and trees and all of these incredible visual showcases are invading our cinema screens week in, week out.

However, back before the advent of sophisticated computer generated graphics technology, artists were using simpler photographic techniques to create short animations with zany characters, new landscapes and everyday objects to bring laughter to the masses on the small screen. One such artist was Terry Gilliam of Monty Python fame who used a technique known as stop motion animation to bring a variety of household magazine pictures to life. By cutting out different pictures, photographing and filming them individually moving over time, they became very funny animated cartoon like sketches.


Terry Gilliam - Monty Python animations

I’ve always been fascinated by this amazing technique, even if it is a very time consuming one, but modern computer animation software makes this process easier. Stop motion animations simply have a quirky charm. More often than not they’re animations which are not polished, or even necessarily slick looking. It’s a style of animation which creates imaginary worlds and characters, often moving in a jagged, staccato fashion, but still somehow one that looks and feels as engaging and interesting as modern visual effects which have cost millions to create. So, with the Terry Gilliam magazine picture ideas in mind where to start, social media of course! 

Stop-motion-1
An example of working on stop motion animation

Social media is dominated by celebrity gossip and tittle tattle, breaking news, but major events also continue to play a key part in posts. This could be when celebrating sporting achievements, raving about a new film, TV show, or even an anniversary event and significantly, nostalgia.

People always like to look back and remember, which is where my 6SecondHistory idea spawned from. I chose Instagram as my social media delivery platform, partly because mini web episodes, such as crime thriller, Shield5, had been very successful on it and partly because it’s a social media platform created specifically to showcase photographs and short videos.

As copyright can be a contentious legal minefield, where to source the modern equivalent of historical magazine photos from? Well, easy, the British Library has a massive collection of freely available copyright free Flickr archive photographs and illustrations to choose from. Animals, places, people, fancy illustrations from old manuscripts, basically a wealth of interesting material. The interesting and sometimes vexing challenge in bringing these pictures to life are many, because they’re often hand drawn with no clear differentiation between foreground and background objects, plus searching for specific pictures can sometimes bring up an interesting results set. Six second animations seemed a good starting point because of the success of internet vines, also six second gifs, or videos etc.

Shakespeare-small
Images taken from the British Library Flickr Commons Site

Left - British Library Flickr Shakespeare Character (https://flic.kr/p/i2ba1M): Image taken from page 252 of 'The Oxford Thackeray. With illustrations. [Edited with introductions by George Saintsbury.]’

Top Right - British Library Flickr Skull (https://flic.kr/p/hVgqkH): Image taken from page 246 of 'Modern Science in Bible Lands ... With maps and illustrations’ - 1888

Bottom Right - British Library Flickr Shakespeare Theatre (https://flic.kr/p/i6ymLe): Image taken from page 76 of 'The Works of Shakspeare; from the text of Johnson, Steevens, and Reed. With a biographical memoir, and a variety of interesting matter, illustrative of his life and writings. By W. Harvey’ - 1825

As an example, 2016 saw the 400th anniversary of Shakespeare’s death.  Whenever the focus is on Shakespeare famous speeches are cited and one such speech is Hamlet’s Act 5 Scene 1 lament to a disembodied skull. Perfect material for a funny 6SecondHistory animation and one that could truly show off the merging of a variety of British Library archive pics, repurposed and coloured to create a comical short Hamlet animation with an element of 3D perspective in it. This was a labour of love, but I hope you agree that my short animation has brought the speech to life.

Here is a link to my animation: https://www.instagram.com/p/BDA8BhWju0D/

Enjoy!

See more of my work at: https://www.instagram.com/6secondhistory and http://www.anitatoes.com/

You can meet me, Nick Cave at the British Library Labs Symposium on Monday 7th of November 2016 at the British Library in London (we still have a few tickets left so book now).

 

20 October 2016

Imaginary Cities - British Library Labs Project

Add comment

Posted by Mahendra Mahey on behalf of Michael Takeo Magruder, first runner up in the British Library Labs Competition 2016.

Michael will be working with the BL Labs team between November 2016 and March 2017 on his project 'Imaginary Cities' which is described below: 

Imaginary Cities
by Michael Takeo Magruder, visual artist and researcher

Exploring the British Library’s digital collection of historic urban maps to create provocative fictional cityscapes for the Information Age

About the project:

Takeo_DS-Blog_Imaginary Cities study detailImaginary Cities (study i), Michael Takeo Magruder, 2016 – aesthetic rendering that has been procedurally generated from a 19th century map of London

Imaginary Cities is an arts-humanities research project that considers how large digital repositories of historic cultural materials can be used to create new born-digital artworks and real-time experiences which are relevant and exciting to 21st century audiences. The project will take images and associated metadata of pre-20th century urban maps drawn from the British Library’s online “1 Million Images from Scanned Books” digital collection on Flickr Commons and transform this material into provocative fictional cityscapes for the Information Age.

Takeo_DS-Blog_Flickr source
Imaginary Cities (study i), Michael Takeo Magruder, 2016 – source digitised map and associated metadata parsed from British Library Flickr Commons

The project will exemplify collaborative and interdisciplinary research as it will bring together contemporary arts practice, digital humanities scholarship and advanced visualisation technology. The project’s outcomes will encompass both artistic and scholarly outputs, most important of which will be a set of prototype digital artworks that will exist as physical installations constructed with leading-edge processes including generative systems, real-time virtual environments and 3D printing. Blending the historical and the contemporary, the informative and the aesthetic, these artworks will not only draw from and feed into the British Library’s digital scholarship and curatorial programmes, but more significantly, will engender new ways for members of the general public to discover and access the Library’s important digital collections and research initiatives.

Takeo_DS-Blog_Imaginary Cities study
Imaginary Cities (study i), Michael Takeo Magruder, 2016 – detail of the aesthetic rendering

If you would like to meet Michael, he will be at the British Library Labs Symposium on Monday 7th of November 2016, at the British Library in London to talk about his work.

About the artist:

Takeo_DS-Blog_portrait
Michael Takeo Magruder

Michael Takeo Magruder (b.1974, US/UK, www.takeo.org) is a visual artist and researcher who works with new media including real-time data, digital archives, immersive environments, mobile devices and virtual worlds. His practice explores concepts ranging from media criticism and aesthetic journalism to digital formalism and computational aesthetics, deploying Information Age technologies and systems to examine our networked, media-rich world.

In the last 15 years, Michael’s projects have been showcased in over 250 exhibitions in 34 countries, and his art has been supported by numerous funding bodies and public galleries within the UK, US and EU. In 2010, Michael represented the UK at Manifesta 8: the European Biennial of Contemporary Art and several of his most well-known digital artworks were added to the Rose Goldsen Archive of New Media Art at Cornell University. More recently, he was a Leverhulme Trust artist-in-residence (2013-14) collaborating with Prof Ben Quash (Theology, King’s College London) and Alfredo Cramerotti (Director, Mostyn) to create a new solo exhibition - entitled De/coding the Apocalypse - exploring contemporary creative visions inspired by and based on the Book of Revelation. In 2014, Michael was commissioned by the UK-based theatre company Headlong to create two new artworks - PRISM (a new media installation reflecting on Headlong’s production of George Orwell’s 1984) and The Nether Realm (a living virtual world inspired by Jennifer Haley’s play The Nether). Last year, he was awarded the 2015 Immersive Environments Lumen Prize for his virtual reality installation A New Jerusalem.

22 August 2016

SherlockNet: tagging and captioning the British Library’s Flickr images

Add comment

Finalists of the BL Labs Competition 2016, Karen Wang, Luda Zhao and Brian Do, inform us on the progress of their SherlockNet project:

Sherlock

This is an update on SherlockNet, our project to use machine learning and other computational techniques to dramatically increase the discoverability of the British Library’s Flickr images dataset. Below is some of our progress on tagging, captioning, and the web interface.

Tags

When we started this project, our goal was to classify every single image in the British Library's Flickr collection into one of 12 tags -- animals, people, decorations, miniatures, landscapes, nature, architecture, objects, diagrams, text, seals, and maps. Over the course of our work, we realised the following:

  1. We were achieving incredible accuracy (>80%) in our classification using our computational methods.
  2. If our algorithm assigned two tags to an image with approximately equal probability, there was a high chance the image had elements associated with both tags.
  3. However, these tags were in no way enough to expose all the information in the images.
  4. Luckily, each image is associated with text on the corresponding page.

We thus wondered whether we could use the surrounding text of each image to help expand the “universe” of possible tags. While the text around an image may or may not be directly related to the image, this strategy isn’t without precedent: Google Images uses text as its main method of annotating images! So we decided to dig in and see how this would go.

As a first step, we took all digitised text from the three pages surrounding each image (the page before, the page of, and the page after) and extracted all noun phrases. We figured that although important information may be captured in verbs and adjectives, the main things people will be searching for are nouns. Besides, at this point this is a proof of principle that we can easily extend later to a larger set of words. We then constructed a composite set of all words from all images, and only kept words present in between 5% and 80% of documents. This was to get rid of words that were too rare (often misspellings) or too common (words like ‘the’, ‘a’, ‘me’ -- called “stop words” in the natural language processing field).

With this data we were able to use a tool called Latent Dirichlet Allocation (LDA) to find “clusters” of images in an automatic way. We chose the original 12 tags after manually going through 1,000 images on our own and deciding which categories made the most sense based on what we saw; but what if there are categories we overlooked or were unable to discern by hand? LDA solves this by trying to find a minimal set of tags where each document is represented by a set of tags, and each tag is represented by a set of words. Obviously the algorithm can’t provide meaning to each tag, so we provide meaning to the tag by looking at the words that are present or absent in each tag. We ran LDA on a sample of 10,000 images and found tags clusters for men, women, nature, and animals. Not coincidentally, these are similar to our original tags and represent a majority of our images.

This doesn’t solve our quest for expanding our tag universe though. One strategy we thought about was to just use the set of words from each page as the tags for each image. We quickly found, however, that most of the words around each image are irrelevant to the image, and in fact sometimes there was no relation at all. To solve this problem, we used a voting system [1]. From our computational algorithm, we found the 20 images most similar to the image in question. We then looked for the words that were found most often in the pages around these 20 images. We then use these words to describe the image in question. This actually works quite well in practice! We’re now trying to combine this strategy (finding generalised tags for images) with the simpler strategy (unique words that describe images) to come up with tags that describe images at different “levels”.

Image Captioning

We started with a very ambitious goal: given only the image input, can we give a machine -generated, natural-language description of the image with a reasonably high degree of accuracy and usefulness? Given the difficulty of the task and of our timeframe, we didn’t expect to get perfect results, but we’ve hoped to come up with a first prototype to demonstrate some of the recent advances and techniques that we hope will be promising for research and application in the future.

We planned to look at two approaches to this problem:

  • Similarity-based captioning. Images that are very similar to each other using a distance metric often share common objects, traits, and attributes that shows up in the distribution of words in their captions. By pooling words together from a bag of captions of similar images, one can come up with a reasonable caption for the target image.
  • Learning-based captioning. By utilising a CNN similar to what we used for tagging, we can capture higher-level features in images. We then attempt to learn the mappings between the higher-level features and their representations in words, using either another neural network or other methods.

We have made some promising forays into the second technique. As a first step, we used a pre-trained CNN-RNN architecture called NeuralTalk to caption our images. As the models are trained on the Microsoft COCO dataset, which consists of pictures and photograph that differs significantly from the British Library's  Flickr dataset, we expect the transfer of knowledge to be difficult. Indeed, the resulting captions of some ~1000 test images show that weakness, with the black-and-white exclusivity of the British Library illustration and the more abstract nature of some illustrations being major roadblocks in the qualities of the captioning. Many of the caption would comment on the “black and white” quality of the photo or “hallucinate” objects that did not exist in the images. However, there were some promising results that came back from the model. Below are some hand-pick examples. Note that this was generated with no other metadata; only the raw image was given.

S1 S2 S3
From a rough manual pass, we estimate that around 1 in 4 captions are of useable quality: accurate, contains interesting and useful data that would aid in search discovery, catalogisation etc., with occasional gems (like the elephant caption!). More work will be directed to help us increase this metric.

Web Interface

We have been working on building the web interface to expose this novel tag data to users around the world.

One thing that’s awesome about making the British Library dataset available via Flickr, is that Flickr provide an amazing API for developers. The API exposes, among other functions, the image website’s search logic via tags as well as free text search using the image title and description, and the capability to sort by a number of factors including relevance and “interestingness”. We’ve been working on using the Flickr API, along with AngularJS and Node.js to build a wireframe site. You can check it out here.

If you look at the demo or the British Library's Flickr album, you’ll see that each image has a relatively sparse set of tags to query from. Thus, our next steps will be adding our own tags and captions to each image on Flickr. We will pre-pend these with a custom namespace to distinguish them from existing user-contributed and machine tags, and utilise them in queries to find better results.

Finally, we are interested in what users will use the site for. For example, we could track user’s queries and which images they click on or save. These images are presumably more relevant to these queries, and we rank them higher in future queries. We also want to be able to track general analytics like the most popular queries over time. Thus incorporating user analytics will be the final step in building the web interface.

We welcome any feedback and questions you may have! Contact us at teamsherlocknet@gmail.com

References

[1] Johnson J, Ballan L, Fei-Fei L. Love Thy Neighbors: Image Annotation by Exploiting Image Metadata. arXiv (2016)