THE BRITISH LIBRARY

Digital scholarship blog

86 posts categorized "Projects"

27 September 2017

In the Spotlight: Application design

Add comment

Alex Mendes, Research Software Engineer with the British Library's Digital Scholarship team, provides some insight into our adaptation of an existing crowdsourcing platform to meet our varied needs.

Earlier this month, we announced a preview of a new crowdsourcing project we're working on. In the Spotlight aims to make the library’s collection of historic playbills easier to find. This post will explore some of the factors involved in our initial project design and the technologies used within the core application.

In_the_spotlight_homepage

The In the Spotlight homepage

During the early stages of development we talked to people working on various projects that deal with similar material, such as Ensemble @ Yale, which is an experiment into crowdsourcing transcriptions of digitised programs for Yale dramatic productions. While these conversations were incredibly useful, and the projects inspiring, after some deliberation we decided that the overhead of modifying such an application to fit our particular needs was too large.

Such projects have often been built for, and become increasingly tightly coupled with, a particular institutional purpose. By starting with such an application and modifying it heavily with our own institution-specific code we would likely be assuming sole responsibility for future maintenance of that application. Being unable to merge our code back into the original, we would be left managing our own modified version; one with limited usefulness outside an increasingly specific purpose. We wanted to avoid creating a significant maintenance issue, and sought a more generic, yet customisable platform.

Accordingly, we turned to our  existing crowdsourcing platform, LibCrowds, which was launched in June 2015 to host the Convert-a-Card projects and help turn printed card catalogues into a searchable online database. The platform is based on PyBossa, a Python library for building crowdsourcing projects that is still very much in active development.

We hoped that it would be relatively quick to generate a new set of projects for collecting the crowdsourced playbills data. In fact, our first prototypes were ready back in April. However, as more detailed requirements were established we soon began to come up against some of the limits of the platform’s existing architecture.

Old theme

The projects page from the old LibCrowds theme

For instance, we needed to present the appearance of a self-contained website designed around the playbills, with additional pages and features not present in the core PyBossa model. We previously navigated some of these issues by developing custom plugins, but as the need for these grew the approach was becoming unwieldy.

Not long before we encountered these issues, PyBossa had released an update allowing for it to be run as a headless backend server. 'Headless' means that it can be run as a stand-alone piece of software, separate from any graphical user interface, and be interacted with purely via an API. This differs from the ‘traditional’ website, in which the front and backend communicate directly, causing the functionality and architecture of one to be heavily dependent on the other.

We took the plunge and decided to drop some of the work that had gone into the redesign up to that point, opting to run a headless PyBossa instance as our backend and rewriting our frontend as a separate single-page application (SPA), using the Vue.js framework. This approach gives us the freedom to structure the website as required, without having to modify large amounts of backend code. Backend plugins still have a place but the majority of custom functionality can be handled within the browser.

New theme

The new LibCrowds homepage.

This new frontend application comprises a set of core LibCrowds pages, including a homepage and an administration interface where staff can manage the projects. Sitting beneath these, each project has its own set of themed pages, giving the appearance of bespoke websites for each project. Crucially, the new architecture managed this without requiring us to maintain multiple application instances, or the handling of user authentication between those instances.

In hindsight, we should have spent more time on requirements gathering at the start of the process, as we iterated through a number of possible system designs before settling on our current architecture. However, we seem to be moving towards quite a clean solution and one that will hopefully provide a satisfying user experience.

The application is still in the beta phase and all suggestions are welcome via the GitHub issues page or the project forum.

07 September 2017

Introducing... Playbills In the Spotlight

Add comment

Mia Ridge, Alex Mendes and Christian Algar from the Library's Digital Scholarship and Printed Heritage teams introduce a new project...

Playbills were sheets of paper handed out or posted up (as in the picture of a Portsmouth theatre, below) to advertise entertainments at theatres, fairs, pleasure gardens and other such venues. The British Library has a fantastic collection of playbills dating back to the 1730s. Looking through them is a lovely way to get a glimpse at how Britons entertained themselves over the past 300 years.

Access_bl_uk_item_viewer_ark__81055_vdc_100022589190_0x000002
Passers-by read playbills outside a theatre in Portsmouth. From: A collection of portraits of celebrated actors and actresses, views of theatres and playbills,([1750?-1821?])<http://access.bl.uk/item/viewer/ark:/81055/vdc_100022589190.0x000002#?c=0&m=0&s=0&cv=164&z=-53.6544%2C795.6187%2C2422.3453%2C1335.8411>

 

Why do playbills matter?

The playbills are a great resource for academic and community researchers interested in theatre and cultural history or seeking to understand their local or family history. They're full of personal names, including actors, playwrights, composers, theatre managers and ticket sellers. The playbills list performances of plays we know and love now alongside less well-known, even forgotten plays and songs. But individual playbills are hard to find in the British Library's catalogues, because they are only listed as a group (in the past they were bound into volumes of frequently miscellaneous sheets) with a brief summary of dates and location/theatre names. The rich details captured on each historical page - from personal names to popular songs and plays to lost moments in theatrical history - aren't yet available to search online.

What is In the Spotlight?

We're launching a project called In the Spotlight soon to make these late 18th - late 19th century digitised playbills more findable online, and to give people a chance to see past entertainments as represented in this collection. In this new crowdsourcing project, members of the public can help transcribe titles, names and locations to make the playbills easier to find.

Detail from a playbill
Detail from a playbill


We're starting with a very simple but fun task: mark out the titles of plays by drawing around them. The screenshot shows how varied the text on playbills can be - it's easy enough for people to spot the title of upcoming plays on the page, but it's not the kind of task we can automate (yet). You'll notice the playbills used different typefaces, sizes and weights with apparent abandon, which makes it tricky for a computer to work out what's a title and what's not. That's why we need your help! 

How you can help

We've chosen two volumes from the Theatre-Royal, Plymouth and one from the Theatre Royal, Margate to begin with. You can find out more about the project and the playbills, or you can just dive in and play a role: https://playbills.libcrowds.com

This project is an 'alpha', work-in-progress that we think is almost but not quite ready for its moment in the spotlight. In theatrical terms, we’re still in rehearsal. Behind-the-scenes, we're preparing the transcription tasks for you, but in the meantime we're excited about giving people a chance to explore the playbills while marking up titles.

Your efforts will help uncover the level of detail important to researchers: titles; names of actors, dramatis personae; dates of performance, and the details of songs performed. Who knows what researchers will discover when the collection is more easily searchable? Key information from individual playbills will be added to the Library's main catalogue to permanently enhance the way these playbills can be found and reviewed for the benefit of all. The website also automatically makes the raw data available for re-use as tasks are completed.

What happens next?

We're taking an iterative approach and releasing a few volumes to test the approach and make sure the tasks we're asking for help with are sufficiently entertaining. Once we have sets of marked up titles for each volume of playbills, they're ready for the transcription task. Your comments and feedback now will make a big difference in making sure the version we formally launch is as entertaining as possible.

Please have a go and do let us know what you think: do the instructions make sense? Do the tasks work as you expected? Is there too much to mark and transcribe, or too little? Are you comfortable using the project forum to discuss the playbills? Are there other types of tasks you'd like to suggest for the pages you've seen? You can help by posting feedback on the project forum, emailing us digitalresearch@bl.uk or tweeting @LibCrowds.

Please consider this your official invitation to our dress rehearsal - we hope you'll find it entertaining! Join us and help us put playbills back in the spotlight at https://playbills.libcrowds.com.


21 July 2017

Through the British Library Looking Glass - A Continuation of Nadya Miryanova's Work Experience

Add comment

Posted by Nadya Miryanova BL Labs School Work Placement Student, currently studying at Lady Eleanor Holles, working with Mahendra Mahey, Manager of BL Labs.

Day 6

Despite the fact that a week of my work experience here has already elapsed, I still can’t quite believe that I am lucky enough to find myself in this magnificent institution, let alone have access to ‘staff-only’ areas and actually be able to work here. One thing I particularly love is that I can enter the library in the early morning, before official opening hours, and see it evolve from a certain peaceful stillness to its usual excited buzz of activity as the day progresses and watch as the library is brought to life once more by the people that visit it.

Photo of me at the book tower
A photograph of me by the book tower in the British Library

Previously, in a very serious and sophisticated catch-up session (including, of course, only work-related matters), Mahendra had discovered that I was a huge fan of the Harry Potter series. Although this subject may seem quite unexpected and completely out of context in this blog, it is actually very relevant, since on the next day, Mahendra had informed me that I would be able to meet the Harry Potter curator. This was something that caught me completely by surprise, but it also shamelessly sparked a child-like excitement within me, having loved the franchise ever since I was seven. A meeting was set for Monday morning, and I waited, with some impatience, to meet Julian Harrison, the curator of medieval manuscripts and also the man who was involved in the organisation of the Harry Potter exhibition.

People looking at exhibition
People looking at an exhibition in the British Library

During the meeting, I was able to gain an insight into the working life of a curator. Julian explained the sorts of things involved in this role, and also talked more about the exhibitions themselves, where inspiration comes from, as well as previous exhibitions and their structure. 

In addition to this, I was able to find out lots of details about the Harry Potter exhibition (it’s fascinating and definitely worth a visit, trust me!). Furthermore, we had an in-depth discussion about the Harry Potter series itself, and we talked about some of the key themes as well as key characters in the books. You’ll soon be able to find out more about the exhibition too, be sure to book your tickets early and visit the British Library to be part of what will truly be a magical experience!

Phoenix
A preview of the "Harry Potter- A History of Magic" Exhibition, coming soon on 27th October 2017

In the afternoon, I went to a classical music concert at the British Museum. As I stepped into the light interior of the museum, I felt a hundred memories instantly come to mind, dating back to various visits with my family and numerous school projects over the years. The British Library and British Museum singers presented a concert performance of ‘Trial by Jury’, an opera in one act, with music by Arthur Sullivan and libretto by W. S. Gilbert. ‘Trial by Jury’ is set at a Court of Justice in 1876. The defendant, Edwin, has recently promised to marry a beautiful woman, Angelina, but has since changed his mind, for which reason Angelina is now suing him for Breach of Promise. After a multitude of entertaining events, involving the Jury, the public, the Usher, and many comic disagreements over the issue, a decision is finally reached. The Judge decides the only real logical solution to the problem is to marry Angelina himself, resulting in happiness for all parties. The choir then performed Te Deum, op 103, by Dvorak, a true choral masterpiece, and the performance itself was very moving.

Although the choir was relatively small in number, their bright and beautiful voices resonated across the room, creating a light-hearted and friendly atmosphere, upheld by the choir’s energy and enthusiasm. I always love seeing how music can unite people to interpret a piece together, and each member was fully involved in this collaborative effort to create stunning music, making the performance an unquestionable success.  

Choir
The British Museum and British Library Singers

When I returned to the office, I checked my e-mails and saw that Laurence Roger, Project Support Officer in the Collections Division, had very kindly offered to help me examine a book about Catullus’ poetry. The book that I eventually saw was dating back to the 18th century, and I spent the last section of my day looking at this book with Laurence, who is very nice, and I felt extremely lucky to be able to have access to it.

Book pic
One of the books that Laurence herself had lent me to look at.

Day 7

My seventh day of work experience arrived, and almost as soon as I got into the office, I set up my desk and eagerly launched straight into my working day. My morning consisted of independent work, where I further developed my research project and carried on with the interview storyboard for Hannah-Rose Murray, a finalist of the BL Labs competition in 2016. Her project was centred on black American activists in the 19th century, particularly their speeches and lectures from the 1830s to the 1890s. This was a period of history that I previously knew little about, and so I enjoyed learning about the influence that black Americans had on British society and seeing the way Hannah went around creating her project, bringing history to life. Read more about her project here. 

Locations of Frederick Douglass
Map displaying the locations of Frederick Douglass’ lectures in the United Kingdom and Ireland, a small section of Hannah-Rose Murray's project

At 12:30, I attended a Welcome Day at the British Library, and this presented me with an excellent opportunity to not only find out more about the different departments of the library, but also to tell some new members of staff about some of the work the Digital Scholarship Department does (I was also provided with a free lunch, always a bonus!). I talked to a variety of departments, ranging from Human Resources to Publishing and Retail, and everyone was extremely friendly, helpful and accommodating.

In the afternoon, I worked independently once again, more specifically on a YouTube transcription of an interview with Melodee Beals, a 2016 research award winner, who created an amazing project entitled ‘Scissors and Paste’. This project utilises the 1800-1900 British Library Newspapers collection to explore the possibilities of mining large-scale newspaper databases for reprinted and re purposed news content.

Melodee presenting her project
Melodee Beals presenting her project, 'Scissors and Paste'

After finishing my working day, I decided to wonder around and explore the British Library. The amazing thing about this place is that it really does resemble a maze, I constantly find myself discovering new places and rooms, with each day presenting something new and different to the previous one.

Day 8

As I entered the lift, I looked at the hard copy of my schedule, and I noticed that a meeting with a fashion company and members of the British Fashion Council was fixed that very morning. Feeling suddenly a little more self-conscious than usual about my appearance, I glanced cautiously in the mirror that was in the lift and my reflection stared back, wondering if anything could be done to cover the consequences that a malfunctioning alarm clock and getting ready in five minutes that morning could bring. After a few fruitless attempts of trying to somehow tame my hair, I finally accepted defeat and entered the meeting room.

The meeting at 9 o’clock was with a luxury womenswear brand. During the meeting, Mahendra introduced BL Labs, showing a presentation that informed the company about Digital Scholarship and detailed previous projects that the department had worked on, including ‘Burning Man’. A project with the fashion company was then initiated, which would involve the Library's collections, and some possible ideas for the project were also brainstormed. The fashion company talked more about their collections and how ideas for projects generally come about. It is inspiring to think how each individual collection, whether an assortment of garments or a literary exhibition of novels, tells its own unique story, and I found out that in many ways the research for the project is itself a sensational journey.

After this meeting, I returned back to my desk and had a quick catch-up with Mahendra, where we evaluated the YouTube transcription work, and the general progress made over the first half of this week. To finish off, I was whisked off to another meeting, this time with Wayne Boucher, a photographer who has a very big interest in beautiful stain-glass windows, and will be keeping in contact with the British Library to promote this stunning artwork.

Tiffany stain glass window
A Tiffany stain-glass window

Day 9

In the morning, I hurriedly entered the British Library through the staff entrance, as usual, but instead of walking over to the doors of the lift, I took a sharp right turn, and walked over to the Post Room. Mahendra had previously organised for me to visit the Post Room with Peter Clarke, Service Delivery Manager, Messenger/Post Service, and today I would be having a tour of certain sections of the building that are off bounds to not only the general public, but also to many members of staff. I was able to see the process of delivery take place, and even help with this crucial procedure, without which many of the library books that researchers and readers need would not be available. I was shown the delivery room by Keiran Duncan-Johnson, Late Team Leader LMS, Messenger/Post Service, Finance Division, and this was a huge, open space, which once more reminded me of the sheer scale of the place. 

I was also kindly shown round other areas of the library  I was previously unfamiliar with by Keiran, such as the modern languages sector and the Alan Turing Institute, both of which are incredible departments that work tirelessly to make great leaps in their corresponding fields of study to change the world for the better.

Alan Turing institute
The Alan Turing Institute

The afternoon commenced with a meeting with the music curator, Chris Scobie. For the second time that day, I was lucky enough to visit a new area of the library that is of limited access, and Chris showed me the music reading room, and most notably, the basement. The basement is where all the music scores and manuscripts lie, and needless to say, I was incredibly excited. As we browsed through the shelves of the collections, I saw multiple familiar names of composers, such as Bach, Beethoven and Brahms, and I even got to read and touch some of Elgar’s letters to Vaughan Williams and look at his original manuscript for his Enigma Variations!  

Elgar Manuscript
A digitised version of the original Elgar manuscript for the theme of the Enigma Variations

Day 10

As I walked down the second floor corridor, I soon came to face the wooden door of the office for what it seemed was the last time. I sighed and a miserable thought came into my head, as I began to contemplate what on earth I was going to do with myself on Monday, when I was no longer going to work here. However, I soon brushed it off, and decided to make the most of my final day at the British Library.

Door to office
The door to the office of the Digital Scholarship Department

My final day consisted of making concluding touches to my numerous projects, including refining and making last minute edits to some of the transcriptions I had done. I then met Christin Hoene from the University of Kent, who was working on a project that was based on the concept of sound within novels. I was able to show her some of the work that I did on Excel with my independent research project, which can be accessed here.

At lunchtime, rather than eating in the staff canteen as usual, I decided to eat my lunch in a free reading space in the centre of the library, whilst reading my book, ‘Mother Tongue’ by Bill Bryson. What I love most about libraries is that there are so many untold stories hiding in the shelves, and I feel like I could sit comfortably in here for hours. In fact, in the space of an hour, you could travel to as many as 10 countries, should you only have the will to open a few different books and immerse yourself in their stories. As Lloyd Alexander once said “Books can truly change our lives: the lives of those who read them, the lives of those who write them. Readers and writers alike discover things they never knew about the world and about themselves”.

Lloyd Alexander quotation
Another great Lloyd Alexander quotation

Lastly, and most importantly, I would like to say a huge thank you to everyone who has made this experience a possibility for me, especially Mahendra, who has not only been very kind and patient, but has also provided me with so many wonderful opportunities and has helped me hugely with a multitude of different things. I have always loved books since a young age, and to be surrounded by so many was in itself very special, but to be able to work in the library and help the Digital Scholarship Department was just incredible. My experience here has taught me multiple valuable things, which is something I am eternally grateful for.

The same way I would never judge a book by its front cover, I will not judge a building by its name, for the British Library is infinitely more than just a residence for books. It is a museum in which there are many exhibitions, it is a research centre, and most importantly, it is an institution that stores the world’s knowledge behind its brick walls.

The-British-Library
The British Library

Inspiration can really come from absolutely anywhere, and from something small you can make something incredibly vast. It makes you think what you could do and what a difference it could make, if only you just choose to try. Inevitably, in life, you have to take risks, but more often than not, lots of these are worth taking in an attempt to brighten and bring artistic colour as well as creativity to the world. In the words of Stephen King, “books are a uniquely portable magic”, something which certainly rings true within the walls of this institution, where so many items are kept and so many new ones are constantly being acquired and discovered.

So, I send a big thank you to the British Library and all who work here, for making what was essentially a childhood dream into a reality and this will truly be a chapter of my life that I will always remember.

Nadya Miryanova

21 December 2016

Mobius programme – on the beach of learning

Add comment

This guest post is by Virve Miettinen, who spent four months with various teams at the British Library.

Every morning there’s a 100 meter queue in front of the British Library. It seems to say a lot about an unashamed nerdiness and love for learning in this city. Usually all the queuers have already put the things they might need in the Reading Room in a clear plastic bag, so they can head straight down to the lockers, stow away their coats, handbags and laptop cases and secure a place on the beach of learning.

Virve
Virve Miettinen

The Mobius fellowship programme, organised by the Finnish Institute in London, enables mobility for visual arts, museum, library and archives professionals, and customised working periods as part of the host organisation’s staff, in my case the British Library. The programme is a great opportunity to break away from daily routines, to think about one’s professional identity, find fresh ideas, compare the practices and methods between two countries, share knowledge and build meaningful networks.

Learn, relearn and unlearn from each other

Learning isn’t a destination, it’s a never-ending road of discovery, challenge, inspiration and wonder. Each learning moment builds character, shapes thoughts, guides futures. But what makes us learn? For me the answer is other people, and during the Mobius Fellowship I’ve been blessed with the chance to work with talented people willing to share their knowledge at the British Library.

I’ve familiarised myself with British Library Learning Team which is responsible for the library’s engagement with all kinds of learners. The Learning Team offers workshops, activities and resources for schools, teachers and learners of all ages.

I’ve been following the work of the Digital Scholarship team and BL Labs project to learn more about the incredible digital collections the library has to offer, and how to open them up for the public through various activities such as competitions, events and projects.

I’ve worked with the Knowledge Quarter, which is a network of now 76 partners within a one mile radius of Kings Cross and who actively create and disseminate knowledge. Partners include over 49 academic, cultural, research, scientific and media organisations large and small: from the British Library and University of the Arts London to the School of Life, Connected Digital Economy Catapult, Francis Crick Institute and Google.

I’ve assisted the Library’s Community Engagement Manager Emma Morgan. She has been working as a community engagement manager for six months now and the aim of her work is to create meaningful, long-lasting, mutually beneficial relationships with the surrounding community, i.e. residents, networks and organisations.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2016-12-21/8bd92af45559431385823ecce6782cb7.png
Inside the British Library

I’ve observed the library’s marketing and communications unit in action, and learned for example how they measure and research the customer experience, i.e. who visits and uses the BL, what they think of their experience and how the BL might improve it.

 

I’ve got many 'mental souvenirs' to take back home with me - if they interest you, read more from my Mobius blog: http://itssupercalafragilistic.tumblr.com/. 

100 digital stories about Finnish-British relations

As part of the Mobius programme I’ve been working on a co-operative project between the British Library, the National Library in Finland, the Finnish National Archives, The Finnish Institute in London and the Finnish Embassy. In the last three decades, contacts between Finland and UK, the two relatively distant nations have multiplied. At the same time, the network of cultural relationships has tightened into a seamless 'love-story' – something that would not have been easy to predict just 50 years ago. In the coming year of 2017 the Finnish Institute celebrates the centennial anniversary of Finland’s independence by telling the story of two nations – the aim is to make the history, the interaction and the links between these two countries tangible and visible.

We are collaborating to create a digital gallery open to all, which offers its visitors carefully curated pieces of the shared history of the two countries and their political, cultural and economic relations. It will offer new information on the relations and influences between the two countries. It consists of digitised historical materials, like letters, news, cards, photographs, tickets and maps. The British Library and other partners will select 100 digitised items to create the basis of the gallery.

The gallery will be expanded further through co-creation. In the spirit of the theme of Finland’s centenary 'together', the gallery is open to all and easily accessible. With the call 'Wanted – make your own heritage' we invite people to share their own stories and interpretations, and record history through them. The gallery feeds curiosity, creates interaction and engages users to share their own memories relating to Finnish-British experiences. The users are invited to interpret recent history from a personal point of view.

The work continues after my Mobius-period and the gallery will open in September 2017. Join us and share your memories. Be frank, withdrawn, furious, imaginative, witty or sad. Through your story you create history.

P.S. The British Library Reading Room is actually far from The Beach of Learning, it’s more like The Coolest Place To Be, I found myself freezing in the air-conditioned Rare Books Reading Room despite wearing my leather jacket and extra pair of leggings

Virve Miettinen is working at Helsinki City Library/ Central Library as a participation planner. Her job is to engage citizens and partners to design the library of the future. For Helsinki City Library co-operative planning and service design means designing the premises and services together with the library users while taking advantage of user centric methods. Her interests involve co-design, service design, community engagement and community-led city development. At the moment she is also working with her PhD under the title 'Co-creative practices in library services'.

16 December 2016

Re-imagining a catalogue of illuminated manuscripts - from search to browse

Add comment

In this guest post, Thomas Evans discusses his work with Digital Curator Dr Mia Ridge to re-imagine the interface to the British Library's popular Online Catalogue of Illuminated manuscripts.

The original Catalogue was built using an Access 2003 database, and allows users to create detailed searches from amongst 20 fields (such as date, title, origin, and decoration) or follow 'virtual exhibitions' to view manuscripts. Search-based interfaces can be ideal for specialists who already know what they're looking for, but the need to think of a search term likely to yield interesting results can be an issue for people unfamiliar with a catalogue. 'Generous interfaces' are designed as rich, browsable experiences that highlight the scope and composition of a particular collection by loading the page with images linked to specific items or further categories. Mia asked Thomas to apply faceted browsing and 'generous' styles to help first-time visitors discover digitised illuminated manuscripts. In this post Thomas explains the steps he took to turn the catalogue data supplied into a more 'generous' browsing interface. An archived version of his interface is available on the Internet Archive.

With over 4,300 manuscripts, written in a variety of languages and created in countries across Europe over a period of about a thousand years, the British Library's collection of illuminated manuscripts contains a diverse treasure trove of information and imagery for both the keen enthusiast and the total novice.

As the final project for my Masters in Computer Science at UCL, I worked with the British Library to design and start to implement alternative ways of exploring the collection. This project had some constraints in time, knowledge and resources. The final deadline for submission was only four months after receiving the project outline and the success of the project rested on the knowledge, experience and research of a fresh-faced rookie (me) using whatever tools I had the wherewithal to cobble together (open source software running on a virtual machine server hosted by UCL).

Rather than showing visitors an empty search box when they first arrive, a generous interface will show them everything available. However, taken literally, displaying 'everything' means details for over 4,300 manuscripts and around 40,000 images would have to be displayed on one page. While this approach would offer visitors a way to explore the entire catalogue, it could be quite unwieldy.

One way to reduce the number of manuscripts loaded onto the screen is to allow visitors to filter out some items, for example limiting the 'date' field to between 519 and 927 or the 'region' field to England. This is 'faceted' browsing, and it makes exploration more manageable. Presenting the list of available values for region or language, etc., also gives you a sense of the collection's diversity. It also means that 'quirky' members of the collection are less likely to be overlooked.

Screenshot of filters in Thomas CIM interface II
An example of 'date' facets providing an instant overview of the temporal range of the Catalogue

For example, if you were to examine 30 random manuscripts from the British Library's collection, you might find 20 written in Latin, three each in French and English, and perhaps one each in Greek, Hebrew or Italian. You would almost certainly miss that the Catalogue contains a manuscript written in Cornish, another in Portuguese and another in Icelandic. These languages might be of interest precisely because they are hard to come by in the British Library's catalogue. Listing all the available languages (as well as their frequencies) exposes the exceptional parts of the collection where an unfaceted generous interface would hide them in plain sight.

Once I understood the project's goals and completed some high-level planning and design sketches, it was time to get to grips with implementation. Being fairly inexperienced, I found some tasks took much longer than expected. A few examples which stick in the mind are properly configuring the web server, debugging errant server-side scripts (which have a habit of failing either silently or with an unhelpful error message) and transforming Library's database into a form which I could use.

Being the work of many hands over the years, the database inevitably contained some tiny differences in the way entries were recorded, which Mia informs me is not uncommon for a long-standing database in a collecting institution. These small inconsistencies - for example, the use of an en-dash in some cases and a hyphen in others - look fine to us, but confuse a computer. I worked around these where I could, 'cleaning' the records only when I was certain of my correction.

Being new to web design, I built the interface iteratively, component by component, consulting periodically with Mia for feedback. Thankfully, frameworks exist for responsive web design and page templating. Nevertheless, there was a small learning curve and some thought was required to properly separate application logic from presentation logic.

There were some ambitions for the project which were ultimately not pursued due to time (and knowledge!) constraints, but this iterative process made other improvements possible over the course of my project. To make exploration of the catalogue easier, the page listing a manuscript's details also contained links to related manuscripts. For instance, Ioannes Rhosos is attributed as the scribe of Harley 5699, so, on that manuscript's page, users could click on his name to see a list of all manuscripts by him. They could then apply further filters if desired. This made links between manuscripts much more clear than the old interface, but it is limited to direct links which were explicitly recorded in the database.

An example of a relevant feature not explicitly recorded in the database is genre - only by reading manuscript descriptions can you determine whether it is religious, historical, medical etc. in its subject matter. Two possible techniques for revealing such features were considered: applying natural language processing to manuscript descriptions in order to classify them, or analysing data about which manuscripts were viewed by which users to build a recommendation system. Both of these turned out to require more in-depth knowledge than I was able to acquire within the time limit of the project.

I enjoyed working out how to transform all the possible inputs to the webpage into queries which could be run against the database, dealing with missing/invalid inputs by providing appropriate defaults etc. There was a quiet satisfaction to be had when tests of the interface went well - seeing something work and thinking 'I made that!'. It was also a pleasure to work with data about such an engaging topic.

Hopefully, this project will have proved that exploration of British Library's Catalogue of Illuminated Manuscripts has the potential to become a richer experience. Relationships between manuscripts which are currently not widely known could be revealed to more visitors and, if the machine learning techniques were to be implemented, perhaps new relationships would be revealed and related manuscripts could be recommended. My project showed the potential for applying new computational methods to better reveal the character of collections and connections between their elements. Although the interface I delivered has some way to go before it can achieve this goal, I earnestly hope that it is a first step in that direction.

Thomas' Catalogue interface
Thomas' Catalogue interface

14 November 2016

British Library Labs Symposium 2016 - Competition and Award runners up

Add comment

The 4th annual British Library Labs Symposium was held on 7th November 2016, and the event was a great success in celebrating and showcasing Digital Scholarship and highlighting the work of BL Labs and their collaborators. The exciting day included the announcement of the winners of the BL Labs Competition and BL Labs Awards, as well as of the runners up who are presented in this blog post. Posts written by all of the winners and runners up about their work are also scheduled for the next few weeks - watch this space!

BL Labs Competition finalist for 2016
Roly Keating, Chief Executive of the British Library announced that the runner up of the two finalists of the BL Labs Competition for 2016 was...

Black Abolitionist Performances and their Presence in Britain
By Hannah-Rose Murray (PhD student at the University of Nottingham)

Bl_labs_symposium_2016_027
Roly Keating, Chief Executive of the British Library, welcoming Hannah-Rose Murray on to the stage.

The project focuses on African American lives, experiences and lectures in Britain between 1830–1895. By assessing black abolitionist speeches in the British Library’s nineteenth-century newspaper collection and using the British Library’s Flickr Commons 1 million collection. to illustrate, the project has illuminated their performances and how their lectures reached nearly every corner of Britain. For the first time, the location of these meetings has been mapped and the number and scale of the lectures given by black abolitionists in Britain has been evaluated, allowing their hidden voices to be heard and building a more complete picture of Victorian London for us. Hannah-Rose has recently posted an update about her work and the project findings can also be found on her website: www.frederickdouglassinbritain.com.

RoseHannah-Rose Murray is a second year PhD student with the Department of American and Canadian Studies, University of Nottingham. Her AHRC/M3C-funded PhD focuses on the legacy of formerly enslaved African Americans on British society and the different ways they fought British racism. Hannah-Rose received a first class Masters degree in Public History from Royal Holloway University and has a BA History degree from University College London (UCL). In Nottingham, Hannah-Rose works closely with the Centre for Research in Race and Rights and is one of the postgraduate directors of the Rights and Justice Research Priority Area, which includes the largest number of scholars (700) in the world working on rights and justice.

BL Labs Awards runners up for 2016

Research Award runner up
Allan Sudlow, Head of Research Development at the British Library announced that the runner up of the Research Award was...

Nineteenth-century Newspaper Analytics
By Paul Fyfe (Associate Professor of English, North Carolina State University) and Qian Ge (PhD Candidate in Electrical and Computer Engineering, North Carolina State University)

News
Nineteenth-Century Newspaper Analytics

The project represents an innovative partnership between researchers in English literature, Electrical & Computer Engineering, and data analytics in pursuit of a seemingly simple research question: How can computer vision and image processing techniques be adapted for large-scale interpretation of historical illustrations? The project is developing methods in image analytics to study a corpus of illustrated nineteenth-century British newspapers from the British Library’s collection, including The Graphic, The Illustrated Police News, and the Penny Illustrated Paper. 

Paul_fyfe_qian_ge
Paul Fyfe and Qian Ge gave a recorded acceptance speech at the Symposium as they were unable to attend in person.

It aims to suggest ways of adapting image processing techniques to other historical media while also pursuing scholarship on nineteenth-century visual culture and the illustrated press. The project also exposes the formidable technical challenges presented by historical illustrations and suggests ways to refine computer vision algorithms and analytics workflows for such difficult data. The website includes sample workflows as well as speculations about how large-scale image analytics might yield insights into the cultural past, plus much more: http://ncna.dh.chass.ncsu.edu/imageanalytics 

Commercial Award runner up
Isabel Oswell, Head of Business Audiences at the British Library announced that the runner up of the Commercial Award was...

Poetic Places
By Sarah Cole (TIME/IMAGE organisation and Creative Entrepreneur-in-Residence at the British Library)

Bl_labs_symposium_2016_172Sarah Cole, presenting Poetic Places PoeticPoetic Places

Poetic Places is a free app for iOS and Android devices which was launched in March 2016. It brings poetic depictions of places into the everyday world, helping users to encounter poems in the locations described by the literature, accompanied by contextualising historical narratives and relevant audiovisual materials. These materials are primarily drawn from open archive collections, including the British Library Flickr collection. Utilising geolocation services and push notifications, Poetic Places can (whilst running in the background on the device) let users know when they stumble across a place depicted in verse and art, encouraging serendipitous discovery. Alternatively, they can browse the poems and places via map and list interfaces as a source of inspiration without travelling. Poetic Places aspires to give a renewed sense of place, to bring together writings and paintings and sounds to mean more than they do alone, and to bring literature into people’s everyday life in unexpected moments.

Artistic Award runner up
Jamie Andrews, Head of Culture and Learning at the British Library announced that the runner up of the Artistic Award was... 

Bl_labs_symposium_2016_190Kristina Hofmann and Claudia Rosa Lukas

Fashion Utopia
By Kris Hofmann (Animation Director) and Claudia Rosa Lukas (Curator)

 
Fashion Utopia

The project involved the creation of an 80 second animation and five vines which accompanied the Austrian contribution to the International Fashion Showcase London, organised annually by the British Council and the British Fashion Council. Fashion Utopia garnered creative inspiration from the treasure trove of images from the British Library Flickr Commons collection and more than 500 images were used to create a moving collage that was, in a second step, juxtaposed with stop-frame animated items of fashion and accessories.

Teaching / Learning Award runner up
Ria Bartlett, Lead Producer: Onsite Learning at the British Library announced that the runner up of the Teaching / Learning Award was...

The PhD Abstracts Collections in FLAX: Academic English with the Open Access Electronic Theses Online Service (EThOS) at the British Library

By Shaoqun Wu (FLAX Research & Development and Lecturer in Computer Science), Alannah Fitzgerald (FLAX Open Education Research and PhD Candidate), Ian H. Witten (FLAX Project Lead and Professor of Computer Science) and Chris Mansfield (English Language and Academic Writing Tutor)

Flax
The PhD Abstracts Collections in FLAX

The project presents an educational research study into the development and evaluation of domain-specific language corpora derived from PhD abstracts with the Electronic Theses Online Service (EThOS) at the British Library. The collections, which are openly available from this study, were built using the interactive FLAX (Flexible Language Acquisition flax.nzdl.org) open-source software for uptake in English for Specific Academic Purposes programmes (ESAP) at Queen Mary University of London. The project involved the harvesting of metadata, including the abstracts of 400,000 doctoral theses from UK universities, from the EThOS Toolkit at the British Library. These digital PhD abstract text collections were then automatically analysed, enriched, and transformed into a resource that second-language and novice research writers can browse and query in order to extend their ability to understand the language used in specific domains, and to help them develop their abstract writing. It is anticipated that the practical contribution of the FLAX tools and the EThOS PhD Abstract collections will benefit second-language and novice research writers in understanding the language used to achieve the persuasive and promotional aspects of the written research abstract genre. It is also anticipated that users of the collections will be able to develop their arguments more fluently and precisely through the practice of research abstract writing to project a persuasive voice as is used in specific research disciplines.

Bl_labs_symposium_2016_209
Alannah Fitzgerald and Chris Mansfield receiving the Runner Up Teaching and Learning Award on behalf of the FLAX team.

British Library Labs Staff Award runner up
Phil Spence, Chief Operating Officer at the British Library announced that the runner up of the British Library Labs Staff Award as...

SHINE 2.0 - A Historical Search Engine

Led by Andy Jackson (Web Archiving Technical Lead at the British Library) and Gil Hoggarth (Senior Web Archiving Engineer at the British Library)

Shine
SHINE

SHINE is a state-of-the-art demonstrator for the potential of Web Archives to transform research. The current implementation of SHINE exposes metadata from the Internet Archive's UK domain web archives for the years 1996- 2013. This data was licensed for use by the British Library by agreement with JISC. SHINE represents a high level of innovation in access and analysis of web archives, allowing sophisticated searching of a very large and loosely-structured dataset and showing many of the characteristics of "Big Social Data". Users can fine-tune results to look for file-types, results from specific domains, languages used and geo-location data (post-code look-up). The interface was developed by Web Archive technical development alongside the AHRC-funded Big UK Domain Data for the Arts and Humanities project. An important concept in its design and development was that it would be researcher-led and SHINE was developed iteratively with research case studies relating to use of UK web archives.

Bl_labs_symposium_2016_298
Andy Jackson, Receiving the Runner up Staff Award on behalf of the SHINE team

The lead institution for SHINE was the University of London, with Professor Jane Winters as principle investigator, and former British Library staff members Peter Webster and Helen Hockx were also instrumental in developing the project and maintaining researcher engagement through the project. 

10 November 2016

British Library Labs Symposium 2016 - Competition and Award Winners

Add comment

The 4th annual British Library Labs Symposium took place on 7th November 2016 and was a resounding success! 

More than 220 people attended and the event was a fantastic experience, showcasing and celebrating the Digital Scholarship field and highlighting the work of BL Labs and their collaborators. The Symposium included a number of exciting announcements about the winners of the BL Labs Competition and BL Labs Awards, who are presented in this blog post. Separate posts will be published about the runners up of the Competition and Awards and posts written by all of the winners and runners up about their work are also scheduled for the next few weeks - watch this space!

BL Labs Competition winner for 2016

Roly Keating, Chief Executive of the British Library announced that the overall winner of the BL Labs Competition for 2016 was...

SherlockNet: Using Convolutional Neural Networks to automatically tag and caption the British Library Flickr collection
By Karen Wang and Luda Zhao, Masters students at Stanford University, and Brian Do, Harvard Medicine MD student

Machine learning can extract information and insights from data on a massive scale. The project developed and optimised Convolutional Neural Networks (CNN), inspired by biological neural networks in the brain, in order to tag and caption the British Library’s Flickr Commons 1 million collection. In the first step of the project, images were classified with general categorical tags (e.g. “people”, “maps”). This served as the basis for the development of new ways to facilitate rapid online tagging with user-defined sets of tags. In the second stage, automatically generate descriptive natural-language captions were provided for images (e.g. “A man in a meadow on a horse”). This computationally guided approach has produced automatic pattern recognition which provides a more intuitive way for researchers to discover and use images. The tags and captions will be made accessible and searchable by the public through the web-based interface and text annotations will be used to globally analyse trends in the Flickr collection over time.

Bl_labs_symposium_2016_131
SherlockNet team presenting at the Symposium

Karen Wang is currently a senior studying Computer Science at Stanford University, California. She also has an Art Practice minor. Karen is very interested in the intersection of computer science and humanities research, so this project is near and dear to her heart! She will be continuing her studies next year at Stanford in CS, Artificial Intelligence track.

Luda Zhao is currently a Masters student studying Computer Science at Stanford University, living in Palo Alto, California. He is interested in using machine learning and data mining to tackle tough problems in a variety of real-life contexts, and he's excited to work with the British Library to make art more discoverable for people everywhere.

Brian Do grew up in sunny California and is a first-year MD/PhD student at Harvard Medical School. Previously he studied Computer Science and biology at Stanford. Brian loves using data visualisation and cutting edge tools to reveal unexpected things about sports, finance and even his own text message history.

SherlockNet recently posted an update of their work and you can try out their SherlockNet interface and tell us what you think.

BL Labs Awards winners for 2016

Research Award winner

Allan Sudlow, Head of Research Development at the British Library announced that the winner of the Research Award was...

Scissors and Paste

By Melodee Beals, Lecturer in Digital History at Loughborough University and historian of migration and media

Bl_labs_symposium_2016_162
Melodee Beals presenting Scissors & Paste

Scissors and Paste utilises the 1800-1900 digitised British Library Newspapers, collection to explore the possibilities of mining large-scale newspaper databases for reprinted and repurposed news content. The project has involved the development of a suite of tools and methodologies, created using both out-of-the-box and custom-made project-specific software, to efficiently identify reprint families of journalistic texts and then suggest both directionality and branching within these subsets. From these case-studies, detailed analyses of additions, omissions and wholesale changes offer insights into the mechanics of reprinting that left behind few if any other traces in the historical record.

Melodee Beals joined the Department of Politics, History and International Relations at Loughborough University in September 2015. Previously, Melodee has worked as a pedagogical researcher for the History Subject Centre, a teaching fellow for the School of Comparative American Studies at the University of Warwick and a Principal Lecturer for Sheffield Hallam University, where she acted as Subject Group Leader for History. Melodee completed her PhD at the University of Glasgow.

Commercial Award winner

Isabel Oswell, Head of Business Audiences at the British Library announced that the winner of the Commercial Award was...

Curating Digital Collections to Go Mobile

By Mitchel Davis, publishing and media entrepreneur

Bl_labs_symposium_2016_178
Mitchell Davis presenting Curating Digital Collections to Go Mobile

As a direct result of its collaborative work with the British Library, BiblioLabs has developed BiblioBoard, an award-winning e-Content delivery platform, and online curatorial and multimedia publishing tools to support it. These tools make it simple for subject area experts to create visually stunning multi-media exhibits for the web and mobile devices without any technical expertise. The curatorial output is almost instantly available via a fully responsive web site as well as through native apps for mobile devices. This unified digital library interface incorporates viewers for PDF, ePub, images, documents, video and audio files allowing users to immerse themselves in the content without having to link out to other sites to view disparate media formats.

Mitchell Davis founded BookSurge in 2000, the world’s first integrated global print-on-demand and publishing services company (sold to Amazon.com in 2005 and re-branded as CreateSpace). Since 2008, he has been founder and chief business officer of BiblioLabs- the creators of BiblioBoard. Mitchell is also an indie producer and publisher who has created several award winning indie books and documentary films over the past decade through Organic Process Productions, a small philanthropic media company he founded with his wife Farrah Hoffmire in 2005.

Artistic Award winner

Jamie Andrews, Head of Culture and Learning at the British Library announced that the winner of the Artistic Award was... 

Here there, Young Sailor

Written and directed by writer and filmmaker Ling Low and visual art by Lyn Ong

Hey There, Young Sailor combines live action with animation, hand-drawn artwork and found archive images to tell a love story set at sea. Inspired by the works of early cinema pioneer Georges Méliès, the video draws on late 19th century and early 20th century images from the British Library's Flickr collection for its collages and tableaux. The video was commissioned by Malaysian indie folk band The Impatient Sisters and independently produced by a Malaysian and Indonesian team.

Bl_labs_symposium_2016_192
Ling Low receives her Award from Jamie Andrews

Ling Low is based between Malaysia and the UK and she has written and directed various short films and music videos. In her fiction and films, Ling is drawn to the complexities of human relationships and missed connections. By day, she works as a journalist and media consultant. Ling has edited a non-fiction anthology of human interest journalism, entitled Stories From The City: Rediscovering Kuala Lumpur, published in 2016. Her journalism has also been published widely, including in the Guardian, the Telegraph and Esquire Malaysia.

Teaching / Learning Award winner

Ria Bartlett, Lead Producer: Onsite Learning at the British Library announced that the winner of the Teaching / Learning Award was...

Library Carpentry

Founded by James Baker, Lecturer at the Sussex Humanities Lab, who represented the global Library Carpentry Team (see below) at the Symposium

Bl_labs_symposium_2016_212
James Baker presenting Library Carpentry

Library Carpentry is software skills training aimed at the needs and requirements of library professionals. It takes the form of a series of modules that are available online for self-directed study or for adaption and reuse by library professionals in face-to-face workshops. Library Carpentry is in the commons and for the commons: it is not tied to any institution or person. For more information on Library Carpentry see http://librarycarpentry.github.io/

James Baker is a Lecturer in Digital History and Archives at the School of History, Art History and Philosophy and at the Sussex Humanities Lab. He is a historian of the long eighteenth century and contemporary Britain. James is a Software Sustainability Institute Fellow and holds degrees from the University of Southampton and latterly the University of Kent. Prior to joining Sussex, James has held positions of Digital Curator at the British Library and Postdoctoral Fellow with the Paul Mellon Centre for Studies of British Art. James is a convenor of the Institute of Historical Research Digital History seminar and a member of the History Lab Plus Advisory Board.

 The Library Carpentry Team is regularly accepting new members and currently also includes: 

Carpentry
The Library Carpentry Team

British Library Labs Staff Award winner

Phil Spence, Chief Operating Officer at the British Library announced that the winner of the British Library Labs Staff Award was...

Libcrowds

Led by Alex Mendes, Software Developer at the British Library

LibCrowds is a crowdsourcing platform built by Alexander Mendes. It aims to create searchable catalogue records for some of the hundreds of thousands of items that can currently only be found in printed and card catalogues. By participating in the crowdsourcing projects, users will help researchers everywhere to access the British Library’s collections more easily in the future.

Bl_labs_symposium_2016_247
Nora McGregor presenting LibCrowds on behalf of Alex Mendes

The first project series, Convert-a-Card, experimented with a new method for transforming printed card catalogues into electronic records for inclusion in our online catalogue Explore, by asking volunteers to link scanned images of the cards with records retrieved from the WorldCat database. Additional projects have recently been launched that invite volunteers to transcribe cards that may require more specific language skills, such as the South Asian minor languages. Records matched, located, transcribed or translated as part of the crowdsourcing projects were uploaded to the British Library's Explore catalogue for anyone to search online. By participating users can have a direct impact on the availability of research material to anyone interested in the diverse collections available at the British Library.

Alex Mendes has worked at the British Library for several years and recently completed a Bachelor’s degree in Computer Science with the Open University. Alex enjoys the consistent challenges encountered when attempting to find innovative new solutions to unusual problems in software development.

AlexMendes
Alex Mendes

If you would like to find out more about BL Labs, our Competition or Awards please contact us at labs@bl.uk   

03 November 2016

SherlockNet update - 10s of millions more tags and thousands of captions added to the BL Flickr Images!

Add comment

SherlockNet are Brian Do, Karen Wang and Luda Zhao, finalists for the Labs Competition 2016.

We have some exciting updates regarding SherlockNet, our ongoing efforts to using machine learning techniques to radically improve the discoverability of the British Library Flickr Commons image dataset.

Tagging

Over the past two months we’ve been working on expanding and refining the set of tags assigned to each image. Initially, we set out simply to assign the images to one of 11 categories, which worked surprisingly well with less than a 20% error rate. But we realised that people usually search from a much larger set of words, and we spent a lot of time thinking about how we would assign more descriptive tags to each image.

Eventually, we settled on a Google Images style approach, where we parse the text surrounding each image and use it to get a relevant set of tags. Luckily, the British Library digitised the text around all 1 million images back in 2007-8 using Optical Character Recognition (OCR), so we were able to grab this data. We explored computational tools such as Term Frequency – Inverse Document Frequency (Tf-idf) and Latent Dirichlet allocation (LDA), which try to assign the most “informative” words to each image, but found that images aren’t always associated with the words on the page.

To solve this problem, we decided to use a 'voting' system where we find the 20 images most similar to our image of interest, and have all images vote on the nouns that appear most commonly in their surrounding text. The most commonly appearing words will be the tags we assign to the image. Despite some computational hurdles selecting the 20 most similar images from a set of 1 million, we were able to achieve this goal. Along the way, we encountered several interesting problems.

Similar images
For all images, similar images are displayed
  1. Spelling was a particularly difficult issue. The OCR algorithms that were state of the art back in 2007-2008 are now obsolete, so a sizable portion of our digitised text was misspelled / transcribed incorrectly. We used a pretty complicated decision tree to fix misspelled words. In a nutshell, it amounted to finding the word that a) is most common across British English literature and b) has the smallest edit distance relative to our misspelled word. Edit distance is the fewest number of edits (additions, deletions, substitutions) needed to transform one word into another.
  2. Words come in various forms (e.g. ‘interest’, ‘interested’, ‘interestingly’) and these forms have to be resolved into one “stem” (in this case, ‘interest’). Luckily, natural language toolkits have stemmers that do this for us. It doesn’t work all the time (e.g. ‘United States’ becomes ‘United St’ because ‘ates’ is a common suffix) but we can use various modes of spell-check trickery to fix these induced misspellings.
  3. About 5% of our books are in French, German, or Spanish. In this first iteration of the project we wanted to stick to English tags, so how do we detect if a word is English or not? We found that checking each misspelled (in English) word against all 3 foreign dictionaries would be extremely computationally intensive, so we decided to throw out all misspelled words for which the edit distance to the closest English word was greater than three. In other words, foreign words are very different from real English words, unlike misspelled words which are much closer.
  4. Several words appear very frequently in all 11 categories of images. These words were ‘great’, ‘time’, ‘large’, ‘part’, ‘good’, ‘small’, ‘long’, and ‘present’. We removed these words as they would be uninformative tags.

In the end, we ended up with between 10 and 20 tags for each image. We estimate that between 30% and 50% of the tags convey some information about the image, and the other ones are circumstantial. Even at this stage, it has been immensely helpful in some of the searches we’ve done already (check out “bird”, “dog”, “mine”, “circle”, and “arch” as examples). We are actively looking for suggestions to improve our tagging accuracy. Nevertheless, we’re extremely excited that images now have useful annotations attached to them!

SherlockNet Interface

Sherlocknet-interface
SherlockNet Interface

For the past few weeks we’ve been working on the incorporation of ~20 million tags and related images and uploading them onto our website. Luckily, Amazon Web Services provides comprehensive computing resources to take care of storing and transferring our data into databases to be queried by the front-end.

In order to make searching easier we’ve also added functionality to automatically include synonyms in your search. For example, you can type in “lady”, click on Synonym Search, and it adds “gentlewoman”, “ma'am”, “madam”, “noblewoman”, and “peeress” to your search as well. This is particularly useful in a tag-based indexing approach as we are using.

As our data gets uploaded over the coming days, you should begin to see our generated tags and related images show up on the Flickr website. You can click on each image to view it in more detail, or on each tag to re-query the website for that particular tag. This way users can easily browse relevant images or tags to find what they are interested in.

Each image is currently captioned with a default description containing information on which source the image came from. As Luda finishes up his captioning, we will begin uploading his captions as well.

We will also be working on adding more advanced search capabilities via wrapper calls to the Flickr API. Proposed functionality will include logical AND and NOT operators, as well as better filtering by machine tags.

Captioning

As mentioned in our previous post, we have been experimenting with techniques to automatically caption images with relevant natural language captions. Since an Artificial Intelligence (AI) is responsible for recognising, understanding, and learning proper language models for captions, we expected the task to be far harder than that of tagging, and although the final results we obtained may not be ready for a production-level archival purposes, we hope our work can help spark further research in this field.

Our last post left off with our usage of a pre-trained Convolutional Neural Networks - Recurrent Neural Networks (CNN-RNN) architecture to caption images. We showed that we were able to produce some interesting captions, albeit at low accuracy. The problem we pinpointed was in the training set of the model, which was derived from the Microsoft COCO dataset, consisting of photographs of modern day scenes, which differs significantly from the BL Flickr dataset.

Through collaboration with BL Labs, we were able to locate a dataset that was potentially better for our purposes: the British Museum prints and drawing online collection, consisting of over 200,000 print drawing, and illustrations, along with handwritten captions describing the image, which the British Museum has generously given us permission to use in this context. However, since the dataset is directly obtained from the public SPARQL endpoints, we needed to run some pre-processing to make it usable. For the images, we cropped them to standard 225 x 225 size and converted them to grayscale. For caption, pre-processing ranged from simple exclusion of dates and author information, to more sophisticated “normalization” procedures, aimed to lessen the size of the total vocabulary of the captions. For words that are exceeding rare (<8 occurrences), we replaced them with <UNK> (unknown) symbols denoting their rarity. We used the same neuraltalk architecture, using the features from a Very Deep Convolutional Networks for Large-Scale Visual Recognition (VGGNet) as intermediate input into the language model. As it turns out, even with aggressive filtering of words, the distribution of vocabulary in this dataset was still too diverse for the model. Despite our best efforts to tune hyperparameters, the model we trained was consistently over-sensitive to key phrases in the dataset, which results in the model converging on local minimums where the captions would stay the same and not show any variation. This seems to be a hard barrier to learning from this dataset. We will be publishing our code in the future, and we welcome anyone with any insight to continue on this research.

Captions
Although there were occasion images with delightfully detailed captions (left), our models couldn’t quite capture useful information for the vast majority of the images(right). More work is definitely needed in this area!

The British Museum dataset (Prints and Drawings from the 19th Century) however, does contain valuable contextual data, and due to our difficulty in using it to directly caption the dataset, we decided to use it in other ways. By parsing the caption and performing Part-Of-Speech (POS) tagging, we were able to extract nouns and proper nouns from each caption. We then compiled common nouns from all the images and filtered out the most common(>=500 images) as tags, resulting in over 1100 different tags. This essentially converts the British Museum dataset into a rich dataset of diverse tags, which we would be able to apply to our earlier work with tag classification. We trained a few models with some “fun” tags, such as “Napoleon”, “parrots” and “angels”, and we were able to get decent testing accuracies of over 75% on binary labels. We will be uploading a subset of these tags under the “sherlocknet:tags” prefix to the Flickr image set, as well as the previous COCO captions for a small subset of images(~100K).

You can access our interface here: bit.ly/sherlocknet or look for 'sherlocknet:tag=' and 'sherlocknet:category=' tags on the British Library Flickr Commons site, here is an example, and see the image below:

Sherlocknet tags
Example Tags on a Flickr Image generated by SherlockNet

Please check it out and let us know if you have any feedback!

We are really excited that we will be there in London in a few days time to present our findings, why don't you come and join us at the British Library Labs Symposium, between 0930 - 1730 on Monday 7th of November, 2016?