THE BRITISH LIBRARY

Digital scholarship blog

15 posts categorized "Modern history"

29 November 2017

Crowdsourcing using IIIF and Web Annotations

Add comment

Alex Mendes from the Digital Scholarship team explains how the LibCrowds platform uses emerging standards for digitised images and annotations.

Our new crowdsourcing project, In the Spotlight, was officially launched at the start of November 2017. The project asks volunteers to identify and transcribe key data held in digitised playbills. Here we explore two of the key technologies we adopted to enable this: IIIF and Web Annotations.

Task-configuration
Configuring a selection task using JSON

Commonly, when an institution began digitising a new type of content, or a particular project realised that the current infrastructure didn’t fit their needs, they may have built or commissioned a new image viewer, one that would probably be tightly coupled with their custom metadata structures. This leads to an ever-growing collection of isolated data silos that, among other issues, do not allow the information they contain to be easily reused.

The International Image Interoperability Framework (IIIF) is a set of APIs (protocols for requests between computers) that aims to tackle this issue by allowing images and metadata to be requested in a standardised way. Via these APIs, particular regions of images can be requested in a specified quality, size and format. The associated metadata includes information about how the images should be displayed and in what order. As this metadata is standardised, different image viewers can be built that are all able to understand and display the same sets of images. The one increasingly used by the library for catalogue items is called the 'Universal Viewer'.

Another IIIF-compliant viewer, called LibCrowds Viewer, has been developed for In the Spotlight. The viewer takes advantage of the flexibility enabled by the APIs described above. Images and metadata already held by the British Library can be requested, combined with some additional configuration details, and used to generate sets of crowdsourcing tasks. This means that we don’t need to host any additional image data, nor are we tied to any institution-specific metadata structures. In fact, the system could be used to generate crowdsourced annotations for any IIIF-compliant content.

Transcriptions are collected in the form of Web Annotations, a W3C standard that was published at the start of this year. This is another step towards future interoperability and reuse. By adopting this standard we can share our transcriptions more easily across the Web and incorporate them back into our core discovery systems.

As well as making the crowdsourced transcriptions searchable via the library’s catalogue viewer, they will be made available via the IIIF Content Search API, further increasing the ways in which the data could be reused. For example, we could develop programmatic ways to search the collection for a particular person who performed in a certain play in a given location.

To enable such exciting functionality we first need to collect the data and since we launched volunteers have completed over 14,000 tasks, which is a fantastic start. Visit In the Spotlight to get involved.

09 November 2017

You're invited to come and play - In the Spotlight

Add comment

Mia Ridge, Alex Mendes and Christian Algar from the Library's Digital Scholarship and Printed Heritage teams invite you to take part in a new crowdsourcing project...

It’s hard for most of us to remember life before entertainment on demand through our personal devices, but a new project at the British library provides a glimpse into life before electronic entertainment. We're excited to launch In the Spotlight, a crowdsourcing site where the public can help transcribe information about performance from the last 300 years. We're inviting online volunteers to help make the British Library's historic playbills easier to find while uncovering curiosities about past entertainments. You can step Into the Spotlight at http://playbills.libcrowds.com

The original playbills were handed out or posted outside theatres, and like modern nightclub flyers, they weren't designed to last. They're so delicate they can't be handled, so providing better access to digitised versions will help academic, local and family history researchers.

Playbills compiled into a volume
The Library’s collection has over a thousand volumes holding thousands of fragile playbills

 

What is In the Spotlight?

Individual playbills in the historical collection are currently hard to find, as the Library's catalogue contains only brief information about the place and dates for each volume of playbills. By marking up and transcribing titles, dates, genres, participant volunteers will make each playbill - and individual performances - findable online.

We’ve started with playbills from theatres in Margate, Plymouth, Bristol, Hull, Perth and Edinburgh. We think this provides wider opportunities for people across the country to connect with nationally held collections.

Crowdsourcing interface screenshot
Take a close look at the playbills whilst marking up or transcribing the titles of plays

 

But it's not all work - it's important to us that volunteers on In the Spotlight can indulge their curiosity. The playbills provide fascinating glimpses into past entertainments, and we're excited to see what people discover.

The playbills people can see on In the Spotlight provide a fabulous source for looking at British and Irish social history from the late 18th century through to the Victorian period. More than this, their visual richness is an experience in itself, and should stimulate interest in historical printing’s use of typography and illustrations. Over time, playbills included more detailed information, and these the song titles, plot synopses, descriptions of stage sets and choreographed action from the plays help bring these past performances to life.

Creating an open stage 

You can download individual playbills, share them on social media or follow a link back to the Library's main catalogue. You can also download the transcribed data to explore or visualise as a dataset.

We also hope that people will share their discoveries with us and with other participants, either on our discussion forum, or social media. Jumping In the Spotlight is a chance for anyone anywhere to engage with the historical printed collections held at the British Library. We’ve created our very own stage for dialogue where people can share and discuss interesting or curious finds - the forum is a great place to post about a particular typeface that takes your fancy, an impressive or clever use of illustration, or an obscure unheard-of or little known play. It's also a great place to ask questions, like 'why do so many playbills announce an evening’s entertainment, ‘For the Benefit’ of someone or other?'. In the Spotlight’s open stage means anyone can add details or links to further good reads: share your growing knowledge with others!

We're also keen to promote the discoveries of project volunteers, and encourage you to get in touch if you'd like to write a short post for the Library’s Untold Lives blog, the English & Drama blog or here on our Digital Scholarship blog. If forums and twitter aren't your thing, you can email us digitalresearch@bl.uk.

Playbill from Devonport, 1836
In the Spotlight is an ‘Open House’ – share your findings with others on the Forum, contribute articles to British Library blogs!

 

What's been discovered so far?

We quietly launched an alpha version of the interface back in September to test the waters and invite comments from the public. We’ve received some incredibly helpful feedback (thank you to all!) that has helped us fine-tune the interface design. We also received some encouraging comments from colleagues at other libraries who work with similar collections. We’ll take someone saying they are 'insanely jealous' of the crowdsourcing work we are doing with our historical printed collections as a good sign!

We've been contacted about some very touching human-interest stories too - follow @LibCrowds or sign up to our crowdsourcing newsletter to be notified when blog posts about discoveries go live. We're looking forward to the first post written by the In the Spotlight participant who uncovered a sad tale behind a Benefit performance for several actors in Plymouth in 1827.

What can you do?

Take on a part! Take a step Into the Spotlight at http://playbills.libcrowds.com and help record titles, dates and genres.

If you are interested in theatre and drama, in musical performance, in the way people were entertained, come and explore this collection and help researchers while you’re doing it. All you need is a little free time and it’s LOTS OF FUN! Help us make In the Spotlight the best show in town.

Lots-of-fun
Join in, it'll be lots of fun!

07 September 2017

Introducing... Playbills In the Spotlight

Add comment

Mia Ridge, Alex Mendes and Christian Algar from the Library's Digital Scholarship and Printed Heritage teams introduce a new project...

Playbills were sheets of paper handed out or posted up (as in the picture of a Portsmouth theatre, below) to advertise entertainments at theatres, fairs, pleasure gardens and other such venues. The British Library has a fantastic collection of playbills dating back to the 1730s. Looking through them is a lovely way to get a glimpse at how Britons entertained themselves over the past 300 years.

Access_bl_uk_item_viewer_ark__81055_vdc_100022589190_0x000002
Passers-by read playbills outside a theatre in Portsmouth. From: A collection of portraits of celebrated actors and actresses, views of theatres and playbills,([1750?-1821?])<http://access.bl.uk/item/viewer/ark:/81055/vdc_100022589190.0x000002#?c=0&m=0&s=0&cv=164&z=-53.6544%2C795.6187%2C2422.3453%2C1335.8411>

 

Why do playbills matter?

The playbills are a great resource for academic and community researchers interested in theatre and cultural history or seeking to understand their local or family history. They're full of personal names, including actors, playwrights, composers, theatre managers and ticket sellers. The playbills list performances of plays we know and love now alongside less well-known, even forgotten plays and songs. But individual playbills are hard to find in the British Library's catalogues, because they are only listed as a group (in the past they were bound into volumes of frequently miscellaneous sheets) with a brief summary of dates and location/theatre names. The rich details captured on each historical page - from personal names to popular songs and plays to lost moments in theatrical history - aren't yet available to search online.

What is In the Spotlight?

We're launching a project called In the Spotlight soon to make these late 18th - late 19th century digitised playbills more findable online, and to give people a chance to see past entertainments as represented in this collection. In this new crowdsourcing project, members of the public can help transcribe titles, names and locations to make the playbills easier to find.

Detail from a playbill
Detail from a playbill


We're starting with a very simple but fun task: mark out the titles of plays by drawing around them. The screenshot shows how varied the text on playbills can be - it's easy enough for people to spot the title of upcoming plays on the page, but it's not the kind of task we can automate (yet). You'll notice the playbills used different typefaces, sizes and weights with apparent abandon, which makes it tricky for a computer to work out what's a title and what's not. That's why we need your help! 

How you can help

We've chosen two volumes from the Theatre-Royal, Plymouth and one from the Theatre Royal, Margate to begin with. You can find out more about the project and the playbills, or you can just dive in and play a role: https://playbills.libcrowds.com

This project is an 'alpha', work-in-progress that we think is almost but not quite ready for its moment in the spotlight. In theatrical terms, we’re still in rehearsal. Behind-the-scenes, we're preparing the transcription tasks for you, but in the meantime we're excited about giving people a chance to explore the playbills while marking up titles.

Your efforts will help uncover the level of detail important to researchers: titles; names of actors, dramatis personae; dates of performance, and the details of songs performed. Who knows what researchers will discover when the collection is more easily searchable? Key information from individual playbills will be added to the Library's main catalogue to permanently enhance the way these playbills can be found and reviewed for the benefit of all. The website also automatically makes the raw data available for re-use as tasks are completed.

What happens next?

We're taking an iterative approach and releasing a few volumes to test the approach and make sure the tasks we're asking for help with are sufficiently entertaining. Once we have sets of marked up titles for each volume of playbills, they're ready for the transcription task. Your comments and feedback now will make a big difference in making sure the version we formally launch is as entertaining as possible.

Please have a go and do let us know what you think: do the instructions make sense? Do the tasks work as you expected? Is there too much to mark and transcribe, or too little? Are you comfortable using the project forum to discuss the playbills? Are there other types of tasks you'd like to suggest for the pages you've seen? You can help by posting feedback on the project forum, emailing us digitalresearch@bl.uk or tweeting @LibCrowds.

Please consider this your official invitation to our dress rehearsal - we hope you'll find it entertaining! Join us and help us put playbills back in the spotlight at https://playbills.libcrowds.com.


21 July 2017

Russian Language Books Research Project by Nadya Miryanova

Add comment

Finding digitised books in the Russian language in a collection of 65,000 books

Posted by Nadya Miryanova BL Labs School Work Placement Student, currently studying at Lady Eleanor Holles, working with Mahendra Mahey, Manager of BL Labs.

Background

Although there are 200 million items in the British Library, contrary to popular belief, only 1-2% of these items are digitised. The ‘Microsoft’ books are 65,000 digitised volumes - about 22.5 million pages, and they were published between 1789 and 1914; digitised in partnership with Microsoft. They cover a wide range of subject areas including topics such as philosophy, poetry and history and they include Optically Character Recognised (OCR) text from the millions of pages.

In discussion with Mahendra Mahey, Project Manager of BL Labs, we explored making a ‘sub collection’ from this larger set which will hopefully be of use to the library in the future. At first, I simply brainstormed possible ideas and looked at different possibilities for this project, and I thought that since 2017 celebrates a century since the Russian Revolution, I would do some research into the concept of ‘revolution’.

Revolution

Definition - A forcible overthrow of a government or social order, in favour of a new system.

Etymology - Late latin ‘revolvere’, meaning to roll back, which turned into the Old French or Late Latin ‘revolutio’, from which came about our contemporary English word ‘revolution’.

Revolutions date back to as early as 2730 BC, where there was a set rebellion against the reign of the pharaoh Seth-Peribsen of the Second Dynasty of Egypt. The most recent revolution actually happened only last year in 2016, when there was a Turkish coup d'état attempt.

About the Russian Revolution

The British Library have recently opened an exhibition perfectly capturing not only the events that took place in this particularly intense period in history, but also the atmosphere that was omnipresent at the time and on my very first day here at the British Library, I got the chance to explore and study this fascinating exhibition in great depth.

The Russian Revolution was initiated by Lenin and the Bolsheviks, who hoped to create a socialist government, and in 1917, they successfully dismantled Tsarist autocracy in the hope of making society less stratified. The revolution resulted in the rise of the USSR and in the words of Karl Liebknecht, “The Russian revolution was to an unprecedented degree the cause of the proletariat of the whole world becoming more revolutionary”. However, this revolution also led to months of social and political turmoil and provoked the tragedy of the Russian Civil War on an unforeseeable scale, in which 10 million lives were lost. The revolution also produced myths that entered the artistic and intellectual fabric of the modern world, which the exhibitions uncovers and investigates. Learn more about the Russian Revolution by booking your tickets for the Russian Revolution Exhibition at the British Library on the website http://goo.gl/FL9FFt.

Russian Revolution Poster
Russian Revolution Exhibition Poster at the British Library

As part of my research project, I also wanted to incorporate some of the other subjects that I had studied at GCSE, and so I thought this would be a brilliant opportunity to compare the Russian Revolution to the French Revolution, both French and Russian being subjects that I wish to at A-level. The French Revolution was a period of far-reaching social and political upheaval in France that lasted from 1789 until 1799, and was partially carried forward by Napoleon during the later expansion of the French Empire.

Below is a mind-map I made detailing the differences and similarities between the French and the Russian Revolution.

Russian and French Revolution Research
French and Russian Revolution Comparison

Although my initial focus for the project was revolution, we soon established that it was too specific a topic and it would be more beneficial to focus on something broader, that would be useful to a larger group of researchers.

I soon discovered that the Russian titles within the digitised collection had never previously seperated and categorised, and being a native Russian speaker, I thought that this would be a better avenue to go down and explore. This would be a project in commemoration of the 100th anniversary of the Russian Revolution, which would hopefully help researchers looking at books in the Russian language in the future.

Facts about the Russian Language

  • Largest European native language.
    • 7th most spoken language in the world.
  • There are only 200,000 words in the Russian language in comparison to 1,000,000 in English.
  • The stress pattern in a word can drastically change its meaning, e.g. :
    • я плачу  (emphasis on second syllable) - I pay.
    • я плáчу (emphasis on first syllable) -I cry.

Approach

My first task included examining a huge spread sheet containing information about the 65,000 books in the collection.

  • In order to make this task a little less daunting, I first used the ‘Filter’ function in the language column of my Excel spreadsheet, and selected the Russian language. As a result, I found 583 books in total that were written in the Russian Language.
  • I now had to think of a way to organise these books. The possibilities seemed endless, should I sort them into history books? Science books? Books about Russia?
  • In the end, I decided to establish two broad categories as a starting point, fiction vs non-fiction, as this seemed like a logical place to start.
  • In order to access the Russian keyboard, I went onto the site translit.net, which turns normal Latin letters into Cyrillic.
  • I typed in a Russian word, using the English keyboard, that related to one of my two categories, e.g. for non-fiction, I wanted to find history related books, so used the simple word ‘history’, which translates as история.
  • I then copied this word, and pasted it into my spreadsheet.
  • I used the filter function on the 'Titles' section, and this would hopefully produce a number of books that included the word history in their title.
Spread Sheet Screenshot
Screenshot of my spread sheet.


Challenges

In this project, I found that I had to overcome a number of difficulties.

  • In Russian, nouns can have up to 12 inflections and adjectives can have as many as 16. This clearly shows that looking up different versions of the same word was necessary.
  • Like I previously said, I first experimented with simple words, such as history. You would think that there would definitely be books relating to history lurking somewhere in a collection of nearly 600 Russian titles. However, when I conducted my search, the spread sheet had no results. Confused, I tried another simple word, and once again had no definitive results.

Scanning more closely through the list of books, I soon noticed that there were certain spellings and letters that I did not recognise. I decided to research this matter more closely, looking at the history of the Russian language, and found out that the Russian of the 19th century does not directly resemble the Russian language used today. Why? Because of the Russian Revolution, of course.

1918 Spelling Reform Research
Bolshevik Spelling Reform of 1918 Research, detailing the causes for the reform and the changes made to the Russian language

Suddenly, everything made a lot more sense.

This discovery meant that I had to change my approach a little bit, so rather than typing in the Russian words in the spelling that I knew today, I would have to go for a sort of hunt throughout the spreadsheet, looking for words in the titles of the books that could encompass a number of books. In a way, this made the process of my project even more interesting, despite the fact that it took longer.

As I mentioned in my previous blog, the majority of the Russian language books were actually non-fiction. As a result, I decided to create sub-categories for the non-fiction set, which can be seen in the speech-bubble I created below.

Non-fiction categories
Speech bubble containing non-fiction categories

To help me in this task, I decided to create a colour-coding system for classification, so that I could keep track of my progress.

  • Yellow=Classified
  • Purple= латиницa (latin letters)- quite often I found titles which where written in Russian but using latin letters. Purple also used for titles written in another language
  • Blue=unknown classification
  • Orange= near classification
Colour coding system
Screenshot of my spread sheet showing the colour coding system that I used.

Evaluation

In conclusion, I managed to categorise the Russian language books into two broad categories, fiction and non-fiction, and I created 25 sub-collections within the non-fiction category. This project has been extremely enjoyable to work on, and although there were many challenges involved in the process, I have learnt lots during my research journey. In order to improve this project, I would definitely say that more work needs to be done on splitting up the 'history' sub-collection of my non-fiction title, since it is very broad and covers political accounts, as well as books about Russian History. Additionally, I think that this project would also considerably benefit from undergoing a thorough check with curators, in order to help classify some of the books I have not organised into separate collections yet. 

Picture from Russian Book
An illustration from one of the Russian books, По Сѣверо-Западу Россіи, available in the digitised collections. Image can be accessed on British Library Flickr Commons.

 

 

21 December 2016

Mobius programme – on the beach of learning

Add comment

This guest post is by Virve Miettinen, who spent four months with various teams at the British Library.

Every morning there’s a 100 meter queue in front of the British Library. It seems to say a lot about an unashamed nerdiness and love for learning in this city. Usually all the queuers have already put the things they might need in the Reading Room in a clear plastic bag, so they can head straight down to the lockers, stow away their coats, handbags and laptop cases and secure a place on the beach of learning.

Virve
Virve Miettinen

The Mobius fellowship programme, organised by the Finnish Institute in London, enables mobility for visual arts, museum, library and archives professionals, and customised working periods as part of the host organisation’s staff, in my case the British Library. The programme is a great opportunity to break away from daily routines, to think about one’s professional identity, find fresh ideas, compare the practices and methods between two countries, share knowledge and build meaningful networks.

Learn, relearn and unlearn from each other

Learning isn’t a destination, it’s a never-ending road of discovery, challenge, inspiration and wonder. Each learning moment builds character, shapes thoughts, guides futures. But what makes us learn? For me the answer is other people, and during the Mobius Fellowship I’ve been blessed with the chance to work with talented people willing to share their knowledge at the British Library.

I’ve familiarised myself with British Library Learning Team which is responsible for the library’s engagement with all kinds of learners. The Learning Team offers workshops, activities and resources for schools, teachers and learners of all ages.

I’ve been following the work of the Digital Scholarship team and BL Labs project to learn more about the incredible digital collections the library has to offer, and how to open them up for the public through various activities such as competitions, events and projects.

I’ve worked with the Knowledge Quarter, which is a network of now 76 partners within a one mile radius of Kings Cross and who actively create and disseminate knowledge. Partners include over 49 academic, cultural, research, scientific and media organisations large and small: from the British Library and University of the Arts London to the School of Life, Connected Digital Economy Catapult, Francis Crick Institute and Google.

I’ve assisted the Library’s Community Engagement Manager Emma Morgan. She has been working as a community engagement manager for six months now and the aim of her work is to create meaningful, long-lasting, mutually beneficial relationships with the surrounding community, i.e. residents, networks and organisations.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2016-12-21/8bd92af45559431385823ecce6782cb7.png
Inside the British Library

I’ve observed the library’s marketing and communications unit in action, and learned for example how they measure and research the customer experience, i.e. who visits and uses the BL, what they think of their experience and how the BL might improve it.

 

I’ve got many 'mental souvenirs' to take back home with me - if they interest you, read more from my Mobius blog: http://itssupercalafragilistic.tumblr.com/. 

100 digital stories about Finnish-British relations

As part of the Mobius programme I’ve been working on a co-operative project between the British Library, the National Library in Finland, the Finnish National Archives, The Finnish Institute in London and the Finnish Embassy. In the last three decades, contacts between Finland and UK, the two relatively distant nations have multiplied. At the same time, the network of cultural relationships has tightened into a seamless 'love-story' – something that would not have been easy to predict just 50 years ago. In the coming year of 2017 the Finnish Institute celebrates the centennial anniversary of Finland’s independence by telling the story of two nations – the aim is to make the history, the interaction and the links between these two countries tangible and visible.

We are collaborating to create a digital gallery open to all, which offers its visitors carefully curated pieces of the shared history of the two countries and their political, cultural and economic relations. It will offer new information on the relations and influences between the two countries. It consists of digitised historical materials, like letters, news, cards, photographs, tickets and maps. The British Library and other partners will select 100 digitised items to create the basis of the gallery.

The gallery will be expanded further through co-creation. In the spirit of the theme of Finland’s centenary 'together', the gallery is open to all and easily accessible. With the call 'Wanted – make your own heritage' we invite people to share their own stories and interpretations, and record history through them. The gallery feeds curiosity, creates interaction and engages users to share their own memories relating to Finnish-British experiences. The users are invited to interpret recent history from a personal point of view.

The work continues after my Mobius-period and the gallery will open in September 2017. Join us and share your memories. Be frank, withdrawn, furious, imaginative, witty or sad. Through your story you create history.

P.S. The British Library Reading Room is actually far from The Beach of Learning, it’s more like The Coolest Place To Be, I found myself freezing in the air-conditioned Rare Books Reading Room despite wearing my leather jacket and extra pair of leggings

Virve Miettinen is working at Helsinki City Library/ Central Library as a participation planner. Her job is to engage citizens and partners to design the library of the future. For Helsinki City Library co-operative planning and service design means designing the premises and services together with the library users while taking advantage of user centric methods. Her interests involve co-design, service design, community engagement and community-led city development. At the moment she is also working with her PhD under the title 'Co-creative practices in library services'.

12 August 2016

Black Abolitionist Performances and their Presence in Britain

Add comment

Posted by Mahendra Mahey on behalf of Hannah-Rose Murray, finalist of the BL Labs Competition 2016

Overview of the project

The Black Abolitionist project focuses on African American lives, experiences and lectures in Britain between 1830-1895. It builds on my PhD project, which I am currently studying for at the Department of American and Canadian Studies, University of Nottingham. Working with the British Library has already proved a fortunate and enriching opportunity, and by harnessing the power of technology, we want to work together to search through thousands of newspapers to find abolitionist speeches, a process that would take years by hand. By reading black abolitionist speeches in the Nineteenth Century Newspaper Collection (and using the Flickr collection to illustrate), we can get a sense of their performances and how their lectures reached nearly every corner of Britain. Newspapers can also provide us with the locations of these meetings, and for the first time, I have mapped these locations to gather an estimate of how many lectures black abolitionists gave in Britain and to allow their hidden voices to be heard. I am updating my website to reflect this project, which can be found at www.frederickdouglassinbritain.com.

These are the maps I have so far: the map (below left) chronicles the lectures of Frederick Douglass, and the second one (on the right) represents the lectures given by other black abolitionists such as Josiah Henson, Sarah Remond, Moses Roper, William Wells Brown, Henry ‘Box’ Brown, Ida B. Wells, James Watkins and William and Ellen Craft (to name a few): Abolitionist_maps

African Americans visited Britain for a variety of reasons. Many came to publish slave narratives, teach Britons about slavery and look for their support in the abolitionist cause. Others came to live in Britain safely, away from the ever-watchful eyes of slave-catchers, while several wanted to raise money to purchase family members from the jaws of slavery. 

Black abolitionists made their mark in nearly every part of Great Britain, and it is of no surprise to learn they had a strong impact on London too. Lectures were held in famous meeting halls, taverns, the houses of wealthy patrons, theatres, and churches across London: we inevitably and unknowably walk past sites with a rich history of Black Britain every day.

When searching the newspapers, what we have found so far is that the OCR (Optical Character Recognition) is patchy at best. OCR refers to scanned images that have been turned into machine-readable text, and the quality of the OCR can depend on many factors – from the quality of the scan itself, to the quality of the paper the newspaper was printed on, to whether it has been damaged or ‘muddied.’ If the OCR is unintelligible, the data will not be ‘read’ properly – hence there could be hundreds of references to Frederick Douglass that are not accessible or ‘readable’ to us through an electronic search (see the image below).

American_slavery_f_douglass

In order to clean and sort through the ‘muddied’ OCR and the ‘clean’ OCR, we need to teach the computer what is ‘positive text’ (i.e., language that uses the word ‘abolitionist’, ‘black’, ‘fugitive’, ‘negro’) and ‘negative text’ (language that does not relate to abolition). For example, the image to the left shows an advert for one of Frederick Douglass’s lectures (Leamington Spa Courier, 20 February 1847). The key words in this particular advert that are likely to appear in other adverts, reports and commentaries are ‘Frederick Douglass’, ‘fugitive’, ‘slave’, ‘American’, and ‘slavery.’ I can search for this advert through the digitized database, but there are perhaps hundreds more waiting to be uncovered.

I have spent several years transcribing many of Frederick Douglass’ speeches and most of this will act as the ‘positive’ text. ‘Negative’ text can refer to other lectures of a similar structure but do not relate to abolition specifically, for example prison reform meetings or meetings about church finances. This will ensure the abolitionist language becomes easily readable. We can then test the performance of this against some of the data we already have, and once the probability ensures we are on the right track, we can apply it to a larger data set.

The prospect of uncovering hidden speeches by African Americans is incredibly exciting, and hopefully this will add to our knowledge of the black presence in Britain: we can use these extensive sources to build a more complete picture of Victorian London in particular.

 

11 July 2016

Finding digitised books and images about Finland in a collection of 65,000 books

Add comment

Posted by Ruby Dixon, currently a student at Graveney School and on work-experience at BL Labs.

Background

The ‘Microsoft’ books are 65,000 digitised volumes - about 22.5 million pages - which were published between 1789 and 1914; they were digitised in partnership with Microsoft. They cover a wide range of subject areas including philosophy, poetry, history and literature and they include Optically Character Recognised (OCR) text from the millions of pages.

In discussion with Mahendra Mahey, Project Manager of BL Labs, we explored making a ‘sub collection’ from this larger set which will hopefully help researchers in the future. After thinking about making a collection of ‘works of fiction’, ‘bibles’ or titles about ‘slavery’ I decided that identifying a collection of books about Finland would be the most interesting and realistic thing to do as part of my mini-project at the Library.

The collection I am creating will hopefully help a project that the Library might be working on which celebrates the 100th year of independence of Finland in 2017.

Facts about Finland

When starting this mini-project, I thought it would be wise to do some background research about Finland. I thought this would be a great way to put my GSCEs in Geography and History to use. Knowing more about the history and geography of Finland would help me in my ‘detective’ hunt through the collection of books. I would learn about important keywords I might need to use to help me identify relevant books in the digitised collection.

Here are some useful facts that you may not know about Finland:

  • Finland had autonomy with Russia on 29 March 1809.
  • Finland received independence on 6 December 1917.
  • Finland joined the European Union on 1 January 1995.

These and more facts can be accessed online: https://en.wikipedia.org/wiki/Finland

Map of Finland picA map showing Finland, taken from Wikipedia: https://en.wikipedia.org/wiki/Finland

This gave me a clue in understanding that there may in fact be several books in the collection in the Russian Language that could cover Finland, given that Finland was given autonomy in 1809 from Russia. Looking at the map of Finland, I also realised that bordering countries would most likely have books about Finland as well.

Approach

Analysing the collection spreadsheet 

Master spreadsheet pic 2A screen shot of a section of the spreadsheet containing 65,000 records of digitised books in the ‘Microsoft Books’ collection.

My first task was to examine the huge spreadsheet containing information about the 65,000 books in the collection.

There were several lines of ‘attack’ we could take in finding information about Finland in this collection, some which involve using the ‘Filter’ function in Excel.

Master spreadsheet picScreen shot from Microsoft Books Spreadsheet: 1. The 'Filter' function in Excel. 2. Filter has been applied on the language code for Finland ‘fin’

We came up with the following strategy:

  1. Find words relating to 'Finland' in the Title field in the spreadsheet for the books.
  2. This task would have to be done in several languages as there are 28 languages listed in the language code field (column C). I decided I would prioritise English and languages of bordering nations around Finland and if I had time would look at the other languages too.
  3. I knew I would have to use Google translate (https://translate.google.co.uk/) to find equivalent words in that language relating to Finland to help me with filtering.

In terms of thinking of what words I might use for the filtering, Mahendra suggested that it might be useful to create a word cloud about all things 'Finnish'; this might help me decide which words were the most important and to use first in filtering.

I used https://tagul.com/ and here is the word cloud I made using the Wikipedia page about Finland:

Word cloud picWordcloud created using Tagul, based on the Wikipedia page in English about Finland.

From this, we decided to use the following words (the amount of words was limited due to time): Finland, Finnish, Helsinki and Finn. 

We also filtered using Danish, Swedish, German, English, Finnish and Russian languages and using related words about Finland in those languages.

Below is a summary table showing the number of books we found by applying a filter to the 'Title' field in the spreadsheet about words related to 'Finland'.

Table 1The table above shows the number of books I found using various filters in the digitised collection.

Please note, that I didn’t have time to look further into the collections we found in some of the non-English language collections, as I am not a native speaker in any of them. More time would be needed to filter this collection. The spreadsheet is available here.

What is interesting, however, is that we know there are 582 books in the collection in the Russian language, details of which I sent to Katya Rogatchevskaia, Lead Curator of East European Collections. 

Images in the books about Finland

I learned how the images from the 'Microsoft' books were extracted and placed on The British Library’s Flickr page. This slide from a BL Labs presentation nicely summarises how it all happened: 

Flickr process pic

Taken from the BL Labs Slideshare account, http://www.slideshare.net/labsbl

More information is available from a blog post written by Ben O’Steen, Technical Lead of BL Labs, which explains this process in much more detail.

What I realised was that there must be images identified in these books which relate to Finland. Mahendra suggested that I first look at some work done by the Wikimedia community on trying to find maps within these images.

Wikimedia commons synoptic index

The Wikimedia Commons Synoptic Index for the Mechanical Curator images, contains a really handy breakdown of the images by geographical place.

Wikimedia pic

Image taken from British Library/Mechanical Curator collection/Synoptic index, Europe.

From this, I was able to find that there were 12 books that had been identified as having images which had something to do with Finland in them.

Wikimedia Finland picImage taken from Wikimedia Commons page.

This was a great way to start, but now I thought I would try the British Library’s Flickr Commons site to see if there were more images about Finland that had been tagged with Finland-related words.

British Library Flickr Commons

As of 07/07/16 there are 1,023,705 images on the British Library’s Flickr Commons page; a large proportion of these come from images snipped out of the digitised books that I have been working on.

The site has had an incredible 400,000,000 plus views and users have tagged over 100,000 images with around 500,000 tags. I am really looking forward to see what the winners of the Labs Competition 2016 will do on their SherlockNet project as they are hoping to tag all the images using computers code!

For now, I wanted to use the tags already there to see if I could find images relating to Finland.

Here is an example image which has several tags added, some of which relate to Finland:

  Image from Flickr 1 Flickr tags pic
Tags added to an example image on the British Library Flickr Commons page.

Here you can see tags such as ‘Finland’, ‘Suomi’ (Finnish for ‘Finnish’), ‘Helsinki’, ‘Helsingfors’ (Swedish for ‘Helsinki’) etc. which have been added by Flickr users (grey tags). Please note that tags in white are those added automatically by Flickr itself.

I have summarised the images I have found on the British Library’s Flickr Commons collection below:

 Keyword(s) used and link to BL Flickr Commons   Number of images found 
Finland 917
Helsinki 18
Suomi 3
Suomen 418
Suomalaiset 15
Finns 42
Finnish 352
Gulf of Finland 43
Kulturbilder ur Finlands historie 1
Turku 3
Pori 4
Tampere 1
Kuopio 2
Hanko 177
Lapland 148
Suomenlinna 2
Kemi 1
Total 1997

 Table showing links and number of British Library Flickr Commons images about Finland

What is clear from this initial research is that there are definitely more books with images about Finland than the 12 identified through Wikimedia Commons. Much more work will be needed on this. Also, I would recommend that all the images that I have found be downloaded so that they may be used for the Finnish 100 year independence project.

In conclusion, I have enjoyed being able to participate in this project and have loved getting involved in some work on it. Although it has been relatively challenging, this new experience has been very interesting and I have definitely enjoyed spending my time on it. On the other hand, I would say that more time is certainly needed on this project to find more books in the 65,000 collection as I have only had a limited amount of time to spend on it. Furthermore, I would recommend that more words relating to Finland should be found and used in several languages to filter the master spreadsheet, in order to add more books to the Finnish collection. Lastly, one other thing that could be done to develop this project even further is to work with the curators of other languages to help identify Finland-related books.

If you would like to find more sub collections in the Microsoft books collection, please email labs@bl.uk, they would love to hear from you!

Tomorrow I will blog about my work experience at the library.

 

 

15 April 2016

The Georgian Pingbacks Project

Add comment Comments (0)

Posted by Mahendra Mahey, Manager of BL Labs on behalf of Dr. Melodee Beals, Lecturer in Digital History, Department of Politics, History and International Relations, Loughborough University.

Georgian Pingbacks

In the wild west of the World Wide Web, if you compose a hilarious joke, provide a simple solution to a complex problem or break a major new story, it is almost certain that your work will be copied. Although intellectual property laws exist, they are inconsistently enforced because of the sheer number of sites where reposting occurs - a number that increases with each passing second. If you are lucky, and your re-poster is honest, you may discover how far your ideas have spread through a pingback, an automatically generated comment on your original blog post with a link to its reprint.

In the nineteenth century, reprinting—especially unauthorised reprinting—was the backbone of Atlantic journalism but, unlike modern bloggers, these authors had no effective means of discovering the fate of their quips or queries, except through chance encounters with competing papers or their readers. Although concerns of commercial losses are long past, this lack of attribution continues to plague researchers working with newspapers. Without a precise date of composition or of original publication, and without a specific or even a corporate author, the provenance of these texts remain frustratingly uncertain. One solution to this problem is to track reprinting through text-matching. Using plagiarism detection software, we can carefully reconnect different versions appearing in a wide range of publications. Yet, however efficient our text-matching processes become, two major problems remain. First, text-matching requires machine-readable versions of the articles—electronic texts rather than images. While the sheer number of historical newspapers that have been digitised is impressive, the number that have high-quality, searchable text is deceptively limited. Many community sites have uploaded images of their physical or microfilm archives but do not have the resources to create fully searchable transcriptions. Others, created by state or commercial providers, have relied upon optical-character recognition, the accuracy of which is subject to wild variations. Even when OCR texts are excellent, these represent a considerable investment to providers and often remain locked behind subscription fees.

Reprints within the British Library's 19th Century Newspaper Database, 1818-1819, based on analysis with Copyfind
Reprints within the British Library's 19th Century Newspaper Database, 1818-1819, based on analysis with Copyfind

Thanks to the efforts of public institutions—including the British Library, National Library of Wales, National Library of Australia and the Library of Congress—machine-readable transcriptions for a large number of nineteenth-century newspapers are now available to researchers. But within these collections, a second, more sinister problem arises. No matter how diligently archivists have worked to provide a representative or diverse selection, these digital holdings remain only a slice of the sprawling news network that once existed. Even if we find every single digital copy of a text, how can we know for sure that the original is among them? It is here that the humble pingback returns to the fore. Whether prompted by the innate honesty of editors or by their desire to establish the authenticity of their materials, a significant minority of newspapers articles contained an attribution. Whether appearing as an introductory dateline or a concluding tagline, these Georgian pingbacks offer tantalising clues as to the true origins of these anonymised texts. Yet, because only a minority of articles contain these attributions, because they can appear in many different forms or locations within the article text and because OCR is frustratingly inconsistent in transcribing italic and gothic typefaces, searching for datelines algorithmically is exceedingly difficult.A Snippet from the Ipswich Journal, 13 January 1821. Courtesy of the British Library.

A Snippet from the Ipswich Journal, 13 January 1821. Courtesy of the British Library.

That is where the crowd come in. Although computers can process data very quickly, the human brain is still more adept at finding patterns when the parameters for those patterns are particularly fuzzy. Because of this, it was easier for astronomers to train volunteers to identify dusty debris disks in nebulae than to train computers to do the same thing. And what is true for nebulae is equally true of these Georgian pingbacks. Using thousands of images from the British Library's 19th-Century Newspapers collection, we have created a new site where you can help spot these attributions and provide researchers with what Georgian authors could only dream of, a in-depth understanding of just who was stealing from whom! The site includes an in-depth tutorial on the structure of nineteenth-century newspapers articles as well as three different ways you can help us tag the database. So, whether you have a smart phone and 5 minutes waiting for your train or want to explore the collection in more depth at your home PC, please visit Georgian Pingabcks and try your hand uncovering a 200-year-old case of plagiarism.

Dr M. H. Beals is a historian of migration and media a Loughborough University. She would like to thank the following undergraduate students at Loughborough University's Department of Politics, History and International Relations for their work on this project. Will Dickinson, Alice Gilbert, Ollie Luhrs, Alex Mackinder, Pooja Makwana, Matthew McCulloch, Jonny Ord, Emily Stanyard and Rebecca Thompson.