THE BRITISH LIBRARY

Digital scholarship blog

129 posts categorized "Data"

14 May 2018

Seeing British Library collections through a digital lens

Add comment

Digital Curator Mia Ridge writes: in this guest post, Dr Giles Bergel describes some experiments with the Library's digitised images...

The University of Oxford’s Visual Geometry Group has been working with a number of British Library curators to apply computer vision technology to their collections. On April 5 of this year I was invited by BL Digital Curator Dr. Mia Ridge to St. Pancras to showcase some of this work and to give curators the opportunity to try the tools out for themselves.  

Image1
Visual Geometry’s VISE tool matching two identical images from separate books digitised for the British Library’s Two Centuries of Indian Print project.

Computer vision - the extraction of meaning from images - has made considerable strides in recent years, particularly through the application of so-called ‘deep learning’ to large datasets. Cultural collections provide some of the most interesting test-cases for computer vision researchers, due to their complexity; the intensity of interest that researchers bring to them; and to their importance for human well-being. Can computers see collections as humans do? Computer vision is perhaps better regarded as a powerful lens rather than as a substitute for human curation. A computer can search a large collection of images far more quickly than can a single picture researcher: while it will not bring the same contextual understanding to bear on an image, it has the advantage of speed and comprehensiveness. Sometimes, a computer vision system can surprise the researcher by suggesting similarities that weren’t readily apparent.

As a relatively new technology, computer vision attracts legitimate concerns about privacy, ethics and fairness. By making its state of the art tools freely available, Visual Geometry hope to encourage experimentation and responsible use, and to enlist users to help determine what they can and cannot do. Cultural collections provide a searching test-case for the state of the art, due to their diversity as media (prints, paintings, stamped images, photographs, film and more) each of which invite different responses. One BL curator made a telling point by searching the BBC News collection with the term 'football': the system was presented with images previously tagged with that word that related to American, Gaelic, Rugby and Association football. Although inconclusive due to lack of sufficiently specific training data, the test asked whether a computer could (or should) pick the most popular instances; attempt to generalise across multiple meanings; or discern separate usages. Despite increases in processing power and in software methods, computers' ability to generalise; to extract semantic meaning from images or texts; and to cope with overlapping or ambiguous concepts remains very basic.  

Other tests with BL images have been more immediately successful. Visual Geometry's Traherne tool, developed originally to detect differences in typesetting in early printed books, worked well with many materials that exhibit small differences, such as postage stamps or doctored photographs. Visual Geometry's Image Search Engine (VISE) has shown itself capable of retrieving matching illustrations in books digitised for the Library's Indian Print project, as well as certain bookbinding features, or popular printed ballads. Some years ago Visual Geometry produced a search interface for the Library's 1 Million Images release. A collaboration between the Library's Endangered Archives programme and Oxford researcher David Zeitlyn on the archive of Cameroonian studio photographer Jacques Toussele employed facial recognition as well as pattern detection. VGG's facial recognition software works on video (BBC News, for example) as well as still photographs and art, and is soon to be freely released to join other tools under the banner of the Seebibyte Project.    

I'll be returning to the Library in June to help curators explore using the tools with their own images. For more information on the work of Visual Geometry on cultural collections, subscribe to the project's Google Group or contact Giles Bergel.      

Dr. Giles Bergel is a digital humanist based in the Visual Geometry Group in the Department of Engineering Science at the University of Oxford.  

The event was supported by the Seebibyte project under an EPSRC Programme Grant EP/M013774/1

 

08 May 2018

The Italian Academies database – now available in XML

Add comment

Dr Mia Ridge writes: in 2017, we made XML and image files from a four-year, AHRC-funded project: The Italian Academies 1525-1700 available through the Library's open data portal. The original data structure was quite complex, so we would be curious to hear feedback from anyone reusing the converted form for research or visualisations.

In this post, Dr Lisa Sampson, Reader in Early Modern Italian Studies at UCL, and Dr Jane Everson, Emeritus Professor of Italian literature, RHUL, provide further information about the project...

New research opportunities for students of Renaissance and Baroque culture! The Italian Academies database is now available for download. It's in a format called XML which represents the original structure of the database.

This dedicated database results from an eight-year project, funded by the Arts and Humanities Research Council UK, and provides a wealth of information on the Italian learned academies. Around 800 such institutions flourished across the peninsula over the sixteenth and seventeenth centuries, making major contributions to the cultural and scientific debates and innovations of the period, as well as forming intellectual networks across Europe. This database lists a total of 587 Academies from Venice, Padua, Ferrara, Bologna, Siena, Rome, Naples, and towns and cities in southern Italy and Sicily active in the period 1525-1700. Also listed are more than 7,000 members of one or more academies (including major figures like Galileo, as well as women and artists), and almost 1,000 printed works connected with academies held in the British Library. The database therefore provides an essential starting point for research into early modern culture in Italy and beyond. It is also an invitation to further scholarship and data collection, as these totals constitute only a fraction of the data relating to the Academies.

Terracina
Laura Terracina, nicknamed Febea, of the Accademia degli Incogniti, Naples

The database is designed to permit searches from many different perspectives and to allow easy searching across categories. In addition to the three principal fields – Academies, People, Books – searches can be conducted by title keyword, printer, illustrator, dedicatee, censor, language, gender, nationality among others. The database also lists and illustrates the mottoes and emblems of the Academies (where known) and similarly of individual academy members. Illustrations from the books entered in the database include frontispieces, colophons, and images from within texts.

Intronati emblem
Emblem of the Accademia degli Intronati, Siena


The database thus aims to promote research on the Italian Academies in disciplines ranging from literature and history, through art, science, astronomy, mathematics, printing and publishing, censorship, politics, religion and philosophy.

The Italian Academies project which created this database began in 2006 as a collaboration between the British Library and Royal Holloway University of London, funded by the Arts and Humanities Research council and led by Jane Everson. The objective was the creation of a dedicated resource on the publications and membership of the Italian learned Academies active in the period between 1525 and 1700. The software for the database was designed in-house by the British Library and the first tranche of data was completed in 2009 listing information for academies in four cities (Naples, Siena, Bologna and Padua). A second phase, listing information for many more cities, including in southern Italy and Sicily, developed the database further, between 2010 and 2014, with a major research grant from the AHRC and collaboration with the University of Reading.

The exciting possibilities now opened up by the British Library’s digital data strategy look set to stimulate new research and collaborations by making the records even more widely available, and easily downloadable, in line with Open Access goals. The Italian Academies team is now working to develop the project further with the addition of new data, and the incorporation into a hub of similar resources.

The Italian Academies project team members welcome feedback on the records and on the adoption of the database for new research (contact: www.italianacademies.org).

The original database remains accessible at http://www.bl.uk/catalogues/ItalianAcademies/Default.aspx 

An Introduction to the database, its aims, contents and objectives is available both at this site and at the new digital data site: https://data.bl.uk/iad/

Jane E. Everson, Royal Holloway University of London

Lisa Sampson, University College, London

25 April 2018

Some challenges and opportunities for digital scholarship in 2018

Add comment

In this post, Digital Curator Dr Mia Ridge shares her presentation notes for a talk on 'challenges and opportunities for digital scholarship' at the British Library's first Research Collaboration 'Open House'.

I'm part of a team that supports the creation and innovative use of the British Library's digital collections. Our working definition of digital scholarship is 'using computational methods to answer existing research questions or challenge existing theoretical paradigms'. In this post/talk, my perspective is informed by my knowledge of the internal processes necessary to support digital scholarship and of the issues that some scholars face when using digital/digitised collections, so I'm not by any means claiming this is a complete list.

Opportunities in digital scholarship

  • Scale: you can explore a bigger body of material computationally - 'reading' thousands, or hundreds of thousands, of volumes of text, images or media files - while retaining the ability to individually examine individual items as research questions arise from that distant reading
  • Perspective: you can see trends, patterns and relationships not apparent from close reading individual items, or gain a broad overview of a topic
  • Speed: you can test an idea or hypothesis on a large dataset; prototype new interfaces; generate classification data about people, places, concepts; transcribe content

Together, these opportunities enable new research questions.

Sample digital scholarship tools and methods

Some of these processes help get data ready for analysis (e.g. turning images of items into transcribed and annotated texts), while others support the analysis of large collections at scale, improve discoverability or enable public engagement.

  • OCR, HTR - optical character recognition, handwritten text recognition
  • Data visualisation for analysis or publication
  • Text and data mining - applying classifications to or analysing texts, images or media. Key terms include natural language processing, corpus linguistics, sentiment analysis, applied machine learning. Examples include: Voyant tools, Clarifai image classification.
  • Mapping and GIS - assigning coordinates to quantitative or qualitative data
  • Public participation and learning including crowdsourcing, citizen science/history. Examples include In the Spotlight, transcribing information from historical playbills.
  • Creative and emerging formats including games
An experiment with image classification with Clarifai
An experiment with image classification with Clarifai

Putting it all together, we have case studies like Dr. Katrina Navickas, BL Labs Winner 2015's Political Meetings Mapper. This project, based on digitised 19th century newspapers, used Python scripts to calculate the meeting date, and extract and geocode their locations to create a map of Chartist meetings.

The Library has created a data portal, data.bl.uk, containing openly licensed datasets. We aim to describe collections in terms of their data format (images, full text, metadata, etc.), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes. Other datasets may be available by request, or digitised via funded partnerships.

We're aware that, currently, it can be hard to use the datasets from data.bl.uk as they can be too large to easily download, store and manipulate. This leads me neatly onto...

Challenges in digital scholarship

  • Digitisation and cataloguing backlog - the material you want mightn't be available without a special digitisation project
  • Providing access to assets for individual items - between copyright and technology, scholars don't always have the ability to download OCR/HTR text, or download all digitised media about an item
  • Providing access to collections as datasets - moving more material into the 'sweet spot' of material that's nicely digitised in suitable formats, usable sizes, with open licences allowing for re-use is an on-going (and expensive, time-consuming process)
  • 'Cleaning' historical data and dealing with gaps in both tools provision and source collections - none of these processes are straightforward
  • Providing access to platforms or suites of tools - how much should the Library take on for researchers, and how much should other institutions or individuals provide?
  • Skills - where will researchers learn digital scholarship methods?
  • Peer review - what if your discipline lacks DS-skilled peers? How can peers judge a website or database if they've only had experience with monographs or articles? How can scholars overcome prejudice about the 'digital'?
  • Versioning datasets as annotations or classifications change, software tools improve over time, transcriptions are corrected, etc - some of these changes may affect the argument you're making

Overall, I hope the opportunities outweigh the challenges, and it's certainly possible to start with small projects with existing tools and digital sources to explore the potential of a larger project.

If you've used BL data, you can enter the BL Labs awards - they don't close until October so you have time to start an experimental project now! You can also ask the Labs team to reality check your digital scholarship idea based on Library collections and data.

Digital scholarship is constantly shifting so on another date I might have come up with different opportunities and challenges. Let me know if you have challenges or opportunities that you think could be included in this very brief overview!

21 April 2018

On the Road (Again)

Add comment

Flickr image: Wanderer
Image from the British Library’s Million Images on Flickr, found on p 198 of 'The Cruise of the Land Yacht “Wanderer”; or, thirteen hundred miles in my caravan, etc' by William Gordon Stables, 1886.

Now that British Summer Time has officially arrived, and with it some warmer weather, British Library Labs are hitting the road again with a series of events in Universities around the UK. The aim of these half-day roadshows is to inspire people to think about using the library's digitised collections and datasets in their research, art works, sound installations, apps, businesses... you name it!

A digitised copy of a manuscript is a very convenient medium to work on, especially if you are unable to visit the library in person and order an original item up to a reading room. But there are so many other uses for digitised items! Come along to one of the BL Labs Roadshows at a University department near you and find out more about the methods used by researchers in Digital Scholarship, from data-mining and crowd sourcing to optical character recognition for transcribing the words from an imaged page into searchable text. 

At each of the roadshow events, there will be speakers from the host institution describing some of the research projects they have already completed using digitised materials, as well as members of the British Library who will be able to talk with you about proposed research plans involving digitised resources. 

The locations of this year's roadshows are: 

Mon 9th April - BL Labs Roadshow 2018 (Open University) - internal event

Mon 26th March - BL Labs Roadshow 2018 (CityLIS) - internal event

Thu 12th April - BL Labs Roadshow 2018 (University of Bristol & Cardiff Digital Cultures Network)

Tue 24th April - BL Labs Roadshow 2018 (UCL)

Wed 25th April - BL Labs Roadshow 2018 (University of Kent)

Wed 2nd May - BL Labs Roadshow 2018 (University of Edinburgh)

Tue 15th May - BL Labs Roadshow 2018 (University of Wolverhampton)

Wed 16th May - BL Labs Roadshow 2018 (University of Lincoln)

Tue 5th June - BL Labs Roadshow 2018 (University of Leeds)

  BL Labs Roadshows 2018
See a full programme and book your place using the Eventbrite page for each event.

If you want to discover more about the Digital Collections, and Digital Scholarship at the British Library, follow us on Twitter @BL_Labs, read our Blog Posts, and get in touch with BL Labs if you have some burning research questions!

12 April 2018

British Library Labs application for Digital Research support

Add comment

BL Labs supports researchers, artists, entrepreneurs and educators who want to use the British Library's digital collections and data

We are proud to announce the launch of a new service where we will able to provide up to 5 days support to help you develop a project idea that uses our digital collections and data. In that time, we will help you understand the collection(s) you want to work with and will provide technical, curatorial and legal advice about your project. We can also help you with scope, costs, time-frames, risks and any other relevant issues.

Lightbulbs
Get support to develop an idea using the British Library's Digital Collections & Data

We will review and select applications at the beginning of each month. If your application is selected, we will work with you to provide targeted support and help you develop your project further.

We strongly recommend that before you submit your idea you explore the digital collections and data you are interested in and contact us at labs@bl.uk for some initial guidance.

You can also visit our previous ideas and projects pages for inspiration.

Once you're ready to go, send in your application using this form,

The 2018 BL Labs Awards: enter before midnight Thursday 11th October!

Add comment

With six months to go before the submission deadline, we would like to announce the 2018 British Library Labs Awards!

The BL Labs Awards are a way of formally recognising outstanding and innovative work that has been created using the British Library’s digital collections and data.

Have you been working on a project that uses digitised material from the British Library's collections? If so, we'd like to encourage you to enter that project for an award in one of our categories.

This year, the BL Labs Awards is commending work in four key areas:

  • Research - A project or activity which shows the development of new knowledge, research methods, or tools.
  • Commercial - An activity that delivers or develops commercial value in the context of new products, tools, or services that build on, incorporate, or enhance the Library's digital content.
  • Artistic - An artistic or creative endeavour which inspires, stimulates, amazes and provokes.
  • Teaching / Learning - Quality learning experiences created for learners of any age and ability that use the Library's digital content.

BLAwards2018
BL Labs Awards 2018 Winners (Top-Left- Research Award Winner – A large-scale comparison of world music corpora with computational tools , Top-Right (Commercial Award Winner – Movable Type: The Card Game), Bottom-Left(Artistic Award Winner – Imaginary Cities) and Bottom-Right (Teaching / Learning Award Winner – Vittoria’s World of Stories)

There is also a Staff award which recognises a project completed by a staff member or team, with the winner and runner up being announced at the Symposium along with the other award winners.

The closing date for entering your work for the 2018 round of BL Labs Awards is midnight BST on Thursday 11th October (2018)Please submit your entry and/or help us spread the word to all interested and relevant parties over the next few months. This will ensure we have another year of fantastic digital-based projects highlighted by the Awards!

The entries will be shortlisted after the submission deadline (11/10/2018) has passed, and selected shortlisted entrants will be notified via email by midnight BST on Friday 26th October 2018. 

A prize of £500 will be awarded to the winner and £100 to the runner up in each of the Awards categories at the BL Labs Symposium on 12th November 2018 at the British Library, St Pancras, London.

The talent of the BL Labs Awards winners and runners up from 2017, 2016 and 2015 has resulted in a remarkable and varied collection of innovative projects. You can read about some of the 2017 Awards winners and runners up in our other blogs, links below:

BLAwards2018-Staff
British Library Labs Staff Award Winner – Two Centuries of Indian Print


Research category Award (2017) winner: 'A large-scale comparison of world music corpora with computational tools', by Maria Panteli, Emmanouil Benetos and Simon Dixon. Centre for Digital Music, Queen Mary University of London

  • Research category Award (2017) runner up: 'Samtla' by Dr Martyn Harris, Prof Dan Levene, Prof Mark Levene and Dr Dell Zhang
  • Commercial Award (2017) winner: 'Movable Type: The Card Game' by Robin O'Keeffe
  • Artistic Award (2017) winner: 'Imaginary Cities' by Michael Takeo Magruder
  • Artistic Award (2017) runner up: 'Face Swap', by Tristan Roddis and Cogapp
  • Teaching and Learning (2017) winner: 'Vittoria's World of Stories' by the pupils and staff of Vittoria Primary School, Islington
  • Teaching and Learning (2017) runner up: 'Git Lit' by Jonathan Reeve
  • Staff Award (2017) winner: 'Two Centuries of Indian Print' by Layli Uddin, Priyanka Basu, Tom Derrick, Megan O’Looney, Alia Carter, Nur Sobers khan, Laurence Roger and Nora McGregor
  • Staff Award (2017) runner up: 'Putting Collection metadata on the map: Picturing Canada', by Philip Hatfield and Joan Francis

For any further information about BL Labs or our Awards, please contact us at labs@bl.uk.

23 March 2018

Shine a light on past entertainments with In the Spotlight

Add comment

In this post, Dr Mia Ridge and Alex Mendes provide an update on the Library's latest crowdsourcing project...

People who've explored In the Spotlight, our project helping make historic playbills more findable, might have noticed a line of text just above the 'Save and Continue' button: 'Seen something interesting? Add a note'.

Insights from your comments

Since the project began, we've received almost 700 comments [update - it's actually over 1900, across all projects]. Some of them simply tell us that an image is blank or upside-down, but many others share interesting findings. We love hearing from you, and we've been highlighting individual comments on Twitter (@LibCrowds) and on our forum.

Comments have pointed out spectacles including 'a Terrific Eruption of Mount Vesuvius, accompanied by TORRENTS OF BURNING LAVA' and a 'Serpent vomiting Fire'. New amenities mentioned include lighting ('600 wax lights and a new set of gold chandeliers' or new gas lighting) and the addition of backs to seats. Famous actors spotted include Sarah Siddons, Jenny Lind and Ira Aldridge, while Mr Kean has caused all kinds of trouble.

Lots of comments are about performances that aren't plays, from hornpipes to tableaux to ballets, songs, speeches, fireworks, scientific demonstrations, performing animals, panoramas, conjuring and juggling tricks, lists of scenery, gun tricks, pantomimes, acrobatics, excerpts from plays, and even the 'reenactment of the Coronation'! We're thinking hard about the best way to deal with them (and with playbills that don't include a year), and will post to the forum and twitter to ask for your ideas soon.

General updates

Since we first shared the link, there have been over 4,700 visitors from 91 countries. About 80% are primarily English-speakers, with Russian, German and French the next most popular languages.

We've had over 42,000 contributions from over 630 participants (with 1499 participants registered on the platform overall). Together, they've helped complete 34 projects by undertaking countless marking and transcription tasks to make genres, dates and play titles searchable.

Each project is based on a specific volume of playbills from a regional theatre or theatres. The fastest projects were 'Theatre Royal, Bristol 1819-1823 (Vol. 2)', completed in 8 days, 31 minutes, with 'Miscellaneous Plymouth theatres 1796-1882 (Vol. 1)' a close second at 8 days, 5 hours, 30 mins. We currently have playbills from theatres in Dublin, Hull, Nottingham - Oswestry or Plymouth - which will be completed first?

Recent blog posts include a wonderful story from PhD student and In the Spotlight participant Edward Mills tracing an ancient custom through the Library's digitised collections in The Flitch of Bacon: An Unexpected Journey Through the Collections of the British Library, and Christian Algar on the 'rich pageant' of historical playbills.

You might have noticed some small changes to the navigation and data pages as we updated the software this week. Most of the changes were behind the scenes, providing additional admin and analysis functions to ensure that data sent off to the catalogue is as accurate as possible.

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2018-03-23/3bfdfe7285d54738a6f225032e20b995.png
Visitors have come from all over the world, but we'd love to reach more

 

Thank you!

We're grateful to everyone who's made a large or small contribution, but particular thanks to Barbara G, David Y, Dina S, Ervins S, Jo B, John L, Katharine S, Kathryn P-S, Lisa G, Maria Antonia V-S, Martin B, mistrec, Olga K, Raphael H, Rosie C, Sharon E, sylvmorris1, Tabitha M, thtrisdead, Tif D, Vijay V and various anonymous posters for your comments. Your comments are also helping us work out how to tweak some of the interfaces so people can let us know about a problem with a task by clicking a button, so expect more improvements in the future!

Step into the Spotlight

It's easy to try out In the Spotlight - you don't need to register, so you can start marking out the titles of plays or transcribing the titles, dates or genres of plays straight away. Give it a go and let us know what you find!

image from http://s3.amazonaws.com/feather-files-aviary-prod-us-east-1/98739f1160a9458db215cec49fb033ee/2018-03-23/63194392defb46a8bae006ea04dc7148.png
There are wonders galore waiting for the spotlight

21 March 2018

BL Labs 2017 Symposium: Vittoria's World of Stories, Learning & Teaching Award Winner

Add comment

Vittoria’s 'World of Stories' - the BL Labs Learning and Teaching Award Winner 2017 - is a project led by parents at Vittoria Primary School through the PTA, with the support of school staff. The aim of the project is to collect and share traditional tales from around the world and creative work by current pupils through workshops, the production of a book, school assemblies, readings and performances, and via the creation of audio, text and images for the school website during the current academic year. The illustrations for the project are drawn from the British Library’s Flickr collection which are displayed alongside pupils’ artwork.

VS 1
The front cover of Vittoria primary school's 'World of Stories'

Our school is a diverse community of learners: pupils’ families come from a wide range of ethnic and cultural backgrounds. Languages spoken by pupils at home include Arabic, Bengali, Vietnamese, Russian, Chechen, Turkish and Somali. One of the pedagogical goals of the project was to make visible the similarities between well-loved traditional tales and explore how different cultures use the same cast of characters - heroes and heroines, tricksters and magicians, villains and monsters – in order to speak across generations about what it means to be human. We wanted to promote and celebrate the diversity of the multi-cultural community which makes up our school, and show parents and children that the characters and stories they love are shared by others from different cultures.   

 The stories in the book include original works by pupils, gathered through a story-writing competition with winning entries selected by the PTA committee. We also asked parents to nominate traditional tales for inclusion in the collection, and held a bi-lingual (English and Arabic) story-sharing workshop for parents organised by the PTA. During the workshop, parents spoke about well-known traditional tales which they remembered from childhood and discussed the contrasts and similarities between characters and narratives from different cultures. For example, the section of the book which presents ‘bogeyman’ type monsters was developed from discussions in the workshop. We discovered that the Beast from Beauty and the Beast is called ‘Al-Ba’ati’ in Sudan, where the story is known as ‘Jamila wal Ba’ati’. Sudanese parents discussed how ‘Al-Ba’ati’ is used to encourage good behaviour in children, which prompted another parent to share her family’s stories of ‘The Boogerman’ who plays a similar role in persuading children to stay in bed at bedtime.

VS 4
One of the British Library's Flickr images, used as an illustration

The project also links with our work within the classroom to develop children’s reading skills, through promoting a love of reading and books at home. By showing that we value and celebrate the oral culture of storytelling between parents and children, and by collecting and translating tales from languages other than English, we aim to encourage parents to read with their children and support their learning at home.

The project has had a positive effect within the school community, by promoting dialogue and interaction between parents from different cultures through the parents’ workshop, and provided a vehicle to celebrate pupils’ achievements to the school community. Parents have also bought copies of the book to share with family and friends. One of our parent contributors took copies of the book to share with older generations of her family in Sudan during a recent visit, and we hope that other parents will do the same.

During the next phase of the project we will be organising a series of readings and performances using the book with different year groups and making audio recordings which we will publish on the school website for parents to download and listen to with their children at home.

If this blog post has stimulated your interest in working with the British Library's digital collections, start a project and enter it for one of the BL Labs 2018 Awards! Join us on 12 November 2018 for the BL Labs annual Symposium at the British Library.

Posted by BL Labs on behalf of Vittoria Primary school