THE BRITISH LIBRARY

Digital scholarship blog

100 posts categorized "Humanities"

25 November 2020

Early Circus in London: Astley's Amphitheatre by Professor Leith Davis

Add comment

Posted on behalf of Professor Leith Davis at Simon Fraser University, British Colombia, Canada by Mahendra Mahey, Manager of BL Labs.

Astley-archive-Th.Cts.35
Picture of cutting taken from the Astley's newspaper clippings archive Th.Cts.35 (held at the British Library)

What do you think of when you hear the word “circus”? Lions, tigers, elephants? Ringmasters in coat-tails? Trapeze artists? In fact, most of the images that we commonly associate with circus derive from nineteenth-century examples of the genre. Circus when it first started out in the late eighteenth century was a different kind of entertainment altogether. Yes, there were animal acts, including equestrian riding stunts, and there were also acrobatics. But early circus also included automatons and air balloons, pantomime and fireworks, musical acts and re-enactments of events like the storming of the Bastille. In short, it was a microcosm of the Georgian world which served to re-present important political and cultural activities by re-mixing them with varieties of astonishing physical entertainments.

Ackermann-rudolph-microcosm-083720
Astley's Amphitheatre from Microcosm of London
Image taken from the British Library Archive

Unfortunately, partially as a result of the overpowering influence of the lions and tigers and ringmasters, and partially as a result of its having fallen through the cracks between academic disciplinary divisions, early circus has been largely forgotten.

The database that I created, “Reconstructing Early Circus: Entertainments at Astley’s Amphitheatre, 1768-1833” (https://dhil.lib.sfu.ca/circus/), based on materials held by the British Library, aims to bring early circus back from offstage and to connect the ephemeral traces of this eighteenth-century entertainment with the concerns of our contemporary age.

Philip-Astley
Phillip Astley - Image Copyright 
National Portrait Gallery

The man credited with “inventing” the form of entertainment known now as circus was Philip Astley. Astley was certainly not the first person to perform popular equestrian entertainments for money, but he is acknowledged to have been the first person to have had the idea of using an enclosed space where he could present his equestrian shows to a paying audience. Over the years, Astley’s Amphitheatre and Riding School evolved to include both a ring and a stage. Astley was an astute businessman and was able to expand his enterprise to include circuses in Dublin and Paris. His success also encouraged other entertainment entrepreneurs to try their hand at the circus business. Sites of entertainment similar to Astley’s sprang up within London and other locations in the British archipelago as well as in Europe and North America, including Jones’s Equestrian Amphitheatre in Whitechapel (1786), Swan’s Amphitheatre in Birmingham (1787), the Edinburgh Equestrian Circus (1790), Ricketts's Equestrian Pantheon in Boston (1794) and Montreal (1797), and the Royal Circus, Equestrian and Philharmonic Academy in London (1782). Circus was not just as a type of entertainment in the metropolis; it was also a transnational phenomenon.

Pony race
Poney Race at Astley's Amphitheatre, image from V&A Museum

I drew the data for  “Reconstructing Early Circus” from the British Library’s “Astley’s Cuttings From Newspapers” (Th. Cts. 35-37). This source consists of three volumes of close to 3,000 newspaper advertisements of entertainments featured at Astley’s from 1768 to 1833, along with a few manuscript materials and a lock of Astley’s daughter’s hair. The clippings were collected by the theatre manager, James Winston, for a history of theatre which he never published. Working with my research assistant, Emma Pink, I photographed each of the clippings from the BL volumes in the reading room and got 4 undergraduate students to transcribe them. Then I worked with the personnel at Simon Fraser University’s Digital Humanities Research Lab to create the website. Users can browse through the sixty-year history of Astley’s or, using the search function, they can identify the frequency of particular acts or performers, for example. The materials represent a rich treasure trove for scholars of: Romantic-era cultural and media studies; British history; economic and business history; performance studies; fine arts; and cultural memory studies. 

As I continue to expand and improve on the site, I hope to use my database to explore connections between early circus and other popular entertainments of the day as well as to expand the site to examine circus locations in transatlantic locations. 

Examining the Astley archives allows us to learn more about leisure in the long eighteenth century as well as about the connections between popular entertainment and political and social concerns in Georgian times, and, by extension, in our own era. Lions and tigers and ringmasters you won’t find here, but check out the “little Learned Military Horse,” the trained bees, and, of course, the equestrian feats of Astley himself for more insight into this neglected popular entertainment from 200 years ago. 

(See also Leith Davis. "Between Archive and Repertoire: Astley's Amphitheatre, Early Circus, and Romantic-Era Song Culture." Studies in Romanticism 58, no. 4 (2019): 451-79).

Leith-davis
Leith Davis, Professor of English at Simon Fraser University in British Columbia, Canada

Leith Davis is Professor of English at Simon Fraser University in British Columbia, Canada where she researches and teaches eighteenth-century literature and media history. She is the author of Acts of Union: Scotland and the Negotiation of the British Nation (Stanford UP, 1998) and Music, Postcolonialism and Gender: The Construction of Irish Identity, 1724-1874 (Notre Dame UP, 2005) as well as co-editor of Scotland and the Borders of Romanticism (Cambridge: Cambridge UP, 2004) and Robert Burns and Transatlantic Culture (Ashgate, 2012). She is currently completing a monograph entitled Mediating Cultural Memory in Britain and Ireland, 1688-1745 which explores sites of cultural memory in the British archipelago within the context of the shifting media ecology of the eighteenth century.

BL Labs Public Awards 2020 - REMINDER - Entries close NOON (GMT) 30 November 2020

Inspired by this work that uses the British Library's digital archived cuttings? Have you done something innovative using the British Library's digital collections and data? Why not consider entering your work for a BL Labs Public Award 2020 and win fame, glory and even a bit of money?

This year's public awards 2020 are open for submission, the deadline for entry is NOON (GMT) Monday 30 November 2020

Whilst we welcome projects on any use of our digital collections and data (especially in research, artistic, educational and community categories), we are particularly interested in entries in our public awards that have focused on anti-racist work, about the pandemic or that are using computational methods such as the use of Jupyter Notebooks.

Work will be showcased at the online BL Labs Annual Symposium between 1400 - 1700 on Tuesday 15 December, for more information and a booking form please visit the BL Labs Symposium 2020 webpage.

11 November 2020

BL Labs Online Symposium 2020 : Book your place for Tuesday 15-Dec-2020

Add comment

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the eighth annual British Library Labs Symposium 2020 will be held on Tuesday 15 December 2020, from 13:45 - 16:55* (see note below) online. The event is FREE, but you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early, see more information here!

*Please note, that directly after the Symposium, we are organising an experimental online mingling networking session between 16:55 and 17:30!

The British Library Labs (BL Labs) Symposium is an annual event and awards ceremony showcasing innovative projects that use the British Library's digital collections and data. It provides a platform for highlighting and discussing the use of the Library’s digital collections for research, inspiration and enjoyment. The awards this year will recognise outstanding use of British Library's digital content in the categories of Research, Artistic, Educational, Community and British Library staff contributions.

This is our eighth annual symposium and you can see previous Symposia videos from 201920182017201620152014 and our launch event in 2013.

Dr Ruth Anhert, Professor of Literary History and Digital Humanities at Queen Mary University of London Principal Investigator on 'Living With Machines' at The Alan Turing Institute
Ruth Ahnert will be giving the BL Labs Symposium 2020 keynote this year.

We are very proud to announce that this year's keynote will be delivered by Ruth Ahnert, Professor of Literary History and Digital Humanities at Queen Mary University of London, and Principal Investigator on 'Living With Machines' at The Alan Turing Institute.

Her work focuses on Tudor culture, book history, and digital humanities. She is author of The Rise of Prison Literature in the Sixteenth Century (Cambridge University Press, 2013), editor of Re-forming the Psalms in Tudor England, as a special issue of Renaissance Studies (2015), and co-author of two further books: The Network Turn: Changing Perspectives in the Humanities (Cambridge University Press, 2020) and Tudor Networks of Power (forthcoming with Oxford University Press). Recent collaborative work has taken place through AHRC-funded projects ‘Living with Machines’ and 'Networking the Archives: Assembling and analysing a meta-archive of correspondence, 1509-1714’. With Elaine Treharne she is series editor of the Stanford University Press’s Text Technologies series.

Ruth's keynote is entitled: Humanists Living with Machines: reflections on collaboration and computational history during a global pandemic

You can follow Ruth on Twitter.

There will be Awards announcements throughout the event for Research, Artistic, Community, Teaching & Learning and Staff Categories and this year we are going to get the audience to vote for their favourite project in those that were shortlisted, a people's BL Labs Award!

There will be a final talk near the end of the conference and we will announce the speaker for that session very soon.

So don't forget to book your place for the Symposium today as we predict it will be another full house again, the first one online and we don't want you to miss out, see more detailed information here

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

23 October 2020

BL Labs Public Award Runner Up (Research) 2019 - Automated Labelling of People in Video Archives

Add comment

Example people identified in TV news related programme clips
People 'automatically' identified in digital TV news related programme clips.

Guest blog post by Andrew Brown (PhD researcher),  Ernesto Coto (Research Software Engineer) and Andrew Zisserman (Professor) of the Visual Geometry Group, Department of Engineering Science, University of Oxford, and BL Labs Public Award Runner-up for Research, 2019. Posted on their behalf by Mahendra Mahey, Manager of BL Labs.

In this work, we automatically identify and label (tag) people in large video archives without the need for any manual annotation or supervision. The project was carried out with the British Library on a sample of 106 videos from their “Television and radio news” archive; a large collection of news programs from the last 10 years. This archive serves as an important and fascinating resource for researchers and the general public alike. However, the sheer scale of the data, coupled with a lack of relevant metadata, makes indexing, analysing and navigating this content an increasingly difficult task. Relying on human annotation is no longer feasible, and without an effective way to navigate these videos, this bank of knowledge is largely inaccessible.

As users, we are typically interested in human-centric queries such as:

  • “When did Jeremy Corbyn first appear in a Newsnight episode?” or
  • “Show me all of the times when Hugh Grant and Shirley Williams appeared together.

Currently this is nigh on impossible without trawling through hundreds of hours of content. 

We posed the following research question:

Is it possible to enable automatic person-search capabilities such as this in the archive, without the need for any manual supervision or labelling?

The answer is “yes”, and the method is described next.

Video Pre-Processing

The basic unit which enables person labelling in videos is the face-track; a group of consecutive face detections within a shot that correspond to the same identity. Face-tracks are extracted from all of the videos in the archive. The task of labelling the people in the videos is then to assign a label to each one of these extracted face-tracks. The video below gives an example of two face-tracks found in a scene.


Two face-tracks found in British Library digital news footage by Visual Geometry Group - University of Oxford.

Techniques at Our Disposal

The base technology used for this work is a state-of-the-art convolutional neural network (CNN), trained for facial recognition [1]. The CNN extracts feature-vectors (a list of numbers) from face images, which indicate the identity of the depicted person. To label a face-track, the distance between the feature-vector for the face-track, and the feature-vector for a face-image with known identity is computed. The face-track is labelled as depicting that identity if the distance is smaller than a certain threshold (i.e. they match). We also use a speaker recognition CNN [2] that works in the same way, except it labels speech segments from unknown identities using speech segments from known identities within the video.

Labelling the Face-Tracks

Our method for automatically labelling the people in the video archive is divided into three main stages:

(1) Our first labelling method uses what we term a “celebrity feature-vector bank”, which consists of names of people that are likely to appear in the videos, and their corresponding feature-vectors. The names are automatically sourced from IMDB cast lists for the programmes (the titles of the programmes are freely available in the meta-data). Face-images for each of the names are automatically downloaded from image-search engines. Incorrect face-images and people with no images of themselves on search engines are automatically removed at this stage. We compute the feature-vectors for each identity and add them to the bank alongside the names. The face-tracks from the video archives are then simply labelled by finding matches in the feature-vector bank.

Face-tracks from the video archives are labelled by finding matches in the feature-vector bank.
Face-tracks from the video archives are labelled by finding matches in the feature-vector bank. 

(2) Our second labelling method uses the idea that if a name is spoken, or found displayed in a scene, then that person is likely to be found within that scene. The task is then to automatically determine whether there is a correspondence or not. Text is automatically read from the news videos using Optical Character Recognition (OCR), and speech is automatically transcribed using Automatic Speech Recognition (ASR). Names are identified and they are searched for on image search engines. The top ranked images are downloaded and the feature-vectors are computed from the faces. If any are close enough to the feature-vectors from the face-tracks present in the scene, then that face-track is labelled with that name. The video below details this process for a written name.


Using text or spoken word and face recognition to identify a person in a news clip.

(3) For our third labelling method, we use speaker recognition to identify any non-labelled speaking people. We use the labels from the previous two stages to automatically acquire labelled speech segments from the corresponding labelled face-tracks. For each remaining non-labelled speaking person, we extract the speech feature-vector and compute the distance of it to the feature-vectors of the labelled speech segments. If one is close enough, then the non-labelled speech segment and corresponding face-track is assigned that name. This process manages to label speaking face-tracks with visually challenging faces, e.g. deep in shadow or at an extremely non-frontal pose.

Indexing and Searching Identities

The results of our work can be browsed via a web search engine of our own design. A search bar allows for users to specify the person or group of people that they would like to search for. People’s names are efficiently indexed so that the complete list of names can be filtered as the user types in the search bar. The search results are returned instantly with their associated metadata (programme name, data and time) and can be displayed in multiple ways. The video associated with each search result can be played, visualising the location and the name of all identified people in the video. See the video below for more details. This allows for the archive videos to be easily navigated using person-search, thus opening them up for use by the general public.


Archive videos easily navigated using person-search.

For examples of more of our Computer Vision research and open-source software, visit the Visual Geometry Group website.

This work was supported by the EPSRC Programme Grant Seebibyte EP/M013774/1

[1] Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. VGGFace2: A dataset for recognising faces across pose and age. In Proc. International Conference on Automatic Face & Gesture Recognition, 2018.

[2] Joon Son Chung, Arsha Nagrani and Andrew Zisserman. VoxCeleb2: Deep Speaker Recognition. INTERSPEECH, 2018

BL Labs Public Awards 2020

Inspired by this work that uses the British Library's digital archived news footage? Have you done something innovative using the British Library's digital collections and data? Why not consider entering your work for a BL Labs Public Award 2020 and win fame, glory and even a bit of money?

This year's public and staff awards 2020 are open for submission, the deadline for entry for both is Monday 30 November 2020.

Whilst we welcome projects on any use of our digital collections and data (especially in research, artistic, educational and community categories), we are particularly interested in entries in our public awards that have focused on anti-racist work, about the pandemic or that are using computational methods such as the use of Jupyter Notebooks.

19 October 2020

The 2020 British Library Labs Staff Award - Nominations Open!

Add comment

Looking for entries now!

A set of 4 light bulbs presented next to each other, the third light bulb is switched on. The image is supposed to a metaphor to represent an 'idea'
Nominate an existing British Library staff member or a team that has done something exciting, innovative and cool with the British Library’s digital collections or data.

The 2020 British Library Labs Staff Award, now in its fifth year, gives recognition to current British Library staff who have created something brilliant using the Library’s digital collections or data.

Perhaps you know of a project that developed new forms of knowledge, or an activity that delivered commercial value to the library. Did the person or team create an artistic work that inspired, stimulated, amazed and provoked? Do you know of a project developed by the Library where quality learning experiences were generated using the Library’s digital content? 

You may nominate a current member of British Library staff, a team, or yourself (if you are a member of staff), for the Staff Award using this form.

The deadline for submission is NOON (GMT), Monday 30 November 2020.

Nominees will be highlighted on Tuesday 15 December 2020 at the online British Library Labs Annual Symposium where some (winners and runners-up) will also be asked to talk about their projects (everyone is welcome to attend, you just need to register).

You can see the projects submitted by members of staff and public for the awards in our online archive.

In 2019, last year's winner focused on the brilliant work of the Imaging Team for the 'Qatar Foundation Partnership Project Hack Days', which were sessions organised for the team to experiment with the Library's digital collections. 

The runner-up for the BL Labs Staff Award in 2019 was the Heritage Made Digital team and their social media campaign to promote the British Library's digital collections one language a week from letters 'A' to 'U' #AToUnknown).

In the public Awards, last year's winners (2019) drew attention to artisticresearchteaching & learning, and community activities that used our data and / or digital collections.

British Library Labs is a project within the Digital Scholarship department at the British Library that supports and inspires the use of the Library's digital collections and data in exciting and innovative ways. It was previously funded by the Andrew W. Mellon Foundation and is now solely funded by the British Library.

If you have any questions, please contact us at labs@bl.uk.

24 July 2020

Ira Aldridge In the Spotlight

Add comment

In this post, Dr Mia Ridge gives a sense of why sightings of Ira Aldridge in our historical playbills collection resonate...

Ira Aldridge is one of the most popular 'celebrity spottings' shared by volunteers working with historical playbills on our In the Spotlight project. Born on this day in New York in 1807, Aldridge was the first Black actor to play a Shakespearean role in Britain.

Portrait of Aldridge by James Northcote
Portrait of Aldridge by James Northcote

Educated at the African Free School and with some experience at the African Grove Theatre in New York, the teenaged Aldridge emigrated to Britain from the US in 1824 and quickly had an impact. In 1826 painter James Northcote captured him as Othello, a portrait which became the first acquisition by the Manchester Art Gallery. (If you're reading this before August 15th, you can attend an online tour exploring his work.)

While his initial reviews were mixed, he took The Times' mocking reference to him as the 'African Roscius' and used both the references to the famous Roman actor and his African ancestry in promotional playbills. Caught up in debates about the abolition of slavery and facing racism in reviews from critics about his performances in London's theatres, Aldridge toured the regions, particularly British cities with anti-slavery sympathies. He performed a range of roles, and his Shakespearean roles eventually including Othello, Shylock, Macbeth, King Lear and Richard III.

From 1852, he toured Europe, particularly Germany, Poland and Russia. This 'List showing the theatres and plays in various European cities where Ira Aldridge, the African Roscius, acted during the years 1827-1867, compiled by Arturo Alfonso Schomburg, shows how widely he travelled and the roles he performed.

As the 1841 playbill from Doncaster's Theatre Royal (below) shows, the tale of his African ancestry grew more creative over time. The playbill also advertises a lecture and memoirs from Aldridge on various topics. In the years around the abolition of slavery in the British Empire, he spoke powerfully and directly to audiences about the injustices of slavery and racism. Playbills like this demonstrate how Aldridge managed to both pander to and play with perceptions of 'the African'.

This is necessarily a very brief overview of Aldridge's life and impact but I hope it's given you a sense of why it's so exciting to catch a glimpse of Aldridge in our collections.

Screenshot of historical playbill

Sources used and further reading include:

My thanks to everyone who suggested references for this post, in particular: Christian Algar, Naomi Billingsley, Nora McGregor, Susan Reed from the British Library; Dorothy Berry from the Houghton Library at Harvard and In the Spotlight participants including beccabooks10, Nosnibor3, Elizabeth Danskin (who shared a link to this video about his daughter, Amanda Aldridge), Nicola Hayes, and Sylvia Morris (who has written extensively about Aldridge on her blog).

Post by Mia Ridge, Digital Curator, Western Heritage Collections.

24 April 2020

BL Labs Learning & Teaching Award Winners - 2019 - The Other Voice - RCA

Add comment

Innovations in sound and art

Dr Matt Lewis, Tutor of Digital Direction and Dr Eleanor Dare, Reader of Digital Media both at the School of Communication, at the Royal College of Art and Mary Stewart Curator, Oral History and Deputy Director of National Life Stories at the British Library reflect on an ongoing and award-winning collaboration (posted on behalf of them by Mahendra Mahey, BL Labs Manager).

In spring 2019, based in both the British Library and the Royal College of Art School of Communication, seven students from the MA Digital Direction course participated in an elective module entitled The Other Voice. After listening in-depth to a selection of oral history interviews, the students learnt how to edit and creatively interpret oral histories, gaining insight into the complex and nuanced ethical and practical implications of working with other people’s life stories. The culmination of this collaboration was a two-day student-curated showcase at the British Library, where the students displayed their own creative and very personal responses to the oral history testimonies.

The culmination of this collaboration was a two-day student-curated showcase at the British Library, where the students displayed their own creative and very personal responses to the oral history testimonies. The module was led by Eleanor Dare (Head of Programme for MA Digital Direction, RCA), Matt Lewis (Sound Artist and Musician and RCA Tutor) and Mary Stewart (British Library Oral History Curator). We were really pleased that over 100 British Library staff took the time to come to the showcase, engage with the artwork and discuss their responses with the students.

Eleanor reflects:

The students have benefited enormously from this collaboration, gaining a deeper understanding of the ethics of editing, the particular power of oral history and of course, the feedback and stimulation of having a show in the British Library.”

We were all absolutely delighted that the Other Voice group were the winners of the BL Labs Teaching and Learning Award 2019, presented in November 2019 at a ceremony at the British Library Knowledge Centre.  Two students, Karthika Sakthivel and Giulia Brancati, also showcased their work at the 2019 annual Oral History Society Regional Network Event at the British Library - and contributed to a wide ranging discussion reflecting on their practice and the power of oral history with a group of 35 oral historians from all over the UK.  The collaboration has continued as Mary and Matt ran ‘The Other Voice’ elective in spring 2020, where the students adapted to the Covid-19 Pandemic, producing work under lockdown, from different locations around the world. 

Here is just a taster of the amazing works the students created in 2019, which made them worthy winners of the BL Labs Teaching and Learning Award 2019.

Karthika Sakthivel and Giulia Brancati were both inspired by the testimony of Irene Elliot, who was interviewed by Dvora Liberman in 2014 for an innovative project on Crown Court Clerks. They were both moved by Irene’s rich description of her mother’s hard work bringing up five children in 1950s Preston.

On the way back by Guilia Brancati

Giulia created On the way back an installation featuring two audio points – one with excerpts of Irene’s testimony and another an audio collage inspired by Irene’s description. Two old fashioned telephones played the audio, which the listener absorbed while curled up in an arm chair in a fictional front room. It was a wonderfully immersive experience.

Irene-eilliot
Irene Elliot's testimony interwoven with the audio collage (C1674/05)
Audio collage and photography © Giulia Brancati.
Listen here

Giulia commented:

In a world full of noise and overwhelming information, to sit and really pay attention to someone’s personal story is an act of mindful presence. This module has been continuous learning experience in which ‘the other voice’ became a trigger for creativity and personal reflection.”

Memory Foam by Karthika Sakthivel

Inspired by Irene’s testimony Karthika created a wonderful sonic quilt, entitled Memory Foam.

Karthika explains,

There was power in Irene’s voice, enough to make me want to sew - something I’d never really done on my own before. But in her story there was comfort, there was warmth and that kept me going.”

Illustrated with objects drawn from Irene's memories, each square of the patchwork quilt encased conductive fabric that triggered audio clips. Upon touching each square, the corresponding story would play.

Karthika further commented,

The initial visitor interactions with the piece gave me useful insights that enabled me to improve the experience in real time by testing alternate ways of hanging and displaying the quilt. After engaging with the quilt guests walked up to me with recollections of their own mothers and grandmothers – and these emotional connections were deeply rewarding.”

Karthika, Giulia and the whole group were honoured that Irene and her daughter Jayne travelled from Preston to come to the exhibition, Karthika:

"It was the greatest honour to have her experience my patchwork of her memories. This project for me unfurled yards of possibilities, the common thread being - the power of a voice.”

Memory-foam
Irene and her daughter Jayne experiencing Memory Foam © Karthika Sakthivel.
Irene's words activated by touching the lime green patch with lace and a zip (top left of the quilt) (C1674/05)
Listen here

Meditations in Clay by James Roadnight and David Sappa

Listening to ceramicist Walter Keeler's memories of making a pot inspired James Roadnight and David Sappa to travel to Cornwall and record new oral histories to create Meditations in Clay. This was an immersive documentary that explores what we, as members of this modern society, can learn from the craft of pottery - a technology as old as time itself. The film combines interviews conducted at the Bernard Leach pottery with audio-visual documentation of the St Ives studio and its rugged Cornish surroundings.


Meditations in Clay, video montage © James Roadnight and David Sappa.

Those attending the showcase were bewitched as they watched the landscape documentary on the large screen and engaged with the selection of listening pots, which when held to the ear played excerpts of the oral history interviews.

James and David commented,

This project has taught us a great deal about the deep interview techniques involved in Oral History. Seeing visitors at the showcase engage deeply with our work, watching the film and listening to our guided meditation for 15, 20 minutes at a time was more than we could have ever imagined.”

Beyond Form

Raf Martins responded innovatively to Jonathan Blake’s interview describing his experiences as one of the first people in the UK to be diagnosed with HIV. In Beyond Form Raf created an audio soundscape of environmental sounds and excerpts from the interview which played alongside a projected 3D hologram based on the cellular structure of the HIV virus. The hologram changed form and shape when activated by the audio – an intriguing visual artefact that translated the vibrant individual story into a futuristic media.

Beyond-form
Jonathan Blake's testimony interwoven with environmental soundscape (C456/104) Soundscape and image © Raf Martins.
Listen here

Stiff Upper Lip

Also inspired by Jonathan Blake’s interview was the short film Stiff Upper Lip by Kinglsey Tao which used clips of the interview as part of a short film exploring sexuality, identity and reactions to health and sickness.

Donald in Wonderland

Donald Palmer’s interview with Paul Merchant contained a wonderful and warm description of the front room that his Jamaican-born parents ‘kept for best’ in 1970s London. Alex Remoleux created a virtual reality tour of the reimagined space, entitled Donald in Wonderland, where the viewer could point to various objects in the virtual space and launch the corresponding snippet of audio.

Alex commented,

I am really happy that I provided a Virtual Reality experience, and that Donald Palmer himself came to see my work. In the picture below you can see Donald using the remote in order to point and touch the objects represented in the virtual world.”

Donald-wonderland
Donald Palmer describes his parents' front room (C1379/102)
Interviewee Donald Palmer wearing the virtual reality headset, exploring the virtual reality space (pictured) created by Alex Remoleux.
Listen here

Showcase at the British Library

The reaction to the showcase from the visitors and British Library staff was overwhelmingly positive, as shown by this small selection of comments. We were incredibly grateful to interviewees Irene and Donald for attending the showcase too. This was an excellent collaboration: RCA students and staff alike gained new insights into the significance and breadth of the British Library Oral History collection and the British Library staff were bowled over by the creative responses to the archival collection.

Feedback
Examples of feedback from British Library showcase of 'The Other Voice' by Royal College of Art

With thanks to the MA Other Voice cohort Giulia Brancati, Raf Martins, Alexia Remoleux, James Roadnight, Karthika Sakthivel, David Sappa and Kingsley Tao, RCA staff Eleanor Dare and Matt Lewis & BL Oral History Curator Mary Stewart, plus all the interviewees who recorded their stories and the visitors who took the time to attend the showcase.

21 April 2020

Clean. Migrate. Validate. Enhance. Processing Archival Metadata with Open Refine

Add comment

This blogpost is by Graham Jevon, Cataloguer, Endangered Archives Programme 

Creating detailed and consistent metadata is a challenge common to most archives. Many rely on an army of volunteers with varying degrees of cataloguing experience. And no matter how diligent any team of cataloguers are, human error and individual idiosyncrasies are inevitable.

This challenge is particularly pertinent to the Endangered Archives Programme (EAP), which has hitherto funded in excess of 400 projects in more than 90 countries. Each project is unique and employs its own team of one or more cataloguers based in the particular country where the archival content is digitised. But all this disparately created metadata must be uniform when ingested into the British Library’s cataloguing system and uploaded to eap.bl.uk.

Finding an efficient, low-cost method to process large volumes of metadata generated by hundreds of unique teams is a challenge; one that in 2019, EAP sought to alleviate using freely available open source software Open Refine – a power tool for processing data.

This blog highlights some of the ways that we are using Open Refine. It is not an instructional how-to guide (though we are happy to follow-up with more detailed blogs if there is interest), but an introductory overview of some of the Open Refine methods we use to process large volumes of metadata.

Initial metadata capture

Our metadata is initially created by project teams using an Excel spreadsheet template provided by EAP. In the past year we have completely redesigned this template in order to make it as user friendly and controlled as possible.

Screenshot of spreadsheet

But while Excel is perfect for metadata creation, it is not best suited for checking and editing large volumes of data. This is where Open Refine excels (pardon the pun!), so when the final completed spreadsheet is delivered to EAP, we use Open Refine to clean, validate, migrate, and enhance this data.

WorkflowDiagram

Replicating repetitive tasks

Open Refine came to the forefront of our attention after a one-day introductory training session led by Owen Stephens where the key takeaway for EAP was that a sequence of functions performed in Open Refine can be copied and re-used on subsequent datasets.

ScreenshotofOpenRefineSoftware1

This encouraged us to design and create a sequence of processes that can be re-applied every time we receive a new batch of metadata, thus automating large parts of our workflow.

No computer programming skills required

Building this sequence required no computer programming experience (though this can help); just logical thinking, a generous online community willing to share their knowledge and experience, and a willingness to learn Open Refine’s GREL language and generic regular expressions. Some functions can be performed simply by using Open Refine’s built-in menu options. But the limits of Open Refine’s capabilities are almost infinite; the more you explore and experiment, the further you can push the boundaries.

Initially, it was hoped that our whole Open Refine sequence could be repeated in one single large batch of operations. The complexity of the data and the need for archivist intervention meant that it was more appropriate to divide the process into several steps. Our workflow is divided into 7 stages:

  1. Migration
  2. Dates
  3. Languages and Scripts
  4. Related subjects
  5. Related places and other authorities
  6. Uniform Titles
  7. Digital content validation

Each of these stages performs one or more of four tasks: clean, migrate, validate, and enhance.

Task 1: Clean

The first part of our workflow provides basic data cleaning. Across all columns it trims any white space at the beginning or end of a cell, removes any double spaces, and capitalises the first letter of every cell. In just a few seconds, this tidies the entire dataset.

Task 1 Example: Trimming white space (menu option)

Trimming whitespace on an individual column is an easy function to perform as Open Refine has a built in “Common transform” that performs this function.

ScreenshotofOpenRefineSoftware2

Although this is a simple function to perform, we no longer need to repeatedly select this menu option for each column of each dataset we process because this task is now part of the workflow that we simply copy and paste.

Task 1 Example: Capitalising the first letter (using GREL)

Capitalising the first letter of each cell is less straightforward for a new user as it does not have a built-in function that can be selected from a menu. Instead it requires a custom “Transform” using Open Refine’s own expression language (GREL).

ScreenshotofOpenRefineSoftware3


Having to write an expression like this should not put off any Open Refine novices. This is an example of Open Refine’s flexibility and many expressions can be found and copied from the Open Refine wiki pages or from blogs like this. The more you copy others, the more you learn, and the easier you will find it to adapt expressions to your own unique requirements.

Moreover, we do not have to repeat this expression again. Just like the trim whitespace transformation, this is also now part of our copy and paste workflow. One click performs both these tasks and more.

Task 2: Migrate

As previously mentioned, the listing template used by the project teams is not the same as the spreadsheet template required for ingest into the British Library’s cataloguing system. But Open Refine helps us convert the listing template to the ingest template. In just one click, it renames, reorders, and restructures the data from the human friendly listing template to the computer friendly ingest template.

Task 2 example: Variant Titles

The ingest spreadsheet has a “Title” column and a single “Additional Titles” column where all other title variations are compiled. It is not practical to expect temporary cataloguers to understand how to use the “Title” and “Additional Titles” columns on the ingest spreadsheet. It is much more effective to provide cataloguers with a listing template that has three prescriptive title columns. This helps them clearly understand what type of titles are required and where they should be put.

SpreadsheetSnapshot

The EAP team then uses Open Refine to move these titles into the appropriate columns (illustrated above). It places one in the main “Title” field and concatenates the other two titles (if they exist) into the “Additional Titles” field. It also creates two new title type columns, which the ingest process requires so that it knows which title is which.

This is just one part of the migration stage of the workflow, which performs several renaming, re-ordering, and concatenation tasks like this to prepare the data for ingest into the British Library’s cataloguing system.

Task 3: Validate

While cleaning and preparing the data for migration is important, it also vital that we check that the data is accurate and reliable. But who has the time, inclination, or eye stamina to read thousands of rows of data in an Excel spreadsheet? What we require is a computational method to validate data. Perhaps the best way of doing this is to write a bespoke computer program. This indeed is something that I am now working on while learning to write computer code using the Python language (look out for a further blog on this later).

In the meantime, though, Open Refine has helped us to validate large volumes of metadata with no programming experience required.

Task 3 Example: Validating metadata-content connections

When we receive the final output from a digitisation project, one of our most important tasks is to ensure that all of digital content (images, audio and video recordings) correlate with the metadata on the spreadsheet and vice versa.

We begin by running a command line report on the folders containing the digital content. This provides us with a csv file which we can read in Excel. However, the data is not presented in a neat format for comparison purposes.

SpreadsheetSnapshot2

Restructuring data ready for validation comparisons

For this particular task what we want is a simple list of all the digital folder names (not the full directory) and the number of TIFF images each folder contains. Open Refine enables just that, as the next image illustrates.

ScreenshotofOpenRefineSoftware4

Constructing the sequence that restructures this data required careful planning and good familiarity with Open Refine and the GREL expression language. But after the data had been successfully restructured once, we never have to think about how to do this again. As with other parts of the workflow, we now just have to copy and paste the sequence to repeat this transformation on new datasets in the same format.

Cross referencing data for validation

With the data in this neat format, we can now do a number of simple cross referencing checks. We can check that:

  1. Each digital folder has a corresponding row of metadata – if not, this indicates that the metadata is incomplete
  2. Each row of metadata has a corresponding digital folder – if not, this indicates that some digital folders containing images are missing
  3. The actual number of TIFF images in each folder exactly matches the number of images recorded by the cataloguer – if not this may indicate that some images are missing.

For each of these checks we use Open Refine’s cell.cross expression to cross reference the digital folder report with the metadata listing.

In the screenshot below we can see the results of the first validation check. Each digital folder name should match the reference number of a record in the metadata listing. If we find a match it returns that reference number in the “CrossRef” column. If no match is found, that column is left blank. By filtering that column by blanks, we can very quickly identify all of the digital folders that do not contain a corresponding row of metadata. In this example, before applying the filter, we can already see that at least one digital folder is missing metadata. An archivist can then investigate why that is and fix the problem.

ScreenshotofOpenRefineSoftware5

Task 4: Enhance

We enhance our metadata in a number of ways. For example, we import authority codes for languages and scripts, and we assign subject headings and authority records based on keywords and phrases found in the titles and description columns.

Named Entity Extraction

One of Open Refine’s most dynamic features is its ability to connect to other online databases and thanks to the generous support of Dandelion API we are able to use its service to identify entities such as people, places, organisations, and titles of work.

In just a few simple steps, Dandelion API reads our metadata and returns new linked data, which we can filter by category. For example, we can list all of the entities it has extracted and categorised as a place or all the entities categorised as people.

ScreenshotofOpenRefineSoftware6

Not every named entity it finds will be accurate. In the above example “Baptism” is clearly not a place. But it is much easier for an archivist to manually validate a list of 29 phrases identified as places, than to read 10,000 scope and content descriptions looking for named entities.

Clustering inconsistencies

If there is inconsistency in the metadata, the returned entities might contain multiple variants. This can be overcome using Open Refine’s clustering feature. This identifies and collates similar phrases and offers the opportunity to merge them into one consistent spelling.

ScreenshotofOpenRefineSoftware7

Linked data reconciliation

Having identified and validated a list of entities, we then use other linked data services to help create authority records. For this particular task, we use the Wikidata reconciliation service. Wikidata is a structured data sister project to Wikipedia. And the Open Refine reconciliation service enables us to link an entity in our dataset to its corresponding item in Wikidata, which in turn allows us to pull in additional information from Wikidata relating to that item.

For a South American photograph project we recently catalogued, Dandelion API helped identify 335 people (including actors and performers). By subsequently reconciling these people with their corresponding records in Wikidata, we were able to pull in their job title, date of birth, date of death, unique persistent identifiers, and other details required to create a full authority record for that person.

ScreenshotofOpenRefineSoftware8

Creating individual authority records for 335 people would otherwise take days of work. It is a task that previously we might have deemed infeasible. But Open Refine and Wikidata drastically reduces the human effort required.

Summary

In many ways, that is the key benefit. By placing Open Refine at the heart of our workflow for processing metadata, it now takes us less time to do more. Our workflow is not perfect. We are constantly finding new ways to improve it. But we now have a semi-automated method for processing large volumes of metadata.

This blog puts just some of those methods in the spotlight. In the interest of brevity, we refrained from providing step-by-step detail. But if there is interest, we will be happy to write further blogs to help others use this as a starting point for their own metadata processing workflows.

20 April 2020

BL Labs Research Award Winner 2019 - Tim Crawford - F-Tempo

Add comment

Posted on behalf of Tim Crawford, Professorial Research Fellow in Computational Musicology at Goldsmiths, University of London and BL Labs Research Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Introducing F-TEMPO

Early music printing

Music printing, introduced in the later 15th century, enabled the dissemination of the greatest music of the age, which until that time was the exclusive preserve of royal and aristocratic courts or the Church. A vast repertory of all kinds of music is preserved in these prints, and they became the main conduit for the spread of the reputation and influence of the great composers of the Renaissance and early Baroque periods, such as Josquin, Lassus, Palestrina, Marenzio and Monteverdi. As this music became accessible to the increasingly well-heeled merchant classes, entirely new cultural networks of taste and transmission became established and can be traced in the patterns of survival of these printed sources.

Music historians have tended to neglect the analysis of these patterns in favour of a focus on a canon of ‘great works’ by ‘great composers’, with the consequence that there is a large sub-repertory of music that has not been seriously investigated or published in modern editions. By including this ‘hidden’ musical corpus, we could explore for the first time, for example, the networks of influence, distribution and fashion, and the effects on these of political, religious and social change over time.

Online resources of music and how to read them

Vast amounts of music, mostly audio tracks, are now available using services such as Spotify, iTunes or YouTube. Music is also available online in great quantity in the form of PDF files rendering page-images of either original musical documents or modern, computer-generated music notation. These are a surrogate for paper-based books used in traditional musicology, but offer few advantages beyond convenience. What they don’t allow is full-text search, unlike the text-based online materials which are increasingly the subject of ‘distant reading’ in the digital humanities.

With good score images, Optical Music Recognition (OMR) programs can sometimes produce useful scores from printed music of simple texture; however, in general, OMR output contains errors due to misrecognised symbols. The results often amount to musical gibberish, severely limiting the usefulness of OMR for creating large digital score collections. Our OMR program is Aruspix, which is highly reliable on good images, even when they have been digitised from microfilm.

Here is a screen-shot from Aruspix, showing part of the original page-image at the top, and the program’s best effort at recognising the 16th-century music notation below. It is not hard to see that, although the program does a pretty good job on the whole, there are not a few recognition errors. The program includes a graphical interface for correcting these, but we don’t make use of that for F-TEMPO for reasons of time – even a few seconds of correction per image would slow the whole process catastrophically.

The Aruspix user-interface
The Aruspix user-interface

 

 

Finding what we want – error-tolerant encoding

Although OMR is far from perfect, online users are generally happy to use computer methods on large collections containing noise; this is the principle behind the searches in Google Books, which are based on Optical Character Recognition (OCR).

For F-TEMPO, from the output of the Aruspix OMR program, for each page of music, we extract a ‘string’ representing the pitch-name and octave for the sequence of notes. Since certain errors (especially wrong or missing clefs or accidentals) affect all subsequent notes, we encode the intervals between notes rather than the notes themselves, so that we can match transposed versions of the sequences or parts of them. We then use a simple alphabetic code to represent the intervals in the computer.

Here is an example of a few notes from a popular French chanson, showing our encoding method.

A few notes from a Crequillon chanson, and our encoding of the intervals
A few notes from a Crequillon chanson, and our encoding of the intervals

F-TEMPO in action

F-TEMPO uses state-of-the-art, scalable retrieval methods, providing rapid searches of almost 60,000 page-images for those similar to a query-page in less than a second. It successfully recovers matches when the query page is not complete, e.g. when page-breaks are different. Also, close non-identical matches, as between voice-parts of a polyphonic work in imitative style, are highly ranked in results; similarly, different works based on the same musical content are usually well-matched.

Here is a screen-shot from the demo interface to F-TEMPO. The ‘query’ image is on the left, and searches are done by hitting the ‘Enter’ or ‘Return’ key in the normal way. The list of results appears in the middle column, with the best match (usually the query page itself) highlighted and displayed on the right. As other results are selected, their images are displayed on the right. Users can upload their own images of 16th-century music that might be in the collection to serve as queries; we have found that even photos taken with a mobile phone work well. However, don’t expect coherent results if you upload other kinds of image!

F-Tempo-User Interface
F-Tempo-User Interface

The F-TEMPO web-site can be found at: http://f-tempo.org

Click on the ‘Demo’ button to try out the program for yourself.

What more can we do with F-TEMPO?

Using the full-text search methods enabled by F-TEMPO’s API we might begin to ask intriguing questions, such as:

  • ‘How did certain pieces of music spread and become established favourites throughout Europe during the 16th century?’
  • ‘How well is the relative popularity of such early-modern favourites reflected in modern recordings since the 1950s?’
  • ‘How many unrecognised arrangements are there in the 16th-century repertory?’

In early testing we identified an instrumental ricercar as a wordless transcription of a Latin motet, hitherto unknown to musicology. As the collection grows, we are finding more such unexpected concordances, and can sometimes identify the composers of works labelled in some printed sources as by ‘Incertus’ (Uncertain). We have also uncovered some interesting conflicting attributions which could provoke interesting scholarly discussion.

Early Music Online and F-TEMPO

From the outset, this project has been based on the Early Music Online (EMO) collection, the result of a 2011 JISC-funded Rapid Digitisation project between the British Library and Royal Holloway, University of London. This digitised about 300 books of early printed music at the BL from archival microfilms, producing black-and-white images which have served as an excellent proof of concept for the development of F-TEMPO. The c.200 books judged suitable for our early methods in EMO contain about 32,000 pages of music, and form the basis for our resource.

The current version of F-TEMPO includes just under 30,000 more pages of early printed music from the Polish National Library, Warsaw, as well as a few thousand from the Bibliothèque nationale, Paris. We will soon be incorporating no fewer than a further half-a-million pages from the Bavarian State Library collection in Munich, as soon as we have run them through our automatic indexing system.

 (This work was funded for the past year by the JISC / British Academy Digital Humanities Research in the Humanities scheme. Thanks are due to David Lewis, Golnaz Badkobeh and Ryaan Ahmed for technical help and their many suggestions.)