BL Labs Research Award Winner 2019 - Tim Crawford - F-Tempo
Posted on behalf of Tim Crawford, Professorial Research Fellow in Computational Musicology at Goldsmiths, University of London and BL Labs Research Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.
Early music printing
Music printing, introduced in the later 15th century, enabled the dissemination of the greatest music of the age, which until that time was the exclusive preserve of royal and aristocratic courts or the Church. A vast repertory of all kinds of music is preserved in these prints, and they became the main conduit for the spread of the reputation and influence of the great composers of the Renaissance and early Baroque periods, such as Josquin, Lassus, Palestrina, Marenzio and Monteverdi. As this music became accessible to the increasingly well-heeled merchant classes, entirely new cultural networks of taste and transmission became established and can be traced in the patterns of survival of these printed sources.
Music historians have tended to neglect the analysis of these patterns in favour of a focus on a canon of ‘great works’ by ‘great composers’, with the consequence that there is a large sub-repertory of music that has not been seriously investigated or published in modern editions. By including this ‘hidden’ musical corpus, we could explore for the first time, for example, the networks of influence, distribution and fashion, and the effects on these of political, religious and social change over time.
Online resources of music and how to read them
Vast amounts of music, mostly audio tracks, are now available using services such as Spotify, iTunes or YouTube. Music is also available online in great quantity in the form of PDF files rendering page-images of either original musical documents or modern, computer-generated music notation. These are a surrogate for paper-based books used in traditional musicology, but offer few advantages beyond convenience. What they don’t allow is full-text search, unlike the text-based online materials which are increasingly the subject of ‘distant reading’ in the digital humanities.
With good score images, Optical Music Recognition (OMR) programs can sometimes produce useful scores from printed music of simple texture; however, in general, OMR output contains errors due to misrecognised symbols. The results often amount to musical gibberish, severely limiting the usefulness of OMR for creating large digital score collections. Our OMR program is Aruspix, which is highly reliable on good images, even when they have been digitised from microfilm.
Here is a screen-shot from Aruspix, showing part of the original page-image at the top, and the program’s best effort at recognising the 16th-century music notation below. It is not hard to see that, although the program does a pretty good job on the whole, there are not a few recognition errors. The program includes a graphical interface for correcting these, but we don’t make use of that for F-TEMPO for reasons of time – even a few seconds of correction per image would slow the whole process catastrophically.
Finding what we want – error-tolerant encoding
Although OMR is far from perfect, online users are generally happy to use computer methods on large collections containing noise; this is the principle behind the searches in Google Books, which are based on Optical Character Recognition (OCR).
For F-TEMPO, from the output of the Aruspix OMR program, for each page of music, we extract a ‘string’ representing the pitch-name and octave for the sequence of notes. Since certain errors (especially wrong or missing clefs or accidentals) affect all subsequent notes, we encode the intervals between notes rather than the notes themselves, so that we can match transposed versions of the sequences or parts of them. We then use a simple alphabetic code to represent the intervals in the computer.
Here is an example of a few notes from a popular French chanson, showing our encoding method.
F-TEMPO in action
F-TEMPO uses state-of-the-art, scalable retrieval methods, providing rapid searches of almost 60,000 page-images for those similar to a query-page in less than a second. It successfully recovers matches when the query page is not complete, e.g. when page-breaks are different. Also, close non-identical matches, as between voice-parts of a polyphonic work in imitative style, are highly ranked in results; similarly, different works based on the same musical content are usually well-matched.
Here is a screen-shot from the demo interface to F-TEMPO. The ‘query’ image is on the left, and searches are done by hitting the ‘Enter’ or ‘Return’ key in the normal way. The list of results appears in the middle column, with the best match (usually the query page itself) highlighted and displayed on the right. As other results are selected, their images are displayed on the right. Users can upload their own images of 16th-century music that might be in the collection to serve as queries; we have found that even photos taken with a mobile phone work well. However, don’t expect coherent results if you upload other kinds of image!
The F-TEMPO web-site can be found at: http://f-tempo.org
Click on the ‘Demo’ button to try out the program for yourself.
What more can we do with F-TEMPO?
Using the full-text search methods enabled by F-TEMPO’s API we might begin to ask intriguing questions, such as:
- ‘How did certain pieces of music spread and become established favourites throughout Europe during the 16th century?’
- ‘How well is the relative popularity of such early-modern favourites reflected in modern recordings since the 1950s?’
- ‘How many unrecognised arrangements are there in the 16th-century repertory?’
In early testing we identified an instrumental ricercar as a wordless transcription of a Latin motet, hitherto unknown to musicology. As the collection grows, we are finding more such unexpected concordances, and can sometimes identify the composers of works labelled in some printed sources as by ‘Incertus’ (Uncertain). We have also uncovered some interesting conflicting attributions which could provoke interesting scholarly discussion.
Early Music Online and F-TEMPO
From the outset, this project has been based on the Early Music Online (EMO) collection, the result of a 2011 JISC-funded Rapid Digitisation project between the British Library and Royal Holloway, University of London. This digitised about 300 books of early printed music at the BL from archival microfilms, producing black-and-white images which have served as an excellent proof of concept for the development of F-TEMPO. The c.200 books judged suitable for our early methods in EMO contain about 32,000 pages of music, and form the basis for our resource.
The current version of F-TEMPO includes just under 30,000 more pages of early printed music from the Polish National Library, Warsaw, as well as a few thousand from the Bibliothèque nationale, Paris. We will soon be incorporating no fewer than a further half-a-million pages from the Bavarian State Library collection in Munich, as soon as we have run them through our automatic indexing system.
(This work was funded for the past year by the JISC / British Academy Digital Humanities Research in the Humanities scheme. Thanks are due to David Lewis, Golnaz Badkobeh and Ryaan Ahmed for technical help and their many suggestions.)