THE BRITISH LIBRARY

Digital scholarship blog

77 posts categorized "Research collaboration"

14 July 2020

Legacies of Catalogue Descriptions and Curatorial Voice: Training Sessions

Add comment

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the University of Sussex.

This month the team behind "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" ran two training sessions as part of our Arts and Humanities Research Council funded project. Each standalone session provided instruction in using the software tool AntConc and approaches from computational linguistics for the purposes of examining catalogue data. The objectives of the sessions were twofold: to test our in-development training materials, and to seek feedback from the community in order to better understand their needs and to develop our training offer.

Rather than host open public training, we decided to foster existing partnerships by inviting a small number of individuals drawn from attendees at events hosted as part of our previous Curatorial Voice project (funded by the British Academy). In total thirteen individuals from the UK and US took part across the two sessions, with representatives from libraries, archives, museums, and galleries.

Screenshot of the website for the lesson entitled Computational Analysis of Catalogue Data

Screenshot of the content page and timetable for the lesson
Carpentries-style lesson about analysing catalogue data in Antconc


The training was delivered in the style of a Software Carpentry workshop, drawing on their wonderful lesson templatepedagogical principles, and rapid response to moving coding and data science instruction online in light of the Covid-19 crisis (see ‘Recommendations for Teaching Carpentries Workshops Online’ and ‘Tips for Teaching Online from The Carpentries Community’). In terms of content, we started with the basics: how to get data into AntConc, the layout of AntConc, and settings in AntConc. After that we worked through two substantial modules. The first focused on how to generate, interact with, and interpret a word list, and this was followed by a module on searching, adapting, and reading concordances. The tasks and content of both modules avoided generic software instruction and instead focused on the analysis of free text catalogue fields, with attendees asked to consider what they might infer about a catalogue from its use of tense, what a high volume of capitalised words might tell us about cataloguing style, and how adverb use might be a useful proxy for the presence of controlled vocabulary.

Screenshot of three tasks and solutions in the Searching Concordances section
Tasks in the Searching Concordances section

Running Carpentries-style training over Zoom was new to me, and was - frankly - very odd. During live coding I missed hearing the clack of keyboards as people followed along in response. I missed seeing the sticky notes go up as people completed the task at hand. During exercises I missed hearing the hubbub that accompanies pair programming. And more generally, without seeing the micro-gestures of concentration, relief, frustration, and joy on the faces of learners, I felt somehow isolated as an instructor from the process of learning.

But from the feedback we received the attendees appear to have been happy. It seems we got the pace right (we assumed teaching online would be slower than face-to-face, and it was). The attendees enjoyed using AntConc and were surprised, to quote one attendees, "to see just how quickly you could draw some conclusions". The breakout rooms we used for exercises were a hit. And importantly we have a clear steer on next steps: that we should pivot to a dataset that better reflects the diversity of catalogue data (for this exercise we used a catalogue of printed images that I know very well), that learners would benefit having a list of suggested readings and resources on corpus linguistics, and that we might - to quote one attendee - provide "more examples up front of the kinds of finished research that has leveraged this style of analysis".

These comments and more will feed into the development of our training materials, which we hope to complete by the end of 2020 and - in line with the open values of the project - is happening in public. In the meantime, the materials are there for the community to use, adapt and build on (more or less) as they wish. Should you take a look and have any thoughts on what we might change or include for the final version, we always appreciate an email or a note on our issue tracker.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library that is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

06 July 2020

Archivists, Stop Wasting Your Ref-ing Time!

Add comment

“I didn’t get where I am today by manually creating individual catalogue references for thousands of archival records!”

One of the most laborious yet necessary tasks of an archivist is the generation of catalogue references. This was once the bane of my life. But I now have a technological solution, which anyone can download and use for free.

Animated image showing Reference Generator being abbreviated to ReG

Meet ReG: the newest team member of the Endangered Archives Programme (EAP). He’s not as entertaining as Reginald D Hunter. She’s not as lyrical as Regina Spektor. But like 1970s sitcom character Reggie Perrin, ReG provides a logical solution to the daily grind of office life - though less extreme and hopefully more successful.

 

Two pictures of musicians, Reginald Hunter and Regina Spektor
Reginald D Hunter (left),  [Image originally posted by Pete Ashton at https://flickr.com/photos/51035602859@N01/187673692]; Regina Spektor (right), [Image originally posted by Beny Shlevich at https://www.flickr.com/photos/17088109@N00/417238523]

 

Reggie Perrin’s boss CJ was famed for his “I didn’t get where I am today” catchphrase, and as EAP’s resident GJ, I decided to employ my own ReG, without whom I wouldn’t be where I am today. Rather than writing this blog, my eyes would be drowning in metadata, my mind gathering dust, and my ears fleeing from the sound of colleagues and collaborators banging on my door, demanding to know why I’m so far behind in my work.

 

Image of two men at their offices from British sitcom The Rise and Fall of Reginald Perrin
CJ (left) [http://www.leonardrossiter.com/reginaldperrin/12044.jpg] and Reginald Perrin (right) [https://www.imdb.com/title/tt0073990/mediaviewer/rm1649999872] from The Rise and Fall of Reginald Perrin.

 

The problem

EAP metadata is created in spreadsheets by digitisation teams all over the world. It is then processed by the EAP team in London and ingested into the British Library’s cataloguing system.

When I joined EAP in 2018 one of the first projects to process was the Barbados Mercury and Bridgetown Gazette. It took days to create all of the catalogue references for this large newspaper collection, which spans more than 60 years.

Microsoft Excel’s fill down feature helped automate part of this task, but repeating this for thousands of rows is time-consuming and error-prone.

Animated image displaying the autofill procedure being carried out

I needed to find a solution to this.

During 2019 I established new workflows to semi-automate several aspects of the cataloguing process using OpenRefine - but OpenRefine is primarily a data cleaning tool, and its difficulty in understanding hierarchical relationships meant that it was not suitable for this task.

 

Learning to code

For some time I toyed with the idea of learning to write computer code using the Python programming language. I dabbled with free online tutorials. But it was tough to make practical sense of these generic tutorials, hard to find time, and my motivation dwindled.

When the British Library teamed up with The National Archives and Birkbeck University of London to launch a PG Cert in Computing for Information Professionals, I jumped at the chance to take part in the trial run.

It was a leap certainly worth taking because I now have the skills to write code for the purpose of transforming and analysing large volumes of data. And the first product of this new skillset is a computer program that accurately generates catalogue references for thousands of rows of data in mere seconds.

 

The solution - ReG in action

By coincidence, one of the first projects I needed to catalogue after creating this program was another Caribbean newspaper digitised by the same team at the Barbados Archives Department: The Barbadian.

This collection was a similar size and structure to the Barbados Mercury, but the generation of all the catalogue references took just a few seconds. All I needed to do was:

  • Open ReG
  • Enter the project ID for the collection (reference prefix)
  • Enter the filename of the spreadsheet containing the metadata

Animated image showing ReG working to file references

And Bingo! All my references were generated in a new file..

Before and After image explaining 'In just a few seconds, the following transformation took place in the 'Reference' column' showing the new reference names

 

How it works in a nutshell

The basic principle of the program is that it reads a single column in the dataset, which contains the hierarchical information. In the example above, it read the “Level” column.

It then uses this information to calculate the structured numbering of the catalogue references, which it populates in the “Reference” column.

 

Reference format

The generated references conform to the following format:

  • Each reference begins with a prefix that is common to the whole dataset. This is the prefix that the user enters at the start of the program. In the example above, that is “EAP1251”.
  • Forward slashes ( / ) are used to indicate a new hierarchical level.
  • Each record is assigned its own number relative to its sibling records, and that number is shared with all of the children of that record.

 

In the example above, the reference for the first collection is formatted:

Image showing how the reference works: 'EAP1251/1' is the first series

The reference for the first series of the first collection is formatted:

Image showing how the reference works: 'EAP1251/1/1' is the first series of the first collection

The reference for the second series of the first collection is:

Image showing how the reference works: 'EAP1251/1/2' is the second series of the first collection

No matter how complex the hierarchical structure of the dataset, the program will quickly and accurately generate references for every record in accordance with this format.

 

Download for wider re-use

While ReG was designed primarily for use by EAP, it should work for anyone that generates reference numbers using the same format.

For users of the Calm cataloguing software, ReG could be used to complete the “RefNo” column, which determines the tree structure of a collection when a spreadsheet is ingested into Calm.

With wider re-use in mind, some settings can be configured to suit individual requirements

For example, you can configure the names of the columns that ReG reads and generates references in. For EAP, the reference generation column is named “Reference”, but for Calm users, it could be configured as “RefNo”.

Users can also configure their own hierarchy. You have complete freedom to set the hierarchical terms applicable to your institution and complete freedom to set the hierarchical order of those terms.

It is possible that some minor EAP idiosyncrasies might preclude reuse of this program for some users. If this is the case, by all means get in touch; perhaps I can tweak the code to make it more applicable to users beyond EAP - though some tweaks may be more feasible than others.

 

Additional validation features

While generating references is the core function, to that end it includes several validation features to help you spot and correct problems with your data.

Unexpected item in the hierarchy area

For catalogue references to be calculated, all the data in the level column must match a term within the configured hierarchy. The program therefore checks this and if a discrepancy is found, users will be notified and they have two options to proceed.

Option 1: Rename unexpected terms

First, users have the option to rename any unexpected terms. This is useful for correcting typographical errors, such as this example - where “Files” should be “File”.

Animated image showing option 1: renaming unexpected 'files' to 'file'

Before and after image showing the change of 'files' to 'file'

Option 2: Build a one-off hierarchy

Alternatively, users can create a one-off hierarchy that matches the terms in the dataset. In the following example, the unexpected hierarchical term “Specimen” is a bona fide term. It is just not part of the configured hierarchy.

Rather than force the user to quit the program and amend the configuration file, they can simply establish a new, one-off hierarchy within the program.

Animated image showing option 2: adding 'specimen' to the hierarchy under 'file'

This hierarchy will not be saved for future instances. It is just used for this one-off occasion. If the user wants “Specimen” to be recognised in the future, the configuration file will also need to be updated.

 

Single child records

To avoid redundant information, it is sometimes advisable for an archivist to eliminate single child records from a collection. ReG will identify any such records, notify the user, and give them three options to proceed:

  1. Delete single child records
  2. Delete the parents of single child records
  3. Keep the single child records and/or their parents

Depending on how the user chooses to proceed, ReG will produce one of three results, which affects the rows that remain and the structure of the generated references.

In this example, the third series in the original dataset contains a single child - a single file.

Image showing the three possible outcomes to a single child record: A. delete child so it appears just as a series, B. delete parent so it appears just as a file, and C. keep the child record and their parents so it appears as a series followed by a single file

The most notable result is option B, where the parent was deleted. Looking at the “Level” column, the single child now appears to be a sibling of the files from the second series. But the reference number indicates that this file is part of a different branch within the tree structure.

This is more clearly illustrated by the following tree diagrams.

Image showing a tree hierarchy of the three possible outcomes for a single child record: A. a childless series, B. a file at the same level as other series, C. a series with a single child file

This functionality means that ReG will help you spot any single child records that you may otherwise have been unaware of.

But it also gives you a means of creating an appropriate hierarchical structure when cataloguing in a spreadsheet. If you intentionally insert dummy parents for single child records, ReG can generate references that map the appropriate tree structure and then remove the dummy parent records in one seamless process.

 

And finally ...

If you’ve got this far, you probably recognise the problem and have at least a passing interest in finding a solution. If so, please feel free to download the software, give it a go, and get in touch.

If you spot any problems, or have any suggested enhancements, I would welcome your input. You certainly won’t be wasting my time - and you might just save some of yours.

 

Download links

For making this possible, I am particularly thankful to Jody Butterworth, Sam van Schaik, Nora McGregor, Stelios Sotiriadis, and Peter Wood.

This blog post is by Dr Graham Jevon, Endangered Archives Programme cataloguer. He is on twitter as @GJHistory.

15 June 2020

Marginal Voices in UK Digital Comics

Add comment

I am an AHRC Collaborative Doctoral Partnership student based at the British Library and Central Saint Martins, University of the Arts London (UAL). The studentship is funded by the Arts and Humanities Research Council’s Collaborative Doctoral Partnership Programme.

Supervised jointly by Stella Wisdom from the British Library, Roger Sabin and Ian Hague from UAL, my research looks to explore the potential for digital comics to take advantage of digital technologies and the digital environment to foster inclusivity and diversity. I aim to examine the status of marginal voices within UK digital comics, while addressing the opportunities and challenges these comics present for the British Library’s collection and preservation policies.

A cartoon strip of three vertical panel images, in the first a caravan is on the edge of a cliff, in the second a dog asleep in a bed, in the third the dog wakes up and sits up in bed
The opening panels from G Bear and Jammo by Jaime Huxtable, showing their caravan on The Gower Peninsula in South Wales, copyright © Jaime Huxtable

Digital comics have been identified as complex digital publications, meaning this research project is connected to the work of the broader Emerging Formats Project. On top of embracing technological change, digital comics have the potential to reflect, embrace and contribute to social and cultural change in the UK. Digital comics not only present new ways of telling stories, but whose story is told.

One of the comic creators, whose work I have been recently examining is Jaime Huxtable, a Welsh cartoonist/illustrator based in Worthing, West Sussex. He has worked on a variety of digital comics projects, from webcomics to interactive comics, and also runs various comics related workshops.

Samir's Christmas by Jaime Huxtable, this promotional comic strip was created for Freedom From Torture’s 2019 Christmas Care Box Appeal. This comic was  made into a short animated video by Hands Up, copyright © Jaime Huxtable

My thesis will explore whether the ways UK digital comics are published and consumed means that they can foreground marginal, alternative voices similar to the way underground comix and zine culture has. Comics scholarship has focused on the technological aspects of digital comics, meaning their potentially significant contribution reflecting and embracing social and cultural change in the UK has not been explored. I want to establish whether the fact digital comics can circumvent traditional gatekeepers means they provide space to foreground marginal voices. I will also explore the challenges and opportunities digital comics might present for legal deposit collection development policy.

As well as being a member of the Comics Research Hub (CoRH) at UAL, I have already begun working with colleagues from the UK Web Archive, and hope to be able to make a significant contribution to the Web Comic Archive. Issues around collection development and management are central to my research, I feel very fortunate to be based at the British Library, to have the chance to learn from and hopefully contribute to practice here.

If anyone would like to know more about my research, or recommend any digital comics for me to look at, please do contact me at Tom.Gebhart@bl.uk or @thmsgbhrt on Twitter. UK digital comic creators and publishers can use the ComicHaus app to send their digital comics directly to The British Library digital archive. More details about this process are here.

This post is by British Library collaborative doctoral student Thomas Gebhart (@thmsgbhrt).

20 April 2020

BL Labs Research Award Winner 2019 - Tim Crawford - F-Tempo

Add comment

Posted on behalf of Tim Crawford, Professorial Research Fellow in Computational Musicology at Goldsmiths, University of London and BL Labs Research Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Introducing F-TEMPO

Early music printing

Music printing, introduced in the later 15th century, enabled the dissemination of the greatest music of the age, which until that time was the exclusive preserve of royal and aristocratic courts or the Church. A vast repertory of all kinds of music is preserved in these prints, and they became the main conduit for the spread of the reputation and influence of the great composers of the Renaissance and early Baroque periods, such as Josquin, Lassus, Palestrina, Marenzio and Monteverdi. As this music became accessible to the increasingly well-heeled merchant classes, entirely new cultural networks of taste and transmission became established and can be traced in the patterns of survival of these printed sources.

Music historians have tended to neglect the analysis of these patterns in favour of a focus on a canon of ‘great works’ by ‘great composers’, with the consequence that there is a large sub-repertory of music that has not been seriously investigated or published in modern editions. By including this ‘hidden’ musical corpus, we could explore for the first time, for example, the networks of influence, distribution and fashion, and the effects on these of political, religious and social change over time.

Online resources of music and how to read them

Vast amounts of music, mostly audio tracks, are now available using services such as Spotify, iTunes or YouTube. Music is also available online in great quantity in the form of PDF files rendering page-images of either original musical documents or modern, computer-generated music notation. These are a surrogate for paper-based books used in traditional musicology, but offer few advantages beyond convenience. What they don’t allow is full-text search, unlike the text-based online materials which are increasingly the subject of ‘distant reading’ in the digital humanities.

With good score images, Optical Music Recognition (OMR) programs can sometimes produce useful scores from printed music of simple texture; however, in general, OMR output contains errors due to misrecognised symbols. The results often amount to musical gibberish, severely limiting the usefulness of OMR for creating large digital score collections. Our OMR program is Aruspix, which is highly reliable on good images, even when they have been digitised from microfilm.

Here is a screen-shot from Aruspix, showing part of the original page-image at the top, and the program’s best effort at recognising the 16th-century music notation below. It is not hard to see that, although the program does a pretty good job on the whole, there are not a few recognition errors. The program includes a graphical interface for correcting these, but we don’t make use of that for F-TEMPO for reasons of time – even a few seconds of correction per image would slow the whole process catastrophically.

The Aruspix user-interface
The Aruspix user-interface

 

 

Finding what we want – error-tolerant encoding

Although OMR is far from perfect, online users are generally happy to use computer methods on large collections containing noise; this is the principle behind the searches in Google Books, which are based on Optical Character Recognition (OCR).

For F-TEMPO, from the output of the Aruspix OMR program, for each page of music, we extract a ‘string’ representing the pitch-name and octave for the sequence of notes. Since certain errors (especially wrong or missing clefs or accidentals) affect all subsequent notes, we encode the intervals between notes rather than the notes themselves, so that we can match transposed versions of the sequences or parts of them. We then use a simple alphabetic code to represent the intervals in the computer.

Here is an example of a few notes from a popular French chanson, showing our encoding method.

A few notes from a Crequillon chanson, and our encoding of the intervals
A few notes from a Crequillon chanson, and our encoding of the intervals

F-TEMPO in action

F-TEMPO uses state-of-the-art, scalable retrieval methods, providing rapid searches of almost 60,000 page-images for those similar to a query-page in less than a second. It successfully recovers matches when the query page is not complete, e.g. when page-breaks are different. Also, close non-identical matches, as between voice-parts of a polyphonic work in imitative style, are highly ranked in results; similarly, different works based on the same musical content are usually well-matched.

Here is a screen-shot from the demo interface to F-TEMPO. The ‘query’ image is on the left, and searches are done by hitting the ‘Enter’ or ‘Return’ key in the normal way. The list of results appears in the middle column, with the best match (usually the query page itself) highlighted and displayed on the right. As other results are selected, their images are displayed on the right. Users can upload their own images of 16th-century music that might be in the collection to serve as queries; we have found that even photos taken with a mobile phone work well. However, don’t expect coherent results if you upload other kinds of image!

F-Tempo-User Interface
F-Tempo-User Interface

The F-TEMPO web-site can be found at: http://f-tempo.org

Click on the ‘Demo’ button to try out the program for yourself.

What more can we do with F-TEMPO?

Using the full-text search methods enabled by F-TEMPO’s API we might begin to ask intriguing questions, such as:

  • ‘How did certain pieces of music spread and become established favourites throughout Europe during the 16th century?’
  • ‘How well is the relative popularity of such early-modern favourites reflected in modern recordings since the 1950s?’
  • ‘How many unrecognised arrangements are there in the 16th-century repertory?’

In early testing we identified an instrumental ricercar as a wordless transcription of a Latin motet, hitherto unknown to musicology. As the collection grows, we are finding more such unexpected concordances, and can sometimes identify the composers of works labelled in some printed sources as by ‘Incertus’ (Uncertain). We have also uncovered some interesting conflicting attributions which could provoke interesting scholarly discussion.

Early Music Online and F-TEMPO

From the outset, this project has been based on the Early Music Online (EMO) collection, the result of a 2011 JISC-funded Rapid Digitisation project between the British Library and Royal Holloway, University of London. This digitised about 300 books of early printed music at the BL from archival microfilms, producing black-and-white images which have served as an excellent proof of concept for the development of F-TEMPO. The c.200 books judged suitable for our early methods in EMO contain about 32,000 pages of music, and form the basis for our resource.

The current version of F-TEMPO includes just under 30,000 more pages of early printed music from the Polish National Library, Warsaw, as well as a few thousand from the Bibliothèque nationale, Paris. We will soon be incorporating no fewer than a further half-a-million pages from the Bavarian State Library collection in Munich, as soon as we have run them through our automatic indexing system.

 (This work was funded for the past year by the JISC / British Academy Digital Humanities Research in the Humanities scheme. Thanks are due to David Lewis, Golnaz Badkobeh and Ryaan Ahmed for technical help and their many suggestions.)

08 April 2020

Legacies of Catalogue Descriptions and Curatorial Voice: a new AHRC project

Add comment

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the School of History, Art History and Philosophy, University of Sussex. James has a background in the history of the printed image, archival theory, art history, and computational analysis. He is author of The Business of Satirical Prints in Late-Georgian England (2017), the first monograph on the infrastructure of the satirical print trade circa 1770-1830, and a member of the Programming Historian team.

I love a good catalogue. Whether describing historic books, personal papers, scientific objects, or works of art, catalogue entries are the stuff of historical research, brief insights into a many possible avenues of discovery. As a historian, I am trained to think critically about catalogues and the entries they contain, to remember that they are always crafted by people, institutions, and temporally specific ways of working, and to consider what that reality might do to my understanding of the past those catalogues and entries represent. Recently, I've started to make these catalogues my objects of historical study, to research what they contain, the labour that produced them, and the socio-cultural forces that shaped that labour, with a particular focus on the anglophone printed catalogue circa 1930-1990. One motivation for this is purely historical, to elucidate what I see as an important historical phenomenon. But another is about now, about how those catalogues are used and reused in the digital age. Browse the shelves of a university library and you'll quickly see that circumstances of production are encoded into the architecture of the printed catalogue: title pages, prefaces, fonts, spines, and the quality of paper are all signals of their historical nature. But when their entries - as many have been over the last 30 years - are moved into a database and online, these cues become detached, and their replacement – a bibliographic citation – is insufficient to evoke their historical specificity, does little to help alert the user to the myriad of texts they are navigating each time they search an online catalogue.

It is these interests and concerns that underpin "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship", a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library. This 12-month project funded by the Arts and Humanities Research Council aims to open up new and important directions for computational, critical, and curatorial analysis of collection catalogues. Our pilot research will investigate the temporal and spatial legacy of a catalogue I know well - the landmark ‘Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum’, produced by Mary Dorothy George between 1930 and 1954, 1.1 million words of text to which all scholars of the long-eighteenth century printed image are indebted, and which forms the basis of many catalogue entries at other institutions, not least those of our partners at the Lewis Walpole Library. We are particularly interested in tracing the temporal and spatial legacies of this catalogue, and plan to repurpose corpus linguistic methods developed in our "Curatorial Voice" project (generously funded by the British Academy) to examine the enduring legacies of Dorothy George's "voice" beyond her printed volumes.

Participants at the Curatorial Voices workshop, working in small groups and drawing images on paper.
Some things we got up to at our February 2019 Curatorial Voice workshop. What a difference a year makes!

But we also want to demonstrate the value of these methods to cultural institutions. Alongside their collections, catalogues are central to the identities and legacies of these institutions. And so we posit that being better able to examine their catalogue data can help cultural institutions get on with important catalogue related work: to target precious cataloguing and curatorial labour towards the records that need the most attention, to produce empirically-grounded guides to best practice, and to enable more critical user engagement with 'legacy' catalogue records (for more info, see our paper ‘Investigating Curatorial Voice with Corpus Linguistic Techniques: the case of Dorothy George and applications in museological practice’, Museum & Society, 2020).

A table with boxes of black and red lines which visualise the representation of spacial and non-spacial sentence parts in the descriptions of the satirical prints.
An analysis of our BM Satire Descriptions corpus (see doi.org/10.5281/zenodo.3245037 for how we made it and doi.org/10.5281/zenodo.3245017 for our methods). In this visualization - a snapshot of a bigger interactive - one box represents a single description, red lines are sentence parts marked ‘spatial’, and black lines are sentence parts marked as ‘non-spatial’. This output was based on iterative machine learning analysis with Method52. The data used is published by ResearchSpace under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Over the course of the "Legacies" project, we had hoped to run two capability building workshops aimed at library, archives, and museum professionals. The first of these was due to take place at the British Library this May, and the aim of the workshop was to test our still very much work-in-progress training module on the computational analysis of catalogue data. Then Covid-19 hit and, like most things in life, the plan had to be dropped.

The new plan is still in development, but the project team know that we need input from the community to make the training module of greatest benefit to that community. The current plan is that in late summer we will run some ad hoc virtual training sessions on computational analysis of catalogue data. And so we are looking for library, archives, and museum professionals who produce or work with catalogue data to be our crash test dummies, to run through parts of the module, to tell us what works, what doesn't, and what is missing. If you'd be interested in taking part in one of these training sessions, please email James Baker and tell me why. We look forward to hearing from you.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

06 April 2020

Poetry Mobile Apps

Add comment

This is a guest post by Pete Hebden, a PhD student at Newcastle University, currently undertaking a practice-led PhD; researching and creating a poetry app. Pete has recently completed a three month placement in Contemporary British Published Collections at the British Library, where he assisted curators working with the UK Web Archive, artists books and emerging formats collections, you can follow him on Twitter as @Pete_Hebden

As part of my PhD research, I have been investigating how writers and publishers have used smartphone and tablet devices to present poetry in new ways through mobile apps. In particular, I’m interested in how these new ways of presenting poetry compare to the more familiar format of the printed book. The mobile device allows poets and publishers to create new experiences for readers, incorporating location-based features, interactivity, and multimedia into the encounter with the poem.

Since the introduction of smartphones and tablet computers in the early 2010s, a huge range of digital books, e-literature, and literary games have been developed to explore the possibilities of this technology for literature. Projects like Ambient Literature and the work of Editions at Play have explored how mobile technology can transform story-telling and narrative, and similarly my project looks at how this technology can create new experiences of poetic texts.

Below are a few examples of poetry apps released over the past decade. For accessibility reasons, this selection has been limited to apps that can be used anywhere and are free to download. Some of them present work written with the mobile device in mind, while others take existing print work and re-mediate it for the mobile touchscreen.

Puzzling Poetry (iOS and Android, 2016)

Dutch developers Studio Louter worked with multiple poets to create this gamified approach to reading poetry. Existing poems are turned into puzzles to be unlocked by the reader word-by-word as they use patterns and themes within each text to figure out where each word should go. The result is that often new meanings and possibilities are noticed that might have been missed in a traditional linear reading experience.

Screen capture of Puzzling Poetry
Screen capture image of  the Puzzling Poetry app

This video explains and demonstrates how the Puzzling Poetry app works:

 

Translatory (iOS, 2016)

This app, created by Arc Publications, guides readers in creating their own English translations of contemporary foreign-language poems. Using the digital display to see multiple possible translations of each phrase, the reader gains a fresh understanding of the complex work that goes into literary translation, as well as the rich layers of meaning included within the poem. Readers are able to save their finished translations and share them through social media using the app.

Screen capture image of Translatory
Screen capture image of the Translatory app

 

Poetry: The Poetry Foundation app (iOS and Android, 2011)

At nearly a decade old, the Poetry Foundation’s Poetry app was one of the first mobile apps dedicated to poetry, and has been steadily updated by the editors of Poetry magazine ever since. It contains a huge array of both public-domain work and poems published in the magazine over the past century. To help users find their way through this, Poetry’s developers created an entertaining and useful interface for finding poems with unique combinations of themes through a roulette-wheel-style ‘spinner’. The app also responds to users shaking their phone for a random selection of poem. 

Screen capture image of The Poetry Foundation app
Screen capture image of The Poetry Foundation app

 

ABRA: A Living Text  (iOS, 2014)

A collaboration between the poets Amaranth Borsuk and Kate Durbin, and developer Ian Hatcher, the ABRA app presents readers with a range of digital tools to use (or spells to cast) on the text, which transform the text and create a unique experience for each reader. A fun and unusual way to encounter a collection of poems, giving the reader the opportunity to contribute to an ever-shifting, crowd-edited digital poem.

Screen capture image of the ABRA app
Screen capture image of the ABRA app

This artistic video below demonstrates how the ABRA app works. Painting your finger and thumb gold is not required! 

I hope you feel inspired to check out these poetry apps, or maybe even to create your own.

24 March 2020

Learning in Lockdown: Digital Research Team online

Add comment Comments (0)

This blog post is by Nora McGregor, Digital Curator, Digital Research Team/European and Americas Collections, British Library. She's on Twitter as @ndalyrose.

With British Library public spaces now closed, the Digital Research Team are focussing our energies on transforming our internal staff Digital Scholarship Training Programme into an online resource for colleagues working from home. Using a mixture of tools at our disposal (Zoom conferencing and our dedicated course Slack channels for text-based chat) we are experimenting with delivering some of our staff workshops such as the Library Carpentries and Open Refine with Owen Stephens online, as well as our reading group and staff lectures. Last week our colleague in Research Services, Jez Cope trialed the delivery of a Library Carpentry workshop on Tidy Data at the last minute to a virtual room of 12 colleagues. For some it was the first time ever working from home or using remote conferencing tools so the digital skills learning is happening on many levels which for us is incredibly exciting! We’ll share more in depth results of these experiments with you via this blog and in time, as we gain more experience in this area, we may well be able to offer some sessions to the public!

Homeschooling for the Digital Research Team

And just like parents around the world creating hopeful, colourful schedules for maintaining children’s daily learning (full disclosure: I’m one of ‘em!), so too are we planning to keep up with our schooling whilst stuck home. Below are just a handful of some of the online training and resources we in the Digital Research Team are keeping up with over the coming months. We’ll add to this as we go along and would of course welcome in the comments any other suggestions from our librarian and digital scholarship networks! 

  • Archivist’s at Home and Free Webinars and Trainings for Academic Library Workers (COVID-19) We’re keeping an eye on these two particularly useful resources for archivists and academic librarians looking for continuing education opportunities while working from home.
  • Digital Skills for the Workplace These (free!) online courses were created by Institute of Coding (who funded our Computing for Cultural Heritage course) to try to address the digital skills gap in a meaningful way and go much further than your classic “Beginner Excel” courses. Created through a partnership with different industries they aim to reflect practical baseline skills that employers need. 
  • Elements of AI is a (free!) course, provided by Finland as ‘a present for the European Union’ providing a gentle introduction to artificial intelligence. What a great present!
  • Gateway to Coding: Python Essentials Another (free!) course developed by the Institute of Coding, this one is designed particularly for folks like us at British Library who would like a gentle introduction to programming languages like Python, but can’t install anything on our work machines.
  • Library Juice Academy has some great courses starting up in April. The other great thing about these is that you can take them 'live' which means the instructor is around and available and you get a certificate at the end or 'asynchronously' at your own pace (no certificate).
  • Programming Historian Tutorials Tried and true, our team relies on these tutorials to understand the latest and greatest in using technology to manage and analyse data for humanities research. 

Time for Play

Of course, if Stephen King’s The Shining has taught us anything, we’d all do well to ensure we make time for some play during these times of isolation!

We’ll be highlighting more opportunities for fun distractions in future posts, but these are just a few ideas to help keep your mind occupied at the moment:

Stay safe, healthy and sane out there guys!

Sincerely,

The Digital Research Team

11 February 2020

Call for participants: April 2020 book sprint on the state of the art in crowdsourcing in cultural heritage

Add comment

[Update, March 2020: like so much else, our plans for the 'Collective Wisdom' project have been thrown out by the COVID-19 pandemic. We have an extension from our funders and will look to confirm dates when the global situation (especially around international flights) becomes clearer. In the meantime, the JISCMail Crowdsourcing list has some discussion on starting and managing projects in the current context.]

One of the key outcomes of our AHRC UK-US Partnership Development Grant, 'From crowdsourcing to digitally-enabled participation: the state of the art in collaboration, access, and inclusion for cultural heritage institutions', is the publication of an open access book written through a collaborative 'book sprint'. We'll work with up to 12 other collaborators to write a high-quality book that provides a comprehensive, practical and authoritative guide to crowdsourcing and digitally-enabled participation projects in the cultural heritage sector. Could you be one of our collaborators? Read on!

The book sprint will be held at the Peale Center for Baltimore History and Architecture from 19 - 24th April 2020. We've added a half-day debriefing session to the usual five day sprint, so that we can capture all the ideas that didn't make it into the book and start to shape the agenda for a follow-up workshop to be held at the British Library in October. Due to the pace of writing and facilitation, participants must be able to commit to five and a half days in order to attend. 

We have some confirmed participants already - including representatives from FromThePage, King’s College London Department of Digital Humanities, the Virginia Tech Department of Computer Science, and the Colored Conventions Project, plus the project investigators Mia Ridge (British Library), Meghan Ferriter (Library of Congress) and Sam Blickhan (Zooniverse) - with additional places to be filled by this open call for participation. 

An open call enables us to include folk from a range of backgrounds and experiences. This matches the ethos of the book sprint model, which states that 'diversity in participants—perspectives, experience, job roles, ethnicity, gender—creates a better work dynamic and a better book'. Participants will have the opportunity to not only create this authoritative text, but to facilitate the formation of an online community of practice which will serve as a resource and support system for those engaging with crowdsourcing and digitally-enabled participation projects.

We're looking for participants who are enthusiastic, experienced and engaged, with expertise at any point in the life cycle of crowdsourcing and digital participation. Your expertise might have been gained through hands-on experience on projects or by conducting research in areas from co-creation with heritage organisations or community archives to HCI, human computation and CSCW. We have a generous definition of 'digitally-enabled participation', including not-entirely-digital volunteering projects around cultural heritage collections, and activities that go beyond typical collection-centric 'crowdsourcing' tasks like transcription, classification and description. Got questions? Please email digitalresearch@bl.uk!

How to apply

  1. Read the Book Sprint FAQs to make sure you're aware of the process and commitment required
  2. Fill in this short Google Form by midnight GMT February 26th

What happens next?

We'll review applications and let people know by the end of February 2020.

We're planning to book travel and accommodation for participants as soon as dates and attendance is confirmed - this helps keeps costs down and also means that individuals aren't out of pocket while waiting for reimbursement. The AHRC fund will pay for travel and accommodation for all book sprint participants. We will also host a follow up workshop at the British Library in October and hope to provide travel and accommodations for book sprint participants. 

We'll be holding a pre-sprint video call (on March 18, 19 or 20) to put faces to names and think about topics that people might want to research in advance and collect as an annotated bibliography for use during the sprint. 

If you can't make the book sprint but would still like to contribute, we've got you covered! We'll publish the first version of the book online for comment and feedback. Book sprints don't allow for remote participation, so this is our best way of including the vast amounts of expertise not in the room.

You can sign up to the British Library's crowdsourcing newsletters for updates, or join our Crowdsourcing group on Humanities Commons set up to share progress and engage in discussion with the wider community.