THE BRITISH LIBRARY

Digital scholarship blog

156 posts categorized "Projects"

06 July 2020

Archivists, Stop Wasting Your Ref-ing Time!

Add comment

“I didn’t get where I am today by manually creating individual catalogue references for thousands of archival records!”

One of the most laborious yet necessary tasks of an archivist is the generation of catalogue references. This was once the bane of my life. But I now have a technological solution, which anyone can download and use for free.

Animated image showing Reference Generator being abbreviated to ReG

Meet ReG: the newest team member of the Endangered Archives Programme (EAP). He’s not as entertaining as Reginald D Hunter. She’s not as lyrical as Regina Spektor. But like 1970s sitcom character Reggie Perrin, ReG provides a logical solution to the daily grind of office life - though less extreme and hopefully more successful.

 

Two pictures of musicians, Reginald Hunter and Regina Spektor
Reginald D Hunter (left),  [Image originally posted by Pete Ashton at https://flickr.com/photos/51035602859@N01/187673692]; Regina Spektor (right), [Image originally posted by Beny Shlevich at https://www.flickr.com/photos/17088109@N00/417238523]

 

Reggie Perrin’s boss CJ was famed for his “I didn’t get where I am today” catchphrase, and as EAP’s resident GJ, I decided to employ my own ReG, without whom I wouldn’t be where I am today. Rather than writing this blog, my eyes would be drowning in metadata, my mind gathering dust, and my ears fleeing from the sound of colleagues and collaborators banging on my door, demanding to know why I’m so far behind in my work.

 

Image of two men at their offices from British sitcom The Rise and Fall of Reginald Perrin
CJ (left) [http://www.leonardrossiter.com/reginaldperrin/12044.jpg] and Reginald Perrin (right) [https://www.imdb.com/title/tt0073990/mediaviewer/rm1649999872] from The Rise and Fall of Reginald Perrin.

 

The problem

EAP metadata is created in spreadsheets by digitisation teams all over the world. It is then processed by the EAP team in London and ingested into the British Library’s cataloguing system.

When I joined EAP in 2018 one of the first projects to process was the Barbados Mercury and Bridgetown Gazette. It took days to create all of the catalogue references for this large newspaper collection, which spans more than 60 years.

Microsoft Excel’s fill down feature helped automate part of this task, but repeating this for thousands of rows is time-consuming and error-prone.

Animated image displaying the autofill procedure being carried out

I needed to find a solution to this.

During 2019 I established new workflows to semi-automate several aspects of the cataloguing process using OpenRefine - but OpenRefine is primarily a data cleaning tool, and its difficulty in understanding hierarchical relationships meant that it was not suitable for this task.

 

Learning to code

For some time I toyed with the idea of learning to write computer code using the Python programming language. I dabbled with free online tutorials. But it was tough to make practical sense of these generic tutorials, hard to find time, and my motivation dwindled.

When the British Library teamed up with The National Archives and Birkbeck University of London to launch a PG Cert in Computing for Information Professionals, I jumped at the chance to take part in the trial run.

It was a leap certainly worth taking because I now have the skills to write code for the purpose of transforming and analysing large volumes of data. And the first product of this new skillset is a computer program that accurately generates catalogue references for thousands of rows of data in mere seconds.

 

The solution - ReG in action

By coincidence, one of the first projects I needed to catalogue after creating this program was another Caribbean newspaper digitised by the same team at the Barbados Archives Department: The Barbadian.

This collection was a similar size and structure to the Barbados Mercury, but the generation of all the catalogue references took just a few seconds. All I needed to do was:

  • Open ReG
  • Enter the project ID for the collection (reference prefix)
  • Enter the filename of the spreadsheet containing the metadata

Animated image showing ReG working to file references

And Bingo! All my references were generated in a new file..

Before and After image explaining 'In just a few seconds, the following transformation took place in the 'Reference' column' showing the new reference names

 

How it works in a nutshell

The basic principle of the program is that it reads a single column in the dataset, which contains the hierarchical information. In the example above, it read the “Level” column.

It then uses this information to calculate the structured numbering of the catalogue references, which it populates in the “Reference” column.

 

Reference format

The generated references conform to the following format:

  • Each reference begins with a prefix that is common to the whole dataset. This is the prefix that the user enters at the start of the program. In the example above, that is “EAP1251”.
  • Forward slashes ( / ) are used to indicate a new hierarchical level.
  • Each record is assigned its own number relative to its sibling records, and that number is shared with all of the children of that record.

 

In the example above, the reference for the first collection is formatted:

Image showing how the reference works: 'EAP1251/1' is the first series

The reference for the first series of the first collection is formatted:

Image showing how the reference works: 'EAP1251/1/1' is the first series of the first collection

The reference for the second series of the first collection is:

Image showing how the reference works: 'EAP1251/1/2' is the second series of the first collection

No matter how complex the hierarchical structure of the dataset, the program will quickly and accurately generate references for every record in accordance with this format.

 

Download for wider re-use

While ReG was designed primarily for use by EAP, it should work for anyone that generates reference numbers using the same format.

For users of the Calm cataloguing software, ReG could be used to complete the “RefNo” column, which determines the tree structure of a collection when a spreadsheet is ingested into Calm.

With wider re-use in mind, some settings can be configured to suit individual requirements

For example, you can configure the names of the columns that ReG reads and generates references in. For EAP, the reference generation column is named “Reference”, but for Calm users, it could be configured as “RefNo”.

Users can also configure their own hierarchy. You have complete freedom to set the hierarchical terms applicable to your institution and complete freedom to set the hierarchical order of those terms.

It is possible that some minor EAP idiosyncrasies might preclude reuse of this program for some users. If this is the case, by all means get in touch; perhaps I can tweak the code to make it more applicable to users beyond EAP - though some tweaks may be more feasible than others.

 

Additional validation features

While generating references is the core function, to that end it includes several validation features to help you spot and correct problems with your data.

Unexpected item in the hierarchy area

For catalogue references to be calculated, all the data in the level column must match a term within the configured hierarchy. The program therefore checks this and if a discrepancy is found, users will be notified and they have two options to proceed.

Option 1: Rename unexpected terms

First, users have the option to rename any unexpected terms. This is useful for correcting typographical errors, such as this example - where “Files” should be “File”.

Animated image showing option 1: renaming unexpected 'files' to 'file'

Before and after image showing the change of 'files' to 'file'

Option 2: Build a one-off hierarchy

Alternatively, users can create a one-off hierarchy that matches the terms in the dataset. In the following example, the unexpected hierarchical term “Specimen” is a bona fide term. It is just not part of the configured hierarchy.

Rather than force the user to quit the program and amend the configuration file, they can simply establish a new, one-off hierarchy within the program.

Animated image showing option 2: adding 'specimen' to the hierarchy under 'file'

This hierarchy will not be saved for future instances. It is just used for this one-off occasion. If the user wants “Specimen” to be recognised in the future, the configuration file will also need to be updated.

 

Single child records

To avoid redundant information, it is sometimes advisable for an archivist to eliminate single child records from a collection. ReG will identify any such records, notify the user, and give them three options to proceed:

  1. Delete single child records
  2. Delete the parents of single child records
  3. Keep the single child records and/or their parents

Depending on how the user chooses to proceed, ReG will produce one of three results, which affects the rows that remain and the structure of the generated references.

In this example, the third series in the original dataset contains a single child - a single file.

Image showing the three possible outcomes to a single child record: A. delete child so it appears just as a series, B. delete parent so it appears just as a file, and C. keep the child record and their parents so it appears as a series followed by a single file

The most notable result is option B, where the parent was deleted. Looking at the “Level” column, the single child now appears to be a sibling of the files from the second series. But the reference number indicates that this file is part of a different branch within the tree structure.

This is more clearly illustrated by the following tree diagrams.

Image showing a tree hierarchy of the three possible outcomes for a single child record: A. a childless series, B. a file at the same level as other series, C. a series with a single child file

This functionality means that ReG will help you spot any single child records that you may otherwise have been unaware of.

But it also gives you a means of creating an appropriate hierarchical structure when cataloguing in a spreadsheet. If you intentionally insert dummy parents for single child records, ReG can generate references that map the appropriate tree structure and then remove the dummy parent records in one seamless process.

 

And finally ...

If you’ve got this far, you probably recognise the problem and have at least a passing interest in finding a solution. If so, please feel free to download the software, give it a go, and get in touch.

If you spot any problems, or have any suggested enhancements, I would welcome your input. You certainly won’t be wasting my time - and you might just save some of yours.

 

Download links

For making this possible, I am particularly thankful to Jody Butterworth, Sam van Schaik, Nora McGregor, Stelios Sotiriadis, and Peter Wood.

This blog post is by Dr Graham Jevon, Endangered Archives Programme cataloguer. He is on twitter as @GJHistory.

15 June 2020

Marginal Voices in UK Digital Comics

Add comment

I am an AHRC Collaborative Doctoral Partnership student based at the British Library and Central Saint Martins, University of the Arts London (UAL). The studentship is funded by the Arts and Humanities Research Council’s Collaborative Doctoral Partnership Programme.

Supervised jointly by Stella Wisdom from the British Library, Roger Sabin and Ian Hague from UAL, my research looks to explore the potential for digital comics to take advantage of digital technologies and the digital environment to foster inclusivity and diversity. I aim to examine the status of marginal voices within UK digital comics, while addressing the opportunities and challenges these comics present for the British Library’s collection and preservation policies.

A cartoon strip of three vertical panel images, in the first a caravan is on the edge of a cliff, in the second a dog asleep in a bed, in the third the dog wakes up and sits up in bed
The opening panels from G Bear and Jammo by Jaime Huxtable, showing their caravan on The Gower Peninsula in South Wales, copyright © Jaime Huxtable

Digital comics have been identified as complex digital publications, meaning this research project is connected to the work of the broader Emerging Formats Project. On top of embracing technological change, digital comics have the potential to reflect, embrace and contribute to social and cultural change in the UK. Digital comics not only present new ways of telling stories, but whose story is told.

One of the comic creators, whose work I have been recently examining is Jaime Huxtable, a Welsh cartoonist/illustrator based in Worthing, West Sussex. He has worked on a variety of digital comics projects, from webcomics to interactive comics, and also runs various comics related workshops.

Samir's Christmas by Jaime Huxtable, this promotional comic strip was created for Freedom From Torture’s 2019 Christmas Care Box Appeal. This comic was  made into a short animated video by Hands Up, copyright © Jaime Huxtable

My thesis will explore whether the ways UK digital comics are published and consumed means that they can foreground marginal, alternative voices similar to the way underground comix and zine culture has. Comics scholarship has focused on the technological aspects of digital comics, meaning their potentially significant contribution reflecting and embracing social and cultural change in the UK has not been explored. I want to establish whether the fact digital comics can circumvent traditional gatekeepers means they provide space to foreground marginal voices. I will also explore the challenges and opportunities digital comics might present for legal deposit collection development policy.

As well as being a member of the Comics Research Hub (CoRH) at UAL, I have already begun working with colleagues from the UK Web Archive, and hope to be able to make a significant contribution to the Web Comic Archive. Issues around collection development and management are central to my research, I feel very fortunate to be based at the British Library, to have the chance to learn from and hopefully contribute to practice here.

If anyone would like to know more about my research, or recommend any digital comics for me to look at, please do contact me at Tom.Gebhart@bl.uk or @thmsgbhrt on Twitter. UK digital comic creators and publishers can use the ComicHaus app to send their digital comics directly to The British Library digital archive. More details about this process are here.

This post is by British Library collaborative doctoral student Thomas Gebhart (@thmsgbhrt).

12 June 2020

Making Watermarks Visible: A Collaborative Project between Conservation and Imaging

Add comment

Some of the earliest documents being digitised by the British Library Qatar Foundation Partnership are a series of ship’s journals dating from 1605 - 1705, relating to the East India Company’s voyages. Whilst working with these documents, conservators Heather Murphy and Camille Dekeyser-Thuet noticed within the papers a series of interesting examples of early watermark design. Curious about the potential information these could give regarding the journals, Camille and Heather began undertaking research, hoping to learn more about the date and provenance of the papers, trade and production patterns involved in the paper industry of the time, and the practice of watermarking paper. There is a wealth of valuable and interesting information to be gained from the study of watermarks, especially within a project such as the BLQFP which provides the opportunity for study within both IOR and Arabic manuscript material. We hope to publish more information relating to this online with the Qatar Digital Library in the form of Expert articles and visual content.

The first step within this project involved tracing the watermark designs with the help of a light sheet in order to begin gathering a collection of images to form the basis of further research. It was clear that in order to make the best possible use of the visual information contained within these watermarks, they would need to be imaged in a way which would make them available to audiences in both a visually appealing and academically beneficial form, beyond the capabilities of simply hand tracing the designs.

Hand tracings of the watermark designs
Hand tracings of the watermark designs

 

This began a collaboration with two members of the BLQFP imaging team, Senior Imaging Technician Jordi Clopes-Masjuan and Senior Imaging Support Technician Matt Lee, who, together with Heather and Camille, were able to devise and facilitate a method of imaging and subsequent editing which enabled new access to the designs. The next step involved the construction of a bespoke support made from Vivak (commonly used for exhibition mounts and stands). This inert plastic is both pliable and transparent, which allowed the simultaneous backlighting and support of the journal pages required to successfully capture the watermarks.

Creation of the Vivak support
Creation of the Vivak support
Imaging of pages using backlighting
Imaging of pages using backlighting
Studio setup for capturing the watermarks
Studio setup for capturing the watermarks

 

Before capturing, Jordi suggested we create two comparison images of the watermarks. This involved capturing the watermarks as they normally appear on the digitised image (almost or completely invisible), and how they appear illuminated when the page is backlit. The theory behind this was quite simple: “to obtain two consecutive images from the same folio, in the exact same position, but using a specific light set-up for each image”.

By doing so, the idea was for the first image to appear in the same way as the standard, searchable images on the QDL portal. To create these standard image captures, the studio lights were placed near the camera with incident light towards the document.

The second image was taken immediately after, but this time only backlight was used (light behind the document). In using these two different lighting techniques, the first image allowed us to see the content of the document, but the second image revealed the texture and character of the paper, including conservation marks, possible corrections to the writing, as well as the watermarks.

One unexpected occurrence during imaging was, due to the varying texture and thickness of the papers, the power of the backlight had to be re-adjusted for each watermark.

First image taken under normal lighting conditions
First image taken under normal lighting conditions 
Second image of the same page taken using backlighting
Second image of the same page taken using backlighting 

https://www.qdl.qa/en/archive/81055/vdc_100000001273.0x000342

 

Previous to our adopted approach, other imaging techniques were also investigated: 

  • Multispectral photography: by capturing the same folio under different lights (from UV to IR) the watermarks, along with other types of hidden content such as faded ink, would appear. However, it was decided that this process would take too long for the number of watermarks we were aiming to capture.
  • Light sheet: Although these types of light sheets are extremely slim and slightly flexible, we experienced some issues when trying the double capture, as on many occasions the light sheet was not flexible enough, and was “moving” the page when trying to reach the gutter (for successful final presentation of the images it was mandatory that the folio on both captures was still).

Once we had successfully captured the images, Photoshop proved vital in allowing us to increase the contrast of the watermark and make it more visible. Because every image captured was different, the approach to edit the images was also different. This required varying adjustments of levels, curves, saturation or brightness, and combining these with different fusion modes to attain the best result. In the end, the tools used were not as important as the final image. The last stage within Photoshop was for both images of the same folio to be cropped and exported with the exact same settings, allowing the comparative images to match as precisely as possible.

The next step involved creating a digital line drawing of each watermark. Matt Lee, a Senior Imaging Support Technician, imported the high-resolution image captures onto an iPad and used the Procreate drawing app to trace the watermarks with a stylus pen. To develop an approach that provided accurate and consistent results, Matt first tested brushes and experimented with line qualities and thicknesses. Selecting the Dry Ink brush, he traced the light outlines of each watermark on a separate transparent layer. The tracings were initially drawn in white to highlight the designs on paper and these were later inverted to create black line drawings that were edited and refined.

Tracing the watermarks directly from the screen of an iPad provided a level of accuracy and efficiency that would be difficult to achieve on a computer with a graphics tablet, trackpad or computer mouse. There were several challenges in tracing the watermarks from the image captures. For example, the technique employed by Jordi was very effective in highlighting the watermarks, but it also made the laid and chain lines in the paper more prominent and these would merge or overlap with the light outline of the design.

Some of the watermarks also appeared distorted, incomplete or had handwritten text on the paper which obscured the details of the design. It was important that the tracings were accurate and some gaps had to be left. However, through the drawing process, the eye began to pick out more detail and the most exciting moment was when a vague outline of a horse revealed itself to be a unicorn with inset lettering.

Vector image of unicorn watermark
Vector image of unicorn watermark

 

In total 78 drawings of varying complexity and design were made for this project. To preserve the transparent backgrounds of the drawings, they were exported first as PNG files. These were then imported into Adobe Illustrator and converted to vector drawings that can be viewed at a larger size without loss of image quality.

Vector image of watermark featuring heraldic designs(Drawing)
Vector image of watermark featuring heraldic designs

 

Once the drawings were complete, we now had three images - the ‘traditional view’ (the page as it would normally appear), the ‘translucid view’ (the same page backlit and showing the watermark) and the ‘translucid + white view’ (the translucid view plus additional overlay of the digitally traced watermark in place on the page).

Traditional view
Traditional view
Translucid view
Translucid view
Translucid view with watermark highlighted by digital tracingtranslucid+white view
Translucid view with watermark highlighted by digital tracing

 

Jordi was able to take these images and, by using a multiple slider tool, was able to display them on an offline website. This enabled us to demonstrate this tool to our team and present the watermarks in the way we had been wishing from the beginning, allowing people to both study and appreciate the designs.

Watermarks Project Animated GIF

 

This is a guest post by Heather Murphy, Conservator, Jordi Clopes-Masjuan, Senior Imaging Technician and Matt Lee, Senior Imaging Support Technician from the British Library Qatar Foundation Partnership. You can follow the British Library Qatar Foundation Partnership on Twitter at @BLQatar.

 

24 April 2020

BL Labs Learning & Teaching Award Winners - 2019 - The Other Voice - RCA

Add comment

Innovations in sound and art

Dr Matt Lewis, Tutor of Digital Direction and Dr Eleanor Dare, Reader of Digital Media both at the School of Communication, at the Royal College of Art and Mary Stewart Curator, Oral History and Deputy Director of National Life Stories at the British Library reflect on an ongoing and award-winning collaboration (posted on behalf of them by Mahendra Mahey, BL Labs Manager).

In spring 2019, based in both the British Library and the Royal College of Art School of Communication, seven students from the MA Digital Direction course participated in an elective module entitled The Other Voice. After listening in-depth to a selection of oral history interviews, the students learnt how to edit and creatively interpret oral histories, gaining insight into the complex and nuanced ethical and practical implications of working with other people’s life stories. The culmination of this collaboration was a two-day student-curated showcase at the British Library, where the students displayed their own creative and very personal responses to the oral history testimonies.

The culmination of this collaboration was a two-day student-curated showcase at the British Library, where the students displayed their own creative and very personal responses to the oral history testimonies. The module was led by Eleanor Dare (Head of Programme for MA Digital Direction, RCA), Matt Lewis (Sound Artist and Musician and RCA Tutor) and Mary Stewart (British Library Oral History Curator). We were really pleased that over 100 British Library staff took the time to come to the showcase, engage with the artwork and discuss their responses with the students.

Eleanor reflects:

The students have benefited enormously from this collaboration, gaining a deeper understanding of the ethics of editing, the particular power of oral history and of course, the feedback and stimulation of having a show in the British Library.”

We were all absolutely delighted that the Other Voice group were the winners of the BL Labs Teaching and Learning Award 2019, presented in November 2019 at a ceremony at the British Library Knowledge Centre.  Two students, Karthika Sakthivel and Giulia Brancati, also showcased their work at the 2019 annual Oral History Society Regional Network Event at the British Library - and contributed to a wide ranging discussion reflecting on their practice and the power of oral history with a group of 35 oral historians from all over the UK.  The collaboration has continued as Mary and Matt ran ‘The Other Voice’ elective in spring 2020, where the students adapted to the Covid-19 Pandemic, producing work under lockdown, from different locations around the world. 

Here is just a taster of the amazing works the students created in 2019, which made them worthy winners of the BL Labs Teaching and Learning Award 2019.

Karthika Sakthivel and Giulia Brancati were both inspired by the testimony of Irene Elliot, who was interviewed by Dvora Liberman in 2014 for an innovative project on Crown Court Clerks. They were both moved by Irene’s rich description of her mother’s hard work bringing up five children in 1950s Preston.

On the way back by Guilia Brancati

Giulia created On the way back an installation featuring two audio points – one with excerpts of Irene’s testimony and another an audio collage inspired by Irene’s description. Two old fashioned telephones played the audio, which the listener absorbed while curled up in an arm chair in a fictional front room. It was a wonderfully immersive experience.

Irene-eilliot
Irene Elliot's testimony interwoven with the audio collage (C1674/05)
Audio collage and photography © Giulia Brancati.
Listen here

Giulia commented:

In a world full of noise and overwhelming information, to sit and really pay attention to someone’s personal story is an act of mindful presence. This module has been continuous learning experience in which ‘the other voice’ became a trigger for creativity and personal reflection.”

Memory Foam by Karthika Sakthivel

Inspired by Irene’s testimony Karthika created a wonderful sonic quilt, entitled Memory Foam.

Karthika explains,

There was power in Irene’s voice, enough to make me want to sew - something I’d never really done on my own before. But in her story there was comfort, there was warmth and that kept me going.”

Illustrated with objects drawn from Irene's memories, each square of the patchwork quilt encased conductive fabric that triggered audio clips. Upon touching each square, the corresponding story would play.

Karthika further commented,

The initial visitor interactions with the piece gave me useful insights that enabled me to improve the experience in real time by testing alternate ways of hanging and displaying the quilt. After engaging with the quilt guests walked up to me with recollections of their own mothers and grandmothers – and these emotional connections were deeply rewarding.”

Karthika, Giulia and the whole group were honoured that Irene and her daughter Jayne travelled from Preston to come to the exhibition, Karthika:

"It was the greatest honour to have her experience my patchwork of her memories. This project for me unfurled yards of possibilities, the common thread being - the power of a voice.”

Memory-foam
Irene and her daughter Jayne experiencing Memory Foam © Karthika Sakthivel.
Irene's words activated by touching the lime green patch with lace and a zip (top left of the quilt) (C1674/05)
Listen here

Meditations in Clay by James Roadnight and David Sappa

Listening to ceramicist Walter Keeler's memories of making a pot inspired James Roadnight and David Sappa to travel to Cornwall and record new oral histories to create Meditations in Clay. This was an immersive documentary that explores what we, as members of this modern society, can learn from the craft of pottery - a technology as old as time itself. The film combines interviews conducted at the Bernard Leach pottery with audio-visual documentation of the St Ives studio and its rugged Cornish surroundings.


Meditations in Clay, video montage © James Roadnight and David Sappa.

Those attending the showcase were bewitched as they watched the landscape documentary on the large screen and engaged with the selection of listening pots, which when held to the ear played excerpts of the oral history interviews.

James and David commented,

This project has taught us a great deal about the deep interview techniques involved in Oral History. Seeing visitors at the showcase engage deeply with our work, watching the film and listening to our guided meditation for 15, 20 minutes at a time was more than we could have ever imagined.”

Beyond Form

Raf Martins responded innovatively to Jonathan Blake’s interview describing his experiences as one of the first people in the UK to be diagnosed with HIV. In Beyond Form Raf created an audio soundscape of environmental sounds and excerpts from the interview which played alongside a projected 3D hologram based on the cellular structure of the HIV virus. The hologram changed form and shape when activated by the audio – an intriguing visual artefact that translated the vibrant individual story into a futuristic media.

Beyond-form
Jonathan Blake's testimony interwoven with environmental soundscape (C456/104) Soundscape and image © Raf Martins.
Listen here

Stiff Upper Lip

Also inspired by Jonathan Blake’s interview was the short film Stiff Upper Lip by Kinglsey Tao which used clips of the interview as part of a short film exploring sexuality, identity and reactions to health and sickness.

Donald in Wonderland

Donald Palmer’s interview with Paul Merchant contained a wonderful and warm description of the front room that his Jamaican-born parents ‘kept for best’ in 1970s London. Alex Remoleux created a virtual reality tour of the reimagined space, entitled Donald in Wonderland, where the viewer could point to various objects in the virtual space and launch the corresponding snippet of audio.

Alex commented,

I am really happy that I provided a Virtual Reality experience, and that Donald Palmer himself came to see my work. In the picture below you can see Donald using the remote in order to point and touch the objects represented in the virtual world.”

Donald-wonderland
Donald Palmer describes his parents' front room (C1379/102)
Interviewee Donald Palmer wearing the virtual reality headset, exploring the virtual reality space (pictured) created by Alex Remoleux.
Listen here

Showcase at the British Library

The reaction to the showcase from the visitors and British Library staff was overwhelmingly positive, as shown by this small selection of comments. We were incredibly grateful to interviewees Irene and Donald for attending the showcase too. This was an excellent collaboration: RCA students and staff alike gained new insights into the significance and breadth of the British Library Oral History collection and the British Library staff were bowled over by the creative responses to the archival collection.

Feedback
Examples of feedback from British Library showcase of 'The Other Voice' by Royal College of Art

With thanks to the MA Other Voice cohort Giulia Brancati, Raf Martins, Alexia Remoleux, James Roadnight, Karthika Sakthivel, David Sappa and Kingsley Tao, RCA staff Eleanor Dare and Matt Lewis & BL Oral History Curator Mary Stewart, plus all the interviewees who recorded their stories and the visitors who took the time to attend the showcase.

21 April 2020

Clean. Migrate. Validate. Enhance. Processing Archival Metadata with Open Refine

Add comment

This blogpost is by Graham Jevon, Cataloguer, Endangered Archives Programme 

Creating detailed and consistent metadata is a challenge common to most archives. Many rely on an army of volunteers with varying degrees of cataloguing experience. And no matter how diligent any team of cataloguers are, human error and individual idiosyncrasies are inevitable.

This challenge is particularly pertinent to the Endangered Archives Programme (EAP), which has hitherto funded in excess of 400 projects in more than 90 countries. Each project is unique and employs its own team of one or more cataloguers based in the particular country where the archival content is digitised. But all this disparately created metadata must be uniform when ingested into the British Library’s cataloguing system and uploaded to eap.bl.uk.

Finding an efficient, low-cost method to process large volumes of metadata generated by hundreds of unique teams is a challenge; one that in 2019, EAP sought to alleviate using freely available open source software Open Refine – a power tool for processing data.

This blog highlights some of the ways that we are using Open Refine. It is not an instructional how-to guide (though we are happy to follow-up with more detailed blogs if there is interest), but an introductory overview of some of the Open Refine methods we use to process large volumes of metadata.

Initial metadata capture

Our metadata is initially created by project teams using an Excel spreadsheet template provided by EAP. In the past year we have completely redesigned this template in order to make it as user friendly and controlled as possible.

Screenshot of spreadsheet

But while Excel is perfect for metadata creation, it is not best suited for checking and editing large volumes of data. This is where Open Refine excels (pardon the pun!), so when the final completed spreadsheet is delivered to EAP, we use Open Refine to clean, validate, migrate, and enhance this data.

WorkflowDiagram

Replicating repetitive tasks

Open Refine came to the forefront of our attention after a one-day introductory training session led by Owen Stephens where the key takeaway for EAP was that a sequence of functions performed in Open Refine can be copied and re-used on subsequent datasets.

ScreenshotofOpenRefineSoftware1

This encouraged us to design and create a sequence of processes that can be re-applied every time we receive a new batch of metadata, thus automating large parts of our workflow.

No computer programming skills required

Building this sequence required no computer programming experience (though this can help); just logical thinking, a generous online community willing to share their knowledge and experience, and a willingness to learn Open Refine’s GREL language and generic regular expressions. Some functions can be performed simply by using Open Refine’s built-in menu options. But the limits of Open Refine’s capabilities are almost infinite; the more you explore and experiment, the further you can push the boundaries.

Initially, it was hoped that our whole Open Refine sequence could be repeated in one single large batch of operations. The complexity of the data and the need for archivist intervention meant that it was more appropriate to divide the process into several steps. Our workflow is divided into 7 stages:

  1. Migration
  2. Dates
  3. Languages and Scripts
  4. Related subjects
  5. Related places and other authorities
  6. Uniform Titles
  7. Digital content validation

Each of these stages performs one or more of four tasks: clean, migrate, validate, and enhance.

Task 1: Clean

The first part of our workflow provides basic data cleaning. Across all columns it trims any white space at the beginning or end of a cell, removes any double spaces, and capitalises the first letter of every cell. In just a few seconds, this tidies the entire dataset.

Task 1 Example: Trimming white space (menu option)

Trimming whitespace on an individual column is an easy function to perform as Open Refine has a built in “Common transform” that performs this function.

ScreenshotofOpenRefineSoftware2

Although this is a simple function to perform, we no longer need to repeatedly select this menu option for each column of each dataset we process because this task is now part of the workflow that we simply copy and paste.

Task 1 Example: Capitalising the first letter (using GREL)

Capitalising the first letter of each cell is less straightforward for a new user as it does not have a built-in function that can be selected from a menu. Instead it requires a custom “Transform” using Open Refine’s own expression language (GREL).

ScreenshotofOpenRefineSoftware3


Having to write an expression like this should not put off any Open Refine novices. This is an example of Open Refine’s flexibility and many expressions can be found and copied from the Open Refine wiki pages or from blogs like this. The more you copy others, the more you learn, and the easier you will find it to adapt expressions to your own unique requirements.

Moreover, we do not have to repeat this expression again. Just like the trim whitespace transformation, this is also now part of our copy and paste workflow. One click performs both these tasks and more.

Task 2: Migrate

As previously mentioned, the listing template used by the project teams is not the same as the spreadsheet template required for ingest into the British Library’s cataloguing system. But Open Refine helps us convert the listing template to the ingest template. In just one click, it renames, reorders, and restructures the data from the human friendly listing template to the computer friendly ingest template.

Task 2 example: Variant Titles

The ingest spreadsheet has a “Title” column and a single “Additional Titles” column where all other title variations are compiled. It is not practical to expect temporary cataloguers to understand how to use the “Title” and “Additional Titles” columns on the ingest spreadsheet. It is much more effective to provide cataloguers with a listing template that has three prescriptive title columns. This helps them clearly understand what type of titles are required and where they should be put.

SpreadsheetSnapshot

The EAP team then uses Open Refine to move these titles into the appropriate columns (illustrated above). It places one in the main “Title” field and concatenates the other two titles (if they exist) into the “Additional Titles” field. It also creates two new title type columns, which the ingest process requires so that it knows which title is which.

This is just one part of the migration stage of the workflow, which performs several renaming, re-ordering, and concatenation tasks like this to prepare the data for ingest into the British Library’s cataloguing system.

Task 3: Validate

While cleaning and preparing the data for migration is important, it also vital that we check that the data is accurate and reliable. But who has the time, inclination, or eye stamina to read thousands of rows of data in an Excel spreadsheet? What we require is a computational method to validate data. Perhaps the best way of doing this is to write a bespoke computer program. This indeed is something that I am now working on while learning to write computer code using the Python language (look out for a further blog on this later).

In the meantime, though, Open Refine has helped us to validate large volumes of metadata with no programming experience required.

Task 3 Example: Validating metadata-content connections

When we receive the final output from a digitisation project, one of our most important tasks is to ensure that all of digital content (images, audio and video recordings) correlate with the metadata on the spreadsheet and vice versa.

We begin by running a command line report on the folders containing the digital content. This provides us with a csv file which we can read in Excel. However, the data is not presented in a neat format for comparison purposes.

SpreadsheetSnapshot2

Restructuring data ready for validation comparisons

For this particular task what we want is a simple list of all the digital folder names (not the full directory) and the number of TIFF images each folder contains. Open Refine enables just that, as the next image illustrates.

ScreenshotofOpenRefineSoftware4

Constructing the sequence that restructures this data required careful planning and good familiarity with Open Refine and the GREL expression language. But after the data had been successfully restructured once, we never have to think about how to do this again. As with other parts of the workflow, we now just have to copy and paste the sequence to repeat this transformation on new datasets in the same format.

Cross referencing data for validation

With the data in this neat format, we can now do a number of simple cross referencing checks. We can check that:

  1. Each digital folder has a corresponding row of metadata – if not, this indicates that the metadata is incomplete
  2. Each row of metadata has a corresponding digital folder – if not, this indicates that some digital folders containing images are missing
  3. The actual number of TIFF images in each folder exactly matches the number of images recorded by the cataloguer – if not this may indicate that some images are missing.

For each of these checks we use Open Refine’s cell.cross expression to cross reference the digital folder report with the metadata listing.

In the screenshot below we can see the results of the first validation check. Each digital folder name should match the reference number of a record in the metadata listing. If we find a match it returns that reference number in the “CrossRef” column. If no match is found, that column is left blank. By filtering that column by blanks, we can very quickly identify all of the digital folders that do not contain a corresponding row of metadata. In this example, before applying the filter, we can already see that at least one digital folder is missing metadata. An archivist can then investigate why that is and fix the problem.

ScreenshotofOpenRefineSoftware5

Task 4: Enhance

We enhance our metadata in a number of ways. For example, we import authority codes for languages and scripts, and we assign subject headings and authority records based on keywords and phrases found in the titles and description columns.

Named Entity Extraction

One of Open Refine’s most dynamic features is its ability to connect to other online databases and thanks to the generous support of Dandelion API we are able to use its service to identify entities such as people, places, organisations, and titles of work.

In just a few simple steps, Dandelion API reads our metadata and returns new linked data, which we can filter by category. For example, we can list all of the entities it has extracted and categorised as a place or all the entities categorised as people.

ScreenshotofOpenRefineSoftware6

Not every named entity it finds will be accurate. In the above example “Baptism” is clearly not a place. But it is much easier for an archivist to manually validate a list of 29 phrases identified as places, than to read 10,000 scope and content descriptions looking for named entities.

Clustering inconsistencies

If there is inconsistency in the metadata, the returned entities might contain multiple variants. This can be overcome using Open Refine’s clustering feature. This identifies and collates similar phrases and offers the opportunity to merge them into one consistent spelling.

ScreenshotofOpenRefineSoftware7

Linked data reconciliation

Having identified and validated a list of entities, we then use other linked data services to help create authority records. For this particular task, we use the Wikidata reconciliation service. Wikidata is a structured data sister project to Wikipedia. And the Open Refine reconciliation service enables us to link an entity in our dataset to its corresponding item in Wikidata, which in turn allows us to pull in additional information from Wikidata relating to that item.

For a South American photograph project we recently catalogued, Dandelion API helped identify 335 people (including actors and performers). By subsequently reconciling these people with their corresponding records in Wikidata, we were able to pull in their job title, date of birth, date of death, unique persistent identifiers, and other details required to create a full authority record for that person.

ScreenshotofOpenRefineSoftware8

Creating individual authority records for 335 people would otherwise take days of work. It is a task that previously we might have deemed infeasible. But Open Refine and Wikidata drastically reduces the human effort required.

Summary

In many ways, that is the key benefit. By placing Open Refine at the heart of our workflow for processing metadata, it now takes us less time to do more. Our workflow is not perfect. We are constantly finding new ways to improve it. But we now have a semi-automated method for processing large volumes of metadata.

This blog puts just some of those methods in the spotlight. In the interest of brevity, we refrained from providing step-by-step detail. But if there is interest, we will be happy to write further blogs to help others use this as a starting point for their own metadata processing workflows.

16 April 2020

BL Labs Community Commendation Award 2019 - Lesley Phillips - Theatre History

Add comment

EXPLORING THEATRE HISTORY WITH BRITISH LIBRARY PLAYBILLS AND NEWSPAPERS

Posted on behalf of Lesley Phillips, a former Derbyshire local studies librarian in the UK and BL Labs Community Commendation Award winner for 2019 by Mahendra Mahey, Manager of BL Labs.

Lesley explains how the British Library's digital collections of playbills and digtised newspapers enabled her to compile a detailed account of the career of the actor-manager John Faucit Saville in the East Midlands 1843-1855.

John Faucit Saville was born in Norwich in 1807, the son of two actors then performing with the Norwich Company as Mr and Mrs Faucit. His parents separated when he was 14 years old and just entering on his stage career. His mother, then a leading actress at Drury Lane, moved in with the celebrated actor William Farren, and continued to perform as Mrs Faucit, while his father became a manager and changed his surname to Saville (his real name).

Oxberry's Dramatic Biography (1825) records his father's grief:

On the evening that the fatal news [of his wife's desertion] reached him [Mr John Faucit] left the theatre and walked over the beach. His lips trembled and he was severely agitated. Many persons addressed him, but he broke from them and went to the house of a particular friend. The facts were then known only to himself. Though a man of temperate habits, he drank upwards of two bottles of wine without being visibly affected. He paced the room and seemed unconscious of the presence of anyone. To his friend's inquiries he made no reply. He once said “My heart is almost broke, but you will soon know why”.

(C.E. Oxberry (ed.) Oxberry's Dramatic Biography and Histrionic Anecdotes. Vol. III (1825) pp. 33-34, Memoir of William Farren)

Despite the rift between his parents, John Faucit Saville had all the advantages that famous friends and relatives could bring in the theatrical world, but during his time as an aspiring actor it soon became clear that he would never be a great star. In 1841 he began to put his energies into becoming a manager, like his father before him. He took a lease of Brighton Theatre in his wife's home town, but struggled to make a success of it.

Like the other managers of his day he was faced with a decline in the fashion for rational amusements and the rise of 'beer and circuses'. This did not deter him from making a further attempt at establishing a theatrical circuit. For this he came to the East Midlands and South Yorkshire, where the decline of the old circuit and the retirement of Thomas Manly had laid the field bare for a new man. Saville must surely have had great confidence in his own ability to be successful here, given that the old, experienced manager had begun to struggle.

Saville took on the ailing circuit, and soon discovered that he was forced to make compromises. He was careful to please the local authorities as to the respectability of his productions, and yet managed to provide more lowbrow entertainments to bring in the audiences. Even so, after a few years he was forced to reign in his ambitions and eventually reduce his circuit, and he even went back on tour as an itinerant actor from time to time to supplement his income. Saville's career had significant implications for the survival of some of the theatres of the East Midlands, as he lived through the final disintegration of the circuit.

Over the years, John Faucit Saville's acting career had taken him to Paris, Edinburgh, and Dublin, as well as many parts of England. Without the use of digital online resources it would be almost impossible to trace a career such as his, to explore his background, and bring together the details of his life and work.

Theatre-royal-brghton
Newspaper article from 29 January 1829 detailing the benefit performance for Mr Faucit entitled 'Clandestine Marriage' at the Theatre Royal Brighton

The digitised newspapers of the British Newspaper Archive https://www.britishnewspaperarchive.co.uk enabled me to uncover the Saville family origins in Bedford, and to follow John Faucit Saville's career from the heights of the London stage, to management at Brighton and then to the Midlands.

Saville-benefit
Newspaper article detailing benefit performance for Mr JF Saville at Theatre Royal Derby on Friday May 23, 1845, play entitled 'Don Caesar de Bazan' or 'Martina the Gypsy'

The dataset of playbills available to download from the British Library web site https://data.bl.uk/playbills/pb1.html enabled me to build up a detailed picture of Saville's work, the performers and plays he used, and the way he used them. It was still necessary to visit some libraries and archives for additional information, but I could never have put together such a rich collection of information without these digital resources.

My research has been put into a self-published book, filled with newspaper reviews of Saville's productions, and stories about his company. This is not just a narrow look at regional theatre; there are also some references to figures of national importance in theatre history. John Faucit Saville's sister, Helen Faucit, was a great star of her day, and his half-brother Henry Farren made his stage debut in Derbyshire with Saville's company. John Faucit Saville's wife Marianne performed with Macready on his farewell tour and also played at Windsor for Queen Victoria. The main interest for me, however, was the way theatre history reveals how national and local events impacted on society and public behaviour, and how the theatre connected with the life of the ordinary working man and woman.

Lesley-phillips-book
Front cover of my self-published book about John Faucit Saville

If you are interested in playbills generally, you might want to help British Library provide more information about individual ones through a crowdsourcing project, entitled 'In the Spotlight'.

 

08 April 2020

Legacies of Catalogue Descriptions and Curatorial Voice: a new AHRC project

Add comment

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the School of History, Art History and Philosophy, University of Sussex. James has a background in the history of the printed image, archival theory, art history, and computational analysis. He is author of The Business of Satirical Prints in Late-Georgian England (2017), the first monograph on the infrastructure of the satirical print trade circa 1770-1830, and a member of the Programming Historian team.

I love a good catalogue. Whether describing historic books, personal papers, scientific objects, or works of art, catalogue entries are the stuff of historical research, brief insights into a many possible avenues of discovery. As a historian, I am trained to think critically about catalogues and the entries they contain, to remember that they are always crafted by people, institutions, and temporally specific ways of working, and to consider what that reality might do to my understanding of the past those catalogues and entries represent. Recently, I've started to make these catalogues my objects of historical study, to research what they contain, the labour that produced them, and the socio-cultural forces that shaped that labour, with a particular focus on the anglophone printed catalogue circa 1930-1990. One motivation for this is purely historical, to elucidate what I see as an important historical phenomenon. But another is about now, about how those catalogues are used and reused in the digital age. Browse the shelves of a university library and you'll quickly see that circumstances of production are encoded into the architecture of the printed catalogue: title pages, prefaces, fonts, spines, and the quality of paper are all signals of their historical nature. But when their entries - as many have been over the last 30 years - are moved into a database and online, these cues become detached, and their replacement – a bibliographic citation – is insufficient to evoke their historical specificity, does little to help alert the user to the myriad of texts they are navigating each time they search an online catalogue.

It is these interests and concerns that underpin "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship", a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library. This 12-month project funded by the Arts and Humanities Research Council aims to open up new and important directions for computational, critical, and curatorial analysis of collection catalogues. Our pilot research will investigate the temporal and spatial legacy of a catalogue I know well - the landmark ‘Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum’, produced by Mary Dorothy George between 1930 and 1954, 1.1 million words of text to which all scholars of the long-eighteenth century printed image are indebted, and which forms the basis of many catalogue entries at other institutions, not least those of our partners at the Lewis Walpole Library. We are particularly interested in tracing the temporal and spatial legacies of this catalogue, and plan to repurpose corpus linguistic methods developed in our "Curatorial Voice" project (generously funded by the British Academy) to examine the enduring legacies of Dorothy George's "voice" beyond her printed volumes.

Participants at the Curatorial Voices workshop, working in small groups and drawing images on paper.
Some things we got up to at our February 2019 Curatorial Voice workshop. What a difference a year makes!

But we also want to demonstrate the value of these methods to cultural institutions. Alongside their collections, catalogues are central to the identities and legacies of these institutions. And so we posit that being better able to examine their catalogue data can help cultural institutions get on with important catalogue related work: to target precious cataloguing and curatorial labour towards the records that need the most attention, to produce empirically-grounded guides to best practice, and to enable more critical user engagement with 'legacy' catalogue records (for more info, see our paper ‘Investigating Curatorial Voice with Corpus Linguistic Techniques: the case of Dorothy George and applications in museological practice’, Museum & Society, 2020).

A table with boxes of black and red lines which visualise the representation of spacial and non-spacial sentence parts in the descriptions of the satirical prints.
An analysis of our BM Satire Descriptions corpus (see doi.org/10.5281/zenodo.3245037 for how we made it and doi.org/10.5281/zenodo.3245017 for our methods). In this visualization - a snapshot of a bigger interactive - one box represents a single description, red lines are sentence parts marked ‘spatial’, and black lines are sentence parts marked as ‘non-spatial’. This output was based on iterative machine learning analysis with Method52. The data used is published by ResearchSpace under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Over the course of the "Legacies" project, we had hoped to run two capability building workshops aimed at library, archives, and museum professionals. The first of these was due to take place at the British Library this May, and the aim of the workshop was to test our still very much work-in-progress training module on the computational analysis of catalogue data. Then Covid-19 hit and, like most things in life, the plan had to be dropped.

The new plan is still in development, but the project team know that we need input from the community to make the training module of greatest benefit to that community. The current plan is that in late summer we will run some ad hoc virtual training sessions on computational analysis of catalogue data. And so we are looking for library, archives, and museum professionals who produce or work with catalogue data to be our crash test dummies, to run through parts of the module, to tell us what works, what doesn't, and what is missing. If you'd be interested in taking part in one of these training sessions, please email James Baker and tell me why. We look forward to hearing from you.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

06 April 2020

Poetry Mobile Apps

Add comment

This is a guest post by Pete Hebden, a PhD student at Newcastle University, currently undertaking a practice-led PhD; researching and creating a poetry app. Pete has recently completed a three month placement in Contemporary British Published Collections at the British Library, where he assisted curators working with the UK Web Archive, artists books and emerging formats collections, you can follow him on Twitter as @Pete_Hebden

As part of my PhD research, I have been investigating how writers and publishers have used smartphone and tablet devices to present poetry in new ways through mobile apps. In particular, I’m interested in how these new ways of presenting poetry compare to the more familiar format of the printed book. The mobile device allows poets and publishers to create new experiences for readers, incorporating location-based features, interactivity, and multimedia into the encounter with the poem.

Since the introduction of smartphones and tablet computers in the early 2010s, a huge range of digital books, e-literature, and literary games have been developed to explore the possibilities of this technology for literature. Projects like Ambient Literature and the work of Editions at Play have explored how mobile technology can transform story-telling and narrative, and similarly my project looks at how this technology can create new experiences of poetic texts.

Below are a few examples of poetry apps released over the past decade. For accessibility reasons, this selection has been limited to apps that can be used anywhere and are free to download. Some of them present work written with the mobile device in mind, while others take existing print work and re-mediate it for the mobile touchscreen.

Puzzling Poetry (iOS and Android, 2016)

Dutch developers Studio Louter worked with multiple poets to create this gamified approach to reading poetry. Existing poems are turned into puzzles to be unlocked by the reader word-by-word as they use patterns and themes within each text to figure out where each word should go. The result is that often new meanings and possibilities are noticed that might have been missed in a traditional linear reading experience.

Screen capture of Puzzling Poetry
Screen capture image of  the Puzzling Poetry app

This video explains and demonstrates how the Puzzling Poetry app works:

 

Translatory (iOS, 2016)

This app, created by Arc Publications, guides readers in creating their own English translations of contemporary foreign-language poems. Using the digital display to see multiple possible translations of each phrase, the reader gains a fresh understanding of the complex work that goes into literary translation, as well as the rich layers of meaning included within the poem. Readers are able to save their finished translations and share them through social media using the app.

Screen capture image of Translatory
Screen capture image of the Translatory app

 

Poetry: The Poetry Foundation app (iOS and Android, 2011)

At nearly a decade old, the Poetry Foundation’s Poetry app was one of the first mobile apps dedicated to poetry, and has been steadily updated by the editors of Poetry magazine ever since. It contains a huge array of both public-domain work and poems published in the magazine over the past century. To help users find their way through this, Poetry’s developers created an entertaining and useful interface for finding poems with unique combinations of themes through a roulette-wheel-style ‘spinner’. The app also responds to users shaking their phone for a random selection of poem. 

Screen capture image of The Poetry Foundation app
Screen capture image of The Poetry Foundation app

 

ABRA: A Living Text  (iOS, 2014)

A collaboration between the poets Amaranth Borsuk and Kate Durbin, and developer Ian Hatcher, the ABRA app presents readers with a range of digital tools to use (or spells to cast) on the text, which transform the text and create a unique experience for each reader. A fun and unusual way to encounter a collection of poems, giving the reader the opportunity to contribute to an ever-shifting, crowd-edited digital poem.

Screen capture image of the ABRA app
Screen capture image of the ABRA app

This artistic video below demonstrates how the ABRA app works. Painting your finger and thumb gold is not required! 

I hope you feel inspired to check out these poetry apps, or maybe even to create your own.