THE BRITISH LIBRARY

Digital scholarship blog

180 posts categorized "Digital scholarship"

04 August 2020

Having a Hoot for International Owl Awareness Day

Add comment

Who doesn’t love owls? Here at the British Library we certainly do.

Often used as a symbol of knowledge, they are the perfect library bird. A little owl is associated and frequently depicted with the Greek goddess of wisdom Athena. The University of Bath even awarded Professor Yoda the European eagle owl a library card in recognition of his valuable service deterring seagulls from nesting on their campus.

The British Library may not have issued a reader pass to an owl (as far as I am aware!), but we do have a wealth of owl sound recordings in our wildlife and environmental sounds collection, you can read about and listen to some of these here.

Little Owl calls recorded by Nigel Tucker in Somerset, England (BL ref 124857)

Owls can also be discovered in our UK Web Archive. Our UK Web Archivists recently examined the Shine dataset to explore which UK owl species is the most popular on the archived .uk domain. Read here to find out which owl is the winner.

They also curate an Online Enthusiast Communities in the UK collection, which features bird watching and some owl related websites in the Animal related hobbies subsection. If you know of websites that you think should be included in this collection, then please fill in their online nomination form.

Here in Digital Scholarship I recently found many fabulous illustrations of owls in our Mechanical Curator Flickr image collection of over a million Public Domain images. So to honour owls on International Owl Awareness Day, I put together an owl album.

These owl illustrations are freely available, without copyright restrictions, for all types of creative projects, including digital collages. My colleague Hannah Nagle blogged about making collages recently and provided this handy guide. For finding more general images of nature for your collages, you may find it useful to browse other Mechanical Curator themed albums, such as Flora & Fauna, as these are rich resources for finding illustrations of trees, plants, animals and birds.

If you creatively use our Mechanical Curator Flickr images, please do share them with us on twitter, using the hashtag #BLdigital, we always love to see what people have done with them. Plus if you use any of our owls today, remember to include the #InternationalOwlAwarenessDay hashtag too!

We also urge you to be eagle-eyed (sorry wrong bird!) and look out for some special animated owls during the 4th August, like this one below, which uses both sounds and images taken from our collections. These have been created by Carlos Rarugal, our arty Assistant Web Archivist and will shared from the WildlifeWeb Archive and Digital Scholarship Twitter accounts. 


Video created by Carlos Rarugal,  using Tawny Owl hoots recorded by Richard Margoschis in Gloucestershire, England (BL ref 09647) and British Library digitised image from page 79 of "Woodland Wild: a selection of descriptive poetry. From various authors. With ... illustrations on steel and wood, after R. Bonheur, J. Bonheur, C. Jacque, Veyrassat, Yan Dargent, and other artists"

One of the benefits of making digital art, is that there is no risks of spilling paint or glue on your furniture! As noted in this tweet from Damyanti Patel "Thanks for the instructions, my kids were entertained & I had no mess to clean up after their art so a clear win win, they really enjoyed looking through the albums". I honestly did not ask them to do this, but it is really cool that her children included this fantastic owl in the centre of one of their digital collages:

I quite enjoy it when my library life and goth life connect! During the covid-19 lockdown I have attended several online club nights. A few months ago I was delighted to see that one of these; How Did I Get Here? Alternative 80s Night! regularly uses the British Library Flickr images to create their event flyers, using illustrations of people in strange predicaments to complement the name of their club; like this sad lady sitting inside a bird cage, in the flyer below.

Their next online event is Saturday 22nd August and you can tune in here. If you are a night owl, you could even make some digital collages, while listening to some great tunes. Sounds like a great night in to me!

Illustration of a woman sitting in a bird cage with a book on the floor just outside the cage
Flyer image for How Did I Get Here? Alternative 80s Night!

This post is by Digital Curator Stella Wisdom (@miss_wisdom

24 July 2020

Ira Aldridge In the Spotlight

Add comment

In this post, Dr Mia Ridge gives a sense of why sightings of Ira Aldridge in our historical playbills collection resonate...

Ira Aldridge is one of the most popular 'celebrity spottings' shared by volunteers working with historical playbills on our In the Spotlight project. Born on this day in New York in 1807, Aldridge was the first Black actor to play a Shakespearean role in Britain.

Portrait of Aldridge by James Northcote
Portrait of Aldridge by James Northcote

Educated at the African Free School and with some experience at the African Grove Theatre in New York, the teenaged Aldridge emigrated to Britain from the US in 1824 and quickly had an impact. In 1826 painter James Northcote captured him as Othello, a portrait which became the first acquisition by the Manchester Art Gallery. (If you're reading this before August 15th, you can attend an online tour exploring his work.)

While his initial reviews were mixed, he took The Times' mocking reference to him as the 'African Roscius' and used both the references to the famous Roman actor and his African ancestry in promotional playbills. Caught up in debates about the abolition of slavery and facing racism in reviews from critics about his performances in London's theatres, Aldridge toured the regions, particularly British cities with anti-slavery sympathies. He performed a range of roles, and his Shakespearean roles eventually including Othello, Shylock, Macbeth, King Lear and Richard III.

From 1852, he toured Europe, particularly Germany, Poland and Russia. This 'List showing the theatres and plays in various European cities where Ira Aldridge, the African Roscius, acted during the years 1827-1867, compiled by Arturo Alfonso Schomburg, shows how widely he travelled and the roles he performed.

As the 1841 playbill from Doncaster's Theatre Royal (below) shows, the tale of his African ancestry grew more creative over time. The playbill also advertises a lecture and memoirs from Aldridge on various topics. In the years around the abolition of slavery in the British Empire, he spoke powerfully and directly to audiences about the injustices of slavery and racism. Playbills like this demonstrate how Aldridge managed to both pander to and play with perceptions of 'the African'.

This is necessarily a very brief overview of Aldridge's life and impact but I hope it's given you a sense of why it's so exciting to catch a glimpse of Aldridge in our collections.

Screenshot of historical playbill

Sources used and further reading include:

My thanks to everyone who suggested references for this post, in particular: Christian Algar, Naomi Billingsley, Nora McGregor, Susan Reed from the British Library; Dorothy Berry from the Houghton Library at Harvard and In the Spotlight participants including beccabooks10, Nosnibor3, Elizabeth Danskin (who shared a link to this video about his daughter, Amanda Aldridge), Nicola Hayes, and Sylvia Morris (who has written extensively about Aldridge on her blog).

Post by Mia Ridge, Digital Curator, Western Heritage Collections.

22 July 2020

World of Wikimedia

Add comment

During recent months of working from home, the Wikimedia family of platforms, including Wikidata and Wikisource, have enabled many librarians and archivists to do meaningful work, to enhance and amplify access to the collections that they curate.

I’ve been very encouraged to learn from other institutions and initiatives who have been working with these platforms. So I recently invited some wonderful speakers to give a “World of Wikimedia” series of remote guest lectures for staff, to inspire my colleagues in the British Library.

Circle of logos from the Wikimedia family of platforms
Logos of the Wikimedia Family of platforms

Stuart Prior from Wikimedia UK kicked off this season with an introduction to Wikimedia and the projects within it, and how it works with galleries, libraries, archives and museums. He was followed by Dr Martin Poulter, who had been the Bodleian Library’s Wikimedian In Residence. Martin shared his knowledge of how books, authors and topics are represented in Wikidata, how Wikidata is used to drive other sites, including Wikipedia, and how Wikipedia combines data and narrative to tell the world about notable books and authors.

Continuing with the theme of books, Gavin Willshaw spoke about the benefits of using Wikisource for optical character recognition (OCR) correction and staff engagement. Giving an overview of the National Library of Scotland’s fantastic project to upload 3,000 digitised Scottish Chapbooks to Wikisource during the Covid-19 lockdown. Focusing on how the project came about, its impact, and how the Library plans to take activity in this area forward in the future.

Illustration of two 18th century men fighting with swords
Tippet is the dandy---o. The toper's advice. Picking lilies. The dying swan, shelfmark L.C.2835(14), from the National Library of Scotland's Scottish Chapbooks collection

Closing the World of Wikimedia season, Adele Vrana and Anasuya Sengupta gave an extremely thought provoking talk about Whose Knowledge? This is a global multilingual campaign, which they co-founded, to centre the knowledges of marginalised communities (the majority of the world) online. Their work includes the annual #VisibleWikiWomen campaign to make women more visible on Wikipedia, which I blogged about recently.

One of the silver linings of the covid-19 lockdown has been that I’ve been able to attend a number of virtual events, which I would not have been able to travel to, if they had been physical events. These have included LD4 Wikidata Affinity Group online meetings; which is a biweekly zoom call on Tuesdays at 9am PDT (5pm BST).

I’ve also remotely attended some excellent online training sessions: “Teaching with Wikipedia: a practical 'how to' workshop” ran by Ewan McAndrew, Wikimedian in Residence at The University of Edinburgh. Also “Wikimedia and Libraries - Running Online Workshops” organised by the Chartered Institute of Library and Information Professionals in Scotland (CILIPS), presented by Dr Sara Thomas, Scotland Programme Coordinator for Wikimedia UK, and previously the Wikimedian in Residence at the Scottish Library and Information Council. From attending the latter, I learned of an online “How to Add Suffragettes & Women Activists to Wikipedia” half day edit-a-thon event taking place on the 4th July organised by Sara, Dr t s Beall and Clare Thompson from the Protests and Suffragettes project, this is a wonderful project, which recovers and celebrates the histories of women activists in Govan, Glasgow.

We have previously held a number of in person Wikipedia edit-a-thon events at the British Library, but this was the first time that I had attended one remotely, via Zoom, so this was a new experience for me. I was very impressed with how it had been organised, using break out rooms for newbies and more experienced editors, including multiple short comfort breaks into the schedule and having very do-able bite size tasks, which were achievable in the time available. They used a comprehensive, but easy to understand, shared spreadsheet for managing the tasks that attendees were working on. This is definitely an approach and a template that I plan to adopt and adapt for any future edit-a-thons I am involved in planning.

Furthermore, it was a very fun and friendly event, the organisers had created We Can [edit]! Zoom background template images for attendees to use, and I learned how to use twinkles on videocalls! This is when attendees raise both hands and wiggle their fingers pointing upwards, to indicate agreement with what is being said, without causing a soundclash. This hand signal has been borrowed it from the American Sign Language word for applause, it is also used by the Green Party and the Occupy Movement.

With enthusiasm fired up from my recent edit-a-thon attending experience, last Saturday I joined the online Wikimedia UK 2020 AGM. Lucy Crompton-Reid, Chief Executive of Wikimedia UK, gave updates on changes in the global Wikimedia movement, such as implementing the 2030 strategy, rebranding Wikimedia, the Universal Code of Conduct and plans for Wikipedia’s 20th birthday. Lucy also announced that three trustees Kelly Foster, Nick Poole and Doug Taylor, who stood for the board were all elected. Nick and Doug have both been on the board since July 2015 and were re-elected. I was delighted to learn that Kelly is a new trustee joining the board for the first time. As Kelly has previously been a trainer at BL Wikipedia edit-a-thon events, and she coached me to create my first Wikipedia article on Coventry godcakes at a Wiki-Food and (mostly) Women edit-a-thon in 2017.

In addition to these updates, Gavin Willshaw, gave a keynote presentation about the NLS Scottish chapbooks Wikisource project that I mentioned earlier, and there were three lightning talks: Andy Mabbett; 'Wiki Hates Newbies', Clare Thompson, Lesley Mitchell and Dr t s Beall; 'Protests and Suffragettes: Highlighting 100 years of women’s activism in Govan, Glasgow, Scotland' and Jason Evans; 'An update from Wales'.

Before the event ended, there was a 2020 Wikimedia UK annual awards announcement, where libraries and librarians did very well indeed:

  • UK Wikimedian of the Year was awarded to librarian Caroline Ball for education work and advocacy at the University of Derby (do admire her amazing Wikipedia dress in the embedded tweet below!)
  • Honourable Mention to Ian Watt for outreach work, training, and efforts around Scotland's COVID-19 data
  • Partnership of the Year was given to National Library of Scotland for the WikiSource chapbooks project led by Gavin Willshaw
  • Honourable Mention to University of Edinburgh for work in education and Wikidata
  • Up and Coming Wikimedian was a joint win to Emma Carroll for work on the Scottish Witch data project and Laura Wood Rose for work at University of Edinburgh and on the Women in Red initiative
  • Michael Maggs was given an Honorary Membership, in recognition of his very significant contribution to the charity over a number of years.

Big congratulations to all the winners. Their fantastic work, and also in Caroline's case, her fashion sense, is inspirational!

For anyone interested, the next online event that I’m planning to attend is a #WCCWiki Colloquium organised by The Women’s Classical Committee, which aims to increase the representation of women classicists on Wikipedia. Maybe I’ll virtually see you there…

This post is by Digital Curator Stella Wisdom (@miss_wisdom

14 July 2020

Legacies of Catalogue Descriptions and Curatorial Voice: Training Sessions

Add comment

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the University of Sussex.

This month the team behind "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" ran two training sessions as part of our Arts and Humanities Research Council funded project. Each standalone session provided instruction in using the software tool AntConc and approaches from computational linguistics for the purposes of examining catalogue data. The objectives of the sessions were twofold: to test our in-development training materials, and to seek feedback from the community in order to better understand their needs and to develop our training offer.

Rather than host open public training, we decided to foster existing partnerships by inviting a small number of individuals drawn from attendees at events hosted as part of our previous Curatorial Voice project (funded by the British Academy). In total thirteen individuals from the UK and US took part across the two sessions, with representatives from libraries, archives, museums, and galleries.

Screenshot of the website for the lesson entitled Computational Analysis of Catalogue Data

Screenshot of the content page and timetable for the lesson
Carpentries-style lesson about analysing catalogue data in Antconc


The training was delivered in the style of a Software Carpentry workshop, drawing on their wonderful lesson templatepedagogical principles, and rapid response to moving coding and data science instruction online in light of the Covid-19 crisis (see ‘Recommendations for Teaching Carpentries Workshops Online’ and ‘Tips for Teaching Online from The Carpentries Community’). In terms of content, we started with the basics: how to get data into AntConc, the layout of AntConc, and settings in AntConc. After that we worked through two substantial modules. The first focused on how to generate, interact with, and interpret a word list, and this was followed by a module on searching, adapting, and reading concordances. The tasks and content of both modules avoided generic software instruction and instead focused on the analysis of free text catalogue fields, with attendees asked to consider what they might infer about a catalogue from its use of tense, what a high volume of capitalised words might tell us about cataloguing style, and how adverb use might be a useful proxy for the presence of controlled vocabulary.

Screenshot of three tasks and solutions in the Searching Concordances section
Tasks in the Searching Concordances section

Running Carpentries-style training over Zoom was new to me, and was - frankly - very odd. During live coding I missed hearing the clack of keyboards as people followed along in response. I missed seeing the sticky notes go up as people completed the task at hand. During exercises I missed hearing the hubbub that accompanies pair programming. And more generally, without seeing the micro-gestures of concentration, relief, frustration, and joy on the faces of learners, I felt somehow isolated as an instructor from the process of learning.

But from the feedback we received the attendees appear to have been happy. It seems we got the pace right (we assumed teaching online would be slower than face-to-face, and it was). The attendees enjoyed using AntConc and were surprised, to quote one attendees, "to see just how quickly you could draw some conclusions". The breakout rooms we used for exercises were a hit. And importantly we have a clear steer on next steps: that we should pivot to a dataset that better reflects the diversity of catalogue data (for this exercise we used a catalogue of printed images that I know very well), that learners would benefit having a list of suggested readings and resources on corpus linguistics, and that we might - to quote one attendee - provide "more examples up front of the kinds of finished research that has leveraged this style of analysis".

These comments and more will feed into the development of our training materials, which we hope to complete by the end of 2020 and - in line with the open values of the project - is happening in public. In the meantime, the materials are there for the community to use, adapt and build on (more or less) as they wish. Should you take a look and have any thoughts on what we might change or include for the final version, we always appreciate an email or a note on our issue tracker.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library that is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

07 July 2020

Readings at the intersection of digital scholarship and anti-racism

Add comment

Digital Curator Mia Ridge writes, 'It seems a good moment to share some of the articles we've discussed as a primer on how and why technologies and working practices in libraries and digital scholarship are not neutral'.

'Do the best you can until you know better. Then when you know better, do better.'

― Attributed to Maya Angelou 

The Digital Scholarship Reading Group is one of the ways the Digital Research team help British Library staff grapple with emerging technologies and methods that could be used in research and scholarship with collections. Understanding the impact of the biases that new technologies such as AI and machine learning can introduce through algorithmic or data sourcing decisions has been an important aspect of these discussions since the group was founded in 2016. As we began work on what would eventually become the Living with Machines project, our readings became particularly focused on AI and data science, aiming to ensure that we didn't do more harm than good.

Reading is only the start of the anti-racism work we need to do. However, reading and discussing together, and bringing the resulting ideas and questions into discussions about procuring, implementing and prioritising digital platforms in cultural and research institutions is a relatively easy next step.

I've listed the topics under the dates we discussed them, and sometimes added a brief note on how it is relevant to intersectional issues of gender, racism and digital scholarship or commercial digital methods and tools. We always have more to learn about these issues, so we'd love to hear your recommendations for articles or topics (contact details here).


Digitizing and Enhancing Description Across Collections to Make African American Materials More Discoverable on Umbra Search African American History by Dorothy Berry

Abstract: This case study describes a project undertaken at the University of Minnesota Libraries to digitize materials related to African American materials across the Universities holdings, and to highlight materials that are otherwise undiscoverable in existing archival collections. It explores how historical and current archival practices marginalize material relevant to African American history and culture, and how a mass digitization process can attempt to highlight and re-aggregate those materials. The details of the aggregation process — e.g. the need to use standardized vocabularies to increase aggregation even when those standardized vocabularies privilege majority representation — also reveal important issues in mass digitization and aggregation projects involving the history of marginalized groups.

Discussed June 2020.

The Nightmare of Surveillance Capitalism, Shoshana Zuboff

For this Reading Group Session, we will be doing something a little different and discussing a podcast on The Nightmare of Surveillance Capitalism. This podcast is hosted by Talking Politics, and is a discussion with Shoshana Zuboff who has recently published The Age of Surveillance Capitalism (January, 2019). 

For those of you who would also like to bring some reading to the table, we can also consult the reviews of this book  as a way of engaging with reactions to the topic. Listed below are a few examples, but please bring along any reviews that you find to be especially thought provoking:

Discussed November 2019. Computational or algorithmic 'surveillance' and capitalism have clear links to structural inequalities. 

You and AI – Just An Engineer: The Politics of AI (video), Kate Crawford

Kate Crawford, Distinguished Research Professor at New York University, a Principal Researcher at Microsoft Research New York, and the co-founder and co-director the AI Now Institute, discusses the biases built into machine learning, and what that means for the social implications of AI. The talk is the fourth event in the Royal Society’s 2018 series: You and AI. 

Discussed October 2018.

'Facial Recognition Is Accurate, if You’re a White Guy'

Read or watch any one of:

'Facial Recognition Is Accurate, if You’re a White Guy' By Steve Lohr

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification by Joy Buolamwini, Timnit Gebru

Abstract: Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.

How I'm fighting bias in algorithms (TED Talk) by Joy Buolamwini

Abstract: MIT grad student Joy Buolamwini was working with facial analysis software when she noticed a problem: the software didn't detect her face -- because the people who coded the algorithm hadn't taught it to identify a broad range of skin tones and facial structures. Now she's on a mission to fight bias in machine learning, a phenomenon she calls the "coded gaze." It's an eye-opening talk about the need for accountability in coding ... as algorithms take over more and more aspects of our lives.

Discussed April 2018, topic suggested by Adam Farquhar.

Feminist Research Practices and Digital Archives, Michelle Moravec

Abstract: In this article I reflect on the process of conducting historical research in digital archives from a feminist perspective. After reviewing issues that arose in conjunction with the British Library’s digitisation of the feminist magazine Spare Rib in 2013, I offer three questions researchers should consider before consulting materials in a digital archive. Have the individuals whose work appears in these materials consented to this? Whose labour was used and how is it acknowledged? What absences must be attended to among an abundance of materials? Finally, I suggest that researchers should draw on the existing body of scholarship about these issues by librarians and archivists.

Discussed October 2017.

Pedagogies of Race: Digital Humanities in the Age of Ferguson by Amy E. Earhart, Toniesha L. Taylor.

From their introduction: 'we are also invested in the development of a practice-based digital humanities that attends to the crucial issues of race, class, gender, and sexuality in the undergraduate classroom and beyond. Our White Violence, Black Resistance project merges foundational digital humanities approaches with issues of social justice by engaging students and the community in digitizing and interpreting historical moments of racial conflict. The project exemplifies an activist model of grassroots recovery that brings to light timely historical documents at the same time that it exposes power differentials in our own institutional settings and reveals the continued racial violence spanning 1868 Millican, Texas, to 2014 Ferguson, Missouri.'

Discussed August 2017.

Recovering Women’s History with Network Analysis: A Case Study of the Fabian News, Jana Smith Elford

Abstract: Literary study in the digital humanities is not exempt from reproducing historical hierarchies by focusing on major or canonical figures who have already been recognized as important historical or literary figures. However, network analysis of periodical publications may offer an alternative to the biases of human memory, where one has the tendency to pay attention to a recognizable name, rather than one that has had no historical significance. It thus enables researchers to see connections and a wealth of data that has been obscured by traditional recovery methodologies. Machine reading with network analysis can therefore contribute to an alternate understanding of women’s history, one that reinterprets cultural and literary histories that tend to reconstruct gender-based biases. This paper uses network analysis to explore the Fabian News, a late nineteenth-century periodical newsletter produced by the socialist Fabian Society, to recover women activists committed to social and political equality.

Discussed July 2017.

Do Artifacts Have Politics? by Langdon Winner

From the introduction: At issue is the claim that the machines, structures, and systems of modern material culture can be accurately judged not only for their contributions of efficiency and productivity, not merely for their positive and negative environmental side effects, but also for the ways in which they can embody specific forms of power and authority.

Discussed April 2017. A classic text from 1980 that describes how seemingly simple design factors can contribute to structural inequalities.

Critical Questions for Big Data by Danah Boyd & Kate Crawford

Abstract: Diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article,we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology that provokes extensive utopian and dystopian rhetoric.

Discussed August 2016, suggested by Aquiles Alencar Brayner.

 

This blog post is by Mia Ridge, Digital Curator for Western Heritage Collections and Co-Investigator for Living with Machines. She's on twitter at @mia_out.

06 July 2020

Archivists, Stop Wasting Your Ref-ing Time!

Add comment

“I didn’t get where I am today by manually creating individual catalogue references for thousands of archival records!”

One of the most laborious yet necessary tasks of an archivist is the generation of catalogue references. This was once the bane of my life. But I now have a technological solution, which anyone can download and use for free.

Animated image showing Reference Generator being abbreviated to ReG

Meet ReG: the newest team member of the Endangered Archives Programme (EAP). He’s not as entertaining as Reginald D Hunter. She’s not as lyrical as Regina Spektor. But like 1970s sitcom character Reggie Perrin, ReG provides a logical solution to the daily grind of office life - though less extreme and hopefully more successful.

 

Two pictures of musicians, Reginald Hunter and Regina Spektor
Reginald D Hunter (left),  [Image originally posted by Pete Ashton at https://flickr.com/photos/51035602859@N01/187673692]; Regina Spektor (right), [Image originally posted by Beny Shlevich at https://www.flickr.com/photos/17088109@N00/417238523]

 

Reggie Perrin’s boss CJ was famed for his “I didn’t get where I am today” catchphrase, and as EAP’s resident GJ, I decided to employ my own ReG, without whom I wouldn’t be where I am today. Rather than writing this blog, my eyes would be drowning in metadata, my mind gathering dust, and my ears fleeing from the sound of colleagues and collaborators banging on my door, demanding to know why I’m so far behind in my work.

 

Image of two men at their offices from British sitcom The Rise and Fall of Reginald Perrin
CJ (left) [http://www.leonardrossiter.com/reginaldperrin/12044.jpg] and Reginald Perrin (right) [https://www.imdb.com/title/tt0073990/mediaviewer/rm1649999872] from The Rise and Fall of Reginald Perrin.

 

The problem

EAP metadata is created in spreadsheets by digitisation teams all over the world. It is then processed by the EAP team in London and ingested into the British Library’s cataloguing system.

When I joined EAP in 2018 one of the first projects to process was the Barbados Mercury and Bridgetown Gazette. It took days to create all of the catalogue references for this large newspaper collection, which spans more than 60 years.

Microsoft Excel’s fill down feature helped automate part of this task, but repeating this for thousands of rows is time-consuming and error-prone.

Animated image displaying the autofill procedure being carried out

I needed to find a solution to this.

During 2019 I established new workflows to semi-automate several aspects of the cataloguing process using OpenRefine - but OpenRefine is primarily a data cleaning tool, and its difficulty in understanding hierarchical relationships meant that it was not suitable for this task.

 

Learning to code

For some time I toyed with the idea of learning to write computer code using the Python programming language. I dabbled with free online tutorials. But it was tough to make practical sense of these generic tutorials, hard to find time, and my motivation dwindled.

When the British Library teamed up with The National Archives and Birkbeck University of London to launch a PG Cert in Computing for Information Professionals, I jumped at the chance to take part in the trial run.

It was a leap certainly worth taking because I now have the skills to write code for the purpose of transforming and analysing large volumes of data. And the first product of this new skillset is a computer program that accurately generates catalogue references for thousands of rows of data in mere seconds.

 

The solution - ReG in action

By coincidence, one of the first projects I needed to catalogue after creating this program was another Caribbean newspaper digitised by the same team at the Barbados Archives Department: The Barbadian.

This collection was a similar size and structure to the Barbados Mercury, but the generation of all the catalogue references took just a few seconds. All I needed to do was:

  • Open ReG
  • Enter the project ID for the collection (reference prefix)
  • Enter the filename of the spreadsheet containing the metadata

Animated image showing ReG working to file references

And Bingo! All my references were generated in a new file..

Before and After image explaining 'In just a few seconds, the following transformation took place in the 'Reference' column' showing the new reference names

 

How it works in a nutshell

The basic principle of the program is that it reads a single column in the dataset, which contains the hierarchical information. In the example above, it read the “Level” column.

It then uses this information to calculate the structured numbering of the catalogue references, which it populates in the “Reference” column.

 

Reference format

The generated references conform to the following format:

  • Each reference begins with a prefix that is common to the whole dataset. This is the prefix that the user enters at the start of the program. In the example above, that is “EAP1251”.
  • Forward slashes ( / ) are used to indicate a new hierarchical level.
  • Each record is assigned its own number relative to its sibling records, and that number is shared with all of the children of that record.

 

In the example above, the reference for the first collection is formatted:

Image showing how the reference works: 'EAP1251/1' is the first series

The reference for the first series of the first collection is formatted:

Image showing how the reference works: 'EAP1251/1/1' is the first series of the first collection

The reference for the second series of the first collection is:

Image showing how the reference works: 'EAP1251/1/2' is the second series of the first collection

No matter how complex the hierarchical structure of the dataset, the program will quickly and accurately generate references for every record in accordance with this format.

 

Download for wider re-use

While ReG was designed primarily for use by EAP, it should work for anyone that generates reference numbers using the same format.

For users of the Calm cataloguing software, ReG could be used to complete the “RefNo” column, which determines the tree structure of a collection when a spreadsheet is ingested into Calm.

With wider re-use in mind, some settings can be configured to suit individual requirements

For example, you can configure the names of the columns that ReG reads and generates references in. For EAP, the reference generation column is named “Reference”, but for Calm users, it could be configured as “RefNo”.

Users can also configure their own hierarchy. You have complete freedom to set the hierarchical terms applicable to your institution and complete freedom to set the hierarchical order of those terms.

It is possible that some minor EAP idiosyncrasies might preclude reuse of this program for some users. If this is the case, by all means get in touch; perhaps I can tweak the code to make it more applicable to users beyond EAP - though some tweaks may be more feasible than others.

 

Additional validation features

While generating references is the core function, to that end it includes several validation features to help you spot and correct problems with your data.

Unexpected item in the hierarchy area

For catalogue references to be calculated, all the data in the level column must match a term within the configured hierarchy. The program therefore checks this and if a discrepancy is found, users will be notified and they have two options to proceed.

Option 1: Rename unexpected terms

First, users have the option to rename any unexpected terms. This is useful for correcting typographical errors, such as this example - where “Files” should be “File”.

Animated image showing option 1: renaming unexpected 'files' to 'file'

Before and after image showing the change of 'files' to 'file'

Option 2: Build a one-off hierarchy

Alternatively, users can create a one-off hierarchy that matches the terms in the dataset. In the following example, the unexpected hierarchical term “Specimen” is a bona fide term. It is just not part of the configured hierarchy.

Rather than force the user to quit the program and amend the configuration file, they can simply establish a new, one-off hierarchy within the program.

Animated image showing option 2: adding 'specimen' to the hierarchy under 'file'

This hierarchy will not be saved for future instances. It is just used for this one-off occasion. If the user wants “Specimen” to be recognised in the future, the configuration file will also need to be updated.

 

Single child records

To avoid redundant information, it is sometimes advisable for an archivist to eliminate single child records from a collection. ReG will identify any such records, notify the user, and give them three options to proceed:

  1. Delete single child records
  2. Delete the parents of single child records
  3. Keep the single child records and/or their parents

Depending on how the user chooses to proceed, ReG will produce one of three results, which affects the rows that remain and the structure of the generated references.

In this example, the third series in the original dataset contains a single child - a single file.

Image showing the three possible outcomes to a single child record: A. delete child so it appears just as a series, B. delete parent so it appears just as a file, and C. keep the child record and their parents so it appears as a series followed by a single file

The most notable result is option B, where the parent was deleted. Looking at the “Level” column, the single child now appears to be a sibling of the files from the second series. But the reference number indicates that this file is part of a different branch within the tree structure.

This is more clearly illustrated by the following tree diagrams.

Image showing a tree hierarchy of the three possible outcomes for a single child record: A. a childless series, B. a file at the same level as other series, C. a series with a single child file

This functionality means that ReG will help you spot any single child records that you may otherwise have been unaware of.

But it also gives you a means of creating an appropriate hierarchical structure when cataloguing in a spreadsheet. If you intentionally insert dummy parents for single child records, ReG can generate references that map the appropriate tree structure and then remove the dummy parent records in one seamless process.

 

And finally ...

If you’ve got this far, you probably recognise the problem and have at least a passing interest in finding a solution. If so, please feel free to download the software, give it a go, and get in touch.

If you spot any problems, or have any suggested enhancements, I would welcome your input. You certainly won’t be wasting my time - and you might just save some of yours.

 

Download links

For making this possible, I am particularly thankful to Jody Butterworth, Sam van Schaik, Nora McGregor, Stelios Sotiriadis, and Peter Wood.

This blog post is by Dr Graham Jevon, Endangered Archives Programme cataloguer. He is on twitter as @GJHistory.

12 June 2020

Making Watermarks Visible: A Collaborative Project between Conservation and Imaging

Add comment

Some of the earliest documents being digitised by the British Library Qatar Foundation Partnership are a series of ship’s journals dating from 1605 - 1705, relating to the East India Company’s voyages. Whilst working with these documents, conservators Heather Murphy and Camille Dekeyser-Thuet noticed within the papers a series of interesting examples of early watermark design. Curious about the potential information these could give regarding the journals, Camille and Heather began undertaking research, hoping to learn more about the date and provenance of the papers, trade and production patterns involved in the paper industry of the time, and the practice of watermarking paper. There is a wealth of valuable and interesting information to be gained from the study of watermarks, especially within a project such as the BLQFP which provides the opportunity for study within both IOR and Arabic manuscript material. We hope to publish more information relating to this online with the Qatar Digital Library in the form of Expert articles and visual content.

The first step within this project involved tracing the watermark designs with the help of a light sheet in order to begin gathering a collection of images to form the basis of further research. It was clear that in order to make the best possible use of the visual information contained within these watermarks, they would need to be imaged in a way which would make them available to audiences in both a visually appealing and academically beneficial form, beyond the capabilities of simply hand tracing the designs.

Hand tracings of the watermark designs
Hand tracings of the watermark designs

 

This began a collaboration with two members of the BLQFP imaging team, Senior Imaging Technician Jordi Clopes-Masjuan and Senior Imaging Support Technician Matt Lee, who, together with Heather and Camille, were able to devise and facilitate a method of imaging and subsequent editing which enabled new access to the designs. The next step involved the construction of a bespoke support made from Vivak (commonly used for exhibition mounts and stands). This inert plastic is both pliable and transparent, which allowed the simultaneous backlighting and support of the journal pages required to successfully capture the watermarks.

Creation of the Vivak support
Creation of the Vivak support
Imaging of pages using backlighting
Imaging of pages using backlighting
Studio setup for capturing the watermarks
Studio setup for capturing the watermarks

 

Before capturing, Jordi suggested we create two comparison images of the watermarks. This involved capturing the watermarks as they normally appear on the digitised image (almost or completely invisible), and how they appear illuminated when the page is backlit. The theory behind this was quite simple: “to obtain two consecutive images from the same folio, in the exact same position, but using a specific light set-up for each image”.

By doing so, the idea was for the first image to appear in the same way as the standard, searchable images on the QDL portal. To create these standard image captures, the studio lights were placed near the camera with incident light towards the document.

The second image was taken immediately after, but this time only backlight was used (light behind the document). In using these two different lighting techniques, the first image allowed us to see the content of the document, but the second image revealed the texture and character of the paper, including conservation marks, possible corrections to the writing, as well as the watermarks.

One unexpected occurrence during imaging was, due to the varying texture and thickness of the papers, the power of the backlight had to be re-adjusted for each watermark.

First image taken under normal lighting conditions
First image taken under normal lighting conditions 
Second image of the same page taken using backlighting
Second image of the same page taken using backlighting 

https://www.qdl.qa/en/archive/81055/vdc_100000001273.0x000342

 

Previous to our adopted approach, other imaging techniques were also investigated: 

  • Multispectral photography: by capturing the same folio under different lights (from UV to IR) the watermarks, along with other types of hidden content such as faded ink, would appear. However, it was decided that this process would take too long for the number of watermarks we were aiming to capture.
  • Light sheet: Although these types of light sheets are extremely slim and slightly flexible, we experienced some issues when trying the double capture, as on many occasions the light sheet was not flexible enough, and was “moving” the page when trying to reach the gutter (for successful final presentation of the images it was mandatory that the folio on both captures was still).

Once we had successfully captured the images, Photoshop proved vital in allowing us to increase the contrast of the watermark and make it more visible. Because every image captured was different, the approach to edit the images was also different. This required varying adjustments of levels, curves, saturation or brightness, and combining these with different fusion modes to attain the best result. In the end, the tools used were not as important as the final image. The last stage within Photoshop was for both images of the same folio to be cropped and exported with the exact same settings, allowing the comparative images to match as precisely as possible.

The next step involved creating a digital line drawing of each watermark. Matt Lee, a Senior Imaging Support Technician, imported the high-resolution image captures onto an iPad and used the Procreate drawing app to trace the watermarks with a stylus pen. To develop an approach that provided accurate and consistent results, Matt first tested brushes and experimented with line qualities and thicknesses. Selecting the Dry Ink brush, he traced the light outlines of each watermark on a separate transparent layer. The tracings were initially drawn in white to highlight the designs on paper and these were later inverted to create black line drawings that were edited and refined.

Tracing the watermarks directly from the screen of an iPad provided a level of accuracy and efficiency that would be difficult to achieve on a computer with a graphics tablet, trackpad or computer mouse. There were several challenges in tracing the watermarks from the image captures. For example, the technique employed by Jordi was very effective in highlighting the watermarks, but it also made the laid and chain lines in the paper more prominent and these would merge or overlap with the light outline of the design.

Some of the watermarks also appeared distorted, incomplete or had handwritten text on the paper which obscured the details of the design. It was important that the tracings were accurate and some gaps had to be left. However, through the drawing process, the eye began to pick out more detail and the most exciting moment was when a vague outline of a horse revealed itself to be a unicorn with inset lettering.

Vector image of unicorn watermark
Vector image of unicorn watermark

 

In total 78 drawings of varying complexity and design were made for this project. To preserve the transparent backgrounds of the drawings, they were exported first as PNG files. These were then imported into Adobe Illustrator and converted to vector drawings that can be viewed at a larger size without loss of image quality.

Vector image of watermark featuring heraldic designs(Drawing)
Vector image of watermark featuring heraldic designs

 

Once the drawings were complete, we now had three images - the ‘traditional view’ (the page as it would normally appear), the ‘translucid view’ (the same page backlit and showing the watermark) and the ‘translucid + white view’ (the translucid view plus additional overlay of the digitally traced watermark in place on the page).

Traditional view
Traditional view
Translucid view
Translucid view
Translucid view with watermark highlighted by digital tracingtranslucid+white view
Translucid view with watermark highlighted by digital tracing

 

Jordi was able to take these images and, by using a multiple slider tool, was able to display them on an offline website. This enabled us to demonstrate this tool to our team and present the watermarks in the way we had been wishing from the beginning, allowing people to both study and appreciate the designs.

Watermarks Project Animated GIF

 

This is a guest post by Heather Murphy, Conservator, Jordi Clopes-Masjuan, Senior Imaging Technician and Matt Lee, Senior Imaging Support Technician from the British Library Qatar Foundation Partnership. You can follow the British Library Qatar Foundation Partnership on Twitter at @BLQatar.

 

10 June 2020

International Conference on Interactive Digital Storytelling 2020: Call for Papers, Posters and Interactive Creative Works

Add comment

It has been heartening to see many joyful responses to our recent post featuring The British Library Simulator; an explorable, miniature, virtual version of the British Library’s building in St Pancras.

If you would like to learn more about our Emerging Formats research, which is informing our work in collecting examples of complex digital publications, including works made with Bitsy, then my colleague Giulia Carla Rossi (who built the Bitsy Library) is giving a Leeds Libraries Tech Talk on Digital Literature and Interactive Storytelling this Thursday, 11th June at 12 noon, via Zoom.

Giulia will be joined by Leeds Libraries Central Collections Manager, Rhian Isaac, who will showcase some of Leeds Libraries exciting collections, and also Izzy Bartley, Digital Learning Officer from Leeds Museums and Galleries, who will talk about her role in making collections interactive and accessible. Places are free, but please book here.

If you are a researcher, or writer/artist/maker, of experimental interactive digital stories, then you may want to check out the current call for submissions for The International Conference on Interactive Digital Storytelling (ICIDS), organised by the Association for Research in Digital Interactive Narratives, a community of academics and practitioners concerned with the advancement of all forms of interactive narrative. The deadline for proposing Research Papers, Exhibition Submissions, Posters and Demos, has been extended to the 26th June 2020, submissions can be made via the ICIDS 2020 EasyChair Site.

The ICIDS 2020 dates, 3-6 November, on a photograph of Bournemouth beach

ICIDS showcases and shares research and practice in game narrative and interactive storytelling, including the theoretical, technological, and applied design practices. It is an interdisciplinary gathering that combines computational narratology, narrative systems, storytelling technology, humanities-inspired theoretical inquiry, empirical research and artistic expression.

For 2020, the special theme is Interactive Digital Narrative Scholarship, and ICIDS will be hosted by the Department of Creative Technology of Bournemouth University (also hosts of the New Media Writing Prize, which I have blogged about previously). Their current intention is to host a mixed virtual and physical conference. They are hoping that the physical meeting will still take place, but all talks and works will also be made available virtually for those who are unable to attend physically due to the COVID-19 situation. This means that if you submit work, you will still need to register and present your ideas, but for those who are unable to travel to Bournemouth, the conference organisers will be making allowances for participants to contribute virtually.

ICIDS also includes a creative exhibition, showcasing interactive digital artworks, which for 2020 will explore the curatorial theme “Texts of Discomfort”. The exhibition call is currently seeking Interactive digital art works that generate discomfort through their form and/or their content, which may also inspire radical changes in the way we perceive the world.

Creatives are encouraged to mix technologies, narratives, points of view, to create interactive digital artworks that unsettle interactors’ assumptions by tackling the world’s global issues; and/or to create artworks that bring to a crisis interactors’ relation with language, that innovate in their way to intertwine narrative and technology. Artworks can include, but are not limited to:

  • Augmented, mixed and virtual reality works
  • Computer games
  • Interactive installations
  • Mobile and location-based works
  • Screen-based computational works
  • Web-based works
  • Webdocs and interactive films
  • Transmedia works

Submissions to the ICIDS art exhibition should be made using this form by 26th June. Any questions should be sent to icids2020arts@gmail.com. Good luck!

This post is by Digital Curator Stella Wisdom (@miss_wisdom