Digital scholarship blog

24 July 2020

Ira Aldridge In the Spotlight

In this post, Dr Mia Ridge gives a sense of why sightings of Ira Aldridge in our historical playbills collection resonate...

Ira Aldridge is one of the most popular 'celebrity spottings' shared by volunteers working with historical playbills on our In the Spotlight project. Born on this day in New York in 1807, Aldridge was the first Black actor to play a Shakespearean role in Britain.

Portrait of Aldridge by James Northcote
Portrait of Aldridge by James Northcote

Educated at the African Free School and with some experience at the African Grove Theatre in New York, the teenaged Aldridge emigrated to Britain from the US in 1824 and quickly had an impact. In 1826 painter James Northcote captured him as Othello, a portrait which became the first acquisition by the Manchester Art Gallery. (If you're reading this before August 15th, you can attend an online tour exploring his work.)

While his initial reviews were mixed, he took The Times' mocking reference to him as the 'African Roscius' and used both the references to the famous Roman actor and his African ancestry in promotional playbills. Caught up in debates about the abolition of slavery and facing racism in reviews from critics about his performances in London's theatres, Aldridge toured the regions, particularly British cities with anti-slavery sympathies. He performed a range of roles, and his Shakespearean roles eventually including Othello, Shylock, Macbeth, King Lear and Richard III.

From 1852, he toured Europe, particularly Germany, Poland and Russia. This 'List showing the theatres and plays in various European cities where Ira Aldridge, the African Roscius, acted during the years 1827-1867, compiled by Arturo Alfonso Schomburg, shows how widely he travelled and the roles he performed.

As the 1841 playbill from Doncaster's Theatre Royal (below) shows, the tale of his African ancestry grew more creative over time. The playbill also advertises a lecture and memoirs from Aldridge on various topics. In the years around the abolition of slavery in the British Empire, he spoke powerfully and directly to audiences about the injustices of slavery and racism. Playbills like this demonstrate how Aldridge managed to both pander to and play with perceptions of 'the African'.

This is necessarily a very brief overview of Aldridge's life and impact but I hope it's given you a sense of why it's so exciting to catch a glimpse of Aldridge in our collections.

Screenshot of historical playbill

Sources used and further reading include:

My thanks to everyone who suggested references for this post, in particular: Christian Algar, Naomi Billingsley, Nora McGregor, Susan Reed from the British Library; Dorothy Berry from the Houghton Library at Harvard and In the Spotlight participants including beccabooks10, Nosnibor3, Elizabeth Danskin (who shared a link to this video about his daughter, Amanda Aldridge), Nicola Hayes, and Sylvia Morris (who has written extensively about Aldridge on her blog).

Post by Mia Ridge, Digital Curator, Western Heritage Collections.

22 July 2020

World of Wikimedia

During recent months of working from home, the Wikimedia family of platforms, including Wikidata and Wikisource, have enabled many librarians and archivists to do meaningful work, to enhance and amplify access to the collections that they curate.

I’ve been very encouraged to learn from other institutions and initiatives who have been working with these platforms. So I recently invited some wonderful speakers to give a “World of Wikimedia” series of remote guest lectures for staff, to inspire my colleagues in the British Library.

Circle of logos from the Wikimedia family of platforms
Logos of the Wikimedia Family of platforms

Stuart Prior from Wikimedia UK kicked off this season with an introduction to Wikimedia and the projects within it, and how it works with galleries, libraries, archives and museums. He was followed by Dr Martin Poulter, who had been the Bodleian Library’s Wikimedian In Residence. Martin shared his knowledge of how books, authors and topics are represented in Wikidata, how Wikidata is used to drive other sites, including Wikipedia, and how Wikipedia combines data and narrative to tell the world about notable books and authors.

Continuing with the theme of books, Gavin Willshaw spoke about the benefits of using Wikisource for optical character recognition (OCR) correction and staff engagement. Giving an overview of the National Library of Scotland’s fantastic project to upload 3,000 digitised Scottish Chapbooks to Wikisource during the Covid-19 lockdown. Focusing on how the project came about, its impact, and how the Library plans to take activity in this area forward in the future.

Illustration of two 18th century men fighting with swords
Tippet is the dandy---o. The toper's advice. Picking lilies. The dying swan, shelfmark L.C.2835(14), from the National Library of Scotland's Scottish Chapbooks collection

Closing the World of Wikimedia season, Adele Vrana and Anasuya Sengupta gave an extremely thought provoking talk about Whose Knowledge? This is a global multilingual campaign, which they co-founded, to centre the knowledges of marginalised communities (the majority of the world) online. Their work includes the annual #VisibleWikiWomen campaign to make women more visible on Wikipedia, which I blogged about recently.

One of the silver linings of the covid-19 lockdown has been that I’ve been able to attend a number of virtual events, which I would not have been able to travel to, if they had been physical events. These have included LD4 Wikidata Affinity Group online meetings; which is a biweekly zoom call on Tuesdays at 9am PDT (5pm BST).

I’ve also remotely attended some excellent online training sessions: “Teaching with Wikipedia: a practical 'how to' workshop” ran by Ewan McAndrew, Wikimedian in Residence at The University of Edinburgh. Also “Wikimedia and Libraries - Running Online Workshops” organised by the Chartered Institute of Library and Information Professionals in Scotland (CILIPS), presented by Dr Sara Thomas, Scotland Programme Coordinator for Wikimedia UK, and previously the Wikimedian in Residence at the Scottish Library and Information Council. From attending the latter, I learned of an online “How to Add Suffragettes & Women Activists to Wikipedia” half day edit-a-thon event taking place on the 4th July organised by Sara, Dr t s Beall and Clare Thompson from the Protests and Suffragettes project, this is a wonderful project, which recovers and celebrates the histories of women activists in Govan, Glasgow.

We have previously held a number of in person Wikipedia edit-a-thon events at the British Library, but this was the first time that I had attended one remotely, via Zoom, so this was a new experience for me. I was very impressed with how it had been organised, using break out rooms for newbies and more experienced editors, including multiple short comfort breaks into the schedule and having very do-able bite size tasks, which were achievable in the time available. They used a comprehensive, but easy to understand, shared spreadsheet for managing the tasks that attendees were working on. This is definitely an approach and a template that I plan to adopt and adapt for any future edit-a-thons I am involved in planning.

Furthermore, it was a very fun and friendly event, the organisers had created We Can [edit]! Zoom background template images for attendees to use, and I learned how to use twinkles on videocalls! This is when attendees raise both hands and wiggle their fingers pointing upwards, to indicate agreement with what is being said, without causing a soundclash. This hand signal has been borrowed it from the American Sign Language word for applause, it is also used by the Green Party and the Occupy Movement.

With enthusiasm fired up from my recent edit-a-thon attending experience, last Saturday I joined the online Wikimedia UK 2020 AGM. Lucy Crompton-Reid, Chief Executive of Wikimedia UK, gave updates on changes in the global Wikimedia movement, such as implementing the 2030 strategy, rebranding Wikimedia, the Universal Code of Conduct and plans for Wikipedia’s 20th birthday. Lucy also announced that three trustees Kelly Foster, Nick Poole and Doug Taylor, who stood for the board were all elected. Nick and Doug have both been on the board since July 2015 and were re-elected. I was delighted to learn that Kelly is a new trustee joining the board for the first time. As Kelly has previously been a trainer at BL Wikipedia edit-a-thon events, and she coached me to create my first Wikipedia article on Coventry godcakes at a Wiki-Food and (mostly) Women edit-a-thon in 2017.

In addition to these updates, Gavin Willshaw, gave a keynote presentation about the NLS Scottish chapbooks Wikisource project that I mentioned earlier, and there were three lightning talks: Andy Mabbett; 'Wiki Hates Newbies', Clare Thompson, Lesley Mitchell and Dr t s Beall; 'Protests and Suffragettes: Highlighting 100 years of women’s activism in Govan, Glasgow, Scotland' and Jason Evans; 'An update from Wales'.

Before the event ended, there was a 2020 Wikimedia UK annual awards announcement, where libraries and librarians did very well indeed:

  • UK Wikimedian of the Year was awarded to librarian Caroline Ball for education work and advocacy at the University of Derby (do admire her amazing Wikipedia dress in the embedded tweet below!)
  • Honourable Mention to Ian Watt for outreach work, training, and efforts around Scotland's COVID-19 data
  • Partnership of the Year was given to National Library of Scotland for the WikiSource chapbooks project led by Gavin Willshaw
  • Honourable Mention to University of Edinburgh for work in education and Wikidata
  • Up and Coming Wikimedian was a joint win to Emma Carroll for work on the Scottish Witch data project and Laura Wood Rose for work at University of Edinburgh and on the Women in Red initiative
  • Michael Maggs was given an Honorary Membership, in recognition of his very significant contribution to the charity over a number of years.

Big congratulations to all the winners. Their fantastic work, and also in Caroline's case, her fashion sense, is inspirational!

For anyone interested, the next online event that I’m planning to attend is a #WCCWiki Colloquium organised by The Women’s Classical Committee, which aims to increase the representation of women classicists on Wikipedia. Maybe I’ll virtually see you there…

This post is by Digital Curator Stella Wisdom (@miss_wisdom

14 July 2020

Legacies of Catalogue Descriptions and Curatorial Voice: Training Sessions

This guest post is by James Baker, Senior Lecturer in Digital History and Archives at the University of Sussex.

This month the team behind "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" ran two training sessions as part of our Arts and Humanities Research Council funded project. Each standalone session provided instruction in using the software tool AntConc and approaches from computational linguistics for the purposes of examining catalogue data. The objectives of the sessions were twofold: to test our in-development training materials, and to seek feedback from the community in order to better understand their needs and to develop our training offer.

Rather than host open public training, we decided to foster existing partnerships by inviting a small number of individuals drawn from attendees at events hosted as part of our previous Curatorial Voice project (funded by the British Academy). In total thirteen individuals from the UK and US took part across the two sessions, with representatives from libraries, archives, museums, and galleries.

Screenshot of the website for the lesson entitled Computational Analysis of Catalogue Data

Screenshot of the content page and timetable for the lesson
Carpentries-style lesson about analysing catalogue data in Antconc


The training was delivered in the style of a Software Carpentry workshop, drawing on their wonderful lesson templatepedagogical principles, and rapid response to moving coding and data science instruction online in light of the Covid-19 crisis (see ‘Recommendations for Teaching Carpentries Workshops Online’ and ‘Tips for Teaching Online from The Carpentries Community’). In terms of content, we started with the basics: how to get data into AntConc, the layout of AntConc, and settings in AntConc. After that we worked through two substantial modules. The first focused on how to generate, interact with, and interpret a word list, and this was followed by a module on searching, adapting, and reading concordances. The tasks and content of both modules avoided generic software instruction and instead focused on the analysis of free text catalogue fields, with attendees asked to consider what they might infer about a catalogue from its use of tense, what a high volume of capitalised words might tell us about cataloguing style, and how adverb use might be a useful proxy for the presence of controlled vocabulary.

Screenshot of three tasks and solutions in the Searching Concordances section
Tasks in the Searching Concordances section

Running Carpentries-style training over Zoom was new to me, and was - frankly - very odd. During live coding I missed hearing the clack of keyboards as people followed along in response. I missed seeing the sticky notes go up as people completed the task at hand. During exercises I missed hearing the hubbub that accompanies pair programming. And more generally, without seeing the micro-gestures of concentration, relief, frustration, and joy on the faces of learners, I felt somehow isolated as an instructor from the process of learning.

But from the feedback we received the attendees appear to have been happy. It seems we got the pace right (we assumed teaching online would be slower than face-to-face, and it was). The attendees enjoyed using AntConc and were surprised, to quote one attendees, "to see just how quickly you could draw some conclusions". The breakout rooms we used for exercises were a hit. And importantly we have a clear steer on next steps: that we should pivot to a dataset that better reflects the diversity of catalogue data (for this exercise we used a catalogue of printed images that I know very well), that learners would benefit having a list of suggested readings and resources on corpus linguistics, and that we might - to quote one attendee - provide "more examples up front of the kinds of finished research that has leveraged this style of analysis".

These comments and more will feed into the development of our training materials, which we hope to complete by the end of 2020 and - in line with the open values of the project - is happening in public. In the meantime, the materials are there for the community to use, adapt and build on (more or less) as they wish. Should you take a look and have any thoughts on what we might change or include for the final version, we always appreciate an email or a note on our issue tracker.

"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" is a collaboration between the Sussex Humanities Lab, the British Library, and Yale University Library that is funded under the Arts and Humanities Research Council (UK) “UK-US Collaboration for Digital Scholarship in Cultural Institutions: Partnership Development Grants” scheme. Project Reference AH/T013036/1.

07 July 2020

Readings at the intersection of digital scholarship and anti-racism

Digital Curator Mia Ridge writes, 'It seems a good moment to share some of the articles we've discussed as a primer on how and why technologies and working practices in libraries and digital scholarship are not neutral'.

'Do the best you can until you know better. Then when you know better, do better.'

― Attributed to Maya Angelou 

The Digital Scholarship Reading Group is one of the ways the Digital Research team help British Library staff grapple with emerging technologies and methods that could be used in research and scholarship with collections. Understanding the impact of the biases that new technologies such as AI and machine learning can introduce through algorithmic or data sourcing decisions has been an important aspect of these discussions since the group was founded in 2016. As we began work on what would eventually become the Living with Machines project, our readings became particularly focused on AI and data science, aiming to ensure that we didn't do more harm than good.

Reading is only the start of the anti-racism work we need to do. However, reading and discussing together, and bringing the resulting ideas and questions into discussions about procuring, implementing and prioritising digital platforms in cultural and research institutions is a relatively easy next step.

I've listed the topics under the dates we discussed them, and sometimes added a brief note on how it is relevant to intersectional issues of gender, racism and digital scholarship or commercial digital methods and tools. We always have more to learn about these issues, so we'd love to hear your recommendations for articles or topics (contact details here).


Digitizing and Enhancing Description Across Collections to Make African American Materials More Discoverable on Umbra Search African American History by Dorothy Berry

Abstract: This case study describes a project undertaken at the University of Minnesota Libraries to digitize materials related to African American materials across the Universities holdings, and to highlight materials that are otherwise undiscoverable in existing archival collections. It explores how historical and current archival practices marginalize material relevant to African American history and culture, and how a mass digitization process can attempt to highlight and re-aggregate those materials. The details of the aggregation process — e.g. the need to use standardized vocabularies to increase aggregation even when those standardized vocabularies privilege majority representation — also reveal important issues in mass digitization and aggregation projects involving the history of marginalized groups.

Discussed June 2020.

The Nightmare of Surveillance Capitalism, Shoshana Zuboff

For this Reading Group Session, we will be doing something a little different and discussing a podcast on The Nightmare of Surveillance Capitalism. This podcast is hosted by Talking Politics, and is a discussion with Shoshana Zuboff who has recently published The Age of Surveillance Capitalism (January, 2019). 

For those of you who would also like to bring some reading to the table, we can also consult the reviews of this book  as a way of engaging with reactions to the topic. Listed below are a few examples, but please bring along any reviews that you find to be especially thought provoking:

Discussed November 2019. Computational or algorithmic 'surveillance' and capitalism have clear links to structural inequalities. 

You and AI – Just An Engineer: The Politics of AI (video), Kate Crawford

Kate Crawford, Distinguished Research Professor at New York University, a Principal Researcher at Microsoft Research New York, and the co-founder and co-director the AI Now Institute, discusses the biases built into machine learning, and what that means for the social implications of AI. The talk is the fourth event in the Royal Society’s 2018 series: You and AI. 

Discussed October 2018.

'Facial Recognition Is Accurate, if You’re a White Guy'

Read or watch any one of:

'Facial Recognition Is Accurate, if You’re a White Guy' By Steve Lohr

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification by Joy Buolamwini, Timnit Gebru

Abstract: Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.

How I'm fighting bias in algorithms (TED Talk) by Joy Buolamwini

Abstract: MIT grad student Joy Buolamwini was working with facial analysis software when she noticed a problem: the software didn't detect her face -- because the people who coded the algorithm hadn't taught it to identify a broad range of skin tones and facial structures. Now she's on a mission to fight bias in machine learning, a phenomenon she calls the "coded gaze." It's an eye-opening talk about the need for accountability in coding ... as algorithms take over more and more aspects of our lives.

Discussed April 2018, topic suggested by Adam Farquhar.

Feminist Research Practices and Digital Archives, Michelle Moravec

Abstract: In this article I reflect on the process of conducting historical research in digital archives from a feminist perspective. After reviewing issues that arose in conjunction with the British Library’s digitisation of the feminist magazine Spare Rib in 2013, I offer three questions researchers should consider before consulting materials in a digital archive. Have the individuals whose work appears in these materials consented to this? Whose labour was used and how is it acknowledged? What absences must be attended to among an abundance of materials? Finally, I suggest that researchers should draw on the existing body of scholarship about these issues by librarians and archivists.

Discussed October 2017.

Pedagogies of Race: Digital Humanities in the Age of Ferguson by Amy E. Earhart, Toniesha L. Taylor.

From their introduction: 'we are also invested in the development of a practice-based digital humanities that attends to the crucial issues of race, class, gender, and sexuality in the undergraduate classroom and beyond. Our White Violence, Black Resistance project merges foundational digital humanities approaches with issues of social justice by engaging students and the community in digitizing and interpreting historical moments of racial conflict. The project exemplifies an activist model of grassroots recovery that brings to light timely historical documents at the same time that it exposes power differentials in our own institutional settings and reveals the continued racial violence spanning 1868 Millican, Texas, to 2014 Ferguson, Missouri.'

Discussed August 2017.

Recovering Women’s History with Network Analysis: A Case Study of the Fabian News, Jana Smith Elford

Abstract: Literary study in the digital humanities is not exempt from reproducing historical hierarchies by focusing on major or canonical figures who have already been recognized as important historical or literary figures. However, network analysis of periodical publications may offer an alternative to the biases of human memory, where one has the tendency to pay attention to a recognizable name, rather than one that has had no historical significance. It thus enables researchers to see connections and a wealth of data that has been obscured by traditional recovery methodologies. Machine reading with network analysis can therefore contribute to an alternate understanding of women’s history, one that reinterprets cultural and literary histories that tend to reconstruct gender-based biases. This paper uses network analysis to explore the Fabian News, a late nineteenth-century periodical newsletter produced by the socialist Fabian Society, to recover women activists committed to social and political equality.

Discussed July 2017.

Do Artifacts Have Politics? by Langdon Winner

From the introduction: At issue is the claim that the machines, structures, and systems of modern material culture can be accurately judged not only for their contributions of efficiency and productivity, not merely for their positive and negative environmental side effects, but also for the ways in which they can embody specific forms of power and authority.

Discussed April 2017. A classic text from 1980 that describes how seemingly simple design factors can contribute to structural inequalities.

Critical Questions for Big Data by Danah Boyd & Kate Crawford

Abstract: Diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article,we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology that provokes extensive utopian and dystopian rhetoric.

Discussed August 2016, suggested by Aquiles Alencar Brayner.

 

This blog post is by Mia Ridge, Digital Curator for Western Heritage Collections and Co-Investigator for Living with Machines. She's on twitter at @mia_out.

06 July 2020

Archivists, Stop Wasting Your Ref-ing Time!

“I didn’t get where I am today by manually creating individual catalogue references for thousands of archival records!”

One of the most laborious yet necessary tasks of an archivist is the generation of catalogue references. This was once the bane of my life. But I now have a technological solution, which anyone can download and use for free.

Animated image showing Reference Generator being abbreviated to ReG

Meet ReG: the newest team member of the Endangered Archives Programme (EAP). He’s not as entertaining as Reginald D Hunter. She’s not as lyrical as Regina Spektor. But like 1970s sitcom character Reggie Perrin, ReG provides a logical solution to the daily grind of office life - though less extreme and hopefully more successful.

 

Two pictures of musicians, Reginald Hunter and Regina Spektor
Reginald D Hunter (left),  [Image originally posted by Pete Ashton at https://flickr.com/photos/51035602859@N01/187673692]; Regina Spektor (right), [Image originally posted by Beny Shlevich at https://www.flickr.com/photos/17088109@N00/417238523]

 

Reggie Perrin’s boss CJ was famed for his “I didn’t get where I am today” catchphrase, and as EAP’s resident GJ, I decided to employ my own ReG, without whom I wouldn’t be where I am today. Rather than writing this blog, my eyes would be drowning in metadata, my mind gathering dust, and my ears fleeing from the sound of colleagues and collaborators banging on my door, demanding to know why I’m so far behind in my work.

 

Image of two men at their offices from British sitcom The Rise and Fall of Reginald Perrin
CJ (left) [http://www.leonardrossiter.com/reginaldperrin/12044.jpg] and Reginald Perrin (right) [https://www.imdb.com/title/tt0073990/mediaviewer/rm1649999872] from The Rise and Fall of Reginald Perrin.

 

The problem

EAP metadata is created in spreadsheets by digitisation teams all over the world. It is then processed by the EAP team in London and ingested into the British Library’s cataloguing system.

When I joined EAP in 2018 one of the first projects to process was the Barbados Mercury and Bridgetown Gazette. It took days to create all of the catalogue references for this large newspaper collection, which spans more than 60 years.

Microsoft Excel’s fill down feature helped automate part of this task, but repeating this for thousands of rows is time-consuming and error-prone.

Animated image displaying the autofill procedure being carried out

I needed to find a solution to this.

During 2019 I established new workflows to semi-automate several aspects of the cataloguing process using OpenRefine - but OpenRefine is primarily a data cleaning tool, and its difficulty in understanding hierarchical relationships meant that it was not suitable for this task.

 

Learning to code

For some time I toyed with the idea of learning to write computer code using the Python programming language. I dabbled with free online tutorials. But it was tough to make practical sense of these generic tutorials, hard to find time, and my motivation dwindled.

When the British Library teamed up with The National Archives and Birkbeck University of London to launch a PG Cert in Computing for Information Professionals, I jumped at the chance to take part in the trial run.

It was a leap certainly worth taking because I now have the skills to write code for the purpose of transforming and analysing large volumes of data. And the first product of this new skillset is a computer program that accurately generates catalogue references for thousands of rows of data in mere seconds.

 

The solution - ReG in action

By coincidence, one of the first projects I needed to catalogue after creating this program was another Caribbean newspaper digitised by the same team at the Barbados Archives Department: The Barbadian.

This collection was a similar size and structure to the Barbados Mercury, but the generation of all the catalogue references took just a few seconds. All I needed to do was:

  • Open ReG
  • Enter the project ID for the collection (reference prefix)
  • Enter the filename of the spreadsheet containing the metadata

Animated image showing ReG working to file references

And Bingo! All my references were generated in a new file..

Before and After image explaining 'In just a few seconds, the following transformation took place in the 'Reference' column' showing the new reference names

 

How it works in a nutshell

The basic principle of the program is that it reads a single column in the dataset, which contains the hierarchical information. In the example above, it read the “Level” column.

It then uses this information to calculate the structured numbering of the catalogue references, which it populates in the “Reference” column.

 

Reference format

The generated references conform to the following format:

  • Each reference begins with a prefix that is common to the whole dataset. This is the prefix that the user enters at the start of the program. In the example above, that is “EAP1251”.
  • Forward slashes ( / ) are used to indicate a new hierarchical level.
  • Each record is assigned its own number relative to its sibling records, and that number is shared with all of the children of that record.

 

In the example above, the reference for the first collection is formatted:

Image showing how the reference works: 'EAP1251/1' is the first series

The reference for the first series of the first collection is formatted:

Image showing how the reference works: 'EAP1251/1/1' is the first series of the first collection

The reference for the second series of the first collection is:

Image showing how the reference works: 'EAP1251/1/2' is the second series of the first collection

No matter how complex the hierarchical structure of the dataset, the program will quickly and accurately generate references for every record in accordance with this format.

 

Download for wider re-use

While ReG was designed primarily for use by EAP, it should work for anyone that generates reference numbers using the same format.

For users of the Calm cataloguing software, ReG could be used to complete the “RefNo” column, which determines the tree structure of a collection when a spreadsheet is ingested into Calm.

With wider re-use in mind, some settings can be configured to suit individual requirements

For example, you can configure the names of the columns that ReG reads and generates references in. For EAP, the reference generation column is named “Reference”, but for Calm users, it could be configured as “RefNo”.

Users can also configure their own hierarchy. You have complete freedom to set the hierarchical terms applicable to your institution and complete freedom to set the hierarchical order of those terms.

It is possible that some minor EAP idiosyncrasies might preclude reuse of this program for some users. If this is the case, by all means get in touch; perhaps I can tweak the code to make it more applicable to users beyond EAP - though some tweaks may be more feasible than others.

 

Additional validation features

While generating references is the core function, to that end it includes several validation features to help you spot and correct problems with your data.

Unexpected item in the hierarchy area

For catalogue references to be calculated, all the data in the level column must match a term within the configured hierarchy. The program therefore checks this and if a discrepancy is found, users will be notified and they have two options to proceed.

Option 1: Rename unexpected terms

First, users have the option to rename any unexpected terms. This is useful for correcting typographical errors, such as this example - where “Files” should be “File”.

Animated image showing option 1: renaming unexpected 'files' to 'file'

Before and after image showing the change of 'files' to 'file'

Option 2: Build a one-off hierarchy

Alternatively, users can create a one-off hierarchy that matches the terms in the dataset. In the following example, the unexpected hierarchical term “Specimen” is a bona fide term. It is just not part of the configured hierarchy.

Rather than force the user to quit the program and amend the configuration file, they can simply establish a new, one-off hierarchy within the program.

Animated image showing option 2: adding 'specimen' to the hierarchy under 'file'

This hierarchy will not be saved for future instances. It is just used for this one-off occasion. If the user wants “Specimen” to be recognised in the future, the configuration file will also need to be updated.

 

Single child records

To avoid redundant information, it is sometimes advisable for an archivist to eliminate single child records from a collection. ReG will identify any such records, notify the user, and give them three options to proceed:

  1. Delete single child records
  2. Delete the parents of single child records
  3. Keep the single child records and/or their parents

Depending on how the user chooses to proceed, ReG will produce one of three results, which affects the rows that remain and the structure of the generated references.

In this example, the third series in the original dataset contains a single child - a single file.

Image showing the three possible outcomes to a single child record: A. delete child so it appears just as a series, B. delete parent so it appears just as a file, and C. keep the child record and their parents so it appears as a series followed by a single file

The most notable result is option B, where the parent was deleted. Looking at the “Level” column, the single child now appears to be a sibling of the files from the second series. But the reference number indicates that this file is part of a different branch within the tree structure.

This is more clearly illustrated by the following tree diagrams.

Image showing a tree hierarchy of the three possible outcomes for a single child record: A. a childless series, B. a file at the same level as other series, C. a series with a single child file

This functionality means that ReG will help you spot any single child records that you may otherwise have been unaware of.

But it also gives you a means of creating an appropriate hierarchical structure when cataloguing in a spreadsheet. If you intentionally insert dummy parents for single child records, ReG can generate references that map the appropriate tree structure and then remove the dummy parent records in one seamless process.

 

And finally ...

If you’ve got this far, you probably recognise the problem and have at least a passing interest in finding a solution. If so, please feel free to download the software, give it a go, and get in touch.

If you spot any problems, or have any suggested enhancements, I would welcome your input. You certainly won’t be wasting my time - and you might just save some of yours.

 

Download links

For making this possible, I am particularly thankful to Jody Butterworth, Sam van Schaik, Nora McGregor, Stelios Sotiriadis, and Peter Wood.

This blog post is by Dr Graham Jevon, Endangered Archives Programme cataloguer. He is on twitter as @GJHistory.