Digital scholarship blog

26 November 2020

Using British Library Cultural Heritage Data for a Digital Humanities Research Course at the Australian National University

Posted on behalf of Terhi Nurmikko-Fuller, Senior Lecturer, Centre for Digital Humanities Research, Australian National University by Mahendra Mahey, Manager of BL Labs.

The teaching philosophy and pedagogy of the Centre for Digital Humanities Research (CDHR) at the Australian National University (ANU) focus on research-fuelled, practice-led, object-orientated learning. We value collaboration, experimentation, and individual growth, rather than adhering to standardised evaluation matrix of exams or essays. Instead, students enrolled in jointly-taught undergraduate and postgraduate courses are given a task: to innovate at the intersection of digital technologies and cultural heritage sector institutions. They are given a great degree of autonomy, and are trusted to deliver. Their aim is to create digital prototypes, which open up GLAM sector material to a new audience.

HUMN2001: Digital Humanities Theories and Projects, and its postgraduate equivalent HUMN6001 are core courses for the programs delivered from the CDHR. HUMN2001 is a compulsory course for both the Minor and the Major in Digital Humanities for the Bachelor of Arts; HUMN6001 is a core, compulsory course in the Masters of Digital Humanities and Public Culture. Initially the course structure was quite different: experts would be invited to guest lecture on their Digital Humanities projects, and the students were tasked with carrying out critical evaluations of digital resources of various kinds. What quickly became apparent, was that without experience of digital projects, the students struggled to meaningfully and thoughtfully evaluate the projects they encountered. Many focused exclusively on the user-interface; too often critical factors like funding sources were ignored; the critical evaluative context in which the students operated was greatly skewed by their experiences of tools such as Google and platforms such as Facebook.

The solution to the problem became clear - students would have to experience the process of developing digital projects themselves before they could reasonably be expected to evaluate those of others. This revelation brought on a paradigm shift in the way in which the CDHR engages with students, projects, and their cultural heritage sector collaborators.

In 2018, we reached out to colleagues at the ANU for small-scale projects for the students to complete. The chosen project was the digitisation and the creation of metadata records for a collection of glass slides that form part of the Heritage in the Limelight project. The enthusiasm, diligence, and care that the students applied to working with this external dataset (external only to the course, since this was an ANU-internal project) gave us confidence to pursue collaborations outside of our own institution. In Semester 1 of 2019, Dr Katrina Grant’s course HUMN3001/6003: Digital Humanities Methods and Practices ran in collaboration with the National Museum of Australia (NMA) to almost unforeseeable success: the NMA granted five of the top students a one-off stipend of $1,000 each, and continued working with the students on their projects, which were then added to the NMA’s Defining Moments Digital Classroom, launched in November 2020. This collaboration was featured in a piece in the ANU Reporter, the University’s internal circular. 

Encouraged by the success of Dr Grant’s course, and presented with a serendipitous opportunity to meet up at the Australasian Association for Digital Humanities (aaDH) conference in 2018 where he was giving the keynote, I reached out to Mahendra Mahey to propose a similar collaboration. In Semester 2, 2019 (July to November), HUMN2001/6001 ran in collaboration with the British Library. 

Our experiences of working with students and cultural heritage institutions in the earlier semester had highlighted some important heuristics. As a result, the delivery of HUMN2001/6001 in 2019 was much more structured than that of HUMN3001/6003 (which had offered the students more freedom and opportunity for independent research). Rather than focus on a theoretical framework per se, HUMN2001/6001 focused on the provision of transferable skills that improved the delivery and reporting of the projects, and could be cited directly in future employment opportunities as a skills-base. These included project planning and time management (such as Gantt charts and SCRUM as a form of agile project management), and each project was to be completed in groups.

The demographic set up of each group had to follow three immutable rules:

  • The first, was that each team had to be interdisciplinary, with students from more than one degree program.
  • Second, the groups had to be multilingual, and not each member of the group could have the same first language, or be monolingual in the same language.
  • Third, was that the group had to represent more than one gender.

Although not all groups strictly implemented these rules, the ones that did benefitted from the diversity and critical lens afforded by this richness of perspective to result in the top projects.

Three examples that best showcase the diversity (and the creative genius!) of these groups and their approach to the British Library’s collection include a virtual reality (VR) concert hall, a Choose-You-Own-Adventure-Game travelling through Medieval manuscripts, and an interactive treasure hunt mobile app.

Examples of student projects

(VR)2 : Virtuoso Rachmaninoff in Virtual Reality

Research Team: Angus Harden, Noppakao (Angel) Leelasorn, Mandy McLean, Jeremy Platt, and Rachel Watson

Fig. 1 Angel Leelasorn testing out (VR)2
Figure 1: Angel Leelasorn testing out (VR)2
Figure 2: Snapshots documenting the construction of (VR)2
Figure 2: Snapshots documenting the construction of (VR)2

This project is a VR experience of the grand auditorium of the Bolshoi Theatre in Moscow. It has an audio accompaniment of Sergei Rachmaninoff’s Prelude in C# Minor, Op.3, No.2, the score for which forms part of the British Library’s collection. Reflective of the personal experiences of some of the group members, the project was designed to increase awareness of mental health, and throughout the experience the user can encounter notes written by Rachmaninoff during bouts of depression. The sense of isolation is achieved by the melody playing in an empty auditorium. 

The VR experience was built using Autodesk Maya and Unreal Engine 4. The music was produced  using midi data, with each note individually entered into Logic Pro X, and finally played through Addictive Keys Studio Grand virtual instrument.

The project is available through a website with a disclosure, and links to various mental health helplines, accessible at: https://virtuosorachmaninoff.wixsite.com/vrsquared

Fantastic Bestiary

Research Team: Jared Auer, Victoria (Vick) Gwyn, Thomas Larkin, Mary (May) Poole, Wen (Raven) Ren, Ruixue (Rachel) Wu, Qian (Ariel) Zhang

Fig. 3 Homepage of A Fantastic Bestiary
Figure 3:  Homepage of A Fantastic Bestiary

This project is a bilingual Choose-Your-Own-Adventure hypertext game that engages with the Medieval manuscripts (such as Royal MS 12 C. xix. Folios 12v-13, based off the Greek Physiologus and the Etymologiae of St. Isidore of Seville) collection at the British Library, first discovered through the Turning the Pages digital feature. The project workflow included design and background research, resource development, narrative writing, animation, translation, audio recording, and web development. Not only does it open up the Medieval manuscripts to the public in an engaging and innovative way through five fully developed narratives (~2,000-3,000 words each), all the content is also available in Mandarin Chinese.

The team used a plethora of different tools, including Adobe Animate, Photoshop, Illustrator, and Audition and Audacity. The website was developed using HTML, CSS, and JavaScript in the Microsoft Visual Studio Integrated Development Environment

The project is accessible at: https://thomaslarkin7.github.io/hypertextStory/

ActionBound

Research Team: Adriano Carvalho-Mora, Conor Francis Flannery, Dion Tan, Emily Swan

Fig 4 (Left)Testing the app at the Australian National Botanical Gardens, (Middle) An example of one of the tasks to complete in ActionBound (Right) Example of sound file from the British Library (a dingo)
Figure 4: (Left) Testing the app at the Australian National Botanical Gardens, (Middle) An example of one of the tasks to complete in ActionBound (Right) Example of sound file from the British Library (a dingo)

This project is a mobile application, designed as a location-based authoring tool inspired by the Pokemon Go! augmented reality mobile game. This educational scavenger-hunt aims to educate players about endangered animals. Using sounds of endangered or extinct animals from the British Library’s collection, but geo-locating the app at the Australian National Botanical Gardens, this project is a perfect manifestation of truly global information sharing and enrichment.

The team used a range of available tools and technologies to build this Serious Game or Game-With-A-Purpose. These include GPS and other geo-locating (and geo-caching), they created QR codes to be scanned during the hunt, locations are mapped using Open Street Map

The app can be downloaded from: https://en.actionbound.com/bound/BotanicGardensExtinctionHunt

Course Assessment

Such a diverse and dynamic learning environment presents some pedagogical challenges and required a new approach to student evaluation and assessment. The obvious question here is how to fairly, objectively, and comprehensively grade such vastly different projects? Especially since not only do they differ in both methodology and data, but also in the existing level of skills within the group. The approach I took for the grading of these assignments is one that I believe will have longevity and to some extent scalability. Indeed, I have successfully applied the same rubric in the evaluation of similarly diverse projects created for the course in 2020, when run in collaboration with the National Film and Sound Archives of Australia

The assessment rubric for this course awards students on two axis: ambition and completeness. This means that projects that were not quite completed due to their scale or complexity are awarded for the vision, and the willingness of the students to push boundaries, do new things, and take on a challenge. The grading system allows for four possible outcomes: a High Distinction (for 80% or higher), Distinction (70-79%), Credit (60-69%), and Pass (50-59%). Projects which are ambitious and completed to a significant extent land in the 80s; projects that are either ambitious but not fully developed, or relatively simple but completed receive marks in the 70s; those that very literally engaged with the material, implemented a technologically straightforward solution (such as building a website using WordPress or Wix, or using one of the suite of tools from Northwestern University’s Knightlab) were awarded marks in the 60s. Students were also rewarded for engaging with tools and technologies they had no prior knowledge of. Furthermore, in week 10 of a 12 week course, we ran a Digital Humanities Expo! Event, in which the students showcased their projects and received user-feedback from staff and students at the ANU. Students able to factor these evaluations into their final project exegeses were also rewarded by the marking scheme.

Notably, the vast majority of the students completed the course with marks 70 or higher (in the two top career brackets). Undoubtedly, the unconventional nature of the course is one of its greatest assets. Engaging with a genuine cultural heritage institution acted as motivation for the students. The autonomy and trust placed in them was empowering. The freedom to pursue the projects that they felt best reflected their passions, interests in response to a national collection of international fame resulted, almost invariably, in the students rising to the challenge and even exceeding expectations.

This was a learning experience beyond the rubric. To succeed students had to develop the transferable skills of project-planning, time-management and client interaction that would support a future employment portfolio. The most successful groups were also the most diverse groups. Combining voices from different degree programs, languages, cultures, genders, and interests helped promote internal critical evaluations throughout the design process, and helped the students engage with the materials, the projects, and each other in a more thoughtful way.

Two groups discussing their projects with Mahendra Mahey
Figure 5: Two groups discussing their projects with Mahendra Mahey
Figure 6 : National Museum of Australia curator Dr Lily Withycombe user-testing a digital project built using British Library data, 2019.
Figure 6: National Museum of Australia curator Dr Lily Withycombe user-testing a digital project built using British Library data, 2019.
User-testing feedback! Staff and students came to see the projects and support our students in the Digital Humanities Expo in 2019.
Figure 7: User-testing feedback! Staff and students came to see the projects and support our students in the Digital Humanities Expo in 2019.

Terhi Nurmikko-Fuller Biography

Dr. Terhi Nurmikko-Fuller
Dr. Terhi Nurmikko-Fuller

Terhi Nurmikko-Fuller is a Senior Lecturer in Digital Humanities at the Australian National University. She examines the potential of computational tools and digital technologies to support and diversify scholarship in the Humanities. Her publications cover the use of Linked Open Data with musicological information, library metadata, the narrative in ancient Mesopotamian literary compositions, and the role of gamification and informal online environments in education. She has created 3D digital models of cuneiform tables, carved boab nuts, animal skulls, and the Black Rod of the Australian Senate. She is a British Library Labs Researcher in Residence and a Fellow of the Software Sustainability Institute, UK; an eResearch South Australia (eRSA) HASS DEVL (Humanities Arts and Social Sciences Data Enhanced Virtual Laboratory) Champion; an iSchool Research Fellow at the University of Illinois at Urbana-Champaign, USA (2019 - 2021), a member of the Australian Government Linked Data Working Group; and, since September 2020 has been a member of the Territory Records Advisory Council for the Australian Capital Territory Government.

BL Labs Public Awards 2020 - REMINDER - Entries close NOON (GMT) 30 November 2020

Inspired by this work that uses the British Library's digitised collections? Have you done something innovative using the British Library's digital collections and data? Why not consider entering your work for a BL Labs Public Award 2020 and win fame, glory and even a bit of money?

This year's public awards 2020 are open for submission, the deadline for entry is NOON (GMT) Monday 30 November 2020

Whilst we welcome projects on any use of our digital collections and data (especially in research, artistic, educational and community categories), we are particularly interested in entries in our public awards that have focused on anti-racist work, about the pandemic or that are using computational methods such as the use of Jupyter Notebooks.

Work will be showcased at the online BL Labs Annual Symposium between 1400 - 1700 on Tuesday 15 December, for more information and a booking form please visit the BL Labs Symposium 2020 webpage.

25 November 2020

Early Circus in London: Astley's Amphitheatre by Professor Leith Davis

Posted on behalf of Professor Leith Davis at Simon Fraser University, British Colombia, Canada by Mahendra Mahey, Manager of BL Labs.

Astley-archive-Th.Cts.35
Picture of cutting taken from the Astley's newspaper clippings archive Th.Cts.35 (held at the British Library)

What do you think of when you hear the word “circus”? Lions, tigers, elephants? Ringmasters in coat-tails? Trapeze artists? In fact, most of the images that we commonly associate with circus derive from nineteenth-century examples of the genre. Circus when it first started out in the late eighteenth century was a different kind of entertainment altogether. Yes, there were animal acts, including equestrian riding stunts, and there were also acrobatics. But early circus also included automatons and air balloons, pantomime and fireworks, musical acts and re-enactments of events like the storming of the Bastille. In short, it was a microcosm of the Georgian world which served to re-present important political and cultural activities by re-mixing them with varieties of astonishing physical entertainments.

Ackermann-rudolph-microcosm-083720
Astley's Amphitheatre from Microcosm of London
Image taken from the British Library Archive

Unfortunately, partially as a result of the overpowering influence of the lions and tigers and ringmasters, and partially as a result of its having fallen through the cracks between academic disciplinary divisions, early circus has been largely forgotten.

The database that I created, “Reconstructing Early Circus: Entertainments at Astley’s Amphitheatre, 1768-1833” (https://dhil.lib.sfu.ca/circus/), based on materials held by the British Library, aims to bring early circus back from offstage and to connect the ephemeral traces of this eighteenth-century entertainment with the concerns of our contemporary age.

Philip-Astley
Phillip Astley - Image Copyright 
National Portrait Gallery

The man credited with “inventing” the form of entertainment known now as circus was Philip Astley. Astley was certainly not the first person to perform popular equestrian entertainments for money, but he is acknowledged to have been the first person to have had the idea of using an enclosed space where he could present his equestrian shows to a paying audience. Over the years, Astley’s Amphitheatre and Riding School evolved to include both a ring and a stage. Astley was an astute businessman and was able to expand his enterprise to include circuses in Dublin and Paris. His success also encouraged other entertainment entrepreneurs to try their hand at the circus business. Sites of entertainment similar to Astley’s sprang up within London and other locations in the British archipelago as well as in Europe and North America, including Jones’s Equestrian Amphitheatre in Whitechapel (1786), Swan’s Amphitheatre in Birmingham (1787), the Edinburgh Equestrian Circus (1790), Ricketts's Equestrian Pantheon in Boston (1794) and Montreal (1797), and the Royal Circus, Equestrian and Philharmonic Academy in London (1782). Circus was not just as a type of entertainment in the metropolis; it was also a transnational phenomenon.

Pony race
Poney Race at Astley's Amphitheatre, image from V&A Museum

I drew the data for  “Reconstructing Early Circus” from the British Library’s “Astley’s Cuttings From Newspapers” (Th. Cts. 35-37). This source consists of three volumes of close to 3,000 newspaper advertisements of entertainments featured at Astley’s from 1768 to 1833, along with a few manuscript materials and a lock of Astley’s daughter’s hair. The clippings were collected by the theatre manager, James Winston, for a history of theatre which he never published. Working with my research assistant, Emma Pink, I photographed each of the clippings from the BL volumes in the reading room and got 4 undergraduate students to transcribe them. Then I worked with the personnel at Simon Fraser University’s Digital Humanities Research Lab to create the website. Users can browse through the sixty-year history of Astley’s or, using the search function, they can identify the frequency of particular acts or performers, for example. The materials represent a rich treasure trove for scholars of: Romantic-era cultural and media studies; British history; economic and business history; performance studies; fine arts; and cultural memory studies. 

As I continue to expand and improve on the site, I hope to use my database to explore connections between early circus and other popular entertainments of the day as well as to expand the site to examine circus locations in transatlantic locations. 

Examining the Astley archives allows us to learn more about leisure in the long eighteenth century as well as about the connections between popular entertainment and political and social concerns in Georgian times, and, by extension, in our own era. Lions and tigers and ringmasters you won’t find here, but check out the “little Learned Military Horse,” the trained bees, and, of course, the equestrian feats of Astley himself for more insight into this neglected popular entertainment from 200 years ago. 

(See also Leith Davis. "Between Archive and Repertoire: Astley's Amphitheatre, Early Circus, and Romantic-Era Song Culture." Studies in Romanticism 58, no. 4 (2019): 451-79).

Leith-davis
Leith Davis, Professor of English at Simon Fraser University in British Columbia, Canada

Leith Davis is Professor of English at Simon Fraser University in British Columbia, Canada where she researches and teaches eighteenth-century literature and media history. She is the author of Acts of Union: Scotland and the Negotiation of the British Nation (Stanford UP, 1998) and Music, Postcolonialism and Gender: The Construction of Irish Identity, 1724-1874 (Notre Dame UP, 2005) as well as co-editor of Scotland and the Borders of Romanticism (Cambridge: Cambridge UP, 2004) and Robert Burns and Transatlantic Culture (Ashgate, 2012). She is currently completing a monograph entitled Mediating Cultural Memory in Britain and Ireland, 1688-1745 which explores sites of cultural memory in the British archipelago within the context of the shifting media ecology of the eighteenth century.

BL Labs Public Awards 2020 - REMINDER - Entries close NOON (GMT) 30 November 2020

Inspired by this work that uses the British Library's digital archived cuttings? Have you done something innovative using the British Library's digital collections and data? Why not consider entering your work for a BL Labs Public Award 2020 and win fame, glory and even a bit of money?

This year's public awards 2020 are open for submission, the deadline for entry is NOON (GMT) Monday 30 November 2020

Whilst we welcome projects on any use of our digital collections and data (especially in research, artistic, educational and community categories), we are particularly interested in entries in our public awards that have focused on anti-racist work, about the pandemic or that are using computational methods such as the use of Jupyter Notebooks.

Work will be showcased at the online BL Labs Annual Symposium between 1400 - 1700 on Tuesday 15 December, for more information and a booking form please visit the BL Labs Symposium 2020 webpage.

13 November 2020

Reflections during International Games Week and Transgender Awareness Week

This week is International Games Week in libraries - “an initiative run by volunteers from around the world to reconnect communities through their libraries around the educational, recreational, and social value of all types of games.”

As a volunteer, participant and collaborator on game events organised by Stella Wisdom in the British Library's Digital Scholarship Team, I’ve particularly enjoyed the International Games Week events held at the Library during previous years, including Adventure X and WordPlay. It’s fitting that a national library acknowledges the value of narratives in games and interactive fiction, as well as those held in books and other formats.

International Games Week logo with a games controller, 2 dice and a meeple

In this post, I wanted to highlight some things that cut across projects I’ve been involved in with the British Library. These include curating UK websites and running online game jams, in addition to the game events mentioned above.

Back in 2018, I co-organised the online Gothic Novel Jam with Stella. In terms of the gothic and supernatural, it’s appropriate that this blog post is published today on Friday the 13th! We’ve blogged about this jam previously, but in summary, the intention was to encourage participants to create games, interactive fiction and other creative outputs using the theme of the gothic novel and British Library Flickr images as inspiration. The response was fantastic, and resulted in a large number of great narrative games being created. I particularly liked As a Glow Brings Out a Haze for the creative reuse of British Library images.

In addition to co-running game jams, I'm a volunteer curator for the UK Web Archive, and as representative for CILIP’s LGBTQ+ Network, I’ve been co-lead on the LGBTQ+ Lives Online project with Steven Dryden from the British Library. This project has focused on identifying UK LGBTQ+ websites, blogs etc. for inclusion in the collection, as a way to preserve them for future generations. To a lesser extent, I’ve also been supporting the curation of the Video Games collection and also Interactive Narratives, which is part of the broader E-publishing trends/Emerging formats collection.

I find it interesting to see where different seemingly unrelated projects overlap, and in this instance, the overlap is an online game called The Tower created by Freya Campbell, which she originally created for Gothic Novel Jam. The game itself is a piece of interactive fiction combining both text and images. For me it was a great example of a narrative that is clearly gothic and dark, but takes a new focus to frame that genre. 

This week is Transgender Awareness Week, and as more UK content is published online about transgender issues and experiences, these sites will be added to the UKWA LGBTQ+ Lives collection. The Tower includes subject matter that is particularly high profile in UK media discussions surrounding LGBTQ+ lives at the moment - transgender identities. As the creator of The Tower is based in the UK, this game is now part of the Interactive Narratives and LGBTQ+ Lives collections in the UK Web Archive.

Anyone can suggest UK published websites to be included in the UK Web Archive by filling in this online nominations form: https://www.webarchive.org.uk/en/ukwa/nominate. As part of both International Games Week and Transgender Awareness Week, why not nominate UK websites for inclusion in the Video GamesInteractive Narratives, and LGBTQ+ Lives Online collections. 

Another overlap connected to The Tower, is that Freya exhibited two other games (Perseids, and Super Lunary ep.1) at AdventureX, when it was held at the British Library during International Games Week in 2018 and 2019. Sadly AdventureX is cancelled in 2020 due to Covid-19, but if you make games and interactive fiction, why not consider taking part in AdvXJam, which starts tomorrow.

This post is by Ash Green (@ggnewed) from the CILIP LGBTQ+ Network.

11 November 2020

BL Labs Online Symposium 2020 : Book your place for Tuesday 15-Dec-2020

Posted by Mahendra Mahey, Manager of BL Labs

The BL Labs team are pleased to announce that the eighth annual British Library Labs Symposium 2020 will be held on Tuesday 15 December 2020, from 13:45 - 16:55* (see note below) online. The event is FREE, but you must book a ticket in advance to reserve your place. Last year's event was the largest we have ever held, so please don't miss out and book early, see more information here!

*Please note, that directly after the Symposium, we are organising an experimental online mingling networking session between 16:55 and 17:30!

The British Library Labs (BL Labs) Symposium is an annual event and awards ceremony showcasing innovative projects that use the British Library's digital collections and data. It provides a platform for highlighting and discussing the use of the Library’s digital collections for research, inspiration and enjoyment. The awards this year will recognise outstanding use of British Library's digital content in the categories of Research, Artistic, Educational, Community and British Library staff contributions.

This is our eighth annual symposium and you can see previous Symposia videos from 201920182017201620152014 and our launch event in 2013.

Dr Ruth Anhert, Professor of Literary History and Digital Humanities at Queen Mary University of London Principal Investigator on 'Living With Machines' at The Alan Turing Institute
Ruth Ahnert will be giving the BL Labs Symposium 2020 keynote this year.

We are very proud to announce that this year's keynote will be delivered by Ruth Ahnert, Professor of Literary History and Digital Humanities at Queen Mary University of London, and Principal Investigator on 'Living With Machines' at The Alan Turing Institute.

Her work focuses on Tudor culture, book history, and digital humanities. She is author of The Rise of Prison Literature in the Sixteenth Century (Cambridge University Press, 2013), editor of Re-forming the Psalms in Tudor England, as a special issue of Renaissance Studies (2015), and co-author of two further books: The Network Turn: Changing Perspectives in the Humanities (Cambridge University Press, 2020) and Tudor Networks of Power (forthcoming with Oxford University Press). Recent collaborative work has taken place through AHRC-funded projects ‘Living with Machines’ and 'Networking the Archives: Assembling and analysing a meta-archive of correspondence, 1509-1714’. With Elaine Treharne she is series editor of the Stanford University Press’s Text Technologies series.

Ruth's keynote is entitled: Humanists Living with Machines: reflections on collaboration and computational history during a global pandemic

You can follow Ruth on Twitter.

There will be Awards announcements throughout the event for Research, Artistic, Community, Teaching & Learning and Staff Categories and this year we are going to get the audience to vote for their favourite project in those that were shortlisted, a people's BL Labs Award!

There will be a final talk near the end of the conference and we will announce the speaker for that session very soon.

So don't forget to book your place for the Symposium today as we predict it will be another full house again, the first one online and we don't want you to miss out, see more detailed information here

We look forward to seeing new faces and meeting old friends again!

For any further information, please contact labs@bl.uk

05 November 2020

World Digital Preservation Day 2020

World Digital Preservation Day (WDPD) is held on the first Thursday of every November, providing an opportunity for the international digital preservation community to connect and celebrate the positive impact that digital preservation has. Follow #WDPD2020 for discussion throughout the day. Our colleagues in the UK Web Archive (UKWA) have already blogged earlier for WDPD about their Coronavirus Collection, which includes preservation of the ‘Children of Lockdown’ project website.

 A number of WDPD online events are taking place, including a book launch party for Electronic Legal Deposit Shaping the library collections of the future, for which our collaborative doctoral research student Linda Berube co-wrote chapter 9; Follow the Users: Assessing UK Non-Print Legal Deposit Within the Academic Discovery Environment

World Digital Preservation Day logo

WDPD is also when the annual Digital Preservation Awards are announced, #DPA2020, and we wish to offer our warmest congratulations to all today's winners, including our wonderful UKWA colleagues who have won the The National Archives Award for Safeguarding the Digital Legacy, recognising 15 years of web archiving work. You can read more about the UKWA's 15 year anniversary in 2020 here and watch a recording of the online Digital Preservation Awards ceremony in the video below.

Here in Digital Scholarship we enjoy collaborating with the British Library's Digital Preservation and UKWA teams. Last year we hosted a six month post-doctoral placement; ‘Emerging Formats: Discovering and Collecting Contemporary British Interactive Fiction’, where Lynda Clark created an Interactive Narratives UKWA collection and evaluated how crawlers captured web hosted works of interactive fiction.

This research project was part of the Library’s ongoing Emerging Formats work, which acknowledges that without intervention, many culturally valuable digital artefacts are at risk of being lost. Interactive narratives are particularly endangered due to the ‘hobbyist’ nature of many creators, meaning they do not necessarily subscribe to standardised practices. However, this also means that digital interactive fiction is created by and for a wide variety of creators and audiences, including various marginalised groups.

Two reports written by Lynda during her innovation placement are publicly available on the BL Research Repository; https://doi.org/10.23636/1192 and https://doi.org/10.23636/1193. Furthermore, a long paper about the Interactive Narratives collection is part of the proceedings of this week's International Conference on Interactive Digital Storytelling (ICIDS).[1] This event is a great opportunity to meet both scholars and creative practitioners who make digital stories. I was delighted to be a reviewer for the ICIDS 2020 online art exhibition, which has the theme "Texts of Discomfort" and presents some very thought provoking work.

This post is by Digital Curator Stella Wisdom (@miss_wisdom)

1. Clark L., Rossi G.C., Wisdom S. (2020) Archiving Interactive Narratives at the British Library. In: Bosser AG., Millard D.E., Hargood C. (eds) Interactive Storytelling. ICIDS 2020. Lecture Notes in Computer Science, vol 12497. Springer, Cham. https://doi.org/10.1007/978-3-030-62516-0_27  ↩︎

04 November 2020

Transforming Legacy Indexes into Catalogue Entries

This guest post is by Alex Hailey, Curator of Modern Archives and Manuscripts. He's on Twitter as @ajrhailey.

In late 2019 I was lucky enough to join BL and National Archives staff to trial a PG Certificate in Computing for Cultural Heritage at Birkbeck. The course provided an introduction to programming with Python, the basics of SQL, and using the two to work with data. Fellow attendees Graham, Nick, Chris and Giulia have written about their work previously, and I am going to briefly introduce one of my project tasks addressing issues with legacy metadata within the India Office Records.

 

The original data

The IOR/E/4 Correspondence with India series consists of 1,112 volumes dating from 1703-1858: four series of letters received by the East India Company (EIC) Court of Directors from the administration in India, and four series of dispatches sent to India. Catalogue entries for these volumes contain only basic information – title, dates, language, reference and former references – and subject, name and place access to the dispatches is provided through 72 index volumes (reference IOR/Z/E/4), which contain around 430,000 entries.

Sample catalogue record titled Pensions, Carnatic, Proceedings respecting from Reference IOR/Z/E/4/42/P133
Sample catalogue record of an index entry, IOR/Z/E/4/42/P133

The original indexes were produced from 1901-1929 by staff of the Secretarial Bureau, led by indexing pioneer Mary Petherbridge; my colleague Antonia Moon has written about Petherbridge’s work in a previous post. When these indexes were converted to the catalogue in the early 2010s, entries within the index volumes were entered as child or sub-items of the index volumes themselves, with information on the related correspondence volumes entered into the free-text Related material field, as shown in the image above.

 

Problem and solution

This approach has caused some issues. Firstly, users attempting to order the related correspondence regularly end up trying to place an order for an index volume instead, which is frustrating. Secondly, it makes it practically impossible to determine the whole contents of a particular volume in a quick and easy manner, which frustrates access and use.

Manually working through 430,000 entries to group the entries by volume would be an impossible task, but I was able to use Python and a library called Pandas, which has a number of useful features for examining and manipulating catalogue data: methods for reading and writing data from multiple sources, flexible reshaping of datasets, and methods for aggregation, indexing, splitting and replacing strings, including regular expressions.

Using Pandas I was able to separate information in the Related material field, restructure the data so that each instance of an index entry formed an individual record, and then group these by volume and further arrange them alphabetically or by page order.

 

Index entries for reference IOR/Z/E/4/42/P133 split into separate records
Index entries for reference IOR/Z/E/4/42/P133 split into separate records

 

 

 

Outputs and analysis

Examining these outputs gave us new insights into the data. We now know that the indexes cover 230 volumes of the dispatches only. We were also able to identify incomplete references originally recorded in the Related material field, as well as what appear to be keying errors (references which fall outside of the range of the dispatches series). We can now follow these up and correct errors in the catalogue which were previously unknown.

Comparing the data at volume level arranged alphabetically and by page order, we could appreciate just how much depth there was to the index. Traditional indexes are written with a lot of information redundancy, which isn’t immediately apparent until you group the entries according to their location within a particular volume:

Example of index entries arranged by page order, for example, 'Chart, Maps & Surveys, Harbours, Dalrymples' plans of, sent to India, pp87, 377' followed by 'East Indian Ports, Plans of Dalrymple publishing, pp87, 377' etc.
Example of index entries arranged by page order

After discussion with the IOR team we have decided to take the alphabetically arranged data and import it to the archives catalogue, so that users selecting a dispatches volume are presented with the relevant index entries immediately.

The original dataset and derived datasets have been uploaded to the Library’s research repository where they are available for download and reuse under a CC0 licence.

To enable further analysis of the index data I have also tried my hand at creating a Jupyter Notebook to use with the derived data. This is intended to introduce colleagues to using Notebooks, Python and the Pandas library to examine catalogue metadata, conducting basic queries, producing a visualisation and exporting subsets for further investigation.

Wordcloud based on terms contained in the IOR/Z/E/4 data, generated within the Jupyter Notebook. Some of the larger, highlighted words are 'respecting', 'Army', 'India', 'Administration', 'Department', 'Madras', etc. Some small words include 'late', 'allowances', 'paid', 'appointment', 'repair', etc.
Wordcloud based on terms contained in the IOR/Z/E/4 data, generated within the Jupyter Notebook.

My Birkbeck project also included work to create place and institution authority files for the Proceedings of the Governments of India series using keyword extraction with existing catalogue metadata, and this will be discussed in a future post.

Huge thanks must go to Nora McGregor, Jo Pugh and the folks at Birkbeck Department of Computer Science for developing the course and providing us with this opportunity; Antonia Moon and the IOR team for helpful discussions about the IOR data; and the rest of the cohort for moral support when the computer just wouldn’t behave.

Alex Hailey

Curator of Modern Archives and Manuscripts