Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

23 September 2021

Computing for Cultural Heritage: Trial Outcomes and Final Report

Six months ago, twenty members of staff from the British Library and The National Archives UK completed Computing for Cultural Heritage, a project that trialled Birkbeck University and Institute of Coding’s new PGCert, Applied Data Science. In this blog post we explore the necessity of this new course, the final report of this trial, and the lasting impact that this PGCert has made on some of the participants. 

 

 

Background 

Information professionals have been experiencing a massive shift to digital in the way collections are being donated, held and accessed. In the British Library’s digital collections there are e-books, maps, digitised newspapers, journal titles, sound recordings and over 500 terabytes of preserved data from the UK Web Archive. Yearly, the library sees 6 million catalogue searches by web users with almost 4 million items consulted online. This amounts to a vast amount of potential cultural heritage data available to researchers, and it requires complex digital workflows to curate, collect, manage, provide access, and help researchers computationally make sense of it all. 

Staff at collecting institutions like the British Library and the National Archives, UK are engaging in computationally driven projects like never before, but often without the benefit of data skills and computational thinking to support them. That is where a program like Computing for Cultural Heritage can help information professionals, allowing them to upskill and tackle issues – like building new digital systems and services, supporting collaborative, computational and data-driven research using digital collections and data, or deploying simple scripts to make everyday tasks easier – with confidence.  

Image of a laptop with the screen showing a bookshelf

 

Learning Aims 

The trial course was broken into two modules, a taught lesson on ‘Demystifying Computing with Python’ and a written ‘Industry Project’ on a software solution to a work-based problem.  A third module, Analytic Tools for Information Professionals, would be offered to participants outside of the trial as part of the full live course in order to earn their PGCert.

By the end of the trial, participants were able to: 

  • Demonstrate satisfactory knowledge of programming with Python. 
  • Understand techniques for Python data structures and algorithms. 
  • Work on case studies to apply data analytics using Python. 
  • Understand the programming paradigm of object-oriented programming. 
  • Use Python to apply the techniques learned on the module to real-world problems. 
  • Demonstrate the ability to develop an algorithm to carry out a specified task and to convert this into an executable program. 
  • Demonstrate the ability to debug a program. 
  • Understand the concepts of data security and general data protection regulations and standards. 
  • Develop a systematic understanding and critical awareness of a commonly agreed problem between the work environment and the academic supervisor in the area of computing. 
  • Develop a software solution for a work-based problem using the skills developed from the taught modules, for example develop software using the programming languages and software tools/libraries taught. 
  • Present a critical discussion on existing approaches in the particular problem area and position their own approach within that area and evaluate their contribution. 

  • Gain experience in communicating complex ideas/concepts and approaches/techniques to others by writing a comprehensive, self-contained report. 

The learning objectives were designed and delivered with the cultural heritage context in mind, and as such incorporated, for instance, examples and datasets from the British Library Music collections in the Python programming elements of the taught module. Additionally, there was a lecture focused on a British Library user case involving the design and implementation of a Database Management System. 

Following the completion of the trial, participants had the opportunity to complete their PGCert in Applied Data Science by attending the final module, Analytic Tools for Information Professionals, which was part of the official course launched last autumn. 

 

The Lasting Impact of Computing for Cultural Heritage 

Now that we’re six months on from the end of the trial, and the participants who opted in have earned their full PGCert, we followed up with some of the learners to hear about their experiences and the lasting effects of the course: 

“The third and final module of the computing for cultural heritage course was not only fascinating and enjoyable, it was also really pertinent to my job and I was immediately able to put the skills I learned into practice.  

The majority of the third module focussed on machine learning. We studied a number of different methods and one of these proved invaluable to the Agents of Enslavement research project I am currently leading. This project included a crowdsourcing task which asked the public to draw rectangles around four different types of newspaper advertisement. The purpose of the task was to use the coordinates of these rectangles to crop the images and create a dataset of adverts that can then be analysed for research purposes. To help ensure that no adverts were missed and to account for individual errors, each image was classified by five different people.  

One of my biggest technical challenges was to find a way of aggregating the rectangles drawn by five different people on a single page in order to calculate the rectangles of best fit. If each person only drew one rectangle, it was relatively easy for me to aggregate the results using the coding skills I had developed in the first two modules. I could simply find the average (or mean) of the five different classification attempts. But what if people identified several adverts and therefore drew multiple rectangles on a single page? For example, what if person one drew a rectangle around only one advert in the top left corner of the page; people two and three drew two rectangles on the same page, one in the top left and one in the top right; and people four and five drew rectangles around four adverts on the same page (one in each corner). How would I be able to create a piece of code that knew how to aggregate the coordinates of all the rectangles drawn in the top left and to separately aggregate the coordinates of all the rectangles drawn in the bottom right, and so on?  

One solution to this problem was to use an unsupervised machine learning method to cluster the coordinates before running the aggregation method. Much to my amazement, this worked perfectly and enabled me to successfully process the total of 92,218 rectangles that were drawn and create an aggregated dataset of more than 25,000 unique newspaper adverts.” 

-Graham Jevon, EAP Cataloguer; BL Endangered Archives Programme 

 

“The final module of the course was in some ways the most challenging — requiring a lot of us to dust off the statistics and algebra parts of our brain. However, I think, it was also the most powerful; revealing how machine learning approaches can help us to uncover hidden knowledge and patterns in a huge variety of different areas.  

Completing the course during COVID meant that collection access was limited, so I ended up completing a case study examining how generic tropes have evolved in science fiction across time using a dataset extracted from GoodReads. This work proved to be exceptionally useful in helping me to think about how computers understand language differently; and how we can leverage their ability to make statistical inferences in order to support our own, qualitative analyses. 

In my own collection area, working with born digital archives in Contemporary Archives and Manuscripts, we treat draft material — of novels, poems or anything else — as very important to understanding the creative process. I am excited to apply some of these techniques — particularly Unsupervised Machine Learning — to examine the hidden relationships between draft material in some of our creative archives. 

The course has provided many, many avenues of potential enquiry like this and I’m excited to see the projects that its graduates undertake across the Library.” 

-Callum McKean, Lead Curator, Digital; Contemporary British Collection

 

"I really enjoyed the Analytics Tools for Data Science module. As a data science novice, I came to the course with limited theoretical knowledge of how data science tools could be applied to answer research questions. The choice of using real-life data to solve queries specific to professionals in the cultural heritage sector was really appreciated as it made everyday applications of the tools and code more tangible. I can see now how curators’ expertise and specialised knowledge could be combined with tools for data analysis to further understanding of and meaningful research in their own collection area."

-Giulia Carla Rossi, Curator, Digital Publications; Contemporary British Collection

 

Final Report 

The Computing for Cultural Heritage project concluded in February 2021 with a virtual panel session that highlighted the learners’ projects and allowed discussion of the course and feedback to the key project coordinators and contributors. Case studies of the participants’ projects, as well as links to other blog posts and project pages can be found on our Computing for Cultural Heritage Student Projects page. 

The final report highlights these projects as well as demographical statistics on the participants and feedback that was gained through anonymous survey at the end of the trial. In order to evaluate the experience of the students on the PGCert we composed a list of questions that would provide insight into various aspects of the course with respect to how the learner fit in the work around their work commitments and how well they met the learning objectives. 

 

Why Computing for Cultural Heritage? 

Bar graph showing the results of the question 'Why did you choose to do this course' with the results discussed in the text below
Figure 1: Why did you choose to do this course? Results breakdown by topic and gender

When asked why the participants chose to take part in the course, we found that one of the most common answers was to develop methods for automating repetitive, manual tasks – such as generating unique identifiers for digital records and copying data between Excel spreadsheets – to free up more curatorial time for their digital collections. One participant said:  

“I wanted to learn more about coding and how to use it to analyse data, particularly data that I knew was rich and had value but had been stuck in multiple spreadsheets for quite some time.” 

There was also a desire to learn new skills, either for personal or professional development: 

“I believe in continuous professional development and knew that this would be an invaluable course to undertake for my career.”  

“I felt I was lagging behind and my job was getting static, and the feeling that I was behind [in digital] and I wanted to kind of catch up.” 

Bar graph showing the results to the question 'Did the course help you meet your aims?' with 14 answering yes, 1 answering no and 1 answering 'mixed'
Figure 2: 'Did the course help you meet your aims? Results broken down by answer and gender.

A follow up question asked whether these goals and aims was met by the course. Happily, most participants indicated that they had been met, for reasons of increased confidence, help in developing new computational skills, and a deeper knowledge of information technology. 

 

What was the most enjoyed aspect of the course? 

Bar graph showing the results of the question 'What did you enjoy most about the course' with the results discussed in the text below
Figure 3: 'What did you enjoy most about the course?' Results breakdown by topic and gender

When broken down, the responses to ‘What did you enjoy most’ largely reflect the student experience, whether it was being in taught modules (4), getting hands on experience (4), or being in a learning environment again (6). Participants also indicated that networking with peers was an enjoyable part of the experience: 

“Day out of work with like minded people made it really easy to stick with rather than just doing it online.”  

“Spending a day away from work and meeting the people I had never met at the NA, and also speaking to people from the BL about what they did.”  

“I enjoyed being a student again, learning a new skill amongst my peers, which week after week is a really valuable experience…” 

“Learning with colleagues and people working in similar fields was also a plus, as our interests often overlapped...” 

While only two responses were made where the project module was considered as one of the most enjoyable components, it was useful to see how the course really afforded the opportunity to apply their learning to solving a work-based problem that provides some benefit to their role, department or digital collection: 

“I really enjoyed being able to apply my learning to a real-world work-based project and to finally analyze some of the data that has been lying around the department for over a decade without any further analysis.”  

“The design and create aspect of the project. Applying what I learned to solving a genuine problem was the most enjoyable part - using Python and solving problems to achieve something tangible. This is where I really consolidated my learning.” 

 

What was the most challenging aspect of the course? 

Bar graph showing the results of the question 'What did you find the most challenging and why?' with the results discussed in the text below
Figure 4: 'What did you find the most challenging and why?' Results breakdown by topic and gender.

When discussing the most challenging aspect of the course, most of the learners focused on the practical Python lab sessions and the work-based project module. Interestingly, participants also stated that they were able to overcome the challenges through personal perseverance and the learning provided by the course itself: 

“I found the initial hurdle of learning how [to] code very challenging, but after the basics it became possible to become more creative and experimental.”  

“The work-based project was a huge challenge. We'd only really done 5 weeks of classes and, having never done anything like this before, it was hard to envisage an end product let alone how to put it together. But got there in the end!” 

While the majority of the cohort found the practical components of the PGCert trial most challenging, the feedback also suggested that the inclusion of the second module – which will be available as part of the full programme – will provide more opportunity to practice the practical programming skills like software tools and APIs. 

 

The Effectiveness of Computing with Cultural Heritage 

Bar graph showing the results of the question 'Have you applied anything you have learnt?' with 2 results for 'Data analysis concepts', 12 results for 'Python coding' and 2 results for 'Nothing'
Figure 5: 'Have you applied anything you have learnt?' Results breakdown by topic and gender.

Participants were asked whether they had used any of the knowledge or skills acquired in the PGCert trial. Even after sitting just the first and third modules, participants responded that they were able to apply their learning to their current role in some form.  

“I now regularly use the software program I built as part of my day-to-day job. This program performs a task in a few seconds, which otherwise could take hours or days, and which is otherwise subject to human error. I have since adapted this so that it can also be used by a colleague in another department.”  

“Python helps me perform tasks that I previously did not know how to achieve. I have also led a couple of training sessions within the library, introducing Python to beginners (using the software I built in the project as a cultural heritage use case to frame the introduction).” 

“I changed [job] role at the end of the course so I think that helped me also in getting this promotion. And in this new role I have many more data analysis tasks to perform [quickly] for actions that would take months so yeah I managed to write that with a few scripts in my new role.” 

It was great to hear that the impacts of the trial were being felt so immediately by the participants, and that they were able to not only retain but also apply the new skills that they had gained.  

 This blog post was written by Deirdre Sullivan, Business Support Officer for Digital Scholarship Training Initiatives, part of the Digital Research and Curators Team. Special thanks to Nora McGregor, Digital Curator for the European and American Collection for support on the blog post and Martyn Harris, Institute of Coding Manager, for his work on the final report, as well as Giulia Rossi, Callum McKean and Graham Jevon for sharing their experiences.

National Libraries Now: Wikimedians Unite!

On Friday 17th September 2012, I was delighted to participate in a conference panel for the National Libraries Now Conference. I had worked to assemble a veritable dream team of Wikimedia and library talent, to talk about Wikimedia Residencies from a four-nation perspective. 

Joining me on the panel were Stella Wisdom (British Library), Jason Evans (National Library of Wales), Rebecca O’Neill (Wikimedia Community Ireland) and Ruth Small (Digital Productions Operator, National Library of Scotland). Stuart Prior (Programme Coordinator, Wikimedia UK) kindly agreed to be our chair. We pre-recorded presentations that were circulated to participants, so that our time on the 17th could be devoted to questions and discussion.

Going over my notes now, the best way to try to reflect the discussion is to look at some of the questions asked and the responses garnered. Please bear in mind that some remarks may be out of chronological order!

  • How do you think working with Wikimedia helps your institution’s strategic goals?

We reflected as a group on the move from WikiPedians in Residence to WikiMedians in residence [emphasis my own] and how this shows a shift in institutional thinking towards the potential of larger Wikimedia projects, and the use of platforms such as Commons, Wikisource and WikiBase.

Jason spoke about the way that fewer onsite footfall numbers at NLW, because of its physical location, enhance the importance of digital work and online outreach. He also spoke about the need for training, promotion and contribution through Wikimedia platforms as being just as valuable, if not more so, than the total number of views gained.

Image of National Library of Wales, Aberystwyth
It might not be digital, but it is a beauty! Ian Capper, via Wikimedia Commons.

 

The National Library of Scotland is in the heart of Edinburgh, so does not face the same issues with footfall, however, as Ruth pointed out, a key strategic goal of the Library is to reach people, and digitising is not the end of the road. Engagement with collections like the NLS Data Foundry is crucial, and the groundbreaking Scottish Chapbooks project run by the NLS was born out of the pandemic, showing a new imagining of institutional goals.

  • How do you incorporate Wikimedia work into your ‘normal’ work?

It was agreed that the inclusion of Wiki in job descriptions could help change at an institutional level, while Rebecca pointed out that the inclusion of Wiki activity as an outreach activity in funding applications is often a good way forward for inclusion of this work as part of major research projects. Again, advocacy and emphasis on the ease with which Wiki work can be undertaken was a key focal point, showing colleagues that their interests and our tools can align well.

  • How do you implement elements of quality control to what is ultimately crowdsourced work?

Jason suggested that we start to think about ‘context’ control: we can upload content and edit and amend details from the beginning, however how we contextualise this material and the activity of Wiki engagement is crucial. There is a high level of quality in curation already, and often Wiki datasets will link back to other repositories such as Flickr or institutional catalogues.

The classic counterpoint of ‘anyone can edit’ and ‘everyone can edit’ came to the fore here: as was rightly pointed out, the early 00s impression of Wikipedia as a free-for-all is largely outdated. In fact, expectations are often inverted, as the enthusiastic and diligent Wiki community are quick to act upon misinformation or inaccuracies. We spoke about the beauty of the process in Wikimedia whereby information picks up value and enriched data along the way, an active evolution of resources.

Image of WIkipedia welcome page stating 'the free encyclopedia that anyone can edit'
The WIkipedia landing page: anyone can edit!

 

  • What about decolonisation and Wikimedia?

Decolonisation is a huge question for Wikimedia: movements around the world are examining what we can do to better serve the larger cause of anti-racist practice. For the British Library, I spoke about the work we have done on the India Office Records in offering a template for content warnings and working with the input of our colleagues to make this as robust of a model as we can.

Rebecca’s experience of working in Ireland was incredibly insightful: she shared with us the experience of working with Irish material that is shaped by colonial ideas of what Ireland is, and how the culture has formed. Despite being a white, European, primarily English-speaking nation, the influence of colonialism is still felt.

The use of Wikimedia as a tool for breaking down barriers is vital, as each of our speakers illustrated. Jason spoke about the digital repatriation of items, and gave an example of the Red Book of Hergest, held by Jesus College Oxford (MS 111) and now available through Wikimedia Commons. Though this kind of action cannot always stand in place of physical repatriation, the move towards collaboration is notable and important.

 

An image of anti-Irish propaganda, featuring an Irish Frankenstein figure
'The Irish Frankenstein', a piece of anti-Irish propaganda from 1882. John Tenniel, Public domain, via Wikimedia Commons.

 

An hour was simply not enough! National Libraries Now was an incredibly important experience for me, at this point in my residency. I was particularly delighted with the dedication and enthusiasm of my co-panelists, and hope that we were able to shed some light on the Wikimedian-in-Residence role for those attending.

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian).

25 August 2021

Dabbling in DCMI

One of the best bits of working in digital scholarship is the variety of learning, training and knowledge exchange we can participate in. I have come to my post as a Wikimedian with a background in digital humanities and voluntary experience, and the opportunity to solidify my skills through training courses is really exciting.

Shortly after I started at the library, I had the chance to participate in the Library Juice Academy’s course ‘Introduction to Metadata’. Metadata has always fascinated me: as someone who can still remember when the internet was installed in their house, by means of numerous AOL compact discs, the way digital information has developed is something I have had direct experience of, even if I didn’t realise it.

Green and yellow CD with 1990s AOL branding.
Image of AOL CD, courtesy of archive.org.

Metadata, simply put, is data about data. It tells us information about resource you might find in a library or museum: the author of a book, the composer of a song, the artist behind a painting. In analogue terms, this is like the title page in a novel. In digital terms, it sits alongside the content of the resource, in attached records or headers. In the Dublin Core Metadata Initiative format, one of the most common ways of expressing metadata, there are fifteen separate ‘elements’ you can apply to describe a resource, such as title, date, format and publisher.

Wikidata houses an amazing amount of data, which is unusual as it is not bounded by a set number of ‘elements’. There are many different ways of describing the items on Wikidata, and many properties and statements can be added to each item. There have been initiatives to integrate Wikidata and metadata in a meaningful way, such as the WikiProject Source Metadata and WikiCite. I have certainly found it very useful to have a sound understanding of metadata and its function, in order to utilise Wikidata effectively.

Image of Wikicite logo, with birthday branding.
Wikicite 8th Birthday Logo by bleeptrack.

The Library Juice Academy course was asynchronous and highly useful. Over four weeks, we completed modules involving self-selected readings, discussion forum posts and video seminars. I particularly enjoyed the varied selection of readings: the group of participants came from a breadth of backgrounds and experiences, and the readings reflected this. The balance between theoretical reading and practical application was excellent, and I enjoyed getting to work with MARCEdit for the first time.

I completed the course in May 2021, and was delighted to receive my certificate by email. I have a much stronger handle on the professional standard of metadata in the GLAM sector and how this intersects with the potential of the vast array of data descriptors available in Wikidata. It was also a great opportunity to think about the room for nuance, subjectivity and bias in data. During Week One, we considered ‘Misinformation and Bias in Data Processing’ by Thornburg and Oskins. I said the following in our forum discussion:

“What I have taken from this piece is a real sense of the hard work that goes into the preparation of resources, and the many different forms bias can take, often inadvertently. It has made me think about and appreciate the difficult decisions that have to be made, and the processes that underlie these practices.”

Overall, participating in this course and expanding my skills into more traditional librarianship fields was fascinating, and left me eager to learn more about metadata and start working more closely with our collections and Wikidata.

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian).

24 August 2021

Important information for email subscribers of the British Library's Digital Scholarship blog

Unfortunately, the third-party platform that the British Library uses for email notifications for our blogs is making changes to its infrastructure. This means that, from August 2021, we anticipate that email notifications will no longer be sent to subscribers (although the provider has been unable to specify when exactly these will cease).

To find out when new blog posts are published, we recommend following us on Twitter @BL_DigiSchol or checking this page on the British Library website where all our blogs are listed.

We want to assure you that we are actively looking into this issue and working to implement a solution which will continue your email notifications, however we do not know whether you will continue to receive notifications about new posts before we are able to implement this. But we promise to update the blog with further information as soon as we have it. Thank you for your patience and understanding while we resolve this.

We appreciate this is inconvenient and know many people are not on social media and have no intention of being so. Many rely on email notifications and may miss out without them. As soon as we have been able to implement a new solution we will post about it here. Thanks for bearing with us.

12 August 2021

Dates to discuss Wikidata at Wikimania 2021

Wikimania is often the highlight of any Wikimedian’s calendar. Hosted by the Wikimedia Foundation, Wikimania is a conference like no other. A large number of participants take part in the annual celebration of open knowledge and Wikimedia projects. Previous events have taken place in  Stockholm (2019), Cape Town (2018), Montreal (2017) and Italy (2016). Due to the ongoing global pandemic situation, this year's conference being held 13-17 August 2021 is taking place entirely online, something Wikimania is ideally suited for!

  Logo for Wikimania 2021, 4 squares, 1 with a drawing of 12 peoples faces as if they are in a videocall, the 2nd of 2 jigsaw puzzle pieces, the 3rd of paper confetti and the 4th square showing 2 people sitting at a table talking

In addition to more traditional conference sessions, Wikimania will be running an Unconference, a Community Village, and a community Hackathon. Communication is encouraged through a variety of channels including Telegram, IRC and Wiki talk pages.

Telegram machine
A photograph of an old telegraph key by Sandra Tan on Unsplash

Looking at the programme, so many interesting topics are on the table for presentation and discussion: from copyright reform, to innovation and community development, there’s a wide spectrum of material to interest all Wikimedians of every level. Handily, events are rated in terms of their suitability for beginners, to make things as welcoming as possible. There is a whole strand of presentations devoted to Wikidata, which you can view here.

I am very excited to be presenting remotely at this conference on behalf of the British Library. I will be introducing the work of Tom Derrick on the Bengali Books Wikisource Competition, and Dominic Kane (UCL) on the India Office Records project. We have shaped our panel to show what GLAM institutions can do to promote and effectively utilise Wiki platforms for public engagement with library and archive collections. Our panel will run on Sunday 15th of August at 8.15pm (7.15pm UTC).

Wikimania is free to attend online, 13-17 August 2021, registration is open until midnight on Thursday 12th August. We hope to see you there!

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian)

03 August 2021

Automating the Recognition of Chinese Manuscripts: New Chevening British Library Fellowship

 

The Chevening Fellowship Programme is the UK government’s international awards scheme aimed at fostering knowledge exchange and collaboration, and developing global leaders. In 2015, the Foreign, Commonwealth & Development Office (FCDO) has partnered with the British Library to offer professionals two new fellowships every year, and recently the two organisations have announced the renewal of their partnership until 2024/25.

Chevening logo and the British Library logo

These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. The Library has hosted international fellows through this scheme since 2016, with each fellowship framing a distinct project inspired by Library collections. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as archival material from Latin America and the Caribbean, African-language printed books, Nationalism, Independence, and Partition in South Asia and Big Data and Libraries.

We are thrilled to (re-)announce that one of the two placements available for the 2022/2023 academic year will focus on automating the recognition of historical Chinese handwritten texts. This fellowship, originally announced two years ago, had to be postponed due to the pandemic – and we are excited to be able to offer it again. This is a special opportunity to work in the Library’s Digital Research Team, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.

End piece of a Chinese Lotus Sutra Scroll (shelfmark: Or.8210/S.1606). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.
End piece of a Chinese Lotus Sutra Scroll (shelfmark: Or.8210/S.1606). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.

 

The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it is important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western languages print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, the Living with Machines project, has been applying Optical Character Recognition (OCR) technology to UK newspapers, designing and implementing new methods in data science and artificial intelligence, and analysing these materials at scale.

Taking a broader perspective on Library collections, we have been exploring opportunities with non-Western collections too. Library staff have been engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for English, Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. We have been working with Transkribus as well – for example, Alex Hailey, Curator for Modern Archives and Manuscripts, used the software to automatically transcribe 19th century botanical records from the India Office Records. An ongoing work led by Tom Derrick is to OCR our digitised collection of Bengali printed texts, digitised as part of the Two Centuries of Indian Print project.

 

Regions, text lines and illustrations demarcated as ground truth, as shown in Transkribus (Shelfmark: Or 3366). Digitised and available on Qatar Digital Library.
Regions, text lines and illustrations demarcated as ground truth, as shown in Transkribus (Shelfmark: Or 3366). Digitised and available on Qatar Digital Library.
 
 
Another screenshot from Transkribus, showing automatically transcribed Bengali printed text (Shelfmark: VT 1914 d). Digitised as part of the Two Centuries of Indian Print project.
Another screenshot from Transkribus, showing automatically transcribed Bengali printed text (Shelfmark: VT 1914 d). Digitised as part of the Two Centuries of Indian Print project.

 

The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.

This fellowship programme will start in September 2022 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.

For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” fellowship page: https://www.chevening.org/fellowship/british-library-historical-chinese-texts/.

Applications open on 3 August, 12:00 (midday) BST and close on 2 November, 12:00 (midday) GMT.

Good Luck!

This post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She is on twitter as @BL_AdiKS

 

22 July 2021

Building the New Media Writing Prize Special Collection

The New Media Writing Prize is awarded annually to interactive works that use technology and digital tools in exciting and innovative ways. Organised by Bournemouth University, the prize is now in its 12th year and open for entries until 26th November 2021.

Banner saying "Innovative, Immersive, Interactive. The 2021 New Media Writing Prize is open for entries. Find out more.
The homepage banner on the New Media Writing Prize website

The British Library hosted a Digital Conversations event to celebrate the 10th anniversary of the prize in 2019 and as part of our work on collecting and preserving emerging formats, last year we started building a special collection to archive all shortlisted and winning entries to the prize in the UK Web Archive. Thanks to Joan Francis for her valued support adding targets and metadata into the Annotation and Curation Tool, at the moment of writing, the collection stands at 226 websites, including not only all the works that were web-based and live at the moment of collection, but blog posts, press kits, online reviews and author’s websites as well. This kind of contextual information (like the data recorded on the ELMCIP Knowledge Base website) is especially valuable in those instances where the work itself couldn’t be captured, due to the limitations of web archiving tools, or the fact that it had already disappeared from the Internet. More information on how the collection was conceived and developed is available in the Collection Scoping Document on the British Library Research Repository.

In order to improve access to the collection and assure quality for the websites we captured, a PhD placement project started at the beginning of this June. Tegan Pyke, from Cardiff Metropolitan University, is working on the collection to identify best captures for each of these works and is also developing a creative response to the collection.

Tegan writes:

From the New Media Writing Prize shortlists, a total of 78 works have been captured, with each work averaging 13 instances to compare and contrast. Each instance represents a web crawl undertaken by the team from the Emerging Formats project.

Screen capture of UKWA search results
A screenshot showing the instances collected for Serge Bouchardon’s 2011 Main Prize winning piece, "Loss of Grasp".

One of the most difficult aspects of this work has been deciding what, exactly, constitutes an ‘acceptable’ capture. By nature digital works are highly complex—featuring audio, visual, and kinetic assets—and using bespoke platforms, formats, and code. These attributes are heightened by the speed at which technology changes; what was acceptable a decade ago may be entirely defunct today, as is the case with Adobe removing their Flash Player support.

After an initial overview of the collection, I came to the conclusion that a strict set of criteria wouldn’t be appropriate. Nor would the capture of all aspects of a work, as many—such as Amira Hanafi’s What I’m Wearing and J R Carpenter’s The Gathering Cloud—make use of external links or externally hosted image and video files. If these lie outside the UK Legal Deposit’s scope, capturing them in their entirety becomes more difficult and sometimes impossible.

Instead, I decided to focus on narrative, asking three questions as I approached each instance: 

  • Can viewers complete the narrative? 
  • Does the theme remain understandable?
  • Is the atmosphere (the overall mood of the piece) intact?

If an instance fulfils these questions, it’s acceptable, with the most complete of those captures being identified as suitable for display in the archive.

At this point, I’m half-way through comparing instances for the collection. Of the pieces captured, just less than half meet the criteria above. Out of these, most can be improved by additional crawls that capture the missing assets. Those that cannot be improved have, for the most part, been affected by software deprecation or EOL (end-of-life), where support has been completely removed.

I’m aiming to finish my review of the collection over the next couple of months, at which point I hope to provide further insight into the process. I’ve also started a collaboration with the BL's Wikimedian-in-Residence, Lucy Hinnie, to plan a Wikidata project related to the collection aiming to make use of contextual data points collected during its creation—I’m sure you’ll read about this work here soon!

This post is by Giulia Carla Rossi, Curator of Digital Publications on twitter as @giugimonogatari and Tegan Pyke, a PhD student at Cardiff Metropolitan University currently undertaking a placement in Contemporary British Published Collections at the British Library.

09 July 2021

Subjects Wanted for Soothing Sounds Psychology Studies

Can you help University of Reading researchers with their studies examining the potential therapeutic effects of  looking at ‘soothing’ images and listening to natural sounds on mental health and wellbeing?

Sound recordings for this research have been provided by Cheryl Tipp, Curator of Wildlife & Environmental Sounds, from the British Library Sound Archive.

One study focuses on young people; 13-17 year-olds are wanted for an easy online survey. Psychology Masters student Jasmiina Ryyanen from the University of Reading is asking young people to view and listen to 25 images and sounds, rating their moods before and after. Access the survey for 13-17 year-olds here: https://henley.eu.qualtrics.com/jfe/form/SV_eKaQjEf2H3Vqw9U.

Poster with details of Soothing Sounds student study for young people

There is also an online survey managed by Emily Witten, which is aimed at adults, so if you are over 18 please participate in this study: https://henley.eu.qualtrics.com/jfe/form/SV_cBa6tNtkN3fgkCO.  

Poster about Soothing Sounds student study for adults

Both surveys are completely randomised; some participants will be asked to look at images only, others to listen to sounds only, and the final group to look at images while listening to the sounds at the same time. These research projects have been fully approved by the University of Reading’s ethical standards board. If you have any questions about these surveys, please email Jasmiina Ryyanen (j.ryynanen(at)student.reading.ac.uk) and Emily Witten (e.i.c.witten(at)student.reading.ac.uk).

We hope you enjoy participating in these surveys and feel suitably soothed from the experience! 

This post is by Digital Curator Stella Wisdom (@miss_wisdom