23 September 2021
Computing for Cultural Heritage: Trial Outcomes and Final Report
Six months ago, twenty members of staff from the British Library and The National Archives UK completed Computing for Cultural Heritage, a project that trialled Birkbeck University and Institute of Coding’s new PGCert, Applied Data Science. In this blog post we explore the necessity of this new course, the final report of this trial, and the lasting impact that this PGCert has made on some of the participants.
Information professionals have been experiencing a massive shift to digital in the way collections are being donated, held and accessed. In the British Library’s digital collections there are e-books, maps, digitised newspapers, journal titles, sound recordings and over 500 terabytes of preserved data from the UK Web Archive. Yearly, the library sees 6 million catalogue searches by web users with almost 4 million items consulted online. This amounts to a vast amount of potential cultural heritage data available to researchers, and it requires complex digital workflows to curate, collect, manage, provide access, and help researchers computationally make sense of it all.
Staff at collecting institutions like the British Library and the National Archives, UK are engaging in computationally driven projects like never before, but often without the benefit of data skills and computational thinking to support them. That is where a program like Computing for Cultural Heritage can help information professionals, allowing them to upskill and tackle issues – like building new digital systems and services, supporting collaborative, computational and data-driven research using digital collections and data, or deploying simple scripts to make everyday tasks easier – with confidence.
The trial course was broken into two modules, a taught lesson on ‘Demystifying Computing with Python’ and a written ‘Industry Project’ on a software solution to a work-based problem. A third module, Analytic Tools for Information Professionals, would be offered to participants outside of the trial as part of the full live course in order to earn their PGCert.
By the end of the trial, participants were able to:
- Demonstrate satisfactory knowledge of programming with Python.
- Understand techniques for Python data structures and algorithms.
- Work on case studies to apply data analytics using Python.
- Understand the programming paradigm of object-oriented programming.
- Use Python to apply the techniques learned on the module to real-world problems.
- Demonstrate the ability to develop an algorithm to carry out a specified task and to convert this into an executable program.
- Demonstrate the ability to debug a program.
- Understand the concepts of data security and general data protection regulations and standards.
- Develop a systematic understanding and critical awareness of a commonly agreed problem between the work environment and the academic supervisor in the area of computing.
- Develop a software solution for a work-based problem using the skills developed from the taught modules, for example develop software using the programming languages and software tools/libraries taught.
- Present a critical discussion on existing approaches in the particular problem area and position their own approach within that area and evaluate their contribution.
- Gain experience in communicating complex ideas/concepts and approaches/techniques to others by writing a comprehensive, self-contained report.
The learning objectives were designed and delivered with the cultural heritage context in mind, and as such incorporated, for instance, examples and datasets from the British Library Music collections in the Python programming elements of the taught module. Additionally, there was a lecture focused on a British Library user case involving the design and implementation of a Database Management System.
Following the completion of the trial, participants had the opportunity to complete their PGCert in Applied Data Science by attending the final module, Analytic Tools for Information Professionals, which was part of the official course launched last autumn.
The Lasting Impact of Computing for Cultural Heritage
Now that we’re six months on from the end of the trial, and the participants who opted in have earned their full PGCert, we followed up with some of the learners to hear about their experiences and the lasting effects of the course:
“The third and final module of the computing for cultural heritage course was not only fascinating and enjoyable, it was also really pertinent to my job and I was immediately able to put the skills I learned into practice.
The majority of the third module focussed on machine learning. We studied a number of different methods and one of these proved invaluable to the Agents of Enslavement research project I am currently leading. This project included a crowdsourcing task which asked the public to draw rectangles around four different types of newspaper advertisement. The purpose of the task was to use the coordinates of these rectangles to crop the images and create a dataset of adverts that can then be analysed for research purposes. To help ensure that no adverts were missed and to account for individual errors, each image was classified by five different people.
One of my biggest technical challenges was to find a way of aggregating the rectangles drawn by five different people on a single page in order to calculate the rectangles of best fit. If each person only drew one rectangle, it was relatively easy for me to aggregate the results using the coding skills I had developed in the first two modules. I could simply find the average (or mean) of the five different classification attempts. But what if people identified several adverts and therefore drew multiple rectangles on a single page? For example, what if person one drew a rectangle around only one advert in the top left corner of the page; people two and three drew two rectangles on the same page, one in the top left and one in the top right; and people four and five drew rectangles around four adverts on the same page (one in each corner). How would I be able to create a piece of code that knew how to aggregate the coordinates of all the rectangles drawn in the top left and to separately aggregate the coordinates of all the rectangles drawn in the bottom right, and so on?
One solution to this problem was to use an unsupervised machine learning method to cluster the coordinates before running the aggregation method. Much to my amazement, this worked perfectly and enabled me to successfully process the total of 92,218 rectangles that were drawn and create an aggregated dataset of more than 25,000 unique newspaper adverts.”
-Graham Jevon, EAP Cataloguer; BL Endangered Archives Programme
“The final module of the course was in some ways the most challenging — requiring a lot of us to dust off the statistics and algebra parts of our brain. However, I think, it was also the most powerful; revealing how machine learning approaches can help us to uncover hidden knowledge and patterns in a huge variety of different areas.
Completing the course during COVID meant that collection access was limited, so I ended up completing a case study examining how generic tropes have evolved in science fiction across time using a dataset extracted from GoodReads. This work proved to be exceptionally useful in helping me to think about how computers understand language differently; and how we can leverage their ability to make statistical inferences in order to support our own, qualitative analyses.
In my own collection area, working with born digital archives in Contemporary Archives and Manuscripts, we treat draft material — of novels, poems or anything else — as very important to understanding the creative process. I am excited to apply some of these techniques — particularly Unsupervised Machine Learning — to examine the hidden relationships between draft material in some of our creative archives.
The course has provided many, many avenues of potential enquiry like this and I’m excited to see the projects that its graduates undertake across the Library.”
-Callum McKean, Lead Curator, Digital; Contemporary British Collection
"I really enjoyed the Analytics Tools for Data Science module. As a data science novice, I came to the course with limited theoretical knowledge of how data science tools could be applied to answer research questions. The choice of using real-life data to solve queries specific to professionals in the cultural heritage sector was really appreciated as it made everyday applications of the tools and code more tangible. I can see now how curators’ expertise and specialised knowledge could be combined with tools for data analysis to further understanding of and meaningful research in their own collection area."
-Giulia Carla Rossi, Curator, Digital Publications; Contemporary British Collection
The Computing for Cultural Heritage project concluded in February 2021 with a virtual panel session that highlighted the learners’ projects and allowed discussion of the course and feedback to the key project coordinators and contributors. Case studies of the participants’ projects, as well as links to other blog posts and project pages can be found on our Computing for Cultural Heritage Student Projects page.
The final report highlights these projects as well as demographical statistics on the participants and feedback that was gained through anonymous survey at the end of the trial. In order to evaluate the experience of the students on the PGCert we composed a list of questions that would provide insight into various aspects of the course with respect to how the learner fit in the work around their work commitments and how well they met the learning objectives.
Why Computing for Cultural Heritage?
When asked why the participants chose to take part in the course, we found that one of the most common answers was to develop methods for automating repetitive, manual tasks – such as generating unique identifiers for digital records and copying data between Excel spreadsheets – to free up more curatorial time for their digital collections. One participant said:
“I wanted to learn more about coding and how to use it to analyse data, particularly data that I knew was rich and had value but had been stuck in multiple spreadsheets for quite some time.”
There was also a desire to learn new skills, either for personal or professional development:
“I believe in continuous professional development and knew that this would be an invaluable course to undertake for my career.”
“I felt I was lagging behind and my job was getting static, and the feeling that I was behind [in digital] and I wanted to kind of catch up.”
A follow up question asked whether these goals and aims was met by the course. Happily, most participants indicated that they had been met, for reasons of increased confidence, help in developing new computational skills, and a deeper knowledge of information technology.
What was the most enjoyed aspect of the course?
When broken down, the responses to ‘What did you enjoy most’ largely reflect the student experience, whether it was being in taught modules (4), getting hands on experience (4), or being in a learning environment again (6). Participants also indicated that networking with peers was an enjoyable part of the experience:
“Day out of work with like minded people made it really easy to stick with rather than just doing it online.”
“Spending a day away from work and meeting the people I had never met at the NA, and also speaking to people from the BL about what they did.”
“I enjoyed being a student again, learning a new skill amongst my peers, which week after week is a really valuable experience…”
“Learning with colleagues and people working in similar fields was also a plus, as our interests often overlapped...”
While only two responses were made where the project module was considered as one of the most enjoyable components, it was useful to see how the course really afforded the opportunity to apply their learning to solving a work-based problem that provides some benefit to their role, department or digital collection:
“I really enjoyed being able to apply my learning to a real-world work-based project and to finally analyze some of the data that has been lying around the department for over a decade without any further analysis.”
“The design and create aspect of the project. Applying what I learned to solving a genuine problem was the most enjoyable part - using Python and solving problems to achieve something tangible. This is where I really consolidated my learning.”
What was the most challenging aspect of the course?
When discussing the most challenging aspect of the course, most of the learners focused on the practical Python lab sessions and the work-based project module. Interestingly, participants also stated that they were able to overcome the challenges through personal perseverance and the learning provided by the course itself:
“I found the initial hurdle of learning how [to] code very challenging, but after the basics it became possible to become more creative and experimental.”
“The work-based project was a huge challenge. We'd only really done 5 weeks of classes and, having never done anything like this before, it was hard to envisage an end product let alone how to put it together. But got there in the end!”
While the majority of the cohort found the practical components of the PGCert trial most challenging, the feedback also suggested that the inclusion of the second module – which will be available as part of the full programme – will provide more opportunity to practice the practical programming skills like software tools and APIs.
The Effectiveness of Computing with Cultural Heritage
Participants were asked whether they had used any of the knowledge or skills acquired in the PGCert trial. Even after sitting just the first and third modules, participants responded that they were able to apply their learning to their current role in some form.
“I now regularly use the software program I built as part of my day-to-day job. This program performs a task in a few seconds, which otherwise could take hours or days, and which is otherwise subject to human error. I have since adapted this so that it can also be used by a colleague in another department.”
“Python helps me perform tasks that I previously did not know how to achieve. I have also led a couple of training sessions within the library, introducing Python to beginners (using the software I built in the project as a cultural heritage use case to frame the introduction).”
“I changed [job] role at the end of the course so I think that helped me also in getting this promotion. And in this new role I have many more data analysis tasks to perform [quickly] for actions that would take months so yeah I managed to write that with a few scripts in my new role.”
It was great to hear that the impacts of the trial were being felt so immediately by the participants, and that they were able to not only retain but also apply the new skills that they had gained.
This blog post was written by Deirdre Sullivan, Business Support Officer for Digital Scholarship Training Initiatives, part of the Digital Research and Curators Team. Special thanks to Nora McGregor, Digital Curator for the European and American Collection for support on the blog post and Martyn Harris, Institute of Coding Manager, for his work on the final report, as well as Giulia Rossi, Callum McKean and Graham Jevon for sharing their experiences.