Digital Curator Dr Mia Ridge writes, In case you need a break from whatever combination of weather, people and news is around you, here are some ways you can entertain yourself (or the kids!) while helping make collections of the British Library more findable, or help researchers understand our past. You might even learn something or make new discoveries along the way!
Mia Ridge writes: Living with Machines is a collaboration between the British Library and the Alan Turing Institute with partner universities. Help us understand the 'machine age' through the eyes of ordinary people who lived through it. Our refreshed task builds on our previous work, and includes fresh newspaper titles, such as the Cotton Factory Times.
Launched in July this year, Agents of Enslavement? is a research project which explores the ways in which colonial newspapers in the Caribbean facilitated and challenged the practice of slavery. One goal is to create a database of enslaved people identified within these newspapers. This benefits people researching their family history as well as those who simply want to understand more about the lives of enslaved people and their acts of resistance.
Some of the remaining maps are quite tricky to georeference and so if there is a perplexing map that you would like some guidance with do get in contact with myself and our curator for modern mapping by emailing email@example.com and we will try to help. Please do look forward to some exciting news maps being released on the platform in 2022!
Engagement with cultural heritage collections and the research impact beyond mainstream metrics in arts and humanities
Open and Engaged, the British Library’s annual event in Open Access Week, took place virtually on 25 October. The theme of the conference was Understanding the Impact of Open in the Arts and Humanities beyond the University as you may see in a previous blog post.
The slides and the video recordings together with their transcripts are now available through the British Library’s Research Repository. This blog post will give you a flavour of the talks and the sessions in a nutshell.
Two main sessions formed the programme of the conference; one was on increasing the engagement with cultural heritage collections and the other one was on measuring and evaluating impact of open resources beyond journal articles.
British Library and Piazza by Paul Grundy
Session One: Increasing Engagement with Cultural Heritage Collections
The first session was opened with a talk from Brigitte Vézina from Creative Commons (CC). It was about how CC supports GLAM (Galleries, Libraries, Archives and Museums) in embracing open access and unlocking universal access to knowledge and culture. Brigitte introduced CC’s Open GLAM programme which is a coordinated global effort to help GLAMs make the content they steward openly available and reusable for the public good.
The British Library’s Sam van Schaik presented Endangered Archives Programme (EAP) which provides funding for projects to digitise and preserve archival materials at risk of destruction. The resulting digital images and sound files are made available via the British Library’s website. Sam drew attention to the challenges around ethical issues with the CC licenses used for these digital materials and the practical considerations with working globally.
Merete Sanderhoff from National Gallery of Denmark (SMK) raised a concern about how the GLAM sector at the institutional level is lagging behind in embracing the full potential of open cultural heritage. Merete explained that GLAM users increasingly benefit from arts and knowledge beyond institutional walls by using data from GLAM collections and by spurring on developments in digital literacy, citizen science and democratic citizenship.
Towards a National Collection (TaNC), the research development programme funded by AHRC was the last talk of this session and presented by Rebecca Bailey, Programme Director at TaNC. The programme sponsors projects that are working to link collections and encourage cross-searching of multiple collection types, to enable research and enhance public engagement. Rebecca outlined the achievements and ambitions of the projects, as they start to look ahead to a national collections research infrastructure.
This session highlighted that the GLAM sector should embrace their full potential in making cultural heritage open for public good beyond their physical premises. The use of more open and public domain licences will make it easier to use digital heritage content and resources in the research and creative spheres. The challenge comes with the unethical use of digital collections in some cases, but licensing mechanisms are not the tools with which to police research ethics.
Session Two: Measuring and Evaluating Impact of Open Resources Beyond Journal Articles
The second half of the conference started with a metrics project, Cobaltmetrics, which works towards making altmetrics genuinely alternative by using URIs. Luc Boruta from Thunken talked about bringing algorithmic fairness to impact measurement, from web-scale attention tracking to computer-assisted data storytelling.
Gemma Derrick from University of Lancaster presented on the hidden REF experience and highlighted assessing the broader value of research culture. Gemma noted that the doubt in whether the impact can be measured doesn’t comes from lack of tools, but it is more about what is considered as impact that differs between individuals, institutions, and fields of disciplines. As she stated, “the nature of impact and the nature of evaluation is inherently better when humans are involved, mainly because mitigating factors and mitigating aspects of our research, and what makes our research culture really important, are less likely to be overlooked by an automated system.” This is what they addressed in the hidden REF, celebrating all research outputs and every role that makes research possible
Anne Boddington from Kingston University reflected on research impact in three parts; looking at its definition, partnering and collaboration between GLAMs and higher education institutions, and the reflections on future benefits. Anne talked about the challenges of impact, the kinds of evidence it demands and the opportunities it presents. She concluded her talk noting that impact is here to stay and there are significant areas for growth, opportunities for innovation and leadership in the context of impact.
Helen Adams from Oxford University Gardens, Libraries & Museums (GLAM) presented the Online Active Community Engagement (O-ACE) project where they combined arts and science to measure the benefits of online culture for mental health in young people. She highlighted how GLAM organizations can actively involve audiences in medical research and how cultural interventions may positively impact individual wellbeing, prior to diagnosis, treatment, or social prescribing pathways. The conference ended with this great case study on impact assessment.
In her closing remarks, Rachael Kotarski of the British Library underlined that opening up GLAM organizations is not only allowing us to break down the walls of our buildings to get content out there but also crosses those geographic boundaries to get content in front of communities who might not have had a chance to experience it before. It also allows us to work with communities who originated content to understand their concerns and not just the concerns of our organizations. Rachael echoed that licensing restrictions are not the solution to all our questions, or to the ethical issues. It is important that we can reflect on what we have learned to adjust and rethink our approach and identify what really allows us to balance access, engagement, and creativity.
In the context of research impact, we need to centre the human in our assessment and the processes. The other factor in impact assessments is the relatively short period of time to assess it. The examples like O-ACE project also showed us that the creation of impact can take much longer than we think and what impacts can be seen will vary through that time. So, assessing those interventions also needs a longer-term views.
Those who didn’t attend the conference or would like to re-visit the talks can find the recordings in the British Library’s Research Repository. The social media interactions can be followed with #OpenEngaged hashtag.
We are looking forward to hosting the Open and Engaged 2022 hopefully in person at the British Library.
This blog post was written by Ilkay Holt, Scholarly Communications Lead, part of the Research Infrastructure Services team.
Since 29 September, to support and guide the management of its collection, the Library has adopted a new persistent identifier policy. A persistent identifier or PID is a long lasting digital reference to an entity whether it is physical or digital. PIDs are a core component in providing reliable, long-term access to collections and improve their discoverability. They also make it easier to track when and how collections are used. The Library has been using PIDs in various forms for almost a decade but following the creation of a case study as part of the AHRC’s Towards a National Collection funded project, PIDs as IRO Infrastructure, the Library recognised the need to document its rationale and approach to PIDs and lay down principles and requirements for their use.
The Library encourages the use of PIDs across its collections and collection metadata. It recognises the role PIDs have as a component in sustainable, open infrastructure and in enabling interoperability and the use of Library resources. PIDs also support the Library’s content strategy and its goal of connecting rather than collecting as they enable long term and reliable access to resources.
Many different types of PIDs are used across the Library, some of which it creates for itself, e.g. ARKs, and others which it harvests from elsewhere, e.g. DOIs that are used to identify journal articles. While not all existing Library services may meet the requirements described in this policy, it provides a benchmark against which they can be measured and aspire to develop.
To make sure staff at the Library are supported in implementing the policy, a working group has been convened to run until the end of December 2022. This group will raise awareness of the policy and ensure that guidance is made available to any project or service which is under review to consider the use of PIDs.
A public version of the policy is available on this page and an extract with the key points are provided below. The group would like to acknowledge the Bibliothèque nationale de France’s policy which was influential in the creation of this policy.
In its use of identifiers, the British Library adheres to the following principles, which describe the qualities PIDs created, contributed or consumed by the Library must have.
A PID must never be deleted but may be marked as deprecated if required
A PID must be usable in perpetuity to identify its associated entry
A PID must only describe one entity and must never be reused for different entities
A PID must have established versioning processes and procedures in place; these may be defined locally by the Library as a creator or by the PID provider
A PID must have established governance mechanisms, such as contracts, in place to ensure the standards of use of the PID are met and continue to be met
A PID must resolve to metadata about the entity available in both a human and machine readable format
A publicly accessible PID must be resolvable via a global resolver
A PID must have an operating model that is sustainable for long-term persistent use
Established user community
A PID must have an established user community, which has adopted it as a standard, either through an organisation such as the International Organization for Standardization (ISO) or as a de factostandard through widespread adoption; the Library will support and develop the use of new types of PIDs where there is a defined and recognised use case which they would address
A PID must be able to link with the other identifiers in use at the Library through open metadata standards and the capability to cross-reference resources
New PID types or new use
New types of PIDs should only be considered for use in the Library where there is a defined need which cannot reasonably be met by a combination of PIDs already in use
Any new PID type used by the Library should meet the requirements described in this policy
Where a PID type is emerging and does not have an established community, the Library can seek to influence its development in line with principles for open and sustainable infrastructures
These requirements outline the Library’s responsibilities in using PID services and creating PIDs. While the Library uses identifiers which do not meet all of these requirements, they are included for future work and developments.
The Library aspires to assign PIDs to all resources within its collections, both physical and digital, and associated entities, in alignment with the guiding principles of the Library’s content strategy 2020-2023.
The Library has varying levels of involvement in different PID schemes, but all PIDs created by the Library must meet the requirements described in this section and the Library prefers the use of PIDs which meet the principles
Identifiers created by the Library must have an opaque format, i.e. not contain any semantic information within them, to ensure their longevity
A PID must resolve to information about the entity to which it refers
The Library must have a process to specify the granularity at which PIDs are assigned and how relationships between PIDs for component and overarching entities are managed
The Library must have a process to manage versioning including changes, merges and retirement of entities
Standard descriptive information about an entity, e.g. creator, should have a PID
All metadata associated with a PID should comply with Collection Metadata Licensing Guidelines
Where a PID referring to a citable resource resolves to a webpage, that webpage should display a suggested citation including the hyperlink to the PID to encourage ongoing use of the PID outside the Library
If you would like to hear more about this policy and the Library’s approach to persistent identifiers, feel free to contact the Heritage PIDs project on Twitter or email firstname.lastname@example.org.
Posted by Mahendra Mahey, former Manager of British Library Labs or "BL Labs" for short
[estimated reading time of around 15 minutes]
This is is my last day working as manager of BL Labs, and also my final posting on the Digital Scholarship blog. I thought I would take this chance to reflect on my journey of almost 9 years in helping to set up, maintain and enabling BL Labs to become a permanent fixture at the British Library (BL).
BL Labs was the first digital Lab in a national library, anywhere in the world, that gets people to experiment with its cultural heritage digital collections and data. There are now several Gallery, Library, Archive and Museum Labs or 'GLAM Labs' for short around the world, with an active community which I helped build, from 2018.
I am really proud I was there from the beginning to implement the original proposal which was written by several colleagues, but especially Adam Farquhar, former head of Digital Scholarship at the British Library (BL). The project was at first generously funded by the Andrew W. Mellon foundation through four rounds of funding as well as support from the BL. In April 2021, the project became a permanently funded fixture, helped very much by my new manager Maja Maricevic, Head of Higher Education and Science.
The great news is that BL Labs is going to stay after I have left. The position of leading the Lab will soon be advertised. Hopefully, someone will get a chance to work with my helpful and supportive colleague Technical Lead of Labs, Dr Filipe Bento, bright, talented and very hard working Maja and other great colleagues in Digital Research and wider at the BL.
The beginnings, the BL and me!
I met Adam Farquhar and Aly Conteh (Former Head of Digital Research at the BL) in December 2012. They must have liked something about me because I started working on the project in January 2013, though I officially started in March 2013 to launch BL Labs.
I must admit, I had always felt a bit intimidated by the BL. My first visit was in the early 1980s before the St Pancras site was opened (in 1997) as a Psychology student. I remember coming up from Wolverhampton on the train to get a research paper about "Serotonin Pathways in Rats when sleeping" by Lidov, feeling nervous and excited at the same time. It felt like a place for 'really intelligent educated people' and for those who were one for the intellectual elites in society. It also felt for me a bit like it represented the British empire and its troubled history of colonialism, especially some of the collections which made me feel uncomfortable as to why they were there in the first place.
I remember thinking that the BL probably wasn't a place for some like me, a child of Indian Punjabi immigrants from humble beginnings who came to England in the 1960s. Actually, I felt like an imposter and not worthy of being there.
Nearly 9 years later, I can say I learned to respect and even cherish what was inside it, especially the incredible collections, though I also became more confident about expressing stronger views about the decolonisation of some of these. I became very fond of some of the people who work or use it, there are some really good kind-hearted souls at the BL. However, I never completely lost that 'imposter and being an outsider' feeling.
What I remember at that time, going for my interview, was having this thought, what will happen if I got the position and 'What would be the one thing I would try and change?'. It came easily to me, namely that I would try and get more new people through the doors literally or virtually by connecting them to the BL's collections (especially the digital). New people like me, who may have never set foot, or had been motivated to step into the building before. This has been one of the most important reasons for me to get up in the morning and go to work at BL Labs.
So what have been my highlights? Let's have a very quick pass through!
BL Labs Launch and Advisory Board
I launched BL Labs in March 2013, one week after I had started. It was at the launch event organised by my wonderfully supportive and innovative colleague, Digital Curator Stella Wisdom. I distinctly remember in the afternoon session (which I did alone), I had to present my 'ideas' of how I might launch the first BL Labs competition where we would be trying to get pioneering researchers to work with the BL's digital collections.
God it was a tough crowd! They asked pretty difficult questions, questions I myself was asking too which I still didn't know the answer too either.
My first gut feeling overall after the event was, this is going to be hard work. This feeling and reality remained a constant throughout my time at BL Labs.
In early May 2013, we launched the competition, which was a really quick and stressful turnaround as I had only officially started in mid March (one and a half months). I remember worrying as to whether anyone would even enter! All the final entries were pretty much submitted a few minutes before the deadline. I remember being alone that evening on deadline day near to midnight waiting by my laptop, thinking what happens if no one enters, it's going to be disaster and I will lose my job. Luckily that didn't happen, in the end, we received 26 entries.
I am a firm believer that we can help make our own luck, but sometimes luck can be quite random! Perhaps BL Labs had a bit of both!
After that, I never really looked back! BL Labs developed its own kind of pattern and momentum each year:
hunting around the BL for digital collections to make into datasets and make available
helping to make more digital collections openly licensed
having hundreds of conversations with people interested in connecting with the BL's digital collections in the BL and outside
working with some people more intensively to carry out experiments
developing ideas further into prototype projects
telling the world of successes and failures in person, meetings, events and social media
launching a competition and awards in April or May
roadshows before and after with invitations to speak at events around the world
the summer working with competition winners
late October/November the international symposium showcased things from the year
'Nothing interesting happens in the office' - Roadshows, Presentations, Workshops and Symposia!
One of the highlights of BL Labs was to go out to universities and other places to explain what the BL is and what BL Labs does. This ended up with me pretty much seeing the world (North America, Europe, Asia, Australia, and giving virtual talks in South America and Africa).
My greatest challenge in BL Labs was always to get people to truly and passionately 'connect' with the BL's digital collections and data in order to come up with cool ideas of what to actually do with them. What I learned from my very first trip was that telling people what you have is great, they definitely need to know what you have! However, once you do that, the hard work really begins as you often need to guide and inspire many of them, help and support them to use the collections creatively and meaningfully. It was also important to understand the back story of the digital collection and learn about the institutional culture of the BL if people also wanted to work with BL colleagues. For me and the researchers involved, inspirational engagement with digital collections required a lot of intellectual effort and emotional intelligence. Often this means asking the uncomfortable questions about research such as 'Why are we doing this?', 'What is the benefit to society in doing this?', 'Who cares?', 'How can computation help?' and 'Why is it necessary to even use computation?'.
Making those connections between people and data does feel like magic when it really works. It's incredibly exciting, suddenly everyone has goose bumps and is energised. This feeling, I will take away with me, it's the essence of my work at BL Labs!
A full list of over 200 presentations, roadshows, events and 9 annual symposia can be found here.
Competitions, Awards and Projects
Another significant way BL Labs has tried to connect people with data has been through Competitions (tell us what you would like to do, and we will choose an idea and work collaboratively with you on it to make it a reality), Awards (show us what you have already done) and Projects (collaborative working).
At the last count, we have supported and / or highlighted over 450 projects in research, artistic, entrepreneurial, educational, community based, activist and public categories most through competitions, awards and project collaborations.
We also set up awards for British Library Staff which has been a wonderful way to highlight the fantastic work our staff do with digital collections and give them the recognition they deserve. I have noticed over the years that the number of staff who have been working on digital projects has increased significantly. Sometimes this was with the help of BL Labs but often because of the significant Digital Scholarship Training Programme, run by my Digital Curator colleagues in Digital Research for staff to understand that the BL isn't just about physical things but digital items too.
Browse through our project archive to get inspiration of the various projects BL Labs has been involved in or highlighted.
Putting the digital collections 'where the light is' - British Library platforms and others
When I started at BL Labs it was clear that we needed to make a fundamental decision about how we saw digital collections. Quite early on, we decided we should treat collections as data to harness the power of computational tools to work with each collection, especially for research purposes. Each collection should have a unique Digital Object Identifier (DOI) so researchers can cite them in publications. Any new datasets generated from them will also have DOIs, allowing us to understand the ecosystem through DOIs of what happens to data when you get it out there for people to use.
However, BL Labs has not stopped there! We always believed that it's important to put our digital collections where others are likely to discover them (we can't assume that researchers will want to come to BL platforms), 'where the light is' so to speak. We were very open and able to put them on other platforms such as Flickr and Wikimedia Commons, not forgetting that we still needed to do the hard work to connect data to people after they have discovered them, if they needed that support.
Our greatest success by far was placing 1 million largely undescribed images that were digitally snipped from 65,000 digitised public domain books from the 19th Century on Flickr Commons in 2013. The number of images on the platform have grown since then by another 50 to 60 thousand from collections elsewhere in the BL. There has been significant interaction from the public to generate crowdsourced tags to help to make it easier to find the specific images. The number of views we have had have reached over a staggering 2 billion over this time. There have also been an incredible array of projects which have used the images, from artistic use to using machine learning and artificial intelligence to identify them. It's my favourite collection, probably because there are no restrictions in using it.
Read the most popular blog post the BL has ever published by my former BL Labs colleague, the brilliant and inspirational Ben O'Steen, a million first steps and the 'Mechanical Curator' which describes how we told the world why and how we had put 1 million images online for anyone to use freely.
It is wonderful to know that George Oates, the founder of Flickr Commons and still a BL Labs Advisory Board member, has been involved in the creation of the Flickr Foundation which was announced a few days ago! Long live Flickr Commons! We loved it because it also offered a computational way to access the collections, critical for powerful and efficient computational experiments, through its Application Programming Interface (API).
I loved working with artists, its my passion! They are so creative and often not restricted by academic thinking, see the work of Mario Klingemann for example! You can browse through our archives for various artistic projects that used the BL's digital collections, it's inspiring.
I am really proud of helping to create the international GLAM Labs community with over 250 members, established in 2018 and still active today. I affectionately call them the GLAM Labbers, and I often ask people to explore their inner 'Labber' when I give presentations. What is a Labber? It's the experimental and playful part of us we all had as children and unfortunately many have lost when becoming an adult. It's the ability to be fearless, having the audacity and perhaps even naivety to try crazy things even if they are likely to fail! Unfortunately society values success more than it does failure. In my opinion, we need to recognise, respect and revere those that have the courage to try but failed. That courage to experiment should be honoured and embraced and should become the bedrock of our educational systems from the very outset.
Two years ago, many of us Labbers 'ate our own dog food' or 'practised what we preached' when me and 15 other colleagues came together for 5 days to produce a book through a booksprint, probably the most rewarding professional experience of my life. The book is about how to set up, maintain, sustain and even close a GLAM Lab and is called 'Open a GLAM Lab'. It is available as public domain content and I encourage you to read it.
Online drop-in goodbye - today!
I organised a 30 minute ‘online farewell drop-in’ on Wednesday 29 September 2021, 1330 BST (London), 1430 (Paris, Amsterdam), 2200 (Adelaide), 0830 (New York) on my very last day at the British Library. It was heart-warming that the session was 'maxed out' at one point with participants from all over the world. I honestly didn't expect over 100 colleagues to show up. I guess when you leave an organisation you get to find out who you actually made an impact on, who shows up, and who tells you, otherwise you may never know.
Those that know me well know that I would have much rather had a farewell do ‘in person’, over a pint and praying for the ‘chip god’ to deliver a huge portion of chips with salt/vinegar and tomato sauce’ magically and mysteriously to the table. The pub would have been Mc'Glynns (http://www.mcglynnsfreehouse.com/) near the British Library in London. I wonder who the chip god was? I never found out ;)
The answer to who the chip god was is in text following this sentence on white on white text...you will be very shocked to know who it was!- s
Spoiler alert it was me after all, my alter ego
Mahendra's online farewell to BL Labs, Wednesday 29 September, 1330 BST, 2021. Left: Flowers and wine from the GLAM Labbers arrived in Tallinn, 20 mins before the meeting! Right: Some of the participants of the online farewell
Leave a message of good will to see me off on my voyage!
It would be wonderful if you would like to leave me your good wishes, comments, memories, thoughts, scans of handwritten messages, pictures, photographs etc. on the following Google doc:
I will leave it open for a week or so after I have left. Reading positive sincere heartfelt messages from colleagues and collaborators over the years have already lifted my spirits. For me it provides evidence that you perhaps did actually make a difference to somone's life. I will definitely be re-reading them during the cold dark Baltic nights in Tallinn.
I would love to hear from you and find out what you are doing, or if you prefer, you can email me, the details are at the end of this post.
BL Labs Sailor and Captain Signing Off!
It's been a blast and lots of fun! Of course there is a tinge of sadness in leaving! For me, it's also been intellectually and emotionally challenging as well as exhausting, with many ‘highs’ and a few ‘lows’ or choppy waters, some professional and others personal.
I have learned so much about myself and there are so many things I am really really proud of. There are other things of course I wish I had done better. Most of all, I learned to embrace failure, my best teacher!
I think I did meet my original wish of wanting to help to open up the BL to as many new people who perhaps would have never engaged in the Library before. That was either by using digital collections and data for cool projects and/or simply walking through the doors of the BL in London or Boston Spa and having a look around and being inspired to do something because of it.
I wish the person who takes over my position lots of success! My only piece of advice is if you care, you will be fine!
Anyhow, what a time this has been for us all on this planet? I have definitely struggled at times. I, like many others, have lost loved ones and thought deeply about life and it's true meaning. I have also managed to find the courage to know what’s important and act accordingly, even if that has been a bit terrifying and difficult at times. Leaving the BL for example was not an easy decision for me, and I wish perhaps things had turned out differently, but I know I am doing the right thing for me, my future and my loved ones.
Though there have been a few dark times for me both professionally and personally, I hope you will be happy to know that I have also found peace and happiness too. I am in a really good place.
I would like to thank former alumni of BL Labs, Ben O'Steen - Technical Lead for BL Labs from 2013 to 2018, Hana Lewis (2016 - 2018) and Eleanor Cooper (2018-2019) both BL Labs Project Officers and many other people I worked through BL Labs and wider in the Library and outside it in my journey.
Where I am off to and what am I doing?
My professional plans are 'evolving', but one thing is certain, I will be moving country!
To Estonia to be precise!
I plan to live, settle down with my family and work there. I was never a fan of Brexit, and this way I get to stay a European.
I would like to finish with this final sweet video created by writer and filmaker Ling Low and her team in 2016, entitled 'Hey there Young Sailor' which they all made as volunteers for the Malaysian band, the 'Impatient Sisters'. It won the BL Labs Artistic Award in 2016. I had the pleasure and honour of meeting Ling over a lovely lunch in Kuala Lumpa, Malaysia, where I had also given a talk at the National Library about my work and looked for remanants of my grandfather who had settled there many years ago.
I wish all of you well, and if you are interested in keeping in touch with me, working with me or just saying hello, you can contact me via my personal email address: email@example.com or follow my progress on my personal website.
Happy journeys through this short life to all of you!
Six months ago, twenty members of staff from the British Library and The National Archives UK completed Computing for Cultural Heritage, a project that trialled Birkbeck University and Institute of Coding’s new PGCert, Applied Data Science. In this blog post we explore the necessity of this new course, the final report of this trial, and the lasting impact that this PGCert has made on some of the participants.
Information professionals have been experiencing a massive shift to digital in the way collections are being donated, held and accessed. In the British Library’s digital collections there are e-books, maps, digitised newspapers, journal titles, sound recordings and over 500 terabytes of preserved data from the UK Web Archive. Yearly, the library sees 6 million catalogue searches by web users with almost 4 million items consulted online. This amounts to a vast amount of potential cultural heritage data available to researchers, and it requires complex digital workflows to curate, collect, manage, provide access, and help researchers computationally make sense of it all.
Staff at collecting institutions like the British Library and the National Archives, UK are engaging in computationally driven projects like never before, but often without the benefit of data skills and computational thinking to support them. That is where a program like Computing for Cultural Heritage can help information professionals, allowing them to upskill and tackle issues – like building new digital systems and services, supporting collaborative, computational and data-driven research using digital collections and data, or deploying simple scripts to make everyday tasks easier – with confidence.
The trial course was broken into two modules, a taught lesson on ‘Demystifying Computing with Python’ and a written ‘Industry Project’ on a software solution to a work-based problem. A third module, Analytic Tools for Information Professionals, would be offered to participants outside of the trial as part of the full live course in order to earn their PGCert.
By the end of the trial, participants were able to:
Demonstrate satisfactory knowledge of programming with Python.
Understand techniques for Python data structures and algorithms.
Work on case studies to apply data analytics using Python.
Understand the programming paradigm of object-oriented programming.
Use Python to apply the techniques learned on the module to real-world problems.
Demonstrate the ability to develop an algorithm to carry out a specified task and to convert this into an executable program.
Demonstrate the ability to debug a program.
Understand the concepts of data security and general data protection regulations and standards.
Develop a systematic understanding and critical awareness of a commonly agreed problem between the work environment and the academic supervisor in the area of computing.
Develop a software solution for a work-based problem using the skills developed from the taught modules, for example develop software using the programming languages and software tools/libraries taught.
Present a critical discussion on existing approaches in the particular problem area and position their own approach within that area and evaluate their contribution.
Gain experience in communicating complex ideas/concepts and approaches/techniques to others by writing a comprehensive, self-contained report.
The learning objectives were designed and delivered with the cultural heritage context in mind, and as such incorporated, for instance, examples and datasets from the British Library Music collections in the Python programming elements of the taught module. Additionally, there was a lecture focused on a British Library user case involving the design and implementation of a Database Management System.
Following the completion of the trial, participants had the opportunity to complete their PGCert in Applied Data Science by attending the final module, Analytic Tools for Information Professionals, which was part of the official course launched last autumn.
The Lasting Impact of Computing for Cultural Heritage
Now that we’re six months on from the end of the trial, and the participants who opted in have earned their full PGCert, we followed up with some of the learners to hear about their experiences and the lasting effects of the course:
“The third and final module of the computing for cultural heritage course was not only fascinating and enjoyable, it was also really pertinent to my job and I was immediately able to put the skills I learned into practice.
The majority of the third module focussed on machine learning. We studied a number of different methods and one of these proved invaluable to the Agents of Enslavement research project I am currently leading. This project included a crowdsourcing task which asked the public to draw rectangles around four different types of newspaper advertisement. The purpose of the task was to use the coordinates of these rectangles to crop the images and create a dataset of adverts that can then be analysed for research purposes. To help ensure that no adverts were missed and to account for individual errors, each image was classified by five different people.
One of my biggest technical challenges was to find a way of aggregating the rectangles drawn by five different people on a single page in order to calculate the rectangles of best fit. If each person only drew one rectangle, it was relatively easy for me to aggregate the results using the coding skills I had developed in the first two modules. I could simply find the average (or mean) of the five different classification attempts. But what if people identified several adverts and therefore drew multiple rectangles on a single page? For example, what if person one drew a rectangle around only one advert in the top left corner of the page; people two and three drew two rectangles on the same page, one in the top left and one in the top right; and people four and five drew rectangles around four adverts on the same page (one in each corner). How would I be able to create a piece of code that knew how to aggregate the coordinates of all the rectangles drawn in the top left and to separately aggregate the coordinates of all the rectangles drawn in the bottom right, and so on?
One solution to this problem was to use an unsupervised machine learning method to cluster the coordinates before running the aggregation method. Much to my amazement, this worked perfectly and enabled me to successfully process the total of 92,218 rectangles that were drawn and create an aggregated dataset of more than 25,000 unique newspaper adverts.”
“The final module of the course was in some ways the most challenging — requiring a lot of us to dust off the statistics and algebra parts of our brain. However, I think, it was also the most powerful; revealing how machine learning approaches can help us to uncover hidden knowledge and patterns in a huge variety of different areas.
Completing the course during COVID meant that collection access was limited, so I ended up completing a case study examining how generic tropes have evolved in science fiction across time using a dataset extracted from GoodReads. This work proved to be exceptionally useful in helping me to think about how computers understand language differently; and how we can leverage their ability to make statistical inferences in order to support our own, qualitative analyses.
In my own collection area, working with born digital archives in Contemporary Archives and Manuscripts, we treat draft material — of novels, poems or anything else — as very important to understanding the creative process. I am excited to apply some of these techniques — particularly Unsupervised Machine Learning — to examine the hidden relationships between draft material in some of our creative archives.
The course has provided many, many avenues of potential enquiry like this and I’m excited to see the projects that its graduates undertake across the Library.”
-Callum McKean, Lead Curator, Digital; Contemporary British Collection
"I really enjoyed the Analytics Tools for Data Science module. As a data science novice, I came to the course with limited theoretical knowledge of how data science tools could be applied to answer research questions. The choice of using real-life data to solve queries specific to professionals in the cultural heritage sector was really appreciated as it made everyday applications of the tools and code more tangible. I can see now how curators’ expertise and specialised knowledge could be combined with tools for data analysis to further understanding of and meaningful research in their own collection area."
The Computing for Cultural Heritage project concluded in February 2021 with a virtual panel session that highlighted the learners’ projects and allowed discussion of the course and feedback to the key project coordinators and contributors. Case studies of the participants’ projects, as well as links to other blog posts and project pages can be found on our Computing for Cultural Heritage Student Projects page.
The final report highlights these projects as well as demographical statistics on the participants and feedback that was gained through anonymous survey at the end of the trial. In order to evaluate the experience of the students on the PGCert we composed a list of questions that would provide insight into various aspects of the course with respect to how the learner fit in the work around their work commitments and how well they met the learning objectives.
Why Computing for Cultural Heritage?
Figure 1: Why did you choose to do this course? Results breakdown by topic and gender
When asked why the participants chose to take part in the course, we found that one of the most common answers was to develop methods for automating repetitive, manual tasks – such as generating unique identifiers for digital records and copying data between Excel spreadsheets – to free up more curatorial time for their digital collections. One participant said:
“I wanted to learn more about coding and how to use it to analyse data, particularly data that I knew was rich and had value but had been stuck in multiple spreadsheets for quite some time.”
There was also a desire to learn new skills, either for personal or professional development:
“I believe in continuous professional development and knew that this would be an invaluable course to undertake for my career.”
“I felt I was lagging behind and my job was getting static, and the feeling that I was behind [in digital] and I wanted to kind of catch up.”
Figure 2: 'Did the course help you meet your aims? Results broken down by answer and gender.
A follow up question asked whether these goals and aims was met by the course. Happily, most participants indicated that they had been met, for reasons of increased confidence, help in developing new computational skills, and a deeper knowledge of information technology.
What was the most enjoyed aspect of the course?
Figure 3: 'What did you enjoy most about the course?' Results breakdown by topic and gender
When broken down, the responses to ‘What did you enjoy most’ largely reflect the student experience, whether it was being in taught modules (4), getting hands on experience (4), or being in a learning environment again (6). Participants also indicated that networking with peers was an enjoyable part of the experience:
“Day out of work with like minded people made it really easy to stick with rather than just doing it online.”
“Spending a day away from work and meeting the people I had never met at the NA, and also speaking to people from the BL about what they did.”
“I enjoyed being a student again, learning a new skill amongst my peers, which week after week is a really valuable experience…”
“Learning with colleagues and people working in similar fields was also a plus, as our interests often overlapped...”
While only two responses were made where the project module was considered as one of the most enjoyable components, it was useful to see how the course really afforded the opportunity to apply their learning to solving a work-based problem that provides some benefit to their role, department or digital collection:
“I really enjoyed being able to apply my learning to a real-world work-based project and to finally analyze some of the data that has been lying around the department for over a decade without any further analysis.”
“The design and create aspect of the project. Applying what I learned to solving a genuine problem was the most enjoyable part - using Python and solving problems to achieve something tangible. This is where I really consolidated my learning.”
What was the most challenging aspect of the course?
Figure 4: 'What did you find the most challenging and why?' Results breakdown by topic and gender.
When discussing the most challenging aspect of the course, most of the learners focused on the practical Python lab sessions and the work-based project module. Interestingly, participants also stated that they were able to overcome the challenges through personal perseverance and the learning provided by the course itself:
“I found the initial hurdle of learning how [to] code very challenging, but after the basics it became possible to become more creative and experimental.”
“The work-based project was a huge challenge. We'd only really done 5 weeks of classes and, having never done anything like this before, it was hard to envisage an end product let alone how to put it together. But got there in the end!”
While the majority of the cohort found the practical components of the PGCert trial most challenging, the feedback also suggested that the inclusion of the second module – which will be available as part of the full programme – will provide more opportunity to practice the practical programming skills like software tools and APIs.
The Effectiveness of Computing with Cultural Heritage
Figure 5: 'Have you applied anything you have learnt?' Results breakdown by topic and gender.
Participants were asked whether they had used any of the knowledge or skills acquired in the PGCert trial. Even after sitting just the first and third modules, participants responded that they were able to apply their learning to their current role in some form.
“I now regularly use the software program I built as part of my day-to-day job. This program performs a task in a few seconds, which otherwise could take hours or days, and which is otherwise subject to human error. I have since adapted this so that it can also be used by a colleague in another department.”
“Python helps me perform tasks that I previously did not know how to achieve. I have also led a couple of training sessions within the library, introducing Python to beginners (using the software I built in the project as a cultural heritage use case to frame the introduction).”
“I changed [job] role at the end of the course so I think that helped me also in getting this promotion. And in this new role I have many more data analysis tasks to perform [quickly] for actions that would take months so yeah I managed to write that with a few scripts in my new role.”
It was great to hear that the impacts of the trial were being felt so immediately by the participants, and that they were able to not only retain but also apply the new skills that they had gained.
This blog post was written by Deirdre Sullivan, Business Support Officer for Digital Scholarship Training Initiatives, part of the Digital Research and Curators Team. Special thanks to Nora McGregor, Digital Curator for the European and American Collection for support on the blog post and Martyn Harris, Institute of Coding Manager, for his work on the final report, as well as Giulia Rossi, Callum McKean and Graham Jevon for sharing their experiences.
Can you help University of Reading researchers with their studies examining the potential therapeutic effects of looking at ‘soothing’ images and listening to natural sounds on mental health and wellbeing?
Both surveys are completely randomised; some participants will be asked to look at images only, others to listen to sounds only, and the final group to look at images while listening to the sounds at the same time. These research projects have been fully approved by the University of Reading’s ethical standards board. If you have any questions about these surveys, please email Jasmiina Ryyanen (j.ryynanen(at)student.reading.ac.uk) and Emily Witten (e.i.c.witten(at)student.reading.ac.uk).
We hope you enjoy participating in these surveys and feel suitably soothed from the experience!
I began a work placement with the Two Centuries of Indian Print project from the British Library working with my supervisor (Digital Curator) Tom Derrick, to automatically transcribe the Library’s Bengali books digitised and catalogued as part of the project. The OCR application we use for transcription is Transkribus, a leading text recognition application for historical documents. We also use a Google Sheet to instantly update each book’s basic information and job status.
In the first two days, I accepted training in how to use the Transkribus application by a face-to-face (virtual) demonstration from my supervisor since I didn't know how to use OCR. He also provided a manual for me to refer to in my practice. There are three main steps to complete a book transcription: uploading books, running layout analysis, and running text detection. We upload books from the British Library’s IIIF image viewer to Transkribus. I needed to first confirm the name and digital system number of a book from our team’s shared Google Sheet so that I could find the digital content of this book within the BL online catalogue. I would record the number of pages the book has into the Google Sheet at the same time. Then I copied the URL of the IIIF manifest and import this book into the collection of our project in Transkribus. After that, I would run layout analysis in Transkribus. It usually takes several minutes to run, and the more pages there are the more time it will take. Perfect layout analysis is where there is one baseline for each line of text on a page.
Although Transkribus is trained on 100+ pages, it still makes mistakes due to multiple causes. Title or chapter headers whose font size differs significantly from other text sometimes would be missed; patterned dividers and borders in the title page will easily been incorrectly identified as text; sometimes the color of paper is too dark, making it difficult to recognize the text. In these cases, the user needs to manually revise the recognition result. After checking the quality of the text analysis, I could then run text recognition. The final step is to check the results of the text recognition and update the Google Sheet.
Above: A view of a book in the Transkribus application, showing the page images and transcription underneath
During the three weeks of the placement, I handled a total of twelve books. In addition to the regular progression patterns described earlier, I was fortunate to come across several books that required special handling and used them to learn how to handle various situations. For example, the image above shows the result of text recognition for a page of the first book I dealt with in Transkribus, Dhārāpāta: prathama bhāg. Pāṭhaśālastha śiśu digera śikshārtha/ Cintāmani Pāl. Every word in this book is very short and widely spaced, making it very difficult for Transkribus to identify the layout. Because the book is only 28 pages long, I manually labeled all the layouts.
In addition to my work, I have had the pleasure of interacting with many British Library curators and investigators who are engaged in digitization. I attended a regular meeting of our project and learnt the division of labor of the digital project members. Besides, my supervisor Tom contacted some colleagues who work related to the digitization of Chinese collections and provided me with the opportunity to meet them, which has benefited me a lot.
The Principal Investigator for our 2CIP project, Adi, who also has been involved with research and development of Chinese OCR/HTR at the British Library, shared with me the challenges of Chinese OCR/HTR and the progress of current research at the British Library.
Curator for the International Dunhuang Project, Melodie, and a project manager, Tan, presented the research content and outcomes of the project. This project has many partner institutions in different countries that have collections related to the Silk Road. It is a very meaningful digitization project and I admire the development of this project.
The lead Curator for the British Library’s Chinese collections, Sara, introduced different types of Chinese collections and some representative collections in the British Library to me. She also shared with me the objective problems they would encounter when digitizing collections.
Three weeks passed quickly and I gained a lot from my experience at the British Library. In addition to the specifics of how to use Transkribus for text recognition, I have learned about the achievements and problems faced in digitizing Chinese collections from a variety of perspectives.
If the page doesn’t already exist, then creating it is also very simple: just select ‘create a new item’ from the menu on the left-hand side of the page.
When using Wikidata, there are some powerful tools which make adding data quicker and easier. One of these is Quick Statements. Unfortunately, using QuickStatements requires that you have made 50 edits on Wikidata before you make your first batch. Fortunately, it is rather quicker than Citation Hunt (for which, see Triangulating Bermuda, Detroit and William Wallace).
Creating a new item in Wikidata
I made those 50 edits very quickly, by setting up Wikidata item pages for each of the sample items from the India Office Records that we are working with (at the moment we are prioritising adding information about the records; further work will take place before any digitised items are uploaded to Wikimedia platforms). Basic information was added to each of the item pages.
Q107074810 (1888-9 Report on the Administration of Bengal)
Q107074801 (1889-90 Report on the Administration of Bengal)
Once I had done this, it became clear that I needed to create more general pages, which could contain the DOIs that link back to the digitised records which are currently only accessible via batch download through the British Library research repository.
Q107134086 Page for administrative reports (V/10/60-1) in general.
Q107136752 Page for India lists (v/13/173-6) in general.
The final preparatory step was to create a WikiProject page, which will facilitate collaboration on the project. This page contains links to all the pages involved in the project and will soon also contain useful resources such as templates for creating new pages as part of the project and queries for using the data.
After this, I began to experiment with Quick Statements, making heavy use of the useful guide to it available on Wikidata.
I decided to upload information on members of a particular regiment in Bengal, since this was information I could easily copy into a spreadsheet because the versions of the reports in the British Library research repository support Optical Character Recognition (OCR).
Section of the original India Office List containing information on members of the 14th Infantry Regiment (IOR/V/6/175, page 258)
Finally, once I had done all of this, I met with the curators of the India Office Records for feedback and suggestions. It became clear from this that there was in fact some confusion about the exact identification of the regiment they were involved in. Fortunately, it turned out we had identified the correct regiment, but had we made a mistake, it would have just required a simple batch of the Quick Statement edits to quickly put right.
Section of my spreadsheet of members of the 14th Infantry Regiment
All in all, I can recommend using Wikidata and I hope I have shown that I can be a useful tool, but also that it is easy to use. The next step for our Wikidata project will be to upload templates and case studies to help and support future volunteer editors to develop it further. We will also add resources to support research on the uploaded data.
Screenshot of Quick Statements for adding gender to each of the pages for the officers