Digital scholarship blog

225 posts categorized "Data"

30 November 2021

BL Labs Online Symposium 2021, Special Climate Change Edition: Speakers Announced!

BL Labs 9th Symposium – Special Climate Change Edition is taking place on Tuesday 7 December 2021. This special event is devoted to looking at computational research and climate change.

A polar bear jumping off an iceberg with the rear of a ship showing. Image captioned: 'A Bear Plunging Into The Sea'
British Library digitised image from page 303 of "A Voyage of Discovery, made under the orders of the Admiralty, in his Majesty's ships Isabella and Alexander for the purpose of exploring Baffin's Bay, and enquiring into the possibility of a North-West Passage".

To help us explore a range of complex issues at the intersection of computational research and climate change we are delighted to announce our expert panel:

  • Schuyler Esprit – Founding Director of Create Caribbean Research Institute & Research Officer at the School of Graduate Studies and Research at the University of West Indies
  • Helen Hardy – Science Digital Programme Manager at the Natural History Museum, London, responsible for mass digitisation of the Museum’s collections of 80 million items
  • Joycelyn Longdon – Founder of ClimateInColour, a platform at the intersection of climate science and social justice, and PhD Student on the Artificial Intelligence for Environmental Risk programme at University of Cambridge
  • Gavin Shaddick – Chair of Data Science and Statistics, University of Exeter, Director of the UKRI funded Centre for Doctoral Training in Environmental Intelligence: Data Science and AI for Sustainable Futures, co-Director of the University of Exeter-Met Office Joint Centre for Excellence in Environmental Intelligence and an Alan Turing Fellow
  • Richard Sandford – Professor of Heritage Evidence, Foresight and Policy at the Institute of Sustainable Heritage at University College London
  • Joseph Walton – Research Fellow in Digital Humanities and Critical and Cultural Theory at the University of Sussex

Join us for this exciting discussion addressing issues such as how digitisation can improve research efficiency, discussing pros and cons of AI and machine learning in relation to climate change, and the links between new technologies, climate and social justice.

You can see more details about our panel and book your place here.

11 November 2021

The British Library Adopts a New Persistent Identifier Policy

Since 29 September, to support and guide the management of its collection, the Library has adopted a new persistent identifier policy. A persistent identifier or PID is a long lasting digital reference to an entity whether it is physical or digital. PIDs are a core component in providing reliable, long-term access to collections and improve their discoverability. They also make it easier to track when and how collections are used. The Library has been using PIDs in various forms for almost a decade but following the creation of a case study as part of the AHRC’s Towards a National Collection funded project, PIDs as IRO Infrastructure, the Library recognised the need to document its rationale and approach to PIDs and lay down principles and requirements for their use.

An image of the world at night from space, showing the bright lights of cities and towns
Photo by NASA on Unsplash

The Library encourages the use of PIDs across its collections and collection metadata. It recognises the role PIDs have as a component in sustainable, open infrastructure and in enabling interoperability and the use of Library resources. PIDs also support the Library’s content strategy and its goal of connecting rather than collecting as they enable long term and reliable access to resources.  

Many different types of PIDs are used across the Library, some of which it creates for itself, e.g. ARKs, and others which it harvests from elsewhere, e.g. DOIs that are used to identify journal articles. While not all existing Library services may meet the requirements described in this policy, it provides a benchmark against which they can be measured and aspire to develop.

To make sure staff at the Library are supported in implementing the policy, a working group has been convened to run until the end of December 2022. This group will raise awareness of the policy and ensure that guidance is made available to any project or service which is under review to consider the use of PIDs.

A public version of the policy is available on this page and an extract with the key points are provided below. The group would like to acknowledge the Bibliothèque nationale de France’s policy which was influential in the creation of this policy.

Principles

In its use of identifiers, the British Library adheres to the following principles, which describe the qualities PIDs created, contributed or consumed by the Library must have.  

  • A PID must never be deleted but may be marked as deprecated if required
  • A PID must be usable in perpetuity to identify its associated entry
  • A PID must only describe one entity and must never be reused for different entities 
  • A PID must have established versioning processes and procedures in place; these may be defined locally by the Library as a creator or by the PID provider  
  • A PID must have established governance mechanisms, such as contracts, in place to ensure the standards of use of the PID are met and continue to be met  
  • A PID must resolve to metadata about the entity available in both a human and machine readable format 
  • A publicly accessible PID must be resolvable via a global resolver
  • A PID must have an operating model that is sustainable for long-term persistent use 

Established user community 

  • A PID must have an established user community, which has adopted it as a standard, either through an organisation such as the International Organization for Standardization (ISO) or as a de factostandard through widespread adoption; the Library will support and develop the use of new types of PIDs where there is a defined and recognised use case which they would address 

Interoperable 

  • A PID must be able to link with the other identifiers in use at the Library through open metadata standards and the capability to cross-reference resources 

New PID types or new use 

  • New types of PIDs should only be considered for use in the Library where there is a defined need which cannot reasonably be met by a combination of PIDs already in use 
  • Any new PID type used by the Library should meet the requirements described in this policy 
  • Where a PID type is emerging and does not have an established community, the Library can seek to influence its development in line with principles for open and sustainable infrastructures 

Requirements

These requirements outline the Library’s responsibilities in using PID services and creating PIDs. While the Library uses identifiers which do not meet all of these requirements, they are included for future work and developments.  

  • The Library aspires to assign PIDs to all resources within its collections, both physical and digital, and associated entities, in alignment with the guiding principles of the Library’s content strategy 2020-2023
  • The Library has varying levels of involvement in different PID schemes, but all PIDs created by the Library must meet the requirements described in this section and the Library prefers the use of PIDs which meet the principles
  • Identifiers created by the Library must have an opaque format, i.e. not contain any semantic information within them, to ensure their longevity 
  • A PID must resolve to information about the entity to which it refers 
  • The Library must have a process to specify the granularity at which PIDs are assigned and how relationships between PIDs for component and overarching entities are managed 
  • The Library must have a process to manage versioning including changes, merges and retirement of entities 
  • Standard descriptive information about an entity, e.g. creator, should have a PID 
  • All metadata associated with a PID should comply with Collection Metadata Licensing Guidelines 
  • Where a PID referring to a citable resource resolves to a webpage, that webpage should display a suggested citation including the hyperlink to the PID to encourage ongoing use of the PID outside the Library

If you would like to hear more about this policy and the Library’s approach to persistent identifiers, feel free to contact the Heritage PIDs project on Twitter or email openaccess@bl.uk.

This post is by Frances Madden (@maddenfc, orcid.org/0000-0002-5432-6116), Research Associate (PIDs as IRO Infrastructure) in the Research Infrastructure Services team.

10 November 2021

BL Labs Online Symposium 2021, Special Climate Change Edition: Book your place for webinar on Tuesday 7 December 2021

In response to the Climate Emergency and issues raised by the COP26, the 9th British Library Labs Symposium is devoted to looking at computational research and climate change.  Registration Now Open.

Futuristic, hologram looking version of the globe overlaid with images like wind turbines, water drops, trees and graphs.

The British Library Labs is the British Library programme dedicated to enabling people to experiment with our digital collections, including deploying computational research methods and using our collections as data. This inevitably means that we, and the communities we work with, are increasingly applying computational tools and methods that have environmental impact on our planet.

As our millions of pages of digitised content are becoming an exciting new research frontier, and we are increasingly using machine learning methods and tools on the large-scale projects, such as the Living with Machines project, it is also inevitable that this exciting new work comes with the increased use of computational resource and energy. With the view of the climate emergency, we are hoping to ensure that climate and sustainability considerations inform everything we do – meaning that we need much better understanding of digital environmental impacts and how this should inform our practice in all things related to computational research.

We know that this is not a simple issue - digitisation and digital preservation is often a lifeline for cultural heritage in the communities where museums, libraries and archives are already endangered due to the climate change - for example, the British Library’s Endangered Archives Programme is dedicated to digitising and saving archives in danger of destruction, including due to climate change. The new digital resources, such the UK Web Archive’s collections, the Climate Change collection in particular, as well as the International Internet Preservation Consortium’s Climate Change collection, are essential resources for climate researchers, especially as we are increasingly working with researchers who wish to text and data mine our collections for the insights that can broaden our understanding of changing climate and biodiversity, and the impact of these changes on different communities.

Equally, as in all other areas related to the impacts of climate change, we are aware that in relation to digital research, there is also a strong interdependency with the issues of equality and social justice. Digital advancements are enablers of new research, helping us to better understand different communities and to broaden access and opportunities, but we also need to consider how the complexities of computational research and access, as well as expensive set up and energy requirements of the state-of-art infrastructures, might disadvantage researchers and communities that do not have access to relevant technologies, or to prohibitively expensive and energy-demanding resources required to run them.

For this year’s BL Labs Symposium, we are bringing a group of speakers that will consider these issues from different angles - from large-scale digitisation, to digital humanities, climate and biodiversity research, as well as the impact of AI. We will look into how our digital strategies and projects can help us fight climate change and be more inclusive, but also how we can improve our sustainability and reduce our impact on the planet.

As well as the views from our panel, there will be an opportunity for an extended audience input, helping us to bring forward the views from the broader Labs community and learn together how our practice can be improved.

The 9th BL Labs Symposium takes place on Zoom on Tuesday 7th December from 16.30 until 18.00. Book your place now.

29 October 2021

Thought Bubble 2021 Wikithon Preparation

Comics fans, are you getting geared up for Thought Bubble? If you enjoy, or want to learn how to edit Wikipedia and Wikidata about comics, please do join us and our collaborators at Leeds Libraries for our first in-person Wikithon since this residency started, on Thursday 11th November, from 1.30pm to 4.30pm, in the Sanderson Room of Leeds Central Library.

Drawing of a person reading a comic and drinking a mug of tea

Joining us in person?

Remember the first step is to book your place here, via Eventbrite

If you’d like to get a head start, you can download and read our handy guide to setting up your Wikipedia account. There is advice on creating your account, Wikipedia's username policy and how to create your user page.

Once you have done that, or if you already have a Wikipedia account, please join our Thought Bubble Wikithon dashboard (the enrollment passcode is ltspmyfa) and go through the introductory exercises, which cover:

  • Wikipedia Essentials
  • Editing Basics
  • Evaluating Articles and Sources
  • Contributing Images and Media Files
  • Sandboxes and Mainspace
  • Sources and Citations
  • Plagiarism
  • Introduction to Wikidata (for those interested in this)

These are all short exercises that will help familiarise you with Wikipedia and its processes. Don’t have time to do them? We get it, and that’s totally fine - we’ll cover the basics on the day too!

You may want to verify your Wikipedia account - this function exists to make sure that people are contributing responsibly to Wikipedia. The easiest and swiftest way to verify your account is to do 10 small edits. You could do this by correcting typos or adding in missing dates. However, another way to do this is to find articles where citations are needed, and add them via Citation Hunt. For further information on adding citations, watching this video may be useful.

When it comes to Wikidata, we are very inspired by the excellent work of the Graphic Possibilities project at the Michigan University Department of English and we have been learning from them. For those interested in editing Wikidata we will be on hand to support this during our Thought Bubble Wikithon event.

Happier with a hybrid approach?

If you cannot join the physical event in person, but would like to contribute, please do check out and sign up to our dashboard. Although we cannot run the training as a hybrid presentation on this occasion, the online dashboard training exercises will be an excellent starting point. From there, all of your edits and contributions will be registered, and you can pat yourself firmly on the back for making the world of comics a better place from a distance.

However, if you can attend in person, please register for the Wikithon at Leeds Central Library here and check out the Thought Bubble festival programme here. Hope to see you there!

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian) and Digital Curator Stella Wisdom (@miss_wisdom).

26 October 2021

On Digital Technologies, Our Cultural Heritage and Global Warming. How do they come together in Venice?

Global warming does not affect only the environment, it affects the entire system we live in. We can’t think of it as detached from gender, social and racial inequalities. Neither as something separated from our cultural heritage. For this reason, when we think about actions we shouldn’t focus only on emissions reductions, but also think about how to preserve our cultural and artistic production and learn how this, with the aid of new technologies, can help us find new ways to shape our future.

Last year, during my spare time, with the help of Marco Magini (writer and environmental policy adviser), Paolo Nelli (writer) and Maddalena Vatti (producer) I started investigating what role digital technologies play in a city like Venice, which is notoriously under the threat of rising waters, and even more so with the increased global warming.

On the 13th of November 2019 an exceptional acqua alta (a high tide) hit the city bringing one of the worst devastation of the last century. Various archives, buildings, commercial activities, homes and cultural venues were damaged. This prompted a question: what can we understand from an event like this? Is the case of Venice an isolated one or is it a cautionary tale for humanity? After all Venice is not the only city which is sinking and where rising tides threaten to unravel the urban fabric. We should not simply mourn the devastation and start to repair the damage, we should consider the event as an opportunity to think about the direct impact of global warming on our cultural heritage and what we can do to reduce it.

While conducting interviews with scholars, experts, professionals and citizens with the aim of producing a podcast, we slowly came to understand the role and potential of digital technologies in the study of the evolution of a city in respect to changing climate and urban conditions, as well as the role these play in its preservation.

Digital preservation, 3D rendering and water sensors

A fantastic example of digital preservation  is the one carried out  between the 6th and 17th of July 2020 by a team from the Factum Foundation for Digital Technology in Conservation in collaboration with the Cini Foundation, EPFL and Iconem (https://www.factumfoundation.org/pag/1640/recording-the-island-of-san-giorgio-maggiore). They spent twelve days in Venice recording the Island of San Giorgio Maggiore in its entirety. The result was a virtual rendering of the island made using a mix of LID long-range LIDAR scanning to capture the overall shape of the buildings, external and internal views and high resolution photogrammetry to add the surface detail to that. The island was recorded from more than 600 different recording spots, from which a massive 60.000 million-point cloud was generated. The data acquired through photogrammetry is currently being merged with the point-clouds with the aim of creating a 3D model of the whole island.

two images of the same statue side by side, the one on the right uses high resolution photogrammetry
First (right) and final (right) data processing of the render of one the statues on the façade © Factum Foundation for ARCHiVe

This massive work enabled researchers to study the sculptures and the inscriptions that are high up on the facade of San Giorgio but also to analyse the way that the plaster covering the walls was being affected by salt and peeling off.

Thanks to these data it is now possible to carry out really detailed recording of the breakdown of a surface and also monitor the speed at which the cobalt coverings are being blown off by the salt, the speed of decay, to really look and create data to discuss how best to preserve the material heritage on the island.

Camera obscura, painting and digital image analysis: what can the past tell us about the present and the future

It is also possible to use paintings and buildings to look at the past to learn our present. In fact, these artifacts can unconsciously record events and phenomena that postdate their own creation, carrying them into the future.

The researcher in atmospheric physics and cultural heritage Dario Camuffo has conducted a scientific analysis of the works of Venetian painters, Canaletto in particular, depicting buildings and compared them with the state of the very same buildings today in an attempt to calculate the impact of land subsidence in Venice.

Painting of The Grand Canal in Venice
Canaletto (Venice 1697-Venice 1768) - The Grand Canal looking East from the Carità towards the Bacino

As professor Camuffo has written, “in general paintings provide a qualitative image, but in Venice’s case, a quantitative evaluation of the apparent sea level rise is possible, thanks to accurate paintings by Canaletto and Bellotto, drawn with the aid of the camera obscura. The paintings accurately reproduce all of the details with a high degree of precision, including the algae belt. […] By analysing these paintings, and comparing them with the algae level we see today, we can extend our knowledge of Venice’s submersion, reaching back in time almost as far back as three centuries.”

How many stories and information are buried in the archives? Deep learning image analysis can help to reveal them, we just need to think creatively.

Maps and algorithms, space syntax, literature and architecture

Maps and literature can also reveal more stories about a city than we think.

UCL/Bartlett Institute Professor Sophia Psarra, drawing inspiration from Italo Calvino’s Invisible Cities and Le Corbusier’s discarded project for the Venice Hospital, has studied the urban evolution of Venice computing the distribution and distances between bridges, calli (=tiny alleys), squares and wells over time. The analysis, which is based on the approaches developed within the world of space syntax, has shown that Venice has and still evolves as a system that resembles a highly probabilistic ‘algorithm’.

What seems a chaotic evolution is in fact the result of the interaction between space and social activity. Maps and data analysis can reveal the modularity of a city and the traces of how social activities have interacted and forged the space. These can help see new connections between literary imagination and the evolution of our society but also help us understand how we can imagine a future which is affected by growing uncertainties.

Digital technologies applied to our cultural heritage as these three examples have shown are an aid to study the past and imagine the future. They can help understand how we as a society can evolve, but also how all our cultural productions are sources of incredible information if we know how to look at them. We can measure the impact of global warming on our cultural artifacts and try to imagine a better future.

To know more on the role of Venice as a vantage point from where to look at the growing emergencies surrounding us –– environmental, cultural, social, and technological –– you can listen to the podcast The Fifth Siren (thefifthsiren.com) and join us for a British Library free online event on Monday 8th November with Professor Sophia Psarra and architectural artists Ila Bêka and Louise Lemoine. More info here: https://www.bl.uk/events/venice-tales-of-a-sinking-city.

This post is by Dr Giorgia Tolfo (@giorgiatolfo), Data and Content Manager for the Living with Machines project.

22 October 2021

Thought Bubble 2021 Wikithon

We are so excited to be working with Thought Bubble and our friends at Leeds Libraries to run our first in-person Wikithon since this residency started. Thought Bubble is an amazing comics festival spread across Yorkshire, culminating in a two day convention in Harrogate, where the British Library will be having a stall and curating a panel discussion, more details about these can be found here.

Thought Bubble Comic Convention Banner

The Thought Bubble website sums it up best when it says: ‘[w]e use our festival week to promote the power of comics! We believe they can inspire, educate and bring people together like no other medium [...]’. We at the library quite agree.

On Thursday November 11, from 1.30pm to 4.30pm, we’ll be taking up residence in the Sanderson Room of Leeds Central Library to demonstrate how to update, create and improve Wikipedia articles, and we'll even dabble in a bit of basic Wikidata editing for those who are interested. The Comics Wikithon event is free, but please book here.

Photograph of Leeds Central Library on a clear sunny day
Leeds Central Library by Lad 2011, CC BY-SA 4.0 via Wikimedia Commons

We’ll be focusing on underrepresented and marginalised voices in graphic novels and comics. We’re particularly interested in exploring the way Black, Asian and minority ethnic, disabled and LGBTQ+ creators and characters, and want to amplify representation at all levels!

As with all our Wikithons, no previous experience of editing Wikipedia is required. If you can write an email, you can edit Wikipedia! Whether it’s Widdershins, The Walking Dead or Wolverine that you like best, come along and learn some new skills and expand your comic horizons.

For those of you keen to get started, we’ll be following up next week with a blog post on how to get set up for the event. In the meantime you can freely register for the Comics Wikithon event here. 

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian).

29 September 2021

Sailing Away To A Distant Land - Mahendra Mahey, Manager of BL Labs - final post

Posted by Mahendra Mahey, former Manager of British Library Labs or "BL Labs" for short

[estimated reading time of around 15 minutes]

This is is my last day working as manager of BL Labs, and also my final posting on the Digital Scholarship blog. I thought I would take this chance to reflect on my journey of almost 9 years in helping to set up, maintain and enabling BL Labs to become a permanent fixture at the British Library (BL).

BL Labs was the first digital Lab in a national library, anywhere in the world, that gets people to experiment with its cultural heritage digital collections and data. There are now several Gallery, Library, Archive and Museum Labs or 'GLAM Labs' for short around the world, with an active community which I helped build, from 2018.

I am really proud I was there from the beginning to implement the original proposal which was written by several colleagues, but especially Adam Farquhar, former head of Digital Scholarship at the British Library (BL). The project was at first generously funded by the Andrew W. Mellon foundation through four rounds of funding as well as support from the BL. In April 2021, the project became a permanently funded fixture, helped very much by my new manager Maja Maricevic, Head of Higher Education and Science.

The great news is that BL Labs is going to stay after I have left. The position of leading the Lab will soon be advertised. Hopefully, someone will get a chance to work with my helpful and supportive colleague Technical Lead of Labs, Dr Filipe Bento, bright, talented and very hard working Maja and other great colleagues in Digital Research and wider at the BL.

The beginnings, the BL and me!

I met Adam Farquhar and Aly Conteh (Former Head of Digital Research at the BL) in December 2012. They must have liked something about me because I started working on the project in January 2013, though I officially started in March 2013 to launch BL Labs.

I must admit, I had always felt a bit intimidated by the BL. My first visit was in the early 1980s before the St Pancras site was opened (in 1997) as a Psychology student. I remember coming up from Wolverhampton on the train to get a research paper about "Serotonin Pathways in Rats when sleeping" by Lidov, feeling nervous and excited at the same time. It felt like a place for 'really intelligent educated people' and for those who were one for the intellectual elites in society. It also felt for me a bit like it represented the British empire and its troubled history of colonialism, especially some of the collections which made me feel uncomfortable as to why they were there in the first place.

I remember thinking that the BL probably wasn't a place for some like me, a child of Indian Punjabi immigrants from humble beginnings who came to England in the 1960s. Actually, I felt like an imposter and not worthy of being there.

Nearly 9 years later, I can say I learned to respect and even cherish what was inside it, especially the incredible collections, though I also became more confident about expressing stronger views about the decolonisation of some of these.  I became very fond of some of the people who work or use it, there are some really good kind-hearted souls at the BL. However, I never completely lost that 'imposter and being an outsider' feeling.

What I remember at that time, going for my interview, was having this thought, what will happen if I got the position and 'What would be the one thing I would try and change?'. It came easily to me, namely that I would try and get more new people through the doors literally or virtually by connecting them to the BL's collections (especially the digital). New people like me, who may have never set foot, or had been motivated to step into the building before. This has been one of the most important reasons for me to get up in the morning and go to work at BL Labs.

So what have been my highlights? Let's have a very quick pass through!

BL Labs Launch and Advisory Board

I launched BL Labs in March 2013, one week after I had started. It was at the launch event organised by my wonderfully supportive and innovative colleague, Digital Curator Stella Wisdom. I distinctly remember in the afternoon session (which I did alone), I had to present my 'ideas' of how I might launch the first BL Labs competition where we would be trying to get pioneering researchers to work with the BL's digital collections.

God it was a tough crowd! They asked pretty difficult questions, questions I myself was asking too which I still didn't know the answer too either.

I remember Professors Tim Hitchcock (now at Sussex University and who eventually sat (and is still sitting) on the BL Labs Advisory Board) and Laurel Brake (now Professor Emerita of Literature and Print Culture, Birkbeck, University of London) being in the audience together with staff from the Royal Library of Netherlands, who 6 months later launched their own brilliant KB Lab. Subsequently, I became good colleagues with Lotte Wilms who led their Lab for many years and is now Head of Research support at Tilburg University.

My first gut feeling overall after the event was, this is going to be hard work. This feeling and reality remained a constant throughout my time at BL Labs.

In early May 2013, we launched the competition, which was a really quick and stressful turnaround as I had only officially started in mid March (one and a half months). I remember worrying as to whether anyone would even enter!  All the final entries were pretty much submitted a few minutes before the deadline. I remember being alone that evening on deadline day near to midnight waiting by my laptop, thinking what happens if no one enters, it's going to be disaster and I will lose my job. Luckily that didn't happen, in the end, we received 26 entries.

I am a firm believer that we can help make our own luck, but sometimes luck can be quite random! Perhaps BL Labs had a bit of both!

After that, I never really looked back! BL Labs developed its own kind of pattern and momentum each year:

  • hunting around the BL for digital collections to make into datasets and make available
  • helping to make more digital collections openly licensed
  • having hundreds of conversations with people interested in connecting with the BL's digital collections in the BL and outside
  • working with some people more intensively to carry out experiments
  • developing ideas further into prototype projects
  • telling the world of successes and failures in person, meetings, events and social media
  • launching a competition and awards in April or May
  • roadshows before and after with invitations to speak at events around the world
  • the summer working with competition winners
  • late October/November the international symposium showcased things from the year
  • working on special projects
  • repeat!

The winners were announced in July 2013, and then we worked with them on their entries showcasing them at our annual BL Labs Symposium in November, around 4 months later.

'Nothing interesting happens in the office' - Roadshows, Presentations, Workshops and Symposia!

One of the highlights of BL Labs was to go out to universities and other places to explain what the BL is and what BL Labs does.  This ended up with me pretty much seeing the world (North America, Europe, Asia, Australia, and giving virtual talks in South America and Africa).

My greatest challenge in BL Labs was always to get people to truly and passionately 'connect' with the BL's digital collections and data in order to come up with cool ideas of what to actually do with them. What I learned from my very first trip was that telling people what you have is great, they definitely need to know what you have! However, once you do that, the hard work really begins as you often need to guide and inspire many of them, help and support them to use the collections creatively and meaningfully. It was also important to understand the back story of the digital collection and learn about the institutional culture of the BL if people also wanted to work with BL colleagues.  For me and the researchers involved, inspirational engagement with digital collections required a lot of intellectual effort and emotional intelligence. Often this means asking the uncomfortable questions about research such as 'Why are we doing this?', 'What is the benefit to society in doing this?', 'Who cares?', 'How can computation help?' and 'Why is it necessary to even use computation?'.

Making those connections between people and data does feel like magic when it really works. It's incredibly exciting, suddenly everyone has goose bumps and is energised. This feeling, I will take away with me, it's the essence of my work at BL Labs!

A full list of over 200 presentations, roadshows, events and 9 annual symposia can be found here.

Competitions, Awards and Projects

Another significant way BL Labs has tried to connect people with data has been through Competitions (tell us what you would like to do, and we will choose an idea and work collaboratively with you on it to make it a reality), Awards (show us what you have already done) and Projects (collaborative working).

At the last count, we have supported and / or highlighted over 450 projects in research, artistic, entrepreneurial, educational, community based, activist and public categories most through competitions, awards and project collaborations.

We also set up awards for British Library Staff which has been a wonderful way to highlight the fantastic work our staff do with digital collections and give them the recognition they deserve. I have noticed over the years that the number of staff who have been working on digital projects has increased significantly. Sometimes this was with the help of BL Labs but often because of the significant Digital Scholarship Training Programme, run by my Digital Curator colleagues in Digital Research for staff to understand that the BL isn't just about physical things but digital items too.

Browse through our project archive to get inspiration of the various projects BL Labs has been involved in or highlighted.

Putting the digital collections 'where the light is' - British Library platforms and others

When I started at BL Labs it was clear that we needed to make a fundamental decision about how we saw digital collections. Quite early on, we decided we should treat collections as data to harness the power of computational tools to work with each collection, especially for research purposes. Each collection should have a unique Digital Object Identifier (DOI) so researchers can cite them in publications.  Any new datasets generated from them will also have DOIs, allowing us to understand the ecosystem through DOIs of what happens to data when you get it out there for people to use.

In 2014, https://data.bl.uk was born and today, all our 153 datasets (as of 29/09/2021) are available through the British Library's research repository.

However, BL Labs has not stopped there! We always believed that it's important to put our digital collections where others are likely to discover them (we can't assume that researchers will want to come to BL platforms), 'where the light is' so to speak.  We were very open and able to put them on other platforms such as Flickr and Wikimedia Commons, not forgetting that we still needed to do the hard work to connect data to people after they have discovered them, if they needed that support.

Our greatest success by far was placing 1 million largely undescribed images that were digitally snipped from 65,000 digitised public domain books from the 19th Century on Flickr Commons in 2013. The number of images on the platform have grown since then by another 50 to 60 thousand from collections elsewhere in the BL. There has been significant interaction from the public to generate crowdsourced tags to help to make it easier to find the specific images. The number of views we have had have reached over a staggering 2 billion over this time. There have also been an incredible array of projects which have used the images, from artistic use to using machine learning and artificial intelligence to identify them. It's my favourite collection, probably because there are no restrictions in using it.

Read the most popular blog post the BL has ever published by my former BL Labs colleague, the brilliant and inspirational Ben O'Steen, a million first steps and the 'Mechanical Curator' which describes how we told the world why and how we had put 1 million images online for anyone to use freely.

It is wonderful to know that George Oates, the founder of Flickr Commons and still a BL Labs Advisory Board member, has been involved in the creation of the Flickr Foundation which was announced a few days ago! Long live Flickr Commons! We loved it because it also offered a computational way to access the collections, critical for powerful and efficient computational experiments, through its Application Programming Interface (API).

More recently, we have experimented with browser based programming / computational environments - Jupyter Notebooks. We are huge fans of Tim Sherrat who was a pioneer and brilliant advocate of OPEN GLAM in using them, especially through his GLAM Workbench. He is a one person Lab in his own right, and it was an honour to recognise his monumental efforts by giving him the BL Labs Research Award 2020 last year. You can also explore the fantastic work of Gustavo Candela and colleagues on Jupyter Notebooks and the ones my colleageue Filipe Bento created.

Art Exhibitions, Creativity and Education

I am extremely proud to have been involved in enabling two major art exhibitions to happen at the BL, namely:

Crossroads of Curiosity by David Normal

Imaginary Cities by Michael Takeo Magruder

I loved working with artists, its my passion! They are so creative and often not restricted by academic thinking, see the work of Mario Klingemann for example! You can browse through our archives for various artistic projects that used the BL's digital collections, it's inspiring.

I was also involved in the first British Library Fashion Student Competition won by Alanna Hilton, held at the BL which used the BL's Flickr Commons collection as inspiration for the students to design new fashion ranges. It was organised by my colleague Maja Maricevic, the British Fashion Colleges Council and Teatum Jones who were great fun to work with. I am really pleased to say that Maja has gone on from strength to strength working with the fashion industry and continues to run the competition to this day.

We also had some interesting projects working with younger people, such as Vittoria's world of stories and the fantastic work of Terhi Nurmikko-Fuller at the Australian National University. This is something I am very much interested in exploring further in the future, especially around ideas of computational thinking and have been trying out a few things.

GLAM Labs community and Booksprint

I am really proud of helping to create the international GLAM Labs community with over 250 members, established in 2018 and still active today. I affectionately call them the GLAM Labbers, and I often ask people to explore their inner 'Labber' when I give presentations. What is a Labber? It's the experimental and playful part of us we all had as children and unfortunately many have lost when becoming an adult. It's the ability to be fearless, having the audacity and perhaps even naivety to try crazy things even if they are likely to fail! Unfortunately society values success more than it does failure. In my opinion, we need to recognise, respect and revere those that have the courage to try but failed. That courage to experiment should be honoured and embraced and should become the bedrock of our educational systems from the very outset.

Two years ago, many of us Labbers 'ate our own dog food' or 'practised what we preached' when me and 15 other colleagues came together for 5 days to produce a book through a booksprint, probably the most rewarding professional experience of my life. The book is about how to set up, maintain, sustain and even close a GLAM Lab and is called 'Open a GLAM Lab'. It is available as public domain content and I encourage you to read it.

Online drop-in goodbye - today!

I organised a 30 minute ‘online farewell drop-in’ on Wednesday 29 September 2021, 1330 BST (London), 1430 (Paris, Amsterdam), 2200 (Adelaide), 0830 (New York) on my very last day at the British Library. It was heart-warming that the session was 'maxed out' at one point with participants from all over the world. I honestly didn't expect over 100 colleagues to show up. I guess when you leave an organisation you get to find out who you actually made an impact on, who shows up, and who tells you, otherwise you may never know.

Those that know me well know that I would have much rather had a farewell do ‘in person’, over a pint and praying for the ‘chip god’ to deliver a huge portion of chips with salt/vinegar and tomato sauce’ magically and mysteriously to the table. The pub would have been Mc'Glynns (http://www.mcglynnsfreehouse.com/) near the British Library in London. I wonder who the chip god was?  I never found out ;)

The answer to who the chip god was is in text following this sentence on white on white text...you will be very shocked to know who it was!- s

Spoiler alert it was me after all, my alter ego

Farwell-bl-labs-290921Mahendra's online farewell to BL Labs, Wednesday 29 September, 1330 BST, 2021.
Left: Flowers and wine from the GLAM Labbers arrived in Tallinn, 20 mins before the meeting!
Right: Some of the participants of the online farewell

Leave a message of good will to see me off on my voyage!

It would be wonderful if you would like to leave me your good wishes, comments, memories, thoughts, scans of handwritten messages, pictures, photographs etc. on the following Google doc:

http://tiny.cc/mahendramahey

I will leave it open for a week or so after I have left. Reading positive sincere heartfelt messages from colleagues and collaborators over the years have already lifted my spirits. For me it provides evidence that you perhaps did actually make a difference to somone's life.  I will definitely be re-reading them during the cold dark Baltic nights in Tallinn.

I would love to hear from you and find out what you are doing, or if you prefer, you can email me, the details are at the end of this post.

BL Labs Sailor and Captain Signing Off!

It's been a blast and lots of fun! Of course there is a tinge of sadness in leaving! For me, it's also been intellectually and emotionally challenging as well as exhausting, with many ‘highs’ and a few ‘lows’ or choppy waters, some professional and others personal.

I have learned so much about myself and there are so many things I am really really proud of. There are other things of course I wish I had done better. Most of all, I learned to embrace failure, my best teacher!

I think I did meet my original wish of wanting to help to open up the BL to as many new people who perhaps would have never engaged in the Library before. That was either by using digital collections and data for cool projects and/or simply walking through the doors of the BL in London or Boston Spa and having a look around and being inspired to do something because of it.

I wish the person who takes over my position lots of success! My only piece of advice is if you care, you will be fine!

Anyhow, what a time this has been for us all on this planet? I have definitely struggled at times. I, like many others, have lost loved ones and thought deeply about life and it's true meaning. I have also managed to find the courage to know what’s important and act accordingly, even if that has been a bit terrifying and difficult at times. Leaving the BL for example was not an easy decision for me, and I wish perhaps things had turned out differently, but I know I am doing the right thing for me, my future and my loved ones. 

Though there have been a few dark times for me both professionally and personally, I hope you will be happy to know that I have also found peace and happiness too. I am in a really good place.

I would like to thank former alumni of BL Labs, Ben O'Steen - Technical Lead for BL Labs from 2013 to 2018, Hana Lewis (2016 - 2018) and Eleanor Cooper (2018-2019) both BL Labs Project Officers and many other people I worked through BL Labs and wider in the Library and outside it in my journey.

Where I am off to and what am I doing?

My professional plans are 'evolving', but one thing is certain, I will be moving country!

To Estonia to be precise!

I plan to live, settle down with my family and work there. I was never a fan of Brexit, and this way I get to stay a European.

I would like to finish with this final sweet video created by writer and filmaker Ling Low and her team in 2016, entitled 'Hey there Young Sailor' which they all made as volunteers for the Malaysian band, the 'Impatient Sisters'. It won the BL Labs Artistic Award in 2016. I had the pleasure and honour of meeting Ling over a lovely lunch in Kuala Lumpa, Malaysia, where I had also given a talk at the National Library about my work and looked for remanants of my grandfather who had settled there many years ago.

I wish all of you well, and if you are interested in keeping in touch with me, working with me or just saying hello, you can contact me via my personal email address: mr.mahendra.mahey@gmail.com or follow my progress on my personal website.

Happy journeys through this short life to all of you!

Mahendra Mahey, former BL Labs Manager / Captain / Sailor signing off!

23 September 2021

Computing for Cultural Heritage: Trial Outcomes and Final Report

Six months ago, twenty members of staff from the British Library and The National Archives UK completed Computing for Cultural Heritage, a project that trialled Birkbeck University and Institute of Coding’s new PGCert, Applied Data Science. In this blog post we explore the necessity of this new course, the final report of this trial, and the lasting impact that this PGCert has made on some of the participants. 

 

 

Background 

Information professionals have been experiencing a massive shift to digital in the way collections are being donated, held and accessed. In the British Library’s digital collections there are e-books, maps, digitised newspapers, journal titles, sound recordings and over 500 terabytes of preserved data from the UK Web Archive. Yearly, the library sees 6 million catalogue searches by web users with almost 4 million items consulted online. This amounts to a vast amount of potential cultural heritage data available to researchers, and it requires complex digital workflows to curate, collect, manage, provide access, and help researchers computationally make sense of it all. 

Staff at collecting institutions like the British Library and the National Archives, UK are engaging in computationally driven projects like never before, but often without the benefit of data skills and computational thinking to support them. That is where a program like Computing for Cultural Heritage can help information professionals, allowing them to upskill and tackle issues – like building new digital systems and services, supporting collaborative, computational and data-driven research using digital collections and data, or deploying simple scripts to make everyday tasks easier – with confidence.  

Image of a laptop with the screen showing a bookshelf

 

Learning Aims 

The trial course was broken into two modules, a taught lesson on ‘Demystifying Computing with Python’ and a written ‘Industry Project’ on a software solution to a work-based problem.  A third module, Analytic Tools for Information Professionals, would be offered to participants outside of the trial as part of the full live course in order to earn their PGCert.

By the end of the trial, participants were able to: 

  • Demonstrate satisfactory knowledge of programming with Python. 
  • Understand techniques for Python data structures and algorithms. 
  • Work on case studies to apply data analytics using Python. 
  • Understand the programming paradigm of object-oriented programming. 
  • Use Python to apply the techniques learned on the module to real-world problems. 
  • Demonstrate the ability to develop an algorithm to carry out a specified task and to convert this into an executable program. 
  • Demonstrate the ability to debug a program. 
  • Understand the concepts of data security and general data protection regulations and standards. 
  • Develop a systematic understanding and critical awareness of a commonly agreed problem between the work environment and the academic supervisor in the area of computing. 
  • Develop a software solution for a work-based problem using the skills developed from the taught modules, for example develop software using the programming languages and software tools/libraries taught. 
  • Present a critical discussion on existing approaches in the particular problem area and position their own approach within that area and evaluate their contribution. 

  • Gain experience in communicating complex ideas/concepts and approaches/techniques to others by writing a comprehensive, self-contained report. 

The learning objectives were designed and delivered with the cultural heritage context in mind, and as such incorporated, for instance, examples and datasets from the British Library Music collections in the Python programming elements of the taught module. Additionally, there was a lecture focused on a British Library user case involving the design and implementation of a Database Management System. 

Following the completion of the trial, participants had the opportunity to complete their PGCert in Applied Data Science by attending the final module, Analytic Tools for Information Professionals, which was part of the official course launched last autumn. 

 

The Lasting Impact of Computing for Cultural Heritage 

Now that we’re six months on from the end of the trial, and the participants who opted in have earned their full PGCert, we followed up with some of the learners to hear about their experiences and the lasting effects of the course: 

“The third and final module of the computing for cultural heritage course was not only fascinating and enjoyable, it was also really pertinent to my job and I was immediately able to put the skills I learned into practice.  

The majority of the third module focussed on machine learning. We studied a number of different methods and one of these proved invaluable to the Agents of Enslavement research project I am currently leading. This project included a crowdsourcing task which asked the public to draw rectangles around four different types of newspaper advertisement. The purpose of the task was to use the coordinates of these rectangles to crop the images and create a dataset of adverts that can then be analysed for research purposes. To help ensure that no adverts were missed and to account for individual errors, each image was classified by five different people.  

One of my biggest technical challenges was to find a way of aggregating the rectangles drawn by five different people on a single page in order to calculate the rectangles of best fit. If each person only drew one rectangle, it was relatively easy for me to aggregate the results using the coding skills I had developed in the first two modules. I could simply find the average (or mean) of the five different classification attempts. But what if people identified several adverts and therefore drew multiple rectangles on a single page? For example, what if person one drew a rectangle around only one advert in the top left corner of the page; people two and three drew two rectangles on the same page, one in the top left and one in the top right; and people four and five drew rectangles around four adverts on the same page (one in each corner). How would I be able to create a piece of code that knew how to aggregate the coordinates of all the rectangles drawn in the top left and to separately aggregate the coordinates of all the rectangles drawn in the bottom right, and so on?  

One solution to this problem was to use an unsupervised machine learning method to cluster the coordinates before running the aggregation method. Much to my amazement, this worked perfectly and enabled me to successfully process the total of 92,218 rectangles that were drawn and create an aggregated dataset of more than 25,000 unique newspaper adverts.” 

-Graham Jevon, EAP Cataloguer; BL Endangered Archives Programme 

 

“The final module of the course was in some ways the most challenging — requiring a lot of us to dust off the statistics and algebra parts of our brain. However, I think, it was also the most powerful; revealing how machine learning approaches can help us to uncover hidden knowledge and patterns in a huge variety of different areas.  

Completing the course during COVID meant that collection access was limited, so I ended up completing a case study examining how generic tropes have evolved in science fiction across time using a dataset extracted from GoodReads. This work proved to be exceptionally useful in helping me to think about how computers understand language differently; and how we can leverage their ability to make statistical inferences in order to support our own, qualitative analyses. 

In my own collection area, working with born digital archives in Contemporary Archives and Manuscripts, we treat draft material — of novels, poems or anything else — as very important to understanding the creative process. I am excited to apply some of these techniques — particularly Unsupervised Machine Learning — to examine the hidden relationships between draft material in some of our creative archives. 

The course has provided many, many avenues of potential enquiry like this and I’m excited to see the projects that its graduates undertake across the Library.” 

-Callum McKean, Lead Curator, Digital; Contemporary British Collection

 

"I really enjoyed the Analytics Tools for Data Science module. As a data science novice, I came to the course with limited theoretical knowledge of how data science tools could be applied to answer research questions. The choice of using real-life data to solve queries specific to professionals in the cultural heritage sector was really appreciated as it made everyday applications of the tools and code more tangible. I can see now how curators’ expertise and specialised knowledge could be combined with tools for data analysis to further understanding of and meaningful research in their own collection area."

-Giulia Carla Rossi, Curator, Digital Publications; Contemporary British Collection

 

Final Report 

The Computing for Cultural Heritage project concluded in February 2021 with a virtual panel session that highlighted the learners’ projects and allowed discussion of the course and feedback to the key project coordinators and contributors. Case studies of the participants’ projects, as well as links to other blog posts and project pages can be found on our Computing for Cultural Heritage Student Projects page. 

The final report highlights these projects as well as demographical statistics on the participants and feedback that was gained through anonymous survey at the end of the trial. In order to evaluate the experience of the students on the PGCert we composed a list of questions that would provide insight into various aspects of the course with respect to how the learner fit in the work around their work commitments and how well they met the learning objectives. 

 

Why Computing for Cultural Heritage? 

Bar graph showing the results of the question 'Why did you choose to do this course' with the results discussed in the text below
Figure 1: Why did you choose to do this course? Results breakdown by topic and gender

When asked why the participants chose to take part in the course, we found that one of the most common answers was to develop methods for automating repetitive, manual tasks – such as generating unique identifiers for digital records and copying data between Excel spreadsheets – to free up more curatorial time for their digital collections. One participant said:  

“I wanted to learn more about coding and how to use it to analyse data, particularly data that I knew was rich and had value but had been stuck in multiple spreadsheets for quite some time.” 

There was also a desire to learn new skills, either for personal or professional development: 

“I believe in continuous professional development and knew that this would be an invaluable course to undertake for my career.”  

“I felt I was lagging behind and my job was getting static, and the feeling that I was behind [in digital] and I wanted to kind of catch up.” 

Bar graph showing the results to the question 'Did the course help you meet your aims?' with 14 answering yes, 1 answering no and 1 answering 'mixed'
Figure 2: 'Did the course help you meet your aims? Results broken down by answer and gender.

A follow up question asked whether these goals and aims was met by the course. Happily, most participants indicated that they had been met, for reasons of increased confidence, help in developing new computational skills, and a deeper knowledge of information technology. 

 

What was the most enjoyed aspect of the course? 

Bar graph showing the results of the question 'What did you enjoy most about the course' with the results discussed in the text below
Figure 3: 'What did you enjoy most about the course?' Results breakdown by topic and gender

When broken down, the responses to ‘What did you enjoy most’ largely reflect the student experience, whether it was being in taught modules (4), getting hands on experience (4), or being in a learning environment again (6). Participants also indicated that networking with peers was an enjoyable part of the experience: 

“Day out of work with like minded people made it really easy to stick with rather than just doing it online.”  

“Spending a day away from work and meeting the people I had never met at the NA, and also speaking to people from the BL about what they did.”  

“I enjoyed being a student again, learning a new skill amongst my peers, which week after week is a really valuable experience…” 

“Learning with colleagues and people working in similar fields was also a plus, as our interests often overlapped...” 

While only two responses were made where the project module was considered as one of the most enjoyable components, it was useful to see how the course really afforded the opportunity to apply their learning to solving a work-based problem that provides some benefit to their role, department or digital collection: 

“I really enjoyed being able to apply my learning to a real-world work-based project and to finally analyze some of the data that has been lying around the department for over a decade without any further analysis.”  

“The design and create aspect of the project. Applying what I learned to solving a genuine problem was the most enjoyable part - using Python and solving problems to achieve something tangible. This is where I really consolidated my learning.” 

 

What was the most challenging aspect of the course? 

Bar graph showing the results of the question 'What did you find the most challenging and why?' with the results discussed in the text below
Figure 4: 'What did you find the most challenging and why?' Results breakdown by topic and gender.

When discussing the most challenging aspect of the course, most of the learners focused on the practical Python lab sessions and the work-based project module. Interestingly, participants also stated that they were able to overcome the challenges through personal perseverance and the learning provided by the course itself: 

“I found the initial hurdle of learning how [to] code very challenging, but after the basics it became possible to become more creative and experimental.”  

“The work-based project was a huge challenge. We'd only really done 5 weeks of classes and, having never done anything like this before, it was hard to envisage an end product let alone how to put it together. But got there in the end!” 

While the majority of the cohort found the practical components of the PGCert trial most challenging, the feedback also suggested that the inclusion of the second module – which will be available as part of the full programme – will provide more opportunity to practice the practical programming skills like software tools and APIs. 

 

The Effectiveness of Computing with Cultural Heritage 

Bar graph showing the results of the question 'Have you applied anything you have learnt?' with 2 results for 'Data analysis concepts', 12 results for 'Python coding' and 2 results for 'Nothing'
Figure 5: 'Have you applied anything you have learnt?' Results breakdown by topic and gender.

Participants were asked whether they had used any of the knowledge or skills acquired in the PGCert trial. Even after sitting just the first and third modules, participants responded that they were able to apply their learning to their current role in some form.  

“I now regularly use the software program I built as part of my day-to-day job. This program performs a task in a few seconds, which otherwise could take hours or days, and which is otherwise subject to human error. I have since adapted this so that it can also be used by a colleague in another department.”  

“Python helps me perform tasks that I previously did not know how to achieve. I have also led a couple of training sessions within the library, introducing Python to beginners (using the software I built in the project as a cultural heritage use case to frame the introduction).” 

“I changed [job] role at the end of the course so I think that helped me also in getting this promotion. And in this new role I have many more data analysis tasks to perform [quickly] for actions that would take months so yeah I managed to write that with a few scripts in my new role.” 

It was great to hear that the impacts of the trial were being felt so immediately by the participants, and that they were able to not only retain but also apply the new skills that they had gained.  

 This blog post was written by Deirdre Sullivan, Business Support Officer for Digital Scholarship Training Initiatives, part of the Digital Research and Curators Team. Special thanks to Nora McGregor, Digital Curator for the European and American Collection for support on the blog post and Martyn Harris, Institute of Coding Manager, for his work on the final report, as well as Giulia Rossi, Callum McKean and Graham Jevon for sharing their experiences.

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs