Digital scholarship blog

218 posts categorized "Data"

29 September 2021

Sailing Away To A Distant Land - Mahendra Mahey, Manager of BL Labs - final post

Posted by Mahendra Mahey, former Manager of British Library Labs or "BL Labs" for short

[estimated reading time of around 15 minutes]

This is is my last day working as manager of BL Labs, and also my final posting on the Digital Scholarship blog. I thought I would take this chance to reflect on my journey of almost 9 years in helping to set up, maintain and enabling BL Labs to become a permanent fixture at the British Library (BL).

BL Labs was the first digital Lab in a national library, anywhere in the world, that gets people to experiment with its cultural heritage digital collections and data. There are now several Gallery, Library, Archive and Museum Labs or 'GLAM Labs' for short around the world, with an active community which I helped build, from 2018.

I am really proud I was there from the beginning to implement the original proposal which was written by several colleagues, but especially Adam Farquhar, former head of Digital Scholarship at the British Library (BL). The project was at first generously funded by the Andrew W. Mellon foundation through four rounds of funding as well as support from the BL. In April 2021, the project became a permanently funded fixture, helped very much by my new manager Maja Maricevic, Head of Higher Education and Science.

The great news is that BL Labs is going to stay after I have left. The position of leading the Lab will soon be advertised. Hopefully, someone will get a chance to work with my helpful and supportive colleague Technical Lead of Labs, Dr Filipe Bento, bright, talented and very hard working Maja and other great colleagues in Digital Research and wider at the BL.

The beginnings, the BL and me!

I met Adam Farquhar and Aly Conteh (Former Head of Digital Research at the BL) in December 2012. They must have liked something about me because I started working on the project in January 2013, though I officially started in March 2013 to launch BL Labs.

I must admit, I had always felt a bit intimidated by the BL. My first visit was in the early 1980s before the St Pancras site was opened (in 1997) as a Psychology student. I remember coming up from Wolverhampton on the train to get a research paper about "Serotonin Pathways in Rats when sleeping" by Lidov, feeling nervous and excited at the same time. It felt like a place for 'really intelligent educated people' and for those who were one for the intellectual elites in society. It also felt for me a bit like it represented the British empire and its troubled history of colonialism, especially some of the collections which made me feel uncomfortable as to why they were there in the first place.

I remember thinking that the BL probably wasn't a place for some like me, a child of Indian Punjabi immigrants from humble beginnings who came to England in the 1960s. Actually, I felt like an imposter and not worthy of being there.

Nearly 9 years later, I can say I learned to respect and even cherish what was inside it, especially the incredible collections, though I also became more confident about expressing stronger views about the decolonisation of some of these.  I became very fond of some of the people who work or use it, there are some really good kind-hearted souls at the BL. However, I never completely lost that 'imposter and being an outsider' feeling.

What I remember at that time, going for my interview, was having this thought, what will happen if I got the position and 'What would be the one thing I would try and change?'. It came easily to me, namely that I would try and get more new people through the doors literally or virtually by connecting them to the BL's collections (especially the digital). New people like me, who may have never set foot, or had been motivated to step into the building before. This has been one of the most important reasons for me to get up in the morning and go to work at BL Labs.

So what have been my highlights? Let's have a very quick pass through!

BL Labs Launch and Advisory Board

I launched BL Labs in March 2013, one week after I had started. It was at the launch event organised by my wonderfully supportive and innovative colleague, Digital Curator Stella Wisdom. I distinctly remember in the afternoon session (which I did alone), I had to present my 'ideas' of how I might launch the first BL Labs competition where we would be trying to get pioneering researchers to work with the BL's digital collections.

God it was a tough crowd! They asked pretty difficult questions, questions I myself was asking too which I still didn't know the answer too either.

I remember Professors Tim Hitchcock (now at Sussex University and who eventually sat (and is still sitting) on the BL Labs Advisory Board) and Laurel Brake (now Professor Emerita of Literature and Print Culture, Birkbeck, University of London) being in the audience together with staff from the Royal Library of Netherlands, who 6 months later launched their own brilliant KB Lab. Subsequently, I became good colleagues with Lotte Wilms who led their Lab for many years and is now Head of Research support at Tilburg University.

My first gut feeling overall after the event was, this is going to be hard work. This feeling and reality remained a constant throughout my time at BL Labs.

In early May 2013, we launched the competition, which was a really quick and stressful turnaround as I had only officially started in mid March (one and a half months). I remember worrying as to whether anyone would even enter!  All the final entries were pretty much submitted a few minutes before the deadline. I remember being alone that evening on deadline day near to midnight waiting by my laptop, thinking what happens if no one enters, it's going to be disaster and I will lose my job. Luckily that didn't happen, in the end, we received 26 entries.

I am a firm believer that we can help make our own luck, but sometimes luck can be quite random! Perhaps BL Labs had a bit of both!

After that, I never really looked back! BL Labs developed its own kind of pattern and momentum each year:

  • hunting around the BL for digital collections to make into datasets and make available
  • helping to make more digital collections openly licensed
  • having hundreds of conversations with people interested in connecting with the BL's digital collections in the BL and outside
  • working with some people more intensively to carry out experiments
  • developing ideas further into prototype projects
  • telling the world of successes and failures in person, meetings, events and social media
  • launching a competition and awards in April or May
  • roadshows before and after with invitations to speak at events around the world
  • the summer working with competition winners
  • late October/November the international symposium showcased things from the year
  • working on special projects
  • repeat!

The winners were announced in July 2013, and then we worked with them on their entries showcasing them at our annual BL Labs Symposium in November, around 4 months later.

'Nothing interesting happens in the office' - Roadshows, Presentations, Workshops and Symposia!

One of the highlights of BL Labs was to go out to universities and other places to explain what the BL is and what BL Labs does.  This ended up with me pretty much seeing the world (North America, Europe, Asia, Australia, and giving virtual talks in South America and Africa).

My greatest challenge in BL Labs was always to get people to truly and passionately 'connect' with the BL's digital collections and data in order to come up with cool ideas of what to actually do with them. What I learned from my very first trip was that telling people what you have is great, they definitely need to know what you have! However, once you do that, the hard work really begins as you often need to guide and inspire many of them, help and support them to use the collections creatively and meaningfully. It was also important to understand the back story of the digital collection and learn about the institutional culture of the BL if people also wanted to work with BL colleagues.  For me and the researchers involved, inspirational engagement with digital collections required a lot of intellectual effort and emotional intelligence. Often this means asking the uncomfortable questions about research such as 'Why are we doing this?', 'What is the benefit to society in doing this?', 'Who cares?', 'How can computation help?' and 'Why is it necessary to even use computation?'.

Making those connections between people and data does feel like magic when it really works. It's incredibly exciting, suddenly everyone has goose bumps and is energised. This feeling, I will take away with me, it's the essence of my work at BL Labs!

A full list of over 200 presentations, roadshows, events and 9 annual symposia can be found here.

Competitions, Awards and Projects

Another significant way BL Labs has tried to connect people with data has been through Competitions (tell us what you would like to do, and we will choose an idea and work collaboratively with you on it to make it a reality), Awards (show us what you have already done) and Projects (collaborative working).

At the last count, we have supported and / or highlighted over 450 projects in research, artistic, entrepreneurial, educational, community based, activist and public categories most through competitions, awards and project collaborations.

We also set up awards for British Library Staff which has been a wonderful way to highlight the fantastic work our staff do with digital collections and give them the recognition they deserve. I have noticed over the years that the number of staff who have been working on digital projects has increased significantly. Sometimes this was with the help of BL Labs but often because of the significant Digital Scholarship Training Programme, run by my Digital Curator colleagues in Digital Research for staff to understand that the BL isn't just about physical things but digital items too.

Browse through our project archive to get inspiration of the various projects BL Labs has been involved in or highlighted.

Putting the digital collections 'where the light is' - British Library platforms and others

When I started at BL Labs it was clear that we needed to make a fundamental decision about how we saw digital collections. Quite early on, we decided we should treat collections as data to harness the power of computational tools to work with each collection, especially for research purposes. Each collection should have a unique Digital Object Identifier (DOI) so researchers can cite them in publications.  Any new datasets generated from them will also have DOIs, allowing us to understand the ecosystem through DOIs of what happens to data when you get it out there for people to use.

In 2014, https://data.bl.uk was born and today, all our 153 datasets (as of 29/09/2021) are available through the British Library's research repository.

However, BL Labs has not stopped there! We always believed that it's important to put our digital collections where others are likely to discover them (we can't assume that researchers will want to come to BL platforms), 'where the light is' so to speak.  We were very open and able to put them on other platforms such as Flickr and Wikimedia Commons, not forgetting that we still needed to do the hard work to connect data to people after they have discovered them, if they needed that support.

Our greatest success by far was placing 1 million largely undescribed images that were digitally snipped from 65,000 digitised public domain books from the 19th Century on Flickr Commons in 2013. The number of images on the platform have grown since then by another 50 to 60 thousand from collections elsewhere in the BL. There has been significant interaction from the public to generate crowdsourced tags to help to make it easier to find the specific images. The number of views we have had have reached over a staggering 2 billion over this time. There have also been an incredible array of projects which have used the images, from artistic use to using machine learning and artificial intelligence to identify them. It's my favourite collection, probably because there are no restrictions in using it.

Read the most popular blog post the BL has ever published by my former BL Labs colleague, the brilliant and inspirational Ben O'Steen, a million first steps and the 'Mechanical Curator' which describes how we told the world why and how we had put 1 million images online for anyone to use freely.

It is wonderful to know that George Oates, the founder of Flickr Commons and still a BL Labs Advisory Board member, has been involved in the creation of the Flickr Foundation which was announced a few days ago! Long live Flickr Commons! We loved it because it also offered a computational way to access the collections, critical for powerful and efficient computational experiments, through its Application Programming Interface (API).

More recently, we have experimented with browser based programming / computational environments - Jupyter Notebooks. We are huge fans of Tim Sherrat who was a pioneer and brilliant advocate of OPEN GLAM in using them, especially through his GLAM Workbench. He is a one person Lab in his own right, and it was an honour to recognise his monumental efforts by giving him the BL Labs Research Award 2020 last year. You can also explore the fantastic work of Gustavo Candela and colleagues on Jupyter Notebooks and the ones my colleageue Filipe Bento created.

Art Exhibitions, Creativity and Education

I am extremely proud to have been involved in enabling two major art exhibitions to happen at the BL, namely:

Crossroads of Curiosity by David Normal

Imaginary Cities by Michael Takeo Magruder

I loved working with artists, its my passion! They are so creative and often not restricted by academic thinking, see the work of Mario Klingemann for example! You can browse through our archives for various artistic projects that used the BL's digital collections, it's inspiring.

I was also involved in the first British Library Fashion Student Competition won by Alanna Hilton, held at the BL which used the BL's Flickr Commons collection as inspiration for the students to design new fashion ranges. It was organised by my colleague Maja Maricevic, the British Fashion Colleges Council and Teatum Jones who were great fun to work with. I am really pleased to say that Maja has gone on from strength to strength working with the fashion industry and continues to run the competition to this day.

We also had some interesting projects working with younger people, such as Vittoria's world of stories and the fantastic work of Terhi Nurmikko-Fuller at the Australian National University. This is something I am very much interested in exploring further in the future, especially around ideas of computational thinking and have been trying out a few things.

GLAM Labs community and Booksprint

I am really proud of helping to create the international GLAM Labs community with over 250 members, established in 2018 and still active today. I affectionately call them the GLAM Labbers, and I often ask people to explore their inner 'Labber' when I give presentations. What is a Labber? It's the experimental and playful part of us we all had as children and unfortunately many have lost when becoming an adult. It's the ability to be fearless, having the audacity and perhaps even naivety to try crazy things even if they are likely to fail! Unfortunately society values success more than it does failure. In my opinion, we need to recognise, respect and revere those that have the courage to try but failed. That courage to experiment should be honoured and embraced and should become the bedrock of our educational systems from the very outset.

Two years ago, many of us Labbers 'ate our own dog food' or 'practised what we preached' when me and 15 other colleagues came together for 5 days to produce a book through a booksprint, probably the most rewarding professional experience of my life. The book is about how to set up, maintain, sustain and even close a GLAM Lab and is called 'Open a GLAM Lab'. It is available as public domain content and I encourage you to read it.

Online drop-in goodbye - today!

I organised a 30 minute ‘online farewell drop-in’ on Wednesday 29 September 2021, 1330 BST (London), 1430 (Paris, Amsterdam), 2200 (Adelaide), 0830 (New York) on my very last day at the British Library. It was heart-warming that the session was 'maxed out' at one point with participants from all over the world. I honestly didn't expect over 100 colleagues to show up. I guess when you leave an organisation you get to find out who you actually made an impact on, who shows up, and who tells you, otherwise you may never know.

Those that know me well know that I would have much rather had a farewell do ‘in person’, over a pint and praying for the ‘chip god’ to deliver a huge portion of chips with salt/vinegar and tomato sauce’ magically and mysteriously to the table. The pub would have been Mc'Glynns (http://www.mcglynnsfreehouse.com/) near the British Library in London. I wonder who the chip god was?  I never found out ;)

The answer to who the chip god was is in text following this sentence on white on white text...you will be very shocked to know who it was!- s

Spoiler alert it was me after all, my alter ego

Farwell-bl-labs-290921Mahendra's online farewell to BL Labs, Wednesday 29 September, 1330 BST, 2021.
Left: Flowers and wine from the GLAM Labbers arrived in Tallinn, 20 mins before the meeting!
Right: Some of the participants of the online farewell

Leave a message of good will to see me off on my voyage!

It would be wonderful if you would like to leave me your good wishes, comments, memories, thoughts, scans of handwritten messages, pictures, photographs etc. on the following Google doc:

http://tiny.cc/mahendramahey

I will leave it open for a week or so after I have left. Reading positive sincere heartfelt messages from colleagues and collaborators over the years have already lifted my spirits. For me it provides evidence that you perhaps did actually make a difference to somone's life.  I will definitely be re-reading them during the cold dark Baltic nights in Tallinn.

I would love to hear from you and find out what you are doing, or if you prefer, you can email me, the details are at the end of this post.

BL Labs Sailor and Captain Signing Off!

It's been a blast and lots of fun! Of course there is a tinge of sadness in leaving! For me, it's also been intellectually and emotionally challenging as well as exhausting, with many ‘highs’ and a few ‘lows’ or choppy waters, some professional and others personal.

I have learned so much about myself and there are so many things I am really really proud of. There are other things of course I wish I had done better. Most of all, I learned to embrace failure, my best teacher!

I think I did meet my original wish of wanting to help to open up the BL to as many new people who perhaps would have never engaged in the Library before. That was either by using digital collections and data for cool projects and/or simply walking through the doors of the BL in London or Boston Spa and having a look around and being inspired to do something because of it.

I wish the person who takes over my position lots of success! My only piece of advice is if you care, you will be fine!

Anyhow, what a time this has been for us all on this planet? I have definitely struggled at times. I, like many others, have lost loved ones and thought deeply about life and it's true meaning. I have also managed to find the courage to know what’s important and act accordingly, even if that has been a bit terrifying and difficult at times. Leaving the BL for example was not an easy decision for me, and I wish perhaps things had turned out differently, but I know I am doing the right thing for me, my future and my loved ones. 

Though there have been a few dark times for me both professionally and personally, I hope you will be happy to know that I have also found peace and happiness too. I am in a really good place.

I would like to thank former alumni of BL Labs, Ben O'Steen - Technical Lead for BL Labs from 2013 to 2018, Hana Lewis (2016 - 2018) and Eleanor Cooper (2018-2019) both BL Labs Project Officers and many other people I worked through BL Labs and wider in the Library and outside it in my journey.

Where I am off to and what am I doing?

My professional plans are 'evolving', but one thing is certain, I will be moving country!

To Estonia to be precise!

I plan to live, settle down with my family and work there. I was never a fan of Brexit, and this way I get to stay a European.

I would like to finish with this final sweet video created by writer and filmaker Ling Low and her team in 2016, entitled 'Hey there Young Sailor' which they all made as volunteers for the Malaysian band, the 'Impatient Sisters'. It won the BL Labs Artistic Award in 2016. I had the pleasure and honour of meeting Ling over a lovely lunch in Kuala Lumpa, Malaysia, where I had also given a talk at the National Library about my work and looked for remanants of my grandfather who had settled there many years ago.

I wish all of you well, and if you are interested in keeping in touch with me, working with me or just saying hello, you can contact me via my personal email address: mr.mahendra.mahey@gmail.com or follow my progress on my personal website.

Happy journeys through this short life to all of you!

Mahendra Mahey, former BL Labs Manager / Captain / Sailor signing off!

23 September 2021

Computing for Cultural Heritage: Trial Outcomes and Final Report

Six months ago, twenty members of staff from the British Library and The National Archives UK completed Computing for Cultural Heritage, a project that trialled Birkbeck University and Institute of Coding’s new PGCert, Applied Data Science. In this blog post we explore the necessity of this new course, the final report of this trial, and the lasting impact that this PGCert has made on some of the participants. 

 

 

Background 

Information professionals have been experiencing a massive shift to digital in the way collections are being donated, held and accessed. In the British Library’s digital collections there are e-books, maps, digitised newspapers, journal titles, sound recordings and over 500 terabytes of preserved data from the UK Web Archive. Yearly, the library sees 6 million catalogue searches by web users with almost 4 million items consulted online. This amounts to a vast amount of potential cultural heritage data available to researchers, and it requires complex digital workflows to curate, collect, manage, provide access, and help researchers computationally make sense of it all. 

Staff at collecting institutions like the British Library and the National Archives, UK are engaging in computationally driven projects like never before, but often without the benefit of data skills and computational thinking to support them. That is where a program like Computing for Cultural Heritage can help information professionals, allowing them to upskill and tackle issues – like building new digital systems and services, supporting collaborative, computational and data-driven research using digital collections and data, or deploying simple scripts to make everyday tasks easier – with confidence.  

Image of a laptop with the screen showing a bookshelf

 

Learning Aims 

The trial course was broken into two modules, a taught lesson on ‘Demystifying Computing with Python’ and a written ‘Industry Project’ on a software solution to a work-based problem.  A third module, Analytic Tools for Information Professionals, would be offered to participants outside of the trial as part of the full live course in order to earn their PGCert.

By the end of the trial, participants were able to: 

  • Demonstrate satisfactory knowledge of programming with Python. 
  • Understand techniques for Python data structures and algorithms. 
  • Work on case studies to apply data analytics using Python. 
  • Understand the programming paradigm of object-oriented programming. 
  • Use Python to apply the techniques learned on the module to real-world problems. 
  • Demonstrate the ability to develop an algorithm to carry out a specified task and to convert this into an executable program. 
  • Demonstrate the ability to debug a program. 
  • Understand the concepts of data security and general data protection regulations and standards. 
  • Develop a systematic understanding and critical awareness of a commonly agreed problem between the work environment and the academic supervisor in the area of computing. 
  • Develop a software solution for a work-based problem using the skills developed from the taught modules, for example develop software using the programming languages and software tools/libraries taught. 
  • Present a critical discussion on existing approaches in the particular problem area and position their own approach within that area and evaluate their contribution. 

  • Gain experience in communicating complex ideas/concepts and approaches/techniques to others by writing a comprehensive, self-contained report. 

The learning objectives were designed and delivered with the cultural heritage context in mind, and as such incorporated, for instance, examples and datasets from the British Library Music collections in the Python programming elements of the taught module. Additionally, there was a lecture focused on a British Library user case involving the design and implementation of a Database Management System. 

Following the completion of the trial, participants had the opportunity to complete their PGCert in Applied Data Science by attending the final module, Analytic Tools for Information Professionals, which was part of the official course launched last autumn. 

 

The Lasting Impact of Computing for Cultural Heritage 

Now that we’re six months on from the end of the trial, and the participants who opted in have earned their full PGCert, we followed up with some of the learners to hear about their experiences and the lasting effects of the course: 

“The third and final module of the computing for cultural heritage course was not only fascinating and enjoyable, it was also really pertinent to my job and I was immediately able to put the skills I learned into practice.  

The majority of the third module focussed on machine learning. We studied a number of different methods and one of these proved invaluable to the Agents of Enslavement research project I am currently leading. This project included a crowdsourcing task which asked the public to draw rectangles around four different types of newspaper advertisement. The purpose of the task was to use the coordinates of these rectangles to crop the images and create a dataset of adverts that can then be analysed for research purposes. To help ensure that no adverts were missed and to account for individual errors, each image was classified by five different people.  

One of my biggest technical challenges was to find a way of aggregating the rectangles drawn by five different people on a single page in order to calculate the rectangles of best fit. If each person only drew one rectangle, it was relatively easy for me to aggregate the results using the coding skills I had developed in the first two modules. I could simply find the average (or mean) of the five different classification attempts. But what if people identified several adverts and therefore drew multiple rectangles on a single page? For example, what if person one drew a rectangle around only one advert in the top left corner of the page; people two and three drew two rectangles on the same page, one in the top left and one in the top right; and people four and five drew rectangles around four adverts on the same page (one in each corner). How would I be able to create a piece of code that knew how to aggregate the coordinates of all the rectangles drawn in the top left and to separately aggregate the coordinates of all the rectangles drawn in the bottom right, and so on?  

One solution to this problem was to use an unsupervised machine learning method to cluster the coordinates before running the aggregation method. Much to my amazement, this worked perfectly and enabled me to successfully process the total of 92,218 rectangles that were drawn and create an aggregated dataset of more than 25,000 unique newspaper adverts.” 

-Graham Jevon, EAP Cataloguer; BL Endangered Archives Programme 

 

“The final module of the course was in some ways the most challenging — requiring a lot of us to dust off the statistics and algebra parts of our brain. However, I think, it was also the most powerful; revealing how machine learning approaches can help us to uncover hidden knowledge and patterns in a huge variety of different areas.  

Completing the course during COVID meant that collection access was limited, so I ended up completing a case study examining how generic tropes have evolved in science fiction across time using a dataset extracted from GoodReads. This work proved to be exceptionally useful in helping me to think about how computers understand language differently; and how we can leverage their ability to make statistical inferences in order to support our own, qualitative analyses. 

In my own collection area, working with born digital archives in Contemporary Archives and Manuscripts, we treat draft material — of novels, poems or anything else — as very important to understanding the creative process. I am excited to apply some of these techniques — particularly Unsupervised Machine Learning — to examine the hidden relationships between draft material in some of our creative archives. 

The course has provided many, many avenues of potential enquiry like this and I’m excited to see the projects that its graduates undertake across the Library.” 

-Callum McKean, Lead Curator, Digital; Contemporary British Collection

 

"I really enjoyed the Analytics Tools for Data Science module. As a data science novice, I came to the course with limited theoretical knowledge of how data science tools could be applied to answer research questions. The choice of using real-life data to solve queries specific to professionals in the cultural heritage sector was really appreciated as it made everyday applications of the tools and code more tangible. I can see now how curators’ expertise and specialised knowledge could be combined with tools for data analysis to further understanding of and meaningful research in their own collection area."

-Giulia Carla Rossi, Curator, Digital Publications; Contemporary British Collection

 

Final Report 

The Computing for Cultural Heritage project concluded in February 2021 with a virtual panel session that highlighted the learners’ projects and allowed discussion of the course and feedback to the key project coordinators and contributors. Case studies of the participants’ projects, as well as links to other blog posts and project pages can be found on our Computing for Cultural Heritage Student Projects page. 

The final report highlights these projects as well as demographical statistics on the participants and feedback that was gained through anonymous survey at the end of the trial. In order to evaluate the experience of the students on the PGCert we composed a list of questions that would provide insight into various aspects of the course with respect to how the learner fit in the work around their work commitments and how well they met the learning objectives. 

 

Why Computing for Cultural Heritage? 

Bar graph showing the results of the question 'Why did you choose to do this course' with the results discussed in the text below
Figure 1: Why did you choose to do this course? Results breakdown by topic and gender

When asked why the participants chose to take part in the course, we found that one of the most common answers was to develop methods for automating repetitive, manual tasks – such as generating unique identifiers for digital records and copying data between Excel spreadsheets – to free up more curatorial time for their digital collections. One participant said:  

“I wanted to learn more about coding and how to use it to analyse data, particularly data that I knew was rich and had value but had been stuck in multiple spreadsheets for quite some time.” 

There was also a desire to learn new skills, either for personal or professional development: 

“I believe in continuous professional development and knew that this would be an invaluable course to undertake for my career.”  

“I felt I was lagging behind and my job was getting static, and the feeling that I was behind [in digital] and I wanted to kind of catch up.” 

Bar graph showing the results to the question 'Did the course help you meet your aims?' with 14 answering yes, 1 answering no and 1 answering 'mixed'
Figure 2: 'Did the course help you meet your aims? Results broken down by answer and gender.

A follow up question asked whether these goals and aims was met by the course. Happily, most participants indicated that they had been met, for reasons of increased confidence, help in developing new computational skills, and a deeper knowledge of information technology. 

 

What was the most enjoyed aspect of the course? 

Bar graph showing the results of the question 'What did you enjoy most about the course' with the results discussed in the text below
Figure 3: 'What did you enjoy most about the course?' Results breakdown by topic and gender

When broken down, the responses to ‘What did you enjoy most’ largely reflect the student experience, whether it was being in taught modules (4), getting hands on experience (4), or being in a learning environment again (6). Participants also indicated that networking with peers was an enjoyable part of the experience: 

“Day out of work with like minded people made it really easy to stick with rather than just doing it online.”  

“Spending a day away from work and meeting the people I had never met at the NA, and also speaking to people from the BL about what they did.”  

“I enjoyed being a student again, learning a new skill amongst my peers, which week after week is a really valuable experience…” 

“Learning with colleagues and people working in similar fields was also a plus, as our interests often overlapped...” 

While only two responses were made where the project module was considered as one of the most enjoyable components, it was useful to see how the course really afforded the opportunity to apply their learning to solving a work-based problem that provides some benefit to their role, department or digital collection: 

“I really enjoyed being able to apply my learning to a real-world work-based project and to finally analyze some of the data that has been lying around the department for over a decade without any further analysis.”  

“The design and create aspect of the project. Applying what I learned to solving a genuine problem was the most enjoyable part - using Python and solving problems to achieve something tangible. This is where I really consolidated my learning.” 

 

What was the most challenging aspect of the course? 

Bar graph showing the results of the question 'What did you find the most challenging and why?' with the results discussed in the text below
Figure 4: 'What did you find the most challenging and why?' Results breakdown by topic and gender.

When discussing the most challenging aspect of the course, most of the learners focused on the practical Python lab sessions and the work-based project module. Interestingly, participants also stated that they were able to overcome the challenges through personal perseverance and the learning provided by the course itself: 

“I found the initial hurdle of learning how [to] code very challenging, but after the basics it became possible to become more creative and experimental.”  

“The work-based project was a huge challenge. We'd only really done 5 weeks of classes and, having never done anything like this before, it was hard to envisage an end product let alone how to put it together. But got there in the end!” 

While the majority of the cohort found the practical components of the PGCert trial most challenging, the feedback also suggested that the inclusion of the second module – which will be available as part of the full programme – will provide more opportunity to practice the practical programming skills like software tools and APIs. 

 

The Effectiveness of Computing with Cultural Heritage 

Bar graph showing the results of the question 'Have you applied anything you have learnt?' with 2 results for 'Data analysis concepts', 12 results for 'Python coding' and 2 results for 'Nothing'
Figure 5: 'Have you applied anything you have learnt?' Results breakdown by topic and gender.

Participants were asked whether they had used any of the knowledge or skills acquired in the PGCert trial. Even after sitting just the first and third modules, participants responded that they were able to apply their learning to their current role in some form.  

“I now regularly use the software program I built as part of my day-to-day job. This program performs a task in a few seconds, which otherwise could take hours or days, and which is otherwise subject to human error. I have since adapted this so that it can also be used by a colleague in another department.”  

“Python helps me perform tasks that I previously did not know how to achieve. I have also led a couple of training sessions within the library, introducing Python to beginners (using the software I built in the project as a cultural heritage use case to frame the introduction).” 

“I changed [job] role at the end of the course so I think that helped me also in getting this promotion. And in this new role I have many more data analysis tasks to perform [quickly] for actions that would take months so yeah I managed to write that with a few scripts in my new role.” 

It was great to hear that the impacts of the trial were being felt so immediately by the participants, and that they were able to not only retain but also apply the new skills that they had gained.  

 This blog post was written by Deirdre Sullivan, Business Support Officer for Digital Scholarship Training Initiatives, part of the Digital Research and Curators Team. Special thanks to Nora McGregor, Digital Curator for the European and American Collection for support on the blog post and Martyn Harris, Institute of Coding Manager, for his work on the final report, as well as Giulia Rossi, Callum McKean and Graham Jevon for sharing their experiences.

National Libraries Now: Wikimedians Unite!

On Friday 17th September 2012, I was delighted to participate in a conference panel for the National Libraries Now Conference. I had worked to assemble a veritable dream team of Wikimedia and library talent, to talk about Wikimedia Residencies from a four-nation perspective. 

Joining me on the panel were Stella Wisdom (British Library), Jason Evans (National Library of Wales), Rebecca O’Neill (Wikimedia Community Ireland) and Ruth Small (Digital Productions Operator, National Library of Scotland). Stuart Prior (Programme Coordinator, Wikimedia UK) kindly agreed to be our chair. We pre-recorded presentations that were circulated to participants, so that our time on the 17th could be devoted to questions and discussion.

Going over my notes now, the best way to try to reflect the discussion is to look at some of the questions asked and the responses garnered. Please bear in mind that some remarks may be out of chronological order!

  • How do you think working with Wikimedia helps your institution’s strategic goals?

We reflected as a group on the move from WikiPedians in Residence to WikiMedians in residence [emphasis my own] and how this shows a shift in institutional thinking towards the potential of larger Wikimedia projects, and the use of platforms such as Commons, Wikisource and WikiBase.

Jason spoke about the way that fewer onsite footfall numbers at NLW, because of its physical location, enhance the importance of digital work and online outreach. He also spoke about the need for training, promotion and contribution through Wikimedia platforms as being just as valuable, if not more so, than the total number of views gained.

Image of National Library of Wales, Aberystwyth
It might not be digital, but it is a beauty! Ian Capper, via Wikimedia Commons.

 

The National Library of Scotland is in the heart of Edinburgh, so does not face the same issues with footfall, however, as Ruth pointed out, a key strategic goal of the Library is to reach people, and digitising is not the end of the road. Engagement with collections like the NLS Data Foundry is crucial, and the groundbreaking Scottish Chapbooks project run by the NLS was born out of the pandemic, showing a new imagining of institutional goals.

  • How do you incorporate Wikimedia work into your ‘normal’ work?

It was agreed that the inclusion of Wiki in job descriptions could help change at an institutional level, while Rebecca pointed out that the inclusion of Wiki activity as an outreach activity in funding applications is often a good way forward for inclusion of this work as part of major research projects. Again, advocacy and emphasis on the ease with which Wiki work can be undertaken was a key focal point, showing colleagues that their interests and our tools can align well.

  • How do you implement elements of quality control to what is ultimately crowdsourced work?

Jason suggested that we start to think about ‘context’ control: we can upload content and edit and amend details from the beginning, however how we contextualise this material and the activity of Wiki engagement is crucial. There is a high level of quality in curation already, and often Wiki datasets will link back to other repositories such as Flickr or institutional catalogues.

The classic counterpoint of ‘anyone can edit’ and ‘everyone can edit’ came to the fore here: as was rightly pointed out, the early 00s impression of Wikipedia as a free-for-all is largely outdated. In fact, expectations are often inverted, as the enthusiastic and diligent Wiki community are quick to act upon misinformation or inaccuracies. We spoke about the beauty of the process in Wikimedia whereby information picks up value and enriched data along the way, an active evolution of resources.

Image of WIkipedia welcome page stating 'the free encyclopedia that anyone can edit'
The WIkipedia landing page: anyone can edit!

 

  • What about decolonisation and Wikimedia?

Decolonisation is a huge question for Wikimedia: movements around the world are examining what we can do to better serve the larger cause of anti-racist practice. For the British Library, I spoke about the work we have done on the India Office Records in offering a template for content warnings and working with the input of our colleagues to make this as robust of a model as we can.

Rebecca’s experience of working in Ireland was incredibly insightful: she shared with us the experience of working with Irish material that is shaped by colonial ideas of what Ireland is, and how the culture has formed. Despite being a white, European, primarily English-speaking nation, the influence of colonialism is still felt.

The use of Wikimedia as a tool for breaking down barriers is vital, as each of our speakers illustrated. Jason spoke about the digital repatriation of items, and gave an example of the Red Book of Hergest, held by Jesus College Oxford (MS 111) and now available through Wikimedia Commons. Though this kind of action cannot always stand in place of physical repatriation, the move towards collaboration is notable and important.

 

An image of anti-Irish propaganda, featuring an Irish Frankenstein figure
'The Irish Frankenstein', a piece of anti-Irish propaganda from 1882. John Tenniel, Public domain, via Wikimedia Commons.

 

An hour was simply not enough! National Libraries Now was an incredibly important experience for me, at this point in my residency. I was particularly delighted with the dedication and enthusiasm of my co-panelists, and hope that we were able to shed some light on the Wikimedian-in-Residence role for those attending.

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian).

25 August 2021

Dabbling in DCMI

One of the best bits of working in digital scholarship is the variety of learning, training and knowledge exchange we can participate in. I have come to my post as a Wikimedian with a background in digital humanities and voluntary experience, and the opportunity to solidify my skills through training courses is really exciting.

Shortly after I started at the library, I had the chance to participate in the Library Juice Academy’s course ‘Introduction to Metadata’. Metadata has always fascinated me: as someone who can still remember when the internet was installed in their house, by means of numerous AOL compact discs, the way digital information has developed is something I have had direct experience of, even if I didn’t realise it.

Green and yellow CD with 1990s AOL branding.
Image of AOL CD, courtesy of archive.org.

Metadata, simply put, is data about data. It tells us information about resource you might find in a library or museum: the author of a book, the composer of a song, the artist behind a painting. In analogue terms, this is like the title page in a novel. In digital terms, it sits alongside the content of the resource, in attached records or headers. In the Dublin Core Metadata Initiative format, one of the most common ways of expressing metadata, there are fifteen separate ‘elements’ you can apply to describe a resource, such as title, date, format and publisher.

Wikidata houses an amazing amount of data, which is unusual as it is not bounded by a set number of ‘elements’. There are many different ways of describing the items on Wikidata, and many properties and statements can be added to each item. There have been initiatives to integrate Wikidata and metadata in a meaningful way, such as the WikiProject Source Metadata and WikiCite. I have certainly found it very useful to have a sound understanding of metadata and its function, in order to utilise Wikidata effectively.

Image of Wikicite logo, with birthday branding.
Wikicite 8th Birthday Logo by bleeptrack.

The Library Juice Academy course was asynchronous and highly useful. Over four weeks, we completed modules involving self-selected readings, discussion forum posts and video seminars. I particularly enjoyed the varied selection of readings: the group of participants came from a breadth of backgrounds and experiences, and the readings reflected this. The balance between theoretical reading and practical application was excellent, and I enjoyed getting to work with MARCEdit for the first time.

I completed the course in May 2021, and was delighted to receive my certificate by email. I have a much stronger handle on the professional standard of metadata in the GLAM sector and how this intersects with the potential of the vast array of data descriptors available in Wikidata. It was also a great opportunity to think about the room for nuance, subjectivity and bias in data. During Week One, we considered ‘Misinformation and Bias in Data Processing’ by Thornburg and Oskins. I said the following in our forum discussion:

“What I have taken from this piece is a real sense of the hard work that goes into the preparation of resources, and the many different forms bias can take, often inadvertently. It has made me think about and appreciate the difficult decisions that have to be made, and the processes that underlie these practices.”

Overall, participating in this course and expanding my skills into more traditional librarianship fields was fascinating, and left me eager to learn more about metadata and start working more closely with our collections and Wikidata.

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian).

12 August 2021

Dates to discuss Wikidata at Wikimania 2021

Wikimania is often the highlight of any Wikimedian’s calendar. Hosted by the Wikimedia Foundation, Wikimania is a conference like no other. A large number of participants take part in the annual celebration of open knowledge and Wikimedia projects. Previous events have taken place in  Stockholm (2019), Cape Town (2018), Montreal (2017) and Italy (2016). Due to the ongoing global pandemic situation, this year's conference being held 13-17 August 2021 is taking place entirely online, something Wikimania is ideally suited for!

  Logo for Wikimania 2021, 4 squares, 1 with a drawing of 12 peoples faces as if they are in a videocall, the 2nd of 2 jigsaw puzzle pieces, the 3rd of paper confetti and the 4th square showing 2 people sitting at a table talking

In addition to more traditional conference sessions, Wikimania will be running an Unconference, a Community Village, and a community Hackathon. Communication is encouraged through a variety of channels including Telegram, IRC and Wiki talk pages.

Telegram machine
A photograph of an old telegraph key by Sandra Tan on Unsplash

Looking at the programme, so many interesting topics are on the table for presentation and discussion: from copyright reform, to innovation and community development, there’s a wide spectrum of material to interest all Wikimedians of every level. Handily, events are rated in terms of their suitability for beginners, to make things as welcoming as possible. There is a whole strand of presentations devoted to Wikidata, which you can view here.

I am very excited to be presenting remotely at this conference on behalf of the British Library. I will be introducing the work of Tom Derrick on the Bengali Books Wikisource Competition, and Dominic Kane (UCL) on the India Office Records project. We have shaped our panel to show what GLAM institutions can do to promote and effectively utilise Wiki platforms for public engagement with library and archive collections. Our panel will run on Sunday 15th of August at 8.15pm (7.15pm UTC).

Wikimania is free to attend online, 13-17 August 2021, registration is open until midnight on Thursday 12th August. We hope to see you there!

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian)

14 June 2021

Adding Data to Wikidata is Efficient with QuickStatements

Once I was set up on Wikipedia (see Triangulating Bermuda, Detroit and William Wallace), I got started with Wikidata. Wikidata is the part of the Wikimedia universe which deals with structured data, like dates of birth, shelf marks and more.

Adding data to Wikidata is really simple: it just requires logging into Wikidata (or creating an account if you don’t already have one) and then pressing edit on any page. you want to edit.

Image of a Wikidata entry about Earth
Editing Wikidata

If the page doesn’t already exist, then creating it is also very simple: just select ‘create a new item’ from the menu on the left-hand side of the page.

When using Wikidata, there are some powerful tools which make adding data quicker and easier. One of these is Quick Statements. Unfortunately, using QuickStatements requires that you have made 50 edits on Wikidata before you make your first batch. Fortunately, it is rather quicker than Citation Hunt (for which, see Triangulating Bermuda, Detroit and William Wallace).

Image of Wikidata menu with 'Create a new item' highlighted
Creating a new item in Wikidata

I made those 50 edits very quickly, by setting up Wikidata item pages for each of the sample items from the India Office Records that we are working with (at the moment we are prioritising adding information about the records; further work will take place before any digitised items are uploaded to Wikimedia platforms). Basic information was added to each of the item pages.

Q107074264 (India Office List January 1885)

Q107074434 (India Office List July 1885)

Q107074463 (India Office List January 1886)

Q107074676 (India Office List July 1886)

Q107074754 (India Office List 1886 Supplement)

Q107074810 (1888-9 Report on the Administration of Bengal)

Q107074801 (1889-90 Report on the Administration of Bengal)

Once I had done this, it became clear that I needed to create more general pages, which could contain the DOIs that link back to the digitised records which are currently only accessible via batch download through the British Library research repository.

Q107134086 Page for administrative reports (V/10/60-1) in general.

Q107136752 Page for India lists (v/13/173-6) in general.

Image of the WikiProject page for the India Office Records
The WikiProject page for the India Office Records

The final preparatory step was to create a WikiProject page, which will facilitate collaboration on the project. This page contains links to all the pages involved in the project and will soon also contain useful resources such as templates for creating new pages as part of the project and queries for using the data.

After this, I began to experiment with Quick Statements, making heavy use of the useful guide to it available on Wikidata.

I decided to upload information on members of a particular regiment in Bengal, since this was information I could easily copy into a spreadsheet because the versions of the reports in the British Library research repository support Optical Character Recognition (OCR).

Image of the original India Office List containing information on members of the 14th Infantry Regiment
Section of the original India Office List containing information on members of the 14th Infantry Regiment (IOR/V/6/175, page 258)

Finally, once I had done all of this, I met with the curators of the India Office Records for feedback and suggestions. It became clear from this that there was in fact some confusion about the exact identification of the regiment they were involved in. Fortunately, it turned out we had identified the correct regiment, but had we made a mistake, it would have just required a simple batch of the Quick Statement edits to quickly put right.

Image of a section of a spreadsheet of members of the 14th Infantry Regiment
Section of my spreadsheet of members of the 14th Infantry Regiment

All in all, I can recommend using Wikidata and I hope I have shown that I can be a useful tool, but also that it is easy to use. The next step for our Wikidata project will be to upload templates and case studies to help and support future volunteer editors to develop it further. We will also add resources to support research on the uploaded data.

Image of Quick Statements for adding gender to each of the pages for the officers
Screenshot of Quick Statements for adding gender to each of the pages for the officers

This is a guest post by UCL Digital Humanities MA student Dominic Kane.

02 June 2021

Triangulating Bermuda, Detroit and William Wallace

Last Monday I began a work placement with the British Library working with its Wikimedian-in-Residence, Dr Lucy Hinnie, to add information and text from the India Office Records to Wikisource and Wikidata.

My first day mainly consisted of a several different meetings. I was introduced to the team dealing with the India Office Records, which really helped me to get a better sense of the importance of the project and its key objectives. I then attended a metadata workshop (metadata is, generally speaking, data about data, e.g. the author of a book, the time a photo was taken etc). This introduced me to the British Library’s current metadata practices and will be very useful when I begin to upload data to Wikidata in ensuring it is as useful as possible. Finally, I attended a meeting with the curators of the Contemporary British collections, which gave me an overview of the range of the Library’s activities online, its current and future exhibitions and its holdings.

On my second day, I finished my basic Wikipedia training and moved on to getting fully registered, which is needed if you want to add new pages to Wikipedia. This requires 10 edits to existing Wikipedia pages. The fastest way to do this was by completing Citation Hunt, according to Dr Hinnie. What she did not mention was Citation Hunt is roughly what would happen if the British Library catalogue and the Easter Bunny came together to plan an Easter egg hunt in St Pancras.

Screen grab showing the interface for Citation Hunt
Screenshot of Citation Hunt

Citation Hunt gives a random passage of Wikipedia in need of citation and you can either add a citation or skip to another. As you might imagine, these pages are completely unrelated to one another. As such, Citation Hunt had me trawling the internet for such delights as:

• Proof that William Wallace appeared in Age of Empires II. Unfortunately, ‘I remember that bit from when I played’ does not meet Wikipedia’s reliable source guidelines. (William Wallace - Wikipedia)

• A discussion of the OECD ‘Acquis Communautaire.’ (Acquis communautaire - Wikipedia)

• The amount of RAM of in an Atari 1040ST, even though that computer is well and truly before my time. (Atari ST - Wikipedia)

• Evidence that Bill Gates invested in a particular company. (Bill Gates - Wikipedia)

I also found myself lost in the Bermudan Economy (Economy of Bermuda - Wikipedia) and growing into researching commercial agriculture (Ethylene - Wikipedia). Most surreal of all was adding directions from Google Maps for the relative locations of two places in Detroit. (Detroit - Wikipedia) I have never been to Detroit…

Ending my first week, I attended a meeting of the British Library’s Digital Scholarship team. It was really interesting to hear about all the different digital initiatives going on, both within the BL and in partnership with other organisations.

This week, I'm having further training on the tools I will need to use for this project and then, for the remaining four weeks of the placement, I will be uploading and enriching data from the India Office Records.

I look forward to updating you soon on the progress I make!

This is a guest post by UCL Digital Humanities MA student Dominic Kane.

26 May 2021

Endangered Archives and Notable Women

At the beginning of this month, I began a work placement with the British Library's Endangered Archives Programme (EAP). The EAP hosted a group of University College London students for several projects, and I was working to further connect EAP collections with Wikimedia. We were able to tailor the project to our interests, which meant that I was able to spend my placement researching and writing about two pioneering women photographers, Marie-Lydie Bonfils (EAP644) and Annemarie Heinrich (EAP755).

Creating a Wikipedia article

I began with Marie-Lydie Bonfils (1837–1918), an early woman photographer and co-owner of the Maison Bonfils studio in Beirut. The Bonfils family archive was digitised in a 2013 project between the EAP and the Jafet Memorial Library, American University of Beirut, and the physical archive is currently preserved at the Sursock Museum.

Perhaps unsurprisingly to those interested in women’s history, while her husband, Félix Bonfils, already had his own Wikipedia article, Marie-Lydie did not. So, I created a new article for her, adding to Félix’s along the way as well. I worked from as many biographical sources as I could possibly access online, including the excellent EAP blog post on Marie-Lydie.

Image of Marie-Lydie Cabanis Bonfils Wikipedia entry
Marie-Lydie Cabanis Bonfils' Wikipedia entry

Wikipedia’s notability criteria were a concern for me when publishing. Topics on Wikipedia must be considered “notable” to avoid needless and self-promotional content. This can have the unintended consequence of noteworthy articles being removed if they are not able to demonstrate their significance to other users. Balancing the objective language of Wikipedia with the need to persuade others of Marie-Lydie’s importance was something I had to be careful of when writing the text.

Once published, the article was given a C rating, which shows room for improvement and expansion. As I was waiting in suspense to see if the article would be removed entirely, a C was really quite exciting! Wikipedia articles are ongoing, collaborative projects rather than the completed essays that I am more used to in my studies. This has encouraged me to have a different and more productive mindset about my work more broadly.

Editing a Wikipedia article

Next, I began to look into Annemarie Heinrich (1912–2005). A German photographer who lived most of her life in Argentina, Heinrich was particularly famous for her celebrity portraits, such as those of Carmen Miranda, Pablo Neruda and Eva Perón. Her archive was added to the EAP collections in 2016, in a project with the Institute for Research in Art and Culture, Universidad Nacional de Tres de Febrero, Argentina. I expanded upon Heinrich’s short existing Wikipedia article.

On beginning my research, I discovered that her article on Spanish Wikipedia was much more extensive. This provided a useful starting point for biographical information and tracking down additional citations (thank you GCSE Spanish!). Heinrich’s lack of recognition on the English-speaking web made research difficult, but also highlighted the importance of adding more information about her onto English Wikipedia.

Black and white image of Annemarie Heinrich
A portrait of Annemarie Heinrich, date unknown. Public Domain.

Wrapping my head around Wikidata

I was also introduced to Wikidata on my placement, another of Wikimedia’s projects consisting of open linked data and a completely unknown field to me. On the placement, we were able to attend the IFLA Wikidata and Wikibase Working Group office hour. The thought-provoking whistle-stop tour of the platform that we were given in this meeting had me creating an account immediately after closing the Zoom call tab.

Image of the Wikidata logo
Wikidata logo, Public Domain.,

As expected due to their Wikipedia articles, Félix Bonfils and Annemarie Heinrich had Wikidata item entries already, but so did Marie-Lydie, their son, Adrien, and Maison Bonfils. This is likely because of the generally less intensive notability criteria on Wikidata.

I did have a few challenges with Wikidata over my second week. One arose when I tried to add the EAP to the Bonfils’ items. Adrien Bonfils had an existing property for “has works in the collection”, with museums and galleries listed, so I added the EAP to this section. However, on looking at a similar artist’s item entry, I found that there is also a property for “archives at” that might better apply.

Image of a Wikidata entry about the Bonfils Collection
Wikidata entry for the Bonfils Collection

Seeing this, I not only realised that I might have used the wrong category, but also that there might be others that were more relevant that I just hadn’t seen yet! Being able to search for each qualifier allows for a flexible and tailored user experience but, for a newbie, the amount of choice can be a bit overwhelming! The upside is that Wikidata is quite forgiving, with changes easily made and explanatory symbols popping up when the system recognises a mistake (as can be seen in the image below).

Image of amended Wikidata entry for the Bonfils Collection
Amended Wikidata entry for the Bonfils Collection on Wikidata

To sum up, researching the lives and careers of these women photographers from the EAP collections has been fascinating. It has been so rewarding to help to increase their online discoverability, and that of the EAP.

Working remotely, this placement was bound to be unusual in some ways, but the BL team was really welcoming and encouraged us all to ask lots of questions (which I absolutely did!). I have learnt a lot about Wikimedia in these few weeks and I will definitely continue exploring and making edits in the future.

This is a guest post by UCL Archives and Records Management MA student and recent Wiki convert, Hope Lowther (@hopelowther)

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs