Digital scholarship blog

44 posts categorized "Manuscripts"

16 June 2022

Working With Wikidata and Wikimedia Commons: Poetry Pamphlets and Lotus Sutra Manuscripts

Greetings! I’m Xiaoyan Yang, from Beijing, China, an MSc student at University College London. It was a great pleasure to have the opportunity to do a four-week placement at the British Library and Wikimedia UK under the supervision of Lucy Hinnie, Wikimedian in Residence, and Stella Wisdom, Digital Curator, Contemporary British Collections. I mainly focused on the Michael Marks Awards for Poetry Pamphlets Project and Lotus Sutra Project, and the collaboration between the Library and Wikimedia.

What interested you in applying for a placement at the Library?

This kind of placement, in world-famous cultural institutions such as the Library and Wikimedia is  a brand-new experience for me. Because my undergraduate major is economic statistics, most of my internships in the past were in commercial and Internet technology companies. The driving force of my interest in digital humanities research, especially related data, knowledge graph, and visualization, is to better combine information technologies with cultural resources, in order to reach a wider audience, and promote the transmission of cultural and historical memory in a more accessible way.

Libraries are institutions for the preservation and dissemination of knowledge for the public, and the British Library is one of the largest and best libraries in the world without doubt. It has long been a leader and innovator in resource protection and digitization. The International Dunhuang Project (IDP) initiated by the British Library is now one of the most representative transnational collaborative projects of digital humanistic resources in the field. I applied for a placement opportunity hoping to learn more about the usage of digital resources in real projects and the process of collaboration from the initial design to the following arrangement. I also wanted  to have the chance to get involved in the practice of linked data, to accumulate experience, and find the direction of future improvements.

I would like to thank Dr Adi Keinan-Schoonbaert for her kind introduction to the British Library's Asian and African Digitization projects, especially the IDP, which has enabled me to learn more about the librarian-led practices in this area. At the same time, I was very happy to sit in on the weekly meetings of the Digital Scholarship Team during this placement, which allowed me to observe how collaboration between different departments are carried out and managed in a large cultural resource organization like the British Library.

Excerpt from Lotus Sutra Or.8210 S.155. An old scroll of parchment showing vertical lines of older Chinese script.
Excerpt from Lotus Sutra Or.8210 S.155. Kumārajīva, CC BY 4.0, via Wikimedia Commons

What is the most surprising thing you have learned?

In short, it is so easy to contribute knowledge at Wikimedia. In this placement, one of my very first tasks was to upload information about winning and shortlisted poems of the Michael Marks Awards for Poetry Pamphlets for each year from 2009 to the latest, 2021, to Wikidata. The first step was to check whether this poem and its author and publisher already existed in Wikidata. If not, I created an item page for it. Before I started, I thought the process would be very complicated, but after I started following the manual, I found it was actually really easy. I just need to click "Create a new Item". 

I always remember that the first item of people that I created was Sarah Jackson, one of the shortlist winners of this award in 2009. The unique QID was automatically generated as Q111940266. With such a simple operation, anyone can contribute to the vast knowledge world of Wiki. Many people who I have never met may read this item page  in the future, a page created and perfected by me at this moment. This feeling is magical and full of achievement for me. Also, there are many useful guides, examples and batch loading tools such as Quickstatements that help the users to start editing with joy. Useful guides include the Wikidata help pages for Quickstatements and material from the University of Edinburgh.

Image of a Wikimedia SPARQL query to determine a list of information about the Michael Marks Poetry Pamphlet uploads.
An example of one of Xiaoyan’s queries - you can try it here!

How do you hope to use your skills going forward?

My current dissertation research focuses on the regional classic Chinese poetry in the Hexi Corridor. This particular geographical area is deeply bound up with the Silk Road in history and has inspired and attracted many poets to visit and write. My project aims to build a proper ontology and knowledge map, then combining with GIS visualization display and text analysis, to explore the historical, geographic, political and cultural changes in this area, from the perspective of time and space. Wikidata provides a standard way to undertake this work. 

Thanks to Dr Martin Poulter’s wonderful training and Stuart Prior’s kind instructions, I quickly picked up some practical skills on Wiki queries construction. The layout design of the timeline and geographical visualization tools offered by Wiki query inspired me to improve my skills in this field more in the future. What’s more, although I haven’t had a chance to experience Wikibase yet, I am very interested in it now, thanks to Dr Lucy Hinnie and Dr Graham Jevon’s introduction, I will definitely try it in future.

Would you like to share some Wiki advice with us?

Wiki is very self-learning friendly: on the Help page various manuals and examples are presented, all of which are very good learning resources. I will keep learning and exploring in the future.

I do want to share my feelings and a little experience with Wikidata. In the Michael Marks Awards for Poetry Pamphlets Project, all the properties used to describe poets, poems and publishers can be easily found in the existing Wikidata property list. However, in the second Lotus Sutra Project, I encountered more difficulties. For example, it is difficult to find suitable items and properties to represent paragraphs of scrolls’ text content and binding design on Wikidata, and this information is more suitable to be represented on WikiCommons at present.

However, as I learn more and more other Wikidata examples, I understand more and more about Wikidata and the purpose of these restrictions. Maintaining concise structured data and accurate correlation is one of the main purposes of Wikidata. It encourages reuse of existing properties as well as imposing more qualifications on long text descriptions. Therefore, this feature of Wikidata needs to be taken into account from the outset when designing metadata frameworks for data uploading.

In the end, I would like to sincerely thank my direct supervisor Lucy for her kind guidance, help, encouragement and affirmation, as well as the British Library and Wikimedia platform. I have received so much warm help and gained so much valuable practical experience, and I am also very happy and honored that by using my knowledge and technology I can make a small contribution to linked data. I will always cherish the wonderful memories here and continue to explore the potential of digital humanities in the future.

This post is by Xiaoyan Yang, an MSc student at University College London, and was edited by Wikimedian in Residence Dr Lucy Hinnie (@BL_Wikimedian) and Digital Curator Stella Wisdom (@miss_wisdom).

14 March 2022

The Lotus Sutra Manuscripts Digitisation Project: the collaborative work between the Heritage Made Digital team and the International Dunhuang Project team

Digitisation has become one of the key tasks for the curatorial roles within the British Library. This is supported by two main pillars: the accessibility of the collection items to everybody around the world and the preservation of unique and sometimes, very fragile, items. Digitisation involves many different teams and workflow stages including retrieval, conservation, curatorial management, copyright assessment, imaging, workflow management, quality control, and the final publication to online platforms.

The Heritage Made Digital (HMD) team works across the Library to assist with digitisation projects. An excellent example of the collaborative nature of the relationship between the HMD and International Dunhuang Project (IDP) teams is the quality control (QC) of the Lotus Sutra Project’s digital files. It is crucial that images meet the quality standards of the digital process. As a Digitisation Officer in HMD, I am in charge of QC for the Lotus Sutra Manuscripts Digitisation Project, which is currently conserving and digitising nearly 800 Chinese Lotus Sutra manuscripts to make them freely available on the IDP website. The manuscripts were acquired by Sir Aurel Stein after they were discovered  in a hidden cave in Dunhuang, China in 1900. They are thought to have been sealed there at the beginning of the 11th century. They are now part of the Stein Collection at the British Library and, together with the international partners of the IDP, we are working to make them available digitally.

The majority of the Lotus Sutra manuscripts are scrolls and, after they have been treated by our dedicated Digitisation Conservators, our expert Senior Imaging Technician Isabelle does an outstanding job of imaging the fragile manuscripts. My job is then to prepare the images for publication online. This includes checking that they have the correct technical metadata such as image resolution and colour profile, are an accurate visual representation of the physical object and that the text can be clearly read and interpreted by researchers. After nearly 1000 years in a cave, it would be a shame to make the manuscripts accessible to the public for the first time only to be obscured by a blurry image or a wayward piece of fluff!

With the scrolls measuring up to 13 metres long, most are too long to be imaged in one go. They are instead shot in individual panels, which our Senior Imaging Technicians digitally “stitch” together to form one big image. This gives online viewers a sense of the physical scroll as a whole, in a way that would not be possible in real life for those scrolls that are more than two panels in length unless you have a really big table and a lot of specially trained people to help you roll it out. 

Photo showing the three individual panels of Or.8210S/1530R with breaks in between
Or.8210/S.1530: individual panels
Photo showing the three panels of Or.8210S/1530R as one continuous image
Or.8210/S.1530: stitched image

 

This post-processing can create issues, however. Sometimes an error in the stitching process can cause a scroll to appear warped or wonky. In the stitched image for Or.8210/S.6711, the ruled lines across the top of the scroll appeared wavy and misaligned. But when I compared this with the images of the individual panels, I could see that the lines on the scroll itself were straight and unbroken. It is important that the digital images faithfully represent the physical object as far as possible; we don’t want anyone thinking these flaws are in the physical item and writing a research paper about ‘Wonky lines on Buddhist Lotus Sutra scrolls in the British Library’. Therefore, I asked the Senior Imaging Technician to restitch the images together: no more wonky lines. However, we accept that the stitched images cannot be completely accurate digital surrogates, as they are created by the Imaging Technician to represent the item as it would be seen if it were to be unrolled fully.

 

Or.8210/S.6711: distortion from stitching. The ruled line across the top of the scroll is bowed and misaligned
Or.8210/S.6711: distortion from stitching. The ruled line across the top of the scroll is bowed and misaligned

 

Similarly, our Senior Imaging Technician applies ‘digital black’ to make the image background a uniform colour. This is to hide any dust or uneven background and ensure the object is clear. If this is accidentally overused, it can make it appear that a chunk has been cut out of the scroll. Luckily this is easy to spot and correct, since we retain the unedited TIFFs and RAW files to work from.

 

Or.8210/S.3661, panel 8: overuse of digital black when filling in tear in scroll. It appears to have a large black line down the centre of the image.
Or.8210/S.3661, panel 8: overuse of digital black when filling in tear in scroll

 

Sometimes the scrolls are wonky, or dirty or incomplete. They are hundreds of years old, and this is where it can become tricky to work out whether there is an issue with the images or the scroll itself. The stains, tears and dirt shown in the images below are part of the scrolls and their material history. They give clues to how the manuscripts were made, stored, and used. This is all of interest to researchers and we want to make sure to preserve and display these features in the digital versions. The best part of my job is finding interesting things like this. The fourth image below shows a fossilised insect covering the text of the scroll!

 

Black stains: Or.8210/S.2814, panel 9
Black stains: Or.8210/S.2814, panel 9
Torn and fragmentary panel: Or.8210/S.1669, panel 1
Torn and fragmentary panel: Or.8210/S.1669, panel 1
Insect droppings obscuring the text: Or.8210/S.2043, panel 1
Insect droppings obscuring the text: Or.8210/S.2043, panel 1
Fossilised insect covering text: Or.8210/S.6457, panel 5
Fossilised insect covering text: Or.8210/S.6457, panel 5

 

We want to minimise the handling of the scrolls as much as possible, so we will only reshoot an image if it is absolutely necessary. For example, I would ask a Senior Imaging Technician to reshoot an image if debris is covering the text and makes it unreadable - but only after inspecting the scroll to ensure it can be safely removed and is not stuck to the surface. However, if some debris such as a small piece of fluff, paper or hair, appears on the scroll’s surface but is not obscuring any text, then I would not ask for a reshoot. If it does not affect the readability of the text, or any potential future OCR (Optical Character Recognition) or handwriting analysis, it is not worth the risk of damage that could be caused by extra handling. 

Reshoot: Or.8210/S.6501: debris over text  /  No reshoot: Or.8210/S.4599: debris not covering text.
Reshoot: Or.8210/S.6501: debris over text  /  No reshoot: Or.8210/S.4599: debris not covering text.

 

These are a few examples of the things to which the HMD Digitisation Officers pay close attention during QC. Only through this careful process, can we ensure that the digital images accurately reflect the physicality of the scrolls and represent their original features. By developing a QC process that applies the best techniques and procedures, working to defined standards and guidelines, we succeed in making these incredible items accessible to the world.

Read more about Lotus Sutra Project here: IDP Blog

IDP website: IDP.BL.UK

And IDP twitter: @IDP_UK

Dr Francisco Perez-Garcia

Digitisation Officer, Heritage Made Digital: Asian and African Collections

Follow us @BL_MadeDigital

29 September 2021

Sailing Away To A Distant Land - Mahendra Mahey, Manager of BL Labs - final post

Posted by Mahendra Mahey, former Manager of British Library Labs or "BL Labs" for short

[estimated reading time of around 15 minutes]

This is is my last day working as manager of BL Labs, and also my final posting on the Digital Scholarship blog. I thought I would take this chance to reflect on my journey of almost 9 years in helping to set up, maintain and enabling BL Labs to become a permanent fixture at the British Library (BL).

BL Labs was the first digital Lab in a national library, anywhere in the world, that gets people to experiment with its cultural heritage digital collections and data. There are now several Gallery, Library, Archive and Museum Labs or 'GLAM Labs' for short around the world, with an active community which I helped build, from 2018.

I am really proud I was there from the beginning to implement the original proposal which was written by several colleagues, but especially Adam Farquhar, former head of Digital Scholarship at the British Library (BL). The project was at first generously funded by the Andrew W. Mellon foundation through four rounds of funding as well as support from the BL. In April 2021, the project became a permanently funded fixture, helped very much by my new manager Maja Maricevic, Head of Higher Education and Science.

The great news is that BL Labs is going to stay after I have left. The position of leading the Lab will soon be advertised. Hopefully, someone will get a chance to work with my helpful and supportive colleague Technical Lead of Labs, Dr Filipe Bento, bright, talented and very hard working Maja and other great colleagues in Digital Research and wider at the BL.

The beginnings, the BL and me!

I met Adam Farquhar and Aly Conteh (Former Head of Digital Research at the BL) in December 2012. They must have liked something about me because I started working on the project in January 2013, though I officially started in March 2013 to launch BL Labs.

I must admit, I had always felt a bit intimidated by the BL. My first visit was in the early 1980s before the St Pancras site was opened (in 1997) as a Psychology student. I remember coming up from Wolverhampton on the train to get a research paper about "Serotonin Pathways in Rats when sleeping" by Lidov, feeling nervous and excited at the same time. It felt like a place for 'really intelligent educated people' and for those who were one for the intellectual elites in society. It also felt for me a bit like it represented the British empire and its troubled history of colonialism, especially some of the collections which made me feel uncomfortable as to why they were there in the first place.

I remember thinking that the BL probably wasn't a place for some like me, a child of Indian Punjabi immigrants from humble beginnings who came to England in the 1960s. Actually, I felt like an imposter and not worthy of being there.

Nearly 9 years later, I can say I learned to respect and even cherish what was inside it, especially the incredible collections, though I also became more confident about expressing stronger views about the decolonisation of some of these.  I became very fond of some of the people who work or use it, there are some really good kind-hearted souls at the BL. However, I never completely lost that 'imposter and being an outsider' feeling.

What I remember at that time, going for my interview, was having this thought, what will happen if I got the position and 'What would be the one thing I would try and change?'. It came easily to me, namely that I would try and get more new people through the doors literally or virtually by connecting them to the BL's collections (especially the digital). New people like me, who may have never set foot, or had been motivated to step into the building before. This has been one of the most important reasons for me to get up in the morning and go to work at BL Labs.

So what have been my highlights? Let's have a very quick pass through!

BL Labs Launch and Advisory Board

I launched BL Labs in March 2013, one week after I had started. It was at the launch event organised by my wonderfully supportive and innovative colleague, Digital Curator Stella Wisdom. I distinctly remember in the afternoon session (which I did alone), I had to present my 'ideas' of how I might launch the first BL Labs competition where we would be trying to get pioneering researchers to work with the BL's digital collections.

God it was a tough crowd! They asked pretty difficult questions, questions I myself was asking too which I still didn't know the answer too either.

I remember Professors Tim Hitchcock (now at Sussex University and who eventually sat (and is still sitting) on the BL Labs Advisory Board) and Laurel Brake (now Professor Emerita of Literature and Print Culture, Birkbeck, University of London) being in the audience together with staff from the Royal Library of Netherlands, who 6 months later launched their own brilliant KB Lab. Subsequently, I became good colleagues with Lotte Wilms who led their Lab for many years and is now Head of Research support at Tilburg University.

My first gut feeling overall after the event was, this is going to be hard work. This feeling and reality remained a constant throughout my time at BL Labs.

In early May 2013, we launched the competition, which was a really quick and stressful turnaround as I had only officially started in mid March (one and a half months). I remember worrying as to whether anyone would even enter!  All the final entries were pretty much submitted a few minutes before the deadline. I remember being alone that evening on deadline day near to midnight waiting by my laptop, thinking what happens if no one enters, it's going to be disaster and I will lose my job. Luckily that didn't happen, in the end, we received 26 entries.

I am a firm believer that we can help make our own luck, but sometimes luck can be quite random! Perhaps BL Labs had a bit of both!

After that, I never really looked back! BL Labs developed its own kind of pattern and momentum each year:

  • hunting around the BL for digital collections to make into datasets and make available
  • helping to make more digital collections openly licensed
  • having hundreds of conversations with people interested in connecting with the BL's digital collections in the BL and outside
  • working with some people more intensively to carry out experiments
  • developing ideas further into prototype projects
  • telling the world of successes and failures in person, meetings, events and social media
  • launching a competition and awards in April or May
  • roadshows before and after with invitations to speak at events around the world
  • the summer working with competition winners
  • late October/November the international symposium showcased things from the year
  • working on special projects
  • repeat!

The winners were announced in July 2013, and then we worked with them on their entries showcasing them at our annual BL Labs Symposium in November, around 4 months later.

'Nothing interesting happens in the office' - Roadshows, Presentations, Workshops and Symposia!

One of the highlights of BL Labs was to go out to universities and other places to explain what the BL is and what BL Labs does.  This ended up with me pretty much seeing the world (North America, Europe, Asia, Australia, and giving virtual talks in South America and Africa).

My greatest challenge in BL Labs was always to get people to truly and passionately 'connect' with the BL's digital collections and data in order to come up with cool ideas of what to actually do with them. What I learned from my very first trip was that telling people what you have is great, they definitely need to know what you have! However, once you do that, the hard work really begins as you often need to guide and inspire many of them, help and support them to use the collections creatively and meaningfully. It was also important to understand the back story of the digital collection and learn about the institutional culture of the BL if people also wanted to work with BL colleagues.  For me and the researchers involved, inspirational engagement with digital collections required a lot of intellectual effort and emotional intelligence. Often this means asking the uncomfortable questions about research such as 'Why are we doing this?', 'What is the benefit to society in doing this?', 'Who cares?', 'How can computation help?' and 'Why is it necessary to even use computation?'.

Making those connections between people and data does feel like magic when it really works. It's incredibly exciting, suddenly everyone has goose bumps and is energised. This feeling, I will take away with me, it's the essence of my work at BL Labs!

A full list of over 200 presentations, roadshows, events and 9 annual symposia can be found here.

Competitions, Awards and Projects

Another significant way BL Labs has tried to connect people with data has been through Competitions (tell us what you would like to do, and we will choose an idea and work collaboratively with you on it to make it a reality), Awards (show us what you have already done) and Projects (collaborative working).

At the last count, we have supported and / or highlighted over 450 projects in research, artistic, entrepreneurial, educational, community based, activist and public categories most through competitions, awards and project collaborations.

We also set up awards for British Library Staff which has been a wonderful way to highlight the fantastic work our staff do with digital collections and give them the recognition they deserve. I have noticed over the years that the number of staff who have been working on digital projects has increased significantly. Sometimes this was with the help of BL Labs but often because of the significant Digital Scholarship Training Programme, run by my Digital Curator colleagues in Digital Research for staff to understand that the BL isn't just about physical things but digital items too.

Browse through our project archive to get inspiration of the various projects BL Labs has been involved in or highlighted.

Putting the digital collections 'where the light is' - British Library platforms and others

When I started at BL Labs it was clear that we needed to make a fundamental decision about how we saw digital collections. Quite early on, we decided we should treat collections as data to harness the power of computational tools to work with each collection, especially for research purposes. Each collection should have a unique Digital Object Identifier (DOI) so researchers can cite them in publications.  Any new datasets generated from them will also have DOIs, allowing us to understand the ecosystem through DOIs of what happens to data when you get it out there for people to use.

In 2014, https://data.bl.uk was born and today, all our 153 datasets (as of 29/09/2021) are available through the British Library's research repository.

However, BL Labs has not stopped there! We always believed that it's important to put our digital collections where others are likely to discover them (we can't assume that researchers will want to come to BL platforms), 'where the light is' so to speak.  We were very open and able to put them on other platforms such as Flickr and Wikimedia Commons, not forgetting that we still needed to do the hard work to connect data to people after they have discovered them, if they needed that support.

Our greatest success by far was placing 1 million largely undescribed images that were digitally snipped from 65,000 digitised public domain books from the 19th Century on Flickr Commons in 2013. The number of images on the platform have grown since then by another 50 to 60 thousand from collections elsewhere in the BL. There has been significant interaction from the public to generate crowdsourced tags to help to make it easier to find the specific images. The number of views we have had have reached over a staggering 2 billion over this time. There have also been an incredible array of projects which have used the images, from artistic use to using machine learning and artificial intelligence to identify them. It's my favourite collection, probably because there are no restrictions in using it.

Read the most popular blog post the BL has ever published by my former BL Labs colleague, the brilliant and inspirational Ben O'Steen, a million first steps and the 'Mechanical Curator' which describes how we told the world why and how we had put 1 million images online for anyone to use freely.

It is wonderful to know that George Oates, the founder of Flickr Commons and still a BL Labs Advisory Board member, has been involved in the creation of the Flickr Foundation which was announced a few days ago! Long live Flickr Commons! We loved it because it also offered a computational way to access the collections, critical for powerful and efficient computational experiments, through its Application Programming Interface (API).

More recently, we have experimented with browser based programming / computational environments - Jupyter Notebooks. We are huge fans of Tim Sherrat who was a pioneer and brilliant advocate of OPEN GLAM in using them, especially through his GLAM Workbench. He is a one person Lab in his own right, and it was an honour to recognise his monumental efforts by giving him the BL Labs Research Award 2020 last year. You can also explore the fantastic work of Gustavo Candela and colleagues on Jupyter Notebooks and the ones my colleageue Filipe Bento created.

Art Exhibitions, Creativity and Education

I am extremely proud to have been involved in enabling two major art exhibitions to happen at the BL, namely:

Crossroads of Curiosity by David Normal

Imaginary Cities by Michael Takeo Magruder

I loved working with artists, its my passion! They are so creative and often not restricted by academic thinking, see the work of Mario Klingemann for example! You can browse through our archives for various artistic projects that used the BL's digital collections, it's inspiring.

I was also involved in the first British Library Fashion Student Competition won by Alanna Hilton, held at the BL which used the BL's Flickr Commons collection as inspiration for the students to design new fashion ranges. It was organised by my colleague Maja Maricevic, the British Fashion Colleges Council and Teatum Jones who were great fun to work with. I am really pleased to say that Maja has gone on from strength to strength working with the fashion industry and continues to run the competition to this day.

We also had some interesting projects working with younger people, such as Vittoria's world of stories and the fantastic work of Terhi Nurmikko-Fuller at the Australian National University. This is something I am very much interested in exploring further in the future, especially around ideas of computational thinking and have been trying out a few things.

GLAM Labs community and Booksprint

I am really proud of helping to create the international GLAM Labs community with over 250 members, established in 2018 and still active today. I affectionately call them the GLAM Labbers, and I often ask people to explore their inner 'Labber' when I give presentations. What is a Labber? It's the experimental and playful part of us we all had as children and unfortunately many have lost when becoming an adult. It's the ability to be fearless, having the audacity and perhaps even naivety to try crazy things even if they are likely to fail! Unfortunately society values success more than it does failure. In my opinion, we need to recognise, respect and revere those that have the courage to try but failed. That courage to experiment should be honoured and embraced and should become the bedrock of our educational systems from the very outset.

Two years ago, many of us Labbers 'ate our own dog food' or 'practised what we preached' when me and 15 other colleagues came together for 5 days to produce a book through a booksprint, probably the most rewarding professional experience of my life. The book is about how to set up, maintain, sustain and even close a GLAM Lab and is called 'Open a GLAM Lab'. It is available as public domain content and I encourage you to read it.

Online drop-in goodbye - today!

I organised a 30 minute ‘online farewell drop-in’ on Wednesday 29 September 2021, 1330 BST (London), 1430 (Paris, Amsterdam), 2200 (Adelaide), 0830 (New York) on my very last day at the British Library. It was heart-warming that the session was 'maxed out' at one point with participants from all over the world. I honestly didn't expect over 100 colleagues to show up. I guess when you leave an organisation you get to find out who you actually made an impact on, who shows up, and who tells you, otherwise you may never know.

Those that know me well know that I would have much rather had a farewell do ‘in person’, over a pint and praying for the ‘chip god’ to deliver a huge portion of chips with salt/vinegar and tomato sauce’ magically and mysteriously to the table. The pub would have been Mc'Glynns (http://www.mcglynnsfreehouse.com/) near the British Library in London. I wonder who the chip god was?  I never found out ;)

The answer to who the chip god was is in text following this sentence on white on white text...you will be very shocked to know who it was!- s

Spoiler alert it was me after all, my alter ego

Farwell-bl-labs-290921Mahendra's online farewell to BL Labs, Wednesday 29 September, 1330 BST, 2021.
Left: Flowers and wine from the GLAM Labbers arrived in Tallinn, 20 mins before the meeting!
Right: Some of the participants of the online farewell

Leave a message of good will to see me off on my voyage!

It would be wonderful if you would like to leave me your good wishes, comments, memories, thoughts, scans of handwritten messages, pictures, photographs etc. on the following Google doc:

http://tiny.cc/mahendramahey

I will leave it open for a week or so after I have left. Reading positive sincere heartfelt messages from colleagues and collaborators over the years have already lifted my spirits. For me it provides evidence that you perhaps did actually make a difference to somone's life.  I will definitely be re-reading them during the cold dark Baltic nights in Tallinn.

I would love to hear from you and find out what you are doing, or if you prefer, you can email me, the details are at the end of this post.

BL Labs Sailor and Captain Signing Off!

It's been a blast and lots of fun! Of course there is a tinge of sadness in leaving! For me, it's also been intellectually and emotionally challenging as well as exhausting, with many ‘highs’ and a few ‘lows’ or choppy waters, some professional and others personal.

I have learned so much about myself and there are so many things I am really really proud of. There are other things of course I wish I had done better. Most of all, I learned to embrace failure, my best teacher!

I think I did meet my original wish of wanting to help to open up the BL to as many new people who perhaps would have never engaged in the Library before. That was either by using digital collections and data for cool projects and/or simply walking through the doors of the BL in London or Boston Spa and having a look around and being inspired to do something because of it.

I wish the person who takes over my position lots of success! My only piece of advice is if you care, you will be fine!

Anyhow, what a time this has been for us all on this planet? I have definitely struggled at times. I, like many others, have lost loved ones and thought deeply about life and it's true meaning. I have also managed to find the courage to know what’s important and act accordingly, even if that has been a bit terrifying and difficult at times. Leaving the BL for example was not an easy decision for me, and I wish perhaps things had turned out differently, but I know I am doing the right thing for me, my future and my loved ones. 

Though there have been a few dark times for me both professionally and personally, I hope you will be happy to know that I have also found peace and happiness too. I am in a really good place.

I would like to thank former alumni of BL Labs, Ben O'Steen - Technical Lead for BL Labs from 2013 to 2018, Hana Lewis (2016 - 2018) and Eleanor Cooper (2018-2019) both BL Labs Project Officers and many other people I worked through BL Labs and wider in the Library and outside it in my journey.

Where I am off to and what am I doing?

My professional plans are 'evolving', but one thing is certain, I will be moving country!

To Estonia to be precise!

I plan to live, settle down with my family and work there. I was never a fan of Brexit, and this way I get to stay a European.

I would like to finish with this final sweet video created by writer and filmaker Ling Low and her team in 2016, entitled 'Hey there Young Sailor' which they all made as volunteers for the Malaysian band, the 'Impatient Sisters'. It won the BL Labs Artistic Award in 2016. I had the pleasure and honour of meeting Ling over a lovely lunch in Kuala Lumpa, Malaysia, where I had also given a talk at the National Library about my work and looked for remanants of my grandfather who had settled there many years ago.

I wish all of you well, and if you are interested in keeping in touch with me, working with me or just saying hello, you can contact me via my personal email address: mr.mahendra.mahey@gmail.com or follow my progress on my personal website.

Happy journeys through this short life to all of you!

Mahendra Mahey, former BL Labs Manager / Captain / Sailor signing off!

03 August 2021

Automating the Recognition of Chinese Manuscripts: New Chevening British Library Fellowship

 

The Chevening Fellowship Programme is the UK government’s international awards scheme aimed at fostering knowledge exchange and collaboration, and developing global leaders. In 2015, the Foreign, Commonwealth & Development Office (FCDO) has partnered with the British Library to offer professionals two new fellowships every year, and recently the two organisations have announced the renewal of their partnership until 2024/25.

Chevening logo and the British Library logo

These fellowships are unique opportunities for one-year placements at the Library, working with exceptional collections under the Library’s custodianship. The Library has hosted international fellows through this scheme since 2016, with each fellowship framing a distinct project inspired by Library collections. Past and present Chevening Fellows at the Library have focused on geographically diverse collections, from Latin America through Africa to South Asia, with different themes such as archival material from Latin America and the Caribbean, African-language printed books, Nationalism, Independence, and Partition in South Asia and Big Data and Libraries.

We are thrilled to (re-)announce that one of the two placements available for the 2022/2023 academic year will focus on automating the recognition of historical Chinese handwritten texts. This fellowship, originally announced two years ago, had to be postponed due to the pandemic – and we are excited to be able to offer it again. This is a special opportunity to work in the Library’s Digital Research Team, and engage with unique historical collections digitised as part of the International Dunhuang Project and the Lotus Sutra Manuscripts Digitisation Project. Focusing on material from Dunhuang (China), part of the Stein collection, this fellowship will engage with new digital tools and techniques in order to explore possible solutions to automate the transcription of these handwritten texts.

End piece of a Chinese Lotus Sutra Scroll (shelfmark: Or.8210/S.1606). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.
End piece of a Chinese Lotus Sutra Scroll (shelfmark: Or.8210/S.1606). Digitised as part of the Lotus Sutra Manuscripts Digitisation Project.

 

The context for this fellowship is the Library’s efforts towards making its collection items available in machine-readable format, to enable full-text search and analysis. The Library has been digitising its collections at scale for over two decades, with digitisation opening up access to diversely rich collections. However, it is important for us to further support discovery and digital research by unlocking the huge potential in automatically transcribing our collections. Until recently, Western languages print collections have been the main focus, especially newspaper collections. A flagship collaboration with the Alan Turing Institute, the Living with Machines project, has been applying Optical Character Recognition (OCR) technology to UK newspapers, designing and implementing new methods in data science and artificial intelligence, and analysing these materials at scale.

Taking a broader perspective on Library collections, we have been exploring opportunities with non-Western collections too. Library staff have been engaging closely with the exploration of OCR and Handwritten Text Recognition (HTR) systems for English, Bangla and Arabic. Digital Curators Tom Derrick, Nora McGregor and Adi Keinan-Schoonbaert have teamed up with PRImA Research Lab and the Alan Turing Institute to ran four competitions in 2017-2019, inviting providers of text recognition methods to try them out on our historical material. We have been working with Transkribus as well – for example, Alex Hailey, Curator for Modern Archives and Manuscripts, used the software to automatically transcribe 19th century botanical records from the India Office Records. An ongoing work led by Tom Derrick is to OCR our digitised collection of Bengali printed texts, digitised as part of the Two Centuries of Indian Print project.

 

Regions, text lines and illustrations demarcated as ground truth, as shown in Transkribus (Shelfmark: Or 3366). Digitised and available on Qatar Digital Library.
Regions, text lines and illustrations demarcated as ground truth, as shown in Transkribus (Shelfmark: Or 3366). Digitised and available on Qatar Digital Library.
 
 
Another screenshot from Transkribus, showing automatically transcribed Bengali printed text (Shelfmark: VT 1914 d). Digitised as part of the Two Centuries of Indian Print project.
Another screenshot from Transkribus, showing automatically transcribed Bengali printed text (Shelfmark: VT 1914 d). Digitised as part of the Two Centuries of Indian Print project.

 

The Chevening Fellow will contribute to our efforts to identify OCR/HTR systems that can tackle digitised historical collections. They will explore the current landscape of Chinese handwritten text recognition, look into methods, challenges, tools and software, use them to test our material, and demonstrate digital research opportunities arising from the availability of these texts in machine-readable format.

This fellowship programme will start in September 2022 for a 12-month period of project-based activity at the British Library. The successful candidate will receive support and supervision from Library staff, and will benefit from professional development opportunities, networking and stakeholder engagement, gaining access to a range of organisational training and development opportunities (such as the Digital Scholarship Training Programme), as well as staff-level access to unique British Library collections and research resources.

For more information and to apply, please visit the Chevening British Library Fellowship page: https://www.chevening.org/fellowship/british-library/, and the “Automating the recognition of historical Chinese handwritten texts” fellowship page: https://www.chevening.org/fellowship/british-library-historical-chinese-texts/.

Applications open on 3 August, 12:00 (midday) BST and close on 2 November, 12:00 (midday) GMT.

Good Luck!

This post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She is on twitter as @BL_AdiKS

 

24 May 2021

Two Million Images Inspire Creativity, Innovation, and Collaboration

BL/QFP Project celebrates two million images on the Qatar Digital Library and the creative ways we have used them.

This week we are celebrating a milestone achievement of two million images digitised and uploaded to the Qatar Digital Library (QDL). In addition to this bilingual, digital archive, the British Library Qatar Foundation Partnership Project (BL/QFP Project) has also inspired creative and innovative pursuits. The material on the QDL is available to use and reuse, which allows for a wide variety of responses. Over the last few years, our Project’s diverse team has explored and demonstrated a multitude of ways to engage with these digital materials, including events, artwork, coding, and analysis.

The BL/QFP Project’s staff are skilled, experienced, and dedicated. They include cataloguers, historians, archivists, imaging specialists, conservators, translators, editors, and administrative support. This means that in one team (ordinarily housed in one office) we have a diverse pool of people, which has inspired some amazing interactions and ideas. Our skills range from photography, graphic design, and technology, to linguistics, history, and data analysis. By sharing and combining these talents, we have been able to engage with the digital material and resources in remarkable ways. We have all enjoyed learning about new areas, sharing skills and knowledge, engaging with fascinating materials, finding new ways of doing things, and collaborating with a range of people, such as the BL BAME Network and other partners.

Some of the work produced outside of our core deliverables is displayed below.

 

Hack Days

Hack Days are an opportunity to use innovative techniques to explore and respond to BL collections. The first BL/QFP Imaging Hack Day was held in October 2018, and led to an array of varied responses from our Imaging Team who used their skills to "hack" the QDL. Subsequent Hack Days have incorporated diverse topics, formats, collections, and participants. They are also award winning: the concept led by the Imaging Team won the British Library Labs Staff Award in 2019.

Poster for first Hack Day, created using images from manuscripts on the QDL, showing an orange tree with heads instead of fruit, saying 'Put Our Heads Together'
Figure 1: Poster for Hack Day created using images from manuscripts on the QDL

 

Astrolabe created by Darran Murray (Digitisation Studio Manager) using Or 2411
Figure 2: Astrolabe created by Darran Murray (Digitisation Studio Manager) using Or 2411

 

Example of images created to respond to the weaponry on the walls by Hannah Nagle (Senior Imaging Support Technician), showing flowers blooming from the muzzles of shotguns
Figure 3: Example of images created to respond to the weaponry on the walls by Hannah Nagle (Senior Imaging Support Technician)

 

Social media banner created by Rebecca Harris (Senior Imaging Technician) for International Women’s Day, showing seven different women from the collection
Figure 4: Social media banner created by Rebecca Harris (Senior Imaging Technician) for International Women’s Day

 

Imaging contrast showing insect damage to manuscript, ‘Four treatises on Astronomy’ (Or 8415), with one image of the manuscript page and the other showing just the pinpricks on a black background, created by Renata Kaminska (Digitisation Studio Manager)
Figure 5: Imaging contrast showing insect damage to manuscript, ‘Four treatises on Astronomy’ (Or 8415), created by Renata Kaminska (Digitisation Studio Manager)

 

Behind the scenes visualisations including conservation treatment, created by Sotirios Alpanis (former Head of Digital Operations) and Jordi Clopes-Masjuan (Senior Imaging Technician)
Figure 6: Behind the scenes visualisations including conservation treatment, created by Sotirios Alpanis (former Head of Digital Operations) and Jordi Clopes-Masjuan (Senior Imaging Technician)

 

Visual narratives made by combining digital images of desert by Melanie Taylor (Senior Imaging Support Technician)
Figure 7: Visual narratives made by combining digital images by Melanie Taylor (Senior Imaging Support Technician)

 

Colourisation of portrait of the Sharif of Mecca, from 1781.b.6/7, using historically accurate colours like gold and dark blue by Daniel Loveday (Senior Imaging Technician)
Figure 8: Colourisation of the portrait of the Sharif of Mecca, from 1781.b.6/7, using historically accurate colours by Daniel Loveday (Senior Imaging Technician)

 

A photo collage showing a creature with one foot, two leafy legs, a maze for a body, and seven heads comprised of flowers, two animal heads and two human heads. By Morgane Lirette (Conservator (Books), Conservation), Tan Wang-Ward (Project Manager, Lotus Sutra Manuscripts Digitisation), Matthew Lee (Imaging Support Technician), Darran Murray (Digitisation Studio Manager), Noemi Ortega-Raventos (Content Specialist, Archivist)
Figure 9: Exquisite Corpse image created by collaging material from different images, including manuscripts from the QDL as well as BL Flickr and Instagram. By Morgane Lirette (Conservator (Books), Conservation), Tan Wang-Ward (Project Manager, Lotus Sutra Manuscripts Digitisation), Matthew Lee (Imaging Support Technician), Darran Murray (Digitisation Studio Manager), Noemi Ortega-Raventos (Content Specialist, Archivist). Exquisite Corpse: Head part 1 (QDL), Head part 2 (QDL), Head part 3 (QDL), Head part 4 (QDL) Head part 5 (QDL), torso (Flickr), legs (Flickr), feet (Instagram)

 

Cyanotype Workshops

Matt Lee (Senior Imaging Support Technician), Daniel Loveday (Senior Imaging Technician) and the Imaging Team

Members of the Imaging team have since gone on to develop cyanotype workshops. The photographic printing process of cyanotype uses chemicals and ultraviolet light to create a copy of an image. The team led experiments on the process at one of the Project’s Staff Away Days. After its success, the concept was developed further and workshops were delivered to students at the Camberwell College of Arts. Images from manuscripts on the QDL were used to create cyanotype collages like those displayed below.

Cyanotype created using collage of images of a bird wearing a crown, a man holding two arms, and two fish in a bowl from the QDL, by Matt Lee (Senior Imaging Support Technician)
Figure 10: Cyanotype created using collage of images from the QDL, by Matt Lee (Senior Imaging Support Technician)

 

Cyanotype created using collage of images including women, text, buildings and animals from the QDL, by Louis Allday (Gulf History Cataloguing Manager)
Figure 11: Cyanotype created using collage of images from the QDL, by Louis Allday (Gulf History Cataloguing Manager)

 

Watermarks Project

Jordi Clopes-Masjuan (Senior Imaging Technician), Camille Dekeyser (Conservator), Matt Lee (Senior Imaging Support Technician), Heather Murphy (Conservation Team Leader)

The Watermarks Project is an ongoing collaboration between the Conservation and Imaging Teams. Together they have sought to examine and display watermarks found in our collection items. Starting with the physical items, and figuring out how best to capture them, they have experimented with ways to display the watermarks digitally. The process requires many forms of expertise, but the results facilitate the study and appreciation of the designs.

Two women standing by a book with cameras and tools
Figure 12: Studio setup for capturing the watermarks

 

Animated image showing traditional and translucid view of a manuscript with a watermark highlighted by digital tracing.
Figure 13: Gif image showing traditional and translucid view with watermark highlighted by digital tracing.

 

Addressing Problematic Terms in our Catalogues and Translations Project

Serim Abboushi (Arabic & English Web Content Editor), Mariam Aboelezz (Translation Support Officer), Louis Allday (Gulf History Cataloguing Manager), Sotirios Alpanis (former Head of Digital Operations), John Casey (Cataloguer, Gulf History), David Fitzpatrick (Content Specialist, Archivist), Susannah Gillard (Content Specialist, Archivist), John Hayhurst (Content Specialist, Gulf History), Julia Ihnatowicz (Translation Specialist), William Monk (Cataloguer, Gulf History), Hannah Nagle (Senior Imaging Support Technician), Noemi Ortega-Raventos (Content Specialist, Archivist), Francis Owtram (Content Specialist, Gulf History), Curstaidh Reid (Cataloguer, Gulf History), George Samaan (Translation Support Officer), Tahani Shaban (Translation Specialist), David Woodbridge (Cataloguer, Gulf History), Nariman Youssef (Arabic Translation Manager) and special thanks to the BL BAME Staff Network.

The Addressing Problematic Terms in our Catalogues and Translations Project was joint winner of the 2020 BL Labs Staff Award. It is an ongoing, highly collaborative project inspired by a talk given by Dr Melissa Bennett about decolonising the archive and how to deal with problematic terms found in archive items. Using existing translation tools and a custom-built python script, the group has been analysing terms that appear in the original language of the documents, and assessing how best to address them in both English and Arabic. This work enables the project to treat problematic terms sensitively and to contextualise them in our catalogue descriptions and translations.

 

More projects

The work continues with projects that explore how to use and share different methods and technologies. For example, Hannah Nagle has taught us how to collage using digital images (How to make art when we’re working apart), Ellis Meade has created a Bitsy game based in the Qatar National Library that draws you inside a manuscript (‘Hidden world of the Qatar National Library’), and Dr Mariam Aboelezz has used the BL/QFP Translation Memory to analyse how we were using the Arabic Verb Form X (istafʿal) in our translations of catalogue descriptions (‘Investigating Instances of Arabic Verb Form X in the BL/QFP Translation Memory’).

Pixelated image of a stick person in front of the Qatar National Library using Bitsy from ‘Hidden world of the Qatar National Library’  blog post by Ellis Meade (Senior Imaging Technician)
Figure 14: Image of the Qatar National Library using Bitsy from ‘Hidden world of the Qatar National Library’ by Ellis Meade (Senior Imaging Technician)

 

We have also made the most of the Covid-19 restrictions and working from home, to share and learn skills such as coding, Arabic language, and photography. For example, through the Project’s ‘Code Club’, many of us have learnt about python and have written scripts to streamline our tasks. Furthermore, codes to explore the collections have also had creative outputs, such as Anne Courtney’s project “Making data into sound” (Runner-up, BL Labs Staff Awards, 2020).

The Project’s extraordinary collaborative work demonstrates some of the exciting and innovative ways to engage with library and archival collections. It also makes clear the wider benefits of digitisation and providing free online access to fully bilingual catalogued resources.

You can read about some of our projects in more detail in the blog posts below:

You can read about previous BL/QFP Hack Days in the blog posts below:

This is a guest post by the British Library Qatar Foundation Partnership Project, compiled by Laura Parsons. You can follow the British Library Qatar Foundation Partnership on Twitter at @BLQatar.

29 April 2021

The Butcher, the Baker, but not the Candlestick Maker

It’s hard to believe, but it’s almost a year since we took a look at some of the weird and wonderful epithets that have been used to distinguish individuals in the Library’s archives and manuscripts catalogue. Twelve months on, the Western Manuscripts cataloguing team is still working its way through the personal name records – correcting errors, enhancing records, and merging duplicate names.

In doing so, yet more items of epithetical interest have emerged. Who amongst us would not have their curiosity piqued by a man described as a pastry-maker and impersonator of King Ferdinand of Portugal? I’m sure we would all wish to take our hats off to the person labelled advocate for world peace (could there be a more noble calling?). We might be impressed at the range of skills held by the builder and composer and be in awe of the derring-do associated with the British flying ace.

But it’s in the area we today call nominative determinism that I’ve started to see some patterns. You know the kind of thing: the farmer whose surname is Farmer, the miller called Miller, and so on. Those are the obvious ones but with a bit of lateral thinking one can find some slightly less obvious examples in Explore Archives and Manuscripts. Nominative determinism once removed, if you like.

The world of religion is a rich seam. We have clergy of various types called Parsons, Bishop, Deacon, Vicars, and Dean, although I’m not sure being called Demons is the most appropriate name for the former owner of a collection of religious treatises.

Then there are the trades and professions. In the catalogue we have a master mason called Stone and a joiner called Turner. And if there’s one thing a bricklayer needs it’s physical strength so being called Backbone is a good start. A schoolmaster called Read makes sense, and when you think of the materials a jeweller works with then so does being called Dargent. A baker called Assh seems ironic (perhaps he was a graduate of the King Alfred School of Baking).

I don’t think there could be a more appropriate name for a soldier than Danger (although Bullitt comes close), and Haddock and Waters seem apt for seafarers too. Ditto, an explorer called Walker.

But of course there are always those who refuse to play along, those who didn’t get the memo. So we have the carpenter called Butcher, the butcher called Baker, the draper called Cooper, the groom called Chandler, the tailor called Fisher, and the mason called Mercer.

And finally, I am disappointed to report that the individual named Le Cat was not, in fact, a burglar.

Burglar coming in through the window with light illuminating a cat
British Library digitised image from page 47 of "The Wild Boys of London; or, the Children of Night. A story of the present day. With numerous illustrations" available on our Flickr collection

This guest blog post is by Michael St John-McAlister, Western Manuscripts Cataloguing Manager at the British Library.

19 February 2021

AURA Research Network Second Workshop Write-up

Keen followers of this blog may remember a post from last December, which shared details of a virtual workshop about AI and Archives: Current Challenges and Prospects of Digital and Born-digital archives. This topic was one of three workshop themes identified by the Archives in the UK/Republic of Ireland & AI (AURA) network, which is a forum promoting discussions on how Artificial Intelligence (AI) can be applied to cultural heritage archives, and to explore issues with providing access to born digital and hybrid digital/physical collections.

The first AURA workshop on Open Data versus Privacy organised by Annalina Caputo from Dublin City University, took place on 16-17 November 2020. Rachel MacGregor provides a great write-up of this event here.

Here at the British Library, we teamed up with our friends at The National Archives to curate the second AURA workshop exploring the current challenges and prospects of born-digital archives, this took place online on 28-29 January 2021. The first day of the workshop held on 28 January was organised by The National Archives, you can read more about this day here, and the following day, 29 January, was organised by the BL, videos and slides for this can be found on the AURA blog and I've included them in this post.

AURA

The format for both days of the second AURA workshop comprised of four short presentations, two interactive breakout room sessions and a wider round-table discussion. The aim being that the event would generate dialogue around key challenges that professionals across all sectors are grappling with, with a view to identifying possible solutions.

The first day covered issues of access both from infrastructural and user’s perspectives, plus the ethical implications of the use of AI and advanced computational approaches to archival practices and research. The second day discussed challenges of access to email archives, and also issues relating to web archives and emerging format collections, including web-based interactive narratives. A round-up of  the second day is below, including recorded videos of the presentations for anyone unable to attend on the day.

Kicking off day two, a warm welcome to the workshop attendees was given by Rachel Foss, Head of Contemporary Archives and Manuscripts at the British Library, Larry Stapleton, Senior academic and international consultant from the Waterford Institute of Technology and Mathieu d’ Aquin, Professor of Informatics at the National University of Ireland Galway.

The morning session on Email Archives: challenges of access and collaborative initiatives was chaired by David Kirsch, Associate Professor, Robert H. Smith School of Business, University of Maryland. This featured two presentations:

The first of these was  about Working with ePADD: processes, challenges and collaborative solutions in working with email archives, by Callum McKean, Curator for Contemporary Literary and Creative Archives, British Library and Jessica Smith, Creative Arts Archivist, John Rylands Library, University of Manchester. Their slides can be viewed here and here. Apologies that the recording of Callum's talk is clipped, this was due to connectivity issues on the day.

The second presentation was Finding Light in Dark Archives: Using AI to connect context and content in email collections by Stephanie Decker, Professor of History and Strategy, University of Bristol and Santhilata Venkata, Digital Preservation Specialist & Researcher at The National Archives in the UK.

After their talks, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the morning session were:

  1. Are there any other appraisal or collaborative considerations that might improve our practices and offer ways forward?
  2. What do we lose by emphasizing usability for researchers?
  3. Should we start with how researchers want to use email archives now and in the future, rather than just on preservation?
  4. Potentialities of email archives as organizational, not just individual?

These questions led to discussions about, file formats, collection sizes, metadata standards and ways to interpret large data sets. There was interest in how email archives might allow researchers to reconstruct corporate archives, e.g. understand social dynamics of the office and understand decision making processes. It was felt that there is a need to understand the extent to which email represents organisation-level context. More questions were raised including:

  • To what extent is it part of the organisational records and how should it be treated?
  • How do you manage the relationship between constant organisational functions and structure (a CEO) and changing individuals?
  • Who will be looking at organisational email in the future and how?

It was mentioned that there is a need to distinguish between email as data and email as an artifact, as the use-cases and preservation needs may be markedly different.

Duties of care that exist between depositors, tool designers, archivists and researchers was discussed and a question was asked about how we balance these?

  • Managing human burden
  • Differing levels of embargo
  • Institutional frameworks

There was discussion of the research potential for comparing email and social media collections, e.g. tweet archives and also the difficulties researchers face in getting access to data sets. The monetary value of email archives was also raised and it was mentioned that perceived value, hasn’t been translated into monetary value.

Researcher needs and metadata was another topic brought up by attendees, it was suggested that the information about collections in online catalogues needs to be descriptive enough for researchers to decide if they wish to visit an institution, to view digital collections on a dedicated terminal. It was also suggested that archives and libraries need to make access restrictions, and the reasoning for these, very clear to users. This would help to manage expectations, so that researchers will know when to visit on-site because remote access is not possible. It was mentioned that it is challenging to identify use cases, but it was noted that without deeper understanding of researcher needs, it can be hard to make decisions about access provision.

It was acknowledged that the demands on human-processing are still high for born digital archives, and the relationship between tools and professionals still emergent. So there was a question about whether researchers could be involved in collaborations more, and to what extent will there be an onus on their responsibilities and liabilities in relation to usage of born digital archives?

Lots of food for thought before the break for lunch!

The afternoon session chaired by Nicole Basaraba, Postdoctoral Researcher, Studio Europa, Maastricht University, discussed Emerging Formats, Interactive Narratives and Socio-Cultural Questions in AI.

The first afternoon presentation Collecting Emerging Formats: Capturing Interactive Narratives in the UK Web Archive was given by Lynda Clark, Post-doctoral research fellow in Narrative and Play at InGAME: Innovation for Games and Media Enterprise, University of Dundee, and Giulia Carla Rossi, Curator for Digital Publications, British Library. Their slides can be viewed here.  

The second afternoon session was Women Reclaiming AI: a collectively designed AI Voice Assistant by Coral Manton, Lecturer in Creative Computing, Bath Spa University, her slides can be seen here.

Following the same format as in the morning, after these presentations, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the afternoon session were:

  1. Should we be collecting examples of AIs, as well as using AI to preserve collections? What are the Implications of this
  2. How do we get more people to feel that they can ask questions about AI?
  3. How do we use AI to think about the complexity of what identity is and how do we engineer it so that technologies work for the benefit of everyone?

There was a general consensus, which acknowledged that AI is becoming a significant and pervasive part of our life. However it was felt that there are many aspects we don't fully understand. In the breakout groups workshop participants raised more questions, including:

  • Where would AI-based items sit in collections?
  • Why do we want it?
  • How to collect?
  • What do we want to collect? User interactions? The underlying technology? Many are patented technologies owned by corporations, so this makes it challenging. 
  • What would make AI more accessible?
  • Some research outputs may be AI-based - do we need to collect all the code, or just the end experience produced? If the latter, could this be similar to documenting evidence e.g. video/sound recordings or transcripts.
  • Could or should we use AI to collect? Who’s behind the AI? Who gets to decide what to archive and how? Who’s responsible for mistakes/misrepresentations made by the AI?

There was debate about how to define AI in terms of a publication/collection item, it was felt that an understanding of this would help to decide what archives and libraries should be collecting, and understand what is not being collected currently. It was mentioned that a need for user input is a critical factor in answering questions like this. A number of challenges of collecting using AI were raised in the group discussions, including:

  • Lack of standardisation in formats and metadata
  • Questions of authorship and copyright
  • Ethical considerations
  • Engagement with creators/developers

It was suggested that full scale automation is not completely desirable and some kind of human element is required for specialist collections. However, AI might be useful for speeding up manual human work.

There was discussion of problems of bias in data, that existing prejudices are baked into datasets and algorithms. This led to more questions about:

  • Is there is a role for curators in defining and designing unbiased and more representative data sets to more fairly reflect society?
  • Should archives collect training data, to understand underlying biases?
  • Who is the author of AI created text and dialogue? Who is the legally responsible person/orgnisation?
  • What opportunities are there for libraries and archives to teach people about digital safety through understanding datasets and how they are used?

Participants also questioned:

  • Why do we humanise AI?
  • Why do we give AI a gender?
  • Is society ready for a genderless AI?
  • Could the next progress in AI be a combination of human/AI? A biological advancement? Human with AI “components” - would that make us think of AIs as fallible?

With so many questions and a lack of answers, it was felt that fiction may also help us to better understand some of these issues, and Rachel Foss ended the roundtable discussion by saying that she is looking forward to reading Kazuo Ishiguro’s new novel Klara and the Sun, about an artificial being called Klara who longs to find a human owner, which is due to be published next month by Faber.

Thanks to everyone who spoke at and participated in this AURA workshop, to make it a lively and productive event. Extra special thanks to Deirdre Sullivan for helping to run the online event smoothly. Looking ahead, the third workshop on Artificial Intelligence and Archives: What comes next? is being organised by the University of Edinburgh in partnership with the AURA project team, and is scheduled to take place on Tuesday 16 March 2021. Please do join the AURA mailing list and follow #AURA_network on social media to be part of the network's ongoing discussions.

This post is by Digital Curator Stella Wisdom (@miss_wisdom)

29 January 2021

Hacking the BL from home

BL/QFP Project and BL BAME Network Hack Day: 13th January, 2021

This is a guest post by the British Library Qatar Foundation Partnership, compiled by Laura Parsons. You can follow the British Library Qatar Foundation Partnership on Twitter at @BLQatar.

We may be unable to visit the British Library in person, or see our colleagues except for on our computer screens, but on Wednesday 13th January we proved that lockdown is no barrier to a Hack Day. For the first time our Hack Day was opened up to British Library staff from outside the BL/QFP Project, as we invited members of the BL BAME Network to join us. It was exciting to have a wide variety of people with different roles and Hack Day experience, which was reflected in the diverse ideas and results displayed on the day. There was no particular subject or theme for this Hack Day. The only objectives were to try or learn something new, meet some people from around the Library and have a bit of fun along the way.

It felt slightly weird holding our Hack Day online via Microsoft Teams, rather than gathered in the BL/QFP Project’s office on the 6th floor of the Library. However, with various types of technology and online platforms, including the Teams breakout function and a shared Google doc, we still managed to work collaboratively whilst working from home. Throughout the Teams rooms, it was great to see and hear amazing ideas, helpful team work, interesting discussions, valuable sharing of skills and knowledge, and laughter.

We hope you enjoy reading about our hacks as much as we enjoyed the process of making them together.

 

Exquisite Corpses

Contributors: Morgane Lirette (Conservator (Books), Conservation), Tan Wang-Ward (Project Manager, Lotus Sutra Manuscripts Digitisation), Matthew Lee (Imaging Support Technician, BL/QFP Project), Darran Murray (Digitisation Studio Manager, BL/QFP Project), Noemi Ortega-Raventos (Content Specialist, Archivist, BL/QFP Project)

Our project for this Hack Day collaboration was centered on the idea of the Exquisite Corpse – a fun and creative game popularised by the Surrealists as a tool to create bizarre and wonderful compositions.

The result was a cross collaborative effort, involving staff from the International Dunhuang Project, Conservation and the BL/QFP Project, that created a series of visual collages using material from the Library's digital collections, Flickr and Instagram accounts as well as the Qatar Digital Library (QDL). We created five exquisite corpses in total.

The biggest takeaway from the day was how easy, fun and creative this process was in facilitating cross library networking and collaboration but also as a tool for invention and exploration of the Library’s diverse collections.

 

Exquisite Corpse image created by collaging material from different images together.
Figure 1: Exquisite Corpse 1: Head part 1 (QDL), Head part 2 (QDL), Head part 3 (QDL), Head part 4 (QDL) Head part 5 (QDL), torso (Flickr), legs (Flickr), feet (Instagram)

 

Exquisite Corpse image 2 - collage
Figure 2: Exquisite Corpse 2: Head (Flickr), torso (BL Catalogue), legs (Instagram), feet (QDL)

 

Exquisite Corpse image 3 - collage
Figure 3: Exquisite Corpse 3: Head (BL Catalogue), torso (Flickr), legs (BL Catalogue), feet (BL Catalogue)

 

Exquisite Corpse image 4 - collage
Figure 4: Exquisite Corpse 4: Head (Flickr), torso (Instagram), legs (QDL), foot 1 (Flickr), foot 2 (Flickr)

 

Exquisite Corpse image 5 - collage
Figure 5: Exquisite Corpse 5: Head (BL Catalogue), torso (QDL), arm (QDL), legs (Flickr), foot 1 (BL Catalogue), foot 2 (BL Catalogue)

 

OCR Text Analysis

Contributors: David Woodbridge (Cataloguer, Gulf History, BL/QFP Project) & Sotirios Alpanis (Head of Digital Operations, BL/QFP Project)

This hack aimed to extend work undertaken as part of the Addressing Problematic Terms Project to explore the BL/QFP’s Optical Character Recognition (OCR) data.

Inspiration for the Hack was drawn from Olivia Vane’s excellent OCR visualisation tool, Steptext. OCR is an automated process employed during the BL/QFP’s digitisation process that ‘reads’ the images captured and turns them into searchable text.

Initially the team came up with a list of terms to search the OCR text for. Then we wrote a Python script to search the OCR files for each term, and output three graphs, built using Bokeh.

Graph displays the number of matches for the term against the year the archive material was created.
Figure 6: This graph displays the number of matches for the term against the year the archive material was created. Click on the image to open an interactive version in a new window.

 

Using the year with the most occurrences of the term, bar chart displays break down of the frequency per shelfmark.
Figure 7: Using the year with the most occurrences of the term, this bar chart  displays the break down the frequency per shelfmark. Click on the image to open an interactive version in a new window.

 

Using the shelfmark with the most matches, this graph displays how often the term occurs in each image capture. Using Bokeh’s inbuilt Hover tool, the graph displays a snippet of the term in context with the rest of the OCR data.
Figure 8: Using the shelfmark with the most matches, this graph displays how often the term occurs in each image capture. Using Bokeh’s inbuilt Hover tool, the graph displays a snippet of the term in context with the rest of the OCR data. Click on the image to open an interactive version in a new window.

 

The results show how it is possible both to identify where specific terms are used in the records and to analyse how they are used over time. This will be of great help as we seek to take the project to the next stage.

 

OCR Exquisite Corpses

Contributor: Sotirios Alpanis (Head of Digital Operations, BL/QFP Project)

Taking inspiration from the Exquisite Corpse Hack project, the code for the OCR text analysis was re-factored to produce OCR Exquisite Corpses. Here is the process:

  1. Taking an initial search term, a shelfmark was picked at random and the term was searched for, this process was repeated until a match was found.
  2. Once a match was made the subsequent four words were selected, completing the first sentence of an exquisite corpse.
  3. The final word of the sentence was then used to begin the process again, creating a link between the two sentences.
  4. This was repeated four times to create surreal nonsense poem.
  5. Finally, using Google Translate’s text to speech service, an mp3 file was created for each poem.

The Hack team nominated some everyday words to generate OCR Exquisite Corpses. Here are some highlights:

  • BREAD and wine: he THEN he in his, POSSESSION of the enemy's ENTRENCHED camp at Brasjoon, ABOUT 80 per cent

Bread OCR Exquisite Corpse

  • BLUE and gold lackered, WORK fur r North & THE 15th November, 1933, WITH ENCLOSURES FOREIGN: Immediate

Blue OCR Exquisite Corpse

  • MUTINY had been prevented BY wandering tribes, small TRIBUTARY to Persia; AND has the honour TO deal with the

Mutiny OCR Exquisite Corpse

 

Investigating Instances of Arabic Verb Form X in the BLQFP Translation Memory

Contributor: Mariam Aboelezz

I investigated uses of Arabic Verb Form X (istafʿala) in the BLQFP Translation Memory using our translation software, memoQ. I chose this verb form because it conveys the meaning of seeking or acquiring something for oneself, possibly by force, and could therefore elicit unconscious bias in our translations. I identified 55 unique verbs that take this form, six of which were potentially problematic. A closer look at the most frequent verb (istawlá; to take forcefully or wrongfully) suggests that some unconscious bias may have travelled from the primary sources to the catalogue descriptions or been introduced during translation. The results provide a prompt for further discussions about problematic language among translators and cataloguers.

Search results from the BLQFP Translation Memory in memoQ for Arabic Verb Form X (istafʿala)
Figure 9: Search results from the BLQFP Translation Memory in memoQ for Arabic Verb Form X (istafʿala)

 

Bar chart displaying the 55 unique verbs identified and their frequency.
Figure 10: Bar chart displaying the 55 unique verbs identified and their frequency.

 

Bar chart displaying the six potentially problematic verbs.
Figure 11: Bar chart displaying the six potentially problematic verbs.

 

Birds of the QDL team

Contributors: Anne Courtney (Cataloguer, Gulf History, BL/QFP Project), Sara Hale (Digitisation Officer, Heritage Made Digital/Asian and African Collections), Francis Owtram (Content Specialist, Gulf History, BL/QFP Project), Annie Ward (Digitisation Workflow Administrator, BL/QFP Project)

The Birds of the QDL team set out to explore how birds appear in the digital records. Sara and Annie used manuscript paintings of bird species as inspiration, creating an animated GIF of a hoopoe and data visualisations of the search results for different birds. Anne tracked bird sightings in one of the IOR ship’s logs by combining quotes from the log with sound recordings and images to help bring the record to life. Francis investigated the Socotra cormorant, British guano extraction and the resistance of the islanders. We enjoyed experimenting with different formats to highlight some of the regional birds and the contexts in which they appear.

Animated gif using an image of a hoopoe bird. Image from: Tarjumah-ʼi ʻAjā’ib al-makhlūqāt ترجمۀ عجائب المخلوقات Anonymous translator [‎397r] (812/958), British Library: Oriental Manuscripts, Or 1621, in Qatar Digital Library and quote from: ''IRAQ AND THE PERSIAN GULF' [‎144v] (293/862), British Library: India Office Records and Private Papers, IOR/L/MIL/17/15/64, in Qatar Digital Library
Animated gif using an image of a hoopoe bird. Image from: Tarjumah-ʼi ʻAjā’ib al-makhlūqāt ترجمۀ عجائب المخلوقات Anonymous translator [‎397r] (812/958), British Library: Oriental Manuscripts, Or 1621, in Qatar Digital Library <https://www.qdl.qa/archive/81055/vdc_100069559270.0x00000d> and quote from: ''IRAQ AND THE PERSIAN GULF' [‎144v] (293/862), British Library: India Office Records and Private Papers, IOR/L/MIL/17/15/64, in Qatar Digital Library <https://www.qdl.qa/archive/81055/vdc_100037366479.0x00005e>

 

Bar chart displaying the number of search results by bird name on the Qatar Digital Library and decorated with bird images from a manuscript (Tarjumah-ʼi ʻAjā’ib al-makhlūqāt ترجمۀ عجائب المخلوقات Anonymous translator, British Library: Oriental Manuscripts, Or 1621, in Qatar Digital Library.
Bar chart displaying the number of search results by bird name on the Qatar Digital Library and decorated with bird images from a manuscript (Tarjumah-ʼi ʻAjā’ib al-makhlūqāt ترجمۀ عجائب المخلوقات Anonymous translator, British Library: Oriental Manuscripts, Or 1621, in Qatar Digital Library <https://www.qdl.qa/archive/81055/vdc_100035587342.0x000001>).

 

Image of the ocean with text reading: “This day we see no birds”. Image from: ‘Sea Song and River Rhyme from Chaucer to Tennyson’ (1887), ed. E D Adams and quote from: Blenheim : Journal [‎16v] (38/209), British Library: India Office Records and Private Papers, IOR/L/MAR/B/697A, in Qatar Digital Library
Figure 14: Image of the ocean with text reading: “This day we see no birds”. Image from: ‘Sea Song and River Rhyme from Chaucer to Tennyson’ (1887), ed. E D Adams and quote from: Blenheim : Journal [‎16v] (38/209), British Library: India Office Records and Private Papers, IOR/L/MAR/B/697A, in Qatar Digital Library <https://www.qdl.qa/archive/81055/vdc_100085281813.0x000027>

 

Map of the island of Socotra from: ‘A Trigonometrical Survey of Socotra by Lieut.ts S.B. Haines and I.R. Wellsted assisted by Lieut. I.P. Sanders and Mess.rs Rennie Cruttenden & Fleming Mids.n, Indian Navy. Engraved by R. Bateman, 72 Long Acre’ [‎8r] (1/2), British Library: Map Collections, IOR/X/3630/13, in Qatar Digital Library
Figure 15: Map of the island of Socotra from: ‘A Trigonometrical Survey of Socotra by Lieut.ts S.B. Haines and I.R. Wellsted assisted by Lieut. I.P. Sanders and Mess.rs Rennie Cruttenden & Fleming Mids.n, Indian Navy. Engraved by R. Bateman, 72 Long Acre’ [‎8r] (1/2), British Library: Map Collections, IOR/X/3630/13, in Qatar Digital Library <https://www.qdl.qa/archive/81055/vdc_100023868004.0x000010>

 

Story-Mapping: The Shater’s Journey

Contributors: Jenny Norton-Wright (Arabic Scientific Manuscripts Curator, BL/QFP Project) & Ula Zeir (Content Specialist, Arabic Language, BL/QFP Project)

Our Hack project aimed to create an interactive map tracing the footsteps of a shater [shāṭir, foot-courier] who made a 700-mile return journey between Gombroon and Shiraz in 1761 bearing an important letter, as recounted in one of the Gombroon Diaries (IOR/G/29/13).

First, we collected background information on the journey and on the term shater, and transcribed the relevant diary entries. We then used the Esri ArcGIS StoryMap Tour platform to visualise and map the events. The Tour function integrates text boxes, captions, and associated images with a background map tracking the points of the journey, and supports hyperlinking to the IOR materials on the QDL.

Image from the start of the story map introducing the Shater journey.
Figure 16: Image from the start of the story map introducing the Shater journey.

 

Image from the story map continuing the Shater journey.
Figure 17: Image from the story map continuing the Shater journey.

 

Image from the story map continuing the Shater journey: a reply is received.
Figure 18: Image from the story map continuing the Shater journey: a reply is received.

 

For more information about the Gombroon Diaries:

Diary and Consultations of Mr Alexander Douglas, Agent of the East India Company at Gombroon [Bandar-e ʻAbbās] in the Persian Gulf, commencing 2 October 1760 and ending 30 December 1761, British Library: India Office Records and Private Papers, IOR/G/29/13, in Qatar Digital Library <https://www.qdl.qa/archive/81055/vdc_100000001251.0x00036a>

 

British Library mosaic

Contributor: Laura Parsons (Digitisation Workflow Administrator, BL/QFP Project)

This project involved learning how to create mosaics using images from the Library and QDL collections. This was inspired by a presentation by Pardaad Chamsaz (Curator Germanic Collections, BL European Studies) about the Decolonising the BL working group of the BL BAME Network. He said that we should remember that the Library is made up of many different people. I decided to try using Mosaically to use multiple images to create an image of the British Library, to show that it takes many parts to make a whole. This also highlights the Library’s vast collections. I then repeated this with images from the QDL to show an image of the QDL homepage.

Mosaic of the British Library using images from the British Library Flickr account
Figure 19: Mosaic of the British Library using images from the British Library Flickr account.

 

Mosaic of the Qatar Digital Library homepage using images from the Qatar Digital Library
Figure 20: Mosaic of the Qatar Digital Library homepage using images from the Qatar Digital Library (https://www.qdl.qa/en).

 

You can also read about the previous Hack Days in the blog posts below:

Digital scholarship blog recent posts

Archives

Tags

Other British Library blogs