THE BRITISH LIBRARY

Digital scholarship blog

189 posts categorized "Projects"

09 July 2021

Subjects Wanted for Soothing Sounds Psychology Studies

Add comment

Can you help University of Reading researchers with their studies examining the potential therapeutic effects of  looking at ‘soothing’ images and listening to natural sounds on mental health and wellbeing?

Sound recordings for this research have been provided by Cheryl Tipp, Curator of Wildlife & Environmental Sounds, from the British Library Sound Archive.

One study focuses on young people; 13-17 year-olds are wanted for an easy online survey. Psychology Masters student Jasmiina Ryyanen from the University of Reading is asking young people to view and listen to 25 images and sounds, rating their moods before and after. Access the survey for 13-17 year-olds here: https://henley.eu.qualtrics.com/jfe/form/SV_eKaQjEf2H3Vqw9U.

Poster with details of Soothing Sounds student study for young people

There is also an online survey managed by Emily Witten, which is aimed at adults, so if you are over 18 please participate in this study: https://henley.eu.qualtrics.com/jfe/form/SV_cBa6tNtkN3fgkCO.  

Poster about Soothing Sounds student study for adults

Both surveys are completely randomised; some participants will be asked to look at images only, others to listen to sounds only, and the final group to look at images while listening to the sounds at the same time. These research projects have been fully approved by the University of Reading’s ethical standards board. If you have any questions about these surveys, please email Jasmiina Ryyanen (j.ryynanen(at)student.reading.ac.uk) and Emily Witten (e.i.c.witten(at)student.reading.ac.uk).

We hope you enjoy participating in these surveys and feel suitably soothed from the experience! 

This post is by Digital Curator Stella Wisdom (@miss_wisdom

24 June 2021

My placement: Using Transkribus to OCR Two Centuries of Indian Print

Add comment

I began a work placement with the Two Centuries of Indian Print project from the British Library working with my supervisor (Digital Curator) Tom Derrick, to automatically transcribe the Library’s Bengali books digitised and catalogued as part of the project. The OCR application we use for transcription is Transkribus, a leading text recognition application for historical documents. We also use a Google Sheet to instantly update each book’s basic information and job status.

In the first two days, I accepted training in how to use the Transkribus application by a face-to-face (virtual) demonstration from my supervisor since I didn't know how to use OCR. He also provided a manual for me to refer to in my practice. There are three main steps to complete a book transcription: uploading books, running layout analysis, and running text detection. We upload books from the British Library’s IIIF image viewer to Transkribus. I needed to first confirm the name and digital system number of a book from our team’s shared Google Sheet so that I could find the digital content of this book within the BL online catalogue. I would record the number of pages the book has into the Google Sheet at the same time. Then I copied the URL of the IIIF manifest and import this book into the collection of our project in Transkribus. After that, I would run layout analysis in Transkribus. It usually takes several minutes to run, and the more pages there are the more time it will take. Perfect layout analysis is where there is one baseline for each line of text on a page.

Although Transkribus is trained on 100+ pages, it still makes mistakes due to multiple causes. Title or chapter headers whose font size differs significantly from other text sometimes would be missed; patterned dividers and borders in the title page will easily been incorrectly identified as text; sometimes the color of paper is too dark, making it difficult to recognize the text. In these cases, the user needs to manually revise the recognition result. After checking the quality of the text analysis, I could then run text recognition. The final step is to check the results of the text recognition and update the Google Sheet.

TranskribusAppplication

Above: A view of a book in the Transkribus application, showing the page images and transcription underneath

During the three weeks of the placement, I handled a total of twelve books. In addition to the regular progression patterns described earlier, I was fortunate to come across several books that required special handling and used them to learn how to handle various situations. For example, the image above shows the result of text recognition for a page of the first book I dealt with in Transkribus, Dhārāpāta: prathama bhāg. Pāṭhaśālastha śiśu digera śikshārtha/ Cintāmani Pāl. Every word in this book is very short and widely spaced, making it very difficult for Transkribus to identify the layout. Because the book is only 28 pages long, I manually labeled all the layouts.

In addition to my work, I have had the pleasure of interacting with many British Library curators and investigators who are engaged in digitization. I attended a regular meeting of our project and learnt the division of labor of the digital project members. Besides, my supervisor Tom contacted some colleagues who work related to the digitization of Chinese collections and provided me with the opportunity to meet them, which has benefited me a lot.

The Principal Investigator for our 2CIP project, Adi, who also has been involved with research and development of Chinese OCR/HTR at the British Library, shared with me the challenges of Chinese OCR/HTR and the progress of current research at the British Library.

Curator for the International Dunhuang Project, Melodie, and a project manager, Tan, presented the research content and outcomes of the project. This project has many partner institutions in different countries that have collections related to the Silk Road. It is a very meaningful digitization project and I admire the development of this project.

The lead Curator for the British Library’s Chinese collections, Sara, introduced different types of Chinese collections and some representative collections in the British Library to me. She also shared with me the objective problems they would encounter when digitizing collections.

Three weeks passed quickly and I gained a lot from my experience at the British Library. In addition to the specifics of how to use Transkribus for text recognition, I have learned about the achievements and problems faced in digitizing Chinese collections from a variety of perspectives.

This is a guest post by UCL Digital Humanities MSc student Xinran Gu.

14 June 2021

Adding Data to Wikidata is Efficient with QuickStatements

Add comment

Once I was set up on Wikipedia (see Triangulating Bermuda, Detroit and William Wallace), I got started with Wikidata. Wikidata is the part of the Wikimedia universe which deals with structured data, like dates of birth, shelf marks and more.

Adding data to Wikidata is really simple: it just requires logging into Wikidata (or creating an account if you don’t already have one) and then pressing edit on any page. you want to edit.

Image of a Wikidata entry about Earth
Editing Wikidata

If the page doesn’t already exist, then creating it is also very simple: just select ‘create a new item’ from the menu on the left-hand side of the page.

When using Wikidata, there are some powerful tools which make adding data quicker and easier. One of these is Quick Statements. Unfortunately, using QuickStatements requires that you have made 50 edits on Wikidata before you make your first batch. Fortunately, it is rather quicker than Citation Hunt (for which, see Triangulating Bermuda, Detroit and William Wallace).

Image of Wikidata menu with 'Create a new item' highlighted
Creating a new item in Wikidata

I made those 50 edits very quickly, by setting up Wikidata item pages for each of the sample items from the India Office Records that we are working with (at the moment we are prioritising adding information about the records; further work will take place before any digitised items are uploaded to Wikimedia platforms). Basic information was added to each of the item pages.

Q107074264 (India Office List January 1885)

Q107074434 (India Office List July 1885)

Q107074463 (India Office List January 1886)

Q107074676 (India Office List July 1886)

Q107074754 (India Office List 1886 Supplement)

Q107074810 (1888-9 Report on the Administration of Bengal)

Q107074801 (1889-90 Report on the Administration of Bengal)

Once I had done this, it became clear that I needed to create more general pages, which could contain the DOIs that link back to the digitised records which are currently only accessible via batch download through the British Library research repository.

Q107134086 Page for administrative reports (V/10/60-1) in general.

Q107136752 Page for India lists (v/13/173-6) in general.

Image of the WikiProject page for the India Office Records
The WikiProject page for the India Office Records

The final preparatory step was to create a WikiProject page, which will facilitate collaboration on the project. This page contains links to all the pages involved in the project and will soon also contain useful resources such as templates for creating new pages as part of the project and queries for using the data.

After this, I began to experiment with Quick Statements, making heavy use of the useful guide to it available on Wikidata.

I decided to upload information on members of a particular regiment in Bengal, since this was information I could easily copy into a spreadsheet because the versions of the reports in the British Library research repository support Optical Character Recognition (OCR).

Image of the original India Office List containing information on members of the 14th Infantry Regiment
Section of the original India Office List containing information on members of the 14th Infantry Regiment (IOR/V/6/175, page 258)

Finally, once I had done all of this, I met with the curators of the India Office Records for feedback and suggestions. It became clear from this that there was in fact some confusion about the exact identification of the regiment they were involved in. Fortunately, it turned out we had identified the correct regiment, but had we made a mistake, it would have just required a simple batch of the Quick Statement edits to quickly put right.

Image of a section of a spreadsheet of members of the 14th Infantry Regiment
Section of my spreadsheet of members of the 14th Infantry Regiment

All in all, I can recommend using Wikidata and I hope I have shown that I can be a useful tool, but also that it is easy to use. The next step for our Wikidata project will be to upload templates and case studies to help and support future volunteer editors to develop it further. We will also add resources to support research on the uploaded data.

Image of Quick Statements for adding gender to each of the pages for the officers
Screenshot of Quick Statements for adding gender to each of the pages for the officers

This is a guest post by UCL Digital Humanities MA student Dominic Kane.

24 May 2021

Two Million Images Inspire Creativity, Innovation, and Collaboration

Add comment

BL/QFP Project celebrates two million images on the Qatar Digital Library and the creative ways we have used them.

This week we are celebrating a milestone achievement of two million images digitised and uploaded to the Qatar Digital Library (QDL). In addition to this bilingual, digital archive, the British Library Qatar Foundation Partnership Project (BL/QFP Project) has also inspired creative and innovative pursuits. The material on the QDL is available to use and reuse, which allows for a wide variety of responses. Over the last few years, our Project’s diverse team has explored and demonstrated a multitude of ways to engage with these digital materials, including events, artwork, coding, and analysis.

The BL/QFP Project’s staff are skilled, experienced, and dedicated. They include cataloguers, historians, archivists, imaging specialists, conservators, translators, editors, and administrative support. This means that in one team (ordinarily housed in one office) we have a diverse pool of people, which has inspired some amazing interactions and ideas. Our skills range from photography, graphic design, and technology, to linguistics, history, and data analysis. By sharing and combining these talents, we have been able to engage with the digital material and resources in remarkable ways. We have all enjoyed learning about new areas, sharing skills and knowledge, engaging with fascinating materials, finding new ways of doing things, and collaborating with a range of people, such as the BL BAME Network and other partners.

Some of the work produced outside of our core deliverables is displayed below.

 

Hack Days

Hack Days are an opportunity to use innovative techniques to explore and respond to BL collections. The first BL/QFP Imaging Hack Day was held in October 2018, and led to an array of varied responses from our Imaging Team who used their skills to "hack" the QDL. Subsequent Hack Days have incorporated diverse topics, formats, collections, and participants. They are also award winning: the concept led by the Imaging Team won the British Library Labs Staff Award in 2019.

Poster for first Hack Day, created using images from manuscripts on the QDL, showing an orange tree with heads instead of fruit, saying 'Put Our Heads Together'
Figure 1: Poster for Hack Day created using images from manuscripts on the QDL

 

Astrolabe created by Darran Murray (Digitisation Studio Manager) using Or 2411
Figure 2: Astrolabe created by Darran Murray (Digitisation Studio Manager) using Or 2411

 

Example of images created to respond to the weaponry on the walls by Hannah Nagle (Senior Imaging Support Technician), showing flowers blooming from the muzzles of shotguns
Figure 3: Example of images created to respond to the weaponry on the walls by Hannah Nagle (Senior Imaging Support Technician)

 

Social media banner created by Rebecca Harris (Senior Imaging Technician) for International Women’s Day, showing seven different women from the collection
Figure 4: Social media banner created by Rebecca Harris (Senior Imaging Technician) for International Women’s Day

 

Imaging contrast showing insect damage to manuscript, ‘Four treatises on Astronomy’ (Or 8415), with one image of the manuscript page and the other showing just the pinpricks on a black background, created by Renata Kaminska (Digitisation Studio Manager)
Figure 5: Imaging contrast showing insect damage to manuscript, ‘Four treatises on Astronomy’ (Or 8415), created by Renata Kaminska (Digitisation Studio Manager)

 

Behind the scenes visualisations including conservation treatment, created by Sotirios Alpanis (former Head of Digital Operations) and Jordi Clopes-Masjuan (Senior Imaging Technician)
Figure 6: Behind the scenes visualisations including conservation treatment, created by Sotirios Alpanis (former Head of Digital Operations) and Jordi Clopes-Masjuan (Senior Imaging Technician)

 

Visual narratives made by combining digital images of desert by Melanie Taylor (Senior Imaging Support Technician)
Figure 7: Visual narratives made by combining digital images by Melanie Taylor (Senior Imaging Support Technician)

 

Colourisation of portrait of the Sharif of Mecca, from 1781.b.6/7, using historically accurate colours like gold and dark blue by Daniel Loveday (Senior Imaging Technician)
Figure 8: Colourisation of the portrait of the Sharif of Mecca, from 1781.b.6/7, using historically accurate colours by Daniel Loveday (Senior Imaging Technician)

 

A photo collage showing a creature with one foot, two leafy legs, a maze for a body, and seven heads comprised of flowers, two animal heads and two human heads. By Morgane Lirette (Conservator (Books), Conservation), Tan Wang-Ward (Project Manager, Lotus Sutra Manuscripts Digitisation), Matthew Lee (Imaging Support Technician), Darran Murray (Digitisation Studio Manager), Noemi Ortega-Raventos (Content Specialist, Archivist)
Figure 9: Exquisite Corpse image created by collaging material from different images, including manuscripts from the QDL as well as BL Flickr and Instagram. By Morgane Lirette (Conservator (Books), Conservation), Tan Wang-Ward (Project Manager, Lotus Sutra Manuscripts Digitisation), Matthew Lee (Imaging Support Technician), Darran Murray (Digitisation Studio Manager), Noemi Ortega-Raventos (Content Specialist, Archivist). Exquisite Corpse: Head part 1 (QDL), Head part 2 (QDL), Head part 3 (QDL), Head part 4 (QDL) Head part 5 (QDL), torso (Flickr), legs (Flickr), feet (Instagram)

 

Cyanotype Workshops

Matt Lee (Senior Imaging Support Technician), Daniel Loveday (Senior Imaging Technician) and the Imaging Team

Members of the Imaging team have since gone on to develop cyanotype workshops. The photographic printing process of cyanotype uses chemicals and ultraviolet light to create a copy of an image. The team led experiments on the process at one of the Project’s Staff Away Days. After its success, the concept was developed further and workshops were delivered to students at the Camberwell College of Arts. Images from manuscripts on the QDL were used to create cyanotype collages like those displayed below.

Cyanotype created using collage of images of a bird wearing a crown, a man holding two arms, and two fish in a bowl from the QDL, by Matt Lee (Senior Imaging Support Technician)
Figure 10: Cyanotype created using collage of images from the QDL, by Matt Lee (Senior Imaging Support Technician)

 

Cyanotype created using collage of images including women, text, buildings and animals from the QDL, by Louis Allday (Gulf History Cataloguing Manager)
Figure 11: Cyanotype created using collage of images from the QDL, by Louis Allday (Gulf History Cataloguing Manager)

 

Watermarks Project

Jordi Clopes-Masjuan (Senior Imaging Technician), Camille Dekeyser (Conservator), Matt Lee (Senior Imaging Support Technician), Heather Murphy (Conservation Team Leader)

The Watermarks Project is an ongoing collaboration between the Conservation and Imaging Teams. Together they have sought to examine and display watermarks found in our collection items. Starting with the physical items, and figuring out how best to capture them, they have experimented with ways to display the watermarks digitally. The process requires many forms of expertise, but the results facilitate the study and appreciation of the designs.

Two women standing by a book with cameras and tools
Figure 12: Studio setup for capturing the watermarks

 

Animated image showing traditional and translucid view of a manuscript with a watermark highlighted by digital tracing.
Figure 13: Gif image showing traditional and translucid view with watermark highlighted by digital tracing.

 

Addressing Problematic Terms in our Catalogues and Translations Project

Serim Abboushi (Arabic & English Web Content Editor), Mariam Aboelezz (Translation Support Officer), Louis Allday (Gulf History Cataloguing Manager), Sotirios Alpanis (former Head of Digital Operations), John Casey (Cataloguer, Gulf History), David Fitzpatrick (Content Specialist, Archivist), Susannah Gillard (Content Specialist, Archivist), John Hayhurst (Content Specialist, Gulf History), Julia Ihnatowicz (Translation Specialist), William Monk (Cataloguer, Gulf History), Hannah Nagle (Senior Imaging Support Technician), Noemi Ortega-Raventos (Content Specialist, Archivist), Francis Owtram (Content Specialist, Gulf History), Curstaidh Reid (Cataloguer, Gulf History), George Samaan (Translation Support Officer), Tahani Shaban (Translation Specialist), David Woodbridge (Cataloguer, Gulf History), Nariman Youssef (Arabic Translation Manager) and special thanks to the BL BAME Staff Network.

The Addressing Problematic Terms in our Catalogues and Translations Project was joint winner of the 2020 BL Labs Staff Award. It is an ongoing, highly collaborative project inspired by a talk given by Dr Melissa Bennett about decolonising the archive and how to deal with problematic terms found in archive items. Using existing translation tools and a custom-built python script, the group has been analysing terms that appear in the original language of the documents, and assessing how best to address them in both English and Arabic. This work enables the project to treat problematic terms sensitively and to contextualise them in our catalogue descriptions and translations.

 

More projects

The work continues with projects that explore how to use and share different methods and technologies. For example, Hannah Nagle has taught us how to collage using digital images (How to make art when we’re working apart), Ellis Meade has created a Bitsy game based in the Qatar National Library that draws you inside a manuscript (‘Hidden world of the Qatar National Library’), and Dr Mariam Aboelezz has used the BL/QFP Translation Memory to analyse how we were using the Arabic Verb Form X (istafʿal) in our translations of catalogue descriptions (‘Investigating Instances of Arabic Verb Form X in the BL/QFP Translation Memory’).

Pixelated image of a stick person in front of the Qatar National Library using Bitsy from ‘Hidden world of the Qatar National Library’  blog post by Ellis Meade (Senior Imaging Technician)
Figure 14: Image of the Qatar National Library using Bitsy from ‘Hidden world of the Qatar National Library’ by Ellis Meade (Senior Imaging Technician)

 

We have also made the most of the Covid-19 restrictions and working from home, to share and learn skills such as coding, Arabic language, and photography. For example, through the Project’s ‘Code Club’, many of us have learnt about python and have written scripts to streamline our tasks. Furthermore, codes to explore the collections have also had creative outputs, such as Anne Courtney’s project “Making data into sound” (Runner-up, BL Labs Staff Awards, 2020).

The Project’s extraordinary collaborative work demonstrates some of the exciting and innovative ways to engage with library and archival collections. It also makes clear the wider benefits of digitisation and providing free online access to fully bilingual catalogued resources.

You can read about some of our projects in more detail in the blog posts below:

You can read about previous BL/QFP Hack Days in the blog posts below:

This is a guest post by the British Library Qatar Foundation Partnership Project, compiled by Laura Parsons. You can follow the British Library Qatar Foundation Partnership on Twitter at @BLQatar.

29 March 2021

British Library x British Fashion Council Student Fashion Awards

Add comment

The British Library and the British Fashion Council held a Virtual Awards Ceremony on 22nd March to celebrate the results of the 2021 research inspired fashion design competition, part of an innovative collaboration between the British Library and the British Fashion Council. The event showcased examples of what has inspired a new generation of fashion designers who have explored the richness and potential of the Library’s extensive digital collections for a design project based on themes of ‘Identity’ or ‘Disruption’.

Earlier in the day a winner was selected by a panel of judges comprised of respected industry individuals including Anna Orsini, Strategic Consultant, British Fashion Council; Dal Chodha, Editor of non-seasonal publication Archivist, Writer & Consultant; Halina Edwards, Researcher, Lead Designer at The Black Curriculum and by the Library’s Daniel Lowe - Curator Arabic Collections. Nabil El Nayal, Designer and Course Leader MA Fashion Design Technology Womenswear, London College of Fashion undertook an advisory role for the judging day and the panel was chaired by Judith Rosser-Davies, Head of Government Relations & Education, Chair Colleges Council, British Fashion Council.  

In addition to the Judges' Award, an additional award - the Public Award was voted for by the online audience at the Awards event.

 

The Judges' Award

The winner of the Judges Award was Adela Babinska, MA Womenswear Student at the London College of Fashion. Her submission ‘NOT’ reflected on the absence of knowledge and on the need for persisting curiosity in our challenging times.

Model wearing Adela Babinska's design, a clear dress with red accents.
Winning entry by Adela Babinska

Adela arrived in London during lockdown from Slovakia as an international student and found herself in a strange new reality of being a London student in Covid times – studying online, meeting her colleagues and tutors online, and responding to our fashion competition without ever visiting the Library. Just as many other students, she discovered that the access to online resources, including at the British Library, is limited and, also, that she cannot complete the Library’s registration process during the lockdown. Normally in her research Adela would search for the answers, but in the absence of being able to visit the Library and with only limited access to online content, she found herself in a disruptive position, so she decided to explore how she could benefit from this.

In her submission, Adela referenced the same emotion that she felt after reading Waiting for Godot – which prompts many questions, but does not provide the answers. Rather than search for the answers she reflected that she needed to search for the questions and decided to generate as many queries as possible about the Library and to use these to guide her designs, including the intangibility of the Library during lockdown. Her research for this project led her to see that the mere act of not knowing can also be powerful.

Her thinking and questioning included a conceptualisation of the icon image on the Customer Services section of the BL website in relation to impersonality and distance. She also reflected on what accessing the BL Sound archives could bring to her research and referenced a recording of the foyer of the Library in which she could hear voices which made the Library seem more real to her.

As well as producing a fantastic fashion presentation, Adela’s experience and search for the Library amidst the current disruptions provides both a challenging and inspirational view of the Library from a student perspective.

 

The Public Award

The winner of the Public Award is Chiara Lamon, Fashion and Textile Student from Gray’s School of Art, based at the Robert Gordon University in Aberdeen, with her submission ‘Morphe’.

A series of nine sketches showing Chiara Lamon's designs in black, gold and bronze.
Designs from the public award winner Chiara Lamon

 

Chiara analysed the idea of a malleable and dynamic identity through a study of body movement.  Her research approach focused on disrupting common ideas of the stillness and singularity of our identity. Using the Library’s online collections, she explored visual imagery, blogs, thematic pages books and ideas which she considered had a high impact on the quality and depth of her research. This experience helped Chiara reflect on how she now conducts research - researching opposites, making mistakes and looking for the unexpected, letting the research lead her rather than trying to control it.

Chiara’s research involved exploring the catalogues on futurism and cubism and discovering connection and metaphors in unexpected ways, such as from images of geological strata from the Library’s Flickr collection to represent layers of the self, building on the concept of the body multiple.

Engaging with practice led research Chiara was able to explore ideas from a digital and physical development for a more sustainable practice. Images that Chiara used from the Library’s digital collections included the work of photographer Ethienne Jules, images from geology and dance and a comparison of a multiple collar garment from Viktor and Rolf’s Autumn /Winter 2003 collection paired with an early portrait of a man in a collared shirt.

Chiara turned images from BL collections and abstracted forms into shapes to apply to the body as an experience for a multifaceted and morphing representation of the self and our dynamic nature.

 

The Finalists

Eight finalists were shortlisted from a strong field of 111 submissions for this year’s competition.

The six other finalists were:

Jordan Fergusson, Manchester Metropolitan University

Jordan's submission ‘The Tearlachs’, is based on a theme of disrupting the codes of highland dress and exploring gender. Jordan reflected on Jacobite poetry and song using multiple images, manuscripts and letters from the BL Collection and extracts from a Jacobite Songbook from 1863. The depictions in this book gave Jordan a strong sense of the character of a Tearlach (instigator).  Jordan presents the Tearlachs as gender non-conforming trailblazers of the highlands, embodying the Jacobite cause and romanticism of Scottish History. In designing his collection, Jordan used himself as a blank canvas to build up inspiration transferring it to 2D then back to 3D. His research led to Jordan to design a ‘warped tartan’ as his own interpretation of a’Tearlach Tartan’, inspired by graph-like images of the Aurora Borealis from the BL Collection.

Maria Fernanda Nava Melgar, Royal College of Art

Maria’s submission ‘The Invisibles’ explored identity through fundamental questions such as to what are we made of, how we are perceived and how are we are listened to. Her work took inspiration from chiaroscuro scientific paintings from the 18th Century, medical journals and distorted sound recordings (Touch Radio) from the BL Collection.  Images of decay and cells were used to inform the design of her collection. Maria reflected on the relationship between light, space and sound and the human body and how the body is information made out of layers that are constantly being pressed and crushed against each other whist working together inside a system.

Emma Fraser, Gray’s School of Art, (Robert Gordon University Aberdeen)

Emma’s submission ‘Repair Yourself’ was based on a following a powerful, moving journey as a survivor of sexual assault and her message to other survivors that they are not alone. Emma used visual metaphors to explore the idea of trauma and recovery initially starting with the concept of damage and repair, reflected through her textile work and then through ideas of comfort and exposure. Another aspect of her work was exploring the over-sexualising of women and the idea of femininity by researching the BL archives.  Her submission included images from the Library’s s Flickr Collection -Women of the World. Emma intended her work as a representation of the courage and strength that it takes to go on such a journey of recovery, confirming that it was ‘not about the attackers, but about the survivors’.

Kelsey Ann Kasom, Royal College of Art        

Kelsey’s submission ‘Identity’ is based on the concept of the left side of her brain being her inner child the ‘past’ and right side of her brain being her creative genius – the ‘present’. Using research and images form the BL Collection her work focused on what can be created from harmonising these two aspects. Kelsey explored abstract feelings and surreal thought working to make sense of them three dimensionally ‘as an extension of the soul outside the body’. Kelsey experimented with the technical aspects of working with organza creating shapes for the body

Louise Korner, London College of Fashion

Louise’s submission ‘The Becoming’ focused on the theme of disruption as a design approach using a destructive force such as fire, (by watching objects burning), to lead to a greater understanding of fire as a transformative process and also to highlight current issues surrounding the fashion industry.  Using images and art work from the 18th century from BL Collections, including an image of a forest fire, Louise reflected on what is left after the disruption of the landscape and what is regenerated after the fire. Louise used a disruptive approach to her research of the BL catalogues by mis - spelling words and entering words backwards in order ‘to stumble across an interesting recording or sound or art work.’ Her research led to her designing a ‘hidden garment withing a garment’ that the wearer decides when to reveal and garments that are sustainable that can be returned to the designer for repurposing.

Cameron Lyall, Gray’s School of Art, (Robert Gordon University Aberdeen)

Cameron’s submission ‘No Place’ is inspired by a theme of identity. It tells a story of the ‘pilgrim of no place’ and their journey, both physically and mentally to understand their own identity. This journey involves the pilgrim reflecting on history and what they have learnt and applying this to ‘a future forward-thinking attitude.’ Cameron’s concept was in response to his own journey in 2020.  His work included images from ancient philosophies, celestial esoterica and astrology from the BL Collection. For his designs, Cameron dissembled and reconstructed garments, some of which took on ‘symbiotic shapes reminiscent of beetles’ purposing and rebranding these garments into new concepts.

For any further information, email highereducation@bl.uk.

15 March 2021

Competition to Proofread Bengali Books on Wikisource

Add comment

Can you read and write in Bangla? Or should I say আপনি কি বাংলা পড়তে এবং লিখতে পারেন? If you were able to read that, congratulations, you are the perfect candidate!

You might be interested in a competition we have launched today asking for help to proofread text that has been automatically transcribed from our historical Bengali books. The competition, in partnership with the West Bengal Wikimedians User Group, and the Bengali Wikisource community, will run until 14th April and invites contributors to create perfect transcriptions of the books.  

More information is available on the Wikisource competition page, including how to get started and prizes on offer.

The books have been digitised through our Two Centuries of Indian Print project, with more than 25 uploaded to Wikisource, an online and free-content digital library where it is possible to view the digitised books and corresponding transcriptions side-by-side. We were inspired by a talk given by the National Library of Scotland who uploaded some of their collections to Wikisource, and thought it could be a useful platform to increase online access to the textual content in our books too.

 

2CIPBook_Wikisource

Above: A view of a Bengali book within the Wikisource platform showing digitised page [R] and transcription [L]

 

Luckily a lot of the transcription work has already been done through using Google’s Optical Character Recognition technology (OCR) to read the Bengali text. However, the results are not perfect, with words in the original books often misspelled in the OCR. That’s where we need human intervention to proofread the OCR and fix the mistakes.

We also want to export proofread transcriptions from Wikisource and make them available as a dataset that could prove interesting to researchers who want to mine thousands of pages of text.

The books we would like proofread cover a multitude of topics and include an adaptation of the Illiad, a book containing a collection of 19th century proverbs and sayings, and a work describing the Bratas fasting ceremonies observed by the Hindu women of what is now Bangladesh. So, if you are looking for a literary indulgence whilst at the same time helping to improve access for others to valuable historical material, this could be an ideal opportunity.

 

This post is by Digital Curator Tom Derrick (@TommyID83)

19 February 2021

AURA Research Network Second Workshop Write-up

Add comment

Keen followers of this blog may remember a post from last December, which shared details of a virtual workshop about AI and Archives: Current Challenges and Prospects of Digital and Born-digital archives. This topic was one of three workshop themes identified by the Archives in the UK/Republic of Ireland & AI (AURA) network, which is a forum promoting discussions on how Artificial Intelligence (AI) can be applied to cultural heritage archives, and to explore issues with providing access to born digital and hybrid digital/physical collections.

The first AURA workshop on Open Data versus Privacy organised by Annalina Caputo from Dublin City University, took place on 16-17 November 2020. Rachel MacGregor provides a great write-up of this event here.

Here at the British Library, we teamed up with our friends at The National Archives to curate the second AURA workshop exploring the current challenges and prospects of born-digital archives, this took place online on 28-29 January 2021. The first day of the workshop held on 28 January was organised by The National Archives, you can read more about this day here, and the following day, 29 January, was organised by the BL, videos and slides for this can be found on the AURA blog and I've included them in this post.

AURA

The format for both days of the second AURA workshop comprised of four short presentations, two interactive breakout room sessions and a wider round-table discussion. The aim being that the event would generate dialogue around key challenges that professionals across all sectors are grappling with, with a view to identifying possible solutions.

The first day covered issues of access both from infrastructural and user’s perspectives, plus the ethical implications of the use of AI and advanced computational approaches to archival practices and research. The second day discussed challenges of access to email archives, and also issues relating to web archives and emerging format collections, including web-based interactive narratives. A round-up of  the second day is below, including recorded videos of the presentations for anyone unable to attend on the day.

Kicking off day two, a warm welcome to the workshop attendees was given by Rachel Foss, Head of Contemporary Archives and Manuscripts at the British Library, Larry Stapleton, Senior academic and international consultant from the Waterford Institute of Technology and Mathieu d’ Aquin, Professor of Informatics at the National University of Ireland Galway.

The morning session on Email Archives: challenges of access and collaborative initiatives was chaired by David Kirsch, Associate Professor, Robert H. Smith School of Business, University of Maryland. This featured two presentations:

The first of these was  about Working with ePADD: processes, challenges and collaborative solutions in working with email archives, by Callum McKean, Curator for Contemporary Literary and Creative Archives, British Library and Jessica Smith, Creative Arts Archivist, John Rylands Library, University of Manchester. Their slides can be viewed here and here. Apologies that the recording of Callum's talk is clipped, this was due to connectivity issues on the day.

The second presentation was Finding Light in Dark Archives: Using AI to connect context and content in email collections by Stephanie Decker, Professor of History and Strategy, University of Bristol and Santhilata Venkata, Digital Preservation Specialist & Researcher at The National Archives in the UK.

After their talks, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the morning session were:

  1. Are there any other appraisal or collaborative considerations that might improve our practices and offer ways forward?
  2. What do we lose by emphasizing usability for researchers?
  3. Should we start with how researchers want to use email archives now and in the future, rather than just on preservation?
  4. Potentialities of email archives as organizational, not just individual?

These questions led to discussions about, file formats, collection sizes, metadata standards and ways to interpret large data sets. There was interest in how email archives might allow researchers to reconstruct corporate archives, e.g. understand social dynamics of the office and understand decision making processes. It was felt that there is a need to understand the extent to which email represents organisation-level context. More questions were raised including:

  • To what extent is it part of the organisational records and how should it be treated?
  • How do you manage the relationship between constant organisational functions and structure (a CEO) and changing individuals?
  • Who will be looking at organisational email in the future and how?

It was mentioned that there is a need to distinguish between email as data and email as an artifact, as the use-cases and preservation needs may be markedly different.

Duties of care that exist between depositors, tool designers, archivists and researchers was discussed and a question was asked about how we balance these?

  • Managing human burden
  • Differing levels of embargo
  • Institutional frameworks

There was discussion of the research potential for comparing email and social media collections, e.g. tweet archives and also the difficulties researchers face in getting access to data sets. The monetary value of email archives was also raised and it was mentioned that perceived value, hasn’t been translated into monetary value.

Researcher needs and metadata was another topic brought up by attendees, it was suggested that the information about collections in online catalogues needs to be descriptive enough for researchers to decide if they wish to visit an institution, to view digital collections on a dedicated terminal. It was also suggested that archives and libraries need to make access restrictions, and the reasoning for these, very clear to users. This would help to manage expectations, so that researchers will know when to visit on-site because remote access is not possible. It was mentioned that it is challenging to identify use cases, but it was noted that without deeper understanding of researcher needs, it can be hard to make decisions about access provision.

It was acknowledged that the demands on human-processing are still high for born digital archives, and the relationship between tools and professionals still emergent. So there was a question about whether researchers could be involved in collaborations more, and to what extent will there be an onus on their responsibilities and liabilities in relation to usage of born digital archives?

Lots of food for thought before the break for lunch!

The afternoon session chaired by Nicole Basaraba, Postdoctoral Researcher, Studio Europa, Maastricht University, discussed Emerging Formats, Interactive Narratives and Socio-Cultural Questions in AI.

The first afternoon presentation Collecting Emerging Formats: Capturing Interactive Narratives in the UK Web Archive was given by Lynda Clark, Post-doctoral research fellow in Narrative and Play at InGAME: Innovation for Games and Media Enterprise, University of Dundee, and Giulia Carla Rossi, Curator for Digital Publications, British Library. Their slides can be viewed here.  

The second afternoon session was Women Reclaiming AI: a collectively designed AI Voice Assistant by Coral Manton, Lecturer in Creative Computing, Bath Spa University, her slides can be seen here.

Following the same format as in the morning, after these presentations, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the afternoon session were:

  1. Should we be collecting examples of AIs, as well as using AI to preserve collections? What are the Implications of this
  2. How do we get more people to feel that they can ask questions about AI?
  3. How do we use AI to think about the complexity of what identity is and how do we engineer it so that technologies work for the benefit of everyone?

There was a general consensus, which acknowledged that AI is becoming a significant and pervasive part of our life. However it was felt that there are many aspects we don't fully understand. In the breakout groups workshop participants raised more questions, including:

  • Where would AI-based items sit in collections?
  • Why do we want it?
  • How to collect?
  • What do we want to collect? User interactions? The underlying technology? Many are patented technologies owned by corporations, so this makes it challenging. 
  • What would make AI more accessible?
  • Some research outputs may be AI-based - do we need to collect all the code, or just the end experience produced? If the latter, could this be similar to documenting evidence e.g. video/sound recordings or transcripts.
  • Could or should we use AI to collect? Who’s behind the AI? Who gets to decide what to archive and how? Who’s responsible for mistakes/misrepresentations made by the AI?

There was debate about how to define AI in terms of a publication/collection item, it was felt that an understanding of this would help to decide what archives and libraries should be collecting, and understand what is not being collected currently. It was mentioned that a need for user input is a critical factor in answering questions like this. A number of challenges of collecting using AI were raised in the group discussions, including:

  • Lack of standardisation in formats and metadata
  • Questions of authorship and copyright
  • Ethical considerations
  • Engagement with creators/developers

It was suggested that full scale automation is not completely desirable and some kind of human element is required for specialist collections. However, AI might be useful for speeding up manual human work.

There was discussion of problems of bias in data, that existing prejudices are baked into datasets and algorithms. This led to more questions about:

  • Is there is a role for curators in defining and designing unbiased and more representative data sets to more fairly reflect society?
  • Should archives collect training data, to understand underlying biases?
  • Who is the author of AI created text and dialogue? Who is the legally responsible person/orgnisation?
  • What opportunities are there for libraries and archives to teach people about digital safety through understanding datasets and how they are used?

Participants also questioned:

  • Why do we humanise AI?
  • Why do we give AI a gender?
  • Is society ready for a genderless AI?
  • Could the next progress in AI be a combination of human/AI? A biological advancement? Human with AI “components” - would that make us think of AIs as fallible?

With so many questions and a lack of answers, it was felt that fiction may also help us to better understand some of these issues, and Rachel Foss ended the roundtable discussion by saying that she is looking forward to reading Kazuo Ishiguro’s new novel Klara and the Sun, about an artificial being called Klara who longs to find a human owner, which is due to be published next month by Faber.

Thanks to everyone who spoke at and participated in this AURA workshop, to make it a lively and productive event. Extra special thanks to Deirdre Sullivan for helping to run the online event smoothly. Looking ahead, the third workshop on Artificial Intelligence and Archives: What comes next? is being organised by the University of Edinburgh in partnership with the AURA project team, and is scheduled to take place on Tuesday 16 March 2021. Please do join the AURA mailing list and follow #AURA_network on social media to be part of the network's ongoing discussions.

This post is by Digital Curator Stella Wisdom (@miss_wisdom)

11 February 2021

Investigating Instances of Arabic Verb Form X in the BL/QFP Translation Memory

Add comment

The Arabic language has a root+pattern morphology where words are formed by casting a (usually 3-letter) root into a morphological template of affixed letters in the beginning, middle and/or end of the word. While most of the meaning comes from the root, the template itself adds a layer of meaning. For our latest Hack Day, I investigated uses of Arabic Verb Form X (istafʿal) in the BL/QFP Translation Memory.

I chose this verb form because it conveys the meaning of seeking or acquiring something for oneself, possibly by force. It is a transitive verb form where the subject may be imposing something on the object and can therefore convey subtle power dynamics. For example, it is the form used to translate words such as ‘colonise’ (yastaʿmir) and ‘enslave’ (yastaʿbid). I wanted to get a sense of whether this form could reflect unconscious biases in our translations – an extension of our work in the BLQFP team to address problematic language in cataloguing and translation.

The other reason I chose this verb form is that it is achieved by affixing three consonants to the beginning of the word, which made it possible to search for in our Translation Memory (TM). The TM is a bilingual corpus, stretching back to 2014, of the catalogue descriptions we translate for the digitised India Office Records and Arabic scientific manuscripts on the QDL. We access the TM through our translation management system (memoQ), which offers some basic search functionalities. This includes a ‘wild card’ option where the results list all the words that begin with the three Form X consonants under investigation (است* and يست*).

Snippet of results in memoQ using the wildcard search function
Figure 1: Snippet of results in memoQ using the wildcard search function.

 

My initial search using these two 3-letter combinations returned 2,140 results. I noticed that there were some recurring false positives such as certain place names and the Arabic calque of ‘strategy’ (istrātījiyyah). The most recurring false positive (699 counts), however, was the Arabic verb for ‘receive’ (istalam) – which is unsurprising given frequent references to correspondences being sent and received in catalogue descriptions of IOR files. What makes this verb a false positive is that the ‘s’ is in fact a root consonant, and therefore the verb actually belongs to Form VIII (iftaʿal). 

After eliminating these false positives, I ended up with 1349 matches. From these, I was able to identify 55 unique verbs used in relation to IOR files. I then conducted a more targeted search of three cases of each verb: the perfective (past) istafʿal, the imperfective (present) yastafʿil, and the verbal noun (istifʿāl). I used the wild card function again to capture variations of these cases with suffixes attached (e.g. pronoun or plural suffixes). Although these would have been useful too, I did not look for the active (mustafʿil) and passive (mustafʿal) participles because the single short vowel that differentiates them is rarely represented in Arabic writing. Close scrutiny of the context of each result would have been needed in order to assign them correctly, and I did not have enough time for that in a single day.

List of the Form X verbs found in the TM and their frequency (excluding six verbs that only occur once)
Figure 2: List of the Form X verbs found in the TM and their frequency (excluding six verbs that only occur once)

 

I made a note of the original English term(s) that the Form X verb was used to translate. I then identified seven potentially problematic verbs that required further investigation. These six verbs typically convey an action that is being either forcefully or wrongfully imposed.

Seven potentially problematic verbs that take Form X in the TM
Figure 3: Seven potentially problematic verbs that take Form X in the TM

 

My next step was to investigate the use of these verbs in context more closely. I looked at the most frequent of these verbs (istawlá/yastawlī/istīlaʾ) in our TM, first using the source + target view, and then the three-column concordance view of the target text. The first view allowed me to scrutinise how we have been employing this verb vis-à-vis the original term used in the English catalogue description. It struck me that, in some cases, more neutral verbs such as ‘take’ and ‘take possession of’ were used on the English side; meaning that bias was introduced during translation.

Source + target view of concordance results for the verb istawlá
Figure 4: Source + target view of concordance results for the verb istawlá

 

The second view makes it possible to see the text immediately preceding and succeeding the verb, typically displaying the assigned subject and object of the verb. It therefore shows who is doing what to whom more clearly, even though the script direction goes a bit awry for Arabic. Here, I noticed that the subjects were disproportionately non-British: it is overwhelmingly native rulers and populations, ‘pirates’, and rival countries who were doing the forceful or wrongful taking in the results. This may indicate an unconscious bias that has travelled from the primary sources to the catalogue descriptions and is something that requires further investigation.

Three-column view of concordance results for the verb istawlá
Figure 5: Three-column view of concordance results for the verb istawlá

 

My hack day investigation was conducted in the spirit of continuous reflection on and improvement of our translation process. Using a verb form rather than specific words as a starting point provided an aggregate view of our practices, which is useful in trying to tease out how the descriptions on the QDL may collectively convey an overall stance or attitude. My investigation also demonstrates the value of our TM, not only for facilitating and maintaining consistency in translation, but as a research tool with countless possibilities. My findings from the hack day are naturally rough-and-ready, but they provide the seed for further conversations about problematic language and unconscious bias among translators and cataloguers.

This is a guest post by linguist and translator Dr Mariam Aboelezz (@MariamAboelezz), Translation Support Officer, BL/QFP Project