THE BRITISH LIBRARY

Digital scholarship blog

219 posts categorized "Digital scholarship"

14 April 2021

Wrangling Wikidata With #1lib1ref 2021

Add comment

Since starting at the Library at the beginning of March, one of the highlights of my working week has been meeting with the IFLA Wikidata and Wikibase Working Group. IFLA is the International Federation of Library Associations, a global body representing the interests of libraries worldwide.

This working group ‘aims to coordinate actions, events and preparation of documents to leverage Wikidata and Wikibase in support of documenting collections and support capacity building in linked data, structured data, and cataloguing work’. For the last six weeks, myself and Digital Curator Stella Wisdom, have been working alongside collaborators from such disparate locations as Jerusalem, New York and Toronto, amongst others, to prepare materials, events and opportunities for the upcoming #1lib1ref campaign.

IFLA20201lib1ref

The next #1Lib1Ref runs from 15th May to 5th June 2021. This campaign, run by the Wikimedia Foundation, invites library staff and patrons to improve the reliability of sources in Wikipedia. Using the philosophy of ‘1 librarian, 1 reference’ the campaign focusses on filling in the gaps of missing references – if just one person adds just one reference, think of what we could do collectively! Full information can be found at the #1Lib1Ref Wikimedia page.

Ahead of this upcoming #1Lib1Ref, IFLA’s Wikidata and Wikibase Working Group are offering a Train-The-Trainers workshop, for up to 50 participants, on Wednesday 21st April at 16:00 CET. This training session, run by Meg Wacha of City University, New York, will show participants how to set up an event to contribute to Wikidata during the #1Lib1Ref campaign period. More details and registration for this online event on Wednesday 21st April can be found on the IFLA website here.

The group will provide resources that can help you learn how to edit Wikidata, and demonstrate the advantages that Wikidata provides for library collections. They will also be holding online weekly informal drop-in office hours throughout May and early June, in which participants can seek advice and guidance from experienced Wikimedians. The British Library will be hosting one of these virtual drop-in office hour sessions on Wednesday 5th of May at 3pm BST, do follow this blog and @BL_Wikimedian on Twitter for details when they are available of how to join these office hours.

We hope to see you there!

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian)

29 March 2021

British Library x British Fashion Council Student Fashion Awards

Add comment

The British Library and the British Fashion Council held a Virtual Awards Ceremony on 22nd March to celebrate the results of the 2021 research inspired fashion design competition, part of an innovative collaboration between the British Library and the British Fashion Council. The event showcased examples of what has inspired a new generation of fashion designers who have explored the richness and potential of the Library’s extensive digital collections for a design project based on themes of ‘Identity’ or ‘Disruption’.

Earlier in the day a winner was selected by a panel of judges comprised of respected industry individuals including Anna Orsini, Strategic Consultant, British Fashion Council; Dal Chodha, Editor of non-seasonal publication Archivist, Writer & Consultant; Halina Edwards, Researcher, Lead Designer at The Black Curriculum and by the Library’s Daniel Lowe - Curator Arabic Collections. Nabil El Nayal, Designer and Course Leader MA Fashion Design Technology Womenswear, London College of Fashion undertook an advisory role for the judging day and the panel was chaired by Judith Rosser-Davies, Head of Government Relations & Education, Chair Colleges Council, British Fashion Council.  

In addition to the Judges' Award, an additional award - the Public Award was voted for by the online audience at the Awards event.

 

The Judges' Award

The winner of the Judges Award was Adela Babinska, MA Womenswear Student at the London College of Fashion. Her submission ‘NOT’ reflected on the absence of knowledge and on the need for persisting curiosity in our challenging times.

Model wearing Adela Babinska's design, a clear dress with red accents.
Winning entry by Adela Babinska

Adela arrived in London during lockdown from Slovakia as an international student and found herself in a strange new reality of being a London student in Covid times – studying online, meeting her colleagues and tutors online, and responding to our fashion competition without ever visiting the Library. Just as many other students, she discovered that the access to online resources, including at the British Library, is limited and, also, that she cannot complete the Library’s registration process during the lockdown. Normally in her research Adela would search for the answers, but in the absence of being able to visit the Library and with only limited access to online content, she found herself in a disruptive position, so she decided to explore how she could benefit from this.

In her submission, Adela referenced the same emotion that she felt after reading Waiting for Godot – which prompts many questions, but does not provide the answers. Rather than search for the answers she reflected that she needed to search for the questions and decided to generate as many queries as possible about the Library and to use these to guide her designs, including the intangibility of the Library during lockdown. Her research for this project led her to see that the mere act of not knowing can also be powerful.

Her thinking and questioning included a conceptualisation of the icon image on the Customer Services section of the BL website in relation to impersonality and distance. She also reflected on what accessing the BL Sound archives could bring to her research and referenced a recording of the foyer of the Library in which she could hear voices which made the Library seem more real to her.

As well as producing a fantastic fashion presentation, Adela’s experience and search for the Library amidst the current disruptions provides both a challenging and inspirational view of the Library from a student perspective.

 

The Public Award

The winner of the Public Award is Chiara Lamon, Fashion and Textile Student from Gray’s School of Art, based at the Robert Gordon University in Aberdeen, with her submission ‘Morphe’.

A series of nine sketches showing Chiara Lamon's designs in black, gold and bronze.
Designs from the public award winner Chiara Lamon

 

Chiara analysed the idea of a malleable and dynamic identity through a study of body movement.  Her research approach focused on disrupting common ideas of the stillness and singularity of our identity. Using the Library’s online collections, she explored visual imagery, blogs, thematic pages books and ideas which she considered had a high impact on the quality and depth of her research. This experience helped Chiara reflect on how she now conducts research - researching opposites, making mistakes and looking for the unexpected, letting the research lead her rather than trying to control it.

Chiara’s research involved exploring the catalogues on futurism and cubism and discovering connection and metaphors in unexpected ways, such as from images of geological strata from the Library’s Flickr collection to represent layers of the self, building on the concept of the body multiple.

Engaging with practice led research Chiara was able to explore ideas from a digital and physical development for a more sustainable practice. Images that Chiara used from the Library’s digital collections included the work of photographer Ethienne Jules, images from geology and dance and a comparison of a multiple collar garment from Viktor and Rolf’s Autumn /Winter 2003 collection paired with an early portrait of a man in a collared shirt.

Chiara turned images from BL collections and abstracted forms into shapes to apply to the body as an experience for a multifaceted and morphing representation of the self and our dynamic nature.

 

The Finalists

Eight finalists were shortlisted from a strong field of 111 submissions for this year’s competition.

The six other finalists were:

Jordan Fergusson, Manchester Metropolitan University

Jordan's submission ‘The Tearlachs’, is based on a theme of disrupting the codes of highland dress and exploring gender. Jordan reflected on Jacobite poetry and song using multiple images, manuscripts and letters from the BL Collection and extracts from a Jacobite Songbook from 1863. The depictions in this book gave Jordan a strong sense of the character of a Tearlach (instigator).  Jordan presents the Tearlachs as gender non-conforming trailblazers of the highlands, embodying the Jacobite cause and romanticism of Scottish History. In designing his collection, Jordan used himself as a blank canvas to build up inspiration transferring it to 2D then back to 3D. His research led to Jordan to design a ‘warped tartan’ as his own interpretation of a’Tearlach Tartan’, inspired by graph-like images of the Aurora Borealis from the BL Collection.

Maria Fernanda Nava Melgar, Royal College of Art

Maria’s submission ‘The Invisibles’ explored identity through fundamental questions such as to what are we made of, how we are perceived and how are we are listened to. Her work took inspiration from chiaroscuro scientific paintings from the 18th Century, medical journals and distorted sound recordings (Touch Radio) from the BL Collection.  Images of decay and cells were used to inform the design of her collection. Maria reflected on the relationship between light, space and sound and the human body and how the body is information made out of layers that are constantly being pressed and crushed against each other whist working together inside a system.

Emma Fraser, Gray’s School of Art, (Robert Gordon University Aberdeen)

Emma’s submission ‘Repair Yourself’ was based on a following a powerful, moving journey as a survivor of sexual assault and her message to other survivors that they are not alone. Emma used visual metaphors to explore the idea of trauma and recovery initially starting with the concept of damage and repair, reflected through her textile work and then through ideas of comfort and exposure. Another aspect of her work was exploring the over-sexualising of women and the idea of femininity by researching the BL archives.  Her submission included images from the Library’s s Flickr Collection -Women of the World. Emma intended her work as a representation of the courage and strength that it takes to go on such a journey of recovery, confirming that it was ‘not about the attackers, but about the survivors’.

Kelsey Ann Kasom, Royal College of Art        

Kelsey’s submission ‘Identity’ is based on the concept of the left side of her brain being her inner child the ‘past’ and right side of her brain being her creative genius – the ‘present’. Using research and images form the BL Collection her work focused on what can be created from harmonising these two aspects. Kelsey explored abstract feelings and surreal thought working to make sense of them three dimensionally ‘as an extension of the soul outside the body’. Kelsey experimented with the technical aspects of working with organza creating shapes for the body

Louise Korner, London College of Fashion

Louise’s submission ‘The Becoming’ focused on the theme of disruption as a design approach using a destructive force such as fire, (by watching objects burning), to lead to a greater understanding of fire as a transformative process and also to highlight current issues surrounding the fashion industry.  Using images and art work from the 18th century from BL Collections, including an image of a forest fire, Louise reflected on what is left after the disruption of the landscape and what is regenerated after the fire. Louise used a disruptive approach to her research of the BL catalogues by mis - spelling words and entering words backwards in order ‘to stumble across an interesting recording or sound or art work.’ Her research led to her designing a ‘hidden garment withing a garment’ that the wearer decides when to reveal and garments that are sustainable that can be returned to the designer for repurposing.

Cameron Lyall, Gray’s School of Art, (Robert Gordon University Aberdeen)

Cameron’s submission ‘No Place’ is inspired by a theme of identity. It tells a story of the ‘pilgrim of no place’ and their journey, both physically and mentally to understand their own identity. This journey involves the pilgrim reflecting on history and what they have learnt and applying this to ‘a future forward-thinking attitude.’ Cameron’s concept was in response to his own journey in 2020.  His work included images from ancient philosophies, celestial esoterica and astrology from the BL Collection. For his designs, Cameron dissembled and reconstructed garments, some of which took on ‘symbiotic shapes reminiscent of beetles’ purposing and rebranding these garments into new concepts.

For any further information, email highereducation@bl.uk.

24 March 2021

Welcome to the British Library’s new Wikimedian in Residence

Add comment

Hello, I’m Dr Lucy Hinnie and I’ve just joined the Digital Scholarship team as the new Wikimedian-in-Residence, in conjunction with Wikimedia UK and the Eccles Centre. My role is to work with the Library to develop and support colleagues with projects using Wikidata, Wikibase and Wikisource.

Bringing underrepresented people and marginalised communities to the fore is a huge part of this remit, and I am looking to be as innovative in our partnerships as we can be, with a view to furthering the movement towards decolonisation. I’m going to be working with curators and members of staff throughout the Library to identify and progress opportunities to accelerate this work.

I have recently returned from a two-year stay in Canada, where I lived and worked on Treaty Six territory and the homeland of the Métis. Working and living in Saskatchewan was a hugely formative experience for me, and highlighted the absolute necessity of forward-thinking, reconciliatory work in decolonisation.

Picture of two black bear sculptures in the snow at Wanuskewin Heritage Park
Wanuskewin Heritage Park, Saskatoon, December 2020

2020 was my year of immersion in Wikimedia – I participated in a number of events, including outreach work by Dr Erin O’Neil at the University of Alberta, Women in Red edit-a-thons with Ewan McAndrew at the University of Edinburgh and the Unfinished Business edit-a-thon run by Leeds Libraries and the British Library. In December 2020 I coordinated and ran my own Wikithon in conjunction with the National Library of Scotland, as part of my postdoctoral project ‘Digitising the Bannatyne MS’.

Page from the Bannatyne Manuscript, stating 'heir begynnys ane ballat buik [writtin] in the yeir of god 1568'
Front page of the Bannatyne MS, National Library of Scotland, Adv MS 1.1.6. (CC BY 4.0)

Since coming into post at the start of this March I have worked hard to make connections with organisations such as IFLA, Code the City and Art+Feminism. I’ve also been creating introductory materials to engage audiences with Wikidata, and thinking about how best to utilise the coming months.

Andrew Gray took up post as the first British Library Wikipedian in Residence nearly ten years ago, you can read more about this earlier residency here and here. Much has changed since then, but reflection on the legacy of Wikimedia activity is a crucial part of ensuring that the work we do is useful, engaging, vibrant and important. I want to use creative thinking to produce output that opens up BL digital collections in relevant, culturally sensitive and engaging ways.

I am excited to get started! I’ll be blogging here regularly about my residency, so please do subscribe to this blog to follow my progress.

This post is by Wikimedian in Residence Lucy Hinnie (@BL_Wikimedian)

19 March 2021

The game was ne'er so fair

Add comment

The works and worlds created by Shakespeare have an enduring appeal, his writing emotionally resonates with audiences today, despite being written over four hundred year's ago. This week is Shakespeare Week, and today is also the first day of the London Games Festival, so a perfect time to reflect on some interactive digital adaptations of the bard's plays. 

Back in 2016 here in the British Library we ran a Shakespeare themed Off the Map competition, which set students a task of creating video games and virtual interactive environments using digitised British Library items, including maps, views, texts, book illustrations and recorded sounds as creative inspiration. The first place winning entry by Team Quattro from De Montfort University in Leicester, created an adaptation of The Tempest, you can see a flythrough video clip of their stunning work here.

In this competition, Tom Battey who was then a student at the London College of Communication was awarded second place with a game called Midsummer based on the characters in the play A Midsummer Night’s Dream and which used digitised engravings from John Boydell’s Shakespeare Gallery. This is a clever work, set in a magical woodland, where trees and bushes generate as the player wanders through the game. The player has the power to enchant and disenchant characters they meet to fall in love with each other, or not! The dialogue between these characters then changes depending on whether they are lovestruck, you can watch a demo of this game here.

cake decorated with image from opening screen of MissionMaker Macbeth
Cake from MissionMaker Macbeth launch event at the British Library in 2019

Another digital Shakespeare project, which the Library has been involved in, is MissionMaker Macbeth, a game-authoring tool, developed by the MAGiCAL team, from D.A.R.E. enterprise at the UCL Knowledge Lab, which was launched at the British Library for the London Games Festival in 2019. Built using Unity, this software incorporates characters, landscapes, objects and even cauldron ingredients for children to make digital games based on Shakespeare’s Macbeth. For anyone wanting to read more about this project, Andrew Burn has written a book Literature, Videogames and Learning, which is due to be published on 20th July 2021.

If virtual woodland walks are your thing, although not Shakespeare related, you may want to explore Faint Signals, by Invisible Flock, this is an interactive website, where you can wander through the woodland as it changes through all four seasons, and evolves from day to night. Also, if you are reading this in time, you may be able to catch a live online performance of Dream, by the Royal Shakespeare Company. This Midsummer Night’s Dream inspired 50-minute online event is set in a virtual midsummer forest, which offers participants a unique opportunity to directly influence the live performance, read more about this here.

For literature loving games makers, you may want to take part in Leeds Libraries upcoming online Novels That Shaped Our World Games Jam, which is running on Saturday 24th and Sunday 25th April 2021. This jam invites people to create games inspired by the BBC’s 100 Novels That Shaped Our World. They have planned an inspiring programme of online events connected to this jam, tickets will be available from Monday 22nd March 2021 from their Eventbrite page: leeds-libraries.eventbrite.com.

Leeds Libraries events games jam programme
Leeds Libraries games jam events programme

This post is by Digital Curator Stella Wisdom (@miss_wisdom)

15 March 2021

Competition to Proofread Bengali Books on Wikisource

Add comment

Can you read and write in Bangla? Or should I say আপনি কি বাংলা পড়তে এবং লিখতে পারেন? If you were able to read that, congratulations, you are the perfect candidate!

You might be interested in a competition we have launched today asking for help to proofread text that has been automatically transcribed from our historical Bengali books. The competition, in partnership with the West Bengal Wikimedians User Group, and the Bengali Wikisource community, will run until 14th April and invites contributors to create perfect transcriptions of the books.  

More information is available on the Wikisource competition page, including how to get started and prizes on offer.

The books have been digitised through our Two Centuries of Indian Print project, with more than 25 uploaded to Wikisource, an online and free-content digital library where it is possible to view the digitised books and corresponding transcriptions side-by-side. We were inspired by a talk given by the National Library of Scotland who uploaded some of their collections to Wikisource, and thought it could be a useful platform to increase online access to the textual content in our books too.

 

2CIPBook_Wikisource

Above: A view of a Bengali book within the Wikisource platform showing digitised page [R] and transcription [L]

 

Luckily a lot of the transcription work has already been done through using Google’s Optical Character Recognition technology (OCR) to read the Bengali text. However, the results are not perfect, with words in the original books often misspelled in the OCR. That’s where we need human intervention to proofread the OCR and fix the mistakes.

We also want to export proofread transcriptions from Wikisource and make them available as a dataset that could prove interesting to researchers who want to mine thousands of pages of text.

The books we would like proofread cover a multitude of topics and include an adaptation of the Illiad, a book containing a collection of 19th century proverbs and sayings, and a work describing the Bratas fasting ceremonies observed by the Hindu women of what is now Bangladesh. So, if you are looking for a literary indulgence whilst at the same time helping to improve access for others to valuable historical material, this could be an ideal opportunity.

 

This post is by Digital Curator Tom Derrick (@TommyID83)

19 February 2021

AURA Research Network Second Workshop Write-up

Add comment

Keen followers of this blog may remember a post from last December, which shared details of a virtual workshop about AI and Archives: Current Challenges and Prospects of Digital and Born-digital archives. This topic was one of three workshop themes identified by the Archives in the UK/Republic of Ireland & AI (AURA) network, which is a forum promoting discussions on how Artificial Intelligence (AI) can be applied to cultural heritage archives, and to explore issues with providing access to born digital and hybrid digital/physical collections.

The first AURA workshop on Open Data versus Privacy organised by Annalina Caputo from Dublin City University, took place on 16-17 November 2020. Rachel MacGregor provides a great write-up of this event here.

Here at the British Library, we teamed up with our friends at The National Archives to curate the second AURA workshop exploring the current challenges and prospects of born-digital archives, this took place online on 28-29 January 2021. The first day of the workshop held on 28 January was organised by The National Archives, you can read more about this day here, and the following day, 29 January, was organised by the BL, videos and slides for this can be found on the AURA blog and I've included them in this post.

AURA

The format for both days of the second AURA workshop comprised of four short presentations, two interactive breakout room sessions and a wider round-table discussion. The aim being that the event would generate dialogue around key challenges that professionals across all sectors are grappling with, with a view to identifying possible solutions.

The first day covered issues of access both from infrastructural and user’s perspectives, plus the ethical implications of the use of AI and advanced computational approaches to archival practices and research. The second day discussed challenges of access to email archives, and also issues relating to web archives and emerging format collections, including web-based interactive narratives. A round-up of  the second day is below, including recorded videos of the presentations for anyone unable to attend on the day.

Kicking off day two, a warm welcome to the workshop attendees was given by Rachel Foss, Head of Contemporary Archives and Manuscripts at the British Library, Larry Stapleton, Senior academic and international consultant from the Waterford Institute of Technology and Mathieu d’ Aquin, Professor of Informatics at the National University of Ireland Galway.

The morning session on Email Archives: challenges of access and collaborative initiatives was chaired by David Kirsch, Associate Professor, Robert H. Smith School of Business, University of Maryland. This featured two presentations:

The first of these was  about Working with ePADD: processes, challenges and collaborative solutions in working with email archives, by Callum McKean, Curator for Contemporary Literary and Creative Archives, British Library and Jessica Smith, Creative Arts Archivist, John Rylands Library, University of Manchester. Their slides can be viewed here and here. Apologies that the recording of Callum's talk is clipped, this was due to connectivity issues on the day.

The second presentation was Finding Light in Dark Archives: Using AI to connect context and content in email collections by Stephanie Decker, Professor of History and Strategy, University of Bristol and Santhilata Venkata, Digital Preservation Specialist & Researcher at The National Archives in the UK.

After their talks, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the morning session were:

  1. Are there any other appraisal or collaborative considerations that might improve our practices and offer ways forward?
  2. What do we lose by emphasizing usability for researchers?
  3. Should we start with how researchers want to use email archives now and in the future, rather than just on preservation?
  4. Potentialities of email archives as organizational, not just individual?

These questions led to discussions about, file formats, collection sizes, metadata standards and ways to interpret large data sets. There was interest in how email archives might allow researchers to reconstruct corporate archives, e.g. understand social dynamics of the office and understand decision making processes. It was felt that there is a need to understand the extent to which email represents organisation-level context. More questions were raised including:

  • To what extent is it part of the organisational records and how should it be treated?
  • How do you manage the relationship between constant organisational functions and structure (a CEO) and changing individuals?
  • Who will be looking at organisational email in the future and how?

It was mentioned that there is a need to distinguish between email as data and email as an artifact, as the use-cases and preservation needs may be markedly different.

Duties of care that exist between depositors, tool designers, archivists and researchers was discussed and a question was asked about how we balance these?

  • Managing human burden
  • Differing levels of embargo
  • Institutional frameworks

There was discussion of the research potential for comparing email and social media collections, e.g. tweet archives and also the difficulties researchers face in getting access to data sets. The monetary value of email archives was also raised and it was mentioned that perceived value, hasn’t been translated into monetary value.

Researcher needs and metadata was another topic brought up by attendees, it was suggested that the information about collections in online catalogues needs to be descriptive enough for researchers to decide if they wish to visit an institution, to view digital collections on a dedicated terminal. It was also suggested that archives and libraries need to make access restrictions, and the reasoning for these, very clear to users. This would help to manage expectations, so that researchers will know when to visit on-site because remote access is not possible. It was mentioned that it is challenging to identify use cases, but it was noted that without deeper understanding of researcher needs, it can be hard to make decisions about access provision.

It was acknowledged that the demands on human-processing are still high for born digital archives, and the relationship between tools and professionals still emergent. So there was a question about whether researchers could be involved in collaborations more, and to what extent will there be an onus on their responsibilities and liabilities in relation to usage of born digital archives?

Lots of food for thought before the break for lunch!

The afternoon session chaired by Nicole Basaraba, Postdoctoral Researcher, Studio Europa, Maastricht University, discussed Emerging Formats, Interactive Narratives and Socio-Cultural Questions in AI.

The first afternoon presentation Collecting Emerging Formats: Capturing Interactive Narratives in the UK Web Archive was given by Lynda Clark, Post-doctoral research fellow in Narrative and Play at InGAME: Innovation for Games and Media Enterprise, University of Dundee, and Giulia Carla Rossi, Curator for Digital Publications, British Library. Their slides can be viewed here.  

The second afternoon session was Women Reclaiming AI: a collectively designed AI Voice Assistant by Coral Manton, Lecturer in Creative Computing, Bath Spa University, her slides can be seen here.

Following the same format as in the morning, after these presentations, the speakers proposed questions and challenges that attendees could discuss in smaller break-out rooms. Questions given by speakers of the afternoon session were:

  1. Should we be collecting examples of AIs, as well as using AI to preserve collections? What are the Implications of this
  2. How do we get more people to feel that they can ask questions about AI?
  3. How do we use AI to think about the complexity of what identity is and how do we engineer it so that technologies work for the benefit of everyone?

There was a general consensus, which acknowledged that AI is becoming a significant and pervasive part of our life. However it was felt that there are many aspects we don't fully understand. In the breakout groups workshop participants raised more questions, including:

  • Where would AI-based items sit in collections?
  • Why do we want it?
  • How to collect?
  • What do we want to collect? User interactions? The underlying technology? Many are patented technologies owned by corporations, so this makes it challenging. 
  • What would make AI more accessible?
  • Some research outputs may be AI-based - do we need to collect all the code, or just the end experience produced? If the latter, could this be similar to documenting evidence e.g. video/sound recordings or transcripts.
  • Could or should we use AI to collect? Who’s behind the AI? Who gets to decide what to archive and how? Who’s responsible for mistakes/misrepresentations made by the AI?

There was debate about how to define AI in terms of a publication/collection item, it was felt that an understanding of this would help to decide what archives and libraries should be collecting, and understand what is not being collected currently. It was mentioned that a need for user input is a critical factor in answering questions like this. A number of challenges of collecting using AI were raised in the group discussions, including:

  • Lack of standardisation in formats and metadata
  • Questions of authorship and copyright
  • Ethical considerations
  • Engagement with creators/developers

It was suggested that full scale automation is not completely desirable and some kind of human element is required for specialist collections. However, AI might be useful for speeding up manual human work.

There was discussion of problems of bias in data, that existing prejudices are baked into datasets and algorithms. This led to more questions about:

  • Is there is a role for curators in defining and designing unbiased and more representative data sets to more fairly reflect society?
  • Should archives collect training data, to understand underlying biases?
  • Who is the author of AI created text and dialogue? Who is the legally responsible person/orgnisation?
  • What opportunities are there for libraries and archives to teach people about digital safety through understanding datasets and how they are used?

Participants also questioned:

  • Why do we humanise AI?
  • Why do we give AI a gender?
  • Is society ready for a genderless AI?
  • Could the next progress in AI be a combination of human/AI? A biological advancement? Human with AI “components” - would that make us think of AIs as fallible?

With so many questions and a lack of answers, it was felt that fiction may also help us to better understand some of these issues, and Rachel Foss ended the roundtable discussion by saying that she is looking forward to reading Kazuo Ishiguro’s new novel Klara and the Sun, about an artificial being called Klara who longs to find a human owner, which is due to be published next month by Faber.

Thanks to everyone who spoke at and participated in this AURA workshop, to make it a lively and productive event. Extra special thanks to Deirdre Sullivan for helping to run the online event smoothly. Looking ahead, the third workshop on Artificial Intelligence and Archives: What comes next? is being organised by the University of Edinburgh in partnership with the AURA project team, and is scheduled to take place on Tuesday 16 March 2021. Please do join the AURA mailing list and follow #AURA_network on social media to be part of the network's ongoing discussions.

This post is by Digital Curator Stella Wisdom (@miss_wisdom)

11 February 2021

Investigating Instances of Arabic Verb Form X in the BL/QFP Translation Memory

Add comment

The Arabic language has a root+pattern morphology where words are formed by casting a (usually 3-letter) root into a morphological template of affixed letters in the beginning, middle and/or end of the word. While most of the meaning comes from the root, the template itself adds a layer of meaning. For our latest Hack Day, I investigated uses of Arabic Verb Form X (istafʿal) in the BL/QFP Translation Memory.

I chose this verb form because it conveys the meaning of seeking or acquiring something for oneself, possibly by force. It is a transitive verb form where the subject may be imposing something on the object and can therefore convey subtle power dynamics. For example, it is the form used to translate words such as ‘colonise’ (yastaʿmir) and ‘enslave’ (yastaʿbid). I wanted to get a sense of whether this form could reflect unconscious biases in our translations – an extension of our work in the BLQFP team to address problematic language in cataloguing and translation.

The other reason I chose this verb form is that it is achieved by affixing three consonants to the beginning of the word, which made it possible to search for in our Translation Memory (TM). The TM is a bilingual corpus, stretching back to 2014, of the catalogue descriptions we translate for the digitised India Office Records and Arabic scientific manuscripts on the QDL. We access the TM through our translation management system (memoQ), which offers some basic search functionalities. This includes a ‘wild card’ option where the results list all the words that begin with the three Form X consonants under investigation (است* and يست*).

Snippet of results in memoQ using the wildcard search function
Figure 1: Snippet of results in memoQ using the wildcard search function.

 

My initial search using these two 3-letter combinations returned 2,140 results. I noticed that there were some recurring false positives such as certain place names and the Arabic calque of ‘strategy’ (istrātījiyyah). The most recurring false positive (699 counts), however, was the Arabic verb for ‘receive’ (istalam) – which is unsurprising given frequent references to correspondences being sent and received in catalogue descriptions of IOR files. What makes this verb a false positive is that the ‘s’ is in fact a root consonant, and therefore the verb actually belongs to Form VIII (iftaʿal). 

After eliminating these false positives, I ended up with 1349 matches. From these, I was able to identify 55 unique verbs used in relation to IOR files. I then conducted a more targeted search of three cases of each verb: the perfective (past) istafʿal, the imperfective (present) yastafʿil, and the verbal noun (istifʿāl). I used the wild card function again to capture variations of these cases with suffixes attached (e.g. pronoun or plural suffixes). Although these would have been useful too, I did not look for the active (mustafʿil) and passive (mustafʿal) participles because the single short vowel that differentiates them is rarely represented in Arabic writing. Close scrutiny of the context of each result would have been needed in order to assign them correctly, and I did not have enough time for that in a single day.

List of the Form X verbs found in the TM and their frequency (excluding six verbs that only occur once)
Figure 2: List of the Form X verbs found in the TM and their frequency (excluding six verbs that only occur once)

 

I made a note of the original English term(s) that the Form X verb was used to translate. I then identified seven potentially problematic verbs that required further investigation. These six verbs typically convey an action that is being either forcefully or wrongfully imposed.

Seven potentially problematic verbs that take Form X in the TM
Figure 3: Seven potentially problematic verbs that take Form X in the TM

 

My next step was to investigate the use of these verbs in context more closely. I looked at the most frequent of these verbs (istawlá/yastawlī/istīlaʾ) in our TM, first using the source + target view, and then the three-column concordance view of the target text. The first view allowed me to scrutinise how we have been employing this verb vis-à-vis the original term used in the English catalogue description. It struck me that, in some cases, more neutral verbs such as ‘take’ and ‘take possession of’ were used on the English side; meaning that bias was introduced during translation.

Source + target view of concordance results for the verb istawlá
Figure 4: Source + target view of concordance results for the verb istawlá

 

The second view makes it possible to see the text immediately preceding and succeeding the verb, typically displaying the assigned subject and object of the verb. It therefore shows who is doing what to whom more clearly, even though the script direction goes a bit awry for Arabic. Here, I noticed that the subjects were disproportionately non-British: it is overwhelmingly native rulers and populations, ‘pirates’, and rival countries who were doing the forceful or wrongful taking in the results. This may indicate an unconscious bias that has travelled from the primary sources to the catalogue descriptions and is something that requires further investigation.

Three-column view of concordance results for the verb istawlá
Figure 5: Three-column view of concordance results for the verb istawlá

 

My hack day investigation was conducted in the spirit of continuous reflection on and improvement of our translation process. Using a verb form rather than specific words as a starting point provided an aggregate view of our practices, which is useful in trying to tease out how the descriptions on the QDL may collectively convey an overall stance or attitude. My investigation also demonstrates the value of our TM, not only for facilitating and maintaining consistency in translation, but as a research tool with countless possibilities. My findings from the hack day are naturally rough-and-ready, but they provide the seed for further conversations about problematic language and unconscious bias among translators and cataloguers.

This is a guest post by linguist and translator Dr Mariam Aboelezz (@MariamAboelezz), Translation Support Officer, BL/QFP Project

02 February 2021

Legacies of Catalogue Descriptions and Curatorial Voice: training materials and next steps

Add comment

Over the past year British Library staff have contributed to the AHRC-funded project "Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship". Led by James Baker, University of Sussex, the project set out to demonstrate the value of corpus linguistic methods and computational tools to the study of legacies of catalogues and cultural institutions’ identities. In a previous blogpost James explained the historical research that shaped this project and outlined the original work plan which was somewhat disrupted by the pandemic.

As we approach the end of the first phase of this AHRC project, we want to share the news about the completion as part of this project of the training module on Computational Analysis of Catalogue Data. The materials take into account the interests and needs of the community for which it is intended. In July James and I delivered a couple of training sessions over Zoom for a group of GLAM professionals, some of whom had previously shown interest in our approach to catalogue data by attending Baker’s British Academy-funded “Curatorial Voice” project.

Screenshot from the December session on Zoom showing a task within the training module
Screenshot from the December session on Zoom showing a task within the training module.

 

In response to feedback from these sessions we updated the lessons to query data derived from descriptions of photographs held at the British Library. This dataset reflects better the diversity of catalogue records created by different cataloguers and curators over time. British Library staff then took part in a Hack-and-Yack session which demonstrated the use of AntConc and approaches from computational linguistics for the purposes of examining the Library’s catalogue data and how this could enable catalogue related work. This was welcomed by curators, cataloguers and other collections management staff who saw value in trying this out with their own catalogue data for the purpose of improving its quality, identifying patterns and ultimately making it more accessible to users. In December, the near-finished module was presented over Zoom to a wider group of GLAM professionals from the UK, US and Turkey.

Screenshot from the December session demonstrating how to use the concordance tool in AntConc
Screenshot from the December session demonstrating how to use the concordance tool in AntConc.




We hope that the training module will be widely used and further developed by the community and are delighted that it has already been referenced in a resource for researchers in the Humanities and Social Sciences at the University of Edinburgh. In terms of next steps, the AHRC has granted an extension for holding some partnership development activities with our partners at Yale University and delivering the end-of-project workshop which will hopefully lead to future collaborations in this space.

Screenshot showing James Baker delivering the December training session on Zoom with participants' appreciative comments in the chat
James Baker delivering the December training session on Zoom which participants found really useful.




Personally, I gained a lot from this fruitful collaboration with James and Andrew Salway as it gave me a first-hand experience of developing a “Carpentries-style” lesson, understanding how AntConc works, and applying corpus linguistic methods. I want to thank [British Library staff who took part in the training sessions and in particular those colleagues who supplied catalogue data and shared curatorial expertise: Alan Danskin, Victoria Morris, Nicolas Moretto, Malini Roy, Cam Sharp Jones and Adrian Edwards.

This post is by Digital Curator Rossitza Atanassova (@RossiAtanassova)