Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

14 October 2024

Research and Development activities in the Qatar Programme Imaging Team

This blog post is by members of the Imaging Team at British Library/Qatar Foundation Partnership (BLQFP) Programme: Eugenio Falcioni (Imaging and Digital Product Manager), Dominique Russell, Armando Ribeiro and Alexander Nguyen (Senior Imaging Technicians), Selene Marotta (Quality Management Officer), Matthew Lee and Virginia Mazzocato (Senior Imaging Support Technicians).

The Imaging Team has played a pivotal role in the British Library/Qatar Foundation Partnership (BLQFP) Programme since its launch in 2012. However, the journey has not been without hurdles. In October 2023, the infamous cyber-attack on the British Library severely disrupted operations across the organisation, impacting the Imaging Team profoundly. Inspired by the Library's Rebuild & Renew Programme, we used this challenging period to focus on research and development, refining our processes and deepening our understanding of the studio’s work practices. 

At the time of the attack, we were in the process of recruiting new members of the team who brought fresh energy, expertise, and enthusiasm. This also coincided with the appointment of a new Studio Manager. The formation of this almost entirely new team presented challenges as we adapted to the Library's disrupted environment. Yet, our synergy and commitment led us to find innovative ways of working.  Although the absence of an IT infrastructure, and therefore imaging hardware and software, posed significant difficulties for day-to-day activities in photography and digitisation, we had the time to focus on continuous improvement, without the usual pressures of deadlines. We enhanced our digitisation processes and expertise through a combination of quality improvements, strategic collaborations, and the development of innovative tools. Through teamwork and perseverance, we transformed adversity into an opportunity for growth. 

As an Imaging Team, we aim to create the optimal  digital surrogate of the items we capture. The BLQFP defined parameters for imaging which specify criteria such as colour and resolution accuracy, ensuring compliance with International Imaging Standards (such as FADGI or ISO 19264). 

During this unusual time, we focused on research and development into imaging standards, and updated our guidelines, resulting in a 150-page document detailing our workflow. This has improved consistency between setups and photographers, and has been fundamental in training new staff. We engaged in skills sharing workshops with Imaging Services, the Library’s core imaging department, and Heritage Made Digital (HMD), the Library’s department that manages digitisation workflows. 

Over the months, we have tested our images and setup, cameras, lighting, and colour targets, all while shooting directly to camera cards and using a laser measure device to check resolution (PPI). As a result of this work, we feel more confident in producing images that conform to International Imaging Standards; capturing images that truly represent the collection items. 

A camera stand with a bound volume with a colour target ruler on top and a laser device next to it.
Colour target on a bound volume

Alongside our testing, we arranged visits to imaging studios at other institutions where we shared our knowledge and learnt from the working processes of those who are digitising comparable collection material. During these visits, we gained a better understanding of the different imaging set-ups, the various international quality standards followed, and of how images produced are analysed. We also shared our approaches to capturing and stitching oversized items such as maps and foldouts. Lastly, we discussed quality assurance and workflow management tools. Overall, these visits across the sector have been a valuable exercise in making new connections, sharing ideas, and understanding that other institutions face similar problems when digitisation collection items. 

Without the use of dedicated digitisation software, the capture of items such as manuscripts and large bound volumes has been challenging as we have been unable to check the images we were producing. For this reason, we prioritised items of the collection which were less demanding and postponed the quality assurance checks to a later date. We chose to capture 78 rpm records as they required only two shots (front and back), minimising any possible mistakes. The imaging of audio collection items was our first achievement as a team since the cyber-attack: we digitised over 1100 shellac discs, in collaboration with the BLQFP Audio Team, who had previously catalogued and digitised the sound recording. 

A record with a green label reading Columbia
Image of a shellac disc (9CS0024993_ColumbiaGA3) digitised by the BLQFP

 Through this capture task we gained the optimism and confidence to start capturing more material, starting with the bindings of all the available bound collection items. The binding capture process is time-consuming and requires a specific setup and position of the item to photograph the front, back, spine, edge, head, and tail of each volume. By capturing bindings now, we will be able to streamline the process when we resume the digitisation of entire volumes.

A camera stand with a red-bound volume supported by a frame over cardboard
Capturing the spine of a bound volume, using l-shaped card on support frame

During this time, we were also involved in scoping work to locate and assess the most challenging items and plan a digitisation strategy accordingly. We focused particularly on identifying oversized maps and foldouts, which will be captured in sections and subsequently digitally stitched. This task required frequent visits to the Library’s basement storage areas and collaboration with the BLQFP Workflow Team to optimise and migrate data from the scoping process into existing workflow management systems. By gathering this data, we could determine the physical characteristics of each collection series and select the most suitable capture device. It was also crucial to collaborate with the BLQFP Conservation Team to develop new digitisation tools for capturing oversized foldouts more quickly and securely.

A volume with an insert, folded and unfolded, over two black foam supports

A volume with an insert, folded and unfolded, over two black foam supports
Using c-shaped Plastazote created by the BLQFP Conservation Team to support an oversized fold-out

The past nine months have presented many challenges for our Team. Nevertheless, in the spirit of Rebuild & Renew, we have been able to solve problems and develop creative ways of working, pulling together all our individual skills and experiences. As we expand, we have used this time productively to understand the intricacies of digitising fragile, complex, and oversized material while working to rigorous colour and quality standards. With the imminent return of imaging software, the next step for the BLQFP Imaging Team will be to apply our knowledge and understanding to a mass digitisation environment with the expectations of targets and monthly deliverables.

Team members standing around a stand on which a volume with a large foldout is prepared for photography, with lighting on both sides of the stand
Capturing a large foldout

 

16 September 2024

memoQfest 2024: A Journey of Innovation and Connection

Attending memoQfest 2024 as a translator was an enriching and insightful experience. Held from 13 to 14 June in Budapest, Hungary, the event stood out as a hub for language professionals and translation technology enthusiasts. 

Streetview 1 of Budapest, near the venue for memoQfest 2024. Captured by the author

Streetview 2 of Budapest, near the venue for memoQfest 2024. Captured by the author
Streetviews of Budapest, near the venue for memoQfest 2024. Captured by the author

 

A Well-Structured Agenda 

The conference had a well-structured agenda with over 50 speakers, including two keynotes, who brought valuable insights into the world of translation.  

Jay Marciano, President of the Association for Machine Translation in the Americas (AMTA), delivered his highly anticipated presentation on understanding generative AI and large language models (LLMs). While he acknowledged their significant potential, Marciano expressed only cautious optimism on their future in the industry, stressing the need for a deeper understanding of the limitations. As he laid out, machines can translate faster but the quality of their output depends greatly on the quality of the training data, especially in certain domains or for specific clients. He believes that translators’ role will evolve so that they will become more involved with data curation, than with translation itself, to improve the quality of machine output. 

Dr Mike Dillinger, the former Technical Lead for Knowledge Graphs in the AI Division at LinkedIn, and now a technical advisor and consultant, also delved into the challenges and opportunities presented by AI-generated content in his keynote speech, The Next 'New Normal' for Language Services.  Dillinger holds a nuanced perspective on the intersection of AI, machine translation (MT), and knowledge graphs. As he explained, knowledge graphs can be designed to integrate, organize, and provide context for large volumes of data. They are particularly valuable because they go beyond simple data storage, embedding rich relationships and context. They can therefore make it easier for AI systems to process complex information, enhancing tasks like natural language processing, recommendation engines, and semantic search.  

Dillinger therefore advocated for the integration of knowledge graphs with AI, arguing that high-quality, context-rich data is crucial for improving the reliability and effectiveness of AI systems. Knowledge graphs can significantly enhance the capabilities of LLMs by grounding language in concrete concepts and real-world knowledge, thereby addressing some of the current limitations of AI and LLMs. He concluded that, while LLMs have made significant strides, they often lack true understanding of the text and context. 

 

Enhancing Translation Technology for BLQFP 

The event also offered hands-on demonstrations of memoQ's latest features and updates such as significant improvements to the In-country Review tool (ICR), a new filter for Markdown files, and enhanced spellcheck.  

Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner
Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner

 

 

As a participant, I was keen to explore how some of these features could be used to enhance translation processes at the British Library. For example, could machine translation (MT) be used to translate catalogue records? Over the last twelve years, the translation team of the British Library/Qatar Foundation Partnership project has built up a massive translation memory (TM) – a bilingual repository of all our previous translations. A machine could be trained on our terminology and style, using this TM and our other bilingual resources, such as our vast and growing term base (TB). With appropriate data curation, MT could be a cost-effective and efficient way to maximise our translation operations. 

There are challenges, however. For example, before it can be used to train a machine, our TM would need to be edited and cleaned, removing repetitive and inappropriate content. We would need to choose the most appropriate translations, while maintaining proper alignment between segments. The same applies to our TB, which would need to be curated. Some of these data curation tasks cannot be pursued at this time, as we remain without access to much of our data following the cyberattack incident. Moreover, these careful preparatory steps would not suffice, as any machine output would still need to be post-edited by skilled human translators. As both the conference’s keynote speakers agreed, it is not yet a simple matter of letting the machines do the work. 

 This blog post is by Musa Alkhalifa Alsulaiman, Arabic Translator, British Library/Qatar Foundation Partnership. 

28 August 2024

Open and Engaged 2024: Empowering Communities to Thrive in Open Scholarship

 British Library is delighted to host its annual Open and Engaged Conference on Monday 21 October, in-person and online, as part of the International Open Access Week. The Conference is supported by the Arts and Humanities Research Council (AHRC) and Research Libraries UK (RLUK).  

Save the Date flyer for Open & Engaged 2024 on 21 October, in person and online, and with logos for sponsors UKRI, Ars and Humanities Research Council and RLUK

 

Open and Engaged 2024: Empowering Communities to Thrive in Open Scholarship will centre leveraging the power of communities in the axis of open scholarship, open infrastructure, emerging technologies, collections as data, equity and integrity, skills development and sustainable models to elevate research of all kinds for the public good. We take a cross sectoral approach to the conference programme – unifying around shared-values for openness – by reflecting on practices within research libraries both in higher education and GLAM (Galleries, Libraries, Archives, Museums) sectors as well as the national and public libraries.  

Open and Engaged 2024 is supported by the Arts and Humanities Research Council (AHRC) and Research Libraries UK (RLUK). Everyone interested in the conference topics is welcome to join us on Monday, 21 October! 

This will be a hybrid event taking place at the British Library’s Knowledge Centre in St. Pancras, London, and streamed online for those unable to attend in-person. 

The event will be recorded and recordings made available in the British Library’s Research Repository.

Registration

Please register here  for online attendance by Thursday 17 October at 18:00 BST. Registration for in-person attendance is closed.

Registrants will be contacted with details for either in-person attendance or a link to access the online stream closer to the event. 

Provisional Programme 

Please note that the conference program is subject to updates as we finalize the lineup of speakers.

09:30  Registration

10:00  Welcome remarks

10:10  Opening keynote panel: Cross disciplinary approach to open scholarship

10:50  Equity and inclusivity in the age of Artificial Intelligence

11:40  Break

12:10  Deepening partnership through shared values

13:00  Lunch

13:45  Open repositories for research of all kinds

14:45  Break

15:15  Enabling collections as data: from policy to practice 

16:15  Closing keynote speech: Future role of libraries as open and inclusive digital (and physical) spaces

16:45 Closing remarks

17:00 Networking session

19:00  End

The hashtag for the event is #OpenEngaged on social media platform of your choice. If you have any questions, please contactus at [email protected].