Digital scholarship blog

Enabling innovative research with British Library digital collections

Introduction

Tracking exciting developments at the intersection of libraries, scholarship and technology. Read more

16 September 2024

memoQfest 2024: A Journey of Innovation and Connection

Attending memoQfest 2024 as a translator was an enriching and insightful experience. Held from 13 to 14 June in Budapest, Hungary, the event stood out as a hub for language professionals and translation technology enthusiasts. 

Streetview 1 of Budapest, near the venue for memoQfest 2024. Captured by the author

Streetview 2 of Budapest, near the venue for memoQfest 2024. Captured by the author
Streetviews of Budapest, near the venue for memoQfest 2024. Captured by the author

 

A Well-Structured Agenda 

The conference had a well-structured agenda with over 50 speakers, including two keynotes, who brought valuable insights into the world of translation.  

Jay Marciano, President of the Association for Machine Translation in the Americas (AMTA), delivered his highly anticipated presentation on understanding generative AI and large language models (LLMs). While he acknowledged their significant potential, Marciano expressed only cautious optimism on their future in the industry, stressing the need for a deeper understanding of the limitations. As he laid out, machines can translate faster but the quality of their output depends greatly on the quality of the training data, especially in certain domains or for specific clients. He believes that translators’ role will evolve so that they will become more involved with data curation, than with translation itself, to improve the quality of machine output. 

Dr Mike Dillinger, the former Technical Lead for Knowledge Graphs in the AI Division at LinkedIn, and now a technical advisor and consultant, also delved into the challenges and opportunities presented by AI-generated content in his keynote speech, The Next 'New Normal' for Language Services.  Dillinger holds a nuanced perspective on the intersection of AI, machine translation (MT), and knowledge graphs. As he explained, knowledge graphs can be designed to integrate, organize, and provide context for large volumes of data. They are particularly valuable because they go beyond simple data storage, embedding rich relationships and context. They can therefore make it easier for AI systems to process complex information, enhancing tasks like natural language processing, recommendation engines, and semantic search.  

Dillinger therefore advocated for the integration of knowledge graphs with AI, arguing that high-quality, context-rich data is crucial for improving the reliability and effectiveness of AI systems. Knowledge graphs can significantly enhance the capabilities of LLMs by grounding language in concrete concepts and real-world knowledge, thereby addressing some of the current limitations of AI and LLMs. He concluded that, while LLMs have made significant strides, they often lack true understanding of the text and context. 

 

Enhancing Translation Technology for BLQFP 

The event also offered hands-on demonstrations of memoQ's latest features and updates such as significant improvements to the In-country Review tool (ICR), a new filter for Markdown files, and enhanced spellcheck.  

Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner
Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner

 

 

As a participant, I was keen to explore how some of these features could be used to enhance translation processes at the British Library. For example, could machine translation (MT) be used to translate catalogue records? Over the last twelve years, the translation team of the British Library/Qatar Foundation Partnership project has built up a massive translation memory (TM) – a bilingual repository of all our previous translations. A machine could be trained on our terminology and style, using this TM and our other bilingual resources, such as our vast and growing term base (TB). With appropriate data curation, MT could be a cost-effective and efficient way to maximise our translation operations. 

There are challenges, however. For example, before it can be used to train a machine, our TM would need to be edited and cleaned, removing repetitive and inappropriate content. We would need to choose the most appropriate translations, while maintaining proper alignment between segments. The same applies to our TB, which would need to be curated. Some of these data curation tasks cannot be pursued at this time, as we remain without access to much of our data following the cyberattack incident. Moreover, these careful preparatory steps would not suffice, as any machine output would still need to be post-edited by skilled human translators. As both the conference’s keynote speakers agreed, it is not yet a simple matter of letting the machines do the work. 

 This blog post is by Musa Alkhalifa Alsulaiman, Arabic Translator, British Library/Qatar Foundation Partnership. 

28 August 2024

Open and Engaged 2024: Empowering Communities to Thrive in Open Scholarship

 British Library is delighted to host its annual Open and Engaged Conference on Monday 21 October, in-person and online, as part of the International Open Access Week. The Conference is supported by the Arts and Humanities Research Council (AHRC) and Research Libraries UK (RLUK).  

Save the Date flyer for Open & Engaged 2024 on 21 October, in person and online, and with logos for sponsors UKRI, Ars and Humanities Research Council and RLUK

 

Open and Engaged 2024: Empowering Communities to Thrive in Open Scholarship will centre leveraging the power of communities in the axis of open scholarship, open infrastructure, emerging technologies, collections as data, equity and integrity, skills development and sustainable models to elevate research of all kinds for the public good. We take a cross sectoral approach to the conference programme – unifying around shared-values for openness – by reflecting on practices within research libraries both in higher education and GLAM (Galleries, Libraries, Archives, Museums) sectors as well as the national and public libraries.  

Open and Engaged 2024 is supported by the Arts and Humanities Research Council (AHRC) and Research Libraries UK (RLUK). Everyone interested in the conference topics is welcome to join us on Monday, 21 October! 

This will be a hybrid event taking place at the British Library’s Knowledge Centre in St. Pancras, London, and streamed online for those unable to attend in-person. 

The event will be recorded and recordings made available in the British Library’s Research Repository.

 

Registration

Please register for Open and Engaged 2024 by filling out this form. Registration will close on Friday 4 October for in-person attendance and Thursday 17 October for online attendance at 18:00 BST.  

Registrants will be contacted with details for either in-person attendance or a link to access the online stream closer to the event. 

 

Provisional Programme 

Please note that the conference program is subject to updates as we finalize the lineup of speakers.

09:30  Registration

10:00  Welcome remarks

10:10  Opening keynote panel: Cross disciplinary approach to open scholarship

10:50  Equity and inclusivity in the age of Artificial Intelligence

11:40  Break

12:10  Deepening partnership through shared values

13:00  Lunch

13:45  Open repositories for research of all kinds

14:45  Break

15:15  Enabling collections as data: from policy to practice 

16:15  Closing keynote speech: Future role of libraries as open and inclusive digital (and physical) spaces

16:45 Closing remarks

17:00 Networking session

19:00  End

The hashtag for the event is #OpenEngaged on social media platform of your choice. If you have any questions, please contactus at [email protected].  

26 July 2024

Charting the European D-SEA Conference at the Stabi

This blog post is by Dr Adi Keinan-Schoonbaert, Digital Curator for Asian and African Collections, British Library. She's on Mastodon as @[email protected]. 

 

Earlier this month, I had the pleasure of attending the “Charting the European D-SEA: Digital Scholarship in East Asian Studies” conference held at the Berlin State Library (Staatsbibliothek zu Berlin), also known as the Stabi. The conference, held on 11-12 July 2024, aimed to fill a gap in the European digital scholarship landscape by creating a research community and a space for knowledge exchange on digital scholarship issues across humanities disciplines concerned with East Asian regions and languages.

The event was a dynamic fusion of workshops, presentations and panel discussions. Over three days of workshops (8-10 July), participants were introduced to key digital methods, resources, and databases. These sessions aimed to transmit practical knowledge in digital scholarship, focusing on East Asian collections and data. The subsequent two days were dedicated to the conference proper, featuring a broad range of presentations on various themes.

The reading room in the Berlin State Library, Haus Potsdamer Straße
The reading room in the Berlin State Library, Haus Potsdamer Straße

 

DH and East Asian Studies in Europe and Beyond

Conference organisers Jing Hu and Brent Ho from the Stabi, and Shih-Pei Chen and Dagmar Schäfer from the Max Planck Institute for the History of Science (MPIWG), set the stage for an enriching exchange of ideas and knowledge. The diversity of topics covered was impressive – from the more established digital resources and research tools to AI applications in historical research – the sessions provided a comprehensive overview of the current state and future directions of the field.

There were so many excellent presentations – and I often wished I could clone myself to attend parallel sessions! As expected, there was much focus on working with AI – machine learning and generative AI – and their potential in historical and humanities research. AI technologies offer powerful tools for data analysis and pattern recognition, and can significantly enhance research capabilities.

Damian Mandzunowski (Heidelberg University) talked about using AI to extract and analyse information from Chinese Comics
Damian Mandzunowski (Heidelberg University) talked about using AI to extract and analyse information from Chinese Comics
 
Shaojian Li (Renmin University of China) looked into automating the classification of pattern images using deep learning
Shaojian Li (Renmin University of China) looked into automating the classification of pattern images using deep learning

One notable session was "Reflections on Deep Learning & Generative AI," chaired by Brent Ho and discussed by Clemens Neudecker. The roundtable highlighted the evolving role of AI in humanities research. Calvin Yeh from MPIWG discussed AI's potential to augment, rather than just automate, research processes. He shared intriguing examples of using AI tools like ChatGPT to simulate group discussions and suggest research actions. Hongsu Wang from Harvard University presented on the use of Large Language Models and traditional Transformers in the China Biographical Database (CBDB) project, demonstrating the effectiveness of these models in data extraction and standardisation.

Calvin Yeh (MPIWG) discussed AI for “Augmentation, not only Automation” and experimented with ChatGPT discussing a research approach, designing a research process and simulating a group discussion
Calvin Yeh (MPIWG) discussed AI for “Augmentation, not only Automation” and experimented with ChatGPT discussing a research approach, designing a research process and simulating a group discussion
 
Hongsu Wang (Harvard University) talked about extracting and standardising data using LLMs and traditional Transformers in the CBDB project – here showcasing Jeffrey Tharsen’s research to create a network graph using a prompt in ChatGPT
Hongsu Wang (Harvard University) talked about extracting and standardising data using LLMs and traditional Transformers in the CBDB project – here showcasing Jeffrey Tharsen’s research to create a network graph using a prompt in ChatGPT

 

Exploring the Stabi

Our group tour in the Stabi was a personal highlight for me. This historic library, part of the Prussian Cultural Heritage Foundation, is renowned for its extensive collections and commitment to making digitised materials publicly accessible. The library operates from two major public sites – Haus Unter Den Linden and Haus Potsdamer Straße. Tours of both locations were available, but I chose to explore the more recent building, designed by Hans Scharoun and located in the Kulturforum on Potsdamer Straße in West Berlin – the history and architecture of which is fascinating.

A group of the conference delegates enjoying the tour of SBB’s Haus Potsdamer Straße
A group of the conference delegates enjoying the tour of SBB’s Haus Potsdamer Straße

I really enjoyed catching up with old colleagues and making new connections with fellow scholars passionate about East Asian digital humanities!

To conclude

In conclusion, the Charting European D-SEA Conference at the Stabi was an enriching experience, providing deep insights into the integration of digital methods in East Asian studies. It provided valuable insights into the advancements in digital scholarship and allowed me to connect with a global community of scholars. The combination of traditional and more recent digital practices, coupled with the forward-looking discussions on AI and deep learning, made this conference a significant milestone in the field. I look forward to seeing how these conversations evolve and contribute to the broader landscape of digital humanities.