Digital scholarship blog

Enabling innovative research with British Library digital collections

1 posts from September 2024

16 September 2024

memoQfest 2024: A Journey of Innovation and Connection

Attending memoQfest 2024 as a translator was an enriching and insightful experience. Held from 13 to 14 June in Budapest, Hungary, the event stood out as a hub for language professionals and translation technology enthusiasts. 

Streetview 1 of Budapest, near the venue for memoQfest 2024. Captured by the author

Streetview 2 of Budapest, near the venue for memoQfest 2024. Captured by the author
Streetviews of Budapest, near the venue for memoQfest 2024. Captured by the author

 

A Well-Structured Agenda 

The conference had a well-structured agenda with over 50 speakers, including two keynotes, who brought valuable insights into the world of translation.  

Jay Marciano, President of the Association for Machine Translation in the Americas (AMTA), delivered his highly anticipated presentation on understanding generative AI and large language models (LLMs). While he acknowledged their significant potential, Marciano expressed only cautious optimism on their future in the industry, stressing the need for a deeper understanding of the limitations. As he laid out, machines can translate faster but the quality of their output depends greatly on the quality of the training data, especially in certain domains or for specific clients. He believes that translators’ role will evolve so that they will become more involved with data curation, than with translation itself, to improve the quality of machine output. 

Dr Mike Dillinger, the former Technical Lead for Knowledge Graphs in the AI Division at LinkedIn, and now a technical advisor and consultant, also delved into the challenges and opportunities presented by AI-generated content in his keynote speech, The Next 'New Normal' for Language Services.  Dillinger holds a nuanced perspective on the intersection of AI, machine translation (MT), and knowledge graphs. As he explained, knowledge graphs can be designed to integrate, organize, and provide context for large volumes of data. They are particularly valuable because they go beyond simple data storage, embedding rich relationships and context. They can therefore make it easier for AI systems to process complex information, enhancing tasks like natural language processing, recommendation engines, and semantic search.  

Dillinger therefore advocated for the integration of knowledge graphs with AI, arguing that high-quality, context-rich data is crucial for improving the reliability and effectiveness of AI systems. Knowledge graphs can significantly enhance the capabilities of LLMs by grounding language in concrete concepts and real-world knowledge, thereby addressing some of the current limitations of AI and LLMs. He concluded that, while LLMs have made significant strides, they often lack true understanding of the text and context. 

 

Enhancing Translation Technology for BLQFP 

The event also offered hands-on demonstrations of memoQ's latest features and updates such as significant improvements to the In-country Review tool (ICR), a new filter for Markdown files, and enhanced spellcheck.  

Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner
Interior of the Pesti Vigado, Budapest's second largest concert hall, and venue for the memoQfest Gala dinner

 

 

As a participant, I was keen to explore how some of these features could be used to enhance translation processes at the British Library. For example, could machine translation (MT) be used to translate catalogue records? Over the last twelve years, the translation team of the British Library/Qatar Foundation Partnership project has built up a massive translation memory (TM) – a bilingual repository of all our previous translations. A machine could be trained on our terminology and style, using this TM and our other bilingual resources, such as our vast and growing term base (TB). With appropriate data curation, MT could be a cost-effective and efficient way to maximise our translation operations. 

There are challenges, however. For example, before it can be used to train a machine, our TM would need to be edited and cleaned, removing repetitive and inappropriate content. We would need to choose the most appropriate translations, while maintaining proper alignment between segments. The same applies to our TB, which would need to be curated. Some of these data curation tasks cannot be pursued at this time, as we remain without access to much of our data following the cyberattack incident. Moreover, these careful preparatory steps would not suffice, as any machine output would still need to be post-edited by skilled human translators. As both the conference’s keynote speakers agreed, it is not yet a simple matter of letting the machines do the work. 

 This blog post is by Musa Alkhalifa Alsulaiman, Arabic Translator, British Library/Qatar Foundation Partnership.