05 August 2022
Burmese Script Conversion using Aksharamukha
Curious about Myanmar (Burma)? Did you know that the British Library has a large collection of Burmese materials, including manuscripts dating back to the 17th century, early printed books, newspapers, periodicals, as well as current material?
You can search our main online catalogue Explore the British Library for printed material, or the Explore Archives and Manuscripts catalogue for manuscripts. But, to increase chances of discovering printed resources, you will need to search the Explore catalogue by typing in the transliteration of the Burmese title and/or author using the Library of Congress romanisation rules. This means that searching for an item using the original Burmese script, or using what you would intuitively consider to be the romanised version of Burmese script, is not going to get you very far (not yet, anyway).
The reason for this is that this is how we catalogue Burmese collection items at the Library, following a policy to transliterate Burmese using the Library of Congress (LoC) rules. In theory, the benefit of this system specifically for Burmese is that it enables a two-way transliteration, i.e. the romanisation could be precisely reversed to give the Burmese script. However, a major issue arises from this romanisation system: romanised versions of Burmese script are so far removed from their phonetic renderings, that most Burmese speakers are completely unable to recognise any Burmese words.
With the LoC scheme being unintuitive for Burmese speakers, not reflecting the spoken language, British Library catalogue records for Burmese printed materials end up virtually inaccessible to users. And we’re not alone with this problem – other libraries worldwide holding Burmese collections and using the LoC romanisation scheme, face the same issues.
One useful solution to this could be to find or develop a tool that converts the LoC romanisation output into Burmese script, and vice versa – similar to how you would use Google Translate. Maria Kekki, our Curator for Burmese collections, have discovered the online tool Aksharamukha, which aims to facilitate conversion between various scripts – also referred to as transliteration (transliteration into Roman alphabet is particularly referred to as romanisation). It supports 120 scripts and 21 romanisation methods, and luckily, Burmese is one of them.
Using Aksharamukha has already been of great help to Maria. Instead of painstakingly converting Burmese script manually into its romanised version, she could now copy-paste the conversion and make any necessary adjustments. She also noticed making fewer errors this way! However, it was missing one important thing – the ability to directly transliterate Burmese script specifically using the LoC romanisation system.
Such functionality would not only save our curatorial and acquisitions staff a significant amount of time – but also help any other libraries holding Burmese collections and following the LoC guidelines. This would also allow Burmese speakers to find material in the library catalogue much more easily – readers will also use this platform to find items in our collection, as well as other collections around the world.
To this end, Maria got in touch with the developer of Aksharamukha, Vinodh Rajan – a computer scientist who is also an expert in writing systems, languages and digital humanities. Vinodh was happy to implement two things: (1) add the LoC romanisation scheme as one of the transliteration options, and (2) add spaces in between words (when it comes to spacing, according to the LoC romanisation system, there are different rules for words of Pali and English origin, which are written together).
Last month (July 2022) Vinodh implemented the new system, and what we can say, the result is just fantastic! Readers are now able to copy-paste transliterated text into the Library’s catalogue search box, to see if we hold items of interest. It is also a significant improvement for cataloguing and acquisition processes, being able to create acquisitions records and minimal records. As a next step, we will look into updating all of our Burmese catalogue records to include Burmese script (alongside transliteration), and consider a similar course of action for other South or Southeast Asian scripts.
I should mention that as a bonus, Aksharamukha’s codebase is fully open source, is available on GitHub and is well documented. If you have feedback or notice any bugs, please feel free to raise an issue on GitHub. Thank you, Vinodh, for making this happen!