15 March 2021
Competition to Proofread Bengali Books on Wikisource
Can you read and write in Bangla? Or should I say আপনি কি বাংলা পড়তে এবং লিখতে পারেন? If you were able to read that, congratulations, you are the perfect candidate!
You might be interested in a competition we have launched today asking for help to proofread text that has been automatically transcribed from our historical Bengali books. The competition, in partnership with the West Bengal Wikimedians User Group, and the Bengali Wikisource community, will run until 14th April and invites contributors to create perfect transcriptions of the books.
More information is available on the Wikisource competition page, including how to get started and prizes on offer.
The books have been digitised through our Two Centuries of Indian Print project, with more than 25 uploaded to Wikisource, an online and free-content digital library where it is possible to view the digitised books and corresponding transcriptions side-by-side. We were inspired by a talk given by the National Library of Scotland who uploaded some of their collections to Wikisource, and thought it could be a useful platform to increase online access to the textual content in our books too.
Above: A view of a Bengali book within the Wikisource platform showing digitised page [R] and transcription [L]
Luckily a lot of the transcription work has already been done through using Google’s Optical Character Recognition technology (OCR) to read the Bengali text. However, the results are not perfect, with words in the original books often misspelled in the OCR. That’s where we need human intervention to proofread the OCR and fix the mistakes.
We also want to export proofread transcriptions from Wikisource and make them available as a dataset that could prove interesting to researchers who want to mine thousands of pages of text.
The books we would like proofread cover a multitude of topics and include an adaptation of the Illiad, a book containing a collection of 19th century proverbs and sayings, and a work describing the Bratas fasting ceremonies observed by the Hindu women of what is now Bangladesh. So, if you are looking for a literary indulgence whilst at the same time helping to improve access for others to valuable historical material, this could be an ideal opportunity.
This post is by Digital Curator Tom Derrick (@TommyID83)