15 July 2011
Earlier this week Lynda and I attended a demonstration day for the IMPACT project (Improving Access to Text). Funded by the European Commission, the project has investigated the issues facing mass digitisation projects, specifically those working with historical documents.
The demonstration day provided an overview of the project, and gave participants the opportunity to see some of the project-developed tools in action.
In the morning two demonstrations were given from a suite of tools intended to prepare digitised images for Optical Character Recognition (OCR): a tool for border removal, and one for correcting geometric errors like page curl.
The afternoon session involved an in depth discussion about the development of lexical resources to improve OCR accuracy and search capability. Beginning with an introductory talk for non-experts, the session highlighted tools developed to address potential obstacles to accurate OCR, such as the need to identify and accommodate historical spelling variations.
The IMPACT project will be hosting a conference in October to formally launch the project resources and a Centre for Competence, which will provide advice and guidance and share best practice. Slideshow presentations from the demo day can be found on the IMPACT project blog.