Last week I came back from my very first ICDAR conference, inspired and energised for things to come! The International Conference on Document Analysis and Recognition (ICDAR) is the main international event for scientists and practitioners involved in document analysis and recognition. Its 17th edition was held in San José, California, 21-26 August 2023.
ICDAR 2023 featured a three-day conference, including several competitions to challenge the field, as well as post-conference workshops and tutorials. All conference papers were made available as conference proceedings with Springer. 155 submissions were selected for inclusion into the scientific programme of ICDAR 2023, out of which 55 were delivered as oral presentations, and 100 as posters. The conference also teamed up with the International Journal of Document Analysis and Recognition (IJDAR) for a special journal track. 13 papers were accepted and published in a special issue entitled “Advanced Topics of Document Analysis and Recognition,” and were included as oral presentations in the conference programme. Do have a look at the programme booklet for more information!
Each conference day included a thought-provoking keynote talk. The first one, by Marti Hearst, Professor and Interim Dean of the UC Berkeley School of Information, was entitled “A First Look at LLMs Applied to Scientific Documents.” I learned about three platforms using Natural Language Processing (NLP) methods on PDF documents: ScholarPhi, Paper Plain, and SCIM. These projects help people read academic scientific publications, for example by enabling definitions for mathematical notations, or generating glossary for nonce words (e.g. acronyms, symbols, jargon terms); make medical research more accessible by enabling simplified summaries and Q&A; and classifying key passages in papers to enable quick and intelligent paper skimming.
The second keynote talk, “Enabling the Document Experiences of the Future,” was by Vlad Morariu, Senior Research Scientist at Adobe Research. Vlad addressed the need for human-document interaction, and took us through some future document experiences: PDF re-flows for mobile devices, documents read themselves, and conversational functionalities such as asking questions and receiving answers. Enabling this type of ultra-responsive documents is reliant on methods such as structural element detection, page layout understanding, and semantic connections.
The third and final keynote talk was by Seiichi Uchida, Distinguished Professor and Senior Vice President, Kyushu University, Japan. In his talk, “What Are Letters?,” Seiichi took us through the four main functions of letters and text: message (transmission of verbalised info), label (disambiguation of objects and environments), design (give a nonverbal info, such as impression), and code (readability under various noises and deformations). He provoked us to contemplate how our lives were affected by texts around us, and how could we analyse the correlation between our behaviour and the texts that we read.
When it came to papers submitted for review by the conference committee, the most prominent topic represented in those submissions was handwriting recognition, with a growing number of papers specifically tackling historical documents. Other submission topics included Graphics Recognition, Natural Language Processing for Documents (D-NLP), Applications (including for medical, legal, and business documents), and other types of Document Analysis and Recognition topics (DAR).
Some of the papers that I attended tackled Named Entity Recognition (NER) evaluation methods and genealogical information extraction; papers dealing with Document Understanding, e.g. identifying the internal structure of documents, and understanding the relations between different entities; papers on Text and Document Recognition, such as looking into a model for multilingual OCR; and papers looking into Graphics, especially the recognition of table structure and content, as well as extracting data from structure diagrammes, for example in financial documents, or flowchart recognition. Papers on Handwritten Text Recognition (HTR) dealt with methods for Writer Retrieval, i.e. identifying documents likely written by specific authors, the creation of generic models, text line detection, and more.
The conference included two poster sessions, featuring an incredibly rich array of poster presentations, as well as doctoral consortia. One of my favourite posters was presented by Mirjam Cuper, Data Scientist at the National Library of the Netherlands (KB), entitled “Unraveling confidence: examining confidence scores as proxy for OCR quality.” Together with colleagues Corine van Dongen and Tineke Koster, she looked into confidence scores provided by OCR engines, which indicate the level of certainty in which a word or character were accurately recognised. However, other factors are at play when measuring OCR quality – you can watch a ‘teaser’ video for this poster.
As mentioned, the conference was followed by three days of tutorials and workshops. I enjoyed the tutorial on Computational Analysis of Historical Documents, co-led by Dr Isabelle Marthot-Santaniello (University of Bale, Switzerland) and Dr Hussein Adnan Mohammed (University of Hamburg, Germany). Presentations focused on the unique challenges, difficulties, and opportunities inherent to working with different types of historical documents. The distinct difficulties posed by historical handwritten manuscripts and ancient artifacts necessitate an interdisciplinary strategy and the utilisation of state-of-the-art technologies – and this fusion leads to the emergence of exciting and novel advancements in this area. The presentations were interwoven with great questions and a rich discussion, indicative of the audience’s enthusiasm. This tutorial was appropriately followed by a workshop dedicated to Computational Palaeography (IWCP).
I especially looked forward to the next day’s workshop, which was the 7th edition of Historical Document Imaging and Processing (HIP’23). It was all about making documents accessible in digital libraries, looking at methods addressing OCR/HTR of historical documents, information extraction, writer identification, script transliteration, virtual reconstruction, and so much more. This day-long workshop featured papers in four sessions: HTR and Multi-Modal Methods, Classics, Segmentation & Layout Analysis, and Language Technologies & Classification. One of my favourite presentations was by Prof Apostolos Antonacopoulos, talking about his work with Christian Clausner and Stefan Pletschacher on “NAME – A Rich XML Format for Named Entity and Relation Tagging.” Their NAME XML tackles the need to represent named entities in rich and complex scenarios. Tags could be overlapping and nested, character-precise, multi-part, and possibly with non-consecutive words or tokens. This flexible and extensible format addresses the relationships between entities, makes them interoperable, usable alongside other information (images and other formats), and possible to validate.
I’ve greatly enjoyed the conference and its wonderful community, meeting old colleagues and making new friends. Until next time!