Data Augmentation and Text Recognition on Khmer Historical Manuscripts

Valy, Dona; Verleysen, Michel; Chhun, Sophia

Data Augmentation and Text Recognition on Khmer Historical Manuscripts

Valy, Dona

;

Verleysen, Michel

;

Chhun, Sophia

(2020) 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) — Location: Dortmund (Germany) (8.September.2020)

Files

DataAugmentationandTextRecognitiononKhmerHistoricalManuscripts.pdf

Open Access
Adobe PDF
409.1 KB

Download

Details

Authors

Valy, DonaDepartment of Information and Communication Engineering, Institute of Technlogy of Cambodia, Cambodia
Author
Verleysen, MichelUCLouvain
Author
Chhun, SophiaDepartment of Information and Communication Engineering, Institute of Technlogy of Cambodia, Cambodia
Author

Abstract

Analysis and recognition of historical documents faces many challenges, one of which is the scarcity of the ground truth data needed for most machine learning techniques, deep learning in particular. In this paper, we present a novel approach which significantly augments the word image samples generated from an existing dataset of Khmer ancient palm leaf manuscripts. Instead of segmenting real Khmer words, we combine the annotated glyphs into groups called sub-syllabes. A new text recognition method is also proposed to take into account the spatially complex structure of Khmer writing. The proposed method is compoused of two main modules: a feature generator and a decoder. The generator utilizes convolutional blocks, inception blocks, and also a bidirectional LSTM to encode information extracted from the input image so that it can be decoded by the attention-based decoder to predict the final text transcription. The experiments are conducted on a new dataset of sub-syllabes constructed from annotated glyphs of the SleukRith Set.

Affiliations

UCLouvainSST/ICTM/ELEN - Pôle en ingénierie électrique

Citations

APA
Chicago
FWB

Valy, D., Verleysen, M., & Chhun, S. (2020). Data Augmentation and Text Recognition on Khmer Historical Manuscripts. 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund (Germany). https://hdl.handle.net/2078.5/254147