Text Recognition on Khmer Historical Documents using Glyph Class Map Generation with Encoder-Decoder Model

Valy, Dona;Verleysen, Michel;Chhun, Sophea
(2019) 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018) — Location: Prague (Czech Republic) (19.February.2019)

Files

TextRecognitiononKhmerHistoricalDocumentsusingGlyphClassMapGenerationwithEncoder-DecoderModel.pdf
  • Open Access
  • Adobe PDF
  • 390.37 KB

Details

Authors
  • Valy, DonaUCLouvain
    Author
  • Author
  • Chhun, SopheaDepartment of Information and Communication Engineering, Institute of Technology of Cambodia, Cambodia
    Author
Abstract
In this paper, we propose a handwritten text recognition approach on word image patches extracted from Khmer historical documents. The network consists of two main modules composing of deep convolutional and multi-dimensional recurrent blocks. We utilize the annotated information of glyph components in the word image to build a glyph class map which is to be predicted by the first module of the network call glyph class map generator. The second module of the network encodes the generated glyph class map and transform it into a context vector which is to be decoded to produce the final word transcription. We also adapt an attention mechanism to the decoder to take advantage of local contexts which are also provided by the encoder. Experiments on a publicly available dataset of digitized Khmer palm leaf manuscripts called SleukRith set are conducted.
Affiliations

Citations

Valy, D., Verleysen, M., & Chhun, S. (2019). Text Recognition on Khmer Historical Documents using Glyph Class Map Generation with Encoder-Decoder Model. Proceedings of ICPRAM 2019, 8. https://hdl.handle.net/2078.5/126739 (Original work published 2018)