Thai OCR Spelling Correction: Case Study in Thai Historical Document

dc.contributor.authorNuttakit Intchot
dc.contributor.authorPonrudee Netisopakul
dc.date.accessioned2026-05-08T19:26:05Z
dc.date.issued2025-11-12
dc.description.abstractThis research focuses on correcting errors from Optical Character Recognition (OCR) reader for Thai historical documents, using the case study of the Bangkok Recorder, one of the earliest printed media in Thailand. The main problems found are the high error rate caused by the deterioration of the original documents, old Thai language spelling, and the complexity of Thai script. The research compared three approaches: (1) using large language models (LLMs) with prompt-based correction, (2) fine-tuning LLMs on OCR datasets and reference texts, and (3) generating correction rules from LLMs and applying them in practice. The experimental results indicate that fine-tuning LLMs can improve accuracy and the ability to handle old spellings and Thai numerals better than using prompts alone, while the rule-generation method helps enhance the consistency of correction in cases with repetitive error patterns. Overall, the GPT-4o model showed superior performance to LLaMA-3 Typhoon in both contextual understanding and accuracy, with the best precision, recall and F1 score of 99.16, 97.62, and 98.38 percent, respectively. This research shows promising results in using LLMs for Thai OCR spelling correction for Thai archival documents domain.
dc.identifier.doi10.1109/isai-nlp66160.2025.11320465
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/20406
dc.subjectHandwritten Text Recognition Techniques
dc.subjectNatural Language Processing Techniques
dc.subjectMathematics, Computing, and Information Processing
dc.titleThai OCR Spelling Correction: Case Study in Thai Historical Document
dc.typeArticle

Files

Collections