Thai OCR Spelling Correction: Case Study in Thai Historical Document

Nuttakit Intchot; Ponrudee Netisopakul

doi:10.1109/isai-nlp66160.2025.11320465

Thai OCR Spelling Correction: Case Study in Thai Historical Document

dc.contributor.author	Nuttakit Intchot
dc.contributor.author	Ponrudee Netisopakul
dc.date.accessioned	2026-05-08T19:26:05Z
dc.date.issued	2025-11-12
dc.description.abstract	This research focuses on correcting errors from Optical Character Recognition (OCR) reader for Thai historical documents, using the case study of the Bangkok Recorder, one of the earliest printed media in Thailand. The main problems found are the high error rate caused by the deterioration of the original documents, old Thai language spelling, and the complexity of Thai script. The research compared three approaches: (1) using large language models (LLMs) with prompt-based correction, (2) fine-tuning LLMs on OCR datasets and reference texts, and (3) generating correction rules from LLMs and applying them in practice. The experimental results indicate that fine-tuning LLMs can improve accuracy and the ability to handle old spellings and Thai numerals better than using prompts alone, while the rule-generation method helps enhance the consistency of correction in cases with repetitive error patterns. Overall, the GPT-4o model showed superior performance to LLaMA-3 Typhoon in both contextual understanding and accuracy, with the best precision, recall and F1 score of 99.16, 97.62, and 98.38 percent, respectively. This research shows promising results in using LLMs for Thai OCR spelling correction for Thai archival documents domain.
dc.identifier.doi	10.1109/isai-nlp66160.2025.11320465
dc.identifier.uri	https://dspace.kmitl.ac.th/handle/123456789/20406
dc.subject	Handwritten Text Recognition Techniques
dc.subject	Natural Language Processing Techniques
dc.subject	Mathematics, Computing, and Information Processing
dc.title	Thai OCR Spelling Correction: Case Study in Thai Historical Document
dc.type	Article

Collections

All

Thai OCR Spelling Correction: Case Study in Thai Historical Document

Files

Collections