An OCR-Based Framework for Automated Verification of Thai and English Academic Transcripts

Bhattarabhorn Wattanacheep; Pannathorn Komkris; Issariyapon Heymun; Wongsawan Srimontrisanga

doi:10.1109/icsec67360.2025.11298018

An OCR-Based Framework for Automated Verification of Thai and English Academic Transcripts

Date

2025-11-2

Authors

Bhattarabhorn Wattanacheep

Pannathorn Komkris

Issariyapon Heymun

Wongsawan Srimontrisanga

Abstract

Academic transcripts are essential documents for employment and higher education but remain susceptible to forgery and manipulation, creating a need for efficient and reliable verification methods. Traditional verification is often time-consuming due to the structural complexity of transcripts and the challenges of extracting information from PDF and image formats. This study presents an OCR-based framework for automated transcript verification that integrates preprocessing and postprocessing techniques to enhance data extraction quality. English transcripts were evaluated using OCR models PaddleOCR, Tesseract, and EasyOCR, while Thai transcripts were assessed with Tesseract and EasyOCR. Experimental results demonstrate that preprocessing substantially improves extraction accuracy and that postprocessing further refines the outputs. Among the evaluated models, PaddleOCR achieved the highest performance on English transcripts with an overall accuracy of 81.87%, whereas Tesseract yielded the best accuracy for Thai transcripts at 81.11%. These findings underscore the effectiveness of combining OCR with tailored preprocessing and postprocessing strategies to support reliable and efficient transcript verification in academic settings.