An OCR-Based Framework for Automated Verification of Thai and English Academic Transcripts

Abstract

Academic transcripts are essential documents for employment and higher education but remain susceptible to forgery and manipulation, creating a need for efficient and reliable verification methods. Traditional verification is often time-consuming due to the structural complexity of transcripts and the challenges of extracting information from PDF and image formats. This study presents an OCR-based framework for automated transcript verification that integrates preprocessing and postprocessing techniques to enhance data extraction quality. English transcripts were evaluated using OCR models PaddleOCR, Tesseract, and EasyOCR, while Thai transcripts were assessed with Tesseract and EasyOCR. Experimental results demonstrate that preprocessing substantially improves extraction accuracy and that postprocessing further refines the outputs. Among the evaluated models, PaddleOCR achieved the highest performance on English transcripts with an overall accuracy of 81.87%, whereas Tesseract yielded the best accuracy for Thai transcripts at 81.11%. These findings underscore the effectiveness of combining OCR with tailored preprocessing and postprocessing strategies to support reliable and efficient transcript verification in academic settings.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By