Comprehensive Benchmarking and Analysis of Open Pretrained Thai Speech Recognition Models
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This paper presents a comprehensive benchmarking and analysis of open pretrained Thai Automatic Speech Recognition (ASR) models, addressing a critical gap in low-resource language ASR development. Our benchmarking focuses on one foundation speech model, three Open Thai Speech Recognition Models, and Open speech APls. We evaluate these models, including our fine-tuned Whisper model, across diverse speech types and acoustic environments. The study reveals significant performance gaps between read and spontaneous speech, with models performing well in controlled settings but struggling in real-world scenarios. We introduce evaluation datasets for distant speech and noisy podcasts, exposing limitations in current models' robustness. Our fine-tuned Whisper model demonstrates performance across various Thai regional dialects, reflecting its targeted training on dialectal data, while the others demonstrate resilience in spontaneous speech scenarios. However, all models show substantial degradation in challenging acoustic conditions, indicating a need for more diverse training corpora to better capture real-world complexity, including spontaneous speech and far-field acoustic scenarios, to further enhance Thai ASR. This work provides valuable insights for improving ASR models in low-resource languages.