Comprehensive Benchmarking and Analysis of Open Pretrained Thai Speech Recognition Models

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This paper presents a comprehensive benchmarking and analysis of open pretrained Thai Automatic Speech Recognition (ASR) models, addressing a critical gap in low-resource language ASR development. Our benchmarking focuses on one foundation speech model, three Open Thai Speech Recognition Models, and Open speech APls. We evaluate these models, including our fine-tuned Whisper model, across diverse speech types and acoustic environments. The study reveals significant performance gaps between read and spontaneous speech, with models performing well in controlled settings but struggling in real-world scenarios. We introduce evaluation datasets for distant speech and noisy podcasts, exposing limitations in current models' robustness. Our fine-tuned Whisper model demonstrates performance across various Thai regional dialects, reflecting its targeted training on dialectal data, while the others demonstrate resilience in spontaneous speech scenarios. However, all models show substantial degradation in challenging acoustic conditions, indicating a need for more diverse training corpora to better capture real-world complexity, including spontaneous speech and far-field acoustic scenarios, to further enhance Thai ASR. This work provides valuable insights for improving ASR models in low-resource languages.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By