Mathematical Benchmarking of Convolutional Neural Networks for Thai Dialect Recognition: A Spectrogram Texture Classification Approach

dc.contributor.authorPorawat Visutsak
dc.contributor.authorDuongduen Ongrungruaeng
dc.contributor.authorSurapong Wiriya
dc.contributor.authorKeun Ho Ryu
dc.date.accessioned2026-05-08T19:26:35Z
dc.date.issued2026-3-18
dc.description.abstractThis study rigorously evaluates 13 Convolutional Neural Network (CNN) architectures for Thai dialect recognition. By treating Automatic Speech Recognition (ASR) as a computer vision texture classification task, we processed an extensive 840-h dataset from the Spoken Language Systems, Chulalongkorn University (SLSCU) corpus. Raw audio from four major dialects—Central, Northern (Khummuang), Northeastern (Korat), and Southern (Pat-tani)—was transformed into 2D Mel-spectrograms using the Short-Time Fourier Transform (STFT). We analyzed a diverse range of architectures, including the VGG, Inception, ResNet, DenseNet, and MobileNet families, to establish the optimal trade-off between mathematical complexity and spectral feature extraction. Our experimental results identify NASNet-Mobile as the most effective model, achieving a macro-average F1-score of 0.9425. The analysis suggests that NASNet’s search-optimized cell structure is uniquely capable of capturing the multiscale texture of phonetic formants. In contrast, we observed a catastrophic mode collapse in VGG16 (32.97% accuracy), likely due to excessive parameter bloat, while Xception and MobileNetV2 maintained robust generalization. Confusion matrix analysis reveals high acoustic distinctiveness for Southern Thai (96.7% recall), whereas Northern Thai exhibits significant spectral overlap with Central Thai. These results support the hypothesis that CNNs interpret spectrograms as textures rather than discrete objects, positioning NASNet-Mobile as a high-performance, low-latency baseline for edge-device deployment in resource-constrained environments.
dc.identifier.doi10.3390/electronics15061271
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/20695
dc.publisherElectronics
dc.subjectSpeech Recognition and Synthesis
dc.subjectPhonetics and Phonology Research
dc.subjectAuthorship Attribution and Profiling
dc.titleMathematical Benchmarking of Convolutional Neural Networks for Thai Dialect Recognition: A Spectrogram Texture Classification Approach
dc.typeArticle

Files

Collections