Improving OpenAI’s Whisper Model for Transcribing Homophones in Legal News

Lattapon Siriket; Kulsawasd Jitkajornwanich; Saichon Jaiyen; Sarun Intakosum

doi:10.1109/iceast61342.2024.10554018

Improving OpenAI’s Whisper Model for Transcribing Homophones in Legal News

Date

2024-5-1

Authors

Lattapon Siriket

Kulsawasd Jitkajornwanich

Saichon Jaiyen

Sarun Intakosum

Abstract

The “Whisper“ model provides a tool for those who require transcription of human voice. It equips with opensource features and diverse functionalities. The model is capable of effectively deciphering messages in multiple languages, including support for the Thai language. This paper focuses on improving the transcription process of Thai homophones using the Whisper model in reducing the word error rate (WER). We focus on words in the legal news category and identify factors that lead to Whisper’s incorrect sound predictions. We examined homophones using snippets of legal news video clips and compiled them into a homophone dictionary. We compare words extracted from the Whisper model by determining the word error rate and spelling of words. Based on the initial results obtained from the original Whisper model and the created homophone dictionary, $48 \%$ of the words were incorrectly transcribed out of a total of 94 words. Then, we propose a methodology by which the performance of the Whisper is improved. That way, the automatic speech recognition of Thai language using the Whisper model can fully be utilized and used in other applications.