Word-level Text Generation from Language Models

Ponrudee Netisopakul; Usanisa Taoto

doi:10.1109/icset53708.2021.9612541

Word-level Text Generation from Language Models

Date

2021-11-6

Authors

Ponrudee Netisopakul

Usanisa Taoto

Abstract

This research constructs and evaluates text generation models created from three different language models, n-gram, a Continuous Bag of Words (CBOW) and gated recurrent unit (GRU), using two training corpora, Berkeley Restaurant (Berkeley) and Alice's Adventures in Wonderland (Alice), and evaluated using two evaluation metrics; perplexity measure and count of grammar errors. The mean perplexities of all three models are comparable for each corpus, the N-gram model produces slightly lower values of perplexity. As for the number of grammatical errors in the Alice corpus, all three models show a slightly higher number of errors than the original corpus. In the Berkeley corpus, the n-gram model had the lowest number of errors, even lower than the original corpus, but the CBOW model had the highest number of errors and the GRU model had the highest number of errors.