Improving a text classifier using text augmentation: road traffic content from Twitter

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The purpose of this study is to develop a more effective method for categorizing Thai-language tweets related to traffic. The categorization consists of five categories. Previous studies have utilized CNN and BERT for classification, but have faced the challenge of needing balanced data for improved performance. To address this, we propose the use of BPEmb to augmentation the data and calculate cosine similarity. The subsequent step will be to create a balanced dataset to train a combination of CNN and bi-LSTM models for tweet classification. Our experiment demonstrates a significant improvement in tweet classification with a 14.3% increase in F1-score compared to the baseline method.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By