Learning Extended Term Frequency-Inverse Document Frequency (TF-IDF++) for Depression Screening From Sentences in Thai Blog Post
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This paper proposed the method of depression screening from a sentence in Thai blog posts. Three classifiers based on a decision tree, linear SVC, and logistic regression were used to create classification models; each learned from extended term frequency-inverse document frequency (TF-IDF++) which is a feature vector created from a term frequency-inverse document frequency (TF-IDF), part-of-speech, and statistics of sentences such as word counts of selected terms. Our experiments showed that the model based on logistic regression achieves the top average score with a precision of 78.32%, a recall of 78.26%, and an f1-score of 78.27%. The proposed method outperforms the Thai BERT model by 0.75%, 0.77%, and 0.76%, respectively. Our investigation also showed that excessive confidence in the Thai BERT model tends to classify a sample with high probability. This also happens in case of an incorrect prediction; the error in such a case becomes noticeably higher than that of the wrong prediction in our proposed logistic regression-based model.