Learning Extended Term Frequency-Inverse Document Frequency (TF-IDF++) for Depression Screening From Sentences in Thai Blog Post

dc.contributor.authorSahussawud Khunruksa
dc.contributor.authorSomkiat Wangsiripitak
dc.date.accessioned2026-05-08T19:20:20Z
dc.date.issued2023-5-18
dc.description.abstractThis paper proposed the method of depression screening from a sentence in Thai blog posts. Three classifiers based on a decision tree, linear SVC, and logistic regression were used to create classification models; each learned from extended term frequency-inverse document frequency (TF-IDF++) which is a feature vector created from a term frequency-inverse document frequency (TF-IDF), part-of-speech, and statistics of sentences such as word counts of selected terms. Our experiments showed that the model based on logistic regression achieves the top average score with a precision of 78.32%, a recall of 78.26%, and an f1-score of 78.27%. The proposed method outperforms the Thai BERT model by 0.75%, 0.77%, and 0.76%, respectively. Our investigation also showed that excessive confidence in the Thai BERT model tends to classify a sample with high probability. This also happens in case of an incorrect prediction; the error in such a case becomes noticeably higher than that of the wrong prediction in our proposed logistic regression-based model.
dc.identifier.doi10.1109/icbir57571.2023.10147692
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/17484
dc.subjectTopic Modeling
dc.subjectSentiment Analysis and Opinion Mining
dc.subjectMental Health via Writing
dc.titleLearning Extended Term Frequency-Inverse Document Frequency (TF-IDF++) for Depression Screening From Sentences in Thai Blog Post
dc.typeArticle

Files

Collections