Improving Naive Bayes by Reducing the Importance of Low-Frequency Words Based on Entropy of Words for Spam Email Classification

dc.contributor.authorPhaiboon Trikanjananun
dc.contributor.authorArjin Numsomran
dc.contributor.authorVittaya Tipsuwannaporn
dc.date.accessioned2026-05-08T19:19:55Z
dc.date.issued2022-11-27
dc.description.abstractThe Naive Bayes algorithm (NB algorithm) is a popular one for spam email classification due to fast training, using simple techniques and high accuracy. One of many research improving NB algorithms are the AWF-NB algorithm. In this paper, we call the research an AWF-algorithm for convenient mention. The AWF-NB algorithm focuses on solving the equally important word in each class because it is not always the case. Another problem of the NB algorithm to solve this problem, the AWF-NB extremely reduces the importance of words in the class that has lower importance. However, this action will lead to reducing the accuracy in cases that slightly differ among the importance of words in each class. Therefore, the goal of the research is to improve the AWF-NB algorithm by reducing the importance of words based on entropy of words. We compute the entropy of a word to decide if it should be reduced in importance. The experimental results on ten spam email datasets from Kaggle website indicated that the RIWE-NB algorithm can remarkably increase the classification accuracy of the NB algorithm and the AWF-NB algorithm in majority datasets while the execution time is still conserved.
dc.identifier.doi10.23919/iccas55662.2022.10003787
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/17269
dc.publisher2022 22nd International Conference on Control, Automation and Systems (ICCAS)
dc.subjectSpam and Phishing Detection
dc.subjectText and Document Classification Technologies
dc.subjectSentiment Analysis and Opinion Mining
dc.titleImproving Naive Bayes by Reducing the Importance of Low-Frequency Words Based on Entropy of Words for Spam Email Classification
dc.typeArticle

Files

Collections