Comparison of Machine Learning Methods for Binary Classification of Multicollinearity Data
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This study examines the effectiveness of binary classification performance in multicollinearity. Four machine learning methods, namely backpropagation neural network, Naïve Bayes, support vector machine, and random forest, are compared in terms of their efficiency in handling multicollinear data. The evaluation of binary classification performance efficiency considers multicollinearity in independent variables, considering both a constant correlation model and the Toeplitz correlation. Correlation coefficients of 0.1 and 0.9 are explored in the analysis. The independent variables in this study are simulated from a multivariate normal distribution with 10, 20, 30, and 40 variables, respectively. The dependent variable is constructed using the logit function with sample sizes of 100 and 200. The simulation and data analysis are performed using the R Studio program and repeated 1,000 times for each scenario. The findings of this research reveal that the backpropagation neural network and Naïve Bayes methods exhibit superior performance in determining the mean accuracy percentage under constant correlation. On the other hand, the backpropagation neural network and support vector machine are the most effective methods in determining the mean accuracy percentage when dealing with multicollinearity in the form of Toeplitz correlation.