Performance Evaluation of Imputation Techniques for Telecommunications Customer Clustering
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ECTI Transactions on Computer and Information Technology (ECTI-CIT)
Abstract
Missing data significantly degrades machine learning model performance in telecommunications customer analytics, leading to unreliable customer segmentation and suboptimal business decision-making. This research systematically compares seven imputation techniques across three missing mechanisms (MCAR, MAR, MNAR) and four missing rates (5%, 10%, 20%, 30%) using the Telco Customer Churn Dataset (7,043 records). Methods evaluated include traditional approaches (mean/mode, forward ll, regression), machine learning techniques (KNN, Random Forest, MICE), and deep learning (Autoencoder). We assessed model performance using normalized MAE and RMSE, and evaluated downstream effects through clustering algorithms. Results demonstrate Random Forest imputation's superior performance with MAE of 0.1568 and RMSE of 0.2123, achieving 53.7% lower error rates compared to mean/mode imputation. Statistical analysis confirmed significant performance differences (Friedman test: χ2 = 55.85, p <0.001). Interestingly, clustering performance did not directly correlate with imputation accuracy; the Autoencoder achieved the highest silhouette score (0.1510) despite moderate reconstruction accuracy. Machine learning approaches maintained robust performance across all missing data mechanisms, whereas traditional methods degraded under MNAR conditions. These findings provide evidence-based guidelines for selecting appropriate imputation techniques in telecommunications analytics, enabling improved customer segmentation and business outcomes.