Performance Evaluation of Imputation Techniques for Telecommunications Customer Clustering
| dc.contributor.author | Patthama Sukthong | |
| dc.contributor.author | Pattama Charoenporn | |
| dc.date.accessioned | 2026-05-08T19:26:19Z | |
| dc.date.issued | 2026-1-31 | |
| dc.description.abstract | Missing data significantly degrades machine learning model performance in telecommunications customer analytics, leading to unreliable customer segmentation and suboptimal business decision-making. This research systematically compares seven imputation techniques across three missing mechanisms (MCAR, MAR, MNAR) and four missing rates (5%, 10%, 20%, 30%) using the Telco Customer Churn Dataset (7,043 records). Methods evaluated include traditional approaches (mean/mode, forward ll, regression), machine learning techniques (KNN, Random Forest, MICE), and deep learning (Autoencoder). We assessed model performance using normalized MAE and RMSE, and evaluated downstream effects through clustering algorithms. Results demonstrate Random Forest imputation's superior performance with MAE of 0.1568 and RMSE of 0.2123, achieving 53.7% lower error rates compared to mean/mode imputation. Statistical analysis confirmed significant performance differences (Friedman test: χ2 = 55.85, p <0.001). Interestingly, clustering performance did not directly correlate with imputation accuracy; the Autoencoder achieved the highest silhouette score (0.1510) despite moderate reconstruction accuracy. Machine learning approaches maintained robust performance across all missing data mechanisms, whereas traditional methods degraded under MNAR conditions. These findings provide evidence-based guidelines for selecting appropriate imputation techniques in telecommunications analytics, enabling improved customer segmentation and business outcomes. | |
| dc.identifier.doi | 10.37936/ecti-cit.2026201.264267 | |
| dc.identifier.uri | https://dspace.kmitl.ac.th/handle/123456789/20551 | |
| dc.publisher | ECTI Transactions on Computer and Information Technology (ECTI-CIT) | |
| dc.subject | Customer churn and segmentation | |
| dc.subject | Customer Service Quality and Loyalty | |
| dc.subject | Imbalanced Data Classification Techniques | |
| dc.title | Performance Evaluation of Imputation Techniques for Telecommunications Customer Clustering | |
| dc.type | Article |