Code Smell Classification Using Graph Convolutional Network with Imbalanced Data and Model Integration

Phawinee Suphawimon; Tuchsanai Ploysuwan

doi:10.1109/isai-nlp66160.2025.11320518

Code Smell Classification Using Graph Convolutional Network with Imbalanced Data and Model Integration

Date

2025-11-12

Authors

Phawinee Suphawimon

Tuchsanai Ploysuwan

Abstract

Code smell represents critical design anomalies that significantly impact software maintainability and quality. This paper presents a comprehensive framework using Graph Convolutional Networks (GCNs) integrated with traditional machine learning techniques. We systematically evaluated graph construction approaches, model integration methodologies, and data balancing strategies using nine real-world Python repositories labeled with PyExamine. Our methodology combines BERT embeddings with graph structural representations, implementing layer integration (Method I) and feature concatenation (Method II). Results show per-line graph construction outperforms global approaches, with SMOTE achieving 96.14% accuracy compared to 86.70% for imbalanced data. Including non-smelly code improves performance from 71% to 95%, demonstrating the importance of negative examples. Our ablation study shows explicit feature engineering achieves only 67% accuracy compared to 95% for end-to-end learning. The integrated GCN with Transformer using Method II achieved 95% accuracy and 89% F1-score, nearly matching CodeT5 (97% accuracy, 85% F1-score) while providing better interpretability.