Code Smell Classification Using Graph Convolutional Network with Imbalanced Data and Model Integration

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Code smell represents critical design anomalies that significantly impact software maintainability and quality. This paper presents a comprehensive framework using Graph Convolutional Networks (GCNs) integrated with traditional machine learning techniques. We systematically evaluated graph construction approaches, model integration methodologies, and data balancing strategies using nine real-world Python repositories labeled with PyExamine. Our methodology combines BERT embeddings with graph structural representations, implementing layer integration (Method I) and feature concatenation (Method II). Results show per-line graph construction outperforms global approaches, with SMOTE achieving 96.14% accuracy compared to 86.70% for imbalanced data. Including non-smelly code improves performance from 71% to 95%, demonstrating the importance of negative examples. Our ablation study shows explicit feature engineering achieves only 67% accuracy compared to 95% for end-to-end learning. The integrated GCN with Transformer using Method II achieved 95% accuracy and 89% F1-score, nearly matching CodeT5 (97% accuracy, 85% F1-score) while providing better interpretability.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By