Analyzing explainability of YOLO-based breast cancer detection using heat map visualizations

Awika Ariyametkul; May Phu Paing

doi:10.21037/qims-2024-2911

Analyzing explainability of YOLO-based breast cancer detection using heat map visualizations

Date

2025-7-1

Authors

Awika Ariyametkul

May Phu Paing

Publisher

Quantitative Imaging in Medicine and Surgery

Abstract

Background: Breast cancer is the most frequently diagnosed and leading cause of cancer-related mortality among women worldwide. The danger of this disease is due to its asymptomatic nature in the early stages, thereby underscoring the importance of early detection. Mammography, a specialized X-ray imaging technique for breast examination, has been pivotal in facilitating early detection and reducing mortality rates. In recent years, artificial intelligence (AI) has gained substantial popularity across various fields, including medicine. Numerous studies have leveraged AI techniques, particularly convolutional neural networks (CNNs) and You Only Look Once (YOLO)-based models, for medical image detection and classification. However, the predictions of such AI models often lack transparency and explainability, resulting in low trustworthiness. This study aims to address this gap by investigating three state-of-the-art versions of the YOLO algorithm-YOLO version 9 (YOLOv9), YOLO version 10 (YOLOv10), and YOLO version 11 (YOLO11)-trained on breast cancer imaging datasets, specifically the INbreast and Mammographic Image Analysis Society (MIAS) databases. Additionally, to address the challenges posed by the lack of explainability and transparency, we integrate seven explainable artificial intelligence (XAI) methods: Grad-CAM, Grad-CAM++, Eigen-CAM, EigenGrad-CAM, XGrad-CAM, LayerCAM, and HiResCAM. Methods: This study utilized two publicly available breast cancer image databases: INbreast: toward a Full-field Digital Mammographic Database and the MIAS dataset. Preprocessing steps were applied to standardize all images in accordance with the input requirements of the YOLO architecture, as these datasets were used to train the three most recent versions of YOLO. The YOLO model demonstrating the highest performance-measured by mean average precision (mAP), precision, and recall-was selected for integration with seven different XAI methods. The performance of each XAI technique was evaluated both qualitatively through visual inspection and quantitatively using several metrics, including matching ground truth (mGT), Pearson correlation coefficient (PCC), precision, recall, and root mean square error (RMSE). These methodologies were employed to interpret and visualize the "black box" decision-making processes of the top-performing YOLO model. Results: Based on our experimental findings, YOLO11 outperformed YOLOv9 (mAP 0.868) and YOLOv10 (mAP 0.926), achieving the highest mAP of 0.935, with classification accuracies of 95% for benign and 80% for malignant cases. Among the evaluated XAI techniques, HiResCAM provided the most effective visual explanations, attaining the highest mGT score of 0.49, surpassing EigenGrad-CAM (0.45) and LayerCAM (0.42) in both visual and quantitative evaluations. Conclusions: The integration of YOLO11 with HiResCAM offers a robust solution that combines high detection accuracy with improved model interpretability. This approach not only enhances user trustworthiness by revealing decision-making patterns and limitations but also provide insights into the weaknesses of the model, enabling developers to refine and improve AI performance further.