Open Access
ARTICLE
Modeling and Predictive Analytics of Breast Cancer Using Ensemble Learning Techniques: An Explainable Artificial Intelligence Approach
1 Computer Science and Engineering Discipline, Khulna University, Khulna, 9208, Bangladesh
2 Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh
3 School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39401, USA
4Computer Skills, Department of Self-Development Skill, Common First Year Deanship, King Saud University, Riyadh, 11362, Saudi Arabia
5 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, 12372, Saudi Arabia
* Corresponding Author: Anupam Kumar Bairagi. Email:
(This article belongs to the Special Issue: Emerging Trends and Applications of Deep Learning for Biomedical Signal and Image Processing)
Computers, Materials & Continua 2024, 81(3), 4033-4048. https://doi.org/10.32604/cmc.2024.057415
Received 16 August 2024; Accepted 30 October 2024; Issue published 19 December 2024
Abstract
Breast cancer stands as one of the world’s most perilous and formidable diseases, having recently surpassed lung cancer as the most prevalent cancer type. This disease arises when cells in the breast undergo unregulated proliferation, resulting in the formation of a tumor that has the capacity to invade surrounding tissues. It is not confined to a specific gender; both men and women can be diagnosed with breast cancer, although it is more frequently observed in women. Early detection is pivotal in mitigating its mortality rate. The key to curbing its mortality lies in early detection. However, it is crucial to explain the black-box machine learning algorithms in this field to gain the trust of medical professionals and patients. In this study, we experimented with various machine learning models to predict breast cancer using the Wisconsin Breast Cancer Dataset (WBCD) dataset. We applied Random Forest, XGBoost, Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and Gradient Boost classifiers, with the Random Forest model outperforming the others. A comparison analysis between the two methods was done after performing hyperparameter tuning on each method. The analysis showed that the random forest performs better and yields the highest result with 99.46% accuracy. After performance evaluation, two Explainable Artificial Intelligence (XAI) methods, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), have been utilized to explain the random forest machine learning model.Keywords
Cite This Article

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.