Open Access
ARTICLE
Explainable Ensemble Learning Approach for Ovarian Cancer Diagnosis Using Clinical Data
1 Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
2 Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
3 Department of Mathematics and Sciences, College of Sciences and Humanities, Prince Sultan University, Riyadh, Saudi Arabia
4 Department of Mathematics, COMSATS University Islamabad, Park Road, Islamabad, Pakistan
* Corresponding Author: Daniyal Asif. Email:
(This article belongs to the Special Issue: Artificial Intelligence Models in Healthcare: Challenges, Methods, and Applications)
Computer Modeling in Engineering & Sciences 2026, 146(3), 38 https://doi.org/10.32604/cmes.2026.077334
Received 07 December 2025; Accepted 02 March 2026; Issue published 30 March 2026
Abstract
Ovarian cancer (OC) is one of the leading causes of death related to gynecological cancer, with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers. Machine learning (ML) has the potential to process complex datasets and support decision-making in OC diagnosis. Nevertheless, traditional ML models tend to be biased, overfitting, noisy, and less generalized. Moreover, their black-box nature reduces interpretability and limits their practical clinical applicability. In this study, we introduce an explainable ensemble learning (EL) model, TreeX-Stack, based on a stacking architecture that employs tree-based learners such as Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGBoost) as base learners, and Logistic Regression (LR) as the meta-learner to enhance ovarian cancer (OC) diagnosis. Local Interpretable Model-Agnostic Explanations (LIME) are used to explain individual predictions, making the model outputs more clinically interpretable and applicable. The model is trained on the dataset that includes demographic information, blood test, general chemistry, and tumor markers. Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7. Relevant features are then selected using the Boruta feature selection method. To obtain robust and unbiased performance estimates during hyperparameter tuning, nested cross-validation (CV) with grid search is employed, and all experiments are repeated five times to ensure statistical reliability. TreeX-Stack demonstrates excellent diagnostic performance, achieving an accuracy of 0.9027, a precision of 0.8673, a recall of 0.9391, and an F1-score of 0.9012. Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4 (HE4) as the most significant biomarker for OC. The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools