Open Access
ARTICLE
Explainable Segmentation-Guided Mamba-Transformer Framework for Automated Cardiovascular Disease Detection
1 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
2 Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
3 Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
4 Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
5 Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
6 Department of Computer Science and Engineering, Soonchunhyang University, Asan, Republic of Korea
* Corresponding Authors: Muhammad Umer. Email: ; Yongwon Cho. Email:
(This article belongs to the Special Issue: Exploring the Impact of Artificial Intelligence on Healthcare: Insights into Data Management, Integration, and Ethical Considerations)
Computer Modeling in Engineering & Sciences 2026, 147(1), 43 https://doi.org/10.32604/cmes.2026.078510
Received 01 January 2026; Accepted 03 April 2026; Issue published 27 April 2026
Abstract
Cardiovascular diseases (CVD) remain the leading cause of global mortality, making early and accurate diagnosis essential for improving patient outcomes. However, most existing deep learning approaches address cardiac image segmentation or disease classification independently, limiting their effectiveness in complex clinical decision-making scenarios. In this study, we propose an explainable spatio-temporal deep learning framework that integrates segmentation-guided representation learning with efficient temporal modeling for automated CVD detection. The proposed architecture incorporates the Segment Anything Model for Medical Imaging in 2D (SAM-Med2D) to achieve accurate cardiac structure segmentation, followed by Mamba-based temporal feature extraction and Transformer-driven spatial representation learning to capture both dynamic motion patterns and anatomical dependencies in cardiac imaging sequences. To enhance transparency and clinical trust, Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP) are employed to provide interpretable diagnostic insights. The framework is evaluated on three benchmark cardiovascular datasets, including EchoNet-Dynamic, CAMUS echocardiography, and UK Biobank cine cardiac magnetic resonance imaging (CMR). Experimental results demonstrate strong performance, achieving a Dice score of 91.20% for segmentation, an AUC of 95.50%, classification accuracy of 92.10%, and an MCC of 0.84, consistently outperforming multiple baseline methods. The proposed framework consistently outperforms baseline and existing methods, achieving approximately 3%–6% improvement in segmentation performance and 3%–4% improvement in classification accuracy across key evaluation metrics. The proposed approach offers a robust and explainable solution for automated cardiovascular disease detection, with significant potential to support reliable clinical deployment and improve diagnostic workflows in medical imaging practice.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools