A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

Sadia Fatima; Fadl Dahan; Jamal Shah; Refan Almohamedh; Mohammed Aloqaily; Samia Riaz

doi:10.32604/cmes.2025.070272

Open Access icon Open Access

ARTICLE

A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

Sadia Fatima¹, Fadl Dahan^2,*, Jamal Hussain Shah¹, Refan Almohamedh², Mohammed Aloqaily², Samia Riaz¹

1 Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt, 47040, Pakistan
2 Department of Management Information Systems, College of Business Administration—Hawtat Bani Tamim, Prince Sattam bin Abdulaziz University, Al-Kharj, 11942, Saudi Arabia

* Corresponding Author: Fadl Dahan. Email: email

Computer Modeling in Engineering & Sciences 2025, 145(1), 971-994. https://doi.org/10.32604/cmes.2025.070272

Received 11 July 2025; Accepted 15 September 2025; Issue published 30 October 2025

Abstract

The human gastrointestinal (GI) tract is influenced by numerous disorders. If not detected in the early stages, they may result in severe consequences such as organ failure or the development of cancer, and in extreme cases, become life-threatening. Endoscopy is a specialised imaging technique used to examine the GI tract. However, physicians might neglect certain irregular morphologies during the examination due to continuous monitoring of the video recording. Recent advancements in artificial intelligence have led to the development of high-performance AI-based systems, which are optimal for computer-assisted diagnosis. Due to numerous limitations in endoscopic image analysis, including visual similarities between infected and healthy areas, retrieval of irrelevant features, and imbalanced testing and training datasets, performance accuracy is reduced. To address these challenges, we proposed a framework for analysing gastrointestinal tract images that provides a more robust and secure model, thereby reducing the chances of misclassification. Compared to single model solutions, the proposed methodology improves performance by integrating diverse models and optimizing feature fusion using a dual-branch CNN transformer architecture. The proposed approach employs a dual-branch feature extraction mechanism, where in the first branch, features are extracted using Extended BEiT, and EfficientNet-B5 is utilized in the second branch. Additionally, cross-entropy loss is used to measure the error of prediction at both branches, followed by model stacking. This multimodal framework outperforms existing approaches across multiple metrics, achieving 94.12% accuracy, recall and F1-score, as well as 94.15% precision on the Kvasir dataset. Furthermore, the model successfully reduced the false negative rate to 5.88%, enhancing its ability to minimize misdiagnosis. These results highlight the adaptability of the proposed work in clinical practice, where it can provide fast and accurate diagnostic assistance crucial for improving the early diagnosis of diseases in the gastrointestinal tract.

Keywords

Multimodal; gastrointestinal GI; disease diagnosis; misclassification; transformer; deep learning

Cite This Article

APA Style

Fatima, S., Dahan, F., Shah, J.H., Almohamedh, R., Aloqaily, M. et al. (2025). A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis. Computer Modeling in Engineering & Sciences, 145(1), 971–994. https://doi.org/10.32604/cmes.2025.070272

Vancouver Style

Fatima S, Dahan F, Shah JH, Almohamedh R, Aloqaily M, Riaz S. A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis. Comput Model Eng Sci. 2025;145(1):971–994. https://doi.org/10.32604/cmes.2025.070272

IEEE Style

S. Fatima, F. Dahan, J. H. Shah, R. Almohamedh, M. Aloqaily, and S. Riaz, “A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis,” Comput. Model. Eng. Sci., vol. 145, no. 1, pp. 971–994, 2025. https://doi.org/10.32604/cmes.2025.070272

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Multimodal Learning Framework to Reduce Misclassification in GI Tract Disease Diagnosis

Abstract

Keywords

Cite This Article

1176

509

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link