A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics

Aya Al-Zoghby; Ahmed Ebada; Aya Saleh; Mohammed Abdelhay; Wael Awad

doi:10.32604/cmc.2025.065571

Open Access icon Open Access

REVIEW

A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics

Aya M. Al-Zoghby^1,2, Ahmed Ismail Ebada^1,*, Aya S. Saleh¹, Mohammed Abdelhay³, Wael A. Awad¹

1 Computer Science Department, Faculty of Computers and Artificial Intelligence, Damietta University, New Damietta, 34517, Egypt
2 Faculty of Computer Science and Engineering, New Mansoura University, Dakhlia, 35516, Egypt
3 Computer Science Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza, 12613, Egypt

* Corresponding Author: Ahmed Ismail Ebada. Email: email

(This article belongs to the Special Issue: Multi-Modal Deep Learning for Advanced Medical Diagnostics)

Computers, Materials & Continua 2025, 84(3), 4155-4193. https://doi.org/10.32604/cmc.2025.065571

Received 17 March 2025; Accepted 18 June 2025; Issue published 30 July 2025

Abstract

Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics, advancing precision medicine by enabling integration and learning from diverse data sources. The exponential growth of high-dimensional healthcare data, encompassing genomic, transcriptomic, and other omics profiles, as well as radiological imaging and histopathological slides, makes this approach increasingly important because, when examined separately, these data sources only offer a fragmented picture of intricate disease processes. Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling, more robust disease characterization, and improved treatment decision-making. This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis. We classify and examine important application domains, such as (1) radiology, where automated report generation and lesion detection are facilitated by image-text integration; (2) histopathology, where fusion models improve tumor classification and grading; and (3) multi-omics, where molecular subtypes and latent biomarkers are revealed through cross-modal learning. We provide an overview of representative research, methodological advancements, and clinical consequences for each domain. Additionally, we critically analyzed the fundamental issues preventing wider adoption, including computational complexity (particularly in training scalable, multi-branch networks), data heterogeneity (resulting from modality-specific noise, resolution variations, and inconsistent annotations), and the challenge of maintaining significant cross-modal correlations during fusion. These problems impede interpretability, which is crucial for clinical trust and use, in addition to performance and generalizability. Lastly, we outline important areas for future research, including the development of standardized protocols for harmonizing data, the creation of lightweight and interpretable fusion architectures, the integration of real-time clinical decision support systems, and the promotion of cooperation for federated multimodal learning. Our goal is to provide researchers and clinicians with a concise overview of the field’s present state, enduring constraints, and exciting directions for further research through this review.

Keywords

Multimodal deep learning; medical diagnostics; multimodal healthcare fusion; healthcare data integration

Cite This Article

APA Style

Al-Zoghby, A.M., Ismail Ebada, A., Saleh, A.S., Abdelhay, M., Awad, W.A. (2025). A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics. Computers, Materials & Continua, 84(3), 4155–4193. https://doi.org/10.32604/cmc.2025.065571

Vancouver Style

Al-Zoghby AM, Ismail Ebada A, Saleh AS, Abdelhay M, Awad WA. A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics. Comput Mater Contin. 2025;84(3):4155–4193. https://doi.org/10.32604/cmc.2025.065571

IEEE Style

A. M. Al-Zoghby, A. Ismail Ebada, A. S. Saleh, M. Abdelhay, and W. A. Awad, “A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics,” Comput. Mater. Contin., vol. 84, no. 3, pp. 4155–4193, 2025. https://doi.org/10.32604/cmc.2025.065571

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics

Abstract

Keywords

Cite This Article

3602

1563

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link