Open Access
REVIEW
A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics
1 Computer Science Department, Faculty of Computers and Artificial Intelligence, Damietta University, New Damietta, 34517, Egypt
2 Faculty of Computer Science and Engineering, New Mansoura University, Dakhlia, 35516, Egypt
3 Computer Science Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza, 12613, Egypt
* Corresponding Author: Ahmed Ismail Ebada. Email:
(This article belongs to the Special Issue: Multi-Modal Deep Learning for Advanced Medical Diagnostics)
Computers, Materials & Continua 2025, 84(3), 4155-4193. https://doi.org/10.32604/cmc.2025.065571
Received 17 March 2025; Accepted 18 June 2025; Issue published 30 July 2025
Abstract
Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics, advancing precision medicine by enabling integration and learning from diverse data sources. The exponential growth of high-dimensional healthcare data, encompassing genomic, transcriptomic, and other omics profiles, as well as radiological imaging and histopathological slides, makes this approach increasingly important because, when examined separately, these data sources only offer a fragmented picture of intricate disease processes. Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling, more robust disease characterization, and improved treatment decision-making. This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis. We classify and examine important application domains, such as (1) radiology, where automated report generation and lesion detection are facilitated by image-text integration; (2) histopathology, where fusion models improve tumor classification and grading; and (3) multi-omics, where molecular subtypes and latent biomarkers are revealed through cross-modal learning. We provide an overview of representative research, methodological advancements, and clinical consequences for each domain. Additionally, we critically analyzed the fundamental issues preventing wider adoption, including computational complexity (particularly in training scalable, multi-branch networks), data heterogeneity (resulting from modality-specific noise, resolution variations, and inconsistent annotations), and the challenge of maintaining significant cross-modal correlations during fusion. These problems impede interpretability, which is crucial for clinical trust and use, in addition to performance and generalizability. Lastly, we outline important areas for future research, including the development of standardized protocols for harmonizing data, the creation of lightweight and interpretable fusion architectures, the integration of real-time clinical decision support systems, and the promotion of cooperation for federated multimodal learning. Our goal is to provide researchers and clinicians with a concise overview of the field’s present state, enduring constraints, and exciting directions for further research through this review.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools