Open Access iconOpen Access

REVIEW

crossmark

Transformers for Multi-Modal Image Analysis in Healthcare

Sameera V Mohd Sagheer1,*, Meghana K H2, P M Ameer3, Muneer Parayangat4, Mohamed Abbas4

1 Department of Biomedical Engineering, KMCT College of Engineering for Women, Kerala, 683104, India
2 MCA Department, Federal Institute of Science and Technology, Kerala, 683104, India
3 ECE Department, National Institute of Technology Calicut, Kerala, 683104, India
4 Electrical Engineering Department, College of Engineering, King Khalid University, Abha, 61421, Saudi Arabia

* Corresponding Author: Sameera V Mohd Sagheer. Email: email

Computers, Materials & Continua 2025, 84(3), 4259-4297. https://doi.org/10.32604/cmc.2025.063726

Abstract

Integrating multiple medical imaging techniques, including Magnetic Resonance Imaging (MRI), Computed Tomography, Positron Emission Tomography (PET), and ultrasound, provides a comprehensive view of the patient health status. Each of these methods contributes unique diagnostic insights, enhancing the overall assessment of patient condition. Nevertheless, the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution, data collection methods, and noise levels. While traditional models like Convolutional Neural Networks (CNNs) excel in single-modality tasks, they struggle to handle multi-modal complexities, lacking the capacity to model global relationships. This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system. The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities. Additionally, it shows resilience to variations in noise and image quality, making it adaptable for real-time clinical use. To address the computational hurdles linked to transformer models, particularly in real-time clinical applications in resource-constrained environments, several optimization techniques have been integrated to boost scalability and efficiency. Initially, a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness. Methods such as model pruning, quantization, and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed. Furthermore, efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations. For further deployment optimization, researchers have implemented hardware-aware acceleration strategies, including the use of TensorRT and ONNX-based model compression, to ensure efficient execution on edge devices. These optimizations allow the approach to function effectively in real-time clinical settings, ensuring viability even in environments with limited resources. Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments. This study highlights the transformative potential of transformer models in multi-modal medical imaging, offering improvements in diagnostic accuracy and patient care outcomes.

Keywords

Multi-modal image analysis; medical imaging; deep learning; image segmentation; disease detection; multi-modal fusion; Vision Transformers (ViTs); precision medicine; clinical decision support

Cite This Article

APA Style
Sagheer, S.V.M., K H, M., Ameer, P.M., Parayangat, M., Abbas, M. (2025). Transformers for Multi-Modal Image Analysis in Healthcare. Computers, Materials & Continua, 84(3), 4259–4297. https://doi.org/10.32604/cmc.2025.063726
Vancouver Style
Sagheer SVM, K H M, Ameer PM, Parayangat M, Abbas M. Transformers for Multi-Modal Image Analysis in Healthcare. Comput Mater Contin. 2025;84(3):4259–4297. https://doi.org/10.32604/cmc.2025.063726
IEEE Style
S. V. M. Sagheer, M. K H, P. M. Ameer, M. Parayangat, and M. Abbas, “Transformers for Multi-Modal Image Analysis in Healthcare,” Comput. Mater. Contin., vol. 84, no. 3, pp. 4259–4297, 2025. https://doi.org/10.32604/cmc.2025.063726



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1659

    View

  • 761

    Download

  • 0

    Like

Share Link