Open Access iconOpen Access

ARTICLE

Multi-Scale Vision Transformer with Dynamic Multi-Loss Function for Medical Image Retrieval and Classification

Omar Alqahtani, Mohamed Ghouse*, Asfia Sabahath, Omer Bin Hussain, Arshiya Begum

Department of Computer Science, College of Computer Science, King Khalid University, Abha, 61421, Saudi Arabia

* Corresponding Author: Mohamed Ghouse. Email: email

(This article belongs to the Special Issue: Emerging Trends and Applications of Deep Learning for Biomedical Signal and Image Processing)

Computers, Materials & Continua 2025, 83(2), 2221-2244. https://doi.org/10.32604/cmc.2025.061977

Abstract

This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer (ViT) architectures and a dynamic multi-loss function. The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features, while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance. Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets, yielding notable improvements. Specifically, on the ISIC-2018 dataset, our method achieves an F1-Score improvement of +4.84% compared to the standard ViT, with a precision increase of +5.46% for melanoma (MEL). On the ChestX-ray14 dataset, the method delivers an F1-Score improvement of 5.3% over the conventional ViT, with precision gains of +5.0% for pneumonia (PNEU) and +5.4% for fibrosis (FIB). Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants, particularly in retrieving relevant medical cases and enhancing diagnostic accuracy. These findings highlight the potential of the proposed method for large-scale medical image analysis, offering improved tools for clinical decision-making through superior classification and case comparison.

Keywords

Medical image retrieval; vision transformer; multi-scale encoding; multi-loss function; ISIC-2018; ChestX-ray14

Cite This Article

APA Style
Alqahtani, O., Ghouse, M., Sabahath, A., Hussain, O.B., Begum, A. (2025). Multi-Scale Vision Transformer with Dynamic Multi-Loss Function for Medical Image Retrieval and Classification. Computers, Materials & Continua, 83(2), 2221–2244. https://doi.org/10.32604/cmc.2025.061977
Vancouver Style
Alqahtani O, Ghouse M, Sabahath A, Hussain OB, Begum A. Multi-Scale Vision Transformer with Dynamic Multi-Loss Function for Medical Image Retrieval and Classification. Comput Mater Contin. 2025;83(2):2221–2244. https://doi.org/10.32604/cmc.2025.061977
IEEE Style
O. Alqahtani, M. Ghouse, A. Sabahath, O. B. Hussain, and A. Begum, “Multi-Scale Vision Transformer with Dynamic Multi-Loss Function for Medical Image Retrieval and Classification,” Comput. Mater. Contin., vol. 83, no. 2, pp. 2221–2244, 2025. https://doi.org/10.32604/cmc.2025.061977



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 58

    View

  • 49

    Download

  • 0

    Like

Share Link