Open Access
ARTICLE
An Overlapped Multihead Self-Attention-Based Feature Enhancement Approach for Ocular Disease Image Recognition
1 School of Artificial Intelligence, Chongqing Technology and Business University, Chongqing, 400067, China
2 Computer Science Department, Community College, King Saud University, Riyadh, 11437, Saudi Arabia
* Corresponding Authors: Zhiwei Guo. Email: ; Amr Tolba. Email:
Computers, Materials & Continua 2025, 85(2), 2999-3022. https://doi.org/10.32604/cmc.2025.066937
Received 21 April 2025; Accepted 07 July 2025; Issue published 23 September 2025
Abstract
Medical image analysis based on deep learning has become an important technical requirement in the field of smart healthcare. In view of the difficulties in collaborative modeling of local details and global features in multimodal image analysis of ophthalmology, as well as the existence of information redundancy in cross-modal data fusion, this paper proposes a multimodal fusion framework based on cross-modal collaboration and weighted attention mechanism. In terms of feature extraction, the framework collaboratively extracts local fine-grained features and global structural dependencies through a parallel dual-branch architecture, overcoming the limitations of traditional single-modality models in capturing either local or global information; in terms of fusion strategy, the framework innovatively designs a cross-modal dynamic fusion strategy, combining overlapping multi-head self-attention modules with a bidirectional feature alignment mechanism, addressing the bottlenecks of low feature interaction efficiency and excessive attention fusion computations in traditional parallel fusion, and further introduces cross-domain local integration technology, which enhances the representation ability of the lesion area through pixel-level feature recalibration and optimizes the diagnostic robustness of complex cases. Experiments show that the framework exhibits excellent feature expression and generalization performance in cross-domain scenarios of ophthalmic medical images and natural images, providing a high-precision, low-redundancy fusion paradigm for multimodal medical image analysis, and promoting the upgrade of intelligent diagnosis and treatment from single-modal static analysis to dynamic decision-making.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools