TY  - EJOU
AU  - Huang, Yan-Hao 
AU  - Huang, Yu-Tse 

TI  - Position-Wise Attention-Enhanced Vision Transformer for Diabetic Retinopathy Grading
T2  - Computers, Materials \& Continua

PY  - 2026
VL  - 87
IS  - 3
SN  - 1546-2226

AB  - Diabetic Retinopathy (DR) is a common microvascular complication of diabetes that progressively damages the retinal blood vessels and, without timely treatment, can lead to irreversible vision loss. In clinical practice, DR is typically diagnosed by ophthalmologists through visual inspection of fundus images, a process that is time-consuming and prone to inter- and intra-observer variability. Recent advances in artificial intelligence, particularly Convolutional Neural Networks (CNNs) and Transformer-based models, have shown strong potential for automated medical image classification and decision support. In this study, we propose a Position-Wise Attention-Enhanced Vision Transformer (PWAE-ViT), which integrates a positional attention module into the standard ViT architecture to strengthen spatial positional information and feature representation across image patches. The proposed module encourages the network to better model local and global contextual relationships, thereby improving DR grading performance. To evaluate the robustness of our model, experiments were conducted on two public retinal fundus image datasets: APTOS-2019 and Indian Diabetic Retinopathy Image Dataset (IDRiD). The proposed PWAE-ViT consistently outperforms the baseline ViT model, achieving classification accuracies of 84% and 62% on the APTOS-2019 and IDRiD datasets, respectively. These results demonstrate more accurate and reliable DR severity classification, offering a promising tool to assist clinicians in screening and diagnosis.
KW  - Transformer; attention mechanisms; medical imaging; diabetic retinopathy

DO  - 10.32604/cmc.2026.076800