Open Access iconOpen Access

ARTICLE

crossmark

Hybrid HRNet-Swin Transformer: Multi-Scale Feature Fusion for Aerial Segmentation and Classification

Asaad Algarni1, Aysha Naseer 2, Mohammed Alshehri3, Yahya AlQahtani4, Abdulmonem Alshahrani4, Jeongmin Park5,*

1 Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, 91911, Saudi Arabia
2 Department of Computer Science, Air University, Islamabad, 44000, Pakistan
3 Department of Computer Science, King Khalid University, Abha, 61421, Saudi Arabia
4 Department of Informatics and Computer Systems, King Khalid University, Abha, 61421, Saudi Arabia
5 Department of Computer Engineering, Tech University of Korea, 237 Sangidaehak-ro, Siheung-si, 15073, Gyeonggi-do, Republic of Korea

* Corresponding Author: Jeongmin Park. Email: email

Computers, Materials & Continua 2025, 85(1), 1981-1998. https://doi.org/10.32604/cmc.2025.064268

Abstract

Remote sensing plays a pivotal role in environmental monitoring, disaster relief, and urban planning, where accurate scene classification of aerial images is essential. However, conventional convolutional neural networks (CNNs) struggle with long-range dependencies and preserving high-resolution features, limiting their effectiveness in complex aerial image analysis. To address these challenges, we propose a Hybrid HRNet-Swin Transformer model that synergizes the strengths of HRNet-W48 for high-resolution segmentation and the Swin Transformer for global feature extraction. This hybrid architecture ensures robust multi-scale feature fusion, capturing fine-grained details and broader contextual relationships in aerial imagery. Our methodology begins with preprocessing steps, including normalization, histogram equalization, and noise reduction, to enhance input data quality. The HRNet-W48 backbone maintains high-resolution feature maps throughout the network, enabling precise segmentation, while the Swin Transformer leverages hierarchical self-attention to model long-range dependencies efficiently. By integrating these components, our model achieves superior performance in segmentation and classification tasks compared to traditional CNNs and standalone transformer models. We evaluate our approach on two benchmark datasets: UC Merced and WHU-RS19. Experimental results demonstrate that the proposed hybrid model outperforms existing methods, achieving state-of-the-art accuracy while maintaining computational efficiency. Specifically, it excels in preserving fine spatial details and contextual understanding, critical for applications like land-use classification and disaster assessment.

Keywords

Remote sensing; computer vision; aerial imagery; scene classification; feature extraction; transformer

Cite This Article

APA Style
Algarni, A., Naseer, A., Alshehri, M., AlQahtani, Y., Alshahrani, A. et al. (2025). Hybrid HRNet-Swin Transformer: Multi-Scale Feature Fusion for Aerial Segmentation and Classification. Computers, Materials & Continua, 85(1), 1981–1998. https://doi.org/10.32604/cmc.2025.064268
Vancouver Style
Algarni A, Naseer A, Alshehri M, AlQahtani Y, Alshahrani A, Park J. Hybrid HRNet-Swin Transformer: Multi-Scale Feature Fusion for Aerial Segmentation and Classification. Comput Mater Contin. 2025;85(1):1981–1998. https://doi.org/10.32604/cmc.2025.064268
IEEE Style
A. Algarni, A. Naseer, M. Alshehri, Y. AlQahtani, A. Alshahrani, and J. Park, “Hybrid HRNet-Swin Transformer: Multi-Scale Feature Fusion for Aerial Segmentation and Classification,” Comput. Mater. Contin., vol. 85, no. 1, pp. 1981–1998, 2025. https://doi.org/10.32604/cmc.2025.064268



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2337

    View

  • 2026

    Download

  • 0

    Like

Share Link