Open Access iconOpen Access

ARTICLE

crossmark

A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation

Cyreneo Dofitas1, Yong-Woon Kim2, Yung-Cheol Byun3,*

1 Department of Institute of Information Science and Technology, Jeju National University, Jeju-si, 63243, Republic of Korea
2 Department of Computer Engineering, Jeju National University, Jeju, 63243, Republic of Korea
3 Department of Computer Engineering, Major of Electronic Engineering, Jeju National University, Food Tech Center (FTC), Jeju National University, Jeju, 63243, Republic of Korea

* Corresponding Author: Yung-Cheol Byun. Email: email

Computers, Materials & Continua 2026, 86(2), 1-19. https://doi.org/10.32604/cmc.2025.069374

Abstract

Recent advances in deep learning have significantly improved flood detection and segmentation from aerial and satellite imagery. However, conventional convolutional neural networks (CNNs) often struggle in complex flood scenarios involving reflections, occlusions, or indistinct boundaries due to limited contextual modeling. To address these challenges, we propose a hybrid flood segmentation framework that integrates a Vision Transformer (ViT) encoder with a U-Net decoder, enhanced by a novel Flood-Aware Refinement Block (FARB). The FARB module improves boundary delineation and suppresses noise by combining residual smoothing with spatial-channel attention mechanisms. We evaluate our model on a UAV-acquired flood imagery dataset, demonstrating that the proposed ViT-UNet+FARB architecture outperforms existing CNN and Transformer-based models in terms of accuracy and mean Intersection over Union (mIoU). Detailed ablation studies further validate the contribution of each component, confirming that the FARB design significantly enhances segmentation quality. To its better performance and computational efficiency, the proposed framework is well-suited for flood monitoring and disaster response applications, particularly in resource-constrained environments.

Keywords

Flood detection; vision transformer (ViT); U-Net segmentation; image processing; deep learning; artificial intelligence

Cite This Article

APA Style
Dofitas, C., Kim, Y., Byun, Y. (2026). A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation. Computers, Materials & Continua, 86(2), 1–19. https://doi.org/10.32604/cmc.2025.069374
Vancouver Style
Dofitas C, Kim Y, Byun Y. A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation. Comput Mater Contin. 2026;86(2):1–19. https://doi.org/10.32604/cmc.2025.069374
IEEE Style
C. Dofitas, Y. Kim, and Y. Byun, “A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–19, 2026. https://doi.org/10.32604/cmc.2025.069374



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 687

    View

  • 244

    Download

  • 0

    Like

Share Link