A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation

Cyreneo Dofitas; Yong-Woon Kim; Yung-Cheol Byun

doi:10.32604/cmc.2025.069374

Open Access icon Open Access

ARTICLE

A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation

Cyreneo Dofitas¹, Yong-Woon Kim², Yung-Cheol Byun^3,*

1 Department of Institute of Information Science and Technology, Jeju National University, Jeju-si, 63243, Republic of Korea
2 Department of Computer Engineering, Jeju National University, Jeju, 63243, Republic of Korea
3 Department of Computer Engineering, Major of Electronic Engineering, Jeju National University, Food Tech Center (FTC), Jeju National University, Jeju, 63243, Republic of Korea

* Corresponding Author: Yung-Cheol Byun. Email: email

Computers, Materials & Continua 2026, 86(2), 1-19. https://doi.org/10.32604/cmc.2025.069374

Received 21 June 2025; Accepted 27 August 2025; Issue published 09 December 2025

Abstract

Recent advances in deep learning have significantly improved flood detection and segmentation from aerial and satellite imagery. However, conventional convolutional neural networks (CNNs) often struggle in complex flood scenarios involving reflections, occlusions, or indistinct boundaries due to limited contextual modeling. To address these challenges, we propose a hybrid flood segmentation framework that integrates a Vision Transformer (ViT) encoder with a U-Net decoder, enhanced by a novel Flood-Aware Refinement Block (FARB). The FARB module improves boundary delineation and suppresses noise by combining residual smoothing with spatial-channel attention mechanisms. We evaluate our model on a UAV-acquired flood imagery dataset, demonstrating that the proposed ViT-UNet+FARB architecture outperforms existing CNN and Transformer-based models in terms of accuracy and mean Intersection over Union (mIoU). Detailed ablation studies further validate the contribution of each component, confirming that the FARB design significantly enhances segmentation quality. To its better performance and computational efficiency, the proposed framework is well-suited for flood monitoring and disaster response applications, particularly in resource-constrained environments.

Keywords

Flood detection; vision transformer (ViT); U-Net segmentation; image processing; deep learning; artificial intelligence

Cite This Article

APA Style

Dofitas, C., Kim, Y., Byun, Y. (2026). A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation. Computers, Materials & Continua, 86(2), 1–19. https://doi.org/10.32604/cmc.2025.069374

Vancouver Style

Dofitas C, Kim Y, Byun Y. A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation. Comput Mater Contin. 2026;86(2):1–19. https://doi.org/10.32604/cmc.2025.069374

IEEE Style

C. Dofitas, Y. Kim, and Y. Byun, “A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–19, 2026. https://doi.org/10.32604/cmc.2025.069374

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation

Abstract

Keywords

Cite This Article

1117

458

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link