Open Access iconOpen Access

ARTICLE

VIF-YOLO: A Visible-Infrared Fusion YOLO Model for Real-Time Human Detection in Dense Smoke Environments

Wenhe Chen1, Yue Wang1, Shuonan Shen1, Leer Hua1, Caixia Zheng2, Qi Pu1,*, Xundiao Ma3,*

1 School of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, China
2 College of Information Sciences and Technology, Northeast Normal University, Changchun, 130117, China
3 School of Teacher Education, Qujing Normal University, Qujing, 655011, China

* Corresponding Authors: Qi Pu. Email: email; Xundiao Ma. Email: email

Computers, Materials & Continua 2026, 87(1), 60 https://doi.org/10.32604/cmc.2025.074682

Abstract

In fire rescue scenarios, traditional manual operations are highly dangerous, as dense smoke, low visibility, extreme heat, and toxic gases not only hinder rescue efficiency but also endanger firefighters’ safety. Although intelligent rescue robots can enter hazardous environments in place of humans, smoke poses major challenges for human detection algorithms. These challenges include the attenuation of visible and infrared signals, complex thermal fields, and interference from background objects, all of which make it difficult to accurately identify trapped individuals. To address this problem, we propose VIF-YOLO, a visible–infrared fusion model for real-time human detection in dense smoke environments. The framework introduces a lightweight multimodal fusion (LMF) module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images, preserving fine details while enhancing salient features. In addition, an efficient multiscale attention (EMA) mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions. Extensive experiments on our newly constructed multimodal smoke human detection (MSHD) dataset demonstrate that VIF-YOLO achieves mAP50 of 99.5%, precision of 99.2%, and recall of 99.3%, outperforming YOLOv10n by a clear margin. Furthermore, when deployed on the NVIDIA Jetson Xavier NX, VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms, validating its real-time capability on edge-computing platforms. These results confirm that VIF-YOLO provides accurate, robust, and fast detection across complex backgrounds and diverse smoke conditions, ensuring reliable and rapid localization of individuals in need of rescue.

Keywords

Fire rescue; dense smoke environments; human detection; multimodal fusion; YOLO

Cite This Article

APA Style
Chen, W., Wang, Y., Shen, S., Hua, L., Zheng, C. et al. (2026). VIF-YOLO: A Visible-Infrared Fusion YOLO Model for Real-Time Human Detection in Dense Smoke Environments. Computers, Materials & Continua, 87(1), 60. https://doi.org/10.32604/cmc.2025.074682
Vancouver Style
Chen W, Wang Y, Shen S, Hua L, Zheng C, Pu Q, et al. VIF-YOLO: A Visible-Infrared Fusion YOLO Model for Real-Time Human Detection in Dense Smoke Environments. Comput Mater Contin. 2026;87(1):60. https://doi.org/10.32604/cmc.2025.074682
IEEE Style
W. Chen et al., “VIF-YOLO: A Visible-Infrared Fusion YOLO Model for Real-Time Human Detection in Dense Smoke Environments,” Comput. Mater. Contin., vol. 87, no. 1, pp. 60, 2026. https://doi.org/10.32604/cmc.2025.074682



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 478

    View

  • 85

    Download

  • 0

    Like

Share Link