Open Access iconOpen Access

ARTICLE

3D Enhanced Residual CNN for Video Super-Resolution Network

Weiqiang Xin1,2,3,#, Zheng Wang4,#, Xi Chen1,5, Yufeng Tang1, Bing Li1, Chunwei Tian2,5,*

1 School of Software, Northwestern Polytechnical University, Xi’an, 710072, China
2 Shenzhen Research Institute of Northwestern Polytechnical University, Northwestern Polytechnical University, Shenzhen, 518057, China
3 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
4 School of Interdisciplinary Studies, Lingnan University, Hong Kong, 999077, China
5 Yangtze River Delta Research Institute, Northwestern Polytechnical University, Taicang, 215400, China

* Corresponding Author: Chunwei Tian. Email: email
# These authors contributed equally to this work

(This article belongs to the Special Issue: Advancements in Pattern Recognition through Machine Learning: Bridging Innovation and Application)

Computers, Materials & Continua 2025, 85(2), 2837-2849. https://doi.org/10.32604/cmc.2025.069784

Abstract

Deep convolutional neural networks (CNNs) have demonstrated remarkable performance in video super-resolution (VSR). However, the ability of most existing methods to recover fine details in complex scenes is often hindered by the loss of shallow texture information during feature extraction. To address this limitation, we propose a 3D Convolutional Enhanced Residual Video Super-Resolution Network (3D-ERVSNet). This network employs a forward and backward bidirectional propagation module (FBBPM) that aligns features across frames using explicit optical flow through lightweight SPyNet. By incorporating an enhanced residual structure (ERS) with skip connections, shallow and deep features are effectively integrated, enhancing texture restoration capabilities. Furthermore, 3D convolution module (3DCM) is applied after the backward propagation module to implicitly capture spatio-temporal dependencies. The architecture synergizes these components where FBBPM extracts aligned features, ERS fuses hierarchical representations, and 3DCM refines temporal coherence. Finally, a deep feature aggregation module (DFAM) fuses the processed features, and a pixel-upsampling module (PUM) reconstructs the high-resolution (HR) video frames. Comprehensive evaluations on REDS, Vid4, UDM10, and Vim4 benchmarks demonstrate well performance including 30.95 dB PSNR/0.8822 SSIM on REDS and 32.78 dB/0.8987 on Vim4. 3D-ERVSNet achieves significant gains over baselines while maintaining high efficiency with only 6.3M parameters and 77 ms/frame runtime (i.e., 20× faster than RBPN). The network’s effectiveness stems from its task-specific asymmetric design that balances explicit alignment and implicit fusion.

Keywords

Video super-resolution; 3D convolution; enhanced residual CNN; spatio-temporal feature extraction

Cite This Article

APA Style
Xin, W., Wang, Z., Chen, X., Tang, Y., Li, B. et al. (2025). 3D Enhanced Residual CNN for Video Super-Resolution Network. Computers, Materials & Continua, 85(2), 2837–2849. https://doi.org/10.32604/cmc.2025.069784
Vancouver Style
Xin W, Wang Z, Chen X, Tang Y, Li B, Tian C. 3D Enhanced Residual CNN for Video Super-Resolution Network. Comput Mater Contin. 2025;85(2):2837–2849. https://doi.org/10.32604/cmc.2025.069784
IEEE Style
W. Xin, Z. Wang, X. Chen, Y. Tang, B. Li, and C. Tian, “3D Enhanced Residual CNN for Video Super-Resolution Network,” Comput. Mater. Contin., vol. 85, no. 2, pp. 2837–2849, 2025. https://doi.org/10.32604/cmc.2025.069784



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2159

    View

  • 904

    Download

  • 0

    Like

Share Link