Open Access
ARTICLE
3D Enhanced Residual CNN for Video Super-Resolution Network
1 School of Software, Northwestern Polytechnical University, Xi’an, 710072, China
2 Shenzhen Research Institute of Northwestern Polytechnical University, Northwestern Polytechnical University, Shenzhen, 518057, China
3 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
4 School of Interdisciplinary Studies, Lingnan University, Hong Kong, 999077, China
5 Yangtze River Delta Research Institute, Northwestern Polytechnical University, Taicang, 215400, China
* Corresponding Author: Chunwei Tian. Email:
# These authors contributed equally to this work
(This article belongs to the Special Issue: Advancements in Pattern Recognition through Machine Learning: Bridging Innovation and Application)
Computers, Materials & Continua 2025, 85(2), 2837-2849. https://doi.org/10.32604/cmc.2025.069784
Received 30 June 2025; Accepted 15 August 2025; Issue published 23 September 2025
Abstract
Deep convolutional neural networks (CNNs) have demonstrated remarkable performance in video super-resolution (VSR). However, the ability of most existing methods to recover fine details in complex scenes is often hindered by the loss of shallow texture information during feature extraction. To address this limitation, we propose a 3D Convolutional Enhanced Residual Video Super-Resolution Network (3D-ERVSNet). This network employs a forward and backward bidirectional propagation module (FBBPM) that aligns features across frames using explicit optical flow through lightweight SPyNet. By incorporating an enhanced residual structure (ERS) with skip connections, shallow and deep features are effectively integrated, enhancing texture restoration capabilities. Furthermore, 3D convolution module (3DCM) is applied after the backward propagation module to implicitly capture spatio-temporal dependencies. The architecture synergizes these components where FBBPM extracts aligned features, ERS fuses hierarchical representations, and 3DCM refines temporal coherence. Finally, a deep feature aggregation module (DFAM) fuses the processed features, and a pixel-upsampling module (PUM) reconstructs the high-resolution (HR) video frames. Comprehensive evaluations on REDS, Vid4, UDM10, and Vim4 benchmarks demonstrate well performance including 30.95 dB PSNR/0.8822 SSIM on REDS and 32.78 dB/0.8987 on Vim4. 3D-ERVSNet achieves significant gains over baselines while maintaining high efficiency with only 6.3M parameters and 77 ms/frame runtime (i.e., 20× faster than RBPN). The network’s effectiveness stems from its task-specific asymmetric design that balances explicit alignment and implicit fusion.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools