Open Access
ARTICLE
RSG-Conformer: ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition
Institute of Automation and Electronic Information, Xiangtan University, Xiangtan, 411105, China
* Corresponding Author: Xin Du. Email:
Computers, Materials & Continua 2026, 86(3), 55 https://doi.org/10.32604/cmc.2025.072145
Received 20 August 2025; Accepted 27 October 2025; Issue published 12 January 2026
Abstract
Audio-visual speech recognition (AVSR), which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions, has attracted significant research interest. However, Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length. In addition, Conformer-based architectures may not provide sufficient flexibility for modeling local dependencies at different granularities. To mitigate these limitations, this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer (RSG-Conformer) architecture. Specifically, we propose a Global-enhanced Sparse Attention (GSA) module incorporating an efficient context restoration block to recover lost contextual cues. Concurrently, a Grouped-scale Convolution (GSC) module replaces the standard Conformer convolution module, providing adaptive local modeling across varying temporal resolutions. Furthermore, we integrate a Refined Intermediate Contextual CTC (RIC-CTC) supervision strategy. This approach applies progressively increasing loss weights combined with convolution-based context aggregation, thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks. Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach, with word error rates (WERs) reduced to 1.8% and 1.5%, respectively. These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools