TY  - EJOU
AU  - Deng, Ziyang 
AU  - Min, Weidong 
AU  - Han, Qing 
AU  - Liu, Mengxue 
AU  - Li, Longfei 

TI  - VTAN: A Novel Video Transformer Attention-Based Network for Dynamic Sign Language Recognition
T2  - Computers, Materials \& Continua

PY  - 2025
VL  - 82
IS  - 2
SN  - 1546-2226

AB  - Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.
KW  - Dynamic sign language recognition; transformer; soft attention; attention-based; visual feature aggregation

DO  - 10.32604/cmc.2024.057456