A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition

Sameena Javaid; Safdar Rizvi

doi:10.32604/cmc.2023.031924

Open Access icon Open Access

ARTICLE

A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition

Sameena Javaid^*, Safdar Rizvi

Department of Computer Sciences, School of Engineering and Applied Sciences, Bahria University, Karachi Campus, Karachi, Pakistan

* Corresponding Author: Sameena Javaid. Email: email

Computers, Materials & Continua 2023, 74(1), 523-537. https://doi.org/10.32604/cmc.2023.031924

Received 30 April 2022; Accepted 22 June 2022; Issue published 22 September 2022

Abstract

Sign language fills the communication gap for people with hearing and speaking ailments. It includes both visual modalities, manual gestures consisting of movements of hands, and non-manual gestures incorporating body movements including head, facial expressions, eyes, shoulder shrugging, etc. Previously both gestures have been detected; identifying separately may have better accuracy, but much communicational information is lost. A proper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others. Our novel proposed system contributes as Sign Language Action Transformer Network (SLATN), localizing hand, body, and facial gestures in video sequences. Here we are expending a Transformer-style structural design as a “base network” to extract features from a spatiotemporal domain. The model impulsively learns to track individual persons and their action context in multiple frames. Furthermore, a “head network” emphasizes hand movement and facial expression simultaneously, which is often crucial to understanding sign language, using its attention mechanism for creating tight bounding boxes around classified gestures. The model’s work is later compared with the traditional identification methods of activity recognition. It not only works faster but achieves better accuracy as well. The model achieves overall 82.66% testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second (G-FLOPS). Another contribution is a newly created dataset of Pakistan Sign Language for Manual and Non-Manual (PkSLMNM) gestures.

Keywords

Sign language; gesture recognition; manual signs; non-manual signs; action transformer network

Cite This Article

APA Style

Javaid, S., Rizvi, S. (2023). A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition. Computers, Materials & Continua, 74(1), 523–537. https://doi.org/10.32604/cmc.2023.031924

Vancouver Style

Javaid S, Rizvi S. A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition. Comput Mater Contin. 2023;74(1):523–537. https://doi.org/10.32604/cmc.2023.031924

IEEE Style

S. Javaid and S. Rizvi, “A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition,” Comput. Mater. Contin., vol. 74, no. 1, pp. 523–537, 2023. https://doi.org/10.32604/cmc.2023.031924

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition

Abstract

Keywords

Cite This Article

2638

1524

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link