HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks

Amany Sarhan; Nada Elshennawy; Dina Ibrahim

doi:10.32604/cmc.2021.016509

Open Access icon Open Access

ARTICLE

HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks

Amany M. Sarhan¹, Nada M. Elshennawy¹, Dina M. Ibrahim^1,2,*

1 Department of Computers and Control Engineering, Faculty of Engineering, Tanta University, Tanta, 37133, Egypt
2 Department of Information Technology, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia

* Corresponding Author: Dina M. Ibrahim. Email: email

Computers, Materials & Continua 2021, 68(2), 1531-1549. https://doi.org/10.32604/cmc.2021.016509

Received 04 January 2021; Accepted 17 February 2021; Issue published 13 April 2021

Abstract

Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking. This is a task of decoding the text from the speaker’s mouth movement. This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles. Using deep learning technologies makes it easier for users to extract a large number of different features, which can then be converted to probabilities of letters to obtain accurate results. Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition. However, in this paper, a deep convolutional neural network model called the hybrid lip-reading (HLR-Net) model is developed for lip reading from a video. The proposed model includes three stages, namely, pre-processing, encoder, and decoder stages, which produce the output subtitle. The inception, gradient, and bidirectional GRU layers are used to build the encoder, and the attention, fully-connected, activation function layers are used to build the decoder, which performs the connectionist temporal classification (CTC). In comparison with the three recent models, namely, the LipNet model, the lip-reading model with cascaded attention (LCANet), and attention-CTC (A-ACA) model, on the GRID corpus dataset, the proposed HLR-Net model can achieve significant improvements, achieving the CER of 4.9%, WER of 9.7%, and Bleu score of 92% in the case of unseen speakers, and the CER of 1.4%, WER of 3.3%, and Bleu score of 99% in the case of overlapped speakers.

Keywords

lip-reading; visual speech recognition; deep neural network; connectionist temporal classification

Cite This Article

APA Style

Sarhan, A.M., Elshennawy, N.M., Ibrahim, D.M. (2021). Hlr-net: A hybrid lip-reading model based on deep convolutional neural networks. Computers, Materials & Continua, 68(2), 1531-1549. https://doi.org/10.32604/cmc.2021.016509

Vancouver Style

Sarhan AM, Elshennawy NM, Ibrahim DM. Hlr-net: A hybrid lip-reading model based on deep convolutional neural networks. Comput Mater Contin. 2021;68(2):1531-1549 https://doi.org/10.32604/cmc.2021.016509

IEEE Style

A.M. Sarhan, N.M. Elshennawy, and D.M. Ibrahim "HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks," Comput. Mater. Contin., vol. 68, no. 2, pp. 1531-1549. 2021. https://doi.org/10.32604/cmc.2021.016509

BibTex EndNote RIS

Citations

2

[click to view]

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks

Abstract

Keywords

Cite This Article

Citations

3216

2376

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link