Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation

Chiara Innocente; Matteo Boemio; Gianmarco Lorenzetti; Ilaria Pulito; Diego Romagnoli; Valeria Saponaro; Giorgia Marullo; Luca Ulrich; Enrico Vezzetti

doi:10.32604/cmes.2025.063186

Open Access icon Open Access

ARTICLE

Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation

Chiara Innocente^1,*, Matteo Boemio², Gianmarco Lorenzetti², Ilaria Pulito², Diego Romagnoli², Valeria Saponaro², Giorgia Marullo¹, Luca Ulrich¹, Enrico Vezzetti¹

1 Management and Production Engineering, Polytechnic University of Turin, C.so Duca degli Abruzzi 24, Torino, 10129, Italy
2 Biomedical Engineering, Polytechnic University of Turin, C.so Duca degli Abruzzi 24, Torino, 10129, Italy

* Corresponding Author: Chiara Innocente. Email: email

Computer Modeling in Engineering & Sciences 2025, 143(2), 1355-1379. https://doi.org/10.32604/cmes.2025.063186

Received 07 January 2025; Accepted 28 March 2025; Issue published 30 May 2025

Abstract

Lip-reading technology, based on visual speech decoding and automatic speech recognition, offers a promising solution to overcoming communication barriers, particularly for individuals with temporary or permanent speech impairments. However, most Visual Speech Recognition (VSR) research has primarily focused on the English language and general-purpose applications, limiting its practical applicability in medical and rehabilitative settings. This study introduces the first Deep Learning (DL) based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions, facilitating communication for patients recovering from vocal cord surgeries, whether temporarily or permanently impaired. To ensure relevance and effectiveness in real-world scenarios, a carefully curated vocabulary of twenty-five Italian words was selected, encompassing critical semantic fields such as Needs, Questions, Answers, Emergencies, Greetings, Requests, and Body Parts. These words were chosen to address both essential daily communication and urgent medical assistance requests. Our approach combines a spatiotemporal Convolutional Neural Network (CNN) with a bidirectional Long Short-Term Memory (BiLSTM) recurrent network, and a Connectionist Temporal Classification (CTC) loss function to recognize individual words, without requiring predefined words boundaries. The experimental results demonstrate the system’s robust performance in recognizing target words, reaching an average accuracy of 96.4% in individual word recognition, suggesting that the system is particularly well-suited for offering support in constrained clinical and caregiving environments, where quick and reliable communication is critical. In conclusion, the study highlights the importance of developing language-specific, application-driven VSR solutions, particularly for non-English languages with limited linguistic resources. By bridging the gap between deep learning-based lip-reading and real-world clinical needs, this research advances assistive communication technologies, paving the way for more inclusive and medically relevant applications of VSR in rehabilitation and healthcare.

Keywords

Lip-reading; deep learning; automatic speech recognition; visual speech decoding; 3D convolutional neural network

Cite This Article

APA Style

Innocente, C., Boemio, M., Lorenzetti, G., Pulito, I., Romagnoli, D. et al. (2025). Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation. Computer Modeling in Engineering & Sciences, 143(2), 1355–1379. https://doi.org/10.32604/cmes.2025.063186

Vancouver Style

Innocente C, Boemio M, Lorenzetti G, Pulito I, Romagnoli D, Saponaro V, et al. Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation. Comput Model Eng Sci. 2025;143(2):1355–1379. https://doi.org/10.32604/cmes.2025.063186

IEEE Style

C. Innocente et al., “Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation,” Comput. Model. Eng. Sci., vol. 143, no. 2, pp. 1355–1379, 2025. https://doi.org/10.32604/cmes.2025.063186

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation

Abstract

Keywords

Cite This Article

2247

922

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link