Speech Recognition via CTC-CNN Model

Wen-Tsai Sung; Hao-Wei Kang; Sung-Jung Hsiao

doi:10.32604/cmc.2023.040024

Open Access icon Open Access

ARTICLE

Speech Recognition via CTC-CNN Model

Wen-Tsai Sung¹, Hao-Wei Kang¹, Sung-Jung Hsiao^2,*

1 Department of Electrical Engineering, National Chin-Yi University of Technology, Taichung, 411030, Taiwan
2 Department of Information Technology, Takming University of Science and Technology, Taipei, 11451, Taiwan

* Corresponding Author: Sung-Jung Hsiao. Email: email

Computers, Materials & Continua 2023, 76(3), 3833-3858. https://doi.org/10.32604/cmc.2023.040024

Received 01 March 2023; Accepted 27 July 2023; Issue published 08 October 2023

Abstract

In the speech recognition system, the acoustic model is an important underlying model, and its accuracy directly affects the performance of the entire system. This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification (CTC) algorithm, which plays an important role in the end-to-end framework, established a convolutional neural network (CNN) combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition. This study uses a sound sensor, ReSpeaker Mic Array v2.0.1, to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference. The baseline acoustic model in this study faces challenges such as long training time, high error rate, and a certain degree of overfitting. The model is trained through continuous design and improvement of the relevant parameters of the acoustic model, and finally the performance is selected according to the evaluation index. Excellent model, which reduces the error rate to about 18%, thus improving the accuracy rate. Finally, comparative verification was carried out from the selection of acoustic feature parameters, the selection of modeling units, and the speaker’s speech rate, which further verified the excellent performance of the CTCCNN_5 + BN + Residual model structure. In terms of experiments, to train and verify the CTC-CNN baseline acoustic model, this study uses THCHS-30 and ST-CMDS speech data sets as training data sets, and after 54 epochs of training, the word error rate of the acoustic model training set is 31%, the word error rate of the test set is stable at about 43%. This experiment also considers the surrounding environmental noise. Under the noise level of 80∼90 dB, the accuracy rate is 88.18%, which is the worst performance among all levels. In contrast, at 40–60 dB, the accuracy was as high as 97.33% due to less noise pollution.

Keywords

Artificial intelligence; speech recognition; speech to text; convolutional neural network; automatic speech recognition

Cite This Article

APA Style

Sung, W., Kang, H., Hsiao, S. (2023). Speech Recognition via CTC-CNN Model. Computers, Materials & Continua, 76(3), 3833–3858. https://doi.org/10.32604/cmc.2023.040024

Vancouver Style

Sung W, Kang H, Hsiao S. Speech Recognition via CTC-CNN Model. Comput Mater Contin. 2023;76(3):3833–3858. https://doi.org/10.32604/cmc.2023.040024

IEEE Style

W. Sung, H. Kang, and S. Hsiao, “Speech Recognition via CTC-CNN Model,” Comput. Mater. Contin., vol. 76, no. 3, pp. 3833–3858, 2023. https://doi.org/10.32604/cmc.2023.040024

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Speech Recognition via CTC-CNN Model

Abstract

Keywords

Cite This Article

1769

1139

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link