Muhammad Babar Kamal1, Arfat Ahmad Khan2, Faizan Ahmed Khan3, Malik Muhammad Ali Shahid4, Chitapong Wechtaisong2,*, Muhammad Daud Kamal5, Muhammad Junaid Ali6, Peerapong Uthansakul2
CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 5547-5562, 2022, DOI:10.32604/cmc.2022.024590
- 21 April 2022
Abstract The deep learning advancements have greatly improved the performance of speech recognition systems, and most recent systems are based on the Recurrent Neural Network (RNN). Overall, the RNN works fine with the small sequence data, but suffers from the gradient vanishing problem in case of large sequence. The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data. Generally, in speech recognition, the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities. The image is classified by the machine learning mechanism to generate a… More >