Open Access iconOpen Access


The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition

Mohammad Amaz Uddin1, Mohammad Salah Uddin Chowdury1, Mayeen Uddin Khandaker2,*, Nissren Tamam3, Abdelmoneim Sulieman4

1 Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong, 4381, Bangladesh
2 Centre for Applied Physics and Radiation Technologies, School of Engineering and Technology, Sunway University, Bandar Sunway, Selangor, 47500, Malaysia
3 Department of Physics, College of Sciences, Princess Nourah bint Abdulrahman University, P.O Box 84428, Riyadh, 11671, Saudi Arabia
4 Department of Radiology and Medical Imaging, Prince Sattam bin Abdulaziz University, Alkharj, Saudi Arabia

* Corresponding Author: Mayeen Uddin Khandaker. Email: email

Computers, Materials & Continua 2023, 74(1), 1709-1722.


Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM + CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing.


Cite This Article

APA Style
Uddin, M.A., Chowdury, M.S.U., Khandaker, M.U., Tamam, N., Sulieman, A. (2023). The efficacy of deep learning-based mixed model for speech emotion recognition. Computers, Materials & Continua, 74(1), 1709-1722.
Vancouver Style
Uddin MA, Chowdury MSU, Khandaker MU, Tamam N, Sulieman A. The efficacy of deep learning-based mixed model for speech emotion recognition. Comput Mater Contin. 2023;74(1):1709-1722
IEEE Style
M.A. Uddin, M.S.U. Chowdury, M.U. Khandaker, N. Tamam, and A. Sulieman "The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition," Comput. Mater. Contin., vol. 74, no. 1, pp. 1709-1722. 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1389


  • 722


  • 0


Share Link