Open Access iconOpen Access


Performance Analysis of a Chunk-Based Speech Emotion Recognition Model Using RNN

Hyun-Sam Shin1, Jun-Ki Hong2,*

1 Division of Software Convergence, Hanshin University, Osan-si, 18101, Korea
2 Division of AI Software Engineering, Pai Chai University, Daejeon, 35345, Korea

* Corresponding Author: Jun-Ki Hong. Email: email

Intelligent Automation & Soft Computing 2023, 36(1), 235-248.


Recently, artificial-intelligence-based automatic customer response system has been widely used instead of customer service representatives. Therefore, it is important for automatic customer service to promptly recognize emotions in a customer’s voice to provide the appropriate service accordingly. Therefore, we analyzed the performance of the emotion recognition (ER) accuracy as a function of the simulation time using the proposed chunk-based speech ER (CSER) model. The proposed CSER model divides voice signals into 3-s long chunks to efficiently recognize characteristically inherent emotions in the customer’s voice. We evaluated the performance of the ER of voice signal chunks by applying four RNN techniques—long short-term memory (LSTM), bidirectional-LSTM, gated recurrent units (GRU), and bidirectional-GRU—to the proposed CSER model individually to assess its ER accuracy and time efficiency. The results reveal that GRU shows the best time efficiency in recognizing emotions from speech signals in terms of accuracy as a function of simulation time.


Cite This Article

H. Shin and J. Hong, "Performance analysis of a chunk-based speech emotion recognition model using rnn," Intelligent Automation & Soft Computing, vol. 36, no.1, pp. 235–248, 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 569


  • 453


  • 0


Share Link