Open AccessOpen Access


Mental Illness Disorder Diagnosis Using Emotion Variation Detection from Continuous English Speech

S. Lalitha1, Deepa Gupta2,*, Mohammed Zakariah3, Yousef Ajami Alotaibi3

1 Department of Electronics & Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India
2 Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India
3 Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Saudi Arabia

* Corresponding Author: Deepa Gupta. Email:

(This article belongs to this Special Issue: Recent Trends in Machine Intelligence respected to Medical Field Applications)

Computers, Materials & Continua 2021, 69(3), 3217-3238.


Automatic recognition of human emotions in a continuous dialog model remains challenging where a speaker’s utterance includes several sentences that may not always carry a single emotion. Limited work with standalone speech emotion recognition (SER) systems proposed for continuous speech only has been reported. In the recent decade, various effective SER systems have been proposed for discrete speech, i.e., short speech phrases. It would be more helpful if these systems could also recognize emotions from continuous speech. However, if these systems are applied directly to test emotions from continuous speech, emotion recognition performance would not be similar to that achieved for discrete speech due to the mismatch between training data (from training speech) and testing data (from continuous speech). The problem may possibly be resolved if an existing SER system for discrete speech is enhanced. Thus, in this work the author’s existing effective SER system for multilingual and mixed-lingual discrete speech is enhanced by enriching the cepstral speech feature set with bi-spectral speech features and a unique functional set of Mel frequency cepstral coefficient features derived from a sine filter bank. Data augmentation is applied to combat skewness of the SER system toward certain emotions. Classification using random forest is performed. This enhanced SER system is used to predict emotions from continuous speech with a uniform segmentation method. Due to data scarcity, several audio samples of discrete speech from the SAVEE database that has recordings in a universal language, i.e., English, are concatenated resulting in multi-emotional speech samples. Anger, fear, sad, and neutral emotions, which are vital during the initial investigation of mentally disordered individuals, are selected to build six categories of multi-emotional samples. Experimental results demonstrate the suitability of the proposed method for recognizing emotions from continuous speech as well as from discrete speech.


Cite This Article

S. Lalitha, D. Gupta, M. Zakariah and Y. Ajami Alotaibi, "Mental illness disorder diagnosis using emotion variation detection from continuous english speech," Computers, Materials & Continua, vol. 69, no.3, pp. 3217–3238, 2021.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1664


  • 820


  • 0


Share Link