CNN-Based Voice Emotion Classification Model for Risk Detection

Hyun Yoo; Ji-Won Baek; Kyungyong Chung

doi:10.32604/iasc.2021.018115

Open Access icon Open Access

ARTICLE

CNN-Based Voice Emotion Classification Model for Risk Detection

Hyun Yoo¹, Ji-Won Baek², Kyungyong Chung^3,*

1 Contents Convergence Software Research Institute, Kyonggi University, Suwon-si, 16227, Korea
2 Department of Computer Science, Kyonggi University, Suwon-si, 16227, Korea
3 Division of AI Computer Science and Engineering, Kyonggi University, Suwon-si, 16227, Korea

* Corresponding Author: Kyungyong Chung. Email: email

Intelligent Automation & Soft Computing 2021, 29(2), 319-334. https://doi.org/10.32604/iasc.2021.018115

Received 25 February 2021; Accepted 06 April 2021; Issue published 16 June 2021

Abstract

With the convergence and development of the Internet of things (IoT) and artificial intelligence, closed-circuit television, wearable devices, and artificial neural networks have been combined and applied to crime prevention and follow-up measures against crimes. However, these IoT devices have various limitations based on the physical environment and face the fundamental problem of privacy violations. In this study, voice data are collected and emotions are classified based on an acoustic sensor that is free of privacy violations and is not sensitive to changes in external environments, to overcome these limitations. For the classification of emotions in the voice, the data generated from an acoustic sensor are combined with the convolution neural network algorithm of an artificial neural network. Short-time Fourier transform and wavelet transform as frequency spectrum representation methods are used as preprocessing techniques for the analysis of a pattern of acoustic data. The preprocessed spectrum data are represented as a 2D image of the pattern of emotion felt through hearing, which is applied to the image classification learning model of an artificial neural network. The image classification learning model uses the ResNet. The artificial neural network internally uses various forms of gradient descent to compare the learning of each node and analyzes the pattern through a feature map. The classification model facilitates the classification of voice data into three emotion types: angry, fearful, and surprised. Thus, a system that can detect situations around sensors and predict danger can be established. Despite the different emotional intensities of the base data and sentence-based learning data, the established voice classification model demonstrated an accuracy of more than 77.2%. This model is applicable to various areas, including the prediction of crime situations and the management of work environments for emotional labor.

Keywords

Convolutional neural networks; machine learning; deep learning; voice emotion; crime prediction; crime prevention; IoT

Cite This Article

APA Style

Yoo, H., Baek, J., Chung, K. (2021). CNN-Based Voice Emotion Classification Model for Risk Detection. Intelligent Automation & Soft Computing, 29(2), 319–334. https://doi.org/10.32604/iasc.2021.018115

Vancouver Style

Yoo H, Baek J, Chung K. CNN-Based Voice Emotion Classification Model for Risk Detection. Intell Automat Soft Comput. 2021;29(2):319–334. https://doi.org/10.32604/iasc.2021.018115

IEEE Style

H. Yoo, J. Baek, and K. Chung, “CNN-Based Voice Emotion Classification Model for Risk Detection,” Intell. Automat. Soft Comput., vol. 29, no. 2, pp. 319–334, 2021. https://doi.org/10.32604/iasc.2021.018115

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

CNN-Based Voice Emotion Classification Model for Risk Detection

Abstract

Keywords

Cite This Article

3415

2102

1

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link