Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning

Uğur Ayvaz; Hüseyin Gürüler; Faheem Khan; Naveed Ahmed; Taegkeun Whangbo; Abdusalomov Bobomirzaevich

doi:10.32604/cmc.2022.023278

Open Access icon Open Access

ARTICLE

Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning

Uğur Ayvaz¹, Hüseyin Gürüler², Faheem Khan³, Naveed Ahmed⁴, Taegkeun Whangbo^3,*, Abdusalomov Akmalbek Bobomirzaevich³

1 Department of Computer Engineering, Istanbul Technical University, Istanbul, 34485, Turkey
2 Department of Information Systems Engineering, Mugla Sitki Kocman University, Mugla, 48000, Turkey
3 Artificial Intelligence Lab, Department of Computer Engineering, Gachon University, Seongnam, 13557, Korea
4 Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, 27272, UAE

* Corresponding Author: Taegkeun Whangbo. Email: email

(This article belongs to the Special Issue: Machine Learning Empowered Secure Computing for Intelligent Systems)

Computers, Materials & Continua 2022, 71(3), 5511-5521. https://doi.org/10.32604/cmc.2022.023278

Received 01 September 2021; Accepted 01 November 2021; Issue published 14 January 2022

Abstract

Automatic speaker recognition (ASR) systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals. One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients (MFCCs). Recent researches show that MFCCs are successful in processing the voice signal with high accuracies. MFCCs represents a sequence of voice signal-specific features. This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings. Since the human perception of sound is not linear, after the filterbank step in the MFCC method, we converted the obtained log filterbanks into decibel (dB) features-based spectrograms without applying the Discrete Cosine Transform (DCT). A new dataset was created with converted spectrogram into a 2-D array. Several learning algorithms were implemented with a 10-fold cross-validation method to detect the speaker. The highest accuracy of 90.2% was achieved using Multi-layer Perceptron (MLP) with tanh activation function. The most important output of this study is the inclusion of human voice as a new feature set.

Keywords

Automatic speaker recognition; human voice recognition; spatial pattern recognition; MFCCs; spectrogram; machine learning; artificial intelligence

Cite This Article

APA Style

Ayvaz, U., Gürüler, H., Khan, F., Ahmed, N., Whangbo, T. et al. (2022). Automatic speaker recognition using mel-frequency cepstral coefficients through machine learning. Computers, Materials & Continua, 71(3), 5511-5521. https://doi.org/10.32604/cmc.2022.023278

Vancouver Style

Ayvaz U, Gürüler H, Khan F, Ahmed N, Whangbo T, Bobomirzaevich AA. Automatic speaker recognition using mel-frequency cepstral coefficients through machine learning. Comput Mater Contin. 2022;71(3):5511-5521 https://doi.org/10.32604/cmc.2022.023278

IEEE Style

U. Ayvaz, H. Gürüler, F. Khan, N. Ahmed, T. Whangbo, and A.A. Bobomirzaevich "Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning," Comput. Mater. Contin., vol. 71, no. 3, pp. 5511-5521. 2022. https://doi.org/10.32604/cmc.2022.023278

BibTex EndNote RIS

Citations

1

[click to view]

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning

Abstract

Keywords

Cite This Article

Citations

2503

2063

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link