Machine Learning Model Development for Classification of Audio Commands

Kaveh Heidary

doi:10.32604/jai.2026.072857

Open Access icon Open Access

ARTICLE

Machine Learning Model Development for Classification of Audio Commands

Kaveh Heidary^*

Department of Electrical Engineering and Computer Science, Alabama A&M University, Huntsville, AL, USA

* Corresponding Author: Kaveh Heidary. Email: email

Journal on Artificial Intelligence 2026, 8, 65-87. https://doi.org/10.32604/jai.2026.072857

Received 05 September 2025; Accepted 18 November 2025; Issue published 13 February 2026

Abstract

This paper presents a comprehensive investigation into the development and evaluation of Convolutional Neural Network (CNN) models for limited-vocabulary spoken word classification, a fundamental component of many voice-controlled systems. Two distinct CNN architectures are examined: a timeseries 1D CNN that operates directly on the temporal waveform samples of the audio signal, and a 2D CNN that leverages the richer time–frequency representation provided by spectrograms. The study systematically analyzes the influence of key architectural and training parameters, including the number of CNN layers, convolution kernel sizes, and the dimensionality of fully connected layers, on classification accuracy. Particular attention is given to the effects of speaker diversity within the training dataset and the number of word recitations per speaker on model performance. In addition, the classification accuracy of the proposed CNN-based models is compared against that of Whisper-AI, a state-of-the-art large language model (LLM) for speech processing. All experiments are conducted using an open-source dataset, ensuring reproducibility and enabling fair comparison across different architectures and parameter configurations. The experimental results demonstrate that the 2D CNN achieved an overall classification accuracy of 98.5%, highlighting its superior capability in capturing discriminative time–frequency features for robust spoken word recognition. These findings offer valuable insights into optimizing CNN-based systems for robust and efficient limited-vocabulary spoken word recognition.

Keywords

Machine learning; artificial intelligence; audio classification; convolutional neural networks; timeseries; spectrogram

Cite This Article

APA Style

Heidary, K. (2026). Machine Learning Model Development for Classification of Audio Commands. Journal on Artificial Intelligence, 8(1), 65–87. https://doi.org/10.32604/jai.2026.072857

Vancouver Style

Heidary K. Machine Learning Model Development for Classification of Audio Commands. J Artif Intell. 2026;8(1):65–87. https://doi.org/10.32604/jai.2026.072857

IEEE Style

K. Heidary, “Machine Learning Model Development for Classification of Audio Commands,” J. Artif. Intell., vol. 8, no. 1, pp. 65–87, 2026. https://doi.org/10.32604/jai.2026.072857

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Machine Learning Model Development for Classification of Audio Commands

Abstract

Keywords

Cite This Article

828

359

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link