Open Access iconOpen Access

ARTICLE

Machine Learning Model Development for Classification of Audio Commands

Kaveh Heidary*

Department of Electrical Engineering and Computer Science, Alabama A&M University, Huntsville, AL, USA

* Corresponding Author: Kaveh Heidary. Email: email

Journal on Artificial Intelligence 2026, 8, 65-87. https://doi.org/10.32604/jai.2026.072857

Abstract

This paper presents a comprehensive investigation into the development and evaluation of Convolutional Neural Network (CNN) models for limited-vocabulary spoken word classification, a fundamental component of many voice-controlled systems. Two distinct CNN architectures are examined: a timeseries 1D CNN that operates directly on the temporal waveform samples of the audio signal, and a 2D CNN that leverages the richer time–frequency representation provided by spectrograms. The study systematically analyzes the influence of key architectural and training parameters, including the number of CNN layers, convolution kernel sizes, and the dimensionality of fully connected layers, on classification accuracy. Particular attention is given to the effects of speaker diversity within the training dataset and the number of word recitations per speaker on model performance. In addition, the classification accuracy of the proposed CNN-based models is compared against that of Whisper-AI, a state-of-the-art large language model (LLM) for speech processing. All experiments are conducted using an open-source dataset, ensuring reproducibility and enabling fair comparison across different architectures and parameter configurations. The experimental results demonstrate that the 2D CNN achieved an overall classification accuracy of 98.5%, highlighting its superior capability in capturing discriminative time–frequency features for robust spoken word recognition. These findings offer valuable insights into optimizing CNN-based systems for robust and efficient limited-vocabulary spoken word recognition.

Keywords

Machine learning; artificial intelligence; audio classification; convolutional neural networks; timeseries; spectrogram

Cite This Article

APA Style
Heidary, K. (2026). Machine Learning Model Development for Classification of Audio Commands. Journal on Artificial Intelligence, 8(1), 65–87. https://doi.org/10.32604/jai.2026.072857
Vancouver Style
Heidary K. Machine Learning Model Development for Classification of Audio Commands. J Artif Intell. 2026;8(1):65–87. https://doi.org/10.32604/jai.2026.072857
IEEE Style
K. Heidary, “Machine Learning Model Development for Classification of Audio Commands,” J. Artif. Intell., vol. 8, no. 1, pp. 65–87, 2026. https://doi.org/10.32604/jai.2026.072857



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 32

    View

  • 13

    Download

  • 0

    Like

Share Link