Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.071251
Special Issues
Table of Content

Open Access

ARTICLE

Action Recognition via Shallow CNNs on Intelligently Selected Motion Data

Jalees Ur Rahman1, Muhammad Hanif1, Usman Haider2,*, Saeed Mian Qaisar3,*, Sarra Ayouni4
1 Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, 23460, Pakistan
2 Department of AI and DS, FAST School of Computering, National University of Computer and Emerging Sciences, Islamabad, 44000, Pakistan
3 College of Engineering and Technology, American University of the Middle East, Egaila, 54200, Kuwait
4 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
* Corresponding Author: Usman Haider. Email: email; Saeed Mian Qaisar. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071251

Received 03 August 2025; Accepted 13 November 2025; Published online 17 December 2025

Abstract

Deep neural networks have achieved excellent classification results on several computer vision benchmarks. This has led to the popularity of machine learning as a service, where trained algorithms are hosted on the cloud and inference can be obtained on real-world data. In most applications, it is important to compress the vision data due to the enormous bandwidth and memory requirements. Video codecs exploit spatial and temporal correlations to achieve high compression ratios, but they are computationally expensive. This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos. However, contrary to the normal practice of reconstructing the full-resolution frames through motion compensation, this work proposes to infer the class label from the block-based computed motion fields directly. Motion fields are a richer and more complex representation of motion vectors, where each motion vector carries the magnitude and direction information. This approach has two advantages: the cost of motion compensation and video decoding is avoided, and the dimensions of the input signal are highly reduced. This results in a shallower network for classification. The neural network can be trained using motion vectors in two ways: complex representations and magnitude-direction pairs. The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields. Our experimental results show 20 × faster convergence during training, reduced overfitting, and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames. We validate the proposed methodology on the HGds dataset, achieving a testing accuracy of 99.21%, on the HMDB51 dataset, achieving 82.54% accuracy, and on the UCF101 dataset, achieving 97.13% accuracy, outperforming state-of-the-art methods in computational efficiency.

Keywords

Action recognition; block matching algorithm; convolutional neural network; deep learning; data compression; motion fields; optimization; videos classification
  • 79

    View

  • 12

    Download

  • 0

    Like

Share Link