Open Access iconOpen Access

ARTICLE

crossmark

ARNet: Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos

Hussain Dawood1, Marriam Nawaz2, Tahira Nazir3, Ali Javed2, Abdul Khader Jilani Saudagar4,*, Hatoon S. AlSagri4

1 School of Computing, Skyline University College, Sharjah, 1797, United Arab Emirates
2 Department of Software Engineering, University of Engineering and Technology-Taxila, Punjab, 47050, Pakistan
3 Department of Software Engineering and Computer Science, Riphah International University-Gulberg Green Campus, Islamabad, 46000, Pakistan
4 Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11432, Saudi Arabia

* Corresponding Author: Abdul Khader Jilani Saudagar. Email: email

Computer Modeling in Engineering & Sciences 2025, 144(1), 429-459. https://doi.org/10.32604/cmes.2025.066415

Abstract

Reliable human action recognition (HAR) in video sequences is critical for a wide range of applications, such as security surveillance, healthcare monitoring, and human-computer interaction. Several automated systems have been designed for this purpose; however, existing methods often struggle to effectively integrate spatial and temporal information from input samples such as 2-stream networks or 3D convolutional neural networks (CNNs), which limits their accuracy in discriminating numerous human actions. Therefore, this study introduces a novel deep-learning framework called the ARNet, designed for robust HAR. ARNet consists of two main modules, namely, a refined InceptionResNet-V2-based CNN and a Bi-LSTM (Long Short-Term Memory) network. The refined InceptionResNet-V2 employs a parametric rectified linear unit (PReLU) activation strategy within convolutional layers to enhance spatial feature extraction from individual video frames. The inclusion of the PReLU method improves the spatial information-capturing ability of the approach as it uses learnable parameters to adaptively control the slope of the negative part of the activation function, allowing richer gradient flow during backpropagation and resulting in robust information capturing and stable model training. These spatial features holding essential pixel characteristics are then processed by the Bi-LSTM module for temporal analysis, which assists the ARNet in understanding the dynamic behavior of actions over time. The ARNet integrates three additional dense layers after the Bi-LSTM module to ensure a comprehensive computation of both spatial and temporal patterns and further boost the feature representation. The experimental validation of the model is conducted on 3 benchmark datasets named HMDB51, KTH, and UCF Sports and reports accuracies of 93.82%, 99%, and 99.16%, respectively. The Precision results of HMDB51, KTH, and UCF Sports datasets are 97.41%, 99.54%, and 99.01%; the Recall values are 98.87%, 98.60%, 99.08%, and the F1-Score is 98.13%, 99.07%, 99.04%, respectively. These results highlight the robustness of the ARNet approach and its potential as a versatile tool for accurate HAR across various real-world applications.

Keywords

Action recognition; Bi-LSTM; computer vision; deep learning; InceptionResNet-V2; PReLU

Cite This Article

APA Style
Dawood, H., Nawaz, M., Nazir, T., Javed, A., Saudagar, A.K.J. et al. (2025). ARNet: Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos. Computer Modeling in Engineering & Sciences, 144(1), 429–459. https://doi.org/10.32604/cmes.2025.066415
Vancouver Style
Dawood H, Nawaz M, Nazir T, Javed A, Saudagar AKJ, AlSagri HS. ARNet: Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos. Comput Model Eng Sci. 2025;144(1):429–459. https://doi.org/10.32604/cmes.2025.066415
IEEE Style
H. Dawood, M. Nawaz, T. Nazir, A. Javed, A. K. J. Saudagar, and H. S. AlSagri, “ARNet: Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos,” Comput. Model. Eng. Sci., vol. 144, no. 1, pp. 429–459, 2025. https://doi.org/10.32604/cmes.2025.066415



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1351

    View

  • 656

    Download

  • 0

    Like

Share Link