Open Access iconOpen Access



HybridHR-Net: Action Recognition in Video Sequences Using Optimal Deep Learning Fusion Assisted Framework

Muhammad Naeem Akbar1,*, Seemab Khan2, Muhammad Umar Farooq1, Majed Alhaisoni3, Usman Tariq4, Muhammad Usman Akram1

1 Department of Computer Engineering, National University of Sciences and Technology (NUST), Islamabad, 46000, Pakistan
2 Department of Robotics, SMME NUST, Islamabad, 45600, Pakistan
3 Computer Sciences Department, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, 11671, Saudi Arabia
4 Management Information System Department, College of Business Administration, Prince Sattam bin Abdulaziz University, Al-Kharj, 16278, Saudi Arabia

* Corresponding Author: Muhammad Naeem Akbar. Email: email

(This article belongs to this Special Issue: Recent Advances in Hyper Parameters Optimization, Features Optimization, and Deep Learning for Video Surveillance and Biometric Applications)

Computers, Materials & Continua 2023, 76(3), 3275-3295.


The combination of spatiotemporal videos and essential features can improve the performance of human action recognition (HAR); however, the individual type of features usually degrades the performance due to similar actions and complex backgrounds. The deep convolutional neural network has improved performance in recent years for several computer vision applications due to its spatial information. This article proposes a new framework called for video surveillance human action recognition dubbed HybridHR-Net. On a few selected datasets, deep transfer learning is used to pre-trained the EfficientNet-b0 deep learning model. Bayesian optimization is employed for the tuning of hyperparameters of the fine-tuned deep model. Instead of fully connected layer features, we considered the average pooling layer features and performed two feature selection techniques-an improved artificial bee colony and an entropy-based approach. Using a serial nature technique, the features that were selected are combined into a single vector, and then the results are categorized by machine learning classifiers. Five publically accessible datasets have been utilized for the experimental approach and obtained notable accuracy of 97%, 98.7%, 100%, 99.7%, and 96.8%, respectively. Additionally, a comparison of the proposed framework with contemporary methods is done to demonstrate the increase in accuracy.


Cite This Article

M. N. Akbar, S. Khan, M. U. Farooq, M. Alhaisoni, U. Tariq et al., "Hybridhr-net: action recognition in video sequences using optimal deep learning fusion assisted framework," Computers, Materials & Continua, vol. 76, no.3, pp. 3275–3295, 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 267


  • 169


  • 0


Share Link