Leveraging Transfer Learning for Spatio-Temporal Human Activity Recognition from Video Sequences

Umair Butt; Hadiqa Ullah; Sukumar Letchmunan; Iqra Tariq; Fadratul Hassan; Tieng Koh

doi:10.32604/cmc.2023.035512

Open Access icon Open Access

ARTICLE

Leveraging Transfer Learning for Spatio-Temporal Human Activity Recognition from Video Sequences

Umair Muneer Butt^1,2,*, Hadiqa Aman Ullah², Sukumar Letchmunan¹, Iqra Tariq², Fadratul Hafinaz Hassan¹, Tieng Wei Koh³

1 School of Computer Sciences, Universiti Sains Malaysia, Penang, 1180, Malaysia
2 Department of Computer Science, The University of Chenab, Gujrat, 50700, Pakistan
3 Department of Software Engineering and Information System, Universiti Putra Malaysia, Selangor, 43400, Malaysia

* Corresponding Author: Umair Muneer Butt. Email: email

Computers, Materials & Continua 2023, 74(3), 5017-5033. https://doi.org/10.32604/cmc.2023.035512

Received 24 August 2022; Accepted 26 October 2022; Issue published 28 December 2022

Abstract

Human Activity Recognition (HAR) is an active research area due to its applications in pervasive computing, human-computer interaction, artificial intelligence, health care, and social sciences. Moreover, dynamic environments and anthropometric differences between individuals make it harder to recognize actions. This study focused on human activity in video sequences acquired with an RGB camera because of its vast range of real-world applications. It uses two-stream ConvNet to extract spatial and temporal information and proposes a fine-tuned deep neural network. Moreover, the transfer learning paradigm is adopted to extract varied and fixed frames while reusing object identification information. Six state-of-the-art pre-trained models are exploited to find the best model for spatial feature extraction. For temporal sequence, this study uses dense optical flow following the two-stream ConvNet and Bidirectional Long Short Term Memory (BiLSTM) to capture long-term dependencies. Two state-of-the-art datasets, UCF101 and HMDB51, are used for evaluation purposes. In addition, seven state-of-the-art optimizers are used to fine-tune the proposed network parameters. Furthermore, this study utilizes an ensemble mechanism to aggregate spatial-temporal features using a four-stream Convolutional Neural Network (CNN), where two streams use RGB data. In contrast, the other uses optical flow images. Finally, the proposed ensemble approach using max hard voting outperforms state-of-the-art methods with 96.30% and 90.07% accuracies on the UCF101 and HMDB51 datasets.

Keywords

Human activity recognition; deep learning; transfer learning; neural network; ensemble learning; spatio-temporal

Cite This Article

APA Style

Butt, U.M., Ullah, H.A., Letchmunan, S., Tariq, I., Hassan, F.H. et al. (2023). Leveraging transfer learning for spatio-temporal human activity recognition from video sequences. Computers, Materials & Continua, 74(3), 5017-5033. https://doi.org/10.32604/cmc.2023.035512

Vancouver Style

Butt UM, Ullah HA, Letchmunan S, Tariq I, Hassan FH, Koh TW. Leveraging transfer learning for spatio-temporal human activity recognition from video sequences. Comput Mater Contin. 2023;74(3):5017-5033 https://doi.org/10.32604/cmc.2023.035512

IEEE Style

U.M. Butt, H.A. Ullah, S. Letchmunan, I. Tariq, F.H. Hassan, and T.W. Koh "Leveraging Transfer Learning for Spatio-Temporal Human Activity Recognition from Video Sequences," Comput. Mater. Contin., vol. 74, no. 3, pp. 5017-5033. 2023. https://doi.org/10.32604/cmc.2023.035512

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Leveraging Transfer Learning for Spatio-Temporal Human Activity Recognition from Video Sequences

Abstract

Keywords

Cite This Article

1025

450

1

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link