Intelligent Human Interaction Recognition with Multi-Modal Feature Extraction and Bidirectional LSTM

Muhammad Hamdan Azhar^1,2,#, Yanfeng Wu^1,#, Nouf Abdullah Almujally³, Shuaa S. Alharbi⁴, Asaad Algarni⁵, Ahmad Jalal^2,6, Hui Liu^1,7,8,*
1 Guodian Nanjing Automation Co., Ltd., Nanjing, 600268, China
2 Faculty of Computing and AI, Air University, Islamabad, 44000, Pakistan
3 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia
4 Department of Information Technology, College of Computer, Qassim University, Buraydah, 52571, Saudi Arabia
5 Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha, 91911, Saudi Arabia
6 Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, 02841, Republic of Korea
7 Jiangsu Key Laboratory of Intelligent Medical Image Computing, School of Future Technology, Nanjing University of Information Science and Technology, Nanjing, 210044, China
8 Cognitive Systems Lab, University of Bremen, Bremen, 28359, Germany
* Corresponding Author: Hui Liu. Email: email
# These authors contributed equally to this work
(This article belongs to the Special Issue: Advances in Image Recognition: Innovations, Applications, and Future Directions)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071988

Received 17 August 2025; Accepted 22 October 2025; Published online 29 December 2025

Download PDF

Abstract

Recognizing human interactions in RGB videos is a critical task in computer vision, with applications in video surveillance. Existing deep learning-based architectures have achieved strong results, but are computationally intensive, sensitive to video resolution changes and often fail in crowded scenes. We propose a novel hybrid system that is computationally efficient, robust to degraded video quality and able to filter out irrelevant individuals, making it suitable for real-life use. The system leverages multi-modal handcrafted features for interaction representation and a deep learning classifier for capturing complex dependencies. Using Mask R-CNN and YOLO11-Pose, we extract grayscale silhouettes and keypoint coordinates of interacting individuals, while filtering out irrelevant individuals using a proposed algorithm. From these, we extract silhouette-based features (local ternary pattern and histogram of optical flow) and keypoint-based features (distances, angles and velocities) that capture distinct spatial and temporal information. A Bidirectional Long Short-Term Memory network (BiLSTM) then classifies the interactions. Extensive experiments on the UT Interaction, SBU Kinect Interaction and the ISR-UOL 3D social activity datasets demonstrate that our system achieves competitive accuracy. They also validate the effectiveness of the chosen features and classifier, along with the proposed system’s computational efficiency and robustness to occlusion.

Keywords

Human interaction recognition; keypoint coordinates; grayscale silhouettes; bidirectional long short-term memory network

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

371

View
49

Download
0

Like

Bi-LSTM-Based Deep Stacked Sequence-to-Sequence Autoencoder for Forecasting Solar Irradiation and Wind Speed
Neelam Mughees, Mujtaba Hussain...
A Novel Human Interaction Framework Using Quadratic Discriminant Analysis with HMM
Tanvir Fatima Naik Bukht, Naif...
A Time Series Intrusion Detection Method Based on SSAE, TCN and Bi-LSTM
Zhenxiang He, Xunxi Wang, Chunwei...
RoBGP: A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer
Xiaohui Cui, Chao Song, Dongmei...
Human Interaction Recognition in Surveillance Videos Using Hybrid Deep Learning and Machine Learning Models
Vesal Khean, Chomyong Kim, Sunjoo...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Intelligent Human Interaction Recognition with Multi-Modal Feature Extraction and Bidirectional LSTM

Abstract

Keywords

371

49

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link