Group Activity Recognition in Crowded Scenes Using Multi-Stage Feature Optimization and ST-GCN-LSTM Networks

Mohammed Alnusayri¹, Tingting Xue^2,3, Saleha Kamal⁴, Nouf Abdullah Almujally⁵, Khaled Alnowaiser⁶, Ahmad Jalal^4,7,*, Hui Liu^3,8,9,*
1 Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
2 School of Environmental Science & Engineering, Nanjing University of Information Science and Technology, Nanjing, China
3 Cognitive Systems Lab, University of Bremen, Bremen, Germany
4 Department of Computer Science, Air University, Islamabad, Pakistan
5 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
6 Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
7 Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
8 Jiangsu Key Laboratory of Intelligent Medical Image Computing, School of Future Technology, Nanjing University of Information Science and Technology, Nanjing, China
9 Guodian Nanjing Automation Co., Ltd., Nanjing, China
* Corresponding Author: Ahmad Jalal. Email: email ; Hui Liu. Email: email
(This article belongs to the Special Issue: Deep Learning: Emerging Trends, Applications and Research Challenges for Image Recognition)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.074115

Received 02 October 2025; Accepted 23 December 2025; Published online 21 April 2026

Download PDF

Abstract

Group activity recognition in public environments is challenging due to dynamic formations, complex inter-person interactions, and frequent occlusions. Existing methods often emphasize individual actions, overlooking collective behavioral patterns. This work introduces a multi-modal framework integrating silhouette-based appearance and skeleton-based pose information for robust recognition in surveillance scenarios. You Only Look Once v11 (YOLOv11) detects persons, Segmenting Objects by LOcations version 2 (SOLOv2) segments instances, and AlphaPose extracts skeletons, followed by hierarchical grouping to form spatially coherent clusters. A hybrid feature extraction strategy combines handcrafted descriptors (Extended GIST (ExGIST), Distance Transform, Binary Robust Independent Elementary Features (BRIEF), Ridge) with deep representations, fused via multi-head attention. Feature selection is refined through a three-stage pipeline of Kernel Principal Component Analysis (K-PCA), mutual information ranking, and genetic algorithm-based optimization. Spatio-Temporal Graph Convolution Networks (ST-GCN) models spatio-temporal dependencies, while Long Short-Term Memory (LSTM) captures long-term dynamics for activity classification. On the Collective Activity Dataset (CAD), the framework achieves 96.80% accuracy, surpassing state-of-the-art approaches. Its modular design ensures scalability and adaptability for intelligent surveillance and smart city applications.

Keywords

BRIEF features; YOLO v11; STGCN; group activity recognition; LSTM

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

144

View
24

Download
0

Like

The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
Mohammad Amaz Uddin, Mohammad...
Wind Power Prediction Based on Machine Learning and Deep Learning Models
Zahraa Tarek, Mahmoud Y. Shams,...
Forecasting Future Trajectories with an Improved Transformer Network
Wei Wu, Weigong Zhang, Dong Wang,...
A Hybrid Deep Fused Learning Approach to Segregate Infectious Diseases
Jawad Rasheed, Shtwai Alsubai
Translation of English Language into Urdu Language Using LSTM Model
Sajadul Hassan Kumhar, Syed Immamul...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Group Activity Recognition in Crowded Scenes Using Multi-Stage Feature Optimization and ST-GCN-LSTM Networks

Abstract

Keywords

144

24

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link