Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.074115
Special Issues
Table of Content

Open Access

ARTICLE

Group Activity Recognition in Crowded Scenes Using Multi-Stage Feature Optimization and ST-GCN-LSTM Networks

Mohammed Alnusayri1, Tingting Xue2,3, Saleha Kamal4, Nouf Abdullah Almujally5, Khaled Alnowaiser6, Ahmad Jalal4,7,*, Hui Liu3,8,9,*
1 Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
2 School of Environmental Science & Engineering, Nanjing University of Information Science and Technology, Nanjing, China
3 Cognitive Systems Lab, University of Bremen, Bremen, Germany
4 Department of Computer Science, Air University, Islamabad, Pakistan
5 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
6 Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
7 Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
8 Jiangsu Key Laboratory of Intelligent Medical Image Computing, School of Future Technology, Nanjing University of Information Science and Technology, Nanjing, China
9 Guodian Nanjing Automation Co., Ltd., Nanjing, China
* Corresponding Author: Ahmad Jalal. Email: email; Hui Liu. Email: email
(This article belongs to the Special Issue: Deep Learning: Emerging Trends, Applications and Research Challenges for Image Recognition)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.074115

Received 02 October 2025; Accepted 23 December 2025; Published online 21 April 2026

Abstract

Group activity recognition in public environments is challenging due to dynamic formations, complex inter-person interactions, and frequent occlusions. Existing methods often emphasize individual actions, overlooking collective behavioral patterns. This work introduces a multi-modal framework integrating silhouette-based appearance and skeleton-based pose information for robust recognition in surveillance scenarios. You Only Look Once v11 (YOLOv11) detects persons, Segmenting Objects by LOcations version 2 (SOLOv2) segments instances, and AlphaPose extracts skeletons, followed by hierarchical grouping to form spatially coherent clusters. A hybrid feature extraction strategy combines handcrafted descriptors (Extended GIST (ExGIST), Distance Transform, Binary Robust Independent Elementary Features (BRIEF), Ridge) with deep representations, fused via multi-head attention. Feature selection is refined through a three-stage pipeline of Kernel Principal Component Analysis (K-PCA), mutual information ranking, and genetic algorithm-based optimization. Spatio-Temporal Graph Convolution Networks (ST-GCN) models spatio-temporal dependencies, while Long Short-Term Memory (LSTM) captures long-term dynamics for activity classification. On the Collective Activity Dataset (CAD), the framework achieves 96.80% accuracy, surpassing state-of-the-art approaches. Its modular design ensures scalability and adaptability for intelligent surveillance and smart city applications.

Keywords

BRIEF features; YOLO v11; STGCN; group activity recognition; LSTM
  • 144

    View

  • 24

    Download

  • 0

    Like

Share Link