Open Access iconOpen Access



Visual Motion Segmentation in Crowd Videos Based on Spatial-Angular Stacked Sparse Autoencoders

Adel Hafeezallah1, Ahlam Al-Dhamari2,3,*, Syed Abd Rahman Abu-Bakar2

1 Department of Electrical Engineering, Taibah University, Madinah, Saudi Arabia
2 Department of Electronic and Computer Engineering, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor Bahru, 81310, Malaysia
3 Department of Computer Engineering, Hodeidah University, Hodeidah, Yemen

* Corresponding Author: Ahlam Al-Dhamari. Email: email

Computer Systems Science and Engineering 2023, 47(1), 593-611.


Visual motion segmentation (VMS) is an important and key part of many intelligent crowd systems. It can be used to figure out the flow behavior through a crowd and to spot unusual life-threatening incidents like crowd stampedes and crashes, which pose a serious risk to public safety and have resulted in numerous fatalities over the past few decades. Trajectory clustering has become one of the most popular methods in VMS. However, complex data, such as a large number of samples and parameters, makes it difficult for trajectory clustering to work well with accurate motion segmentation results. This study introduces a spatial-angular stacked sparse autoencoder model (SA-SSAE) with l2-regularization and softmax, a powerful deep learning method for visual motion segmentation to cluster similar motion patterns that belong to the same cluster. The proposed model can extract meaningful high-level features using only spatial-angular features obtained from refined tracklets (a.k.a ‘trajectories’). We adopt l2-regularization and sparsity regularization, which can learn sparse representations of features, to guarantee the sparsity of the autoencoders. We employ the softmax layer to map the data points into accurate cluster representations. One of the best advantages of the SA-SSAE framework is it can manage VMS even when individuals move around randomly. This framework helps cluster the motion patterns effectively with higher accuracy. We put forward a new dataset with its manual ground truth, including 21 crowd videos. Experiments conducted on two crowd benchmarks demonstrate that the proposed model can more accurately group trajectories than the traditional clustering approaches used in previous studies. The proposed SA-SSAE framework achieved a 0.11 improvement in accuracy and a 0.13 improvement in the F-measure compared with the best current method using the CUHK dataset.


Cite This Article

A. Hafeezallah, A. Al-Dhamari and S. A. R. Abu-Bakar, "Visual motion segmentation in crowd videos based on spatial-angular stacked sparse autoencoders," Computer Systems Science and Engineering, vol. 47, no.1, pp. 593–611, 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 356


  • 164


  • 0


Share Link