Open Access iconOpen Access

ARTICLE

Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning

Jiayi Geng1, Yuxuan Wu1, Wenbo Lu2, Pengxiang Su1,*, Amel Ksibi3, Wei Li1, Zaffar Ahmed Shaikh4,5, Di Gai6

1 School of Software, Nanchang University, Nanchang, 330000, China
2 School of Queen Mary, Nanchang University, Nanchang, 330000, China
3 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh, 11671, Saudi Arabia
4 Department of Computer Science and Information Technology, Benazir Bhutto Shaheed University Lyari, Karachi, 75660, Pakistan
5 School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, 1015, Switzerland
6 School of Mathematics and Computer Science, Nanchang University, Nanchang, 330000, China

* Corresponding Author: Pengxiang Su. Email: email

(This article belongs to the Special Issue: Advances in Action Recognition: Algorithms, Applications, and Emerging Trends)

Computers, Materials & Continua 2025, 85(2), 3689-3707. https://doi.org/10.32604/cmc.2025.066944

Abstract

Predicting human motion based on historical motion sequences is a fundamental problem in computer vision, which is at the core of many applications. Existing approaches primarily focus on encoding spatial dependencies among human joints while ignoring the temporal cues and the complex relationships across non-consecutive frames. These limitations hinder the model’s ability to generate accurate predictions over longer time horizons and in scenarios with complex motion patterns. To address the above problems, we proposed a novel multi-level spatial and temporal learning model, which consists of a Cross Spatial Dependencies Encoding Module (CSM) and a Dynamic Temporal Connection Encoding Module (DTM). Specifically, the CSM is designed to capture complementary local and global spatial dependent information at both the joint level and the joint pair level. We further present DTM to encode diverse temporal evolution contexts and compress motion features to a deep level, enabling the model to capture both short-term and long-term dependencies efficiently. Extensive experiments conducted on the Human 3.6M and CMU Mocap datasets demonstrate that our model achieves state-of-the-art performance in both short-term and long-term predictions, outperforming existing methods by up to 20.3% in accuracy. Furthermore, ablation studies confirm the significant contributions of the CSM and DTM in enhancing prediction accuracy.

Keywords

Human motion prediction; spatial dependencies learning; temporal context learning; graph convolutional networks; transformer

Cite This Article

APA Style
Geng, J., Wu, Y., Lu, W., Su, P., Ksibi, A. et al. (2025). Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning. Computers, Materials & Continua, 85(2), 3689–3707. https://doi.org/10.32604/cmc.2025.066944
Vancouver Style
Geng J, Wu Y, Lu W, Su P, Ksibi A, Li W, et al. Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning. Comput Mater Contin. 2025;85(2):3689–3707. https://doi.org/10.32604/cmc.2025.066944
IEEE Style
J. Geng et al., “Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning,” Comput. Mater. Contin., vol. 85, no. 2, pp. 3689–3707, 2025. https://doi.org/10.32604/cmc.2025.066944



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 951

    View

  • 520

    Download

  • 0

    Like

Share Link