Open Access
ARTICLE
Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning
1 School of Software, Nanchang University, Nanchang, 330000, China
2 School of Queen Mary, Nanchang University, Nanchang, 330000, China
3 Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh, 11671, Saudi Arabia
4 Department of Computer Science and Information Technology, Benazir Bhutto Shaheed University Lyari, Karachi, 75660, Pakistan
5 School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, 1015, Switzerland
6 School of Mathematics and Computer Science, Nanchang University, Nanchang, 330000, China
* Corresponding Author: Pengxiang Su. Email:
(This article belongs to the Special Issue: Advances in Action Recognition: Algorithms, Applications, and Emerging Trends)
Computers, Materials & Continua 2025, 85(2), 3689-3707. https://doi.org/10.32604/cmc.2025.066944
Received 21 April 2025; Accepted 24 July 2025; Issue published 23 September 2025
Abstract
Predicting human motion based on historical motion sequences is a fundamental problem in computer vision, which is at the core of many applications. Existing approaches primarily focus on encoding spatial dependencies among human joints while ignoring the temporal cues and the complex relationships across non-consecutive frames. These limitations hinder the model’s ability to generate accurate predictions over longer time horizons and in scenarios with complex motion patterns. To address the above problems, we proposed a novel multi-level spatial and temporal learning model, which consists of a Cross Spatial Dependencies Encoding Module (CSM) and a Dynamic Temporal Connection Encoding Module (DTM). Specifically, the CSM is designed to capture complementary local and global spatial dependent information at both the joint level and the joint pair level. We further present DTM to encode diverse temporal evolution contexts and compress motion features to a deep level, enabling the model to capture both short-term and long-term dependencies efficiently. Extensive experiments conducted on the Human 3.6M and CMU Mocap datasets demonstrate that our model achieves state-of-the-art performance in both short-term and long-term predictions, outperforming existing methods by up to 20.3% in accuracy. Furthermore, ablation studies confirm the significant contributions of the CSM and DTM in enhancing prediction accuracy.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools