TY  - EJOU
AU  - Geng, Jiayi 
AU  - Wu, Yuxuan 
AU  - Lu, Wenbo 
AU  - Su, Pengxiang 
AU  - Ksibi, Amel 
AU  - Li, Wei 
AU  - Shaikh, Zaffar Ahmed 
AU  - Gai, Di 

TI  - Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning
T2  - Computers, Materials \& Continua

PY  - 2025
VL  - 85
IS  - 2
SN  - 1546-2226

AB  - Predicting human motion based on historical motion sequences is a fundamental problem in computer vision, which is at the core of many applications. Existing approaches primarily focus on encoding spatial dependencies among human joints while ignoring the temporal cues and the complex relationships across non-consecutive frames. These limitations hinder the model’s ability to generate accurate predictions over longer time horizons and in scenarios with complex motion patterns. To address the above problems, we proposed a novel multi-level spatial and temporal learning model, which consists of a Cross Spatial Dependencies Encoding Module (CSM) and a Dynamic Temporal Connection Encoding Module (DTM). Specifically, the CSM is designed to capture complementary local and global spatial dependent information at both the joint level and the joint pair level. We further present DTM to encode diverse temporal evolution contexts and compress motion features to a deep level, enabling the model to capture both short-term and long-term dependencies efficiently. Extensive experiments conducted on the Human 3.6M and CMU Mocap datasets demonstrate that our model achieves state-of-the-art performance in both short-term and long-term predictions, outperforming existing methods by up to 20.3% in accuracy. Furthermore, ablation studies confirm the significant contributions of the CSM and DTM in enhancing prediction accuracy.
KW  - Human motion prediction; spatial dependencies learning; temporal context learning; graph convolutional networks; transformer

DO  - 10.32604/cmc.2025.066944