TY - EJOU AU - Geng, Jiayi AU - Wu, Yuxuan AU - Lu, Wenbo AU - Su, Pengxiang AU - Ksibi, Amel AU - Li, Wei AU - Shaikh, Zaffar Ahmed AU - Gai, Di TI - Human Motion Prediction Based on Multi-Level Spatial and Temporal Cues Learning T2 - Computers, Materials \& Continua PY - 2025 VL - 85 IS - 2 SN - 1546-2226 AB - Predicting human motion based on historical motion sequences is a fundamental problem in computer vision, which is at the core of many applications. Existing approaches primarily focus on encoding spatial dependencies among human joints while ignoring the temporal cues and the complex relationships across non-consecutive frames. These limitations hinder the model’s ability to generate accurate predictions over longer time horizons and in scenarios with complex motion patterns. To address the above problems, we proposed a novel multi-level spatial and temporal learning model, which consists of a Cross Spatial Dependencies Encoding Module (CSM) and a Dynamic Temporal Connection Encoding Module (DTM). Specifically, the CSM is designed to capture complementary local and global spatial dependent information at both the joint level and the joint pair level. We further present DTM to encode diverse temporal evolution contexts and compress motion features to a deep level, enabling the model to capture both short-term and long-term dependencies efficiently. Extensive experiments conducted on the Human 3.6M and CMU Mocap datasets demonstrate that our model achieves state-of-the-art performance in both short-term and long-term predictions, outperforming existing methods by up to 20.3% in accuracy. Furthermore, ablation studies confirm the significant contributions of the CSM and DTM in enhancing prediction accuracy. KW - Human motion prediction; spatial dependencies learning; temporal context learning; graph convolutional networks; transformer DO - 10.32604/cmc.2025.066944