Open Access
ARTICLE
Evaluating Method of Lower Limb Coordination Based on Spatial-Temporal Dependency Networks
1 Sport Science Research Institute, Nanjing Sport Institute, Nanjing, 210014, China
2 School of Computer Science, School of Cyber Science and Engineering, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, 210044, China
* Corresponding Author: Yongjun Ren. Email:
Computers, Materials & Continua 2025, 85(1), 1959-1980. https://doi.org/10.32604/cmc.2025.066266
Received 03 April 2025; Accepted 06 June 2025; Issue published 29 August 2025
Abstract
As an essential tool for quantitative analysis of lower limb coordination, optical motion capture systems with marker-based encoding still suffer from inefficiency, high costs, spatial constraints, and the requirement for multiple markers. While 3D pose estimation algorithms combined with ordinary cameras offer an alternative, their accuracy often deteriorates under significant body occlusion. To address the challenge of insufficient 3D pose estimation precision in occluded scenarios—which hinders the quantitative analysis of athletes’ lower-limb coordination—this paper proposes a multimodal training framework integrating spatiotemporal dependency networks with text-semantic guidance. Compared to traditional optical motion capture systems, this work achieves low-cost, high-precision motion parameter acquisition through the following innovations: (1) spatiotemporal dependency attention module is designed to establish dynamic spatiotemporal correlation graphs via cross-frame joint semantic matching, effectively resolving the feature fragmentation issue in existing methods. (2) noise-suppressed multi-scale temporal module is proposed, leveraging KL divergence-based information gain analysis for progressive feature filtering in long-range dependencies, reducing errors by 1.91 mm compared to conventional temporal convolutions. (3) text-pose contrastive learning paradigm is introduced for the first time, where BERT-generated action descriptions align semantic-geometric features via the BERT encoder, significantly enhancing robustness under severe occlusion (50% joint invisibility). On the Human3.6M dataset, the proposed method achieves an MPJPE of 56.21 mm under Protocol 1, outperforming the state-of-the-art baseline MHFormer by 3.3%. Extensive ablation studies on Human3.6M demonstrate the individual contributions of the core modules: the spatiotemporal dependency module and noise-suppressed multi-scale temporal module reduce MPJPE by 0.30 and 0.34 mm, respectively, while the multimodal training strategy further decreases MPJPE by 0.6 mm through text-skeleton contrastive learning. Comparative experiments involving 16 athletes show that the sagittal plane coupling angle measurements of hip-ankle joints differ by less than 1.2° from those obtained via traditional optical systems (two one-sided t-tests, p < 0.05), validating real-world reliability. This study provides an AI-powered analytical solution for competitive sports training, serving as a viable alternative to specialized equipment.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools