Sheng Luo1, Rashid Abbasi1,*, Hao Wang2, Jinghua Xu3, Dongyang Lyu4, Aaron Zhang1, Farhan Amin5,*, Isabel de la Torre6, Gerardo Mendez Mezquita7, Henry Fabian Gongora7
CMES-Computer Modeling in Engineering & Sciences, Vol.146, No.3, 2026, DOI:10.32604/cmes.2026.075080
- 30 March 2026
Abstract Human pose estimation is crucial across diverse applications, from healthcare to human–computer interaction. Integrating inertial measurement units (IMUs) with monocular vision methods holds great potential for leveraging complementary modalities; however, existing approaches are often limited by IMU drift, noise, and underutilization of visual information. To address these limitations, we propose a novel dual-stream feature extraction framework that effectively combines temporal IMU data and single-view image features for improved pose estimation. Short-term dependencies in IMU sequences are captured with convolutional layers, while a Transformer-based architecture models long-range temporal dynamics. To mitigate IMU drift and inter-sensor inconsistencies, More >