TY - EJOU AU - Luo, Sheng AU - Abbasi, Rashid AU - Wang, Hao AU - Xu, Jinghua AU - Lyu, Dongyang AU - Zhang, Aaron AU - Amin, Farhan AU - Torre, Isabel de la AU - Mezquita, Gerardo Mendez AU - Gongora, Henry Fabian TI - Robust Human Pose Estimation and Action Recognition Utilizing Feature Extraction T2 - Computer Modeling in Engineering \& Sciences PY - 2026 VL - 146 IS - 3 SN - 1526-1506 AB - Human pose estimation is crucial across diverse applications, from healthcare to human–computer interaction. Integrating inertial measurement units (IMUs) with monocular vision methods holds great potential for leveraging complementary modalities; however, existing approaches are often limited by IMU drift, noise, and underutilization of visual information. To address these limitations, we propose a novel dual-stream feature extraction framework that effectively combines temporal IMU data and single-view image features for improved pose estimation. Short-term dependencies in IMU sequences are captured with convolutional layers, while a Transformer-based architecture models long-range temporal dynamics. To mitigate IMU drift and inter-sensor inconsistencies, a complementary filtering module is introduced alongside a cross-channel interaction mechanism. Features from the IMU and image streams are then fused via a dedicated fusion module and further refined utilizing a high-precision regression head for accurate pose prediction. Experimental results on benchmark datasets demonstrate that our method significantly outperforms existing techniques in terms of estimation, accuracy, and robustness, validating the effectiveness of our dual-stream architecture. KW - Human pose estimation; dual-stream network; inertial measurement units (IMU) DO - 10.32604/cmes.2026.075080