Open Access
ARTICLE
Robust Human Pose Estimation and Action Recognition Utilizing Feature Extraction
1 College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China
2 Wenzhou Shifeng Technology Co., Ltd., Wenzhou Innovation Center, Wenzhou, China
3 The State Key Laboratory of Fluid Power Transmission and Control, Zhejiang University, Hangzhou, China
4 China Electronics Digital Innovation, Hengkuan International Building, Beijing, China
5 School of Computer Science and Engineering, Yeungnam University, Gyeongsan, Republic of Korea
6 Department of Signal Theory and Communications, University of Valladolid, Valladolid, Spain
7 Department of Project Management, Universidad Internacional Iberoamericana, Campeche, Mexico
* Corresponding Authors: Rashid Abbasi. Email: ; Farhan Amin. Email:
(This article belongs to the Special Issue: Advanced Image Segmentation and Object Detection: Innovations, Challenges, and Applications)
Computer Modeling in Engineering & Sciences 2026, 146(3), 31 https://doi.org/10.32604/cmes.2026.075080
Received 24 October 2025; Accepted 22 January 2026; Issue published 30 March 2026
Abstract
Human pose estimation is crucial across diverse applications, from healthcare to human–computer interaction. Integrating inertial measurement units (IMUs) with monocular vision methods holds great potential for leveraging complementary modalities; however, existing approaches are often limited by IMU drift, noise, and underutilization of visual information. To address these limitations, we propose a novel dual-stream feature extraction framework that effectively combines temporal IMU data and single-view image features for improved pose estimation. Short-term dependencies in IMU sequences are captured with convolutional layers, while a Transformer-based architecture models long-range temporal dynamics. To mitigate IMU drift and inter-sensor inconsistencies, a complementary filtering module is introduced alongside a cross-channel interaction mechanism. Features from the IMU and image streams are then fused via a dedicated fusion module and further refined utilizing a high-precision regression head for accurate pose prediction. Experimental results on benchmark datasets demonstrate that our method significantly outperforms existing techniques in terms of estimation, accuracy, and robustness, validating the effectiveness of our dual-stream architecture.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools