Open Access iconOpen Access

ARTICLE

Robust Human Pose Estimation and Action Recognition Utilizing Feature Extraction

Sheng Luo1, Rashid Abbasi1,*, Hao Wang2, Jinghua Xu3, Dongyang Lyu4, Aaron Zhang1, Farhan Amin5,*, Isabel de la Torre6, Gerardo Mendez Mezquita7, Henry Fabian Gongora7

1 College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China
2 Wenzhou Shifeng Technology Co., Ltd., Wenzhou Innovation Center, Wenzhou, China
3 The State Key Laboratory of Fluid Power Transmission and Control, Zhejiang University, Hangzhou, China
4 China Electronics Digital Innovation, Hengkuan International Building, Beijing, China
5 School of Computer Science and Engineering, Yeungnam University, Gyeongsan, Republic of Korea
6 Department of Signal Theory and Communications, University of Valladolid, Valladolid, Spain
7 Department of Project Management, Universidad Internacional Iberoamericana, Campeche, Mexico

* Corresponding Authors: Rashid Abbasi. Email: email; Farhan Amin. Email: email

(This article belongs to the Special Issue: Advanced Image Segmentation and Object Detection: Innovations, Challenges, and Applications)

Computer Modeling in Engineering & Sciences 2026, 146(3), 31 https://doi.org/10.32604/cmes.2026.075080

Abstract

Human pose estimation is crucial across diverse applications, from healthcare to human–computer interaction. Integrating inertial measurement units (IMUs) with monocular vision methods holds great potential for leveraging complementary modalities; however, existing approaches are often limited by IMU drift, noise, and underutilization of visual information. To address these limitations, we propose a novel dual-stream feature extraction framework that effectively combines temporal IMU data and single-view image features for improved pose estimation. Short-term dependencies in IMU sequences are captured with convolutional layers, while a Transformer-based architecture models long-range temporal dynamics. To mitigate IMU drift and inter-sensor inconsistencies, a complementary filtering module is introduced alongside a cross-channel interaction mechanism. Features from the IMU and image streams are then fused via a dedicated fusion module and further refined utilizing a high-precision regression head for accurate pose prediction. Experimental results on benchmark datasets demonstrate that our method significantly outperforms existing techniques in terms of estimation, accuracy, and robustness, validating the effectiveness of our dual-stream architecture.

Keywords

Human pose estimation; dual-stream network; inertial measurement units (IMU)

Cite This Article

APA Style
Luo, S., Abbasi, R., Wang, H., Xu, J., Lyu, D. et al. (2026). Robust Human Pose Estimation and Action Recognition Utilizing Feature Extraction. Computer Modeling in Engineering & Sciences, 146(3), 31. https://doi.org/10.32604/cmes.2026.075080
Vancouver Style
Luo S, Abbasi R, Wang H, Xu J, Lyu D, Zhang A, et al. Robust Human Pose Estimation and Action Recognition Utilizing Feature Extraction. Comput Model Eng Sci. 2026;146(3):31. https://doi.org/10.32604/cmes.2026.075080
IEEE Style
S. Luo et al., “Robust Human Pose Estimation and Action Recognition Utilizing Feature Extraction,” Comput. Model. Eng. Sci., vol. 146, no. 3, pp. 31, 2026. https://doi.org/10.32604/cmes.2026.075080



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 9

    View

  • 6

    Download

  • 0

    Like

Share Link