TY  - EJOU
AU  - Dong, Chengang 
AU  - Ding, Yongkang 
AU  - Hu, Jianwei 

TI  - Towards Real-Time Multi-Person Pose Estimation via Feature Selection and Sharpening Mechanisms
T2  - Computer Modeling in Engineering \& Sciences

PY  - 2026
VL  - 146
IS  - 3
SN  - 1526-1506

AB  - Real-time multi-person pose estimation (MPE) built upon neural network architectures aims to simultaneously detect multiple human instances and regress joint coordinates in dynamic scenes. However, due to factors such as high model complexity and limited expression of keypoint information, both the efficiency and accuracy of real-time MPE remain to be improved. To mitigate the adverse impacts caused by the aforementioned issues, this work develops FSEM-Pose, a real-time MPE model rooted in the YOLOv10 framework. In detail, first, FSEM-Pose upgrades the backbone module of the baseline network by introducing the Feature Shuffling-Convolution (FS-Conv), which effectively reduces the backbone size while maximizing the retention of spatial information from the input image. Second, FSEM-Pose incorporates a Feature Saliency Enhancement Module (FSEM) to strengthen the feature encoding of human keypoints, thereby improving the accuracy of pose estimation. Finally, FSEM-Pose further enhances inference efficiency via a lightweight optimization of the head using shared convolutional layers. Our method achieves competitive results across multiple accuracy and efficiency metrics on the MS COCO 2017 and CrowdPose datasets. While being lightweight in design, it improves average precision (AP) by 2.1% and 2.5%, respectively.
KW  - Pose estimation; feature sharpening; lightweight; YOLOv10

DO  - 10.32604/cmes.2026.079062