Open AccessOpen Access


High-Movement Human Segmentation in Video Using Adaptive N-Frames Ensemble

Yong-Woon Kim1, Yung-Cheol Byun2,*, Dong Seog Han3, Dalia Dominic1, Sibu Cyriac1

1 Centre for Digital Innovation, CHRIST (Deemed to be University), Bangalore, 560029, India
2 Department of Computer Engineering, Jeju National University, Jeju, 63243, Korea
3 School of Electronics Engineering, Kyungpook National University, Daegu, 41566, Korea

* Corresponding Author: Yung-Cheol Byun. Email:

Computers, Materials & Continua 2022, 73(3), 4743-4762.


A wide range of camera apps and online video conferencing services support the feature of changing the background in real-time for aesthetic, privacy, and security reasons. Numerous studies show that the Deep-Learning (DL) is a suitable option for human segmentation, and the ensemble of multiple DL-based segmentation models can improve the segmentation result. However, these approaches are not as effective when directly applied to the image segmentation in a video. This paper proposes an Adaptive N-Frames Ensemble (AFE) approach for high-movement human segmentation in a video using an ensemble of multiple DL models. In contrast to an ensemble, which executes multiple DL models simultaneously for every single video frame, the proposed AFE approach executes only a single DL model upon a current video frame. It combines the segmentation outputs of previous frames for the final segmentation output when the frame difference is less than a particular threshold. Our method employs the idea of the N-Frames Ensemble (NFE) method, which uses the ensemble of the image segmentation of a current video frame and previous video frames. However, NFE is not suitable for the segmentation of fast-moving objects in a video nor a video with low frame rates. The proposed AFE approach addresses the limitations of the NFE method. Our experiment uses three human segmentation models, namely Fully Convolutional Network (FCN), DeepLabv3, and Mediapipe. We evaluated our approach using 1711 videos of the TikTok50f dataset with a single-person view. The TikTok50f dataset is a reconstructed version of the publicly available TikTok dataset by cropping, resizing and dividing it into videos having 50 frames each. This paper compares the proposed AFE with single models and the Two-Models Ensemble, as well as the NFE models. The experiment results show that the proposed AFE is suitable for low-movement as well as high-movement human segmentation in a video.


Cite This Article

Y. Kim, Y. Byun, D. S. Han, D. Dominic and S. Cyriac, "High-movement human segmentation in video using adaptive n-frames ensemble," Computers, Materials & Continua, vol. 73, no.3, pp. 4743–4762, 2022.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 609


  • 239


  • 1


Share Link

WeChat scan