TY  - EJOU
AU  - Wang, Shiqi 
AU  - Yang, Yimin 
AU  - Wei, Ruizhong 
AU  - Wu, Qingming Jonathan 

TI  - 3-Dimensional Bag of Visual Words Framework on Action Recognition
T2  - Computers, Materials \& Continua

PY  - 2020
VL  - 63
IS  - 3
SN  - 1546-2226

AB  - Human motion recognition plays a crucial role in the video analysis 
framework. However, a given video may contain a variety of noises, such as an unstable 
background and redundant actions, that are completely different from the key actions. 
These noises pose a great challenge to human motion recognition. To solve this problem, 
we propose a new method based on the 3-Dimensional (3D) Bag of Visual Words
(BoVW) framework. Our method includes two parts: The first part is the video action 
feature extractor, which can identify key actions by analyzing action features. In the 
video action encoder, by analyzing the action characteristics of a given video, we use the 
deep 3D CNN pre-trained model to obtain expressive coding information. A classifier 
with subnetwork nodes is used for the final classification. The extensive experiments 
demonstrate that our method leads to an impressive effect on complex video analysis. 
Our approach achieves state-of-the-art performance on the datasets of UCF101 (85.3%) 
and HMDB51 (54.5%).
KW  - Action recognition
KW  -  3D CNNs
KW  -  recurrent neural networks
KW  -  residual networks
KW  -   subnetwork nodes

DO  - 10.32604/cmc.2020.09648