Suzhen Wang1, Zhanfeng Zhang1,*, Shanshan Geng1, Chaoyi Pang2
CMC-Computers, Materials & Continua, Vol.71, No.2, pp. 3721-3731, 2022, DOI:10.32604/cmc.2022.015378
Abstract As society has developed, increasing amounts of data have been generated by various industries. The random forest algorithm, as a classification algorithm, is widely used because of its superior performance. However, the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features, thereby affecting its classification accuracy, and resulting in a low data calculation efficiency in the stand-alone mode. In response to the aforementioned problems, related optimization research was conducted with Spark in the present paper. This improved random forest algorithm performs feature extraction according to the calculated feature importance… More >