Open Access iconOpen Access



Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier

Vaibhav Rupapara1, Furqan Rustam2, Abid Ishaq2, Ernesto Lee3, Imran Ashraf4,*

1 School of Computing and Information Sciences, Florida International University, USA
2 Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, 64200, Pakistan
3 Department of Computer Science, Broward College, Broward County, Florida, USA
4 Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, 38541, Korea

* Corresponding Author: Imran Ashraf. Email: email

Intelligent Automation & Soft Computing 2023, 36(2), 1931-1949.


Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization. During the last few years, an alarming increase is observed worldwide with a 70% rise in the disease since 2000 and an 80% rise in male deaths. If untreated, it results in complications of many vital organs of the human body which may lead to fatality. Early detection of diabetes is a task of significant importance to start timely treatment. This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis. An ensemble model, logistic tree classifier (LTC), is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism. Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression, extra tree classifier, AdaBoost, Gaussian naive Bayes, decision tree, random forest, and k nearest neighbor. In addition, several experiments are carried out using principal component analysis (PCA) and Chi-square (Chi-2) features to analyze the influence of feature selection on the performance of machine learning classifiers. Results indicate that Chi-2 features show high performance than both PCA features and original features. However, the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed feature fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction. In addition, the statistical T-test proves the statistical significance of the proposed approach over other approaches.


Cite This Article

APA Style
Rupapara, V., Rustam, F., Ishaq, A., Lee, E., Ashraf, I. (2023). Chi-square and PCA based feature selection for diabetes detection with ensemble classifier. Intelligent Automation & Soft Computing, 36(2), 1931-1949.
Vancouver Style
Rupapara V, Rustam F, Ishaq A, Lee E, Ashraf I. Chi-square and PCA based feature selection for diabetes detection with ensemble classifier. Intell Automat Soft Comput . 2023;36(2):1931-1949
IEEE Style
V. Rupapara, F. Rustam, A. Ishaq, E. Lee, and I. Ashraf "Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier," Intell. Automat. Soft Comput. , vol. 36, no. 2, pp. 1931-1949. 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1159


  • 535


  • 0


Share Link