TY - EJOU
AU - Kukkala, Varada Rajkumar
AU - Praveen, Surapaneni Phani
AU - Tirumanadham, Naga Satya Koti Mani Kumar
AU - Srinivasu, Parvathaneni Naga
TI - A Study on Outlier Detection and Feature Engineering Strategies in Machine Learning for Heart Disease Prediction
T2 - Computer Systems Science and Engineering
PY - 2024
VL - 48
IS - 5
SN -
AB - This paper investigates the application of machine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely; Z-Score incorporated with Grey Wolf Optimization (GWO) as well as Interquartile Range (IQR) coupled with Ant Colony Optimization (ACO). Using a performance index, it is shown that when compared with the Z-Score and GWO with AdaBoost, the IQR and ACO, with AdaBoost are not very accurate (89.0% vs. 86.0%) and less discriminative (Area Under the Curve (AUC) score of 93.0% vs. 91.0%). The Z-Score and GWO methods also outperformed the others in terms of precision, scoring 89.0%; and the recall was also found to be satisfactory, scoring 90.0%. Thus, the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques, which can be important to consider in further improving various aspects of diagnostics in cardiovascular health. Collectively, these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovative machine learning (ML) techniques. These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches. This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies. Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.
KW - Grey wolf optimization; ant colony optimization; Z-Score; interquartile range (IQR); AdaBoost; outlier
DO - 10.32604/csse.2024.053603