Open Access iconOpen Access

ARTICLE

Harmonization of Heart Disease Dataset for Accurate Diagnosis: A Machine Learning Approach Enhanced by Feature Engineering

Ruhul Amin1, Md. Jamil Khan1, Tonway Deb Nath1, Md. Shamim Reza2, Jungpil Shin3,*

1 Department of Computer Science and Engineering, Metropolitan University, Sylhet, 3104, Bangladesh
2 Department of Statistics, Pabna University of Science and Technology, Pabna, 6600, Bangladesh
3 Department of Computer Science & Engineering, University of Aizu, Aizu-Wakamatsu, Fukushima, 956-8580, Japan

* Corresponding Author: Jungpil Shin. Email: email

(This article belongs to the Special Issue: Advanced Medical Imaging Techniques Using Generative Artificial Intelligence)

Computers, Materials & Continua 2025, 82(3), 3907-3919. https://doi.org/10.32604/cmc.2025.061645

Abstract

Heart disease includes a multiplicity of medical conditions that affect the structure, blood vessels, and general operation of the heart. Numerous researchers have made progress in correcting and predicting early heart disease, but more remains to be accomplished. The diagnostic accuracy of many current studies is inadequate due to the attempt to predict patients with heart disease using traditional approaches. By using data fusion from several regions of the country, we intend to increase the accuracy of heart disease prediction. A statistical approach that promotes insights triggered by feature interactions to reveal the intricate pattern in the data, which cannot be adequately captured by a single feature. We processed the data using techniques including feature scaling, outlier detection and replacement, null and missing value imputation, and more to improve the data quality. Furthermore, the proposed feature engineering method uses the correlation test for numerical features and the chi-square test for categorical features to interact with the feature. To reduce the dimensionality, we subsequently used PCA with 95% variation. To identify patients with heart disease, hyperparameter-based machine learning algorithms like RF, XGBoost, Gradient Boosting, LightGBM, CatBoost, SVM, and MLP are utilized, along with ensemble models. The model’s overall prediction performance ranges from 88% to 92%. In order to attain cutting-edge results, we then used a 1D CNN model, which significantly enhanced the prediction with an accuracy score of 96.36%, precision of 96.45%, recall of 96.36%, specificity score of 99.51% and F1 score of 96.34%. The RF model produces the best results among all the classifiers in the evaluation matrix without feature interaction, with accuracy of 90.21%, precision of 90.40%, recall of 90.86%, specificity of 90.91%, and F1 score of 90.63%. Our proposed 1D CNN model is 7% superior to the one without feature engineering when compared to the suggested approach. This illustrates how interaction-focused feature analysis can produce precise and useful insights for heart disease diagnosis.

Keywords


Cite This Article

APA Style
Amin, R., Khan, M.J., Nath, T.D., Reza, M.S., Shin, J. (2025). Harmonization of heart disease dataset for accurate diagnosis: A machine learning approach enhanced by feature engineering. Computers, Materials & Continua, 82(3), 3907–3919. https://doi.org/10.32604/cmc.2025.061645
Vancouver Style
Amin R, Khan MJ, Nath TD, Reza MS, Shin J. Harmonization of heart disease dataset for accurate diagnosis: A machine learning approach enhanced by feature engineering. Comput Mater Contin. 2025;82(3):3907–3919. https://doi.org/10.32604/cmc.2025.061645
IEEE Style
R. Amin, M. J. Khan, T. D. Nath, M. S. Reza, and J. Shin, “Harmonization of Heart Disease Dataset for Accurate Diagnosis: A Machine Learning Approach Enhanced by Feature Engineering,” Comput. Mater. Contin., vol. 82, no. 3, pp. 3907–3919, 2025. https://doi.org/10.32604/cmc.2025.061645



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1536

    View

  • 205

    Download

  • 0

    Like

Share Link