Open Access
ARTICLE
Machine Learning Techniques Applied to Electronic Healthcare Records to Predict Cancer Patient Survivability
Ornela Bardhi1,2,*, Begonya Garcia Zapirain1
1 eVIDA Lab, University of Deusto, Bilbao, 48007, Spain
2 Success Clinic Oy, Helsinki, 00180, Finland
* Corresponding Author: Ornela Bardhi. Email:
(This article belongs to this Special Issue: AI, IoT, Blockchain Assisted Intelligent Solutions to Medical and Healthcare Systems)
Computers, Materials & Continua 2021, 68(2), 1595-1613. https://doi.org/10.32604/cmc.2021.015326
Received 15 November 2020; Accepted 06 February 2021; Issue published 13 April 2021
Abstract
Breast cancer (BCa) and prostate cancer (PCa) are the two most
common types of cancer. Various factors play a role in these cancers, and
discovering the most important ones might help patients live longer, better
lives. This study aims to determine the variables that most affect patient
survivability, and how the use of different machine learning algorithms can
assist in such predictions. The AURIA database was used, which contains
electronic healthcare records (EHRs) of 20,006 individual patients diagnosed
with either breast or prostate cancer in a particular region in Finland. In
total, there were 178 features for BCa and 143 for PCa. Six feature selection
algorithms were used to obtain the 21 most important variables for BCa, and
19 for PCa. These features were then used to predict patient survivability by
employing nine different machine learning algorithms. Seventy-five percent of
the dataset was used to train the models and 25% for testing. Cross-validation
was carried out using the StratifiedKfold technique to test the effectiveness of
the machine learning models. The support vector machine classifier yielded
the best ROC with an area under the curve (AUC) = 0.83, followed by the
KNeighborsClassifier with AUC = 0.82 for the BCa dataset. The two algorithms
that yielded the best results for PCa are the random forest classifier
and KNeighborsClassifier, both with AUC = 0.82. This study shows that not
all variables are decisive when predicting breast or prostate cancer patient
survivability. By narrowing down the input variables, healthcare professionals
were able to focus on the issues that most impact patients, and hence devise
better, more individualized care plans.
Keywords
Cite This Article
O. Bardhi and B. Garcia Zapirain, "Machine learning techniques applied to electronic healthcare records to predict cancer patient survivability,"
Computers, Materials & Continua, vol. 68, no.2, pp. 1595–1613, 2021. https://doi.org/10.32604/cmc.2021.015326
Citations