Heart disease prognosis (HDP) is a difficult undertaking that requires knowledge and expertise to predict early on. Heart failure is on the rise as a result of today’s lifestyle. The healthcare business generates a vast volume of patient records, which are challenging to manage manually. When it comes to data mining and machine learning, having a huge volume of data is crucial for getting meaningful information. Several methods for predicting HD have been used by researchers over the last few decades, but the fundamental concern remains the uncertainty factor in the output data, as well as the need to decrease the error rate and enhance the accuracy of HDP assessment measures. However, in order to discover the optimal HDP solution, this study compares multiple classification algorithms utilizing two separate heart disease datasets from the Kaggle repository and the University of California, Irvine (UCI) machine learning repository. In a comparative analysis, Mean Absolute Error (MAE), Relative Absolute Error (RAE), precision, recall, f-measure, and accuracy are used to evaluate Linear Regression (LR), Decision Tree (J48), Naive Bayes (NB), Artificial Neural Network (ANN), Simple Cart (SC), Bagging, Decision Stump (DS), AdaBoost, Rep Tree (REPT), and Support Vector Machine (SVM). Overall, the SVM classifier surpasses other classifiers in terms of increasing accuracy and decreasing error rate, with RAE of 33.2631 and MAE of 0.165, the precision of 0.841, recall of 0.835, f-measure of 0.833, and accuracy of 83.49 percent for the dataset gathered from UCI. The SC improves accuracy and reduces the error rate for the Kaggle dataset, which is 3.30% for RAE, 0.016 percent for MAE, 0.984% for precision, 0.984 percent for recall, 0.984 percent for f-measure, and 98.44% for accuracy.
The heart is the major organ of the body which pumps blood and supplies it to the whole body. Life is dependent on the efficient working of the heart. If the heart cannot regulate blood to body parts, it may cause severe pain and mortality within minutes. Such disease needs to be treated on time [
However, the main focus of the study is the empirical analysis of different ML techniques and finding the best technique amongst the prevailing techniques for the prediction of HD with higher accuracy and lesser amount of error rate. For evaluating existing techniques, this research focuses on Mean Absolute Error (MAE), Relative Absolute Error (RAE), Accuracy, Precision, Recall, and F-measure as assessment metrics to evaluate the employed techniques. The reason for using error rate is to find that how our prediction is wrong or what is the difference between actual and predicted outcomes. This is also in important factor to keep in consideration.
Hereinafter, Section 2 addresses the literature review, whereas Section 3 illustrates the study methodology. Section 4 went through the outcomes and how they were discussed. Finally, Section 5 summarizes the entire study’s findings.
The basic ML process comprises data collection, pre-processing, and applying a classifier on a dataset to diagnose diseases. The first step involves the preprocessing of raw data to form a clean dataset that can further be passed for the training phase, the second step involves the utilization of the classifier on the preprocessed dataset to evaluate the prediction accuracy of the classifier. Supervised learning involves the development of a model where labels are known. On the other side, the “unsupervised” research method is not pre-labeled.
Agreeing with information from the literature periodical, various data mining techniques are used for HDP with higher accuracy and fewer error rates [
Experimental analysis performed by Chaurasia et al. [
Dai et al. [
An intelligent technique for HDP proposed by Dbritto et al. [
An operational framework for HDP named Identification of heart failure by using unstructured data of cardiac patients proposed by Saqlain et al. [
Kumar et al. [
A model has been developed to support decision-making in HD prognosis based on data mining techniques propose by Makumba et al. [
Ware et al. [
The detailed methodology begins with a collection of two different HDDs, one is the UCI dataset and the other is the Kaggle dataset. Post collections of the dataset, classification techniques are applied to the dataset to achieve better accuracy and lower error rate. For this, techniques including J48, NB, LR, SC, Bagging, DS, AdaBoost, ANN, REPT, and SVM are first trained using 10 fold-cross validation (CV) on the dataset, and then the prediction is performed by each technique. Post predictions, analysis of comparisons were performed among all mentioned techniques to check which technique has higher accuracy and lower error rate. The overall methodology for HDP is shown in
The ML classification techniques are employed on datasets taken from UCI and Kaggle repositories. The selection of these datasets is based on the waste use of these datasets in the previous research studies. These datasets are recommended by various researchers as standards for research analysis [
SN | Variables | Description | Measurement scale |
---|---|---|---|
1 | Age | Age of patient | Interval |
2 | Sex | Sex of patient | Nominal |
3 | Cp | Chest pain type | Nominal |
4 | Trestbps | Resting blood pressure | Interval |
5 | Chol | Cholesterol level | Interval |
6 | Fbs | Fasting blood sugar | Nominal |
7 | restecg | Resting electrographic results | Ratio |
8 | thalach | Maximum exercise heart rate achieved (bpm) | Interval |
9 | exang | Exercise-induced angina | Nominal |
10 | Oldpeak | ST depression induced by exercise relative to rest | Interval |
11 | Slope | The slope of the peak exercise ST segment | Nominal |
12 | ca | Number of major vessels colored by fluoroscopy | Interval |
13 | Thal | Normal, fixed defect, reversible defect | Nominal |
14 | Target | Absence, Presence | Nominal |
SN | Variables | Range |
---|---|---|
1 | Age | In Years |
2 | Sex | 1 is used for Male and 0 is used for Female |
3 | Cp | value 1: typical type 1 angina; value 2: typical type angina; value 3: non-angina pain; value 4: asymptomatic |
4 | Trestbps | mm Hg |
5 | Chol | 200–250 or higher mg/dL |
6 | Fbs | (value 1: > 120 mg/dl; value 0: < 120 mg/dl) |
7 | restecg | Value 0:Normal ; Value 1:ST-wave abnormality ; Value 2:probable left ventricular hypertrophy |
8 | thalach | Beats per minute (bmp) |
9 | exang | value 1: yes ; value 0: no |
10 | Oldpeak | 1–3 |
11 | Slope | Value 1: unsloping; Value 2: flat; Value 3: downsloping |
12 | ca | value 0–3 |
13 | Thal | value 3: normal; value 6: fixed defect; value 7: reversible defect |
14 | Target | 0 or 1 |
To assess the performance of a classifier, the 10-fold cross-validation is applied. The 10-fold cross-validation is a technique that divides data records into 10 portions of equivalent sizes; one portion is utilized for validation set while others are used for training. This process continues until each portion has been utilized for validation. It is a standard technique used for assessment [
To determine techniques with higher accuracy and lower error rates, ten classification techniques including J48, NB, LR, SC, Bagging, DS, AdaBoost, ANN, REPT, and SVM have been used for comparisons. The subsection contains a brief detail of each employed technique.
The J48 algorithm grows the initial tree using the technique of divide and conquers. The root node is the attribute that has the highest gain ratio. To enhance accuracy, this technique uses pessimistic pruning to eliminate unnecessary branches in the tree. To treat continuous attributes, the algorithm segregates values into two divisions. The tree is pruned to avoid overfitting and can be seen in
A basic yet very powerful solution is based on combining some weak classifiers into a strong classifier. Weak (or basic)” means poor performance and accuracy classifier is relatively low. It is especially for classification problems. It performs selecting the training set for each new classifier. A random subset of the overall training set will be equipped for every weak classifier. If each classifier has been equipped, the weight of the classifier is determined based on its accuracy. Mathematically, can be calculated as in
The final classifier is composed of weak classifiers ‘T’, “H t (x)” is the output of the low classifier ‘t’, “alpha t” is the weight added by AdaBoost to the classifier ‘t’. The final output is therefore just a linear combination of all the weak classifiers and the final judgment is taken simply by looking at the sign of this sum [
REPT is a quick decision tree. It follows the rationale for the regression tree and produces several trees in different iterations. Post, it selects the best one from all trees produced. The metric used in pruning the tree is the mean square error on tree predictions. It constructs a decision/regression tree using information gain as the criteria for separation, and prunes it using reduced pruning of errors and helps in reducing the variance. This sort values just once for numerical attributes. Missing values are addressed using the approach used by C4.5 for utilizing fractional instances [
A neural network is an ML built on a human neuron model. This algorithm was designed to simulate the human brain neurons. It involves several related processing units working together to process information. This is composed of several associated nodes or neurons, and one node’s output is another’s an entry. Every node receives several inputs but only one value is generated. A commonly used form of ANN, the Multi-Layer Perceptron (MLP) consists of an input layer that reflects the raw input that is allowed to flow through the network., hidden layers that identify the operation of each hidden object, and an output layer which depends on the operation of the hidden units and the weights between the hidden units and the output units. The important parts include the synapses, defined by their weight values, the summing junction (integrator), and the activation mechanism [
SVM is a supervised algorithm for ML that can be used for classification or regression problems. The goal of the support vector machine algorithm is to find a hyperplane in an N-dimensional space where n is the number of features that classify the data points distinctly. Hyperplanes are boundaries for decision-making and help to distinguish data points. Support vectors are data points relative to the hyperplane, which influence the hyperplane’s position and orientation, and can be calculated as
Bagging is used to enhance the accuracy and symmetry of ML techniques used in statistical regression and classification. It also helps in reducing variance and avoiding overfitting. Provide work for bagging on classifiers, particularly on decision trees, neural networks improve the precision of classification. Bagging plays a crucial role in the field of HD diagnosis [
LR algorithm is a regression and classification method for examining the dataset in which it contains one or more independent variables that conclude an outcome [
In the training phase, coefficients of instances x1,x2,x3,… xn will be b0,b1,..bn. The coefficients are updated and estimated by stochastic gradient descent.
All the coefficients are 0 initially where l is the learning rate, x is the biased input for b0, which is always 1. The process of updating continues until the correct prediction is made at the training stage [
Bayesian Theorem delivers the foundation of NB. In this, singular parameters subsidize autonomously to the prospect as shown in
For example, the fruit is an apple that gives individualistically to the likelihood of apple, even though somewhat conceivable correlations between the roundness, color, and diameter features for classification. For classifying spatial datasets, the NB algorithm is desirable. This method achieves conditional independence. An attribute value is independent of other attributes to estimate conditional independence. So that it proves fruitful for investigating and getting information as in
SC is a classification technique that generates a binary decision tree. Because the output is a binary tree, it generates only two children. Attribute splitting is performed by the highest entropy. It uses a CV or a large number of the test sample to choose the best tree from a series of the tree which is considered as pruning process. The rationale behind the Simple Cart algorithm is a greedy algorithm in which the locally best feature is selected at each stage. The full process, it is computationally costly. In the implementation process, the dataset is divided into two groups that are unique concerning the outcome. This process continues until the small size subgroup reached [
A DS is a learning model that consists of one decision tree. It has one internal node which connects to leaves immediately. For prediction, it uses a single input attribute. It is also known as 1-rules. There are variations possible depending on the type of input attribute. For binary and nominal type attributes, two leaves are possible. For numeric attributes, some values are selected and the stump contains two leaves such as below and above threshold [
Model assessment is the essential goal of any research work. It is important to evaluate with some standard evaluation measures/models. For evaluation of algorithms to achieve higher accuracy and lower error, assessment metrics involved namely MAE [
MAE can be determined by compelling the difference of incessant variables, for example, foreseen and witnessed values, final time against initial time. It can be calculated as
Tentative or probing values are the two variables on which relative absolute error relies on. To measure the relative RAE, these two criteria must be recognized. RAE is obtained by the ratio of absolute error and the experimental value. Percentage or fraction is used to indicate relative absolute error because it has no units.
The numerator is equivalent to 0 for a good suit, and Ei = 0.
Accuracy is one criterion for assessing models of classification. Informally, accuracy is the proportion of our model’s observations that was accurate. Accuracy can be find using
It is the ratio of optimistic successfully expected observations to predicted positive all-out observations. It can be calculated as
A recall is the percentage of exactly expected optimistic findings to other actual class findings – yes. It can be calculated as
It is the weighted average of Precision and Recall. This ranking takes into account all false negatives and false positives in certain lines. Instinctively it is not as straightforward as precision, however, F1-Measure is normally more helpful than accuracy, particularly if you have a lopsided class distribution. To find F-measure the
This section presents the experimental results of SC and SVM with a comparison to employed techniques. Firstly, the results of employed models are discussed and then the results of SC and SVM are presented. The results are taken using two different HDDs. The results are evaluated using MAE, RAE, accuracy, precision, recall, and f-measure as evaluation metrics.
This section presents outcomes obtained through the analysis of classification algorithms. These models are evaluated on two different datasets using six evaluation metrics. Ten classifiers including J48, NB, LR, SC, Bagging, DS, AdaBoost, ANN, REPT, and SVM were tested on HDDs including the UCI dataset and Kaggle dataset with 10 fold CV on evaluation metrics which include RAE, MAE, correctly and incorrectly instances, accuracy, precision, recall, and f-measure for analyzing which algorithm works best in predicting HD.
S.No | Technique | CI | ICI | ||
---|---|---|---|---|---|
1 | SC | 248 | 81.50% | 55 | 18.15% |
2 | J48 | 238 | 78.55% | 65 | 21.45% |
3 | ANN | 236 | 77.89% | 67 | 22.11% |
4 | Bagging | 249 | 82.18% | 54 | 17.82% |
5 | REPTree | 240 | 79% | 20.79% | |
8 | LR | 249 | 82.10% | 54 | 17.82% |
7 | AdaBoost | 247 | 81.51% | 56 | 18.48% |
8 | NB | 251 | 82.80% | 52 | 17.16% |
9 | DS | 225 | 74.26% | 78 | 25.74% |
10 | SVM |
S.No | Techniques | RAE | MAE |
---|---|---|---|
1 | SC | 53.70% | 0.26 |
2 | J48 | 50.28% | 0.24 |
3 | ANN | 43.79% | 0.21 |
4 | Bagging | 56.30% | 0.27 |
5 | REPTree | 57% | 0.28 |
6 | LR | 47% | 0.23 |
7 | AdaBoost | 46.38% | 0.23 |
8 | NB | 41.60% | 0.2 |
9 | DS | 75.24% | 0.37 |
10 | SVM |
As the SVM algorithm performed better on UCI dataset than the other techniques with lower error rates,
S No | Techniques | Diff in RAE | Diff in MAE |
---|---|---|---|
1 | SVM with SC | 20.44 | 0.1 |
2 | SVM with J48 | 17.02 | 0.08 |
3 | SVM with ANN | 10.53 | 0.05 |
4 | SVM with Bagging | 23.04 | 0.11 |
5 | SVM with REPTree | 23.74 | 0.12 |
6 | SVM with LR | 13.74 | 0.07 |
7 | SVM with AdaBoost | 13.12 | 0.07 |
8 | SVM with NB | 8.34 | 0.04 |
9 |
For evaluating algorithms, there ought to be some metric to predict the correctness of the algorithm. For this, accuracy is highly important to check how correctly it is performing.
Technique | Precision | Recall | F-Measure | Accuracy |
---|---|---|---|---|
SC | 0.823 | 0.818 | 0.816 | 81.5% |
J48 | 0.785 | 0.785 | 0.785 | 78.55% |
ANN | 0.77 | 0.77 | 0.77 | 77.89% |
Bagging | 0.82 | 0.82 | 0.82 | 82.18% |
REPTree | 0.79 | 0.79 | 0.78 | 79% |
LR | 0.82 | 0.82 | 0.82 | 82.1% |
AdaBoost | 0.81 | 0.81 | 0.81 | 81.51% |
NB | 0.83 | 0.82 | 0.82 | 82.8% |
DS | 0.74 | 0.74 | 0.74 | 74.26% |
SVM |
When comparing the SVM with the employed techniques on the UCI dataset, the difference of accuracy between SVM and employed techniques is given in
Here, v1 represents the value of SVM while v2 represents the value of other techniques compared with SVM. While in
S. No | Technique | CCI | ICI | ||
---|---|---|---|---|---|
1 | SC | ||||
2 | J48 | 1005 | 98.05% | 20 | 1.9% |
3 | ANN | 979 | 95.51% | 46 | 4.4% |
4 | Bagging | 969 | 94.54% | 56 | 5.4% |
5 | REPTree | 952 | 92.88% | 73 | 7.1% |
6 | LR | 866 | 84.49% | 159 | 15.5% |
7 | AdaBoost | 864 | 84.29% | 161 | 15.70% |
8 | NB | 852 | 83.12% | 173 | 16.8% |
9 | DS | 779 | 76.00% | 246 | 24% |
10 | SVM | 863 | 84.20% | 162 | 15.8% |
S. No | Techniques | RAE | MAE |
---|---|---|---|
1 | SC | ||
2 | J48 | 4.11% | 0.02 |
3 | ANN | 11.16% | 0.05 |
4 | Bagging | 24.64% | 0.12 |
5 | REPTree | 19.02% | 0.09 |
6 | LR | 45.00% | 0.224 |
7 | AdaBoost | 41.88% | 0.2 |
8 | NB | 39.21% | 0.195 |
9 | DS | 73.05% | 0.365 |
10 | SVM | 31.63% | 0.158 |
As the SC algorithm performed better on Kaggle dataset than the other techniques with lower error rates,
S No. | Technique | Diff in RAE | Diff in MAE |
---|---|---|---|
1 | SC with J48 | 0.81 | 0.01 |
2 | SC with ANN | 7.86 | 0.04 |
3 | SC with Bagging | 21.34 | 0.11 |
4 | SC with REPTree | 15.72 | 0.08 |
5 | SC with LR | 41.7 | 0.21 |
6 | SC with AdaBoost | 38.58 | 0.18 |
7 | SC with NB | 35.91 | 0.18 |
8 | SC with DS | ||
9 | SC with SVM | 28.33 | 0.14 |
For evaluating algorithms, there ought to be some metric to predict the correctness of the algorithm. For this, accuracy is highly important to check how correctly it is performing.
Techniques | Precision | Recall | F-Measure | Accuracy |
---|---|---|---|---|
SC | ||||
J48 | 0.98 | 0.98 | 0.98 | 98.05% |
ANN | 0.95 | 0.95 | 0.95 | 95.51% |
Bagging | 0.945 | 0.945 | 0.945 | 94.54% |
REPTree | 0.929 | 0.929 | 0.929 | 92.88% |
LR | 0.84 | 0.84 | 0.84 | 84.49% |
AdaBoost | 0.84 | 0.84 | 0.84 | 84.29% |
NB | 0.835 | 0.83 | 0.83 | 83.12% |
DS | 0.76 | 0.76 | 0.76 | 76.00% |
SVM | 0.849 | 0.842 | 0.841 | 84.20% |
When comparing SC with the employed techniques on the Kaggle dataset, the difference of accuracy between SC and employed techniques is given in
This study aims to perform an empirical analysis of ten different ML classification algorithms on two different HDDs taken from Kaggle and UCI repositories. On both datasets, results, after the assessment is a heterogeneous due to each dataset, containing a different amount of instances dataset according to attributes and most important, different amounts (percentage) of effective and non-effective patient records.
S. No. | Assessment measures | Dataset from UCI repository | Dataset from kaggle repository |
---|---|---|---|
1 | RAE | SVM | SC |
2 | MAE | SVM | SC |
3 | Recall | SVM | SC |
4 | F-Measure | SVM | SC |
5 | Precision | SVM | SC |
6 | Accuracy | SVM | SC |
This work may have certain limitations that are addressed as challenges to authenticity. These risks include changing the dataset, increasing or decreasing the number of occurrences in the dataset, which may alter the results of this work. It is also possible that using new procedures or assessment criteria would disrupt the present analysis. Furthermore, changing the testing and training criteria will result in a change in the existing outcomes.
The performance of SVM is better on UCI dataset that is due the dataset that applied to algorithms was taken from the UCI repository which contains 303 instances with 14 attributes. The dataset is pre-processed which means the SVM has linearly separated the data causing the margin to be maximized on the UCI dataset. To get the maximum margin to best fit our data, we have used a polynomial kernel function that can plot data in high dimensional. Moreover, the parameters are tuned due to which SVM has better performance on the UCI dataset. According to over data, it is also known about SVM that when there is a clear margin of distinction between classes, SVM performs rather well. In high-dimensional spaces, SVM is more effective, and is effective when the number of dimensions exceeds the number of samples. It also performs and generalizes effectively on data that is not from a sample. Another reason to use SVM is that making minor changes to the derived feature data does not influence the previously predicted results. It is rapidly converging, and as previously indicated in the article Kernel Functionality, in general, the Polynomial kernel appears to be a better factor in terms of SVM [
It is observed from the literature that several kinds of research developed techniques for HDP but it is still a challenging task in terms of increasing accuracy and decreasing error rate. The focus of this research is to improve the accuracy rate of HDP in terms of reducing the error rate of the evaluation metric using two algorithms i.e., SVM for a UCI dataset and SC for Kaggle datasets. The datasets from UCI and Kaggle data repositories were selected and MAE, RAE, accuracy, precision, recall, and f-measure are used as evaluation metrics. The results taken from the proposed models are compared with the results of the employed techniques used for comparative analysis. The eventual goal of this research is to reduce the error rate and maximize accuracy for techniques used in research for HDP. Improvement in results can be performed by using the latest algorithm with the latest datasets can also be applied to improve the accuracy results of HDP and merging the strength of SVM, SC to enhance the proficiency and performance which is known as hybridization. In the future, we can combine SVM with SC to design an ensemble model that may produce better performance on any relevant dataset. Moreover, SVM or SC can also be hybridized with any other searching optimization techniques to find a better solution for aforementioned problem.
Authors would like to acknowledge the support of the Deputy for Research and Innovation- Ministry of Education, Kingdom of Saudi Arabia for this research through a grant (NU/IFC/ENT/01/014) under the institutional Funding Committee at Najran University, Kingdom of Saudi Arabia.