|Intelligent Automation & Soft Computing |
Heart Failure Patient Survival Analysis with Multi Kernel Support Vector Machine
1School of Information Technology & Engineering, Vellore Institute of Technology, Vellore, 632014, India
2Lord Buddha Education Foundation & Scientific Research Group in Egypt (SRGE), Kathmandu, 44600, Nepal
3School of Computer Science and Engineering SCE, Taylors University, Slangor, 47500, Malaysia
4Material Science Research Institute, King Addulaziz City for Science and Technology (KACST), Riyad, 6086, Kingdom of Saudi Arabia
5General Administration of Research and Development Laboratories, King Abdulaziz City for Science and Technology (KACST), Riyad, 6086, Kingdom of Saudi Arabia
*Corresponding Author: Zahrah A. Almusaylim. Email: email@example.com
Received: 03 April 2021; Accepted: 05 May 2021
Abstract: Heart failure (HF) is an intercontinental pandemic influencing in any event 26 million individuals globally and is expanding in commonness. HF healthiness consumptions are extensive and will increment significantly with a maturing populace. As per the World Health Organization (WHO), Cardiovascular diseases (CVDs) are the major reason for all-inclusive death, taking an expected 17.9 million lives per year. CVDs are a class of issues of the heart, blood vessels and include coronary heart sickness, cerebrovascular illness, rheumatic heart malady, and various other conditions. In the medical care industry, a lot of information is as often as possible created. Nonetheless, it is frequently not utilized adequately. The information shows that the produced picture, sound, text, or record has some shrouded designs and their connections. Devices used to remove information from these data sets for clinical determination of illness or different reasons for existing are more uncommon. 4 cases out of 5 CVD dying are due to heart attacks and strokes, 33% of these losses of life happen roughly in peoples under 70 year of age. In the current work, we have tried to predict the survival chances of HF sufferers using methods such as attribute selection (scoring method) & classifiers (machine learning). The scoring methods (SM) used here are the Gini Index, Information Gain, and Gain Ratio. Correlation-based feature selection (CFS) with the best first search (BFS) strategy for best attribute selection (AS). We have used multi-kernel support vector machine (MK-SVM) classifiers such as Linear, Polynomial, radial base function (RBF), Sigmoid. The classification accuracy (CA) we received using SM is as follows: SVM (Linear with 80.3%, Polynomial with 86.6%, RBF with 83.6%, Sigmoid with 82.3%) and by using CFS-BFS method are as follows: SVM (Linear with 79.9%, Polynomial with 83.3%, RBF and Sigmoid with 83.6%).
Keywords: Heart Failure (HF); Cardiovascular diseases (CVD); Scoring method (SM); Correlation-Based Feature Selection (CFS); Best First Search (BFS); MK-SVM; Classification accuracy (CA); Classifiers
Healthcare monitoring frameworks include preparing and investigating information recovered from smartphones, watches, wristbands, just as different sensors and wearable gadgets. Such frameworks empower persistent observing of patient’s mental and medical issue by detecting and sending estimations, for example, pulse, electrocardiogram, internal heat level, respiratory rate, chest sounds, or circulatory strain. Heart failure (HF) is a medically complex disease. HF, otherwise known as congestive HF or congestive cardiac failure, is where the heart cannot pump sufficiently to keep up the circulation system to address the body’s tissues. Conditions depict it, for instance, shortness of breath, etc., that may be joined by signs, for example, aspiratory snaps, & periphery edema, realized by assistant & furthermore useful heart or non-cardiovascular inconsistencies. HF is a certified concern related to high demise rates. As demonstrated by the European Society of Cardiology (ESC), twenty-six thousand adults exhaustively are resolved to have HF, while three thousand six hundred are as of late investigated per year. 17%–45% of the persons encountering HF pass on inside the fundamental years and the remainder of the failure miserably inside 5 yr. The board prices related to HF are approximately 1%–2% of every human help utilization, with an enormous segment of them associated with redundant emergency center affirmations . Extended regularity, increasing costs for human administration, rehabilitated hospitalizations, reduced quality of life (QoL), and premature death have transformed HF into a global scourge. Include the necessity for premature assurance (revelation of the closeness of HF & approximation of its earnestness) & appropriate treatment. A clinical investigation, including physical appraisal, is reinforced by subordinate tests, for instance, blood tests, chest radiography, electrocardiography (ECG) & echocardiography (echo) .
There are past models of risk for HF sufferers . Every use of a single patient partner and, therefore, their generalizability to specific populations is sketchy. The development of each model is of a restricted partner scale, recognizing the need to gauge the concept of effective risk estimation truly. Likewise, most models are restricted to individuals with reduced left-ventricular ejection (EF) fractions, thus debating numerous HF sufferers with rescued EF. The Global Meta-Analysis Global Group in Chronic HF (MAGGIC) offers a massive opportunity to create a prognostic model in HF patients with reduced and assured EF . With more than 10 million ECGs per year in Medicare patients alone, ECG is essential for demonstrating cardiology . From these image details, cardiologists usually evaluate some estimates to appear at a final clinical understanding, given the section characteristics of a patient & many clinical histories. This clinical understanding helps in assessing, conceiving tenacious outcomes, and eventually supervising supporting decisions. With the expanding proportion of obtained picture information & estimations from the photos, individuals may not now choose to decode this information near the rapidly developing proportion of electronic health record (EHR) information.
E.g., at our establishment, there are more than 500 estimates obtained from the echo. Each echo consists of millions of pictures, and there is 76 International Classification of Diseases, Tenth Revision (ICD-10), significant level indicative codes falling within the class of “circulatory system disorders”. With this information measure & the restricted time available to doctors for translation, the full potential of echo information is not recognized in flow clinical practice.
Machine learning (ML) can help solve these hurdles by providing automated analysis of large clinical datasets . Believe it or not, ML has been bit by bit infiltrating into cardiovascular investigation starting at now, with late models in the composing demonstrating its ability to do the going with 1) outfit help with testing differential examinations, for instance, restrictive cardiomyopathy vs. constrictive pericarditis  & physiological versus fanatical cardiovascular hypertrophy ; 2) consistency of the coronary tomography data ; 3) predict the fatality of in-crisis facilities in patients with a treatment program for stomach aortic aneurysm ; & 4) make specific groupings or ‘phenotypes’ of patients with cardiovascular failure and protected ejection fraction . These examinations exhibited the assurance of ML in cardiovascular prescription. In any case, test sizes in these tests are typically limited for ML to take from various features, especially given the propensity to overfit small data sets . Likewise, a huge segment of these studies probably used wide and full data sets obtained, which might not reflect brief therapeutic applications that would be possible with EHR data.
Recently, there has been no move to use ML to exploit traditional clinical & echo data to help specialists anticipate bringing about the large populations of patients experiencing echo during daily clinical thinking. We believed that nonlinear ML techniques would have superior accuracy over differentiated and direct mortality models that utilizing all estimations got from routine echo would outfit unparalleled desire exactness differentiated & standard clinical data (checking fundamental clinical danger scores ) & specialist itemized left ventricular ejection fraction (LVEF).
The main contributions of this paper are:
• In the proposed work, we are using two feature selection methods, namely correlation-based and ranking-based methods.
• In the correlation-based method, we apply the Correlation-based feature selection (CFS) for selecting the best features and in the ranking-based method we have used Information Gain, Gain Ration & Gini Index for selecting the best features.
• In both the cases after feature selection, we applied MK-SVM classifiers to predict the patient’s survival, and rank the features corresponding to the most critical risk factors.
• The decision of treatment based on the patient’s condition could be substantiated with this system’s guidance.
The rest of the artifact is structured as follows: Section 2 depicts the existing works on this area. The proposed approach is detailed in Section 3 along with feature selection and other ML methods used here. Section 4 presents the dataset in detail. Section 5 presents the MK-SVM in detail. Section 6 presents the performance analysis in detail and the conclusion of the present work is discussed in Section 7 along with the future scope.
2 Literature Review
We have reviewed some of the existing works in this area as evaluated the exhibition of SHFM utilizing EHRs at Mayo Clinic and tried to build up a hazard forecast model utilizing ML procedures that apply routine clinical consideration information. Tripoliti et al.  meant to introduce the best in a class of the AI philosophies applied to evaluate cardiovascular breakdown. Hsich et al.  discussed the estimation of a natural, vigorous way to deal with variable selection, random survival forests (RSF), is an enormous clinical associate. Austin et al.  thought about presenting present-day grouping and regression techniques with order and regression trees to characterize patients with HF into one of two unrelated classes (HFPEF Vs. HFREF) to foresee the likelihood of the nearness of HFPEF. Samad et al.  presented a prescient report indicating how AI could help distinguish patients at the most elevated hazard for mortality and recognize the most significant variables identified with this raised hazard. Kalscheur et al.  tried to utilize an ML calculation to build up a model to anticipate results after CRT. Taslimitehrani et al.  suggested to use a grouping calculation, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic misfortune function, to create & approve prognostic hazard models to anticipate 1, 2, & multiyear survival in HF utilizing information from EHRs at Mayo Clinic. Venkatesh et al.  tried the capacity of RSF, an ML method, to anticipate 6 cardiovascular results in contrast with standard cardiovascular hazard scores. Chicco et al.  investigate a dataset of 299 ill persons with HF gathered in 2015 and applied a few ML methods to both foresee the patient’s survival & rank the features relating to the maximum significant hazard characteristics. Kwon et al.  planned to create and approve a deep-learning-utilized acute HF (DAHF) algorithm for foreseeing mortality. Miao et al.  built up an extensive risk model for anticipating HF mortality with a significant stage of exactness utilizing an improved random survival forest (iRSF). Guidi et al.  introduced a clinical decision support system (CDSS) for the investigation of HF ill persons, giving different yields, for example, an HF seriousness assessment, HF-type forecast, just as an administration interface that looks at the changed patients’ subsequent meet-ups. Alashban et al.  expected to characterize blood glucose levels for CHD risk agents utilizing the exceptionally disproportionate Framingham Heart Study dataset & to distinguish an appropriate dietary arrangement that lessens the risk of creating CHD. Vankara et al.  tried to depict a novice ensemble learning technique that empowers high heart disease forecast precision with negligible erroneous reporting. Jiang et al.  worked on a three-tiered investigation by incorporating transcriptional information & pathway data to investigate HF’s shared characteristics from various aetiologies. Wang et al.  showed that LV diastolic dyssynchrony (diastolic LVMD) boundaries have important prognostic qualities for DCM patients. Nascimento et al.  surveyed how four non-morphological feature selection techniques gave valuable ECG characterization and suggested an advancement in the arrangement of the structural co-occurrence matrix (SCM) by including it with the Fourier transform to remove the principal frequencies of the sign. As per , CVDs are the major reason for deaths globally than any other disease. Finding out the chances of CVD in a very early stage is most important and increases the patient survival rate. And in this case, ML can perform an outstanding role, so, in the current work, we have tried to foresee the survival chances of HF patients using methods such as attribute selecting & ML. In Ishaq et al. , a successful and effective ML based procedure is recommended for the expectation of heart patients’ endurance. ML procedures incorporate LR, AdaBoost, RF, GBM, G-NB, SVM and SMOTE is applied to manage class awkwardness issue. The point of ’s investigation is to distinguish the best indicators of mortality among clinical, biochemical and progressed echocardiographic boundaries in acute heart failure (AHF) patients conceded to coronary care unit (CCU). The reason and curiosity of ’s investigation is to characterize another customized monitoring device abusing the time-varying meaning of drug adherence, inside a joint modeling approach.
3 Proposed Methodology
To provide a practical system, the selection of attributes along with algorithms plays a vital role. The discussion of our work focused on two methods, with the set of ML algorithms. In the initial phase of both works, the entire dataset is taken into consideration. Following that set of attributes is selected based on their importance using two different approaches and ML algorithms. Fig. 1, illustrates the working of the HF prediction system.
3.1 Feature Selection (FS)
It is one of the main thoughts in ML, which tremendously influences the introduction of any model. One can utilize the information features to set up the ML models to sway the show one can achieve. Unimportant or not entirely appropriate features can oppositely influence model execution FS should be the first and most noteworthy development of any model organizing. FS is the place you like this or truly choose the features that devote most to the figure variable you are enthused about. Considering unimportant features in the data can diminish the models’ precision and cause your model to learn subject to superfluous features. Minimizes Overfitting: Less repetitive information implies less chance to settle on decisions dependent on the commotion.
• Enhances Accuracy: Less misleading information implies displaying exactness improves.
• Minimizes Training Time: less information focuses on lessening algorithmic complexities & trains algorithm quicker .
3.2 Scoring Method (SM)
Multiple SMs available to make the best feature selection from the dependent variable. The score is meant to assess the worth of the feature based on the classification task. Usually, class labels’ values depend on the value of the features, and hence the feature selection process is too vital in deciding the performance of the entire system.
3.2.1 Information Gain (IG)
IG is also known as mutual information. It quantifies the decrease in entropy by parting a dataset as indicated by a given estimation of an arbitrary variable. It works fine for most cases, except if you have a couple of variables that have an enormous number of classes. IG is one-sided towards picking attributes with countless qualities as root nodes [35,36].
Here ‘p’ denotes probability, which is an entropy function in Eq. (1).
3.2.2 Gain Ratio (GR)
This is an alteration of IG that diminishes its predisposition and is generally the ideal alternative. GR beats IG by considering the number of branches that would result before doing the split. It adjusts IG by considering the inherent data of a split .
3.2.3 Gini Index (GI)
It is also known as Gini impurity, which computes the probability of a specific feature that is classified wrongly when selected arbitrarily. If every element is connected through a single class, it can be called pure distribution .
Here ‘Pi’ represents the probability of a feature being classified for a distinct class in Eq. (2).
By applying the scoring methods, the best 2 attributes are selected to optimize the CA and training model in the HF prediction system. Tab. 1 provides the selected attributes with the values that show the importance of selection.
3.3 Attribute Selection (AS)
AS typically holds two parts having an attribute evaluator and search method. A central issue in ML recognizes a representative arrangement of features from which to construct a portrayal model for a particular task. CFS positions attribute, as shown by a heuristic evaluation work subject to relationships. The limit computes subsets made of trademark vectors that were related to the class mark, etc. The CFS procedure acknowledges that unessential features depict the estimation should neglect a low association with the class & in this way. Of course, excess features should be assessed, as they usually are determinedly related within any event, one of the various attributes. The standard utilized to study a subset of features could be imparted as follows: the appraisal of a subset of involving features is the typical relationship regarding features & class names & is the ordinary connection among two features . Best first works on the principle of a greedy algorithm . In this work, close search after five node expansions and the best subset found is 0.579 using full training set option and target as death event—the selected attributes with this process given in Tab. 2.
3.4 Dataset Description
HF clinical record dataset considered for our ML work is the benchmark dataset available in the UCI ML repository . The primary dataset is populated in the year 2017 from Government College University, Faisalabad, Pakistan by Tanvir Ahmad et al. and recently in the year January 2020, it is updated by Davide Chicco of Krembil Research Institute, Toronto, Canada. 13 clinical features are considered out of which 12 tries to map to the class label death event. The number of instances is 299 and life-related dataset without missing values. The twelve feature value acts as the deciding factor of the deceased in case of HF condition.
• Age – Mentioned in Years
• Anemia – Hemoglobin deficiency indicated in Boolean
• High Blood Pressure – Hypertension indicated in Boolean
• Creatinine Phosphokinase (CPK) – Enzyme level in blood given in mcg/L
• Diabetes – Presence of blood sugar indicated in Boolean
• Ejection Fraction – % of blood leaving at each contraction measured in percentage
• Platelets – number of blood cells that help in clot expressed in kiloplatelets/mL
• Sex – Woman/Man indicated in Binary
• Serum Creatinine – Indication about renal blood flow indicated in mg/dL
• Serum Sodium – The level of blood sodium in m Eq/L
• Smoking – Sufferer is smoker or non-smoker in Boolean
• Time – Check out period provided in Days
• Death Event – Patient deceased in between the checkout period indicted in Boolean and act as a class label.
The values of each feature help in mapping the patient who is prone to death or not. This data acts as a great aid to take the necessary action in the subsequent diagnosis of the different patients with similar symptoms. The medical field intervened with the data analysis, and treatment rendering has enormous potential. The way of handling the vast existing data and applying suitable algorithms with the feature selection approach makes the system highly reliable in exhibiting good performance measures.
4 The Multi-Kernel Support Vector Machine (MK-SVM)
Different sets of attributes are received from two different approaches in the Tabs. 1 and 2 are applied to the SVM for training purposes. Sampling with a 10-fold stratified cross-validation method is utilized in the training phase. An SVM is a supervised AI model that utilizes order calculations for two-bunch grouping issues. In the wake of giving an SVM model arrangement of named preparing information for every class, they’re ready to classify new content .
The SVM computation’s main goal is to figure out a hyperplane in an N-dimensional area (N — the number of features), which unquestionably groups the data . SVM calculations utilize a bunch of numerical functions that are characterized as the kernel. The kernel can accept information as information and change it into the necessary structure. Distinctive SVM calculations utilize various sorts of piece functions. These functions can be of various sorts. Here we have considered SVM kernels, namely linear, nonlinear, polynomial, RBF, and sigmoid.
4.1 Linear Kernel SVM
It is utilized for data that can be separated in a linear fashion, which is defined as data that can be divided into two groups using only a single straight line. It aims to increase this margin as much as possible in order to improve its generalization potential.
The linear function is defined as the dot product of 2 vectors z1 & z2 as in Eq. (3):
4.2 Polynomial Kernel SVM
It’s a more generalized linear kernel representation.
A polynomial kernel is defined as:
where e is the degree of the polynomial and vectors z1 & z2 in Eq. (4).
4.3 RBF SVM
Gaussian is one such kernel giving good linear separation in higher dimensions for many nonlinear problems.
where, α > 0 & α = 1/2σ2 in Eq. (5).
The parameter that you can adjust Sigma has a significant impact on the kernel’s efficiency and should be fine-tuned to the specific problem at hand. When the exponential is overestimated, it behaves almost linearly, and the higher-dimensional projection loses its non-linear power. On the other hand, if the feature is undervalued, it will lack regularization, and the decision boundary will be extremely sensitive to noise in the training data. The value of its function is determined by the distance from the origin or from a specific point.
4.4 Sigmoid SVM
This kernel uses the tanh function. This can be used as a proxy for a neural network shown in Eq. (6).
where α is the slope & d is the intercept constant. A common value for α is 1/N, where N is the data dimension. Function is analogous to a 2-layer perceptron neural network model, which is used as an artificial neuron activation function.
5 Performance Analysis
ML tasks of characterization proficiency demonstrated with the assistance of boundaries like True Positive (TP), True Negative (TN), False Positive (FP) & False Negative (FN). These qualities go about as a spine in measuring the model’s characterization. The referenced 4 boundaries are the piece of the arrangement dataset. PC with 8 GB RAM was utilized for the test reason utilizing Orange data mining software. Eqs. (7–10) provides a way of computing the metrics to illustrate the classifier working.
5.1 Evaluation Results in SM
5.1.1 Confusion Matrix (CM)
A confusion matrix is the matrix format that represents achieving the target compared to actual and predicted values. In Fig. 2, 240 instances are correctly classified in the Linear SVM model, and 59 instances are misclassified. Polynomial SVM, as shown in Fig. 3, has 259 correctly and 40 misclassified instances. Similarly, in Fig. 4, RBF SVM 250 instances are correctly classified, whereas 49 misclassified instances are there. In Fig. 5, Sigmoid SVM, 246 instances are correctly classified, but 53 instances are wrongly classified.
5.1.2 ROC Plot
ROC curves, which map sensitivity as a function of specificity for all limits imaginable, demonstrate a classifier’s trade-off among true positives and false negatives. For a given specificity value, a higher sensitivity value implies higher performance. The region under the ROC curve (AUC) is a widely used metric for determining a classifier’s perfectness. Sensitivity is the term that provides insight into the number of actual positive cases predicted by the model correctly. Other names are recall and true positive rate.
On the other hand, specificity is the number of times the model predicts the actual negative cases perfectly. In the SM feature selection system, the AUC of SVM-RBF is highest than the other methods as per the ROC curve discussion point of view. Fig. 6. Represents the ROC curve for the SM with various classification approaches.
5.2 Evaluation Results in CFS-BFS
5.2.1 Confusion Matrix (CM)
In the Linear SVM model, as shown in Fig. 7, 239 instances are correctly classified and 60 instances are misclassified. The Polynomial SVM model as shown in Fig. 8, have 249 correctly classified instances followed by 50 misclassifications. Similarly, in Fig. 9, SVM RBF has 250 correct classified instances followed by 49 misclassified instances and in Fig. 10, Sigmoid SVM 250 instances are correctly classified with 49 misclassified cases.
5.2.2 ROC Curve
In the CFS – BFS method, the AUC of NB is highest than that of other methods used as per the ROC curve discussion point of view. Fig. 11, represents the ROC curve for the CFS-BFS with various classification approaches. In the CFS-BFS feature selection system, the AUC of SVM-RBF is highest than the other methods per the ROC curve discussion point of view.
5.3 Comparison of SM and CFS-BFS with MK-SVM
Comparing both the AS with the classifier models, as mentioned in Tab. 3, provides the insight that SM is showing good accuracy results. In the case of SM, the classifier model SVM (RBF & Sigmoid) gives good accuracy, and in the case of CFS-BFS SVM Polynomial so overall, the SVM method is advisable. Based on the dataset and approaches used, the results are varying. Fig. 12, presents the CA comparison between SM & CFS-BFS methods.
6 Conclusion and Future Work
In common ML intervening, health care helps provide the earlier indication of the patient’s criticality. Various parameters are responsible for the illness, and the present lifestyle and eating habits greatly impact people’s health conditions. Selecting the attributes that provide the perfect decision-making is the need of an hour. Prediction of the HF by cumulating various related parameters with expert knowledge is a common perspective in the healthcare industry. To make accurate decision making in the presence of the few parameters followed by applying the ML is the hybrid method that builds the system. The mortality rate is high due to HF and we need accurate systems that can tell these in an earlier phase. In this work, we hybridized the ML with two types of AS approaches. In the first approach, an SM combination of IG, GR & GI is used. In the second approach, CFS with BFS is utilized. The selected attributes from each process are applied to MK-SVM (linear, polynomial, RBF, sigmoid). Based on the results received from our experimentation, we can state that SM methods are best while predicting HF than multi CFS-BFS methods. As future work, we plan to use a bio-inspired feature selection process and deep learning to get optimized results. Concentrating more on the parameters and discussion with medical experts will provide more insight into the collected data structure. Basic awareness about the food and lifestyle will help to reduce death due to HF. Avoiding fast food, smoking, alcohol, and maintaining a peaceful mind will be added to social messages that reduce the casualty.
Acknowledgement: We would like to thank the Prince Sattam Bin Abdulaziz University, Alkharj, Saudi.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding this research work.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|