According to the World Health Organization, 31% death rate in the World is because of cardiovascular diseases like heart arrhythmia and heart failure. Early diagnosis of heart problems may help in timely treatment of the patients and hence control death rate. Heart sounds are good signals of heart health if examined by an expert. Moreover, heart sounds can be analyzed with inexpensive and portable medical devices. Automatic heart sound classification can be very useful in diagnosing heart problems. Major focus of this research is to study the existing techniques for heart sound classification and develop a more sophisticated method. A signal processing technique is been proposed for heart sound classification. Five classifiers, Naive Bayes algorithm, Sequential (SMO), J48, Rep tree and Random Forest (RF) are used for this experiment. A detailed experimentation is performed to fine-tune the method and finally results are compared with the existing systems. The best proposed classifying technique results the overall accuracy of 91.33%.
Cardiac disease is one of the main reasons of death around the world [
Heart disease is a main health problem and a leading cause of death throughout the world. The treatment of cardiac disease can be easy, efficient and cost effective if the disease is diagnosed early and clearly. If the disease is detected early, it would be more convenient to take suitable measures [
Dealing with complex data of heart disease such as PCG signals, it seems reasonable to use machine learning both for highlight extraction and classification. The detection of cardiac disease is necessary for patients to survive. Due to lack of proper health facilities and experienced physicians in underdeveloped countries and areas, heart diagnosis is not possible. In case it does happen, it’s critical. The proposed system will be able to categorize heart beat recordings as normal or abnormal. This would be beneficial for the physicians and untrained people to perform an initial screening of a heart disease [
In the recent years, the high concern about heath management administration and medicinal welfare makes the quick advancement of home medical instruments for health care and diagnosis in everyday life [
The objective of this paper is to develop a system for classification of heart sounds into normal and abnormal so the condition of heart could be intermittently checked at home and everyone doesn’t have to wait for symptoms of heart disorder to show up and after that approach a cardiologist. The aim of our thesis is in accordance with verify the probability of using power spectrum density to calculate additional features from cardiac sound recording which can be utilized for normal and abnormal cardiac sound classification.
Another aim is to evaluate diagnostic classification methods for the diagnosis of cardiovascular disease using various methods to determine diagnostic test.
In past, a lot of work has been done for cardiac patients to survive. Different methodologies have been implemented for cardiac disease identification. The following work has been done by researchers for physionet challenge 2016.
Reference [
Reference [
Reference [
Reference [
Reference [
Reference [
Reference [
Reference [
Reference [
Phonocardiogram yet an Electrocardiogram are following estimations over heart movement utilized in imitation of recognize normal besides odd working on heart. Despite the fact so much it measures numerous physical quantities; it shows learning about lengthy momentary inclination community about the physio net challenge utilizing just the ECG records on hand because of the MIT courage recordings database nevertheless yields a score concerning 74 in contrast with the notice rating over 0.82 because of a comparable net skilled of the PCG information. It recommends so much that may remain precious according to put together a transformational neural system in conformity with creates an artificial ECG from a PCG. Such a transformational net would allow in conformity with caparison the knowledge of many years over research about ECG array in accordance with enhance PCG alignment [
Reference [
The author uses the posterior otherworldly information to understand an assortment about applications between mild on means, fluctuation, or action at diverse frequency cable. It locates up to a major part the data relating to variations from the norm is caught in these features, with specific great execution on mumbles. At lengthy last, such fabricates a model, especially an around wooded area regressor, in conformity with symbolize recent examples based on the before reported features. The final execution regarding the project facts got a sure propriety concerning 81% [
Reference [
The proposed system has two stages that are associated with cardiac sound analysis; training and decoding. In the training phase, the input signal is split into overlapping frames and short-time power spectral density (PSD) of the heart sound signals is calculated. As signals have different length so numbers of frames are also different for different signals. Index-wise average of these frames is calculated as statistical features to squeeze the time; and then classification is applied. In the decoding phase, features are extracted from the short-time power spectral density of the heart sound and classification rules, learnt in training phase, are applied to classify the signal.
The frequency domain allows us to see how a range of frequencies have various amount of energy of the signal. To convert a signal to the frequency domain, discrete Fourier transform (DFT) is used. DFT provides the spectrum of the signals and can be calculated using
A fast Fourier transform is used to implement the DFT that significantly reduces the computational complexity. If
The DFT calculated in
Power Spectral Density (PSD) shows power distribution of the signals in the frequency domain. It indicates strong variations for different frequency ranges and it can be valuable for further analysis. We divided the signal into overlapping windows and calculated PSD for each window. This ST-PSD provides significant information about different parts of the signal. Features are extracted from it to get insight of the whole signal.
Let
PSD of the signal is given as Fourier transform of the autocorrelation of the signal and is denoted as:
To calculate feature vector of the heart sound signal, short-time PSD of each window is used. Element-wise mean of PSD of each window is taken, and is used as feature vector. This feature vector will have length equal of size of the window taken.
Feature vector for the signal based on short-time Power Spectral Density
where
Decision tree algorithm is used to classify the heart sound signals into normal/abnormal categories. Decision tree generates the rules by dividing each attribute values into two parts; based on the highest entropy. Feature vectors of all the training signals from normal and abnormal classes are calculated and labels are assigned accordingly.
Han and Kamber (2006) characterized decision tree as a flowchart like tree structure, where each inward node (non-leaf hub) indicates a test on a quality, each branch speaks to a result of the test, and each leaf hub (or terminal hub) holds a class name. The highest hub in a tree is the root hub. Decision Tree once trained generates rules in the form of if-then-else. These rules are used as the trained model and can be applied on the test signals to generate the accuracy of the system.
A 10-fold cross validation strategy is used to measure the performance of the technique. The dataset is randomly split among 10 parts and 9 parts are used for training while 1 part is used for testing. According to [
To measure the performance of the algorithm, Sensitivity, specificity, and accuracy is used. Accuracy is a good performance measure when number of examples in different classes is balanced. As we have imbalance examples, so other performance measures are also used.
where TP is true positive and FN is false negative.
Specificity is the true negative rate i.e., the proportion of negative rows that are correctly identified.
where FP is false positive and TN is true negative.
While accuracy is measured as following:
To measure the performance of the proposed technique, 10-fold cross validation is applied on the data set. Experiments show that algorithm gives good results in terms of the above-mentioned performance measures. Following table provides the quantitative results of the algorithm.
Sensitivity | Specificity | Overall results |
---|---|---|
60% | 93% | 84.94% |
It can be seen from the
A number of experiments are performed in this study to fine-tune the algorithm and to have more robust results. The experiments were conducted on a full training dataset containing 3242 files and 10-Fold Cross Validation was utilized for randomly sampling the training and test sets.
In our first experiment, effect of different window sizes is studied and window size with best results is selected for further experiments. In Section 3.2, overlapping windows are used to calculate the ST-PSD. This experiment tries to find the optimal window size for this problem. Four different window sizes are used to measure and evaluated the performance of our proposed system.
It can be seen that as we increase the window size the accuracy of the technique also increases, but at window size = 256 improvements in results becomes negligible. Hence the selected window size = 256.
In Section 3.2, mean is used to calculate the features from PSD. In this experiment, different other well-known statistical parameters are used instead of mean to calculate the feature vector.
In previous step, the results of 256 window size have given more accuracy. So the accuracy is stable at 256 window size. After fixing 256 window size, we have done experiments on five different statistical parameters i.e., mean, median, variance, standard deviation, and MAD (median of absolute difference from the median).
It can be seen from the figure that calculating features by utilizing MAD gives the best results in terms of accuracy.
Five classifiers, Naive Bayes algorithm, Sequential (SMO), J48, Rep tree and Random Forest (RF) are used for this experiment. In this stage the selected features were used to classify the signals into their predefined (normal/abnormal) classes. Various classifiers were used in this study in order to find out the best classifier that suits the challenge. Classifiers were trained and tested separately on both Physionet2016 datasets using 10-fold stratified cross-validation.
In this section the evaluation and assessment of the proposed system is carried out with the help of five different classifiers.
It can be seen from
In this section, we have presented a comparison of our proposed methodology with existing state-of-art. The experimental results of proposed framework and relevant approaches are presented in
Classification approach | Sensitivity | specificity |
---|---|---|
[ |
86.91% | 84.90% |
[ |
88.48% | 80.48% |
[ |
76.5% | 93.1% |
[ |
84.8% | 77.6% |
[ |
94% | 77% |
Proposed approach | 69% | 96% |
The evaluation and assessment of the proposed system is carried out with the help of three different experiments on the available data set. The experiments have been conducted to determine the effectiveness, strength and performance of the overall proposed classification system. The goal behind these experiments is to demonstrate the classification accuracy of this proposed framework. A methodology for the extraction of noteworthy techniques for the cardiac disease has been presented where random forest has best accuracy rate than others. In random forest technique the correctly classified instances are 2961. The proposed system is evaluated on the delivered results as given in