Automated Deep Learning Based Cardiovascular Disease Diagnosis Using ECG Signals

Automated biomedical signal processing becomes an essential process to determine the indicators of diseased states. At the same time, latest developments of artificial intelligence (AI) techniques have the ability to manage and analyzing massive amounts of biomedical datasets results in clinical decisions and real time applications. They can be employed for medical imaging; however, the 1D biomedical signal recognition process is still needing to be improved. Electrocardiogram (ECG) is one of the widely used 1-dimensional biomedical signals, which is used to diagnose cardiovascular diseases. Computer assisted diagnostic models find it difficult to automatically classify the 1D ECG signals owing to time-varying dynamics and diverse profiles of ECG signals. To resolve these issues, this study designs automated deep learning based 1D biomedical ECG signal recognition for cardiovascular disease diagnosis (DLECG-CVD) model. The DLECG-CVD model involves different stages of operations such as pre-processing, feature extraction, hyperparameter tuning, and classification. At the initial stage, data pre-processing takes place to convert the ECG report to valuable data and transform it into a compatible format for further processing. In addition, deep belief network (DBN) model is applied to derive a set of feature vectors. Besides, improved swallow swarm optimization (ISSO) algorithm is used for the hyperparameter tuning of the DBN model. Lastly, extreme gradient boosting (XGBoost) classifier is employed to allocate proper class labels to the test ECG signals. In order to verify the improved diagnostic performance of the DLECGCVD model, a set of simulations is carried out on the benchmark PTB-XL dataset. A detailed comparative study highlighted the betterment of the DLECG-CVD model interms of accuracy, sensitivity, specificity, kappa, Mathew correlation coefficient, and Hamming loss.


Introduction
Cardiovascular Disease (CVD) is the major reason for human mortality, which is accountable for thirtyone percentage of global mortalities in 2016 [1], from which eighty-five percentage occurred because of heart attack. The yearly burden of CVD on American and European economies is calculated to be $555 billion and €210 billion, correspondingly. The conventional CVD diagnosis model is depending upon single person's medicinal history and investigations. This result is interpreted based on set of quantitative medicinal variables for classifying the person according to the taxonomy of medicinal disease. Medically, cardiovascular diseases are frequently acquired using arrhythmia. Severe arrhythmia could result in heart failure/sudden death [2]. Thus, accurate and timely recognition of arrhythmia is necessary and urgent. Electrocardiogram (ECG) is a 1-dimensional physiological signal which describes the state of the heart, is very important to detect and diagnose arrhythmia.
ECG analyses were determined as the fundamental cardiovascular pathology diagnoses in the present century. The ECG signal reflects the electrical activity of the heart. Therefore, heart rhythm disorder/ alteration in the ECG waveform is evidence of basic cardiovascular issues like arrhythmias. Non-invasive arrhythmia diagnoses are depending upon typical twelve leading ECGs that measure electrical potential from ten electrodes located at distinct portions of the body surface, 6 in the chest, and 4 in the limbs. To give an efficient medication for arrhythmias, an earlier diagnosis is significant. Globally millions of ECG records are gathered yearly, mostly analyzed automatically and interpreted via computers [3,4]. It executes the need for ECG interpretation approaches for accurate and fast however person and device autonomous. The extensive digitization of ECG information combined with the growth of DL techniques that could process huge number of raw data has presented novel opportunities to improve the automatic ECG interpretation. In fact, DNN has currently attained cardiologist level classification efficiency if trained on a huge (n = 91,232) dataset of raw ECG records. But, presented ECG datasets are frequently lesser that creates complexity for attaining required efficiency level.
Earlier recognition of specific kinds of transient, short-term/infrequent arrhythmias needs long-term observing (over 24 h) of electrical activity of the heart. The rapid growth of digital industry was enabled to improvement of data acquisition, devices, and CAD approaches. The open-access to ECG database has result in the improvement of several approaches and techniques for CAD ECG arrhythmia classification in past few years, raising the productive cross disciplinary effort where the physicists' engineers, and nonlinear dynamics scientists are no strangers [5]. Nearly each CAD ECG classification method includes 4 major phases, such as FS, pre-processing of ECG signal, feature extraction, classification creation, and heartbeat recognition. Recently, they have observed significant developments in automated ECG interpretation methods. Particularly, DL based techniques have attained or exceeded cardiologist level efficiency for certain sub-tasks [6,7] or allowed statements are highly complex for making cardiologists, for instance, precisely infer age and gender from the ECG. Because of obvious easiness and decreased dimension related to imaging data, the broader ML communities have attained several attentions in ECG classification as recorded via several studies.
This study designs automated deep learning based 1D biomedical ECG signal recognition for cardiovascular disease diagnosis (DLECG-CVD) model. The DLECG-CVD model involves different stages of operations namely pre-processing, feature extraction, hyperparameter tuning, and classification. At the same time, improved swallow swarm optimization (ISSO) algorithm based deep belief network (DBN) model is applied to derive a set of feature vectors. In addition, extreme gradient boosting (XGBoost) classifier is employed to allocate proper class labels to the test ECG signals. For examining the enhanced diagnostic outcome of the DLECG-CVD model, a series of experimentations take place on the benchmark PTB-XL dataset.
The key contribution of the paper is given as follows.
▪ An efficient 1D biomedical ECG signal recognition model using DLECG-CVD model is presented for cardiovascular diseases. To the best of our knowledge, the DLECG-CVD model has been never presented in the literature. ▪ A novel ISSO based feature selection technique is introduced by incorporating the concepts of levy flight to the SSO algorithm in order to avoid the local optima problem. The design of ISSO algorithm shows the novelty of the work. ▪ Besides, the inclusion of ISSO algorithm as a hyperparameter optimizer helps to improve the classification performance of the DLECG-CVD model for unseen data. ▪ A detailed experimental validation process takes place using PTB-XL dataset and examined the outcomes under several dimensions.
The organization of the paper is given as follows. Section 2 reviews the recent state of art ECG recognition techniques. Section 3 discusses the materials and methods involved in the proposed model. Next, section 4 offers the detailed experimental analysis and section 5 draws the conclusions.

Literature Review
This section offers a comprehensive survey of recently developed ECG recognition and classification models. In Li et al. [8], the morphology and rhythm of heartbeats are merged to 2D data vector to process consequently using CNN which includes biased dropout and adaptive learning rate approaches. The result demonstrates the projected CNN module is efficient to detect abnormal heartbeats/arrhythmias by automated feature extraction. Weimann and Conrad [9] used DCNN for classifying raw ECG records. But, training CNN for ECG classification frequently needs a huge number of annotated instances that are costly to obtain. In this study, they address the issue with the help of TL technique. Initially, they pretrain CNN on the large public dataset of continued raw ECG signals. Then, they fine-tune the network on a smaller dataset for classification of Atrial Fibrillation that is one of the popular heart arrhythmia.
In Pandey et al. [10], an 11-layer DCNN module is presented to classification of MIT-BIH arrhythmia database into five classes based on ANSI AAMI principles. In this CNN module, they implemented a comprehensive end to end structure of the classification technique and employed with no other denoising processes of the database. The main benefit of the novel method was presented in the several classifications that would decrease and should identify, and segmented the QRS complex, avoided. This MIT-BIH database was artificially over-sampled for handling the minority class, class imbalance challenge utilizing SMOTE method. Jeon et al. [11] presented a baseline module with RNN for ECG classification. Moreover, they proposed a light weighted module with combined RNN to accelerate the predictive time on CPU.
Shaker et al. [12] presented a new data augmentation model utilizing GAN technique for restoring the balance of dataset. The 2 DL an end-to-end method and a 2-phase hierarchical approach depending upon DCNN is utilized for eliminating hand engineering features with the combination of feature reduction, classification, and feature extraction to a single learning technique. Nurmaini et al. [13], DL is presented in the fine-tuning and pre-training stages for producing an automatic feature depiction to multi class classification of arrhythmia condition. In pre-training stage, stacked DAE and AE are utilized to feature learning; in fine-tuning stage, DNN is designed as classifiers. Huang et al. [14] presented an accurate classification model based on intelligent ECG utilizing FCResNet. In this introduced system, the MOWPT gives a comprehensive time scale paving pattern and possesses time invariance features that are employed for decomposing the real ECG signal to sub signal samples of various scales. Then, the sample of five arrhythmia forms is employed as input to the FCResNet; hence, ECG arrhythmia types are categorized and recognized. Han et al. [15] enhanced a technique for constructing a smoothed adversarial instance to ECG tracing which is not visible to human expert's analysis and shows that the DL technique to detect arrhythmia from single lead ECG is vulnerable to adversarial attacks. Furthermore, they offer common method to collate and perturb known adversary instances for creating many novel ones.
Wang et al. [16] presented an automated ECG classification technique depending upon Continuous Wavelet Transform (CWT) and Convolutional Neural Network (CNN). The CWT is utilized for decomposing ECG signals to attain various time frequency modules, and CNN is utilized for extracting features in two-dimensional scalogram consist of aforementioned modules. Consider the nearby R peak interval (named RR interval) is beneficial to diagnose arrhythmia, 4 RR interval features are combined and extracted by CNN features to input for fully connected (FC) layer to ECG classification. Peimankar et al. [17] proposed a DL module for real time segmentation of heartbeats. The presented DENS ECG technique, integrates LSTM and CNN module for detecting offset, onset, and peak of distinct heartbeat waveforms like NW, T-waves, P-waves, and QRS complexes. By utilizing ECG as input, the module learns for extracting higher level feature by trained procedure that contrasting to other traditional ML based techniques, removes the feature engineering stage. In Wang et al. [18], a new CNN with NCBAM is presented for classifying automatic ECG heartbeats. Approaches: this presented technique contains 33layer CNN framework accompanied by NCBAM module. At first, pre-processed ECG signals are provide for CNN framework for extracting the channel and spatial features. Additionally, longer range dependences of illustrative features together with channel and spatial axes are taken using non-local attention. Lastly, the temporal, spatial, and channel data of ECG is combined using learned matrix.

Dataset Used
This study utilizes PTB-XL dataset [19] which comprises 21837 ECG signals of 10 s duration from 18885 persons in which 52% of persons are male and the remaining 48% of the persons are female. The ECG data employed for annotation follows the SCP-ECG standard and are allocated to 3 non-mutually exclusive classes such as diagnostic, form, and rhythm. Totally, 71 distinct records have existed that decomposed into 44 diagnostics, 12 rhythm, and 19 form statements. Moreover, the PTB-XL data encompasses 5 classes such as normal ECG (NORM), conduction disturbance (CD), myocardial infarction (MI), hypertrophy (HYP), and: ST/T changes (STTC). Furthermore, a total of 24 subclass labels are also provided.

Overall System Architecture
The overall system architecture of the presented model is illustrated in Fig. 1. The figure showcases that the input ECG signals are primarily pre-processed to convert into a compatible format. Then, the DBN model is applied to extract a useful set of feature vectors. At the same time, the ISSO algorithm is employed to optimize the hyperparameters of the DBN model. At last, the XGBoost classifier is used to allocate the class labels of the input ECG signals. These processes are briefly discussed in the succeeding subsections.

Data Pre-Processing
During data pre-processing, a collection of 3000 ECG signals is used for experimental analysis. As a set of 35 ECG signals include NULL as class label, they get eliminated from the dataset and the remaining 2965 ECG records are employed for simulation. In addition, a sampling rate of 100 is selected among the two sampling rates of 100 and 500 from the dataset.

Structure of DBN Model
Once the 1-D ECG signals are processed, they are inputted to the DBN model to filter the required feature vectors. The DBN presented by Hinton et al. [20] is a typical DNN method that has various RBMs and classifier layers. In standard DNNs, the trained DBN has unsupervised pre-trained of deep RBMs and supervised fine-tuned of the classifier layer. The DBN depicts an optimal feature extraction efficiency and most suitable to feature learning in data. The DBN is a representative FC networks which is simpler than other classic DNNs. It allows the presented model in rule extraction and insertion as to DBN for ECG signal recognition. All RBMs are combined of visible layer which includes the visible unit v ¼ fv 1 ; v 2 ; . . . ; v i g, and hidden layer which has of hidden unit h ¼ fh 1 ; h 2 ; . . . ; h j g: To provide the model parameter of DBN h ¼ ½W ; b; a, the energy function is provided as follows.
Eðv; h; hÞ ¼ À where ω ij refers the connection weight amongst visible units v i if whole number is I and hidden units h. When the whole number is J, b i and a j represents the bias terms of visible and hidden units correspondingly [21].
where δ denotes the logistic function, for instance, δ(x) = 1/1 + exp(x). The RBMs are trained for maximizing the joint probabilities. It is formed by stacking multiple RBMs, in which the output of lth layer (hidden units) is employed as input of l + 1th layer (visible units). The trained process of DBN is generalized as to 2 stages like pre-training and fine-tuning. During the pre-training stage, the data is provide for visible layers of an initial RBM and changed into hidden layers that are frequently applied from next RBM. Next, the layerto-layer unsupervised trained is complete, and feature learnt automatic by DBN in the provided data is feed as to classifier layer of DBN. Lastly, fine-tuning is carried out on the classifier layer for enhancing DBN. Fig. 2 demonstrates the architecture of DBN model.

Design of ISSO Algorithm for Hyperparameter Optimization
For improving the performance of the DBN model, its hyperparameters are optimally tuned by the use of ISSO algorithm. SSO is a population based metaheuristic dependent technique presented by Neshat et al. [22]. Initially, in all iterations, the population is sorted depending upon the value of objective functions. Afterward, the subsequent parts are allocated as follows: 1) The head leader (HL) is the particles with an optimal value of objective functions; 2) The local leaders (LL) are l particles which follow the HL according to the value of objective functions; 3) The random particles are k particle with worse value of objective functions; 4) Explorers are every particle.
At the present iteration, HLs do not transfer, performing as beacons to explore particles that, sequentially, explores the search space among the adjacent LL and HL. The explorer particle varies its locations with the following equations: VHLðz þ 1ÞVHLðzÞ þ randð0; 1Þ ðh best e ðzÞ À h e ðzÞÞ þ randð0; 1Þðh HL ðzÞ À h e ðzÞÞ (7) VLLðz þ 1Þ ¼ VLLðzÞ þ randð0; 1Þ ðh best e ðzÞ À h e ðzÞÞ þ randð0; 1Þ ðh LL ðzÞ À h e ðzÞÞ where θ e implies the location of explorers, θ HL represents the location of HL, θ LL refers the location of LL adjacent to explorers, h best e signifies the optimal location, V denotes the velocity vector of particles, VHL stands for the velocity vector of particles move to HL, and VLL defines the velocity vector of particles move to adjacent LL [23]. It resolves not for selecting the parameters α HL , β HL , α LL and β LL that are utilized for computing the velocity vectors interms of HL and LL. The equation to change the locations of random particles is also changed as the original equation is effort particles to collect at the boundary of search space or even go beyond it. This equation decreases the probabilities of this performance and also permits explorer particle to somewhat affects the performance of random particles. For changing the location of random particle, the subsequent equations are utilized: where θ O implies the location of random particles, θ j refers the location of j-th particle, N represents the entire number of particles from the population, and k signifies the number of random particles. When the end criteria are met, the technique returns the location of HL as novel solution.
The presented techniques give extraordinary optimal solutions. SSO technique is general to their simplicity and their exploitation ability to search for global or near-global solutions. Also, the SSO technique gives an enhanced local search model with optimal first evaluates for resolving the filtered proposal problems. The technique also lies within the model of exploitation, for avoiding the local minimal, to get a global or near-global solution. At this point, additional numbers of younger swallow birds are exploited for searching optimal feed arbitrarily, thus it is never stuck off with local minimal. It gives them exploited birds. At times, it also gives us helpful global data [24]. The product È represents the entrywise multiplication. From this sense, an improved exploitation feature is combined in presented the ISSO technique that is explained under. Eqs. (11)- (14) are similar to the regular SSO mathematically forms that cause the enhanced solutions. These enhanced solutions are more refined by Eq. (15) with the model of Levy distribution function.

V kþ1
HLi Levy ðÞ ¼ u where α refers the step size recently established in this technique, Levy (λ) is attained in the Levy distribution.
Here, Mantegna's technique is utilized. The step size α is selected to optimize exploitation of local search space. At every equation, the coefficients are computed utilizing the mathematically expression is given.
The adding of Levy term as in Eq. (15) uses to update the location earlier by exploiting the optimal solution in the solution space. The features of Levy flight create the step size adaptive outcome from faster selective of the optimal solution.

XGBoost Based Classification
At the final stage, the extracted features from the 1-D ECG signals are fed as to XGBoost model to determine the class labels of the input ECG signals. The XGBoost is extreme gradient boosting that signifies their recent benefits from the investigation of ML. Additional benefits are maximum accuracy of standard boosting methods, along with use of sparse data effectually and apply distributed with parallel calculating adaptably. For achieving the target variable computing, the XGBoost technique creates a series of DTs and allocates all leaf nodes a quantized weight [25]. For a given n*m feature matrix of trained data, the forecaster utilizes the K additive functions for the ensemble outcomes.
where X i refers the majority instances (i = 1, 2, …, n), f ¼ f f ðxÞ ¼ w s ðxÞgðs : R m ! T; w s 2 R T Þ implies the ensemble of trees, all trees f(x) contain their structural parameter s and leaf weight w, w i implies the i-th leaf, Tr represents the number of leaves from tree, K denotes the number of trees that are utilized for ensemble the outcomes andŷ i indicates the forecast label.
For getting the minimal loss function, the greedy search rules are utilized for reducing one of loss, the loss function is demonstrated as: where g i and h i are the 1st and 2nd order gradient statistics on loss function, lð Þ implies the loss function. The last term Ω(f t ) is the penalty, around as to γ and λ are the parameters which manage the difficulty of tree, the normalization term is utilized for avoiding over-fit by smoothing the final learned weight. The loss reduction when splitting is provided as: where I ¼ I L [ I R ; I L and I R are the sample groups of left and right nodes subsequently splitting. For getting the significance of all split nodes from the tree, it is computed as significance of node comparative variables from XGBoost method. A significance of all split nodes is determined as: where l signifies the indicator function that is connected to squared-influence, v t denotes the split variable related to node t, andî 2 t stands for the empirical enhancement of square error produced by the split,î 2 t is determined as: where y l and y r are the mean of weight of left and right children node of t, w l and w r are sum of weights. For a set of DTs fTr m g M 1 , boosting is attained by the generalization of the average over every tree from the sequence. So, Eq. (24) is redefined as:

Performance Validation
This section validates the performance of the proposed model. The simulation process take place using Python 3.6.5 tool and the results are examined. A detailed comparative results analysis is performed to highlight the superior performance of the proposed model.         A comparative study of the DLECG-CVD model with existing techniques takes place in Tab. 2 and Fig. 8 [5]. The results showcased that the DT model has failed to show effective results with an accuracy of 0.279. Then, the LR model has attained a slightly increased result with an accuracy of 0.3738. Likewise, the KNC technique has obtained moderate performance with an accuracy of 0.6689. Afterward, the 1-DCNN and RF models have demonstrated closer results with the accuracy of 0.73 and 0.7983 respectively. Simultaneously, the DL-ECGA and GBT models have exhibited competitive accuracy values of 0.847 and 0.8498 respectively. At last, the DLECG-CVD model has outperformed the other methods with a maximum accuracy of 0.8824. Besides, the inclusion of the ISSO algorithm as a hyperparameter optimizer helps to enhance the classification performance of the DLECG-CVD model for unseen data. From the above-mentioned results, it is apparent that the DLECG-CVD model has been found to be an appropriate tool to recognize the 1D-ECG signals.

Conclusion
This paper has designed a 1-D ECG signal recognition model named DLECG-CVD. The presented model involves pre-processing, DBN based feature extraction, ISSSO based parameter tuning, and XGBoost based classification. A novel ISSO based feature selection technique is introduced by incorporating the concepts of levy flight to the SSO algorithm in order to avoid the local optima problem. Besides, the inclusion of the ISSO algorithm as a hyperparameter optimizer helps to enhance the classification performance of the DLECG-CVD model for unseen data. A detailed experimental validation process takes place using PTB-XL dataset and examined the outcomes under several dimensions. The comparative study of the DLECG-CVD model highlighted better performance over the existing techniques with the maximum accuracy of 0.8824. As a part of future scope, the performance of the DLECG-CVD model can be further increased by utilize of DL based classification models instead of XGBoost technique.
Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.