Handling High Dimensionality in Ensemble Learning for Arrhythmia Prediction

Computer-aided arrhythmia prediction from ECG (electrocardiograms) is essential in clinical practices, which promises to reduce the mortality caused by inexperienced clinical practitioners. Moreover, computer-aided methods often succeed in the early detection of arrhythmia scope from electrocardiogram reports. Machine learning is the buzz of computer-aided clinical practices. Particularly, computer-aided arrhythmia prediction methods highly adopted machine learning methods. However, the high dimensionality in feature values considered for the machine learning models’ training phase often causes false alarming. This manuscript addressed the high dimensionality in the learning phase and proposed an (Ensemble Learning method for Arrhythmia Prediction) ELAP (ensemble learning-based arrhythmia prediction). The proposed method is working as a classification approach that incorporates both supervised and unsupervised learning methods. The experimental study addresses the rise of the proposed method in the prediction accuracy of both labels. The cross-validation statistics of ELAP have been compared to contemporary methods to scale the performance of the ELAP. The proposed method ELAP. Concerning scale, the prediction accuracy, the scope of the false alarm, the robustness of the label prediction, the outcomes of the assessment metrics obtained from 10-fold leave pair out cross-validation performed on proposed ELAP has compared to the corresponding outcomes metrics obtained from the contemporary methods.


Introduction
Worldwide every year, 17.9 million people have been affected by CVD (cardiovascular diseases). For the medical analysis, the signal has been considered, detecting the heart's abnormality by computing electrical muscle and electrical activity. Small electrical impulses have been formed by the cardiac, which spreads all over the entire cardiac-muscle. Here, these impulses have been recognized often with the ECG device. The ECG device has recorded the cardiac electrical action, and this information has been exhibited on the graph sheet of ECG. The provider of healthcare understands the information that is recorded. ECG aims to know the pain and sign in the heart and help perceive abnormal heart defects.
Skilled clinical experts often suggest electrocardiogram tests to diagnose the arrhythmia scope in a patient's heartbeats. The cardiac disorder, identified by ECG (Electrocardiograms), considers distended heart and abnormal cardiac rhythm. One of the significant cardiac illness diagnostic equipment is ECG. Due to maximal cardiac illness death rate, accurate ECG signals discernment, and initial recognition stands significant on patient action. The cardiogram signals technique would provide prominent research for doctors to examine the recognition and diagnosis of heart illness and categorize the patient abnormality. Cardiogram classification into types of heart diseases provides adequate information to identify the illness of cardiac. The cardiogram signal classification is an intricate issue because of the classification procedure issue. The feature normalization, changeability, lacking, non-existence of optimal classification, unstable graphs, and originality are the critical constraints of classifying electrocardiograms. Moreover, evolving the appropriate classifier, which attained categorizing illness and the ECG signal classifier's main application, identifies cardiac illness diagnosis.

Related Research
Over the former decades, numerous schemes have been projected to classify the heartbeat and detect arrhythmia based on automatic ECG. Faezipour et al. [1] explored the wavelet-based ECG beats classification. Chazal et al. [2] introduced an algorithm based on linear discriminant analysis. Kumar et al. [3] present a model based on NN (neural network) to classify the five diversified ECG classes automatically. The models based on SVM have been implemented for ECG signals classification [4][5][6]. Melgani et al. [5] projected a model based on SVM for classifying the ECG beats automatically and has been compared to the other two classifiers like RFF-NN (Radial basis function NN classifier) and KNN (k-nearest neighbors). Mondejar-Guerraa et al. [4] projected a novel model for classifying the ECG based on the SVM classifier.
The most prominent ML techniques are ensemble learning that might be used in diversified issues like regression and classification [7]. Peimankar et al. [8,9] used biomedical and electrical engineering. Each ensemble learning approach comprises three significant parts: (a) forming the training set from the dataset, (b) group of diversified classification algorithms has to be trained, and (c) integrating the classification prediction algorithms. Polikar [7] presents that the required benefits of utilizing ensemble learning possess a more precise classification method by transmitting the single weak-classifier selection.
A set of 26 features have been extracted and utilized as inputs for three classifiers to differentiate among regular heartbeats and four diversified arrhythmia classes. Here, three classification algorithms have been utilized in this contribution are Artificial NN (ANN) [10], Adaboost [11], and RF (Random Forest) [12]. Each of the above stated three single classification algorithms has been trained using 5fold crossverification schemes. Glenn [13] presents that these algorithms' outcomes have been accumulated by utilizing DST (Dempster-Shafer theory) to enhance these performance classification algorithms.
The ensemble classifiers integrate the individual classifiers' decisions, which compose them to enhance the final estimation. Dietterich [35] present several schemes in this literature for forming ensemble classifiers. Each classifier is trained by diversified training subset instances such as AdaBoost [36] or Bagging [37].
Dietterichet et al. [38] present an issue that pre-requisites many classes, segmenting the number of outputs in diversified sets and producing an ensemble classifier. Moreover, other contributions train every classifier in a diversified input features subset. Robert et al. [39] performed experimentation and finalized that integrating classifiers trained on diversified feature sets are very resourceful, mainly when single classifiers deliver an optimal performance. Waske et al. [40] developed ensemble SVM classifiers in multi-source cover land classification issues using a balanced dataset. Moreover, training every SVM with a diversified data source prominently enhanced the outcomes when compared with a single SVM that has been trained with entire data sources.
Automatic detection of cardiac arrhythmia (ADCA) using ensemble learning [41] has endeavored to address the constraints of the contemporary methods of machine learning-based arrhythmia prediction. Though the ADCA is an ensemble classification model, it does not address the false alarming caused by the high dimensionality of the values representing the training phase's features. Our earlier contribution of a classification technique, Electrocardiogram Stream Level Correlated Patterns as Features (ESCPF) [42], addressed a novel feature selection and feature optimization methods to perform heartbeat classification to identify the arrhythmia scope in a given electrocardiogram. However, the false alarm due to dimensionality in feature values has not been addressed by ESCPF.
Concerning addressing the false alarm in arrhythmia prediction caused by the dimensionality in feature values, this manuscript portrayed a novel ensemble classification process that uses signal flow features.

Methods and Materials
This section explores the methods and materials used in the proposed ensemble learning-based arrhythmia prediction from electrocardiograms. The section includes a detailed description of the data corpus used in the classification process, the features extraction, feature optimization using the Dice Similarity Assessment Scale, the method of handling dimensionality through clustering and cluster optimization by Differential Evolution, the incremental binary classifier, and the method of the classification process.

The Data
The dataset was prepared by the integration of diversified datasets EHCD (ECG Heartbeat Categorization Dataset) [43] and MIT-BIH [18] of the records labeled either as positive or negative. Each record considered for the experimental study is positive or negative, as stated in [44]. The records' count from these records labeled positive is 15000 records, whereas the rest of records 12100 are labeled as benign.

The Features
Let the dataset ECG represents the set of electrocardiogram reports of the subjects in digital format (as x, y coordinates), which have been labeled either as negative (no evidence of arrhythmia) or positive (prone to arrhythmia). The input corpus of ECG reports bipartite into two sets pT, nT contains the records labeled as positive (prone to arrhythmia) and negative (normal heartbeat) in respective order. The sequence of cardiogram elements (y-coordinate values projected for a sequence of x coordinates) of size one and above are the considered features, referred to further as the sets fP, fN in respective order of the labels positive and negative. Each feature f represents a sequence of elements (x-coordinates) of size 1 to the record's size |r i |, with maximum cardiogram elements than any other record of the corresponding set. Each record fr9r 2 ECGg reflects the number |r| − s + 1 of size features s, which is the absolute difference between the record r size |r| and the size of the sequence of elements (feature)s incremented by 1 [45].

Features Optimization Using Dice Similarity Assessment Scale
The Dice similarity coefficient (DSC) has been used to select optimal labeled records attributes as positive and negative [44]. Also, the use of DSC to choose optimal features has been explored in the coming sections.
Diversity assessment using a distance scale is as follows. It is the variance observed between the values projected for a feature, which is the column of both matrices E, and F. The main scheme for evaluating the variance for the elements is adapting coding theory. Such a scheme is implemented to handle distance among several unique values, which are noticed and the record set attribute tagged as false or true.
Let the i th column of the set E, and set F as vectors E c , and F c in respective order, which may be distinct in vector size. Assessment of the diversity by distance scale is as following by (Eq. (1)).
//d E c $F c denotes a distance between the i th column E c , and F c of the matrix E and matrix F.

Forming the Initial Clusters
Clusters have been framed separately for both labels, such as positive as well as negative. The proposed cluster from negative and positive labels and values exhibited for framing every given labeled records data has been considered a unique set in respect to this. Every support value has been assessed as a ratio of records having corresponding values against the overall records count. Every record has been weighted by accumulative support values perceived for every value depicted in the corresponding record.
The records have been organized in descending weight sequence, which has been assessed as follows: Primarily, it depicts average support ratio values avgS and the respective deviation eS of the corresponding set of records. Further, it depicts the absolute difference in average support avgS and deviation deS as a record weight rew. A record average weight has been considered to determine the centroid threshold, as explored in the following description.
The record's average weight is denoted as 〈rew〉 and deviation of the corresponding transaction's weight detw, indicated by RMSE of corresponding record weights for the specified training corpus. Also, consider the accumulative of record average weight 〈rew〉 and representation of the threshold deviation value detw of centroid cet.
The initial cluster count has been signified by the count of records having records weight greater than cet. Also, clustering has been implemented. One or more cluster records can exist in other clusters if the distance amid record and corresponding centroid cluster is greater than the threshold dit.
Further, assesses the distance amid each pair of clusters; if the distance is less than the distance threshold, it replaces both clusters with a new cluster that results from the union of the corresponding pair of clusters.

Optimizing Clusters
The Differential Evolution (DE) [46] is a reliable evolutionary method to perform optimization routines. The DE concept is approximately identical to GA, as stated in [47]. Even though the fluctuation from GA considers unique genotypes moreover, among these inputs (parent) and resultant (child) chromosomes, the fittest pair of chromosomes survive, and the rest evades.
The primary clusters have been deliberated to be input chromosomes set; further, DE (Differential evolution) has been performed on every set of chromosomes that leads to a new pair of chromosomes. Moreover, the following subsection explores the function of fitness used in the DE process.

Fitness Function
Specified cluster input has been considered as a dataset, and average record weights have been recognized. Moreover, identify cluster level utility-scale exhibited for multiple attributes values.

Cluster Optimization:
With cluster formation process completion, organize records in diminishing dataset sequence at the transaction utility level, and DE has been performed and attained high fitness as follows: Let CLS signifies set, which depicts total possible clusters, Let TCLS indicates a set, which includes novel formed clusters. CHRS = null //The vacant set has been considered to preserve the novel chromosomes generated from the crossover procedure.
CHRS cl i CHRS cl j // transmitting the parental chromosomes towards set CHRS For each crossover fcrs k 9crs k 2 CRSg // Begin Let representation lcl i signifies a subset cl i containing total transactions that exist as predecessors towards the crossover crs k .
Let representation rcl i signifies a subset cl i containing total transactions that exist as a successor towards the crossover crs k .
Let representation lcl j signifies a subset cl j containing total transactions that exist as predecessors towards the crossover crs k .
Let representation rcl j signifies a subset cl j containing total transactions that exist as predecessors towards the crossover crs k .

CHRS
flcl i ; crs k ; rcl j g CHRS flcl j ; crs k ; rcl i g

End
Identify fitness entry in CHRS, as stated in Section 3.5.1 Organize set CHRS in reducing order for utility level cluster and count of optimal records for manifold attributes values. Further, these clusters are used for training the classifier such that the cluster must be formed individually, and a binary-classifier has been trained.

Ensemble Classification
The work [48] presents a cuckoo search-based incremental binary classifier that enhanced binary classifier has been modified for performing sentiment analysis. The above-stated classifier is optimum for performing binary classification compared with other existing intricate classification approaches. Moreover, the label and training prognosis states of classifiers have been explained in the following description. The work [49] diabetes is predicted using significant attributes, and the relationship of the differing attributes is also characterized. Various tools are used to determine significant attribute selection and for clustering, prediction, and association rule mining for diabetes.
Sometimes individual classifiers would deliver excellent outcomes in the classification of ECG heartbeat. These excellent instances can be good results, extreme or bad results. Prominently, diversified outcomes have been attained when such classifiers have extended to other kinds of datasets. The ensemble classifier has better capabilities in general. Additionally, the ensemble classifier error rate has usually been less than an individual one. Thus, optimal and ensemble classifier has offered several balanced outcomes for entire categories. The ensemble classifier comprises numerous device learners that might be integrated and constructed as the ensemble classifier.

Learning Phase
The binary classifier designed on cuckoo search (CS) has two stages. Here, the primary step called training builds a hierarchy of nest so that every level comprises various perches more than the amount of predeceasing level perches if there are any. Moreover, the training step builds two hierarchies for both negative and positive labels. The branches formed in each level of both hierarchies are in the following: For sentiment polarity labels, the patterns of n-gram have been discovered in the form of optimum features from respective label training corpus has to be organized in decreasing sequence of size. Moreover, n-gram features with maximum size n must be segmented into clusters so that similar size ngrams possess the same frequency that might present in one cluster. Each cluster with n-grams of n size needs to keep as branches in the respective hierarchy's primary level fl9l ¼ 1; 2; 3; ::; ng. Identically, features n-gram size fðn À iÞ9i ¼ 1; 2; 3; . . . ; ðn À 1Þg has to be segmented into clusters so that each cluster comprises distinct n-grams possessing similar frequency. Moreover, these clusters have to be kept as branches in fl ¼ ði þ 1Þ9i ¼ 1; 2; 3; . . . ; ðn À 1Þg level. Here, this procedure is repeated until the last level's hierarchy has been framed in the following. The size of n-grams has to be segmented into clusters so that every cluster comprises unique n-grams set possessing a similar frequency. Entire these clusters would be kept as branches (n th level), which is the last level.
The branch hierarchies of entire clusters for both negative and positive labels have been building. Here, the representation depicts negative or positive label clusters.

Prognosis of Arrhythmia
The record R shall be labeled as either prone to arrhythmia or not by measuring the average record fitness R concerning diversified records clusters labeled positive and fitness average for diversified records clusters labeled negative & their corresponding RMSD. By utilizing the conditions determined below, the labeled record has been determined as prone to arrhythmia or not. This section depicts the algorithmic flow of the label prediction strategy, which includes the estimation of positive fitness Let the notation tr representing the test record given to identify the arrhythmia scope Let the notation wv representing the resultant word vector of the preprocessing phase be applied to the test record tr Let the notation ng(tr) representing all possible n-grams discovered from the electrocardiogram signal Perform perch search on all hierarchies to find the competent perches concerning the n-gram features ng (tr) of the test record tr as follows #Estimating positive fitness# pf = 1// fitness initialized to maximum 1; fitness always is greater than to 0 and lesser than or equal to 1

Experimental Study
The total number of labeled records considered for the experimental study is 81614, comprising 46103 records labeled as positive and 35511 records labeled as negative. The k-fold Leave-Pair-Out Cross-Validation (LPOCV) [50] has been used to scale the proposed method's performance. The crossvalidation metrics have considered assessing the performance of the proposed method ELAP (ensemble learning-based arrhythmia prediction). Concerning scale, the prediction accuracy, the scope of the false alarm, the robustness of the label prediction, the outcomes of the assessment metrics obtained from The depicted fitness pF(tr), nF(tr) shall use further to predict the arrhythmia scope is positive or negative as follows if((pF(tr) − nF(tr)) > dτ) confirms that the given test record is reflecting the arrhythmia scope as positive elseif((nF(tr) − pF(tr)) > dτ) confirms that the given test record is reflecting the arrhythmia scope as negative 10-fold leave pair out cross-validation performed on proposed ELAP has compared to the corresponding outcomes metrics obtained from the contemporary methods. Investigates the recovery and death factors that contribute to schistosomiasis disease preprocessed dataset, collected from Hubei, China. A computerized learning method, association rule mining (Apriori), is used to spot factors [51]. Automatic Detection of Cardiac Arrhythmias (ADCA) Using Ensemble Learning [41] and Electrocardiogram Stream level Correlated Patterns as Features (ESCPF) [42]. Fig. 1 addresses the significance of the precision observed from ELAP compared to the precision observed from ADCA and ESCPF. Though the ESCPF is most similar to the ELAP concerning feature extraction and optimization, it ranked last about precision due to the curse of dimensionality in a training corpus. Though the method ADCA performing an ensemble learning process, it is evincing low precision compared to the precision observed from ELAP.
The graph has been plotted among metric specificity and ten folds of leave pair out cross-validation performed on ELAP, ADCA, and ESCPF models, as shown in Fig. 2. The metric specificity is also called selectivity. It has been envisioned from the above figure that the performance of specificity ELAP is better than the ADCA and ESCPF models. Fig. 3 addresses the metric sensitivity, also called recall, observed from ELAP, ADCA, and ESCPF models. The method ELAP outperforming the ADCA and ESCPF towards sensitivity. Among the three methods compared, ELAP, ADCA, and ESCPF ranked in corresponding order since the contemporary methods lagged in handling the curse of dimensionality in a training corpus.  The graph has been plotted among metric accuracy and ten PCV IDs over the proposed ELAP model and ADCA and ESCPF models, as shown in Fig. 4. It has been envisioned from the above figure that the accuracy performance for the ELAP is better when compared to the ADCA and ESCPF models.
In Fig. 5, the metric F-measure is also called as F1score. The graph represents the f-measure observed from tenfold leave pair out cross-validation performed on ELAP and contemporary ADCA and ESCPF. The ELAP has been compared with ADCA and ESCPF models. These statistics exhibit that ELAP is outperforming the contemporary methods ADCA and ESCPF. The graph has been plotted among metric MCC and ten PCV IDs over the proposed ELAP model and ADCA and ESCPF models, as shown in Fig. 6. This metric MCC has been used as a measure to assess the performance of the binary classification. It is envisioned from the above figure that the MCC observed from the projected model is better when compared to the ADCA and ESCPF models.

Conclusion
The arrhythmia prediction by ensemble classification using sequence patterns of the electrocardiogram signals has been addressed in this manuscript. Unlike the contemporary models, which are specific to train the different classifiers on the same feature values, the proposed method is partitioning the training corpus into multiple clusters. The entries of one cluster may occur in one or more other clusters. It treats each cluster as a different corpus and discovers sequence patterns of the corresponding cluster's electrocardiogram signals as features. The discovered features of each cluster are used further to train the classifier. The training phase uses different objects of the same classifier for different clusters. The experimental study performed on proposed and other contemporary methods exhibits the significance and performance optimality of ELAP to identify arrhythmia scope compared to other contemporary methods ADCA and ESCPF. Future research can introduce the fusion of feature optimization methods and classification methods to improve arrhythmia prediction accuracy.