Heart Disease Diagnosis Using Electrocardiography (ECG) Signals

Electrocardiogram (ECG) monitoring models are commonly employed for diagnosing heart diseases. Since ECG signals are normally acquired for a longer time duration with high resolution, there is a need to compress the ECG signals for transmission and storage. So, a novel compression technique is essential in transmitting the signals to the telemedicine center to monitor and analyse the data. In addition, the protection of ECG signals poses a challenging issue, which encryption techniques can resolve. The existing Encryption-Then-Compression (ETC) models for multimedia data fail to properly maintain the tradeoff between compression performance and signal quality. In this view, this study presents a new ETC with a diagnosis model for ECG data, called the ETC-ECG model. The proposed model involves four major processes, namely, pre-processing, encryption, compression, and classification. Once the ECG data of the patient are gathered, Discrete Wavelet Transform (DWT) with a Thresholding mechanism is used for noise removal. In addition, the chaotic map-based encryption technique is applied to encrypt the data. Moreover, the Burrows-Wheeler Transform (BWT) approach is employed for the compression of the encrypted data. Finally, a Deep Neural Network (DNN) is applied to the decrypted data to diagnose heart disease. The detailed experimental analysis takes place to ensure the effective performance of the presented model to assure data security, compression, and classification performance for ECG data.


Introduction
Globally, heart-based disorders are progressively increasing because of massive factors such as stress level, the physical state of the body, present lifestyle, etc. Electrocardiogram (ECG) signal processing is a well-known technique applied to determine heart condition [1]. ECG monitoring devices are employed extensively in medical as well as telemedicine. Subsequently, transmitting the ECG information through public systems is highly tedious and imbalanced. Also, ECG signals are gathered for longer durations at high resolutions. As a result, a massive amount of information has been collected. Hence, the ECG signals are compressed for enabling effective transmission and memory. In the last decades, developers investigated ECG compression. Many works were carried out on lossy compression. The required objective is to offer high-definition images with considerable distortions [2,3]. Typically, compression models are divided into 3 classes: Direct, Parameter Extraction, and Transform Technologies. Initially, direct approaches were applied to predict. The common instances are breaking codebook, Artificial Neural Network (ANN), and Vector Quantization. The transforming schemes were involved in degrading the actual signal in terms of linear orthogonal transformation. Finally, extended coefficients accomplished better compression.
Due to the demand for the e-healthcare sector, clinical data's trust and integrity are significant problems. Generally, biomedical is transmitted through open networks and protected from external intrusions [4]. Moreover, the ever-increasing demands for computing biomedical signals are encrypted for securing the patients' details where the security is not affected [5]. As ECG information undergo ETC, traditional works have applied compression and then encryption modalities. The ETC scheme for multimedia systems has received huge attraction from developers [6]. Therefore, the former ETC technique was used for multimedia with the help of previous methods that unavoidably led to reduced compression performance [7].
A computer-aided system is deployed for automatic prediction of MI, which helps cardiologists to make effective decisions. Thus, diverse works were performed on automated MI prediction. Following the nonlinearity of heart abnormality classification, the models relied on Neural Network (NN) applied in recent times. Conventionally, developers have introduced a training model according to the Radial Basis Probabilistic Neural Network (RBPNN) to provide a remarkable solution. The newly deployed technology sampled for ECG analysis and prediction of irregular heartbeats was divided into various pathologies. Presently, researchers have experimented with new and NN methods [8] especially, Machine Learning (ML) and Deep Learning (DL) frameworks like Convolution Neural Network (CNN) [9]. It was applied in arrhythmia prediction, coronary artery disease prediction, and beat categorization. A Deep Belief Network (DBN) was employed for classifying the quality of the ECG signals [10]. An 11-layer CNN model was developed for predicting Myocardial Infarction (MI) [11]. Afterwards, researchers insisted on the application of shallow CNN by focusing on poor myocardial infarction. It is highly beneficial by using different filter sizes in a similar convolution layer, enabling understanding features from signal sites of diverse lengths. In [12], planners have used a classifier to classify heart diseases by applying the Multi-Layer Perceptron (MLP) network and the CNN system. Specifically, the results were obtained by applying similar data sets but with various classes. In this application, two classes were employed in the MLP system. These works produced a minimum performance by applying MLP and CNN methodologies. This study presents a new ETC with a diagnosis model for ECG data, called ETC-ECG. The proposed model involves four major processes, namely, preprocessing, encryption, compression, and classification. Once the ECG data of the patient are gathered, DWT with a Thresholding mechanism is used for noise removal. Next, the chaotic map-based encryption technique is used to encrypt the data. In addition, the BWT approach is employed for the compression of the encrypted data. When the encrypted data get compressed, they are transmitted for further analysis. Finally, DNN is applied to the decrypted data for diagnosing heart disease. The detailed experimental analysis takes place to ensure the effective performance of the presented model to assure data security, compression, and classification outcome of the ECG data.

Related Works
Numerous ETC techniques for ECG signals were recommended in the past decades. Compression models are categorized into two groups, namely, lossy and lossless. Initially, a collection of features for a Compression Rate (CR) with maximum value occurs, whereas the second is loss of essential data. As a result, the loss model shows the least CR and conserves applicable and essential data. Mostly, developers have gained better efficiency by integrating encryption and compression.
Compression with the removal of data repetition occurs so that memory space gets enlarged and transmission duration is reduced, which results in stability and resistance for encryption modules [13]. Finally, the nearest neighbouring coupled-map lattice and non-uniform Discrete Cosine Transform (DCT) that uses Huffman coding to compress and encrypt images are obtained [14]. Consequently, it implements maximum compression rate and security level when compared with alternate technologies. Alternatively, [15] Adaptive Fourier Decomposition (AFD) based new transform compression approach combined with Symbol Substitution (SS) algorithm was proposed. In this approach, SS is facilitated as a built-in encryption model. Compression technologies are essential in medical applications as they occupy less memory space and accomplish a suitable transmission rate [16].
One of the significant limitations of this technique is that the data are decompressed as reformed data might lose significant data. Hence, the presented approach compresses data by retaining significant data and uses the cryptographic model to conserve confidentiality. Compressed Sensing (CS) [17] produces maximum CR than the wavelet method and consumes less energy. Based on [18], ECG shows the electrical heart movements. Also, it guides in observing and examining heart-based diseases. Remote monitoring modules like telemedicine requires massive memory space to assess and diagnose data. The wireless communication applies maximum power at the time of sending uncompressed data. Therefore, data compression is more essential to limit the memory and enhance the transmission rate and bandwidth application. Various lossless compression methods were used for ECG applications, and their effectiveness was compared. Finally, low variance Huffman coding, an optimal algorithm for compressing ECG signals, was proposed. The study assumed the Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia dataset and Matrix Laboratory (MATLAB) tool for result analysis. Based on the obtained simulations, most of the memory could be conserved with low variance and by applying a Huffman code with computation complexity of NLog 2 N. Thus, the proposed method achieves optimal consumption of bandwidth with an elegant buffer design.

The Proposed Encryption-Then-Compression-Electrocardiogram Model
The schematic representation of the presented ETC-ECG is represented in Fig. 1. First, ECG biomedical signal is fed into the developed approach. Then, denoising is performed using DWT with successive Thresholding and block development of ECG signal. Subsequently, the encryption model based on a chaotic map approach is deployed. Afterwards, the compression mechanism is applied using BWT for every block of ECG signal. Eventually, once the ECG signal block is received, decryption and decompression are carried out. Finally, DNN based classification process takes place.

Pre-Processing
Here, DWT is applied to remove the noise that exists in the ECG signal with a thresholding concept. Wavelets are numerical functions that are operated on signal data as per the resolution. DWT is classified as wavelets that are discretely sampled. The primary benefit of DWT over Fourier Transform (FT) is that it simultaneously examines frequency and time. Soft or Hard threshold models describe the shrinkage rule [19]. Thresholding is employed to a signal vector according to the features.

Chaotic Map-Based Encryption
Chaos-related clinical image security system involves two working stages, namely confusion and diffusion. For the allocated input images, the image is encrypted and decrypted using two phases, and the behaviour of the projected approach was managed using fundamental keys and control attributes. In order to enhance the security level of Chaos-Function (C-F), dual keys have been produced to encrypt and decrypt images where double keys have been produced by 16-character byte keys [20]. The performance involved in Chaos is an image encryption task that depends upon the features and Chaos with Figure 1: Block diagram of ETC-ECG randomness in the academic application. In the encryption process, the newly developed C-F was coupled with XOR to enhance the randomness in cipher image and maximum key space is used for attack resistance. In the input image, C-F is estimated under the application of the map function (Fig. 1).
The chaotic map is applied for creating random sequences during the encryption process. The principle behind chaos theory is mixing and sensitivity for primary conditions and attributes that map with cryptography. The two fundamental features of the C-F are sensitivity for the initial state and combining property. Followed by that, C-F streams have been applied in diverse chaotic maps.

Logistic Map
It is defined as a direct, non-linear, and polynomial state of degree 2 with output and input parameters, represented as one major condition and control parameters as given in below Eq. (3) a nþ1 ¼ a Ã a n 1 À a n ð Þa n 2 0; 1 ð Þ and n ¼ 0; 1; . . . ::; a 2 0; 4 ð Þ In the logistic map, a semi-group was developed by a collection of functions, as a 2 0; 4 ð Þ is a perioddoubling bifurcation operation.

Tent Map
It is defined as a repetitive task in the place of the tent, making a discrete-time dynamical approach. In the map, point CX n on actual line maps, alternate point and b are retained as constant values. a nþ1 ¼ ba n for a n < 1=2 bð1 À a n Þ for 1=2 a n & Based on the constant value, the tent map forecasts the chaotic function. The mapping in C-F converges b < l from the range of b 2 0; 1 ð Þ input image in which a 0 is applied in secret keys.

Confusion Stage
It is a well-known phrase used while the pixel change is carried out and the location of pixels is integrated with the entire image with no pixel estimation where the images are unpredicted. Hence, the strategy behind this phase is to limit maximum correlation among adjoining pixels in the primary image. Here, the security method produces arbitrary keys M ¼ a 1 ; a 21 . . . a rÃc f g . These keys are applied as input plain image size, which differs from 0 to 255. The 1D vector o is given in Eq. (5), where D' implies the permuted image and 'O' means the permutation key. As a result, the sensitivity for minimum changes in primary condition has been expanded, and the accuracy of wider permutation was reduced.

Diffusion Stage
The overall image is encoded at the diffusion process using different chaotic values according to the tent map and arbitrary values. Furthermore, it is applied to ensure the sensitivity for images; thus, minimum pixel changes must be distributed in all pixels of the entire image. It is computed according to the given function Eq. (6) where U i and U iÀ1 are measures of masking pixel in which È implies XOR operation and random code measures are as depicted in Eq. (7) r ¼ mod floor l n Ã2 2O À Á 255 À Á It's expanded as Eq. (8) Based on the security, keystream 'r' has been upgraded for a pixel, and an estimated encrypted pixel measures 'D i ' depending on the encrypted pixels and a keystream. Therefore, this method assures the resistance for various attacks like plaintext attacks, selected plaintext attacks, and known cipher images.

BWT Compression
The BWT is defined as a data transformation mechanism used to redevelop data where the transformed messages are compressible. Assume that AE is a finite ordered alphabet. It shows that AEÃ is a set of pixels in AE: The finite pixel w ¼ w l w 2 Á Á Á w n 2 AEÃ with w i 2 AE, the length of w; implied as w j j, similar to n. It is also considered that words x; y 2 AEÃ are conjugate, when x ¼ uv and y ¼ vu, where u; v 2 AEÃ Conjugacy among words has equivalent relation over AEÃ The conjugacy class w ½ of w 2 AEn is a set of pixels w i w iþ1 Á Á Á w n w 1 Á Á Á w iÀ1 ; for 1 i n.
The BWT [21] is defined in the following: a word w 2 AEÃ, the simulation outcome of BWT is a pair of (BWT w ð Þ; I) accomplished by lexicographical sorting of conjugates of w. Specifically, BWT (w) is a word attained by combining the consequent symbol of a conjugate in the organized list, and I means the place of w. For example, when w ¼ mathematics, then BWT w ð Þ ¼ mathematics and I ¼ 7. The matrix is attained by the lexicographical organization of conjugates of w. It is also referred to as F; wherein the first column of the matrix, F is accomplished by lexicographical sorting of w. The column L is composed of word BWT w ð Þ.
The significant features of BWT are that it intends to collect various characters from the exact contents of input text by generating the results with high compressible factors. Various developers have referred to this feature as clustering BWT. Then, the similar characters are arranged in column L. Additionally, the feature of BWT is reversibility. A permutation s : 1; n f g7 ! 1; n f g has been defined with correspondence among the place of characters L and F. The variable s shows an order to reform the units L for making actual word w. Thus, starting from a position I, word w is recovered as given in below Eq. (9) The permutation s corresponding to the sample is defined as Eq. (10) s ¼ 1 2 3 4 5 6 7 8 9 10 11 7 8 6 5 10 11 9 4 3 1 2 In permutation s, the set i; s i ð Þ; s^2 i ð Þ; : : : f gfor i 2 1; . . . ; n f g ð Þ is named as orbit of s. The orbit of s develops a divider of set f1, ng. In BWT, it is feasible to estimate the given characterization of conjugate terms.

DNN Classification
It is one of the DNN models where DCNN has captured the maximum attention of developers in the latest times. A common DCNN has convolution, pooling, and Fully Connected (FC) layers and feature learning is attained by effective alternate and stacking convolution and pooling layers. Therefore, FC layers apply convolutional and pooling layers to map the Two-Dimensional feature vectors into One-Dimensional feature vectors. In the case of convolution layers, an output feature map of the l th layer x l j is estimated according to Eq. (11) when M feature maps are used as N filters. Fig. 2 shows the structure of the DNN.
where x 1À1 i means ith input map; k l ij denotes kernel of j th filter linked with i th input map; b 1 j shows a bias corresponding to j th filter; f ðÞ refers to activation function; '*' showcases the convolutional task. Consequently, N feature maps are gained as outcomes. Therefore, the count of parameters of the convolutional layer is estimated as Eq. (12) where kernel size is s×s. In the convolutional task, the size of the input map is 7×7, the kernel size is 3×3, and the stride is 2. Hence, the pooling layer applies a convolutional layer, which has consumed the activations and applies the average operator for extracting measures for the spatial area; thus, the same local features are combined as 1. One of the major benefits of pooling layers is that it is consistent for joining the same features in a local position. Lastly, the processing time, as well as attributes of the complete network, is limited effectively. The pooling tasks can be classified into 2 classes, namely, max pooling and average pooling. Initially, max-pooling is applied for computing local patch of feature map whereas average pooling estimates average. The max-pooling and average pooling tasks of the input map size are 4×4, kernel size is 3×3, and the stride is 2. In the pooling layer, the estimation process of the resultant feature map of the l th layer is the same as Eq. (13), and it is defined as: where β j l implies the multiplicative bias equivalent to j th filter; down() signifies the sub-sampling function.
Then, convolutional, pooling, and FC layers are applied for classifying the features obtained from actual data. The learned feature vectors are suppressed as a One-Dimensional vector where the input of FC layers consumes them. In the case of the FC layer, a value of the input vector is linked for a value of the output vector using a neuron. When the length of the input and output vector is represented as M and N, the output vector of the l th layer is estimated as D1 ½ Eq. (14) x l j ¼ f where w l ij implies the weight of jth output value linked with i th input value. The processing value for parameters in the FC layer is defined as Eq. (15) For experimentation, the MIT-BIH Arrhythmia Database is used [22]. Many cardiologists have annotated a record independently, and disagreements have been solved for gaining machine-readable reference annotation for every beat in a database. The results are determined in terms of different dimensions. Tab. 1 presents the analysis of the results obtained by the Continuing Medical Education (CME) with existing Patient Safety Organizations (PSO) and ECC for Mean-Square Error (MSE) and Peak Signal-to-Noise Ratio (PSNR). Fig. 3 investigates the MSE analysis of the proposed CME and the existing methods applied on different datasets. Fig. 3 shows that ME has performed better over the PSO and Hanover MSE on the applied dataset. For instance, on the applied dataset 100, the CME has attained a minimum MSE of 0.132, whereas the PSO and ECC models have obtained a higher MSE of 0.277 and 0.564, respectively. Likewise, on the applied dataset 112, the CME method has gained a lower MSE of 0.161, while the PSO and ECC methodologies have obtained a maximum MSE of 0.306 and 0.619, respectively. Later on, in the applied dataset 213, the CME has achieved a minimum MSE of 0.154, and the PSO and ECC have obtained a higher MSE of 0.372 and 0.549, respectively.   Fig. 4 shows that the CME model has performed better over the PSO and ECC techniques by reaching higher PSNR values on the applied dataset. For instance, on the applied dataset 100, the CME model has attained a minimum PSNR of 56.93 dB, whereas the PSO and ECC models have obtained a higher PSNR of 53.71 dB, 50.62 dB, respectively. At the same time, on the applied dataset 112, the CME model has achieved the least PSNR of 56.06 dB, whereas the PSO and ECC models have gained a maximum PSNR of 53.27 dB, 50.21 dB, respectively. Similarly, on the applied dataset 213, the CME model has obtained a minimum PSNR of 56.26 dB while the PSO and ECC models have obtained a higher PSNR of 52.43 dB and 50.74 dB correspondingly.
Tab. 2 presents the analysis of the results of the BWT with existing methods in terms of CR, and CT. Fig. 5 shows the CR result analysis of the projected BWT and previous methodologies on the applied dataset. Fig. 5 portrays the better performance of the BWT scheme over the Lempel-Ziv-Welc (LZW) and Arithmetic methods by gaining maximum CR values on the given dataset. For the sample, on the applied dataset 100, the BWT technology has accomplished a lower CR of 0.279, and the LZW and Arithmetic schemes have achieved greater CR of 0.451 and 0.567, respectively. At the same time, on the applied dataset 112, the BWT approach has obtained the least CR of 0.618, whereas the LZW and Arithmetic models have accomplished maximum CR of 0.652 and 0.714, respectively. Along with that, on the applied dataset 213, the BWT framework has reached a minimal CR of 0.267, whereas the LZW and Arithmetic models have attained supreme CR of 0.389 and 0.479, respectively. Fig. 6 examines the CT analysis of the developed BWT and previous methods on different datasets. Fig. 6 exhibits the BWT approach that showcased moderate performance over the PSO and ECC models by gaining minimum CT on the applied dataset.
For sample, on the applied dataset 100, the BWT framework has gained a low CT of 2.46 s while the LZW and Arithmetic approaches have reached a maximum CT of 3.14 and 4.56 s respectively. In this line, on the applied dataset 112, the BWT model has obtained a lower CT of 2.91 s and the LZW and Arithmetic technologies have gained a higher CT of 4.68 and 6.79 s respectively. Then, on the applied dataset 213, the BWT scheme has attained a minimum CT of 2.09 s while the LZW and Arithmetic models have obtained a higher CT of 6.53 and 7.98 s respectively.    Fig. 7 examines the Sensitivity and Specificity analysis of the DNN with compared methods. Fig. 7 portrays the observation of bad classification outcomes of the MLP model with a minor sensitivity of 89.87% and specificity of 87.42%. In addition, the Federatie Nederlandse Vakvereniging (FNV) has tried to showcase slightly better results with a sensitivity of 93.89% and specificity of 88.94%. Moreover, the Support Vector Machines (SVM) have obtained a moderate performance sensitivity of 94.63% and specificity of 89.41%. Furthermore, the k-Nearest-Neighbor (K-NN) has achieved greater sensitivity of 95.31% and specificity of 89.48%. But the proposed DNN has proficiently classified the disease with the sensitivity of 96.06% and specificity of 91.49%. Fig. 8 investigates the accuracy and F-score analysis of the DNN with traditional models. Fig. 8 represented the inferior classification results achieved by the Medical Language Processing (MLP) approach with a minimum accuracy of 83.92% and F-score of 93.38%. Also, the SVM approach has managed to show intermediate results with an accuracy of 83.93% and a F-score of 94.36%. In addition, the Fine Needle aspiration Cytology (FNC) has gained a considerable function with an accuracy of 84.41% and a F-score of 94.35%. Moreover, the KNN has achieved competing accuracy of 84.47% and a F-score of 96.85%. Thus, the proposed DNN has significantly classified the disease with an accuracy of 84.41% and a F-score of 94.35%.

Conclusion
This study has presented a new ETC-ECG with a classification model for the examination of ECG data. The proposed model involves four major processes, namely: Preprocessing, Encryption, Compression, and Classification. When the gathered input of ECG signal data is preprocessed, the encryption process uses the chaotic map technique. Once the encrypted data get compressed, they are transmitted for further analysis. At last, the DNN is applied to determine the existence of heart diseases. An experimental analysis takes place to ensure the effective performance of the presented model to assure data security, compression, and classification performance for ECG data. The obtained experimental values that the ETC-ECG attains have maximum compression efficiency, security, and detection rate. In future, the proposed ETC-ECG can be deployed in an Internet of Things based cloud platform for remote monitoring of patients.