Automated Learning of ECG Streaming Data Through Machine Learning Internet of Things

Applying machine learning techniques on Internet of Things (IoT) data streams will help achieve better understanding, predict future perceptions, and make crucial decisions based on those analytics. The collaboration between IoT, Big Data and machine learning can be found in different domains such as Health care, Smart cities, and Telecommunications. The aim of this paper is to develop a method for automated learning of electrocardiogram (ECG) streaming data to detect any heart beat anomalies. A promising solution is to use medical sensors that transfer vital signs to medical care computer systems, combined with machine learning, such that clinicians can get alerted about patient’s critical condition and act accordingly. Since the probability of false alarms pose serious impact to the accuracy of cardiac arrhythmia detection, it is the most important factor to keep false alarms to the lowest level. The proposed method in this paper demonstrates an example of how machine learning can contribute to health technologies with in detecting heart disease through minimizing negative false alarms. Stages of heartbeat learning model are proposed and explained besides the stages heartbeat anomalies detection stages.


Introduction
The enormous growth of the IoT sensors leads to a giant amount of sensed data over time for a wide fields of applications. Based on the nature of those applications, the resultant revenue is big data streams. Applying machine learning techniques on IoT data streams will help achieve better understanding, predict future perceptions, and make crucial decisions based on those analytics. That makes IoT a worthy prototype of life improving technology [1].
Smart cities: effect of seismic forces at work of bridges, traffic patterns and congestion management of highways and foundation subsidence, seismic activity of buildings. Production: IoT sensors automatically can monitor development cycles, and manage warehouses as well as records. Optimization and foreseen maintenance. Transportation: traffic Management such as smart parking, smart accident assistance. Cars: Vehicle to Vehicle (V-V) communication that exchange information about the speed and existence of surrounding vehicles to avoid crashes and easiness traffic congestion. Telecommunications: telecommunications companies use IoT in asset management, remote systems monitoring and anomalies detection. Retail: IoT is used to enhance supply chain management and smart inventory management.
Machine learning recall that algorithms are used to solve certain problems, based on predefined set of instructions, they organize enormous amounts of data into information and services, while in machine learning, algorithms build a model based on training data and then make predictions or decisions accordingly as illustrated in Fig. 1 [7,8].
Anomaly detection is the process of finding the outliers of a dataset, and considered as an example of uncontrolled tactic to machine learning [9]. Uncontrolled algorithms find correspondences or patterns in the input data that do not have a label or target result foreseen in advance as shown in Fig. 2C [10]. The first phase of anomaly detection is the recognition or definition of the expected normal behavior. Afterwards, the second phase is the comparison of the normal behavior with the collected or measured behavior, and sequentially the final phase is the creation of alerts if the deviation is significant from the mean. As an example, in the case of heart arrhythmia, the process is started by looking for deviations from the typical readings and then applying this estimate in the near real time.
In this paper, we will try to figure out the importance of combination of IoT data, data streaming and machine learning with the intention of providing a helping hand in medical care.
Chronic diseases, such as cardiovascular disease, are the main causes of death, with big concerns of transferring such patients to medical care centers at the appropriate time. A promising solution is to use medical sensors that transfer vital signs to medical care computer systems, combined with machine learning, such that clinicians can get alerted about patient's critical condition and act accordingly [11].
A deep-learning model was developed by Stanford University researchers, which can detect cardiac arrhythmia on an electrocardiogram (ECG) with accuracy of that of a cardiologist-level. Fig. 3 part (a) shows a portable sensor used to collect 30,000 30-second clips from patients with arrhythmia, while part (b) illustrates the output of Deep-learning algorithm of to identify different types of arrhythmia, collected from the ECG data sources [12].
Heart arrhythmia and other chronic diseases require unplanned severe hospital visits and use. Current evidence examining Remote Patient Monitoring (RPM) can possibly help in reducing expensive severe hospital usage and cost by a considerable amount [13]. Data collected from sensors can be analyzed in real time and while notifications are sent to physicians, who are kept updated for any critical changes in the patient's health.
However, the probability of false alarms pose serious opposing impact to the accuracy of cardiac arrhythmia detection, it is the most important factor to keep false alarms to the lowest level. A patient in UCSF medical center hardly survived a 39-fold overdose as physicians ignored the alarms since they received too many alarms [14].  2 Related Work Plentiful researches using data mining techniques in the diagnosis of Heart disease have inspired this work, a brief survey is presented here.
Abderrahmane in his research paper proposed a real-time heart disease prediction system based on apache Spark. The proposed system consists of two parts: streaming processing and data storage and visualization. The streaming processing uses Spark MLlib with Spark streaming and a classification model to predict heart disease. The data storage and visualization uses Apache Cassandra for storing the large bulk of generated data [15].
Hlaudi commenced an experiment for predicting heart diseases using various data mining algorithms. The experiment showed that there is no vivid difference in the prediction using different classification algorithms. The analytical accuracy determined by J48, REPTREE and SIMPLE CART algorithms are reliable indicators of heart diseases [16].
Shadman and his team proposed a machine learning technique that was derived from several machine learning algorithms in a Java Based Open Access Data Mining Platform, Waikato Environment for Knowledge Analysis (WEKA). They validated the proposed algorithm using 10-fold cross-validation. They achieved an accuracy, sensitivity and specificity levels of 97.53%, 97.50% and 94.94%, respectively [17].
Aditi's team developed an application based on machine learning algorithm neural networks which can predict the exposure of heart disease given basic symptoms such as: pulse rate, age, and sex parameters [18].
Experiments were performed by Amin and his team to categorize the performance of various feature selection algorithms. The experiments proved that Super Vector Machine classifier had excellent performance among other classifiers and achieved 86% classification accuracy [19].
Ricardo and his team proposed a heart disease detection method using the Random Forests algorithm, based on clinical data and patient test results. The algorithms showed an overall accuracy of 84.448% when using 10 times cross-validation, while the archived accuracy was 82.895% without the cross validation [20].
Sonam and his colleagues introduced a heart disease prediction system with Naïve Bayes and decision tree classifiers. The analysis showed that the decision tree classifier has better accuracy than naïve Bayes classifier [21].
A good comparison of the different algorithms of Decision Tree classification was held by Jaymin and his team looking for better performance using WEKA. They found that the J48 tree technique is the best classifier [22].
The aim of this paper is to develop a method for automated learning of ECG streaming data to detect any heart beat anomalies. Stages of heartbeat learning model are proposed and explained besides the stages heartbeat anomalies detection stages. The proposed method demonstrates an example of how health technologies can contribute with machine learning in heart disease detection.

Building a Model for Typical Heart Rate Activity
The electrocardiogram (ECG) is a plot of electrical impulses vs. time of the electrical of the heart. Heartbeats follow steady pattern (frequency) about 60 to 100 beats per minute. Heartbeat rate anomalies result from atrial fibrillation leading to rapid increases and irregularities (impulses) of heartbeat rate.
In the proposed method, the repetitive readings of heartbeat are used to train a model on normal heart rate. After that, these readings are used to compare successive observations with the model to judge abnormal behavior. Fig. 6 illustrates the complete flowchart of building representative heart rate activity model. In the first stage, we collect ECG from patient/(s), either from ECG repository or from free ECG data sets such as PTB-XL (freely accessible clinical 12-lead ECG-waveform dataset encompassing 21837 chronicles from 18885 patients) [23]. ECG readings need to be sampled (encoded) in fragments. The sampling rate can be obtained through utilizing the running mean algorithm to compute the running average of the ECG data stream (running mean is a calculation to analyze data points by creating a series of means of different partitions of the full data set). The goal of this stage is to choose a sampling rate that is higher than the running mean of the data. A good starting mean average algorithm is Welford algorithm [24]. In the case of ECG, the running mean of the normal ECG is about 3 samples per second as the normal ECG readings are from 60 to 150 pulses per minute (maximum running mean of 2.5 beats per second). Based on this running mean, ECG is sampled at one third of a second time interval as shown in Fig. 4.
In the k-means clustering algorithm stage, the algorithm classifies the data from fragments obtained from the encoding (sampling) stage into categories based on similarities between them. The algorithm will next combine observations into k-clusters, with each cluster containing observations with the nearby average from the cluster it belongs to [25]. Fig. 5 illustrates the working principle of the K-means clustering algorithm.
After classifying the fragmentations into clusters, they are used to teach the system, i.e., the process of creating the model. The model created is then used to be reconstruct the ECG of any incoming data. Fig. 6 illustrates a flowchart of a building normal ECG Model.  49 Heartbeat rate anomalies lead to irregularities (impulses) of heartbeat rate. Hence, the next stage is detecting anomalies, i.e., the irregularities in upcoming ECG readings. Starting at getting patient's ECG data, the running mean algorithm will choose the sampling rate and the k-means clustering algorithm will classify the data from fragments obtained from encoding (sampling) stage into categories based on similarities between them. The ECG readings are consequently reconstructed.
To find residues in the reconstructed ECG, T-Digest algorithm is used compare it with normal behavior in the model. The T-digest is an algorithm that estimates quantiles from a solid sketch [26]. In this anomaly detection, the threshold of T-Digest is set to 99.9% quantile; this means that any anomalies less than 0.1% of anomalies in the ECG data is considered as normal and will be helpful in minimizing negative false alarms. Any anomalies above 0.1% would be regarded as a symptom of heart arrhythmia and will thus trigger an alarm. The anomalies detection procedure is shown in Fig. 7.

Conclusions
This paper proposed an integration of machine learning and IoT streaming system to detect heart disease by monitoring heartbeat data. Our proposed model aims at automating the ECG data learning process in order to detect any heart beat anomalies. This proposed method takes in the repetitive readings of normal heart rate heartbeat, often from ECG repository or from free ECG data sets such as PTB-XL. Those readings are later used to create the heart rate model representing normal heart beat rate.
ECG will be sampled at one third of a second time interval to best achieve 3 samples per second, as the normal ECG readings are from 60 to 150 pulses per minute. With k-means clustering algorithm, the data from fragments obtained from the encoding (sampling) stage is classified into categories based on similarities between them, building the basis of the teaching process of our proposed model. After that, these readings are used to compare successive observations with the model to detect and judge abnormal behavior, and the ECG readings are consequently reconstructed. The system compares patient's signals in the framework of stored repository to detect asymmetrical heartbeats in real time.
To find residues in the reconstructed ECG, the threshold of T-Digest algorithm is set to 99.9% quantile; this means that any anomalies less 0.1% of anomalies in the ECG data is considered as normal and will be helpful in minimizing negative false alarms. With such a threshold, it is guaranteed that only anomalies above 0.1% would be considered a symptom of heart arrhythmia and will trigger an alarm.
Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.