An Online Chronic Disease Prediction System Based on Incremental Deep Neural Network

: Many chronic disease prediction methods have been proposed to predict or evaluate diabetes through artificial neural network. However, due to the complexity of the human body, there are still many challenges to face in that process. One of them is how to make the neural network prediction model continuously adapt and learn disease data of different patients, online. This paper presents a novel chronic disease prediction system based on an incremental deep neural network. The propensity of users suffering from chronic diseases can continuously be evaluated in an incremental manner. With time, the system can predict diabetes more and more accurately by processing the feedback information. Many diabetes prediction studies are based on a common dataset, the Pima Indians diabetes dataset, which has only eight input attributes. In order to determine the correlation between the pathological characteristics of diabetic patients and their daily living resources, we have established an in-depth cooperation with a hospital. A Chinese diabetes dataset with 575 diabetics was created. Users’ data collected by different sensors were used to train the network model. We evaluated our system using a real-world diabetes dataset to confirm its effectiveness. The experimental results show that the proposed system can not only continuously monitor the users, but also giveearly warning of physiological data that may indicate future diabetic ailments.

or delay 80% of type 2 diabetes complications. The medical decision-making process for early detection of diabetes is an urgent need. Due to the low efficiency of traditional manual data analysis, computer-based analysis becomes essential [1]. To this aim, many computer data analysis methods have been considered and examined. Data mining is a big advance in the type of analysis tools. It has been proved that the introduction of data mining into disease prediction can improve the accuracy of diagnosis, reduce costs and save human resources [2].
A number of disease predictive systems using various classification techniques (such as Naive Bayes (NB) [3], Multi-domain learning [4], dual-chaining [5] and nonlinear activation [6]) have been proposed. However, one of the main challenges of neural networks is that when users define the topology of neural networks, the number of hidden layers, the number of neurons, the number of times and the learning rate must be optimized. These parameters have to be confirmed before training. To automatically adjust the network parameters, AutoMLP [7] was proposed. However, the accuracy of this method still unsatisfactory in real environment.
Deep learning [8][9][10][11] has developed rapidly in recent years. They are widely used in medical field, including medical prognosis and cell event detection. Phan et al. [12] found that due to the lack of annotated data, supervised deep learning methods have inherent limitations. They proposed a novel unsupervised two-path input neural network architecture to capture the irregular changes in cell appearance and motion. Unsupervised deep learning methods show an advantage in general (non-cell) video because they can learn the visual appearance and motion of events that occur periodically. A problem of poor performance in the prognosis of diabetes is data overfitting. Ashiquzzaman et al. [13] developed a prediction system for the disease of diabetes. The overfitting problem was minimized by using dropout method. They tested their system on Pima Indians Diabetes Dataset (PIDD), and obtained a remarkable performance. However, the lack of data and training attributes of PIDD limits its application.
The parameters of network model need to be constantly updated due to the complexity and unpredictability of human body and disease, as well as the irregularity in of biological information [14]. In this paper, a novel network model is proposed to adapt to the changing human physiological data through incremental learning. With the increase of user's time, the intelligent system can predict diabetes more and more accurately according to feedback information. The overall architecture is shown in Fig. 1.
Compared with the current state-of-the-art of prediction methods of diabetics, the contribution of our work can be summarized as follows: • The proposed network model is able to predict the risk of diabetes for patient in real time. • The proposed system can adapt to the constantly changing physiological data of users by using incremental learning technique. • When misjudgment occurs, the error samples will be fed back to the server for network model adjustment. Thus, after a period of incremental learning, the network model will be more and more adaptable to user.
The rest of the paper is organized as follows. Section 2 presents the related work of incremental learning algorithms. The proposed method is presented in Section 3. Section 4 presents the experimental results and discussions. Finally, the concluding remarks are given in Section 5.  Machine learning can be divided into supervised learning, unsupervised learning [15] and semi-supervised learning [16]. In supervised learning, training samples need to be labeled, while unsupervised learning can train without labeled samples. Most samples have no labels in the real world, so unsupervised learning is easier to apply than supervised learning. As an important type of unsupervised learning, clustering has been widely used in data classification when training data are not available. Some clustering algorithms, such as k-means [17], Expectation-Maximization (EM) [18], and their variations [19][20][21][22][23][24] have been exploited for classification applications. BP network model has been widely used in diabetes prediction system. However, there is a disadvantage that the neural network model must be retrained when new detection data is generated. In addition, when the number of online users increases, the remote server may not be able to complete the training task in time. Reference [25] discovered that BP neural network models are not suitable for on-line prediction system. Incremental learning [26,27] was proposed to overcome this weakness. In this section, a brief background on classical incremental learning algorithms are presented.

Self-Organizing Map
Self-organizing Map (SOM) [28] is one of the famous incremental learning models. It provides a guaranteed topological mapping from a high dimensional space to a mapped neuron. SOM usually contains a two-layer neural network based on competitive learning, which can be used for online clustering and topological representation without prior knowledge. Moreover, it is robust to noise data. There are many improved versions of SOM that have similar learning mechanisms.
A certain number of neurons are randomly distributed in a certain space in SOM models. Their connections are initialized to determine the initial topology of the network. In training process, all neurons compete with each other for the response power to the current input. The winning neurons update their parameters to adapt to the new input. This competition mechanism eventually makes neurons in different regions more sensitive to different input patterns. Therefore, competitive neural network is a model for pattern recognition. However, it is difficult to obtain stable learning results while maintaining plasticity. We need an adaptive learning system which can adapt to the changing environment in real time. If the system is too stable, it cannot adapt to the fast-changing environment. On the contrary, if the system is too sensitive to external stimuli, it is difficult to stably save previously learned knowledge or even converge to a stable state [29]. When there is no prior knowledge and the external input mode changes with time, the self-organization of the network and the incremental of the algorithm are the key points of the learning system. One of the most representative self-organization models is Growing Neural Gas (GNG) [30] network. Its neurons can dynamically increase with the input data. GNG network has no parameters that change with time and can learn continuously. It can continuously update the network model by adding nodes and connections until it reaches the performance standard. Thus, GNG can adapt to the change of input mode and is more dynamic than SON network.

Self-Organizing Incremental Neural Network
In order to further improve the plasticity of the network, Self-organizing Incremental Neural Network (SOINN) based on SOM and GNG was proposed in [31]. SOINN uses two-layer neural network to represent the topological structure of unsupervised on-line data. By using a similarity threshold-based and a local error-based insertion criterion, the network is able to grow incrementally. Its working process is shown in Fig. 2. SOINN finds the closest node (winner) and the second closest node (second winner) of the input vector. Similarity threshold is used to determine whether the input vector belongs to the same cluster or the second winner. If node I has a neighbor node, the maximum distance between node I and its neighboring nodes is used to calculate the similarity threshold T i .
where N i is the set of neighbor nodes of node i and W i is the weight vector of node i. If node i has no neighbor node, similarity threshold T i is defined as the minimum distance between node i and other nodes in the network.
where N is the set of all nodes. If the input vector V is defined to belong to the cluster of first winner S 1 or second winner S 2 , then an edge will connect with S 1 and S 2 . The 'age' of the edge is set to '0'.
The age of all edges linked to the winner are then increased by '1'. The weight vector of the winner and its neighboring nodes is updated as follow: To find the winner S 1 , and second winner S 2 , the nodes set is searched by the following: where connection set C is initialized to empty set C ⊂ A × A. The input vector V is defined as a new node and is added into If S 1 and S 2 are not connected, these closest nodes are connected with 'age' of '0'.
The change W i to the weight of winner and change W j to the weight of the neighbor is the weight of the input vector. If an edge is older than the predefined parameter, agemax, the edge is removed.
The SOINN network would add the new node into the right position where the accumulating error is extremely large after λ learning iterations (λ is a timer). And the insertion would be canceled if the insertion cannot decrease the error. SOINN can find new clusters in data flow and learn without affecting the architecture of previous results by using incremental way. SOINN can be adjusted accordingly, which is suitable for robot intelligence, computer vision, expert system, anomaly detection and other fields.
Although SOINN has shown excellent classification ability in some applications, it still has some disadvantages. User should determine the stop time of the first level learning and the start time of the second level learning. In addition, if there is a high-density overlap between clusters, the clusters in the network will be linked together to form a new cluster. If the learning results of the first level changed, the second level must be completely retrained. Therefore, the second layer of SOINN is not suitable for online incremental learning. If a prototype x i is the nearest neighbor or the second nearest neighbor of the given sample n, the threshold will change. This means that new information can be learned without destroying previous knowledge. Thus, the second layer does not need to be fully retrained. Furao et al. [32] proposed an Enhanced Selforganizing Incremental Neural Network (ESOINN) to accomplish online unsupervised learning tasks. ESOINN is proofed to be superior to SOINN in the following respects: (1) it adopts a single-layer network to take the place of the two-layer network structure of SOINN; (2) it separates clusters with high-density overlap; (3) it uses fewer parameters than SOINN; and (4) it is more stable than SOINN [32]. Fig. 3 presents a comparison of SOINN and ESOINN in same dataset.

Chronic Disease Prediction
First of all, we cooperated with a hospital to obtain pathological diagnosis data of diabetes on a certain scale. Before providing the pathological data of these diabetic patients, name, ID and other information had been deleted to protect the privacy of patients. Second, we input these hospital patients' pathological data into the proposed neural network for training. The pathology of diabetes was closely related to some factors, such as the changes of daily life habits, clinical symptoms, standard values, high-risk groups and so on. Finally, an intelligent prediction system based on incremental deep learning was established, which can accurately predict the incidence of diabetes. The proposed system collected user's daily life data through the sensors on the smart home appliances. Then the periodicity and regularity of the data are analyzed. People at high risk of diabetes will be warned to see a doctor.

Identifying the Characteristics of Diabetic Patients
Many diabetics prediction studies were based on a common dataset "The Pima Indians Diabetes Dataset" [2,13]. There are 768 instances in this dataset, and all instances have 8 input attributes (from X l to X 8 ) and 1 output attribute (Y), which are listed in Tab. 1. The number of samples in this dataset is not enough for machine learning. Due to the difference of physical characteristics between Asians and Europeans, the dataset cannot accurately predict the diabetes of Asians (especially Chinese). In order to determine the correlation between diabetic patients' pathological characteristics and daily life data, we had carried out in-depth cooperation with a hospital. Patients with diabetes and impaired glucose tolerance usually monitor their blood glucose through lifestyle and physical changes. These changes include Body Mass Index (BMI), Waist to Hip Ratio (WHR), blood pressure, blood lipid, fasting blood glucose, OGTT, 2-hour Postprandial Blood Glucose (2hPBG), etc. Finally, we created a Chinese Diabetes Dataset (CDD) containing 575 patients. Some samples are shown in Tab. 2. An application was developed to collect data from intelligent device.

The Proposed Incremental Learning Network Model
This paper presents a method to adjust network parameters online by learning user feedback data. The accuracy and personalization of the network model can be improved. The goal of proposed deep learning model is to identify user feedback data which contains a lot of information iteratively. Using a large number of unlabeled samples to expand the training data set with large amount of information can improve the prediction accuracy. Thus, unlabeled samples are selected according to their potential contribution to training.
Both between-class and within-class insertions are important to the second layer of SOINN [31]. One drawback is that if the results of the first level change, all the learning results of the second level will be destroyed. The second layer must be retrained, which means that the second layer of SOINN is not suitable for online incremental learning. Since many nodes are generated in high-density regions, the distance between adjacent nodes in these regions will be shorter. Here, we define the range thresholds T s1 and T s2 as the Euclidean distance between two neurons. S1 and S2 are two neurons in neuron set which is close to new input data. The winner and second winner of the input data are located. Based on the range thresholds T s1 and T s2 , these two winners will be added or removed the connection between them. Then we update the winner's density and weight after each learning. Nodes that caused by noise should also be deleted in the process. After learning, all nodes will be classified into different classes. We summarize the process of learning a new unlabeled data which is shown in Algorithm 1.

Algorithm 1:
Proposed incremental learning algorithm Input: ξ : new input data A: neuron set C: connection matrix Begin 1: Initialize to neuron set A = {C1, C2}, where the weights of neurons C1 and C2 are W 1 , W 2 ∈ R n . 2: Initialize edge set C ⊆ A × A, where C is an empty set, that is, there is no initial connection between neurons. 3: Establish a two-dimensional matrix age (m, n) = −1, where m, n are maximum integers. m, n can be increased automatically according to the actual situation. '−1' means that the no connection between neurons m and n. 4: Define the range thresholds T s1 and T s2 as the Euclidean distance between two neurons, that is, T S1 = T S2 = W 1 − W 2 . 5: Find out two neurons S1 and S2 in A which is closest to ξ .
ξ − W C 6: If ξ − W S1 > T S1 or ξ − W S2 > T S2 , then create a new node R for ξ . Add R into A, and set W r = X . Go to step 5. 7: If there is no connection between S1 and S2, create a connection for them. C = C ∪ {(S1, S2)} , age (S1,S2) = 0 8: All the edge's age connected to the winner node is added '1'.
age (S1,i) = age (S1,i) + 1, where i is the point connected with S1. 9: Update the weights of two winner nodes.
where the parameter ε (t) is the different learning rate of each neuron, and t is the number of times that the neuron becomes the winner. 10: For connection e(i, j) in edge set C do If age(i, j) > agemax then Remove e(i, j) from C End For where the value of agemax is set to 1/3 of the dimension of input data vector according to the experimental results.
(Continued) 11: Update the range thresholds T s1 and T s2 to the maximum distance from the adjacent datum points of S1 and S2. T S1 = argmin ξ ,S1 ξ − S1 , T S2 = argmin ξ ,S2 ξ − S2 12: If the total number of input data samples is an integral multiple of λ, then Check the neuron set A from whole neural network.
If Count ({si, sj}) = 1, {si, sj} ∈ C then Set j as the neuron connected with Si, Delete Si from neuron set A. Endif Endif End Output: The neuron set A and connection matrix C.

Experiments Results and Analysis
Experiments were carried out on a PC server with a GPU (Nvidia GeForce GTX 1080Ti with 11GB RAM) to evaluate the performance of the proposed method. Previous diabetics prediction studies were using a common dataset "The Pima Indians Diabetes Dataset" (PIDD) to test their performance [2,13]. PIDD was obtained from the UCI machine learning repository [33]. It is a subset of a bigger dataset held by the National Institute of Diabetes and Digestive and Kidney Diseases. The patient data provided in this dataset are Pima Indian genetic women over the age of 20. The output variable is 0 or 1, where 0 indicates negative detection and 1 indicates positive detection. All diabetes data have been normalized. 268 (34.9%) samples are positive and 500 (65.1%) samples are negative. However, as described in Section 3.1, PIDD is too small, and it is all non-Asian samples. We had created a Chinese Diabetes Dataset (CDD) for experiment with 5222 patients. 70% of the samples were used for training and the rest for testing. Polat et al. [37] Generalized discriminant analysis and LS-SVM 82.1 7 Purushottam et al. [38] C4.5 rules and partial tree 81. 3 8 Nnamoko et al. [39] Meta model of 5 classifier 77.0 9 Kalaiselvi and Nasira [40] Neuro-Fuzzy inference system 80.0 10 Daho et al. [3] Neuro-fuzzy Classifier 82.3 11 Proposed method Incremental learning network 89.1 In order to evaluate the effectiveness of the proposed diabetes prediction method, four types of methods (Bayes network [2], automatic multilayer perceptron [34], DNN [35] and DNN with Dropout [13]) were adopted. We used accuracy as the percentage of patients that are correctly diagnosed by prediction methods. The first experiment tested the accuracy diabetes prediction based on PIDD dataset. The comparison results of the previous methods discussed in the paper as well as result of the proposed method are depicted in Tab. 3. In this experiment, we only used 8 attributes provided in PIDD dataset to trained the proposed neural network.

Figure 4:
The prediction results after 1 to 9 incremental training. In the first row of figures, from left to right are the 1st, 2nd and 3rd training respectively. In the second row of figures, from left to right are the 4th, 5th and 6th training respectively, and so on It can be seen that in the first test on PIDD, our proposed method outperformed existing state-of-the-art methods. However, the experimental results of proposed method were not much better than those of others. This is because the depth of the neural network model used for prediction was not enough, resulting in the bottleneck of prediction accuracy of proposed method. Moreover, the lack of training attributes provided by PIDD data sets also limited the performance of artificial neural networks. Therefore, in the second experiment, we evaluated four well-known methods [2,13,34,35] to test the performance on CDD dataset as depicted in Tab. 4. Experimental results demonstrated that the more abundant attributes provided by CDD dataset can significantly enhance the accuracy of diabetes prediction. Meanwhile, a larger amount of data makes the deep neural network model easier to learn the characteristics of diabetes. Note that, the accuracy of [35], which using DNN for prediction, was quite low. This is because the depth neural network they used had excessive hidden layers and lacked of weight reduction means. In its training process, the neural network fitted the noise in the training data and the unrepresentative features in the training samples, so that the network was overfitted. The accuracy of their method can be improved by using two-dimensional data projection technology [41]. In [13], the prediction was also based on DNN, and the performance was improved by using Dropout technique. In CDD dataset, the proposed method obtained a significant score. In order to further demonstrate the incremental learning characteristics of our proposed method, we used another diabetes daily dataset (DDD) containing daily life records (i.e., walking distance, sleeping time) of patients. The third experiment was conducted on DDD. In this experiment, samples were divided into 9 groups to test the efficiency of incremental learning. Prediction results after 1 to 9 incremental training processes were presented in Fig. 4.
Prediction errors according to Fig. 4 were also calculated and shown in Fig. 5. Experimental results demonstrate that the proposed incremental learning model is able to update its parameters continuously according to the feedback information.

Conclusions
Artificial intelligence and big data analysis are getting more and more attention in medical application research. In this paper, a novel incremental learning network model was proposed. Using this network model, a chronic disease prediction system can evaluate the possibility of chronic disease for users in real time. We have also developed a data collection process to continuously collect physiological information from users. The error will be fed back to the server system, and the neural network model will be adjusted by processing the misjudgment information. When users have similar feature information again, the prediction system will give a warning. In this case, the parameters of neural network would not be updated. After a period of incremental learning and feedback adjustment, neural network will be more and more adaptive to each user. A series of experiments verify the effectiveness of our proposed system.