Prediction of Parkinson’s Disease Using Improved Radial Basis Function Neural Network

: Parkinson’s disease is a neurogenerative disorder and it is difficult to diagnose as no therapies may slow down its progression. This paper con-tributes a novel analytic system for Parkinson’s Disease Prediction mechanism using Improved Radial Basis Function Neural Network (IRBFNN). Particle swarm optimization (PSO) with K-means is used to find the hidden neuron’s centers to improve the accuracy of IRBFNN. The performance of RBFNN is seriously affected by the centers of hidden neurons. Conventionally K-means was used to find the centers of hidden neurons. The problem of sensitiveness to the random initial centroid in K-means degrades the performance of RBFNN. Thus, a metaheuristic algorithmcalled PSO integrated with K-means alleviates initial random centroid and computes optimal centers for hidden neurons in IRBFNN. The IRBFNN uses Particle swarm optimization K-means to find the centers of hidden neurons and the PSO K-means was designed to evaluate the fitness measures such as Intracluster distance and Intercluster distance. Experimentation have been performed on three Parkinson’s datasets obtained from the UCI repository. The proposed IRBFNN is compared with other variations of RBFNN, conventional machine learning algorithms and other Parkinson’s Disease prediction algorithms. The proposed IRBFNN achieves an accuracy of 98.73%, 98.47% and 99.03% for three Parkinson’s datasets taken for experimentation. The experimental results show that IRBFNN maximizes the accuracy in predicting Parkinson’s disease with minimum root mean square error.

The motivation behind using IRBFNN is to predict PD with maximum accuracy, positive predictive value, negative predictive value, and minimum error. The problem with conventional RBFNN is that performance of the classifier lies in the center of the hidden neurons [13][14][15]. Conventionally K-means clustering was used to find the centers of the hidden neurons in RBFNN. K-means is vulnerable to the initial centroid, which heavily affects the performance of RBFNN [16]. In order to build an efficient RBFNN, an optimal radial basis function has to be constructed for the hidden neurons. Some traditional methods used to find RBF centers are K-means, K-means with density parameter [16], original density method [17]. All the approaches specified have their origin from K-means. Thus, this paper focuses on using K-means for finding centers of hidden neurons. Thus, a metaheuristic algorithm called Particle Swarm Optimization (PSO) based K-means is used to find the centers of the hidden neurons of IRBFNN to maximize the accuracy with correct predictions. The PSO K-means [18] finds the centers using the exploration, exploitation of the particles and movement of the particle towards the global best.
The main contributions of the paper are: -PSO K-means is used to find the centers with the fitness value of maximizing the Intercluster distance and minimizing the intra-cluster distance -The centers given by PSO K-means are used in the hidden neurons of IRBFNN -Experimentation of IRBFNN is done on 3 Parkinson's datasets IRBFNN is compared with other variations of RBFNN such as RBFNN-3 where centers are found using Whale Optimization Algorithm (WOA) K-means, RBFNN-2 where centers are determined using Sine Cosine Algorithm (SCA) K-means, RBFNN-1 in which centers are calculated using Genetic Algorithm (GA) based K-means and RBFNN where centers are found using K-means in terms of accuracy, positive predictive value, negative predictive value, root mean square error, F-score -IRBFNN is also compared with conventional machine learning algorithms such as K-means, Random Forest, Decision Tree, and Support Vector Machine -Mean, Best, and Worst fitness values are also compared for proposed PSO K-means, WOA K-means, SCA K-means, and GA K-means, which is used in IRBFNN, RBFNN-3, RBFNN-2 and RBFNN-1, respectively.
-Also, IRBFNN is compared with other machine learning algorithms used for PD prediction -Experimentation is carried out 30 times, and the mean value is taken for performance analysis The rest of the paper is organized as follows: Section 2 describes the related study on applying different machine learning algorithms for predicting Parkinson's disease. Section 3 details the proposed system for the prediction of Parkinson's disease. Section 4 details IRBFNN along with centers determined using PSO K-means. Section 5 details the experimental results obtained by comparing the proposed Improved radial basis function neural network (IRBFNN-RBFNN + PSO + K-means) with other machine learning algorithms using the Parkinson's dataset taken from UCI repository [19]. Section 6 concludes the work along with the future scope.

Background and Related Works
Freezing of Gait (FoG) in Parkinson's disease was predicted using FoG prediction algorithm, which considers various metrics such as sensor positions, sensor axis, sampling window length [20]. Multisource ensemble learning together with Convolutional Neural Network (CNN) was used to detect Parkinson's disease [21]. Cascaded Multi-Column Random Vector Functional Link (RVFL) had been used for diagnosing PD. The data sets used was taken from PPMI. The model produced an accuracy of 81.93% [22].
FoG prediction model using AdaBoost was designed using impaired gait features. In order to correctly identify gait, a pre-Fog phase was used based on the slope of the impaired gait pattern [23]. Positive Transfer Learning (PTL) was used to detect PD. An At-home testing device (AHTD) measures the symptoms of PD which is then converted into UPDRS measurements [24].
FoG prediction was made using Electroencephalography (EEG) features, which was determined using Fourier and wavelet analysis using data gathered from 16 patients [25]. Conventional RBFNN was used to predict PD using the data generated by electrodes implanted in the deep brain of a patient [26]. Random forest together with minimum redundancy and maximum relevance was used to predict PD using the dataset having voice measurements of 31 people [27]. National Centre for Voice and Speech (NCVS) having 263 samples were used as a dataset to create a model using support vector machine and random forest to maximize accuracy while classifying PD [28]. Joint Regression and Classification Framework was designed for diagnosing PD using Parkinson's Progression Marker Initiative (PPMI) dataset [29].
From the literature, it is observed that there are several approaches present for prediction of Parkinson's disease. Also, there is wide use of particle swarm optimization algorithms to find the number of neurons, their centers and weights of RBFNN and the methods were applied to various real-world problems. With the goal to still maximizing the accuracy, in this paper, PSO with K-means is designed to find the optimal centers for RBFNN structure and the proposed approach is used for optimal prediction of Parkinson's disease.

Proposed Parkinson's Disease Prediction System
The system design for the proposed prediction of Parkinson's disease is shown in Fig. 1. The sensors embedded in the elderly patient gather the patient's health data and the data get stored in the data store. The data in the data store is split as training data set and test data set. Training data set is given as input to the preprocessor where the normalization process happens. The normalized data is provided as input to the predictor, which is designed using IRBFNN. The IRBFNN is trained with the training dataset, and the model is tested with the test dataset. Improved radial basis function neural network is used to transfer input vector X i ∈ X represented 3104 CMC, 2021, vol.68, no.3 in Eq. (1) into a suitable form that can be feed into the network to get linear separability. (1) where A ij represents j th feature of i th instance and |d| represents the number of features.

Preprocessor
The preprocessor does the process of normalizing the data to the range [0, 1]. Normalization of attributes represented in Eq. (2) is essential for efficient training of the predictor.

IRBFNN
IRBFNN classifies the input sample by sending each input vector X i to each RBF neuron in the hidden layer. Each RBF neuron in the hidden layer is a prototype that maps the input instance X i with the mean centroid vector μ u of hidden neurons. The radial basis function plays a crucial role in the classification of the instances accurately.

Initialization of Neuron
The numbers of neurons in the input layer are initialized to the number of dimensions of the dataset. Let I ← in 1 , in 2 , . . . , in |d| represents the set of neurons in the input layer. The number of neurons in the hidden layer is determined using Eq. (3) as specified in [16].
where H represents the set of hidden neurons specified as H ← {h 1 , h 2 , . . . , h L } and S represents the set of neurons in the summation layer. Typically, the neurons in the summation layer are equal to the number of target class label |CL| in the dataset. The set of neurons in the summation layer is given as S ← s 1 , s 2 , . . . , s |CL| . Finally, the number of neurons in the output layer is 1, and it is represented asO 1 .

Construction of Radial Basis Function Using PSO K-means
Each RBF neuron is designed using PSO K-means. The RBF neuron prototype plays a prominent role in the optimal allocation of class label to the instance that results in maximizing accuracy. Thus, it is necessary to choose a good prototype for maximizing accuracy. The metaheuristic clustering is used as a RBF neuron prototype where each instance is trained for the optimal class assignment. PSO is evaluated against the metrics such as Intra-cluster distance and Inter-cluster distance.

Computing Variance
Having computed the hidden neurons' centers using PSO K-means, the next step is computing the variance of each hidden neuron using Eq. (4).

Computing Weight
The initial weights between the hidden neuron and the summation layer neuron are assigned by the pseudo inverse method represented in Eq. (5). The weight between the j th hidden neuron and k th summation neuron is given as w jk . If the error rate specified in Eq. (7) does not converge at each iteration, then the weight vector is updated using the gradient descent method as defined in Eq. (9).
Error for the i th instance is specified using Eq. (6).
where y i is the actual output, f X i is the predicted output of the i th instance and |X | represents the number of instances. Eq. (8) illustrates the computation of change in weight vector.
When the error value is converged, the IRBFNN maximizes the accuracy in the prediction of Parkinson's disease for the test dataset.

Implementation
The radial basis function is designed using particle swarm optimization-based K-means. Algorithm 1 illustrates the working procedure of IRBFNN. Section 4 details the computation of the radial basis function for IRBFNN. The combined fitness function for the particle swarm optimization is represented in Eq. (10). The objective of maximizing the fitness function is achieved by maximizing the Inter-cluster distance and minimizing the intra-cluster distance.
where represents Intra-cluster distance, δ represents Inter-cluster distance. Each particle performs the computation in every iteration, such as evaluating its fitness using Eq. (10). Intra-cluster distance ( i ) is the distance between any two instances within the same cluster i represented in Eq. (11). Low intra-cluster distance of any cluster means that the clusters are compact.
Inter-cluster distance δ i , j is measured as the distance between the centroid of the cluster represented in Eq. (12). Higher the value of inter-cluster distance means that the clusters are well separated.
where C i , C j represents the centroid of i th cluster and j th cluster respectively and is computed as shown in Eq. (13).
The PSO K-means for finding optimal centers of the hidden neuron are represented in Algorithm 2. The GBestPos swarm contain the cluster centres that are used by hidden neurons in IRBFNN.

Experimental Results
The proposed IRBFNN was executed in python and its accomplishment was measured using three Parkinson's disease datasets taken from the UCI repository [19]. PSO K-means is used to find centers of hidden neurons. The investigation is performed on Intel ® core TM i5-4210 U CPU @1.70 GHz and 4 GB RAM.

Dataset Description
In order to evaluate the efficiency of the proposed IRBFNN, several investigations were performed. The analysis was conducted on 3 benchmarking Parkinson's datasets taken from the UCI repository. The datasets include Dataset 1-Parkinson's dataset, Dataset 2-Parkinson's disease classification dataset, Dataset 3-Parkinson's speech dataset with multiple types of sound recordings data set. Researchers widely used these datasets for classifying the Parkinson's disease. Tab. 1 gives a detailed description of the datasets including the number of instances, features, and classes.

Algorithms Used for Comparison
A metaheuristic algorithm PSO integrated with K-means with the defined fitness represented in Eq. (10) is used to find the centers of hidden layer neurons for IRBFNN. For the experimental purposes, the dataset is divided into 80:20 ratio i.e., 80% of data is used for training and 20% of data are used for testing. The experimentation is repeated for 30 times and the average value is taken for analyzing the efficiency of IRBFNN. The variations of radial basis function used for comparison of proposed PSO K-means are Whale Optimization Algorithm (WOA) K-means [30] Sine Cosine Algorithm (SCA) K-means [31], Genetic Algorithm K-means [32], and K-means [33].

Results
The results acquired for IRBFNN are elaborated in this section. The proposed IRBFNN is compared to assess the outcome of using PSO K-means as radial basis function instead of using K-means in RBFNN, GA K-means in RBFNN-1, SCA K-means in RBFNN-2 and WOA Kmeans in RBFNN-3. The metrics used to evaluate the proposed mechanism includes: Accuracy: Accuracy is defined as the correct prediction ratio that the classifier made to the total number of instances. The classifier's accuracy is represented in Eq. (14).
F score: F Score is the harmonic mean of precision and recall and it gives the measure of incorrectly classified instances by the classifier as specified in Eq. (15).
Recall: Recall is the ratio of correctly identified positive instances to the total number of positive instances specified in Eq. (16).
Fitness: The mean, best and worst fitness values of the radial basis functions are evaluated.
Execution Time: It is defined by the time taken to execute the algorithm to produce the desired outcome of 0 (indication of person is healthy) and 1 (indication of person is suffered from Parkinson's disease). This also includes the time taken by radial basis function to find the centers of hidden neurons. Fig. 2 gives the calculated values of average accuracy for the proposed IRBFNN is higher than other variations of RBFNN networks for all three Parkinson's datasets. The reason behind is that radial basis function of WOA K-means in RBFNN-3 does not explores the search space efficiently [30]. Also, SCA K-means in RBFNN-2 use so many random parameters resulting in the degradation of searching ability [31]. The GA K-means in RBFNN-2 suffers from a problem of premature convergence, and thus the centers of hidden neurons are not optimal enough to increase the classifier accuracy. The RBF K-means is used to find centers of hidden neurons in RBFNN. As K-means is vulnerable to the initial centroid and the centers are not optimal in increasing classifier's accuracy. For dataset1, IRBFNN achieves 4.9%, 12.04%, 14.71% and 17.3% greater accuracy than RBFNN-3, RBFNN-2, RBFNN-1 and RBFNN respectively. Similarly, for dataset 2, IRBFNN achieves 1.6%, 24.67%, 34.09% and 38.12% greater accuracy than RBFNN-3, RBFNN-2, RBFNN-1 and RBFNN respectively. IRBFNN improves the accuracy by 4.85%, 7.8%, 9.99%, and 20.00% than RBFNN-3, RBFNN-2, RBFNN-1, and RBFNN respectively for dataset 3.

Figure 2: Comparison of accuracy of IRBFNN with variants of RBFNN
Next experiment is carried out to measure the positive predictive value, which is shown in Tab. 4. IRBFNN is superior to all other variants of RBFNN for dataset 1 and dataset 3. RBFNN-3 has a 6.9% higher PPV than IRBFNN for dataset 2. IRBFNN ranks first in maximizing PPV for all datasets except dataset2, for which it comes the second position. These experimental results indicate the proposed activation function PSO K-means is good in finding the centers of hidden neurons and justifies the need for integrating PSO K-means in original RBFNN. Tab. 5 gives the comparison between IRBFNN, RBFNN-3, RBFNN-2, RBFNN-1 and RBFNN based on the correct prediction of negative instances as negative using the metric NPV. Also, Tab. 5 outlines that IRBFNN surpasses the classifier outputs given by additional algorithms for all datasets. IRBFNN improves NPV by 0.11% and 12.06% than RBFNN-3 for dataset3 and dataset2, respectively. The reason behind the success of IRBFNN is that activation function PSO K-means carefully searches the solution space without trapping in local optima to produce centers of hidden neurons. Tab. 6 shows the outcome of applying IRBFNN and other variants of RBFNN on all three Parkinson's datasets. It also gives clear evidence that IRBFNN has superior performance than other algorithms. For a good classifier, recall should be high so does IRBFNN. IRBFNN improves recall by 3.34%, 8.6%, 9.6% and 11.68% than RBFNN-3, RBFNN-2, RBFNN-1 and RBFNN respectively for dataset 1. For large scale dataset 2, the Recall of IRBFNN is 8.09% higher than RBFNN-3. Fig. 3 presents the results of IRBFNN and other variants of RBFNN for measuring root mean square error. Fig. 3 gives evidence that IRBFNN achieves minimum root mean square error for all datasets. The RMSE of IRBFNN is 0.09829, which is less than RBFNN where the RMSE is 0.56 for dataset 3. In other words, RMSE of IRBFNN is 33.195%, 45.52%, 50.17%, and 58.91% less than RBFNN-3, RBFNN-2, RBFNN-1, and RBFNN, respectively for dataset 3. Fig. 4 represents that IRBFNN predicts the instances more accurately than other variants of RBFNN.     Fig. 5. radial basis function PSO K-means has better fitness than other RBF's such as WOA K-means, SCA K-means, GA K-means. IRBFNN, and RBFNN-3, RBFNN-2 has nearly the same worst fitness value but the former achieves 3.34% and 8.69% minimum RMSE than the latter. For dataset 3, IRBFNN has poor worst fitness than RBFNN-1 but the mean fitness of IRBFNN is superior to the latter which gives evidence that IRBFNN is better than all other classifiers mentioned. Fig. 5 represents the fitness value as specified in Eq. (10) for the datasets described in Tab. 1.

Comparison of IRBFNN with other Parkinson's Disease Prediction Method
The accuracy of IRBFNN is compared with other existing Parkinson's Prediction Algorithm. For dataset 1, the accuracy was measured as: neural network [34] 0.9290, SVM with recursive feature selection [35] 0.9384, Fuzzy K-NN [36] 0.9579, PSO FKNN [37] 0.9747 and IRBFNN 0.9874. For dataset 2, the computation of accuracy is given as CNN [38] with accuracy 0.8690, MAMASVD + K-NN [39] with accuracy 0.9200, OPS + K-NN [40] with accuracy 0.9841, SLGS [41] with accuracy 0.8871 for males and 0.8715 for females and IRBFNN with accuracy 0.9847. For dataset 3, the measurement of accuracy is given as: LOSO + K-NN [42] with accuracy 0.8250, FLR [43] with accuracy of 1.0000 and IRBFNN with an accuracy of 0.9903. For dataset 3, IRBFNN ranks second while FLR obtained first rank. In other words, the FLR improves accuracy by 0.0962% than IRBFNN. It is observed that IRBFNN is superior to other methods for dataset 1 and dataset 2 from experimental results.

Analyzing Computational Time
The computational complexity of the proposed IRBFNN is measured using Big-Oh O notation. The time taken for each process are specified in Tab. 7. Tab. 8 shows the average execution time taken by the algorithms. The deduction made from the Tab. 8 is that, for all the datasets, the time taken by IRBFNN is minimum than other algorithms. From Tab. 8 it is evident that the algorithm spends maximum time in computing centers of hidden neurons. Thus, the computational complexity of IRBFNN is O (T max |P| |d| |X |).

Comparison of IRBFNN with other Machine Learning Algorithms
From Tab. 9, it is evident that for all the datasets, IRBFNN achieves maximum accuracy than other traditional algorithms. The accuracy of IRBFNN is improved by 9.382%, 14.621%, 10.266%, and 18.875% than SVM, Random Forest, Decision Tree, and K-means for dataset 1. For dataset 2 IRBFNN improves accuracy 17.387% greater than Decision trees. Similarly, for dataset 3, the performance of IRBFNN is improved by 17.08% than SVM. But the time taken by IRBFNN is more than traditional machine learning algorithms showing there is a trade-off between accuracy and time taken. Tab. 10 shows the time taken by various algorithms for predicting the Parkinson's disease.  The inferences made from the experiment results were listed as: • Improved radial basis function neural network maximizes accuracy together with minimizing root mean square error • The use of PSO K-means with the fitness of maximizing Intercluster distance and minimizing intracluster distance finds optimal cluster centers, which is used in hidden neurons of IRBFNN • Experiments performed to measure the positive predictive value, and negative predictive value also signifies the introduction of PSO K-means radial basis function improves the performance in identifying the positive and negative instances • The execution time of the proposed IRBFNN is higher than conventional machine learning algorithms but with the increase in accuracy • The introduction of PSO K-means improves the accuracy of IRBFNN by 3.83%, 14.85%, 19.57% and 25.15% than RBFNN-3, RBFNN-2, RBFNN-1 and RBFNN, respectively.

Conclusion
Finally, through rigorous analysis, it has been inferred that the IRBFNN was designed and experimented successfully to predict Parkinson's disease. Besides, the proposed network reveals that finding the efficient radial basis function is essential for accurate prediction. The presented IRBFNN best solves the given problem of predicting Parkinson's disease by efficiently finding the centers of hidden neurons for designing the radial basis function of IRBFNN. Thus, to obtain the good performance, metaheuristic algorithms are used to find optimal values of these parameters, leading to minimizing error and maximizing accuracy. PSO K-means' performance is compared with other metaheuristic way of finding centers in designing radial basis function neural network and the proposed IRBFNN shows that PSO K-means choose the optimal center by doing good level of exploration and exploitation by avoiding struck in local optima when predicting Parkinson's disease. The key findings of the paper are listed as: • The problem of finding the centers of the hidden neurons is solved by using PSO with K-means, which maximizes the accuracy of the presented IRBFN • The integration of PSO with K-means diminishes the problems caused by the initial random centroid of conventional K-means by doing a good level of exploration and exploitation • The fitness value of PSO takes Intra-cluster distance, Inter-cluster distance, which produces optimal cluster centers The use of PSO K-means in finding the hidden neurons' centers maximize the accuracy, F-score, positive predictive value, negative predictive value, recall and minimizes the root mean square error. In future work, a novel feature selector algorithm will be integrated before the analytics process for further enhancing the accuracy of prediction.