|Computers, Materials & Continua |
Enhancing Parkinson's Disease Prediction Using Machine Learning and Feature Selection Methods
1College of Computer Science and Engineering, Taibah University, Medina, 41477, Saudi Arabia
2School of Computing and Digital Technology, Birmingham City University, Birmingham, B47XG, United Kingdom
3Information System Department, Saba'a Region University, Mareeb, Yemen
4Computers and Systems Engineering Department, Al-Azhar University, Cairo, 11884, Egypt
5RIADI Laboratory, National School of Computer Sciences, University of Manouba, Manouba, 2010, Tunisia
*Corresponding Author: Faisal Saeed. Email: email@example.com
Received: 29 August 2021; Accepted: 19 November 2021
Abstract: Several millions of people suffer from Parkinson's disease globally. Parkinson's affects about 1% of people over 60 and its symptoms increase with age. The voice may be affected and patients experience abnormalities in speech that might not be noticed by listeners, but which could be analyzed using recorded speech signals. With the huge advancements of technology, the medical data has increased dramatically, and therefore, there is a need to apply data mining and machine learning methods to extract new knowledge from this data. Several classification methods were used to analyze medical data sets and diagnostic problems, such as Parkinson's Disease (PD). In addition, to improve the performance of classification, feature selection methods have been extensively used in many fields. This paper aims to propose a comprehensive approach to enhance the prediction of PD using several machine learning methods with different feature selection methods such as filter-based and wrapper-based. The dataset includes 240 recodes with 46 acoustic features extracted from 3 voice recording replications for 80 patients. The experimental results showed improvements when wrapper-based features selection method was used with K-NN classifier with accuracy of 88.33%. The best obtained results were compared with other studies and it was found that this study provides comparable and superior results.
Keywords: Filter-based feature selection methods; machine learning; parkinson's disease; wrapper-based feature selection methods
Parkinson's disease (PD) is a long term degenerative disorder of the central nervous system which causes both motor and non-motor symptoms . The exact causes of PD are unknown and unclear, but it is supposed to include risk factors which are both genetic and environmental. More than 10% of patients with PD have a first-degree relative with PD disease. In addition, PD is more prevalent between people who are disclosed to some pesticides and the people with past history of head injury, while PD risk is lower for patients who smoke . PD mainly affects neurons in a certain region of the mid brain that is known as substantia nigra, dopamine-producing brain cells, which leads to inadequate dopamine secretion in this region .
In the early stage of the PD, the main symptoms are shaking, difficulty with walking and slowness of movement. The common symptoms with late phase of PD are anxiety, dementia and depression. Moreover, emotional problems, sleep and sensory symptoms may also occur [4,5], in addition to Parkinsonian syndrome . These symptoms are mainly used to diagnose typical PD, in addition to examinations such as neuroimaging. There is no total recovery for PD, however treatment aims to improve the symptoms [7,8]. The medical decision support systems (MDSS) have increasingly used a significant diagnosis and treatment method that uses artificial intelligence (AI) methods on a clinical dataset to assist clinicians to make better decisions [9,10]. Recent improvements in machine learning, AI and statistical learning have improved decision support system (DSS), which has helped to introduce intelligent decision systems [10,11]. Some studies reported that the artificial intelligence cannot be effective without learning . There are many types of machine learning methods such as Support Vector Machine (SVM), Naïve Bayes (NB), K-nearest Neighbor (KNN), Multilayer Perceptron, Decision Tree (DT) and Random Forests (RF) that have been used to solve medical decision problems.
There is a significant overlapping between ML and data mining which often use the same procedures, but whereas ML concentrates on prediction, based on previously definite properties learned from the training data, data mining concentrates on the detection of unknown properties in the clinical data. The machine learning (ML) techniques have a significant role to play in the medical disease diagnosis field and are widely used in bioinformatics [13,14].
Recently, the variety of medical data is continuously increasing, therefore, effective classification and prediction algorithms are required. The previous studies on machine learning research reported that the accuracy of a classification algorithm can be influenced by many agents . ML algorithms are used to analyze medical data sets and diagnostic problems . Subsequently, improvement of medical decisions, treatments, and decrease financial costs will occur [14,16].
In addition, feature selection plays an important role in the explanation of medical data. Feature selection technique constitutes a significant issue of global combinatorial optimization in machine learning, which is used to decrease the number of features from the original features, removes irrelevant or redundant features without incurring much loss of information, as well as simplification of models to make them easier to interpret and shortening training times . Therefore, a good feature selection method is required to accelerate processing time and predictive accuracy. There are three types of feature selection algorithms, which are: filter (extract features from the data set without any learning), wrapper (use learning techniques to estimate useful features) and hybrid (gather the feature selection step and the classifier construction) [18,19].
Recently, the medical field is the most favorable field to use machine learning methods. Therefore, Naïve Bayes (NB), Support Vector Machine (SVM), K-nearest Neighbor, Multilayer Perceptron and Random Forests as well as feature selection methods have been suggested to solve medical decision problems, such as the prediction of Parkinson's disease. In this paper, the main contributions in the domain of prediction of Parkinson's Disease can be summarized as follows.
1. A comprehensive approach was used to investigate the performance of several feature selection methods and machine learning methods in order to enhance the prediction of PD.
2. These feature selection methods include both filter-based methods such as (Information gain IG, Principle Component Analysis PCA) and wrapper methods that include different search methods such as First Best Greedy Stepwise PSO Method.
3. A comparative analysis was conducted to examine the performances of all methods/combinations used and the best prediction results were reported.
This paper is organized as follows: Section 2: the related works. Section 3: discussion of the methods. Section 4: experimental results and discussion. Section 5: conclusions and future works.
2 Related Studies
Several works have investigated the diagnosis of PD, in which many machine learning methods were applied such as Support Vector Machine, neural network, Naïve Bayes, K-nearest neighbor and Random Forests. In this paper, several datasets were used to search for related studies on Parkinson's disease, including Scopus, IEEE Xplore, Science Direct and Google Scholar.
In  a supervised ML method was proposed that combined the Principal Components Analysis (PCA) to extract features and SVM as classification method to identify PD patients. The main goal of this method was to determine patients that will be diagnosed with PD or with Progressive Supranuclear Palsy (PSP). The experiments were conducted on data of several patients with clinical and demographic features. The results depicted good accuracy of the proposed method in identifying the PD patients compared to existing related works.
In addition, the authors in  proposed an expert system of PD using features extracted from recordings of patients’ voice. They developed a Bayesian classification approach to deal with the dependence to match the replication-based experimental design. The experiments were performed on voice recordings involving 80 subjects, 50% of them had PD. The aim was to identify which subjects had no PD and which did have the disease. Naranjo et al. addressed the problem of identifying PD patients using the extracted acoustic features from repeated voice recordings. The proposed method was based on two steps, namely variable selection, and classification. The first step aims to reduce the number of features, while the next step uses a regularization method named LASSO (Least Absolute Shrinkage and Selection Operator) as a classifier. The proposed method was tested on the previously described database and showed a good capacity for PD discrimination.
In addition, the authors in  addressed the problem of PD diagnosis by developing an approach that investigated gait and tremor features that were extracted from the voice reordering data. They started by filtering data to remove noises, then, using this data to extract gait features they detected the peak and measured the pulse duration. The average accuracy obtained for the identifying PD patients by the proposed approach was satisfactory.
The authors in  proposed a method to automatically detect PD by using the convolutional neural network (CNN). The authors suggested considering electroencephalogram (EEG) signals to build a thirteen-layer CNN model. The proposed approach experimented with EEG signals of 20 Parkinson's disease patients (50% men and 50% women). The CNN method obtained interesting results to identify PD patients; however, its performance should be evaluated using a large population.
Recently, Mostafa et al.  tried to enhance the diagnoses of PD by using several methods of feature evaluation and classification. They used a multi-agent system to evaluate multiple features by using five classification methods, namely DT, NB, NN, RF, and SVM. To evaluate the proposed method, they conducted several experiments using original and filtered datasets. The results depicted that this method enhanced the performance of ML methods used by finding the best set of features.
In addition, several methods were applied by [25–27] in order to predict Parkinson's disease. These methods applied several machine learning and feature selection methods to enhance the prediction of Parkinson's disease and other studies utilized machine learning and deep learning to improving prediction of diseases [28–38]. This paper extends these efforts by applying a comprehensive approach to investigate the performance of several machine learning with feature selection methods.
There are many feature selection techniques available, and we have considered the utilization of the following feature selection techniques: Filter-based technique, Correlation-based Feature Subset Selection (CfsSubsetEval), Principle Component Analysis (PCA), and Wrapper technique. The aforementioned techniques use different strategies or search algorithms to generate subsets and progress the search processes including (i) Best First (ii) Greedy Stepwise, (iii) Particle Swarm Optimization (PSO), and (vi) Ranker (see Fig. 1).
The dataset used in this paper is available online at UCI Machine Learning Repository . The dataset contains acoustic features of 80 patients, 50% of them suffering from Parkinson's disease. The data set has 240 recordings with 46 acoustic features extracted from 3 voice recording replications per patient. The data set is well-balanced by gender and class label (whether the patients have Parkinson's disease or not).
The experimental protocol was designed for evaluating the combination of the above techniques and search algorithms when they were used with the following classification models: (i) Naïve Bayes, (ii) Support Vector Machine (SVM)1, (iii) K-Nearest Neighbor (K-NN), (vi) Multi-Layer Perceptron (MLP) and (v) Random Forest (RF). The experiments were carried out on WEKA tool version 3.8 and MacBook Pro with OS X Yosemite version 10.10.5 as an operating system. To evaluate the performance of each classifier, we first ran feature selection in order to find the representative features and then we applied the classification models. Additionally, 10-fold cross validation was applied and the results have been reported in terms of Accuracy, Recall, Precision and F-score. Finally, we analyzed the results achieved from the experimentations. As stated earlier, the main goal of the research is to enhance the prediction of Parkinson's disease. However, this work also provides a useful guide to selecting the best feature selection technique for different classification models.
3.1 Feature Selection Techniques
Several feature selection techniques were applied before feeding the data into the classifier. The filter-based techniques consider the relevance between the features. Thus, they have low complexity, acceptable stability and scalability . A disadvantage of this type of technique is that it might ignore some informative features, especially when the data is coming in stream . The filter-based approaches can be either univariate or multivariate . The univariate methods examine features according to the statistically-based criterion such as Information Gain (IG) [42–44]. Multivariate methods compute feature dependency before ranking the feature. In addition, Principle Component Analysis (PCA) is a common statistical method that is used for data analysis. PCA reduces the size of the data sets by selecting a set of features that represents the whole data set. Since PCA is a conversion technique, the principal components of the first variables is the component with the highest variance value. Then, other principal components are ordered with descending variance values . In addition, the wrapper-based techniques evaluate the quality of the selected features using the performance of the learning classifier.
Regarding the search strategies, the search algorithms follows either sequential forward search (SFS), or sequential backward search (SBS). The SFS starts with a single feature and then iteratively adds or removes features until some terminating criterion is met whereas SBS starts with the whole feature set and then continues with adding and deleting operations. Since the SBS method attempts to find solutions ranged between suboptimal and near optimal regions , it is worth fully employing optimization techniques to figure out the subset that leads to maximizing the learner's performance, in particular, with the wrapper approach. At this end, the wrapper-based method can take advantage of various optimization methods such genetic algorithm [46,47] and ant colony optimization algorithm (ACO) .
3.2 Machine Learning Classifiers
In machine learning, the data classification is still an attractive domain. Lately, there are many proposed algorithms that have been examined in several domains such as NB, SVM, K-NN, MLP and RF, which are presented briefly in the next subsections.
3.2.1 Support Vector Machine
The basic idea behind SVM algorithm is to construct a hyperplane between groups of data. The quality of the hyperplane is evaluated by measuring to which degree it can maintain the largest distance from the points in either class . Therefore, as it is presented in Fig. 2, the higher the separation ability of the hyperplane, the lower the error in the value . The computational complexity of SVM is [50,51].
3.2.2 Naïve Bayes
Naïve Bayes (NB) is a probabilistic classifier that is based on Bayesian theorem. It is called Naïve because the classifier works on a strong features independence assumption. In literature, there are several variants of NB: simple Naïve Bayes, Gaussian Naïve Bayes, Multinomial Naïve Bayes, Bernoulli Naïve Bayes and Multi-variant Poisson Naïve Bayes in which the main different among them is the way the probability of the target class is computed. The time complexity of Naïve Bayes is where d is the query vector's dimension, and c is the total classes.
3.2.3 K-Nearest Neighbor
K-NN is a type of lazy learning, in which there is no explicit training phase and all computations are deferred until classification. It is a method of classifying data based on the nearest training data points in the feature space. The K-NN classifier uses the Euclidean distance measure, or another measure such as Euclidean squared, Manhattan, and Chebyshev, to estimate the target class. The performance of the classifier depends upon the parameter k, while the best value of k depends upon the dataset. In general, the greater the value of k, the lower the noises in the classification, but the boundaries between the classes become less distinct as shown in Fig. 3. The time complexity of K-NN is , where n is the number of training examples and m is the number of dimensions in the training set .
3.2.4 Multilayer Perceptron Model
The MLP is a classical feedforward neural network classifier in which the errors of the output are used to train the network . MLP consists of three layers of nodes: (i) input layer, (ii) at least one or more hidden layer(s), and (iii) output layer. The input layer is connected to the hidden layers which are connected to the output layer. All the layers are processed by weighted values. Fig. 4 represents a MLP with a single hidden layer. MLP is one-way error propagation where back-propagation techniques have been utilized to train and test these weight values. The time complexity of MLP is .
3.2.5 Random Forests
The Random Forests (RF) classifier is a type of ensemble method that combines multiple decision tree predictions. In RF, the trees are generated randomly by selecting attributes at each node. The output of the ensemble is tree votes with the most popular class. The pseudo-code of the Random Forest ensemble is presented in Tab. 1. The time complexity of Random Forest of size T and maximum depth D (excluding the root) is .
The random forest method is more robust to errors and outliers. Therefore, the problem of over-fitting is not faced. The accuracy of the model depends mainly on the strength of the base classifiers and measure of the dependence between them .
4 Experimental Results
The experiments were conducted such that 10-fold cross validation was applied for each classifier. The performance of each classifier was measured by the accuracy, precision, recall and F-score. Tabs. 2–12 show the experimental results of several machine learning methods both with and without different feature selection methods.
Tab. 2, shows the performance of all classifiers used before applying features selecting methods. The results showed Naïve Bayes obtained the best performance using all evaluation measures compared to the other classifiers. It obtained 82.92%, 83.30%, 82.90% and 82.90% for accuracy, precision, recall and F-score respectively.
The number of features was reduced using correlation based feature selection (CfsSubsetEval) method to 23, 17, 18 for the search methods of First Best, Greedy Stepwise and POS respectively, as shown in Fig. 5. The performance of with CfsSubsetEval combinations for each classifier is shown in Tab. 3. The results showed that no improvements were obtained by most of the combinations, except for RF with Greedy Stepwise and POS methods.
Tab. 4 showed the performance of classifiers used when features selection method based on information gain was applied. As shown in Fig. 5, the number of features was reduced to 10. The results showed that no improvements were reported on the performance of all classifiers after applying this feature selection method.
In addition, Tab. 5 shows the performance of all classifiers when features selection method based on PCA was applied. The results showed that only SVM methods obtained better performance after applying this features selection method. The number of features was reduced to 20 as shown in Fig. 5.
Tab. 6 summarizes the performance of filter based features selection methods. The results showed that feature selections with PCA obtained the best performance when SVM classifier was applied.
Tabs. 7–12 show the performance of wrapper-based features selection methods using different base classifiers. In each table, First Best, Greedy Stepwise and PSO search methods were applied.
Tab. 7 showed that, when Naïve Bayes was used as the base classifier for wrapper-based feature selection method, the performance of NB using PSO search method was enhanced to 0.854, 0.855, 0.854 and 0.854 for accuracy, precision, recall and F-score respectively. The performance of the other classifiers using this method was reduced.
Tab. 8 shows the performance of classifiers when the wrapper-based features selection method with c-SVM as the base classifier was applied. The results showed the enhancements obtained by all classifiers using all search methods. However, the best performance was obtained by SVM using First Best and Greedy Stepwise search methods.
However, Tab. 9 shows the performance of classifiers when wrapper-based features selection method with nu-SVM as the base classifier was applied. The results showed that the enhancements were obtained by applying c-SVM, K-NN and RF, especially when the POS search method was used.
In addition, Tab. 10 shows the performance of classifiers when wrapper-based features selection method with MLP as base classifier was applied. The results showed that the enhancements were obtained by applying MLP and RF for the three search methods. The best results were obtained using MLP classifier.
Moreover, Tab. 11 shows the performance of classifiers when wrapper-based features selection method with K-NN as base classifier was applied. The results showed that the enhancements were obtained by applying K-NN and RF for the First Best and POS search methods. The best results were obtained using K-NN classifier with accuracy, precision, recall and F-scores of 0.883, 0.884, 0.883 and 0.883 respectively.
Tab. 12 shows the performance of classifiers when wrapper-based features selection method with RF as base classifier was applied. The results showed that the enhancements were obtained by applying MLP and RF for the three search methods. The best results were obtained using RF classifier.
Tab. 13 shows a comparison of different wrapper-based features selection methods (using different base classifiers). The results showed that the best performing classifier was K-NN associated with the wrapper-based feature selection with KNN as base classifier, obtaining 88.33% accuracy. The number of features was reduced (with the best performance obtained) to 20, 5 and 22 using First Best, Greedy Stepwise and PSO search methods.
Finally, Tab. 14 shows a comparison of using different features selection methods (filter and wrapper base methods). It shows that the best performance was obtained by K-NN classifier associated with wrapper-based feature selection method with K-NN as base classifier and using Best First and PSO search method.
For this paper a comparison has been conducted between the best performing methods and the previous studies on predicting Parkinson's disease using the same dataset, and other datasets, as shown in Tab. 15. The comparison results showed that the best performing method (K-NN classifier associated with wrapper-based feature selection method with K-NN as base classifier and using Best First and PSO search method) obtained comparable and superior results.
6 Conclusions and Future Works
This paper examined the performance of several classifiers with filter-based and wrapper-based features selections methods to enhance the diagnosis of Parkinson's disease. Different evaluation metrics were used including accuracy, precision, recall and F-score. The experiments compared the performance of machine learning on original and filtered datasets. The results showed that wrapper-based features selection method with K-NN enhanced the performance of predicting Parkinson's disease, with the accuracy reached to 88.33%. In future work, more machine learning and deep learning methods could be applied with these combinations of features selection methods. In addition, other features selection methods could be investigated to improve the performance of predicting Parkinson's disease.
Acknowledgement: The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work; project number (77/442). Also, the authors would like to extend their appreciation to Taibah University for its supervision support.
Funding Statement: This research was funded by the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia under the Project Number (77/442).
Conflicts of Interest: The authors declare that they have no conflicts of interest.
1Both, the c-SVM and nu-SVM are examined.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|