Evolutionary Algorithsm with Machine Learning Based Epileptic Seizure Detection Model

: Machine learning (ML) becomes a familiar topic among decision makers in several domains, particularly healthcare. Effective design of ML models assists to detect and classify the occurrence of diseases using healthcare data. Besides, the parameter tuning of the ML models is also essential to accomplish effective classification results. This article develops a novel red colobuses monkey optimization with kernel extreme learning machine (RCMO-KELM) technique for epileptic seizure detection and classification. The proposed RCMO-KELM technique initially extracts the chaotic, time, and frequency domain features in the actual EEG signals. In addition, the min-max normalization approach is employed for the pre-processing of the EEG signals. Moreover, KELM model is used for the detection and classification of epileptic seizures utilizing EEG signal. Furthermore, the RCMO technique was utilized for the optimal parameter tuning of the KELM technique in such a way that the overall detection outcomes can be considerably enhanced. The experimental result analysis of the RCMO-KELM technique has been examined using benchmark dataset and the results are inspected under several aspects. The comparative result analysis reported the better outcomes of the RCMO-KELM technique over the recent approaches with the accu y of 0.956.


Introduction
Machine Learning (ML) method is a subarea of artificial intelligence (AI) technique, where the term represents the capacity of information technology (IT) system to independently find solutions to the problem by recognizing patterns in data bases. The ML method allows IT systems to identify patterns on the basis of current datasets and algorithms and develop satisfactory solution concepts. Thus, in ML method, artificial knowledge is created on an experience basis [1]. In ML, mathematical and statistical models are utilized for learning datasets. There are two major schemes i.e., symbolic and sub-symbolic models. During symbolic system, e.g., propositional system where the knowledge content, that is the induced rules and they are explicitly characterized, while sub-symbolic system is artificial neuronal network [2]. It works on the principles of human brain, where the knowledge content is implicitly characterized. The key problems of ML for big data are high speed of streaming data, largescale of data, different types of data, incomplete and uncertain data. The three major kinds of ML are reinforcement, supervised, and unsupervised learning.
Epilepsy is a common brain disorder, represented as recurring seizure [3]. Around 50 million people worldwide suffer from epilepsy, and 80% of them are in developing nations. Yearly, over 2 million new cases of epilepsy are detected across the world. Electroencephalogram (EEG) signal is extensively employed for detecting epilepsy by recording the brain's electrical activities directly [4]. Generally, seizure happens unpredictably and infrequently, an automatic diagnosis scheme which is capable of classifying epileptic EEG signal from normal one is extremely useful in making diagnosis. In this method, the recorded EEG signal is the input, whereas the classification of EEG signal is the output. In general, two stages are included in an automatic diagnoses method: (i) the feature extraction from EEG input signal and (ii) the classification of feature extraction for seizure diagnosis [5]. Various classification technique has been employed for the automatic diagnosis of seizure. Generally, the experiment result shows that EEG signal contains useful feature for the diagnosis of seizure event and that most automatic seizure diagnosis system is very efficient [6].
The major problem in the automated diagnosis of epileptic seizures is selecting the distinguishing feature to differentiate among distinct phases (involving ictal, pre-ictal) [7]. But, in the earlier studies, initially, time-frequency, several time, frequency, and statistical features are extracted, later, the optimal discriminator feature is chosen manually or utilizing traditional feature selection (FS) method that is a time-consuming process that demands higher computation difficulty because of higher dimension and are computationally intensive and are typically not strong [8]. Moreover, the optimal feature in one case/subject mayn't be regarded as optimal for other ones. Thus, a generalized model which learns the appropriate feature corresponds to all the cases/subjects is important.
Kaur et al. [9], proposed a secure and smart medical data scheme with advanced security and ML mechanisms for handling big data in healthcare field. The novelty lies in the integration of data security layer and optimum storage utilized for maintaining privacy and security. Distinct methods such as activity monitoring, masking encryption, dynamic data encryption, end point validation, and granular access control were integrated. Abdelaziz et al. [10] presented an approach for HCS based cloud environments with Parallel (PPSO) to enhance the selection of virtual machine (VM). Additionally, a new method for chronic kidney disease (CKD) prediction and detection is presented for measuring the efficiency of these VM models. The predictive method of CKD is performed by two successive methods, that is, logistic regression (LR) and neural network (NN).
Nilashi et al. [11] a prediction model has been developed for the diagnosis of heart disease with ML method. Then, the presented model is designed by supervised and unsupervised learning models. Particularly, the study is based on principal component analysis (PCA), Self-Organizing Map, Fuzzy support vector machine (FSVM), and two imputation methods for missing value imputation. Moreover, employing the incremental FSVM and PCA to incremental learning of the information for reducing the computational time of disease prediction.
Dinh et al. [12] estimate the capability of ML methods in diagnosing at-risk patients with laboratory results and find key parameters within the data that contributed to these diseases amongst the patients. With distinct feature sets and time-frames for the laboratory data, various ML methods have been estimated on the classification accuracy. Elhoseny et al. [13] presented an automatic heart disease (HD) diagnosis (AHDD) which incorporates a binary convolutional neural network (CNN) with multiagent feature wrapper (MAFW) method. The agent instructs the genetic algorithm (GA) to implement a global searching on HD feature and adjusts the weight at the time of early classification.
This article develops a novel red colobuses monkey optimization with kernel extreme learning machine (RCMO-KELM) technique for epileptic seizure detection and classification. The proposed RCMO-KELM technique initially extracts the features from the actual EEG signals and the min-max normalization approach is employed for the pre-processing. In addition, KELM model is used for the detection and classification of epileptic seizures utilizing EEG signals. Also, the RCMO technique was employed to the optimal parameter tuning of KELM model in such a way that the overall detection outcomes can be considerably enhanced. The experimental result analysis of the RCMO-KELM technique has been examined using benchmark dataset and the results are inspected under several aspects.

The Proposed Model
In this article, a novel RCMO-KELM technique has been developed for epileptic seizure detection and classification. The proposed RCMO-KELM technique initially extracts the chaotic, time, and frequency domain features in the actual EEG signal. Besides, the RCMO-KELM technique involves several stages of operations namely feature extraction, min-max normalization based preprocessing, KELM based classification, and RCMO based parameter tuning. Fig. 1 demonstrates the overall process of RCMO-KELM technique.

Feature Extraction
The proposed RCMO-KELM technique initially extracts the chaotic, time, and frequency domain features from the actual EEG signal. During all the raw EEG signals, there are 178 points. For extracting important data in these EEG signals, 31 distinct features are removed in these EEG signals to all the classes. These features are skewness, maximal value, clearance factor, minimal value, sample entropy, average value, shape factor, kurtosis, median, mod, fast Fourier transform (FFT) coefficients (first 15 values), approximate entropy, and Auto-Regressive (AR) coefficients (first 5 values).

Pre-processing
The procedure of normalized to raw input is an enhanced result of creating the data that is suitable to train. This approach rescales the resultant or feature in one range of values to a novel range of values. Most frequently, the feature was being rescaled to lie from the range of zero to one or from -l to 1. This rescale was accomplished frequently with utilize of linear interpretation equation as: to a feature, it defines the continuous rate for that feature from the data. When the value of feature was detected with constant value from the data, it supposes that unconcerned as it doesn't provide some data to NN. When the min-max normalized was implemented, all the features lie from the novel range of values that turn remain similar. The normalized utilizing min-max is the advantage of maintaining every connection from the data exactly.

KELM Based Classification
Information on the ELM utilized by IPEELM technique. The ELM utilizes a Single hidden layer feedforward neural network (SLFN) with learning speed greater than typical feedforward network learning techniques (BP). Because of their simplicity, remarkable performance, and impressive efficiency on generalized, the ELM was executed from a variety of domains namely data classification, computer vision, control and robotics, bioinformatics, and system identification [14].
The resultant of SLFN containing L amount of hidden nodes are demonstrated in Eq. (2); where a i and b i implies the learning parameter of hidden nodes β i signifies the weight linking the i th hidden to resultant nodes. G (a i , b i , x) refers to the resultant of hidden nodes in terms of input χ . Usually, the additive hidden node with activation function (AF) has g (x) : Eq. (4) is expressed as: where H is named as hidden state resultant matrix of SLFN; the i −th column of H is j th hidden nodes resultant in terms of input is termed as hidden state feature map. The i th row of H signifies the hidden state feature map interms of i th input It is demonstrated that interpolation ability viewpoint when the AF g has infinitely differentiable from some interval the hidden state parameters are arbitrarily created [15].  The AF is also named as Transfer Function (TF) defines the result of node because of a provided input or group of inputs. Specifically, AFs were utilized for restricting and limiting the resultant value to particular finite value range. With respect to this calculation, AF is a vital play. At this point, it is examining the performances of AFs and implements them experimentally. This technique utilizes 4 distinct AFs. All the processors from the parallel calculation environment utilize or choose one of these AFs arbitrarily under the optimized procedure. This AFs utilized by IPE-ELM technique are: Sigmoid function: is a mathematical process containing a characteristic "S"-shaped curve or sigmoid curve. During these conditions, sigmoidal function signifies to special case of logistic function, determined as the equation: where n refers the weighted sum of inputs. Its range is amongst zero and one. It can be simple for understanding and applying however it has main challenge. Primary, it is a vanishing gradient problem that means in specific cases, the gradient is vanishingly small, efficiently preventing the weight from altering its value. Secondary, its resultant is not zero centered. It generates gradient upgrades that get carried away from distinct directions.
Hyperbolic Tangent function: Its mathematic equation as: Its outcome is zero centered as their range was amongst −1 to 1, for instance, −1 < output < 1. Therefore, the optimization was simpler under this technique practically. It can be sometimes desired on Sigmoid function. However, it also undergoes vanishing gradient problems.
Sine function: Although, one of the AFs utilized from SLFN or deep neural network (DNN) are non-periodic, it is also utilized periodic functions like sine and cosine.
Whether a NN model with sine activation, the whole solutions are repeating periodically and the system is training to some resultant classes. The NN with one hidden state is estimated some function, provided the AF was improving and is set (with min and max) where The sine function could not improve function and input to sine function which has extremely lower and extremely higher are producing the similar outcome.
Cosine function: is utilized to compare with sine function however it could not be a regularly utilized AF. Its outcomes are experimental because of their periodic nature w.r.t. sine functions.
For N trained instance (x i , t i ) ∈ R d × R m , while r i denotes the i th class vector coded as {−1, 1} m , and thereby ELM using G (w, b, x) activation function can be arithmetically expressed by Hβ = T, . . . · · · . . .
later the output function of ELM is formulated by [16]: In which For unknown samplesx, its class is attained as follows: It is notable that the computations of Eqs. (13) and (14) does not need to directly estimate the hidden function h (x) through kernel conversion that brings several benefits: (1) the oscillation of hidden neuron output matrix of ELM is resolved, also the arbitrariness is a major problem in chemometrics. (2) the amount of hidden nodes does not need to be stated. Thus, the computational time of kernel matrix for KELM would be much lesser when compared to the searching time of hidden node number in ELM, when the amount of trained samples is smaller i.e., normal situations in spectroscopy-related chemometrics.

RCMO Based Parameter Tuning
At the final stage, the RCMO algorithm is utilized for the optimal parameter tuning of the KELM model in such a way that the overall detection outcomes can be considerably enhanced. The RCMO algorithm is stimulated by the characteristics of red monkeys. The RCMO algorithmic program was simulating the red monkey performance. In order to modeling, these connections, all clusters from the monkey region units needed maneuvering on the search region. While it is referred that earlier, during this case, it can be separated to teams, all the teams of monkeys are consumed one male, and no needed the male was leader, but the stronger monkey could not from the scope of convention vision. Besides, it could not be several connections amongst male Cercopithecus mitis and young ones [17]. The young male has to come out fast because of territorial aspects connected with Cercopithecus mitis that best performing, as it enters problems with dominant male in another family. When it is defeated that male, it can be leader from the family and proposal place to live, food supply, and socialized to the young males.
The place upgrade considering each one of the red monkeys from the set was dependent upon the place of an optimum red monkey of group; as performance is delineating with the subsequent equation: where, • PBZ implies the monkey body power (an arbitrary number amongst -5 and 5); • PAZ refers the monkey battle power (an arbitrarily selected number among zero and one); • W leader Z signifies the leader weight; • W i Z stands for the monkey weight (arbitrary numbers from the range of four and six); • X Z denotes the place of red monkey; • X best Z indicates the place of leaders. Conversely, rand defines the some number amongst zero and one. For updating the place compared with the children of red monkeys, the next formulas are employed as: where, • PBch implies the rate of power of the child body; • PAch has demonstrated the child fighting rate of power; • Wch leader Z defining the weight of the child of leaders • Wch i Z signifying the child weight in which every weight is indicated to being arbitrary numbers from the range of four and six; • XchZ denotes the place of children; • Xch best Z indicates the place of leader child's, and • "rand" refers an arbitrary number from the range of zero and one. Moreover, this place is transformed from every iteration.
It can be worth declaring that every parameter of RCMO is fixed also to be experimental or based on the issues in nature that are resolved [18]. The RCMO was considered as some parameters which generate it simple for executing; the RCMO is also balancing amongst exploitation as well as exploration stages, creating it appropriate for solving several optimized problems.

Experimental Validation
The performance validation of the RCMO-KELM technique takes place using the Epileptic Seizure Recognition Data Set from the UCI repository [19]. The dataset comprises 5 class labels namely eyes open, eyes closed, with tumor region, healthy brain, and epileptic seizure in the dataset. The dataset holds a set of 11500 instances. Tab. 1 offers the classification result analysis of the RCMO-KELM technique under a distinct number of hidden layers (NHL) and runs. Fig. 3 portrays the classification results obtained by the RCMO-KELM technique under run-1 with distinct NHLs. With NHLs of 10, the RCMO-KELM technique has obtained prec n of 0.9462, reca l of 0.9509, accu y of 0.9472, and F score of 0.9469. Concurrently, with NHLs of 30, the RCMO-KELM methodology has reached prec n of 0.9353, reca l of 0.9223, accu y of 0.9301, and F score of 0.9375.           In this article, a novel RCMO-KELM technique has been developed for epileptic seizure detection and classification. The proposed RCMO-KELM technique initially extracts the chaotic, time, and frequency domain features in the actual EEG signal. Besides, the RCMO-KELM technique involves several stages of operations namely feature extraction, min-max normalization based preprocessing, KELM based classification, and RCMO based parameter tuning. The RCMO technique was utilized for the optimal parameter tuning of the KELM method in such a way that the overall detection outcomes can be considerably enhanced. The experimental result analysis of the RCMO-KELM technique has been examined using benchmark dataset and the results are inspected under several aspects. The comparative result analysis reported the better outcomes of the RCMO-KELM technique over the recent approaches with accu y of 0.956. In future, hybrid DL models can be included to enhance the overall performance.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.