Integrated CWT-CNN for Epilepsy Detection Using Multiclass EEG Dataset

: Electroencephalography is a common clinical procedure to record brain signals generated by human activity. EEGs are useful in Brain controlled interfaces and other intelligent Neuroscience applications, but manual analysis of these brainwaves is complicated and time-consuming even for the experts of neuroscience. Various EEG analysis and classification techniques have been proposed to address this problem however, the conventional classification methods require identification and learning of specific EEG characteristics beforehand. Deep learning models can learn features from data without having in depth knowledge of data and prior feature identification. One of the great implementations of deep learning is Convolutional Neural Network (CNN) which has outperformed traditional neural networks in pattern recognition and image classification. Continuous Wavelet Transform (CWT) is an efficient signal analysis technique that presents the magnitude of EEG signals as time-related Frequency components. Existing deep learning architectures suffer from poor performance when classifying EEG signals in the Time-frequency domain. To improve classification accuracy, we propose an integrated CWT and CNN technique which classifies five types of EEG signals using. We compared the results of proposed integrated CWT and CNN method with existing deep learning models e.g., GoogleNet, VGG16, AlexNet. Furthermore, the accuracy and loss of the proposed integrated CWT and CNN method have been cross validated using Kfold cross validation. The average accuracy and loss of Kfold cross-validationfor proposed integrated CWT and CNN method are, 76.12% and 56.02% respectively. This model produces results on a publicly available dataset: Epilepsy dataset by UCI (Machine Learning Repository).


Introduction
Electroencephalography is an electrophysiological method to record neural activity generated by brain neurons. Electrodes are used to perform this method and record all brainwave patterns. These brain signals are helpful to analyze brain performance in different conditions since EEGs are formed of brainwaves caused by emotional changes, motor movements and motor movement imagery, tumor, epileptic Seizure, and many other systems of the human body [1,2]. Electroencephalography has been an important and well-researched topic over the past few years since it plays a significant role in the diagnosis of neural abnormalities. Moreover, these brain signals have been of great interest for Brain Computer Interfaces which are used to facilitate people who are suffering from Paraplegia, Quadriplegia, and Locked-In syndrome [3]. Because EEG has a huge impact on human life, it is imperative to design reliable classification algorithms which are cost efficient and have better diagnostic accuracy.
To perform common EEG inspection, brainwaves are compared with standard EEGs and the experts identify if there are any abnormalities by manual examination. Since it is difficult and time consuming to analyze EEG signals manually, various Source Localization techniques [4,5] have been proposed and widely used to analyze EEG signals. A wide range of statistical techniques has been developed to analyze EEG signals in temporal and spatial resolution [6]. There are many signal transformation techniques such as Fourier Transform, Short-Time Fourier Transform, Hilbert Huang Transform, and Wavelet-based Transform [7,8] which have been used for interpretation of brain signals and to detect anomalies.
Machine Learning (ML) has been of great interest for ML researchers [9,10] and data science practitioners due to its significance in the industrial sector and classification systems [11,12]. In recent years, many machine learning algorithms have been proposed and implemented for EEG classification such as Support Vector Machine [13], K-Nearest Neighbors [14], and Neural networks [15]. Conventional EEG classification methods require Prior knowledge of data and accurate feature selection for better classification accuracy [16,17]. Lately, with the success of Deep Learning (DL) models, researchers have overcome many classification challenges such as deep learning models do not require prior feature derivation and selection techniques [18,19]. Convolutional neural network (CNN), a subtype of Neural Networks, is considered best as compared to other machine learning techniques because it has end-to-end learning ability on raw data in terms of information extraction, online applications, usability, and classification accuracy [20].
Epilepsy is a recurring neural disease which can cause brain dysfunction due to unexpected occurrences of seizures. These Seizures can result in loss of consciousness, limb Tremors, behavioral disorders, and transient sensory disorders, etc. due to mental shocks caused by Seizures [21]. Brain tumors are also a major cause of Seizures, limb numbness, and behavioral disorders [22]. EEG can be a reliable source for the early detection of Seizures due to Epilepsy or Brain Tumors but, it tends to be contaminated by various physiological activities. Traditional EEG classification methods enforce prior feature engineering and their classification accuracy is solely dependent on correct feature selection. Therefore, it is imperative to design an efficient method which can separate the useful information from noisy characteristics of EEG. The application of Deep Learning in Neuroscience has been of great help in the detection of neural abnormalities because DL models can recognize complex EEG patterns without predefined feature engineering.

Related Work
The classification of EEG signals using machine learning algorithms is solely dependent on the correct detection of distinctive EEG features. The categorization of EEG signals in time domain has been performed using a variety of classification methods [23]. The author of [24] implemented EEG classification in the frequency domain using CNN. Image-based EEG classification has also been proposed by the author of [25]. In [26], the author performed EEG classification using Continuous Wavelet transform and machine learning algorithms e.g., SVM and KNN. CNN was initially designed to classify images, but it has been successfully used to classify EEG data in the frequency domain. The binary classification of EEGs has been implemented in [27] using Radial Basis Function for feature extraction and a One Against One binary classifier.
In [24], the author proposed a 1D-CNN for epilepsy detection using the Butterworth filter to denoise raw data and then created a spectrograms matrix which has been used to train CNN for classification. Similarly, a fixed size overlapping window is also used in [28] to generate a collection of sub-signals of EEGs in the time domain whose plot images are used for binary and ternary classification using CNN. A 3D image reconstruction and classification method have been proposed in [29] which used a sliding window to divide time series data into 2D segments and then, 3D image reconstruction is performed to create images suitable for 3D CNN. The author of [30] proposed binary classification of the Epilepsy dataset using KNN, Logistic regression, Decision Tree, and Random Forest. Various machine learning methods are implemented and compared in [31] for binary classification of the Epilepsy dataset.
In the recent literature [26,28,30], multiclass EEG datasets have been used to perform the binary classification via K nearest neighbor, decision tree, random forest, and logistic regression. However, these techniques are dependent on the correct feature selection. Accuracy may also get affected by the inconsistency and non-abruptness of the EEG signals, along with other features extracted from the patient history. Many of the researchers have not included the spatial resolution feature. It is also challenging to select the appropriate window length when using time series data.
In this paper, a multiclass classification on Epilepsy dataset [32] is carried out by using Integrated CWT and CNN method, which classifies data into five different classes. The proposed Integrated CWT and CNN method aim to improve accuracy and loss results achieved by existing method. Lastly, a comparison of the results produced by proposed Integrated CWT and CNN method and the existing DL models is carried out to evaluate performance of our model. The performance of proposed Integrated CWT and CNN method is also evaluated using the K-Fold cross validation.
The proposed integrated CWT and CNN method uses CWT to create time-frequency images of the Epilepsy dataset. Then, the images created by CWT are used as input to the CNN model for feature learning and classification. The rest of the paper is organized as follows: Section 3 defines the techniques used in this paper; in Section 4, the proposed integrated CWT and CNN method is explained, Section 5 contains the description of the Epilepsy dataset used in this paper to generate results, in Section 6, we cross validated the performance of proposed integrated CWT and CNN method against existing deep learning models and K-Fold cross validation using K = 10 folds. Lastly, Section 7 consists of the conclusion derived from this research.

Methods/Techniques
In this section, the working of Convolution Neural Network and Continuous Wavelet Transform is explained.

Continuous Wavelet Transform
In recent years, many signal transformation techniques have been useful to analyze signals such as Fast Fourier transforms (FFT) which provides frequency components of signals, and Short-Time Fourier Transform (STFT), which uses a sliding window to extract time-frequency components of a signal. However, STFT has the limitation of window size and it is suitable for signals that do not change the frequency over time. Empirical Mode Decomposition (EMD) also provides a time-frequency analysis of signals, but it has complex exhibition modes of data which are difficult to interpret. Wavelet transformation (WT) decomposes a signal into a set of frequency components and present their distribution in temporal and spectral domain by compressing, scaling, and shifting signals. Since all frequencies become apparent in WT, it is advantageous over other transformation techniques [33].
A Wavelet (ψ ∈ R) is function with zero average (i.e., R ψ = 0), which is centered around t = 0 and normalized (i.e., ψ = 1). Eq. (1) represents the mathematical form of a wavelet transform: where ψ is the basis function to compute frequency components, u is a shifting parameter along the x-axis in the concerned region and v is a positive scaling parameter along the y-axis because negative scaling is not established. The scaling and shifting parameters continuously vary to convolve mother wavelet over different portions of the signal and analyze it at different scales. Given for the original signal (f ∈ R), the Continuous Wavelet Transform (CWT) [34] function of f using scaling and shifting parameters is presented in Eq. (2) Through this transformation, a one-dimensional signal f (t) can be converted into a twodimensional form Wf (u, v) which is known as scalogram. These scalograms are used to detect and present the most prominent frequencies (scales) of a signal in time-scale representation. From Eq. (3), the scalograms of a signal f (t) can be calculated as: where, the function denotes a scalogram which presents the energy of signal Wf at scale u, and time-location v. There exist many wavelet families which are different from each other based on their compactness and smoothness such as Gaussian, Mexican hat, Shannon and Morlet wavelet etc. In the proposed Integrated CWT and CNN method, we used Morlet wavelet transform as the mother wavelet because it can extract features with equal variance in frequency and time. Eq. (4) presents a general mathematical form of Morlet wavelets Morlet wavelets are sinusoids weighted by gaussian kernel which enables this function to capture local oscillatory components in the time-domain.
Morlet wavelet extract temporal features of a variety of signals and it can adapt to their time-frequency resolution. There is a criterion for selecting a scale which is based on the entropy of signals: where E is the entropy of signal f whereas pi is the probability of k-th class in f .

Convolutional Neural Network
Deep learning has been of great interest for researchers in recent years and it has shown a great advantage to every aspect of life where it has been used. To name a few, denoising, Pattern recognition, fault, and motion detection image segmentation, high-resolution reconstruction, and classification [35] are advance deep learning applications [36,37]. The most prominent model of deep learning is Convolutional Neural Networks(CNN) which do not enforce prior selection of input features. CNN takes raw data as input and learns certain patterns in time and scale dimensions (i.e., scalograms) without any handcrafted filters. CNN has the adaptability to any kind of transformation e.g., linear, and non-linear transformation.
As shown in Fig. 1, a CNN architecture has two main layers, an input, and output layer where there can be a variable number of hidden layers between these two layers. The hidden layers are comprised of combinations of Convolution layer, pooling layer, batch normalization layers, Rectified Linear Unit Layer, and one or more fully connected layers. CNN is just like feed-forward neural networks which is composed of one or more layers with a variable number of neurons. The input passes through the network as linear combinations of input culminated by each neuron from each layer so that the network can learn highly non-linear features.
Convolution layer is the essence of CNN architectures because this layer is composed of feature maps which are generated by computing cross-correlation between the previous layer's output and kernels in receptive neurons. Each neuron in the current layer is associated with a different region of the previous layer's input to extract distinct elements from it [38]. Typically, due to internal covariation in training data, the distribution of feature maps changes due to the update of parameters. This phenomenon requires selecting a small learning rate and initialize parameters carefully. This problem seems to slow down the learning process and makes it harder to learn features with saturating nonlinearities. Therefore, each convolution layer is followed by a batch normalization (BN) layer to avoid overfitting and slow convergence while classification.
Fully connected (FC) layer takes the output of previous layers and combines them to generate a vector of probability scores. The output layer of CNN architecture assigns data to the respective classes based on computed probability scores. The classification accuracy and loss can be determined by using Eqs. (6) and (7).

Proposed Integrated CWT and CNN Method
Since the classification of Epilepsy Dataset can lead to early seizure detection, it is crucial to design a method which can achieve maximum classification accuracy to avoid adverse consequences of seizures. The maximum accuracy achieved by the author of [39] on Epilepsy Dataset using CNN is 72.49%. In this paper, we incorporated Continuous Wavelet Transformation with a new Convolutional Neural Network model to improve classification accuracy and performance of [39]. Initially, the dataset is shuffled randomly to make sure that samples from different classes are appearing in training, validation, and testing datasets equally. Furthermore, a standard scaler is used to normalize data with mean = 0 and standard deviation = 1. For reading the Epilepsy dataset and performing data normalization on the acquired dataset, we have divided the dataset using a splitting factor = 0.3 such that a) training dataset contains 8050 images b) validation and testing datasets contains 1725 images each. Then Continuous Wavelet transform is applied to generate 2D images of shape 128 × 128, and the images are reshaped to 128 × 128 × 1. Lastly, performing label encoding on label vectors e.g., Training_labels, Validation _labels, and Tesing_labels.
The continuous wavelet transformation is used to create two dimensional images using scale values 0 to 128. These time-frequency coefficients are then rescaled by 128 to create 128 × 128 dimension images. Furthermore, the collection of these images is then divided into training, validation, and testing datasets with a splitting factor of 0.3. The training dataset contains 8050 images where the validation and testing datasets have 1725 images each. The proposed integrated CWT and CNN method take these images as input to learn all possible features of attributes of Epilepsy Dataset, see Tab. 2. A flowchart of the proposed Integrated CWT and CNN method is presented in Fig. 2.

Figure 2: Proposed integrated CWT and CNN method flowchart
The proposed Integrated CWT and CNN method is broken down into feature detection and classification. The feature detection part has three convolutional (Conv) layers present in proposed architecture followed by a Batch normalization (BN) Layer. The convolution layers use ReLU function to activate neurons which contain linear combinations of data patterns learned by the network, whereas Batch Normalization is performed right after convolution layers.
For the classification part, the network consists of a flatten layer which converts all features into one dimensional data so that, the data can be forwarded to the fully connected layer. There are three fully connected layers in this part where the ReLU activation function is applied in the first two dense (FC) layers, and the Softmax activation function is used in classification layer. Essentially, the output layer has five nodes as it must classify five-class data. The architecture of proposed CWT and CNN method is presented in Fig. 3.
The proposed Integrated CWT and CNN method has input size of 128 × 128 pixels. The first convolutional layer has 128 filters of 7-by-7, the second layer has 256 filters of size 5-by-5, and the third layer has 128 filters of 5-by-5. Furthermore, the first convolution layer uses a stride of size 4, where the second and third convolution layers use strides of size 1. For the classification part, the first two FC layers contain 128, 50, and 5 nodes, respectively to classify data with respect to five attributes. The network is trained using the 18,430,257 parameters, however, the total number of parameters learned in this architecture are 18

Activation Function
RELU activation helps deep learning models to train faster on complex features than standard unit. Also, RELU does not suffer with the issue of vanishing gradients. The proposed Integrated CWT and CNN method use RELU activation for each convolution layer and hidden FC layers.
The output layer is reserved for Softmax activation and it contains concluded probabilities for all classes. (See Eq. (8)) where, C is used for number of classes in an input vector I.

Loss Function
Loss functions evaluate the performance of deep learning model on the given data. There are two types of loss functions in deep learning: regression and classification. Since our work is based on Classification, we used the Cross-entropy loss (CEL) function. CEL is a log loss, which estimates the performance of a classification model whose output is a probability which always lies between 0 and 1. Eq. (9) can be used to define the cross-entropy loss function for multi-class data.
where, M and P are used for number of classes in the data and probability of a sample o in class c, respectively.

Optimization
Adam Optimizer is useful in the training process to update weight parameters since it is adaptive to moment estimation and it overcame the problem of vanishing learning rates/moments. It is computationally effective because it requires very less memory.

Dataset Description
Electroencephalography records non-invasive brain signals generated by neurons due to some neural activity. These signals can be used to track brain functions, but EEGs tend to be noisy due to epilepsy, brain tumors, muscle movements, movement imagery, and Alzheimer's disease, etc. Usually, these noisy characteristics in the EEG signals make it challenging to separate the useful information from attributes of other classes with similar time-frequency patterns. For a precise identification and classification of multiclass EEG signals, machine learning models require a large amount of data since, the data will be divided, pooled, and normalized during the feature learning process.
The data used in proposed Integrated CWT and CNN method is an open-access dataset known as Epilepsy dataset which is available at UCI Machine Learning Repository and it was published by [32]. This data was recorded from 500 individuals with different health conditions such as healthy, epilepsy and tumor, etc. This dataset contains a collection of attributes of data from five different classes, see Tab. 2.
The original Epilepsy dataset contains five folders, where each folder contains EEG data recorded from 100 subjects. For each class, there are 4097 data points recorded for the duration of 23.6 s. To create a significant number of training images, each EEG recording is divided into a collection of small one second instances, where each instance contains 178 data points. The resulting dataset is comprised of 11500 information samples of one second. Additionally, the dataset is shuffled and reorganized to avoid biased classification of EEG data towards any class and, to make sure that the data points from each class get to be part of CNN training. Essentially, samples from each class have different characteristics from each other which are illustrated in Fig. 4.

Results and Discussion
We trained the proposed Integrated CWT and CNN method using Google Colaboratory, which is a cloud-computing based service to train deep learning models in Python environment. We implemented this method in Keras using Tensorflow at the backend. Multiple python libraries such as Pandas, Numpy, and sklearn etc., have been used for data processing and simulations. Famous deep learning (DL) models such as GoogleNet, VGG16, and AlexNet are also implemented so that comparison between the proposed Integrated CWT and CNN method and existing DL models could be performed. AlexNet architecture won ILSVRC-2012. AlexNet is trained for 150 epochs and the resulting accuracy and loss of AlexNet for EEG dataset classification can be seen in Fig. 5.
GoogleNet won ILSVRC-2014. This model is trained for 300 epochs and the resulting accuracy and loss graph of GoogleNet performance on the EEG dataset are shown in Fig. 6. VGG16 is also a famous deep learning architecture which is named after the Visual Geometry Group at Oxford. This model outperformed many previous generation models in ILSVRC-2012 and ILSVRC-2013 competitions. Fig. 7 presents the resulting accuracy and loss of VGG16 for 150 epochs.  GoogleNet, VGG16, and AlexNet architectures seem to perform well on the Epilepsy dataset during training and validation steps whereas, these architectures suffer from overfitting while performing classification on test data which results in bad accuracy scores. The proposed Integrated CWT and CNN method seem to perform well on the same Epilepsy dataset and the results of proposed Integrated CWT and CNN method do not show a huge discrepancy in accuracy scores of training, validation, and testing phases. The results generated by all DL models and proposed Integrated CWT and CNN method for their performance assessment are presented in Tab. 3.
The results of proposed Integrated CWT and CNN method are also cross validated using K-Fold cross validation using 10 folds of Epilepsy dataset. K-Fold cross-validation method is useful to evaluate the performance of a trained model on unseen data from the original dataset.
This cross-validation process makes sure that the model is not performing in a biased manner towards a specific class. In this process, each data point gets to be a part of the testing process at least once and the model gets to trains on this data on multiple times depending on the number of folds i.e., if we use k = 10 folds then each data point is used as a part of training for k − 1 times. The average accuracy achieved by Kfold cross validation for proposed Integrated CWT and CNN method is 76.02%. The overall Accuracy lies in the range of 74% to 79% and loss lies in the range of 50% to 57%. The proposed Integrated CWT and CNN method has improved classification accuracy by 6.35%, loss is reduced by 6.02% and the performance time of proposed Integrated CWT and CNN method is also efficient. Lastly, the performance of proposed Integrated CWT and CNN method can be observed in Fig. 8.  In the proposed work, we demonstrated that Deep learning models are advantageous for EEG classification and timely prediction of epileptic seizures to avoid damage caused by recurrent seizure occurrences. We proposed an Integrated CWT and CNN method to classify EEG data and detect seizures caused by Epilepsy and Brain Tumors. The configurations of three existing Deep Learning models are experimented on Epilepsy Dataset [32] and their results are compared to the proposed Integrated CWT and CNN method.

Figure 7: VGG16 performance
The proposed CWT and CNN method generated better accuracy and loss results in a timely manner. As shown in Tab. 3, our program generated better loss and accuracy results than [39]. Specifically, the proposed Integrated CWT and CNN achieved better test accuracy than GoogleNet, VGG16, and AlexNet. Consequently, the proposed Integrated CWT and CNN method has better loss results than VGG16, AlexNet, and [39]. Moreover, the proposed Integrated CWT and CNN method has better learning time against GoogleNet, VGG16, AlexNet, and [39]. The proposed Integrated CWT and CNN is performing better for the EEG classification over other classification techniques due to their ability of end-to-end learning. We incorporated CWT and CNN successfully to classify EEG data without losing low or high frequencies.
In the proposed integrated CWT and CNN method, if the classification accuracy is improved, the loss score is increased as well for a certain amount of dataset, which may require further research. We hope to improve performance time, and accuracy of the proposed Integrated CWT and CNN method by adding more data, multiple feature selection, and refining the layered architecture of CNN. Additionally, we aim to reduce the number of false positives (See Fig. 9.) while performing EEG classification on Epilepsy Dataset. The research intents to utilize the proposed method in medical applications for early seizure and brain tumor detection in the future. Further study is required to refine the performance of the proposed Integrated CWT and CNN method and achieve maximum classification accuracy with minimum loss score.