Hybridized Wrapper Filter Using Deep Neural Network for Intrusion Detection

Huge data over the cloud computing and big data are processed over the network. The data may be stored, send, altered and communicated over the network between the source and destination. Once data send by source to destination, before reaching the destination data may be attacked by any intruders over the network. The network has numerous routers and devices to connect to internet. Intruders may attack any were in the network and breaks the original data, secrets. Detection of attack in the network became interesting task for many researchers. There are many intrusion detection feature selection algorithm has been suggested which lags on performance and accuracy. In our article we propose new IDS feature selection algorithm with higher accuracy and performance in detecting the intruders. The combination of wrapper filtering method using Pearson correlation with recursion function is used to eliminate the unwanted features. This feature extraction process clearly extracts the attacked data. Then the deep neural network is used for detecting intruders attack over the data in the network. This hybrid machine learning algorithm in feature extraction process helps to find attacked information using recursive function. Performance of proposed method is compared with existing solution. The traditional feature selection in IDS such as differential equation (DE), Gain ratio (GR), symmetrical uncertainty (SU) and artificial bee colony (ABC) has less accuracy than proposed PCRFE. The experimented results are shown that our proposed PCRFE-CDNN gives 99% of accuracy in IDS feature selection process and 98% in sensitivity.


Introduction
Nowadays Computer networks, wireless networks are widely used by variety of applications which are prone to myriad of security threats and attacks. The security challenges that have to be solved originate from the open nature, the flexibility and the mobility of the wireless communication medium [1,2]. In an effort to secure these networks, various preventive and protective mechanisms such as intrusion detection systems (IDS) were developed [3]. Primarily, IDS can be classified as: host based intrusion detection systems (HIDS) and network based intrusion detection systems (NIDS) [4]. Furthermore, both HIDS and NIDS can be categorized into: signature-based IDS, anomaly-based IDS and hybrid IDS [5,6]. An Anomaly based IDS analyses the network under normal circumstances and flags any deviation as an intrusion. A signature-based IDS relies on a predefined database of known intrusions to pinpoint an intrusion. In this case, a manual update of the database is performed by the system administrators. The associate editor coordinating the review of this manuscript and approving it for publication was Shagufta Henna. In terms of performance, an IDS is considered effective or accurate in detecting intrusions when it concurrently achieves low false alarm rates and high classification accuracy [7]; therefore, decreasing the law false alarm rate as well as increasing the detection accuracy of an IDS should be one of the crucial tasks when designing an IDS. In this paper, the terms wireless intrusion detection system (WIDS) and intrusion detection system (IDS) will be used interchangeably. In order to obtain better network security, a various research were conducted for IDS such as bagged boosting with C5 decision trees [8] and kernel miner [9] which are the earlier detection of IDS. The papers from [10,11] applied Machine learning techniques such as Support vector machine for IDS. The various ML techniques such as Artificial neural network (ANN), SVM and Multivariate adaptive regression spline (MARS) [12][13][14] were used in IDS to detect the normal traffic from the attacks.
Due to the current network traffic because of processing large amount of data, the security is an issue in IDS [15]. The huge amount of data leads to mathematical difficulties with high computational complexity in classification process. Also these large size of datasets may contains noise, redundancy, and unrelated features which creates challenge to the classification. Processing the large volume of data with all the features will affect the classification accuracy. To address the issues in feature selection methods, this paper proposed a hybrid feature selection called Pearson correlation based Recursive feature elimination to select the relevant features that are close to the data which will increase the classification accuracy. In order to process the large volume of data, the best and well known Deep learning concepts were used in this paper to classify the data. DL has been applied to various fields such as language identification, image processing and pharmaceutical research [16][17][18]. With this knowledge, the DL technique called Convolutional Deep Neural Network (CDNN) is applied on our work for classification. Our contribution of the paper is as follows, This work proposed a hybrid wrapper feature selection method called Pearson Correlation based Recursive Feature Elimination (PCRFE) to remove the redundant and irrelevant features from the dataset. This evaluate the correlation between the features and generate the subset of relevant features using the Recursive Feature elimination technique. Due to the subset of feature selection, this proposed PCRFE-FS technique will improve the detection rate, accuracy of the classification with low computational complexity. Convolutional Deep Neural Network (CDNN) is used as a classifier for IDS, Which is the deep learning technique. The efficiency of the proposed technique is evaluated using the performance metrics. The experimented results were compared with the previous IDS algorithms in terms of feature selection and previous IDS systems. The evaluated dataset is NSL-KDD dataset.
The rest of the paper is organized as follows: Section 2 outlines the literature related to IDS, Section 3 introduces the proposed feature selection and classification algorithm called PCRFE-CDNN-IDS, Section 4 presents the experimented results and analysis of the comparative study and Section 5 concludes the paper with the future work.

Related Work
This section describes about the various literature related to IDS. In order to detect the anomalies in the network traffic of IDS, the IDS dataset called NSL-KDD dataset analyzed in paper. They also analyzed about the protocols relate to the attacks which is used by the intruders to create the network traffic using the classification algorithms and WEKA tool. They proposed Least square support vector machine based IDS called LSSVM-IS for optimal feature selection. The evaluated datasets are KDD Cup99, NSL-KDD and Kyoto 2006+. Various feature selection algorithms such as Information gain, PCA, Correlation feature selection (CFS), Genetic algorithm, Artificial Bee colony and PSO are analyzed to boost the network IDS.
They concluded that ABC-NIDS performs better than other algorithms. Deep belief network based dimensionality reduction was proposed in paper [19]. They used SVM as classifier and NSL-KDD dataset have been used for analysis. Bi layer behavioral based feature selection was proposed in paper [20] which consists of two layers such as information gain used to rank the features based on the global maxima, a new set of features are selected as 41 to 34 features then in the second layer, the selected features are redacted to find global maximum to reduce the number of features as 34 to 20. The evaluated dataset was NSL-KDD dataset. IDS based on CNN was proposed in paper [21]. To balance the network traffic, before the training of CNN, an algorithm called synthetic minority oversampling technique with edited nearest neighbours (SMOTE-ENN) was applied on the NSL-KDD dataset. This SMOTE-ENN based CNN obtains 83.31% of accuracy.
IDS with deep learning using feed forward deep neural networks (FFDNN) was proposed in paper [22] which is combined with filter based feature selection method. The evaluated dataset was NSL-KDD dataset. Feature selection based on ant colony optimization with two level pheromones applied on KDDCup 99 for IDS [23]. Wrapper based feature selection called Genetic Algorithm (GA) has been applied on IDS in paper [24] and to evaluate the algorithm logistic regression used.
The study on various IDS with bench mark datasets were analyzed in paper [25] to understand the different attacks and relevant issues and problems of IDS. They also evaluated the performance of IDS with machine learning classification algorithms and suggested some feature selection classification algorithm for IDS. They suggested that Auto encoder and Recurrent Neural network of deep learning performs better. And also the combination of SVM with RBMS also performs better. Feature selection algorithm called auto encoder damped with incremental statistics algorithm was proposed in paper [26] and HELAD used as a classifier combined with LSTM which is evaluated MAWLAB dataset.
Principal component analysis and auto encoder used as a feature selection algorithm in paper [27] and CNN as a classifier on KDD Cup 99 dataset. They obtain the accuracy of 94% and 93% of detection rate. The paper [28] evaluated the ten ML algorithm on NSL-KDD dataset for IDS in order to choose the best classifier based on the performance metrics. Convolutional neural network based classification was proposed on paper [29] which obtains high accuracy and FAR rates. Deep stacked auto encoder based feature extraction was proposed in paper [30] and softmax used as a booting for classification on NSL-KDD dataset with the accuracy of 98.6% and UNSW-NB15 dataset with the accuracy of 92.4%. Paper [31] proposed a feature selection approach using information gain used to find the attack on NSL-KDD dataset in order to find the best feature set for each attack based on threshold. The classifiers with Random forest and PART obtains high precision and accuracy. Based on the reviewed literature, IDS with better performance is still needed and based on the knowledge of the reviewed techniques we proposed optimized feature selection with deep learning for IDS.

Proposed Pearson Correlation Recursive Feature Elimination Methodology
Deep learning is types of machine learning techniques which are inspired by artificial neural networks algorithms that imitate the way the human brain think rather machine learning used the simpler predictive models. Deep learning concepts require a larger datasets for processing. For smaller volume of data, deep learning is not suited one. Since DL requires large volume of data, the parameters, computation and formulation to train the ANN takes time and the methods to train the models with improved accuracy are still in research. The repeated and irrelevant features in the dataset are leads a problem in network traffic classification. These irrelevant features will reduce the accuracy of the classification and also make the classification system as slow. In this proposed work, the hybrid version of filter and wrapper based feature selection algorithm with deep learning techniques. This proposed work is evaluated in network intrusion detection. The proposed Intrusion Detection system overview is shown in Fig. 1.
Initially the network data set is divided into three datasets such as training, testing and validation in the ration of 6:2:2. The raw data are preprocessed to remove the missing and redundant features. Then the proposed work uses the filter such as Pearson correlation based recursive feature elimination algorithm for feature selection. Until the stopping condition met, the subset of features are selected using the proposed filter based FS approach. Selected features are then trained and classified using the deep learning algorithm called Deep Neural Network. These proposed PCRFE based DNN classification approach analyzed with the intrusion detection dataset to prove the efficiency of the proposed work.

Network Data Preprocessing
The preprocessing is important step before proceeding the classification approaches. the raw data is the combination of numeric and non-numeric data. The deep neural network can process numeric data. Using scikit learn in python, all the non-numeric symbols are transformed into numeric value. Normalization is the process to scale the data values into range [0, 1]. To apply the normalization on the data, the minimum value of each feature value is subtracted and divided with the range as (maximum-minimum) using the following Eq. (1) where X_i-ith feature, i = 1…. n, n-total number of features.

PCRFE Based Feature Selection
The normalized data are then given as input to feature engineering part called filter based feature selection. In this proposed work, hybrid FS approach called Pearson correlation based RFE used. Pearson correlation is the relationship between the data that vary between the range [−1, 1]. Value 1 means positive correlation, 0 means no correlation and −1 means negative correlation. This method remove the features at once from the machine learning model rather than removing the features at each step. Because of this, it is a faster than wrapper based filters and embedded filter methods. And also this method uses a threshold value to rank the features. Minimum the threshold will remove more features. So choosing the threshold value is next important choice. Based on corresponding value of correlation coefficient threshold is chosen. Here, threshold is defined based on testing the multiple hypotheses also.

Algorithm:
Input: Normalized Feature set Output: Selected Feature subset Step 1: for all features i = 1…n Step 2: Compute the correlation coefficient of the feature using the Eq. (2) PC xiyi ¼ P n i¼1 ðx i À xÞðy i À yÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n i¼1 ðx i À xÞ 2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n i¼1 ðy i À yÞ 2 q (2) Step 3: Eliminate the features using the PCRFE Eq. (3), Step 4: if (PCRFE(x i ) ≥ threshold then Step 5: Add the features into the subset Step 6: end if Step 7: repeat step 2 to 6 till all the features evaluated Step 8: end for The normalized and selected features are then fed into convolutional deep neural network (CDNN) for classification.

Convolutional Deep Neural Network
In this proposed work, CDNN is used as a deep learning method for IDS to classify the normal and abnormal data in the network traffic. In most cases of CNN uses the image as input while the grey images with 2D and color images with 3D representation. Our evaluation of proposed work consider the NSL-KDD data set. Among the 121 features of the dataset, the selected features are then transformed into 11 × 11 array. In the proposed DNN, there are five layers are involved. Convolutional, pooling, input layer, hidden layer an output layer which are fully connected. The convolution and pooling layer are operate the activation functions. The data are transformed form input layer to class layer through hidden layer. This deep neural network used sigmoid activation function for binary classification and Softmax activation function for multi class classification. The proposed CDNN-IDS is shown in Fig. 2. The input 11 × 11 dimension array is given as input to the input layer. The convolution layer contains multiple kernel values which is related to bias and weights. Convolution process is done using the Eq. (4).
where kernel k = p*q size, w i −weight and v i = image luminance value of the image dimension x i , y i . After the convloution, the dimension reduced into the size of 2 × 2 as pooling stage. Between the input and hidden layer the bias value b is added and the activation function is h.
In this work for binary classification sigmoid function used as activation function and multi class classification softmax activation function used as Eq. (5). In this work, four hidden layers are used.
The hidden layers takes inputs from input layer and performed the activation operation and then produces the output based on the weight value. The computation of the hidden layer with non linear loss function is declared as Eq. (6) where h is the activation function. The loss value of the actual and predicted value is calculated using the Eq. (7). The minimization of the loss functions will leads to get better result in deep learning neural network.
The proposed Network Intrusion Detection using filter based deep learning is shown in Fig. 3. The input data are pass on to various level of processing called normalization, feature elimination and classification using the proposed approaches.

Results and Discussions
The proposed work has been experimented as a binary classification on NSL-KDD dataset. This proposed model is implemented using python and python deep learning library called keras. This CDNN-IDS consist of two convolution, two pooling and three fully connected layers of input, hidden and output is used. The pool size is declared as 2\ast 2. For the three fully connected layers two neurons are used to train the model. The dropout rate of this model is 0.3. The proposed work is evaluated using the performance metrics.

Data Set
To analyze the performance of the proposed PCRFE-CDNN-IDS system, the benchmark network traffic dataset called NSL-KDD used. It is proven to be the best dataset for testing the IDS. There are 41 features that are divided as basic, conent based and time based attributes. Training set consist of 22 attacks and 16 attacks are considered as testing set. The attacks are categorized as 1) Denial of Service attacks (DoS) 2) Probe Attacks (PA) 3) Remote to Local attacks (R2L) and 4) User to Root attacks (U2R). The IDS attacks with detailed explanation and the training, testing data are mentioned in Tab. 1. with the binary class.

Features Selection
Among the 41 features of NSL-KDD dataset, the proposed feature selection approach called Pearson Correlation based Recursive Feature Elimination, eliminate the irrelevant features from the features set recursively and add the selected features to feature subset. This proposed filter based feature selection select 4 relevant attributes for further processing. To evaluate the performance of the proposed Feature Selection algorithm, number of selected features of different FS algorithms are compared with the proposed filter based FS approach which is represented in Tab. 2. The selected features name by the proposed model is represented in Tab. 3. Probe Obtain the detailed specification of network configuration details. Intruders trying to collect the target machine informationss. This attack violates the system confidentiality and integrity.
11656 2422 R2L Illegal access. Intruders make traffic flow and get unauthorized access. This violates system integrity.

2887
U2R Obtain the root of the PC. This also violates the integrity of the system.

Evaluation Using Performance Metrics
The proposed PCRFE-CDNN-IDS system is compared with the existing approaches to analyze the performance using the performance metrics such as Accuracy, False positive rate (FPR), False negative rate (FNR), Sensitivity/True positive rate (TPR), Specificity/True negative rate (TNR) and recall/Attack Detection rate (ADR) [3]. The evaluation metrics equations are represented as

Proposed System Evaluatin Interms of Feature Selection
The proposd work is evaluated with the total number of features and the selected fatures using proposed PCRFE feature selection. The evalurated resutls are shown in Tab. 4.
From the table, the accuracy of 99% is obtained while reducing feature set. The propsoed CDNN-IDS with all the features are evaluated first which obtain the accuracy of 91%. While using the propsoed Feature selecton scheme called PCRFE, the accuracy percentage of the classification is improved by 8% and obtain 99% of accuracy on classificaiton of the IDS data using the propsoed Feature selection and deep learning model. This evaluation is illustrated in Fig. 4.
The proposed work feature selection performance is compared with the existing FS on IDS such as, Discrete differential equation [4], Gain ratio [5], symmetrical uncertainty [6] and ABC [3]. The experimented results are shown in Tab. 5 and illustrated in Fig. 5. From the evaluated results, our proposed pearson correlation based recursive feature elimination reduce the feature set into 6. Which is the most relevant features for classification and obtained high accuracy of 99% compared to other existing IDS feature selection schemes. Hence, the feature selection we proposed will reduce the feature set and select the relevant features which guarantee the accuracy of the IDS system.

Performance Comparison of Proposed with Existing IDS Systems
In order to prove the deep neural network based IDS systems, our proposed convolutional deep neural network based IDS is compared with the existing IDS systems such as DMNB [7], DBN-SVM [19], Bi-layer behavioral-based [20], TUIDS [32], FVBRM [33], PSOM [34] and LSSVM-IDS + FMIFS [2]. The experimented results is represented in Tab. 6. The accuracy and FPR is illustrated in Figs. 6 and 7. From the experimented results of the proposed work, its proven that our proposed filter based feature selection with deep learning IDS obtain high classification accuracy of 99.96% with the minimum False positive rate of 0.23. Hence the proposed IDS achieves high accuracy than others with low FPR than others.  Hence the experimented result with NSL-KDD dataset of the proposed pearson correlation based recursive feature elimination reduces the irrelevant features in a secure way which leads to increase the accuracy level. Our proposed convolutional deep neural network classify all four Network attacks of DoS, Probe, R2L and U2R with high accuracy. It is proven that our proposed deep learning approach obtain better result on intrusion detection system.

Conclusion
In this paper, we proposed Pearson correlation based recursive feature elimination (PCRFE) for reducing the redundancy among the features using recursive feature elimination and create the relevant subset of features that are correlated. The selected subset feature data are then classified using the DL method called CDNN for better detection of intruders. The evaluation is done using NSL-KDD dataset. Based on the experimented results, the proposed PCRFE-CDNN-IDS obtains better performance in detecting intrusions among the network. The comparative analysis with various IDS can also prove that our proposed IDS is efficient. In future, the proposed IDS will apply for multi class classification to improve the detection rate with optimized feature selection strategies and will try with some other IDS datasets other than NSL-KDD to know the efficiency of the proposed scheme.
Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.