Intrusion detection systems (IDS) are one of the most promising ways for securing data and networks; In recent decades, IDS has used a variety of categorization algorithms. These classifiers, on the other hand, do not work effectively unless they are combined with additional algorithms that can alter the classifier’s parameters or select the optimal sub-set of features for the problem. Optimizers are used in tandem with classifiers to increase the stability and with efficiency of the classifiers in detecting invasion. These algorithms, on the other hand, have a number of limitations, particularly when used to detect new types of threats. In this paper, the NSL KDD dataset and KDD Cup 99 is used to find the performance of the proposed classifier model and compared; These two IDS dataset is preprocessed, then Auto Cryptographic Denoising (ACD) adopted to remove noise in the feature of the IDS dataset; the classifier algorithms, K-Means and Neural network classifies the dataset with adam optimizer. IDS classifier is evaluated by measuring performance measures like f-measure, recall, precision, detection rate and accuracy. The neural network obtained the highest classifying accuracy as 91.12% with drop-out function that shows the efficiency of the classifier model with drop-out function for KDD Cup99 dataset. Explaining their power and limitations in the proposed methodology that could be used in future works in the IDS area.
The usage of computer systems and the Internet has recently resulted in major protection, confidentiality, and privacy difficulties due to the procedures involved in the electronic transformation of data. More work has indeed been transformed into enhancing the user privacy of systems, but systems have all these issues. The aim about an ID is to track a site or server and identify any sort of irregular activity inside the network. Barbara et al. (2002) [
Hussein et al. (2020) [
Machine learning is a strong collection of neural-network learning algorithms. The neural net is an underlying biological framework that allows computers to learn from observations. The multiple hidden layers inside a convolutional neural network, with each hidden layer acting itself as a machine learning techniques. DNN’s basic structure consists of a feed back layer, a number of hidden layers, and an output layer. When preprocessed input is fed into the learning algorithm, the output values can be calculated sequentially throughout the network’s hidden layers. K-means is the classifies data under unsupervised machine learning. In this section the three steps is presented. The first is pre-processing; second, Auto Crptographic Denoising with dropout and without dropout; and the third is Classification.
Preprocessing is a technique to have attributes type conversion. This is carried-out by numericalization and normalization process. Numerilicalization is the process of converting the dataset in the structured format and normalization is the process to minimize or exclude the duplicate data. Numericalization:
Features 2, 3 and 4 (i.e., Protocol_type, Service and Flag) of NSL-KDD were represented as non numerical. ACD technique accepts only the numeric matrix for denoising. For this purpose, the values of features are converted to numeric form in both test and training data set. The encoded binary vectors of udp, tcp, icmp are (0, 1, 0), (1, 0, 0) and (0, 0, 1). There are other features such as 70 type of attributes for the feature service and 11 type of attributes for the feature flag. So after numericalization the 41 feature is mapped to dimensionality of 122 feature. Normalization:
The logarithmic scaling method is used to have same range of values for the features such as duration, src_bytes and dst_bytes which have high values. Finally using min–max scaling every feature values is mapped to [0, 1].
Generally denoising reduces the feature pool of the dataset. In IDS, the dataset in cloud environment the data security is more important. In order to reduce the feature pool efficiently with security the cryptographic technique is used for denoising called ACD (Auto Cryptographic Denoising).
With a feed forward network, auto encrption is a sort of neural network in which the input and output are the same. Its goal is to learn compressed data with the least amount of loss possible. It have three steps such as encrpt, decrpt and code. The encrpter encodes by compressing the input produces the code and decrpter decodes the code that produced by encrpter. This code layer is learned by auto encrpt for determining useful features A. ElAdel et al. (2017) [
Dropout aims to interpret the hidden layers in neural network by adding noise Dahl et al. (2013) [
The features of the extracted dataset are categorized, and performance measures are assessed. The ACD drop-out function is used to create the neural network classification model, and performance measurements are examined. The performance of a K-means classifier model is tested without the use of a drop-out function. The dataset that identifies intrusions in the cloud environment is classified by the classifier model.
The Architecture for two inputs, the first order fuzzy model with two rules is shown in
Assume the inputs X and Y and output Z.
Rule 1: if X is A1 and Y is B1, then f1 = p1X + q1Y + r1
Rule 2: if X is A2 and Y is B2, then f2 = p2X + q2Y + r2
Here every node i in this layer is an adaptive node with a node function,
This layer includes the nodes which represents the antecedent part of an association rule. The output of each node is given as
Weight obtained in layer 2 is normalized by fixed nodes that present in this layer. The output of the normalization layer is given as,
The adaptive nodes included in this layer are the products of normalization strength in polynomial order.
The overall output is given as,
In layer 1 the eight modifiable parameters ci and σi where i = 1, 2, 3, 4. Modifiable parameters ri, si, and ti for i = 1, 2 in layer 4 six
ACD with Drop-outs on the inputs has a 122-neuron activation function Shone et al. (2018) [
Generally, K-means algorithm is implemented in finding the intrusions in NSL KDD that is widely used for internet traffic without drop-out.
The goal of K-means clustering is to divide data into k groups so that data points in the same cluster are similar and data points in other clusters are quite far away. The gap between adjacent locations determines their similarity. The distance can be measured in a variety of ways. One of the most often used distance metrics is the Euclidean distance. The diagram below illustrates how to determine the euclidean distance between two locations in a two-dimensional space. The square of the difference between the x and y coordinates of the locations is used to compute it. The clusters formed that is associated with the prototype that assigned. The cluster prototype is updated to have centroid for current samples. The error function is calculated E.
The network traffic records in the NSL KDD dataset are the traces of traffic that a real intrusion detection system observed, leaving only evidence of its presence. The dataset comprises 43 features per record, 41 of which are related to the traffic input and the last two of which are labels (whether it’s a standard or attack) and score (the severity of the attack) Chen et al. (2020) [
Despite the fact that these attacks are present in the data, the distribution is substantially skewed.
Classes | DoS | Probe | U2R | R2L |
---|---|---|---|---|
Sub-Classes | Apache 2 |
Ipsweep |
Buffer_overflow |
ftp_write |
The KDD Cup99 dataset comprises around 4GB of packed data gathered over roughly 7 weeks of network traffic data. It has 41 traffic flow characteristic properties and is divided into two classes: regular and malicious. The Dataset comprises several assaults, including a Denial of Service (DOS), a User to Root (U2R), a Remote to Local (R2L), and a Probing Attack.
Unsupervised data mining technique for intrusion detection is K-Mean Clustering. It is used to evaluate the Intrusion detection without drop-out. The training and testing dataset is shown in
Dataset | Number of records | ||||
---|---|---|---|---|---|
Normal | DoS | Probe | U2R | R2L | |
KDD Train | 13449 | 9234 | 2289 | 11 | 209 |
KDD test | 2152 | 4344 | 2402 | 2885 | 67 |
Testing dataset | Attack type |
---|---|
Denial of service | Back, Neptune, Smurf, Mailbomb, Udpstorm, Worm, Land, Pod, teardrop, Processtable, Apache2 |
Probe | Satan, Nmap, Mscan, Saint, Ipsweep, Portsweep, |
R2L | GuessPassword, Imap, Multihop, Xlock, Snmpguess, Httptunnel, Sendmail, xsnoop |
U2R | Bufferoverflow, RootkitSqlattack, Ps, Loadmodule, Perl, Xterm, |
Training dataset | Attack-type |
---|---|
Denial of service | Back, Land, Neptune, Pod, Smurf, teardrop |
Probe | Satan, Ipsweep, Nmap, Portsweep, |
R2L | GuessPassword, Ftpwrite, Imap, Phf, Multihop, Warezmaster, Warezclient, Spy |
U2R | Bufferoverflow, Loadmodule, Rootkit |
The KDD Cup99 has training and testing dataset as shown in
Original records | Distinct records | |
---|---|---|
Attacks | 3,925,650 | 262,178 |
Normal | 972,781 | 812,814 |
Original records | Distinct records | |
---|---|---|
Attacks | 250,436 | 29,378 |
Normal | 60,591 | 47,911 |
Class of attack | Attack name |
---|---|
Normal | Normal |
DoS | Neptune, Smurf Pod, Teardro, Landback |
Probe | Ipsweep, nmap, satan, portsweep |
R2L | ftp_write, guess_passwd, imap, multihop, phf_spy |
U2R | Perl, buffer_overflow, rootkit, loadmodule |
Cluster | % | Normal | DoS | Probe | R2L | U2R | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | |
Cluster1 | 20.35 | 20.46 | 3220 | 11020 | 736 | 8454 | 1162 | 1256 | 10 | 15 | 2 | 12 |
Cluster2 | 39.79 | 38.42 | 9622 | 10150 | 197 | 584 | 9 | 145 | 186 | 654 | 10 | 25 |
Cluster3 | 27.89 | 27.14 | 36 | 5550 | 6900 | 8125 | 90 | 421 | 1 | 24 | 1 | 10 |
Cluster4 | 11.79 | 12.32 | 551 | 3300 | 1381 | 4821 | 1029 | 12 | 8 | 9 | 1 | 9 |
The ACD was learned by only using tests labelled “Standard” that reflect the presence of acceptable behavior, but this was accomplished by checking the device to reduce the error function between its outputs and inputs. Every neuron uses the same amount of weight. The neurons are considered to be locally linked since they share the weight layers Urban et al. (2017) [
Single hidden layer (no. of neurons) | Epoch 1/20 | Epoch 2/20 | Epoch 3/20 | |||
---|---|---|---|---|---|---|
32 | Loss | Value loss | Loss | Value loss | Loss | Value loss |
24 | 0.03 | 0.015 | 0.0112 | 0.0084 | 0.0112 | 0.0087 |
16 | 0.032 | 0.034 | 0.0121 | 0.0086 | 0.0100 | 0.0069 |
8 | 0.0335 | 0.016 | 0.0311 | 0.011 | 0.0089 | 0.0081 |
Average | 0.033 | 0.0181 | 0.017 | 0.013 | 0.0127 | 0.0105 |
The measures that follow are: The efficiency of ACD-based anomaly detection is evaluated by Aldweesh, et al. (2020) [
Recall (R) is defined as the proportion of true positive records divided by total number of true positives and false negatives (FN) sorted information
The precision(P) is calculated by dividing the percent of true positives (TP) records by the total number of true positives (TP) and false positives (FP) categories
The harmonic average F
The performance metrics are measured with drop-out function is shown in
Single hidden layer (no. of neurons) | Accuracy | Recall | Precision | F-measure | ||||
---|---|---|---|---|---|---|---|---|
NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | |
32 | 89.53% | 91.21% | 0.9407 | 0.9145 | 0.8772 | 0.954 | 0.9078 | 0.9125 |
24 | 65% | 86% | 0.9566 | 0.9124 | 0.8736 | 0.8458 | 0.9132 | 0.9421 |
16 | 89.90% | 84% | 0.9661 | 0.9542 | 0.8707 | 0.8456 | 0.9159 | 0.9632 |
8 | 90.32% | 86% | 0.9504 | 0.9471 | 0.8812 | 0.9410 | 0.9185 | 0.9147 |
Single hidden layer (no. of neurons) | Normal | DOS | R2L | U2R | PROBE | |||||
---|---|---|---|---|---|---|---|---|---|---|
NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | |
32 | 17.39% | 21.12% | 91.98% | 93.14% | 92.62% | 92.12% | 78.31% | 75.15% | 98.97% | 98.41% |
24 | 18.09% | 15.17% | 92.97% | 92.57% | 99.01% | 94.15% | 100% | 98.15% | 98.97% | 97.18% |
16 | 18.95% | 19.12% | 95.80% | 90.14% | 98.96% | 96.12% | 97.50% | 98.98% | 98.91% | 99.15% |
8 | 22.77% | 25.17% | 96.40% | 95.14% | 98.65% | 9.15% | 100% | 100% | 98.91% | 97.18% |
Accuracy | Recall | Precision | F1 | |||||
---|---|---|---|---|---|---|---|---|
NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | |
With dropout | 88.36% | 91.12% | 0.9622 | 0.9489 | 0.8523 | 0.9152 | 0.9039 | 0.9425 |
Without dropout | 87.09% | 90.78% | 0.9283 | 0.9012 | 0.8568 | 0.8951 | 0.8911 | 0.941 |
Normal | DOS | R2L | U2R | Probe | ||||||
---|---|---|---|---|---|---|---|---|---|---|
NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | NSL KDD | KDD Cup99 | |
With dropout | 0.2203 | 0.3241 | 0.9372 | 0.9781 | 0.9981 | 0.9454 | 1 | 1 | 1 | 1 |
Without dropout | 0.2049 | 0.245 | 0.9603 | 0.9254 | 0.777 | 0.8752 | 0.8358 | 0.931 | 0.9995 | 0.9874 |
The confusion matrix describes the performance of the classification model for normal and attack predicted class. The threshold values classify values between 0 and 1. Here, confusion matrix represents true positive as 7570, false negative as 2140, false positive as 484 and true negative as 12349. The threshold value is 0.01. The threshold value can be changed according to the classification process. EDA-HSSO’s performance renders better performance for the rest of the nodes. Thus, it can well be perceived that the proposed work attains superior performance to the existing work.
The confusion matrix represents true positive as 7720, false negative as 1990, false positive as 919 and true negative as 11914; threshold is measured at 0.01. If the threshold value changes there will be change in the predicted value. The threshold is set to the point where there is highest sensitivity with low specificity.
IDS model | Accuracy | ||
---|---|---|---|
Proposed model | With drop-out | 91.12 | |
88.36 | |||
Without drop-out | 90.78 | ||
87.09 | |||
Chen et al. (2020) | 88.28 | ||
Shone et al. (2018) | 85.42 |
In this paper, the Auto Cryptographic Denoising is proposed with drop-out based anomaly detection. The dataset is trained using normal traffic only. The main contribution of this paper is implementing the Drop-Out function to increase the classification accuracy. The performance metrics and threshold value are measured and compared between with drop-out and with-out drop-out function for two datasets nslkdd dataset and kdd Cup 99. The classification model with drop-out function obtains the better accuracy in KDD Cup99 when compared to classification model with-out drop-out function. The single hidden layer in Neural network is very effective and easy to train the dataset. The simplicity is the strength of this approach. K-means forms the four clusters that is efficient for handling intrusions without drop-out. The data imbalance problem is handled by using drop-out function. Here, the implementation of hidden layer denoising classifier with drop-out and drop-out function is evaluated the IDS using by measuring classifiers performance metrics; where data imbalance problem with low detection rate is achieved. The KDD Cup99 dataset by implementing Neural network classification model with adam optimizer obtains the largest accuracy of 91.12% with drop-out function with less error rate. Explaining In future work, the multiple hidden layers can be build with various optimizers.