|Computers, Materials & Continua |
Adversarial Training Against Adversarial Attacks for Machine Learning-Based Intrusion Detection Systems
Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Karachi, 75600, Pakistan
*Corresponding Author: Muhammad Shahzad Haroon. Email: firstname.lastname@example.org
Received: 13 March 2022; Accepted: 26 April 2022
Abstract: Intrusion detection system plays an important role in defending networks from security breaches. End-to-end machine learning-based intrusion detection systems are being used to achieve high detection accuracy. However, in case of adversarial attacks, that cause misclassification by introducing imperceptible perturbation on input samples, performance of machine learning-based intrusion detection systems is greatly affected. Though such problems have widely been discussed in image processing domain, very few studies have investigated network intrusion detection systems and proposed corresponding defence. In this paper, we attempt to fill this gap by using adversarial attacks on standard intrusion detection datasets and then using adversarial samples to train various machine learning algorithms (adversarial training) to test their defence performance. This is achieved by first creating adversarial sample based on Jacobian-based Saliency Map Attack (JSMA) and Fast Gradient Sign Attack (FGSM) using NSLKDD, UNSW-NB15 and CICIDS17 datasets. The study then trains and tests JSMA and FGSM based adversarial examples in seen (where model has been trained on adversarial samples) and unseen (where model is unaware of adversarial packets) attacks. The experiments includes multiple machine learning classifiers to evaluate their performance against adversarial attacks. The performance parameters include Accuracy, F1-Score and Area under the receiver operating characteristic curve (AUC) Score.
Keywords: Intrusion detection system; adversarial attacks; adversarial training; adversarial machine learning
Machine learning models are currently being deployed in many domains . Researchers are focusing on robustness of machine learning algorithms to maintain performance. One of the major threats to robustness of machine learning algorithms is adversarial attacks. These attacks are aimed to fool the machine learning model and misclassify the output. Models are often times trained with expectations in mind for ease of computation, such as feature independence and linear separability of the data, but these types of convenience can at times open possibilities for adversarial attacks and make models vulnerable .
Adversarial attacks can be classified into two types: White-box attacks and black-box attacks. In white-box attacks, an adversary has the knowledge of the trained model, training data, network architecture hyper parameters etc. Whereas, in a black-box attack, an adversary has no access to training data and training model. Thus an adversary acts as a normal user and only knows the output of the model (label or confidence score).
Security concerns in enterprise networks remains a major worry as cyber threats increase day by day . An intrusion detection system (IDS) is considered the main defence system against these cyber threats. Hackers are inventing new techniques frequently which can bypass the IDS. IDS are categorized into two major categories: Signature-based and anomaly-based. Signature-based IDS systems are developed by extracting information used in earlier attacks which are called a signature. Every time a new attack appears, the signature must be updated into the system . Whereas anomaly-based IDS systems inspect traffic based on the behaviour of activities. Anomaly-based models are trained to classify normal and malicious traffic which can detect new attacks as well .
Researchers have used Machine Learning (ML) in anomaly-based IDS with the hope of improving intrusion detection. The limitation of ML models concerning the security of the model itself has been explored in the literature. Researchers have focused on the image processing domain and investigated it thoroughly [6–9]. Similarly, machine learning models have also been found to be vulnerable in the intrusion detection domain. There is a limited number of studies that have investigated the adversarial attack on machine learning-based intrusion detection systems like [2,5,10,11] while some papers have also studied their defence [12–14]. Some of the relevant papers related to current research are discussed in Section 3.
In this paper, the focus has been on adversarial defence. Multiple datasets are used for generation of adversarial attacks. The models trained using different ML datasets are then compared for performance. Models are also trained on adversarial attacks and their performance then analyzed. The paper is organized as follows: Section 2 discusses generation of adversarial attacks. Related literature is reviewed in Section 3. Experimental setup is discussed in Section 4 whereas results are discussed in Section 5. We conclude in Section 6.
2 Generation of Adversarial Attacks
Multi-layer Perceptron (MLP)  is used for adversarial attack generation. It is a feed-forward neural network and consists of fully connected three layers. The input layer receives the input to be processed. The output layer provides the predictions and classification of the received input. The hidden layer is the computation engine where all the inputs are processed. MLP is made of neurons called perceptron. The structure of a perceptron is given in Fig. 1.
In the MLP network, each perceptron receives n features as input (x1,x2,…xn) and each feature is associated with weights (. The input features are passed on to an input function u, which computes the weighted sum of the input features as given in Eq. (1):
The result of this computation is then passed onto an activation function f, which will produce the output of the perceptron. For example, a step function can act as an activation function as given in Eq. (2):
where θ is the threshold parameter.
2.1 Jacobian-based Saliency Map Attack
Jacobian-based Saliency Map Attack (JSMA) was proposed in 2016 . The aim was to misclassify by minimizing the modified features involved in an adversarial example generation. In this method, a saliency map is created for the input sample which has the saliency values for each feature. This saliency value suggests how much the classification process is manipulated. According to the saliency value each feature is selected in decreasing order. The process continues until the modified feature threshold is reached or misclassification occurs. This process creates adversarial examples close to the original sample .
For the white-box attack category, JSMA is more suitable for an adversary  but requires high computational power. In , Euclidean distance is used to measure the closeness of the original sample and adversarial sample which confirms 99% similarity. Only 6% of features are modified for the generation of adversarial samples in .
2.2 Fast Gradient Sign Attack
Fast Gradient Sign Attack (FGSM) was first proposed in 2014 . The FGSM attack on neural networks is formulated by using gradients. The neural network minimizes the loss by adjusting weights through the feedback of back propagated gradients. To attack the neural network, the FGSM attack maximizes the loss using the same back propagated gradients.
The FGSM based adversarial attack is formulated as given in Eq. (3):
where x are the inputs to the model, is the magnitude of the perturbation and J(, x, y) is the gradient of the adversarial loss.
FGSM attack was initially evaluated on image related datasets like ImageNet , MNIST  and CIFAR  where they added a very small amount of carefully constructed noise to misclassify the object. It was later used in the intrusion detection domain like in [2,20]. FGSM is a white box category that modifies 100% features to generate adversarial samples . The authors in  also summarized FGSM as less effective but efficient with respect to computational time. Fig. 2 gives an example of how JSMA and FGSM changes attack traffic so that they appear as normal traffic to the classifiers.
3 Related Work
The following Section details adversarial machine learning training and attack related work done by others.
The authors in  used FGSM and JSMA for the generation of adversarial attacks in the white box category by using the NSLKDD  dataset. They conclude an important observation on the percentage of features modified by the attacks and preferred JSMA over FGSM. The features modified by FGSM is 100% on every sample, while only 6% on JSMA features. This makes JSMA a more practical attack. Due to domain-related limitations, the attacker has to relate which features he can modify. The author used MLP to generate JSMA based adversarial attacks. Testing was done by using the adversarial examples against Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF) and ensemble of these three classifiers called a voting classifier. The performance parameters included accuracy, F1-Score and AUC normal. It was found that the performance of all the baseline models decreased. The most affected classifier was SVM, and the most resilient was RF. The author also presented the top 20 contributed features involved in adversarial attacks. No solution or defence against attacks were provided. Also, there were no results based on FGSM and only JSMA results were provided on a single dataset.
The authors in  used JSMA, FGSM, Deepfool  and Carlini et al. (C&W)  attacks for the generation of adversarial examples using the NSLKDD dataset in a white box setting. The author used MLP as a neural network with ReLU  activation function. The MLP model was tested against adversarial samples. The main contribution of the author is to highlight the features that most participate in generating each attack. The performance parameters include accuracy, precision, recall, false alarm, F1-score. The accuracy for JSMA attack is 52.41%, untargeted FGSM is 40.78%, targeted FGSM is 50.66%, Deepfool is 41.03% and C&W is 64.8%. Author further concludes that C&W attacks seem to be less devastating than the other three attacks.
In , the authors proposed IDSGAN to generate adversarial attacks to defeat intrusion detection systems. The NSLKDD dataset is used in a black box scenario in which only malicious traffic is used to generate an adversarial attack. For the black box IDS training, a few algorithms are used like SVM, Naïve Bayes (NB), MLP, Logistic Regression (LR), DT, RF, k-nearest neighbors (K-NN). The author analyzes the algorithms on the detection rate and evasion increase rate which shows that IDSGAN successfully evaded the algorithms and DoS, U2R and R2L attacks are undetected by them.
The authors in  proposed DOS-WGAN, which uses Wasserstein generative adversarial network (WGAN) with a gradient penalty method to evade the classifier. The author uses standardized Euclidean distance which maps the adversarial samples to the original data distribution. Standardized Euclidean distance and information entropy is used to assess generative adversarial network (GAN) training. The author uses KDDCUP 99 dataset. The author includes three types of experiments DoS-GAN with an accuracy of 69.7%, DoS-WGAN with clipping (WGAN-CLIP) accuracy of 57.1%, and DoS-WGAN with gradient penalty (WGAN-GP) accuracy of 47.6% with most stable training, respectively. The WGAN-CLIP model means that the weight of the generator is limited in [−0.01, 0.01] and the WGAN-GP model means DoS-WGAN uses gradient penalty.
In , the authors evaluated adversarial attacks in a black box scenario on the NSLKDD dataset. Three different types of black-box attacks were launched. The first attack in which the adversary trained a substitute model with white box limitations. The attacks generated on substitute models are created using C&W attacks. The second attack is based on zeroth-order optimization (ZOO). The third attack is generated using GAN. The following parameters are used to measure the performance of classifiers: Accuracy, precision, recall, false alarm and F1-score. The impact of the first attack approach that uses a substitute model is less as compared to the second and third approaches. The second approach performed better among three black-box attacks but required a large number of queries and computation power to calculate the gradients.
The authors in  used a neural network for the implementation of the intrusion detection system. The NSLKDD dataset has been used to train the model. FGSM was used for the generation of adversarial attacks. The author showed the results with various performance parameters. The overall results deteriorated after the adversarial attack.
In , the authors used four adversarial creation methods: Projected Gradient Descent (PGD) attack, Momentum Iterative Fast Gradient Sign Method, Limited-Memory Broyden-Fletcher-Goldfarb-Shanno method (L-BFGS)  attack, and Stochastic Approximation Simultaneous Perturbation (SPSA) . For the experiment, they have used the NSLKDD dataset. For the testing of adversarial attacks, multiple algorithms have been selected like Deep Neural Network (DNN), SVM, RF and LR. The performance measurement parameter includes accuracy, precision rate, recall rate, F1-Score and success rate. All the performance parameters of all the targeted models were decreased after the attack.
The authors in  had chosen three methods to generate adversarial attacks: particle swarm optimization (PSO), a genetic algorithm (GA), and a GAN. The adversarial attacks have been tested on two datasets NSLKDD and UNSW-NB15 . Multiple baseline classifiers have been used to test against an adversarial attack which includes SVM, DT, NB, K-NN, RF, MLP, Gradient Boosting (GB), LR, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Bagging (BAG). All the trained classifiers demonstrated a decrease in evasion rate.
In , the authors used adversarial training methods to defend against adversarial attacks. They divided the CICIDS17  dataset into four parts for training IDS, testing IDS, training adversarial detector and testing adversarial detector. The adversarial examples are generated using four white-box attacks method which includes FGSM, Basic Iterative method (BIM), C&W and PGD. Performance parameters include Precision, Recall F1-Score and Accuracy. RF & K-NN performed the same in all the parameters with the F1-score of 95%. The AdaBoost algorithm did not exceed the results of the RF with 87.66% accuracy while SVM was unsuccessful in picking up the adversarial attack with recall of 79%.
The authors in  demonstrated the adversarial attack and two defence methods. The author simulated realistic attacks by manually modifying three features; exchanged_bytes, duration, total_packets. The author evaluated RF, MLP and K-NN before and after the adversarial attack. The manually crafted features were added in the training set to retrain the models. The performance of all the models deteriorated but after retraining, it showed improvement. The author used feature removal as the second method to defend against adversarial attacks. When compared to the non-adversarial setting, accuracy drops for each classifier such as RF from 99% to 97%, MLP and K-NN from 99% to 98% which affected the NIDS.
In , the authors experimented on NSLKDD and CICIDS17 datasets using only the DOS attack. For feature selection, Recursive Feature Elimination with Linear SVMs technique was used which provided the highest AUC on the original dataset. The selected features were 41 for NSLKDD and 77 for CICIDS17. Four adversarial attack methods were used for each distance metric FGSM, JSMA, Deepfool and C&W. The aim was to misclassify the attack record as a normal record. The author tested adversarial examples on DT, RF, NB, SVM, Neural Network (NN) and Denoising Autoencoder (DA). Evaluation on original datasets for baseline performance shows Decision Tree and RF are among the best while NB and DA underperformed. AUC decreased by 13% on the NSL-KDD dataset while on the CICIDS17 dataset it decreased by 40%. The model was then trained on three adversarial generation methods while one was left for testing purposes. The performance of the classifiers decreased by 4% on the NSLKDD and 18% on the CICIDS17 which was much better than the previous AUC score. In these conditions, RF was the most resilient which only suffered a 0.1% of AUC decrease on both datasets.
The authors in  used GAN to generate an adversarial attack. To validate the GAN based attack, black-box IDS have been trained on the baseline machine learning classifier such as DNN, RF, LR, NB, DT, K-NN, SVM, GB. The KDDCUP dataset is used which contains four types of attack classes: Denial-of-Service (DOS), Remote to User (R2L), PROBE and User to Root (U2R). The study targeted the classification of normal and probe class. All the classifiers were affected by adversarial examples generated by adding small perturbations using GAN. To demonstrate defence, the authors used the adversarial training methods in which adversarial examples were added to the training data to ensure the model learns about the possible perturbations. The author evaluated the performance of black box IDS on the accuracy, precision, recall and F1-Score. After GAN based adversarial training, LR performed better among all the classifiers with an accuracy of 86.64%.
In , the author used a deep neural network and static features from the DERBIN dataset . The adversarial examples were manually crafted without impacting the malware functionality. The performance parameters were false-negative rates, misclassification rate and average distortion. The author found results with misclassification rates of up to 69% against models. The authors also demonstrated two defence methods: Adversarial training and Defense distillation. The adversarial training defence is non-adaptive and depends on training data. Defence Distillation did not perform well as it usually does in computer vision.
Tab. 1 summarizes the above literature review. As can be seen from the Tab. 1, none of the others have worked on all three i.e., NSLKDD, UNSW-NB15 and CICIDS17 datasets to provide comparison. Also, very few others have worked on defense against adversarial attacks. The contribution of this work can be given as follows:
• Detail comparison of adversarial attack on three benchmark datasets i.e., NSLKDD, UNSW-NB15 and CICIDS17
• For a defence against adversarial attacks, adversarial training is used by including adversarial dataset generated through FGSM and JSMA in training process.
• Various machine learning algorithms have been tested against adversarial attacks in seen (where model is aware of adversarial samples) and unseen (where model is unaware of adversarial samples) attacks.
4 Experimental Setup
In this study, NSLKDD, UNSW-NB15 and CICIDS17 datasets are utilized. For NSLKDD, there are 39 types of attacks and one normal class. All the attacks have been converted into one of the four classes [‘dos’, ‘r2l’, ‘probe’ and ‘u2r’]. For UNSW-NB15, there are 9 attack types and one normal class. For CICIDS17, there are 14 attack types and one normal class. All the datasets are evaluated as multi-classification problem. The categorical data were One-Hot Encoded as 1 for correct and 0 for all others. StandardScaler is used to resize the distribution of data so that the mean of the observed data is 0 and the standard deviation is 1. StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. The unit variance uses the standard deviation as the scaling factor.
The Sklearn library is used for classification and the Cleverhans library  is for adversarial attacks generation. For the generation of adversarial examples, we have used MLP as shown in Fig. 3. With the help of MLP, we have created an adversarial test dataset using FGSM and JSMA for this study. Initially, the MLP model is trained on original dataset. The test set of the dataset is applied to MLP model with one of the adversarial attack methods in order to generate adversarial samples. The attack methods try to add small amount of perturbation in adversarial test dataset which can make model to take wrong decision.
The adversarial examples were tested against multiple machine learning classifiers like Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, Logistic Regression and Naïve Bayes.
4.1 Evaluation Parameters
To evaluate the performance of machine learning classifiers, Accuracy, F1 Score and AUC Score are used
AUC is the area under the Receiver Operating Characteristic (ROC) curve which is drawn using the false positive rate (FPR) and true positive rate (TPR) metrics.
4.2 Types of Experiments
The experiments are divided into the following five types:
i. The baseline performance of each classifier
ii. Classifiers tested against JSMA AND FGSM based adversarial attacks (without adversarial training)
iii. Classifiers trained with adversarial samples and tested with the original dataset
iv. Performance of classifiers tested against JSMA AND FGSM after adversarial training with JSMA.
v. Performance of classifiers tested against JSMA AND FGSM after adversarial training with FGSM.
For the evaluation of experiments (i) and (ii), the model in Fig. 4 is implemented. Similarly, for the evaluation of the experiments of (iii) and (iv), the model implemented is shown in Fig. 5. The result of these models are observed in Tab. 2 for NSLKDD, in Tab. 3 for UNSW-NB15 and in Tab. 4 for CICIDS17 datasets.
5 Experiment Results
Tabs. 2–4 summarize the complete results for NSLKDD, UNSW-NB15, CICIDS17 respectively. The top and the lowest performer among classifiers for each category are in bold fonts. To explain the salient points of obtained results, a single row of NSLKDD (results of K-NN) is shown in Fig. 6.
The baseline performance obtained in (i) drops after the classifiers are tested against JSMA and FGSM based adversarial attacks in (ii-a) and (ii-b) (without adversarial training) on original datasets. The classifiers trained with adversarial samples and tested with the original dataset in (iii-a) and (iii-b) shows a similar trend as observed in (i). Performance of classifiers after adversarial training in (iv-a) and (v-a) shows improved results for the seen attack, whereas in (iv-b) and (v-b) drop of accuracy can be observed even after adversarial training for the unseen attack.
Tabs. 2–4 show the performance of baseline classifiers in column Original Dataset (i) for NSLKDD, UNSW-NB15, CICIDS17 respectively. The results obtained for baseline classifiers for each dataset are similar to those that have been obtained in other literature. These performance parameters are then used to compare with adversarial attack results.
Referring to experiment (ii) the impact of adversarial attack either with JSMA or FGSM without any adversarial training indicate the average drop of accuracy of around 25% to 30% on all the classifiers and datasets. On the other hand, K-NN shows better performance in accuracy, F1-Score and AUC among all the classifiers when tested against FGSM. Similarly, Random Forest performs better for CICIDS17 dataset when tested against JSMA.
The experiment types (iv-a) and (iv-b) is where models are trained on JSMA and tested on the JSMA and FGSM attacks respectively. The results in this type of experiment are better than type (ii) for all the classifiers as these classifiers are trained on the adversarial examples on which they have been tested. The experiment (iv-a) shows better results for the Decision Tree for all the datasets. Similarly, for the experiment (iv-b), K-NN has performed better against FGSM compared to other classifiers trained on JSMA based adversarial examples for all datasets. Whereas in the experiment (v-b), classifiers trained on FGSM and tested against JSMA based adversarial examples, Logistic Regression performs well in accuracy for NSLKDD and UNSW-NB15 datasets. For CICIDS17, Random Forest performs is better in accuracy among all classifiers. Considering accuracy for all the datasets, Naïve Bayes performs worst among all classifiers with the exception of a few results.
Analyzing the AUC Score, for experiment type (iv-a) and (v-a), observed results lies above 90% except for the CICIDS17 dataset in type (iv-a). Another interesting aspect for the experiment type (iv-b) for NSLKDD is, obtained results are not up to the mark for accuracy. A similar pattern is reflected by the F1 and AUC Score which results in a low score. Whereas the results of (v-a) are good for each classifier and are also supported by the F1 and AUC scores. These kinds of patterns validate our work according to the performance parameters included in our studies.
We have tested FGSM and JSMA based adversarial examples against multiple classifiers. The experiments have been conducted in five different scenarios. Initially, classifiers have been tested on clean data to compare the results with other experiments. The classifies tested against adversarial examples with or without adversarial training. The behaviour of classifiers on multiple datasets has been observed with performance parameters. Performance of NB is observed to be the worst overall whereas KNN performs better in NSLKDD and UNSW-NB15 datasets. For CICIDS17, Random Forest classifier gives better results.
The future work includes the adversarial training of the classifier with multiple adversarial datasets to increase the robustness of the classifier. Ensemble of classifiers can also be created to increase overall performance against adversarial attacks.
Funding Statement: The authors receive no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|