The Internet of Things (IoT) integrates billions of self-organized and heterogeneous smart nodes that communicate with each other without human intervention. In recent years, IoT based systems have been used in improving the experience in many applications including healthcare, agriculture, supply chain, education, transportation and traffic monitoring, utility services etc. However, node heterogeneity raised security concern which is one of the most complicated issues on the IoT. Implementing security measures, including encryption, access control, and authentication for the IoT devices are ineffective in achieving security. In this paper, we identified various types of IoT threats and shallow (such as decision tree (DT), random forest (RF), support vector machine (SVM)) as well as deep machine learning (deep neural network (DNN), deep belief network (DBN), long short-term memory (LSTM), stacked LSTM, bidirectional LSTM (Bi-LSTM)) based intrusion detection systems (IDS) in the IoT environment have been discussed. The performance of these models has been evaluated using five benchmark datasets such as NSL-KDD, IoTDevNet, DS2OS, IoTID20, and IoT Botnet dataset. The various performance metrics such as Accuracy, Precision, Recall, F1-score were used to evaluate the performance of shallow/deep machine learning based IDS. It has been found that deep machine learning IDS outperforms shallow machine learning in detecting IoT attacks.
Internet of things (IoT) are growing exponentially and playing a vital role in our everyday life. IoT nodes can use internet protocol address and connect to internet. These self-configured smart nodes are driving beyond many cutting-edge applications such as process automation, home automation, smart cars, decision analytics, smart grids, health care system, educational development, industrial development and so on [
In IoT systems, heterogeneous nodes are connected to a complex network architecture and pose security concerns. The key challenge is to ensure security in resource constraint IoT nodes [
In this research, data analysis-based techniques have been used as it works faster than others and performs better for the unspecified cases raised from unknown attacks. The essential objective of the framework is to build up a keen, secured, and trusted IoT based system that can differentiate its vulnerability, provide a protected firewall against all cyberattacks, and recuperate itself consequently. Thus, a learning-based methodology is proposed here which can recognize and ensure the infrastructure's security when it is in the anomalous condition. Despite the prevalence of utilizing conventional shallow ML strategies for classification problems, they still have numerous inadequacies that should be tended to, for example, the point of view of full representation of features, problem complexity, and static classification limitations. Deep learning (DL), having hierarchical architectures, is considered as a class of ML techniques that comprises numerous layers of data processing for classification and pattern recognition. Instead of conquering the previous lack of customary machine learning techniques, it puts an extraordinary mark on many research explorations recently. In light of DL achievement and stability, it has been effectively utilized in a wide scope of uses these days, for example, natural language processing, computer vision, and cybersecurity systems. For this errand, three shallow ML classifiers and five DL models have been exploited. Another vital part of this paper is that it has made the comparison of a simple model like DT or RF with a complex network like deep belief network (DBN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM) for anomaly detection.
The main contributions of this research can be summarized as follows:
A comprehensive workflow is proposed to predict optimal attack detection model in IoT systems. In addition to two most commonly used network attack datasets (such as NSL-KDD and DS2OS) three new datasets (such as IoTDevNet, IoTID20, and IoT Botnet) have been used. The performance of some shallow and deep machine learning algorithms has been evaluated using these five datasets through extensive experimentations.
The rest of the paper describes the state-of-the-art of this field and IoT threats in Section 2 and proposes methodologies, detailed dataset descriptions, learning model summary in Section 3. In Section 4, experimental setup, performance analysis, and comparative study with other existing works are explained. In the end, Section 5 presents concluding remarks with future scopes.
This section includes some of the researches of various ML algorithms and classifiers integrated IDSs to detect intrusions in IoT networks. Roy et al. [
A novel intrusion detection system proposed by Xu et al. [
A hybrid sampling-based intrusion detection by Jiang et al. [
Hasan et al. [
Latif et al. [
Above surveyed studies regarding IoT security are summarized according to their datasets, models and best accuracy result in
Ref. | Method | Dataset | Contribution | Limitation | |
---|---|---|---|---|---|
[ |
Bi-LSTM | UNSW-NB15 | IDS classifier detected normal or attack types with 95% accuracy. | It failed to identify various types of attack. Besides, parameters were not optimized. | |
[ |
CNN | System Call Graph | It used CNN based model to detect Botnet using system call graphs with an accuracy of 97% and an F-measure of 98.33%. | No experiment was done for other malicious lines on IoT devices. | |
[ |
RNN | NSL-KDD | Fog computing-based IDS having multi-layered deep RNN with higher detection rate (DoS: 98.27%, Probe: 97.35%, U2R: 64.93%, R2L: 77.25%). | It explored a single dataset only without explaining the hyper-parameters tuning. | |
[ |
BGRU+MLP, GRU+MLP, BLSTM+MLP, LSTM+MLP, GRU, LSTM, MLP | KDD99, NSL-KDD | Four types of attacks such as DOS, Probe, U2R and R2L had been detected using multiple models with high accuracy where BGRU+MPL performs well achieving 99.24% accuracy. | More generic attacks were detected exploring a single dataset only. | |
[ |
LSTM, GRU |
NSL-KDD | Three types of RNN and BLS models were applied to detect intrusions where BLS outperforms with 84.14% accuracy and 84.68% F-measures. | A single simple network dataset was considered only. No hyper-parameter tuning was done. | |
[ |
DF, DJ, DNN, DBN, LSTM, GRU | NSL-KDD, KDD99, CICIDS | An empirical extensive study on NIDS as multiclass classification using four DL models and two shallow ML models with multiple evaluation matrices where DBN outperforms with an accuracy of 96.9%. | No IoT attack datasets are studied. | |
[ |
J48, SVM, NB, NB Tree, MLP, RF, RF Tree, ANN and RNN-IDS | NSL-KDD | NIDS using the feed-forward nature of Random Neural Networks (RNN-IDS) was proposed and compared with multiple ML algorithms where RNN-IDS accuracy reached up to 95.2%. | Only ML models and a single dataset are considered for performance comparison. No hyper-parameter tuning. | |
[ |
RF, CNN, |
NSL-KDD, UNSW-15 | Hybrid sampling and a deep hierarchical network constructed on CNN and BiLSTM were proposed with an accuracy of 83.58%. | No performance comparison with related studies. Multiple models were tested with two similar types of datasets. | |
[ |
Bi-RNN, RNN, GRNN | 10% KDD | IDS was designed and evaluated using the Bi-RNN model which outperformed RNN and GRNN with a 99.04% detection rate. | Only 10% of the full KDD dataset was used, failed to analyze the performance using a large amount of data, and to classify multi-class attack data. | |
[ |
LR, SVM, DT, RF, ANN | DS2OS | Data-analysis based IDS model was explored where RF model achieved higher accuracy (99.4%) than other ML models, which could detect multi-class attacks more accurately. | With a single dataset, only ML models were implemented. The efficiency of the RF model for a large volume of data was not examined. | |
[ |
TCN, |
DS2OS | A semi-supervised HS-TCN model outperformed the other two models with 98.15% accuracy. Besides, a balanced version of the dataset has also been evaluated along with the original one. | Lack of efficiency optimization of semi-supervised models and failed to identify multi-class attack types. | |
[ |
LR, ANN | DS2OS | The dataset was experimented with a data-analysis based technique in a two-fold way, using LR and ANN models. In both cases, LR and ANN detected attacks with equal accuracy (99.4%, 99.99%), respectively. | Indistinct analysis of achieved result and no parameter optimization. | |
[ |
RaNN | DS2OS | Achieved 99.20% attack detection accuracy with less prediction time using an advanced and lightweight scheme of ANN, the RaNN model. | Only a single dataset was used. No statistical comparison of attack data for experimented SVM and DT models. | |
[ |
DNN | DS2OS | Seven types of attacks were classified with 98.26% accuracy using an optimized DNN model. | Complex DL models had not analyzed. |
In
With the distributed nature, an IoT network is a layered architecture where every layer sequentially maintains individual tasks to run this whole platform efficiently. Intrusion happens in every layer to breach the security where researchers have found multiple attacks occurring in the whole network including the protocol and gateways [
The strategic framework to detect attacks on an IoT network follows the combination of some elementary steps.
In this research work, five datasets have been used to train and evaluate three shallow ML and five DL models. To analyze intrusion detection methods, two widely used datasets (NSL-KDD, DS2OS) and other three new datasets (IoT Device Network Logs, IoT Intrusion Dataset 2020, IoT Botnet Dataset 2020) have been chosen based on the IoT attack variation. The first dataset is NSL-KDD, extensively used as a benchmark dataset improvised from the original KDD’99 dataset by eliminating the redundancy of 78% and 75% train and test set records, respectively [
Class | frequency count | % of total records | % of attack records |
---|---|---|---|
79,035 | 16.55% | - | |
82,285 | 17.24% | 20.65% | |
79,020 | 16.55% | 19.83% | |
79,002 | 16.55% | 19.83% | |
79,052 | 16.56% | 19.84% | |
79,032 | 16.55% | 19.84% |
SI. | Features | Data Types | SI. | Features | Data Types |
---|---|---|---|---|---|
1 | frame.number | numeric | 8 | ip.proto | numeric |
2 | frame.time | numeric | 9 | ip.len | numeric |
3 | frame.len | numeric | 10 | tcp.len | numeric |
4 | eth.src | numeric | 11 | tcp.srcport | numeric |
5 | eth.dst | numeric | 12 | tcp.dstport | numeric |
6 | ip.src | numeric | 13 | Value | numeric |
7 | ip.dst | numeric | 14 | normality | numeric |
The ‘IoT Intrusion Dataset 2020’ acronym as IoTID20 is the fourth dataset, adopted by I. Ullah et al. [
Class | frequency count | % of total records | % of attack records |
---|---|---|---|
4,15,677 | 6.43% | 70.97% | |
75,265 | 12.03% | 12.85% | |
59,391 | 9.49% | 10.14% | |
40,073 | 6.40% | - | |
35,377 | 5.65% | 6.04% |
The fifth dataset is ‘IoT Botnet Dataset 2020’ developed based on a comprehensive IoT network by I. Ullah et al. [
IoT Botnet Dataset 2020 | |||
---|---|---|---|
Class | frequency count | % of total records | % of attack records |
6,51,122 | 33.56 % | 35.33 % | |
6,36,539 | 32.8 % | 34.53 % | |
5,55,011 | 28.60 % | 30.11 % | |
97,197 | 5.00 % | - | |
520 | 0.027 % | 0.028% |
SI. | Features | Data types | SI. | Features | Data types | SI. | Features | Data types |
---|---|---|---|---|---|---|---|---|
1 | Flow_ID | object | 30 | Fwd_IAT_Max | float64 | 59 | Pkt_Size_Avg | float64 |
2 | Src_IP | object | 31 | Fwd_IAT_Min | float64 | 60 | Fwd_Seg_Size_Avg | float64 |
3 | Src_Port | int64 | 32 | Bwd_IAT_Tot | float64 | 61 | Bwd_Se_Size_Avg | float64 |
4 | Dst_IP | object | 33 | Bwd_IAT_Mean | float64 | 62 | Fwd_Byts/b_Avg | int64 |
5 | Dst_Port | int64 | 34 | Bwd_IAT_Std | float64 | 63 | Fwd_Pkts/b_Avg | int64 |
6 | Protocol | int64 | 35 | Bwd_IAT_Max | float64 | 64 | Fwd_Blk_Rate_Avg | int64 |
7 | Timestamp | ‘0’ | 36 | Bwd_IAT_Min | float64 | 65 | Bwd_Byts/b_Avg | int64 |
8 | Flow_Duration | int64 | 37 | Fwd_PSH_Flags | int64 | 66 | Bwd_Pkts/b_Avg | int64 |
9 | Tot_Fwd_Pkts | int64 | 38 | Bwd_PSH_Flags | int64 | 67 | Bwd_Blk_Rate_Avg | int64 |
10 | Tot_Bwd_Pkts | int64 | 39 | Fwd_URG_Flags | int64 | 68 | Subflow_Fwd_Pkts | int64 |
11 | TotLen_Fwd_Pkts | float64 | 40 | Bwd_URG_Flags | int64 | 69 | Subflow_Fwd_Byts | int64 |
12 | TotLen_Bwd_Pkts | float64 | 41 | Fwd_Header_Len | int64 | 70 | Subflow_Bwd_Pkts | int64 |
13 | Fwd_Pkt_Len_Max | float64 | 42 | Bwd_Header_Len | int64 | 71 | Subflow_Bwd_Byts | int64 |
14 | Fwd_Pkt_Len_Min | float64 | 43 | Fwd_Pkts/s | float64 | 72 | Init_Fwd_Win_Byts | int64 |
15 | Fwd_Pkt_Len_Mean | float64 | 44 | Bwd_Pkts/s | float64 | 73 | Init_Bwd_Win_Byts | int64 |
16 | Fwd_Pkt_Len_Std | float64 | 45 | Pkt_Len_Min | float64 | 74 | Fwd_Act_Data_Pkts | int64 |
17 | Bwd_Pkt_Len_Max | float64 | 46 | Pkt_Len_Max | float64 | 75 | Fwd_Seg_Size_Min | int64 |
18 | Bwd_Pkt_Len_Min | float64 | 47 | Pkt_Len_Mean | float64 | 76 | Active_Mean | float64 |
19 | Bwd_Pkt_Len_Mean | float64 | 48 | Pkt_Len_Std | float64 | 77 | Active_Std | float64 |
20 | Bwd_Pkt_Len_Std | float64 | 49 | Pkt_Len_Var | float64 | 78 | Active_Max | float64 |
21 | Flow_Byts/s | float64 | 50 | FIN_Flag_Cnt | int64 | 79 | Active_Min | float64 |
22 | Flow_Pkts/s | float64 | 51 | SYN_Flag_Cnt | int64 | 80 | Idle_Mean | float64 |
23 | Flow_IAT_Mean | float64 | 52 | RST_Flag_Cnt | int64 | 81 | Idle_Std | float64 |
24 | Flow_IAT_Std | float64 | 53 | PSH_Flag_Cnt | int64 | 82 | Idle_Max | float64 |
25 | Flow_IAT_Max | float64 | 54 | ACK_Flag_Cnt | int64 | 83 | Idle_Min | float64 |
26 | Flow_IAT_Min | float64 | 55 | URG_Flag_Cnt | int64 | 84 | Label | int64 |
27 | Fwd_IAT_Tot | float64 | 56 | CWE_Flag_Count | int64 | 85 | Cat | Object |
28 | Fwd_IAT_Mean | float64 | 57 | ECE_Flag_Cnt | int64 | 86 | Sub_Cat | Object |
29 | Bwd_IAT_Mean | float64 | 58 | Down/Up_Ratio | float64 | - | - | - |
Any ML/DL research requires radical data analysis to be compatible with the learning algorithms to accomplish accurate performance. Hence, it is necessary to perform pre-processing to convert the data from categorical values to numeric. There are two fundamental processes of pre-processing: Numericalization and Normalization. Before performing these, a few steps need to execute as dataset cleaning, finding out the missing values, and replacing the ‘NaN’ (Not a Number) values.
Both the KDDTrain+_20 and KDDTest-21 datasets contain 42 identical features, whereas only one feature has missing values, namely, ‘num_outbound_cmds’. IoT Network Device Logs Dataset is abbreviated as ‘IoTDevNet’ for the supremacy of the further experimental work. As it contains no missing or ‘NaN’ values, therefore, there is no need for further data filtering. As for DS2OS, there are two columns- ‘Accessed Node Type’ and ‘Value’ containing missing values. The data types of these columns are categorical and continuous, respectively. The ‘Accessed Node Type’ column has a total of 148 rows containing ‘NaN’ values needed to replace or remove. Removal of these 148 rows might results in a loss of valuable data, eventually in substandard performance. Hence, the ‘NaN’ values are restored with ‘Malicious’ values. There are also some unassigned data in the ‘Value’ column as ‘False’, ‘True’, ‘Twenty’, and ‘None’ transformed into values ‘0.0’, ‘1.0’, ‘20.0’, and ‘0.0’, respectively.
The IoT Botnet Dataset 2020 and the IoTIDS20 dataset both contain similar type and number of features with slightly variated attack classes. Before performing normalization on these two datasets, the empty spaces are dropped, useless indices are deprecated and the data types are converted into appropriate type (float) to avoid errors. In these two datasets, there are 16 features containing missing values which are removed to compute accurate result.
For each dataset, the data types of the features need to be determined beforehand for numericalization. In NSL-KDD, there are three categorical nominal variables in the ‘Service’, ‘Flag’, and ‘Protocol_type’ columns as well as in DS2OS; except ‘Value’ and ‘Timestamp’; other columns contain categorical nominal variables [
After numericalization, the StandardScaler method is applied to the continuous numerical data (particularly the high range ones) of the five concerned datasets for normalization. StandardScalar follows a normal distribution and scales the data by subtracting the mean 0 and dividing by the standard deviation 1.
For feature
ML is an Artificial Intelligence (AI) branch that is closely related to computational statistics which uses mathematical optimization to emphasis on prediction making. It is known to be unsupervised learning for different individuals to learn and develop baseline behavioral profiles and then to identify meaningful abnormalities. Shallow Learning is a form of ML in which models learn from predefined features represented by data. This study considers three shallow ML techniques, namely, DT, RF, and SVM. DT is a flowchart-like classification or regression model that works by separating a dataset into several smaller subsets while gradually evolving a related decision tree with decision and leaf nodes simultaneously [
DL is a subfield of shallow ML and is much more implicated for processing big data and ensuring IoT network security. In this subsection, an overview of five different DL models has been introduced, namely, DNN, DBN, LSTM, Stacked LSTM, and Bi-LSTM. DNN is a sort of neural network demonstrated as a multilayer perceptron (MLP) prepared with algorithms to take in portrayals from datasets with no manual design of feature extractors [
The experiment is conducted on a personal HP laptop where the microprocessor is 2.5 GHz Intel Core i7-6500U with Intel® HD Graphics 520, 8 GB RAM, and Windows 10 operating system. The models are executed on an open-source platform Google Colab Notebook. Experimented shallow ML and DL models are implemented from the Keras layer using TensorFlow 1.14.0. As for loading, cleaning data, Pandas, and NumPy frameworks are used. Again, Matplotlib and Seaborn frameworks are implemented for data visualization. Finally, the performance of the experiment is analyzed using the scikit-learn framework.
Shallow ML models are self-biased learning algorithms that work best on moderate data, whereas DL algorithms perform more efficiently on intensive and larger data which requires complex hidden patterns consideration. All shallow ML/DL techniques are trained and evaluated for multi-class classifications with five mentioned datasets. Five-fold cross-validation is performed on each of the datasets using all of these techniques to estimate the skill of learning algorithms on unseen data.
During the simulation of the proposed models, the parameters listed in
Parameters | Shallow ML Algorithms | ||
---|---|---|---|
DT | RF | SVM | |
max_depth | 3 | 3 | ‒ |
n_estimators | ‒ | 100 | ‒ |
regularization parameter (C) | ‒ | ‒ | 1000 |
Parameters | DL Algorithms | ||||
---|---|---|---|---|---|
DNN | DBN | LSTM | Stacked LSTM | Bi-LSTM | |
Input neuron | 800 | 256 | 4 | 16 | 80 |
Hidden neuron | 400 | − | − | 16 | 40, 128 |
Epochs | 1000/10 | 10 (rbm) | 10 | 10 | 15 |
Batch size | 64 | 32 | 64 | 64 | 64 |
Optimizer | adam | − | adam | adam | adam |
Dropout rate | 0.9 | 0.2 | 0.6 | 0.4 | 0.1 |
Activation | relu, softmax | Relu | softmax | softmax | relu, softmax |
Loss | categorical | − | categorical | categorical | categorical |
DL models are complex neural networks with input, hidden and output layers having multiple neurons. Parameter optimization regarding mentioned five algorithms is mostly related to these hidden number of neurons. Dropout regularization is a strategy for minimizing overfitting and enhancing deep neural network generalization. Fixed dropout for each of the individual model is considered here for better performance which is measured based on accuracy and evaluation time. So, to acquire optimized result, hidden neuron number and the number of epochs in each fold are tuned for each model. At first, considering DNN for the DS2OS dataset, it is considered as a sequential model with three layers of hidden neurons with dropouts. The model is experimented with (400, 200), (800, 400), (1200, 800), and (100, 50) neuron sets which achieve 98.84%, 99.11%, 99.12%, and 97.17% accuracy with a time variation of 892 s, 1661 s, 3620 s, and 437 s. So, considering test and train accuracy with the training time, (800, 400) hidden neurons with a dropout 0.9, optimizer = ‘adam’, and loss = ‘categorical_crossentropy’ combination set is contemplated as optimum. The other four datasets also perform accordingly better in such arrangements. As for DBN model, the parameters combination (‘n_epochs_rbm’, ‘n_iter_backprop’) is tuned in a variation of (10, 100), (10, 50), (5, 100), and (5, 50) for NSL-KDD dataset. As per the performance analysis, the tuning with (10, 100) combination yields a higher testing accuracy of 91.97% with an evaluation time of 10,514 s. The same parameter tuning is applicable for the other four datasets too. As LSTM is a modified DNN, it is also constructed as a sequential model with the above parameters. While three variations of the LSTM model are explored in this study, the various combinations of hidden neuron set and no. of epoch have been examined for each of the models. The LSTM model is tested with two tuned parameters [(‘hidden neuron’), ‘epoch’] for the IoTID20 dataset, and the combination sets [(4), 10], [(4), 15], [(4), 5], and [(2), 10] obtain 99.91%, 99.25%, 94.35%, and 95.33% test accuracy with time 1128 s, 1807 s, 650 s, and 1141 s, respectively. These statistics indicate that the first combination [(4), 10] with a dropout of 0.6 beats the others, so the rest of the datasets are analyzed with it and performed as anticipated. Now, Stacked LSTM model is trialed with neuron-epoch combination [(8), 10], [(16), 10], [(16), 5], and [(16), 15] for IoTDevNet dataset, where test accuracy is found 82.94%, 99.63%, 98.28%, and 99.53% with time variance of 1535 s, 1641 s, 887 s, and 2549 s. So, the performance of [(16), 10] combination with a dropout of 0.4 provides an optimum result which is tested for other datasets as well. Finally, for the NSL-KDD dataset, the Bi-LSTM model with the tuned parameter sets [(80, 40, 128), 15], [(80, 40, 128), 5], [(80, 40, 128), 10], and [(40, 20, 64), 15] achieves 99.5%, 98.78%, 99.06%, and 99.26% test accuracy with processing times of 55 s, 32 s, 47 s, and 63 s. The results reveal the correctness of [(80, 40, 128), 15] set with a dropout of 0.1 to get a most favorable outcome. Furthermore, other datasets also justify these sets of tuned parameters through the performance.
To analyze the performances of the executed models, the widely used popular multi-class performance metric ‘Accuracy’ is evaluated in this experiment. Besides, the scores of Precision, Recall, and F1-score have also been computed. These metrics depend on four basic qualitative model quality indicators, namely, true positive, true negative, false positive, and false negative [
The use of DS2OS dataset is quite recent in the study of IDSs. Therefore, the number of directed researches related to network security are very few with this dataset. Among them, [
For NSL-KDD dataset, Elmasry et al. acquired 73.35% accuracy using decision forest and 73.38 % using decision jungle in [
IoT has been utilized extensively due to its potential of communicating with the actual devices of different application spaces to clients through the Web. In any case, the interconnected structure of IoT and the capacity of devices to interact with one another has risen security issues in IoT networks. So, a legitimate security system for IoT networks and devices should be created. In this paper, we have presented a data analysis technique for intrusion detection in the IoT environment. We begin with the state-of-the-art of different intrusion detection systems with a general introduction to IoT possible threats. Thereafter, the paper exhibits the nut and bolts of five datasets, among them two are known-NSL-KDD and DS2OS while another three are comparatively new-IoTDevNet, IoTID20, and IoT Botnet. Henceforth, this study discusses three ML and five DL techniques for distinguishing IoT attacks from a known or even an obscure environment. The structure overcomes implementation problems of heavy DL techniques directly on low space IoT devices, recognizes a few threats with high accuracy and detection rates, and maintaining a detection system by updating it accordingly for better attack identification. Relying on the experimental investigation, it can be concluded that Bi-LSTM outperforms best among the studied DL techniques and for this particular study of multiple datasets. However, it does not guarantee that on account of the big data and other obscure conditions Bi-LSTM will play out thusly. Hence, further investigation is required on the problem based on real-time data and power-time optimization. In the IoT network, micro-services behave distinctively at different events which trigger deviations in ordinary conduct in IoT services thus subsequently making an inconsistency. So, further analysis is required to interpret these issues in a more inside and out manner which may end in designing a hybrid algorithm of multiple techniques.