Enhancement in wireless networks had given users the ability to use the Internet without a physical connection to the router. Almost every Internet of Things (IoT) devices such as smartphones, drones, and cameras use wireless technology (Infrared, Bluetooth, IrDA, IEEE 802.11, etc.) to establish multiple inter-device connections simultaneously. With the flexibility of the wireless network, one can set up numerous ad-hoc networks on-demand, connecting hundreds to thousands of users, increasing productivity and profitability significantly. However, the number of network attacks in wireless networks that exploit such flexibilities in setting and tearing down networks has become very alarming. Perpetrators can launch attacks since there is no first line of defense in an
Humayun et al. [
Two types of IDS are commonly deployed for intrusion detection, namely (1) Wired IDS and (2) Wireless IDS.
A wired IDS is the standard IDS connected with all the network components (e.g. router, switch, IDS manager server, end devices) to perform intrusion detection processes [
IDS Technique | Detection time | Data source | Weakness |
---|---|---|---|
Signature based | Real-time | Network traffic | Unable to identify unknown attacks. |
Anomaly based | Real-time | Network traffic, User behaviours | Consumes more resources for high level users. |
Hybrid based | Real-time | Network packet, prior events | Complexity will increased due to integration of both, signatures and anomalies. |
The wired or standard IDS architecture used to connect all the devices with a cable. The IDS console will play the role to monitor and analyze the network traffic. When traffic or packet is coming from the internet, the router will pass the data to the IDS server; the IDS server would collect the traffic and perform the analysis. The IDS do not drop any packets since the job of IDS is to collect and analyze the data. The wired IDS require more components and devices for the network setup, such as routers, switches, IDS consoles, IDS servers, and other end devices, as shown in
One of the weaknesses of traditional wired IDS for wireless implementation as in IoT is, it does not generally detect network intrusion from internal hosts of the network. Although it is possible to protect an organization’s internal network from wireless attackers, there can be only one link between the wireless network and the main network, such a network intrusion system will not cover all of the traffic on the wireless network [
Compared with wired IDS, wireless IDS is more suitable for monitoring IoT networks. Wireless IDS is a better version of wired IDS because it has characteristics to be covered in a wireless network [
The wireless IDS architecture looks like a wired IDS architecture, but it uses a wireless access point for network connectivity. The wireless IDS architecture is more suitable for IoT network due to the wireless sensor deployment. Furthermore, the typical components in a wireless IDS are the console, database server, and sensors, as shown in
Network traffic dataset can be sniffed from both wired network and wireless network. There are considerable differences in the attacks that targets wired and wireless networks. Fadlullah et al. [
Understanding the network data is fundamental before proceeding to the network traffic analysis process. Before investigating the network traffic dataset, understanding the type of network traffic data is crucial. Humayun et al. [
Network applications generate traffic (packets) containing headers from multiple protocols through the encapsulation process. Some examples of these protocols are Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP). See
IP Header (IPV4) | |
---|---|
Internet Header Length | The number of 32-bit words in the header |
Total Length | The entire packet size, including header and data in bytes |
Time To Live | This filed limits a datagram’s lifetime in hops (or time) |
Protocol | The protocol used in the data portion of the IP datagram |
Source address | This field is the IPV4 address of the sender of the datagram |
Destination address | This field is the IPV4 address of the receiver of the datagram |
Source port | Identifies the sending port |
Destination port | Identifies the receiving port |
Sequence number | Initial or accumulated sequence number |
Acknowledgement number | The next sequence number that the receiver is expecting |
Data offset | Specifies the size of the TCP header in 32-bit words |
Flags (control bits) | NS, CWR, ECE, URG, ACK, PSH, RST, SYN, FIN |
Source port | Identifies the sending port |
Destination port | Identifies the receiving port |
Length | The length in bytes of the UDP header and UDP data |
ICMP Packet | |
Type | Control (e.g. ping, destination unreachable, trace route) |
Code | Details with the type |
Rest of Header | More details |
On the other hand, port mirroring can be another method that can be used for data collection. This is a hardware-based data collection whereby packet forwarding devices can be used to forward packets from one port to another port where packet capture devices can be connected. By using this method, the entire incoming and outgoing packets can be viewed within a whole network. But this approach requires enough bandwidth as mirroring can cause loss and packet delay [
For example, a deep inspection of the packet application header and its payload can indicate whether the header carries malware. One weakness of such a method is that some protocols such as SSL and TLS encrypt the payload information. Therefore, it is hard to detect malicious information carried by the payload. The packet size can indicate whether the payload information carried is initiated by a bot because the attack packets coming from all the bots would have almost similar packet size. A DDoS attack can be detected by analyzing the packets count, based on the inbound and outbound packet counts. Some examples include checking ICMP’s request/reply count; traffic distribution among different network protocols; or the ratio of TCP packets with different flags value [
A flow-level data is usually used by using a flow-key aggregated with the relevant data depending on the application. The applications can be summarized as network/application/host monitoring, intrusion detection, security awareness and network application classification. A collection of a flow-based dataset for intrusion detection is outlined in Umer et al. [
A flow is defined as a unidirectional sequence of packets that belongs to a same TCP session. The purposes of flow are to provide an overview of network traffic and attempt to deal with the encrypted packets. Flow level data comprises (as shown in
NetFlow Data — Simple Network Management Protocol (SNMP) | |
---|---|
Ingress Interface | Router Information |
Source IP address | na |
Destination IP address | na |
IP protocol | IP protocol number |
Source Port | UDP or TCP ports, 0 for other ports |
Destination Port | UDP or TCP ports, 0 for other ports |
IP type of service | Priority level of the flow |
NetFlow Data — Flow Statistics | |
IP protocol | IP protocol number |
Source IP address | na |
Destination IP address | na |
Source Port | na |
Destination Port | na |
Bytes per packet | The flow analyzer captures their statistics |
Packets per flow | Number of packets in the flow |
TCP flags | SYN, FIN |
The connection-level data records global information between two IP addresses from the viewpoint of a particular network, providing a finer level of network traffic granularity than the flow-level data. Using connection-level data, packet-level data, or flow-level data, we may obtain detailed information about network activities. Connection size (size of packets and length of flow) can be summarised as connection period (time from connection establishment to connection termination), connection count (number of connections per unit time), and connection form (TCP, UDP ICMP etc.).
Any internal changes in the host can be seen in the host-level data, and most attacks have a direct effect on the host's reliability. Internal attacks such as unauthorised logging or entry, file system alteration, and privilege escalation can be detected using host-level data. They are commonly used in HIDS for monitoring abnormalities in the internals. Host level data can be collected using open source tools like Collect in Linux machines [
Data category | Classification | Advantages | Disadvantages | |
---|---|---|---|---|
Type | Explanation | |||
Packet level data | Source address | The IP address of source network interface | Access to raw packet data so that make real-time detection possible |
Collection methods are not suitable for high-speed networks |
Destination address | The IP address of destination network interface | |||
Source Port | The end-point of source network interface | |||
Destination Port | The end-point of destination network interface | |||
TTL | The maximum hop count | |||
Timestamp | The point-in-time of sending or receiving packets | |||
Packet payload | The content that packet carries | |||
Packet size | The size of a packet in bytes | |||
The number of transmitting bytes | The number of bytes accumulated in a certain time period | |||
The number of packets | The packet count accumulated in a certain time period | |||
Packet rate | The number of transmitting packets per unit time | |||
Flow-level data | Flow count | The number of flows | Independent of encrypted payload |
Collection methods cause some inevitable delay |
Flow type | The protocol information of flow | |||
Flow size | The number of packets in a flow | |||
Flow direction | The transmission direction of a flow | |||
Flow duration | The time duration of a flow | |||
Flow rate | The number of transmitting packets per unit time of a flow | |||
Connection-level data | Connection size | The number of packets or flows in a connection | Provide global information of exchanged traffic between two IP addresses in a given time; |
Collection methods need to keep track of each connection status; |
Connection duration | The time duration of a connection | |||
Connection count | The number of connections | |||
Connection type | The protocol information of a connection | |||
Host-level data | CPU usage | The load information about running software programs | Have full information about system performance and behaviors; |
Have a high false alarm rate when being used separately from other categories of data; |
Memory usage | The information about data exchange | |||
Equipment operation logs | The running records of equipment that connects with a host | |||
Application operation logs | The running records of an application |
As a summary, the analysis given based on each type of data can be used to detect specific attacks by considering the detection method and the network environment. Any application-specific attack detection may require packet-level and connection/flow level data. Whereas inspecting malware within a host might require more host-level data and some extend of connection data. Much network-based attacks such as DDoS and Botnet may need to utilize packet level and connection-level data. This attack can be further granularized by integrating host-level data to see the effect of DDoS on the host performance. A higher accuracy of detection can be achieved by doing a thorough analysis of the nature of the attacks and their impacts on the host or the network.
The live dataset collected from the network is used for intrusion detection, but many compiled datasets are available on the internet for network intrusion detection. One of the most widely used datasets is the KDD Cup data set [
Basic Features | ||
---|---|---|
Duration | Integer | Duration of the connection |
Protocol_type | Nominal | Protocol type of the connection; TCP; UDP and ICMP |
Service | Nominal | http, ftp, smtp, telnet … and other |
Flag | Nominal | Connection status |
Src_bytes | Integer | Bytes sent in one connection |
Dst_bytes | Integer | Bytes received in one connection |
Land | Binary | If src/dst IP address and port numbers are same, then 1 |
Wrong_fragment | Integer | Sum of bad checksum packets in a connection |
Urgent | Integer | Sum of urgent packets in a connection |
Hot | Integer | Sum of hot actions in a connection such as; entering a system directory, creating programs and executing programs |
Num_failed_logins | Integer | Number of incorrect logins in a connection |
Logged_in | Binary | If the login is correct, then 1, else 0 |
Num_compromised | Integer | Sum pf times appearance “not found” error in a connection |
Root_shell | Binary | If the root gets the shell, then 1, else 0 |
Su_attempted | Binary | If the su command has been used, then 1 else 0 |
Num_root | Integer | Sum of operations performed as root in a connection |
Num_file_creations | Integer | Number of logins of normal users |
Num_access_files | Integer | Sum of operations in control files in a connection |
Num_outbound_cmds | Integer | Sum of outbound commands in an ftp session |
Is_hot_login | Binary | If the user is accessing as root or admin |
Is_guest_login | Binary | If the user is accessing as guest, anonymous, or visitor |
Traffic Features – Same Host – 2-second Window | ||
Duration | Integer | Duration of the connection |
Protocol_type | Nominal | Protocol type of the connection; TCP, UDP, and ICMP |
Service | Nominal | http, ftp, smtp, telnet..and other |
Flag | Nominal | Connection status |
Src_bytes | Integer | Bytes sent in one connection |
Dst_bytes | Integer | Bytes received in one connection |
Land | Binary | If src/dst IP address and port numbers are same, then 1 |
Wrong_fragment | Integer | Sum of bad checksum packets in a connection |
Urgent | Integer | Sum of urgent packets in a connection |
Sobh [
Therefore [
Type | Features |
---|---|
Nominal | Protocol_type(2), service(3), flag(4) |
Binary | Land(7),logged_in(12),root_shell(14),su_attempted(15), is_host_login(21),,is_guset_login(22) |
Numeric | Duration(10,src_bytes(5), dst_bytes(6), wrong_fragment(8), urgent(9), hot(10), num_failed_logins(11),num_compromised(13), num_root(16), num_file_creations(17), num_shells(18), num_access_files(19), num_outbound_cmds(20), count(23), srv_count(24), serror_rate(25), srv_error_rate(26),rerror_rate(27), srv_reeror_rate(28), same_srv_rate(29), diff_srv_rate(30), srv_diff_host_rate(31), dst_host_count(32), dst_host_srv_count(33), dst_host_same_srv_rate(34), dst_host_diff_srv_rate(35), dst_host_same_src_port_rate(36), dst_host_srv_diff_rate(37), dst_host_serror_rate(38), dst_host_srv_serror_rate(39), dst_host_rerror_rate(40), dst_host_srv_rerror_rate(41) |
In an earlier study, McHugh (2010) [
IEEE 802.11 based WLANs consists of several frames that contain the information of the packets. The expanded packets in the dataset provide a clear comparison between the types of structures. For example, the number of fields of each packet differs according to the type of frame the packet is in. Therefore, the feature selection highly relies on similar fields that all the frames possess to compare the differences in the same fields of different frames. The types of frames are an important feature in the study of wireless networks as they affect the number of fields and the types of fields present in the frame.
Info | Frame | 802.11 Radio Information | Layers in TCP/IP Stack |
---|---|---|---|
Beacon Frame | Management | PHY type = 802.11(b) | S- Band DSSS Physical Layer |
Null Function | Data | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
Acknowledgement | Control | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
Data | Data | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
Action | Management | PHY type = 802.11(b) | (b)= DSSS Physical Layer |
Probe Response | Management | PHY type = 802.11(b) | (b)= DSSS Physical Layer |
Clear-to-send | Control | PHY type = 802.11(b) | (b)= DSSS Physical Layer |
QoS Data | Data | PHY type = 802.11(b) | (b)= DSSS Physical Layer |
Probe Request | Management | PHY type = 802.11(b) | (b)= DSSS Physical Layer |
Block Ack | Control Frame | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
Request-to-send | Control | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
Block Ack Request | Control | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
QoS Null Function | Data | PHY type = 802.11(b) |
(b)= DSSS Physical Layer |
Key | Data | PHY type = 802.11(b) | (b)= DSSS Physical Layer |
DE authentication | Management | – | No layer |
CF - End | Control | PHY type = 802.11(g) | S-Band ISM OFDM Physical Layer |
Authentication | Management | PHY type = 802.11(b) | DSSS Physical Layer |
Association Response | Management | PHY type = 802.11(b) | DSSS Physical Layer |
Association Request | Management | PHY type = 802.11(b) | DSSS Physical Layer |
There is no fundamental research in IoT intrusion detection (ID) that mainly focuses on wireless networks to the best of our knowledge. Most of the research area in IDS focuses on traditional wired networks. Applying the wired network research of IDS at wireless network may not be feasible due to the architectural differences of IoT. Traditional security countermeasures and privacy enforcement cannot be directly applied to IoT technologies due to the three fundamental aspects: The limited computing power of IoT components The high number of interconnected devices Sharing of data among objects users
Moreover, intrusion response to wireless networks depends on the type of intrusion, network protocols and applications in use and the confidence in the evidence, which is different from wired networks. A few works have been conducted using IDS to counter the wireless network attacks in IoT security. The main challenge is the nature of the wireless network. Unlike a wired network, in the wireless network, centralized access control is hard to be implemented due to the distributed nature of a wireless network. IoT devices and networks are the sources to generate massive unstructured data. Until now, researchers usually do not have access to the complete IoT network data that can be used for intrusion detection research. The wireless intrusion detection system will need to collect as much protocol data from the wireless network as needed.
Moreover, there are specific vulnerabilities in the physical and data link (MAC vulnerability) layer in wireless networks, which was not attempted in designing wired IDS. Therefor just deploying a wired IDS into wireless IDS would be just a false hope as it may not detect some specific wireless attacks, especially at the data link layer. No reliable research work has been conducted to create a standard benchmark dataset in a wireless IoT environment.
Careful selection of datasets is important in training ML-based wireless intrusion detection systems. As discussed, KDD Cup datasets and NSL-KDD Datasets contain traffic features that are detrimental to detect model accuracy when they are used to train to detect IoT variants kind of network intrusions. In IoT networks, wireless traffic carries more critical information at the data link. A detailed comparison between wired and wireless data showed that most wireless IDS' relevant features are found in the physical and data link layers. The findings indicate that adjusting features' weight for wireless-specific header information can potentially improve intrusions classification. Currently, to our best knowledge, no reliable research has been conducted to create a standard benchmark dataset in a wireless IoT environment. This paper identified a set of high gain features (in