Real-Time Network Intrusion Prevention System Using Incremental Feature Generation

: Security measures are urgently required to mitigate the recent rapid increase in network security attacks. Although methods employing machine learning have been researched and developed to detect various network attacks effectively, these are passive approaches that cannot protect the network from attacks, but detect them after the end of the session. Since such passive approaches cannot provide fundamental security solutions, we propose an active approach that can prevent further damage by detecting and block-ing attacks in real time before the session ends. The proposed technology uses a two-level classifier structure: the first-stage classifier supports real-time classification, and the second-stage classifier supports accurate classification. Thus, the proposed approach can be used to determine whether an attack has occurred with high accuracy, even under heavy traffic. Through extensive evaluation, we confirm that our approach can provide a high detection rate in real time. Furthermore, because the proposed approach is fast, light, and easy to implement, it can be adopted in most existing network security equipment. Finally, we hope to mitigate the limitations of existing security systems, and expect to keep networks faster and safer from the increasing number of cyber-attacks.

However, signature-based methods are considerably vulnerable to variants of existing attacks and to newly emerging attacks, especially zero-day attacks [5][6][7]. Therefore, machine learning-based technology for detecting abnormal behaviors (instead of a pattern-dependent method) has recently been developed to overcome these vulnerabilities. Although a number of studies are underway into the proposed machine learning-based technologies, most of them focus on improving detection accuracy; research on improving detection speed to achieve real-time detection is lacking [8]. This is evident from the fact that machine learning-based technologies are applied only to intrusion detection systems (IDSs). Thus far, intrusion prevention, which blocks intrusions in real time, does not have applicable systems using machine learning.
There might be several reasons why a machine learning-based intrusion prevention system (IPS) has not yet been developed; however, the most important reason is the complexity of the machine learning algorithm itself. Most machine learning algorithms are trained on large amounts of data, and classification is then performed by the generated models [9][10][11][12][13][14]. It requires a considerably long time to train machine-learning models with large amounts of data, and this requires huge amounts of memory and computing power. To solve these problems, various partitioningbased machine learning techniques have been proposed, and some of the problems can be solved by adopting external cloud systems to mitigate the lack of memory and computational power required by training procedures.
However, classification by a learning model requires considerable computing power and fast speeds. As a solution, high classification speed can be obtained by massively parallel processing using expensive multiple GPUs. In this case, the CPU-GPU latency from transferring and processing a large amount of data can be detrimental to a high-capacity network that needs to transmit packets at high speed without delay [15].
Furthermore, the biggest reason why a machine learning-based IPS is difficult to implement is that it takes too long to generate from network traffic the features used for machine learning. There are several approaches to generating features; however, most studies generate features from each session, rather than from single packets. In this case, features cannot be generated before the session ends, and any attack is detected after the session ends [5][6][7]16,17]. Thus, attacks cannot be detected in real time. Moreover, with the current approaches, it is even difficult to detect an attack soon after the session ends because of poor classification performance.
Thus, in this study, we propose a method of generating features and detecting attacks in real time before the session ends. In particular, this study makes the following contributions.
(1) A structure for generating features in real time By presenting the structure for generating features in real time, the proposed method enables early attack detection by determining whether an attack has occurred before the session ends.
(2) High accuracy and real time attack detection through a two-level classifier To detect an attack in real time, we propose a unique two-level detection method. We designed a first-stage classifier that can detect attacks at high speed; the second-stage classifier improves detection accuracy. Thus, we implement a classifier with high accuracy while detecting attacks in real time.
(3) Low implementation costs Although the proposed method uses a two-level classifier, it has the advantage of being applicable to existing equipment because it uses the same classifier (or two similar classifiers) to simplify implementation, compared to other hybrid methods or ensemble-based classifier methods.
The remainder of this manuscript is organized as follows. Section 2 identifies and compares the features in existing work. Section 3 describes the proposed method in detail. Section 4 analyzes the results of the performance evaluation. Finally, Section 5 concludes this study with a brief summary.

Existing Work
Early network intrusion detection systems (NIDSs) use pattern-matching or threshold-based approaches. Such NIDSs can support fast detection but reveal crucial limitations in detecting zero-day attacks. Thus, a lot of research is focusing on machine learning-based approaches. The early machine learning-based NIDS employed a single machine learning algorithm, so it showed weakness in accurately detecting various network attacks. NIDS research using multiple machine learning algorithms has been actively going on. Generally, machine learning-based IDSs are classified into packet-based methods and session-based methods, where the former use packet data for learning, and the latter use session data. The packet-based methods obtain features from raw packet data without a feature extraction technique. The session-based methods require all the session data in order to build features after the session is finished or has expired. Since the information from one entire session is reduced to a small number of statistical values called session features, it can support very high processing speeds.
In this section, we describe in detail the existing work, from early non-machine learning-based approaches to various recent machine learning approaches. We also compare the pros and cons of each approach.

The Non-Machine-Learning Algorithm
The signature-based approaches can be classified into two groups according to whether they support real-time detection or not. One of the most well-known non-real-time detection methods using the signature-based approach is the earliest IDS for monitoring multi-user systems. It can detect some specific types of attack: intrusion attempts, unauthorized intrusions, data breaches, DDoS, and suspicious use. The security policy is converted into rules and stored in a database, and each flow is analyzed to determine whether it was an attack or not based on the data registered in the database. After the flow closes, features are extracted from transmitted and received packet data and are used for detection. Thus, this approach cannot support real-time detection [2].
One of the NIDSs belonging to this category can provide real-time intrusion detection and prevention using the Boyer-Moore pattern matching algorithm in a signature-based manner [2,18]. It compares the header, payload, and size of an incoming packet to pre-registered signatures to identify malicious traffic. However, the system has some issues, such as processing overhead and reliability. It needs to analyze every packet to create a new signature. Nonetheless, it cannot guarantee the reliability of a signature.

The Packet-Based Single-Machine-Learning Algorithm
This approach uses a single-machine-learning algorithm with features obtained from packet data [2]. From the packet-based features, it can detect malicious code in packet payload data similar to an early pattern-matching approach. However, it inherently cannot detect zero-day attacks and attack variants, and the NIDS using this approach can be bypassed via packet fragmentation to avoid detection. By collecting multiple packets of a session rather than a single packet, such a weakness can be mitigated.

The Packet-Based Multiple-Machine-Learning Algorithm
This approach adopts multiple machine-learning algorithms to detect attacks [3]. Multiple algorithms can greatly help increase classification performance but the classification speed can deteriorate. Thus, the main disadvantage of this approach is that it is very difficult to use in large networks because of the slow training and classification speeds [3].

The Session-Based Single-Machine-Learning Algorithm
This approach extracts features from each session and classifies each session to detect abnormal traffic [9][10][11][12][13][14]. Early machine learning-based studies belong to this category. Since it does not use packet data to generate features, but uses a fixed number of features (regardless of the session length or packet size of each session), it can reduce memory usage and simplify the classification algorithm, resulting in high training and classification speeds. Owing to such benefits, we can apply this approach to large-scale networks. However, features can only be generated after the session ends, so when it detects an attack, it has most likely already been completed.

The Session-Based Multiple-Machine-Learning Algorithm
This approach performs training and classification by using features extracted from a session by using various classification algorithms. Ensemble and multi-layered methods are well-known types in this category [17,19]. The ensemble method applies several algorithms and combines the results from them. By doing so, it can significantly improve the detection performance, compared to a single-machine-learning approach. The multi-layered method runs each algorithm serially, based on the results after executing a specific algorithm. Generally, this approach adopts unsupervised learning and supervised learning. One example applies k-nearest neighbors (kNN) at first, to obtain multiple partitions, and then applies a decision tree (DT) algorithm to each partition. A multiple-classification algorithm compensates for the weakness of each algorithm, reaching very high classification accuracy. Instead, the classification speed becomes too slow to support realtime attack detection because of the very high computational cost. For some algorithms in this category, it is even impossible to apply them to a real network security system, because the overall implementation cost is too high.
As of now, little research has been done to increase detection accuracy and speed simultaneously. Various approaches have been proposed for overcoming existing technical issues, but real-time detection is still an open problem.

Proposed Algorithm
We propose a method for implementing an NIDS that can process packets received in real time and determine whether an attack has occurred. The proposed algorithm generates the latest features by updating the feature table for each session whenever a packet is received, and it determines whether an attack has occurred using the features. As shown in Fig. 1, the proposed system is configured to simultaneously increase both classification speed and accuracy by utilizing two classifiers. The proposed method has the following features.
• Early attack detection The proposed method performs intrusion detection whenever a packet is received. Therefore, it can detect intrusions without waiting until the session ends.
• Easy implementation Although the proposed method is equipped with two classifiers, it is implemented using the simple DT and its variants. Hence, it is considerably easy and simple to implement, and therefore, it is possible to apply the proposed method, without high cost, to an existing system.
The proposed method consists of a classifier to apply whenever a packet is received, and another classifier to apply when a session has ended. The classification executed whenever a packet is received is done by the cumulative packet-based classifier (CPC), and the classification executed after the session ends is done by the terminated flow-based classifier (TFC). A session is composed of a series of two-way packets. Therefore, session f is denoted as f = {p 1 , p 2 , . . . , p n } based on the sequence of two-way packets received by the IDS. Here, f is a session consisting of n packets. The session is defined based on a five-tuple, <sip, dip, sport, dport, protocol>, in which sip, dip, sport, and dport denote the source IP, destination IP, source port, and destination port, respectively. Thus, <ip 1 , ip 2 , port 1 , port 2 , protocol> and <ip 2 , ip 1 , port 2 , port 1 , protocol> are regarded as the same session if the lifetimes overlap.
Whenever an IDS or IPS receives a packet, it creates and updates session statistics to generate features for the relevant session. Now, suppose F k is the feature vector generated using the first k packets received. Assuming the total number of packets of the session is n, a total of n pairs of feature vectors are created for the session (i.e., F 1 , F 2 , . . . , F n ). Here, the CPC uses F 1 , F 2 , . . . , F n , to classify whether the session is under attack, whereas the TFC uses only F n to estimate abnormality. It is common to remove sip and dip from the features used to train the CPC and TFC. This is to prevent creation of a specific session-dependent model. Furthermore, in the CPC, dport is excluded from the feature. Now, we describe in detail updating and generating features whenever a new packet is received. We also show how the CPC and TFC work.

Incremental Feature Generation
Whenever a packet is received, the proposed algorithm updates information on the session to which the packet belongs, and creates the features required for classification. As shown in Fig. 1, the session information is stored in the feature table, which consists of internal session states and session stateful features. Internal session states are not features, but rather, the information necessary to create features. For example, Last Flow Timestamp (a field included in internal session states) stores the time at which every packet is received. This value is then used to update other values of internal session states or to create other features.
The internal session state is composed of bi-directional flow information and uni-directional flow information, i.e., forward and backward flow information. Whenever a packet is received, the corresponding fields for bi-directional flow information are always updated. Subsequently, fields for forward or backward flow information are updated according to the direction of the packet. Tab. 1 shows some selected fields for bi-directional flow information and shows how they are updated whenever a packet is received. Similarly, Tab. 2 shows a partial set of the fields for forward information, and how to update them. We omit fields for backward flow information since they are almost identical to the forward ones.   We call information fields similar to the internal session state that work as features for classification the session stateful features. In session stateful features, bi-directional, forward flow, and backward flow information exists, and the corresponding fields are updated according to the direction of the received packets. Tab. 3 provides some selected fields of bi-directional flow information in session stateful features and shows how we update the fields. As mentioned earlier, internal session states and session stateful features are updated every time a packet is received. Here, we should note that session stateful features do not include all features required for machine learning and classification. It means that we need to create the remaining features using internal session states and session stateful features. Such features are called derived session features. They are not stored or maintained in the feature tables shown in Fig. 1, and are temporarily generated through internal session states and session stateful features whenever required. Derived session features contain fields for bi-directional, forward, and backward flows.
Tab. 4 shows some typical bi-directional derived session features, and they are created by using internal session states and session stateful features. This feature-generation approach allows the system to progressively build session features. Whenever a packet is received, internal session states and session stateful features are updated. When the entire feature set is needed, derived session features are easily created without a high cost. Generally, we incur high overhead to create the entire feature set after the session is terminated. However, incremental feature generation distributes the overhead over time.

The Cumulative Packet-Based Classifier
If F k (k < n) for a specific attack session partially reflects the characteristics of an attack, it is possible to detect the attack using F k . Here, the smaller the value of k, the faster the attack can be classified; however, the probability of incorrect classification may also increase. The overall characteristics of a session can be identified more accurately with an increase in k, but more time is spent detecting attacks. Ultimately, it is necessary to decide when to perform classification for the session. The session is no longer processed if it is classified as an attack in the CPC, and the relevant packet and the subsequent packets received by the IDS are discarded. Therefore, it is necessary to be cautious when classifying an attack in the CPC. In general, when machine learning is employed to detect a network intrusion, the relationship between the sip and dip address values should be used to create a feature (for example, by determining if they are the same). However, sip and dip address values should be removed from the feature. To make the CPC more reliable in detecting attacks, all features that can affect the creation of a model dependent on the session itself should be removed. Hence, sip, dip, and dport are all removed in the proposed method, whereas only sip and dip are removed in the conventional methods.
In general, class type and score are obtained as a result of CPC classification. The closer the score is to 1, the more reliable it is, whereas the closer the score is to 0, the more unreliable it is. Therefore, the minimum CPC score (MCS) should be determined-the higher the MCS, the lower the rate of misclassification by the CPC. However, with an increase in the number of packets used to generate features for classifying a session, it takes longer to detect an attack-the lower the MCS, the quicker the detection in the CPC. However, this leads to an increase in the probability of error. Therefore, in the proposed method, it is crucial to maintain high classification accuracy and to improve speed at the same time by setting the MCS to an optimal value.

The Terminated Flow-Based Classifier
The TFC and CPC use basically the same feature structure; however, unlike the CPC, the TFC performs classification after the session ends. Hence, there is no need to process the session in real time. Therefore, unlike the CPC, it is more advantageous for the TFC to use a classification algorithm with high accuracy rather than considering speed or computational complexity. Furthermore, while the CPC performs learning and classification using all of F k (k < n), the TFC classifies only finished sessions. Therefore, in the TFC, learning and classification are performed using only the F n features generated based on all packets of the finished session. This method uses the same features as those used in the CPC, but uses one more: dport.

Parameter Setting
As described above, the performance of the proposed method varies depending on the MCS. Therefore, after training, the optimal MCS value is set based on the results from classifying the training data. The proposed method uses decision tree algorithms for machine learning. In general, the decision tree algorithm is suitable for an IDS that processes large amounts of data owing to its fast training time, high classification speed, and low memory usage. Of the several decision tree algorithms, the most appropriate should be selected. Therefore, by considering three algorithms-DT, random forest (RF), and boosted DT (BDT)-we measure the F1-score while increasing the MCS value from a combination of each algorithm. Using these results, the optimal MCS for each algorithm was selected. The ISCXIDS2012 and CICIDS2017 datasets were used for the experiment. For reference, the measurement results using the CICIDS2017 dataset are shown in Fig. 2.  Fig. 2, the F1-score was the highest when RF and BDT were used among the combinations of first-and second-level classifiers. Here, the F1-score was consistently maintained when the MCS was 0.977 or higher; however, the average detection time increased significantly, when the MCS was 0.998 or higher. Therefore, in the proposed method, we conducted experiments by setting the MCS to 0.998 when using RF for the CPC and BDT for the TPC, and by setting it to 1 when using DT and BDT. Similarly, the same method was used to select the best combination of classifiers and the relevant MCS for the ISCXIDS2012 dataset. Thus, the MCS was set to 0.985 when using RF and BDT, and it was set to 1 when using DT and BDT.

Overall System Operation
The overall operation of the proposed IPS is as follows. When a packet arrives, the IPS first determines whether to receive it or not according to the firewall policy. If the matched policy returns deny, it is discarded. Otherwise, it is accepted, and the internal session states and session stateful features are created or existing features are updated. After that, the system builds entire features for the packet after creating derived session features. It determines if the session for the packet is benign or not through the CPC. If the classification score is higher than MCS, the session is added to the firewall policy blacklist or whitelist based on the class type. Conversely, if it is lower than MCS, the packet is forwarded regardless of the classification result. When the session terminates, the internal session state and session stateful feature data for the session expire and are removed after building the final features. The final determination about the session is done by the TFC; the results are logged and the administrator is notified, if necessary. The overall operation is in Algorithm 1.

The Environment
To evaluate the performance of the proposed method, we compared its performance using various algorithms and two datasets: CICIDS2017 and ISCXIDS2012 [17,20]. For training and testing, the datasets were split in a 6:4 ratio. We chose these datasets because packet and labeling data are available, and therefore, features can be generated using CICFlowMeter. We used 80 features proposed in ISCXIDS2012. However, as described in Section 3, sip, dip, and dport were excluded from the first-level classifier, but only sip and dip were excluded from the secondlevel classifier. The size and characteristics of each dataset are summarized in Tab. 5. For the performance comparison, we employed a 1D-CNN [21], LSTM [22], and TCN [23], which are deep learning algorithms [24], along with DT and Naïve Bayes (DTNB) as a clusteringbased method [25], BDT as a boosted algorithm [26], and DT and RF [27], which are DT categories [18]. The parameter settings for each algorithm are listed in Tab. 6.

Comparison of Detection Rates
Of the various performance indicators in the classifiers used in the NIDS, the most crucial factor is detection rate. If normal and attack sessions cannot be accurately classified, such an algorithm is impractical for an NIDS, regardless of its high classification speed. In this experiment, we measured accuracy, precision, recall, and F1-score to compare the detection performance of each algorithm. The experimental results are shown in Figs. 3 and 4, which indicate that the results are similar, regardless of the dataset type. As seen in the figures, the proposed method using a combination of RF and BDT showed higher performance than the conventional competing methods for all metrics. Furthermore, the method using DT and BDT achieved slightly lower performance than the 1D-CNN and LSTM. In all cases, the proposed method using RF and BDT showed the highest accuracy and F1-score. This clearly demonstrates that the proposed two-level classifier structure is effective in improving accuracy.

Comparison of Detection Times
To detect an attack in real time, it should be possible to detect the attack before the session ends. To evaluate this capability, we measured and compared the time taken from the start of the session to detection of an attack. The shorter the time, the more effective the method is at detecting and defending against an attack in real time. Tab. 7 shows the results from comparing detection times for the proposed and comparison methods. Because all the competing methods are session-based IDSs, classification and detection were performed after the sessions ended.  Accordingly, in Tab. 7, the detection times of the session-based methods are expressed as the session duration, assuming no additional processing time existed. In an actual implementation, the comparison methods may take more time than the results shown in Tab. 7. The proposed method indicates the time taken for accurate detection after the session started.
As shown in Tab. 7, the proposed method makes use of the CPC to detect attacks even before the session ends. Hence, the speed at which an attack is detected by the CPC is a valid metric to gauge the performance of the comparison methods. Tab. 7 indicates that the proposed method can detect attacks significantly faster than the conventional methods. In particular, the (DT, BDT) method was at least five times faster than the (RF, BDT) method. The proposed method based on RF and BDT was almost three times faster than the conventional session-based methods. This clearly shows that the proposed method can provide detection speed that is not achievable with conventional methods. In particular, most session-based methods detect the session end using a timeout value. For the TCP, the session end time can be determined by detecting the FIN packet; however, in the other protocols, such as a UDP session, the IDS cannot accurately detect the session end time. Therefore, it needs to estimate the end time after no packets are transmitted for predefined duration, which is generally set at 30 s to 120 s. For session-based IDSs in a real environment, the sum of the session duration and the timeout value is defined as the total detection time. Thus, the gap in the actual detection speed between the proposed method and a conventional session-based method becomes larger than that shown in Tab. 7. To compare detection speeds more accurately, it is necessary to compare the speed for each class. Tab. 7 also shows those detection speeds, and the proposed method detects each class much faster than the conventional methods. In particular, the proposed method using (DT, BDT) can significantly reduce the detection time, compared to the proposed method using (RF, BDT). As seen in the previous experiment, the performance from (DT, BDT) is slightly lower than from (RF, BDT) in terms of detection rate. Therefore, it is advantageous to use (RF, BDT) when detection rate is more important than speed. Conversely, it is better to use (DT, BDT) when speed is more critical than detection rate.
Tab. 8 shows the average detection time for each class in the ISCXIDS2012 dataset. As with CICIDS2017, the detection time can be significantly reduced compared to the conventional session-based methods, so it is more suitable to use (DT, BDT) instead of (RF, BDT) if high detection speed is needed.
The detection time is affected by the inter-packet time in a session. That is, with an increase in the inter-packet time in the same session, the detection time also increases. Therefore, instead of measuring the relative detection time, we can compare the performance more accurately by determining how many packets within each session are received before an attack is detected. Tab. 9 summarizes the average number of packets required to detect each class type in CICIDS2017.  Tab. 9 indicates that the proposed method requires considerably fewer packets to detect an attack than the conventional session-based methods. Moreover, we can see that the number of packets required for (RF, BDT) is not significantly different from (DT, BDT). Tab. 10 summarizes the results using ISCXIDS2012, which are similar to those for CICIDS2017.

System Load
The proposed NIDS should repeatedly classify the session whenever a new packet is received until the class type of a specific session is detected. That is, unlike the conventional session-based NIDS that requires a one-time classification for each session, the proposed NIDS performs more classifications. This can result in significantly higher overhead in the system, compared to the conventional method. Thus, such increased overhead can be an obstacle to real-time processing. The number of packets needed to classify each session becomes the most crucial factor, and should be minimized.
For a more accurate analysis of system loads, the total number of packets included in a session, and the number of packets required before detection, are displayed for each session in Fig. 5. The figure shows that the average number of packets required for detection is less than five in most cases. In particular, for the sessions in which the total number of packets is very high (>1000), the number of packets required for detection tends to stay consistently small, without a significant increase. For example, when using (DT, BDT) for the CICIDS2017 dataset, we observed that, even for a session when the total number of packets was more than 100,000, it is possible to determine whether an attack has occurred with only the first two packets of the session. Thus, the proposed method can classify normal and attacked sessions while maintaining low system loads regardless of the session length. This is a significant characteristic for improving the performance of the NIDS. This characteristic demonstrates that real-time IPS development is a real possibility.

Conclusion
We proposed a new approach that can detect cyberattacks in real time. It is composed of two classifiers, one for processing packets in real time and the other for processing sessions in non-real time, so it can simultaneously increase detection performance in terms of speed and accuracy. In this research, we showed a promising solution enabling a machine learning-based real-time IPS rather than a machine learning-based non-real-time IDS by providing incomparable detection speed and accuracy. Of course, the proposed approach cannot process all the traffic and detect any kind of attack in real time. The hardware platform costs are higher than conventional IDSs since it requires almost twice the processing power, compared to the existing session-based approaches. However, despite these limitations, it is of great significance, showing that it is possible to implement real-time IPS-based rather than IDS-based machine learning algorithms. Future research will find solutions to the shortcomings revealed by this research. In doing so, we believe the proposed approach will improve so it is able to detect and defend against attacks in real time, even on 100-gigabit networks. We also expect that it can protect networks and users from malicious users and various network attacks.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.