The Internet Control Message Protocol (ICMP) covert tunnel refers to a network attack that encapsulates malicious data in the data part of the ICMP protocol for transmission. Its concealment is stronger and it is not easy to be discovered. Most detection methods are detecting the existence of channels instead of clarifying specific attack intentions. In this paper, we propose an ICMP covert tunnel attack intent detection framework ICMPTend, which includes five steps: data collection, feature dictionary construction, data preprocessing, model construction, and attack intent prediction. ICMPTend can detect a variety of attack intentions, such as shell attacks, sensitive directory access, communication protocol traffic theft, filling tunnel reserved words, and other common network attacks. We extract features from five types of attack intent found in ICMP channels. We build a multi-dimensional dictionary of malicious features, including shell attacks, sensitive directory access, communication protocol traffic theft, filling tunnel reserved words, and other common network attack keywords. For the high-dimensional and independent characteristics of ICMP traffic, we use a support vector machine (SVM) as a multi-class classifier. The experimental results show that the average accuracy of ICMPTend is 92%, training ICMPTend only takes 55 s, and the prediction time is only 2 s, which can effectively identify the attack intention of ICMP.
Internet Control Message Protocol (ICMP) covert tunnel is used to transmitting special information to processes or users prevented from accessing the information. It is more hidden and more difficult to detect than malware traffic. The purpose of using covert channels is to send data in the network while ensuring that the sending is unnoticed by a third party and without alerting any firewalls or Intrusion Detection Systems (IDS) on the network. Studies have shown that a large website may have 26 gigabyte (GB) of information illegally stolen by covert tunnels in a year, assuming that an ICMP packet only carries 1 bit of data [
Several researchers have oriented their research axes to detect covert channel attacks using multiple methods and techniques. Currently, covert tunnel detection is mainly studied in terms of both traffic behavior and signature.
The detection method based on traffic behavior uses behavior characteristics such as the maximum, minimum, average time interval, message size, and the ratio of the number of request and response messages within a specified time window as the detection basis. This method takes all traffic within a specified time window as a detection object, and can only determine whether a covert tunnel has been established at both ends of the communication within a certain time window, and cannot locate specific malicious traffic [
Signature-based detection [
Through the analysis of a large number of ICMP covert tunnel traffic, we found that ICMP covert tunnel traffic has obvious and specific attack intentions in the data part, such as shell attacks, access to sensitive directories and other illegal behaviors. Corresponding shell commands, sensitive directories, communication protocol keywords, tunnel reserved words, and common network attack keywords often appear in the data part of the malicious traffic of ICMP covert tunnel. For example, the Hypertext Transfer Protocol (HTTP) keyword “www”, the sensitive directory “User” in the Windows operating system, the reserved word “TUNL” in the tunnel tool ptunnel, the shell command “docker pull nginx; /bin/sh shell.sh”. With these types of keywords as features, the attack intention of the covert tunnel can be effectively detected, and targeted defensive measures can be taken.
A large number of studies have proved that machine learning methods have good generalization in traffic detection. Among them, SVM [
In this paper, we propose ICMPTend, an ICMP covert tunnel attack detector, by extracting the corresponding keyword features for common ICMP covert tunnel attack intent and using SVM as a classifier algorithm.
In summary, we make the following contributions in this paper:
We propose a systematic ICMP covert tunnel attack intent detection framework ICMPTend, which consists of five steps: data collection, feature lexicon construction, data preprocessing, model construction, and attack intent prediction. It can detect a variety of attack intentions, such as shell attacks, sensitive directory access, communication protocol traffic stealing, filling tunnel reserved words, and other common network attacks. We build a multi-dimensional malicious feature lexicon containing keywords for shell attacks, sensitive directory access, communication protocol traffic theft, filling tunnel reserved words, and other common network attacks. The experimental results show that the average accuracy of ICMPTend reaches 92%, the training time is only 55 s, and the prediction time is only 2 s, which can effectively identify the attacking intention of ICMP.
With the rapid development and progress of network technology, our daily work is increasingly dependent on the network. While network technology brings us convenience, it also brings hidden security threats. Many researchers have begun to study the application of artificial intelligence technology in network attack detection [
In the detection method based on statistical behavior, [
In signature-based detection methods, the main focus is to match the data part with a specific signature. Some covert tunnel tools generate traffic with distinct signatures, e.g., icmptunnel generates tunnel traffic with the signature string “TUNL”. Some ICMP covert channels will also be used to transmit the content of other protocols, such as HTTP and Domain Network System (DNS). Keywords “TUNL”, “http://” and “DNS” can be used as typical signature features. There are two problems with detection based on data signatures: first, it needs to accumulate signatures continuously, unable detect unknown attacks, and its generalization ability is weak; second, the detection unit of this method is a single traffic flow, and it cannot detect context-sensitive covert tunnel which splits payload into multiple traffic for delivery.
Symbols used in this paper and their meanings are shown in the following
Symbols | Meaning |
---|---|
FD |
The feature word numbered |
|∗| | Number of ∗ elements |
|
Feature vectors |
|
The |
|
Term Frequency-Inverse Document Frequency (TF-IDF) value of the corresponding feature |
|
Dimension of the feature vector |
|
The data portion of the |
|
ICMP traffic labels, |
The detection framework of ICMPTend is shown in
Extracting features from the perspective of specific attack intentions of ICMP covert tunnels to identify hidden tunnels is essentially a multi-classification task of ICMP hidden tunnels based on attack intentions. In this paper, we mainly consider a large number of attack intents in covert tunnels, such as shell attacks, access to sensitive directories, stealing communication protocol traffic, filling tunnel reserved words, and common network attacks. There are five types of specific attack intentions. The benign samples come from normal ICMP traffic in the campus network of Beijing University of Posts and Telecommunications (BUPT), with a total of 1,000; the malicious samples come from the following sources:
Sample ICMP tunnel traffic collected from sites such as GitHub, counting 442 entries. Rules and other ICMP covert tunnel detection models judged as malicious, and manually sampled and labeled malicious traffic in the campus network, totaling 659 items. The malicious traffic was constructed and communicated using ICMP covert tunnel tools such as icmptunnel, ptunnel, and icmpsh, and then crawled using Wireshark, counting 3,361 entries.
A total of 4,462 malicious samples with malicious attack intent were obtained, and the number of samples with specific attack intent of 5 types is shown in
Label ( |
Training set | Testing set | Total |
---|---|---|---|
700 | 300 | 1000 | |
660 | 282 | 942 | |
520 | 222 | 742 | |
686 | 294 | 980 | |
474 | 203 | 667 | |
784 | 336 | 1120 | |
Total | 3824 | 1638 | 5462 |
Feature words are mainly composed of letters, numbers and special symbols, and different feature databases have different construction methods. Some commercial software constructs feature databases by directly querying the feature signature of malware [ SHELL_ATTACKS keywords: Shell attacks are essentially composed of various shell commands, and shell commands are divided into built-in commands and external commands. Therefore, this paper combines the malicious samples in the training set to collects 78 keywords of built-in shell commands and 33 common keywords. There are 111 external commands, such as the built-in command keyword “kill” for forcibly terminating the startup process and the external command keyword “sh” for starting a shell script, which constitute the keyword set for shell attacks. ACCESS_SENSITIVE_DIRS keywords: When hackers enter sensitive directories, they may add, delete, change, check, copy, upload and download files in sensitive directories. Therefore, this article combines the malicious samples in the training set to collect 241 common sensitive directories, sensitive file names, and keywords for sensitive file operations in Linux and Windows operating systems, such as sensitive directories “etc” and “bin” in Linux. And the keyword “read( )” for Python functions used to read and write the contents of a file. For example, the sensitive directories “etc” and “bin” in Linux, and the keywords “read( )” and “write” of the Python function used to read and write the contents of files are used to construct the keyword set for sensitive directory access. STEAL_PROTOCOLS keywords: After some ICMP covert tunnels are established, traffic from the controlled side using any communication protocol will be sent to the control side through the covert tunnel. In this paper, we combine the malicious samples in the training set to collect the names of common communication protocols and a total of 86 keywords related to each communication protocol, such as “http://” involving HTTP protocol, “www.”, “.com” and “.cn” etc. are used to construct a keyword set for the theft of communication protocol traffic. FILL_RESERVED_WORDS keywords: Some ICMP covert tunnel tools [ COMMON_CYBER_ATTACKS keywords: After the ICMP covert tunnels are established, some attackers send common network scripts such as SQL injection, command execution, cross-site scripting attacks to the controlled end through the covert tunnel, and the controlled end launches corresponding attacks on the target server, thus evading the security personnel's tracking by means of this intermediate bridge. In this paper, we collect a total of 150 common network attack keywords with malicious samples in the training set, such as “select”, “union” and “from” frequently used in SQL injection, and “<script>”, “alert” and “<img>”, frequently used in cross-site scripting attacks.
The final set of these five types of keywords are combined into a feature database (FD) containing 637 unique feature words. The composition and description of the feature database are shown in
Keyword | FDi | Feature word instances |
---|---|---|
SHELL_ATTACKS | 111 | rm, cd, mkdir, wget, cat, echo, kill, sh |
ACCESS_SENSITIVE_DIRS | 241 | etc., admin, bin, read, write, db, C: |
STEAL_PROTOCOLS | 86 | http, www, .com, .cn, .gov, https, ssh, scp, irp, if |
FILL_RESERVED_WORDS | 76 | TUNL, tun0, signature, tunnel, DataBuffer |
COMMON_CYBER_ATTACKS | 150 | select, from, union, insert, alert, <script>, <img> |
TOTAL (unique words) | 637 |
Data preprocessing is the process of converting the hexadecimal data of ICMP data part into tensors that can be recognized by the machine learning model after hexadecimal decoding, string decoding, and text feature representation. The specific process is as follows:
Step 1: Hexadecimal Decode
The data field of the original ICMP traffic stores data in the form of a hexadecimal stream. In order to extract the text features of the transmitted content, the hexadecimal data needs to be decoded. The decoding function is shown in
As shown in Step 2: String Decode
With the continuous development of various encryption technologies, attackers use Uniform Resource Locator (URL) encoding [
As shown in
Step 3: Text Feature Representation
In this paper, we use word frequency-inverse document frequency (TF-IDF) [ Each word in the FD is numbered, which corresponds to the index in the feature vector with a latitude size of, Initialize a vector for each ICMP traffic
In order to clarify the specific attack intentions of ICMP covert tunnel, ICMP traffic flow needs to be classified into multiple categories. There are six categories of multi-classification, namely, normal traffic, shell attack, sensitive directory access, communication protocol traffic stealing, filling tunnel reserved words, and common network attacks as shown in the aforementioned
ICMPTend uses SVM as the classification algorithm. SVM has maintained its unique advantage in solving classification problems for small and medium samples, high-dimensional, and linearly indistinguishable datasets. The ICMP covert tunnel dataset constructed in this paper happens to be small-sample, high-dimensional, and linearly indistinguishable, so theoretically SVM is suitable for the situation in this paper.
The ICMPTend receives the data part of ICMP traffic as input, and outputs the label of the category to which the traffic belongs. The label corresponds to the specific attack intent. Suppose the training set contains the data part of ICMP traffic.
In order to verify the effectiveness and practicability of the database-based SVM covert tunnel attack intent detection model proposed in this article, this section answers the following four questions through related experiments:
The software environment used in this paper is Python 3.7, Scikit-Learn 0.21.3, Wireshark 3.2.7.0, the operating system is Ubuntu 16.04, and the hardware environment is Intel(R) Core(TM) i7-8550U @ 1.80 GHz central processing unit (CPU), 8 GB random-access memory (RAM). The goal of this article is to measure the effectiveness of the model, which is essentially a standard multi-class model. Therefore, precision, recall, F1 score, accuracy and macro average are used as evaluation indicators to evaluate the experimental results of the multi-class model. This is shown in the following
Evaluation indicators and formulas | Meanings in this paper |
The percentage of samples correctly identified by the model as ICMP covert tunnel intentions out of the total number of samples identified by the model itself. | |
The percentage of attacks that are correctly classified as ICMP covert tunnels among all samples in this category of data set. | |
When determining the accuracy rate, the larger the F1 score, the larger the proportion of the ICMP covert tunnel traffic correctly classified by the model to the total number of malicious traffic samples in the data set. When determining the recall rate, the larger the F1 score, the larger the proportion of the ICMP hidden tunnel traffic correctly classified by the model to the total number of malicious traffic samples identified by the model. | |
The percentage of traffic that the model correctly judges to the total traffic. The higher the accuracy of the model, the more effective the model. | |
In the multi-class evaluation index, the higher the macro mean, the better the model. |
In order to answer Q1 and Q2, we build a feature dictionary based on the attack intent of shell attacks, access to sensitive directories, stealing communication protocol traffic, filling tunnel reserved words, and common network attacks, and construct feature vectors based on the feature dictionary as input to the model. In order to verify the effectiveness of the feature construction method in this paper, the feature vector is input into a separate model for training and prediction, and the effectiveness of the detection is evaluated.
The results of the comparison experiments using SVM, logistic regression (LR), and Naive Bayesian (NB) models are shown in
Model | Class | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|
SVM | NORMAL | 0.91 | 0.90 | 0.90 | 0.92 |
SHELL_ATTACKS | 0.89 | 0.89 | 0.89 | ||
ACCESS_SENSITIVE_DIRS | 0.92 | 0.90 | 0.91 | ||
STEAL_PROTOCOLS | 0.93 | 0.88 | 0.90 | ||
FILL_RESERVED_WORDS | 0.91 | 0.90 | 0.90 | ||
OTHER_ CYBER_ATTACKS | 0.89 | 0.86 | 0.87 | ||
LR | NORMAL | 0.89 | 0.89 | 0.89 | 0.91 |
SHELL_ATTACKS | 0.87 | 0.88 | 0.87 | ||
ACCESS_SENSITIVE_DIRS | 0.90 | 0.88 | 0.89 | ||
STEAL_PROTOCOLS | 0.91 | 0.86 | 0.88 | ||
FILL_RESERVED_WORDS | 0.91 | 0.90 | 0.90 | ||
OTHER_ CYBER_ATTACKS | 0.88 | 0.86 | 0.87 | ||
NB | NORMAL | 0.87 | 0.87 | 0.87 | 0.89 |
SHELL_ATTACKS | 0.89 | 0.89 | 0.89 | ||
ACCESS_SENSITIVE_DIRS | 0.86 | 0.80 | 0.83 | ||
STEAL_PROTOCOLS | 0.89 | 0.88 | 0.88 | ||
FILL_RESERVED_WORDS | 0.88 | 0.87 | 0.87 | ||
OTHER_ CYBER_ATTACKS | 0.86 | 0.84 | 0.85 |
Comparing the detection effect of the SVM model with that of the LR and NB models, the SVM model is also superior to other models in all aspects. This show that the SVM model is more appropriate in discerning the specific attack intent of the covert tunnel. The feature construction method in this paper is effective and SVM can be used as a classifier for covert tunnel specific attack intent detection [
The general method of using keywords for classification in machine learning is to use the collection of all words in the training set after sample word separation to form a vocabulary to form a large-dimensional vocabulary, which often requires further dimensionality reduction. In order to verify the effectiveness of this dimensionality reduction method, a comparative experiment before and after dimensionality reduction was constructed. The pre-dimensionalization experiment is to split the data part of each traffic in the training set with spaces and special symbols, and use the set of all words obtained after splitting as a feature dictionary, which contains about 30,000 words. The dimensionality reduction experiment adopts the feature dictionary construction method of this article. The other experimental steps are the same.
As shown in
In terms of CPU resource consumption, as shown in
In terms of memory resource consumption, as shown in
Dimensions | Class | Precision | Recall | F1 | Accuracy | Train time |
---|---|---|---|---|---|---|
About30000 | NORMAL | 0.86 | 0.80 | 0.83 | 0.87 | 4 min |
SHELL_ATTACKS | 0.87 | 0.81 | 0.84 | |||
ACCESS_SENSITIVE_DIRS | 0.89 | 0.88 | 0.88 | |||
STEAL_PROTOCOLS | 0.80 | 0.80 | 0.80 | |||
FILL_RESERVED_WORDS | 0.87 | 0.86 | 0.87 | |||
OTHER_ CYBER_ATTACKS | 0.86 | 0.85 | 0.85 | |||
637 | NORMAL | 0.91 | 0.90 | 0.90 | 0.92 | 0.5 min |
SHELL_ATTACKS | 0.89 | 0.89 | 0.89 | |||
ACCESS_SENSITIVE_DIRS | 0.92 | 0.90 | 0.91 | |||
STEAL_PROTOCOLS | 0.93 | 0.88 | 0.91 | |||
FILL_RESERVED_WORDS | 0.91 | 0.90 | 0.90 | |||
OTHER_CYBER_ATTACKS | 0.89 | 0.86 | 0.87 |
In order to clarify the gap of detection capability between attack intention detection based on multi classification and anomaly detection based on binary classification, we aggregate all types of malicious samples into malicious samples, and conducts anomaly detection experiments based on two classifications. And compare the results of the anomaly detection experiment with the results of the attack intention detection experiment.
The experimental results are shown in
Model | Category | Precision | Recall | Macro- Precision | Macro- Recall | Accu-racy |
---|---|---|---|---|---|---|
Multi- Classification | NORMAL | 0.91 | 0.90 | 0.91 | 0.89 | 0.92 |
SHELL_ATTACKS | 0.89 | 0.89 | ||||
ACCESS_SENSITIVE_DIRS | 0.92 | 0.90 | ||||
STEAL_PROTOCOLS | 0.93 | 0.88 | ||||
FILL_RESERVED_WORDS | 0.91 | 0.90 | ||||
OTHER_ CYBER_ATTACKS | 0.89 | 0.86 | ||||
Binary classification | NORMAL | 0.95 | 0.94 | 0.96 | 0.95 | 0.97 |
ABNORMAL | 0.96 | 0.96 |
This paper uses ICMP data as the starting point to extract malicious attack intention keywords from five perspectives: shell commands, sensitive directories, communication protocol keywords, tunnel reserved words, and common network attack keywords, and build an ICMPTend detection model. Compared with the use of dictionary suffix cutting to construct feature vectors, it reduces noise interference and greatly reduces the dimensionality of feature vectors, which can clarify the attack intention of malicious traffic contained in the data part.