Advanced Persistent Threat Detection and Mitigation Using Machine Learning Model

,


Introduction
Advanced Persistent Threats (APT) are one of the major cyber security attacks that have far-reaching consequences on multinational corporations, governments and the public.Attackers must be successfully thwarted from achieving their malicious goals, such as sabotaging a program, infrastructure takeover, credential stealing, etc.This is the ultimate goal of cybersecurity.Xuan et al. [1] debate the relevance of the advanced or sophisticated nature of the threat while defining APT as a consistent cyber-attack on a target in multiple stages to compromise the organisation by retrieving information, inherently causing a maximal loss in terms of finance and cyber damage.In 2018, the annual loss incurred by cyber-attacks such as APTs were predicted to increase by more than six trillion dollars.Post the North Korean-sponsored attack on Sony in 2014 and a devastating distributed denialof-service (DDoS) attack on Dyn in 2016, most organisations and enterprises have faced increasing rates of cyber-attacks, especially in the form of APTs [2].Due to the significant losses incurred, a higher ratio of investments is observed in APT detection and prevention systems.
Usually, the steps of an APT can be characterised by the well-researched reconnaissance of their target to minimise their vulnerability, appropriating a weaponised strategy, widespread usage of malware in lateral movement, followed by data exfiltration from the target organisation using information.Bahrami et al. [3] enlist a cyber kill chain (CKC)-based on seven stages of APT attack that also include reconnaissance, weaponization that mainly includes using phishing, Structured Query Language (SQL) injection, spyware, spam, delivery, exploitation, installation, command, and control (C2), and action on objectives (AoO).The loss of secure information from an organisation, government or commercial can compromise infrastructure and military installations.Corporate and nation/state-sponsored espionage to procure state-of-the-art technology and intellectual property (IP) is also one of the main objectives of APTs.On that note, an APT attack can be handled on two fronts: an apt defence or prevention system and a sufficient attack detection system.
Due to the nature of the APT ransomware as a stealthy threat actor, a good attack detection system should be able to overcome the limitations of the traditional feature-based system by identifying abnormal patterns and attempts in computer networks and correlating them over a long period [4] and differentiating between false positives and false negatives [2].Similarly, APTs are strategically motivated and well-funded, resulting in a non-repetitive pattern of attacks, which is highly unnoticed by traditional misuse-based detectors that use signature detection patterns of past similar activities [5].When compared to signature-based detection, anomaly-based detection could be proposed to detect any divergent patterns from normal events in the network (a baseline profile).However, attackers often circumvent the detector by treating network events and system calls as temporal sequences, resulting in underperforming APT detection [6].
Wang et al. [4] classify APT attack detection into two different models based on the host and the network traffic.Host-based detection systems use classification models such as random forest and algorithms such as Naïve Bayes and decision trees to analyse the network connectivity, Central Processing Unit (CPU) usage, memory access, and process creation.Network traffic-based detection collects communication traffic data and analyses it by feature extraction and detection.However, the drawback with the implementation of detection is that it utilises intrusion detection and machine learning.Since attack detection is a fundamental classification issue, neural networks have been observed to deal with attack recognition effectively.For APT attack recognition, a forward feedback neural network model was proposed by Chen et al. [7], integration of support vector machine (SVM) and neural networks, and novel recurrent neural networks (RNN) have also been explored [8,9].Due to the high complexity and limitations of gradient spread and network layers, real-time APT attacks are often dealt with better by deep neural networks than neural networks in APT attack identification by Ameli et al. [10].This paper is organised such that the first section details the characteristics of APT along with the type of dataset used and the challenges and opportunities for APT attack detection.A survey of the various APT attacks and measures is detailed in Section 2 while Section 3 outlines the problem formulation for the proposed work.Section 4 details the various attacks detected and the algorithms involved.The defender mechanism is laid out in Section 5 along with its mathematical modelling.Section 6 details the experimental analysis carried out and the results recorded.Based on the observations a conclusion is drawn in Section 7.
Cyberspace is popularly used by several nations, states and governments for carrying out attacks.As a result of these cybernetic skills, disruption in electrical supplies and concerns related to election tampering prevails.Since its discovery, APT attacks have become a vulnerable and damaging aspect where even high-profile systems are easily hacked despite the complex protection algorithms [11].Categorized under conventional and unconventional, many APT attacks took place in different parts of the world, like China, Pakistan, Ukraine, and so on, related to intellectual property, privacy, and finance [12].The difference between APT and traditional attacks is given in Fig. 1.

Figure 1: Difference between APT and traditional attacks
To find out whether the occurring attack is an intentional APT or not, certain criteria are mentioned by authors in reference [13][14][15].Some of the inferences are given below: APT attacks can be avoided in several ways: Unexpected or likely assaults require minimum countermeasures and security procedures to prevent such occurrences.
This attack requires slight modification on the portion of attackers: If the attacker's goal doesn't require any modification or evasive movement concerning defensive movements, there occurs an issue in the target's environment.
APT attack uniqueness in its variants: His assault's effectiveness mainly depends on the novel approaches and attack methodologies.However, the established process and tools can detect the attack if the techniques aren't novel.
APT attacks are either specific or broad, as this assault isn't consistent in all the attack categories.They can be segregated into five distinct stages for dealing with the attack.They are as follows: Stage 1-Reconnaissance: In stage one, the target is clear and becomes more efficient concerning the exploration level.It is a vital stage.Stage 2-Establishing Foothold: Stability, entrance and penetration onto the objective occur in this stage.As accessing the target's network is the attacker's primary goal, this stage serves to be the second significant stage.Stage 3-Staying Undetected: For stealing the sensitive data and comprising the critical components, the attackers need to traverse the target's network alongside and stay hidden or undetectable.Stage 4-Impairment/Exfiltration: Operations such as the delivery of the attacker's command, control centre and data retrieval for obtaining corporate data are done in this stage.This is the stage where the attacker can destroy or weaken the essential components of the target organization.Stage 5-Post Impediment/Post-Exfiltration: Accomplishing the attacker's goal, such as deactivating critical components, destroying evidence, exfiltration process completion, and clean withdrawal guarantee from the network's organization, is done in this final stage.
For building an efficient model, the dataset sets a milestone [16].The commonly used APT datasets are tabulated below in Table 1.

Free
In the APT attack detection process, certain challenges are incurred.In this section, a few are discussed in detail about Challenges and Opportunities for APT Detection.
Long Duration Attacks-Sometimes, APT attacks are performed for a long duration, hence detecting them is quite challenging.If the system shows any suspicious behaviour, it is further correlated with the previous ones in the system.Whereas, the situation is critical if it is a large network with more connections, as false positives and incorrect leads are possible.
Combination with Malware detection-In an APT attack, for establishing communication and data exfiltration tunnels, submission of malware is required where numerous studies were carried out, among which the best example is by Sriram et al. [21], who used an end-to-end deep learning algorithm in identifying different types of malware of dynamic file size.
Powerful & Determined Attackers-The strength and determination of APT attackers is another challenge.Even if a strong defence is in place, it is easy for the attackers to build a complex tool or strategy to break down the defence system.Especially, at present, with the invention and availability of plenty of resources, new malware and custom tools are developed by attackers for attaining their goal. 1 are useful, there still is a need for an efficient network intrusion detection dataset for investigation.For example, KDD 99 is a public benchmark used for evaluating the performance of the system most widely prevalent in studying an IDS network [22].While both APT and DoS attacks are prevalent in the dataset, for research and analysis, most of the dataset doesn't work out as the host gets compromised gradually.Hence, for reducing the false positives and improving IDS performance, the dataset needs to be labelled properly.

Lack of Dedicated APT Network Intrusion Dataset-Although the popular datasets mentioned in Table
Infrastructure-Oriented Challenges-Another challenge in detecting and preventing an APT attack is the infrastructure or the environment-oriented threats like a large number of correlating events, large interconnections and data exfiltration techniques.
Adversarial ML-Based Attack Detection Methods-Last but not least challenge encountered in the detection of APT attacks is bypassing the defence systems without discovering them.Adversarial Machine Learning (AML) misleads the ML classifiers, and the samples created in the technique, like Fast Gradient Sign and Jacobian-Saliency Map attack, affect the deep-learning-based NIDS.Hence, the effectiveness of AML training needs to be increased.

Related Work
In the current Intrusion Detection System (IDSs), the detection of APT is a major challenge, as mentioned before.Numerous research studies were carried out to address this MSA attack and assault.A novel host-based APT detector-"SPuNgae", was proposed by authors in reference [23] that monitors the network and finds out the malicious URLs.Similarly, for the detection of data exfiltration, Sigholm et al., in reference [24], utilised Data Leakage Prevention (DLP) algorithm, which looks on for data leakage in the network.By employing Cyber Counterintelligence (CCI) sensors, the location of leaked data is detected accurately.Another approach was presented in [25] called TerminAPT, which tracks the data flow taking place in the APT campaign.In an APT system, for gathering information regarding Point of Entry (PoE), Spear phishing is the common method.In [26], the authors discussed the methodology involving mathematical computational analysis techniques in picking out spam emails.Developed to identify tokens and characters of spam behaviour, tokens need to be defined thoroughly in the algorithm, which is the only limitation.Different APT detection & prevention models and ML-based APT methods are also available and discussed in upcoming sub-sections.The various methodologies carried out so far have several drawbacks that need to be addressed and this work aims to improve some of these aspects of it.The proposed work addresses the drawbacks of the previously existing work in terms of accuracy, precision, F1 score and recall recording.
Various detection and prevention approaches were developed by renowned cyber security researchers for handling APT attacks that are state-of-art and tactical.In reference [27], the Honey-pot technique employable at the production part of the network is given.By diverting adversaries' focus from the intended goal, the technique possesses drawbacks like lack of real-time APT detection & prevention and post-infiltration detection.Big Data Analytics [28], a recent trend creator in the world of technology and data science, is recognised to be a good technique for detecting APTs.For pattern matching, this method is also quite helpful in analysing the flow of network topology.Unlike the Honeypot method, Big Data analytics have disadvantages like non-protection in real-time and false positives.Finally, a Context-based framework [29] improves user experience in information systems related to medical or context-based systems.
Data labelling is crucial in generating the correct answer when the question arises without certainty.In this process, the Machine Learning technique gets the primary focus.In ML, three types are available: Supervised, Unsupervised, and Semi-supervised.In detecting malicious RDP, supervised and unsupervised ML algorithms are preferred.
Unsupervised ML Algorithms-In this category, the algorithm tends to learn the infrastructure of input data without the necessity of explicit labels.Clustering algorithms come under this type where K-means [33], Agglomerative clustering, Density-based Spatial Clustering for Applications with Noise (DBSCAN) [34], and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) [35] get the high focus.

Attack Detection and Problem Formulation
There is a considerable limitation in the internet event log datasets that depict the behaviour of a real user.The datasets are taken from Windows event log datasets from Los Alamos National Laboratory (LANL).Network-based intrusion detection is facilitated by most of the publicly available datasets.Sensitive information in the host event logs limits organisations from distributing such data.This limitation is overcome by simulating the behaviour of users and attackers and generating synthetic datasets.The user behaviour in a real-world environment may not be completely depicted in such approaches as the datasets are generated purely based on hypothetical assumptions.Datasets are significant for successfully training and testing any machine learning algorithm.The primary limitations of generic intrusion datasets and systems are as follows: The attack traffic is captured at the external endpoints.When the attack vectors of the APTs are within the internal networks, these datasets are ineffective.
The APT attacks from sophisticated attackers may not be represented as the distinction between normal and anomalous behaviour is surmised in these datasets.
In semi-supervised learning, the real-world settings are not reflected efficiently, leading to data imbalance.However, supervised models operate optimally.
Real-time detection and prevention systems are limited, and most existing systems work on postinfiltration scenarios.

Dataset Combination
A comprehensive and unified dataset is combined with preserving user behaviour's realistic nature.
Comprehensive dataset-The Operationally Transparent Computing Cyber (OpTC) dataset from Defense Advanced Research Projects Agency (DARPA) is a comprehensive dataset used for anomaly detection.The host-based telemetry records are available in this dataset, and it is the most detailed public dataset that currently exists.Such datasets are privately gathered by executing professional security services or cybersecurity operations within the organization.Despite being generated by simulators, this dataset offers different types of malicious engagements similar to the baseline activity of modern tactics.The complexity and structure of private datasets are replicated for cyber defence research.The Los Alamos National Laboratory's (LANL) Unified Host and Network dataset is another public dataset that has similar features to the OpTC dataset.
Unified Dataset-For quantitative comparison, reproduction and further research, several barriers are significantly lowered by the data format in unified datasets.However, in the Transparent Computing program, the Engagement 3 and 5 datasets are the only publicly available unified datasets for detecting APT attacks.Self-collected limited attack data is used in most of the existing research work.These datasets do not entirely represent the real-world sophisticated attack scenarios as they contain limited attack scenarios.The LANL-based unified dataset is collected for 90 days.Detailed event logs on windows are provided comprehensively in this dataset, along with the missing logoff events.Events are categorized into days despite the obfuscation of timestamps in the dataset.However, the activities of benign users are only available in this dataset.

The Lateral Movement Detection Algorithm
Certain limitations are observed when the comprehensive and unified datasets are used individually.To overcome such limitations, malicious data is injected from the comprehensive dataset into the unified dataset.The attack event patterns and properties are retained as both datasets are gathered within the same organization.However, the mismatch and variations in the hash functions make it challenging to merge two datasets.To avoid bias in classification using machine learning, the existing hosts can be mapped into a larger group of hosts in the new dataset.In the comprehensive dataset, the collection of malicious logon events is termed M, and in the unified dataset, the collection of benign RDP logon events is termed B.
The source host S mi is mapped for each event e i ∈ M to S mj , which is a unique source host that is randomly selected from an event e j ∈ B. For the event e i , {U i , D i } represent the host tuple user name and destination mapped to {U k , D k }, a randomly selected unique tuple from e k ∈ B. The modified malicious events' e i is inserted into the benign dataset chronologically and labelled.Algorithm 1 provides the details of the injection of malicious remote desktop protocol authentication events.μ represents the mean of the benign session duration, σ 2 represents the variance of the benign session duration, x is the set of benign source hosts, and y is the set of malicious source hosts.

Defense Mechanism
In a real-time environment, the network defender takes action as soon as the attacker progresses through the system and limits this progress.For this purpose, a dynamic deception model is introduced that uses socket synchronization and IP address generation.These steps use hybrid encrypted communication and block cipher symmetric encryption, respectively.A Hidden Markov Model (HMM) is used for timing selection.The dynamic host configuration protocol (DHCP) enables policy allocation.The action of the defender is controlled by a belief update algorithm.A joint probability distribution Algorithm 1: Malicious Remote Desktop Protocol authentication events injection Initialize: μ, σ 2 , x and y/malicious and benign variables 1: Malicious_AuthTuple ← ("username" + "destination event")/in benign 2: Benign_AuthTuple ← ("username" + "destination event")/in malicious /Dictionary mapping malicious and benign data from source 3: Source ← dict{} 4: for each host ∈ y 5: Source is used over the attacker types, and security states and the capability of the attacker are captured using a belief matrix.The defender can decide whether to save or spend more resources to thwart the reconnaissance missions based on the type of attacker.

Socket Synchronization and IP Address Generation
Based on User Datagram Protocol (UDP) and Transmission Control Protocol (TCP), socket communication can be further divided into two communication methodologies.It is easier to synchronise sockets and hence is chosen as the optimal means in this work.Since the socket doesn't have a fixed port, attackers find it difficult to attack.Fig. 2 represents the communication flow of the encryption.A hybrid end-to-end encryption communication module is designed based on the original socket communication technology.The message M is extracted as plaintext using 'H' as hash algorithm.This is followed by signing the hash value SM2 and the package II is further processed using the SM4 hashing algorithm.At the receiving end, the sender's public key is used to recover the plaintext and hashing is again carried out with SM4.
A total of 32 rounds of non-linear iteration is carried out with the help of key expansion and packet encryption with SM4 grouping.Thus, a pseudo-random sequence is generated and can be further incorporated as the dynamic IP address table.An encryption model using cipher block chaining is used wherein the previous round of encryption operation is XORed with the initial parameter.Here, the key generated is generated and used as the seed key input.

Hidden Markov Model
A Hidden Markov model (HMM) is incorporated to identify the future state of the system.Here the Markov chain is capable of determining the transition probability based on the visible states such that q t represents the current state and q t+1 is the future state.
Fig. 3 represents the dynamic policy assignment of the proposed work.As shown in the figure, the dynamic policy has two major parts: dynamic IP address table and dynamic time.As the first step, a dynamic policy is generated by the dynamic deception domain, which forwards it to the DHCPv6 server depending on the honeypot, Intrusion detection system and firewall's information.The server uses an IPv6 address dynamic protocol to control the lease period, which impacts the IP address.A dynamic timing generation is triggered by the length of the lease period.

Update Algorithm
A heuristic search algorithm called the online defence algorithm is used for identifying defence actions in real-time.An online defence algorithm built on the sample generates security alerts on detecting an attacker's progress via the network using the security model structure, paving the way to large-scale domain computing analysis.Blocking vulnerability or a similar defence action is employed to analyze the progress in assessing the attacking path of the attacker.The challenge lies in optimal computing action while deceptively interacting with the attacker, as far as scalable networks are concerned.The offline POMDP solver evaluates the optimal action for every belief state before runtime.Even though the solver has higher efficiency, its ability to capture the optimal action will be impossible in the case of large networks.Zainudin et al. in [16] addressed this issue with the help of Partially Observable Monte-Carlo Planning (POMCP), which can handle a large-scale network.
Compared with offline methods, online methods skip execution and computation stages, resulting in a more scalable approach.Action nodes and belief nodes are the two types of nodes in POMCP.
Action nodes are the children's nodes of the belief state that can be reached using actions.
The belief state denotes belief nodes.
In this proposed methodology, a POMCP algorithm similar to the selection process is used along with the solution to large observation space problems with a modified belief update procedure.In this technique, Algorithm 2 is used to analyze if every incoming alert z i ∈ Z matches with Z(s) = Z(e), which is the security state.When the attacker triggers an attempt to exploit, the alert is generated.On the other hand, alerts that are not in A(s) will not be generated.Hence these alerts are declared as false alerts.A generative model is called at the initial stage of simulation to provide cost, observation and sample success for a particular state and action (s, ϕ, y, −) ∼ G(s, ϕ, u r ).Here the state-action pair is represented as ∝ t .
Algorithm 2: Belief Update Algorithm Initialize: n k , ∝ t+1 = U a(r,f ) , added_num=0 1. procedure Belief_Update (∝ t , u r , y r ) 2: while added_num <n k do 3: (s, ϕ) ∼ ∝ t 4: (s, ϕ, y, −) ∼ G(s, ϕ, u r ) 5: if y Z(s) = y Z(s) r then 6: ∝ t+1 ←∝ t+1 U {s , ϕ } 7: added_num←added_num+1 Using successive sampling and generative models, the history of the search tree can be built, as shown in Fig. 4. In a search tree, history is represented by the nodes, while the branches that extend from the nodes denote possible future histories.On the other hand, the greedy tree policy is followed by the MCTS such that the highest value is chosen at the initial stage of simulation.

Mathematical Modeling
Consider the attack arrival from the user b to host a at time t, Here, σ represents the attack load, and N represents the set of hosts infected under the specific attack.The rate at which the host receives a response is given by: Characterization of the host dropping rate is done by the following expression while considering the attack loads:  For each attack event b towards the host a, the dynamic risk level is modelled using the following expression: Here, ω b represents the risk score.The time at which host b is compromised is estimated by t − t b .Preference ranking and maximum response guarantee of the host are estimated to obtain high response efficiency under stable security conditions.The preference ranking of host a is given by for the risk level vector Further, the end-to-end packet delay is modelled by the following expression considering the processing overhead, queueing delay, propagation delay, transmission delay and processing delay that occurs while ensuring confidentiality, integrity and authentication.
When interacting with the attacker, the defender's optimum action is POMDP.This can be defined using the equation: such that 0 < γ < 1 lies between 0 and 1, while c (b t , u t , ϕ t ) denotes the cost of the state.
The delay-aware virtual queue Q a is updated using the following expression:

Evaluation Metrics
A cluster of nodes is considered for performing pre-processing, visualization and data analysis.Intel Xeon E-2224G Processor 4.70 GHz CPU and 32 GB RAM are used for this purpose.10 Gbps Ethernet is used for interconnecting the nodes.Microsoft Azure VM is used for training and validating the model under supervised learning conditions.Unsupervised learning is performed on Intel Xeon W-2200 Processor with 18 AVX-512-enabled cores and 1 TB DDR4 memory.Data visualization is performed using Grafana, and the dataset is ingested into the Datadog cluster by deploying an instance.Pandas, SciPy, NumPy and such Python packages are used for data pre-processing.Keras and MLlib libraries are used for developing the ML models in Python.

ML Metrics
Various ML techniques are evaluated, and their performance metrics are estimated to define the malicious RDP sessions.

Experiment
K-cross fold validation is employed to validate the machine learning models with baseline features, as in Table 2.The value of k is considered to be 10.Comparison is made among the Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Feed-forward Neural Network (FNN), Decision Tree Classifier (DTC) and Adaptive Boosting (AdaBoost) classifiers.The AdaBoost classifier offers better accuracy, precision, F1 score and recall.This is because these classifiers are designed to boost the performance of the existing classifiers.Fig. 5 represents the performance of the AdaBoost classifier in terms of the metrics for a different number of clusters.Fig. 6 shows the cross-validation results for various iterations.It compares the proposed cross-validation model along with existing models such as robustness tests [36] and bootstrapping [37].
Various attack types are used with multiple levels of stealth, aggression and attack knowledge ranging between high, moderate and low [38,39].The attack and success probabilities for each condition are estimated.Further, the performance of the proposed model on individual datasets and the combined dataset is compared in Fig. 7 for various parameters.Better classification performance is observed when the combined dataset is tested with malicious traces from a user.Ensemble ML can be used for consolidating stand-alone classifiers to further improve performance.Here, a majority voting algorithm is used for leveraging the ML models in the ensemble.A conservative approach termed weighted voting is used, where weights are assigned based on intuition.The false positives and negatives may be reduced by assigning a higher weight to the classifier with better performance.A series of experiments are conducted to analyse the impact of adversarial attacks on the proposed model.Based on these experiments, it is evident that the proposed model is robust and successfully detects and mitigates various types of adversarial attacks.

Conclusion
In recent years, cyber threats serve as a severe threat to people using the internet extensively for all purposes.Advanced persistent threats (APTs) are one of the most complex attacks which last a long time.During the APT attack and its lateral movement stage, a common tool that can be used to prevent the attack from intervening is the RDP.In this work, malicious attacks in RDP are detected and mitigated by leveraging the event logs in Windows.Multiple datasets are combined to overcome the shortcomings of the individual datasets while remaining faithful to the attack models.The anomalous RDP sessions are detected using a supervised learning algorithm by extracting relevant features.Classification algorithms namely Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Feed-forward Neural Network (FNN), Decision Tree Classifier (DTC) and Adaptive Boosting (AdaBoost) are evaluated and compared for precision, recall, F1 score and accuracy.It is found that the AdaBoost classifier offers better accuracy, precision, F1 score and recall recording 99.9%, 99.9%, 0.99 and 0.98%.The dynamic deception model is used as a defence mechanism.It is a combination of the block cipher symmetric encryption, HMM, DHCP and a belief update algorithm.Future work is directed towards the deployment of the system in online learning, hybrid systems and other session-based protocols to test the performance.Further, more event logs and test scenarios can be implemented to improve efficiency.

Figure 4 :
Figure 4: POMDP environment search tree construction with real action and real observation o

Figure 5 :Figure 6 :
Figure 5: Number of estimators vs. precision, recall and training duration for stand-alone AdaBoost model

Figure 7 :
Figure 7: Performance evaluation on testing with independent and combined datasets

Table 1 :
Commonly used APT datasets

Table 2 :
Estimation of performance metrics during detection of RDP session with ML classifiers