Concept Drift Analysis and Malware Attack Detection System Using Secure Adaptive Windowing

Emad Alsuwat; Suhare Solaiman; Hatim Alsuwat

doi:10.32604/cmc.2023.035126

icon Open Access

ARTICLE

Concept Drift Analysis and Malware Attack Detection System Using Secure Adaptive Windowing

Emad Alsuwat^1,*, Suhare Solaiman¹, Hatim Alsuwat²

1 Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 26571, Saudi Arabia
2 Department of Computer Science, College of Computer and Information Systems, Umm Al-Qura University, Makkah, 24382, Saudi Arabia

* Corresponding Author: Emad Alsuwat. Email: email

Computers, Materials & Continua 2023, 75(2), 3743-3759. https://doi.org/10.32604/cmc.2023.035126

Received 08 August 2022; Accepted 29 January 2023; Issue published 31 March 2023

Abstract

Concept drift is a main security issue that has to be resolved since it presents a significant barrier to the deployment of machine learning (ML) models. Due to attackers’ (and/or benign equivalents’) dynamic behavior changes, testing data distribution frequently diverges from original training data over time, resulting in substantial model failures. Due to their dispersed and dynamic nature, distributed denial-of-service attacks pose a danger to cybersecurity, resulting in attacks with serious consequences for users and businesses. This paper proposes a novel design for concept drift analysis and detection of malware attacks like Distributed Denial of Service (DDOS) in the network. The goal of this architecture combination is to accurately represent data and create an effective cyber security prediction agent. The intrusion detection system and concept drift of the network has been analyzed using secure adaptive windowing with website data authentication protocol (SAW_WDA). The network has been analyzed by authentication protocol to avoid malware attacks. The data of network users will be collected and classified using multilayer perceptron gradient decision tree (MLPGDT) classifiers. Based on the classification output, the decision for the detection of attackers and authorized users will be identified. The experimental results show output based on intrusion detection and concept drift analysis systems in terms of throughput, end-end delay, network security, network concept drift, and results based on classification with regard to accuracy, memory, and precision and F-1 score.

Keywords

Concept drift; machine learning; DDOS; cyber security; SAW_WDA; MLPGDT

1 Introduction

The current technological world of present era is changing and making it harder to protect systems and links against mischievous attacks or breaches. One sort of security technology is called an intrusion detection system (IDS) that has been designed to identify and prevent intrusions in a network system. Because it contains such a big amount of data and information, the Internet has a variety of issues in terms of making it a secure system. Businesses, industries, and different spheres of daily activity all use computer networks. Organizations and institutions all over the world have been obliged to build and employ modern networks for safety as a result of technological and business advancements [1].

A shift in the features of the data stream is known as concept drift. When properties of decision attributes as well as classes to be forecasted vary unexpectedly between two given time points, it is defined as concept drift. Classification quality may suffer as a result of this circumstance and learning mechanisms may suffer as a result [2]. In ML, concept drift relates to a shift in relationships between input and output data in a data stream. Data could be altered in any way. Other sorts of changes include (i) gradual changes over time, (ii) recurring or cyclical changes, and (iii) abrupt or sudden changes. Learning models must be able to adjust to changes swiftly and accurately. The ideal drift detection approach is used to detect incoming new communications autonomously. The drift detector appears to be the simplest classifier, however, it is not as straightforward as it appears. The model should usually be rebuilt as soon as feasible after returning the signal regarding the drift [3].

Learning techniques in embedded applications have been required by recent improvements in cyber-physical systems (CPS) to work in non-stationary, time-variant contexts [4]. Idea drift learning, sometimes referred to as learning in non-stationary contexts, concentrates on the environment's event-driven changes in CPS. Changes in feature data (x) and goal variables (y) altered underlying models developed by learning methods as a result of such evolving notions. Concept drift detection in CPS reduces negative compounding error impact and allows for cost-effective predictive maintenance [5]. In this setting, ensemble learning algorithms that include many supervised techniques determine it impossible as well as impractical to detect concept drifts.

A novel unsupervised ML method is required to solve these issues and to manage complicated data patterns as well as distributional assumption breaches buried in industrial applications of CPS data streams. In supervised ML problems, a machine learning classifier is trained using a given labeled dataset of training samples with the goal of predicting a target variable. Concept drift in this situation refers to the alteration over time of the relationship between the input data and the target variables.

Concept drift may emerge in dynamic environments, such as e-mail spam detection. In this dynamic environment, malicious opponents may attempt change their e-mails to avoid spam filters. Ineffective classifiers are unable to accurately categorize newer samples as a result of these changes in data distribution. necessitating the development of algorithms for responding to concept drifts [6].

The following are the chief contributions of this study paper:

• To design novel architecture in concept drift analysis and detection of malware attacks like DDOS in the network

• The network has been analyzed for detecting the intrusion and concept drift using secure adaptive windowing with website data authentication protocol (SAW_WDA) integrated with authentication protocol to avoid DDOS attacks and concept drift.

• The user data of the network will be collected and classified using multilayer perceptron gradient decision tree (MLPGDT) classifiers.

• Based on the classification output, the decision for the detection of attackers and authorized users will be identified.

The model of this essay is organized as follows. Section 2 of our report includes the associated work. Section 3 of our proposal presents the system model. Performance analysis is presented in Section 4. In Section 5, the conclusion of our research is presented.

2 Related Work

Idea drift is pertinent for malware detection when static file analysis is performed, according to earlier research [7]. Prior studies have looked into methods for identifying idea drift in malware families [8] and warning human analysts when it is found during malware detection. The efficiency of several machine learning properties for detecting fraudulent websites is examined in work [9] However, the use of Host and Content capabilities is the extent of their activity. Extract the Lexical features, as well as the Host and With features based on content, from each URL and then keep them in feature vector form. The supervised learning system uses these feature vectors as input to classify these URLs as harmful or benign. Random Forest (RF), Gradient Boosted Trees (GBT), and Feed Forward Neural Networks (FFNN) have supervised learning methods employed in our research. They use unsafe databases and benign URLs collected from diverse sources to train their algorithms. Although the suggested method is flexible and resistant to a range of dangers, it disregards the dynamic nature of websites. The same training data reaches the hands of criminals and the ability to spot patterns in detection methods tries to change some elements of harmful websites to go around the security [10].

Among the machine learning techniques suggested in [11] for identifying fake websites are (Lagrangian Support Vector Machine) LSVM, (Logistic regression) LR, Random Forest (RF), Naive Bayes (NB), and statistical techniques for finding Concept Drifts in websites. Only a few studies have produced practical results, according to the author in [12], intending to foster research on intelligent security techniques based on a cyclic process that begins with the discovery of new threats and ends with the analysis and development of prevention measures. The authors of [13] propose a novel Gradient Boosting Decision Tree (GBDT) training technique with narrower sensitivity limitations and much better noise allocations. To slight the sensitivity boundaries by analyzing the gradient characteristic and the contributions of each tree in GBDTs, they suggest flexibly regulate the gradients of training data for every iteration and leaf node clipping Furthermore, they develop a unique boosting structure to distribute the privacy budget among trees, reducing the precision loss even further. Their studies reveal that our technique outperforms other baselines in terms of model accuracy. Jiang et al. conducted a comprehensive review of several articles that used ML in security domains, resulting in a taxonomy of machine learning models and their applications in cybersecurity [14]. Because label data is rarely available in real-world applications, [15] classified existing solutions for detecting abnormalities in changing data using unsupervised algorithms. The research [16] looked at adversarial assaults on PDF malware detectors.

The state-of-the-art of ML for data streams was emphasized by the author in [17], who presented possible research options. A lot of research is not valid in many use situations, according to [18], which focuses on label acquisition and model deployment. In [19], the author conducted a comparative examination of several methodologies for dealing with imbalanced data, applying them to various data distributions and application domains. [20] Investigated some of the limitations and challenges of Deep Learning (DL) methods in a traditional ML workflow for malware detection as well as classification in literature such as open benchmarks, class imbalance, concept drift, model interpretability, and adversarial learning.

3 System Model

The novel design in concept drift analysis and malware attack detection is covered in this section. Here the website with concept drift has been predicted and analyzed for DDOS attack in the network. When the data drift is detected, the web server has been analyzed and predicted to be concept drift. Then SAW_WDA has been employed for minimizing the concept drift and intrusion of the network. The data of users has been collected and classified using MLPGDT where the decision has been made based on classified output whether the malware user or authorized user. Fig. 1 depicts the total system architecture.

images

Figure 1: shows the proposed systems’ overall architecture

3.1 Secure Adaptive Windowing with Website Data Authentication Protocol (SAW_WDA)

The intrusion detection system and concept drift of the network has been analyzed using secure adaptive windowing with website data authentication protocol (SAW_WDA). To detect the adaptive windowing change, we have performed the SAW_WDA protocol. The detailed SAW_WDA protocol is given below:

When no obvious change is found, the window is dynamically magnified, and when a change is determined it is compressed. In section W0 · W1 of W, the cut value is evaluated as under. Let W stand for W length, ^µW for the average of W's elements, and W for the average of µt for t ∈ W. Let n0 show the size of W0, n1 show the size of W0, and W1 and n show the length of W, resulting in n = n0 + n1. W0 and the predicted values are characterized by W1 To achieve the most stringent performance guarantees are given by Eq. (1)

m=11/n0+1/n1(harmonic mean ofn0andn1)δ′=δn,andϵcut=12m⋅ln⁡4δ′(1)

If we undertake S as a data stream and E is an ensemble. When the case is received, the internal change detector D is used to train the online classifier gradually. The ensemble member’s Ci ∈ E are weighted after every incoming instance, rather than calculating component classifiers by Eq. (2).

Wo=∑1≤t≤TW(sc(P(t))),We=W(sc(C))−Wo(2)

W(sc(P(t1))∩sc(P(t2)))=a1

For all1≤t1,t2≤T

A1={W(sc(P(t1)))ift1=t20ift1≠t2

A(sc(Pls(t1))∩sc(Cf))+∑1≤t2≤T,t2≠t1A(sc(Pls(t1))∩sc(Pus(t2)))≥0.6×A(sc(Pls(t1)))

For all1≤t1≤T

W(sc(P(t1))∩sc(C))=W(sc(P(t1)))(3)

For all1≤t1≤T

When the underlying function that creates instances evolves, concept drift is said to occur. Formally, it can be described as any situation where the joint probability shifts. Thus, it may appear as a change in the class prior probabilities, a change in the class-conditional PDF, or a combination of the two. The transition probability of road segment Rj at time t − 1 is used to compute the under-mention method that can enter road segment Ri at time t given by Eq. (4). Here ‘traffic’ denotes the average network traffic.

Pr(Rjat t−1∣Riat t)=pj= trafficj∑p=1ntrafficp

Sc(P(t))={(x↼,y↼,z↼)∈R3∣px,p(t)≤x↼≤px,P(t)+lx,P(t),py,P(t)≤y↼≤py,P(t)+ly,P(t),pz,P(t)≤z↼≤pz,P(t)+lz,P(t)}

sc©={(x↼,y↼,z↼)∈R3∣0≤x↼≤Lx,0≤y↼≤Ly,0≤z↼≤Lz}

sc(Cf)={(x↼,y↼,z↼)∈R3∣0≤x↼≤Lx,0≤y↼≤Ly,z↼=0}

sc(Cfh)={(x↼,y↼)∈R2∣(x↼,y↼)∈sc(Cf)}(4)

Assume that E is an ensemble and S is a data stream. When the case is received, the internal change detector D is used to train the online classifier gradually. Each incoming instance results in a weighting of the ensemble members Ci and E, rather than calculating component classifiers by Eq. (2).

E=−∑j(pj⋅log⁡pj)(5)

The Hoeffding bound asserts that the estimated mean will not deviate from the true mean by more than with probability 1 by Eq. (6) after n independent observations of range R.

E =R2ln⁡(1/δ)2n(6)

where a user-defined confidence parameter δ ∈ (0, 1) is used. Let’s call the two subwindows W0 and W1. With 1−δ probability, we obtains |μW0 − μW1 | ≤ 2ε.

Such that ε is Hoeffding bound while μW0 and μW1 are average of two sub-windows.

P(t):pP(t)=(pI,P(t),pj,P(t),pk,P(t))∈I++3,lP(t)

=(lI,P(t),lj,P(t),lk,P(t))∈I++3

LI,P(t)=lx,P(t)1x,lj,P(t)=ly,P(t)1y,lk,P(t)=lz,P(t)1z

Li=Lx1x,Lj=Ly1y,Lk=Lz1z(7)

I++n is then-dimensional space of positive integers.

Assume that W’s true mean is

Cite This Article

APA Style

Alsuwat, E., Solaiman, S., Alsuwat, H. (2023). Concept drift analysis and malware attack detection system using secure adaptive windowing. Computers, Materials & Continua, 75(2), 3743-3759. https://doi.org/10.32604/cmc.2023.035126

Vancouver Style

Alsuwat E, Solaiman S, Alsuwat H. Concept drift analysis and malware attack detection system using secure adaptive windowing. Comput Mater Contin. 2023;75(2):3743-3759 https://doi.org/10.32604/cmc.2023.035126

IEEE Style

E. Alsuwat, S. Solaiman, and H. Alsuwat "Concept Drift Analysis and Malware Attack Detection System Using Secure Adaptive Windowing," Comput. Mater. Contin., vol. 75, no. 2, pp. 3743-3759. 2023. https://doi.org/10.32604/cmc.2023.035126

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Concept Drift Analysis and Malware Attack Detection System Using Secure Adaptive Windowing

Abstract

Keywords

Cite This Article

729

398

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link