A Secure Intrusion Detection System in Cyberphysical Systems Using a Parameter-Tuned Deep-Stacked Autoencoder

: Cyber physical systems (CPSs) are a networked system of cyber (computation,communication) and physical (sensors, actuators) elements that interact in a feedback loop with the assistance of human interference. Gener-ally, CPSs authorize critical infrastructures and are considered to be important in the daily lives of humans because they form the basis of future smart devices. Increased utilization of CPSs, however, poses many threats, which may be of major significance for users. Such security issues in CPSs represent a global issue; therefore, developing a robust, secure, and effective CPS is currently a hot research topic. To resolve this issue, an intrusion detection system (IDS) can be designed to protect CPSs. When the IDS detects an anomaly,it instantly takes the necessary actions to avoid harming the system. In this study, we introduce a new parameter-tuned deep-stacked autoencoder based on deep learning (DL),called PT-DSAE, for the IDS in CPSs. The proposed model involves preprocessing, feature extraction, parameter tuning, and classification. First, data preprocessing takes place to eliminate the noise present in the data. Next, a DL-based DSAE model is applied to detect anomalies in the CPS. In addition, hyperparameter tuning of the DSAE takes place using a search-and-rescue optimization algorithm to tune the parameters of the DSAE, such as the number of hidden layers, batch size, epoch count, and learning rate. To assess the experimental outcomes of the PT-DSAE model, a series of experiments were performed using data from a sensor-based CPS. Moreover, a detailed comparative analysis was performed to ensure the effective detection outcome of the PT-DSAE technique. The experimental results obtained verified the superior performance on the applied data over the compared methods.


Introduction
In general, sensors are embedded in cyberphysical systems (CPSs) to monitor anomalies and manage intrusions and hazards. To predict and prevent abnormalities, anomaly detection systems (ADSs) have been applied. Therefore, ADSs experience false positives (FPs, false alarms) and false negatives (FNs, missed predictions), which result in performance limitations in CPS domains.
In particular, FPs tend to recover unwanted information, whereas FNs tend to recover essential data only. These prediction errors result in imbalanced values, which are forwarded to the controller and result in nonoptimal destabilizing control solutions that compromise the system performance. For instance, prediction errors result in catastrophic actions, such as reactor dispersion in process control systems, pollution in water distribution systems, and traffic control in smart transportation networks [1].
CPS networks are comprised of sensors, actuators, and networking modules, which are suitable in the fields of power, automation, development, civil structure, and medicine, among others. Generally, a CPS is a difficult system in which external operations and cyber applications are supported in a combined fashion. Although information and communications technology (ICT) is extremely progressed in CPSs, cybersecurity is still considered a vital issue in several sectors. One of the complicated vulnerabilities in CPSs is intrusion hazards. In the past few decades, close attention has been paid to the enhancement of CPS security. Intrusion detection (ID) is one of the important applications for maximizing the integrity of CPSs. Intrusion detection systems (IDSs) are usually applied to effectively prevent attacks. In 1980, Anderson presented the notion of basic ID, which was later followed by a massive number of studies on IDSs.
In general, IDS approaches are categorized into two major classes: misuse and anomaly prediction. Initially, features of well-known attacks are applied for misuse prediction. At this point, the audited data are related to a database and reported as an intrusion. Although misuse detectors generate the minimum FPs, these detectors have massive limitations. For example, with these detectors, developing and maximizing a comprehensive database represent a tedious operation, and well-known attacks are expected. Many models have been developed for misuse prediction. For example, Abbes et al. established a new protocol analysis to enhance the performance of pattern matching. In [2], the authors estimated ID-based pattern matching. A rule-based expert method has been used for misuse prediction. Moreover, a genetic algorithm (GA) has been employed for computing misuse detection. Recently, data mining (DM) schemes have been used to develop misuse prediction approaches. An extensive review of ID by GAs and DM is available in [3]. However, only a few efforts were made to classify and predict system intrusions under the application of colored Petri nets. Anomaly detectors shape the general behavior of a network. An intrusion is defined as considerable degradation from general system operation. One of the major benefits of these detectors is their ability to identify attacks, which is traditionally unknown. Unlike classical models, this model yields FPs, although its accuracy is low.
Some prediction approaches depend on clustering models. Recently, several artificial learning methods have been extensively applied in anomaly prediction. Currently, the only anomaly detection (AD) technologies available are neural networks (NNs), GAs, and wavelet. Previous works on IDS have assumed misuse detection and anomaly prediction. Conventionally, misuse and anomaly prediction approaches have both major advantages and disadvantages. Previous IDSs have been applied only for the identification of misuse or anomaly attacks, whereas concurrent misuse and anomaly IDSs have been developed to address limitations.
In this study, we introduce a new parameter-tuned deep-stacked autoencoder based on deep learning (DL), called PT-DSAE, for the IDS in CPSs. The proposed model comprises preprocessing to eliminate the noise present in the data. Next, a DL-based DSAE model is applied to detect anomalies in the CPS. In addition, hyperparameter tuning of the DSAE is performed by a search-and-rescue (SAR) optimization algorithm to tune the number of hidden layers, batch size, epoch count, and learning rate. To evaluate the experimental outcomes of the PT-DSAE model, a series of experiments were performed on data from a sensor-based CPS.

Literature Review
Different types of detectors have been introduced with machine learning (ML) and NNs. Goh et al. [4] established an unsupervised method for anomaly prediction in CPS-based recurrent neural networks (RNNs) as well as a cumulative sum approach. Kosek [5] implied a contextual AD technology for smart grids based on NNs. Krishnamurthy et al. [6] used a secondary method called Bayesian networks, which provides a means for learning causal correlations and temporal relations in cyber and external parameters from unlabeled data using Bayesian systems. Such modules are employed to predict abnormalities and isolate root causes. Jones et al. [7] developed a method based on formal ones to compute AD in CPSs. This model is equipped with model-free, unsupervised learning, which tends to create signal temporal logic (STL) from the final outcomes collected in common operations. Next, anomalies are predicted by a flagging method that does not satisfy the learned function. Kong et al. [8] described a scheme based on formal methods for supervised anomaly learning.
Chibani et al. [9] investigated the problems faced while creating fault detection filters in fuzzy systems, which assume errors and failures in discrete-time polynomial fuzzy systems. Moreover, AD is employed in security intrusions to predict the CPS over the intrusions. Urbina et al. [10] used a physics-based prediction of stealthy intrusions through industrial control systems. Conventional works are defined with prediction principles, which does not restrict the influence of stealthy attacks. Next, a new measure was utilized to measure the impacts, demonstrating attacks distinguished with better configuration. Unlike former schemes, Kleinmann et al. [11] considered predictive attacks over industrial control networks on the basis of cyber anomalies, and various modalities have been considered for forecasting errors projected in traffic networks.
Lu et al. [12] recommended a former work in AD of traffic sensors that, according to the level of data used, categorizes detection methods into three phases: macroscopic, mesoscopic, and microscopic. In general, several data correction approaches have provided practical guidelines for AD in traffic networks. Zygouras et al. [13] developed three methods based on Pearson's correlation, cross-correlation, and multivariate ARIMA to examine failed traffic values. They also employed crowdsourcing to resolve indefinite values in faulty sensors. Finally, Robinson [14] applied a sample based on the correlation between flows at close sensors to detect faulty loop detectors. Fig. 1 shows the process involved in the proposed PT-DSAE model. As depicted, the input data are first preprocessed to remove noise. Then, DSAE-based classification is performed, in which the parameters are optimized using an SAR optimization algorithm.

Stacked Autoencoder
It should be noted that the stacked autoencoder (SAE) applied in this study was developed using various autoencoder (AE) and logistic regression (LR) layers, as depicted in Fig. 2. The AE is a fundamental unit of the SAE classification method. It is composed of an encoding step (Layers 1 to 2) and a decoding or reconstruction step (Layers 2 to 3). This process is depicted in Eqs. (1) and (2), where W and W T (transpose of W) are weight matrices of modes b and b are 2 various bias vectors of this mode; s is defined as a nonlinearity function, like the applied sigmoid function; y denotes latent parameter implication of the input layer x; and z is viewed as a prediction of x given y, which has a similar shape to that of x: (1) Various AE layers are jointly stacked in the unsupervised pretraining phase (Layers 1 to 4). Then, the secondary representation 'y' processed by the AE is applied as an input to the upcoming AE layer. The layer then undergoes training as an AE by reducing the reconstruction error, which has simultaneously been computed [15]. Then, the reconstruction error (loss function L(x, z)) is estimated in massive iterations. At this point, cross-entropy is applied to measure the reconstruction error, as depicted in formula (3), where x k and z k represent the kth element of x and z, respectively: Importantly, the reconstruction failure is limited when a gradient descent (GD) model is applied. Hence, the weights in Eqs. (1) and (2) should be upgraded on the basis of Eqs. (4)-(6), where 0 implies a learning rate:  Once the layers are pretrained, the system is supervised at a fine-tuning stage. Then, from the supervised fine-tuning stage, an LR layer is included in an output layer at an unsupervised pretraining phase. In this work, the probability with input vector x (Layer 4) comes under class i as illustrated in formula (7), where y defines a predicted class of input vector x, ·W ; b represents a weight matrix and bias vector; W j and W j represent the ith and jth row of matrix W , respectively; b j and b j are the ith and jth elements of vector b, respectively; and softmax is a nonlinearity function applied in this work. The class with the maximum probability is considered the predicted label (y pred ) of the input vector x, as depicted in formula (8). The prediction error of a sample data set D(Loss(D)) is estimated on the basis of true labels, as illustrated in formula (9), where y j denotes a true label of x j . Loss(D) is reduced when a GD scheme is applied, which is same as reducing the reconstruction failure, as defined in the following:

Parameter Optimization of a Deep-Stacked Autoencoder
In SAE networks, the pretraining layer is essential to gain the best weights with the help of an optimization model, and this is applied as initial variables for deep AE systems. Then, optimal attributes are applied to achieve the best detection accuracy. One of the effective models applied in this approach is backpropagation (BP), which depends on GD. However, this model has some deficiencies in large data sets, such as a low convergence speed and probability to fall into a local extremum. Here, the L-BFGS method is applied for initial parameter examination. This is one of the significant limited-memory quasi-newton mechanisms that can be applied in largescale data optimization issues. It can also be applied to search global optima with the maximum convergence speed. The procedure of L-BFGS is defined in λlgvx i ithm1. The main objective of this work is to identify optimal attributes θ by reducing a function f (x), where f (x) is a nonlinear, frequently differentiable objective function. An objective function is illustrated in Eq. (4). Here, H k represents an inverse Hessian approximation, which is upgraded at each iteration to obtain H k+1 . In previous quasi-newton technologies, H k was denser and had an increased number of iterations, which becomes impossible as the memory and processing of a matrix. In general, the L-BFGS approach does not require the storage of a full n × n inverse Hessian matrix; it saves the extended version of H k by changing {s k , y k }. This model keeps r, which represents correction pairs {s i , y i } k−1 i=k−r , for upgrading the r iterations. It can be seen that the cost of every iteration is minimal; thus, the L-BFGS approach exhibits a high implementation speed and strong robustness.

A Deep-Stacked Autoencoder Model Based on Search and Rescue
To enhance the training process of the L-BFGS model, a SAR optimization algorithm is employed. In SAR, the humans' places are similar to the solutions attained for optimization issues, and the volume of clues identified in these positions refers to an objective function for such solutions [16]. Fig. 3 shows the flowchart of SAROA.
where M and X refer to the memory and intrusion of a CPS, respectively, and X N1 denotes the place of the first dimension for the Nth value. Additionally, M 1D represents the location of the Dth dimension for the first memory. These modules have two phases, a social phase and an individual phase, as shown in the following.
From the given statement, a random clue was considered to find the searching direction using the given expression: where X i , C k , and SD i denote the place of the ith intrusion, the position of the kth clue, and a search direction of the ith value, respectively, and k denotes a random value within 1 and 2N (selected in k = i).
Importantly, the search process should be computed when the group members are identified. However, the dimensions of X i remain the same in Eq. (11). This condition is applied using a binomial crossover operator. Moreover, a defined clue is optimal when compared with clue based on recent position, the regions from SD i direction as well as place of a clue is identified (Area 1); otherwise, a search task is processed in the present location with SD i direction (Area 2). Finally, the given function is applied in a social phase: where X i, j denotes the position of the ith dimension for the ith intrusion; C kj represents the position of the j t dimension in the kth clue found; f (C) and f (X ) are the objective functions for the solutions C k and X i , respectively; r1 denotes a random value with a uniform distribution from [−1, 1]; r2 mimics uniformly distributed arbitrary within [0, 1] that is varied from all the dimensions, and hence r1 is fixed for such dimensions; j rand represents a random value other than 1 and D assures a 1D of X i, j is differed from X ij ; and SE represents a model variable from 0 and 1. Here, Eq. (12) is applied to achieve a new location of ith dimensions.
In the individual phase, intrusions are identified by the present clues applied in the social phase used for the searching process. Unlike in the social phase, the dimensions of X i are modified in an individual phase. Hence, the intrusion of the ith objective is obtained by the given derivations: where k and m represent random integer values ranging from 1 to 2N. To eliminate movement with other clues, k and m are selected in i = k = m. r3 defines a random value with a uniform distribution within 0 and 1.
In metaheuristic approaches, solutions should be placed in a solution space. When the solution exceeds the considered solution space, then it needs to be changed. Thus, when an IDS is processed from a solution space, the following equation is applied to change the new position: where x max j and X min j are the measures of the higher and lower thresholds for the jth dimension, respectively.
In all iterations, the group members find two stages in which the measure of the objective function at position X (f (X )) is higher than the existing one (f (X i )). The traditional position (X) is saved randomly from a memory matrix (M) with the help of Eq. (15) and is approved as a novel place with the help of Eq. (16); otherwise, it is left and the memory remains the same: where M n denotes the place of the nth clue saved in the memory matrix and n defines a random integer value from 1 to N. This allows memory updates to enhance the diversity of a model and the capability of this model to identify a global optimum.
In the case of an SAR process, time is considered a significant factor, because when people get wounded, any delay by the SAR teams prevents them from finding these people. Hence, the process defined above is computed with a massive space and limited time duration. In general, the unsuccessful search number (USN) is fixed as 0 for all human beings. When an intrusion is examined, the USN is set as 0; otherwise, it is changed to 1, as shown below: where USN i shows the time of human i was not applicable to identify optimal clues. If the USN is higher than the maximum unsuccessful search value (MU), then a random position is selected in a search space by Eq. (18), and USN i is fixed as 0: where r4 refers to a random value with a uniform distribution ranging from 0 to 1, which differs from one dimension to another.
Generally, SAR is composed of two control variables: social effect (SE) and MU. The SE is applied to manage the impact of group members in the social phase. This attribute falls in the range [0, 1]. Higher values of SE enhance the convergence value and limit the global search of a method. Here, the e MU parameter indicates a greater number of ineffective searches before excluding a clue. It falls within the range [0, 2 × T max ], where 2 × T max means higher searches and T max represents a larger number of iterations. In case of massive values in MU, attacks or intrusions can be identified. A minimum value of this attribute results in Group 3 members finishing their exploration of the present clue and moving on to an alternate position. Therefore, MU is compared with the dimension of the problem. When the search space is maximized, the massive count of unsuccessful searches is also enhanced. Hence, the measure of SE is allocated as 0.05, and the measure of MU is accomplished by Eq. (19). Analysis of the SAR variables shows that the predefined values for SE and MU can be applied to identify CPS intrusions:

Performance Validations
For an experimental analysis, a series of experiments were performed on the NSL-KDD dataset, which includes samples under five attack types. This dataset contains a total of 45,927 samples under denial-of-service (DoS) attack, 995 samples under R21 attack, 11,656 samples under probe attack, 52 samples under U2r attack, and 67,343 samples under normal attack, as shown in Tab. 1. Fig. 4 presents details related to this dataset.       Fig. 7 shows that the IDBN method has attained an insignificant classifier outcome with a low F-measure of 0.908. Moreover, the AK-NN technology surpassed the IDBN method with an F-measure of 0.9292. In line with this, the DL approach generated an F-measure of 0.9412, while an acceptable F-measure of 0.9508 was generated by the DPC-DBN framework. Likewise, the DT scheme attained a reasonable F-measure of 0.9542, followed by the AdaBoost, RF, SVM, and T-SID methods, which attained close F-measure values of 0.9568, 0.9592, 0.9655, and 0.9729, respectively. Importantly, it was observed that the proposed PT-DSAE technique attained an optimal F-measure of 0.986. With regard to the measurement of the classifier results in terms of accuracy, the Fig. 7 shows that the AK-NN approach yielded an ineffective classifier outcome with a minimal accuracy of 0.9199. Concurrently, the DL scheme performed quite better than the AK-NN model with an accuracy of 0.9277. Moreover, the DT method generated an accuracy of 0.9365, while a reasonable accuracy of 0.9396 was attained by the T-SID model. Similarly, the DPC-DBN scheme yielded a considerable accuracy of 0.9498, whereas the AdaBoost, RF, IDBN, and SVM approaches exhibited close accuracy values of 0.9587, 0.9598, 0.9617, and 0.9632, respectively. Importantly, it was observed that the newly proposed PT-DSAE scheme yielded a superior accuracy of 0.9849.

Conclusion
In this study, we developed an effective IDS using DL models for CPSs. First, input data were preprocessed to remove noise, and then a DSAE-based classification process was performed, in which the parameters were optimized using a SAR optimization algorithm. In SAE networks, the pretraining layer is essential to obtain the best weights with the help of an optimization model, and this is applied as initial variables for deep AE systems. To improve the training process of the L-BFGS model, a SAR optimization algorithm was employed. For an experimental analysis, a series of experiments were performed on the NSL-KDD dataset, which includes samples under five attack types. From the experimental results, it was observed that the PT-DSAE model identified intrusions with an average precision of 0.9791, recall of 0.9865, F-measure of 0.9860, and accuracy of 0.9849. Therefore, it can be applied as an effective tool for intrusion detection in CPSs. In the future, hybrid optimization algorithms can be used to improve the performance.
Funding Statement: The author(s) received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.