Two-Branch Intrusion Detection Method Based on Fusion of Deep Semantic and Statistical Features

Lan Xiong; Liang Wan; Jingxia Ren

doi:10.32604/cmc.2026.076986

icon Open Access

ARTICLE

Two-Branch Intrusion Detection Method Based on Fusion of Deep Semantic and Statistical Features

Lan Xiong, Liang Wan^*, Jingxia Ren

State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China

* Corresponding Author: Liang Wan. Email: email

(This article belongs to the Special Issue: Advances in Machine Learning and Artificial Intelligence for Intrusion Detection Systems)

Computers, Materials & Continua 2026, 87(3), 18 https://doi.org/10.32604/cmc.2026.076986

Received 30 November 2025; Accepted 08 January 2026; Issue published 09 April 2026

Abstract

The semantic complexity of large-scale malicious payloads in modern network traffic severely limits the robustness and generalization of existing Intrusion Detection Systems (IDS). This limitation presents a major challenge to network security. This paper proposes a dual-branch intrusion detection method called CPS-IDS. This method fuses deep semantic features with statistical features. The first branch uses the DeBERTav2 module. It performs deep semantic modeling on the session payload. This branch also incorporates a Time Encoder. The Time Encoder models the temporal behavior of the packet arrival interval time series. A Cross-Attention mechanism achieves the joint modeling of payload semantics and temporal behavior. This joint modeling accurately characterizes the potential attack intent and its temporal-coordinate dependency. The second branch utilizes a Transformer-LiteFF structure. It performs global dependency modeling on the flow statistical features. This process extracts the non-redundant representation of the statistical sequence. Finally, the design includes a Gated Fusion Mechanism. This mechanism efficiently integrates the multi-dimensional features. It then completes the intrusion classification. The paper validates the proposed CPS-IDS on three public datasets: CICIDS2017, UNSW-NB15 and CICIoT23. The method achieves accuracies of 99.92%, 94.54% and 97.91%, respectively, in multi-classification tasks. The experimental results demonstrate that CPS-IDS surpasses existing mainstream models in both accuracy and generalization. The system thus provides an effective solution for improving intrusion detection performance in complex network environments.

Keywords

Network intrusion detection; malicious traffic detection; feature fusion; modeling temporal dependencies; DeBERTav2

1 Introduction

Malicious network traffic presents substantial challenges to intrusion detection due to its high concealment, heterogeneity, and rapid evolution. The widespread deployment of Transport Layer Security (TLS/SSL) has resulted in a large proportion of encrypted traffic, rendering traditional detection mechanisms incapable of directly interpreting payload contents. In addition, the behavioral differences between benign and malicious traffic have become increasingly subtle, while contemporary attack strategies exhibit enhanced stealthiness and polymorphism. These factors collectively constrain the accuracy, stability, and generalization capability of existing intrusion detection systems.

To overcome these limitations, this paper introduces CPS-IDS, a Transformer-based [1] intrusion detection framework that integrates deep semantic features with statistical representations. The framework adopts the network session as the fundamental analysis unit. To address the pronounced class imbalance in intrusion datasets, an Auxiliary Classifier Generative Adversarial Network (AC-GAN [2]) is incorporated to synthesize additional samples for minority attack categories. For feature representation, the DeBERTav2 [3] module is employed to extract deep semantic information from payload sequences, the global dependency relationships in the statistical flow features are extracted using Transformer-LiteFF. These two complementary feature branches are subsequently fused to enhance the comprehensiveness and discriminative power of the learned representations, thereby enabling efficient and robust intrusion classification.

The main contributions of this article are summarized as follows: (1) A multi-modal feature fusion mechanism based on Cross-Attention has been proposed. Unlike existing IDS models that merely fuse features through simple vector concatenation, CPS-IDS is the first to achieve forced alignment modeling of payload semantics and inter-arrival time of packets. Through the cross-attention mechanism, the model can capture the nonlinear correlations of attack intentions in the payload content and temporal distribution, significantly enhancing the perception ability for low-frequency and covert attack behaviors. (2) An innovative dual-branch heterogeneous architecture was constructed to mine multi-dimensional features. In the semantic branch, the decoupled attention mechanism of DeBERTav2 was utilized to deeply analyze the complex semantic dependencies among payload bytes. Compared with traditional models that extract shallow syntactic features, this architecture possesses stronger feature depth. In the statistical branch, the Transformer with the Lite-FF structure was integrated to model multi-dimensional statistical features, effectively addressing the dimension redundancy issue in traditional feature extraction and extracting more discriminative high-order representations. (3) To address the severe class imbalance issue in network traffic, the AC-GAN was introduced. This strategy not only can generate high-quality samples of the minority class but also ensures the consistency of labels of the generated samples through the auxiliary classification loss. Experiments show that this method effectively alleviates the overfitting caused by the long-tail distribution of data and enhances the detection robustness of the model in real non-balanced environments.

The remainder of this paper is organized as follows. Section 2 reviews related studies. Section 3 details the architecture and core modules of CPS-IDS. Section 4 presents the datasets and experimental configurations. Section 5 reports the evaluation results and comparisons with baseline models. Section 6 concludes the work and outlines potential directions for future research.

2 Related Work

Machine learning (ML) and deep learning (DL)–based approaches for malicious traffic detection have undergone significant development over the past decades. Network Intrusion Detection Systems (NIDS) incorporating DL architectures, such as Convolutional Neural Networks (CNN) [4], Recurrent Neural Networks (RNN) [5], Autoencoders [6], and Long Short-Term Memory networks (LSTM) [7], have demonstrated strong performance in detecting large volumes of anomalous data and novel network attacks [8,9].

Traditional NIDS primarily rely on conventional ML models, such as Support Vector Machines (SVM) and Decision Trees, which classify traffic based on statistical features including session duration and packet counts. Fu et al. [10] reported that methods leveraging statistical and temporal features can achieve accurate detection of known malicious traffic and exhibit certain generalization capability; however, their performance is limited against previously unseen attacks. To improve detection, Liu et al. [11] proposed FS-Net, which learns feature representations from packet length sequences using a GRU-based autoencoder and reconstruction mechanism. Hu et al. [12] modeled packet sequences as graphs and applied Graph2Vec to obtain session-level embeddings. Zhang et al. [13] developed a real-time detection framework based on Online Isolation Forest for SD-WAN environments, enabling low-latency anomaly detection via packet-level and sequence-level modeling. Xie et al. [14] further combined TCP protocol awareness and traffic augmentation to extract packet sequence features from TLS-encrypted traffic for classification.

Compared with traditional ML methods, DL-based models offer stronger capabilities in automated feature extraction and complex traffic modeling. Yu et al. [15] designed a dual-CNN network to capture packet-level and session-level features in parallel, enhancing detection performance for minor classes through few-shot learning. Bader et al. [16] combined CNN and LSTM to learn spatiotemporal representations from raw byte streams while incorporating statistical features to improve detection efficacy. Zhu et al. [17] proposed CMTSNN, employing a cascaded CNN-LSTM structure for feature fusion, and Lin et al. [18]’s TSCRNN optimized temporal and spatial feature modeling. With the introduction of Transformer models, Han et al. [19] applied them to traffic classification, jointly considering packet headers and payloads. Wang et al. [20] proposed a Transformer-based NIDS that combines unsupervised learning with self-supervised masked reconstruction to enhance anomaly representation. Han et al. [21]’s ECNet further integrated multi-view features using gated feature fusion and a confidence mechanism, improving both representation capacity and detection robustness.

In terms of enhancing the transparency and adaptability of the detection system, Farooqui et al. [22] proposed the AutoSHARC framework. This research combines SHAP interpretive analysis with a feedback-driven re-training mechanism. By identifying the key features that cause a decline in QoS, it guides the re-training of the model, effectively solving the problem of performance degradation of detection models in the IoT environment due to network dynamic changes. Regarding the deployment challenge of deep learning models on resource-constrained devices, Umair et al. [23] explored the application of knowledge distillation (Knowledge Distillation) technology. They transferred the knowledge of the complex teacher model to the lightweight student model, maintaining high detection accuracy while significantly reducing inference time and memory usage, providing a feasible solution for real-time anomaly detection in consumer IoT devices.

In summary, DL-based methods integrating architectures such as Transformer, CNN, and GRU can substantially improve feature extraction efficiency and intrusion detection accuracy. Nevertheless, these approaches still face limitations in fully capturing spatial, temporal, and inter-packet dependencies in network traffic, resulting in incomplete feature representations and reduced robustness of classification. Therefore, enhancing the robustness and generalization of DL-based NIDS in complex and dynamic network environments remains a critical and urgent research challenge.

3 Proposed Method

This section presents a detailed description of the proposed Transformer-based intrusion detection framework, CPS-IDS, which integrates deep semantic features, temporal behaviors, and statistical representations. The model consists of three primary components: a data processing module, a feature learning module, and a feature fusion module. The overall architecture is illustrated in Fig. 1. CPS-IDS can be deployed at Local Area Network (LAN) boundaries or on edge nodes to enable real-time monitoring and analysis of communications between the internal LAN and the external Internet.

images

Figure 1: Overall architecture diagram.

3.1 Overview

The detection workflow of CPS-IDS is summarized in Algorithm 1. During training, an AC-GAN is employed to generate synthetic samples for minority-class attack traffic. This approach addresses the class imbalance problem and enhances the model’s ability to recognize rare attack types.

images

3.2 Data Processing Module

3.2.1 Data Extraction Module

The network traffic analysis framework NFStream [24] was employed to process raw PCAP files, systematically capturing both standard flow metrics and packet payload contents for each flow. During extraction, relevant statistical and protocol features were stored as structured CSV files for subsequent modeling. In the feature selection phase, dimensions highly correlated with network protocol behavior were retained from multiple time windows, including communication frequency, packet direction, flow duration, port usage, and TCP control flags. To mitigate the impact of differing magnitudes among statistical features and to improve training convergence and model stability, all numerical features were standardized using the Z-score method.

To convert raw network byte streams into structured representations suitable for deep learning, a hierarchical payload processing and serialization mechanism is adopted. Considering that discriminative semantic cues of network attacks (e.g., malicious signatures and protocol negotiation fields) are primarily concentrated at the early stage of a session, a two-level truncation strategy is employed. At the flow level, for each flow defined by a 5-tuple, only the first K=8 packets containing payloads are retained. At the packet level, the first B=64 bytes of each packet payload are extracted. This results in a fixed-length byte stream of K×B=512 bytes per flow, which preserves essential attack semantics while eliminating redundant transmissions and reducing computational overhead.

Following ET-BERT [25] and PEAN [26], bigram tokenization is adopted instead of single-byte tokenization. Bigram tokenization expands the vocabulary from 28 to 216, enabling a more expressive representation of common 16-bit protocol fields (e.g., ports and length indicators) and mitigating semantic fragmentation. A non-overlapping sliding window with window size W=2 and stride S=2 is applied to the byte stream:

tj=(b2j−1,b2j),j=1,…,256(1)

yielding a fixed-length sequence of 256 tokens per flow. This design aligns with protocol field structures and maintains strict byte consistency, making it more suitable for network traffic than adaptive subword schemes such as BPE.

The sequence length of 256 tokens is selected as a trade-off between semantic coverage and computational efficiency, effectively controlling the O(L2) complexity of Transformer self-attention. For flows with insufficient payload length, a special [PAD] token is appended to ensure consistent input dimensionality, facilitating stable batch processing and seamless integration with the downstream DeBERTaV2 encoder.

To comprehensively capture temporal traffic behavior, the inter-packet arrival time series was included as a third feature type during preprocessing. This time series, which reflects connection activity and protocol interaction rates, is computed as the difference between arrival times of consecutive packets within the same flow. Consequently, the preprocessing stage produces three distinct feature types: (1) inter-packet arrival time series, (2) payload token sequences, and (3) flow statistical features. Each feature type is subsequently processed by its corresponding modeling module and integrated during the feature fusion phase.

During the sample preparation stage, in response to the highly unbalanced distribution of the number of normal traffic and attack traffic in different network traffic data sets, this paper adopts a hybrid sample balancing strategy. Specifically, for the majority class samples (normal traffic) that account for a significantly higher proportion, they are randomly downsampled by 5% of their original size to effectively control the sample size; at the same time, all attack category samples are completely retained, thereby achieving an initial category balance of the training data without introducing any synthetic samples.

To verify the rationality of this sampling strategy, this paper conducted a comparative analysis of the category distribution before and after sampling on three public datasets with different distribution characteristics, namely CICIDS2017, UNSW-NB15, and CICIoT2023. The results are shown in Fig. 2.

images

Figure 2: Comparison of label distributions before and after sampling on different datasets.

In the CICIDS2017 dataset, the number of normal traffic samples before sampling was much higher than that of various attack traffic samples, resulting in a severely imbalanced data distribution. After 5% random down-sampling, the scale of normal traffic was significantly reduced, while the sample quantities of Bot, Web Attack, Infiltration, and Heartbleed attack categories remained unchanged. The attack categories such as DoS, DDoS, and PortScan, although their quantities decreased, still maintained their relative proportion relationships, indicating that the internal structure of the attack traffic was not disrupted.

In the UNSW-NB15 dataset, the sampling strategy also significantly reduced the proportion of normal traffic in the training set. At the same time, all attack category samples were completely retained, including samples of attack types with relatively small quantities such as Worms, Backdoor, and Shellcode. The distribution of attack categories before and after sampling was completely consistent, further verifying the effectiveness of this strategy in maintaining the stability of the attack distribution. Given the extremely large scale of attack traffic in the CICIoT2023 dataset, this paper conducts a comparative analysis of normal traffic and attack traffic from a macro perspective. The results show that the sampling operation effectively controlled the scale of attack traffic, making it suitable for model training. Meanwhile, various types of DDoS, DoS, and Mirai attacks still retain sufficient sample quantities, ensuring the diversity and representativeness of attack types.

The adopted hybrid downsampling strategy can effectively alleviate the class imbalance problem on different datasets. While significantly reducing the training cost, it maintains the original distribution structure of attack traffic and does not introduce additional noise, providing a reliable data foundation for subsequent model training and performance evaluation.

3.2.2 Minority Class Sample Generation and Augmentation Module

To mitigate the class imbalance arising from the predominance of normal traffic and the scarcity of attack samples, this study employs an AC-GAN for data augmentation following the initial preprocessing stage. As an extension of the GAN, AC-GAN incorporates conditional information, such as class labels, into both the generator and the discriminator. The generated pseudo-samples consist of three types of traffic features: the payload vector, the inter-packet arrival time series vector, and the statistical feature vector. The model structure of AC-GAN is shown in Table 1.

images

The generator employs a joint architecture comprising three parallel sub-networks, which take a low-dimensional latent noise vector z∼𝒩(0,I)∈Rd and a one-hot encoded category label c∈RNclass as inputs. The payload token generator utilizes a Transformer-based sequence generation model. The concatenated input is projected into a 256-dimensional latent space and augmented with positional embeddings, followed by two Transformer encoder layers (configured with 4 attention heads and a 2048-dimensional feed-forward network). A final linear head maps the representations to a 65,532-size vocabulary to generate a payload token sequence of length 256. The statistical and temporal sub-generators both adopt a dual-layer Transformer architecture. By incorporating learnable [CLS] tokens to aggregate global dependencies, these modules synthesize continuous feature vectors that maintain dimensional consistency with real-world traffic statistics and inter-arrival times (IAT).

The discriminator D takes the feature triplet (xpayload,xtime,xstats) and its corresponding category label c as inputs. To align the heterogeneous modalities, three independent linear layers integrated with spectral normalization project the features into a unified 256-dimensional latent space; this normalization is crucial for mitigating gradient explosion and stabilizing adversarial training. The projected representations are then concatenated with the label embedding and fed into a dual-layer MLP (hidden dimensions of 512 and 256). The network terminates in two specialized heads: an adversarial head (Ladv) that discriminates between authentic and synthesized samples, and an auxiliary classification head (Laux) that employs a Softmax function to predict the categorical distribution of the attack types.

The adversarial loss ℒadv encourages the generator G to approximate the true data distribution pdata, thereby deceiving the discriminator D. It can be minimized only if the generated features, including xpayload and xstats, closely resemble real traffic in both statistical and semantic properties:

ℒadv=Ex∼pdata[log⁡Dadv(x,c)]+Ez∼pz[log⁡(1−Dadv(G(z,c),c))].(2)

The auxiliary classification loss ℒcls ensures that the pseudo-sample G(z,c) conforms to the features of the specified class c. For example, when c represents a “Web Attack,” the generated payload xpayload contains Web Attack–related tokens, and xstats reflects corresponding statistical properties, such as short connections and low traffic volume:

ℒcls=Ex∼pdata[−log⁡Dcls(c∣x)]+Ez∼pz[−log⁡Dcls(c∣G(z,c))].(3)

The comprehensive objectives for the discriminator and generator are formulated as:

ℒD=−ℒadv+λℒcls,ℒG=ℒadv+λℒcls,(4)

where λ is a balancing coefficient. During my training, I set the value of λ at 0.3. By jointly minimizing ℒD and ℒG, AC-GAN generates high-quality pseudo-samples that are consistent with real samples of the target class in both semantic and statistical spaces, effectively addressing the limitations of traditional GANs on high-dimensional structured data.

During training, the model is optimized using the Adam optimizer, with the learning rates for the generator (lrg) and discriminator (lrd) meticulously set to 1×10−4 and 2×10−5, respectively. The batch size is uniformly set to 32. To improve generalization and prevent overconfidence, label smoothing is integrated into the training objective. Furthermore, a dynamic sample augmentation strategy is implemented: AC-GAN generates proportional synthetic samples for any minority class whose population falls below the predefined threshold of 8000. The training regime incorporates a warm-up phase for the first 10 epochs, followed by an early stopping mechanism (Patience = 3) to mitigate overfitting. This comprehensive approach ensures that the model effectively addresses class imbalance, leading to a substantial enhancement in recall rates for infrequent intrusions, such as UDP-Flood and Web attacks.

3.3 Feature Learning Module

To balance the detection accuracy and inference efficiency, CPS-IDS adopts a lightweight parameter configuration. As shown in Table 2, the semantic branch uses a 2-layer DeBERTav2 structure, with the hidden layer dimension set to 256, extracting byte-level features through decoupled attention mechanisms; meanwhile, the temporal encoder and the statistical branch maintain the same dimensional space, facilitating multimodal feature integration through a gated fusion mechanism.

images

3.3.1 Payload Semantic Modeling

The payload segment of network traffic contains specific communication content, such as HTTP requests, DNS queries, and fragments of attack commands. These sequences often include potential attack patterns, command signatures, and characteristic strings, which are valuable for intrusion detection. In this study, DeBERTav2 is employed as the feature representation learner for payload data. Compared with the standard BERT model [27], DeBERTav2 introduces architectural innovations that better capture complex semantic dependencies in non-natural language sequences, such as hexadecimal byte streams.

The raw payload is first converted into a hexadecimal byte sequence:

p=b1,b2,…,bL,(5)

where L denotes the payload length. A Bigram tokenization strategy is then applied, combining two adjacent bytes into a single token:

t=t1,t2,…,tN,ti∈V,(6)

where V represents the 16-bit vocabulary and N is the token sequence length. Sequences shorter than N are padded with the [PAD] token, while longer sequences are truncated to ensure uniform input dimensions.

In DeBERTav2, each token is mapped into a content embedding and a position embedding:

hi=Econtent(ti)+Epos(i),(7)

where Econtent and Epos denote content and position embeddings, respectively. The model employs decoupled attention to independently capture semantic and positional dependencies. In byte sequences, attack instructions or feature signatures often depend on specific byte combinations and their relative positions. Decoupled attention enables precise modeling of such sparse yet critical dependencies:

Attn(Qc,Kc,Qp,Kp,V)=Softmax(QcKc⊤d+QcKp⊤d+QpKc⊤d)V,(8)

where Qc,Kc originate from content embeddings and capture semantic dependencies, while Qp,Kp come from position embeddings and capture structural dependencies. This mechanism is crucial for interpreting protocol fields and malicious code fragments within unstructured byte streams.

During pre-training, DeBERTav2 employs traditional Masked Language Modeling (MLM) and introduces the Enhanced Mask Decoder (EMD) to improve context representation. For a masked sequence, the EMD objective is:

ℒEMD=−∑i∈ℳlog⁡Pθ(xi∣X,C),(9)

where ℳ denotes the set of masked token indices, and C represents the enhanced context encoding, including both content and position information. This allows the model to reconstruct missing bytes based on a rich global context, improving generalization in downstream detection tasks.

After multi-layer decoupled Transformer encoding and EMD pre-training, the payload sequence representation is:

H=h1,h2,…,hL,hi∈Rd.(10)

The [CLS] token embedding is taken as the global payload semantic representation:

Hpayload=h[CLS],(11)

which serves as the feature embedding for the network traffic payload.

3.3.2 Temporal Behavior Modeling

To capture the temporal dynamics of network flows, packet interarrival time series are utilized as input features. The sequence xtime is constructed by computing the time differences between consecutive packet arrivals and is subsequently fed into the Time Encoder. The encoder consists of a single-layer Transformer and employs a Time Embedding to map interval values into a high-dimensional space, enabling effective modeling of temporal dependencies. The resulting output sequence is denoted as Htime.

To integrate temporal information with payload semantics, the payload representation Hpayload and the interarrival time embedding Htime are input into a Cross-Attention module for deep interaction:

Zp=CrossAttn(Q=Hpayload,K=Htime,V=Htime).(12)

Here, Hpayload serves as the query Q, while Htime acts as both the key K and value V. The output Zp represents the fused payload-time semantic features, capturing the cross-modal relationships between packet content and interarrival time sequences. This fused representation Zp encodes temporal coordinate information and serves as the input for the subsequent gated fusion module.

3.3.3 Global Dependency Modeling of Statistical Features

Statistical features xstats, including flow duration, packet count, byte count, and the usage frequency of TCP protocol flags, provide macroscopic quantitative information about network traffic behavior. Conventional approaches often treat these features as flattened vectors, ignoring intrinsic correlations and global dependencies across feature dimensions.

To overcome this limitation, we propose the Transformer-LiteFF block for global dependency modeling of the statistical feature sequence xstats. Its structure is shown in Fig. 3. Each feature dimension is treated as a token and embedded into a high-dimensional space through a linear projection:

Hstats=Linear(xstats).(13)

images

Figure 3: Transformer-LiteFF block.

The embedded sequence Hstats is then fed into a Transformer Encoder, where the self-attention mechanism captures global dependencies among all feature dimensions. For example, the model can learn non-linear correlations between “packet count” and “flow duration” under specific attack types:

Hattn=TransformerEncoder(Hstats).(14)

To obtain a compact, non-redundant representation, we incorporate the Lite-FF Structure (Lite FlowFormer Block) [28]. This structure integrates a bottleneck layer into the Transformer’s feed-forward network, forcing feature compression while preserving critical information, thereby enhancing efficiency and reducing redundancy.

Finally, the processed statistical feature sequence Hlite is aggregated using a 1D Convolutional Layer to extract local key patterns and condense them into a single statistical feature vector Zs:

Zs=Conv1D(Hlite).(15)

The resulting vector Zs encodes global correlations and a compact representation of traffic statistical features, serving as input to the subsequent feature fusion module.

3.4 Feature Fusion Module

The Feature Fusion Module is designed to integrate the outputs of the dual-branch feature learning module, namely the joint semantic-temporal representation Zp and the statistical feature representation Zs, into a unified multi-modal feature vector. Due to the high heterogeneity and distinct semantic spaces of these two feature types (sequence-based vs. vector-based), simple concatenation is insufficient for effective integration.

To address this, a Gated Fusion Mechanism is employed for adaptive and efficient feature combination. A gating vector g is computed to dynamically control the contribution of each branch to the final representation. The gating vector is obtained via a simple Multi-Layer Perceptron (MLP) applied to the concatenated features:

g=σ(MLP([Zp‖Zs])),(16)

where σ denotes the sigmoid activation function, constraining g to values between 0 and 1, and [⋅‖⋅] represents vector concatenation.

The fused feature vector F is then computed as a gated weighted sum:

F=g⊙Zp+(1−g)⊙Zs,(17)

where ⊙ represents the Hadamard product (element-wise multiplication). This mechanism ensures that the final feature vector effectively balances semantic-temporal and statistical information, providing a comprehensive representation for subsequent classification.

4 Datasets and Experimental Setup

4.1 Dataset Description

This study employs three authoritative datasets representing diverse network scenarios: traditional networks (CICIDS2017 [29]), hybrid modern networks (UNSW-NB15 [30]), and IoT edge networks (CICIoT23 [31]). Common attack types such as DDoS, Brute Force, and Botnet are present across these datasets, enabling the extraction of shared attack patterns in protocol payloads, connection frequency, packet size distribution, traffic rate, mean and variance of packet lengths, and flow duration. The diversity of attacks provides complementary differentiation for the model: CICIDS2017 emphasizes session-level features for Web attacks (e.g., XSS), UNSW-NB15 supplements payload features for Shellcode attacks, and CICIoT23 includes resource-exhaustion attacks specific to IoT devices.

4.2 Experimental Setup

To evaluate the generalization and detection performance of the proposed method, experiments were conducted on the three datasets mentioned above. Key experimental configurations, hardware environment, and training strategies are detailed as follows.

Experiments were performed on a system equipped with an NVIDIA GeForce RTX 3070 GPU (8 GB VRAM), a 12th Gen Intel Core i5-12400 CPU, and 16 GB RAM. The operating system was [Placeholder for OS, e.g., Ubuntu 20.04], the deep learning framework was PyTorch 2.0, and Python version 3.10.

During training, the parameter configuration is shown in Table 3. The Cross-Entropy loss was used as the primary classification objective, with class-weighted samples applied to mitigate data imbalance. Model parameters were optimized using the AdamW optimizer, combined with a learning rate warm-up and decay schedule. To rigorously evaluate the generalization performance of the proposed CPS-IDS and prevent potential data leakage, a strict flow-level data separation strategy was adopted. The dataset was partitioned into training and testing sets with an 8:2 ratio. Unlike conventional packet-level random splitting, network packets were first aggregated into discrete flows according to their 5-tuple information (source IP, destination IP, source port, destination port, and protocol). Each flow, uniquely identified by its Flow ID, was assigned exclusively to either the training or testing set, ensuring that no packets from the same flow appeared in both subsets. Furthermore, highly identifiable fields such as IP addresses and timestamps were excluded during feature extraction to prevent the model from memorizing host-specific patterns. This design forces the framework to learn intrinsic semantic signatures and statistical dynamics of attack behaviors rather than dataset-specific artifacts. During training, the optimal model was selected based on validation performance, and the final results—including Accuracy, Precision, Recall, and F1-score—were reported on the independent test set to ensure the reliability and transparency of the experimental evaluation.

images

5 Results and Evaluation

To comprehensively assess the performance of the proposed CPS-IDS, four widely-used metrics are employed: Accuracy, Precision, Recall, and F1-score. These metrics are standard in multi-class intrusion detection and effectively quantify the model’s classification performance across different categories.

5.1 Performance Metrics

Multi-classification experiments were conducted on the CICIDS2017, UNSW-NB15, and CICIoT2023 datasets using the proposed method.

As presented in Table 4 and Fig. 4, CPS-IDS achieves a weighted-average F1-score of 0.9992. Notably, despite the high performance on majority classes, the recall for Infiltration remains relatively lower at 0.8947. This is primarily due to the feature manifold overlap; infiltration attacks often involve legitimate-looking internal-to-internal communications, which share high statistical and semantic similarity with Benign traffic. The AC-GAN helps bridge the quantity gap (from a support of only 38 samples), but the inherent ‘mimicking’ nature of such attacks poses a persistent challenge for absolute separation.

images

Figure 4: CICIDS2017 confusion matrix.

Table 5 and Fig. 5 illustrate the framework’s performance on the more complex UNSW-NB15 dataset. While the weighted-average F1-score reaches 0.9440, the recall rates for Analysis (0.3415) and DoS (0.3874) are noticeably constrained. We attribute this to two factors: extreme initial scarcity and high intra-class variance. Although AC-GAN balances the class distribution, the initial information available for the ‘Analysis’ category is insufficient to capture all its diverse sub-attack vectors, potentially leading to a partial mode collapse where the generator synthesizes only a subset of attack patterns. Furthermore, the overlapping feature space between low-intensity DoS and Benign bursts limits the classifier’s ability to draw a sharp decision boundary.

images

Figure 5: UNSW-NB15 confusion matrix.

The results in Table 6 and Fig. 6 indicate that CPS-IDS performs excellently on the CICIoT2023 dataset. However, WebAttack shows a recall of 0.7885. In IoT environments, Web-based attacks often utilize simplified protocols that can be statistically indistinguishable from legitimate device-to-cloud management traffic. This suggests that while multimodal fusion (Payload + Time + Stats) significantly improves detection compared to single-modality baselines, the dynamic nature of IoT traffic requires even more granular temporal modeling to fully isolate these stealthy intrusions.

images

Figure 6: CICIoT2023 confusion matrix.

Overall, the results across three benchmark datasets validate that the AC-GAN effectively mitigates class imbalance by populating underrepresented regions of the feature space. However, as noted in the persistent challenges for ‘Infiltration’ and ‘Analysis’, generative augmentation alone cannot fully resolve cases where malicious features are inherently similar to benign patterns. This underscores the necessity of our Cross-Attention mechanism, which attempts to leverage temporal-semantic dependencies to provide the additional discriminative power required for these challenging cases.

5.2 Comparative Experiments

To comprehensively assess the effectiveness of the proposed method, in the multi-classification experiment, I selected some intrusion detection methods from the past three years, such as Et-bert [25], Transformer [32], Graph2vec+RF [12], GTID [19], DeepCLG [33] and CTWA [34], and compared them with my method. The comparison indicators included Accuracy, Precision, Recall and F1-score. The specific experimental results are shown in Table 7.

images

As shown in Table 7 and Fig. 7, the proposed method consistently outperforms the comparative approaches across all three datasets. It achieves the highest Accuracy, Precision, Recall, and F1-score in every case. These results confirm the method’s effectiveness and superiority in multi-class intrusion detection, demonstrating stronger feature representation, enhanced anomaly detection capability, and robust generalization across diverse network scenarios.

images

Figure 7: Bar chart of the comparison experiment.

5.3 Generation of Sample Quality Assessment

To intuitively verify the effectiveness of the minority sample augmentation module in the CPS-IDS model, t-SNE (t-Distributed Stochastic Neighbor Embedding) is employed to project the high-dimensional fusion embeddings onto a 2D plane for visual analysis. For each of the three datasets—UNSW-NB15, CIC-IDS2017, and CIC-IoT23—800 real samples were randomly selected from the test sets, while 800 synthetic samples were generated for the corresponding classes using the trained conditional generator. The feature extraction point is situated at the multimodal feature fusion layer of the classifier, which captures a joint representation of payload semantics, flow statistics, and temporal characteristics. Fig. 8 illustrates the distribution comparison, where blue dots represent real traffic samples and red dots represent synthetic samples.

images

Figure 8: The t-SNE visualization display of the deep semantic features of the real and synthetic samples in CIC-IDS2017 (a), UNSW-NB15 (b), and CIC-IoT23 (c).

As shown in Fig. 8c, the red synthetic samples exhibit a high degree of overlap and intertwining with the blue real samples in the feature space. This demonstrates that the proposed generator accurately captures the intrinsic correlations between multimodal features, ensuring that the generated samples not only approximate the real distribution numerically but also maintain consistency in deep semantic logic. In Fig. 8a,b, real samples form several distinct, discrete clusters corresponding to various attack subcategories or behavioral patterns. Rather than being concentrated at a single point, the synthetic samples are distributed within and around the boundaries of these real clusters. This confirms the robust manifold coverage of the generation module, which effectively fills sparse regions in the feature space originally occupied by minority samples. In conclusion, the t-SNE visualization confirms that CPS-IDS is capable of generating minority samples with high fidelity and diversity.

5.4 Ablation Experiment

Through a systematic ablation study, we evaluate the contribution of four critical components—AC-GAN Sample Augmentation, DeBERTav2 Representation Learning, Statistical Feature Modeling, and Temporal Behavior Features with their Fusion—to the overall detection performance of the proposed model. Experiments were conducted on three public datasets: CICIDS2017, UNSW-NB15, and CICIoT2023. The results are summarized in Table 8 and Fig. 9.

images

Figure 9: Bar graph of the ablation experiment.

The results in Table 8 and Fig. 9 confirm that the Full Method achieves the best performance across all three datasets, demonstrating the effective synergy among all proposed modules. Removing any component leads to notable performance degradation, highlighting the necessity of multi-modal fusion and augmentation.

Key observations from the ablation study are as follows:

1. AC-GAN Augmentation (w/o AC-GAN): Beyond the F1-Score drop on UNSW-NB15 (−0.0238), the removal of AC-GAN leads to a sparse and biased feature space. Without synthesized minority samples, the classifier’s decision boundary collapses toward the majority class. AC-GAN improves feature quality by “filling” the underrepresented regions of the manifold with high-fidelity synthetic samples, ensuring the model learns a more continuous and robust class-separation boundary.

2. DeBERTav2 Semantic Learning (w/o Payload): The massive decline in UNSW-NB15 (F1-Score −0.1919) indicates that omitting payload semantics deprives the model of contextual discriminators. DeBERTav2’s disentangled attention improves feature quality by isolating the relative positions of bytes from their content, allowing the model to distinguish between similar byte sequences that serve different malicious functions. Without it, the model relies on shallow statistical signatures, failing to capture the “intent” of the attack.

3. Transform-Conv Statistical Modeling (w/o Stats): The sharp reduction on CICIoT2023 (−0.1926) highlights the loss of global flow-level descriptors. Statistical features provide a macroscopic view of the traffic (e.g., total volume, flow duration). Removing this module lowers feature quality by limiting the model to localized byte-level patterns, making it blind to high-volume flooding attacks (e.g., DDoS) that are characterized by their statistical frequency rather than their payload content.

4. Temporal Feature Fusion (w/o Temporal): Excluding the Time Encoder leads to a loss of dynamic behavioral signatures. Network attacks often exhibit specific “rhythms” (e.g., periodic heartbeat in C&C traffic or rapid bursts in scanning). Joint temporal-payload modeling ensures that semantic tokens are weighted by their arrival patterns. Removing this component degrades feature quality into a “static” representation, losing the ability to differentiate between legitimate high-speed communication and rapid automated attack bursts.

5. Lite-FF Structure (w/o Lite-FF): The impact of removing Lite-FF varies across datasets, which explains the observed inconsistency in the ablation trends. Lite-FF is not designed as a standalone discriminative module but rather as a lightweight refinement component for statistical feature representations. Its effectiveness therefore depends on the relative importance of statistical features within the overall feature hierarchy. In IoT-oriented datasets such as CICIoT2023, where payload information is limited or weakly informative, statistical features constitute the primary source of behavioral evidence. Under this condition, Lite-FF significantly improves representation compactness by suppressing redundancy and noise in high-dimensional statistical inputs, leading to more stable and discriminative flow-level representations. Consequently, its removal results in a noticeable performance degradation. In contrast, for CICIDS2017 and UNSW-NB15, payload semantic representations dominate the decision process, while statistical features mainly play a complementary role. In such scenarios, the refinement provided by Lite-FF yields a smaller marginal benefit, resulting in less consistent performance changes across datasets. This observation highlights that Lite-FF contributes adaptively depending on the dominance of statistical features in the target traffic domain.

5.5 Model Performance Analysis

In order to further evaluate the deployment feasibility of CPS-IDS in the actual network environment, this section quantitatively analyzes the number of model Parameters (Parameters), computational complexity (FLOPs), inference Latency (Latency) and training efficiency. Since ET-BERT is the representative Transformer model in this field and provides an open source implementation, it is used as the main comparison benchmark in this experiment. The comparison results are shown in Table 9.

images

The total parameter count of CPS-IDS is only 29.69 M, which is approximately 77.5% less than that of ET-BERT. This is mainly attributed to the introduction of the Lite-FF structure in the statistical dependency branch and the adoption of a more streamlined heterogeneous dual-branch design in this paper. While maintaining high detection accuracy, it significantly reduces the model’s occupation of storage resources, making it more conducive to the deployment of edge computing devices. In terms of computational complexity, the FLOPs of CPS-IDS is 5.11 GFLOPs, which is only about half of that of ET-BERT. This indicates that when processing the same scale of traffic data, the model proposed in this paper consumes less computing resources. In the single-sample inference scenario (Batch Size = 1), the average delay of CPS-IDS is 6.54 ms, which is approximately 2.39 ms shorter than that of ET-BERT. In a high-speed network environment, the reduction of milliseconds-level delay is crucial for preventing the spread of malicious attacks. In conclusion, CPS-IDS not only performs exceptionally well in detection accuracy, but also has significant advantages in model lightweighting and execution efficiency, achieving a good balance between performance and efficiency.

6 Conclusion

To address the increasingly complex and covert intrusion behaviors in modern networks, this paper proposes a deep intrusion detection system that combines an AC-GAN enhancement mechanism with Transformer-based modeling. The model integrates two complementary traffic features: semantic representations of payloads extracted via the DeBERTav2 module, and packet interarrival time series. These are fused through a Cross-Attention mechanism, effectively capturing both the temporal dynamics of network flows and their semantic dependencies on payload content, providing a fine-grained basis for intrusion detection.

Additionally, statistical behavior is modeled using the Transformer-LiteFF block, while the AC-GAN is employed to augment minority-class attack traffic. This approach is designed to alleviate data scarcity and enhance the model’s robustness againstimbalanced attack samples.

Experimental results on multiple mainstream datasets suggest that the proposed system achieves highly competitive detection performance, particularly in improving the recognition of minority attack classes. While the optimized DeBERTav2 and Transform-Conv designs aim to control computational cost, further validation is required to ensure their real-time performance in high-throughput production environments or on resource-constrained edge devices. Future research will focus on enhancing the model’s transferability to diverse real-world network settings and optimizing inference latency for line-rate detection.

Acknowledgement: State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang.

Funding Statement: This work was supported in part by the National Natural Science Foundation of China under Grant 62262004 (http://www.nsfc.gov.cn).

Author Contributions: Conceptualization: Lan Xiong, Liang Wan and Jingxia Ren; methodology: Lan Xiong; software: Lan Xiong; validation: Lan Xiong, Liang Wan and Jingxia Ren; formal analysis: Lan Xiong; investigation: Lan Xiong; resources: Lan Xiong; data curation: Lan Xiong; writing—original draft preparation: Lan Xiong; writing—review and editing: Lan Xiong, Liang Wan and Jingxia Ren; visualization: Jingxia Ren; supervision: Liang Wan; project administration: Liang Wan; funding acquisition: Liang Wan. All authors reviewed and approved the final version of the manuscript.

Availability of Data and Materials: The datasets used in this study—CICIDS2017, UNSW-NB15, and CICIoT2023—are publicly available. They can be accessed from their respective sources: CICIDS2017 (https://www.unb.ca/cic/datasets/ids-2017.html), UNSW-NB15 (https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/), and CICIoT2023 (https://www.unb.ca/cic/datasets/iot-2023.html).

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4–9; Long Beach, CA, USA. p. 5998–6008. [Google Scholar]

2. Mirza M, Osindero S. Conditional generative adversarial nets. arXiv:1411.1784. 2014. [Google Scholar]

3. He P, Gao J, Chen W. DeBERTaV3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. arXiv:2111.09543. 2021. [Google Scholar]

4. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Proc Neural Inf Process Syst. 2012;25:1097–105. doi:10.1145/3065386. [Google Scholar] [CrossRef]

5. Zhang A, Lipton ZC, Li M, Smola AJ. Dive into deep learning. Cambridge, UK: Cambridge University Press; 2023. [Google Scholar]

6. Michelucci U. An introduction to autoencoders. arXiv:2201.03898. 2022. [Google Scholar]

7. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80. doi:10.1162/neco.1997.9.8.1735. [Google Scholar] [PubMed] [CrossRef]

8. Ferrag MA, Maglaras L, Moschoyiannis S, Komninos N. Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inf Secur Appl. 2020;50:102419. doi:10.1016/j.jisa.2019.102419. [Google Scholar] [CrossRef]

9. Lee SW, Mohammadi M, Rashidi S, Patoary MKA, Abbasi A. Towards secure intrusion detection systems using deep learning techniques: comprehensive analysis and review. J Netw Comput Appl. 2021;187:103111. doi:10.1016/j.jnca.2021.103111. [Google Scholar] [CrossRef]

10. Fu Z, Liu M, Qin Y, Gu G, Fan J, Wei M. Encrypted malware traffic detection via graph-based network analysis. In: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defens; 2022 Oct 26–28; New York, NY, USA. p. 495–509. doi:10.1145/3545948.3545983. [Google Scholar] [CrossRef]

11. Liu C, He L, Xiong G, Zhao B, Liu Z, Xie K, et al. FS-Net: a flow sequence network for encrypted traffic classification. In: Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications; 2019 Apr 29–May 2; Paris, France. p. 1495–503. doi:10.1109/INFOCOM.2019.8737507. [Google Scholar] [CrossRef]

12. Hu X, Gao W, Cheng G, Li J, Zhang G. Towards early and accurate network intrusion detection using graph embedding. IEEE Trans Inf Forensics Secur. 2023;18:5817–31. doi:10.1109/TIFS.2023.3318960. [Google Scholar] [CrossRef]

13. Zhang P, He F, Zhang H, Zhang W, He Z, Li Z, et al. Real-time malicious traffic detection with online isolation forest over SD-WAN. IEEE Trans Inf Forensics Secur. 2023;18:2076–90. doi:10.1109/TIFS.2023.3262121. [Google Scholar] [CrossRef]

14. Xie R, Wang Y, Cao J, Ma Y, Liu B, Li Z, et al. Rosetta: enabling robust TLS encrypted traffic classification in diverse network environments with TCP-aware traffic augmentation. In: Proceedings of the ACM Turing Award Celebration Conference; 2023 Jul 28–30; Wuhan, China. p. 131–2. doi:10.1145/3603165.3607437. [Google Scholar] [CrossRef]

15. Yu L, Dong J, Chen L, Xie R, Guo M, Zhou X. PBCNN: packet bytes-based convolutional neural network for network intrusion detection. Comput Netw. 2021;194:108117. doi:10.1016/j.comnet.2021.108117. [Google Scholar] [CrossRef]

16. Bader O, Lichy A, Hajaj C, Alghazi B. MalDIST: from encrypted traffic classification to malware traffic detection and classification. In: Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC); 2022 Jan 8–11; Las Vegas, NV, USA. p. 527–33. doi:10.1109/CCNC49033.2022.9700625. [Google Scholar] [CrossRef]

17. Zhu S, Xu X, Gao H, Lin J, Chen J, Liu S. CMTSNN: a deep learning model for multiclassification of abnormal and encrypted traffic of Internet of Things. IEEE Internet Things J. 2023;10:11773–91. doi:10.1109/JIOT.2023.3244544. [Google Scholar] [CrossRef]

18. Lin K, Xu X, Gao H. TSCRNN: a novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of IIoT. Comput Netw. 2021;190:107974. doi:10.1016/j.comnet.2021.107974. [Google Scholar] [CrossRef]

19. Han X, Cui S, Liu S, Liu J, Jiang S. Network intrusion detection based on n-gram frequency and time-aware transformer. Comput Secur. 2023;128:103171. doi:10.1016/j.cose.2023.103171. [Google Scholar] [CrossRef]

20. Wang W, Jian S, Tan Y, Peng Y, Liu J. Robust unsupervised network intrusion detection with self-supervised masked context reconstruction. Comput Secur. 2023;128:103131. doi:10.1016/j.cose.2023.103131. [Google Scholar] [CrossRef]

21. Han X, Liu S, Liu J, Cui S, Jiang S. ECNet: robust malicious network traffic detection with multi-view feature and confidence mechanism. IEEE Trans Inf Forensics Secur. 2024;19:6871–85. doi:10.1109/TIFS.2024.3426304. [Google Scholar] [CrossRef]

22. Farooqui MS, Khattak AA, Awaji BH, Alturki N, Alnazzawi N, Hanif M, et al. AutoSHARC: feedback driven explainable intrusion detection with SHAP-Guided Post-Hoc retraining for QoS sensitive IoT networks. Comput Model Eng Sci. 2025;145(3):4395–439. doi:10.32604/cmes.2025.072023. [Google Scholar] [CrossRef]

23. Umair M, Khan MS, Al Malwi W, Asiri F, Nafea I, Saeed F, et al. Knowledge distillation for lightweight and explainable intrusion detection in resource-constrained consumer devices. IEEE Trans Consumer Electron. 2025;71(4):12157–65. doi:10.1109/TCE.2025.3601183. [Google Scholar] [CrossRef]

24. Aouini Z, Pekar A. NFStream: a flexible network data analysis framework. Comput Netw. 2022;204:108719. doi:10.1016/j.comnet.2021.108719. [Google Scholar] [CrossRef]

25. Lin X, Xiong G, Gou G, Li Z, Shi J, Yu J. ET-BERT: a contextualized datagram representation with pre-training transformers for encrypted traffic classification. In: Proceedings of the WWW ’22: Proceedings of the ACM Web Conference 2022; 2022 Apr 25–29; Lyon, France. p. 633–42. doi:10.1145/3485447.3512217. [Google Scholar] [CrossRef]

26. Lin P, Ye K, Hu Y, Lin Y, Xu CZ. A novel multimodal deep learning framework for encrypted traffic classification. IEEE/ACM Trans Netw. 2023;31(3):1369–84. doi:10.1109/TNET.2022.3215507. [Google Scholar] [CrossRef]

27. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); 2019 Jun 2–7; Minneapolis, MA, USA. p. 4171–86. doi:10.18653/v1/N19-1423. [Google Scholar] [CrossRef]

28. Zhao R, Deng X, Yan Z, Ma J, Xue Z, Wang Y. MT-FlowFormer: a semi-supervised flow transformer for encrypted traffic classification. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22); 2022 Aug 14–18; Washington, DC, USA. p. 2576–84. doi:10.1145/3534678.3539314. [Google Scholar] [CrossRef]

29. Sharafaldin I, Lashkari AH, Ghorbani AA. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018); 2018 Jan 22–24; Funchal, Madeira, Portugal. p. 108–16. doi:10.5220/0006639801080116. [Google Scholar] [CrossRef]

30. Moustafa N, Slay J. UNSW-NB15: a comprehensive dataset for network intrusion detection systems. In: Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS); 2015 Nov 10–12; Canberra, ACT, Australia. p. 1–6. doi:10.1109/MilCIS.2015.7348942. [Google Scholar] [CrossRef]

31. Neto ECP, Dadkhah S, Ferreira R, Zohourian A, Lu R, Ghorbani AA. CICIoT2023: a real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors. 2023;23(13):5941. doi:10.3390/s23135941. [Google Scholar] [PubMed] [CrossRef]

32. Akuthota UC, Bhargava L. Transformer based intrusion detection for IoT networks. IEEE Internet Things J. 2025;12(5):6062–7. doi:10.1109/JIOT.2025.3525494. [Google Scholar] [CrossRef]

33. Gulzar Q, Mustafa K. Enhancing network security in industrial IoT environments: a DeepCLG hybrid learning model for cyberattack detection. Int J Mach Learn Cybern. 2025;16:4797–815. doi:10.1007/s13042-025-02544-w. [Google Scholar] [CrossRef]

34. Wang H, Yang Y, Tan P. CTWA: a novel incremental deep learning-based intrusion detection method for the Internet of Things. Artif Intell Rev. 2025;58:374. doi:10.1007/s10462-025-11358-9. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Xiong, L., Wan, L., Ren, J. (2026). Two-Branch Intrusion Detection Method Based on Fusion of Deep Semantic and Statistical Features. Computers, Materials & Continua, 87(3), 18. https://doi.org/10.32604/cmc.2026.076986

Vancouver Style

Xiong L, Wan L, Ren J. Two-Branch Intrusion Detection Method Based on Fusion of Deep Semantic and Statistical Features. Comput Mater Contin. 2026;87(3):18. https://doi.org/10.32604/cmc.2026.076986

IEEE Style

L. Xiong, L. Wan, and J. Ren, “Two-Branch Intrusion Detection Method Based on Fusion of Deep Semantic and Statistical Features,” Comput. Mater. Contin., vol. 87, no. 3, pp. 18, 2026. https://doi.org/10.32604/cmc.2026.076986

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Two-Branch Intrusion Detection Method Based on Fusion of Deep Semantic and Statistical Features

Abstract

Keywords

References

Cite This Article

1495

442

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link