Open Access
ARTICLE
ATC-FusionNet: A Hybrid Deep Learning Ensemble for Network Intrusion Detection Systems
1 School of Information Science and Engineering (School of Cyberspace Security), Zhejiang Sci-Tech University, Hangzhou, China
2 Key Laboratory of Intelligent Textile and Flexible Interconnection of Zhejiang Province, Hangzhou, China
3 Zhejiang Shannon Communication Technology Company Ltd., Hangzhou, China
* Corresponding Author: Jiang Wu. Email:
Computers, Materials & Continua 2026, 88(1), 42 https://doi.org/10.32604/cmc.2026.078591
Received 04 January 2026; Accepted 13 March 2026; Issue published 08 May 2026
Abstract
The rapid growth of networked systems and the increasing diversity of cyberattack behaviors have posed significant challenges to intrusion detection, particularly in scenarios characterized by high-dimensional features and severe class imbalance. Conventional detection approaches based on handcrafted rules or shallow representations often exhibit limited robustness under such conditions. To address these issues, this paper presents a hybrid deep learning framework for network intrusion detection that integrates complementary feature learning mechanisms within a dual-branch architecture. Specifically, a Transformer branch is employed to model long-range temporal dependencies in network traffic, while a convolutional neural network branch (CNN) is used to capture localized and fine-grained feature patterns. An attention-based fusion strategy is further introduced to adaptively aggregate branch-specific representations and enhance intrusion-sensitive features. In addition, an autoencoder-based feature reconstruction module is incorporated before the dual-branch network to compress and reconstruct input features through an encoder–decoder structure, thereby preserving essential behavioral characteristics of network traffic and improving feature discriminability. To mitigate the impact of class imbalance, a Dynamic Weighted Logit-adjusted Focal Loss (DWLF) is introduced to reduce the bias toward majority classes during model optimization. Extensive experiments conducted on two public benchmark datasets demonstrate the effectiveness of the proposed approach. The proposed model achieves an overall accuracy of 90.01% on the UNSW-NB15 dataset and 97.82% on the NF-CSE-CIC-IDS2018 dataset. Experimental results indicate improved robustness under highly imbalanced data distributions, demonstrating stable performance across datasets with different traffic characteristics.Keywords
The continuous advancement of information technologies and the accelerating pace of digital transformation have resulted in large-scale, highly interconnected network environments. Along with this evolution, cyber attacks have become increasingly diverse, concealed, and automated, posing persistent threats to modern networked systems [1]. Various security threats, including distributed denial-of-service (DDoS) attacks [2], privilege escalation, and multi-stage advanced persistent threats (APTs) [3], have become more frequent and destructive, creating serious risks for critical infrastructure and enterprise network security. In this context, timely and accurate identification of abnormal network traffic [4] has become a key research direction for safeguarding network security.
Network intrusion detection systems (NIDS) serve as essential security solutions that continuously analyze network environments to identify and monitor malicious behaviors [5]. Their goal is to detect abnormal activities or potential attacks from massive communication flows. Existing NIDS approaches can generally be divided into signature-based methods and anomaly-based methods. Signature-based methods identify known attacks using predefined attack signature databases, providing low false positive rates and high real-time efficiency [6]. However, they struggle to detect previously unseen or variant attacks. In contrast, anomaly-based approaches learn normal behavior patterns through statistical or machine learning models and detect deviations as potential intrusions [7]. Although these methods can identify unknown attacks, they often suffer from higher false positive rates.
Earlier intrusion detection studies were largely dominated by traditional machine learning techniques [8], where algorithms such as decision trees, support vector machines (SVM), and random forests were widely adopted. These methods offer advantages in terms of simplicity and interpretability but exhibit limitations when dealing with high-dimensional traffic features, complex attack behaviors, and severe class imbalance. With the rapid development of deep learning, neural network-based approaches have become increasingly popular for intrusion detection tasks. Convolutional neural networks (CNNs) [3] can automatically capture local structural patterns in traffic feature representations, while recurrent neural networks (RNNs) [9] and their variants [10] are effective for modeling sequential dependencies and temporal evolution in traffic data. More recently, Transformers [11], originally developed for natural language processing, have been introduced into cybersecurity research due to their powerful attention mechanisms [12] that enable long-range dependency modeling.
To address these challenges, this study proposes a principle-driven dual-branch intrusion detection framework termed ATC-FusionNet. The key idea is to explicitly decouple heterogeneous traffic representation learning into two complementary components: a Transformer branch dedicated to modeling long-range temporal dependencies and global traffic semantics, and a CNN branch responsible for capturing local structural patterns and short-term behavioral variations. By combining these two specialized learning pathways, the proposed architecture can more effectively model complex network traffic patterns.
The primary contributions of this work can be summarized as follows:
• Principle-driven dual-branch intrusion detection architecture. We design a dual-branch learning framework that explicitly separates global temporal dependency modeling and local structural pattern extraction through Transformer and CNN branches, respectively. This design enables complementary feature representation learning and improves the model’s capability to capture heterogeneous network traffic patterns.
• Autoencoder-assisted representation enhancement. An autoencoder-based feature reconstruction module is introduced before the dual-branch network to compress redundant information and enhance latent feature representations. This mechanism improves feature compactness and strengthens the representation of minority attack categories.
• Dynamic Weighted Logit-adjusted Focal Loss (DWLF). To address the severe class imbalance problem in intrusion detection datasets, we propose an imbalance-aware loss function that combines logit-level prior correction with focal modulation. The proposed DWLF effectively mitigates prior bias and gradient domination during training, leading to more stable optimization and improved detection performance for minority attack classes.
In recent years, network intrusion detection systems (NIDS) have seen growing adoption of machine learning and deep learning techniques, achieving notable performance improvements. Existing studies can be broadly categorized according to their modeling strategies and feature learning paradigms.
2.1 Traditional Machine Learning Approaches
Early intrusion detection approaches mainly relied on traditional machine learning algorithms. Gu and Lu [13] introduced an IDS utilizing both support vector machines and Naive Bayes classifiers, where Naive Bayes was employed to modify original features and generate superior-quality datasets for training the SVM. This hybrid strategy achieved improved detection rates and reduced false alarms, demonstrating the effectiveness of probabilistic feature modeling. To further reduce false positives and false negatives in IDS, Aljawarneh et al. [14] presented a hybrid approach combining information gain-based feature selection with ensemble voting mechanisms, integrating classifiers such as Naive Bayes, AdaBoostM1, REPTree, Random Tree, and J48.Testing with the NSL-KDD dataset showed higher detection accuracy alongside fewer false positives.
2.2 Hybrid and Deep Learning Approaches
As deep learning techniques continue to evolve, hybrid architectures have gained widespread attention in NIDS research. Yang et al. [15] employed DCGAN to generate synthetic samples for data augmentation and applied an improved LSTM classifier for intrusion detection, demonstrating the importance of dataset expansion. However, GAN-based methods generally require substantial computational resources and lack interpretability.
Rajesh Kanna and Santhi [16] introduced a hybrid approach for intrusion detection, merging HMLSTM with OCNN. Spatial features were learned via CNN with hyperparameters optimized using Lion Swarm Optimization (LSO), while temporal dependencies were captured by HMLSTM. Wei et al. [17] further introduced a CNN-BiLSTM-Attention framework, where spatial and temporal features were jointly modeled and weighted using an attention mechanism. To address class imbalance, Equalization Loss v2 was adopted, allowing the model to better detect infrequent attack categories.
2.3 Autoencoder-Based Feature Learning and Imbalanced Data Handling
To address high-dimensionality and class imbalance, autoencoder-based models have been increasingly adopted. Liu et al. [18] developed an IDS which utilizes ADASYN oversampling in conjunction with LightGBM. By increasing minority class samples and leveraging ensemble learning, the model achieved improved detection performance while maintaining low computational complexity.
Li et al. [19] developed a lightweight online autoencoder-based IDS (AE-IDS), which was trained using only normal traffic data. The framework employed random forests for feature selection and affinity propagation for feature grouping, enhancing the representational capability of autoencoders. Autoencoders were employed alongside clustering methods, such as K-means and GMM, for anomaly detection.
This section presents the proposed ATC-FusionNet framework for network traffic classification. The model is motivated by two major challenges in modern intrusion detection: (i) heterogeneous traffic patterns that exhibit both long-range temporal dependencies and short-term local variations, and (ii) severe class imbalance in real-world network datasets. To address these issues, ATC-FusionNet adopts a dual-branch architecture that decouples representation learning into complementary pathways, enabling more effective modeling of complex traffic behaviors.
3.1 Model Overview and Structure
Fig. 1 illustrates the overall architecture of the proposed model, which integrates an autoencoder with a dual-branch fusion framework.

Figure 1: Overview of the proposed ATC-FusionNet architecture, including autoencoder-based representation learning, dual-branch feature extraction, and attention-based fusion.
Raw network traffic is first preprocessed into fixed-length time-window sequences. The autoencoder performs nonlinear dimensionality reduction and feature reconstruction to produce stable low-dimensional embeddings. These encoded features are then fed into two parallel branches: a Transformer branch that captures global contextual dependencies and a CNN branch that extracts local spatial patterns. An attention-based fusion module dynamically learns the contribution of each branch and integrates their representations for final classification. Additionally, the framework employs a Dynamic Weighted Logit-adjusted Focal Loss (DWLF) to mitigate the impact of severe class imbalance during training.
3.2 Autoencoder-Based Feature Representation
An autoencoder (AE) [20] is employed as a pre-training module to extract compact and informative representations from network traffic features. An autoencoder consists of an encoder and a decoder, which project high-dimensional inputs into a lower-dimensional latent space [21]. The overall architecture of the autoencoder used in this study is illustrated in Fig. 2.

Figure 2: Autoencoder framework.
Let the input sample be
where
where
where
The CNN branch [22] is designed to capture local temporal dependencies and short-term dynamic patterns from the latent feature sequences generated by the autoencoder. Compared with attention-based models that focus on global relationships, convolutional operations are more effective in modeling localized feature [23] interactions within short temporal windows. Therefore, the CNN branch complements the Transformer branch by providing fine-grained structural pattern extraction.
where T indicates the time window length and D refers to the dimensionality of the latent features.
The CNN branch adopts a hierarchical one-dimensional convolutional architecture [24]. The first convolutional layer consists of
where
which encodes basic local interactions and short-term temporal patterns among adjacent time steps.
Subsequently, a one-dimensional max-pooling layer is applied to
where
The pooled features
Finally, a global average pooling layer aggregates the temporal feature map into a fixed-length vector
defined as
This operation summarizes the temporal information of each feature channel and produces the final output of the CNN branch, which is subsequently used for feature fusion with the Transformer branch, as illustrated in Fig. 3.

Figure 3: CNN branch Architecture.
The Transformer [25] branch aims to learn long-range dependencies and global contextual patterns from the input feature sequence. Similar to the CNN branch, its input is the latent feature sequence produced by the autoencoder.
Since the Transformer architecture itself is permutation-invariant [26], explicit positional information is injected into the input sequence to preserve temporal order. In this work, a fixed sinusoidal positional encoding scheme is adopted. The positional encoding matrix is defined as
here,
The position-enhanced feature sequence is then input to a deep Transformer encoder consisting of
Within each encoder layer, the input sequence is initially transformed through linear projections to generate the query (Q), key (K), and value (V) matrices. For an individual attention head, the scaled dot-product attention is defined as
where
and the final multi-head attention output is obtained by
here,
Residual connections and layer normalization are applied after each sub-layer to improve training stability.
A position-wise feed-forward network is included in each encoder layer to independently transform the representation at every time step. The FFN contains two linear transformations with an intermediate ReLU activation:
Finally, a global average pooling layer is employed along the temporal dimension to aggregate the Transformer encoder outputs into a fixed-length feature vector,
defined as
here,
3.5 Attention-Based Feature Fusion Layer
Different from prior hybrid architectures that concatenate or statically fuse CNN and Transformer outputs, our fusion strategy is motivated by the observation that local flow-level patterns and global temporal dependencies contribute unevenly across attack categories, especially under extreme class imbalance. Therefore, a fixed fusion scheme may amplify dominant-class representations while suppressing minority-class cues.
To address this issue, we adopt an attention-based dynamic fusion mechanism that adaptively adjusts the contribution of each branch conditioned on the learned feature context, enabling the model to emphasize discriminative cues for minority attacks when necessary.
Let the output of the Transformer branch be denoted as the global feature vector
where [, ; ,] indicates feature concatenation. The concatenated feature vector
where
Here,
where
where
with N as the batch size and C as the total number of classes,
To address the severe class imbalance commonly observed in network intrusion detection systems (NIDS), a Dynamic Weighted Logit-adjusted Focal Loss (DWLF) is employed. Class imbalance in intrusion detection typically manifests through skewed class priors and the dominance of easily classified majority samples. The proposed DWLF loss is designed to decouple these effects by jointly incorporating logit-level prior correction, focal modulation for hard-sample emphasis, and dynamic class reweighting.
Let
where
The adjusted posterior probability is then obtained via the Softmax function
where C denotes the total number of intrusion categories.
To emphasize hard-to-classify samples, a focal modulation term is introduced based on the adjusted posterior probability:
where
Considering that class distributions may vary across mini-batches. Therefore, a final class weight is constructed by combining a static global weight with a dynamically updated batch-level weight:
where
The dynamic weight
where
By integrating logit adjustment, focal modulation, and dynamic class weighting, the overall DWLF objective is formulated as
where N is the mini-batch size and
This loss function is fully differentiable and can be optimized using standard stochastic gradient-based methods. During training, gradients are jointly propagated through all network components, enabling the proposed model to learn discriminative representations under dynamically changing and highly imbalanced traffic distributions. Overall, DWLF enables stable optimization under highly skewed class distributions, which is critical for real-world intrusion detection scenarios.
This section reports the experimental results and discussion of the proposed intrusion detection model,evaluated on two datasets: UNSW-NB15 and NF-CSE-CIC-IDS2018.
Experiments are conducted on Windows 11 using Python 3.8 and TensorFlow 2.5 (Conda), with an NVIDIA RTX 3090 Ti GPU, Intel Core i7 CPU, and 32 GB RAM. To guarantee a fair comparison, all models are trained and tested under identical experimental settings. The Adam optimizer is used with a learning rate of 0.001, and the batch size is fixed at 128. The temporal window size is set to
Four commonly adopted classification metrics are used: Precision, Recall, F1-score, and Accuracy. The evaluation metrics are computed as follows:
True positives (TP) are instances of the positive class that are correctly identified as positive, whereas true negatives (TN) are negative instances correctly recognized as negative. False positives (FP) occur when negative instances are mistakenly labeled as positive, and false negatives (FN) correspond to positive instances incorrectly assigned to the negative class. For multi-class classification, the metrics are computed using a macro-averaging strategy across all classes.
To assess the performance and generalization of the proposed intrusion detection model, experiments are performed on two publicly accessible benchmark datasets: UNSW-NB15 and NF-CSE-CIC-IDS2018.
UNSW-NB15 [27] is a benchmark dataset designed to address limitations of earlier intrusion detection datasets. It contains both normal traffic and nine contemporary attack categories generated in a controlled laboratory environment. Each flow is described by 49 traffic features, excluding label information. The distribution of training and testing samples is shown in Fig. 4a.

Figure 4: Class distribution of samples in the two benchmark datasets.
The NF-CSE-CIC-IDS2018 [28] dataset is a benchmark for network intrusion detection. It contains over eight million flow records collected over multiple days, including benign traffic and various attack types such as DoS/DDoS, brute-force, web-based attacks, botnet activity, and infiltration. Each flow record has more than eighty features. As shown in Fig. 4b, the class distribution is highly imbalanced, with benign traffic and large-scale attacks dominating.
Both datasets are preprocessed using a unified pipeline prior to training. First, discriminative flow-level features are extracted from raw network traffic records following the unified flow format specification. No explicit feature selection or manual dimensionality reduction is applied at this stage. Instead, all available flow-level features are retained to preserve complete traffic semantics.This design choice avoids introducing manual feature-selection bias and allows the autoencoder to automatically learn compact latent representations from the full feature space.
A sliding window mechanism with a fixed length of eight time steps is then employed to segment continuous traffic flows into sequential samples, where each window represents a complete network behavior unit and captures short-term temporal evolution patterns.
Numerical features are normalized using Z-score standardization to ensure comparable value ranges across different feature dimensions.Categorical features are encoded using a bounded strategy, where the number of categorical levels is explicitly limited (32 levels per categorical attribute), and infrequent categories are mapped to an auxiliary token.This design prevents uncontrolled dimensionality expansion after encoding.
Missing or invalid values are handled using standard preprocessing operations to maintain data consistency. After preprocessing, the resulting flow sequences are transformed into normalized tensor representations and fed into the autoencoder module, which performs implicit feature compression and representation learning prior to downstream classification. Following preprocessing, the resulting feature dimensionality is fixed at 264 for UNSW-NB15 and 290 for NF-CSE-CIC-IDS2018, respectively.
In this study, the proposed approach is compared and analyzed against the following methods.
• XGBoost [29]: A comparative study by Lawal et al. evaluated multiple machine learning models on the CSE-CIC-IDS2018 dataset using undersampling to address class imbalance.
• Support Vector Machine (SVM) [30]: A classical supervised learning method that constructs optimal separating hyperplanes in high-dimensional feature spaces and is frequently used as a baseline in intrusion detection studies.
• Graph Attention(NEGAT) [31]: A self-supervised GNN-based model that exploits graph attention and contrastive learning to model structural dependencies among network flows for unsupervised intrusion detection.
• CNN-LSTM [32]: A hybrid deep learning framework combining convolutional neural networks for local feature extraction with LSTM layers to capture temporal dependencies in network flow data.
• CNN-BiLSTM-Attention [17]: An improved hybrid model utilizing bidirectional LSTM layers alongside an attention mechanism to capture temporal dependencies in both directions and highlight discriminative features.
• FlowTransformer [25]: A transformer-based network intrusion detection framework that models long-term dependencies in flow-level traffic and provides a modular architecture for flexible component substitution across different datasets.
• AI-SCAN [33]: A deep learning IDS proposed by [authors] utilizes a CNN architecture with SMOTE-based data augmentation and class weighting strategies to handle imbalanced data.
Two publicly available benchmark datasets are adopted to evaluate the effectiveness of the proposed intrusion detection model: UNSW-NB15 and NF-CSE-CIC-IDS2018.
Table 1 summarizes the multi-class classification performance on the UNSW-NB15 and NF-CSE-CIC-IDS2018 datasets. In addition to classical machine learning and CNN-based baselines, FlowTransformer is included as a representative Transformer-based NIDS baseline, whose results are reproduced under the same experimental settings to ensure fair comparison.

Given the severe class imbalance present in both datasets, Macro-F1 is emphasized as the primary evaluation metric, since it reflects balanced performance across attack categories rather than being dominated by majority classes. On the UNSW-NB15 dataset, ATC-FusionNet achieves a Macro-F1 of
On the more challenging NF-CSE-CIC-IDS2018 dataset, ATC-FusionNet achieves a Macro-F1 of
To evaluate robustness and mitigate the influence of stochastic factors, all results for ATC-FusionNet are reported as the mean and standard deviation over multiple runs with different random seeds. The consistently low variance across metrics further confirms that the observed performance gains are stable rather than arising from random fluctuations.
Overall, the results on both benchmark datasets demonstrate that ATC-FusionNet achieves stable and competitive performance compared with recent Transformer-based and deep learning NIDS approaches, highlighting the effectiveness of the proposed architecture for multi-class intrusion detection tasks.
Table 2 reports the per-class classification performance of the proposed model on the UNSW-NB15 dataset. Overall, the model achieves stable performance across most traffic categories. In particular, the Normal class attains an F1-score of 0.9961, demonstrating that the proposed model can accurately separate benign samples from attack instances with a low false alarm rate.

For the majority of attack categories, including Generic, Exploits, Fuzzers, and DoS, the model consistently achieves F1-scores above 0.92. These attacks typically generate relatively stable traffic patterns with distinctive statistical characteristics, allowing the dual-branch architecture to capture both global temporal dependencies and local flow-level variations effectively.
From an error analysis perspective, Fig. 5 provides further insight into the misclassification behavior of the proposed model on the UNSW-NB15 dataset. Most samples are concentrated along the main diagonal, indicating that ATC-FusionNet achieves stable and reliable detection performance for the majority of traffic categories. Nevertheless, misclassifications are not uniformly distributed across classes and are mainly concentrated among minority attack types.

Figure 5: Confusion matrix (UNSW-NB15).
Specifically, samples from the Reconnaissance and Worms classes are occasionally misclassified as semantically related attack categories or benign traffic. This observation is consistent with their lower F1-scores reported in Table 2. Such failure cases are primarily caused by two factors. First, these attack categories suffer from severe sample scarcity, which limits the model’s ability to learn sufficiently discriminative decision boundaries. Second, their flow-level statistical characteristics exhibit substantial overlap with other attack types, reducing feature separability at the flow representation level.
Table 3 presents the per-class performance on the NF-CSE-CIC-IDS2018 dataset. The proposed model achieves outstanding detection performance for the dominant Benign traffic, with an accuracy of 0.9994, as well as for several high-frequency attack categories such as DoS-Hulk, DDoS-HOIC, and DoS-GoldenEye. The findings show that the model captures highly distinctive features for significant and large-scale attack behaviors.

However, for extremely low-frequency attack types, including Brute Force-Web and SQL Injection, the detection performance is noticeably reduced. As illustrated by the class distribution in Fig. 4b, these categories suffer from severe sample imbalance, contain extremely few samples. Consequently, even a small number of misclassifications can lead to disproportionately low evaluation scores, a phenomenon commonly observed in highly imbalanced intrusion detection scenarios.
Fig. 6 further reveals that misclassifications are mainly concentrated among these rare attack categories. Specifically, samples from Brute Force-Web and SQL Injection are occasionally confused with other web-related attacks or benign traffic due to their short-lived, low-intensity behaviors and substantial overlap in flow-level statistical features. These attacks typically involve short-lived and low-intensity interactions that generate traffic patterns very similar to benign web activities. As a result, their flow-level statistical features show substantial overlap with normal traffic, which significantly reduces feature separability even for deep representation models.

Figure 6: Confusion matrix (NF-CSE-CIC-IDS2018).
Overall, this analysis suggests that the remaining classification errors are mainly driven by intrinsic data characteristics rather than training instability or model underfitting. While the proposed Dynamic Weighted Logit-adjusted Focal Loss alleviates class-prior bias and enhances minority class learning, accurately detecting extremely rare and behaviorally similar attack types remains challenging.
Fig. 7 presents the reconstruction loss evolution during autoencoder pretraining on the UNSW-NB15 and CIC-IDS2018 datasets.

Figure 7: Autoencoder training loss curve.
For both datasets, the reconstruction loss decreases rapidly in the initial epochs, indicating that the autoencoder quickly captures dominant statistical regularities in network flow features. As training proceeds, the training and validation losses converge smoothly with a small gap, suggesting stable optimization and good generalization.
Different convergence behaviors are observed between the two datasets. UNSW-NB15 exhibits a relatively smooth convergence toward a low-loss plateau, reflecting more stable traffic patterns. In contrast, CIC-IDS2018 shows a higher initial loss and a steeper early decline, likely due to the greater diversity and complexity of its traffic behaviors.
Overall, these results indicate that the autoencoder effectively compresses raw flow features into a compact latent space while preserving essential information for downstream intrusion detection.
To further evaluate the effectiveness of the dual-branch architecture, the performance of single-branch models was compared with that of the dual-branch fusion model, as shown in Figs. 8 and 9.

Figure 8: Single- and dual-branch model comparison on UNSW-NB15.

Figure 9: Single- and dual-branch model comparison on NF-CSE-CIC-IDS2018.
As shown in Fig. 8, the dual-branch model consistently outperforms the single-branch models across multiple attack categories in the UNSW-NB15 dataset. This improvement indicates that combining global contextual modeling and local feature extraction provides complementary information for network traffic representation.
Notably, for the Worms category, the dual-branch model achieves improvements of approximately 0.12 and 0.23 over the Transformer and CNN models, respectively. This result highlights the advantage of feature fusion when dealing with extremely low-sample attack scenarios, where relying on a single representation type may lead to unstable decision boundaries.
In contrast, the CNN-only model exhibits relatively lower performance and larger fluctuations across several categories, suggesting that local convolutional patterns alone are insufficient to capture long-range dependencies in sequential network traffic.
Fig. 9 shows that all three models achieve high detection accuracy for Benign traffic and several high-frequency attack categories in the NF-CSE-CIC-IDS2018 dataset, reflecting stable performance in modeling mainstream traffic features.
However, for attack categories with limited samples and complex feature distributions, the dual-branch model demonstrates more pronounced advantages. In minority attack classes such as Brute Force-Web, SQL Injection, Brute Force-XSS, and Infiltration, the dual-branch model significantly outperforms both the Transformer-only and CNN-only models.
These attack types often involve short-lived or stealthy behaviors that generate subtle traffic variations. As a result, their statistical flow features exhibit substantial overlap with other attack categories or benign traffic. The fusion of global temporal dependencies and local feature patterns enables the proposed architecture to better capture these subtle variations, leading to improved detection performance.
To evaluate the effectiveness of DWLF and to compare it with simpler imbalance-handling strategies, systematic ablation experiments were conducted on both datasets, as summarized in Table 4.

The ablation results in Table 4 show that each component of the DWLF loss contributes to performance improvements. Introducing logit adjustment significantly improves the baseline cross-entropy performance, indicating that correcting class-prior bias is beneficial in highly imbalanced intrusion detection scenarios.
Adding the focal modulation further enhances detection performance by increasing the contribution of hard-to-classify samples during training. Static class re-weighting provides additional improvement by compensating for global class imbalance.
Finally, incorporating the dynamic weighting mechanism yields the best overall results, achieving accuracies of 90.01% and 97.82% on UNSW-NB15 and CIC-IDS2018, respectively. This result suggests that dynamically adapting class weights according to batch-level distributions helps stabilize optimization under highly skewed traffic distributions.
4.7 Computational Complexity and Deployment Feasibility
To assess the practical feasibility of ATC-FusionNet, we perform a quantitative complexity–performance analysis, reporting parameter count, FLOPs, training time per epoch, and inference latency in addition to detection accuracy. Two simplified baselines, namely CNN-only and Transformer-only models, are implemented with the same input representation and classifier head. Inference latency is measured as the average per-sample forward time over 100 runs under batch size 1 on an NVIDIA RTX 3090 Ti GPU, with the model evaluated in inference mode.
As shown in Table 5, ATC-FusionNet incurs a moderate increase in computational cost due to the dual-branch design and attention-based fusion. Nevertheless, the model remains compact with fewer than 1M parameters and an inference latency of approximately 3 ms per sample. Compared to the Transformer-only baseline, the additional overhead is modest, while consistently yielding improved detection performance. These results suggest that the architectural complexity of ATC-FusionNet is well motivated by its performance gains in challenging intrusion detection scenarios.

Memory usage during training and inference is primarily dominated by the Transformer encoder and the autoencoder components. Given the modest parameter scale (<1M parameters), the peak GPU memory footprint remains well within the capacity of a single modern GPU.
From a deployment perspective, the lightweight architecture and low inference latency further indicate that ATC-FusionNet is suitable for real-time intrusion detection scenarios. With an average inference time of approximately 3 ms per sample, the model can process network flow sequences efficiently without introducing significant delay in traffic monitoring pipelines.
Moreover, the compact parameter size and moderate computational requirements suggest that the model can be deployed not only on high-performance servers but also on resource-constrained environments, such as edge gateways or distributed network monitoring nodes. These characteristics make the proposed framework practical for large-scale or edge-oriented intrusion detection systems where both detection accuracy and computational efficiency are critical.
This study proposes an autoencoder-based dual-branch network intrusion detection model to address the limitations of existing methods in complex network environments, including low detection accuracy, insufficient recognition of minority attack classes, and inadequate feature representation. The model employs a parallel dual-branch architecture integrating Transformer [34] and CNN [35], enabling the simultaneous capture of long-range sequential dependencies and local critical behavior patterns. An attention-guided fusion mechanism (Attention Fusion) is incorporated to enhance the focus on key traffic patterns. At the feature input stage, a complete autoencoder-based feature reconstruction module is implemented, which improves the abstraction capability and stability of the model for high-dimensional network traffic through dimensionality reduction and reconstruction loss.
On the UNSW-NB15 datasets indicate that the proposed model attains an overall accuracy of 90.01% and a Macro-F1 score of 85.14%, indicating improved class-balanced performance beyond majority-class accuracy. In particular,for rare attack categories such as Analysis and Backdoor, the proposed model attains F1-scores of 0.7138 and 0.6784, respectively, which are higher than those typically reported by baseline models. On the NF-CSE-CIC-IDS2018 dataset, ATC-FusionNet achieves an accuracy of 97.82% with a Macro-F1 score of 77.20%, reflecting robust performance under more extreme class imbalance.
Despite these advancements, several challenges remain. In extremely imbalanced scenarios, the model may still produce misclassifications for certain highly similar attack categories. Specifically, samples from Reconnaissance andWorms, as well asBrute Force-Web and SQL Injection are occasionally misclassified as semantically related attack categories or benign traffic due to their short-lived, low-intensity behaviors and substantial overlap in flow-level statistical features. In addition, the dual-branch architecture combined with the autoencoder introduces computational overhead. Subsequent research will aim at lightweight models and high-throughput,low-latency real-time intrusion detection systems for practical deployment, aiming to further improve the model’s operational viability and operational value.
Acknowledgement: Thanks to our tutors and researchers for their assistance and guidance.
Funding Statement: This work was supported in part by the National Natural Science Foundation of China under Grant U22A2004 and in part by Zhejiang Key Laboratory of Digital Fashion and Data Governance, Zhejiang Sci-Tech University, Hangzhou, China.
Author Contributions: Study conception and design: Liping Wang and Jiang Wu; analysis and intepretation of results: Liping Wang and Jiang Wu; data collection: Liang Wang; draft manuscript preparation: Liping Wang. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: The datasets used in this study are publicly available. The UNSW-NB15 dataset is available at https://research.unsw.edu.au/projects/unsw-nb15-dataset. The NF-CSE-CIC-IDS2018 dataset is available from the Canadian Institute for Cybersecurity at https://www.unb.ca/cic/datasets/ids-2018.html.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Odeh A, Taleb AA. Robust network security: a deep learning approach to intrusion detection in IoT. Comput Mater Contin. 2024;81(3):4149–69. doi:10.32604/cmc.2024.058052. [Google Scholar] [CrossRef]
2. Liu Z, Liu S, Zhang J. An industrial intrusion detection method based on hybrid convolutional neural networks with improved TCN. Comput Mater Contin. 2024;78(1):411–33. doi:10.32604/cmc.2023.046237. [Google Scholar] [CrossRef]
3. Alrayes FS, Zakariah M, Amin SU, Khan ZI, Alqurni JS. Network security enhanced with deep neural network-based intrusion detection system. Comput Mater Contin. 2024;80(1):1457–90. doi:10.32604/cmc.2024.051996. [Google Scholar] [CrossRef]
4. Deng J, Shen W, Feng Y, Lu G, Shen G, Cui L, et al. Lightweight intrusion detection using reservoir computing. Comput Mater Contin. 2024;78(1):1345–61. doi:10.32604/cmc.2023.047079. [Google Scholar] [CrossRef]
5. Salih AA, Abdulrazaq MB. Cybernet Model: a new deep learning model for cyber DDoS attacks detection and recognition. Comput Mater Contin. 2024;78(1):1275–95. doi:10.32604/cmc.2023.046101. [Google Scholar] [CrossRef]
6. Patel S, Kumar R, Singh G. Enhanced hybrid deep learning models-based anomaly detection method for two-stage binary and multi-class classification of attacks in intrusion detection systems. Algorithms. 2025;18(2):69. doi:10.3390/a18020069. [Google Scholar] [CrossRef]
7. Neto ECP, Iqbal S, Buffett S, Sultana M, Taylor A. Deep learning for intrusion detection in emerging technologies: a comprehensive survey and new perspectives. Artif Intell Rev. 2025;58(11):340. doi:10.1007/s10462-025-11346-z. [Google Scholar] [CrossRef]
8. Ayantayo A, Kaur A, Kour A, Schmoor X, Shah F, Vickers I, et al. Network intrusion detection using feature fusion with deep learning. J Big Data. 2023;10(1):167. doi:10.1186/s40537-023-00834-0. [Google Scholar] [CrossRef]
9. Abdel-Basset M, Manogaran G, Mirjalili S, Surendiran M. Artificial intelligence driven cyberattack detection system using integration of deep belief network with convolutional neural network on industrial IoT. Alex Eng J. 2025;110(2):438–50. doi:10.1016/j.aej.2024.10.009. [Google Scholar] [CrossRef]
10. Liu H, Lang B. Machine learning and deep learning methods for intrusion detection systems: a survey. Appl Sci. 2019;9(20):4396. doi:10.3390/app9204396. [Google Scholar] [CrossRef]
11. Kamal H, Mashaly M. Advanced hybrid transformer-convolutional neural network (transformer-CNN) deep learning model for effective intrusion detection systems with class imbalance mitigation using resampling techniques. Future Internet. 2024;16(12):481. doi:10.3390/fi16120481. [Google Scholar] [CrossRef]
12. Zhang C, Li J, Wang N, Zhang D. Research on intrusion detection method based on transformer and CNN-BiLSTM in Internet of Things. Sensors. 2025;25(9):2725. doi:10.3390/s25092725. [Google Scholar] [PubMed] [CrossRef]
13. Gu J, Lu S. An effective intrusion detection approach using SVM with Naive Bayes feature embedding. Comput Secur. 2021;103(3):102158. doi:10.1016/j.cose.2020.102158. [Google Scholar] [CrossRef]
14. Aljawarneh S, Aldwairi M, Yassein MB. Anomaly-based intrusion detection system through feature selection and hybrid model analysis. J Comput Sci. 2018;25(3):152–60. doi:10.1016/j.jocs.2017.03.006. [Google Scholar] [CrossRef]
15. Yang J, Li T, Liang G, He W, Zhao Y. A simple recurrent unit model based intrusion detection system with DCGAN. IEEE Access. 2019;7:83286–96. doi:10.1109/ACCESS.2019.2922692. [Google Scholar] [CrossRef]
16. Rajesh Kanna P, Santhi P. Unified deep learning approach for efficient intrusion detection using integrated spatial–temporal features. Knowl Based Syst. 2021;226(1):107132. doi:10.1016/j.knosys.2021.107132. [Google Scholar] [CrossRef]
17. Wei D, Li X, Ji W, He S. Network intrusion detection method based on CNN, BiLSTM, and attention mechanism. IEEE Access. 2024;12:1–14. doi:10.1109/ACCESS.2024.3384528. [Google Scholar] [CrossRef]
18. Liu J, Gao Y, Hu F. A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Comput Secur. 2021;106(99):102289. doi:10.1016/j.cose.2021.102289. [Google Scholar] [CrossRef]
19. Li X, Chen W, Zhang Q, Wu L. Building auto-encoder intrusion detection system based on random forest feature selection. Comput Secur. 2020;95(1):101851. doi:10.1016/j.cose.2020.101851. [Google Scholar] [CrossRef]
20. Lopes IO, Zou DQ, Abdulqadder IH, Ruambo FA, Yuan B, Jin H. Effective network intrusion detection via representation learning: a denoising autoencoder approach. Comput Commun. 2022;194(5):55–65. doi:10.1016/j.comcom.2022.07.027. [Google Scholar] [CrossRef]
21. Song Y, Hyun S, Cheong YG. Analysis of autoencoders for network intrusion detection. Sensors. 2021;21(13):4294. doi:10.3390/s21134294. [Google Scholar] [PubMed] [CrossRef]
22. Vanlalruata H, Najar AA, Laldinsanga C, Hussain J, Hmingliana L. A lightweight intrusion detection system using deep convolutional neural network. Comput Electr Eng. 2025;127:110561. doi:10.1016/j.compeleceng.2025.110561. [Google Scholar] [CrossRef]
23. Mohammadpour L, Ling TC, Liew CS, Aryanfar A. A survey of CNN-based network intrusion detection. Appl Sci. 2022;12(16):8162. doi:10.3390/app12168162. [Google Scholar] [CrossRef]
24. Alashjaee AM. Deep learning for network security: an Attention-CNN-LSTM model for accurate intrusion detection. Sci Rep. 2025;15(1):21856. doi:10.1038/s41598-025-07706-y. [Google Scholar] [PubMed] [CrossRef]
25. Manocchio LD, Layeghy S, Lo WW, Kulatilleke GK, Sarhan M, Portmann M. FlowTransformer: a transformer framework for flow-based network intrusion detection systems. Expert Syst Appl. 2024;241(10):122564. doi:10.1016/j.eswa.2023.122564. [Google Scholar] [CrossRef]
26. Almadhor A, Alsubai S, Kryvinska N, Al Hejaili A, Ayari M, Bouallegue B, et al. Evaluating large transformer models for anomaly detection of resource-constrained IoT devices for intrusion detection system. Sci Rep. 2025;15(1):37972. doi:10.1038/s41598-025-21826-5. [Google Scholar] [PubMed] [CrossRef]
27. Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Proceedings of the Military Communications and Information Systems Conference (MilCIS); 2015 Nov 10–12; Canberra, ACT, Australia. p. 1–6. [Google Scholar]
28. Sarhan M, Layeghy S, Moustafa N, Portmann M. NetFlow datasets for machine learning-based network intrusion detection systems. In: Big Data Technologies and Applications (BDTA 2020) and Wireless Internet (WiCON 2020). Cham, Switzerland: Springer; 2021. p. 117–35. doi:10.1007/978-3-030-72802-1_9. [Google Scholar] [CrossRef]
29. Saidi Z, Ouidad A, El Idrissi Younes EB. A machine learning and blockchain-based framework for enhanced intrusion detection systems using the CSE-CIC-IDS2018 dataset. Informatica. 2025;49(18). doi:10.31449/inf.v49i18.9421. [Google Scholar] [CrossRef]
30. Ahmad I, Basheri M, Iqbal MJ, Raheem A. Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access. 2018;6:33789–95. doi:10.1109/ACCESS.2018.2841987. [Google Scholar] [CrossRef]
31. Sarhan M, Layeghy S, Portmann M. Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Big Data Res. 2022;30(3):100359. doi:10.1016/j.bdr.2022.100359. [Google Scholar] [CrossRef]
32. Halbouni A, Gunawan TS, Habaebi MH, Halbouni M, Kartiwi M, Ahmad R. CNN-LSTM: hybrid deep neural network for network intrusion detection system. IEEE Access. 2022;10(11):99837–49. doi:10.1109/ACCESS.2022.3206425. [Google Scholar] [CrossRef]
33. Shivakanth G. A performance analysis of ML-based intrusion detection systems in cloud environments. Int J Electr Electron Eng Telecommun. 2025;14(4):243–52. doi:10.18178/ijeetc.14.4.243-252. [Google Scholar] [CrossRef]
34. Wang J, Si C, Wang Z, Fu Q. A new industrial intrusion detection method based on CNN-BiLSTM. Comput Mater Contin. 2024;79(3):4297–4318. doi:10.32604/cmc.2024.050223. [Google Scholar] [CrossRef]
35. Ren K, Yuan S, Zhang C, Shi Y, Huang Z. CANET: a hierarchical CNN-attention model for network intrusion detection. Comput Commun. 2023;205(16):170–81. doi:10.1016/j.comcom.2023.04.018. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools