ARAE: An Adaptive Robust AutoEncoder for Network Anomaly Detection

Chunyong Yin; Williams Kyei

doi:10.32604/jcs.2025.072740

icon Open Access

ARTICLE

ARAE: An Adaptive Robust AutoEncoder for Network Anomaly Detection

Chunyong Yin, Williams Kyei^*

Department School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, 210044, China

* Corresponding Author: Williams Kyei. Email: email

Journal of Cyber Security 2025, 7, 615-635. https://doi.org/10.32604/jcs.2025.072740

Received 02 September 2025; Accepted 16 October 2025; Issue published 24 December 2025

Abstract

The evolving sophistication of network threats demands anomaly detection methods that are both robust and adaptive. While autoencoders excel at learning normal traffic patterns, they struggle with complex feature interactions and require manual tuning for different environments. We introduce the Adaptive Robust AutoEncoder (ARAE), a novel framework that dynamically balances reconstruction fidelity with latent space regularization through learnable loss weighting. ARAE incorporates multi-head attention to model feature dependencies and fuses multiple anomaly indicators into an adaptive scoring mechanism. Extensive evaluation on four benchmark datasets demonstrates that ARAE significantly outperforms existing autoencoder variants and classical methods, with ablation studies confirming the critical importance of its adaptive components. The framework provides a robust, self-tuning solution for modern network intrusion detection systems without requiring manual hyperparameter optimization.

Keywords

Adaptive robust autoencoder (ARAE); unsupervised anomaly detection; deep autoencoders; variational Autoencoders (VAE); learnable adaptive regularization; latent space regularization; mahalanobis distance; cybersecurity network security; deep learning

1 Introduction

The escalating sophistication and scale of cyber threats pose formidable challenges to global digital infrastructure, making robust network anomaly detection (NAD) a critical cybersecurity priority [1]. With projected annual damages exceeding $10.5 trillion by 2025 [2], the evolving threat landscape from stealthy intrusions to large-scale denial-of-service attacks demands adaptive defense mechanisms. This challenge is exacerbated in real-world environments where anomalies are rare, often unlabeled, and continuously evolving, rendering traditional signature-based methods ineffective. While supervised learning excels in domains with abundant labeled data, it struggles in NAD due to annotation scarcity, zero-day attacks, and the prohibitive cost of maintaining current labeled datasets [3,4]. Consequently, unsupervised learning paradigms have gained traction, with autoencoders (AEs) and variational autoencoders (VAEs) emerging as prominent approaches for learning compressed representations of normal traffic without requiring prior attack knowledge [4,5].

Despite their promise, standard AE-based models suffer from critical limitations that hinder practical deployment in Network Intrusion Detection Systems (NIDS). They often learn unstable latent representations, exhibit sensitivity to noise and feature scaling in high-dimensional network data, and demonstrate poor generalization from legacy benchmarks to modern traffic captures [6,7]. A fundamental weakness is their reliance on reconstruction error alone, which proves inadequate for attack types that reconstruct easily [8]. Concurrently, classical methods like Isolation Forest and One-Class SVM, while computationally efficient, lack the representational capacity to model complex nonlinear feature interactions in high-dimensional data, leading to suboptimal performance and high false-positive rates [9,10].

Recent advances highlight promising directions for addressing these limitations. Dual attention mechanisms have shown effectiveness in capturing complex interdependencies in multivariate time series data [11], while improved regularization techniques enhance autoencoder robustness in industrial settings [12]. Additionally, adaptive thresholding approaches combined with graph attention networks demonstrate the value of self-tuning frameworks for evolving anomalies [13]. However, these approaches remain fragmented attention mechanisms, adaptive regularization, and dynamic scoring are typically explored in isolation rather than as an integrated solution. Crucially, no existing framework provides end-to-end adaptability specifically tailored for the tabular, heterogeneous nature of network traffic data.

To bridge this gap, we introduce the Adaptive Robust AutoEncoder (ARAE), a novel architecture that enhances unsupervised NAD through three integrated innovations:

1. Multi-head self-attention to model complex feature interdependencies in network traffic, building on attention mechanisms like those in [11] but adapted for tabular data

2. Learnable adaptive weighting of KL-divergence and center loss terms for dynamic training balance, extending regularization concepts from [12,14]

3. An adaptive scoring mechanism that fuses reconstruction error, Mahalanobis distance, and KL divergence, incorporating self-tuning principles similar to [13,15]

Trained exclusively on normal traffic, ARAE learns a tightly regularized latent distribution while automatically adapting to data characteristics without manual hyperparameter tuning providing the first comprehensive framework that integrates attention, adaptive regularization, and dynamic scoring specifically for network anomaly detection. Our contributions include: (1) the ARAE framework itself, which introduces unprecedented end-to-end adaptability to autoencoder-based NAD; (2) comprehensive evaluation demonstrating state-of-the-art performance across four benchmark datasets; and (3) rigorous ablation studies validating each component’s necessity. This work provides a robust, self-tuning solution for modern NIDS that effectively addresses the adaptability limitations of existing approaches.

2 Related Work

The evolution of network anomaly detection (NAD) has progressed from classical statistical methods to sophisticated deep learning architectures, driven by the need to handle unlabeled, high-dimensional network traffic. This section reviews unsupervised NAD methods thematically, focusing on autoencoder-based approaches and their variants, to contextualize our proposed Adaptive Robust AutoEncoder (ARAE).

2.1 Standard Autoencoder Approaches

Autoencoders (AEs) have become a cornerstone of unsupervised NAD due to their ability to learn compressed representations of normal traffic patterns [3,5]. The fundamental premise involves training a network to reconstruct input data, under the assumption that anomalies deviating from learned patterns will exhibit high reconstruction error. Standard AEs offer simplicity and computational efficiency but suffer from several limitations: sensitivity to noise and hyperparameter tuning, propensity to learn identity functions, and unstable latent representations that lead to inconsistent scoring [4,13].

Variational autoencoders (VAEs) address some stability issues by introducing probabilistic latent spaces [16]. By regularizing the latent distribution towards a Gaussian prior, VAEs learn smoother representations less prone to overfitting. However, both standard AEs and VAEs rely predominantly on reconstruction error for anomaly detection, which proves inadequate for attack types that reconstruct easily [8]. This limitation has motivated research into more robust architectures.

2.2 Robust and Variational Autoencoder Variants

To enhance robustness, researchers have developed several advanced regularization techniques. Center loss approaches force latent embeddings of normal samples to cluster tightly around a central point, creating more defined decision boundaries [17]. Robust covariance estimation methods, such as Ledoit-Wolf shrinkage, provide more reliable Mahalanobis distance calculations in latent space, reducing sensitivity to outliers [9,18]. Other approaches incorporate adversarial training or discriminative components, conceptually similar to Generative Adversarial Networks, to sharpen the distinction between normal and anomalous patterns [19].

Probabilistic extensions like β-VAEs introduce tunable regularization strengths, while methods like [12] explore enhanced regularization for industrial anomaly detection. Despite these advances, most robust variants maintain static formulations with fixed hyperparameters that cannot adapt to different network environments or traffic characteristics.

2.3 Hybrid and Attention-Based Architectures

The integration of attention mechanisms represents a significant advancement in capturing complex feature interactions. Inspired by transformer architectures [20], attention-based AEs allow models to dynamically weight feature importance during reconstruction. Recent works like DTAAD [11] demonstrate that dual attention architectures can effectively model interdependencies in multivariate time series data. Similarly, graph attention networks have been combined with VAEs for adaptive anomaly thresholding in industrial settings [13].

Hybrid approaches combine AEs with classical methods, using the latent space as input for one-class classifiers like SVMs [9,21]. Other architectures incorporate temporal modeling capabilities through RNNs or temporal convolutions for sequence-aware anomaly detection [22]. While these hybrid models often improve performance, they typically lack end-to-end adaptability and require careful manual tuning of component weights.

2.4 Research Gaps and Our Contribution

As summarized in Table 1, existing approaches exhibit complementary strengths but critical limitations. Standard and robust AEs lack mechanisms for handling complex feature interactions. Attention-based models capture dependencies but maintain static formulations. Hybrid approaches improve discrimination but require manual component integration. Most significantly, no existing framework provides end-to-end adaptability across both training and inference phases.

images

Our proposed Adaptive Robust AutoEncoder (ARAE) addresses these gaps through three integrated innovations:

1. Multi-head self-attention for dynamic feature interaction modeling

2. Learnable adaptive weighting of KL-divergence and center loss terms

3. Adaptive scoring mechanism that fuses multiple anomaly indicators

Unlike previous works that implement these components statically or in isolation, ARAE introduces full adaptability specifically designed for tabular network data. By dynamically balancing reconstruction fidelity against latent space regularization during training, and intelligently weighting anomaly indicators during inference, ARAE provides the first comprehensive framework that automatically adapts to diverse network environments without manual hyperparameter tuning. Unlike prior robust AEs that lack visual diagnostics, ARAE’s feature-level attention provides interpretable heatmaps for enhanced detection insights.

3 Proposed Approach

This section introduces the Adaptive Robust AutoEncoder (ARAE), a novel framework designed to address the limitations of existing autoencoder-based methods in unsupervised network anomaly detection. ARAE incorporates three synergistic innovations: (1) a multi-head self-attention mechanism to capture complex feature interdependencies, (2) learnable adaptive regularization for dynamic balancing of training objectives, and (3) a multi-metric fusion approach for robust anomaly scoring. These components enable ARAE to adapt automatically to diverse network environments without extensive manual hyperparameter tuning. Fig. 1 provides a high-level flowchart of the complete pipeline, illustrating data flow from input processing through encoding, attention, latent projection, reconstruction, and scoring enhancing accessibility for readers less familiar with deep learning architectures.

images

Figure 1: ARAE processing pipeline illustrating data flow from input through attention mechanism and latent space to final anomaly scoring

3.1 Model Architecture and Pipeline

The ARAE pipeline processes network traffic features in a structured sequence, as depicted in Fig. 1. Raw input data x∈Rd undergoes standardization before entering the encoder for initial feature extraction. The multi-head attention layer then models inter-feature relationships, followed by variational projection to a regularized latent space. The decoder reconstructs the input from this latent representation, and anomalies are scored via a composite metric fusing reconstruction error with latent deviations. This end-to-end design allows ARAE to learn robust representations of normal traffic while remaining sensitive to deviations, making it suitable for dynamic cybersecurity applications.

3.2 Model Components

Encoder Network

The encoder, (Eϕ), transforms an input sample (x∈Rd) into a hidden representation via stacked nonlinear layers:

h=ReLU(W2⋅ReLU(W1⋅x+b1)+b2)(1)

where W1∈Rh×d, b1∈Rh, W2∈Rh×h, and b2∈Rh are learnable parameters, with hidden dimension h = 128. Practically, this architecture extracts hierarchical patterns from raw features, such as correlations between packet sizes and connection durations, providing a foundation for subsequent attention-based refinement.

Multi-Head Self-Attention Mechanism

Network features often exhibit intricate dependencies (e.g., protocol type influencing packet counts). To capture these without assuming independence [20,22,23], we apply multi-head self-attention post-encoding [13,24]:

Hatt=MultiHeadAttention(H,H,H)(2)

Hfinal=LayerNorm(H+Hatt)(3)

Intuitively, unlike feedforward layers that process features in isolation, attention dynamically weights each feature based on its context within the sample [24–26]. For instance, in HTTP traffic, it might emphasize flow duration over less relevant flags, enabling detection of anomalies that disrupt normal correlations. This mechanism enhances robustness to variable traffic patterns, as demonstrated in ablation studies showing performance drops without it (Section 5.2).

Variational Latent Space with Adaptive Regularization

The attended representation (hfinal) projects to Gaussian parameters in latent space Rz (where z = 32):

µ=Wµ⋅hfinal+bµ(4)

log⁡σ2=Wlog⁡∖var⋅hfinal+blog⁡var(5)

The latent code is sampled via reparameterization: z=µ+σ⊙ϵ, where ϵ∼𝒩(0,I).

A key innovation is the learnable center vector c∈Rz and scale vector s∈Rz allowing the model to adapt its “normal” distribution rather than enforcing a fixed standard Gaussian [14,15]. Practically, this flexibility accommodates diverse network data distributions, e.g., skewed feature scales in real traffic preventing over-regularization and improving anomaly separation, as evidenced by reduced false positives in experiments.

Decoder Network

The decoder Dθ reconstructs input from latent codes through symmetric layers:

x^=W5⋅ReLU(W4⋅ReLU(W3⋅z+b3)+b4)+b5(6)

Trained to minimize error on normal samples, it ensures anomalies yield high deviations, forming the basis for scoring.

3.3 Adaptive Training Mechanism

ARAE’s training objective dynamically balances components for optimal convergence.

Loss Components

Reconstruction Loss promotes faithful input recovery:

ℒrecon=1n∑i=1n|xi−xi^|22(7)

Intuitively, this enforces learning of normal patterns, making anomalies stand out via poor reconstruction.

KL Divergence Loss structures the latent space:

ℒKL=−12∑j=1zE[1+log⁡σj2−µj2−exp⁡(log⁡σj2)](8)

Practically, it prevents latent collapse while allowing adaptive shaping through learnable parameters.

Center Loss regularizes toward the adaptive center:

ℒcenter=1n∑i=1n[(zi−c)2⊙s](9)

This pulls normal latents toward a data-driven centroid, enhancing compactness without rigidity [14].

The total loss adaptively weights these terms via learnable sigmoids (e.g., β=σ(wβ)×0.01), enabling the model to prioritize reconstruction early and regularization later, as needed for stable training on variable traffic.

Learnable Weighting Mechanism

Key innovation: Instead of fixed hyperparameters, ARAE learns optimal weights during training:

β=σ(wKL)⋅0.01,γ=σ(wcenter)⋅0.01(10)

where (σ(⋅)) is the sigmoid function.

Practical intuition: Different network environments require different balances between reconstruction accuracy and latent space organization. For example, noisy enterprise traffic might benefit from stronger reconstruction focus, while clean data centers might prioritize latent space compactness. ARAE automatically discovers this balance rather than requiring manual tuning.

The total loss combines these adaptively weighted components:

ℒtotal=ℒrecon+β⋅ℒKL+γ⋅ℒcenter(11)

3.4 Anomaly Scoring Framework

The anomaly score for a test sample x is computed as a weighted combination of its reconstruction error and its deviation from the learned normal profile in the latent space. This multi-faceted approach leverages all aspects of the model’s training objective for robust detection.

Scoring Components

Reconstruction Error (Recon) measures input reproduction fidelity:

Recon=||x−x^||2(12)

Adaptive Latent Deviation (AdpLD) is mahalanobis distance weighted by reconstruction confidence.

AdpLD=λmd(x)⋅DM(z,μtrain)(13)

KL Divergence Penalty (KLDP) measures distribution deviation.

KLDP=γ⋅DKL(𝒩(μ,σ2)∥𝒩(0,I))(14)

The final anomaly score combines these components:

A(x)=Recon+AdpLD+KLDP(15)

Adaptive Weighting Intuition

The Mahalanobis distance DM between the sample’s latent vector z and the distribution of normal training latent vectors (with mean µtrain and precision matrix Σtrain−1) [15]. This distance is adaptively weighted by a factor λmd that scales with the reconstruction error:

λmd(x)=clip(||x−x^||2E[||X−X^||2],0.1,2.0)(16)

This ensures samples with high reconstruction error receive greater scrutiny in latent space. This is particularly useful for attacks that reconstruct well but exhibit abnormal latent characteristics.

3.5 Training Procedure

The training process of the encoder and decoder in the Adaptive Robust AutoEncoder follows an iterative optimization procedure where traffic data is fed into the encoder network to generate a compressed latent representation, which is then processed by the decoder network to reconstruct the original input. The reconstruction output is compared against the original input to compute three distinct loss components: reconstruction loss that measures the fidelity of the reconstructed output, KL divergence that regularizes the latent space distribution, and center loss that pulls latent representations toward a learnable center point. These loss components are combined to form a total loss value that guides model optimization. The system then evaluates whether the model has converted to a stable state; if convergence is not achieved, backpropagation is performed to update the weights of both the encoder and decoder networks. This iterative cycle of forward propagation, loss computation, convergence checking, and parameter updates continues until the model reaches convergence, ensuring that the encoder learns to extract meaningful features from normal traffic patterns while the decoder becomes proficient at reconstructing these patterns, with the latent space being regularized to effectively distinguish normal from anomalous data points. The training algorithm for ARAE is shown in Algorithm 1 below.

images

4 Experiments

4.1 Experimental Setup

We conducted comprehensive experiments to evaluate the proposed Adaptive Robust AutoEncoder (ARAE) against established baselines across four benchmark network intrusion detection datasets: KDDCup99, NSL-KDD, UNSW-NB15, and CIC-IDS2017. These datasets provide a mix of simulated and real-world traffic, enabling assessment of generalization. While KDDCup99 and NSL-KDD offer comparability with prior benchmarks despite known limitations (e.g., redundancy and outdated attacks), we prioritize modern UNSW-NB15 and CIC-IDS2017 for relevance to contemporary threats like DDoS and exploits [6].

All experiments were implemented in PyTorch and executed on an NVIDIA RTX 4060 GPU. To ensure reproducibility, we used a fixed seed (42) and released the complete codebase on GitHub. Models were trained in a strictly unsupervised manner on normal (benign) traffic only, simulating real-world scenarios where attacks are unlabeled and rare. This approach mitigates label scarcity but may introduce bias toward over-sensitivity in normal variations; we address this via adaptive regularization, as validated in ablations showing stable performance (Table 2).

images

ARAE hyperparameters are summarized in Table 3. Training employed a two-phase strategy: 20-epoch warm-up on reconstruction loss, followed by 50 epochs incorporating adaptive KL and center losses. We used gradient clipping (max norm = 1.0) and ReduceLROnPlateau scheduler (patience = 3). No explicit early stopping was implemented, but the scheduler stabilized training. For evaluation, we used 70/30 train-test splits (stratified), with thresholds calibrated on test data by maximizing F1-score. Experiments ran 5 times with varying seeds (42–46) for statistical rigor, reporting mean ± std. dev.

images

4.2 Datasets

Four publicly available network intrusion detection datasets were used to evaluate the proposed model. KDDCup99 is a classic benchmark dataset containing 41 features derived from simulated network traffic. The dataset includes attacks such as DoS and probes. For this study, the full dataset was used. After preprocessing, where categorical features were handled with one-hot encoding and numerical features were standardized, the anomaly ratio was 19.69% [27]. NSL-KDD is a refined version of KDDCup99, designed to address its redundancy issues, and maintains the same 41 features. It offers a more balanced and reliable evaluation, with an anomaly ratio of 20.00% in the test set [27]. UNSW-NB15 is a modern dataset comprising real contemporary network traffic with 42 features (including flow duration and packet sizes), which encompass 9 attack categories such as exploits and fuzzers. Its preprocessing involved label encoding for protocol and state features as well as median imputation, resulting in a test set anomaly ratio of 10.00% [28]. CIC-IDS2017 is a large-scale real-world dataset containing benign traffic and modern attacks like DDoS and botnets. Its 78 features were extracted from raw PCAP files. The data was processed with minimal synthetic generation, ensuring the evaluation reflects real-world complexity. The test set has an anomaly ratio of 16.77% [6]. The details of each dataset, including the number of training samples, test samples, anomaly percentage, and feature dimensions, are summarized in Table 4.

images

4.3 Evaluation Metrics

Performance was evaluated using standard metrics for imbalanced anomaly detection [27], including AUROC, Precision, Recall, and F1-Score. AUROC, or Area Under the Receiver Operating Characteristic curve, measures the overall discriminative ability of the model. Precision refers to the proportion of correctly predicted positive instances among all instances predicted as positive. Recall, also known as sensitivity, is the proportion of correctly predicted positive instances among all actual positive instances. The Best F1-Score is the maximum harmonic mean of precision and recall, which balances both metrics, and it is accompanied by the corresponding Precision and Recall values at the optimal threshold.

Precision=True PositivesTrue Positives +False Positives(17)

Recall=True PositivesTrue Positives +False Negatives(18)

Precision=2×Precision × RecallPrecision + Recall(19)

4.4 Baseline Methods

To ensure fair and comprehensive comparisons, we benchmarked ARAE against a diverse set of state-of-the-art unsupervised anomaly detection methods, including deep autoencoder variants and classical machine learning approaches. These baselines were selected based on their prominence in prior NAD literature [5,9,12] and to represent a spectrum of representational power: from simple linear models (PCA-Based) to non-linear isolation techniques (Isolation Forest), kernel-based methods (One-Class SVM), and deep architectures (Standard AE and VAE, which serve as proxies for robust AE variants like RXAE by incorporating reconstruction and latent regularization). All baselines were trained exclusively on normal traffic, mirroring ARAE’s unsupervised paradigm.

(1) Standard Autoencoder: A basic feedforward AE with encoder/decoder mirroring ARAE’s structure (hidden dim = 128, latent dim = 32), trained on reconstruction MSE alone for 50 epochs using Adam (lr = 1e−3). Anomalies scored via L2 reconstruction error [29].

(2) Variational Autoencoder (VAE): A probabilistic AE with reparameterization, matching ARAE’s encoder/decoder dims and trained for 50 epochs on MSE + 0.01 × KL divergence. Scoring fuses reconstruction and KL terms, akin to robust VAE variants [4].

(3) Isolation Forest (IF): An ensemble method isolating anomalies via random partitioning. Tuned via grid search on a validation split (20% of normal train data): n_estimators ∈ {100, 200, 300}, contamination ∈ {0.05, 0.1, 0.15}, max_features ∈ {0.5, 0.75, 1.0}. Best model selected by maximizing mean anomaly score on validation; scoring uses negative decision scores [30].

(4) One-Class SVM (OCSVM): A kernel-based boundary learner for normal data. Grid-tuned on validation split: nu ∈ {0.01, 0.05, 0.1, 0.2}, gamma ∈ {‘scale’, ‘auto’, 0.01, 0.1}. Optimal params via max decision function on val; anomalies scored as negative distances [31].

(5) PCA-Based: A linear dimensionality reduction model for reconstruction-based detection. Tuned over n_components ∈ {8, 16, 32, min(64, features//2)} by minimizing validation MSE; scoring via mean squared reconstruction error [32].

Hyperparameter tuning for classical methods (IF, OCSVM, PCA) employed systematic grid search with a held-out validation set from normal training data, ensuring parity with ARAE’s detailed optimization (e.g., adaptive weights and scheduler). Deep baselines (AE/VAE) shared ARAE’s core architecture and optimizer settings where applicable, avoiding bias toward the proposed model. This balanced approach, combined with multi-run statistics (n = 5, varying seeds), confirms ARAE’s superior performance stems from its innovations rather than tuning imbalances.

5 Results and Discussion

5.1 Comparative Performance against Baselines

The proposed Adaptive Robust AutoEncoder (ARAE) consistently achieves state-of-the-art performance across all evaluated datasets, as summarized in Table 5. On the KDDCup99 dataset, ARAE attains near-perfect discrimination with an AUROC of 0.9987 ± 0.0002 and an F1-score of 0.9852 ± 0.0007, outperforming all baselines while demonstrating a favorable precision-recall balance. This superior detection capability is particularly evident in its ability to maintain high recall (0.9808 ± 0.0005) without compromising precision (0.9897 ± 0.0013), underscoring its potential for operational NIDS where minimizing false negatives is critical.

images

On the NSL-KDD dataset, ARAE yields an AUROC of 0.9824 ± 0.0012 and F1-score of 0.9420 ± 0.0021, surpassing baselines in both metrics. The gains are more pronounced on modern datasets like UNSW-NB15 (AUROC: 0.8992 ± 0.0068; F1: 0.8558 ± 0.0047) and CIC-IDS2017 (AUROC: 0.9602 ± 0.0062; F1: 0.9026 ± 0.0081), where ARAE effectively handles complex feature interactions and noisy traffic. Compared to VAE (high precision but lower recall) and Isolation Forest (high recall but poor precision), ARAE achieves a balanced trade-off, as reflected in its superior overall metrics.

The performance advantages over standard AE and VAE baselines are amplified on datasets with intricate dependencies (e.g., UNSW-NB15 and CIC-IDS2017), highlighting the contributions of multi-head attention and adaptive regularization. ARAE also significantly exceeds classical methods such as One-Class SVM and Isolation Forest, affirming the benefits of deep architectures for high-dimensional network data [8]. Statistical tests confirm these improvements are significant (p < 0.05 against most baselines; see Table 5). Fig. 2 visualizes the aggregated AUROC and F1-scores across datasets, illustrating ARAE’s consistent superiority with non-overlapping 95% confidence intervals for key comparisons.

images

Figure 2: Comparative performance of ARAE and baseline methods on average AUROC (left) and F1-Score (right) across all datasets, with 95% confidence intervals. ARAE (red) consistently outperforms the baselines (blue), demonstrating its superior robustness and effectiveness in anomaly detection

5.2 Ablation Analysis

To isolate the contributions of ARAE’s core components, multi-head attention, learnable adaptive loss weighting, and adaptive scoring mechanism, we performed an ablation study across all four datasets. Each variant was evaluated over 5 independent runs, with results reported as mean ± standard deviation for AUROC and F1-score (Table 2). The ablations include: (1) w/o Attention, removing the multi-head self-attention to assess its role in modeling non-linear feature interactions; (2) w/o Adaptive Weights, using fixed regularization parameters instead of learnable ones to evaluate dynamic loss balancing; and (3) Reconstruction Only, relying solely on reconstruction error for scoring (ablating the fusion of Mahalanobis distance and KL divergence) to quantify the adaptive scoring’s impact.

As shown in Table 2, removing adaptive weights leads to the most substantial degradation, with AUROC drops of 0.0009–0.0799 and F1-score reductions of 0.0073–0.0706 across datasets. This effect is pronounced on complex, modern datasets like UNSW-NB15 (−0.0799 AUROC) and CIC-IDS2017 (−0.0606 AUROC), where heterogeneous traffic (e.g., varying packet sizes, encrypted flows) benefits from dynamic regularization to prevent overfitting to noise while maintaining a compact latent space. Practically, this enables better generalization in real-world NIDS, reducing false positives during traffic spikes or zero-day attacks by adaptively prioritizing reconstruction early and regularization later.

Eliminating attention yields smaller but consistent declines (AUROC drops of 0.0002–0.0212), particularly on high-dimensional datasets like CIC-IDS2017 (−0.0212 AUROC), underscoring its value in capturing subtle correlations (e.g., between service flags and byte counts) that static encoders miss. In operational scenarios, this enhances detection of stealthy intrusions mimicking normal patterns.

The Reconstruction Only variant, ablating adaptive scoring, results in AUROC reductions of 0.0002–0.0133, with larger impacts on UNSW-NB15 (−0.0133) and CIC-IDS2017 (−0.0202), confirming the fusion mechanism’s role in robust anomaly quantification. By integrating probabilistic metrics, it mitigates reconstruction error’s limitations for easily reconstructible attacks, improving reliability in diverse environments like IoT networks.

Overall, the ablations demonstrate the synergistic effect of ARAE’s components, with statistical significance (p < 0.05) for most drops (see Section 5.5). These findings validate the model’s design for practical NAD, where adaptability to evolving threats is paramount.

5.3 Computational Analysis

To evaluate computational efficiency—a critical factor for real-time deployment in Network Intrusion Detection Systems (NIDS)—we benchmarked ARAE against baselines on an NVIDIA RTX 4060 GPU (16 GB VRAM) with 128 GB system RAM. All deep learning models (ARAE, AE, VAE) were configured to run exclusively on GPU via PyTorch’s device = ‘cuda’ setting, with CUDA 12.1 for accelerated tensor operations. Classical methods (Isolation Forest, One-Class SVM, PCA-Based) utilized scikit-learn implementations on CPU, as they lack native GPU support. Measurements were averaged over 5 runs per dataset, using time.perf_counter() for wall-clock timing of full training epochs (25 epochs, batch size 256) and inference (batch-processed over test sets, reported as ms/sample). Peak memory footprint was captured via torch.cuda.max_memory_allocated() for GPU models and psutil for CPU usage (in GB). Dataset sizes influenced results (e.g., KDDCup99 ~494 k samples; CIC-IDS2017 ~223 k), so we report averages across all four benchmarks for comparability as detailed in Table 6.

images

ARAE’s elevated training overhead arises from its multi-head attention (O(n2) complexity) and learnable adaptive regularization, which involve additional forward/backward passes compared to vanilla AE/VAE. However, GPU acceleration yields feasible durations (<20 min/dataset on average), suitable for offline model updates in production NIDS. Inference remains near real-time (~3 ms/sample), enabling throughput of >300 samples/second on modest hardware adequate for monitoring enterprise networks with typical packet rates (e.g., 10 k–100 k/sec). Memory usage scales with hidden dimensions (hidden_dim = 128, z_dim = 32) and batch size; peak VRAM allocation during training reflects gradient accumulation, but stays under 4 GB, compatible with edge devices like NVIDIA Jetson.

Classical methods exhibit superior efficiency in speed and memory due to their non-deep architectures, but this comes at the cost of inferior detection accuracy (e.g., 0.03–0.44 lower AUROC; see Table 5). ARAE’s trade-off is justified for high-stakes applications, where precision outweighs marginal overhead. For optimization, techniques like mixed-precision training (FP16 via torch.amp) could halve memory (~1.6 GB) and reduce inference to <2 ms/sample, while model pruning or distillation might align it closer to AE efficiency without significant performance loss. Future work includes porting to TensorRT for embedded NIDS and evaluating on larger-scale clusters for distributed training.

5.4 Cross-Dataset Validation

To rigorously evaluate generalization to unseen network conditions (e.g., zero-day attacks, encrypted traffic, and IoT-specific threats), we conducted cross-dataset experiments. This approach simulates real-world deployment scenarios where models trained on historical data must detect novel attacks in evolving network environments. We trained exclusively on normal traffic from UNSW-NB15 (representing diverse IoT/enterprise patterns) and tested on the full test set of CIC-IDS2017 (containing modern attacks like DDoS, botnets, and infiltration attempts).

Table 7 summarizes cross-dataset performance, with metrics reported as mean ± standard deviation over 5 runs to ensure statistical rigor. ARAE maintains superior performance despite domain shift, achieving AUROC = 0.862 ± 0.012—a 9.1% improvement over standard AE and 14.4% over VAE. Crucially, ARAE’s F1-score (0.814 ± 0.018) exceeds baselines by 6.2%–32.2%, demonstrating balanced precision-recall trade-offs under distribution shift.

images

5.5 Statistical Significance

Experiments were conducted over 5 runs with varying seeds (42–46) to mitigate randomness. We report mean ± standard deviation across metrics. For precision, 95% confidence intervals (CI) were computed using Student’s t-distribution (df = 4, t = 2.776). As visualized in Fig. 2 (average performance with 95% CIs), ARAE demonstrates consistent superiority, with non-overlapping intervals against most baselines, confirming robustness.

• KDD: ARAE AUROC: 0.9987 ± 0.0002 (95% CI: ±0.0002); F1: 0.9852 ± 0.0007 (95% CI: ±0.0009)

• NSL-KDD: ARAE AUROC: 0.9824 ± 0.0012 (95% CI: ±0.0015); F1: 0.9420 ± 0.0021 (95% CI: ±0.0026)

• UNSW_NB15: ARAE AUROC: 0.8992 ± 0.0068 (95% CI: ±0.0084); F1: 0.8558 ± 0.0047 (95% CI: ±0.0058)

• CIC-IDS2017: ARAE AUROC: 0.9602 ± 0.0062 (95% CI: ±0.0077); F1: 0.9026 ± 0.0081 (95% CI: ±0.0100)

Paired t-tests confirm ARAE’s superiority (p < 0.05 vs. most baselines; see Table 5). Low std. (<0.01) indicates reproducibility, with CI overlaps minimal for significant differences.

5.6 Interpretability Analysis and Case Studies

The multi-head attention applied to the feature sequence enables visualization of learned interactions among network attributes, thereby enhancing model transparency and providing actionable insights for anomaly attribution. Fig. 3 presents heatmaps of average attention weights for a normal and an anomaly sample from the CIC-IDS2017 dataset. In the normal traffic sample, attention is relatively diffuse and uniform across features, with moderate weights (~0.011–0.013 on the scale) distributed evenly, indicating balanced contributions from attributes like ‘flow duration’ and ‘packet length mean’ for typical benign flows. In contrast, the anomaly sample exhibits more concentrated patterns, with higher weights (up to 0.015) forming distinct clusters, such as vertical/horizontal bands highlighting correlations among flow-related metrics (e.g., ‘flow packets/s’ attending strongly to ‘fwd packets/s’ and ‘psh flag count’), which are indicative of disruptive behaviors in attacks.

images

Figure 3: Attention heatmaps illustrating learned feature interactions in ARAE for samples from the CIC-IDS2017 dataset. (a) Heatmap for a normal traffic sample, showing diffuse and uniform attention weights (~0.011–0.013), indicative of balanced contributions across features without prominent correlations. (b) Heatmap for a DoS Hulk anomaly sample, displaying concentrated patterns with higher weights (up to 0.015) in clusters, such as between flow-related metrics (‘flow packets/s’, ‘fwd packets/s’, ‘psh flag count’), highlighting attack-specific flood characteristics

Case Study: Consider a ‘DoS Hulk’ attack instance from CIC-IDS2017, which involves HTTP flood tactics generating numerous small packets to overwhelm servers. The heatmap reveals elevated attention weights (>0.014) between ‘fwd packets/s’ and ‘packet length mean’, as well as ‘flow bytes/s’ and ‘psh flag count’ key features that capture the attack’s high packet rate and persistent push flags, leading to amplified reconstruction error and latent deviation. This visualization allows security practitioners in enterprise NIDS to rapidly validate alerts, trace anomalies to specific traffic characteristics (e.g., unusual packet bursts), and mitigate false positives by correlating with domain knowledge, such as filtering benign high-traffic sources.

5.7 Limitations

While ARAE demonstrates state-of-the-art performance in unsupervised network anomaly detection, several limitations warrant discussion, informed by our empirical findings and broader trends in recent research. First, computational efficiency remains a challenge: ARAE’s multi-head attention and adaptive regularization introduce higher training overhead (∼18 min/dataset) and inference latency (∼3 ms/sample) compared to classical methods, potentially hindering deployment in resource-constrained environments like edge devices or high-throughput networks. This stems from the model’s O(n2) attention complexity, which, while GPU-accelerated, scales poorly with larger latent dimensions or batch sizes. Second, explainability is underdeveloped; although attention mechanisms offer potential interpretability benefits (e.g., highlighting key features like packet counts or protocols), our current implementation lacks visualizations such as heatmaps or SHAP analyses, limiting post-detection insights for security analysts. Additionally, reliance on outdated datasets (KDDCup99/NSL-KDD) introduces biases from redundancy and irrelevance to modern attacks, despite our emphasis on contemporary ones. Finally, ARAE assumes offline training on static normal traffic, overlooking streaming or online scenarios where data arrives continuously, risking drift in dynamic networks.

6 Conclusion

In this paper, we tackled the pressing challenge of unsupervised network anomaly detection in the face of dynamic cyber threats, where conventional autoencoders often lack the robustness and adaptability required for real-world efficacy. ARAE’s key novelty lies in its adaptive fusion of multi-head attention, learnable loss weighting, and multi-metric scoring to model complex feature interactions and evolving patterns more effectively than static variants like RXAE.

Extensive evaluations on four benchmarks, including outdated KDDCup99 and NSL-KDD for comparative continuity and modern UNSW-NB15 and CIC-IDS2017 for contemporary relevance, demonstrate ARAE’s state-of-the-art performance, with AUROC scores up to 0.9987 ± 0.0002 and significant gains over AE/VAE (0.01–0.07) and classical methods like Isolation Forest/One-Class SVM (0.03–0.44), validated through ablations, statistical tests (p < 0.05), and fair baseline comparisons. Our contributions a novel adaptive architecture, rigorous empirical analysis, and open-source code advance deep learning for NAD.

This framework directly supports practical applications, such as enhancing enterprise NIDS for large-scale intrusion monitoring, bolstering IoT security against device-specific vulnerabilities, and strengthening cloud defenses amid encrypted and zero-day threats. Future work on online adaptations and explainability will further amplify its impact in operational cybersecurity.

7 Future Work

Future work will address these gaps to enhance ARAE’s practicality. For efficiency, we plan optimizations like model pruning, quantization (e.g., FP16), or distillation to reduce memory (targeting <2 GB) and latency, enabling real-time NIDS integration. To improve explainability, incorporate attention visualizations (e.g., heatmaps of feature weights) and post-hoc tools like SHAP for interpretable anomaly reasoning, aiding human-in-the-loop systems. Generalization can be strengthened via cross-dataset validation and evaluation on emerging datasets (e.g., Edge-IIoT for IoT traffic or encrypted VPN benchmarks), alongside adaptations for zero-day resilience using self-supervised techniques like masked autoencoders. For streaming scenarios, extend to online learning paradigms, such as incremental updates or federated approaches, to handle concept drift in evolving networks. These directions will bridge ARAE toward robust, deployable solutions in diverse cybersecurity contexts.

Acknowledgement: The authors wish to express their sincere gratitude to the lecturers at Nanjing University of Information Science and Technology for their time, guidance, and valuable contributions to this paper. Their support and encouragement have been instrumental in shaping the direction of this research and enhancing its overall quality. Additionally, we extend our heartfelt thanks to our families and friends for their unwavering support, patience, and encouragement throughout the duration of this research. Their understanding during the challenges of the work, and their enthusiasm for our progress, provided the foundation of motivation that enabled us to see this project to completion.

Funding Statement: The authors received no specific funding for this study.

Author Contributions: The authors confirm contribution to the paper as follows: Conceptualization: Chunyong Yin, Williams Kyei; methodology: Williams Kyei; software: Williams Kyei; validation: Williams Kyei; formal analysis: Williams Kyei; investigation: Williams Kyei; resources: Chunyong Yin; data curation: Williams Kyei; writing—original draft preparation: Williams Kyei; writing—review and editing: Williams Kyei, Chunyong Yin; visualization: Williams Kyei; supervision: Chunyong Yin; project administration: Chunyong Yin. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: Data available on request from the authors.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

Abbreviations

AE	Auto Encoder
PCAP	Packet Capture
RNN	Recurrent Neural Network
SVM	Support Vector Machine
OCSVM	One-Class Support Vector Machine (used as a baseline).
DTAAD	Dual Tcn-Attention Networks for Anomaly Detection
TCN	Temporal Convolutional Network
SHAP	Shapley Additive Explanations
MSE	Mean Squared Error
TensorRT	NVIDIA TensorRT (inference optimization library, often abbreviated TensorRT).

References

1. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity. 2019;2(1):20. doi:10.1186/s42400-019-0038-7. [Google Scholar] [CrossRef]

2. Saha B, Anwar Z. A review of cybersecurity challenges in small business: the imperative for a future governance framework. J Inf Secur. 2024;15(1):24–39. doi:10.4236/jis.2024.151003. [Google Scholar] [CrossRef]

3. Torabi H, Mirtaheri SL, Greco S. Practical autoencoder based anomaly detection by using vector reconstruction error. Cybersecurity. 2023;6(1):1. doi:10.1186/s42400-022-00134-9. [Google Scholar] [CrossRef]

4. Kingma DP, Welling M. Auto-encoding variational bayes. 2022. doi:10.48550/arXiv.1312.6114. [Google Scholar] [CrossRef]

5. Neri M, Baldoni S. Unsupervised network anomaly detection with autoencoders and traffic images. 2025. doi:10.48550/arXiv.2505.16650. [Google Scholar] [CrossRef]

6. Sharafaldin I, Gharib A, Lashkari AH, Ghorbani AA. Towards a reliable intrusion detection benchmark dataset. Softw Netw. 2017;2017(1):177–200. doi:10.13052/jsn2445-9739.2017.009. [Google Scholar] [CrossRef]

7. Song Y, Wang W, Wu Y, Fan Y, Zhao X. Unsupervised anomaly detection in shearers via autoencoder networks and multi-scale correlation matrix reconstruction. Int J Coal Sci Technol. 2024;11(1):79. doi:10.1007/s40789-024-00730-9. [Google Scholar] [CrossRef]

8. Zhou C, Paffenroth RC. Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2017. p. 665–74. doi:10.1145/3097983.3098052. [Google Scholar] [CrossRef]

9. Pang G, Shen C, Cao L, Hengel AVD. Deep learning for anomaly detection: a review. ACM Comput Surv. 2022;54(2):1–38. doi:10.1145/3439950. [Google Scholar] [CrossRef]

10. Malaiya RK, Kwon D, Kim J, Suh SC, Kim H, Kim I. An empirical evaluation of deep learning for network anomaly detection. In: 2018 International Conference on Computing, Networking and Communications (ICNC). Piscataway, NJ, USA: IEEE; 2018. p. 893–8. doi:10.1109/ICCNC.2018.8390278. [Google Scholar] [CrossRef]

11. Yu L, Lu Q, Xue Y. DTAAD: dual tcn-attention networks for anomaly detection in multivariate time series data. Knowl Based Syst. 2024;295:111849. doi:10.1016/j.knosys.2024.111849. [Google Scholar] [CrossRef]

12. Aslam MM, Tufail A, De Silva LC, Haji Mohd Apong RAA, Namoun A. An improved autoencoder-based approach for anomaly detection in industrial control systems. Syst Sci Control Eng. 2024;12(1):1117. doi:10.1080/21642583.2024.2334303. [Google Scholar] [CrossRef]

13. Fang F, Wang C. An industrial equipment anomaly detection model with adaptive thresholding based on graph attention networks-diffusion variational autoencoder. 2024. doi:10.2139/ssrn.4827146. [Google Scholar] [CrossRef]

14. Li D, Tao Q, Liu J, Wang H. Center-aware adversarial autoencoder for anomaly detection. IEEE Trans Neural Netw Learn Syst. 2022;33(6):2480–93. doi:10.1109/TNNLS.2021.3122179. [Google Scholar] [PubMed] [CrossRef]

15. Ming L, Dezhi H, Dun L. A method combining improved Mahalanobis distance and adversarial autoencoder to detect abnormal network traffic. In: International Database Engineered Applications Symposium Conference. New York, NY, USA: ACM; 2023. p. 161–9. doi:10.1145/3589462.3589489. [Google Scholar] [CrossRef]

16. An J, Cho S. Variational autoencoder based anomaly detection using reconstruction probability. 2015 [cited 2025 Sep 30]. Available from: http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf. [Google Scholar]

17. Wen Y, Zhang K, Li Z, Qiao Y. A discriminative feature learning approach for deep face recognition. In: Computer Vision–ECCV 2016: 14th European Conference. Amsterdam, The Netherlands; 2016. p. 499–515. doi:10.1007/978-3-319-46478-7_31. [Google Scholar] [CrossRef]

18. Chen M, Gao C, Ren Z. Robust covariance and scatter matrix estimation under Huber’s contamination model. Annal Statist. 2018;46(5):1932–60. doi:10.1214/17-AOS1607. [Google Scholar] [CrossRef]

19. Huang P, Yan H, Song Z, Xu Y, Hu Z, Dai J. Combining autoencoder with clustering analysis for anomaly detection in radiotherapy plans. Quant Imaging Med Surg. 2023;13(4):2328–38. doi:10.21037/qims-22-825. [Google Scholar] [PubMed] [CrossRef]

20. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: 31st Annual Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA; 2017. [Google Scholar]

21. Usmani UA, Happonen A, Watada J. A review of unsupervised machine learning frameworks for anomaly detection in industrial applications; In: Intelligent Computing (SAI 2022). Cham, Switzerland: Springer; 2022. p. 158–89. doi:10.1007/978-3-031-10464-0_11. [Google Scholar] [CrossRef]

22. Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA. USAD: unsupervised anomaly detection on multivariate time series. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: ACM; 2020. p. 3395–404. doi:10.1145/3394486.3403392. [Google Scholar] [CrossRef]

23. Herrmann L, Bieber M, Verhagen WJC, Cosson F, Santos BF. Unmasking overestimation: a re-evaluation of deep anomaly detection in spacecraft telemetry. CEAS Space J. 2024;16(2):225–37. doi:10.1007/s12567-023-00529-5. [Google Scholar] [CrossRef]

24. Xi C, Wang H, Wang X. A novel multi-scale network intrusion detection model with transformer. Sci Rep. 2024;14(1):23239. doi:10.1038/s41598-024-74214-w. [Google Scholar] [PubMed] [CrossRef]

25. Zhang J, Zhang X, Liu Z, Fu F, Jiao Y, Xu F. A network intrusion detection model based on BiLSTM with multi-head attention mechanism. Electronics. 2023;12(19):4170. doi:10.3390/electronics12194170. [Google Scholar] [CrossRef]

26. Ren K, Yuan S, Zhang C, Shi Y, Huang Z. CANET: a hierarchical CNN-attention model for network intrusion detection. Comput Commun. 2023;205(16):170–81. doi:10.1016/j.comcom.2023.04.018. [Google Scholar] [CrossRef]

27. Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE; 2018. p. 7482–91. doi:10.1109/CVPR.2018.00781. [Google Scholar] [CrossRef]

28. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. doi:10.1162/neco.1997.9.8.1735. [Google Scholar] [PubMed] [CrossRef]

29. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–7. doi:10.1126/science.1127647. [Google Scholar] [PubMed] [CrossRef]

30. Liu FT, Ting KM, Zhou Z-H. Isolation forest. In: Eighth IEEE International Conference on Data Mining. Piscataway, NJ, USA: IEEE; 2008. p. 413–22. doi:10.1109/ICDM.2008.17. [Google Scholar] [CrossRef]

31. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Comput. 2001;13(7):1443–71. doi:10.1162/089976601750264965. [Google Scholar] [PubMed] [CrossRef]

32. Ringberg H, Soule A, Rexford J, Diot C. Sensitivity of PCA for traffic anomaly detection. ACM SIGMETRICS Perform Eval Rev. 2007;35(1):109–20. doi:10.1145/1269899.1254895. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Yin, C., Kyei, W. (2025). ARAE: An Adaptive Robust AutoEncoder for Network Anomaly Detection. Journal of Cyber Security, 7(1), 615–635. https://doi.org/10.32604/jcs.2025.072740

Vancouver Style

Yin C, Kyei W. ARAE: An Adaptive Robust AutoEncoder for Network Anomaly Detection. J Cyber Secur. 2025;7(1):615–635. https://doi.org/10.32604/jcs.2025.072740

IEEE Style

C. Yin and W. Kyei, “ARAE: An Adaptive Robust AutoEncoder for Network Anomaly Detection,” J. Cyber Secur., vol. 7, no. 1, pp. 615–635, 2025. https://doi.org/10.32604/jcs.2025.072740

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

ARAE: An Adaptive Robust AutoEncoder for Network Anomaly Detection

Abstract

Keywords

References

Cite This Article

55

20

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link