iconOpen Access

ARTICLE

crossmark

Improving Security-Sensitive Deep Learning Models through Adversarial Training and Hybrid Defense Mechanisms

Xuezhi Wen1, Eric Danso2,*, Solomon Danso2

1 Department School of Computer Science and School of Cyber Science and Engineering, Nanjing University of Information Science and Technology, Nanjing, 210044, China
2 School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, 210044, China

* Corresponding Author: Eric Danso. Email: email

Journal of Cyber Security 2025, 7, 45-69. https://doi.org/10.32604/jcs.2025.063606

Abstract

Deep learning models have achieved remarkable success in healthcare, finance, and autonomous systems, yet their security vulnerabilities to adversarial attacks remain a critical challenge. This paper presents a novel dual-phase defense framework that combines progressive adversarial training with dynamic runtime protection to address evolving threats. Our approach introduces three key innovations: multi-stage adversarial training with TRADES (Tradeoff-inspired Adversarial Defense via Surrogate-loss minimization) loss that progressively scales perturbation strength, maintaining 85.10% clean accuracy on CIFAR-10 (Canadian Institute for Advanced Research 10-class dataset) while improving robustness; a hybrid runtime defense integrating feature manipulation, statistical anomaly detection, and adaptive ensemble learning; and a 40% reduction in computational costs compared to (Projected Gradient Descent) PGD-based methods. Experimental results demonstrate state-of-the-art performance, achieving 66.50% adversarial accuracy on CIFAR-10 (outperforming TRADES by 12%) and 70.50% robustness against FGSM (Fast Gradient Sign Method) attacks on GTSRB (German Traffic Sign Recognition Benchmark). Statistical validation (p < 0.05) confirms the reliability of these improvements across multiple attack scenarios. The framework’s significance lies in its practical deployability for security-sensitive applications: in autonomous systems, it prevents adversarial spoofing of traffic signs (89.20% clean accuracy on GTSRB); in biometric security, it resists authentication bypass attempts; and in financial systems, it maintains fraud detection accuracy under attack. Unlike existing defenses that trade robustness for efficiency, our method simultaneously optimizes both through its unique combination of proactive training and reactive runtime mechanisms. This work provides a foundational advancement in adversarial defense, offering a scalable solution for protecting AI systems in healthcare diagnostics, intelligent transportation, and other critical domains where model integrity is paramount. The proposed framework establishes a new paradigm for developing attack-resistant deep learning systems without compromising computational practicality.

Keywords

Adversarial training; hybrid defense mechanisms; deep learning robustness; security-sensitive applications; adversarial attacks mitigation

1  Introduction

Deep learning has revolutionized multiple fields, including cybersecurity, computer vision, and natural language processing. Its ability to learn complex patterns from large datasets has enabled breakthroughs in tasks such as image classification, speech recognition, and anomaly detection [13]. For instance, deep learning models now power critical applications like medical diagnostics, where they assist in detecting diseases from medical images, and autonomous systems, where they enable real-time decision-making in dynamic environments. However, despite these advancements, deep learning models remain highly susceptible to adversarial attacks, where malicious actors craft subtle perturbations that mislead models into making incorrect predictions [4].

These attacks exploit the inherent vulnerabilities of neural networks, often by introducing imperceptible changes to input data that drastically alter the model’s output. This vulnerability poses significant risks, particularly in security-sensitive applications such as biometric recognition, autonomous driving, and financial fraud detection, where even minor misclassifications can lead to severe consequences. With biometric systems, adversarial attacks can bypass authentication mechanisms, while in financial systems, they can manipulate transaction data to evade fraud detection algorithms [58]. Addressing these weaknesses is critical to ensuring the robustness and reliability of AI-driven systems in high-stakes environments, which necessitates continuous research into sophisticated adversarial defense strategies.

Real-world adversarial attacks have already demonstrated substantial threats in critical domains. For instance, researchers have shown that small perturbations to stop signs—such as adding stickers or graffiti—can trick autonomous vehicle systems into misclassifying them as speed limit signs, leading to potential traffic hazards [9]. Notwithstanding, adversarial attacks on facial recognition systems have enabled unauthorized access by subtly modifying facial images to fool security checkpoints. In the healthcare domain, adversarial perturbations to medical images have been shown to cause misdiagnoses, such as classifying benign tumors as malignant or vice versa [10,11].

These cases emphasize the urgent need for stronger adversarial defenses that can adapt to both known and evolving attack strategies. Despite the growing body of research on adversarial defenses, existing methods often struggle to balance robustness with computational efficiency, and many fail to generalize across diverse attack types [12]. While adversarial training, for instance, has demonstrated efficacy against particular attacks such as the Fast Gradient Sign Method (FGSM), it often underperforms against more sophisticated techniques like Projected Gradient Descent (PGD) or physical-world attacks [13].

Again, extensive research on adversarial attacks and defenses has revealed the increasing complexity of adversarial techniques, underscoring the need for robust defenses [14,15]. Attackers exploit intricate model characteristics to craft attacks that are often imperceptible to humans but significantly degrade model performance. This dynamic reflects a decade-long “arms race” in adversarial machine learning, where advancements in defenses are met with increasingly sophisticated attacks [16]. To counter these evolving threats, researchers have developed hybrid and ensemble defenses that combine multiple strategies, such as adversarial training, defensive distillation, input transformations, and ensemble methods, to create layered defenses capable of addressing diverse attack vectors [17]. Recent proposals have further advanced hybrid defenses by integrating evolutionary algorithms, support vector machines, and artificial neural networks, demonstrating a growing trend toward adaptive and sophisticated defense mechanisms [1820].

Moreover, many defenses are computationally expensive, making them impractical for real-time applications such as autonomous driving or real-time fraud detection. These limitations highlight a critical research gap: the need for a comprehensive defense framework that combines proactive and reactive strategies to address both known and emerging adversarial threats while maintaining computational efficiency and scalability [21]. To address these challenges, this study proposes a Hybrid Adversarial Training and Defense Framework that integrates multi-stage adversarial training with an adaptive hybrid defense mechanism to provide stronger and more efficient robustness against adversarial attacks. Our key contributions include:

•   We propose that our approach achieves state-of-the-art adversarial robustness, outperforming existing methods such as PGD and Ensemble Adversarial Training while maintaining high clean accuracy. This ensures reliable performance even under strong adversarial attacks.

•   We introduce a multi-stage adversarial training strategy that progressively increases perturbation strength during training. Unlike traditional approaches that use fixed-strength perturbations, our method enables the model to learn robust feature representations across a wide range of attack scenarios, enhancing its ability to generalize under varying levels of adversarial threats.

•   We demonstrate that our hybrid runtime defense mechanism, which combines feature manipulation, statistical anomaly detection, and adaptive ensemble learning, dynamically counters different attack strategies in real time. This multi-layered approach provides comprehensive protection against both known and emerging adversarial threats.

•   We show that our framework significantly improves computational efficiency compared to standard adversarial training techniques. This reduction in training costs makes our approach more practical for large-scale applications, such as autonomous systems and real-time fraud detection, where computational resources are often limited.

•   We confirm that our method generalizes effectively across datasets, achieving strong robustness not only on standard benchmarks but also on datasets relevant to security-sensitive applications. This demonstrates its effectiveness in real-world scenarios, where adversarial attacks can have severe consequences.

In light of these considerations, it becomes evident that the future of adversarial defense in deep learning may rely on comprehensive hybrid models that leverage the strengths of multiple defense paradigms. The defenses built to fend off antagonistic attacks must advance in sophistication along with them. A possible approach to protecting deep learning models from developing adversarial strategies is the pursuit of a comprehensive defense architecture that includes components of input modification, adversarial training, and ensemble-based detection. By creating hybrid defensive methods that solve existing constraints, improve model robustness, and guarantee that security-sensitive deep learning systems continue to withstand adversarial tests, our research seeks to further this objective.

The remainder of this paper is organized as follows. Section 2 presents a detailed literature review on adversarial attacks and existing defense mechanisms, highlighting the research gap. Section 3 describes the proposed methodology, including multi-stage adversarial training and the hybrid defense framework. Section 4 provides experimental results, including comparative analysis, computational trade-offs, and statistical significance tests. Section 5 discusses the key findings, novelty, and real-world implications of the study. Finally, Section 6 concludes the paper and outlines future research directions.

2  Related Work

In this section, we provide an overview of adversarial attack strategies, existing defense mechanisms, and advancements in adversarial training. Specifically, we categorize adversarial defense techniques into three main areas: adversarial training-based approaches, feature manipulation and anomaly detection methods, and ensemble-based strategies. Additionally, we examine the strengths and limitations of denoising techniques for mitigating adversarial perturbations and discuss hybrid defense frameworks that integrate multiple countermeasures. This comparative analysis highlights the existing research gaps and underscores the need for a comprehensive, adaptive adversarial defense strategy, which our proposed framework aims to address.

2.1 Adversarial Defense Techniques

Adversarial defenses can generally be classified into four major categories, each with its own strengths and limitations. These approaches aim to mitigate the impact of adversarial attacks, but their effectiveness varies depending on the attack type, computational requirements, and applicability to real-world scenarios. The four primary categories include adversarial training-based defenses, feature manipulation techniques, anomaly detection methods, and ensemble-based defenses. Each category addresses specific vulnerabilities in deep learning models, yet challenges such as computational inefficiency, limited generalization, and adaptability to evolving threats remain significant hurdles in achieving comprehensive adversarial robustness.

2.1.1 Adversarial Training-Based Defenses

Adversarial training, pioneered by Madry et al. [22], remains a cornerstone of defense strategies by training models on hybrid clean and adversarially perturbed data to learn robust feature representations. While early implementations focused on static attacks like FGSM and PGD [22,23], recent advances by Wang et al. [24] and Dhanaraj and Sridevi [25] have expanded this paradigm through adaptive perturbation scheduling and loss function designs. For instance, TRADES [26] introduced a theoretical trade-off between robustness and accuracy, achieving 56.61% adversarial accuracy on CIFAR-10, while subsequent work by Ryu and Choi [27] hybridized adversarial training with denoising networks to mitigate clean accuracy drops.

Despite these advancements, three persistent limitations emerge. First, the computational overhead of iterative adversarial example generation (e.g., PGD’s multi-step attacks) scales poorly for large datasets, as noted in Barik et al.’s [28] analysis of training costs. Second, the robustness-generalization trade-off remains unresolved, with models like Parseval Networks [29] sacrificing up to 5% clean accuracy for adversarial robustness. Finally, attack-specific optimization leaves models vulnerable to emerging threats, as demonstrated by Lunghi et al. [8] in fraud detection systems. These gaps motivate our multi-stage training approach, which reduces computational costs by 40% while maintaining 85.10% clean accuracy.

2.1.2 Feature Manipulation-Based Defenses

Feature manipulation techniques have evolved significantly since their inception, focusing on altering model internal representations to resist adversarial perturbations. The foundational work on Parseval networks [29] introduced spectral normalization to constrain the Lipschitz constant, reducing sensitivity to input perturbations by 32% compared to standard models. Subsequent advances by Xie et al. [30] demonstrated how feature denoising could improve robustness against PGD attacks while maintaining 81.5% clean accuracy on CIFAR-10.

Recent innovations in semantic feature manipulation [24] have shown particular promise by modifying key attributes in feature space, achieving 68% robustness against adaptive attacks. However, these methods face two critical limitations: their effectiveness remains attack-specific, with performance dropping by 15%–20% against unseen attack types [25] and most require extensive architectural modifications, making them impractical for deployment in existing systems without complete retraining. These challenges motivate our hybrid approach’s dynamic feature manipulation component, which requires no architectural changes while maintaining compatibility with diverse attack types.

2.1.3 Anomaly Detection-Based Defenses

Anomaly detection has emerged as a complementary defense strategy, with Mahalanobis distance-based detection [31,32] establishing early benchmarks by analyzing intermediate feature distributions to identify adversarial patterns with 86% precision. Recent works have enhanced these methods through deep feature analysis, reducing false positives by 40% in real-world applications [33,34]. The state-of-the-art detection systems now combine statistical techniques with neural network-based classifiers [8], achieving 93% detection rates against adaptive attacks on GTSRB.

Despite these advances, three persistent challenges remain: sophisticated adversarial examples can mimic clean data distributions, evading detection in 15%–20% of cases [31]. High false positive rates (up to 12% in biometric systems) degrade user experience [35] and most systems cannot adapt to novel attack patterns without retraining. Our framework addresses these limitations through an adaptive ensemble of detection methods that dynamically updates detection thresholds based on attack patterns observed during runtime, maintaining 95% detection accuracy while reducing false positives to under 5%.

2.1.4 Ensemble-Based Defenses

Ensemble methods have demonstrated significant progress in adversarial defense through their ability to combine predictions from multiple models or strategies. The foundational work on ensemble adversarial training [36,37] established that aggregating predictions from models trained with diverse adversarial examples could reduce single-point vulnerabilities by up to 35%. Recent advances by Li et al. [38] introduced adaptive weighting mechanisms that dynamically adjust model contributions based on attack patterns, achieving 72% robustness against adaptive attacks on ImageNet. Similarly, Wang et al. [24] demonstrated that architecturally diverse ensembles with varying inductive biases could improve generalization, maintaining 68% accuracy under PGD attacks while requiring 30% fewer parameters than traditional ensembles.

However, these approaches face three persistent challenges: computational costs scale linearly with ensemble size, increasing training time by 2–3× for typical 5-model ensembles [28]. System complexity grows exponentially when combining multiple defense strategies [25]; and static ensembles show limited adaptability, with performance dropping by 15%–20% against novel attack types without retraining [8]. Our adaptive ensemble design addresses these limitations through dynamic model weighting and selective activation of defense components.

2.2 Hybrid Defense Mechanisms

Hybrid defense mechanisms represent the current frontier in adversarial robustness research, with Wang et al. [39] demonstrating that multi-task training frameworks could achieve 82% cross-attack generalization by simultaneously optimizing against multiple adversarial objectives. The work of Dhanaraj and Sridevi [25] marked a significant advancement by integrating attack-specific training with learned data augmentation, reducing the robustness-accuracy trade-off by 40% compared to single-strategy defenses. Domain-specific implementations have shown particular promise, such as in speaker recognition systems [35] where hybrid adversarial training improved resistance to audio spoofing attacks by 55% while maintaining 98% clean accuracy.

Recent innovations [40,41] have further expanded these approaches through: cascaded defense layers that sequentially apply different protection mechanisms, and runtime strategy switching based on attack detection confidence. However, current hybrid systems still face challenges in computational efficiency, with many requiring 2–5× more resources than baseline models [42]. Our framework addresses this through optimized defense scheduling that reduces overhead by 40% while maintaining 95% attack detection rates.

2.3 Comparative Analysis of Past Methods

The following analysis evaluates adversarial defense methods, highlighting critical trade-offs between robustness, accuracy, and efficiency. Table 1 compares classical approaches, while Table 2 documents recent advancements. Our discussion contextualizes how our framework addresses the identified gaps.

images

images

The limitations highlighted in Tables 1 and 2 necessitate our integrated framework, where PGD training [22], hybrid defenses [25], and ensembles [45] incur 40%–300% computational overhead [38]. Our progressive perturbations reduce training time by 40%. Single-strategy defenses (e.g., autoencoders [43] or Parseval Networks [29]) exhibit evasion rates >25% [26,39] or accuracy drops [29]—gaps addressed by our hybrid runtime mechanism.

Unlike TRADES [26] and randomized smoothing [44], which sacrifices 4%–12% accuracy, our framework maintains 85.10% clean accuracy while outperforming recent hybrids [25] in cross-attack robustness by 15%. Recent studies [24,25] confirm that unified approaches outperform isolated defenses by 15%–30% on cross-attack robustness, with our method achieving the upper bound of this improvement. This aligns with our framework’s dynamic adaptability to attack types, as demonstrated by its 95% detection rate against adaptive attacks, while preserving efficiency, resolving all critical gaps identified in Tables 1 and 2.

2.4 Research Gap and Limitations of Existing Methods

Despite extensive research into adversarial defense mechanisms, significant gaps remain in current methodologies. Many existing approaches focus on specific attack types, limiting their generalizability across diverse threat models. Again, the trade-off between robustness and model accuracy remains a persistent challenge. These limitations hinder the practical deployment of robust deep learning systems, particularly in security-sensitive applications where reliability and efficiency are paramount. The key research gaps and limitations include:

2.4.1 Trade-Off between Robustness and Accuracy

Many adversarial training methods improve robustness at the cost of reduced clean accuracy, making them less effective in real-world applications where both metrics are critical. For example, while PGD Adversarial Training enhances resilience against white-box attacks, it often leads to a noticeable drop in performance on clean data [40,42,46]. This trade-off is particularly problematic in domains like healthcare and autonomous driving, where high accuracy on clean inputs is essential for safe and reliable operation. Addressing this challenge requires a defense mechanism that balances robustness and accuracy without compromising either.

2.4.2 Computational Inefficiency

State-of-the-art methods, such as ensemble-based defenses and adversarial training, require substantial computational resources, limiting their scalability for large-scale deployment [38]. For instance, training an ensemble of models or generating adversarial examples during training significantly increases both time and resource requirements. This computational overhead makes such methods impractical for real-time applications, such as real-time fraud detection or autonomous systems, where efficiency is critical. A more computationally efficient approach is needed to enable the widespread adoption of adversarial defenses.

2.4.3 Lack of Adaptive Defenses

Most existing methods apply static defenses that fail to adjust dynamically to evolving adversarial threats. For instance, feature manipulation techniques like randomized smoothing are effective against specific attack types but cannot adapt to new or more sophisticated adversarial strategies [44,47,48]. This lack of adaptability is a significant limitation in dynamic environments, such as cybersecurity, where attackers continuously develop new techniques to bypass defenses. A defense framework that can dynamically adapt to emerging threats is essential for long-term robustness.

2.4.4 Limited Generalization across Attack Types

Many defenses perform well against specific attack types but fail to generalize across multiple adversarial strategies. For instance, Mahalanobis Distance Detection excels at identifying adversarial inputs generated using gradient-based methods but struggles against attacks that mimic clean data distributions [32]. Similarly, Autoencoder-Based Reconstruction effectively removes adversarial noise from inputs but is ineffective against adaptive attacks that bypass reconstruction mechanisms [41,43]. This lack of generalization limits the applicability of existing defenses in real-world scenarios, where models must withstand a wide range of adversarial threats.

2.5 Addressing the Gaps: Proposed Hybrid Framework

Existing adversarial defense methods suffer from computational inefficiency, limited generalization, and a trade-off between robustness and accuracy. To address these challenges, this study proposes a Hybrid Adversarial Training and Defense Framework, integrating multi-stage adversarial training with adaptive runtime defenses to enhance robustness while maintaining efficiency. The multi-stage training progressively increases perturbation strength, ensuring robust feature learning without overfitting to weak adversarial examples. Unlike fixed-strength perturbation methods, this approach balances robustness and clean accuracy, mitigating trade-offs seen in PGD and TRADES. At runtime, adversarial threats are actively countered by adaptive defenses, such as feature modification, anomaly detection, and ensemble learning.

Techniques such as autoencoder-based reconstruction, randomized smoothing, and statistical anomaly detection neutralize attacks while preserving data integrity. Ensemble learning further reduces vulnerabilities by aggregating multiple model predictions. A key advantage of this framework is computational efficiency, reducing training costs by 40% through adaptive perturbation techniques like Auto-PGD and avoiding expensive iterative attack computations. Experiments on CIFAR-10 and GTSRB demonstrate superior adversarial accuracy and generalization, making this approach effective for critical applications in healthcare, finance, cybersecurity, and autonomous systems. By bridging robustness, accuracy, and efficiency, this framework advances the reliability of AI deployments.

This review highlights the diverse approaches to defending deep learning models against adversarial attacks, including adversarial training, hybrid defenses, semantic manipulation, and ensemble methods. While these approaches have made significant progress in mitigating adversarial vulnerabilities, they face critical limitations such as computational inefficiency, limited generalization across attack types, and inability to adapt to evolving adversarial tactics. As adversarial threats continue to evolve, there is a pressing need for adaptable, application-specific solutions that can address dynamic attack scenarios, particularly in domains like autonomous driving, healthcare, and cybersecurity, where robustness, accuracy, and efficiency are paramount.

Theinsights from this review provide a foundation for developing an innovative hybrid adversarial defense framework that integrates adversarial training, feature manipulation, anomaly detection, and ensemble methods. This system fills in the holes in current defenses by reducing computing cost, balancing accuracy and resilience, and allowing for dynamic adaptability to new threats. For protecting deep learning models in crucial real-world applications, the suggested approach provides a complete, flexible solution by integrating several protection concepts.

3  Proposed Method

This section presents the Hybrid Adversarial Training and Defense Framework, designed to enhance model robustness while optimizing computational efficiency. The framework consists of Multi-Stage Adversarial Training, which gradually increases perturbation strength during training, allowing the model to develop stronger resilience against adversarial attacks while preserving clean accuracy. Unlike conventional methods with fixed perturbation strengths, this approach progressively exposes the model to increasingly complex adversarial examples, ensuring more effective generalization across different attack scenarios.

Complementing this, the Hybrid Runtime Defense Mechanism provides real-time adversarial mitigation by integrating feature manipulation, anomaly detection, and ensemble learning. Feature manipulation neutralizes perturbations, anomaly detection identifies adversarial inputs based on statistical deviations, and an ensemble of models ensures reliable predictions. This multi-layered defense strategy adapts dynamically to evolving adversarial threats while maintaining computational efficiency.

3.1 Multi-Stage Adversarial Training

Standard adversarial training methods, such as Projected Gradient Descent (PGD) and TRADES, apply fixed-strength perturbations during training [26]. While these approaches improve robustness, they often lead to overfitting to weak adversarial examples, limiting their effectiveness against more sophisticated attacks. In contrast, our approach introduces a multi-stage adversarial training strategy that gradually increases perturbation strength over multiple training stages. This progressive approach prevents models from overfitting to weak adversarial examples and enhances their ability to generalize across a wide range of attack scenarios.

The multi-stage training process begins by stabilizing early learning with small perturbations. This initial phase allows the model to develop a strong foundation of robust feature representations without being overwhelmed by strong adversarial examples. As training progresses, the perturbation magnitude is gradually increased, exposing the model to increasingly complex adversarial scenarios. This step-by-step escalation ensures that the model learns to handle both weak and strong adversarial perturbations, improving its overall robustness as visualized in Fig. 1.

images

Figure 1: Multi-stage adversarial training framework

To optimize the trade-off between clean accuracy and robustness, we employ the TRADES loss function, which balances the model’s performance on clean and adversarial data. This loss function ensures that the model maintains high accuracy on clean inputs while improving its resilience to adversarial attacks.

3.1.1 Adversarial Example Generation

Adversarial examples x~ are generated by introducing a perturbation into a clean input, ensuring that the perturbation remains within a bound [49]. This process is mathematically represented as:

x~=x+δ(1)

where ||δ||p. Here, ||δ||p ensures that the perturbation is constrained within an p-norm ball of radius , making the adversarial example imperceptible to human observers while still misleading the model. For this work, we employ Projected Gradient Descent (PGD) and Fast Gradient Sign Method (FGSM) to craft adversarial examples [23]. The PGD update step follows:

δ(t+1)=Proj(||δ||p≤∈)(δ(t)+α.sign(xJ(θ,x+δ,y)))(2)

where α is the step size, and Proj ensures that perturbations remain within the constraint . This iterative process allows PGD to generate strong adversarial examples by maximizing the model’s loss while staying within the perturbation bound. FGSM, on the other hand, is a single-step method that generates adversarial examples by taking a step in the direction of the gradient sign:

x~=x+.sign(xJ(θ,x,y))(3)

3.1.2 Training Objective

The model is trained to minimize the following adversarial loss function which ensures robustness against adversarial perturbations:

a dv=E(x,y)D[max||δ||p≤∈J(θ,x+δ,y)](4)

This formulation ensures that the model learns robust feature representations, reducing susceptibility to adversarial attacks during deployment [50].

3.1.3 Computational Complexity Analysis

The computational efficiency of adversarial training is a critical consideration, especially for large-scale applications where resource constraints are a concern. Traditional adversarial training methods, such as Projected Gradient Descent (PGD), often require extensive computational resources due to their reliance on iterative gradient updates and full-batch adversarial example generation. These methods involve multiple forward and backward passes through the model for each input, leading to significant training time and resource consumption. In contrast, our multi-stage adversarial training strategy introduces several optimizations that reduce computational overhead while maintaining robustness.

First, our approach leverages Auto-PGD for adaptive adversarial perturbation generation. Unlike standard PGD, which performs fixed-step gradient updates, Auto-PGD dynamically adjusts the step size and perturbation strength based on the model’s current state. This adaptive process reduces the number of iterations required to generate effective adversarial examples, thereby lowering computational costs. Additionally, Auto-PGD avoids the need for deep iterative attack computations, which are computationally expensive and time-consuming.

Again, the multi-stage training process itself contributes to computational efficiency. By starting with small perturbations and gradually increasing their strength, the model converges more quickly in the early stages of training. This progressive approach reduces the overall training time per epoch, as the model does not need to immediately handle highly complex adversarial scenarios. Furthermore, the use of TRADES loss ensures that the training process remains focused on optimizing the trade-off between robustness and clean accuracy, avoiding unnecessary computations associated with overly aggressive adversarial training.

3.2 Hybrid Runtime Defense Mechanism

The Hybrid Defense Framework dynamically detects, transforms, and mitigates adversarial attacks during inference, ensuring flexibility in handling diverse attack types. This framework integrates multiple defense strategies to provide comprehensive protection against adversarial threats while maintaining computational efficiency. The key components of the framework include feature manipulation, statistical anomaly detection, and ensemble learning, each contributing to the system’s adaptability and robustness as visualized in Fig. 2.

images

Figure 2: Hybrid runtime defense mechanism

3.2.1 Feature Manipulation

When an adversarial attack is detected, the framework applies feature manipulation techniques to neutralize adversarial perturbations while preserving the integrity of the input data. One such technique is Autoencoder-Based Reconstruction, which removes adversarial noise by reconstructing the input from its latent representation. This process ensures that the reconstructed input retains important features while eliminating adversarial artifacts. Another technique, Randomized Smoothing, smooths the feature distributions by adding random noise to the input, making it harder for adversarial perturbations to influence the model’s predictions. The transformation process can be represented as:

x=T(x)(5)

where T(x) represents transformations applied to distort adversarial perturbations while preserving key data features.

3.2.2 Statistical Anomaly Detection

To identify adversarial samples before inference, the framework employs Mahalanobis Distance-Based Detection. This method calculates the anomaly score by comparing the logits of the input to the distribution of clean logits. The anomaly score is defined as:

Anomaly Score=|L(x)µ|σ(6)

where L(x) represents logits, µ and σ are the mean and standard deviation of clean logits respectively. Inputs with anomaly scores exceeding a predefined threshold are flagged as adversarial and subjected to further processing.

3.2.3 Ensemble Learning for Adaptive Robustness

The framework leverages ensemble learning to increase resistance to adaptive attacks. By combining predictions from multiple models, the ensemble reduces the likelihood of successful adversarial manipulation. The final prediction is computed using adaptive weighting strategies, where each model’s output is weighted based on its reliability. The ensemble prediction is given by:

y^=argmaxyi=1nwipi(y|x)(7)

where pi(y|x) represents the output probability of the i-th model, and wi is its assigned weight based on reliability. This adaptive approach ensures robust decision-making even under diverse adversarial scenarios.

4  Experiment Results

This section presents the outcomes of the experiments conducted to evaluate the proposed method. The results include the model’s performance during training and evaluation on both clean and adversarial datasets. A comparative analysis with state-of-the-art defense strategies highlights the effectiveness of the proposed approach in improving robustness while maintaining competitive clean accuracy.

4.1 Experimental Setup

Following the procedure in Algorithm 1, the experiments were conducted on two datasets: CIFAR-10 and GTSRB (German Traffic Sign Recognition Benchmark). CIFAR-10, a widely used benchmark dataset, contains 60,000 images across 10 classes and serves as the primary evaluation platform for adversarial robustness. GTSRB, which includes 43 classes of traffic signs, was used to assess the generalizability of our approach in real-world security-sensitive applications, such as autonomous driving.

images

The proposed framework offers several key advantages over existing adversarial defense methods. First, it provides better adaptability by dynamically adjusting training perturbations and inference defenses based on the type of attack encountered. This adaptability ensures that the framework remains effective against both known and emerging adversarial threats. Second, the framework achieves lower computational cost by reducing the expensive iterative attack calculations used in standard adversarial training. This efficiency makes the approach more practical for real-time applications, such as autonomous driving, biometric authentication, and cybersecurity, where both rapid inference and strong adversarial robustness are essential.

To evaluate the model’s robustness, we tested it against several adversarial attacks, including FGSM (Fast Gradient Sign Method), PGD (Projected Gradient Descent), CW (Carlini & Wagner Attack), and AutoAttack [51,52]. FGSM is a single-step attack that perturbs inputs using the sign of the gradient, while PGD is a multi-step iterative attack known for its strength. The CW attack is an optimization-based method that often bypasses standard defenses, and AutoAttack is an ensemble of adaptive adversarial attacks, considered one of the most challenging to defend against. Also, the implementation was carried out using PyTorch and TensorFlow [53,54], ensuring flexibility and efficiency in training and evaluation. The model was trained for 5 epochs with a batch size of 128, optimizing the trade-off between computational efficiency and adversarial robustness.

4.2 Performance Comparison & Trade-Offs

The trade-off between clean accuracy, adversarial robustness, and computational efficiency is a key consideration in adversarial defenses. Our method achieves higher adversarial accuracy than state-of-the-art methods while maintaining computational efficiency. Specifically, the proposed method achieves 66.50% adversarial accuracy, surpassing PGD, TRADES, and Ensemble Adversarial Training. This improvement in robustness is achieved without compromising clean accuracy, ensuring reliable performance in both adversarial and non-adversarial scenarios.

In terms of computational efficiency, our method reduces computational overhead by 40% compared to PGD-based adversarial training. This reduction is achieved through the use of multi-stage training and adaptive defenses, which minimize the need for expensive iterative attack computations. Unlike computationally expensive methods like TRADES, our approach scales efficiently across datasets, maintaining high robustness on GTSRB. This scalability is particularly important for real-world applications, where computational resources are often limited.

4.3 Training Convergence and Hybrid Defense Performance

The training process demonstrates a steady decrease in both training loss and adversarial loss, confirming the model’s ability to learn robust feature representations while maintaining stability. Training loss decreased from 1.4823 in Epoch 1 to 0.9231 in Epoch 5, while adversarial loss decreased from 1.9021 to 1.5708 over the same period as shown in Table 3. This gradual convergence highlights the effectiveness of the multi-stage adversarial training strategy in balancing robustness and clean accuracy. Fig. 3 is the graphical represention of the training loss and adversarial loss values.

images

images

Figure 3: Training loss vs. adversarial loss

Similarly, the hybrid defense mechanism demonstrates continuous improvement in anomaly detection accuracy and ensemble accuracy, highlighting its adaptability in mitigating adversarial threats. Anomaly detection accuracy increased from 86% in Epoch 1 to 95% in Epoch 5, indicating the system’s growing efficiency in identifying adversarial patterns. Similarly, ensemble accuracy improved from 83% to 92%, confirming that the integration of adaptive defenses enhances the system’s ability to mitigate adversarial threats while maintaining clean accuracy as shown in Table 4. These results underscore the effectiveness of the hybrid defense framework in reinforcing the model’s resistance to adversarial attacks as shown in Fig. 4.

images

images

Figure 4: Anomaly detection performance and ensemble prediction loss

4.4 Quantitative Results and Statistical Significance

To statistically validate the robustness improvements of the proposed method, we conducted paired t-tests on adversarial accuracies across multiple attack types. Additionally, 95% confidence intervals were computed to assess the stability and consistency of model performance. The results are summarized in Table 5.

images

The p-value of 0.012 confirms that the proposed method’s improvement over PGD Adversarial Training is statistically significant (p < 0.05). This indicates that the observed gains in adversarial accuracy are not due to random chance but reflect a genuine improvement in robustness. Furthermore, the narrow confidence intervals for the proposed method ([65.10, 67.90]) demonstrate consistent performance across multiple test runs, highlighting the reliability of our approach. In contrast, TRADES and Feature Denoising exhibit lower adversarial accuracies and wider confidence intervals, suggesting less stable performance under adversarial conditions. Fig. 5 is the visual representation showing the statistical significance of performance gains.

images

Figure 5: Statistical significance of performance gains

4.5 Generalization to Another Dataset

To evaluate the generalizability of the proposed method beyond CIFAR-10, we tested it on the GTSRB (German Traffic Sign Recognition Benchmark) dataset under FGSM and PGD attacks. The results, presented in Table 6, demonstrate the method’s effectiveness in a real-world security-sensitive application.

images

The proposed method achieves 89.20% clean accuracy and 70.50% FGSM accuracy as shown in Fig. 6, outperforming baseline methods such as PGD Adversarial Training, TRADES, and Feature Denoising. This demonstrates its ability to generalize across datasets while maintaining high performance in both clean and adversarial settings. Under the more challenging PGD attack, the proposed method achieves 57.80% accuracy, significantly higher than PGD Adversarial Training (52.90%) and TRADES (50.30%). This confirms the method’s robustness against stronger adversarial threats. The results on GTSRB validate the scalability of the proposed method, as it maintains high performance even in a domain-specific, real-world dataset.

images

Figure 6: Performance comparison on GTSRB

4.6 Comparative Evaluation with State-of-the-Art Defenses

A comprehensive comparison of clean and adversarial accuracy across CIFAR-10 is provided in Table 7. This comparison highlights the proposed method’s superiority over existing state-of-the-art defenses.

images

The proposed method achieves 85.10% clean accuracy and 66.50% adversarial accuracy, outperforming all baseline methods. This demonstrates its ability to balance robustness and clean accuracy, a critical requirement for real-world applications. Compared to PGD Adversarial Training, the proposed method achieves higher adversarial accuracy (66.50% vs. 65.41%) while maintaining competitive clean accuracy (85.10% vs. 83.00%). This improvement is achieved with 40% lower computational cost, making the method more practical for large-scale deployment. The method also outperforms TRADES, which suffers from a significant drop in adversarial accuracy (56.61%) despite its higher clean accuracy (84.92%). This highlights the limitations of TRADES in handling strong adversarial attacks. Fig. 7 visualizes the comparison that the proposed method is superior to existing state-of-the-art defenses.

images

Figure 7: Comparative performance of adversarial defense methods

The experimental results demonstrate the effectiveness of the proposed method across multiple dimensions. First, the method achieves higher robustness, attaining state-of-the-art adversarial accuracy across multiple attacks, including FGSM, PGD, and AutoAttack. This improvement in robustness is achieved without compromising clean accuracy, ensuring reliable performance in both adversarial and non-adversarial scenarios. Second, the method exhibits strong generalization, performing consistently across CIFAR-10 and GTSRB. This cross-dataset effectiveness validates its applicability to real-world scenarios, such as autonomous driving and traffic sign recognition, where adversarial robustness is critical.

In addition to robustness and generalization, the method offers significant computational efficiency. By reducing training costs by 40% compared to PGD and TRADES, the approach is more practical for large-scale and real-time applications. This efficiency is particularly important in domains like real-time fraud detection and autonomous systems, where computational resources are often limited. Finally, the performance improvements are statistically validated, with p-values < 0.05 and narrow confidence intervals, confirming the reliability of the results. These findings underscore the potential of the proposed method to enhance the security and reliability of deep learning models in critical applications.

5  Discussion

This section differentiates our work from existing adversarial defense methods and highlights its novelty. The proposed Hybrid Adversarial Training and Defense Mechanism enhances security-sensitive deep learning models through a combination of multi-stage adversarial training and adaptive hybrid defense strategies. Unlike previous approaches that focus on either training-time or inference-time defenses, our method integrates both, ensuring comprehensive protection against adversarial attacks.

5.1 Differentiation from Existing Work

Existing adversarial defense methods primarily fall into three categories, each with its own strengths and limitations. First, adversarial training-based defenses, such as PGD Adversarial Training [22] and TRADES [26], improve robustness by exposing models to adversarial examples during training. However, these methods often suffer from clean accuracy degradation and high computational costs. Our approach addresses these limitations by implementing a progressive, multi-stage adversarial training strategy that optimizes perturbation strength over time, balancing robustness and accuracy more effectively.

Again, feature manipulation and anomaly detection-based defenses, such as autoencoder-based reconstruction [43] and Mahalanobis distance-based detection [31] Aim to remove adversarial noise and identify adversarial patterns before inference. While these methods are effective against certain attack types, they struggle with adaptive adversarial attacks that bypass their defenses. Our method integrates both feature manipulation and statistical anomaly detection, dynamically adjusting to different attack types and ensuring robust performance under diverse adversarial scenarios.

Also, ensemble-based defenses, such as Adaptive Ensemble Weighting [38] and Feature Denoising Networks [30]. Combine multiple models to enhance robustness. However, these techniques often incur high computational overhead, limiting their practicality for real-time applications. Our hybrid defense strategy optimizes ensemble learning for efficiency, ensuring real-time adaptability without excessive resource consumption.

5.2 Key Differentiation

Unlike existing defenses that focus on a single aspect (training-time or inference-time defense), our method integrates both, offering a dual-layer security mechanism. While ensemble approaches enhance robustness, our framework ensures dynamic adaptability across different attack types, making it more resilient to evolving adversarial threats.

5.3 Novelty & Key Contributions

The proposed method introduces three key innovations that differentiate it from prior works. First, our multi-stage adversarial training strategy optimizes robustness by gradually increasing perturbation strength during training. Unlike PGD-based adversarial training, which applies fixed perturbations, our approach prevents overfitting to weak attacks and ensures better generalization. The use of TRADES loss optimization further balances robustness and accuracy, outperforming existing approaches in both metrics.

Second, our hybrid runtime defense mechanism dynamically adapts to different attack strategies through a combination of autoencoder-based feature transformation, statistical anomaly detection, and adaptive ensemble learning. Unlike static defenses, our method ensures robust performance under diverse adversarial scenarios. The autoencoder-based feature transformation cleans adversarial noise while preserving key data features, while statistical anomaly detection identifies adversarial patterns before inference. Adaptive ensemble learning introduces redundancy, reducing single-point vulnerabilities and improving overall robustness.

Finally, our method achieves improved computational efficiency and scalability. By reducing training costs by 40% compared to PGD adversarial training, our approach is more practical for large-scale and real-time applications. Additionally, the method generalizes effectively across datasets, such as CIFAR-10 and GTSRB, confirming its adaptability to different security-sensitive applications.

5.4 Impact on Real-World Applications

Our approach is particularly valuable in security-sensitive applications where deep learning models require high robustness with minimal performance trade-offs. In autonomous driving, the method prevents adversarial attacks that mislead traffic sign recognition models, ensuring the safety and reliability of autonomous systems. In biometric authentication, it enhances protection against adversarial spoofing attacks in facial recognition systems, improving security in access control applications. In cybersecurity, the method strengthens defenses against adversarial malware and phishing detection systems, safeguarding critical infrastructure from evolving threats.

To further illustrate the generalization of our approach across different datasets and application domains, Table 8 presents a comparison of model performance on common objects across CIFAR-10 and GTSRB. This highlights how our hybrid adversarial defense framework effectively mitigates adversarial threats in both natural and structured image datasets.

images

The results indicate that our approach achieves strong adversarial robustness across datasets with distinct characteristics. On CIFAR-10, the method demonstrates its ability to defend against adversarial manipulations in natural object recognition tasks. For instance, the model achieves 85.10% clean accuracy and 70.50% FGSM accuracy on car images in (Fig. 8), and 84.70% clean accuracy and 69.80% FGSM accuracy on airplane images (Fig. 9). These results highlight the framework’s effectiveness in handling natural images with diverse features and textures.

images

Figure 8: Adversarial robustness evaluation on CIFAR-10 (Car class)

images

Figure 9: Adversarial robustness evaluation on CIFAR-10 (Airplane class)

On the GTSRB dataset, the method validates its effectiveness in structured real-world applications, such as traffic sign detection. For example, the model achieves 89.20% clean accuracy and 74.30% FGSM accuracy on stop sign images (Fig. 10) and 88.60% clean accuracy and 72.50% FGSM accuracy on speed limit sign images (Fig. 11). The higher robustness on GTSRB suggests that the model is particularly suited for safety-critical environments like autonomous driving, where accurate and reliable traffic sign recognition is essential.

images

Figure 10: Adversarial robustness evaluation on GTSRB (Stop Sign class)

images

Figure 11: Adversarial robustness evaluation on GTSRB (Speed Limit class)

The combination of adversarial training and hybrid runtime defenses ensures that the model maintains high performance even under adaptive attack scenarios. By integrating proactive and reactive defense strategies, the framework provides a comprehensive solution that balances robustness, accuracy, and computational efficiency across diverse datasets and application domains.

6  Conclusion

This study presents a robust dual-layered framework that integrates multi-stage adversarial training with a hybrid runtime defense mechanism to enhance the resilience of deep learning models against adversarial threats in security-sensitive applications. By progressively increasing perturbation strength, adversarial training ensures that the model generalizes effectively across both clean and perturbed data, improving robustness without sacrificing clean accuracy. Meanwhile, the hybrid defense mechanism dynamically mitigates adversarial threats at runtime through feature manipulation, anomaly detection, and adaptive ensemble learning, strengthening the system’s ability to respond to diverse attack strategies in real time.

Experimental evaluations on CIFAR-10 and GTSRB datasets demonstrate that our proposed method outperforms existing state-of-the-art adversarial defenses in adversarial accuracy while maintaining computational efficiency. The combination of proactive adversarial training and reactive hybrid defenses provides a comprehensive security framework that balances accuracy, robustness, and efficiency. Statistical significance tests confirm the reliability of the method, while generalization across multiple datasets highlights its scalability for real-world applications.

These findings underscore the importance of integrating multiple defense paradigms to create a scalable, adaptable, and computationally efficient adversarial defense strategy. By enhancing the security and reliability of deep learning models, this research contributes to safer AI deployments in critical fields such as healthcare, finance, cybersecurity, and autonomous systems. Future work will focus on expanding the methodology to larger-scale datasets, real-world adversarial attack scenarios, and additional security-sensitive domains to further improve model resilience against evolving adversarial threats.

Acknowledgement: The authors would like to extend their sincere gratitude to the reviewers for their insightful comments and constructive feedback, which have significantly improved the quality and clarity of this manuscript. Their detailed suggestions and critical observations helped us refine our methodology, strengthen the experimental analysis, and better articulate the novelty and impact of our work. We are deeply appreciative of their time and effort in reviewing this paper, as their contributions have been invaluable in shaping it into a more robust and comprehensive study.

Funding Statement: The authors did not receive any specific funding for this study.

Author Contributions: The authors confirm contribution to the paper as follows: Conceptualization: Eric Danso, Xuezhi Wen; methodology: Eric Danso; software: Eric Danso; validation: Eric Danso, Solomon Danso, Xuezhi Wen; formal analysis: Eric Danso, Solomon Danso; investigation: Eric Danso; resources: Xuezhi Wen; data curation: Eric Danso; writing—original draft preparation: Eric Danso; writing—review and editing: Eric Danso, Solomon Danso; visualization: Eric Danso; supervision: Xuezhi Wen; project administration: Xuezhi Wen. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The data will be made available upon request.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

References

1. Khamaiseh SY, Bagagem D, Al-Alaj A, Mancino M, Alomari HW. Adversarial deep learning: a survey on adversarial attacks and defense mechanisms on image classification. IEEE Access. 2022;10(7):102266–91. doi:10.1109/ACCESS.2022.3208131. [Google Scholar] [CrossRef]

2. Waghela H, Sen J, Rakshit S. Adversarial resilience in image classification: a hybrid approach to defense. In: 2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). Piscataway, NJ, USA: IEEE; 2024. p. 419–26. [Google Scholar]

3. Abbasi A, Javed ARR, Yasin A, Jalil Z, Kryvinska N, Tariq U. A large-scale benchmark dataset for anomaly detection and rare event classification for audio forensics. IEEE Access. 2022;10:38885–94. doi:10.1109/ACCESS.2022.3166602. [Google Scholar] [CrossRef]

4. Radanliev P, Santos O. Adversarial attacks can deceive AI systems, leading to misclassification or incorrect decisions. Oxford, UK: University of Oxford; 2023. [Google Scholar]

5. Wang J, Zhang W, Huang Z, Li J. Boosting robustness in deep neuro-fuzzy systems: uncovering vulnerabilities, empirical insights, and a multi-attack defense mechanism. IEEE Trans Fuzzy Syst. 2025 Jan;33(1):255–66. doi:10.1109/TFUZZ.2024.3396845. [Google Scholar] [CrossRef]

6. Peng H, Bao S, Li L. A survey of security protection methods for deep learning model. IEEE Transact Artif Intell. 2024;5(4):1533–53. doi:10.1109/TAI.2023.3314398. [Google Scholar] [CrossRef]

7. Chou E, Tramer F, Pellegrino G. SentiNet: detecting localized universal attacks against deep learning systems. In: 2020 IEEE Security and Privacy Workshops (SPW). Piscataway, NJ, USA: IEEE; 2020. p. 48–54. [Google Scholar]

8. Lunghi D, Simitsis A, Bontempi G. Assessing adversarial attacks in real-world fraud detection. In: 2024 IEEE International Conference on Web Services (ICWS). Piscataway, NJ, USA: IEEE; 2024. p. 27–34. [Google Scholar]

9. Dong Y, Wang L, Li Z, Li H, Tang P, Hu C, et al. Safe driving adversarial trajectory can mislead: toward more stealthy adversarial attack against autonomous driving prediction module. ACM Trans Priv Secur. 2025;28(2):1–28. doi:10.1145/3705611. [Google Scholar] [CrossRef]

10. Gandhi A, Jain S. Adversarial perturbations fool deepfake detectors. In: 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ, USA: IEEE; 2020. p. 1–8. [Google Scholar]

11. Hui L, Bo Z, Linquan H, Jiabao G, Yifan L. FoolChecker: a platform to evaluate the robustness of images against adversarial attacks. Neurocomputing. 2020;412(10):216–25. doi:10.1016/j.neucom.2020.05.062. [Google Scholar] [CrossRef]

12. Sun C, Xu C, Yao C, Liang S, Wu Y, Liang D, et al. Improving robust fariness via balance adversarial training. Proc AAAI Conf Artif Intell. 2023;37(12):15161–9. doi:10.1609/aaai.v37i12.26769. [Google Scholar] [CrossRef]

13. Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, et al. Robust physical-world attacks on deep learning visual classification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE; 2018. p. 1625–34. [Google Scholar]

14. Yuan X, He P, Zhu Q, Li X. Adversarial examples: attacks and defenses for deep learning. IEEE Trans Neural Netw Learn Syst. 2019;30(9):2805–24. doi:10.1109/TNNLS.2018.2886017. [Google Scholar] [PubMed] [CrossRef]

15. Goyal S, Doddapaneni S, Khapra MM, Ravindran B. A survey of adversarial defenses and robustness in NLP. ACM Comput Surv. 2023;55(14s):1–39. doi:10.1145/3593042. [Google Scholar] [CrossRef]

16. Biggio B, Roli F. Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit. 2018;84(3):317–31. doi:10.1016/j.patcog.2018.07.023. [Google Scholar] [CrossRef]

17. Zhao W, Alwidian S, Mahmoud QH. Adversarial training methods for deep learning: a systematic review. Algorithms. 2022;15(8):283. doi:10.3390/a15080283. [Google Scholar] [CrossRef]

18. Hosseini S, Zade BMH. New hybrid method for attack detection using combination of evolutionary algorithms, SVM, and ANN. Comput Netw. 2020;173:107168. doi:10.1016/j.comnet.2020.107168. [Google Scholar] [CrossRef]

19. Sharma A, Rani S, Driss M. Hybrid evolutionary machine learning model for advanced intrusion detection architecture for cyber threat identification. PLoS One. 2024;19(9):e0308206. doi:10.1371/journal.pone.0308206. [Google Scholar] [PubMed] [CrossRef]

20. Sokkalingam S, Ramakrishnan R. An intelligent intrusion detection system for distributed denial of service attacks: a support vector machine with hybrid optimization algorithm based approach. Concurr Comput. 2022;34(27):1024. doi:10.1002/cpe.7334. [Google Scholar] [CrossRef]

21. Muritala A, Ayokunle A, Oyewale O, Apaleokhai D. Enhancing cyber threat detection through real-time threat intelligence and adaptive defense mechanisms. Int J Comput Appl Technol Res. 2024;13(8):11–27. doi:10.7753/IJCATR1308.1002. [Google Scholar] [CrossRef]

22. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. 2017. [Google Scholar]

23. Waghela H, Sen J, Rakshit S. Robust image classification: defensive strategies against FGSM and PGD adversarial attacks. arXiv:2408.13274. 2024. [Google Scholar]

24. Wang J, Pan J, AlQerm I, Liu Y. Def-IDS: an ensemble defense mechanism against adversarial attacks for deep learning-based network intrusion detection. In: 2021 International Conference on Computer Communications and Networks (ICCCN). Piscataway, NJ, USA: IEEE; 2021. p. 1–9. [Google Scholar]

25. Dhanaraj RS, Sridevi M. Building a robust and efficient defensive system using hybrid adversarial attack. IEEE Transact Artif Intell. 2024;5(9):4470–8. doi:10.1109/TAI.2024.3384337. [Google Scholar] [CrossRef]

26. Zhang H, Yu Y, Jiao J, Xing EP, El Ghaoui L, Jordan MI. Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the 36th International Conference on Machine Learning. PMLR; 2019. Vol. 97, p. 7472–82. [Google Scholar]

27. Ryu G, Choi D. A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples. Appl Intell. 2023;53(8):9174–87. doi:10.1007/s10489-022-03991-6. [Google Scholar] [CrossRef]

28. Barik K, Misra S, Fernandez-Sanz L. Adversarial attack detection framework based on optimized weighted conditional stepwise adversarial network. Int J Inf Secur. 2024;23(3):2353–76. doi:10.1007/s10207-024-00844-w. [Google Scholar] [CrossRef]

29. Cisse M, Bojanowski P, Grave E, Dauphin Y, Usunier N. Parseval networks: improving robustness to adversarial examples. In: Proceedings of the 34th International Conference on Machine Learning. PMLR; 2017. Vol. 70, p. 854–63. [Google Scholar]

30. Xie C, Wu Y, Van Der Maaten L, Yuille A, He K. Feature denoising for improving adversarial robustness [Internet]. [cited 2025 Apr 14]. Available from: https://github.com/facebookresearch/. [Google Scholar]

31. Kamoi R, Kobayashi K. Why is the mahalanobis distance effective for anomaly detection? arXiv:2003.00402. 2020. [Google Scholar]

32. Lee S, Park J-W, Kim D-S, Jeon I, Baek D-C. Anomaly detection of tripod shafts using modified Mahalanobis distance. J Mech Sci Technol. 2018;32(6):2473–8. doi:10.1007/s12206-018-0504-2. [Google Scholar] [CrossRef]

33. Olateju O, Okon SU, Igwenagu U, Salami AA, Oladoyinbo TO, Olaniyi OO. Combating the challenges of false positives in AI-driven anomaly detection systems and enhancing data security in the cloud. SSRN Electron J. 2024. [Google Scholar]

34. Baniya BK, Rush T. Intelligent anomaly detection system based on ensemble and deep learning. In: 2024 26th International Conference on Advanced Communications Technology (ICACT). Piscataway, NJ, USA: IEEE; 2024. p. 137–42. [Google Scholar]

35. Pal M, Jati A, Peri R, Hsu C-C, AbdAlmageed W, Narayanan S. Adversarial defense for deep speaker recognition using hybrid adversarial training. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ, USA: IEEE; 2021. p. 6164–8. [Google Scholar]

36. Kariyappa S, Qureshi MK. Improving adversarial robustness of ensembles with diversity training. arXiv: 1901.09981. 2019. [Google Scholar]

37. Yuan J, He Z. Ensemble generative cleaning with feedback loops for defending adversarial attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE; 2020. p. 581–90. [Google Scholar]

38. Li T, Shu X, Wu J, Zheng Q, Lv X, Xu J. Adaptive weighted ensemble clustering via kernel learning and local information preservation. Knowl Based Syst. 2024;294(3):111793. doi:10.1016/j.knosys.2024.111793. [Google Scholar] [CrossRef]

39. Wang D, Li C, Wen S, Nepal S, Xiang Y. Defending against adversarial attack towards deep neural networks via collaborative multi-task training. IEEE Trans Dependable Secure Comput. 2022;19(2):953–65. doi:10.1109/TDSC.2020.3014390. [Google Scholar] [CrossRef]

40. Park LH, Kim J, Oh MG, Park J, Kwon T. Adversarial feature alignment: balancing robustness and accuracy in deep learning via adversarial training. In: Proceedings of the 2024 Workshop on Artificial Intelligence and Security. New York, NY, USA: ACM; 2024. p. 101–12. [Google Scholar]

41. Ashraf SN, Siddiqi R, Farooq H. Auto encoder-based defense mechanism against popular adversarial attacks in deep learning. PLoS One. 2024;19(10):e0307363. doi:10.1371/journal.pone.0307363. [Google Scholar] [PubMed] [CrossRef]

42. Roshan K, Zafar A, Ul Haque SB. Untargeted white-box adversarial attack with heuristic defence methods in real-time deep learning based network intrusion detection system. Comput Commun. 2024;218(4):97–113. doi:10.1016/j.comcom.2023.09.030. [Google Scholar] [CrossRef]

43. Liu J, Di S, Zhao K, Jin S, Tao D, Liang X, et al. Exploring autoencoder-based error-bounded compression for scientific data. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER). Piscataway, NJ, USA: IEEE; 2021. p. 294–306. [Google Scholar]

44. Cohen J, Rosenfeld E, Zico Kolter J. Certified adversarial robustness via randomized smoothing. In: Proceedings of the 36th International Conference on Machine Learning. PMLR; 2019. Vol. 97, p. 1310–20. [Google Scholar]

45. He W, Wei J, Chen X, Carlini N, Song D. Adversarial example defenses: ensembles of weak defenses are not strong. In: 11th USENIX Workshop on Offensive Technologies (WOOT 17); 2017. [Google Scholar]

46. Kitada S, Iyatomi H. Attention meets perturbations: robust and interpretable attention with adversarial training. IEEE Access. 2021;9:92974–85. doi:10.1109/ACCESS.2021.3093456. [Google Scholar] [CrossRef]

47. Rakin AS, Fan D. Defense-Net: defend against a wide range of adversarial attacks through adversarial detector. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Piscataway, NJ, USA: IEEE; 2019. p. 332–7. [Google Scholar]

48. Ozbulak U, Van Messem A, De Neve W. Not all adversarial examples require a complex defense: identifying over-optimized adversarial examples with IQR-based logit thresholding. In: 2019 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ, USA: IEEE; 2019. p. 1–8. [Google Scholar]

49. An S, Lee MJ, So J. Improving robustness against adversarial example attacks using non-parametric models on MNIST. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). Piscataway, NJ, USA: IEEE; 2020. p. 443–7. [Google Scholar]

50. McCarthy A, Ghadafi E, Andriotis P, Legg P. Functionality-preserving adversarial machine learning for robust classification in cybersecurity and intrusion detection domains: a survey. J Cybersecur Priv. 2022;2(1):154–90. doi:10.3390/jcp2010010. [Google Scholar] [CrossRef]

51. Villegas-Ch W, Jaramillo-Alcázar A, Luján-Mora S. Evaluating the robustness of deep learning models against adversarial attacks: an analysis with FGSM, PGD and CW. Big Data Cogn Comput. 2024;8(1):8. doi:10.3390/bdcc8010008. [Google Scholar] [CrossRef]

52. Liu Y, Cheng Y, Gao L, Liu X, Zhang Q, Song J. Practical evaluation of adversarial robustness via adaptive auto attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022. p. 15105–14. [Google Scholar]

53. Novac O-C, Chirodea MC, Novac CM, Bizon N, Oproescu M, Stan OP, et al. Analysis of the application efficiency of TensorFlow and PyTorch in convolutional neural network. Sensors. 2022;22(22):8872. doi:10.3390/s22228872. [Google Scholar] [PubMed] [CrossRef]

54. Yuan J. Performance analysis of deep learning algorithms implemented using pytorch in image recognition. Procedia Comput Sci. 2024;247(2):61–9. [cited 2024 Dec 16]. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1877050924028084. [Google Scholar]


Cite This Article

APA Style
Wen, X., Danso, E., Danso, S. (2025). Improving Security-Sensitive Deep Learning Models through Adversarial Training and Hybrid Defense Mechanisms. Journal of Cyber Security, 7(1), 45–69. https://doi.org/10.32604/jcs.2025.063606
Vancouver Style
Wen X, Danso E, Danso S. Improving Security-Sensitive Deep Learning Models through Adversarial Training and Hybrid Defense Mechanisms. J Cyber Secur. 2025;7(1):45–69. https://doi.org/10.32604/jcs.2025.063606
IEEE Style
X. Wen, E. Danso, and S. Danso, “Improving Security-Sensitive Deep Learning Models through Adversarial Training and Hybrid Defense Mechanisms,” J. Cyber Secur., vol. 7, no. 1, pp. 45–69, 2025. https://doi.org/10.32604/jcs.2025.063606


cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 456

    View

  • 219

    Download

  • 0

    Like

Related articles

Share Link