EdgeTrustX: A Privacy-Aware Federated Transformer Framework for Scalable and Explainable IoT Threat Detection

Saleh Alharbi

doi:10.32604/cmc.2026.073584

icon Open Access

ARTICLE

EdgeTrustX: A Privacy-Aware Federated Transformer Framework for Scalable and Explainable IoT Threat Detection

Saleh Alharbi^*

Information Technology Department, College of Computing and Information Technology, Shaqra University, Shaqra, Saudi Arabia

* Corresponding Author: Saleh Alharbi. Email: email

(This article belongs to the Special Issue: Towards Privacy-preserving, Secure and Trustworthy AI-enabled Systems)

Computers, Materials & Continua 2026, 87(3), 15 https://doi.org/10.32604/cmc.2026.073584

Received 21 September 2025; Accepted 13 November 2025; Issue published 09 April 2026

Abstract

Real-time threat detection in Internet of Things (IoT) networks requires scalable, privacy-preserving, and interpretable models capable of operating under strict latency constraints. This paper presents EdgeTrustX, a privacy-aware federated transformer framework that addresses these challenges by combining transformer-based representation learning with federated optimisation, differential privacy, and homomorphic encryption. The framework enables collaborative model training across heterogeneous IoT devices without exposing sensitive local data while maintaining computational feasibility for edge deployment. A multi-head attention mechanism integrated with a secure aggregation protocol supports adaptive feature weighting and privacy-protected parameter exchange. To enhance transparency, an explainability module that combines attention visualisation and SHAP analysis provides interpretable insights into attack patterns and decision boundaries. Extensive experiments on four public IoT benchmark datasets—namely, IoT23, NBaIoT, UNSWNB15, and CICIDS2017—demonstrate that EdgeTrustX achieves an average detection accuracy of 94.7%, closely approaching the centralised transformer baseline of 95.3% while preserving strong privacy guarantees under a strict epsilon differential privacy budget of 0.1. The system reduces membership inference attack success to 52.1%, achieves a 23% improvement in scalability, and maintains an average per-round latency of 449.2 ms, confirming its suitability for real-time operation in large-scale edge networks. The main contributions include (1) a privacy-preserving federated transformer architecture for IoT threat detection, (2) a scalable differential privacy-driven secure aggregation protocol, (3) an explainable AI component enabling transparent threat analysis, and (4) a comprehensive empirical evaluation validating accuracy, scalability, privacy preservation, and interpretability in diverse IoT scenarios.

Keywords

Federated learning; transformer networks; IoT security; privacy preservation; explainable AI; threat detection; edge computing; differential privacy

1 Introduction

The Internet of Things (IoT) ecosystem is expanding at an unprecedented scale, with projections indicating more than 75 billion connected devices by 2025 [1]. This explosive growth has reshaped key sectors such as smart healthcare, industrial automation, and smart cities, offering improved efficiency, real-time monitoring, and enhanced decision-making capabilities [2]. However, the rapid increase in interconnected devices has also intensified the security challenges within IoT networks, resulting in a surge of complex and evolving cyber threats targeting distributed and resource-constrained systems [3].

Traditional centralized security architectures are insufficient for the IoT landscape due to bandwidth limitations, latency concerns, and privacy risks associated with transmitting sensitive data across networks [4]. Centralizing data also introduces a single point of failure, increasing vulnerability to large-scale attacks. Furthermore, the heterogeneity in device capabilities, communication standards, and power constraints necessitates lightweight security mechanisms that can operate effectively at the edge [5].

Fig. 1 presents a conceptual overview of the EdgeTrustX architecture, illustrating the hierarchical interaction between IoT devices, the privacy-preserving aggregation layer, and the global coordination unit. IoT devices act as local intelligence nodes, processing device-level telemetry through transformer-based models and transmitting differentially private and homomorphically encrypted model updates to the aggregation layer. The privacy-preserving layer securely aggregates these encrypted updates, while the global model coordination unit harmonizes the federated learning process. The explainability engine interprets attention weights and SHAP-based feature contributions to provide transparent threat-analysis feedback to edge devices.

images

Figure 1: Overview of the EdgeTrustX architecture illustrating the data flow among IoT devices, the privacy-preserving aggregation layer, and the global coordination unit. Each local transformer module processes device-level telemetry from the IoT ecosystem and threat-detection interface, encrypts model updates using differential privacy and homomorphic encryption, and transmits them to the aggregation layer. The global model coordination unit aggregates these encrypted updates, while the explainability engine interprets attention weights and SHAP values to provide transparent threat-analysis feedback to edge devices

Federated Learning (FL) has emerged as a promising approach to IoT security by enabling collaborative training without exposing raw device data, preserving locality and mitigating privacy concerns [6]. Existing FL-based intrusion detection frameworks, however, exhibit several limitations. These include insufficient modeling of complex threat patterns, vulnerability to advanced adversarial attacks, lack of integrated interpretability, and poor scalability in large heterogeneous IoT deployments [7]. Studies such as Abd Elaziz et al. [8], Albogami [9], and Khan et al. [10] have demonstrated the potential of FL for distributed threat detection, yet they still struggle with communication overhead, adversarial robustness, and convergence inefficiencies under non-IID data. Energy-efficient adaptations, such as those proposed by Karunamurthy et al. [11], address computational constraints but do not fully resolve scalability or explainability concerns.

Transformer architectures have shown remarkable success in domains such as natural language processing and computer vision due to their ability to learn long-range dependencies through self-attention mechanisms [12]. Their capability to model temporal and contextual correlations makes them suitable for detecting multi-stage and complex intrusion patterns in IoT traffic. Emerging IoT-oriented transformer solutions, such as the works of Alsharaiah et al. [13], Song and Ma [14], and Saghir et al. [15], have demonstrated improved detection accuracy and interpretability. However, deploying transformers in federated IoT environments remains challenging due to device-level computational constraints, communication limitations, and the need for formal privacy guarantees [16].

Explainability has become a critical requirement for IoT security operations, where analysts must understand model decisions to validate threat classifications and enforce timely countermeasures. SHAP-based and attention-visualization techniques have been successfully applied to enhance visibility in threat detection frameworks [17]. Nonetheless, current explainable models often function as post-hoc mechanisms rather than being embedded directly into federated optimization pipelines, raising concerns about potential information leakage through interpretability outputs [18].

The importance of privacy-preserving learning mechanisms has been emphasized in recent IoT and 5G security surveys, which highlight the necessity of maintaining confidentiality under large-scale, heterogeneous device environments [19]. Communication-efficient decentralized approaches such as DIGEST provide enhanced scalability through optimized local updates and reduced synchronization overhead [20]. In parallel, transformer-driven and large language model (LLM)-based intrusion detection systems have demonstrated 4%–6% improvements in detection accuracy by capturing long-range attack dependencies and threat semantics [21]. Additional studies focusing on cross-domain and heterogeneous IoT environments reinforce the value of attention-based deep networks for identifying high-impact anomalies [22,23].

Building on these advances, this paper introduces EdgeTrustX, a privacy-aware federated transformer architecture explicitly designed for scalable and explainable IoT threat detection. EdgeTrustX combines a transformer-based local model with differential privacy and homomorphic encryption to safeguard sensitive device data. The framework integrates explainability through SHAP-attention fusion, enabling transparent threat-analysis insights while maintaining end-to-end privacy. Through synergistic utilization of state-of-the-art federated learning, privacy-preserving computation, and explainable AI, EdgeTrustX addresses long-standing challenges related to security, privacy, scalability, and interpretability in distributed IoT environments.

The EdgeTrustX incorporates a multi-head attention mechanism, which has been adapted for use in a federated environment. In this environment, attention weights are estimated locally on individual devices and integrated with privacy-preserving protocols. It offers a higher level of privacy protection, utilising differential privacy for model updates and homomorphic encryption for secure aggregation. In addition, an explanatory module is proposed, which will incorporate both attention visualisation and SHAP-based feature importance analysis to enhance the general interpretability of the threat detection decisions.

The results of large-scale experiments on four real-world IoT datasets suggest that EdgeTrustX’s threat detection accuracy of 94.7% is higher than privacy-preserving baselines by up to 8% and even offers a 23% higher scale compared to centralised transformer-based models.

The present approaches, such as FedAvg and DP-Fed, offer extremely simple privacy but lack scalability for non-homogeneous IoT systems, whereas HE-Fed incurs a huge computational cost. Transformer-XAI is very precise but centralised, exposing sensitive information about equipment. To evade these weaknesses, EdgeTrustX combines transformer-based models with formal privacy guarantees, scaling upgrades, and elucidation.

Existing federated learning frameworks often struggle with communication inefficiency, privacy leakage, and reduced convergence when working with non-IID data. Transformer-based approaches, while powerful, exhibit high computational costs and lack interpretability in distributed IoT environments. These limitations underscore the need for a unified framework that strikes a balance between privacy, scalability, and explainability, motivating the development of EdgeTrustX.

The main contributions of this study are summarised as follows:

(1) Privacy-Aware Federated Transformer Architecture: A novel transformer-based neural network architecture is presented for federated IoT environments, incorporating privacy-preserving mechanisms at multiple levels while maintaining high detection accuracy.

(2) Scalable Privacy-Preserving Aggregation Protocol: A sophisticated aggregation protocol is designed by combining differential privacy with homomorphic encryption, providing formal privacy guarantees and ensuring scalability across heterogeneous IoT networks.

(3) Explainable Threat Detection Module: A comprehensive explainability framework is integrated, combining attention visualisation, SHAP analysis, and feature-importance ranking to produce interpretable insights into threat detection decisions.

(4) Comprehensive Evaluation and Analysis: Extensive experiments are conducted on real-world IoT datasets, demonstrating superior performance in terms of accuracy, privacy preservation, scalability, and explainability compared with existing approaches.

The proposed framework is particularly relevant to industrial IoT, smart healthcare, and critical infrastructure monitoring, where privacy-preserving and interpretable threat detection is vital for operational trust and regulatory compliance.

The remainder of this paper is organised as follows: Section 2 provides a comprehensive review of related work in federated learning, transformer networks, and IoT security. Section 3 presents the EdgeTrustX framework, including the problem formulation, architecture design, and algorithmic details. Section 4 presents experimental results and comparative analysis. Finally, Section 6 concludes the paper and discusses future research directions.

2 Literature Review

2.1 IoT Security Landscape and Emerging Threats

The Internet of Things (IoT) ecosystem continues to expand rapidly, with billions of interconnected devices deployed across healthcare, transportation, industrial automation, smart grids, and consumer environments. This rapid proliferation amplifies the attack surface and exposes IoT systems to a growing landscape of security and privacy threats. Recent studies highlight that the distributed architecture, device heterogeneity, limited computational capability, and lack of standardized security protocols contribute to widespread vulnerabilities in IoT deployments [12]. Traditional centralized threat detection mechanisms are unsuitable for modern IoT environments due to latency constraints, communication overhead, and privacy concerns arising from centralized data aggregation [19]. Consequently, emerging security solutions must ensure accuracy, low latency, scalability, privacy preservation, and explainability to secure next-generation IoT ecosystems [20].

2.2 Federated Learning for Distributed IoT Threat Detection

Federated Learning (FL) provides a decentralized learning mechanism that preserves data locality and secures privacy by enabling collaborative model training across heterogeneous devices without transferring raw data [4]. In FL, IoT devices compute local updates and share encrypted model parameters with a central aggregator, which synthesizes a global model. This approach is particularly effective for sensitive domains such as medical IoT and industrial IoT security.

Several recent works demonstrate the potential of FL while revealing important limitations. Abd Elaziz et al. [1] introduced a TabTransformer-driven federated intrusion detection framework optimized via nature-inspired hyperparameter search, but it suffered from high communication overhead caused by frequent synchronization. Albogami [2] implemented a deep FL model with integrated privacy mechanisms suited for edge-IoT systems; however, its resilience against adversarial attacks was limited. Khan et al. [4] proposed a reinforcement-based federated fusion model that improved convergence speed and robustness under non-IID data settings—an intrinsic challenge in real-world IoT networks. Karunamurthy et al. [8] focused on reducing energy consumption by optimizing computation and communication loads on low-power IoT devices.

Among the core FL paradigms, horizontal federated learning remains the most suitable for IoT security scenarios because feature spaces are typically consistent across IoT nodes while datasets vary by device. This minimizes alignment complexity, reduces aggregation overhead, and enhances convergence efficiency. Therefore, EdgeTrustX employs horizontal FL to ensure performance scalability across diverse IoT architectures.

2.3 Transformer-Based Architectures for Threat Analysis

Transformers have demonstrated superior capability in sequence modeling and capturing long-range dependencies through multi-head self-attention mechanisms. These characteristics make them highly effective for analyzing complex network traffic and identifying multi-stage attack patterns [11].

Alsharaiah et al. [9] developed an explainable transformer model for spoofing attack detection in IoMT systems, employing SHAP-based interpretability to highlight critical features. Song and Ma [6] proposed a federated attention-based neural network that integrates attention mechanisms with federated optimization strategies, resulting in improved detection accuracy and reduced communication cost. Saghir et al. [11] employed transformer-based anomaly detection with explainable SHAP-driven transparency to interpret threat classification.

These studies validate that transformers can effectively capture complex threat semantics; however, integrating transformer architectures into federated environments requires solutions that address privacy, resource limitations, and real-time communication efficiency. Practical deployment also necessitates combining explainability mechanisms with robust privacy preservation.

2.4 Privacy Preservation in Federated IoT Systems

With IoT applications dealing with highly sensitive information—including health records, environmental telemetry, and industrial operational data—preserving privacy during model training is non-negotiable. Two major privacy-preserving technologies are widely studied: Differential Privacy (DP) and Homomorphic Encryption (HE).

Yuan et al. [21] provided a comprehensive survey on approximate homomorphic encryption for privacy-preserving machine learning, highlighting the trade-offs between computational cost and security strength. Han [20] introduced a blockchain-enhanced privacy-preserving FL system, where HE and optimization techniques jointly secure IoT communication and aggregation processes. Gholami and Seferoglu [13] proposed DIGEST, a decentralized learning method that enhances communication efficiency using local-update strategies, contributing to reduced aggregation overhead in privacy-sensitive environments.

Although both HE and DP offer strong privacy protection, integrating them simultaneously presents challenges in terms of computational efficiency, communication overhead, and model convergence. As highlighted in contemporary IoT FL systems [15–18], HE can slow down gradient exchange due to encryption complexity, while DP may reduce model utility when noise levels are high. Achieving a balanced integration remains a key research challenge that EdgeTrustX aims to address.

2.5 Explainable Artificial Intelligence for Security Operations

Explainable AI (XAI) enables visibility into threat detection decision-making processes, significantly improving trust, analyst interpretability, and operational transparency in IoT security systems. SHAP analysis and attention visualizations are the most widely adopted techniques for explaining the behavior of deep learning models.

Alabbadi and Bajaber [14] used SHAP-based interpretability in an IoT intrusion detection system, improving analyst situational awareness. Rampone et al. [19] integrated explainability features into a hybrid FL framework to support distributed, privacy-preserving intrusion detection with transparent anomaly interpretation. In addition, Alsharaiah et al. [9] and Saghir et al. [11] demonstrated successful use of attention-driven models that highlight significant features involved in threat prediction.

Despite these contributions, current XAI techniques often operate as post-hoc add-ons rather than being architecturally integrated. Furthermore, explanation outputs may expose sensitive information, posing a risk of privacy leakage in federated environments—an issue that remains largely unaddressed across existing literature. Achieving simultaneous privacy preservation and meaningful interpretability is therefore a crucial research necessity.

2.6 Research Gaps and Motivations

Despite advancements across IoT security, federated learning, transformers, and explainable AI, several key gaps remain unaddressed:

1. Underutilization of transformer architectures in federated IoT environments: Existing studies lack a unified approach to leveraging self-attention for distributed sequence modeling under privacy constraints.

2. Insufficient formal privacy guarantees: Most frameworks fail to provide differential privacy proofs or effective defenses against advanced threats such as gradient inversion or membership inference.

3. Lack of integrated explainability mechanisms: Current approaches apply XAI as post-processing rather than embedding it directly within federated optimization pipelines, leaving risks of information leakage.

4. Limited scalability evaluation: Most FL-based IDS solutions are tested on small-scale IoT networks with limited device diversity, overlooking performance variability in large heterogeneous deployments.

EdgeTrustX addresses these gaps by integrating transformer-based local models with homomorphic encryption and formal differential privacy guarantees, alongside a SHAP-attention interpretability module. The framework is evaluated across diverse IoT datasets and large-scale network configurations to ensure robustness, scalability, and transparency in practical deployments.

Table 1 summarises findings derived from an extensive review of more than 20 recent research contributions (2022–2025) on IoT threat detection, privacy-preserving federated learning, and explainable AI frameworks. All comparative insights are synthesised from empirical results and methodologies reported in the referenced works.

images

As shown in Table 1, most existing frameworks fail to simultaneously achieve high detection accuracy, privacy guarantees, explainability, and scalability. EdgeTrustX addresses these gaps by combining a transformer-based architecture with fully homomorphic encryption and differential privacy while incorporating SHAP-driven explainability for interpretable threat analytics.

3 Methodology

The section contains the detailed methodology behind EdgeTrustX, such as the formulated problem, system architecture, mathematical basis, and algorithm implementations. The first part outlines the problem definition and describes each component of the proposed framework.

3.1 Problem Formulation

Consider a federated IoT network consisting of N heterogeneous devices D={D1,D2,…,DN}, where each device Di maintains a local dataset Xi={(xi,j,yi,j)}j=1ni. Here, xi,j∈Rd represents the j-th feature vector of dimension d on the device Di, and yi,j∈{0,1} represents the corresponding binary label indicating whether the sample represents a threat (1) or benign behaviour (0). The total number of samples on the device Di is ni.

The objective of EdgeTrustX is to learn a global threat detection model fθ:Rd→[0,1] parameterised by θ that minimises the global loss function:

Lglobal(θ)=∑i=1NnintotalLi(θ)(1)

here, ntotal=∑i=1Nni is the total number of data samples, ensuring proportional contribution from each client. For binary classification tasks, the local loss Li(θ) is computed using the cross-entropy function:

Li(θ)=1ni∑j=1niℓ(fθ(xi,j),yi,j)(2)

The learning process must satisfy the following constraints:

Privacy Constraint: To ensure formal privacy guarantees, EdgeTrustX adopts (ϵ,δ)-differential privacy. Two datasets X and X′ are defined as adjacent if they differ by at most one record. The randomised mechanism 𝒜 satisfies (ϵ,δ)-DP if:

Pr[𝒜(X)∈S]≤eϵPr[𝒜(X′)∈S]+δ(3)

here, ϵ controls the privacy budget: smaller ϵ enforces stronger privacy. The parameter δ captures the probability of privacy leakage. In EdgeTrustX, Gaussian noise N(0,σ2) is added to gradients before secure aggregation to satisfy this constraint.

Communication Constraint: To optimise communication efficiency, a constraint is defined on the total transmitted model updates across T training rounds and N clients. Instead of using the ℓ0-norm, which counts non-zero parameters, the ℓ2-norm is adopted to measure the magnitude of updates Δθi(t), making the communication cost differentiable and better aligned with federated optimisation practices:

Ccomm=∑t=1T∑i=1N∥Δθi(t)∥2≤Cmax,(4)

here, ∥Δθi(t)∥2 measures the ℓ2-norm of local parameter updates sent by client i during round t, and Cmax represents the maximum allowable communication budget.

Explainability Constraint: The model must provide interpretable explanations for its decisions through attention weights and feature importance scores.

3.2 EdgeTrustX Architecture

Fig. 2 illustrates the comprehensive architecture of EdgeTrustX, which consists of five main components: (1) Local Transformer Modules, (2) Privacy-Preserving Aggregation Layer, (3) Global Model Coordination, (4) Explainability Engine, and (5) Threat Detection Interface.

images

Figure 2: Overall architecture of EdgeTrustX framework showing the integration of federated learning, transformer networks, privacy preservation, and explainability components

3.2.1 Local Transformer Module

Each IoT device Di implements a local transformer module that processes device-specific data while maintaining privacy. The transformer architecture consists of multi-head self-attention layers followed by position-wise feed-forward networks.

The multi-head attention mechanism is defined as:

MultiHead(Q,K,V)=Concat(head1,…,headh)WO(5)

where each attention head is computed as:

headi=Attention(QWiQ,KWiK,VWiV)(6)

the scaled dot-product attention is defined as:

Attention(Q,K,V)=softmax(QKTdk)V(7)

to preserve input-level privacy during multi-head self-attention computation, Gaussian noise is injected into the attention logits. This prevents adversaries from reconstructing sensitive information from intermediate attention maps while maintaining accurate global representations.

A~i,j=softmax(qikjTdk+N(0,σ2))(8)

here, qi and kj represent the query and key vectors, dk is the key dimension, and N(0,σ2) denotes Gaussian noise with variance σ2 calibrated to the privacy budget ϵ. This mechanism provides formal differential privacy guarantees while preventing leakage through attention visualization.

3.2.2 Privacy-Preserving Aggregation Protocol

The aggregation protocol combines differential privacy with secure multi-party computation to ensure privacy while maintaining model utility. The protocol consists of three phases: noise addition, secure aggregation, and model update.

Phase 1: Noise Addition. Each device Di adds calibrated Gaussian noise to its local model updates:

θ~i(t)=θi(t)+N(0,σ2I)(9)

where the noise variance is calibrated according to the differential privacy requirements:

σ2=2log⁡(1.25/δ)C2ϵ2(10)

here, C represents the clipping bound for gradient updates.

Phase 2: Secure Aggregation. The noisy updates are aggregated using homomorphic encryption. Each device encrypts its update using a shared public key:

E(θ~i(t))=Encpk(θ~i(t))(11)

the server performs homomorphic aggregation:

E(θ(t+1))=1N∑i=1NE(θ~i(t))(12)

Phase 3: Model Update. The aggregated model is decrypted and distributed to all participating devices:

θ(t+1)=Decsk(E(θ(t+1)))(13)

Conceptual Summary of the Privacy-Preserving Aggregation Workflow: The privacy-preserving aggregation process of EdgeTrustX operates in three sequential stages to ensure secure model coordination across distributed IoT devices.

(1) Noise Addition: Each client locally perturbs its gradient updates by injecting calibrated Gaussian noise according to differential-privacy parameters (ε,δ). This step guarantees that the contribution of any single data sample remains indistinguishable, protecting sensitive device-level information.

(2) Secure Aggregation: The perturbed gradients are then encrypted using the CKKS homomorphic encryption scheme, enabling the server to perform summation directly on ciphertexts without accessing the raw gradients. This preserves both confidentiality and computational integrity.

(3) The encrypted global aggregate is decrypted using the private key and redistributed to clients for the next training round, completing a secure and privacy-aware learning cycle.

Fig. 3 visually summarises these three phases, depicting the interaction between IoT devices, the privacy-preserving aggregation layer, and the global coordination server.

images

Figure 3: Conceptual flow of the EdgeTrustX privacy-preserving aggregation protocol showing differential-privacy noise addition, homomorphic-encryption-based aggregation, and secure global model update

3.2.3 Explainability Engine

The explainability engine provides multi-faceted interpretability through three complementary approaches: attention visualisation, SHAP analysis, and feature importance ranking.

Attention-Based Explanations: The attention weights from the transformer provide insight into which input features or time steps are most relevant for the decision:

Importance(xj)=∑h=1H∑i=1Lαh,i,j(l)(14)

where αh,i,j(l) represents the attention weight for feature j in head h of layer l.

SHAP-Based Explanations: SHAP values are computed to provide feature-level explanations:

ϕj=∑S⊆F\{j}|S|!(|F|−|S|−1)!|F|![f(S∪{j})−f(S)](15)

where F is the set of all features and S is a subset of features.

Integrated Explainability Score: A combination of attention and SHAP-based explanations to provide a comprehensive interpretability score:

ExplainScore(xj)=α⋅Importance(xj)+(1−α)⋅|ϕj|(16)

where α∈[0,1] is a weighting parameter.

3.3 Training Algorithm

Algorithm 1 presents the complete EdgeTrustX training procedure, which integrates federated learning with privacy preservation and explainability.

images

Data Distribution and Client Heterogeneity Modelling: To reflect the statistical heterogeneity typically observed in large-scale IoT networks, the local training data for each client was organised under both independent and non-identically distributed (non-IID) conditions. The non-IID scenario was simulated using a Dirichlet allocation strategy with a concentration parameter of α = 0.5, resulting in variable class proportions and sample volumes per client. This design captures the inherent diversity in device behaviour and traffic composition across heterogeneous IoT environments. The IID configuration, in contrast, ensured class-balanced random sampling, serving as a controlled benchmark. All stochastic partitions were generated with a fixed random seed (42) to preserve experimental reproducibility.

3.4 Rigorous Privacy and Utility Analysis

We formalise the privacy guarantees of EdgeTrustX and derive an explicit optimisation/utility bound under standard smoothness/convexity assumptions.

3.4.1 Notation and Assumptions

Two datasets X and X′ are adjacent if they differ in at most one record. Let Δ2(g) denote the ℓ2-sensitivity of a (vector-valued) function g. Let N(0,σ2I) denote isotropic Gaussian noise.

We assume throughout:

• Per-example gradients are clipped: for all examples z, ∥∇ℓ(θ;z)∥2≤C; client update deltas are clipped to norm ≤C before noise.

• For the attention-logit function h(⋅) computed locally (pre-softmax), per-example logits are norm-bounded: ∥h(z)∥2≤B (enforced by per-example logit clipping); hence Δ2(h)≤2B.

• Loss L(θ) is L-smooth; for the convergence bound, an additional assumption of μ-strong convexity (0<μ≤L).

3.4.2 Per-Round Differential Privacy

Theorem 1 (Per-Mechanism DP via Gaussian Mechanism): Let g be a function with ℓ2-sensitivity Δ2(g). For any ϵ0∈(0,1] and δ0∈(0,1), the mechanism

M(x)=g(x)+N(0,σ2I)with σ≥2ln⁡(1.25δ0)Δ2(g)ϵ0(17)

is (ϵ0,δ0)-differentially private.

Proof: This is the classical Gaussian mechanism; see, e.g., Dwork–Roth (2014, Thm 3.22). □

Corollary 1 (Per-Round DP for EdgeTrustX components): In each communication round t:

1. Attention-logit privatisation (Eq. (8)): with logit clipping enforcing Δ2(h)≤2B and noise std σatt≥2ln(1.25/δatt)(2B)/ϵatt, the attention-logit release is (ϵatt,δatt)-DP.

2. Noisy client updates (Eqs. (9) and (10)): with clipping bound C so that Δ2(Δθi)≤2C, and σupd≥2ln(1.25/δupd)(2C)/ϵupd, each client’s transmitted update is (ϵupd,δupd)-DP.

Proof: Apply Theorem 1 with Δ2(h)≤2B and Δ2(Δθi)≤2C. □

The differential privacy parameters are configured as ε = 0.1 and δ = 1 × 10−5 to achieve a rigorous balance between privacy and utility. These values ensure that the probability of an adversary inferring any single sample’s contribution remains negligible while maintaining convergence stability during federated optimisation. Empirical studies in privacy-preserving learning have shown that ε≤0.1 achieves strong protection against gradient inversion without compromising model accuracy by more than 1%–2%, justifying the selected configuration for EdgeTrustX.

3.4.3 Composition across Mechanisms and Rounds

Theorem 2 (Advanced Composition across Rounds): Suppose each round applies two mechanisms as in Cor. 1, each satisfying (ϵ0,δ0)-DP with ϵ0∈(0,1]. Across T rounds, the total mechanism is (ϵT,δT)-DP with ϵT≤

2(2T)ln⁡(1δ^) ϵ0+(2T)ϵ02,δT=(2T)δ0+δ^,(18)

for any δ^∈(0,1).

Proof: Use the advanced composition theorem (e.g., Dwork–Roth 2014, Thm 3.20) on 2T sequential mechanisms with identical (ϵ0,δ0), then collect δ terms. □

Remark (Moments Accountant/RDP Alternative).

Tighter bounds are obtainable via Rényi DP (RDP) or the moments accountant. If each mechanism is Gaussian with variance σ2, then for order α>1, εRDP(α)=α/(2σ2) (no subsampling). Over 2T compositions: εRDP,T(α)=(2T)α/(2σ2), and conversion to (ϵ,δ) gives ϵ(δ)=minα>1{εRDP,T(α)+ln⁡(1/δ)α−1}. This can be reported alongside the advanced-composition bound.

3.4.4 Effect of Secure Aggregation (HE) and Post-Processing

Theorem 3 (HE Aggregation Preserves DP by Post-Processing): Let each client transmit a DP-sanitised vector xi=Mi(Di) where Mi is (ϵT,δT)-DP w.r.t. Di. Let Enc/Dec be any (possibly randomised) encryption/decryption maps independent of Di, and let the server compute y=Dec(∑ iEnc(xi)). Then the distribution of y is also (ϵT,δT)-DP with respect to each Di.

Proof. Differential privacy is closed under post-processing: for any (possibly randomised) function F independent of the underlying data, F(M(D)) is (ϵ,δ)-DP if M is. Encryption, homomorphic addition, and decryption are (data-independent) mappings applied to DP outputs xi; hence, y preserves the same (ϵT,δT). □

Corollary 2 (End-to-End DP for EdgeTrustX): Under (A1)–(A2), with per-round parameters set as in Corollary 1 and composition as in Theorem 2, the global model sequence {θ(t)}t=1T released to clients is (ϵT,δT)-DP with respect to every individual record across all devices.

Proof: Combine Corollary 1, Theorem 2, and Theorem 3; the server’s decryption and model update are post-processing steps. □

3.4.5 Utility/Convergence under DP Noise

We analyse FedAvg with clipping and Gaussian noise in the strongly convex case.

Lemma 1 (Noise on Averaged Update): Let each of N clients send a clipped update with added noise ξi∼N(0,σupd2I). The server forms the average u↼=1N∑i=1N(Δθi+ξi). Then E[u↼]=1N∑iΔθi and Var(u↼)=σupd2NI.

Proof: Linearity of expectation; independence of ξi gives variance reduction by N. □

Theorem 4 (Utility Bound for DP-FedAvg (Strongly Convex)): Assume (A1) and (A3). Consider global updates θ(t+1)=θ(t)−ηg↼(t), where g↼(t) is the clipped, noisy average gradient proxy from Lemma 1 with variance σeff2:=σupd2/N per coordinate. Choose step size 0<η≤1/L. Then for all T≥1,

E[L(θ(T))−L⋆]≤(1−ημ)T(L(θ(0))−L⋆)+ηLdσeff22μ+ηLζ22μ,(19)

where d is the parameter dimension and ζ2 collects the (bounded) variance/bias from stochastic gradients and clipping (with ∥∇ℓ(θ;z)∥≤C giving ζ2≤C2).

Proof: By L-smoothness and μ-strong convexity (standard descent lemma),

L(θ+)≤L(θ)−η⟨∇L(θ),g↼⟩+Lη22∥g↼∥2.(20)

Take the conditional expectation given θ. Write g↼=∇L(θ)+ϵ, where ϵ has zero mean and covariance bounded by Σ≼σeff2I+ζ2I (DP noise + stochastic/clipping noise). Then E∥g↼∥2≤∥∇L(θ)∥2+d(σeff2+ζ2). With η≤1/L and strong convexity (∥∇L(θ)∥2≥2μ(L(θ)−L⋆)), it yields:

E[L(θ+)−L⋆]≤(1−ημ)(L(θ)−L⋆)+ηL2dσeff2+ηL2ζ2.(21)

Unroll the recursion for T steps and sum the geometric tail to get the stated bound. □

Corollary 3 (Explicit DP–Utility Tradeoff): Using the Gaussian calibration σupd=2ln(1.25/δ0)2Cϵ0 per round (Cor. 1), the effective noise term in Thm. 4 is σeff2=8C2ln(1.25/δ0)Nϵ02. Thus, the DP-induced steady-state error contribution is:

ηLd2μσeff2=4ηLdμ⋅C2ln⁡(1.25δ0)Nϵ02(22)

for fixed (ϵ0,δ0) and N, tighter privacy (smaller ϵ0) increases this term as 1/ϵ02, while more devices decrease it as 1/N.

Gradient Clipping and Convergence Assumptions: to ensure numerical stability during privacy-preserving training, gradient clipping was applied with an upper bound of ‖g‖2≤1.0, thereby limiting the sensitivity of model updates before the injection of differential privacy noise. This constraint controls the variance of the Gaussian noise and prevents gradient explosion in non-IID client distributions. Convergence was empirically assumed when the validation loss improvement remained below 1% over five consecutive communication rounds, a condition verified across all experimental configurations. These assumptions provide a reproducible basis for analysing training dynamics under privacy and encryption constraints.

3.4.6 Computational Complexity

The computational complexity of EdgeTrustX per round is analysed as follows:

Local Computation: Each device performs transformer computation with complexity O(nid2+ni2d) for attention computation and O(nid2) for feed-forward layers.

Communication Complexity: The communication cost per round is O(Nd) for encrypted model updates.

Aggregation Complexity: The server aggregation complexity is O(Nd) for homomorphic operations.

The total complexity per round is:

Ctotal=O(∑i=1N(nid2+ni2d)+Nd)(23)

Incorporating DP and HE introduces additional computational overhead due to noise injection and encryption. However, this cost is offset by reduced communication frequency and parameter sharing, leading to an overall 23% reduction in transmission volume while maintaining <0.5 s per round latency.

4 Results and Discussion

This section presents a comprehensive experimental evaluation of EdgeTrustX across multiple dimensions: threat detection performance, privacy preservation effectiveness, scalability analysis, and explainability assessment. Experiments are conducted on real-world IoT datasets and compare state-of-the-art baselines.

4.1 Experimental Setup

4.1.1 Datasets

EdgeTrustX is evaluated on four widely used IoT security datasets, with explicit links to the datasets provided to ensure reproducibility.

(1) IoT-23 Dataset1: Contains network traffic from 20 IoT devices with various malware families, including 325,307 benign flows and 129,274 malicious flows.

(2) N-BaIoT Dataset2: Comprises network traffic from 9 commercial IoT devices infected with Mirai and BASHLITE botnets, totalling 7,062,606 instances.

(3) UNSW-NB15 Dataset3: Contains 2,540,044 records with nine types of attacks, including Exploits, Reconnaissance, DoS, and Generic attacks.

(4) CIC-IDS2017 Dataset4: Includes 2,830,743 network flow records capturing normal traffic and contemporary attack scenarios.

Each dataset was partitioned across 20 clients under both IID and non-IID conditions. For IID, samples were randomly distributed, ensuring class balance; for non-IID, the Dirichlet distribution (α = 0.5) is adopted to emulate heterogeneity in client data sizes (ranging from 1 K to 80 K samples). A random seed of 42 was fixed for all splits to ensure reproducibility. Unless specified otherwise, reported results correspond to non-IID configurations, confirming the robustness of EdgeTrustX under heterogeneous data distributions.

Table 2 summarises the primary hyperparameters used for training EdgeTrustX and the comparison baselines to ensure experimental reproducibility.

images

4.1.2 Baseline Methods

We compare EdgeTrustX with the following state-of-the-art approaches:

FedAvg-CNN: Standard federated averaging using a deep convolutional architecture, represented by the deep federated learning approach in Albogami’s model [2].

FedProx-LSTM: Federated proximal variant implemented with an LSTM-based intrusion detection framework as demonstrated in Sorour et al.’s privacy-preserving LSTM-JSO federated model [15].

DP-Fed: Differential-privacy–enhanced federated learning following the DP-integrated FL mechanisms adopted in Albogami’s work [2].

HE-Fed: Homomorphic-encryption–based federated learning consistent with the blockchain-assisted HE-enabled FL system proposed by Han [20].

Centralised-Transformer: Centralised transformer-based intrusion detection system without privacy guarantees, represented by Alsharaiah et al.’s explainable transformer-powered IoMT threat detection model [9].

4.1.3 Implementation Details

EdgeTrustX is implemented in PyTorch with the following configuration: Transformer layers: 6 encoder layers with eight attention heads; Hidden dimension: 512; Feed-forward dimension: 2048; Learning rate: 0.001 with Adam optimiser; Batch size: 32; Local epochs: 5; Privacy parameters: ϵ=0.1, δ=10−5; Communication rounds: 100.

The experiments are conducted on a distributed testbed consisting of 20 edge devices (Raspberry Pi 4) and a central server (Intel Xeon E5-2690).

4.2 Threat Detection Performance

Table 3 presents a comprehensive comparison of threat detection performance across all datasets and methods.

images

EdgeTrustX achieves superior performance with an average accuracy of 94.7% across all datasets, representing an improvement of approximately 5%–8% over privacy-preserving federated learning baselines. The framework maintains performance close to that of the centralised transformer (95.3%) while providing strong privacy guarantees.

Fig. 4 provides a three-part analysis of EdgeTrustX’s effectiveness. Fig. 4a breaks down power consumption by components across IoT device types, with Jetson Nano consuming the most. Fig. 4b shows method-wise comparison on Raspberry Pi 4, where EdgeTrustX offers the best energy efficiency and battery life. Fig. 4c evaluates the energy-accuracy-privacy trade-off, plotting detection accuracy against energy, with bubble size indicating privacy score. EdgeTrustX achieves the optimal point, balancing all three axes. This figure validates that EdgeTrustX is energy-efficient, privacy-preserving, and accurate for resource-constrained IoT deployment compared to other federated learning baselines.

images

Figure 4: Detailed performance comparison showing accuracy, precision, recall, and F1-score across different datasets and methods

Attack Type-Specific Analysis

Table 4 presents the detection performance for different attack types, demonstrating EdgeTrustX’s effectiveness across various threat categories.

images

The results indicate that EdgeTrustX maintains robust performance across all attack types, achieving high F1-scores for botnet detection (0.961) and DDoS detection (0.955).

4.3 Privacy Preservation Analysis

4.3.1 Differential Privacy Evaluation

We evaluate the privacy preservation capabilities of EdgeTrustX through membership inference attacks and reconstruction attacks. Fig. 5 illustrates the privacy-utility trade-off under different privacy budgets.

images

Figure 5: Privacy-utility trade-off analysis showing model accuracy vs. privacy budget (ϵ) for different privacy mechanisms

Fig. 5 examines privacy-utility trade-offs in federated IoT learning. Fig. 5a illustrates the inverse relationship between detection accuracy and privacy budget (ϵ), with EdgeTrustX maintaining high accuracy even at low ϵ values. Fig. 5b compares effectiveness scores of privacy mechanisms (DP, HE, secure aggregation, noise addition), where EdgeTrustX outperforms others across all categories. Fig. 5c visualises the privacy-utility Pareto landscape. EdgeTrustX resides in the optimal zone, offering high privacy and utility simultaneously. The figure confirms EdgeTrustX’s strong privacy preservation with minimal performance compromise, outperforming traditional DP-Fed, HE-Fed, and centralised models.

Table 5 presents the quantitative results of the privacy evaluation using standard privacy attack methods.

images

EdgeTrustX achieves the best privacy score (0.923) with minimal utility loss (0.6%), demonstrating the effectiveness of the implemented privacy-preserving mechanisms.

The evaluation was conducted under a semi-honest adversarial model where the attacker observes exchanged gradients during federated rounds. Membership inference was performed using multiple shadow models trained on disjoint subsets of the same dataset. At the same time, gradient reconstruction employed an inversion strategy optimised with the Adam algorithm (learning rate = 0.001, 2000 iterations). All attack parameters, data partitions, and optimiser settings were standardised across experiments to maintain comparability. The privacy score reported in Table 5 is a normalised indicator combining membership-inference resistance, reconstruction difficulty, and utility retention, where higher values denote stronger privacy preservation. Detailed configurations and scripts used for this evaluation are provided in the supplementary material to support independent verification.

The strong performance of EdgeTrustX at low privacy budgets (ε ≤ 0.1) can be attributed to the architectural integration of transformer-based self-attention layers with federated aggregation regularisation. The self-attention mechanism enables contextual feature weighting, allowing the model to maintain discriminative representations even when privacy noise is injected into gradients. Furthermore, the multi-head attention structure distributes information learning across multiple subspaces, thereby mitigating the degradation typically caused by differential privacy perturbations. The use of layer normalisation and residual connections additionally stabilises training under high noise variance, preserving convergence and ensuring minimal accuracy loss. Collectively, these architectural properties enable EdgeTrustX to maintain an accuracy of up to 94.7% while adhering to stringent privacy guarantees, underscoring its robustness against privacy-utility trade-offs.

4.3.2 Homomorphic Encryption Efficiency

We analyse the computational overhead of homomorphic encryption operations in EdgeTrustX. Fig. 6 shows the encryption, computation, and decryption times for different model sizes.

images

Figure 6: Homomorphic encryption efficiency analysis showing computational overhead for different operations and model sizes

Fig. 6 evaluates the efficiency of homomorphic encryption (HE) in EdgeTrustX. Fig. 6a shows computation time scaling with model size, where EdgeTrustX outperforms baseline HE implementations. Fig. 6b plots memory usage, demonstrating the impact of encryption on model overhead. Fig. 6c analyses throughput versus device count, revealing that EdgeTrustX maintains linear scalability while standard HE-fed approaches degrade rapidly beyond 100 devices. These insights highlight EdgeTrustX’s optimisation for resource-constrained environments, making it scalable and efficient even under cryptographic load, validating its practical feasibility for large-scale IoT deployments with privacy guarantees.

4.4 Scalability Analysis

4.4.1 Device Scalability

We evaluate EdgeTrustX’s scalability by varying the number of participating devices from 10 to 200. Fig. 7 presents the scalability analysis results.

images

Figure 7: Scalability analysis showing convergence time, communication overhead, and model accuracy as functions of the number of participating devices

Fig. 7 assesses EdgeTrustX scalability across varying IoT network sizes. Fig. 7a presents convergence time contours as device count increases, showing that EdgeTrustX remains efficient up to 1000 nodes. Fig. 7b uses a forest plot to compare performance (in terms of memory, convergence, efficiency, and accuracy) across 200 different device setups. Fig. 7c analyzes convergence time vs. efficiency trade-off, with EdgeTrustX sustaining low latency and high efficiency, even as the number of devices scales. This demonstrates the framework’s robustness and adaptability under heavy load, reinforcing its suitability for diverse, large-scale federated IoT networks.

Although Table 6 presents empirical scalability results up to 200 physical IoT devices, an additional large-scale simulation was conducted to examine performance under extended network sizes. Using a federated simulation environment developed with PyTorch and Flower, up to 1000 virtual clients were modelled to emulate the communication latency, aggregation delay, and bandwidth heterogeneity typical of large IoT deployments. The simulated results indicated that EdgeTrustX maintained an average convergence efficiency above 78% and exhibited less than 18% accuracy degradation compared with the 200-device configuration, while communication overhead scaled linearly with client participation. These findings substantiate the claim that EdgeTrustX can remain computationally viable and communication-efficient when scaled toward 1000 nodes under realistic distributed network conditions. However, comprehensive hardware-based validation beyond 200 devices is reserved for future work.

images

4.4.2 Communication Efficiency

We analyse the communication efficiency of EdgeTrustX compared to baseline methods. Fig. 8 illustrates the reduction in communication overhead achieved by the proposed framework.

images

Figure 8: Communication efficiency comparison showing total communication overhead and convergence behaviour for different federated learning approaches

Fig. 8 provides a communication-centric analysis. Fig. 8a maps communication overhead across communication rounds and device numbers, showing EdgeTrustX achieves low overhead at scale. Fig. 8b evaluates bandwidth-latency trade-offs with message size and compression, highlighting EdgeTrustX’s optimal region. Fig. 8c compares convergence efficiency with communication overhead over time, where EdgeTrustX consistently outperforms FedAvg and HE-Fed in convergence while incurring lower overhead. The figure proves EdgeTrustX is communication-efficient, latency-aware, and scalable, making it a strong candidate for real-world federated deployments with constrained bandwidth.

4.5 Explainability Assessment

4.5.1 Attention Visualization

Fig. 9 presents attention heatmaps for different types of attacks, demonstrating how EdgeTrustX focuses on relevant features for threat detection.

images

Figure 9: Multi-level attention visualisation and feature-importance analysis across different IoT attack types. (a) Feature-attention landscape illustrating contour-based saliency over key network features. (b) Temporal feature-attention evolution showing sequence-level sensitivity across time steps. (c) Feature-importance distribution derived from SHAP and attention scores across attack categories

Fig. 9 explores the explainability of EdgeTrustX via attention mechanisms. Fig. 9a maps attention strengths over feature-attack type combinations, identifying patterns (e.g., system-level vs. command and control). Fig. 9b tracks the temporal evolution of attention during DDoS attacks, clearly showing the decision-making phases. Fig. 9c presents the statistical distribution of feature importance across different threat categories, classified as critical, high, moderate, and supportive. This visual evidence confirms that EdgeTrustX not only detects threats effectively but also provides granular interpretability, crucial for real-time analyst trust and forensic auditing in cybersecurity environments. The heatmaps emphasise that attention is concentrated on high-impact indicators, such as packet rate, source-port diversity, and flow duration, during DDoS and botnet activities. In contrast, reconnaissance and exfiltration attacks exhibit a stronger dependence on destination-port entropy and connection frequency. These visualisations demonstrate how the attention mechanism captures context-aware patterns unique to each threat type.

4.5.2 SHAP Analysis

We conduct a comprehensive SHAP analysis to provide feature-level explanations. Table 7 presents the top contributing features for each attack type.

images

4.5.3 Explainability Metrics

We evaluate the explainability quality using standard metrics. Fig. 10 shows the explainability assessment results.

images

Figure 10: Explainability-quality assessment comparing multiple interpretability methods in terms of faithfulness, stability, and comprehensiveness. (a) Explainability-quality landscape highlighting Pareto-optimal trade-offs among competing metrics. (b) Temporal explainability evolution showing faithfulness progression during training. (c) Multi-metric forest-plot comparison across explanation techniques

Fig. 10 provides a comprehensive evaluation of EdgeTrustX’s explainability. Fig. 10a plots faithfulness versus comprehensiveness in a Pareto-optimal landscape, where EdgeTrustX ranks among the best. Fig. 10b illustrates how explainability evolves during training, with EdgeTrustX exhibiting consistent growth. Fig. 10c presents a forest plot with scores across eight explainability metrics, confirming EdgeTrustX’s balance of interpretability dimensions. Compared to baselines, it delivers superior, stable, and explainable outputs. These results validate that privacy-preserving federated models can achieve high transparency—addressing the black-box criticism of deep learning. The plots indicate that EdgeTrustX achieves the most balanced explainability profile, maintaining high faithfulness and comprehensiveness while ensuring temporal stability. Compared with baseline methods, its SHAP-attention fusion yields consistent interpretability gains throughout training.

Table 8 compares the explainability performance of EdgeTrustX with baseline methods.

images

EdgeTrustX achieves explainability performance comparable to centralised approaches while maintaining privacy preservation. For instance, in DDoS scenarios, the model assigns higher attention weights to packet-rate variability and port-diversity features, enabling analysts to directly trace anomalies to high-frequency traffic bursts, thereby supporting transparent incident interpretation.

4.6 Ablation Studies

4.6.1 Component-Wise Analysis

We conduct ablation studies to understand the contribution of each component in EdgeTrustX. Table 9 presents the results.

images

To evaluate the contribution of each module within EdgeTrustX, an ablation study is conducted, as shown in Table 5. The base Transformer model achieves an accuracy of 91.2% ± 0.4 with limited privacy preservation and moderate explainability.

All reported results are presented as the mean ± standard deviation computed over five independent runs to capture experimental variability. A paired t-test with a significance threshold of p < 0.05 was conducted to compare EdgeTrustX with the baseline configurations, confirming that the observed improvements in accuracy and privacy score are statistically significant. These tests ensure the reliability and reproducibility of the reported performance metrics.

Integrating Federated Learning (FL) enhances data decentralisation but slightly decreases accuracy due to non-IID distributions. Incorporating Differential Privacy (DP) increases the privacy score to 0.823 while causing a slight reduction in accuracy to 89.3% ± 0.6, reflecting the inherent trade-off between privacy preservation and model utility.

When Homomorphic Encryption (HE) is introduced, the privacy score increases further to 0.891, while accuracy recovers to 90.1%±0.5, demonstrating that secure aggregation can maintain its competitive utility. Incorporating the Attention Explanation module enhances interpretability, raising the explainability score from 0.669 to 0.789 without compromising performance. Further adding SHAP Analysis provides the highest interpretability (0.823±0.02), enabling human-understandable feature importance rankings.

Finally, the complete EdgeTrustX framework integrates all modules, achieving the best overall performance with an accuracy of 94.7%±0.3, the highest privacy score (0.923), and an explainability score of 0.851±0.01. These results validate the synergistic benefit of combining DP, HE, federated learning, and explainable attention mechanisms.

In addition to the component-level analysis, the effect of training hyperparameters on communication efficiency and model performance is examined. Fig. 11 illustrates the relationship between accuracy and communication cost as the local training epochs, client sampling rates, and quantisation levels are varied. Specifically, local epochs were adjusted among {1, 3, 5}, client participation rates were varied among {20%, 50%, 100%}, and model precision was compared between 8-bit and 16-bit quantisation. Results show that EdgeTrustX achieves a detection accuracy of above 93% even under 8-bit quantisation and 50% client participation, while reducing the total communication volume by approximately 35% compared to the full-precision baseline. Increasing the number of local epochs beyond three slightly improves convergence stability, but it also increases communication by about 22%, demonstrating a diminishing return. Applying gradient sparsification further reduces transmitted data by 28% while maintaining accuracy within a 1% variation of the baseline model. These findings confirm that EdgeTrustX achieves a strong balance between detection accuracy and bandwidth efficiency, making it well-suited for large-scale, bandwidth-limited IoT deployments.

images

Figure 11: Accuracy–communication trade-off for EdgeTrustX under different local epochs, client sampling rates (20%, 50%, 100%), and quantisation precisions (8-bit vs. 16-bit). The framework achieves over 93% accuracy while reducing communication costs by up to 35% through quantisation and partial client participation

Increasing the number of attention heads enhances the model’s ability to represent multi-context features, thereby improving its capacity to capture diverse traffic behaviours. However, beyond eight heads, diminishing returns are observed due to redundancy in attention patterns, suggesting optimal trade-offs for future transformer tuning.

4.6.2 Hyperparameter Sensitivity

All hyperparameter-sensitivity experiments were conducted under constant privacy parameters (ε = 0.1, δ = 1 × 10−5) to isolate the effects of architectural and optimisation settings. Fig. 12 shows the sensitivity analysis for key hyperparameters, including privacy budget, learning rate, and number of attention heads.

images images

Figure 12: Hyperparameter sensitivity analysis showing the impact of (a) Privacy Budget vs. Learning Rate Interaction Surface, (b) Multi-Parameter Sensitivity Profiles, and (c) QMobNet Quantum Component Contribution Analysis

Fig. 12 investigates how tuning key hyperparameters affects EdgeTrustX. Fig. 12a shows a 3D privacy-budget vs. learning-rate surface, pinpointing optimal configuration (ϵ =0.1, LR=10−2) with confidence contours. Fig. 12b normalises parameter influence—privacy budget, attention heads, epochs, batch size—revealing that privacy and attention dimensions dominate performance. Fig. 12c presents a related quantum framework (QMobNet) to contrast incremental performance contributions. While EdgeTrustX is not quantum, the comparative subplot underscores modular performance gains. Overall, this figure highlights the importance of meticulous hyperparameter tuning in striking a balance between privacy, accuracy, and efficiency.

4.7 Real-World Deployment Analysis

4.7.1 Energy Consumption

We analyse the energy consumption of EdgeTrustX on IoT devices. Fig. 13 presents a comparison of energy efficiency.

images

Figure 13: Energy consumption analysis showing power usage for different components and comparison with baseline methods across various IoT device types

Fig. 13 evaluates EdgeTrustX’s energy footprint. Fig. 13a breaks down power usage across five IoT device types (Raspberry Pi 4, Jetson Nano), revealing that Jetson Nano consumes the most, mainly during local training. Fig. 13b compares federated learning methods on the Raspberry Pi 4, showing that EdgeTrustX achieves better battery life due to lower power consumption. Fig. 13c plots detection accuracy vs. energy, highlighting EdgeTrustX’s optimal zone of energy, performance, and privacy balance. This figure validates EdgeTrustX’s hardware efficiency, making it ideal for real-time threat detection on resource-constrained edge platforms.

4.7.2 Latency Analysis

Table 10 presents the latency analysis for different operations in EdgeTrustX.

images

4.7.3 Throughput and Memory Profiling

To evaluate the computational feasibility of running EdgeTrustX on resource-constrained hardware, throughput and memory profiling were performed using two representative edge devices: Raspberry Pi 4 (4 GB RAM) and NVIDIA Jetson Nano (8 GB RAM). Each device executed the local transformer model, which was configured with six encoder layers and a hidden dimension of d = 512. The Raspberry Pi 4 achieved an average throughput of 42 inferences per second, with a peak memory footprint of 1.28 GB. In contrast, the Jetson Nano reached 78 inferences per second and utilised 1.94 GB of memory. To validate scalability, a lightweight configuration comprising four layers was developed (d = 256), which reduced memory usage by 37% while maintaining detection accuracy within 2% of the whole model. Further 8-bit quantisation of model weights reduced total memory to below 1 GB with negligible accuracy loss (<0.5%). Fig. 14 illustrates the throughput–layer-depth trade-off, confirming that the transformer backbone remains computationally feasible for practical IoT deployment. These results substantiate that EdgeTrustX can operate effectively on embedded platforms without exceeding their resource budgets, addressing concerns regarding transformer complexity in edge-level environments.

images

Figure 14: Throughput and memory profiling of EdgeTrustX models on Raspberry Pi 4 and Jetson Nano platforms

4.8 Comparative Trade-off Analysis

The integrated evaluation of EdgeTrustX highlights the interdependence between privacy preservation, scalability, and explainability in federated IoT environments. Incorporating differential privacy (DP) and homomorphic encryption (HE) introduces a modest computational overhead of approximately 11% compared to the baseline federated model, primarily due to noise injection and ciphertext operations. However, these mechanisms reduce information leakage by over 20% in membership-inference resistance and enhance robustness against reconstruction attacks, confirming their practical utility in privacy-sensitive deployments.

From a scalability perspective, the optimised aggregation pipeline enables support for up to 1000 clients with an average communication latency of below 0.5 s per round, validating its real-time feasibility on heterogeneous IoT infrastructures. Although the addition of explainability modules (attention visualisation and SHAP analysis) slightly increases inference time by less than 5%, they substantially improve interpretability and model transparency for security analysts.

Overall, the framework achieves a balanced trade-off, offering strong privacy protection, efficient scalability, and high interpretability without significantly compromising detection accuracy. This equilibrium between privacy, performance, and transparency demonstrates EdgeTrustX’s suitability for large-scale, trustworthy IoT threat detection and mitigation.

5 Discussion

The results demonstrate that EdgeTrustX delivers consistently strong performance across multiple evaluation criteria, including accuracy, privacy preservation, explainability, and scalability. By integrating federated transformer networks with differential privacy and homomorphic encryption, EdgeTrustX achieves an average accuracy of 94.7% across four real-world IoT security datasets while ensuring user data privacy through a strict differential privacy budget of ϵ = 0.1. Moreover, the explainability analysis confirms that EdgeTrustX maintains interpretability comparable to centralised transformer baselines, which is essential for practical threat intelligence workflows in operational IoT environments.

A key observation is EdgeTrustX’s ability to sustain performance with increasing device participation. Even at 200 participating nodes, EdgeTrustX retains over 80% operational efficiency with reduced communication overhead and stable convergence rates. This demonstrates the framework’s suitability for large-scale, heterogeneous IoT deployments where device capabilities often vary significantly. Additionally, edge-based energy and latency evaluation results confirm that the proposed architecture is more deployable on real IoT hardware than heavy centralised transformer-based models that incur high computational costs.

To further contextualize these improvements, Table 11 summarizes a comparison between EdgeTrustX and existing representative methods based on the validated reference list.

images

As shown in Table 11, most prior approaches sacrifice either privacy, scalability, or interpretability. FedAvg-CNN and DP-Fed provide basic privacy or distributed training but lack integrated explainability and do not scale effectively beyond small device groups. HE-Fed offers strong encryption but suffers from high computational overhead and limited scalability. Transformer-XAI models demonstrate strong accuracy and rich interpretability but only in fully centralised settings where privacy risks are significantly higher.

In contrast, EdgeTrustX strikes a balanced and unified approach. It combines the privacy-preserving strengths of differential privacy and homomorphic encryption with the modeling power of transformer architectures and SHAP-attention–driven explainability. This synergy enables both high performance and operational transparency while maintaining scalable and privacy-conscious analytics in distributed IoT environments.

The strength of EdgeTrustX lies in its holistic design philosophy. Instead of treating privacy, scala1bility, and interpretability as isolated objectives, EdgeTrustX integrates them seamlessly within one coherent architecture. While future research may explore further optimizations—such as ultra-low-latency configurations or integration with trusted execution environments—the current implementation already establishes a strong benchmark for next-generation secure IoT analytics.

6 Conclusion

This study presents EdgeTrustX, a privacy-aware federated transformer framework developed for scalable and explainable IoT threat detection. By integrating transformer-based neural architectures with differential privacy and homomorphic encryption, EdgeTrustX addresses the fundamental challenges of data confidentiality, interpretability, and heterogeneity in large-scale distributed IoT ecosystems. Comprehensive experiments on four benchmark datasets (IoT-23, N-BaIoT, UNSW-NB15, and CIC-IDS2017) demonstrate that EdgeTrustX achieves an average detection accuracy of 94.7%, approaching the centralised transformer baseline of 95.3% while ensuring strong privacy guarantees under a strict ε = 0.1 differential privacy budget. The framework improves scalability by 23%, sustains over 80% system efficiency across 200 heterogeneous devices, and achieves a per-round latency of 449.2 ms, confirming its real-time viability for edge environments. The integration of SHAP-based feature attribution and attention-weight visualisation yields high interpretability, achieving faithfulness and comprehensiveness scores of 0.826 and 0.851, respectively. Despite its advantages, EdgeTrustX introduces moderate computational overhead due to encryption and depends on labelled datasets for supervised learning. These factors may constrain deployment on ultra-low-power nodes and limit adaptability to unlabeled data streams. Future research will focus on several directions: (1) designing lightweight homomorphic-encryption schemes and integrating Trusted Execution Environments (TEEs) to minimize cryptographic cost while preserving end-to-end security; (2) developing adaptive differential-privacy mechanisms with dynamic noise scaling and gradient clipping for improved privacy-utility balance; (3) extending the framework toward a malicious threat model through Byzantine-robust aggregation; (4) enabling cross-domain generalization to support heterogeneous IoT verticals; (5) incorporating blockchain-based authentication and auditability to enhance trust management; and (6) scaling EdgeTrustX beyond 1000 devices using hierarchical federated optimization and efficient parameter-compression strategies. Overall, EdgeTrustX establishes a unified, practical, and high-performance paradigm that combines accuracy, privacy preservation, interpretability, and system efficiency, positioning it as a strong foundation for next-generation IoT threat-detection systems.

Acknowledgement: The author would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work.

Funding Statement: The author received no specific funding for this study.

Availability of Data and Materials: The data supporting this study’s findings are available from the corresponding author upon reasonable request at: saleh@su.edu.sa.

Ethics Approval: This research study solely involves the use of historical datasets. No human participants or animals were involved in the collection or analysis of data for this study. As a result, ethical approval was not required.

Conflicts of Interest: The author declares no conflicts of interest.

1https://www.kaggle.com/datasets/surajsooraj26/iot-23/data, (accessed on 11 November 2025).

2https://www.kaggle.com/datasets/mkashifn/nbaiot-dataset, (accessed on 11 November 2025).

3https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15, (accessed on 11 November 2025).

4https://www.kaggle.com/datasets/dhoogla/cicids2017, (accessed on 11 November 2025).

References

1. Abd Elaziz M, Fares IA, Dahou A, Shrahili M. Federated learning framework for IoT intrusion detection using tab transformer and nature-inspired hyperparameter optimization. Front Big Data. 2025;8:1526480. doi:10.3389/fdata.2025.1526480. [Google Scholar] [PubMed] [CrossRef]

2. Albogami NN. Intelligent deep federated learning model for enhancing security in internet of things enabled edge computing environment. Sci Rep. 2025;15(1):4041. doi:10.1038/s41598-025-88163-5. [Google Scholar] [PubMed] [CrossRef]

3. Saraladeve L, Chandrasekar A, Nithya T, Kalaiarasi S, Sampathkumar B, Thanikachalam R. A multiclass attack classification framework for IoT using hybrid deep learning model. J Cybersecur Inf Manag. 2025;15(1):151. doi:10.54216/JCIM.150112. [Google Scholar] [CrossRef]

4. Khan IA, Razzak I, Pi D, Khan N, Hussain Y, Li B, et al. Fed-inforce-fusion: a federated reinforcement-based fusion model for security and privacy protection of IoMT networks against cyber-attacks. Inf Fusion. 2024;101(3):102002. doi:10.1016/j.inffus.2023.102002. [Google Scholar] [CrossRef]

5. Park JH, Yotxay S, Singh SK, Park JH. PoAh-enabled federated learning architecture for DDoS attack detection in IoT networks. Hum Centric Comput Inf Sci. 2024;14(3):1–25. doi:10.22967/HCIS.2024.14.003. [Google Scholar] [CrossRef]

6. Song X, Ma Q. Intrusion detection using federated attention neural network for edge enabled internet of things. J Grid Comput. 2024;22(1):15. doi:10.1007/s10723-023-09725-3. [Google Scholar] [CrossRef]

7. Hamdi N. Federated learning-based intrusion detection system for internet of things. Int J Inf Secur. 2023;22(6):1937–48. doi:10.1007/s10207-023-00727-6. [Google Scholar] [CrossRef]

8. Karunamurthy A, Vijayan K, Kshirsagar PR, Tan KT. An optimal federated learning-based intrusion detection for IoT environment. Sci Rep. 2025;15(1):8696. doi:10.1038/s41598-025-93501-8. [Google Scholar] [PubMed] [CrossRef]

9. Alsharaiah MA, Almaiah MA, Shehab R, Obeidat M, Ali El-Qirem F, Aldhyani T. An explainable AI-driven transformer model for spoofing attack detection in internet of medical things (IoMT) networks. Discov Appl Sci. 2025;7(5):488. doi:10.1007/s42452-025-07071-5. [Google Scholar] [CrossRef]

10. Al-Haboosi IT, Elbagoury BM, El-Regaily S, El-Horbaty EM. A hybrid-transformer-based cyber-attack detection in IoT networks. Int J Interact Mob Technol. 2024;18(14):90–102. doi:10.3991/ijim.v18i14.50343. [Google Scholar] [CrossRef]

11. Saghir A, Beniwal H, Tran KD, Raza A, Koehl L, Zeng X, et al. Explainable transformer-based anomaly detection for internet of things security. In: Tran KP, Li S, Heuchenne C, Truong TH, editors. EAI/Springer innovations in communication and computing. Cham, Switzerland: Springer; 2024. p. 83–97. doi:10.1007/978-3-031-53028-9_6. [Google Scholar] [CrossRef]

12. Ahmed SF, Alam MSB, Afrin S, Rafa SJ, Taher SB, Kabir M, et al. Toward a secure 5G-enabled internet of things: a survey on requirements, privacy, security, challenges, and opportunities. IEEE Access. 2024;12:13125–45. doi:10.1109/access.2024.3352508. [Google Scholar] [CrossRef]

13. Gholami P, Seferoglu H. DIGEST: fast and communication efficient decentralized learning with local updates. Trans Mach Learn Comm Netw. 2024;2(213):1456–74. doi:10.1109/tmlcn.2024.3354236. [Google Scholar] [CrossRef]

14. Alabbadi A, Bajaber F. An intrusion detection system over the IoT data streams using eXplainable artificial intelligence (XAI). Sensors. 2025;25(3):847. doi:10.3390/s25030847. [Google Scholar] [PubMed] [CrossRef]

15. Sorour SE, Aljaafari M, Shaker AM, Amin AE. LSTM-JSO framework for privacy preserving adaptive intrusion detection in federated IoT networks. Sci Rep. 2025;15(1):11321. doi:10.1038/s41598-025-95966-z. [Google Scholar] [PubMed] [CrossRef]

16. Agbor BA, Stephen BU, Asuquo P, Luke UO, Anaga V. Hybrid CNN-BiLSTM–DNN approach for detecting cybersecurity threats in IoT networks. Computers. 2025;14(2):58. doi:10.3390/computers14020058. [Google Scholar] [CrossRef]

17. Danquah LKG, Appiah SY, Mantey VA, Danlard I, Akowuah EK. Computationally efficient deep federated learning with optimized feature selection for IoT botnet attack detection. Intell Syst Appl. 2025;25(4):200462. doi:10.1016/j.iswa.2024.200462. [Google Scholar] [CrossRef]

18. Wen M, Zhang Y, Zhang P, Chen L. IDS-DWKAFL: an intrusion detection scheme based on dynamic weighted K-asynchronous federated learning for smart grid. J Inf Secur Appl. 2025;89(3):103993. doi:10.1016/j.jisa.2025.103993. [Google Scholar] [CrossRef]

19. Rampone G, Ivaniv T, Rampone S. A hybrid federated learning framework for privacy-preserving near-real-time intrusion detection in IoT environments. Electronics. 2025;14(7):1430. doi:10.3390/electronics14071430. [Google Scholar] [CrossRef]

20. Han Y. A privacy preserving federated learning system for IoT devices using blockchain and optimization. J Comput Commun. 2024;12(9):78–102. doi:10.4236/jcc.2024.129005. [Google Scholar] [CrossRef]

21. Yuan J, Liu W, Shi J, Li Q. Approximate homomorphic encryption based privacy-preserving machine learning: a survey. Artif Intell Rev. 2025;58(3):82. doi:10.1007/s10462-024-11076-8. [Google Scholar] [CrossRef]

22. Kheddar H. Transformers and large language models for efficient intrusion detection systems: a comprehensive survey. Inf Fusion. 2025;124(1):103347. doi:10.1016/j.inffus.2025.103347. [Google Scholar] [CrossRef]

23. Gueriani A, Kheddar H, Mazari AC, Ghanem MC. A robust cross-domain IDS using BiGRU-LSTM-attention for medical and industrial IoT security. ICT Express. 2025;8(11):1–10. doi:10.1016/j.icte.2025.08.011. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Alharbi, S. (2026). EdgeTrustX: A Privacy-Aware Federated Transformer Framework for Scalable and Explainable IoT Threat Detection. Computers, Materials & Continua, 87(3), 15. https://doi.org/10.32604/cmc.2026.073584

Vancouver Style

Alharbi S. EdgeTrustX: A Privacy-Aware Federated Transformer Framework for Scalable and Explainable IoT Threat Detection. Comput Mater Contin. 2026;87(3):15. https://doi.org/10.32604/cmc.2026.073584

IEEE Style

S. Alharbi, “EdgeTrustX: A Privacy-Aware Federated Transformer Framework for Scalable and Explainable IoT Threat Detection,” Comput. Mater. Contin., vol. 87, no. 3, pp. 15, 2026. https://doi.org/10.32604/cmc.2026.073584

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

EdgeTrustX: A Privacy-Aware Federated Transformer Framework for Scalable and Explainable IoT Threat Detection

Abstract

Keywords

References

Cite This Article

594

320

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link

Related articles

https://www.kaggle.com/datasets/surajsooraj26/iot-23/data

https://www.kaggle.com/datasets/mkashifn/nbaiot-dataset

https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15

https://www.kaggle.com/datasets/dhoogla/cicids2017