Open Access
ARTICLE
Cross-Domain Robust Dynamic Trust Evaluation for Industrial Internet of Things Edge Nodes
School of Cryptography Engineering, Engineering University, Zhengzhou, China
* Corresponding Author: Zhiyu Ren. Email:
Computers, Materials & Continua 2026, 88(2), 99 https://doi.org/10.32604/cmc.2026.082704
Received 23 March 2026; Accepted 19 May 2026; Issue published 15 June 2026
Abstract
To address trust-score drift and unsafe online adaptation under cross-domain attack-contaminated streams in Industrial Internet of Things (IIoT) edge environments, this paper proposes a risk-aware lightweight test-time adaptation (TTA) framework, named RaL-TTA, for dynamic trust evaluation of edge nodes. RaL-TTA constructs a low-dimensional robust feature space and a source-domain normal-entropy reference baseline, and performs selective online maintenance in the target domain through Kolmogorov–Smirnov (KS) drift detection, SafeBrake risk gating, Adaptive Batch Normalization (AdaBN) anchor protection, and budgeted sample-level safeguards. Low-risk batches are adapted by updating only lightweight Batch Normalization (BN) parameters, whereas high-risk batches freeze online updates and invoke anchor-based protective inference. Experiments on Edge-IIoTset show that RaL-TTA substantially improves perturbation-stage attack detection and false-positive control compared with general TTA baselines while maintaining post-perturbation stability. In the main Edge-IIoTset setting, RaL-TTA achieves a perturbation-stage true positive rate (TPR) of 1.0000, false positive rate (FPR) of 0.0410, F1-score of 0.9544, and accuracy of 0.9713, while updating only 192 online parameters. External validation on X-IIoTID,a connectivity- and device-agnostic intrusion dataset for IIoT, further evaluates cross-service generalization under Modbus, Message Queuing Telemetry Transport (MQTT), and WebSocket target services. Additional sensitivity, startup-window robustness, calibration, and runtime-overhead analyses further characterize the stability, deployment assumptions, trust-score reliability, and edge-side feasibility of the proposed framework.Keywords
The Industrial Internet of Things (IIoT) has developed rapidly with the accelerated integration of next-generation information technology and industrial manufacturing. It extends computing capabilities from cloud infrastructures to edge devices and has become an important infrastructure for industrial intelligence. Its security and reliability are directly related to the continuous and stable operation of physical production processes [1–3]. With the ubiquitous deployment of heterogeneous sensors and actuators, IIoT systems face complex security challenges arising from the coexistence of open connectivity and closed-loop physical processes. Dynamic trust evaluation is an important way to complement traditional static defense in heterogeneous edge scenarios and to support adaptive access control and edge-side risk decision-making [4,5].
Most existing studies on IIoT trust evaluation and industrial intrusion detection follow a static offline training mode [6–8]. Whether traditional machine-learning methods, such as random forests and support vector machines, or deep neural-network-based anomaly detection models are employed, their performance usually depends heavily on the assumption that the training data and deployment data follow an independent and identically distributed (IID) setting [9,10]. Such models are typically trained once in the source domain, such as a laboratory environment, and then fixed, which limits their adaptability to non-stationary environments. However, in actual industrial settings, edge nodes face severe domain-shift challenges. The heterogeneity of underlying communication protocols, such as migration between Message Queuing Telemetry Transport (MQTT) and Modbus over Transmission Control Protocol (Modbus/TCP), and the time-varying nature of production conditions can make the target-domain data distribution deviate from the source-domain prior [6,8]. Violating the IID assumption can therefore degrade the detection performance of a source-domain model in the target domain and distort the corresponding trust scores.
To alleviate the degradation of source-domain models in the target domain, researchers have regarded cross-domain trust evaluation as a distribution-transfer problem under non-stationary environments and introduced mechanisms such as domain adaptation (DA) to reduce the impact of domain shift [11,12]. For example, Adaptive Batch Normalization (AdaBN) achieves distribution alignment by re-estimating normalization statistics in the target domain [13]; Source Hypothesis Transfer (SHOT) freezes the source-domain classifier without accessing source data and iteratively optimizes the target-domain feature extractor [14]. To address continuous distribution changes in industrial edge environments, methods such as EdgeFD combine drift detection with model-weight integration to reduce the overhead caused by frequent fine-tuning and to alleviate catastrophic forgetting [11]. However, such methods often still rely on phased adaptation processes or iterative optimization and are sensitive to statistical estimates from small batches, making it difficult to meet the requirements of edge-node security detection in online unlabeled scenarios. Therefore, the research focus has gradually shifted to test-time adaptation (TTA) methods that can operate without retraining after deployment.
TTA reduces the deployment cost of adaptation because it does not require source-domain data after deployment [15]. AdaBN rapidly aligns distributions by re-estimating Batch Normalization (BN) statistics [13]; fully test-time adaptation by entropy minimization (TENT) optimizes model parameters online through entropy minimization to increase prediction confidence in the target domain [16]. For continuously changing target-domain distributions, methods such as continual test-time adaptation (CoTTA) suppress error accumulation and catastrophic forgetting through teacher-student consistency, enhanced averaging, and random recovery [12]. Other studies construct adaptation objectives from energy functions and iterative sampling [17], or monitor entropy drift online and perform entropy-distribution matching for more robust trigger-based adaptation [18]. However, directly applying general TTA methods to adversarial IIoT edge environments still faces resource and security constraints. Frequent backpropagation, multiple data augmentations, or iterative sampling can increase inference latency on resource-limited edge nodes. In addition, entropy minimization may compress prediction uncertainty and cause overconfident erroneous convergence or model contamination when malicious attacks or high-noise disturbances appear, thereby reducing the reliability of trust scores [10].
However, directly applying existing TTA methods to IIoT trust evaluation remains insufficient. AdaBN mainly recalibrates BN statistics and does not explicitly distinguish benign domain shift from attack-contaminated drift. TENT performs entropy minimization during testing, but may become overconfident on abnormal or adversarial target samples. Protected Online Entropy Matching (POEM) improves online entropy matching, but it is not specifically designed for dynamic trust evaluation under attack-risk constraints. In contrast, the proposed risk-aware lightweight test-time adaptation (RaL-TTA) framework introduces a risk-aware maintenance strategy that combines entropy-distribution shift detection, SafeBrake risk gating, AdaBN-based anchor protection, and budgeted sample-level safeguards. Therefore, the proposed method is not only an adaptation mechanism but also a security-oriented dynamic trust evaluation framework for edge-side IIoT streams. Table 1 summarizes the key differences.

To address resource constraints, cross-domain distribution shifts, and attack-stream contamination in unlabeled online updates for IIoT edge nodes, this paper proposes the RaL-TTA framework for cross-domain dynamic trust evaluation. Unlike general TTA strategies that lack explicit risk constraints, RaL-TTA builds an “offline trust baseline–online risk gating” mechanism. In the source-domain stage, it uses the low-dimensional feature set for Industrial Internet of Things (LoFT-IIoT) [19] and label-smoothed training to establish a normal-entropy reference baseline. In the target-domain stage, it combines shift detection with SafeBrake risk gating, freezes updates for high-risk batches, invokes Adaptive Batch Normalization (AdaBN) anchor protection, and performs restricted BN maintenance only for low-risk batches. Budgeted sample-level arbitration is used only as a supplementary safety boundary for a small number of highly ambiguous samples. This framework enhances detection performance, trust-score stability, and maintenance security in cross-domain online streams while controlling online overhead.
The main contributions of this work are summarized as follows:
1. We formulate security-oriented dynamic trust evaluation for IIoT edge nodes under cross-domain online streams, where the task-level trust score is derived from the estimated attack risk.
2. We develop RaL-TTA, a lightweight TTA framework that combines a low-dimensional source-domain trust baseline with risk-constrained online maintenance for edge-side deployment.
3. We introduce a protective online adaptation strategy integrating KS-based shift detection, SafeBrake risk gating, AdaBN-calibrated anchor inference, and budgeted sample-level safeguards to reduce unsafe adaptation under attack-contaminated streams.
4. We validate the proposed framework on Edge-IIoTset and an external service-holdout setting based on X-IIoTID, a connectivity- and device-agnostic intrusion dataset for IIoT, with ablation, sensitivity, calibration, startup-window robustness, and runtime-overhead analyses.
2 RaL-TTA Cross-Domain Dynamic Trust Evaluation Framework
To address the domain-shift problem caused by cross-protocol communication and continuous online streaming in IIoT edge nodes, this paper constructs the RaL-TTA cross-domain dynamic trust evaluation framework, as shown in Fig. 1. This framework consists of two phases: offline trust modeling in the source domain and online risk-constrained maintenance in the target domain. In the source-domain phase, labeled traffic is taken as input, and low-dimensional feature selection is completed through LoFT-IIoT [19], followed by training a lightweight trust evaluation model under the label-smoothing constraint. Meanwhile, the empirical cumulative distribution function (ECDF) of normal entropy is constructed based only on normal samples in the source domain, serving as a reference baseline for the target-domain online phase. Before deployment, a short target-domain startup window collected during controlled initialization is used for AdaBN anchor-model calibration, target-domain normal-reference statistics estimation, and threshold initialization. In the target-domain phase, unlabeled online streams are taken as input. First, Kolmogorov–Smirnov (KS)-based entropy-shift detection is executed, and then SafeBrake determines the risk status in combination with batch volatility. For low-risk batches, only restricted maintenance of BN affine parameters is performed, while for high-risk batches, updates are frozen and anchor protection is invoked. Sample-level arbitration is used only as a supplementary safety boundary. Finally, dynamic trust scores are output at the task level.

Figure 1: Overall framework of RaL-TTA for cross-domain dynamic trust evaluation. The source-domain phase constructs a low-dimensional feature space, trains the TrustMLP model, and builds a normal-entropy ECDF reference baseline. The target-domain phase processes unlabeled online batches, performs KS-based entropy-shift detection and SafeBrake risk gating, updates only BN affine parameters for low-risk batches, and invokes the AdaBN-calibrated anchor model with sample-level safeguards for high-risk batches. The startup window
2.2 Source-Domain Trust Baseline Construction
The construction of the source-domain trust baseline includes three steps: feature selection, model training, and estimation of the normal-entropy baseline. First, in the candidate feature space after removing protocol-specific fields, LoFT-IIoT [19] is used to select low-dimensional robust features to reduce reliance on protocol identifiers in cross-domain scenarios. Second, a lightweight multilayer perceptron (MLP) trust evaluator with label smoothing is trained on the selected low-dimensional feature space. Finally, a normal-entropy ECDF is constructed solely from the predicted entropy of normal samples in the source domain, serving as the normal reference baseline for the target-domain online phase.
2.3 Online Risk-Aware Adaptation Mechanism
During the online phase, edge nodes receive unlabeled target-domain traffic in online batches. First, the prediction entropy of the current batch is calculated, and the KS statistic is used to measure the discrepancy between the current batch and the source-domain normal-entropy baseline. Then, SafeBrake determines the current state as no drift, low-risk drift, or high-risk drift according to both entropy-distribution shift and batch-level feature volatility.
If no significant drift is detected, the current batch is directly evaluated by the online model, and the model state remains unchanged. If a low-risk drift is detected, restricted online maintenance is executed by updating only the BN affine parameters through entropy distribution alignment, attack-ratio prior regularization, and BN regularization. If a high-risk drift is detected, the online model update is frozen, and the AdaBN-calibrated anchor model is invoked to perform protective inference. Distance gating and budget-based sample-level arbitration are used only as supplementary security boundaries for a small number of highly ambiguous samples, thereby suppressing contamination-induced erroneous adaptation.
3.1 Problem Formulation and Notation
In IIoT edge scenarios, edge nodes continuously receive unlabeled network traffic from the target environment. This study considers cross-domain dynamic trust evaluation in a setting where the source domain is labeled, the target domain is unlabeled, and the online distribution changes over time. The goal is to balance anomaly detection capability, output stability, and online maintenance security in continuous online streams.
Let the labeled source-domain reference set be defined as
where
To reduce the redundancy of the original traffic features and enhance the cross-domain transferability, the samples are mapped to a low-dimensional feature space through a feature mapping function
where
A lightweight trust evaluator
where
Accordingly, the task-level trust score is defined as
where
To characterize the uncertainty of the model prediction for the current sample, the predictive entropy is introduced:
Here,
During the online stage, target-domain traffic arrives batch by batch. Let the current batch at time
where
Based on the predictive entropy of normal source-domain samples, the empirical cumulative distribution function (ECDF) is defined as
where
Similarly, the empirical entropy distribution of the current batch is defined as
where
The distribution shift of the current batch with respect to the normal source-domain reference is then defined as
A larger
3.2 Construction of the Source-Domain Reference Model and Normal Baseline
3.2.1 Lightweight Feature Construction Based on LoFT-IIoT
IIoT traffic features typically exhibit significant cross-scale differences, with large variations across different fields. Such scale differences can make statistical estimation vulnerable to extreme values. Meanwhile, some application-layer fields in heterogeneous protocols have strong protocol specificity. If directly used as input, these fields may cause the model to over-rely on protocol-specific semantics, thereby weakening generalization in cross-domain scenarios. To address these issues, this paper adopts a lightweight feature construction strategy based on LoFT-IIoT [19] to filter and reduce the dimension of the original traffic features. The overall process is shown in Fig. 2.

Figure 2: LoFT-IIoT-based lightweight feature construction pipeline. Raw high-dimensional traffic features are first filtered to remove labels, timestamps, host identifiers, and protocol-specific shortcut fields, and log-smoothing is applied to reduce scale differences and extreme-value effects. Candidate features are grouped into statistical, behavioral, and protocol-related categories, scored by the mutual-information–variance joint score, and ranked to select the top-
First, the numerical fields are retained from the original traffic features, while the label column, time column, host identification fields, and sequence-number-like fields that may introduce identity shortcuts are removed. Meanwhile, the protocol-specific fields of MQTT, Hypertext Transfer Protocol (HTTP), Modbus, Domain Name System (DNS), Address Resolution Protocol (ARP), and Internet Control Message Protocol (ICMP) are masked, and only transferable low-level statistical and behavioral features are retained. Subsequently, logarithmic smoothing is performed on the candidate numerical features:
which compresses scale differences and reduces the influence of extreme values.
Second, the candidate features are classified into three semantic buckets, namely statistics, behavior, and protocol, based on the field semantics. A joint scoring strategy of mutual information and variance is adopted to describe their category discrimination ability and information activity. For the
where
Finally, features are pre-selected within each semantic bucket according to the joint score, and global ranking is then used for supplementation and trimming to obtain the low-dimensional feature subset
3.2.2 TrustMLP: A Lightweight Trust Assessment Model
After obtaining the low-dimensional feature representation, a lightweight MLP, named TrustMLP, is constructed as the source-domain trust evaluator. Let the model parameters be denoted by

Figure 3: TrustMLP architecture for task-level trust-score estimation. The selected
Given the input feature
Here,
3.2.3 Offline Training with Label Smoothing
If the standard cross-entropy loss is directly adopted for source-domain supervised training, the model is prone to output extremely high or low probabilities close to 0 or 1 in the later stage of training, thereby causing an overconfidence problem [20]. For scenarios where a statistical baseline needs to be constructed based on the prediction entropy subsequently, this will cause the entropy values of normal samples to overly concentrate in the low range, weakening the statistical sensitivity of the entropy distribution to domain shifts. Therefore, a label smoothing strategy is introduced in the source-domain offline training stage.
Let the number of classes be
Here,
Based on this, offline training is completed by minimizing the cross-entropy between the predicted distribution and the soft label distribution. Before the features are input into the model, logarithmic compression and standardization processing are still adopted to eliminate the scale differences among different features. After the offline training is completed, the model parameters with the best performance on the validation set are selected as the source-domain reference model
3.2.4 Construction of the Source-Domain Normal-Entropy Baseline
To build the normal reference baseline required for online shift detection in the target domain, predictive entropy is computed only over the normal source-domain sample set
Based on the entropy set in Eq. (15), the corresponding empirical distribution function is still defined by Eq. (8).
This distribution characterizes the typical uncertainty structure of the model under normal behavior conditions and can be used as a reference to determine the degree of distribution shift in the target domain during the online phase. It is necessary to emphasize that the entropy baseline constructed in this paper is only derived from normal samples in the source domain and does not include attack samples, in order to reduce the contamination of abnormal samples on the normal reference distribution.
3.3 Risk-Constrained Online Trust Maintenance Mechanism
3.3.1 Calibration with a Target-Domain Normal Window
To enhance the statistical matching in the early stage of deployment, this paper introduces an explicit deployment assumption: a short controlled trial period containing normal target-domain traffic is available at the beginning of deployment. The corresponding target-domain normal calibration window is denoted by
where
In practical IIoT deployment, such a startup window can be collected during system commissioning, scheduled maintenance, device restart, or a short trusted initialization period in which the production process operates under known normal conditions. This assumption does not require attack labels and does not expose target-domain class labels to training or online updates. If a clean startup window cannot be guaranteed, the system should operate in a conservative mode by disabling online updates until operator-confirmed normal traffic is available. The startup-window robustness analysis in Section 4.7 further evaluates this assumption under limited and contaminated calibration data.
At deployment initialization, the collected
Based on
where
The online model is initialized as
Meanwhile, in the low-dimensional feature space, the mean vector and covariance matrix of the target-domain normal reference are estimated from
and
These statistics are used later for sample-level protection in high-risk stages.
Next,
In the main setting,
For the
where
Accordingly, the volatility baseline of the target-domain normal window is estimated by
In addition, based on the Mahalanobis-distance distribution of samples in
3.3.2 Entropy-Distribution Shift Detection
In the online stage, target-domain traffic arrives batch by batch. For the current batch defined in Eq. (7), the predictive entropy of each sample is first computed using the current online model
When
3.3.3 SafeBrake Risk Gatekeeping
Relying solely on entropy-distribution shift detection may still be affected by local noisy samples and short-term abnormal fluctuations. To improve decision robustness, we further introduce a batch-volatility statistic. Let the log-domain value of the
where
Based on the normal volatility baseline
where
3.3.4 Constrained Online Update and Budgeted Selective Protection
When the current batch is determined to have a low-risk deviation, we adopt a restricted online update strategy to improve adaptation to target-domain data. Specifically, only the learnable affine parameters of the BN layers are updated, while all other weights are frozen. Let the set of learnable BN parameters at time
where
The entropy-quantile alignment term is defined as
where
The attack-ratio prior constraint is defined as
where
The BN-parameter regularization term is defined as
where
Accordingly, the BN-parameter update under low-risk conditions is written as
where
When the current batch is judged as a high-risk shift, online updating is suspended, i.e.,
Under high-risk conditions, the system enters a protection mode. Based on the calibrated anchor model
where
For strongly outlying samples, the anchor-model output is directly used instead. Let the strong-outlier set be
where
For the remaining samples, a budgeted selective-fallback mechanism is further constructed. In the final configuration, only samples predicted as normal by the online model and not identified as strong outliers are considered for arbitration. The candidate set is therefore defined as
To measure the necessity of correcting the current sample by the anchor model, the following score is defined:
where
The fallback-eligible subset and the selective-fallback budget are then defined as
where
Accordingly, the selective-fallback set is defined as
where
Under the protection mechanism, the final attack risk is defined as
and the protected task-level trust score and class output are written as
Here,
This mechanism ensures that strongly anomalous samples are preferentially protected by the anchor model, while only a limited number of high-risk samples with model disagreement are selectively corrected under a fixed budget. In this way, the spread of erroneous self-adaptation can be suppressed while the online model still retains its capability to recognize anomalous behavior. For ease of implementation, the startup-window calibration, entropy-shift detection, risk judgment, and constrained maintenance procedures are summarized in Algorithm 1.

4 Experiments and Results Analysis
4.1 Experimental Environment and Datasets
4.1.1 Experimental Scenario and Cross-Domain Task
The experiments are conducted on the DNN-EdgeIIoT-dataset.csv file from the Edge-IIoTset benchmark proposed by Ferrag et al. [21]. This dataset is collected from network traffic in IIoT environments and contains both normal-behavior samples and multiple categories of attack samples, thereby providing a rich set of traffic features for edge-side security evaluation. In this study, the binary label Attack_label provided by the dataset is adopted as the supervision signal, where Normal is encoded as 0 and Attack is encoded as 1. The attack-type label Attack_type is used only for cross-domain task construction, statistical analysis, and result presentation, and is not involved in model input or online parameter updates. The overall statistics of the dataset are summarized in Table 2.

This paper focuses on the cross-domain behavior distribution shift problem faced by IIoT edge nodes under continuous online streaming conditions. Unlike protocol-only domain definitions, this study defines the cross-domain task as a compound-shift scenario involving both device communication relationships and attack types. In this setting, the normal behavior patterns and attack compositions differ between the source and target domains. The specific data division, online stream construction, and feature configuration are respectively presented in Sections 4.1.2 and 4.1.3.
4.1.2 Data Construction and Streaming Evaluation Settings
To ensure fair comparisons among different methods, ablation experiments, and parameter sensitivity experiments, this paper uniformly completes data partitioning, independent startup-window construction, and three-phase online stream pre-generation under the condition of fixed random seeds. Except for the variables under investigation, all experiments reuse the same data partitioning and streaming sequences. The source-domain training set is composed of source-domain normal traffic and source-domain attack samples extracted according to a fixed attack ratio. The normal and attack traffic of the target domain are split at the group level and divided into validation and test sets in a 1:1 ratio to avoid instance-level leakage. The resulting dataset compositions are summarized in Tables 3 and 4.


Before online evaluation, an independent subset

4.1.3 Feature Preprocessing and Input Feature Configuration
After data construction, the raw traffic features are uniformly preprocessed and configured, and all methods share the same preprocessing pipeline and input feature space. For numerical features, the log-compression strategy described in Section 3.2.1 is applied. The standardization parameters are estimated only on the source-domain training set. They are then fixed and applied to the source-domain validation set, target-domain validation set, target-domain test set, the startup window
To reduce the model dependence on protocol identifiers, explicit identity information, and protocol-specific fields, the experiments remove the label field, time field, communication-object identifier fields, and various protocol-specific fields, while retaining only general statistical features, behavioral features, and transport-layer-related features. On this basis, following the LoFT-IIoT feature-selection strategy, candidate features are screened on the source-domain training set and the final input dimensionality is determined. Unless otherwise specified, the main experiments adopt

The use of eight retained features is motivated by the trade-off among cross-domain robustness, lightweight edge-side deployment, and avoidance of shortcut learning, rather than by an assumption that eight features are universally sufficient for all IIoT scenarios. During feature construction, label fields, timestamps, communication-object identifiers, and protocol-specific application-layer fields are removed to reduce identity or protocol shortcuts. The retained features cover complementary low-level behavioral evidence, including transport-layer integrity, port and connection patterns, packet-control behavior, packet-length statistics, temporal behavior, connection-state information, and flow-level session association. These categories jointly characterize packet control, timing, length, and connection behavior that remain meaningful under heterogeneous service or protocol shifts. Therefore, the eight-dimensional representation provides sufficient behavioral evidence for the evaluated cross-domain robust dynamic trust task while keeping the TrustMLP model and online BN maintenance lightweight. We do not claim that the same eight features are universally optimal for every deployment; rather, they are selected as a conservative low-dimensional configuration for the studied cross-domain evaluation setting.
4.1.4 Evaluation Metrics and Experimental Environment
To comprehensively evaluate the proposed method under cross-domain continuous online streams, the evaluation criteria are organized into four aspects, namely, overall classification performance, phase-wise streaming behavior, trust-score calibration, and resource overhead. Specifically, the overall classification metrics are used to measure the general detection capability of the model; the phase-wise metrics are introduced to characterize the dynamic behavior of the model during the initial deployment stage, the perturbation stage, and the recovery stage; the calibration metrics are used to evaluate the probabilistic interpretability of the task-level trust score; and the resource-overhead metrics are adopted to assess the feasibility of the proposed method under edge-side deployment conditions. The definitions of all evaluation metrics are summarized in Table 7, while the experimental environment and the main parameter settings are listed in Table 8.


4.1.5 External X-IIoTID Validation Protocol
To evaluate generalization beyond Edge-IIoTset, we additionally conduct external validation on the X-IIoTID dataset [22]. Unlike the Edge-IIoTset main experiment, the X-IIoTID experiment adopts a service-holdout cross-domain protocol. The source domain contains multiple source services, while the target domain consists of disjoint Modbus, MQTT, and WebSocket services. The target perturbation stage further includes multiple attack families, including false data injection, MQTT cloud broker subscription, Modbus register reading, scanning vulnerability, and fuzzing. This setting is used to evaluate whether RaL-TTA can maintain dynamic trust evaluation capability under external cross-service distribution shift.
4.2 Comparison of Cross-Domain Detection Performance
To compare the detection performance of different online adaptation strategies under cross-domain distribution shifts, we evaluate Source-Only, AdaBN-only [13], TENT [16], POEM [18], POEM+SafeBrake, and RaL-TTA under the unified experimental protocol described above. The results are reported as mean

As shown in Table 9, Source-Only and AdaBN-only still suffer from high false-positive rates in the target domain, indicating that merely relying on the source-domain model or simple BN-statistics recalibration is insufficient to mitigate cross-domain mismatch. TENT and POEM behave unstably under continuous unlabeled streams, suggesting that online updates without explicit risk constraints are vulnerable to attack-contaminated drift. In contrast, RaL-TTA achieves a strong perturbation-stage trade-off against the external TTA baselines: it maintains a P2_TPR of 1.0000, reduces P2_FPR to 0.0410, and achieves a P2_F1 of 0.9544. The internal POEM+SafeBrake control obtains slightly lower P2_FPR and higher P2_F1 in this controlled stream, but it does not include the full task-level trust-score formulation and budgeted sample-level safeguard used by RaL-TTA. Therefore, the results should be interpreted as showing that risk-aware gating is essential for safe online maintenance, while the full RaL-TTA framework provides a conservative trust-evaluation design with only a small raw-performance cost relative to the strongest internal control.
Fig. 4 further visualizes the online behavior of different methods. Source-Only, TENT, and POEM show unstable or low rolling accuracy under the target-domain stream, whereas the risk-protected POEM-based variants maintain more stable trajectories. RaL-TTA preserves high rolling accuracy during the perturbation and recovery phases, which is consistent with the phase-wise metrics in Table 9. These trajectories further support the role of risk-aware gating and protection in stabilizing online trust evaluation under attack-contaminated target streams.

Figure 4: Sliding-window accuracy trajectories of different online adaptation methods in the three-phase continuous online stream. Each curve reports the mean accuracy over five pre-built target streams with a rolling window of 256 samples, and the shaded region indicates one standard deviation. The vertical dashed lines mark the transitions from the P1 normal phase to the P2 mixed-perturbation phase and from the P2 phase to the P3 normal recovery phase.
4.3 External Validation on X-IIoTID
To evaluate generalization beyond Edge-IIoTset, we further conduct external validation on X-IIoTID under the service-holdout protocol described in Section 4.1.5. Table 10 reports the overall representative results under the strong perturbation setting. Compared with Source-Only, RaL-TTA reduces P2_FPR and P3_FPR while maintaining meaningful attack recall. The external setting is more challenging than the Edge-IIoTset main setting because the target services and attack families are held out from the source domain.

The X-IIoTID results show that general cross-domain TTA methods may produce very different trade-offs. For example, TENT obtains very low false-positive rates but almost loses attack recall in Phase 2. POEM-based variants achieve competitive P2_F1, whereas RaL-TTA provides lower post-perturbation false positives. Therefore, RaL-TTA should be understood as a risk-aware trust-maintenance framework that emphasizes the balance among attack detection, false-alarm suppression, and post-perturbation recovery rather than a method that maximizes every single metric.
Table 11 further reveals that the X-IIoTID service-holdout setting is challenging. RaL-TTA shows relatively stable behavior on MQTT-related target traffic, while Modbus exhibits larger seed-level variance and WebSocket remains difficult due to the mixture of normal and attack samples. These findings provide a fine-grained view of cross-service generalization and suggest that service-specific calibration remains an important direction for future work. Table 12 reports the per-attack-family recall of RaL-TTA under the X-IIoTID service-holdout setting.


The per-attack-family results indicate that different attack types have different degrees of cross-domain difficulty. MQTT cloud broker subscription attacks are detected more reliably, whereas false data injection, fuzzing, and scanning-related attacks remain more challenging. This observation is consistent with the difficulty of unlabeled external cross-service adaptation and is acknowledged as a limitation of the current framework.
4.4 Analysis of Key Mechanisms
4.4.1 Ablation Study of Key Modules
To analyze the main sources of performance improvement during the perturbation stage, this paper constructs three representative ablation settings: removing anchor protection (RaL-TTA w/o Anchor), removing budgeted sample-level safeguard/rollback (BSR; RaL-TTA w/o BSR), and always freezing updates (AlwaysFreeze). The results are summarized in Table 13.

Table 13 shows that simply freezing updates cannot effectively handle cross-domain abnormal perturbations because AlwaysFreeze preserves attack recall but causes a high false-positive rate and poor recovery. The variant without anchor protection is close to the POEM+SafeBrake control, indicating that SafeBrake-style risk gating is the dominant source of false-positive control in the Edge-IIoTset stream. The comparison between RaL-TTA and RaL-TTA w/o BSR further shows that budgeted sample-level rollback has only a limited numerical effect under the main setting. Thus, the ablation study supports a conservative interpretation: risk gating is the primary stabilization mechanism, whereas AdaBN anchor protection and budgeted rollback provide additional safety boundaries for the full trust-evaluation framework rather than serving as the sole source of raw metric gains.
4.4.2 Analysis of Online Maintenance Behavior
To further illustrate how RaL-TTA operates in the main experiments, Table 14 summarizes the numbers of batches under different risk states and the corresponding maintenance actions across the three-phase online stream.

Table 14 shows that all 12 batches in the perturbation stage are judged as high-risk and therefore enter the protection mode, with no online updates being executed. This indicates that the proposed method prioritizes freezing unreliable adaptation when attack traffic is mixed into the stream. In contrast, updates occur mainly in low-risk batches during the initial normal phase and the recovery phase, showing that RaL-TTA follows a selective online strategy of freezing at high risk and maintaining at low risk.
4.4.3 Analysis of Trust-Score Calibration Ability
To verify that the task-level trust scores output in this paper have probabilistic interpretability, we further assess calibration using expected calibration error (ECE), Brier score, and negative log-likelihood (NLL). Table 15 compares the trust-score reliability of RaL-TTA and representative baselines on the target-domain online stream.

Table 15 indicates that Source-Only suffers from large calibration errors after direct transfer to the target domain. During the perturbation stage, RaL-TTA and POEM+SafeBrake both provide substantially better calibration than Source-Only, with POEM+SafeBrake slightly lower on the three calibration metrics. Over the full stream, however, RaL-TTA obtains lower ECE, Brier score, and NLL, suggesting that the full protection-oriented trust-score output improves overall probabilistic reliability across deployment, perturbation, and recovery.
4.5 Hyperparameter Selection and Sensitivity Analysis
To clarify how the key thresholds and online-adaptation parameters are selected, Table 16 summarizes the candidate values, selected values, and selection rules. The parameters are selected through a combination of validation-based tuning, startup-window quantile calibration, and conservative security-budget constraints.

Table 17 shows that RaL-TTA remains stable under moderate variations of

4.6 Computational Overhead Analysis
To compare the relative runtime cost of different methods, we conduct an overhead evaluation on a unified platform with a pre-built online stream. This subsection uses the test split under the pure-attack pressure setting


Figure 5: Overhead comparison of different online adaptation methods under a unified hardware and software platform with a pre-built online stream. The figure reports average batch latency, throughput, peak memory usage, and the number of online trainable parameters. Error bars indicate one standard deviation over three repeated measurements.
From Table 18 and Fig. 5, RaL-TTA uses 192 online-trainable parameters, with a total runtime of 3.1044 s, an average batch latency of 59.70 ms/batch, a throughput of 4188.37 samples/s, and a peak memory usage of 274.84 MB. Its overhead is higher than Source-Only and basic TTA baselines because it performs risk gating and protected inference, but the cost remains bounded and the number of online trainable parameters is unchanged at 192. These results support the feasibility of lightweight edge-side deployment while also clarifying that the additional safety mechanisms introduce a modest runtime and memory cost.
The target-domain normal startup window is important for AdaBN calibration, target-domain normal-reference estimation, and sample-level threshold initialization. To evaluate the feasibility and limitation of this assumption, we test RaL-TTA under different startup-window sizes and contamination rates. This robustness experiment uses three seeds and independently resampled startup windows; therefore, the clean


Figure 6: Startup-window robustness under contaminated startup windows. Curves report the perturbation-stage F1-score under different startup contamination rates for
The results in Table 19 and Fig. 6 show that RaL-TTA performs reliably when the startup window is clean, even with 256 normal samples. With a sufficiently large startup window, the method also tolerates mild contamination. However, small contaminated windows or heavily contaminated startup data degrade calibration reliability. This indicates that the normal startup window is a practical but nontrivial deployment assumption, and further robust initialization under contaminated startup conditions remains future work.
4.8 Discussion and Limitations
First, the trust score in this work is defined from a security-risk perspective. It is suitable for real-time edge-side security monitoring, but does not cover all dimensions of general trust management, such as long-term reputation, social interaction history, resource reliability, or quality-of-service evaluation. Second, RaL-TTA assumes that a short normal startup window is available for unsupervised calibration. Although this assumption is realistic during commissioning, maintenance restart, or trusted initialization, heavily contaminated startup data may weaken anchor construction and threshold estimation. Third, although this study includes external validation on X-IIoTID, both Edge-IIoTset and X-IIoTID are still public benchmark datasets. Real long-term industrial deployments may involve more complex temporal drift, device heterogeneity, and unseen attack behaviors. Fourth, false-positive control remains important for practical edge security systems. Although the main Edge-IIoTset setting reduces the perturbation-stage FPR to 0.0410, deployment-time alert fatigue still needs to be considered. In deployment, the trust score can be combined with multi-window smoothing, alert aggregation, or operator-confirmed escalation to reduce unnecessary alarms. Finally, the ablation results indicate that the conservative rollback branch has limited numerical effect under the main stream, and therefore more adaptive criteria for when to activate sample-level protection deserve further study.
This paper has proposed RaL-TTA, a risk-aware lightweight test-time adaptation (TTA) framework for security-oriented dynamic trust evaluation of IIoT edge nodes under cross-domain online streams. By combining a low-dimensional source-domain trust baseline, KS-based entropy-shift detection, SafeBrake risk gating, AdaBN anchor protection, and budgeted sample-level safeguards, RaL-TTA selectively maintains the online model under low-risk conditions and freezes unsafe adaptation under high-risk attack-contaminated streams. Experiments on Edge-IIoTset demonstrate that RaL-TTA improves perturbation-stage attack detection over general TTA baselines while substantially reducing false positives and maintaining post-perturbation stability. External validation on X-IIoTID further evaluates cross-service generalization across Modbus, MQTT, and WebSocket target services. Additional ablation, sensitivity, startup-window robustness, calibration, and overhead analyses show that the proposed method achieves a favorable balance among detection performance, trust-score reliability, adaptation safety, and edge-side efficiency. Future work will focus on more robust initialization under heavily contaminated startup windows, real edge-hardware deployment, and broader validation across long-term industrial traffic streams.
Acknowledgement: Not applicable.
Funding Statement: This work was supported by the National Natural Science Foundation of China [Grant No. 62102449] and the Science and Technology Research Project of Henan Province [Grant No. 252102211080].
Author Contributions: The authors confirm contribution to the paper as follows: conceptualization, Qiuguo Guan and Zhiyu Ren; methodology, Qiuguo Guan; software and validation, Qiuguo Guan; formal analysis, Qiuguo Guan and Zhiyu Ren; writing—original draft preparation, Qiuguo Guan; writing—review and editing, Qiuguo Guan and Zhiyu Ren; supervision, Zhiyu Ren. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: The datasets used in this study are publicly available. Edge-IIoTset and X-IIoTID are available from their public dataset sources cited in the manuscript. The experimental code and processed scripts can be made available from the corresponding author upon reasonable request.
Ethics Approval: Not applicable. This study does not involve human participants, human data, or animal experiments.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Alotaibi B. A survey on industrial Internet of Things security: requirements, attacks, AI-based solutions, and edge computing opportunities. Sensors. 2023;23(17):7470. doi:10.3390/s23177470. [Google Scholar] [PubMed] [CrossRef]
2. Liu DQ, Liang HL, Zeng XJ, Zhang Q, Zhang ZD, Li MH. Edge computing application, architecture, and challenges in ubiquitous power Internet of Things. Front Energy Res. 2022;10:850252. doi:10.3389/fenrg.2022.850252. [Google Scholar] [CrossRef]
3. China Academy of Information and Communications Technology. White paper on Internet of Things (2020) [Internet]. Beijing, China: China Academy of Information and Communications Technology; 2020 [cited 2026 Mar 19]. Available from: http://www.caict.ac.cn/. [Google Scholar]
4. Ferraris D, Fernandez-Gago C, Roman R, Lopez J. A survey on IoT trust model frameworks. J Supercomput. 2024;80(6):8259–96. doi:10.1007/s11227-023-05765-4. [Google Scholar] [CrossRef]
5. Garagad V, Iyer N. Dynamic trust-based device legitimacy assessment towards secure IoT interactions. J Commun Softw Syst. 2022;18(3):269–76. doi:10.24138/jcomss-2021-0189. [Google Scholar] [CrossRef]
6. Motmi A, Alhazmi S, Abu-Khadrah A, Al-Akhras M, Alhosban F. Trust management in industrial Internet of Things using a trusted E-Lithe protocol. Int J Adv Comput Sci Appl. 2022;13(2):334–45. doi:10.14569/ijacsa.2022.0130239. [Google Scholar] [CrossRef]
7. Jayasinghe U, Lee GM, Um TW, Shi Q. Machine learning based trust computational model for IoT services. IEEE Trans Sustain Comput. 2019;4(1):39–52. doi:10.1109/tsusc.2018.2839623. [Google Scholar] [CrossRef]
8. Duque Anton SD, Sinha S, Schotten HD. Anomaly-based intrusion detection in industrial data with SVM and random forests. In: Proceedings of the 27th International Conference on Software, Telecommunications and Computer Networks (SoftCOM); 2019 Sep 19–21; Split, Croatia. Piscataway, NJ, USA: IEEE; 2019. p. 1–6. [Google Scholar]
9. Rabanser S, Günnemann S, Lipton ZC. Failing loudly: an empirical study of methods for detecting dataset shift. In: Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019); 2019 Dec 8–14; Vancouver, Canada. Red Hook, NY, USA: Curran Associates, Inc.; 2019. p. 1396–408. [Google Scholar]
10. Niu SC, Wu JX, Zhang YF, Wen ZQ, Chen YF, Zhao PL, et al. Towards stable test-time adaptation in dynamic wild world. In: Proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023); 2023 May 1–5; Kigali, Rwanda. [Google Scholar]
11. Chen J, Mao FJ, Lv ZH, Tang JH. EdgeFD: an edge-friendly drift-aware fault diagnosis system for industrial IoT. In: Proceedings of the 2023 IEEE 23rd International Conference on Communication Technology (ICCT); 2023 Oct 13–16; Wuxi, China. Piscataway, NJ, USA: IEEE; 2023. p. 390–6. [Google Scholar]
12. Wang Q, Fink O, Van Gool L, Dai DX. Continual test-time domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18–24; New Orleans, LA, USA. Piscataway, NJ, USA: IEEE; 2022. p. 7201–11. [Google Scholar]
13. Li YH, Wang NY, Shi JP, Hou XD, Liu JY. Adaptive batch normalization for practical domain adaptation. Pattern Recognit. 2018;80(3):109–17. doi:10.1016/j.patcog.2018.03.005. [Google Scholar] [CrossRef]
14. Liang J, Hu DP, Feng JS. Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In: Proceedings of the 37th International Conference on Machine Learning (ICML); 2020 Jul 13–18; Virtual Event. New York, NY, USA: PMLR; 2020. p. 6028–39. [Google Scholar]
15. Liang J, He R, Tan T. A comprehensive survey on test-time adaptation under distribution shifts. Int J Comput Vis. 2025;133(1):31–64. doi:10.1007/s11263-024-02181-w. [Google Scholar] [CrossRef]
16. Wang DQ, Shelhamer E, Liu ST, Olshausen BA, Darrell T. TENT: fully test-time adaptation by entropy minimization. In: Proceedings of the 9th International Conference on Learning Representations (ICLR 2021); 2021 May 3–7; Virtual Event. [Google Scholar]
17. Yuan YG, Xu BB, Hou L, Sun F, Shen HW, Cheng XQ. TEA: test-time energy adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 17–21; Seattle, WA, USA. Piscataway, NJ, USA: IEEE; 2024. p. 23901–11. [Google Scholar]
18. Bar Y, Shaer S, Romano Y. Protected test-time adaptation via online entropy matching: a betting approach. In: Proceedings of the Advances in Neural Information Processing Systems 37 (NeurIPS 2024); 2024 Dec 10–15; Vancouver, Canada. Red Hook, NY, USA: Curran Associates, Inc.; 2024. p. 85467–99. [Google Scholar]
19. Guan QG, Ren ZY, Wang QL. LoFT-IIoT: a lightweight trust feature extraction method for industrial Internet of Things. In: Proceedings of the 2025 IEEE 25th International Conference on Communication Technology (ICCT); 2025 Oct 16–18; Shenyang, China. Piscataway, NJ, USA: IEEE; 2025. p. 919–23. [Google Scholar]
20. Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning (ICML); 2017 Aug 6–11; Sydney, Australia. New York, NY, USA: PMLR; 2017. p. 1321–30. [Google Scholar]
21. Ferrag MA, Friha O, Hamouda D, Maglaras L, Janicke H. Edge-IIoTset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access. 2022;10:40281–306. doi:10.1109/access.2022.3165809. [Google Scholar] [CrossRef]
22. Al-Hawawreh M, Sitnikova E, Aboutorab N. X-IIoTID: a connectivity- and device-agnostic intrusion dataset for industrial Internet of Things. IEEE Internet Things J. 2022;9(5):3962–77. doi:10.1109/jiot.2021.3102056. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools