Open Access
ARTICLE
HiFraud: Hierarchical Privacy-Preserving Federated Learning with Star-Chain Knowledge Transfer for Cross-Institutional Fraud Detection
1 School of Economics and Management, Beijing Jiaotong University, Beijing, China
2 Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
* Corresponding Author: Lei Zhang. Email:
# These authors contributed equally to this work
(This article belongs to the Special Issue: Recent Advances in Malware Detection)
Computers, Materials & Continua 2026, 88(2), 34 https://doi.org/10.32604/cmc.2026.081922
Received 11 March 2026; Accepted 13 April 2026; Issue published 15 June 2026
Abstract
Financial fraud detection across institutions faces a fundamental tension between the need for diverse training data and regulatory prohibitions on sharing sensitive records. Existing federated learning approaches suffer from performance degradation under non-IID distributions and substantial utility losses when uniform differential privacy is applied to inherently sparse fraud signals. To this end, this paper proposes HiFraud, a hierarchical federated framework featuring three key components: fraud-aware dynamic clustering with complementarity regularization to group institutions by fraud pattern similarity while preserving rare-type representation; star-chain knowledge transfer augmented by not-true-class distillation to propagate novel fraud patterns rapidly within clusters while mitigating catastrophic forgetting; and privacy-adaptive aggregation via Rényi differential privacy composition, calibrating noise intensity to distributional divergence and fraud rarity. Experiments on IEEE-CIS, PaySim, and Worldline datasets show that HiFraud achieves an area under the receiver operating characteristic curve (AUC-ROC) of 0.935 underKeywords
Financial fraud poses a persistent and escalating threat to banking, e-commerce, and digital payment ecosystems, where rapid anomaly identification is essential to preventing substantial economic losses [1]. In 2024, the U.S. Federal Trade Commission reported $12.5 billion in consumer fraud losses, representing a 25% increase over the prior year, with investment scams alone accounting for $5.7 billion [2]. This growth has intensified the demand for detection systems capable of identifying diverse and evolving fraud patterns across institutional boundaries. However, privacy regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) strictly prohibit the sharing of raw transaction data between organizations [3]. This creates a fundamental tension: effective fraud detection requires large-scale, heterogeneous datasets, yet the very data needed to build robust models cannot leave institutional boundaries.
Federated learning (FL) has emerged as a promising paradigm for enabling collaborative model training without centralizing sensitive data [4]. By allowing institutions to jointly optimize a shared model through the exchange of model parameters rather than raw data, FL offers a potential resolution to the privacy–utility dilemma in fraud detection. Nevertheless, the direct application of standard FL algorithms to cross-institutional fraud detection introduces two compounding challenges. First, different institutions face fundamentally distinct fraud types—credit card skimming at banks, account takeovers at e-commerce platforms, and money laundering at payment processors—resulting in extreme non-IID (non-independent and identically distributed) data distributions that violate the convergence assumptions underlying standard federated algorithms [5]. Second, fraudulent transactions are inherently rare events, with some institutions reporting fraud rates below 0.1%, rendering isolated local training insufficient to learn discriminative representations for minority-class patterns [6]. The simultaneous presence of distributional heterogeneity and extreme class imbalance demands architectural innovations beyond incremental adaptations of existing methods.
1.1 Federated Fraud Detection: Progress and Limitations
Early federated fraud detection work established the viability of collaborative training: Yang et al. [7] achieved an AUC of 95.5% with the first federated credit card fraud framework, while Abdul Salam et al. [8] introduced hybrid resampling to address class imbalance. However, the former assumes homogeneous fraud distributions and the latter lacks formal privacy guarantees. Subsequent studies have tackled class imbalance more systematically: Shah et al. [9] and Hilou et al. [10] applied client-side synthetic minority over-sampling technique (SMOTE) variants before federated aggregation, Farooq et al. [11] integrated adaptive aggregation with privacy-preserving mechanisms, Sarkar et al. [12] proposed Fed-Focal Loss to focus gradient updates on difficult fraud instances, and Wang et al. [13] developed Ratio Loss to dynamically counteract local–global imbalance mismatch without direct data access.
Despite these advances, three structural limitations persist across existing federated fraud detection systems. First, all aforementioned frameworks adopt flat federated architectures in which every participant is treated identically during aggregation, failing to exploit the natural similarities in fraud patterns among subsets of institutions. Second, class imbalance mitigation remains confined to the client level, without federated-layer coordination that could leverage cross-institutional knowledge about rare fraud types. Third, the integration of differential privacy—essential for regulatory compliance—typically imposes uniform noise that disproportionately degrades the detection of already-sparse fraud signals, creating a privacy–utility trade-off that existing approaches have not satisfactorily resolved. Recent work has begun to address these limitations from different angles. For instance, Aljunaid et al. [14] proposed an explainable AI-driven federated learning model for financial fraud detection that integrates secure aggregation with interpretable decision mechanisms, demonstrating the growing recognition that collaborative architectures must balance privacy, transparency, and detection accuracy in banking environments. However, their approach does not address hierarchical aggregation, dynamic clustering based on fraud pattern semantics, or the interplay between adaptive differential privacy and knowledge transfer that is central to our work.
1.2 Hierarchical Architectures and Knowledge Transfer in Federated Learning
Hierarchical federated learning introduces intermediate aggregation layers to address the communication overhead and data heterogeneity of large-scale federated systems. Liu et al. [15] proposed a cloud-edge-client architecture reducing communication costs by 60%, and comprehensive reviews have further examined cloud–edge–end collaboration for privacy-preserving AI [16,17]. Clustered federated learning refines this paradigm by grouping clients according to distributional similarity: Sattler et al. [18] demonstrated that model-agnostic clustering substantially improves convergence on non-IID data, while Gong et al. [19] achieved a 9.2% accuracy gain with adaptive cluster scheduling. On the clustering criteria side, Duan et al. [20] and Ali et al. [21] independently developed dynamic clustering frameworks supporting client migration in response to distribution drift, and Islam et al. [22] proposed a weight-based one-shot method achieving up to 45% accuracy gains. However, Yang et al. [23] identified a critical limitation: purely similarity-driven clustering may marginalize clients with rare patterns into low-influence groups, suggesting that clustering objectives should balance similarity with complementarity—an insight directly relevant to fraud detection.
In the knowledge transfer dimension, Wang et al. [24] provided theoretical and empirical evidence that sequential model passing achieves faster convergence under extreme non-IID conditions than parallel averaging, a finding further validated by Yan et al. [25] within hierarchical architectures. Xie et al. [26] proposed StarCPFL, combining centralized model distribution with chain-style sequential refinement. Despite these advances, sequential transfer inherits a well-known vulnerability: catastrophic forgetting, where training on subsequent clients overwrites previously accumulated knowledge [27]. Knowledge distillation has emerged as the predominant countermeasure. Lee et al. [28] proposed FedNTD, distilling knowledge on non-true classes to preserve discriminative capacity for absent categories, while He et al. [29] introduced selective self-distillation conditioned on teacher confidence. Arafeh et al. [30] further designed a warmup-based protocol to reduce forgetting during sequential initialization. These techniques provide a mature foundation for integrating distillation into sequential transfer, yet this combination has not been explored in cross-institutional fraud detection.
It is important to note how HiFraud’s star-chain mechanism fundamentally differs from existing approaches. Unlike StarCPFL [26], which applies star-chain communication in a general personalized federated learning setting with layer-wise clustering, HiFraud introduces three domain-specific innovations: (i) the star institution is selected based on a fraud-rate-adjusted performance metric (Eq. (5)) rather than simple accuracy, ensuring that institutions capable of detecting fraud under scarcity lead knowledge propagation; (ii) the chain ordering is determined by distributional similarity to the star in fraud pattern space, creating a curriculum-like transfer path that progressively adapts to increasingly diverse fraud distributions; and (iii) not-true-class distillation is integrated at each chain step specifically to preserve knowledge of fraud types absent from the current institution, which is critical in fraud detection where each institution may observe only a subset of fraud categories. These design choices transform the general-purpose star-chain topology into a fraud-aware knowledge propagation mechanism with theoretical and empirical advantages over flat aggregation in non-IID fraud settings.
1.3 Privacy Preservation and Adversarial Robustness
Differential privacy (DP) remains the dominant formal privacy framework in federated learning. The moments accountant [31] enabled tight privacy loss tracking, while Rényi differential privacy (RDP) [32] further tightened multi-round composition bounds. However, uniform noise injection poses a particular challenge for fraud detection: because fraud signals are inherently sparse, flat noise mechanisms disproportionately obscure the patterns the model needs to learn, as demonstrated by Truex et al. [33] for local DP on imbalanced datasets. Adaptive mechanisms have been proposed to address this limitation: Xue et al. [34] dynamically adjusted clipping thresholds based on gradient norms, Yuan et al. [35] introduced amplitude-varying perturbation that reduces noise in later training stages, and Lin et al. [36] formalized the M2FDP framework decomposing privacy contributions across multi-tier networks. Nevertheless, a unified composition theorem tailored to hierarchical adaptive mechanisms remains an open challenge.
Beyond formal guarantees, federated systems must withstand practical attacks. Bai et al. [37] surveyed membership inference attacks (MIA) in FL, concluding that standard configurations are vulnerable to success rates well above random chance, while Deng and Yang [38] showed that composite defenses combining gradient compression, selective sharing, and regularization can suppress MIA accuracy to below 38% with minimal task degradation. On the robustness front, Li et al. [39] demonstrated that classical Byzantine-robust aggregation methods degrade under non-IID distributions, motivating hierarchical solutions such as the two-tier Byzantine-resilient scheme of Nordlund et al. [40] and the cross-device protocol of Liu et al. [41]. However, neither MIA defenses nor Byzantine robustness mechanisms have been systematically integrated with hierarchical federated architectures for fraud detection, leaving a significant design gap.
The privacy-preserving mechanism in HiFraud is designed to defend against three specific categories of privacy threats that are particularly relevant to cross-institutional fraud detection. First, model inversion attacks, in which an adversary attempts to reconstruct sensitive transaction features from shared model parameters, are mitigated through the Gaussian noise mechanism applied during both star distribution (Eq. (6)) and chain transfer (Eq. (9)), which ensures that the transmitted parameters do not reveal individual transaction characteristics. Second, gradient leakage attacks, where an attacker infers training data from observed gradient updates, are countered by gradient clipping to sensitivity bound S combined with calibrated noise injection at each local adaptation step, following the Gaussian mechanism framework of Abadi et al. [31]. Third, membership inference attacks, in which an adversary determines whether a specific transaction was used in training, are addressed through the hierarchical aggregation structure that limits external visibility to cluster-level models rather than individual institutional updates, compounded by the adaptive noise calibration that provides stronger protection for institutions with distinctive fraud patterns. The formal privacy guarantee (Theorem 1) establishes that these layered defenses collectively satisfy
1.4 Our Approach and Contributions
The preceding analysis reveals a fragmented research landscape: clustered federated learning benefits non-IID data but has not been designed around fraud pattern semantics; sequential knowledge transfer offers convergence advantages but lacks privacy guarantees and forgetting mitigation; and adaptive differential privacy improves the privacy–utility trade-off but has not been coupled with hierarchical architectures. No existing framework unifies these individually mature components into a coherent system tailored to cross-institutional fraud detection.
This paper proposes HiFraud, a hierarchical privacy-preserving federated learning framework that addresses these challenges through a three-layer architecture integrating fraud-aware clustering, star-chain knowledge transfer with distillation-based forgetting mitigation, and privacy-adaptive aggregation grounded in Rényi differential privacy composition. The main contributions of this work are as follows:
• We propose a three-layer hierarchical architecture that combines fraud-aware dynamic clustering, intra-cluster star-chain transfer learning, and privacy-adaptive global aggregation. The framework achieves an AUC-ROC of 0.935 under
• We develop a star-chain knowledge transfer mechanism augmented with not-true-class distillation to mitigate catastrophic forgetting during sequential model passing. This mechanism enables detection of novel fraud patterns within 3 h inside clusters, compared to 24 h for flat federated architectures.
• We introduce a fraud-pattern-specific dynamic clustering strategy that balances distributional similarity with complementarity, preventing the marginalization of institutions with rare fraud types. This design improves detection performance for rare fraud categories by 18% compared to static geographic clustering.
• We design a hierarchical adaptive privacy allocation scheme based on Rényi differential privacy composition, calibrating noise intensity according to both distributional divergence and fraud pattern rarity. This approach reduces overall privacy budget consumption by 35% compared to uniform allocation while maintaining equivalent formal guarantees.
The remainder of this paper is organized as follows. Section 2 presents the HiFraud framework, detailing the fraud-aware dynamic clustering mechanism, the star-chain knowledge transfer with not-true-class distillation, and the privacy-adaptive aggregation scheme with formal privacy and convergence guarantees. Section 3 provides comprehensive experimental evaluations on three benchmark datasets, including comparisons with state-of-the-art baselines, ablation studies, privacy–utility analysis, and scalability assessments. Section 4 discusses the practical implications of the results and identifies limitations alongside directions for future research. Finally, Section 5 concludes the paper.
This section presents the design of HiFraud, a hierarchical federated learning framework for cross-institutional fraud detection. The framework adopts a three-layer architecture in which participating institutions are first grouped into clusters based on fraud pattern similarity, then engage in intra-cluster knowledge transfer through a star-chain mechanism augmented with distillation-based forgetting mitigation, and finally contribute to global model refinement via privacy-adaptive aggregation grounded in Rényi differential privacy. Fig. 1 provides an overview of the complete architecture. The global coordination layer manages cross-cluster aggregation and privacy budget allocation. Each cluster coordination layer facilitates star-chain knowledge transfer among its member institutions. At the institutional layer, local models are trained on private transaction data with adaptive differential privacy protection. The interplay among these three layers enables efficient knowledge sharing between similar institutions while maintaining formal privacy guarantees across the entire system. As illustrated in Fig. 1, the communication flow proceeds as follows: (1) at Layer 1, each institution trains locally on its private dataset

Figure 1: Overview of the HiFraud framework. The three-layer architecture comprises a global coordination layer (Layer 3) for cross-cluster aggregation and privacy budget management, cluster coordination layers (Layer 2) for intra-cluster star-chain knowledge transfer, and institutional layers (Layer 1) for local training with adaptive differential privacy. Within each cluster, the star institution (
Let
Algorithm 1 presents the complete training procedure of HiFraud.

2.1 Fraud-Aware Dynamic Clustering
Conventional federated clustering strategies group clients based on geographic proximity, organizational hierarchy, or generic model-weight similarity. In the context of fraud detection, however, institutions that are geographically distant may face nearly identical fraud schemes, while co-located institutions may encounter entirely different threat profiles. To capture this domain-specific structure, HiFraud introduces a fraud-aware clustering mechanism that groups institutions according to the distributional characteristics of their observed fraud patterns.
For each institution
where
To prevent the clustering process from leaking sensitive institutional information, each feature vector is perturbed with calibrated Laplace noise prior to transmission to the global coordinator:
where
Given the set of perturbed feature vectors
where
where
The clustering is re-executed every

Figure 2: Evolution of fraud-aware clusters over 50 communication rounds. Early rounds exhibit geographic groupings inherited from initialization, which progressively transition to fraud-pattern-based clusters as the feature vectors capture increasingly discriminative fraud characteristics. By round 30, institutions handling similar fraud types are co-located in the same cluster regardless of geographic origin.
2.2 Star-Chain Knowledge Transfer with Distillation
Within each cluster, HiFraud employs a star-chain transfer mechanism to efficiently propagate fraud detection knowledge among member institutions. This mechanism operates in two sequential phases: a star distribution phase that disseminates the best-performing model to all cluster members, followed by a chain enhancement phase that refines models through sequential passing with distillation-based forgetting mitigation. The design is motivated by two complementary findings from the recent literature: sequential model transfer achieves superior convergence under extreme non-IID conditions compared to parallel aggregation [24], while knowledge distillation on non-true classes effectively preserves global discriminative capacity during local adaptation [28].
In each cluster
where
The star model
where the noise scale
Following star distribution, models are refined through a chain of sequential local adaptations. Institutions within each cluster are ordered by their distributional similarity to the star, forming a transfer chain
To mitigate the catastrophic forgetting that arises from sequential training on heterogeneous data, each local adaptation step incorporates a not-true-class distillation loss. Specifically, for institution
where
The model update at each chain step is then given by:
where
The chain enhancement is executed for
where

Figure 3: Star-chain knowledge transfer within a cluster. In the star distribution phase (left), the best-performing institution (marked with
2.3 Privacy-Adaptive Mechanism
A uniform differential privacy mechanism applies identical noise to all institutions regardless of their data characteristics, which can disproportionately degrade detection performance for institutions with unique or rare fraud patterns. HiFraud addresses this through an adaptive noise calibration scheme that allocates stronger privacy protection to institutions whose data distributions diverge significantly from the cluster norm, while preserving model utility for institutions with representative distributions.
The noise scale
where
Here,
2.3.1 Hierarchical Privacy Budget Allocation
The total privacy budget
where
Within the star-chain transfer phase, the budget
2.3.2 Formal Privacy Guarantee
The following theorem establishes the end-to-end privacy guarantee of HiFraud.
Theorem 1: (Privacy Guarantee). Under the Gaussian mechanism with adaptive noise calibration defined in Eqs. (11) and (12), gradient clipping bound S, and Rényi DP composition in Eq. (13), the HiFraud framework satisfies
Proof: The proof follows from three observations. First, the clustering phase satisfies
Discussion of Assumptions. The privacy guarantee in Theorem 1 relies on three key assumptions that merit explicit examination in the context of fraud detection. First, the gradient clipping bound S assumes that all per-sample gradient norms can be bounded by S without significant information loss. In practice, fraud detection models may exhibit larger gradient norms for rare fraud samples; we mitigate this by setting S based on the 95th percentile of observed gradient norms during a non-private warmup phase of 5 rounds, following the adaptive clipping strategy of Xue et al. [34]. Second, the composition assumes that the noise mechanisms at different layers are applied independently, which holds in our architecture because each layer operates on distinct parameter spaces (feature vectors for clustering, model parameters for star-chain, aggregated models for global). Third, the use of
2.4 Global Aggregation and Convergence
At each global communication round, the cluster models
The global model is then redistributed to all clusters as the initialization for the next round of local training and star-chain transfer.
Theorem 2: (Convergence Rate). Assume that the global loss function
where
Proof: The proof proceeds by bounding the expected gradient norm through a standard one-step descent analysis adapted to the hierarchical structure. By L-smoothness:
Substituting the global update rule (Eq. (15)) and decomposing the update into three sources of error—stochastic gradient variance within clusters, privacy noise, and inter-cluster heterogeneity—yields:
where
Telescoping across T rounds with learning rate
Substituting
The convergence bound in Eq. (16) decomposes into three interpretable terms. The first term
2.5 Computational Complexity and Communication Overhead
The per-round computational cost comprises: clustering (
This section presents a comprehensive evaluation of HiFraud across multiple dimensions: overall detection performance, component-wise ablation, fraud pattern propagation speed, privacy–utility trade-offs, adversarial robustness, scalability, per-type detection, and sensitivity to key hyperparameters.
3.1 Datasets and Experimental Setup
We evaluate HiFraud on three benchmark fraud detection datasets with distinct characteristics. Table 1 summarizes the key statistics of each dataset.

The IEEE-CIS Fraud Detection dataset contains 590,540 e-commerce transactions with a 3.5% fraud rate and 433 heterogeneous features spanning transaction metadata (amount, product category, device information), identity features (email domain, device type, address), and V-features derived from principal component analysis of anonymized variables. PaySim provides 6.36 million synthetic mobile money transactions simulating real-world transfer, cash-out, and payment operations with 11 features including transaction type, amount, origin and destination account balances, and a binary fraud indicator; despite being synthetically generated, PaySim preserves the statistical properties of a real mobile money dataset from a developing country, including realistic class imbalance (0.13% fraud rate). The Worldline dataset comprises 284,807 credit card transactions with an extreme fraud rate of 0.172%, representing the most challenging class imbalance scenario among the three benchmarks ; all 28 numerical features are transformed via principal component analysis (PCA) for anonymization, with only the “Time” and “Amount” features retaining their original semantics.
To simulate realistic cross-institutional settings, we partition each dataset across
The base fraud detection model at each institution is a 4-layer fully connected neural network with hidden dimensions [256, 128, 64, 32], ReLU activations, batch normalization after each hidden layer, and a sigmoid output layer. The model contains approximately 168 K trainable parameters. We use the Adam optimizer with an initial learning rate of
All experiments are implemented in PyTorch 1.13.0 with Opacus 1.4.0 for differential privacy accounting. Unless otherwise stated, we use the following default configuration: the number of clusters K is determined dynamically within
Table 2 compares HiFraud against seven baselines spanning centralized training, standard federated methods, and recent hierarchical and clustered approaches. All federated methods are evaluated under the same data partition and, where applicable, the same total privacy budget

HiFraud achieves the highest AUC-ROC of 0.935, surpassing the centralized baseline by 2.3% and the strongest federated competitor (FedFraud) by 1.8%. The improvement over DP-FedAvg is 10.5%, demonstrating that the hierarchical architecture substantially recovers the performance typically lost to differential privacy. In terms of convergence speed, HiFraud reaches its plateau in 30 communication rounds, representing a 37% reduction compared to DP-FedAvg (49 rounds) and a 21% reduction compared to FedFraud (38 rounds). The membership inference attack success rate of 10.2% approaches the random-guess baseline of 10%, indicating that the combination of hierarchical aggregation and adaptive noise provides strong empirical privacy protection.
We note that HiFraud’s AUC-ROC (0.935) exceeds the centralized baseline (0.912) by 2.3%. This seemingly counterintuitive result can be attributed to two factors. First, the hierarchical structure introduces an implicit regularization effect: by training specialized cluster models before global aggregation, the framework prevents overfitting to the dominant non-fraud class that occurs in centralized training on imbalanced datasets. Second, the complementarity-aware clustering ensures that rare fraud patterns, which may be underrepresented in a single centralized training pass, receive focused attention within their assigned clusters through the star-chain mechanism. A similar phenomenon has been observed in clustered federated learning settings where local specialization outperforms global averaging on heterogeneous data [18]. However, we emphasize that this advantage is dataset- and partition-dependent: the centralized baseline represents a single-model upper bound under our specific non-IID partition, and centralized training with ensemble methods or specialized imbalance handling could potentially match or exceed HiFraud’s performance.
Fig. 4 presents the convergence trajectories of all methods across 50 communication rounds. HiFraud exhibits the steepest initial ascent and reaches 90% of its final performance by round 12, while DP-FedAvg requires approximately 30 rounds to reach the same relative milestone. The acceleration is attributable to the star-chain transfer mechanism, which enables efficient knowledge propagation within clusters of similar institutions, reducing the number of global rounds needed to disseminate useful fraud patterns.

Figure 4: Convergence trajectories of all methods on the IEEE-CIS dataset. HiFraud achieves the fastest convergence and highest final AUC-ROC, reaching 90% of its plateau by round 12.
Fig. 5 extends the comparison across all three datasets. HiFraud consistently outperforms all federated baselines on every dataset. The performance advantage is most pronounced on the Worldline dataset (0.948 vs. 0.858 for DP-FedAvg), where the extreme class imbalance (0.172% fraud rate) amplifies the benefit of the complementarity-aware clustering and adaptive privacy allocation. On PaySim, HiFraud achieves 0.962, exceeding even the centralized baseline (0.955), which we attribute to the regularization effect of the hierarchical structure preventing overfitting to the dominant non-fraud class.

Figure 5: AUC-ROC comparison across three datasets. HiFraud consistently achieves the highest performance among federated methods and surpasses centralized training on PaySim.
To quantify the contribution of each architectural component, we conduct an ablation study in which individual modules are removed while keeping all other components unchanged. Fig. 6 summarizes the results in terms of AUC-ROC and convergence rounds.

Figure 6: Ablation study on the IEEE-CIS dataset. Each bar pair shows the AUC-ROC (left axis, blue) and convergence rounds (right axis, orange) when one component is removed from the full HiFraud framework.
Removing the star-chain transfer mechanism produces the largest performance degradation, reducing AUC-ROC from 0.935 to 0.867 (a 6.8 percentage point drop) and increasing convergence rounds from 30 to 42. This result confirms that sequential knowledge transfer within clusters is the primary driver of both accuracy and convergence speed improvements. Without NTD distillation, AUC-ROC decreases to 0.901, demonstrating that forgetting mitigation accounts for approximately 3.4 percentage points of the total gain. Notably, even without distillation, the star-chain mechanism alone still outperforms all baselines, indicating that the transfer topology provides value independent of the forgetting mitigation strategy.
Removing fraud-aware clustering and reverting to random cluster assignment reduces AUC-ROC to 0.895 and increases convergence to 42 rounds, highlighting the importance of grouping institutions by fraud pattern similarity rather than arbitrary criteria. The adaptive differential privacy mechanism contributes 2.5 percentage points over uniform noise allocation, with its removal reducing AUC-ROC to 0.910 while only modestly affecting convergence. The complementarity regularizer provides a smaller but meaningful improvement of 1.7 percentage points, with its primary benefit concentrated on rare fraud types as discussed in Section 3.7.
A critical operational requirement for fraud detection systems is the ability to rapidly disseminate knowledge of newly emerging fraud patterns across institutions. To evaluate this capability, we simulate the injection of a novel fraud type at round 25 in a single institution and track the detection rate of this new pattern across the cluster hierarchy over subsequent rounds. Fig. 7 presents the results.

Figure 7: Detection rate of a novel fraud pattern injected at round 25. Within the same cluster, 80% of institutions detect the new pattern within 2 rounds (
Within the same cluster, the star-chain mechanism enables 80% of institutions to detect the novel fraud pattern within 2 communication rounds, corresponding to approximately 3 h in our experimental setup. This “3-h” figure is derived from our experimental configuration in which each communication round takes approximately 90 min, comprising local training (
3.5.1 Privacy–Utility Trade-Off
Fig. 8 illustrates the relationship between the total privacy budget

Figure 8: Privacy–utility trade-off. HiFraud achieves superior AUC-ROC at every privacy budget level. At the operating point
The advantage of HiFraud is most pronounced at tighter privacy budgets. At
Table 3 provides a detailed breakdown of how the privacy budget is allocated across the three layers of HiFraud, compared to uniform and non-hierarchical adaptive allocation.

3.5.2 Resistance to Membership Inference Attacks
We evaluate the empirical privacy protection of HiFraud against membership inference attacks using the gradient-based attack framework described in Bai et al. [37]. The attacker is assumed to have passive access to the aggregated model updates at the cluster level and employs a binary classifier trained to distinguish between member and non-member samples. Fig. 9 reports the attack success rates across all methods.

Figure 9: Membership inference attack success rate. Lower values indicate stronger privacy. HiFraud achieves 10.2%, approaching the 10% random-guess baseline.
HiFraud achieves the lowest MIA success rate of 10.2%, approaching the 10% random-guess baseline for our 10-class attack formulation. This represents a 64% reduction relative to FedAvg (28.7%) and a 17% reduction relative to FedFraud (12.3%). The strong privacy protection results from the compounding effect of three mechanisms: the adaptive noise injection obscures individual gradient contributions, the hierarchical aggregation limits the attacker’s visibility to cluster-level updates rather than institutional-level parameters, and the star-chain transfer introduces additional noise through sequential model passing. Notably, HiFraud provides stronger empirical privacy than DP-FedAvg (15.2%) despite achieving substantially higher detection performance, demonstrating that the hierarchical architecture enables a more favorable privacy–utility operating point.
Fig. 10 evaluates the performance and communication efficiency of HiFraud as the number of participating institutions increases from 10 to 100.

Figure 10: Scalability analysis. HiFraud maintains stable AUC-ROC as the federation grows to 100 institutions, while FedAvg degrades by 4.1 percentage points. Communication cost scales as
HiFraud maintains stable detection performance across all federation sizes, with AUC-ROC decreasing only marginally from 0.930 (10 institutions) to 0.927 (100 institutions), a degradation of 0.3 percentage points. In contrast, FedAvg suffers a 4.1 percentage point decline over the same range, from 0.882 to 0.842, as increasing data heterogeneity overwhelms the flat aggregation mechanism. The communication cost of HiFraud grows sublinearly with the number of institutions due to the hierarchical structure: at 100 institutions, HiFraud requires 4.8

3.7 Per-Type Detection and Clustering Analysis
To evaluate whether the hierarchical architecture improves detection uniformly or preferentially benefits specific fraud categories, we report per-type AUC-ROC in Fig. 11.

Figure 11: Per-fraud-type AUC-ROC. HiFraud achieves the most uniform performance across fraud types, with the greatest improvement on the rarest category (Synthetic ID: +23.0% over DP-FedAvg).
HiFraud provides substantial and consistent improvements across all fraud types, with the most pronounced gains on rare categories. For Synthetic ID fraud, the rarest type in our experimental setup, HiFraud achieves an AUC-ROC of 0.910, compared to 0.782 for FedFraud and 0.680 for DP-FedAvg. This 23.0 percentage point improvement over DP-FedAvg demonstrates the effectiveness of the complementarity regularizer in preventing the marginalization of institutions holding rare fraud types. The performance variance across fraud types is also notably reduced: the standard deviation of per-type AUC-ROC is 0.015 for HiFraud, compared to 0.058 for FedFraud and 0.078 for DP-FedAvg, indicating more equitable detection across all fraud categories.
Fig. 12 visualizes the evolution of cluster assignments over the course of training, using a t-distributed stochastic neighbor embedding (t-SNE) projection of institutional fraud feature vectors with markers indicating the dominant fraud type at each institution.

Figure 12: Evolution of cluster assignments visualized via t-SNE projection. Colors indicate cluster assignment; marker shapes indicate dominant fraud type. By round 30, clusters align closely with fraud type rather than initial geographic grouping.
At round 1, cluster assignments reflect the initial geographic grouping, with each cluster containing a heterogeneous mix of fraud types (indicated by diverse marker shapes within each color group). By round 15, the clustering begins to reorganize around fraud pattern similarity, as the fraud-aware feature vectors become more discriminative through iterative refinement. By round 30, clusters are strongly aligned with fraud type: institutions facing similar fraud categories are co-located in the same cluster regardless of their geographic origin. This transition demonstrates that the dynamic re-clustering mechanism successfully adapts to the underlying fraud structure of the data, enabling increasingly specialized intra-cluster knowledge transfer as training progresses.
We investigate the sensitivity of HiFraud to three key hyperparameters: the transfer coefficient

Figure 13: Sensitivity to key hyperparameters. (a) Transfer coefficient
The transfer coefficient
The re-clustering interval
The experimental results carry several practical implications. The finding that HiFraud surpasses centralized training on PaySim (0.962 vs. 0.955) challenges the assumption that federated approaches necessarily sacrifice detection quality for privacy, suggesting that the hierarchical structure introduces a beneficial inductive bias by increasing the diversity of training signals without exposing raw data. In production settings, financial institutions can thus achieve detection performance meeting or exceeding centralized alternatives while fully complying with GDPR and CCPA, and the 3-h propagation latency for novel fraud patterns enables rapid collective response to emerging threats. Equally important, the hierarchical architecture fundamentally alters the privacy–utility trade-off: at the stringent budget of
4.2 Limitations and Future Work
Despite the strong performance across multiple benchmarks, several limitations merit acknowledgment. The dynamic re-clustering mechanism introduces periodic disruptions to intra-cluster learning; our sensitivity analysis shows that the interval
Additionally, the current evaluation is conducted on benchmark datasets that, while widely used in the fraud detection literature, may not fully capture the complexity of production fraud systems. Real-world deployments involve continuously evolving fraud tactics, adversarial adaptation, and regulatory constraints that vary across jurisdictions. Future work should evaluate HiFraud on proprietary institutional datasets in controlled pilot studies to validate the framework’s effectiveness under genuine operational conditions.
This paper proposed HiFraud, a hierarchical federated learning framework that addresses the fundamental challenges of cross-institutional fraud detection through a three-layer architecture integrating fraud-aware dynamic clustering with complementarity regularization, star-chain knowledge transfer augmented by not-true-class distillation for forgetting mitigation, and privacy-adaptive aggregation grounded in Rényi differential privacy composition. The key technical contributions include: (i) a fraud-aware dynamic clustering mechanism with complementarity regularization that groups institutions by fraud pattern similarity while preserving rare-type representation; (ii) a star-chain knowledge transfer mechanism with domain-specific innovations including fraud-rate-adjusted star selection, similarity-ordered chain traversal, and not-true-class distillation for forgetting mitigation; (iii) a hierarchical adaptive privacy allocation scheme based on Rényi DP composition that calibrates noise to distributional divergence and fraud rarity; and (iv) formal privacy and convergence guarantees with detailed proofs under explicit assumptions. Experiments on three benchmark datasets demonstrated that HiFraud achieves an AUC-ROC of 0.935 under
Acknowledgement: Not applicable.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: Conceptualization, Zhihao Zhang and Zhuodong Liu; methodology, Zhihao Zhang and Zhuodong Liu; software, Zhihao Zhang and Zhuodong Liu; validation, Zhihao Zhang, Zhuodong Liu and Xiangyu Li; formal analysis, Zhihao Zhang and Zhuodong Liu; investigation, Zhihao Zhang, Zhuodong Liu and Xiangyu Li; resources, Lei Zhang; data curation, Xiangyu Li; writing—original draft preparation, Zhihao Zhang and Zhuodong Liu; writing—review and editing, Xiangyu Li and Lei Zhang; visualization, Zhihao Zhang and Zhuodong Liu; supervision, Lei Zhang; project administration, Lei Zhang. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: The three benchmark datasets used in this study are publicly available: IEEE-CIS Fraud Detection (https://www.kaggle.com/c/ieee-fraud-detection), PaySim (https://www.kaggle.com/datasets/ealaxi/paysim1), and Worldline (https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud). The federated data partitioning configurations, which simulate cross-institutional settings as described in Section 3.1, are not derived from real institutional records and do not contain sensitive information. The experimental code, including data partitioning scripts and all baseline implementations, will be released upon acceptance of this paper.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Chatterjee P, Das D, Rawat DB. Digital twin for credit card fraud detection: opportunities, challenges, and fraud detection advancements. Future Gener Comput Syst. 2024;158:410–26. [Google Scholar]
2. Federal Trade Commission. New FTC data show consumers reported losing more than $12.5 billion to fraud in 2024 [Internet]. Washington, DC, USA: FTC; 2025 [cited 2025 Mar 15]. Available from: https://www.ftc.gov/news-events/news/press-releases/2025/03/new-ftc-data-show-big-jump-reported-losses-fraud-125-billion-2024. [Google Scholar]
3. Yang Z. Privacy-aware financial risk control: a federated learning approach with differential privacy optimization. J Comput Technol Softw. 2025;4:37–52. [Google Scholar]
4. McMahan B, Moore E, Ramage D, Hampson S, Arcas BAY. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS); 2017 Apr 20–22; Fort Lauderdale, FL, USA. p. 1273–82. [Google Scholar]
5. Zhu H, Xu J, Liu S, Jin Y. Federated learning on non-IID data: a survey. Neurocomputing. 2021;465:371–90. doi:10.1016/j.neucom.2021.07.098. [Google Scholar] [CrossRef]
6. Zhang J, Li C, Qi J, He J. A survey on class imbalance in federated learning. arXiv:2303.11673. 2023. [Google Scholar]
7. Yang W, Zhang Y, Ye K, Li L, Xu CZ. FFD:a federated learning based method for credit card fraud detection. In: Proceedings of the International Conference on Big Data; 2019 Dec 10–13; Los Angeles, CA, USA. p. 18–32. [Google Scholar]
8. Abdul Salam M, Fouad KM, Elbably DL, Elsayed SM. Federated learning model for credit card fraud detection with data balancing techniques. Neural Comput Appl. 2024;36(11):7359–78. doi:10.1007/s00521-023-09410-2. [Google Scholar] [CrossRef]
9. Shah M, Shah P, Patil S. Secure and efficient fraud detection using federated learning and distributed search databases. In: Proceedings of the IEEE 4th International Conference on AI in Cybersecurity (ICAIC). Piscataway, NJ, USA: IEEE; 2025. p. 1–6. [Google Scholar]
10. Hilou H, Ahmed M, Dheeb S, Radhi A, Khadim Z, Majeed M, et al. Federated learning for credit card fraud detection: a privacy-preserving approach with SMOTE optimization. J Al-Qadisiyah Comput Sci Math. 2025;17(3):Comp 44–57. [Google Scholar]
11. Farooq M, Munir S, Manzoor M, Shaheen M. AI-driven adaptive federated learning with privacy preservation and imbalance adjustment for financial credit card fraud detection. Appl Comput Intell Soft Comput. 2025;2025:7116768. [Google Scholar]
12. Sarkar D, Narang A, Rai S. Fed-Focal Loss for imbalanced data classification in federated learning. arXiv:2011.06283. 2020. [Google Scholar]
13. Wang L, Xu S, Wang X, Zhu Q. Addressing class imbalance in federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park, CA, USA: AAAI Press; 2021. Vol. 35, p. 10165–73. [Google Scholar]
14. Aljunaid SK, Almheiri SJ, Dawood H, Khan MA. Secure and transparent banking: explainable AI-driven federated learning model for financial fraud detection. J Risk Financ Manag. 2025;18(4):179. [Google Scholar]
15. Liu L, Zhang J, Song S, Letaief KB. Client-edge-cloud hierarchical federated learning. In: Proceedings of the IEEE International Conference on Communications (ICC); 2020 Jun 7–11; Dublin, Ireland. p. 1–6. [Google Scholar]
16. Zhan S, Huang L, Luo G, Zheng S, Gao Z, Chao HC. A review on federated learning architectures for privacy-preserving AI: lightweight and secure cloud-edge–end collaboration. Electronics. 2025;14(13):2512. [Google Scholar]
17. Albshaier L, Almarri S, Albuali A. Federated learning for cloud and edge security: a systematic review of challenges and AI opportunities. Electronics. 2025;14(5):1019. [Google Scholar]
18. Sattler F, Müller KR, Samek W. Clustered federated learning: model-agnostic distributed multitask learning for non-IID data. IEEE Trans Neural Netw Learn Syst. 2021;32:3710–22. [Google Scholar] [PubMed]
19. Gong B, Xing T, Liu Z, Xi W, Chen X. Adaptive client clustering for efficient federated learning over non-IID and imbalanced data. IEEE Trans Big Data. 2024;10(6):1051–65. doi:10.1109/tbdata.2022.3167994. [Google Scholar] [CrossRef]
20. Duan M, Liu D, Ji X, Wu Y, Liang L, Chen X, et al. Flexible clustered federated learning for client-level data distribution shift. IEEE Trans Parallel Distrib Syst. 2022;33:2661–74. doi:10.1109/tpds.2021.3134263. [Google Scholar] [CrossRef]
21. Ali SS, Ali M, Bhatti DMS, Choi BJ. dy-TACFL: dynamic temporal adaptive clustered federated learning for heterogeneous clients. Electronics. 2025;14(1):152. doi:10.3390/electronics14010152. [Google Scholar] [CrossRef]
22. Islam M, Javaherian S, Xu F, Yuan X, Chen L, Tzeng N. FedClust: optimizing federated learning on non-IID data through weight-driven client clustering. In: 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); 2024 May 27–31; San Francisco, CA, USA. p. 1184–1186. [Google Scholar]
23. Yang X, Feng J, Tong Y, Wang L, Guo S, Fang B, et al. DA-PFL: dynamic affinity aggregation in personalized federated learning under class imbalance. IEEE Trans Neural Netw Learn Syst. 2025;36:20184–98. [Google Scholar] [PubMed]
24. Wang N, Deng Y, Feng W, Fan S, Yin J, Ng S. One-shot sequential federated learning for non-IID data by enhancing local model diversity. In: MM ’24: Proceedings of the 32nd ACM International Conference on Multimedia. New York, NY, USA: ACM; 2024. p. 5201–10. [Google Scholar]
25. Yan X, Zuo S, Fan R, Hu H, Shen L, Zhao P, et al. Sequential federated learning in hierarchical architecture on non-IID datasets. IEEE Trans Mob Comput. 2024;24(10):11110–24. doi:10.1109/tmc.2025.3573928. [Google Scholar] [CrossRef]
26. Xie R, Liang W, Chen Y, He D, Jin K, Li K, et al. StarCPFL: star-centric personalized federated learning with layer-wised clustering. Future Gener Comput Syst. 2025;175:108037. [Google Scholar]
27. Criado M, Casado F, Iglesias R, Regueiro C, Barro S. Non-IID data and continual learning processes in federated learning: a long road ahead. Inf Fusion. 2022;88(3):263–80. doi:10.1016/j.inffus.2022.07.024. [Google Scholar] [CrossRef]
28. Lee G, Jeong M, Shin Y, Bae S, Yun S. Preservation of the global knowledge by not-true distillation in federated learning. arXiv:2106.03097. 2022. [Google Scholar]
29. He Y, Chen Y, Yang X, Yu H, Huang Y, Gu Y. Learning critically: selective self-distillation in federated learning on non-IID data. IEEE Trans Big Data. 2024;10:789–800. [Google Scholar]
30. Arafeh M, Hammoud A, Guizani M, Mourad A, Otrok H, Ould-Slimane H, et al. WFSL: warmup-based federated sequential learning. IEEE Internet Things J. 2025;12:1974–89. [Google Scholar]
31. Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, et al. Deep learning with differential privacy. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS); 2016 Oct 24–28; Vienna, Austria. p. 308–18. [Google Scholar]
32. Mironov I. Rényi differential privacy. In: Proceedings of the IEEE 30th Computer Security Foundations Symposium (CSF); 2017 Aug 21–25; Santa Barbara, CA, USA. p. 263–75. [Google Scholar]
33. Truex S, Liu L, Chow KH, Gursoy ME, Wei W. LDP-Fed: federated learning with local differential privacy. In: Proceedings of the EdgeSys Workshop; 2020 Apr 27; Heraklion, Greece. p. 61–6. [Google Scholar]
34. Xue R, Xue K, Zhu B, Luo X, Zhang T, Sun Q, et al. Differentially private federated learning with an adaptive noise mechanism. IEEE Trans Inf Forensics Secur. 2024;19:74–87. doi:10.1109/tifs.2023.3318944. [Google Scholar] [CrossRef]
35. Yuan X, Ni W, Ding M, Wei K, Li J, Poor HV. Amplitude-varying perturbation for balancing privacy and utility in federated learning. IEEE Trans Inf Forensics Secur. 2023;18:1884–97. doi:10.1109/tifs.2023.3258255. [Google Scholar] [CrossRef]
36. Lin F, Chen E, Han D, Brinton CG. Differentially-private multi-tier federated learning: a formal analysis and evaluation. IEEE/ACM Trans Netw. 2025;34:2226–41. [Google Scholar]
37. Bai L, Hu H, Ye Q, Li H, Wang L, Xu J. Membership inference attacks and defenses in federated learning: a survey. ACM Comput Surv. 2024;57(4):1–35. doi:10.1145/3704633. [Google Scholar] [CrossRef]
38. Deng X, Yang J. Multi-layer defense strategies and privacy preserving enhancements for membership reasoning attacks in a federated learning framework. In: Proceedings of the 5th International Conference on Computer Science and Blockchain (CCSB); 2025 Aug 1–3; Shenzhen, China. p. 278–82. [Google Scholar]
39. Li S, Ngai ECH, Voigt T. An experimental study of Byzantine-robust aggregation schemes in federated learning. IEEE Trans Big Data. 2024;10(6):975–88. doi:10.1109/tbdata.2023.3237397. [Google Scholar] [CrossRef]
40. Nordlund D, Liao J, Chen Z. Byzantine-resilient hierarchical federated learning with clustered over-the-air aggregation. In: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Workshops; 2024 Apr 14–19; Seoul, Republic of Korea. p. 715–19. [Google Scholar]
41. Liu J, Wu Y, Du W, Sun R, Xu G, Liu L, et al. Byzantine-robust hierarchical aggregation for cross-device federated learning in consumer IoT. IEEE Trans Consum Electron. 2025;71(2):6359–70. doi:10.1109/tce.2024.3450649. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools