Open Access
ARTICLE
PRIME: A Physics-Guided Residual Integrated Framework for Multi-Task Aircraft Engine Diagnostics
1 Faculty of Sciences and Technology, Department of Computer Sciences, L2IS Laboratory, Cadi Ayyad University, Marrakech, Morocco
2 Faculty of Sciences Semlalia, Department of Computer Sciences, LISI Laboratory, Cadi Ayyad University, Marrakech, Morocco
* Corresponding Author: Ouail Mjahed. Email:
Computer Modeling in Engineering & Sciences 2026, 147(3), 14 https://doi.org/10.32604/cmes.2026.083272
Received 31 March 2026; Accepted 14 May 2026; Issue published 30 June 2026
Abstract
Accurate aircraft engine diagnostics is essential for ensuring operational safety and enabling predictive maintenance under heterogeneous operating conditions. Although deep learning models can effectively capture high-dimensional multivariate sensor dynamics, purely data-driven approaches often entangle operating-condition variability with degradation-sensitive patterns, which limits robustness and generalization. This paper introduces PRIME, a physics-guided residual integrated framework for multi-task aircraft engine diagnostics. Rather than embedding explicit thermodynamic equations or physical constraints into the optimization process, PRIME relies on a physically motivated residual decomposition strategy that separates operating-condition-driven nominal behavior from degradation-sensitive sensor deviations. Specifically, nominal responses are estimated from operating-condition representations and subtracted from observed sensor signals to isolate fault-relevant residual patterns. These residual representations are then processed by a hybrid temporal architecture combining temporal convolutional networks and transformer-based self-attention, enabling joint modeling of local degradation signatures and long-range temporal dependencies. Within a unified optimization framework, PRIME simultaneously performs Fault Detection (FD), Fault Type Classification (FTC), and Health State Estimation (HSE). Extensive experiments on NASA C-MAPSS, N-CMAPSS, and the ALFA dataset show consistent and statistically significant improvements over strong baseline models. For FD and FTC, PRIME achieves gains of approximately 2%–4% over the strongest neural baselines evaluated under the same protocol, with larger margins over classical machine learning approaches. For HSE, PRIME yields more faithful degradation trajectories, leading to systematic reductions in estimation error across single- and multi-regime datasets. When Remaining Useful Life (RUL) is projected from the learned health trajectory through a threshold-based mechanism, the resulting estimates also improve substantially, with RMSE reductions of up to about 22% under complex operating conditions. These results show that physics-guided residual disentanglement improves robustness, interpretability, and multi-task diagnostic performance. More broadly, they support the view that HSE provides a useful latent degradation representation for downstream prognostic assessment, even though RUL is not directly optimized by the model.Keywords
Aircraft engines constitute one of the most safety-critical subsystems in modern aviation, where unexpected failures can severely impact operational safety, mission reliability, and maintenance costs. To mitigate such risks, Prognostics and Health Management (PHM) systems have become essential for enabling condition-based maintenance and early fault detection. The increasing deployment of onboard sensing technologies has led to the availability of high-dimensional multivariate time-series data describing engine behavior across diverse operating regimes.
Benchmark datasets such as the NASA C-MAPSS turbofan simulation dataset [1] have played a central role in advancing data-driven diagnostic research by providing multivariate run-to-failure trajectories under multiple fault conditions. More recently, the N-CMAPSS dataset [2] introduced higher-fidelity flight condition simulations and richer labeling structures, enabling both prognostics and fault classification under realistic operational variability. In parallel, complementary real-world datasets such as the AirLab Failure and Anomaly (ALFA) dataset [3] provide annotated UAV fault and anomaly scenarios, allowing additional evaluation under practical flight conditions.
Early approaches to aircraft engine fault diagnosis relied primarily on classical machine learning methods, including Support Vector Machines, k-Nearest Neighbors, and ensemble learning techniques [4,5]. While effective under controlled conditions, these models depend heavily on handcrafted features and often struggle to generalize across complex and variable operating regimes.
The emergence of deep learning has significantly improved diagnostic performance by enabling automatic feature extraction from raw multivariate signals. Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks have demonstrated strong capabilities in modeling temporal degradation patterns [6,7]. More recently, transformer-based architectures leveraging self-attention mechanisms have shown superior ability to capture long-range temporal dependencies in time-series PHM applications [8,9]. Despite their success, purely data-driven deep models often suffer from limited robustness under varying operating conditions, reduced interpretability, and degraded performance when encountering rare or previously unseen fault modes.
A key challenge in aircraft engine diagnostics lies in the strong coupling between operating-condition dynamics and degradation-induced sensor variations. Changes in altitude, throttle settings, and environmental conditions significantly influence sensor measurements, potentially masking early degradation signatures. Without explicitly accounting for this coupling, data-driven models may inadvertently learn operating regime characteristics rather than fault-related patterns, leading to unreliable generalization.
To address these limitations, hybrid and domain-guided learning paradigms have attracted increasing attention. Theory-guided data science frameworks [10] and physics-informed neural networks [11,12] advocate integrating domain knowledge into data-driven architectures to enhance robustness and interpretability. In the context of aero-engine diagnostics, hybrid approaches combining physics-based performance models with deep learning have demonstrated improved stability under variable operating conditions [13,14]. However, many existing approaches either impose loosely coupled physical constraints or do not explicitly disentangle operating-condition effects from degradation dynamics within the learned representations.
Motivated by the need for robust and generalizable aircraft engine diagnostics under varying operating conditions, this paper proposes PRIME (Physics-guided Residual Integrated framework for Multi-task Engine diagnostics), a hybrid framework that explicitly decouples operating-condition dynamics from degradation-sensitive sensor patterns through a physically motivated residual decomposition strategy. Specifically, PRIME estimates a nominal sensor response conditioned on the operating regime and subtracts it from the observed measurements to isolate degradation-relevant residual signals prior to feature extraction. These residual representations are subsequently processed using a hybrid temporal architecture combining Temporal Convolutional Networks and transformer-based self-attention, enabling the model to capture both local degradation signatures and long-range temporal dependencies. The proposed nominal–residual separation is intended to reduce the entanglement between operating-condition variability and degradation-sensitive patterns. However, this decomposition remains an approximation and may become less accurate when strong nonlinear interactions exist between regime dynamics and fault evolution. More broadly, while the framework is evaluated on heterogeneous datasets, the present study does not yet constitute a dedicated stress test for unseen operating regimes or sensor drift. Accordingly, the value of PRIME should be understood not only in terms of average accuracy gains, but also in its unified treatment of FD, FTC, and HSE, its support for downstream prognostic analysis, and its built-in interpretability under variable operating conditions. In addition, a hierarchical attention mechanism provides sensor-level and temporal interpretability, improving diagnostic transparency and trustworthiness.
Unlike conventional regime normalization techniques, which attempt to mitigate operating-condition variability through preprocessing transformations or domain adaptation strategies, PRIME performs representation-level disentanglement through residual modeling. By explicitly separating nominal regime-dependent behavior from degradation-induced variations, the framework allows the neural network to focus more directly on fault-related information.
It is important to distinguish the proposed approach from physics-informed neural network (PINN) paradigms in the strict sense. PINN-based methods typically embed governing equations, conservation laws, or explicit physical constraints into the optimization process. PRIME does not require explicit thermodynamic equations or equation-constrained loss terms. Instead, its physical grounding lies in a structured and physically motivated separation between operating-condition-driven nominal behavior and degradation-sensitive residual deviations. In this sense, PRIME is best characterized as a physics-guided residual learning framework rather than a physics-informed model in the strict PINN sense.
Consequently, PRIME combines the flexibility of data-driven deep learning with a physically motivated residual representation, enabling improved robustness under varying operating conditions while remaining compatible with practical PHM settings in which explicit first-principles models may be unavailable or difficult to integrate.
The main contributions of this work are summarized as follows:
• We propose a physics-guided residual decomposition strategy that explicitly separates operating-condition effects from degradation-sensitive sensor variations in aircraft engine data.
• We design a hybrid deep architecture integrating temporal convolution and transformer-based self-attention for multi-scale temporal modeling.
• We incorporate a hierarchical attention mechanism to enhance interpretability and robustness of diagnostic decisions.
• We develop a unified multi-task framework capable of simultaneously performing fault detection (FD), fault type classification (FTC), and health state estimation (HSE) within a single model.
• We conduct extensive experiments on C-MAPSS, N-CMAPSS, and ALFA datasets, demonstrating consistent and statistically significant improvements over strong machine learning and deep learning baselines under a unified evaluation protocol.
The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 introduces the proposed PRIME framework. Section 4 describes the experimental setup, while Section 5 presents and discusses the experimental results. Sections 6 and 7 report the statistical significance analysis and the interpretability/physical consistency analysis, respectively. Section 8 presents the ablation study, Section 9 analyzes the computational complexity, Section 10 discusses the comparison with related literature, Section 11 outlines the main limitations of the proposed approach, and Section 12 concludes the paper.
This section reviews recent advances in aircraft engine fault diagnosis and prognostics with a focus on five complementary directions: deep temporal modeling, transformer-based PHM architectures, robustness under varying operating conditions, residual and physics-guided hybrid learning, and explainability in safety-critical PHM systems.
2.1 Deep Temporal Modeling for Engine Fault Diagnosis
Deep temporal models have significantly advanced aircraft engine fault diagnosis by enabling direct learning from multivariate sensor trajectories. Beyond early CNN- and LSTM-based architectures, recent studies have explored more expressive temporal modeling strategies for capturing progressive degradation patterns. Multi-scale convolutional networks and dilated Temporal Convolutional Networks (TCNs) have shown improved capability in capturing local and medium-range degradation dynamics at different temporal resolutions [15,16].
Graph-based deep learning approaches have also been introduced to explicitly model inter-sensor dependencies. In particular, Graph Neural Networks (GNNs) have shown promising results in engine fault classification by learning relational structures among correlated sensor channels [17,18]. These developments highlight the importance of jointly modeling temporal evolution and cross-sensor interactions.
However, most temporal models remain predominantly data-driven and often assume stationary degradation structures. As a result, they may not explicitly account for operating-condition variability, which can lead to entanglement between regime effects and degradation-sensitive patterns.
2.2 Transformer-Based Architectures in PHM
Transformer models have recently gained increasing attention in PHM due to their ability to capture long-range dependencies without recurrent computations. Architectures such as Informer, Autoformer, and related time-series transformers have been adapted to Remaining Useful Life (RUL) prediction and health state estimation [19,20].
In engine diagnostics, attention-based architectures can improve feature weighting across sensors and time steps, while also providing a degree of interpretability by highlighting critical degradation windows [21]. These properties make transformers attractive for complex PHM settings in which both short-term fluctuations and long-horizon dependencies are important.
Recent work has also explored the use of time-series foundation models for few-shot aircraft engine prognostics, highlighting the growing importance of transferability and data-efficient temporal representation learning in PHM settings [22]. Current aircraft engine PHM studies have explored adaptive deep Q-learning and heterogeneous deep ensembles, further underscoring the need for operating-condition-aware and unified diagnostic frameworks [23,24].
Nevertheless, transformer-based models remain largely data-driven and typically exhibit quadratic complexity with sequence length. More importantly, most existing transformer PHM studies focus on prognostics, especially RUL estimation, rather than explicit multi-class fault diagnosis under heterogeneous operating conditions. Their robustness under strong regime shifts also remains an open challenge.
2.3 Robustness under Variable Operating Conditions
A persistent difficulty in aircraft engine diagnostics is the strong influence of operating conditions on sensor measurements. Variations in altitude, throttle setting, ambient pressure, and flight regime can induce substantial changes in the observed signals, often masking early degradation signatures.
To address this issue, several normalization and domain adaptation strategies have been proposed. Domain-adversarial neural networks and transfer learning approaches have been applied to C-MAPSS subsets with multiple operating regimes to reduce distribution shifts between training and testing data [25]. Other studies employ operating-condition clustering or regime-specific normalization prior to model training [26].
Although these strategies can mitigate regime bias, they often treat operating conditions as nuisance variables to be normalized away, rather than explicitly separating nominal operating behavior from degradation-related deviations. Consequently, degradation-sensitive information may remain entangled with regime-dependent variability in the learned latent space.
2.4 Residual Learning and Physics-Guided Hybrid Frameworks
Residual-based modeling has long been recognized as an effective strategy for isolating fault-relevant deviations from nominal behavior. In aero-engine monitoring, residual generation techniques derived from performance analysis or reference models have been used for fault isolation in model-based diagnosis frameworks [27].
More recently, hybrid approaches combining physical insight with neural networks have been proposed to improve robustness and interpretability. For example, physics-augmented deep models incorporate estimated thermodynamic variables or simulator-derived features as additional inputs to improve fault classification performance [13]. Other approaches use physics-aware regularization to encourage consistency between model outputs and known physical relationships [28,29].
It is important, however, to distinguish between physics-informed learning in the strict PINN sense and broader physics-guided or physically motivated hybrid modeling. PINN-style methods explicitly embed governing equations, conservation laws, or differentiable physical constraints into the optimization process [11,12]. By contrast, many practical PHM methods do not impose explicit equations, but instead exploit domain knowledge through architecture design, feature engineering, or residual decomposition. PRIME belongs to this latter category: its physical motivation lies in separating operating-condition-driven nominal behavior from degradation-sensitive residual deviations, rather than enforcing thermodynamic equations in the loss function.
Despite these advances, limited work has systematically combined residual-based operating-condition disentanglement with hybrid temporal modeling and built-in interpretability for multi-task aircraft engine diagnostics.
2.5 Explainability in PHM Models
As deep PHM models become increasingly complex, interpretability has become essential, particularly in safety-critical applications. Attention visualization, SHAP values, saliency analysis, and related post-hoc techniques have been used to identify important sensors and degradation stages in engine diagnostics [21,30]. However, post-hoc explanation methods do not always guarantee faithful correspondence with the internal reasoning process of the model. This limitation has motivated interest in architectures that incorporate interpretability directly into the learning pipeline. In aircraft engine diagnostics, such intrinsically interpretable designs remain relatively underexplored.
In contrast to prior work, PRIME introduces a physics-guided residual decomposition mechanism that explicitly separates operating-condition-driven nominal behavior from degradation-sensitive sensor deviations prior to deep temporal feature extraction. Unlike preprocessing-based regime normalization or domain adaptation methods, PRIME modifies the representation space itself through nominal–residual separation, allowing the downstream network to focus more directly on fault-relevant information.
Furthermore, PRIME integrates this residual modeling strategy within a hybrid Temporal Convolutional–Transformer architecture equipped with hierarchical attention. This combination enables:
• multi-scale temporal degradation modeling across short- and long-range dependencies,
• improved robustness under heterogeneous operating regimes,
• simultaneous multi-task predictions for FD, FTC, and HSE,
• built-in interpretability at both sensor and temporal levels.
Accordingly, the contribution of PRIME is not to introduce physics-informed learning in the strict PINN sense, but to provide a unified physics-guided residual learning framework for interpretable and robust multi-task aircraft engine diagnostics.
This section presents the proposed PRIME framework (Physics-guided Residual Integrated Multi-task framework for aircraft Engine diagnostics), a physics-guided and domain-robust deep learning framework designed for aircraft engine diagnostics under heterogeneous operating conditions.
PRIME is built around four main design principles: (i) explicit disentanglement of operating-condition effects from degradation-sensitive sensor variations, (ii) physically motivated residual decomposition, (iii) hybrid temporal modeling of residual dynamics, and (iv) unified multi-task prediction of fault detection (FD), fault type classification (FTC), and health state estimation (HSE).
Fig. 1 illustrates the overall workflow of PRIME. At a high level, the framework first estimates nominal operating-condition-driven behavior, then isolates degradation-sensitive residuals, and finally processes these residual representations through a hybrid temporal encoder with hierarchical attention to support multi-task diagnostic inference.

Figure 1: PRIME framework: physics-guided residual decomposition with hierarchical attention for multi-task aircraft engine diagnostics.
Let
where
where
3.2 Neural Architecture of PRIME
PRIME is organized as a two-branch neural architecture followed by a shared latent representation and three task-specific prediction heads. The first branch encodes operating-condition variables in order to estimate nominal behavior under varying regimes. The second branch processes degradation-sensitive residual signals through a hybrid temporal encoder that combines local and long-range sequence modeling.
The attended shared representation is then exploited by three heads dedicated to FD, FTC, and HSE. Importantly, PRIME does not directly optimize RUL through an additional neural head. Instead, RUL is treated as a downstream prognostic quantity projected from the learned health trajectory, as described in Section 3.8. Table 1 summarizes the functional neural architecture of the framework.

Because PRIME follows a sequential pipeline in which operating-condition encoding precedes residual extraction and temporal modeling, errors introduced in upstream stages may influence downstream representations. This modular design was adopted to improve interpretability and explicit nominal–residual disentanglement, but it also implies that imperfect nominal estimation may affect subsequent diagnostic processing.
3.3 Operating-Condition Encoding
The operating-condition branch captures regime-dependent nominal dynamics driven by environmental and control variables. A gated recurrent unit (GRU) is used to model temporal dependencies in the operating-condition sequence [31,32]:
where
3.4 Physics-Guided Residual Decomposition
To isolate degradation-sensitive deviations from operating-condition-driven variability, PRIME introduces a physics-guided residual decomposition layer. The approach is physically motivated rather than equation-constrained: it does not embed explicit thermodynamic equations or conservation laws in the loss function. Instead, it relies on the assumption that observed sensor measurements can be decomposed into: (i) a nominal component primarily associated with operating conditions, and (ii) a residual component more sensitive to degradation.
Nominal sensor behavioris estimated as:
where
The residual signal is then computed as:
This residual decomposition relies on an approximate additive separation between regime-dependent nominal behavior and degradation-sensitive deviations. Such an assumption is physically motivated and practically useful for disentangling operating-condition effects from fault-related patterns, but it does not imply that engine dynamics are strictly additive in all regimes. In particular, strong nonlinear coupling between operating conditions and degradation mechanisms may reduce the fidelity of this decomposition.
In addition, the quality of the residual representation depends on the accuracy of the nominal estimation stage. If the operating-condition encoder does not fully capture regime-dependent behavior, the subtraction step may introduce residual bias, which can then propagate to the downstream temporal modeling and task-specific heads. Accordingly, the proposed residual representation should be interpreted as a structured approximation for improving diagnostic inference rather than as an exact physical decomposition of engine behavior.
This decomposition suppresses operating-condition effects and highlights fault-relevant deviations. As a result, the downstream temporal encoder receives inputs that are less dominated by regime variability and more directly linked to degradation-sensitive patterns.
The residual sequence is processed through a hybrid temporal encoder composed of a Temporal Convolutional Network (TCN) followed by a Transformer encoder. The TCN captures short-range and local temporal correlations [15,33], while the Transformer models long-range dependencies across time and residual patterns [34]:
This hybrid design combines the strengths of convolutional temporal abstraction and self-attention-based sequence modeling, enabling more effective representation of progressive degradation signatures.
3.6 Hierarchical Attention Mechanism
To enhance interpretability and diagnostic precision, PRIME applies hierarchical attention over the encoded residual features [34,35]. Two complementary attention mechanisms are used.
Sensor-level attention assigns adaptive importance weights to each sensor channel:
where
Temporal attention identifies critical time steps associated with degradation progression:
The combination of sensor-level and temporal attention yields an attended representation that is both more discriminative and more interpretable.
3.7 Multi-Task Diagnostic Framework
PRIME is formulated as a unified multi-task learning framework addressing three complementary objectives: Fault Detection (FD), Fault Type Classification (FTC), and Health State Estimation (HSE).
Given an input sequence
which feeds three task-specific heads:
where
HSE as a Continuous Diagnostic Variable: HSE provides a continuous representation of degradation dynamics. Rather than treating health as a direct rescaling of RUL, PRIME learns a relative health index reflecting the progressive deviation of the engine from nominal behavior. Healthy conditions correspond to values close to 1, while increasingly degraded states approach 0. In multi-fault settings, intermediate values may capture relative degradation severity.
Because the HSE head is trained from residual degradation-sensitive representations, it is expected to encode smoother degradation dynamics than purely class-based supervision alone. In the main evaluation protocol, HSE is treated primarily as a continuous regression task. Accordingly, the predicted health index
Joint Optimization Objective: The overall training objective combines the three task-specific losses:
where
3.8 From Health State Estimation to RUL Projection
PRIME does not directly optimize Remaining Useful Life (RUL). Instead, RUL is estimated as a downstream prognostic projection from the learned health trajectory
Failure threshold: A failure event is defined as the first time at which the predicted health trajectory reaches a predefined threshold
The estimated RUL at time
Monotonic extrapolation: In practical settings, the predicted health trajectory may not intersect the threshold within the observed time horizon. In such cases,
Let the fitted local degradation model be:
with
Evaluation protocol: RUL performance is evaluated using RMSE, MAE, and the NASA scoring function. Importantly, no additional RUL loss is introduced during training. Therefore, RUL should be interpreted as a downstream operational indicator derived from the learned health trajectory rather than as an independently optimized prediction task.
Discussion. More flexible nonlinear projection strategies could also be considered, such as spline-based extrapolation, piecewise degradation models, Gaussian-process trajectory forecasting, or parametric wear models. Such alternatives may better capture nonlinear degradation phases, especially in settings with strong regime changes or accelerated wear near failure. Exploring these nonlinear HSE-to-RUL mappings is left for future work.
Algorithm 1 summarizes the training and inference procedure of PRIME. The framework first estimates nominal operating-condition-driven behavior, then extracts degradation-sensitive residuals, and finally processes these residual representations through a hybrid temporal encoder with hierarchical attention for multi-task prediction.

This section describes the datasets, preprocessing protocol, implementation details, evaluation metrics, and hyperparameter configuration used to validate the proposed PRIME framework.
The proposed PRIME framework is evaluated on three benchmark datasets covering both simulated turbofan degradation scenarios and complementary real-world UAV fault conditions: C-MAPSS, N-CMAPSS, and ALFA. Together, these datasets support a broad assessment of diagnostic robustness under controlled simulation settings and practical flight disturbances.
C-MAPSS: The NASA Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset [1,36] provides multivariate run-to-failure time-series collected from simulated turbofan engines under controlled degradation scenarios. The benchmark consists of trajectories generated under three operating settings and 21 sensor measurements, together with unit index and cycle information, resulting in 26 columns in the raw data format. Four subsets (FD001–FD004) are provided, differing in operating conditions and fault modes. FD002 and FD004 involve multiple operating regimes, while FD003 and FD004 include two degradation modes, namely HPC degradation and fan degradation [37].
N-CMAPSS: The N-CMAPSS dataset [2,38] extends C-MAPSS with higher-fidelity physics-based simulation and more realistic flight profiles. It includes additional environmental variables, richer sensor measurements, and explicit fault annotations. Compared with C-MAPSS, N-CMAPSS introduces greater operational variability and more complex degradation behavior, making it particularly relevant for evaluating robustness under realistic flight-condition changes.
ALFA: The AirLab Failure and Anomaly (ALFA) dataset [3] is a real fixed-wing UAV fault and anomaly detection dataset comprising 47 autonomous flight sequences collected under multiple failure scenarios. These include sudden full engine power loss as well as control-surface actuator faults. Unlike the simulation-based C-MAPSS and N-CMAPSS benchmarks, ALFA reflects real-world noise, non-stationarity, and operational uncertainty. In this work, ALFA is used as a complementary real-world dataset for evaluating fault detection robustness under practical flight conditions, rather than as a full turbofan prognostics benchmark.
Table 2 summarizes the key characteristics of the six aircraft engine datasets employed for evaluating the PRIME framework.

4.2 Data Preparation and Preprocessing
To ensure methodological consistency across datasets and tasks, all experiments in this work are conducted under a unified ten-fold cross-validation protocol at the engine-unit level. For each fold, engine trajectories are partitioned into training and validation/test subsets without overlap, so that all reported results are expressed as mean
A unified preprocessing pipeline is then applied across datasets and tasks.
Sensor signals are standardized using z-score normalization:
where
A sliding-window strategy is applied to construct temporal samples:
This segmentation enables the learning of short- and mid-term degradation dynamics.
Labels are generated according to the task formulation:
• FD: binary labels derived from fault annotations or failure-oriented decision criteria, depending on the dataset.
• FTC: multi-class labels corresponding to distinct fault modes (FD003, FD004).
• HSE: continuous health targets defined as a relative degradation indicator, where values close to 1 correspond to healthy operation and lower values indicate progressive degradation.
For ALFA, noise filtering and outlier removal are additionally applied to mitigate measurement variability inherent to real flight data.
The PRIME model is implemented in PyTorch. Experiments are conducted on an NVIDIA RTX A6000 GPU (48 GB VRAM).
Model parameters are optimized using the Adam optimizer [39].
Training is performed for a maximum of 100 epochs with early stopping based on validation loss to mitigate overfitting.
A ten-fold cross-validation protocol was adopted at the engine-unit level. For each fold, engines were partitioned into training and validation/test subsets without overlap, and all reported results are expressed as mean
Performance evaluation is conducted separately for the three diagnostic tasks: FD, FTC, and HSE. Classification metrics are derived from the confusion matrix, where
4.4.1 Classification Metrics (FD, FTC)
For each class
The overall Accuracy (
where
For binary FD, the False Alarm Rate (FAR) is defined as:
where
The Area Under the Receiver Operating Characteristic Curve (AUC) is additionally reported to evaluate discriminative capability independently of decision thresholds. For multi-class FTC, the main evaluation metrics are Accuracy, Macro-Recall, Macro-
4.4.2 Regression Metrics (Continuous HSE and RUL)
For continuous Health State Estimation (HSE), performance is evaluated using regression metrics, namely Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), coefficient of determination (
To validate performance differences between PRIME and baseline models, statistical comparisons are conducted across cross-validation folds. Paired Student’s
where
Statistical significance is established at
PRIME is compared against representative deep learning baselines: CNN-based architectures [41], LSTM networks [42], BiLSTM models [43], and Transformer-based models [34,44].
All baselines are trained under identical preprocessing and evaluation protocols to ensure fair comparison.
4.6 Implementation and Hyperparameter Configuration
The hyperparameters of PRIME were selected through validation-based grid search to ensure fair comparison across datasets and baseline models. In order to improve reproducibility, Table 3 reports the main configuration settings associated with the operating-condition encoder, the residual temporal encoder, the optimization procedure, and the multi-task objective.

5 Multi-Task Results and Discussion
This section evaluates PRIME across the three diagnostic tasks: Fault Detection (FD), Fault Type Classification (FTC), and Health State Estimation (HSE), followed by the derived Remaining Useful Life (RUL) analysis.
Unless otherwise stated, all experiments are conducted using a ten-fold cross-validation protocol at the engine-unit level. The reported results are expressed as mean
Although PRIME is designed as a unified multi-task framework, each task is evaluated only when the corresponding dataset provides annotations that support a meaningful and methodologically consistent assessment.
Fault Detection (FD) is reported on all datasets, since all of them provide labels that allow discrimination between nominal and faulty operating conditions.
Fault Type Classification (FTC) is evaluated only on FD003 and FD004, which provide a directly comparable multi-class fault taxonomy under controlled simulated degradation settings. Although N-CMAPSS contains richer fault-related annotations, the subset adopted in this study was selected primarily for FD and HSE evaluation under heterogeneous operating conditions. Its label organization is not directly aligned with the FTC configuration used for FD003 and FD004; therefore, N-CMAPSS is not included in the main FTC benchmark in order to preserve task consistency across datasets.
Health State Estimation (HSE) is assessed on FD001–FD004 and N-CMAPSS, where progressive degradation trajectories are available. By contrast, ALFA is not used for HSE or RUL evaluation because it does not provide continuous health trajectories or standardized run-to-failure targets comparable to those available in C-MAPSS and N-CMAPSS.
ALFA is therefore included as a complementary real-world UAV fault/anomaly dataset for evaluating FD robustness under practical flight conditions only. In this sense, PRIME remains a flexible multi-task architecture, but the activation of its task-specific heads depends on the annotation structure and prognostic suitability of each dataset.
This task–dataset alignment ensures methodological consistency, avoids task–label mismatch, and enables fairer interpretation of the reported results. Pairwise statistical significance is further assessed using paired
5.2 Fault Detection Performance
Table 4 summarizes the binary FD results across all datasets using ten-fold cross-validation.

As expected, FD001 yields the highest overall performance due to its lower operating-regime variability and simpler degradation patterns compared to multi-condition datasets.
PRIME consistently achieves the highest detection performance across all datasets. Accuracy improvements over the Transformer baseline range from +2.18% (FD002) to +3.38% (FD004). On FD004, PRIME increases the
AUC values remain consistently high for PRIME, exceeding 0.98 on five out of six datasets and reaching 0.993 on FD001. Even under the most challenging scenario (FD004), PRIME maintains strong discrimination capability with an AUC of 0.975. Standard deviations remain below 0.7 across all datasets, indicating stable training and low cross-fold variability. Performance gains are particularly pronounced on datasets characterized by higher operating-condition variability (FD003, FD004, and N-CMAPSS), suggesting improved robustness under heterogeneous regimes.
For binary FD, ROC curves are computed for each cross-validation fold. True positive rates (TPR) are interpolated over a common false positive rate (FPR) grid and averaged across folds to obtain robust estimates.
Fig. 2 presents the ROC curves for binary fault detection across three datasets (FD001, FD004, and N-CMAPSS). The curves compare PRIME with baseline models (Transformer, BiLSTM, and CNN).

Figure 2: ROC curves for binary FD across FD001, FD004, and N-CMAPSS.
On FD004, PRIME achieves a higher AUC (0.975) compared to the Transformer baseline (0.942), indicating improved discrimination capability. The largest performance gap occurs in the low-FPR region (
Although the observed gains over the strongest Transformer baseline remain moderate in absolute terms on some datasets, they are consistent across classification, health estimation, and downstream prognostic indicators. In this sense, the contribution of PRIME should not be interpreted as an accuracy-only improvement, but rather as a more integrated trade-off between multi-task performance, interpretability, and robustness under heterogeneous operating regimes. At the same time, we note that improved Accuracy,
5.3 Fault Type Classification Performance
Fault type classification (FTC) is evaluated on FD003 and FD004, which are the two datasets in this study providing a directly comparable multi-class fault taxonomy. N-CMAPSS is not included in the main FTC benchmark because the adopted subset was selected for FD/HSE evaluation and does not align directly with the FTC label structure considered for FD003 and FD004. Table 5 reports the ten-fold cross-validation results.

PRIME consistently achieves the best multi-class classification performance on both FTC benchmarks. On FD003, macro-
The gains are even more pronounced on FD004, where macro-
Taken together, these improvements indicate that PRIME better disentangles fault-specific signatures under heterogeneous operating regimes. The consistent increase in macro-recall further suggests that the gains are distributed across classes rather than driven by a single dominant category.
To provide a finer-grained assessment of multi-class behavior, Table 6 reports the per-class Precision (

The per-class analysis confirms that the observed gains are not driven by a single operating state. On both datasets, the Healthy class remains the easiest to discriminate, whereas the Combined fault class is the most challenging, which is expected given the overlap of multiple degradation signatures under heterogeneous operating conditions. Nevertheless, PRIME maintains strong and balanced performance across all classes, with per-class
Fig. 3 presents the confusion matrices for FTC on FD003 and FD004. The matrices confirm that PRIME preserves strong discrimination for the Healthy class while reducing confusion among degraded categories. The largest residual ambiguity is observed for the Combined fault class, which remains the most difficult operating state under multi-regime conditions.

Figure 3: Normalized confusion matrices (in %) of PRIME for fault type classification (FTC) on FD003 and FD004.
Fig. 4 presents the one-vs.-rest ROC curves for multi-class FTC on FD003 and FD004. The four operating states considered are Healthy, HPC degradation, Fan degradation, and Combined fault, together with the macro-averaged ROC.

Figure 4: One-vs.-rest ROC curves for multi-class FTC on FD003 and FD004.
PRIME consistently achieves the highest AUC across all classes, confirming strong discrimination between healthy operation and different fault mechanisms. These ROC results further corroborate the quantitative gains observed in Accuracy, macro-
5.4 Health State Estimation Performance
Health State Estimation (HSE) performance is evaluated using RMSE, MAE, and

On FD001, RMSE decreases from 16.4 cycles (CNN) and 10.2 cycles (Transformer) to 8.2 cycles, corresponding to reductions of 50.0% and 19.6%, respectively. MAE is reduced to 6.2 cycles, while
On more complex datasets, PRIME maintains similar improvements. For FD004, RMSE decreases from 16.7 to 14.6 cycles (
As expected, performance gradually declines from FD001 to FD004 and N-CMAPSS due to increasing degradation complexity and multi-regime operating variability. Nevertheless, PRIME consistently outperforms CNN, LSTM, BiLSTM, and Transformer baselines, demonstrating improved robustness under heterogeneous degradation conditions.
The consistent improvements in RMSE, MAE, and
ALFA is not included in the HSE evaluation because it does not provide continuous degradation trajectories or calibrated health targets comparable to those available in C-MAPSS and N-CMAPSS.
While HSE provides a continuous degradation trajectory

PRIME consistently achieves the lowest RUL prediction error across all evaluated datasets. On FD001, RMSE decreases from 13.5 cycles (Transformer) to 10.6 cycles (PRIME), corresponding to a 21.5% reduction, while MAE decreases from 10.3 to 8.1 cycles (21.4%). The NASA score is reduced from 201 to 158, indicating a substantial improvement in prognostic reliability.
Similar trends are observed on the more challenging datasets. On FD004, PRIME reduces RMSE from 23.6 to 18.4 cycles (22.0%) and lowers the NASA score from 392 to 318 (18.9%). On N-CMAPSS, RMSE decreases from 27.8 to 22.1 cycles (20.5%), while the NASA score improves from 486 to 402 (17.3%). These results indicate that the health trajectory learned by PRIME remains sufficiently informative to support robust downstream prognostic projection even under more complex operating regimes.
As expected, RUL estimation becomes more difficult as operating-condition variability and degradation complexity increase from FD001 to FD004 and N-CMAPSS. Nevertheless, PRIME maintains a consistent advantage over the strongest baseline, with RMSE reductions of approximately 20%–22% across the most relevant benchmarks. This suggests that improved health trajectory fidelity contributes directly to more accurate and operationally meaningful RUL estimates.
ALFA is excluded from the RUL benchmark because it does not provide a standardized run-to-failure structure suitable for threshold-based prognostic projection and cycle-level RUL evaluation.
Sensitivity to the failure threshold

On FD004, the performance remains relatively stable in the range
These results suggest that the proposed HSE-to-RUL projection is moderately robust to threshold selection within a reasonable operating range, while still benefiting from calibration on validation data. Based on this analysis,
6 Statistical Significance Analysis
To rigorously assess the superiority of PRIME, statistical hypothesis testing was conducted separately for each task. Accuracy was used for FD, Macro-
For each task, paired Student’s
Table 10 summarizes the statistical comparison between PRIME and baseline models on Accuracy.

PRIME significantly outperforms all baselines in FD. For comparisons against CNN and LSTM, the paired
6.2 Fault Type Classification (FTC)
Statistical tests were conducted on Macro-

PRIME achieves statistically significant improvements in Macro-
6.3 Health State Estimation (HSE)
Statistical evaluation for HSE was performed on MAE values across folds (Table 12).

PRIME yields statistically significant reductions in MAE compared with all baselines across the considered HSE datasets. The paired
Across all datasets and tasks, PRIME yields statistically significant improvements over the considered baselines. The paired
Effect size analysis further reveals large to very large practical effects. For both FD and FTC,
Statistical significance should not be interpreted as evidence of large practical gains in all cases. In the present work, paired tests are used primarily to assess whether the observed improvements are systematic across folds, while effect sizes are reported to complement p-values with a measure of practical relevance. Nevertheless, the current analysis remains limited to fold-wise statistics and does not include deeper uncertainty characterization, such as confidence intervals at the engine level or failure-case-level uncertainty analysis.
While mean
7 Interpretability and Physical Consistency Analysis
To interpret the behavior of the proposed framework and verify the physical consistency of the learned representations, we analyze three complementary aspects: residual heatmaps, latent space separation using t-SNE, and quantitative sensor importance.
To analyze the degradation patterns captured by PRIME, we examine sensor-level residual distributions through a residual heatmap visualization.
Fig. 5 presents the per-class sensor-level residual heatmap for FD004. Residuals are defined as

Figure 5: Per-class sensor-level residual heatmap for FD004.
Fourteen sensors exhibiting significant degradation-related variability are retained:
Each macro-band corresponds to a fault category (HPC degradation, Fan degradation, or Combined Fault), while the fourteen horizontal stripes represent temporal residual trajectories of the selected sensors.
Distinct class-dependent patterns emerge. The HPC degradation shows progressively increasing residuals during later cycles, mainly in temperature and pressure sensors. In contrast, the Fan degradation produces more localized and abrupt deviations. The combined fault generates larger residual magnitudes and broader activation regions, reflecting interacting degradation mechanisms across multiple engine subsystems.
Thermodynamic variables such as
Regions with strong residual activation align with high-confidence classification decisions, establishing a direct link between physics-guided residual modeling and improved FD and FTC performance. The structured residual organization confirms that PRIME captures both abrupt fault transitions and gradual degradation dynamics while preserving physical interpretability.
7.2 Latent Space Separation Analysis Using t-SNE
To further analyze the discriminative structure of the learned representations, we apply t-distributed Stochastic Neighbor Embedding (t-SNE) [45] to the 128-dimensional latent features extracted prior to the FTC classification head. The projection is computed using perplexity = 30, learning rate = 200, 1500 iterations, and a fixed random seed for reproducibility.
Fig. 6 compares PRIME with the Transformer baseline on FD004. The Transformer latent space exhibits partial class overlap and dispersion across operating regimes, indicating residual entanglement between regime variability and fault signatures. In contrast, PRIME produces more compact and well-separated clusters, with clearer inter-class margins and reduced inter-regime mixing.

Figure 6: t-SNE visualization of latent space for FTC on FD004.
This improved geometric separability is consistent with the higher macro-
7.3 Quantitative Sensor Importance
To complement the qualitative insights provided by the residual heatmaps, we perform a quantitative analysis of sensor relevance using two complementary approaches: attention weight statistics and permutation feature importance.
Attention Weight Statistics: The hierarchical attention mechanism assigns an importance weight to each sensor representation. Let
where
Table 13 reports the average attention weights of the most influential sensors for the FD004 dataset.

The results show that thermodynamic variables associated with compressor and turbine stages receive the highest attention weights, highlighting their dominant role in capturing degradation dynamics.
Permutation Feature Importance: To validate sensor contributions independently of the attention mechanism, permutation feature importance (PFI) is computed. For each sensor channel
where
The permutation analysis confirms that compressor and turbine thermodynamic variables, particularly
To quantify the contribution of each architectural component in PRIME, we conducted an ablation study under the same ten-fold cross-validation protocol used for FD, FTC, and HSE evaluations.
We evaluated three reduced variants:
• PRIME w/o Attention: hierarchical attention module removed.
• PRIME w/o Physics Residual: residual separation of operational and degradation effects removed.
• PRIME w/o Transformer: transformer encoder replaced by a BiLSTM layer.
• Full PRIME: complete proposed architecture.
Table 14 summarizes the ablation results for the fault detection (FD) task, reporting the mean and standard deviation of Accuracy (

Across all datasets, each architectural component contributes positively to FD performance. The largest degradation occurs when the physics-guided residual module is removed. On FD004, accuracy decreases from 97.52% to 89.7% (–7.8%), confirming that separating operational variability from degradation dynamics is essential under heterogeneous operating regimes.
Removing hierarchical attention leads to a consistent performance drop of approximately 2%–3%, highlighting its role in enhancing sensor-level feature selectivity. Replacing the transformer with a BiLSTM causes a moderate decrease (about 1%–2%), suggesting that global contextual modeling improves temporal representation stability.
Overall, Full PRIME consistently achieves the highest accuracy with the lowest variance. Statistical tests across cross-validation folds confirm that the improvements are significant (
8.2 Multi-Task Ablation Analysis
To evaluate whether these contributions generalize beyond FD, the ablation study is extended to FTC and HSE tasks. Table 15 reports FTC results. Removing the physics-guided residual module produces the largest degradation, reducing macro-

Similarly, the HSE ablation (Table 16) shows that removing the physics-guided residual module substantially increases prediction error. MAE rises by 38.7% on FD001, 22.4% on FD004, and 20.1% on N-CMAPSS, demonstrating that residual-based disentanglement significantly improves degradation trajectory estimation.

Overall, the ablation analysis confirms that the physics-guided residual module is the dominant contributor to PRIME’s performance improvements, while hierarchical attention and transformer-based temporal modeling further enhance feature selectivity and representation stability.
8.3 Ablation Study on Loss Weighting
To assess the sensitivity of PRIME to the task-balancing coefficients, we vary the loss weights associated with the FD, FTC, and HSE objectives on two multi-task benchmarks, namely FD003 and FD004. These two datasets were selected because they both support the complete FD–FTC–HSE setting, while FD004 provides the more challenging multi-regime scenario.
Table 17 reports the performance obtained under different weighting configurations, where

Consistent trends are observed on both datasets. Increasing
On FD003, the balanced configuration
Overall, the results suggest that PRIME remains reasonably stable under moderate perturbations of the task weights on both FD003 and FD004. However, this analysis should not be interpreted as evidence of universal robustness across all datasets or task structures. In particular, datasets such as ALFA do not support the full FD–FTC–HSE configuration considered here. Therefore, the present ablation supports the stability of the proposed multi-task formulation on the two controlled multi-task benchmarks, while broader cross-dataset validation remains an important direction for future work.
9 Computational Complexity Analysis
This section provides a comparative computational analysis between PRIME and the adopted baselines (CNN, LSTM, BiLSTM, Transformer). We report dominant time and memory complexities with respect to sequence length
Table 18 summarizes both asymptotic complexity and empirical measurements obtained under the experimental setup described in Section 4 (single GPU, fixed window length, identical batch size).

Several observations can be drawn.
First, Transformer architectures exhibit quadratic complexity with respect to sequence length (
Second, although PRIME shares similar asymptotic complexity with LSTM (
Third, Transformer models exhibit the highest inference latency, which may limit their applicability in latency-sensitive industrial monitoring or embedded prognostic systems. PRIME achieves a favorable balance between parameter count and computational cost, remaining substantially lighter than Transformer architectures while maintaining competitive or superior predictive performance (Section 5).
Overall, the proposed framework provides an effective trade-off between modeling capacity and computational efficiency, making it suitable for real-time fault diagnosis and scalable deployment scenarios.
10 State-of-the-Art Performance Comparison
This section positions PRIME relative to representative published studies on C-MAPSS. Because prior works rely on heterogeneous evaluation settings, including different train/test partitions, preprocessing strategies, subset selections, target definitions, and reporting protocols, the following comparisons are intended for contextual positioning only rather than for strict head-to-head benchmarking. Accordingly, the main empirical conclusions of this paper are drawn from the controlled comparisons against CNN, LSTM, BiLSTM, and Transformer baselines evaluated under the same ten-fold cross-validation protocol adopted in this work.
Under the unified cross-validation protocol used in this work, PRIME consistently outperforms all controlled baseline models for both fault diagnosis and health trajectory estimation. Relative to the broader literature, Tables 19 and 20 indicate that PRIME remains competitive with representative published methods. In several cases, its reported performance under the present protocol is comparable to or higher than values reported in prior studies, although such observations remain contextual because the underlying evaluation settings are not fully aligned.
For FD, PRIME achieves very strong discrimination capability under the adopted protocol, with the best results obtained on FD001 (99.47% accuracy, 99.37% recall, and 99.48%
11 Limitations and Future Work
Despite the promising performance of PRIME across multiple datasets and tasks, several limitations should be acknowledged.
First, the proposed residual decomposition relies on an approximate additive separation between nominal operating-condition behavior and degradation-sensitive deviations. Although this approximation is effective on the considered benchmarks, it may become less accurate under strong nonlinear coupling, abrupt regime transitions, or fault modes whose manifestation depends jointly on operating conditions and degradation state.
Second, the quality of the residual representation depends on the nominal estimation stage. If the operating-condition encoder does not fully capture regime-dependent behavior, the subtraction step may introduce bias, which can then affect downstream temporal modeling and task-specific predictions. More generally, the sequential organization of the pipeline may propagate upstream errors to later stages.
Third, although PRIME is evaluated on C-MAPSS, N-CMAPSS, and ALFA, the current benchmark coverage remains limited with respect to real-world variability, fault diversity, and deployment conditions. In particular, FTC is restricted to FD003 and FD004 because the adopted N-CMAPSS subset does not provide a directly aligned fault taxonomy, while ALFA is used only for FD since it does not provide continuous health trajectories or standardized run-to-failure targets for HSE and RUL evaluation.
Fourth, while cross-dataset results suggest encouraging generalization, the current experiments do not explicitly stress-test the framework under unseen operating regimes, sensor drift, missing-sensor conditions, or severe domain shifts. In addition, although fold-wise mean
Finally, although the observed gains over strong baselines are systematic and statistically supported, some of the absolute improvements remain moderate, especially against advanced Transformer-based models. The value of PRIME should therefore be interpreted not only through average performance gains, but also through its unified treatment of FD, FTC, and HSE, its support for downstream prognostic projection, and its built-in interpretability.
Future work will investigate more flexible disentanglement strategies for nonlinear regime–degradation interactions, broader robustness evaluation under unseen regimes and sensor drift, adaptive multi-task weighting, uncertainty-aware learning, and more advanced HSE-to-RUL projection mechanisms. Additional directions include lightweight deployment-oriented variants of PRIME and the integration of richer multimodal sensing inputs such as vibration, acoustic, and environmental measurements.
This paper presented PRIME, a physics-guided residual integrated multi-task framework for aircraft engine diagnostics under heterogeneous operating conditions. Rather than enforcing explicit thermodynamic equations or physics-based constraints in the optimization process, PRIME relies on a physically motivated residual decomposition strategy that separates operating-condition-driven nominal behavior from degradation-sensitive sensor deviations. This residual disentanglement mechanism, coupled with hybrid temporal modeling and hierarchical attention, enables robust and interpretable diagnostic inference across variable operating regimes.
Extensive experiments on NASA C-MAPSS, N-CMAPSS, and ALFA showed that PRIME consistently outperforms strong baseline models across the considered tasks. For Fault Detection (FD), the framework achieved near-ceiling performance, with accuracy and
For Health State Estimation (HSE), PRIME produced more faithful degradation trajectories and consistently reduced estimation error relative to all baselines, with RMSE ranging from 8.2 cycles on FD001 to 21.2 cycles on N-CMAPSS and
Beyond quantitative performance, PRIME provides built-in interpretability through residual analysis and hierarchical attention. The resulting sensor-level and temporal patterns are consistent with degradation-sensitive engine behavior, thereby improving transparency and practical trustworthiness in safety-critical PHM settings. In particular, the strongest gains were observed on the more challenging multi-regime benchmarks, suggesting that residual operating-condition disentanglement is especially beneficial when regime variability and degradation patterns are strongly entangled.
Overall, the contribution of PRIME lies not in introducing fundamentally new neural primitives, but in providing a principled integration of residual operating-condition disentanglement, hybrid temporal modeling, hierarchical interpretability, and multi-task diagnostic learning within a unified PHM framework. This integrated design offers a robust and scalable solution for aircraft engine diagnostics in complex operating environments.
The proposed residual disentanglement should therefore be understood as a practically effective approximation rather than as a universally valid physical decomposition. The present results support its usefulness on the considered benchmarks, but broader validation under unseen regimes, sensor drift, and more detailed uncertainty analysis will be necessary to further assess its deployment readiness. Similarly, while the observed gains are systematic and statistically supported, future work should better characterize the relationship between predictive improvements, architectural complexity, and cross-metric consistency.
Future work will focus on extending the framework toward better-calibrated prognostic projection, broader evaluation across heterogeneous fleet conditions, adaptive task balancing, and improved integration of complementary sensing modalities for real-world deployment.
Acknowledgement: Not applicable.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: conception and design, data curation, literature review, analysis and interpretation of results: Soukaina Mjahed and Ouail Mjahed; draft manuscript preparation: Ouail Mjahed, writing—review and editing, supervision: Soukaina Mjahed. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: The data supporting the conclusions of this study are freely available at the websites cited in references [1–3].
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| ALFA | AirLab Failure and Anomaly dataset |
| AUC | Area Under the Curve |
| BiLSTM | Bidirectional Long Short-Term Memory |
| CNN | Convolutional Neural Network |
| C-MAPSS | Commercial Modular Aero-Propulsion System Simulation dataset |
| FAR | False Alarm Rate |
| FD | Fault Detection |
| FTC | Fault Type Classification |
| GNN | Graph Neural Network |
| GRU | Gated Recurrent Unit |
| HSE | Health State Estimation |
| LSTM | Long Short-Term Memory |
| MAE | Mean Absolute Error |
| N-CMAPSS | Numerical Commercial Modular Aero-Propulsion System Simulation dataset |
| PHM | Prognostics and Health Management |
| PINN | Physics-Informed Neural Network |
| PRIME | Physics-guided Residual Integrated framework for Multi-task aircraft Engine diagnostics |
| RMSE | Root Mean Squared Error |
| ROC | Receiver Operating Characteristic |
| RUL | Remaining Useful Life |
| t-SNE | t-distributed Stochastic Neighbor Embedding |
| TCN | Temporal Convolutional Network |
References
1. Saxena A, Goebel K, Simon D, Eklund NHW. Damage propagation modeling for aircraft engine run-to-failure simulation. In: Proceedings of 2008 International Conference on Prognostics and Health Management; 2008 Oct 6–9; Denver, CO, USA. p. 1–9. doi:10.1109/PHM.2008.4711414. [Google Scholar] [CrossRef]
2. Chatterjee S, Keprate A. Exploratory data analysis of the N-CMAPSS dataset for prognostics. In: Proceedings of 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM); 2021 Dec 13–16; Singapore. p. 1114–21. doi:10.1109/IEEM50564.2021.9673064. [Google Scholar] [CrossRef]
3. Keipour A, Mousaei M, Scherer S. ALFA: a dataset for uav fault and anomaly detection. Int J Robot Res. 2021;20(2–3):515–20. doi:10.1177/0278364920966642. [Google Scholar] [CrossRef]
4. Heimes FO. Recurrent neural networks for remaining useful life estimation. In: Proceedings of 2008 International Conference on Prognostics and Health Management; 2008 Oct 6–9; Denver, CO, USA. p. 1–6. doi:10.1109/PHM.2008.4711422. [Google Scholar] [CrossRef]
5. Marcia L, Baptista ML, Henriques EMP, Prendinger H. Classification prognostics approaches in aviation. Measurement. 2021;182(5):109756. doi:10.1016/j.measurement.2021.109756. [Google Scholar] [CrossRef]
6. Zheng S, Ristovski K, Farahat A, Gupta C. Long short-term memory network for remaining useful life estimation. In: Proceedings of 2017 IEEE International Conference on Prognostics and Health Management (ICPHM); 2017 Jun 19–21; Dallas, TX, USA. p. 88–95. doi:10.1109/ICPHM.2017.7998311. [Google Scholar] [CrossRef]
7. Li X, Ding Q, Sun J-Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab Eng Syst Saf. 2018;172(1–2):1–11. doi:10.1016/j.ress.2017.11.021. [Google Scholar] [CrossRef]
8. Fang X, Xiao L, Shan Y. PBMT: a novel transformer-based model for accurate RUL prediction in industrial systems. In: Proceedings of 2024 Global Reliability and Prognostics and Health Management Conference (PHM-Beijing); 2024 Oct 11–13; Beijing, China. p. 1–8. doi:10.1109/PHM-Beijing63284.2024.10874751. [Google Scholar] [CrossRef]
9. Kim G, Choi JG, Lim S. Using transformer and a reweighting technique to develop a remaining useful life estimation method for turbofan engines. Eng Appl Artif Intell. 2024;133(Pt E):108475. doi:10.1016/j.engappai.2024.108475. [Google Scholar] [CrossRef]
10. Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans Knowl Data Eng. 2017;29(10):2318–31. doi:10.1109/TKDE.2017.2720168. [Google Scholar] [CrossRef]
11. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks. J Comput Phys. 2019;378:686–707. doi:10.1016/j.jcp.2018.10.045. [Google Scholar] [CrossRef]
12. Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L. Physics-informed machine learning. Nat Rev Phys. 2021;3(6):422–40. doi:10.1038/s42254-021-00314-5. [Google Scholar] [CrossRef]
13. Xiao D, Xiao H, Li R, Wang Z. Application of physical-structure-driven deep learning and compensation methods in aircraft engine health management. Eng Appl Artif Intell. 2024;136(Pt B):109024. doi:10.1016/j.engappai.2024.109024. [Google Scholar] [CrossRef]
14. Fu S, Avdelidis NP, Plastropoulos A. Novel hybrid prognostics of aircraft systems. Electronics. 2025;14(11):2193. doi:10.3390/electronics14112193. [Google Scholar] [CrossRef]
15. Chai A, Fang Z, Lian M, Huang P, Guo C, Yin W, et al. Hi-MDTCN: hierarchical multi-scale dilated temporal convolutional network for tool condition monitoring. Sensors. 2025;25(24):7603. doi:10.3390/s25247603. [Google Scholar] [PubMed] [CrossRef]
16. Xu Z, Zhang Y, Miao Q. An attention-based multi-scale temporal convolutional network for remaining useful life prediction. Reliab Eng Syst Saf. 2024;250(3):110288. doi:10.1016/j.ress.2024.110288. [Google Scholar] [CrossRef]
17. Wang Y, Wu M, Li X, Xie L, Chen Z. A survey on graph neural networks for remaining useful life prediction: methodologies, evaluation and future trends. Mech Syst Signal Process. 2025;229(1):112449. doi:10.1016/j.ymssp.2025.112449. [Google Scholar] [CrossRef]
18. Li Z, Ma J, Fan R, Zhao Y, Ai J, Dong Y. Aircraft sensor fault diagnosis based on graphsage and attention mechanism. Sensors. 2025;25(3):809. doi:10.3390/s25030809. [Google Scholar] [PubMed] [CrossRef]
19. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell. 2021;35(12):11106–15. doi:10.1609/aaai.v35i12.17325. [Google Scholar] [CrossRef]
20. Zhong J, Li H, Chen Y, Huang C, Zhong S, Geng H. Remaining useful life prediction of rolling bearings based on ECA-CAE and autoformer. Biomimetics. 2024;9(1):40. doi:10.3390/biomimetics9010040. [Google Scholar] [PubMed] [CrossRef]
21. Zhu J, Liang S, Ma Z, Huang X. Attention-based multi-modal learning for aircraft engine fan fault diagnosis. Aerosp Sci Technol. 2025;162(1):110194. doi:10.1016/j.ast.2025.110194. [Google Scholar] [CrossRef]
22. Dinten R, Zorrilla M. Using time series foundation models for few-shot remaining useful life prediction of aircraft engines. Comput Model Eng Sci. 2025;144(1):239–65. doi:10.32604/cmes.2025.065461. [Google Scholar] [CrossRef]
23. Szrama S. Adaptive cluster-count selection via deep Q-learning for turbofan engine prognostics and health monitoring. Neurocomputing. 2026;665:132294. doi:10.1016/j.neucom.2025.132294. [Google Scholar] [CrossRef]
24. Szrama S. Turbofan engine health status prediction with heterogeneous ensemble deep neural networks. Int J Data Sci Anal. 2026;22(1):23. doi:10.1007/s41060-025-00989-4. [Google Scholar] [CrossRef]
25. Duan Y, Xiao J, Li H, Zhang J. Cross-domain remaining useful life prediction based on adversarial training. Machines. 2022;10(6):438. doi:10.3390/machines10060438. [Google Scholar] [CrossRef]
26. Ren L, Qin H, Cai N, Li B, Xie Z. A hybrid degradation evaluation model for aero-engines. Sustainability. 2023;15(1):29. doi:10.3390/su15010029. [Google Scholar] [CrossRef]
27. Liu J, Yu Z, Zuo H, Fu R, Feng X. Multi-stageresidual life prediction of aero-engine based on real-time clustering and combined prediction model. Reliab Eng Syst Saf. 2022;225(11):108624. doi:10.1016/j.ress.2022.108624. [Google Scholar] [CrossRef]
28. Willard J, Jia X, Xu S, Steinbach M, Kumar V. Integrating physics-based modeling with machine learning: a survey. arXiv:2003.04919. 2022. [Google Scholar]
29. Chao MA, Kulkarni C, Goebel K, Fink O. Fusing physics-based and deep learning models for prognostics. Reliab Eng Syst Saf. 2022;217(3):107961. doi:10.1016/j.ress.2021.107961. [Google Scholar] [CrossRef]
30. Cummins L, Sommers A, Ramezani S, Mittal S, Jabour J, Seale M, et al. Explainable predictive maintenance: a survey of current methods, challenges and opportunities. IEEE Access. 2024;12(4):57574–602. doi:10.1109/ACCESS.2024.3391130. [Google Scholar] [CrossRef]
31. He S, Wang S, Zhang R. A generalizable gated graph recurrent unit (Graph-GRU) network for nonlinear response prediction of cross-structures. Comput Struct. 2025;318(5):107968. doi:10.1016/j.compstruc.2025.107968. [Google Scholar] [CrossRef]
32. Mienye ID, Swart TG, Obaido G. Recurrent neural networks: a comprehensive review of architectures, variants, and applications. Information. 2024;15(9):517. doi:10.3390/info15090517. [Google Scholar] [CrossRef]
33. Wang M, Qin F. A TCN-linear hybrid model for chaotic time series forecasting. Entropy. 2024;26(6):467. doi:10.3390/e26060467. [Google Scholar] [PubMed] [CrossRef]
34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4–9; Long Beach, CA, USA. p. 5998–6008. [Google Scholar]
35. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, ICLR; 2015 May 7–9; San Diego, CA, USA. p. 1–15. [Google Scholar]
36. Saxena A, Goebel K. Turbofan engine degradation simulation data set. NASA Ames Progn Data Repos. 2008;18:878–87. [Google Scholar]
37. Ramasso E, Saxena A. Performance benchmarking and analysis of prognostic methods for C-MAPSS datasets. Int J Progn Health Manag. 2014;5(2):1–15. doi:10.36001/ijphm.2014.v5i2.2236. [Google Scholar] [CrossRef]
38. Chao MA, Kulkarni C, Goebel K, Fink O. Aircraft engine run-to-failure dataset under real flight conditions for prognostics and diagnostics. Data. 2021;6(1):5. doi:10.3390/data6010005. [Google Scholar] [CrossRef]
39. Kingma DP, Ba J. Adam: a method for stochastic optimization. In: Proceedings of 3rd International Conference for Learning Representations (ICLR); 2015 May 7–9; San Diego, CA, USA. p. 1–15. [Google Scholar]
40. Rajić V. Statistical hypothesis testing: a comprehensive review of theory, methods, and applications. Mathematics. 2026;14(2):300. doi:10.3390/math14020300. [Google Scholar] [CrossRef]
41. Purwono I, Maarif A, Rahmaniar W, Imam H, Frisky AZK, ul Haq QM. Understanding of convolutional neural network (CNNa review. Int J Robot Control Syst. 2023;2(4):739–48. doi:10.31763/ijrcs.v2i4.888. [Google Scholar] [CrossRef]
42. Krichen M, Mihoub A. Long short-term memory networks: a comprehensive survey. Artif Intell. 2025;6(9):215. doi:10.3390/ai6090215. [Google Scholar] [CrossRef]
43. Yang T, Cheng Y, Ren Y, Lou Y, Wei M, Xin H. A deep learning framework for sequence mining with bidirectional LSTM and multi-scale attention. In: Proceedings of 2nd International Conference on Innovation Management and Information System (ICIIS 2025); 2025 Apr 18–20; Shenzhen, China. p. 472–6. doi:10.1145/3745676.3745751. [Google Scholar] [CrossRef]
44. Su L, Zuo X, Li R, Wang X, Zhao H, Huang B. A systematic review for transformer-based long-term series forecasting. Artif Intell Rev. 2025;58(3):80. doi:10.1007/s10462-024-11044-2. [Google Scholar] [CrossRef]
45. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605. [Google Scholar]
46. Özcan H. Predictive maintenance in aircraft engine maintenance using the C-MAPSS dataset: performance comparison and evaluation of machine learning classification algorithms. Artif Intell Eng Des Anal Manuf. 2026;40:e4. doi:10.1017/S0890060426100249. [Google Scholar] [CrossRef]
47. Sharma DB, Kodipalli A, Rao T, Rohini BR. Machine predictive maintenance classification using machine learning. In: Proceedings of 2023 International Conference on Computational Intelligence for Information, Security and Communication Applications (CIISCA); 2023 Jun 22–23; Bengaluru, India. p. 308–13. doi:10.1109/CIISCA59740.2023.00066. [Google Scholar] [CrossRef]
48. Al Hasib A, Rahman A, Khabir M, Shawon M. An interpretable systematic review of machine learning models for predictive maintenance of aircraft engine. arXiv:2309.13310. 2023. [Google Scholar]
49. Melkumian SA. Predictive maintenance analysis of turbofan engine sensor data. J Purdue Undergrad Res. 2024;14(1):8. doi:10.7771/2158-4052.1708. [Google Scholar] [CrossRef]
50. Wu J, Kong L, Kang S, Zuo H, Yang Y, Cheng Z. Aircraft engine fault diagnosis model based on 1DCNN-BiLSTM with CBAM. Sensors. 2024;24(3):780. doi:10.3390/s24030780. [Google Scholar] [PubMed] [CrossRef]
51. Li Y, Chen Y, Hu Z, Zhang H. Remaining useful life prediction of aero-engine enabled by fusing knowledge and deep learning models. Reliab Eng Syst Saf. 2023;229(4):108869. doi:10.1016/j.ress.2022.108869. [Google Scholar] [CrossRef]
52. Zhu Q, Xiong Q, Yang Z, Yu Y. A novel feature-fusion-based end-to-end approach for remaining useful life prediction. J Intell Manuf. 2023;34(8):3495–505. doi:10.1007/s10845-022-02015-x. [Google Scholar] [CrossRef]
53. Yang X, Gao X, Zheng H, Yang M, Liu Y. A hybrid prognosis method based on health indicator and wiener process: the case of multi-sensor monitored aero-engine. Eng Appl Artif Intell. 2025;144(Pt C):110099. doi:10.1016/j.engappai.2025.110099. [Google Scholar] [CrossRef]
54. Liu CL, Xiao B, Hsu SS. Self-supervised learning for remaining useful life prediction using simple triplet networks. Adv Eng Inform. 2025;64(1):103038. doi:10.1016/j.aei.2024.103038. [Google Scholar] [CrossRef]
55. Nunes P, Santos J, Rocha E. Combining generalized fault trees and k-LSTM ensembles for enhancing prognostics and health management. CIRP J Manuf Sci Technol. 2025;63:505–21. doi:10.1016/j.cirpj.2025.11.002. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF

Downloads
Citation Tools