Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.073189
Special Issues
Table of Content

Open Access

ARTICLE

Engine Failure Prediction on Large-Scale CMAPSS Data Using Hybrid Feature Selection and Imbalance-Aware Learning

Ahmad Junaid1, Abid Iqbal2,*, Abuzar Khan1, Ghassan Husnain1,*, Abdul-Rahim Ahmad3, Mohammed Al-Naeem4
1 Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, 25000, Pakistan
2 Department of Computer Engineering, College of Computer Sciences and Information Technology, King Faisal University, Al Ahsa, 31982, Saudi Arabia
3 Department of Information Systems, King Faisal University, Al-Hofuf, 31982, Saudi Arabia
4 Department of Computer Networks Communications, CCSIT, King Faisal University, Al Ahsa, 31982, Saudi Arabia
* Corresponding Author: Abid Iqbal. Email: email; Ghassan Husnain. Email: email
(This article belongs to the Special Issue: AI for Industry 4.0 and 5.0: Intelligent Robotics, Cyber-Physical Systems, and Resilient Automation)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.073189

Received 12 September 2025; Accepted 02 December 2025; Published online 22 December 2025

Abstract

Most predictive maintenance studies have emphasized accuracy but provide very little focus on Interpretability or deployment readiness. This study improves on prior methods by developing a small yet robust system that can predict when turbofan engines will fail. It uses the NASA CMAPSS dataset, which has over 200,000 engine cycles from 260 engines. The process begins with systematic preprocessing, which includes imputation, outlier removal, scaling, and labelling of the remaining useful life. Dimensionality is reduced using a hybrid selection method that combines variance filtering, recursive elimination, and gradient-boosted importance scores, yielding a stable set of 10 informative sensors. To mitigate class imbalance, minority cases are oversampled, and class-weighted losses are applied during training. Benchmarking is carried out with logistic regression, gradient boosting, and a recurrent design that integrates gated recurrent units with long short-term memory networks. The Long Short-Term Memory–Gated Recurrent Unit (LSTM–GRU) hybrid achieved the strongest performance with an F1 score of 0.92, precision of 0.93, recall of 0.91, Receiver Operating Characteristic–Area Under the Curve (ROC-AUC) of 0.97, and minority recall of 0.75. Interpretability testing using permutation importance and Shapley values indicates that sensors 13, 15, and 11 are the most important indicators of engine wear. The proposed system combines imbalance handling, feature reduction, and Interpretability into a practical design suitable for real industrial settings.

Keywords

Predictive maintenance; CMAPSS dataset; feature selection; class imbalance; LSTM-GRU hybrid model; interpretability; industrial deployment
  • 129

    View

  • 25

    Download

  • 0

    Like

Share Link