Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.067759
Special Issues
Table of Content

Open Access

ARTICLE

HDFPM: A Heterogeneous Disk Failure Prediction Method Based on Time Series Features

Zhongrui Jing1, Hongzhang Yang1,*, Jiangpu Guo2
1 School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China
2 R&D Department, Roycom Information Technology Corporation, Tianjin, 301721, China
* Corresponding Author: Hongzhang Yang. Email: email
(This article belongs to the Special Issue: Signal Processing for Fault Diagnosis)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.067759

Received 12 May 2025; Accepted 22 October 2025; Published online 26 November 2025

Abstract

Hard disk drives (HDDs) serve as the primary storage devices in modern data centers. Once a failure occurs, it often leads to severe data loss, significantly degrading the reliability of storage systems. Numerous studies have proposed machine learning-based HDD failure prediction models. However, the Self-Monitoring, Analysis, and Reporting Technology (SMART) attributes differ across HDD manufacturers. We define hard drives of the same brand and model as homogeneous HDD groups, and those from different brands or models as heterogeneous HDD groups. In practical engineering scenarios, a data center is often composed of a heterogeneous population of HDDs, spanning multiple vendors and models. Existing research predominantly focuses on homogeneous datasets, ignoring the model’s generalization capability across heterogeneous HDDs. As a result, HDD models with limited samples often suffer from poor training effectiveness and prediction performance. To address this issue, we investigate generalizable SMART predictors across heterogeneous HDD groups. By extracting time-series features within a fixed sliding time window, we propose a Heterogeneous Disk Failure Prediction Method based on Time Series Features (HDFPM) framework. This method is adaptable to HDD models with limited sample sizes, thereby enhancing its applicability and robustness across diverse drive populations. Experimental results show that the proposed model achieves an F1-score of 0.9518 when applied to two different Seagate HDD models, while maintaining the False Positive Rate (FPR) below 1%. After incorporating the Complexity-Ratio Dynamic Time Warping (CDTW) based feature enhancement method, the best prediction model achieves a True Positive Rate (TPR) of up to 0.93 between the two models. For next-day failure prediction across various Seagate models, the model achieves an F1-score of up to 0.8792. Moreover, the experimental results also show that within the same brand, the higher the proportion of shared SMART attributes across different models, the better the prediction performance. In addition, HDFPM demonstrates the best stability and most significant performance in heterogeneous environments.

Keywords

Heterogeneous hard disk drives; failure prediction; time series feature; constrained dynamic time warping; sensitivity analysis
  • 82

    View

  • 11

    Download

  • 0

    Like

Share Link