Open Access iconOpen Access

ARTICLE

crossmark

A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation

Thierry Mugenzi, Cahit Perkgoz*

Department of Computer Engineering, Faculty of Engineering, Eskisehir Technical University, Eskisehir, 26555, Türkiye

* Corresponding Author: Cahit Perkgoz. Email: email

Computers, Materials & Continua 2026, 86(1), 1-21. https://doi.org/10.32604/cmc.2025.070381

Abstract

Missing data presents a crucial challenge in data analysis, especially in high-dimensional datasets, where missing data often leads to biased conclusions and degraded model performance. In this study, we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision. The proposed loss combines (i) a guided, masked mean squared error focusing on missing entries; (ii) a noise-aware regularization term to improve resilience against data corruption; and (iii) a variance penalty to encourage expressive yet stable reconstructions. We evaluate the proposed model across four missingness mechanisms, such as Missing Completely at Random, Missing at Random, Missing Not at Random, and Missing Not at Random with quantile censorship, under systematically varied feature counts, sample sizes, and missingness ratios ranging from 5% to 60%. Four publicly available real-world datasets (Stroke Prediction, Pima Indians Diabetes, Cardiovascular Disease, and Framingham Heart Study) were used, and the obtained results show that our proposed model consistently outperforms baseline methods, including traditional and deep learning-based techniques. An ablation study reveals the additive value of each component in the loss function. Additionally, we assessed the downstream utility of imputed data through classification tasks, where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios. The model demonstrates strong scalability and robustness, improving performance with larger datasets and higher feature counts. These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations, making it a promising solution for robust data recovery in clinical applications.

Keywords

Missing data imputation; autoencoder; deep learning; missing mechanisms

Supplementary Material

Supplementary Material File

Cite This Article

APA Style
Mugenzi, T., Perkgoz, C. (2026). A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation. Computers, Materials & Continua, 86(1), 1–21. https://doi.org/10.32604/cmc.2025.070381
Vancouver Style
Mugenzi T, Perkgoz C. A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation. Comput Mater Contin. 2026;86(1):1–21. https://doi.org/10.32604/cmc.2025.070381
IEEE Style
T. Mugenzi and C. Perkgoz, “A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation,” Comput. Mater. Contin., vol. 86, no. 1, pp. 1–21, 2026. https://doi.org/10.32604/cmc.2025.070381



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 466

    View

  • 159

    Download

  • 0

    Like

Share Link