A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation

Thierry Mugenzi; Cahit Perkgoz

doi:10.32604/cmc.2025.070381

Open Access icon Open Access

ARTICLE

A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation

Thierry Mugenzi, Cahit Perkgoz^*

Department of Computer Engineering, Faculty of Engineering, Eskisehir Technical University, Eskisehir, 26555, Türkiye

* Corresponding Author: Cahit Perkgoz. Email: email

Computers, Materials & Continua 2026, 86(1), 1-21. https://doi.org/10.32604/cmc.2025.070381

Received 15 July 2025; Accepted 23 September 2025; Issue published 10 November 2025

Abstract

Missing data presents a crucial challenge in data analysis, especially in high-dimensional datasets, where missing data often leads to biased conclusions and degraded model performance. In this study, we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision. The proposed loss combines (i) a guided, masked mean squared error focusing on missing entries; (ii) a noise-aware regularization term to improve resilience against data corruption; and (iii) a variance penalty to encourage expressive yet stable reconstructions. We evaluate the proposed model across four missingness mechanisms, such as Missing Completely at Random, Missing at Random, Missing Not at Random, and Missing Not at Random with quantile censorship, under systematically varied feature counts, sample sizes, and missingness ratios ranging from 5% to 60%. Four publicly available real-world datasets (Stroke Prediction, Pima Indians Diabetes, Cardiovascular Disease, and Framingham Heart Study) were used, and the obtained results show that our proposed model consistently outperforms baseline methods, including traditional and deep learning-based techniques. An ablation study reveals the additive value of each component in the loss function. Additionally, we assessed the downstream utility of imputed data through classification tasks, where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios. The model demonstrates strong scalability and robustness, improving performance with larger datasets and higher feature counts. These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations, making it a promising solution for robust data recovery in clinical applications.

Keywords

Missing data imputation; autoencoder; deep learning; missing mechanisms

Supplementary Material

Supplementary Material File

Cite This Article

APA Style

Mugenzi, T., Perkgoz, C. (2026). A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation. Computers, Materials & Continua, 86(1), 1–21. https://doi.org/10.32604/cmc.2025.070381

Vancouver Style

Mugenzi T, Perkgoz C. A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation. Comput Mater Contin. 2026;86(1):1–21. https://doi.org/10.32604/cmc.2025.070381

IEEE Style

T. Mugenzi and C. Perkgoz, “A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation,” Comput. Mater. Contin., vol. 86, no. 1, pp. 1–21, 2026. https://doi.org/10.32604/cmc.2025.070381

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Composite Loss-Based Autoencoder for Accurate and Scalable Missing Data Imputation

Abstract

Keywords

Supplementary Material

Cite This Article

1175

403

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link