Open Access iconOpen Access

ARTICLE

Structure-Based Virtual Sample Generation Using Average-Linkage Clustering for Small Dataset Problems

Chih-Chieh Chang*, Khairul Izyan Bin Anuar, Yu-Hwa Liu

School of Management, National Taiwan University of Science and Technology, No. 43, Sec. 4, Keelung Rd., Taipei, 106335, Taiwan

* Corresponding Author: Chih-Chieh Chang. Email: email

Computers, Materials & Continua 2026, 87(1), 34 https://doi.org/10.32604/cmc.2025.073177

Abstract

Small datasets are often challenging due to their limited sample size. This research introduces a novel solution to these problems: average linkage virtual sample generation (ALVSG). ALVSG leverages the underlying data structure to create virtual samples, which can be used to augment the original dataset. The ALVSG process consists of two steps. First, an average-linkage clustering technique is applied to the dataset to create a dendrogram. The dendrogram represents the hierarchical structure of the dataset, with each merging operation regarded as a linkage. Next, the linkages are combined into an average-based dataset, which serves as a new representation of the dataset. The second step in the ALVSG process involves generating virtual samples using the average-based dataset. The research project generates a set of 100 virtual samples by uniformly distributing them within the provided boundary. These virtual samples are then added to the original dataset, creating a more extensive dataset with improved generalization performance. The efficacy of the ALVSG approach is validated through resampling experiments and t-tests conducted on two small real-world datasets. The experiments are conducted on three forecasting models: the support vector machine for regression (SVR), the deep learning model (DL), and XGBoost. The results show that the ALVSG approach outperforms the baseline methods in terms of mean square error (MSE), root mean square error (RMSE), and mean absolute error (MAE).

Keywords

Small datasets; average linkage; virtual sample generation; forecasting; accuracy improvements

Cite This Article

APA Style
Chang, C., Anuar, K.I.B., Liu, Y. (2026). Structure-Based Virtual Sample Generation Using Average-Linkage Clustering for Small Dataset Problems. Computers, Materials & Continua, 87(1), 34. https://doi.org/10.32604/cmc.2025.073177
Vancouver Style
Chang C, Anuar KIB, Liu Y. Structure-Based Virtual Sample Generation Using Average-Linkage Clustering for Small Dataset Problems. Comput Mater Contin. 2026;87(1):34. https://doi.org/10.32604/cmc.2025.073177
IEEE Style
C. Chang, K. I. B. Anuar, and Y. Liu, “Structure-Based Virtual Sample Generation Using Average-Linkage Clustering for Small Dataset Problems,” Comput. Mater. Contin., vol. 87, no. 1, pp. 34, 2026. https://doi.org/10.32604/cmc.2025.073177



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 269

    View

  • 56

    Download

  • 0

    Like

Share Link