Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.077179
Special Issues
Table of Content

Open Access

ARTICLE

ADS: Adaptive Dataset Selection for Fine-Tuning in Anomalous Text

Xiaoyong Zhao1, Jiamin Wu2,*, Lei Wang2
1 School of Information Management, Beijing Information Science and Technology University, Beijing, China
2 College of Computer Science, Beijing Information Science and Technology University, Beijing, China
* Corresponding Author: Jiamin Wu. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.077179

Received 03 December 2025; Accepted 18 March 2026; Published online 22 May 2026

Abstract

With the continuous improvement of the performance of large language models, how to further enhance their ability in complex tasks has become a key issue. The task of abnormal text detection poses a challenge to the model in identifying non-standard semantics due to its semantic complexity and high-risk features. However, existing fine-tuning methods rely heavily on static data selection strategies, making it difficult to adapt to the dynamic evolution of model capabilities, resulting in low training efficiency. This article proposes ADS (Adaptive Dataset Selection), an adaptive framework for selecting data in anomaly text detection. ADS performs model-aware data selection prior to fine-tuning, adapting the initial state of pre-trained language models by selecting samples that are most informative for the target anomaly detection task. Empirical results on mainstream large language model architectures show that ADS significantly compresses data size while still outperforming existing static strategies and mainstream compression methods. When using only 1000 fine-tuning samples, ADS achieves a 92% F1 score, with an accuracy improvement of over 22% compared to the baseline, demonstrating excellent performance. This study proposes an efficient data selection mechanism from the perspective of model capability and dynamic adaptation of data, providing theoretical support and a practical path for fine-tuning large models in low-resource scenarios.

Keywords

Adaptive dataset selection; anomalous text detection; fine-tuning; large language models; dynamic sample optimization; data diversity
  • 182

    View

  • 41

    Download

  • 0

    Like

Share Link