TY  - EJOU
AU  - Zhao, Xiaoyong 
AU  - Wu, Jiamin 
AU  - Wang, Lei 

TI  - ADS: Adaptive Dataset Selection for Fine-Tuning in Anomalous Text
T2  - Computers, Materials \& Continua

PY  - 
VL  - 
IS  - 
SN  - 1546-2226

AB  - With the continuous improvement of the performance of large language models, how to further enhance their ability in complex tasks has become a key issue. The task of abnormal text detection poses a challenge to the model in identifying non-standard semantics due to its semantic complexity and high-risk features. However, existing fine-tuning methods rely heavily on static data selection strategies, making it difficult to adapt to the dynamic evolution of model capabilities, resulting in low training efficiency. This article proposes ADS (Adaptive Dataset Selection), an adaptive framework for selecting data in anomaly text detection. ADS performs model-aware data selection prior to fine-tuning, adapting the initial state of pre-trained language models by selecting samples that are most informative for the target anomaly detection task. Empirical results on mainstream large language model architectures show that ADS significantly compresses data size while still outperforming existing static strategies and mainstream compression methods. When using only 1000 fine-tuning samples, ADS achieves a 92% F1 score, with an accuracy improvement of over 22% compared to the baseline, demonstrating excellent performance. This study proposes an efficient data selection mechanism from the perspective of model capability and dynamic adaptation of data, providing theoretical support and a practical path for fine-tuning large models in low-resource scenarios.
KW  - Adaptive dataset selection; anomalous text detection; fine-tuning; large language models; dynamic sample optimization; data diversity

DO  - 10.32604/cmc.2026.077179