Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.071301
Special Issues
Table of Content

Open Access

ARTICLE

Task-Structured Curriculum Learning for Multi-Task Distillation: Enhancing Step-by-Step Knowledge Transfer in Language Models

Ahmet Ezgi1, Aytuğ Onan2,*
1 Department of Computer Engineering, Faculty of Engineering and Architecture, łzmir Katip Çelebi University, łzmir, 35620, Turkey
2 Department of Computer Engineering, Faculty of Engineering, łzmir Institute of Technology, łzmir, 35430, Turkey
* Corresponding Author: Aytuğ Onan. Email: email
(This article belongs to the Special Issue: Contrastive Representation Learning for Next-Generation LLMs: Methods and Applications in NLP)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071301

Received 04 August 2025; Accepted 31 October 2025; Published online 19 December 2025

Abstract

Knowledge distillation has become a standard technique for compressing large language models into efficient student models, but existing methods often struggle to balance prediction accuracy with explanation quality. Recent approaches such as Distilling Step-by-Step (DSbS) introduce explanation supervision, yet they apply it in a uniform manner that may not fully exploit the different learning dynamics of prediction and explanation. In this work, we propose a task-structured curriculum learning (TSCL) framework that structures training into three sequential phases: (i) prediction-only, to establish stable feature representations; (ii) joint prediction–explanation, to align task outputs with rationale generation; and (iii) explanation-only, to refine the quality of rationales. This design provides a simple but effective modification to DSbS, requiring no architectural changes and adding negligible training cost. We justify the phase scheduling with ablation studies and convergence analysis, showing that an initial prediction-heavy stage followed by a balanced joint phase improves both stability and explanation alignment. Extensive experiments on five datasets (e-SNLI, ANLI, CommonsenseQA, SVAMP, and MedNLI) demonstrate that TSCL consistently outperforms strong baselines, achieving gains of +1.7–2.6 points in accuracy and 0.8–1.2 in ROUGE-L, corresponding to relative error reductions of up to 21%. Beyond lexical metrics, human evaluation and ERASER-style faithfulness diagnostics confirm that TSCL produces more faithful and informative explanations. Comparative training curves further reveal faster convergence and lower variance across seeds. Efficiency analysis shows less than 3% overhead in wall-clock training time and no additional inference cost, making the approach practical for real-world deployment. This study demonstrates that a simple task-structured curriculum can significantly improve the effectiveness of knowledge distillation. By separating and sequencing objectives, TSCL achieves a better balance between accuracy, stability, and explanation quality. The framework generalizes across domains, including medical NLI, and offers a principled recipe for future applications in multimodal reasoning and reinforcement learning.

Keywords

Knowledge distillation; curriculum learning; language models; multi-task learning; step-by-step learning
  • 27

    View

  • 5

    Download

  • 0

    Like

Share Link