Yaxin Zhao1, Qi Han2, Hui Shu2, Yan Guang2,*
CMC-Computers, Materials & Continua, Vol.86, No.2, pp. 1-24, 2026, DOI:10.32604/cmc.2025.070511
- 09 December 2025
Abstract Large Language Models (LLMs) are increasingly applied in the field of code translation. However, existing evaluation methodologies suffer from two major limitations: (1) the high overlap between test data and pretraining corpora, which introduces significant bias in performance evaluation; and (2) mainstream metrics focus primarily on surface-level accuracy, failing to uncover the underlying factors that constrain model capabilities. To address these issues, this paper presents TCode (Translation-Oriented Code Evaluation benchmark)—a complexity-controllable, contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework. The dataset is carefully designed to control complexity along multiple dimensions—including syntactic… More >