Shengjia Chang, Baojiang Cui*, Shaocong Feng
CMC-Computers, Materials & Continua, Vol.85, No.3, pp. 4345-4374, 2025, DOI:10.32604/cmc.2025.070195
- 23 October 2025
Abstract Binary Code Similarity Detection (BCSD) is vital for vulnerability discovery, malware detection, and software security, especially when source code is unavailable. Yet, it faces challenges from semantic loss, recompilation variations, and obfuscation. Recent advances in artificial intelligence—particularly natural language processing (NLP), graph representation learning (GRL), and large language models (LLMs)—have markedly improved accuracy, enabling better recognition of code variants and deeper semantic understanding. This paper presents a comprehensive review of 82 studies published between 1975 and 2025, systematically tracing the historical evolution of BCSD and analyzing the progressive incorporation of artificial intelligence (AI) techniques. Particular… More >