Binary Code Similarity Detection: Retrospective Review and Future Directions

Shengjia Chang; Baojiang Cui; Shaocong Feng

doi:10.32604/cmc.2025.070195

Open Access icon Open Access

REVIEW

Binary Code Similarity Detection: Retrospective Review and Future Directions

Shengjia Chang, Baojiang Cui^*, Shaocong Feng

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, 100876, China

* Corresponding Author: Baojiang Cui. Email: email

Computers, Materials & Continua 2025, 85(3), 4345-4374. https://doi.org/10.32604/cmc.2025.070195

Received 10 July 2025; Accepted 10 September 2025; Issue published 23 October 2025

Abstract

Binary Code Similarity Detection (BCSD) is vital for vulnerability discovery, malware detection, and software security, especially when source code is unavailable. Yet, it faces challenges from semantic loss, recompilation variations, and obfuscation. Recent advances in artificial intelligence—particularly natural language processing (NLP), graph representation learning (GRL), and large language models (LLMs)—have markedly improved accuracy, enabling better recognition of code variants and deeper semantic understanding. This paper presents a comprehensive review of 82 studies published between 1975 and 2025, systematically tracing the historical evolution of BCSD and analyzing the progressive incorporation of artificial intelligence (AI) techniques. Particular emphasis is placed on the role of LLMs, which have recently emerged as transformative tools in advancing semantic representation and enhancing detection performance. The review is organized around five central research questions: (1) the chronological development and milestones of BCSD; (2) the construction of AI-driven technical roadmaps that chart methodological transitions; (3) the design and implementation of general analytical workflows for binary code analysis; (4) the applicability, strengths, and limitations of LLMs in capturing semantic and structural features of binary code; and (5) the persistent challenges and promising directions for future investigation. By synthesizing insights across these dimensions, the study demonstrates how LLMs reshape the landscape of binary code analysis, offering unprecedented opportunities to improve accuracy, scalability, and adaptability in real-world scenarios. This review not only bridges a critical gap in the existing literature but also provides a forward-looking perspective, serving as a valuable reference for researchers and practitioners aiming to advance AI-powered BCSD methodologies and applications.

Keywords

Binary code similarity detection; semantic code representation; graph-based modeling; representation learning; large language models

Cite This Article

APA Style

Chang, S., Cui, B., Feng, S. (2025). Binary Code Similarity Detection: Retrospective Review and Future Directions. Computers, Materials & Continua, 85(3), 4345–4374. https://doi.org/10.32604/cmc.2025.070195

Vancouver Style

Chang S, Cui B, Feng S. Binary Code Similarity Detection: Retrospective Review and Future Directions. Comput Mater Contin. 2025;85(3):4345–4374. https://doi.org/10.32604/cmc.2025.070195

IEEE Style

S. Chang, B. Cui, and S. Feng, “Binary Code Similarity Detection: Retrospective Review and Future Directions,” Comput. Mater. Contin., vol. 85, no. 3, pp. 4345–4374, 2025. https://doi.org/10.32604/cmc.2025.070195

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Binary Code Similarity Detection: Retrospective Review and Future Directions

Abstract

Keywords

Cite This Article

1374

538

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link