Open Access iconOpen Access

ARTICLE

Deep Learning-Based Program-Wide Binary Code Similarity for Smart Contracts

Yuan Zhuang1, Baobao Wang1, Jianguo Sun2,*, Haoyang Liu1, Shuqi Yang1, Qingan Da3

1 Harbin Engineering University, Harbin, 150000, China
2 University of Sanya, Sanya, 572000, China
3 University of Alberta, Edmonton, T5J4P6, Canada

* Corresponding Author: Jianguo Sun. Email: email

Computers, Materials & Continua 2023, 74(1), 1011-1024. https://doi.org/10.32604/cmc.2023.028058

Abstract

Recently, security issues of smart contracts are arising great attention due to the enormous financial loss caused by vulnerability attacks. There is an increasing need to detect similar codes for hunting vulnerability with the increase of critical security issues in smart contracts. Binary similarity detection that quantitatively measures the given code diffing has been widely adopted to facilitate critical security analysis. However, due to the difference between common programs and smart contract, such as diversity of bytecode generation and highly code homogeneity, directly adopting existing graph matching and machine learning based techniques to smart contracts suffers from low accuracy, poor scalability and the limitation of binary similarity on function level. Therefore, this paper investigates graph neural network to detect smart contract binary code similarity at the program level, where we conduct instruction-level normalization to reduce the noise code for smart contract pre-processing and construct contract control flow graphs to represent smart contracts. In particular, two improved Graph Convolutional Network (GCN) and Message Passing Neural Network (MPNN) models are explored to encode the contract graphs into quantitatively vectors, which can capture the semantic information and the program-wide control flow information with temporal orders. Then we can efficiently accomplish the similarity detection by measuring the distance between two targeted contract embeddings. To evaluate the effectiveness and efficient of our proposed method, extensive experiments are performed on two real-world datasets, i.e., smart contracts from Ethereum and Enterprise Operation System (EOS) blockchain-based platforms. The results show that our proposed approach outperforms three state-of-the-art methods by a large margin, achieving a great improvement up to 6.1% and 17.06% in accuracy.

Keywords


Cite This Article

Y. Zhuang, B. Wang, J. Sun, H. Liu, S. Yang et al., "Deep learning-based program-wide binary code similarity for smart contracts," Computers, Materials & Continua, vol. 74, no.1, pp. 1011–1024, 2023. https://doi.org/10.32604/cmc.2023.028058



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1005

    View

  • 648

    Download

  • 0

    Like

Share Link