GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

AI-generated keywords: Cross-language Binary-Source Matching

AI-generated Key Points

  • Matching binary code to source code and vice versa is crucial in various fields including computer security, software engineering, and reverse engineering.
  • Existing methods focus on matching source code with binary code for specific programming languages, but programs are developed using different languages based on their requirements.
  • Cross-language binary-to-source code matching has gained increased interest.
  • The authors propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes.
  • The goal of GraphBinMatch is to accurately predict matches between binary and source code across different programming languages.
  • Cross-language binary-source matching is important in practical scenarios where software applications are written in multiple programming languages to meet various requirements.
  • Detecting binary-source code clones across different languages can be beneficial for vulnerability assessment and improving code bases.
  • Input files are converted to LLVM IR, a language-independent format commonly used in modern compilers, to facilitate easier comparison of code written in different programming languages.
  • GraphBinMatch significantly outperforms state-of-the-art approaches with improvements of up to 15% in terms of F1 score.
  • GraphBinMatch also demonstrates superior performance in single-language scenarios.
  • The paper concludes with discussions on related works and future research directions.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ali TehraniJamsaz, Hanze Chen, Ali Jannesari

License: CC BY 4.0

Abstract: Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary code to accelerate the reverse engineering process, most of them are designed to focus on one programming language. However, in real life, programs are developed using different programming languages depending on their requirements. Thus, cross-language binary-to-source code matching has recently gained more attention. Nonetheless, the existing approaches still struggle to have precise predictions due to the inherent difficulties when the problem of matching binary code and source code needs to be addressed across programming languages. In this paper, we address the problem of cross-language binary source code matching. We propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes. We evaluate GraphBinMatch on several tasks, such as cross-language binary-to-source code matching and cross-language source-to-source matching. We also evaluate our approach performance on single-language binary-to-source code matching. Experimental results show that GraphBinMatch outperforms state-of-the-art significantly, with improvements as high as 15% over the F1 score.

Submitted to arXiv on 10 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.04658v1

Matching binary code to source code and vice versa is a crucial task in various fields, including computer security, software engineering, and reverse engineering. While there are existing methods that focus on matching source code with binary code for specific programming languages, the reality is that programs are developed using different languages based on their requirements. This has led to an increased interest in cross-language binary-to-source code matching. In this paper, the authors propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes. The goal is to address the challenges of accurately predicting matches between binary and source code across different programming languages. The paper highlights the importance of cross-language binary-source matching in practical scenarios where software applications are written in multiple programming languages to meet various requirements. Detecting binary-source code clones across different languages can be beneficial, especially for vulnerability assessment and improving code bases. To facilitate easier comparison of code written in different programming languages, the authors convert input files to LLVM IR, a language-independent format commonly used in modern compilers. This allows for more efficient code comparison and analysis. The authors evaluate GraphBinMatch on several tasks, including cross-language binary-to-source code matching, cross-language source-to-source matching, and single-language binary-to-source code matching. Experimental results demonstrate that GraphBinMatch significantly outperforms state-of-the art approaches with improvements of up to 15% in terms of F1 score. The effectiveness of GraphBinMatch extends beyond cross language matching and also applies to single language scenarios. Overall, GraphBinMatch demonstrates superior performance compared to existing approaches making it a promising solution for accurate cross language binary to source code matching. The paper concludes with discussions on related works and future research directions.
Created on 07 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.