GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching
AI-generated Key Points
- Matching binary code to source code and vice versa is crucial in various fields including computer security, software engineering, and reverse engineering.
- Existing methods focus on matching source code with binary code for specific programming languages, but programs are developed using different languages based on their requirements.
- Cross-language binary-to-source code matching has gained increased interest.
- The authors propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes.
- The goal of GraphBinMatch is to accurately predict matches between binary and source code across different programming languages.
- Cross-language binary-source matching is important in practical scenarios where software applications are written in multiple programming languages to meet various requirements.
- Detecting binary-source code clones across different languages can be beneficial for vulnerability assessment and improving code bases.
- Input files are converted to LLVM IR, a language-independent format commonly used in modern compilers, to facilitate easier comparison of code written in different programming languages.
- GraphBinMatch significantly outperforms state-of-the-art approaches with improvements of up to 15% in terms of F1 score.
- GraphBinMatch also demonstrates superior performance in single-language scenarios.
- The paper concludes with discussions on related works and future research directions.
Authors: Ali TehraniJamsaz, Hanze Chen, Ali Jannesari
Abstract: Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary code to accelerate the reverse engineering process, most of them are designed to focus on one programming language. However, in real life, programs are developed using different programming languages depending on their requirements. Thus, cross-language binary-to-source code matching has recently gained more attention. Nonetheless, the existing approaches still struggle to have precise predictions due to the inherent difficulties when the problem of matching binary code and source code needs to be addressed across programming languages. In this paper, we address the problem of cross-language binary source code matching. We propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes. We evaluate GraphBinMatch on several tasks, such as cross-language binary-to-source code matching and cross-language source-to-source matching. We also evaluate our approach performance on single-language binary-to-source code matching. Experimental results show that GraphBinMatch outperforms state-of-the-art significantly, with improvements as high as 15% over the F1 score.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.