Malicious Source Code Detection Using Transformer

AI-generated keywords: Malicious Code

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Open source code is commonly used in modern software development
  • Reused code presents a risk for supply chain attacks
  • Malicious actors can inject malicious code into products that rely on reused code
  • Many approaches have been developed to detect vulnerable packages, but detecting malicious code within packages is uncommon
  • The Malicious Source Code Detection using Transformers (MSDT) algorithm was introduced to address this issue
  • MSDT is a static analysis based on deep learning that detects real-world code injection cases in source code packages
  • MSDT uses a dataset with over 600,000 different functions and applies a clustering algorithm to identify outliers and detect malicious functions
  • Extensive experiments were conducted to evaluate MSDT's performance, demonstrating its capability to detect functions injected with malicious code with precision@k values of up to 0.909.
  • Developing new detection methods for identifying malicious code within open source packages is important to mitigate supply chain attacks in software development.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chen Tsfaty, Michael Fire

License: CC BY-NC-ND 4.0

Abstract: Open source code is considered a common practice in modern software development. However, reusing other code allows bad actors to access a wide developers' community, hence the products that rely on it. Those attacks are categorized as supply chain attacks. Recent years saw a growing number of supply chain attacks that leverage open source during software development, relaying the download and installation procedures, whether automatic or manual. Over the years, many approaches have been invented for detecting vulnerable packages. However, it is uncommon to detect malicious code within packages. Those detection approaches can be broadly categorized as analyzes that use (dynamic) and do not use (static) code execution. Here, we introduce Malicious Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel static analysis based on a deep learning method that detects real-world code injection cases to source code packages. In this study, we used MSDT and a dataset with over 600,000 different functions to embed various functions and applied a clustering algorithm to the resulting vectors, detecting the malicious functions by detecting the outliers. We evaluated MSDT's performance by conducting extensive experiments and demonstrated that our algorithm is capable of detecting functions that were injected with malicious code with precision@k values of up to 0.909.

Submitted to arXiv on 16 Sep. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2209.07957v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The use of open source code is a common practice in modern software development, but it also presents a risk for supply chain attacks. Bad actors can access a wide community of developers through reused code and inject malicious code into products that rely on it. While many approaches have been developed to detect vulnerable packages, detecting malicious code within packages is uncommon. To address this issue, Chen Tsfaty and Michael Fire introduce the Malicious Source Code Detection using Transformers (MSDT) algorithm. MSDT is a novel static analysis based on deep learning that detects real-world code injection cases in source code packages. The authors used MSDT and a dataset with over 600,000 different functions to embed various functions and applied a clustering algorithm to the resulting vectors, detecting the malicious functions by identifying outliers. Extensive experiments were conducted to evaluate MSDT's performance, demonstrating its capability to detect functions injected with malicious code with precision@k values of up to 0.909. This study highlights the importance of developing new detection methods for identifying malicious code within open source packages in order to mitigate supply chain attacks in software development.
Created on 15 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.