Formalizing and Benchmarking Prompt Injection Attacks and Defenses

AI-generated keywords: Prompt injection attacks

AI-generated Key Points

  • Prompt injection attacks are a type of cyber attack that aims to insert harmful instructions or data into the input of an LLM-Integrated Application.
  • Previous studies on prompt injection attacks have primarily focused on case studies and lacked a formal framework for understanding these attacks and their defenses.
  • A new study proposes a comprehensive framework for formalizing prompt injection attacks, encompassing existing attack strategies and allowing for the design of new attacks by combining different techniques.
  • The study conducted a systematic evaluation of five prompt injection attacks using ten different LLMs across seven tasks, with a newly designed attack consistently outperforming existing attacks in various target tasks and injected data scenarios.
  • Ten defense mechanisms against prompt injection attacks were systematically benchmarked, revealing that existing defenses are inadequate in effectively preventing or detecting these attacks.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong

To appear in USENIX Security Symposium 2024
License: CC BY 4.0

Abstract: A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

Submitted to arXiv on 19 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.12815v3

, , , , Prompt injection attacks, also known as malicious injections, are a type of cyber attack that aims to insert harmful instructions or data into the input of an LLM-Integrated Application. This can manipulate the results produced by the application and potentially compromise its functionality. While there have been previous studies on prompt injection attacks, they have primarily focused on case studies and lacked a formal framework for understanding these attacks and their defenses. To address this gap, a new study proposes a comprehensive framework for formalizing prompt injection attacks. This framework not only encompasses existing attack strategies but also allows for the design of new attacks by combining different techniques. The study then conducts a systematic evaluation of five prompt injection attacks using ten different LLMs across seven tasks. One notable finding is that a newly designed attack inspired by the proposed framework consistently outperforms existing attacks in various target tasks and injected data scenarios. This highlights the effectiveness of utilizing a structured framework for designing and evaluating prompt injection attacks. In addition to evaluating attacks, the study also systematically benchmarks ten defense mechanisms against prompt injection attacks. These defenses include both prevention-based strategies, which aim to redesign task prompts or preprocess data to thwart injected instructions, and detection-based methods, which focus on identifying compromised data within tasks. However, the evaluation reveals that existing defenses are inadequate in effectively preventing or detecting prompt injection attacks. This indicates a need for more robust defense mechanisms in this domain. Overall, this study makes significant contributions by introducing a formalized framework for prompt injection attacks, conducting thorough evaluations of both attacks and defenses using diverse LLMs and tasks, and providing an open-source platform for further research in this area. By establishing a common benchmark for evaluating future prompt injection attacks and defenses, this work sets the stage for advancements in cybersecurity measures against malicious injections in LLM-Integrated Applications.
Created on 08 Aug. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.