Defeating Prompt Injections by Design

AI-generated keywords: Prompt Injection Attacks Large Language Models CaMeL Defense Mechanism Control and Data Flows Agentic Systems

AI-generated Key Points

  • Large Language Models (LLMs) in agentic systems are vulnerable to prompt injection attacks when handling untrusted data
  • Authors propose CaMeL as a defense mechanism to create a protective layer around LLMs
  • CaMeL extracts control and data flows from trusted queries to prevent untrusted data from impacting program flow
  • CaMeL uses capability concept to prevent exfiltration of private data over unauthorized flows
  • CaMeL demonstrates effectiveness by solving 67% of tasks with provable security in AgentDojo [NeurIPS 2024]
  • Importance of securing both control and data flows against prompt injection attacks in agentic systems is highlighted
  • Various defense mechanisms like using delimiters and prompting sandwiching are discussed to make models more resilient to malicious instructions
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr

License: CC BY 4.0

Abstract: Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL relies on a notion of a capability to prevent the exfiltration of private data over unauthorized data flows. We demonstrate effectiveness of CaMeL by solving $67\%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.

Submitted to arXiv on 24 Mar. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2503.18813v1

In the paper "Defeating Prompt Injections by Design," authors Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr address the increasing deployment of Large Language Models (LLMs) in agentic systems that interact with external environments. These LLM agents are vulnerable to prompt injection attacks when handling untrusted data. To combat this issue, the authors propose CaMeL, a robust defense mechanism that creates a protective system layer around the LLM to secure it even when underlying models may be susceptible to attacks. CaMeL operates by explicitly extracting control and data flows from trusted queries. This ensures that untrusted data retrieved by the LLM cannot impact program flow. Additionally, CaMeL relies on a capability concept to prevent the exfiltration of private data over unauthorized data flows. The effectiveness of CaMeL is demonstrated through its ability to solve 67% of tasks with provable security in AgentDojo [NeurIPS 2024], an agentic security benchmark. The authors highlight the importance of securing both control and data flows against prompt injection attacks in agentic systems. They discuss various defense mechanisms proposed by researchers to mitigate these risks. Methods such as using delimiters to mark boundaries of untrusted content within context and prompting sandwiching are explored as ways to make models more resilient to malicious instructions. Overall, the paper emphasizes the significance of developing robust defenses like CaMeL to protect LLM agents from prompt injection attacks and ensure secure interactions with external environments in agentic systems.
Created on 11 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.