SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

AI-generated keywords: Reinforcement learning

AI-generated Key Points

  • Reinforcement learning (RL) is increasingly used in real-world, safety-critical scenarios
  • Focus on backdoor poisoning attacks in RL agents during training to manipulate behavior at inference time
  • Theoretical limitations of existing work in generalizing across domains and Markov Decision Processes (MDPs)
  • Introduction of a novel poisoning attack framework aligning adversary's objectives with finding an optimal policy for long-term success
  • Introduction of "SleeperNets" as a universal backdoor attack strategy using dynamic reward poisoning techniques
  • Evaluation of SleeperNets attack in various environments showing improved success rates while maintaining benign episodic return
  • Formal analysis of static reward poisoning attacks' weaknesses
  • Introduction of an "outer-loop" threat model for more informed poisoning attacks after each episode
  • Development of a novel framework utilizing dynamic reward poisoning for creating RL backdoor attacks with provable guarantees of success and stealth over time
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ethan Rathbun, Christopher Amato, Alina Oprea

23 pages, 14 figures, NeurIPS
License: CC BY 4.0

Abstract: Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.

Submitted to arXiv on 30 May. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2405.20539v2

, , , , Reinforcement learning (RL) is a rapidly evolving field with increasing applications in real-world, safety-critical scenarios. As the use of RL algorithms grows, it becomes crucial to ensure their robustness against adversarial attacks. In this study, the focus is on backdoor poisoning attacks, a stealthy form of training-time attacks against RL agents. These attacks involve adversaries intervening during the training process to manipulate the agent's behavior so that it reliably performs a specific action when presented with a predetermined trigger at inference time. The research uncovers theoretical limitations in existing work by demonstrating their inability to generalize across different domains and Markov Decision Processes (MDPs). Motivated by this discovery, a novel poisoning attack framework is formulated. This framework aligns the adversary's objectives with finding an optimal policy, ensuring attack success in the long run. Leveraging insights from theoretical analysis, "SleeperNets" is introduced as a universal backdoor attack strategy. SleeperNets exploits a newly proposed threat model and employs dynamic reward poisoning techniques to achieve its goals. The study evaluates the SleeperNets attack in various environments spanning multiple domains such as robotic navigation, video game playing, self-driving tasks, and stock trading. Results demonstrate significant improvements in attack success rates compared to existing methods while maintaining benign episodic return. The research also includes 1. The first formal analysis of static reward poisoning attacks, highlighting their weaknesses. 2. Introduction of an "outer-loop" threat model where adversaries manipulate agent rewards and state observations after each episode for more informed poisoning attacks. 3. Development of a novel framework utilizing dynamic reward poisoning for creating RL backdoor attacks with provable guarantees of success and stealth over time. Overall, the study provides valuable insights into enhancing the security of RL algorithms against adversarial threats through innovative backdoor poisoning attack strategies like SleeperNets.
Created on 21 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.