Planning Goals for Exploration
AI-generated Key Points
- The paper proposes a new approach to goal-directed exploration in reinforcement learning.
- The authors focus on the goal-conditioned reinforcement learning (GCRL) paradigm, which involves training an agent to achieve goals specified by a reward function.
- They introduce a new method called "Planning Exploratory Goals" (PEG), which sets goals for each training episode to directly optimize an intrinsic exploration reward.
- PEG leverages planning with world models to identify goal commands that will induce high exploration value trajectories, even if they lead to previously observed or physically implausible states.
- PEG enables more efficient and effective training of generalist GCRL policies relative to baselines and ablations in challenging simulated robotics environments.
- The authors discuss related work in the field of exploration in reinforcement learning, highlighting recent approaches that have observed improved exploration but still suffer from shortcomings such as suboptimal launchpads or lack of direct optimization for exploration.
- In conclusion, the proposed PEG method offers a promising solution to the problem of goal-directed exploration in reinforcement learning by optimizing directly for goal selection that generates trajectories with high exploration value.
Authors: Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman
Abstract: Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG first chooses goal commands such that the agent's goal-conditioned policy, at its current level of training, will end up in states with high exploration potential. It then launches an exploration policy starting at those promising states. To enable this direct optimization, PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands". In challenging simulated robotics environments including a multi-legged ant robot in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables more efficient and effective training of goal-conditioned policies relative to baselines and ablations. Our ant successfully navigates a long maze, and the robot arm successfully builds a stack of three blocks upon command. Website: https://penn-pal-lab.github.io/peg/
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.