Transformers are Sample Efficient World Models
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Deep reinforcement learning agents have been limited in their application to real-world problems due to sample inefficiency
- Model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches
- Ensuring that the world model is accurate over extended periods of time has been a challenge
- IRIS is a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer
- IRIS achieves remarkable results on the Atari 100k benchmark with only two hours of gameplay equivalent training time, outperforming humans on 10 out of 26 games and setting a new state-of-the-art for methods without lookahead search and even surpassing MuZero
- The success of IRIS is attributed to its ability to learn from sequences efficiently using Transformers while also leveraging information from past interactions through its discrete autoencoder component
- The authors release their codebase at https://github.com/eloialonso/iris to foster future research on Transformers and world models for sample-efficient reinforcement learning.
Authors: Vincent Micheli, Eloi Alonso, François Fleuret
Abstract: Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at https://github.com/eloialonso/iris.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.