TD-MPC2: Scalable, Robust World Models for Continuous Control

AI-generated keywords: TD-MPC2 model-based reinforcement learning local trajectory optimization implicit world model scalability

AI-generated Key Points

  • TD-MPC2 is a model-based reinforcement learning algorithm focusing on optimizing local trajectories within the latent space of an implicit world model
  • Unlike its predecessor TD-MPC, TD-MPC2 does not require decoders and has shown significant improvements across various online RL tasks in different domains
  • With a single set of hyperparameters, TD-MPC2 consistently delivers strong results and demonstrates enhanced agent capabilities as both the model and data sizes increase
  • A single 317M parameter agent trained using TD-MPC2 successfully performs 80 tasks spanning multiple domains, embodiments, and action spaces
  • The scalability and robustness of TD-MPC2 are highlighted in handling complex real-world scenarios
  • The authors emphasize democratizing RL by enhancing open-source algorithms like TD-MPC2 to make them more accessible to smaller teams and individuals with limited resources
  • While other methods like DreamerV3 excel in specific domains, TD-MPC2 stands out for its versatility and performance across a broader spectrum of challenges
  • Researchers and practitioners are invited to join efforts in advancing algorithmic robustness within reinforcement learning by refining existing approaches like TD-MPC2
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nicklas Hansen, Hao Su, Xiaolong Wang

ICLR 2024. Explore videos, models, data, code, and more at https://tdmpc2.com
License: CC BY 4.0

Abstract: TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com

Submitted to arXiv on 25 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.16828v2

TD-MPC2 is a highly effective model-based reinforcement learning algorithm that focuses on optimizing local trajectories within the latent space of an implicit world model. Unlike its predecessor TD-MPC, this approach does not require decoders and has shown significant improvements across various online RL tasks in different domains. With a single set of hyperparameters, TD-MPC2 consistently delivers strong results and demonstrates enhanced agent capabilities as both the model and data sizes increase. In a groundbreaking demonstration, a single 317M parameter agent trained using TD-MPC2 successfully performs 80 tasks spanning multiple domains, embodiments, and action spaces. This remarkable achievement highlights the scalability and robustness of the algorithm in handling complex real-world scenarios. The authors emphasize the importance of democratizing RL by enhancing the robustness of open-source algorithms like TD-MPC2 to make them more accessible to smaller teams and individuals with limited resources. The development of TD-MPC2 showcases the ongoing quest for an all-encompassing RL algorithm that excels across diverse tasks without extensive tuning or expert intervention. While other contemporary methods like DreamerV3 have shown promise in specific domains such as Atari games and Minecraft, TD-MPC2 stands out for its versatility and performance across a broader spectrum of challenges. The authors invite researchers and practitioners to join their efforts in advancing algorithmic robustness within the field of reinforcement learning. By continuing to refine existing approaches like TD-MPC2 and exploring new avenues for improvement, they aim to drive impactful advancements that benefit the entire RL community. For further insights, videos, models, data, code, and more information on TD-MPC2 can be accessed at https://tdmpc2.com.
Created on 27 Aug. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.