DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

AI-generated keywords: Offline RL Model-based RL DOMAIN Adaptive Sampling Distribution Lower Bound

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • The paper proposes a new algorithm called DOMAIN for model-based reinforcement learning in offline settings.
  • Offline RL involves learning from a fixed dataset without interacting with the environment.
  • Model-based RL learns an environment model from the dataset and generates additional out-of-distribution model data to address distribution shift.
  • Previous algorithms rely on estimating model uncertainty for conservatism, but this can be unreliable and result in poor performance.
  • DOMAIN introduces an adaptive sampling distribution of model samples to adjust the penalty for using model data, without relying on estimating model uncertainty.
  • The Q value learned by DOMAIN outside a specific region is a lower bound of the true Q value, making it less conservative than previous algorithms while guaranteeing security policy improvement.
  • Experimental results show that DOMAIN outperforms prior RL algorithms on the D4RL dataset benchmark and other tasks requiring generalization capabilities.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xiao-Yin Liu, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou

13 pages, 6 figures

Abstract: Model-based reinforcement learning (RL), which learns environment model from offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. Therefore, this paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty to address the above issues. DOMAIN introduces adaptive sampling distribution of model samples, which can adaptively adjust the model data penalty. In this paper, we theoretically demonstrate that the Q value learned by the DOMAIN outside the region is a lower bound of the true Q value, the DOMAIN is less conservative than previous model-based offline RL algorithms and has the guarantee of security policy improvement. The results of extensive experiments show that DOMAIN outperforms prior RL algorithms on the D4RL dataset benchmark, and achieves better performance than other RL algorithms on tasks that require generalization.

Submitted to arXiv on 16 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.08925v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper titled "DOMAIN: Mildly Conservative Model-Based Offline Reinforcement Learning" proposes a new algorithm called DOMAIN for model-based reinforcement learning (RL) in offline settings. Offline RL involves learning from a fixed dataset without interacting with the environment. Model-based RL learns an environment model from this dataset and generates additional out-of-distribution model data to address the problem of distribution shift. One challenge in model-based RL is the gap between the learned environment model and the actual environment, which can lead to inaccurate predictions. To address this, conservatism needs to be incorporated into the algorithm to balance accurate offline data and imprecise model data. Previous algorithms rely on estimating model uncertainty for conservatism, but uncertainty estimation can be unreliable and result in poor performance in certain scenarios. Additionally, these methods often ignore differences between the model data, leading to excessive conservatism. In response to these issues, DOMAIN introduces an adaptive sampling distribution of model samples that can adjust the penalty for using model data. Unlike previous approaches, DOMAIN does not rely on estimating model uncertainty. The authors theoretically demonstrate that the Q value learned by DOMAIN outside a specific region is a lower bound of the true Q value. This means that DOMAIN is less conservative than previous model-based offline RL algorithms while still guaranteeing security policy improvement. The paper presents extensive experimental results showing that DOMAIN outperforms prior RL algorithms on the D4RL dataset benchmark as well as other tasks requiring generalization capabilities. The authors of this paper are Xiao-Yin Liu, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang , De -Xing Huang , and Zeng -Guang Hou . The paper is categorized under cs .LG (Computer Science - Machine Learning) and cs .AI (Computer Science - Artificial Intelligence). It is 13 pages long and includes 6 figures .
Created on 01 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.