MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

AI-generated keywords: Autonomous agents generalist agents MineDojo simulation suite embodied agents

AI-generated Key Points

  • Autonomous agents have made strides in specialized domains like Atari games and Go, but struggle to generalize across tasks
  • Researchers propose a trinity of ingredients for building generalist agents: diverse task environment, large-scale multimodal knowledge base, and flexible agent architecture
  • MineDojo framework based on Minecraft offers simulation suite with open-ended tasks and internet-scale knowledge base
  • Utilizes novel learning algorithm for embodied agents leveraging pre-trained video-language models as reward function
  • Agent trained using this approach shows competitive performance and up to 73% improvement in success rates
  • Introduces open-ended task suite, internet-scale domain knowledge, and agent learning techniques utilizing large pre-trained models
  • MINEDOJO simulator suite and knowledge base will be available as open-source resources for further research
  • Offers programmatic tasks focused on survival, harvesting materials, tech advancement, combat skills; also creative tasks without straightforward success criteria
  • Novel task evaluation metric based on contrastive video-language model used to assess creative tasks accurately
  • Task mining from YouTube tutorial videos expands number of task definitions significantly compared to existing challenges in the field
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar

License: CC BY 4.0

Abstract: Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports a multitude of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. Using MineDojo's data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward. We open-source the simulation suite and knowledge bases (https://minedojo.org) to promote research towards the goal of generally capable embodied agents.

Submitted to arXiv on 17 Jun. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2206.08853v1

Autonomous agents have made impressive strides in specialized domains such as Atari games and Go. However, they often struggle to generalize across a wide range of tasks and capabilities. To address this limitation, researchers propose a trinity of ingredients for building generalist agents: an environment supporting diverse tasks and goals, a large-scale database of multimodal knowledge, and a flexible agent architecture. Introducing MineDojo – a framework based on Minecraft that offers a simulation suite with thousands of open-ended tasks and an internet-scale knowledge base comprising videos, tutorials, wiki pages, and forum discussions. This innovative approach utilizes a novel learning algorithm for embodied agents leveraging large pre-trained video-language models as a learned reward function. By training on the vast amount of YouTube data from MineDojo, a video-text contrastive model is developed to associate natural language subtitles with video segments. This correlation score serves as an effective reward function for reinforcement learning training without the need for manually designed dense shaping rewards. The results show that the agent trained using this approach demonstrates competitive performance compared to traditionally trained agents and achieves up to 73% improvement in success rates. The paper also introduces an open-ended task suite, internet-scale domain knowledge, and agent learning techniques utilizing large pre-trained models. The MINEDOJO simulator suite and knowledge base will be made available as open-source resources to facilitate further research in developing generally capable embodied agents. In addition to programmatic tasks focused on survival, harvesting materials, advancing through tech trees, and combat skills; MineDojo offers creative tasks with no straightforward success criteria. A novel task evaluation metric based on a pre-trained contrastive video-language model is employed to assess these creative tasks accurately. Through systematic approaches like task mining from YouTube tutorial videos; the number of task definitions is expanded significantly compared to existing challenges in the field. Overall, MineDojo presents a comprehensive framework for developing generalist embodied agents by combining diverse task environments with extensive domain knowledge and advanced learning algorithms. Researchers are encouraged to utilize these resources to advance the field towards creating more adaptable and capable autonomous agents.
Created on 12 Aug. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.