FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Authors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield

License: CC BY 4.0

Abstract: We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/

Submitted to arXiv on 13 Dec. 2023

Explore the paper tree

Click on the tree nodes to be redirected to a given paper and access their summaries and virtual assistant

Also access our AI generated Summaries, or ask questions about this paper to our AI assistant.
Bowen Wen et al.Renat Bashirov et al.Qianqian Wang et al.Julian Ost et al.Wentao Zhu et al.Dhyey Manish Rajani et al.Xingrui Wang et al.Silvan Weder et al.Yue Wang et al.Kailu Wu et al.Zilong Chen et al.Tamás Matuszka et al.Weichao Zhao et al.Stephanie Fu et al.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.