GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

AI-generated keywords: Text-guided 3D modeling

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Utilization of pretrained text-to-image diffusion models in text-guided 3D modeling has gained traction
  • Limitation of existing methods in handling complex textual descriptions with multiple objects due to inability of vectorized text embeddings to capture intricate relationships
  • Introduction of GraphDreamer framework focusing on generating compositional 3D scenes from structured scene graphs
  • Use of signed distance fields by GraphDreamer for representing objects and preventing object inter-penetration
  • Novel text prompt designed for ChatGPT to automatically generate structured representations based on textual inputs
  • Extensive experiments validating the efficacy of GraphDreamer in producing high-fidelity compositional 3D scenes with disentangled object entities
  • Advancement in text-guided 3D modeling research through GraphDreamer's ability to accurately translate complex textual descriptions into detailed and coherent 3D visualizations
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf

Technical Report (18 pages, 11 figures, https://graphdreamer.github.io/)

Abstract: As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities.

Submitted to arXiv on 30 Nov. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2312.00093v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of <kw>text-guided 3D modeling,</kw> the utilization of <kw>pretrained text-to-image diffusion models</kw> has gained significant traction due to their enhanced capabilities. However, existing methods often fall short when faced with complex textual descriptions depicting scenes with multiple objects. This limitation stems from the inherent inability of vectorized text embeddings to capture intricate relationships between entities within a scene. To address these challenges, a groundbreaking framework known as <kw>GraphDreamer</kw> has been introduced. This innovative approach focuses on generating compositional 3D scenes from structured scene graphs, where objects are represented as nodes and their interactions as edges. By leveraging the rich node and edge information encoded in scene graphs, GraphDreamer optimally harnesses pretrained text-to-image diffusion models to disentangle different objects without relying on image-level supervision. One key aspect of GraphDreamer is its use of <kw>signed distance fields</kw> for representing objects and enforcing constraints to prevent object inter-penetration. This enables the model to effectively capture object-wise relationships within the scene while maintaining spatial coherence. Additionally, to streamline the process of creating scene graphs, a novel text prompt has been designed for ChatGPT to automatically generate structured representations based on textual inputs. Extensive qualitative and quantitative experiments have been conducted to validate the efficacy of GraphDreamer in producing high-fidelity compositional 3D scenes with disentangled object entities. The results showcase the model's ability to accurately translate complex textual descriptions into detailed and coherent 3D visualizations, marking a significant advancement in <kw>text-guided 3D modeling</kw> research. Overall, GraphDreamer represents a pioneering step towards enhancing the interpretability and fidelity of 3D scene synthesis by incorporating <kw>structured scene graphs</kw> and leveraging advanced deep learning techniques for optimal knowledge distillation from pretrained models.
Created on 29 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.