Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars

AI-generated keywords: Synthetic 3D Scenes Photorealistic 2D Images Stochastic Grammar Physics-Based Rendering Machine Learning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors propose a systematic learning-based approach for generating synthetic 3D scenes and photorealistic 2D images
  • Pipeline of algorithms can automatically generate diverse indoor scenes using a stochastic grammar and physics-based rendering
  • Precise customization and control of scene attributes is possible
  • Renders realistic RGB images while synthesizing detailed per-pixel ground truth data such as depth, surface normal, object identity, material information, and environmental factors
  • Synthesized dataset improves performance in machine learning based scene understanding tasks
  • Provides benchmarks for trained models through controllable modifications of object attributes and scene properties
  • Paper accepted in the International Journal of Computer Vision (IJCV) in 2018
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu

Accepted in IJCV 2018

Abstract: We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, we devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. Our pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). We demonstrate the value of our synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks--depth and surface normal prediction, semantic segmentation, reconstruction, etc.--and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.

Submitted to arXiv on 01 Apr. 2017

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1704.00112v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The authors propose a systematic learning-based approach for generating large quantities of synthetic 3D scenes and photorealistic 2D images, along with ground truth information, to train and evaluate computer vision and robotics algorithms. They develop a pipeline of algorithms that can automatically generate diverse indoor scenes using a stochastic grammar and physics-based rendering. The pipeline allows for precise customization and control of scene attributes. It renders realistic RGB images while synthesizing detailed per-pixel ground truth data such as depth, surface normal, object identity, material information and environmental factors. The authors demonstrate the value of their synthesized dataset by improving performance in various machine learning based scene understanding tasks and providing benchmarks for trained models through controllable modifications of object attributes and scene properties. The paper has been accepted in the International Journal of Computer Vision (IJCV) in 2018.
Created on 11 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.