SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images

AI-generated keywords: Self-Supervised Bird's-Eye-View Monocular Frontal View Semantic Mapping Automated Driving

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • The paper addresses the need for Bird's-Eye-View (BEV) semantic maps in automated driving pipelines.
  • Existing approaches for generating BEV maps rely on fully supervised training and require large amounts of annotated data.
  • The authors propose a self-supervised approach for generating a BEV semantic map using a single monocular image from the frontal view (FV).
  • The model leverages FV semantic annotations from video sequences during training instead of BEV ground truth annotations.
  • The proposed SkyEye architecture learns through implicit supervision and explicit supervision.
  • Extensive evaluations on the KITTI-360 dataset show that the self-supervised approach performs comparably to state-of-the-art fully supervised methods.
  • It achieves competitive results using only 1% of direct supervision in the BEV compared to fully supervised approaches.
  • The authors publicly release their code and the BEV datasets generated from the KITTI-360 and Waymo datasets to facilitate further research.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nikhil Gosala, Kürsat Petek, Paulo L. J. Drews-Jr, Wolfram Burgard, Abhinav Valada

14 pages, 7 figures

Abstract: Bird's-Eye-View (BEV) semantic maps have become an essential component of automated driving pipelines due to the rich representation they provide for decision-making tasks. However, existing approaches for generating these maps still follow a fully supervised training paradigm and hence rely on large amounts of annotated BEV data. In this work, we address this limitation by proposing the first self-supervised approach for generating a BEV semantic map using a single monocular image from the frontal view (FV). During training, we overcome the need for BEV ground truth annotations by leveraging the more easily available FV semantic annotations of video sequences. Thus, we propose the SkyEye architecture that learns based on two modes of self-supervision, namely, implicit supervision and explicit supervision. Implicit supervision trains the model by enforcing spatial consistency of the scene over time based on FV semantic sequences, while explicit supervision exploits BEV pseudolabels generated from FV semantic annotations and self-supervised depth estimates. Extensive evaluations on the KITTI-360 dataset demonstrate that our self-supervised approach performs on par with the state-of-the-art fully supervised methods and achieves competitive results using only 1% of direct supervision in the BEV compared to fully supervised approaches. Finally, we publicly release both our code and the BEV datasets generated from the KITTI-360 and Waymo datasets.

Submitted to arXiv on 08 Feb. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2302.04233v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper titled "SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images" addresses the need for Bird's-Eye-View (BEV) semantic maps in automated driving pipelines. These maps provide a rich representation that is crucial for decision-making tasks. However, existing approaches for generating BEV maps rely on fully supervised training and require large amounts of annotated data. To overcome this limitation, the authors propose the first self-supervised approach for generating a BEV semantic map using a single monocular image from the frontal view (FV). Instead of relying on BEV ground truth annotations, the model leverages more easily available FV semantic annotations from video sequences during training. The proposed SkyEye architecture learns through two modes of self-supervision: implicit supervision and explicit supervision. Implicit supervision enforces spatial consistency of the scene over time based on FV semantic sequences. Explicit supervision utilizes BEV pseudolabels generated from FV semantic annotations and self-supervised depth estimates. Extensive evaluations on the KITTI-360 dataset demonstrate that the self-supervised approach performs comparably to state-of-the-art fully supervised methods. Remarkably, it achieves competitive results using only 1% of direct supervision in the BEV compared to fully supervised approaches. In addition to presenting their approach, the authors publicly release both their code and the BEV datasets generated from the KITTI-360 and Waymo datasets to facilitate further research in this area. Overall, this paper introduces an innovative self-supervised approach which reduces reliance on annotated data while maintaining performance levels comparable to fully supervised methods.
Created on 16 Oct. 2023
Available in other languages: fr

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.