You Only Segment Once: Towards Real-Time Panoptic Segmentation

AI-generated keywords: Panoptic Segmentation YOSO Efficiency Accuracy Dynamic Convolution

AI-generated Key Points

  • YOSO is a real-time panoptic segmentation framework
  • It aims to achieve efficiency and accuracy in image segmentation
  • YOSO predicts masks using dynamic convolutions between panoptic kernels and image feature maps
  • It enables segmentation of both instance and semantic tasks with one pass
  • The authors designed a feature pyramid aggregator for efficient feature map extraction
  • They also designed a separable dynamic decoder for panoptic kernel generation
  • YOSO performs multi-head cross-attention through separable dynamic convolution to enhance efficiency and accuracy
  • YOSO achieves competitive performance compared to state-of-the-art models
  • Impressive results on various datasets: 46.4 PQ at 45.6 FPS on COCO, 52.5 PQ at 22.6 FPS on Cityscapes, 38.0 PQ at 35.4 FPS on ADE20K, and 34.1 PQ at 7.1 FPS on Mapillary Vistas.
  • Increasing the number of stages improves PQ performance but decreases FPS performance; two stages strike the best balance between speed and accuracy.
  • Increasing the number of proposal kernels from 50 to 100 improves PQ performance; saturates at 150 kernels; higher numbers decrease speed as well.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao

CVPR 2023
License: CC BY 4.0

Abstract: In this paper, we propose YOSO, a real-time panoptic segmentation framework. YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps, in which you only need to segment once for both instance and semantic segmentation tasks. To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation. The aggregator re-parameterizes interpolation-first modules in a convolution-first way, which significantly speeds up the pipeline without any additional costs. The decoder performs multi-head cross-attention via separable dynamic convolution for better efficiency and accuracy. To the best of our knowledge, YOSO is the first real-time panoptic segmentation framework that delivers competitive performance compared to state-of-the-art models. Specifically, YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K; and 34.1 PQ, 7.1 FPS on Mapillary Vistas. Code is available at https://github.com/hujiecpp/YOSO.

Submitted to arXiv on 26 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.14651v1

In this paper, the authors propose YOSO, a real-time panoptic segmentation framework that aims to achieve both efficiency and accuracy in segmenting images. YOSO predicts masks by utilizing dynamic convolutions between panoptic kernels and image feature maps, enabling the segmentation of both instance and semantic tasks with just one pass. To reduce computational overhead, the authors design a feature pyramid aggregator for efficient feature map extraction and a separable dynamic decoder for panoptic kernel generation. The feature pyramid aggregator re-parameterizes interpolation-first modules in a convolution-first manner, significantly speeding up the pipeline without any additional costs. Meanwhile, the separable dynamic decoder performs multi-head cross-attention through separable dynamic convolution to enhance both efficiency and accuracy. One notable aspect of YOSO is its competitive performance compared to state-of-the-art models. It achieves impressive results on various datasets, including 46.4 PQ at 45.6 FPS on COCO, 52.5 PQ at 22.6 FPS on Cityscapes, 38.0 PQ at 35.4 FPS on ADE20K, and 34.1 PQ at 7.1 FPS on Mapillary Vistas. The authors also explore different configurations to optimize YOSO's performance further. They find that increasing the number of stages improves PQ performance but decreases FPS performance; after experimentation they determine that using two stages strikes the best balance between speed and accuracy. Additionally, they investigate the impact of the number of proposal kernels on performance and find that increasing it from 50 to 100 improves PQ performance while saturating at 150 kernels; however higher numbers of proposal kernels decrease speed as well.
Created on 10 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.