You Only Segment Once: Towards Real-Time Panoptic Segmentation
AI-generated Key Points
- YOSO is a real-time panoptic segmentation framework
- It aims to achieve efficiency and accuracy in image segmentation
- YOSO predicts masks using dynamic convolutions between panoptic kernels and image feature maps
- It enables segmentation of both instance and semantic tasks with one pass
- The authors designed a feature pyramid aggregator for efficient feature map extraction
- They also designed a separable dynamic decoder for panoptic kernel generation
- YOSO performs multi-head cross-attention through separable dynamic convolution to enhance efficiency and accuracy
- YOSO achieves competitive performance compared to state-of-the-art models
- Impressive results on various datasets: 46.4 PQ at 45.6 FPS on COCO, 52.5 PQ at 22.6 FPS on Cityscapes, 38.0 PQ at 35.4 FPS on ADE20K, and 34.1 PQ at 7.1 FPS on Mapillary Vistas.
- Increasing the number of stages improves PQ performance but decreases FPS performance; two stages strike the best balance between speed and accuracy.
- Increasing the number of proposal kernels from 50 to 100 improves PQ performance; saturates at 150 kernels; higher numbers decrease speed as well.
Authors: Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao
Abstract: In this paper, we propose YOSO, a real-time panoptic segmentation framework. YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps, in which you only need to segment once for both instance and semantic segmentation tasks. To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation. The aggregator re-parameterizes interpolation-first modules in a convolution-first way, which significantly speeds up the pipeline without any additional costs. The decoder performs multi-head cross-attention via separable dynamic convolution for better efficiency and accuracy. To the best of our knowledge, YOSO is the first real-time panoptic segmentation framework that delivers competitive performance compared to state-of-the-art models. Specifically, YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K; and 34.1 PQ, 7.1 FPS on Mapillary Vistas. Code is available at https://github.com/hujiecpp/YOSO.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.