DETRs with Collaborative Hybrid Assignments Training

AI-generated keywords: Co-DETR DETR sparse supervision feature learning attention learning

AI-generated Key Points

  • Authors address the issue of sparse supervision in DETR models caused by too few positive samples assigned during training
  • Proposed training scheme called Co-DETR (Collaborative Hybrid Assignments Training) to enhance learning ability of DETR-based detectors
  • Co-DETR improves feature learning in encoder and attention learning in decoder through two main components: collaborative hybrid assignments training and customized positive queries generation
  • Collaborative hybrid assignments training involves training multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN
  • Customized positive queries are generated by extracting positive coordinates from these auxiliary heads, improving efficiency of training positive samples in the decoder
  • During inference, auxiliary heads are discarded with no additional parameters or computational cost to original detector
  • Co-DETR eliminates need for handcrafted non-maximum suppression (NMS)
  • Evaluated on various DETR variants achieving state-of-the-art results on COCO val dataset with improvement from 58.5% to 59.5%
  • When incorporated with ViT backbone, achieves impressive results of 66.0% AP on COCO test dev dataset and 67.9% AP on LVIS val dataset with significantly fewer model sizes
  • Co-DETR presents an effective solution for improving feature learning and attention learning in DETR-based detectors while achieving state-of-the-art performance on various benchmark datasets
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhuofan Zong, Guanglu Song, Yu Liu

ICCV 2023. Codes are available at https://github.com/Sense-X/Co-DETR
License: CC BY 4.0

Abstract: In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely $\mathcal{C}$o-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear margins with much fewer model sizes. Codes are available at \url{https://github.com/Sense-X/Co-DETR}.

Submitted to arXiv on 22 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.12860v5

In this paper, the authors address the issue of sparse supervision in DETR (Detection Transformer) models caused by too few positive samples assigned during training. They propose a novel training scheme called Co-DETR (Collaborative Hybrid Assignments Training) to enhance the learning ability of DETR-based detectors. Co-DETR improves feature learning in the encoder and attention learning in the decoder through two main components: collaborative hybrid assignments training and customized positive queries generation. The collaborative hybrid assignments training scheme involves training multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN which enhances the encoder's learning ability in end-to-end detectors. Additionally, customized positive queries are generated by extracting positive coordinates from these auxiliary heads which improves the efficiency of training positive samples in the decoder. During inference, these auxiliary heads are discarded introducing no additional parameters or computational cost to the original detector. Co-DETR also eliminates the need for handcrafted non-maximum suppression (NMS). The proposed approach is evaluated on various DETR variants including DAB-DETR, Deformable-DETR and DINO-Deformable-DETR with state of art results on COCO val dataset achieving an improvement from 58.5% to 59.5%. Moreover, when incorporated with ViT backbone it achieves impressive results of 66.0% AP on COCO test dev dataset and 67.9% AP on LVIS val dataset outperforming previous methods with significantly fewer model sizes. Overall, Co-DETR presents an effective solution for improving feature learning and attention learning in DETR based detectors while achieving state of art performance on various benchmark datasets.
Created on 10 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.