Detect Every Thing with Few Examples

AI-generated keywords: DE-ViT Object Detection DINOv2 Backbone Region Propagation Few-Shot Detection

AI-generated Key Points

  • DE-ViT is an open-set object detector for detecting arbitrary categories beyond those seen during training.
  • It uses vision-only DINOv2 backbones and learns new categories through example images instead of language.
  • DE-ViT transforms multi-classification tasks into binary classification tasks, improving general detection ability.
  • It introduces a novel region propagation technique for localization.
  • DE-ViT's performance is evaluated on open-vocabulary, few-shot, and one-shot object detection benchmarks using COCO and LVIS datasets.
  • In terms of open-vocabulary detection on COCO, DE-ViT outperforms the state-of-the-art (SoTA) by achieving a 6.9 AP50 improvement and reaching 50 AP50 in novel classes.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xinyu Zhang, Yuting Wang, Abdeslam Boularias

License: CC BY 4.0

Abstract: Open-set object detection aims at detecting arbitrary categories beyond those seen during training. Most recent advancements have adopted the open-vocabulary paradigm, utilizing vision-language backbones to represent categories with language. In this paper, we introduce DE-ViT, an open-set object detector that employs vision-only DINOv2 backbones and learns new categories through example images instead of language. To improve general detection ability, we transform multi-classification tasks into binary classification tasks while bypassing per-class inference, and propose a novel region propagation technique for localization. We evaluate DE-ViT on open-vocabulary, few-shot, and one-shot object detection benchmark with COCO and LVIS. For COCO, DE-ViT outperforms the open-vocabulary SoTA by 6.9 AP50 and achieves 50 AP50 in novel classes. DE-ViT surpasses the few-shot SoTA by 15 mAP on 10-shot and 7.2 mAP on 30-shot and one-shot SoTA by 2.8 AP50. For LVIS, DE-ViT outperforms the open-vocabulary SoTA by 2.2 mask AP and reaches 34.3 mask APr. Code is available at https://github.com/mlzxy/devit.

Submitted to arXiv on 22 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.12969v2

DE-ViT is an open-set object detector that addresses the challenge of detecting arbitrary categories beyond those seen during training. Unlike previous approaches that utilize vision-language backbones, DE-ViT employs vision-only DINOv2 backbones and learns new categories through example images instead of language. This approach improves general detection ability by transforming multi-classification tasks into binary classification tasks and bypassing per-class inference. Additionally, DE-ViT introduces a novel region propagation technique for localization. The performance of DE-ViT is evaluated on open-vocabulary, few-shot, and one-shot object detection benchmarks using COCO and LVIS datasets. In terms of open-vocabulary detection on COCO, DE-ViT outperforms the state-of-the-art (SoTA) by achieving a 6.9 AP50 improvement and reaching 50 AP50 in novel classes.
Created on 03 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.