GeneCIS: A Benchmark for General Conditional Image Similarity
AI-generated Key Points
- Models should be able to adapt to different notions of similarity dynamically
- The authors propose the GeneCIS benchmark for General Conditional Image Similarity to measure models' ability to adapt to a range of similarity conditions
- The benchmark is designed for zero-shot evaluation only and considers an open-set of similarity conditions
- Baselines from powerful CLIP models struggle on GeneCIS and performance on the benchmark is weakly correlated with ImageNet accuracy
- The authors propose a simple, scalable solution based on automatically mining information from existing image-caption datasets to address this issue
- Their method offers a substantial boost over the baselines on GeneCIS and further improves zero-shot performance on related image retrieval benchmarks
- Their model surpasses state-of-the-art supervised models on MIT-States even though evaluated zero-shot
- Statistics of evaluations are shown in Table 1 including number of retrieval templates and gallery images as well as carefully constructed benchmarks with only one 'positive' image among targets with gallery sizes between 10 and 15 images.
- Distribution of objects and attributes specified in the conditions are shown in Figure 3 noting that their space of conditions spans a long tail of over 400 attributes and 100 objects.
- Strongest ViT-B/16 model's results are reported in Table 2 and scaling up mined triplets used for training improves performance shown in Figure 5.
- Different CLIP backbones' impact on their model's performance is explored in Figure 6.
- This paper proposes an important but understudied problem in computer vision: General Conditional Image Similarity.
- The proposed benchmark evaluates an open set of similarity conditions and is designed for zero shot testing only.
- The authors propose a way forward for scalably training conditional similarity models which mines information from widely available image caption datasets.
- Their method not only boosts performance over all baselines on GeneCIS but also provides substantial zero shot gains on related image retrieval tasks.
Authors: Sagar Vaze, Nicolas Carion, Ishan Misra
Abstract: We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly correlated with ImageNet accuracy, suggesting that simply scaling existing methods is not fruitful. We further propose a simple, scalable solution based on automatically mining information from existing image-caption datasets. We find our method offers a substantial boost over the baselines on GeneCIS, and further improves zero-shot performance on related image retrieval benchmarks. In fact, though evaluated zero-shot, our model surpasses state-of-the-art supervised models on MIT-States. Project page at https://sgvaze.github.io/genecis/.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.