DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

AI-generated keywords: perceptual similarity DreamSim image-level embeddings NIGHTS dataset human visual perception

AI-generated Key Points

  • Introduction of a new holistic perceptual metric called DreamSim to address limitations of current perceptual similarity metrics
  • Motivation for perceiving similarities between images in various ways, including higher-level concepts like object pose and semantic content
  • Development of the NIGHTS dataset containing human similarity judgments over image triplets focusing on mid-level similarities
  • Use of features from large pre-trained vision models to outperform standard perceptual metrics on the NIGHTS dataset
  • Creation of DreamSim by tuning these models on their data to align better with human perception
  • Consideration of foreground objects, color, and layout in DreamSim compared to previous metrics or modern image embeddings
  • Expansion of measuring perceptual image similarity to encompass factors beyond low-level similarities with DreamSim
  • Contribution of the NIGHTS dataset and DreamSim to advancing understanding of human visual perception for tasks like image retrieval and synthesis
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, Phillip Isola

Website: https://dreamsim-nights.github.io/ Code: https://github.com/ssundaram21/dreamsim
License: CC BY-NC-SA 4.0

Abstract: Current perceptual similarity metrics operate at the level of pixels and patches. These metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level similarities and differences in image layout, object pose, and semantic content. In this paper, we develop a perceptual metric that assesses images holistically. Our first step is to collect a new dataset of human similarity judgments over image pairs that are alike in diverse ways. Critical to this dataset is that judgments are nearly automatic and shared by all observers. To achieve this we use recent text-to-image models to create synthetic pairs that are perturbed along various dimensions. We observe that popular perceptual metrics fall short of explaining our new data, and we introduce a new metric, DreamSim, tuned to better align with human perception. We analyze how our metric is affected by different visual attributes, and find that it focuses heavily on foreground objects and semantic content while also being sensitive to color and layout. Notably, despite being trained on synthetic data, our metric generalizes to real images, giving strong results on retrieval and reconstruction tasks. Furthermore, our metric outperforms both prior learned metrics and recent large vision models on these tasks.

Submitted to arXiv on 15 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.09344v1

In this paper, the authors introduce a new holistic perceptual metric called DreamSim to address the limitations of current perceptual similarity metrics. Their motivation stems from the importance of perceiving similarities between images in various ways, including higher-level concepts such as object pose and semantic content. Existing metrics like PSNR and SSIM are limited in capturing these higher-level similarities, leading researchers to explore image-level embeddings from large vision models like DINO and CLIP. To bridge this gap, the authors collect a new dataset named NIGHTS (Novel Image Generations with Human-Tested Similarity) containing human similarity judgments over image triplets. This dataset focuses on mid-level similarities rather than just low-level perturbations or high-level category differences. The authors find that features from recent large pre-trained vision models outperform standard perceptual metrics on their dataset and develop DreamSim by tuning these models on their data. DreamSim aligns better with human perception and demonstrates high agreement in both quantitative assessments and qualitative comparisons using real images. It heavily considers foreground objects while also taking color and layout into account compared to previous metrics or modern image embeddings. In summary, this work expands the task of measuring perceptual image similarity to encompass factors beyond low-level similarities and provides a new metric that effectively captures mid-level similarities. The NIGHTS dataset and DreamSim contribute significantly to advancing our understanding of human visual perception and can be integrated into existing pipelines for tasks such as image retrieval and synthesis.
Created on 30 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.