Synthetic Data from Diffusion Models Improves ImageNet Classification

AI-generated keywords: Generative Models

AI-generated Key Points

  • Deep generative models have advanced in generating high-quality, photo-realistic images based on text prompts.
  • These models can be used for generative data augmentation to enhance challenging discriminative tasks.
  • Large-scale text-to-image diffusion models can be fine-tuned to produce class conditional models with state-of-the-art Frechet Inception Distance (FID) and Inception Score at a resolution of 256x256.
  • Generated samples achieve a new state-of-the-art in Classification Accuracy Scores, with 64.96 for 256x256 generative samples and improving to 69.24 for 1024x1024 samples.
  • Augmenting the ImageNet training set with these generated samples leads to significant improvements in ImageNet classification accuracy compared to strong ResNet and Vision Transformer baselines.
  • Previous studies have shown that synthetic data generated with GLIDE improves zero-shot and few-shot image classification performance.
  • Fine-tuning the Imagen text-to-image model for class conditional ImageNet leads to state-of-the-art models.
  • This research highlights the potential of using large-scale text-to-image diffusion models for generative data augmentation, leading to improved performance in challenging discriminative tasks such as ImageNet classification.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, David J. Fleet

License: CC BY 4.0

Abstract: Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data augmentation, helping to improve challenging discriminative tasks? We show that large-scale text-to image diffusion models can be fine-tuned to produce class conditional models with SOTA FID (1.76 at 256x256 resolution) and Inception Score (239 at 256x256). The model also yields a new SOTA in Classification Accuracy Scores (64.96 for 256x256 generative samples, improving to 69.24 for 1024x1024 samples). Augmenting the ImageNet training set with samples from the resulting models yields significant improvements in ImageNet classification accuracy over strong ResNet and Vision Transformer baselines.

Submitted to arXiv on 17 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.08466v1

Deep generative models have made significant advancements in generating high-quality, photo-realistic images based on text prompts. These models have the potential to be used for generative data augmentation, which can enhance challenging discriminative tasks. In this study, the researchers demonstrate that large-scale text-to-image diffusion models can be fine-tuned to produce class conditional models with state-of-the-art Frechet Inception Distance (FID) and Inception Score at a resolution of 256x256. The results show that the generated samples achieve a new state-of-the-art in Classification Accuracy Scores, with 64.96 for 256x256 generative samples and improving to 69.24 for 1024x1024 samples. By augmenting the ImageNet training set with these generated samples, significant improvements in ImageNet classification accuracy are observed compared to strong ResNet and Vision Transformer baselines. The authors provide additional context by discussing related work in synthetic data generation using diffusion models. Previous studies have shown that synthetic data generated with GLIDE improves zero-shot and few-shot image classification performance. Augmenting individual images using a pretrained diffusion model has also demonstrated improvements in few-shot settings. Two recent papers have trained ImageNet classifiers on images generated by diffusion models but did not fine tune them. However, these studies found that the generated images did not improve accuracy on the clean ImageNet validation set. In contrast, this study shows that fine tuning the Imagen text to image model for class conditional ImageNet leads to state of the art models. Overall, this research highlights the potential of using large scale text to image diffusion models for generative data augmentation, leading to improved performance in challenging discriminative tasks such as ImageNet classification.
Created on 24 Dec. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.