Self-Improving Diffusion Models with Synthetic Data
AI-generated Key Points
- The demand for real data to train large generative models is outpacing its availability, leading to a shift towards utilizing synthetic data.
- Training new generative models with synthetic data can result in issues like model autophagy disorder (MAD) and model collapse, compromising the quality and diversity of generated data.
- Traditional advice has been to avoid using synthetic data for training to prevent descending into MADness.
- Self-Improving Diffusion Models with Synthetic Data (SIMS) introduces a novel training concept for diffusion models by leveraging self-synthesized data to provide negative guidance during the generation process.
- SIMS sets new benchmarks in terms of Fréchet inception distance (FID) metrics for generating datasets like CIFAR-10 and ImageNet-64 while delivering competitive results on FFHQ-64 and ImageNet-512.
- SIMS is the first prophylactic generative AI algorithm that can be iteratively trained on self-generated synthetic data without succumbing to MAD, offering adjustments in a diffusion model's synthetic data distribution to align with specific target distributions within a domain.
- Contributions from various sources including NSF grants, ONR grants, AFOSR grant, DOE grants, Vannevar Bush Faculty Fellowship, and Ken Kennedy Institute Fellowship underscore the collaborative effort behind this innovative research endeavor.
Authors: Sina Alemohammad, Ahmed Imtiaz Humayun, Shruti Agarwal, John Collomosse, Richard Baraniuk
Abstract: The artificial intelligence (AI) world is running out of real data for training increasingly large generative models, resulting in accelerating pressure to train on synthetic data. Unfortunately, training new generative models with synthetic data from current or past generation models creates an autophagous (self-consuming) loop that degrades the quality and/or diversity of the synthetic data in what has been termed model autophagy disorder (MAD) and model collapse. Current thinking around model autophagy recommends that synthetic data is to be avoided for model training lest the system deteriorate into MADness. In this paper, we take a different tack that treats synthetic data differently from real data. Self-IMproving diffusion models with Synthetic data (SIMS) is a new training concept for diffusion models that uses self-synthesized data to provide negative guidance during the generation process to steer a model's generative process away from the non-ideal synthetic data manifold and towards the real data distribution. We demonstrate that SIMS is capable of self-improvement; it establishes new records based on the Fr\'echet inception distance (FID) metric for CIFAR-10 and ImageNet-64 generation and achieves competitive results on FFHQ-64 and ImageNet-512. Moreover, SIMS is, to the best of our knowledge, the first prophylactic generative AI algorithm that can be iteratively trained on self-generated synthetic data without going MAD. As a bonus, SIMS can adjust a diffusion model's synthetic data distribution to match any desired in-domain target distribution to help mitigate biases and ensure fairness.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.