Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

AI-generated keywords: Diffusion Forcing

AI-generated Key Points

  • Novel training paradigm combining next-token prediction models and full-sequence diffusion models
  • Allows for variable-length generation and guides sampling towards desirable trajectories
  • Offers capabilities such as rolling out sequences beyond training horizon, new sampling schemes, and improved decision-making performance
  • Demonstrates flexibility in controllable sequential compositional generation
  • Enables long-horizon imitation learning and robust visuomotor control in robotics applications
  • Shows robustness to noisy or missing observations through Bayes filtering principles
  • Competes favorably with prior approaches in time series forecasting
  • Applicable in language modeling, planning, video generation, robotic manipulation, and time series forecasting
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

License: CC BY 4.0

Abstract: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of next-token prediction models, such as variable-length generation, with the strengths of full-sequence diffusion models, such as the ability to guide sampling to desirable trajectories. Our method offers a range of additional capabilities, such as (1) rolling-out sequences of continuous tokens, such as video, with lengths past the training horizon, where baselines diverge and (2) new sampling and guiding schemes that uniquely profit from Diffusion Forcing's variable-horizon and causal architecture, and which lead to marked performance gains in decision-making and planning tasks. In addition to its empirical success, our method is proven to optimize a variational lower bound on the likelihoods of all subsequences of tokens drawn from the true joint distribution. Project website: https://boyuan.space/diffusion-forcing

Submitted to arXiv on 01 Jul. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2407.01392v1

is a novel training paradigm that combines the strengths of next-token prediction models and full-sequence diffusion models in sequence generative modeling. This approach allows for variable-length generation while guiding sampling towards desirable trajectories. It also offers additional capabilities such as rolling out sequences of continuous tokens beyond the training horizon, introducing new sampling and guiding schemes, and leveraging its variable-horizon and causal architecture for improved performance in decision-making and planning tasks. The study demonstrates the flexibility of in controllable sequential compositional generation by modifying the sampling scheme to compose sub-sequences of observed sequences. In robotics applications, enables long-horizon imitation learning and robust visuomotor control by incorporating memory into the latent state. This results in higher success rates compared to diffusion policy algorithms without memory. Additionally, this method shows robustness to noisy or missing observations through principles from Bayes filtering. Furthermore, proves to be a good general-purpose sequence model in time series forecasting, competing favorably with prior diffusion and transformer-based approaches. The paper showcases the potential of in various applications such as language modeling, planning, video generation, robotic manipulation, and time series forecasting. Overall,presents a promising approach that bridges the gap between next-token prediction models and full-sequence diffusion models for enhanced sequence generative modeling across diverse domains.
Created on 04 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.