[MASK] is All You Need

AI-generated keywords: Generative models Masked Generative Models Non-Autoregressive Models Discrete Interpolants Discrete-state models

AI-generated Key Points

  • Two prominent paradigms in generative models: Masked Generative Models and Non-Autoregressive Models
  • Proposal of a novel approach bridging these paradigms using discrete-state models in vision domain
  • Methodology involves comprehensive analysis across both types of models and redefines traditional discriminative tasks as an unmasking process within a discrete-state model
  • Introduction of framework called Discrete Interpolants leading to state-of-the-art performance on various benchmarks
  • Leveraging [MASK] in discrete-state models to bridge gap between different generative models and integrate generative and discriminative tasks seamlessly
  • Potential extension to other approaches by utilizing discrete stochastic interpolants mentioned in related works
  • Acknowledgment of contributions from Timy Phan, Moyang Li, and Owen Vincent for proofreading assistance and technical support
  • Support received from various entities including the German Federal Ministry for Economic Affairs and Climate Action, Bayer AG, and the German Research Foundation (DFG)
  • Gratitude expressed to the Gauss Center for Supercomputing for providing computational resources through NIC on JUWELS at JSC and HPC resources from the Erlangen National High Performance Computing Center (NHR@FAU funded by DFG)
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Vincent Tao Hu, Björn Ommer

Technical Report (WIP), Project Page(code, model, dataset): https://compvis.github.io/mask/
License: CC BY 4.0

Abstract: In generative models, two paradigms have gained attraction in various applications: next-set prediction-based Masked Generative Models and next-noise prediction-based Non-Autoregressive Models, e.g., Diffusion Models. In this work, we propose using discrete-state models to connect them and explore their scalability in the vision domain. First, we conduct a step-by-step analysis in a unified design space across two types of models including timestep-independence, noise schedule, temperature, guidance strength, etc in a scalable manner. Second, we re-cast typical discriminative tasks, e.g., image segmentation, as an unmasking process from [MASK] tokens on a discrete-state model. This enables us to perform various sampling processes, including flexible conditional sampling by only training once to model the joint distribution. All aforementioned explorations lead to our framework named Discrete Interpolants, which enables us to achieve state-of-the-art or competitive performance compared to previous discrete-state based methods in various benchmarks, like ImageNet256, MS COCO, and video dataset FaceForensics. In summary, by leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models, as well as generative and discriminative tasks.

Submitted to arXiv on 09 Dec. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2412.06787v2

In the realm of generative models, two prominent paradigms have emerged: Masked Generative Models and Non-Autoregressive Models. In this study, we propose a novel approach that bridges these two paradigms using discrete-state models in the domain of vision. Our methodology involves a comprehensive analysis across both types of models and redefines traditional discriminative tasks as an unmasking process within a discrete-state model. Through our exploration and experimentation, we introduce a framework called Discrete Interpolants and achieve state-of-the-art performance on various benchmarks. By leveraging [MASK] in discrete-state models, we effectively bridge the gap between different generative models while seamlessly integrating generative and discriminative tasks. Our method has the potential to extend to other approaches mentioned in related works by utilizing discrete stochastic interpolants. We would like to acknowledge the contributions of Timy Phan and Moyang Li for proofreading assistance and Owen Vincent for technical support. This project has received support from various entities including the German Federal Ministry for Economic Affairs and Climate Action, Bayer AG, and the German Research Foundation (DFG). We also express gratitude to the Gauss Center for Supercomputing for providing computational resources through NIC on JUWELS at JSC and HPC resources from the Erlangen National High Performance Computing Center (NHR@FAU funded by DFG).
Created on 23 Dec. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.