MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

AI-generated keywords: Text-to-image generation MobileDiffusion Architecture optimization Sampling techniques Mobile-based image synthesis

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • MobileDiffusion introduced by researchers Zhao, Xu, Xiao, and Hou
  • Extensive optimizations in architecture and sampling techniques
  • Reduced redundancy and enhanced computational efficiency without compromising image quality
  • Employed distillation and diffusion-GAN finetuning techniques for 8-step and 1-step inference processes
  • Achieved sub-second inference speed for generating high-quality $512\times512$ images on mobile devices
  • Overcomes limitations in deploying text-to-image models on mobile platforms
  • Establishes MobileDiffusion as a state-of-the-art solution for efficient text-to-image generation
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yang Zhao, Yanwu Xu, Zhisheng Xiao, Tingbo Hou

Abstract: The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and slow inference speed. In this paper, we propose \textbf{MobileDiffusion}, a highly efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to reduce redundancy, enhance computational efficiency, and minimize model's parameter count, while preserving image generation quality. Additionally, we employ distillation and diffusion-GAN finetuning techniques on MobileDiffusion to achieve 8-step and 1-step inference respectively. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed techniques. MobileDiffusion achieves a remarkable \textbf{sub-second} inference speed for generating a $512\times512$ image on mobile devices, establishing a new state of the art.

Submitted to arXiv on 28 Nov. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2311.16567v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of text-to-image generation, the deployment of large-scale models on mobile devices has been hindered by their significant model size and slow inference speed. To address this challenge, a team of researchers led by Yang Zhao, Yanwu Xu, Zhisheng Xiao, and Tingbo Hou have introduced MobileDiffusion. This innovative text-to-image diffusion model is the result of extensive optimizations in both architecture and sampling techniques. Through meticulous examination of model architecture design, the team successfully reduced redundancy and enhanced computational efficiency while minimizing the model's parameter count. This was achieved without compromising on high image generation quality. Additionally, distillation and diffusion-GAN finetuning techniques were employed to enable 8-step and 1-step inference processes respectively. Empirical studies encompassing both quantitative and qualitative analyses showcased the effectiveness of these proposed techniques. Notably, MobileDiffusion achieved an impressive sub-second inference speed for generating high-quality $512\times512$ images on mobile devices—setting a new benchmark in the field. This groundbreaking work not only overcomes existing limitations in deploying text-to-image models on mobile platforms but also establishes MobileDiffusion as a state-of-the-art solution for efficient text-to-image generation. The contributions made by Zhao et al. pave the way for further advancements in mobile-based image synthesis technologies.
Created on 24 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.