Human Motion Diffusion Model

AI-generated keywords: Motion Diffusion Model Generative Model Human Motion Text-to-Motion Action-to-Motion

AI-generated Key Points

Natural and expressive human motion generation is challenging in computer animation due to diversity, perceptual sensitivity, and difficulty in accurately describing it.
Current generative solutions are low-quality or limited in expressiveness.
Diffusion models have shown remarkable generative capabilities in other domains and are promising candidates for human motion due to their many-to-many nature.
Motion Diffusion Model (MDM) is introduced as a carefully adapted classifier-free diffusion-based generative model for the human motion domain.
MDM is transformer-based, combining insights from motion generation literature.
MDM predicts the sample rather than the noise in each diffusion step, facilitating the use of established geometric losses on the locations and velocities of the motion such as foot contact loss.
MDM achieves state-of-the-art quality in several motion generation tasks while requiring only about three days of training on a lightweight resource.
In text-to-motion tasks, MDM generates coherent motions that achieve state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion.
Human evaluators prefer MDM generated motions over real motions 42% of the time.
In action to motion tasks, MDM outperforms state of the art models designed specifically for this task on common benchmarks.
MDM also demonstrates completion and editing by adapting diffusion image inpainting to set a motion prefix and suffix and using our model to fill in gaps under textual conditions while maintaining semantic input.
By performing inpainting in joints space rather than temporally, MDM also demonstrates semantic editing of specific body parts without changing others.
Overall, Motion Diffusion Model introduces a promising solution to generate high quality diverse human motions with controllability while being lightweight and efficient compared to existing methods.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, Amit H. Bermano

arXiv: 2209.14916v2 - DOI (cs.CV)

License: CC BY-SA 4.0

Abstract: Natural and expressive human motion generation is the holy grail of computer animation. It is a challenging task, due to the diversity of possible motion, human perceptual sensitivity to it, and the difficulty of accurately describing it. Therefore, current generative solutions are either low-quality or limited in expressiveness. Diffusion models, which have already shown remarkable generative capabilities in other domains, are promising candidates for human motion due to their many-to-many nature, but they tend to be resource hungry and hard to control. In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain. MDM is transformer-based, combining insights from motion generation literature. A notable design-choice is the prediction of the sample, rather than the noise, in each diffusion step. This facilitates the use of established geometric losses on the locations and velocities of the motion, such as the foot contact loss. As we demonstrate, MDM is a generic approach, enabling different modes of conditioning, and different generation tasks. We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion. https://guytevet.github.io/mdm-page/ .

Submitted to arXiv on 29 Sep. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2209.14916v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

The generation of natural and expressive human motion is a challenging task in computer animation due to the diversity of possible motions, human perceptual sensitivity, and the difficulty of accurately describing it. Current generative solutions are either low-quality or limited in expressiveness. However, diffusion models have shown remarkable generative capabilities in other domains and are promising candidates for human motion due to their many-to-many nature. The Motion Diffusion Model (MDM) is introduced as a carefully adapted classifier-free diffusion-based generative model for the human motion domain. MDM is transformer-based, combining insights from motion generation literature. A notable design choice is the prediction of the sample rather than the noise in each diffusion step, facilitating the use of established geometric losses on the locations and velocities of the motion such as foot contact loss. MDM's generic approach enables different modes of conditioning and different generation tasks. The MDM framework achieves state-of-the-art quality in several motion generation tasks while requiring only about three days of training on a lightweight resource. In text-to-motion tasks, MDM generates coherent motions that achieve state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion. In addition, human evaluators prefer MDM generated motions over real motions 42% of the time. In action to motion tasks, MDM outperforms state of the art models designed specifically for this task on common benchmarks. MDM also demonstrates completion and editing by adapting diffusion image inpainting to set a motion prefix and suffix and using our model to fill in gaps under textual conditions while maintaining semantic input. By performing inpainting in joints space rather than temporally, MDM also demonstrates semantic editing of specific body parts without changing others. Overall, Motion Diffusion Model introduces a promising solution to generate high quality diverse human motions with controllability while being lightweight and efficient compared to existing methods. The code can be found at https://github.com/GuyTevet/motiondiffusionmodel .

- Natural and expressive human motion generation is challenging in computer animation due to diversity, perceptual sensitivity, and difficulty in accurately describing it.
- Current generative solutions are low-quality or limited in expressiveness.
- Diffusion models have shown remarkable generative capabilities in other domains and are promising candidates for human motion due to their many-to-many nature.
- Motion Diffusion Model (MDM) is introduced as a carefully adapted classifier-free diffusion-based generative model for the human motion domain.
- MDM is transformer-based, combining insights from motion generation literature.
- MDM predicts the sample rather than the noise in each diffusion step, facilitating the use of established geometric losses on the locations and velocities of the motion such as foot contact loss.
- MDM achieves state-of-the-art quality in several motion generation tasks while requiring only about three days of training on a lightweight resource.
- In text-to-motion tasks, MDM generates coherent motions that achieve state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion.
- Human evaluators prefer MDM generated motions over real motions 42% of the time.
- In action to motion tasks, MDM outperforms state of the art models designed specifically for this task on common benchmarks.
- MDM also demonstrates completion and editing by adapting diffusion image inpainting to set a motion prefix and suffix and using our model to fill in gaps under textual conditions while maintaining semantic input.
- By performing inpainting in joints space rather than temporally, MDM also demonstrates semantic editing of specific body parts without changing others.
- Overall, Motion Diffusion Model introduces a promising solution to generate high quality diverse human motions with controllability while being lightweight and efficient compared to existing methods.

Summary: Scientists have created a computer program that can make people move like humans do. This is hard to do because everyone moves differently and it's tricky to describe how they move. Other programs that try to do this aren't very good. The new program, called Motion Diffusion Model (MDM), is really good at making people move in lots of different ways. It only takes three days to teach the program how to do this. Definitions- Natural and expressive human motion generation: creating movements that look like those made by real people - Computer animation: using computers to create moving pictures - Diversity: being different from each other - Perceptual sensitivity: being able to notice small differences or changes - Generative solutions: computer programs that create things automatically - Diffusion models: a type of mathematical model used in science and engineering - Classifier-free: not using a system for sorting things into categories - Transformer-based: using a type of neural network called a transformer - Geometric losses: measures used in math to compare shapes and sizes - Foot contact loss: measuring how well feet touch the ground when walking or running - Text-to-motion tasks: creating movements based on written words - Action-to-motion tasks: creating movements based on actions, like kicking a ball - Semantic input/editing/completion/inpainting: changing or finishing something while keeping its meaning intact

Generating Natural and Expressive Human Motion with the Motion Diffusion Model

Computer animation is a complex task, as it requires accurately replicating natural human motion. This is due to the diversity of possible motions, human perceptual sensitivity, and the difficulty of accurately describing them. Current generative solutions are either low-quality or limited in expressiveness. However, diffusion models have shown remarkable generative capabilities in other domains and are promising candidates for human motion due to their many-to-many nature. The Motion Diffusion Model (MDM) is a carefully adapted classifier-free diffusion-based generative model for the human motion domain that has recently been introduced. It combines insights from motion generation literature with transformer-based technology to create an efficient yet powerful solution for generating high quality diverse motions with controllability while being lightweight and efficient compared to existing methods.

Design Choices

A notable design choice of MDM is its prediction of samples rather than noise in each diffusion step, which facilitates the use of established geometric losses on locations and velocities of motions such as foot contact loss. Additionally, MDM's generic approach enables different modes of conditioning and different generation tasks such as text-to-motion tasks or action-to-motion tasks.

Performance Results

MDM has achieved state-of-the art quality in several motion generation tasks while requiring only about three days of training on a lightweight resource. In text-to-motion tasks, MDM generates coherent motions that achieve state-of–the–art results on leading benchmarks for text–to–motion and action–to–motion; furthermore, human evaluators prefer MDM generated motions over real motions 42% of the time! In action to motion tasks, MDM outperforms state –of –the –art models designed specifically for this task on common benchmarks. MDM also demonstrates completion and editing by adapting diffusion image inpainting to set a motion prefix and suffix; using our model to fill gaps under textual conditions while maintaining semantic input; performing inpainting in joints space rather than temporally; demonstrating semantic editing of specific body parts without changing others; etc…

Conclusion

Overall, Motion Diffusion Model introduces a promising solution to generate high quality diverse human motions with controllability while being lightweight and efficient compared to existing methods. The code can be found at https://github.com/GuyTevet/motiondiffusionmodel .

Created on 25 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

71.4%

Human Motion Diffusion as a Generative Prior

cs.CV

60.4%

Learning Human Motion Representations: A Unified Perspective

cs.CV

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.