TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

AI-generated keywords: TokenHMR

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors address the challenge of regressing 3D human pose and shape from a single image with a focus on achieving high 3D accuracy
  • Observation that as 2D accuracy increases, there is a decline in 3D pose accuracy due to biases in pseudo-ground-truth data and camera projection model
  • Introduction of Threshold-Adaptive Loss Scaling (TALS) to penalize significant errors in 2D and pseudo-ground-truth data without affecting smaller errors
  • Proposal of tokenized representations of human pose and formulating the problem as token prediction to reduce ambiguity in estimating valid human poses
  • Extensive experiments demonstrate that the reformulated keypoint loss function and tokenization technique significantly improve 3D accuracy compared to existing methods
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black

CVPR 2024

Abstract: We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy. The current best methods leverage large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints, leading to robust performance. With such methods, we observe a paradoxical decline in 3D pose accuracy with increasing 2D accuracy. This is caused by biases in the p-GT and the use of an approximate camera projection model. We quantify the error induced by current camera models and show that fitting 2D keypoints and p-GT accurately causes incorrect 3D poses. Our analysis defines the invalid distances within which minimizing 2D and p-GT losses is detrimental. We use this to formulate a new loss Threshold-Adaptive Loss Scaling (TALS) that penalizes gross 2D and p-GT losses but not smaller ones. With such a loss, there are many 3D poses that could equally explain the 2D evidence. To reduce this ambiguity we need a prior over valid human poses but such priors can introduce unwanted bias. To address this, we exploit a tokenized representation of human pose and reformulate the problem as token prediction. This restricts the estimated poses to the space of valid poses, effectively providing a uniform prior. Extensive experiments on the EMDB and 3DPW datasets show that our reformulated keypoint loss and tokenization allows us to train on in-the-wild data while improving 3D accuracy over the state-of-the-art. Our models and code are available for research at https://tokenhmr.is.tue.mpg.de.

Submitted to arXiv on 25 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.16752v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In their paper "TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation," authors Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, and Michael J. Black address the challenging task of regressing 3D human pose and shape from a single image while focusing on achieving high 3D accuracy. The current state-of-the-art methods rely on large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints to achieve robust performance. However, the authors make an intriguing observation that as 2D accuracy increases, there is a paradoxical decline in 3D pose accuracy. This phenomenon is attributed to biases present in the p-GT data and the utilization of an approximate camera projection model. To address this issue, the authors conduct a thorough analysis to quantify the error introduced by existing camera models and demonstrate that accurately fitting 2D keypoints and p-GT can lead to incorrect 3D poses. They define specific invalid distances within which minimizing losses related to 2D keypoints and p-GT becomes detrimental. To mitigate this problem, they propose a novel loss function called Threshold-Adaptive Loss Scaling (TALS), which penalizes significant errors in 2D and p-GT data without affecting smaller errors. Furthermore, the paper discusses the challenge of reducing ambiguity in estimating valid human poses based on given evidence. While prior knowledge about valid poses can introduce bias, the authors propose a solution by leveraging tokenized representations of human pose and formulating the problem as token prediction. This approach effectively restricts estimated poses to a space of valid configurations, providing a uniform prior without introducing unwanted biases. Extensive experiments conducted on datasets such as EMDB and 3DPW demonstrate that the reformulated keypoint loss function and tokenization technique enable training on diverse real-world data while significantly improving 3D accuracy compared to existing state-of-the-art methods. The authors make their models and code available for further research at https://tokenhmr.is.tue.mpg.de. Overall, this work presents innovative advancements in human mesh recovery by addressing key challenges in regressing accurate 3D human pose from single images through novel loss functions and tokenized pose representations.
Created on 03 Oct. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.