LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

AI-generated keywords: Machine Learning

AI-generated Key Points

  • **Machine Learning:**
  • Constant development of new techniques and frameworks to enhance performance and efficiency.
  • **Joint Embedding Predictive Architectures (JEPAs):**
  • Focus on learning world models in compact latent spaces.
  • **LeWorldModel (LeWM):**
  • Offers stable end-to-end training from raw pixels with minimal hyperparameter tuning requirements.
  • **Efficiency in Planning:**
  • LeWM can plan up to 48 times faster than foundation-model-based world models while remaining competitive across control tasks.
  • **Physical Structure Encoding:**
  • LeWM's latent space encodes meaningful physical structures through probing of physical quantities, making it valuable for various machine learning applications.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero

License: CC BY 4.0

Abstract: Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.

Submitted to arXiv on 13 Mar. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2603.19312v1

, , , , In the field of machine learning, Joint Embedding Predictive Architectures (JEPAs) have emerged as a promising framework for learning world models in compact latent spaces. However, existing methods have been found to be fragile and often rely on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to prevent representation collapse. To address these challenges, a new approach called LeWorldModel (LeWM) has been introduced. <kw>Machine Learning:</kw> In the ever-evolving field of machine learning, new techniques and frameworks are constantly being developed to improve performance and efficiency. <kw>Joint Embedding Predictive Architectures (JEPAs):</kw> JEPAs are a specific type of machine learning framework that focuses on learning world models in compact latent spaces. <kw>LeWorldModel (LeWM):</kw> LeWM is a groundbreaking JEPA that offers stable end-to-end training from raw pixels with minimal hyperparameter tuning requirements. <kw>Efficiency in Planning:</kw> One of the key advantages of LeWM is its ability to plan up to 48 times faster than foundation-model-based world models while maintaining competitiveness across various control tasks. <kw>Physical Structure Encoding:</kw> Through probing of physical quantities, LeWM's latent space has been shown to encode meaningful physical structures, making it a valuable tool for various machine learning applications.
Created on 24 May. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.