Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

AI-generated keywords: Matrix-aware optimizers Muon Pion Newton-Schulz iterations spectral gradient orthogonalization

AI-generated Key Points

  • **Muon Limitations:**
  • Challenges in cross-modality vision-language-action (VLA) training due to noisy tail directions and low-rank action-module gradients.
  • Instability in reinforcement learning with verifiable rewards (RLVR) tasks with low Signal-to-Noise Ratio (SNR) gradients.
  • **Introduction of Pion:**
  • A new optimizer introduced as a drop-in replacement for Muon to address its limitations.
  • Implements a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration for spectral whitening.
  • **Pion Features and Benefits:**
  • Maintains computational efficiency while inducing a sharp spectral high-pass effect and suppressing noisy tail components.
  • Offers support for per-head mode enabling independent updates across attention heads at no additional cost.
  • **Performance Comparison:**
  • Empirical evaluations show Pion consistently outperforms both Muon and AdamW on LIBERO and LIBERO-Plus datasets using various architectures.
  • Achieves remarkable success rates in simulation environments and real-world applications, surpassing competitors in robotics tasks involving the Franka Research 3 robot.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

License: CC BY 4.0

Abstract: Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

Submitted to arXiv on 19 May. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2605.19282v1

Pion: A Novel Optimizer for Overcoming Limitations of Muon in Various Learning Scenarios In the realm of matrix-aware optimizers, stands out as a powerful tool that leverages to enforce . This uniform spectral whitening technique has proven effective in enhancing exploration and surpassing AdamW in Large Language Models (LLM) pretraining tasks. However, recent research has uncovered potential limitations of Muon beyond pretraining. One key area where Muon may face challenges is in cross-modality vision-language-action (VLA) training. The uniform spectral whitening approach used by Muon can amplify noisy tail directions due to inherently low-rank action-module gradients. Similarly, in reinforcement learning with verifiable rewards (RLVR) tasks where low Signal-to-Noise Ratio (SNR) gradients are prevalent and per-head specialization from prior training needs to be preserved, Muon's whitening mechanism may prove unstable. To address these challenges, a new optimizer called has been introduced as a drop-in replacement for Muon. Pion maintains the computational efficiency of its predecessor while implementing a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration. This design induces a sharp spectral high-pass effect by anchoring dominant singular values at 1 while suppressing noisy tail components towards 0 with controllable filter strength. Moreover, Pion offers support for a per-head mode that enables updates to be applied independently across attention heads via a simple reshape operation at no additional cost. In empirical evaluations on LIBERO and LIBERO-Plus datasets using l_1-regression and flow-matching architectures, Pion consistently outperforms both Muon and AdamW. For instance, achieving a remarkable 100% success rate on LIBERO Object after just 1,500 training steps with VLA-Adapter compared to 97.0% for Muon and only 32.2% for AdamW. Furthermore, Pion's advantages extend beyond simulation environments to real-world applications such as robotics tasks involving the Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on grasp-and-place tasks. In RLVR post-training experiments on Qwen3-1.7B/4B datasets using GRPO and GMPO methods, Pion also outperforms AdamW on MATH and GSM8K benchmarks while Muon experiences performance degradation leading to zero results. Overall, the introduction of Pion represents a significant advancement in addressing the limitations of Muon beyond pretraining through its innovative high-pass NS iteration mechanism and support for maintaining pretrained per-head heterogeneity in various challenging learning scenarios.
Created on 25 May. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.