TabICLv2: A better, faster, scalable, and open tabular foundation model

AI-generated keywords: Predictive modeling

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Tabular foundation models like TabPFNv2 and TabICL have risen to prominence, outperforming traditional gradient-boosted trees.
  • The significance of in-context learning tailored for tabular datasets is highlighted by this shift.
  • TabICLv2 is a cutting-edge foundation model for regression and classification with three key pillars of innovation:
  • Novel synthetic data generation engine for high pretraining diversity
  • Architectural enhancements including scalable softmax in attention mechanism for improved generalization capabilities
  • Optimized pretraining protocols shifting from AdamW to the Muon optimizer
  • On benchmark tests such as TabArena and TALENT, TabICLv2 surpasses RealTabPFN-2.5 without tuning, demonstrating remarkable generalization abilities on million-scale datasets within memory constraints while processing faster.
  • Extensive ablation studies quantify the impact of each enhancement introduced in TabICLv2.
  • Authors have released inference code and model weights on GitHub with plans to share synthetic data engine and pretraining code in subsequent releases.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan

Abstract: Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new state-of-the-art foundation model for regression and classification built on three pillars: (1) a novel synthetic data generation engine designed for high pretraining diversity; (2) various architectural innovations, including a new scalable softmax in attention improving generalization to larger datasets without prohibitive long-sequence pretraining; and (3) optimized pretraining protocols, notably replacing AdamW with the Muon optimizer. On the TabArena and TALENT benchmarks, TabICLv2 without any tuning surpasses the performance of the current state of the art, RealTabPFN-2.5 (hyperparameter-tuned, ensembled, and fine-tuned on real data). With only moderate pretraining compute, TabICLv2 generalizes effectively to million-scale datasets under 50GB GPU memory while being markedly faster than RealTabPFN-2.5. We provide extensive ablation studies to quantify these contributions and commit to open research by first releasing inference code and model weights at https://github.com/soda-inria/tabicl, with synthetic data engine and pretraining code to follow.

Submitted to arXiv on 11 Feb. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2602.11139v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of predictive modeling for tabular data, recent advancements have seen Tabular foundation models like TabPFNv2 and TabICL rise to prominence, outperforming traditional gradient-boosted trees. This shift underscores the significance of in-context learning specifically tailored for tabular datasets. Enter TabICLv2, a cutting-edge foundation model for regression and classification that stands on three key pillars of innovation. Firstly, TabICLv2 boasts a novel synthetic data generation engine meticulously crafted to ensure high pretraining diversity. This engine sets the stage for robust model training and performance optimization. Secondly, the model incorporates various architectural enhancements, including a revolutionary scalable softmax in attention mechanism that enhances generalization capabilities across larger datasets without requiring prohibitively long-sequence pretraining. These innovations collectively contribute to improved model efficiency and accuracy. Moreover, TabICLv2 adopts optimized pretraining protocols, with a notable shift from AdamW to the Muon optimizer. This strategic change further refines the model's training process, resulting in enhanced performance outcomes. On benchmark tests such as TabArena and TALENT, TabICLv2 showcases its prowess by surpassing the current state-of-the-art RealTabPFN-2.5 without any tuning required. Notably, even with moderate pretraining compute resources, TabICLv2 demonstrates remarkable generalization abilities on million-scale datasets within 50GB GPU memory constraints while also exhibiting faster processing speeds compared to RealTabPFN-2.5. To substantiate these claims and contributions further, extensive ablation studies have been conducted to quantify the impact of each enhancement introduced in TabICLv2. In a commitment to open research practices, the authors have released inference code and model weights on GitHub (https://github.com/soda-inria/tabicl), with plans to share synthetic data engine and pretraining code in subsequent releases. Authored by Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan, "TabICLv2: A better, faster, scalable, and open tabular foundation model" represents a significant leap forward in predictive modeling for tabular data analysis.
Created on 12 Feb. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.