Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

AI-generated keywords: Model soups

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors propose a novel approach called "model soups" for maximizing model accuracy in fine-tuning large pre-trained models
  • Traditional method involves training multiple models with different hyperparameters, while the new approach averages weights of multiple fine-tuned models with varying hyperparameter configurations
  • Model soups lead to improvements in accuracy and robustness without increasing inference or memory costs
  • Application of model soups on large pre-trained models like CLIP, ALIGN, and ViT-G resulted in state-of-the-art performance surpassing 90.94% top-1 accuracy on ImageNet
  • Benefits extend beyond image classification tasks to natural language processing tasks, enhancing out-of-distribution and zero-shot performance
  • Analytical insights provided on why weight averaging and logit ensembling improve performance related to flatness of loss landscapes and prediction confidence
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

ICML 2022. The last three authors contributed equally

Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.

Submitted to arXiv on 10 Mar. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2203.05482v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In their paper titled "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time," Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, and Ludwig Schmidt explore a novel approach to maximizing model accuracy in the context of fine-tuning large pre-trained models. The traditional method involves training multiple models with different hyperparameters and selecting the best-performing one on a validation set. However, the authors propose an alternative technique where they average the weights of multiple models fine-tuned with varying hyperparameter configurations. This new approach leads to improvements in both accuracy and robustness without incurring additional inference or memory costs. Referred to as "model soups," this method allows for the combination of numerous models to enhance performance significantly. By applying this technique to fine-tune large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, the authors achieved remarkable results surpassing the state-of-the-art performance on ImageNet with a ViT-G model reaching 90.94% top-1 accuracy. Moreover, the study demonstrates that the benefits of model soups extend beyond image classification tasks to various natural language processing tasks. The approach also enhances out-of-distribution performance and zero-shot performance on new downstream tasks. The authors provide analytical insights into why weight-averaging and logit ensembling lead to similar performance improvements by relating them to flatness of loss landscapes and prediction confidence. Overall, this innovative methodology presented by Wortsman et al. offers a promising avenue for enhancing model accuracy and robustness in machine learning applications without adding complexity or computational overhead. The code for implementing these techniques is available at https://github.com/mlfoundations/model-soups for further exploration and application in diverse domains.
Created on 01 Jun. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.