When Does Re-initialization Work?

AI-generated keywords: Re-initialization Regularization Label Noise Self-Distillation Empirical

AI-generated Key Points

  • Re-initializing a neural network during training has been observed to improve generalization in recent works.
  • This technique is not widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols.
  • The authors conducted an extensive empirical comparison of standard training with a selection of re-initialization methods by training over 15,000 models on a variety of image classification benchmarks.
  • Re-initialization methods are consistently beneficial for generalization in the absence of any other regularization.
  • When deployed alongside other carefully tuned regularization techniques such as data augmentation, weight decay, and learning rate schedules that resemble state-of-the-art training protocols, re-initialization methods offer little to no added benefit for generalization.
  • Optimal generalization performance becomes less sensitive to the choice of hyperparameters under these conditions.
  • Under label noise where other regularization techniques are not able to offer much help on learning tasks, re-initialization significantly improves upon standard training.
  • Fixed-budget BANs do not improve performance compared to standard training in most cases but can serve as an important baseline for more sophisticated re-initialization methods.
  • A deeper understanding of why re-initializations work or do not work well is missing and future work could explore online learning implications and extend the scope of study beyond specific datasets/architectures.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Sheheryar Zaidi, Tudor Berariu, Hyunjik Kim, Jörg Bornschein, Claudia Clopath, Yee Whye Teh, Razvan Pascanu

License: CC BY 4.0

Abstract: Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.

Submitted to arXiv on 20 Jun. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2206.10011v1

In recent works, re-initializing a neural network during training has been observed to improve generalization. However, this technique is not widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works and whether it should be used together with regularization techniques such as data augmentation, weight decay, and learning rate schedules. To answer this question, the authors conducted an extensive empirical comparison of standard training with a selection of re-initialization methods by training over 15,000 models on a variety of image classification benchmarks. The authors found that re-initialization methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques such as data augmentation, weight decay, and learning rate schedules that resemble state-of-the-art training protocols, re-initialization methods offer little to no added benefit for generalization. Nonetheless, optimal generalization performance becomes less sensitive to the choice of hyperparameters under these conditions. Surprisingly, under label noise where other regularization techniques are not able to offer much help on learning tasks, re-initialization significantly improves upon standard training. The authors also investigated the role of self-distillation and found that fixed-budget BANs do not improve performance compared to standard training in most cases but can serve as an important baseline for more sophisticated re-initialization methods. One limitation of this study is that although clear empirical trends were observed in when re-initialization works or does not work well under certain conditions and settings using specific datasets and architectures (CIFAR-10/100 and Tiny ImageNet), a deeper understanding of why it does or does not work is missing. Future work could explore online learning implications where Shrink & Perturb was first proposed and shown to be helpful or extend the scope of study beyond these specific datasets/architectures to investigate other tasks and data modalities which may provide further insight into why re-initializations work or do not work well in certain contexts.
Created on 11 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.