Width and Depth Limits Commute in Residual Networks

AI-generated keywords: Deep Neural Networks Skip Connections Residual Networks Scalability Network Architecture

AI-generated Key Points

  • Study by Soufiane Hayou and Greg Yang on deep neural networks with skip connections
  • Scaling branches by $1/\sqrt{depth}$ maintains consistent covariance structure regardless of limit approach
  • Increasing width before depth is practical for networks where depth and width are comparable
  • Pre-activations in scenario follow Gaussian distributions, impacting Bayesian deep learning
  • Theoretical results validated through extensive simulations, showing alignment between theory and practice
  • Proof technique establishes that large-depth and large-width limits commute in residual neural networks during initialization
  • Concentration of measure result for a McKean-Vlasov process supports analyses prioritizing increasing width before depth
  • Technique does not address network behavior post-training, leaving room for exploration into variations based on learning rate selection
  • Insights contribute to scalability of deep neural networks with skip connections, emphasizing robustness of covariance structures across dimensions
  • Importance of understanding network architecture's influence on model performance and laying foundation for future research on training strategy optimization
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Soufiane Hayou, Greg Yang

24 pages, 8 figures. arXiv admin note: text overlap with arXiv:2210.00688
License: CC BY 4.0

Abstract: We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.

Submitted to arXiv on 01 Feb. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2302.00453v2

In their study "Width and Depth Limits Commute in Residual Networks," Soufiane Hayou and Greg Yang investigate the behavior of deep neural networks with skip connections as the width and depth approach infinity. They demonstrate that by scaling branches by $1/\sqrt{depth}$, the covariance structure remains consistent regardless of how the limit is taken. This finding sheds light on why increasing width before depth is still a practical approach for networks where depth and width are comparable. Additionally, Hayou and Yang show that pre-activations in this scenario follow Gaussian distributions, which has implications for Bayesian deep learning. Through extensive simulations, they validate their theoretical results and showcase a strong alignment between theory and practice. The authors employ a novel proof technique to establish that in residual neural networks (resnets), the large-depth and large-width limits commute during initialization. Their concentration of measure result for a McKean-Vlasov process supports previous analyses of deep and wide neural networks that prioritize increasing width before depth. However, they acknowledge that their technique does not address network behavior post-training, leaving room for exploration into potential variations based on learning rate selection. Overall, "Width and Depth Limits Commute in Residual Networks" contributes valuable insights into the scalability of deep neural networks with skip connections, highlighting the robustness of covariance structures across varying dimensions. The study underscores the importance of understanding how network architecture influences model performance and lays a foundation for future research on optimizing training strategies for complex neural networks.
Created on 02 Jun. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.