On the infinite-depth limit of finite-width neural networks

AI-generated keywords: Infinite-depth limit Finite-width residual neural networks Random Gaussian weights Pre-activations convergence Zero-drift diffusion process

AI-generated Key Points

  • Authors investigate infinite-depth limit of finite-width residual neural networks with random Gaussian weights
  • Pre-activations converge to zero-drift diffusion process in infinite-depth limit, different from weak convergence to Gaussian random variable in infinite-width limit
  • Diverse distributions observed in infinite-depth limit based on activation function chosen, with distinct closed-form expressions identified in two scenarios
  • Change in regime for post-activation norms when transitioning from width of 3 to 4
  • Comparison between sequential limits: first approaching infinite depth then width versus more common path of width preceding depth
  • Understanding limiting laws crucial for designing resilient neural networks and gaining insights into overparameterized networks
  • Unlike infinite-width limit where Gaussian distribution is typically obtained under certain conditions, behavior in infinite-depth limit influenced by activation function selection
  • Leveraging Itˆo's lemma allows obtaining known distributions by adjusting activation functions carefully
  • Zero probability of process collapse for specific conditions in infinite-depth neural networks with various activation functions (including ReLU), highlighting robustness and stability compared to other configurations
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Soufiane Hayou

71 pages, 21 figures
License: CC BY 4.0

Abstract: In this paper, we study the infinite-depth limit of finite-width residual neural networks with random Gaussian weights. With proper scaling, we show that by fixing the width and taking the depth to infinity, the pre-activations converge in distribution to a zero-drift diffusion process. Unlike the infinite-width limit where the pre-activation converge weakly to a Gaussian random variable, we show that the infinite-depth limit yields different distributions depending on the choice of the activation function. We document two cases where these distributions have closed-form (different) expressions. We further show an intriguing change of regime phenomenon of the post-activation norms when the width increases from 3 to 4. Lastly, we study the sequential limit infinite-depth-then-infinite-width and compare it with the more commonly studied infinite-width-then-infinite-depth limit.

Submitted to arXiv on 03 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.00688v3

The authors of this paper investigate the infinite-depth limit of finite-width residual neural networks with random Gaussian weights. By fixing the width and allowing the depth to approach infinity, they demonstrate that the pre-activations converge in distribution to a zero-drift diffusion process through proper scaling. This is different from the infinite-width limit where pre-activations tend to weakly converge to a Gaussian random variable. The infinite-depth limit also yields diverse distributions depending on the chosen activation function. Two scenarios are identified where these distributions have distinct closed-form expressions. Interestingly, there is a change in regime for post-activation norms when transitioning from a width of 3 to 4. The study also compares two sequential limits: first approaching infinite depth followed by infinite width versus the more commonly explored path of infinite width preceding infinite depth. Understanding these limiting laws is crucial for designing resilient neural networks and gaining insights into overparameterized networks. While previous research has focused on the infinite-width limit, this work contributes by investigating the infinite-depth limit of finite-width neural networks. A key finding is that unlike in the case of infinite width where a Gaussian distribution is typically obtained under certain conditions regarding activation functions, behavior in the infinite-depth limit is notably influenced by activation function selection. Leveraging Itˆo's lemma allows for obtaining known distributions by carefully adjusting activation functions. Furthermore, an important characteristic is revealed regarding general width limits: there is a zero probability of process collapse for specific conditions in infinite-depth neural networks with various activation functions (including ReLU). This highlights their robustness and stability compared to other configurations.
Created on 02 Jun. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.