Grokking as Compression: A Nonlinear Complexity Perspective

AI-generated keywords: Grokking Linear Mapping Number (LMN) Compression Information Complexity Deep Learning

AI-generated Key Points

  • The study investigates the phenomenon of grokking in deep learning models
  • Grokking refers to delayed generalization after memorization
  • The authors propose that grokking is attributed to compression
  • They introduce a metric called linear mapping number (LMN) to measure network complexity
  • LMN is preferred over $L_2$ norm for characterizing model complexity due to its interpretability and linear relationships with test losses during compression phase
  • LMN reveals an intriguing phenomenon where XOR networks switch between two generalization solutions, which is not observed with $L_2$ norm
  • Previous attempts have been made to understand grokking through toy models and measures like linear region number
  • This work extends these measures by introducing LMN that can accommodate any activation function
  • The authors highlight the connection between compression and deep learning, citing information bottleneck theory and success of language models
  • Considering information and compression perspectives are crucial for understanding generalization puzzles in deep learning
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ziming Liu, Ziqian Zhong, Max Tegmark

License: CC BY 4.0

Abstract: We attribute grokking, the phenomenon where generalization is much delayed after memorization, to compression. To do so, we define linear mapping number (LMN) to measure network complexity, which is a generalized version of linear region number for ReLU networks. LMN can nicely characterize neural network compression before generalization. Although the $L_2$ norm has been a popular choice for characterizing model complexity, we argue in favor of LMN for a number of reasons: (1) LMN can be naturally interpreted as information/computation, while $L_2$ cannot. (2) In the compression phase, LMN has linear relations with test losses, while $L_2$ is correlated with test losses in a complicated nonlinear way. (3) LMN also reveals an intriguing phenomenon of the XOR network switching between two generalization solutions, while $L_2$ does not. Besides explaining grokking, we argue that LMN is a promising candidate as the neural network version of the Kolmogorov complexity since it explicitly considers local or conditioned linear computations aligned with the nature of modern artificial neural networks.

Submitted to arXiv on 09 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.05918v1

In this study, the authors investigate the phenomenon of grokking, which refers to the delayed generalization that occurs after memorization in deep learning models. They propose that this phenomenon can be attributed to compression and aim to understand it from a computation/information complexity perspective. To measure network complexity, the authors introduce a metric called linear mapping number (LMN), which is an extension of the linear region number for ReLU networks. LMN provides a way to characterize neural network compression before generalization. The authors argue in favor of using LMN over the popular choice of $L_2$ norm for characterizing model complexity for several reasons. Firstly, LMN can be naturally interpreted as information/computation, whereas $L_2$ norm cannot. Secondly, during the compression phase, LMN exhibits linear relationships with test losses, while $L_2$ norm shows complicated nonlinear correlations with test losses. Lastly, LMN reveals an intriguing phenomenon where XOR networks switch between two generalization solutions, which is not observed with $L_2$ norm. The authors also discuss related works and discussions on grokking and complexity measures in deep learning. Previous attempts have been made to understand grokking through toy models and measures that characterize its dynamics. Complexity measures such as linear region number have been proposed from an information perspective. However, this work extends these measures by introducing LMN that can accommodate general networks with any activation function. Furthermore, the authors highlight the connection between compression and deep learning. The theory of information bottleneck suggests a compression phase followed by a fitting phase in deep learning models. Recent studies have also attributed the success of language models to compression. The authors agree that considering information and compression perspectives are crucial for unlocking generalization puzzles in deep learning and propose that LMN could serve as a useful metric in this regard. In summary, this study explores grokking from a computation/information complexity perspective by introducing the concept of LMN as a measure of network complexity.
Created on 02 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.