Interpreting Grokked Transformers in Complex Modular Arithmetic

AI-generated keywords: Grokking

AI-generated Key Points

  • Focus on grokking in classification tasks involving simple algorithmic data
  • Study explores interpretability of algorithms within grokked models and mechanisms behind delayed generalization
  • Investigates complex modular arithmetic through interpretable reverse engineering
  • Introduces novel progress measures like Fourier Frequency Sparsity and Fourier Coefficient Ratio to evaluate internal representations
  • Research delves into mechanistic interpretability within neural networks during training and inference processes
  • Aims to understand phenomena like double descent by reverse engineering neural networks
  • Detailed analysis of a wide range of modular arithmetic operations including addition, subtraction, multiplication, polynomials, and multi-task mixtures
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo

Code: https://github.com/frt03/grok_mod_poly
License: CC BY 4.0

Abstract: Grokking has been actively explored to reveal the mystery of delayed generalization. Identifying interpretable algorithms inside the grokked models is a suggestive hint to understanding its mechanism. In this work, beyond the simplest and well-studied modular addition, we observe the internal circuits learned through grokking in complex modular arithmetic via interpretable reverse engineering, which highlights the significant difference in their dynamics: subtraction poses a strong asymmetry on Transformer; multiplication requires cosine-biased components at all the frequencies in a Fourier domain; polynomials often result in the superposition of the patterns from elementary arithmetic, but clear patterns do not emerge in challenging cases; grokking can easily occur even in higher-degree formulas with basic symmetric and alternating expressions. We also introduce the novel progress measure for modular arithmetic; Fourier Frequency Sparsity and Fourier Coefficient Ratio, which not only indicate the late generalization but also characterize distinctive internal representations of grokked models per modular operation. Our empirical analysis emphasizes the importance of holistic evaluation among various combinations.

Submitted to arXiv on 26 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.16726v2

, , , , In this work, the focus is on grokking in classification tasks involving simple algorithmic data commonly explored in the literature. The study delves into the interpretability of algorithms within grokked models, shedding light on the mechanisms behind delayed generalization. It goes beyond basic modular addition to investigate complex modular arithmetic through interpretable reverse engineering and uncovers intriguing insights into how different mathematical operations are represented internally by Transformer models. To comprehensively evaluate internal representations of grokked models per modular operation, novel progress measures like Fourier Frequency Sparsity and Fourier Coefficient Ratio are introduced. The empirical analysis underscores the importance of holistic evaluation across various combinations. Furthermore, the research delves into mechanistic interpretability within neural networks during training and inference processes. By systematically reverse engineering neural networks, researchers aim to understand phenomena like double descent and identify functional modules or circuits inside these complex systems. Previous studies have revealed algorithmic patterns obtained during tasks such as modular addition or group composition through techniques like Fourier transform of logits or gradient investigation. However, this work provides a detailed analysis of a wide range of modular arithmetic operations including addition, subtraction, multiplication, polynomials, and multi-task mixtures. Unlike previous studies that focused on specific operations or tasks, this research offers a comprehensive exploration of grokking across various mathematical domains within neural networks.
Created on 08 Jul. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.