Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

AI-generated keywords: Pushdown Layers Transformer Syntactic Generalization Sample Efficiency Language Understanding

AI-generated Key Points

  • Pushdown Layers introduced as a new self-attention layer for Transformer language models
  • Address the challenge of capturing recursive structure in human language
  • Use a stack tape to track estimated depths of tokens in an incremental parse
  • Allow Transformer models to softly modulate attention and learn to "skip" over closed constituents
  • Achieve significantly better syntactic generalization compared to standard Transformer models
  • 3-5 times more sample-efficient than standard Transformer models
  • WIKITREES dataset created consisting of over 100 million tokens from Wikipedia articles
  • Pushdown Transformers exhibit drastically more sample-efficient syntactic generalization compared to base Transformers on WIKITREES dataset
  • Staged finetuning of GPT2-medium with Pushdown Layers improves language understanding tasks beyond just syntactic generalization
  • Replacing final 12 self-attention blocks with Pushdown Layers achieves better performance on several GLUE text classification tasks
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

Accepted at EMNLP 2023 (Long Papers)
License: CC BY 4.0

Abstract: Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer language models poorly capture long-tail recursive structure and exhibit sample-inefficient syntactic generalization. This work introduces Pushdown Layers, a new self-attention layer that models recursive state via a stack tape that tracks estimated depths of every token in an incremental parse of the observed prefix. Transformer LMs with Pushdown Layers are syntactic language models that autoregressively and synchronously update this stack tape as they predict new tokens, in turn using the stack tape to softly modulate attention over tokens -- for instance, learning to "skip" over closed constituents. When trained on a corpus of strings annotated with silver constituency parses, Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization, while maintaining similar perplexities. Pushdown Layers are a drop-in replacement for standard self-attention. We illustrate this by finetuning GPT2-medium with Pushdown Layers on an automatically parsed WikiText-103, leading to improvements on several GLUE text classification tasks.

Submitted to arXiv on 29 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.19089v1

This work introduces Pushdown Layers, a new self-attention layer for Transformer language models that addresses the challenge of capturing recursive structure in human language. Recursion is a fundamental feature of language but is difficult to model with self-attention due to the lack of an explicit mechanism for tracking recursive states. Pushdown Layers solve this problem by using a stack tape to track estimated depths of tokens in an incremental parse of the observed prefix. With Pushdown Layers, Transformer models can autoregressively update the stack tape as they predict new tokens, allowing them to softly modulate attention over tokens and learn to "skip" over closed constituents. The authors trained Transformers equipped with Pushdown Layers on a corpus of strings annotated with silver constituency parses and found that these models achieve significantly better syntactic generalization and are 3-5 times more sample-efficient compared to standard Transformer language models. To further evaluate the effectiveness of Pushdown Layers, the authors created a dataset called WIKITREES consisting of over 100 million tokens extracted from Wikipedia articles. They trained Pushdown Transformers on different amounts of data from WIKITREES and measured their sample efficiency in syntactic generalization tasks. The results showed that Pushdown Transformers exhibit drastically more sample-efficient syntactic generalization compared to base Transformers. Additionally, the authors performed staged finetuning of GPT2-medium with Pushdown Layers and observed improvements in language understanding tasks beyond just syntactic generalization. By replacing the final 12 self-attention blocks with Pushdown Layers, they achieved better performance on several GLUE text classification tasks. Overall, this work demonstrates that Pushdown Layers offer improvements in modeling recursive structure and can enhance both syntactic generalization and language understanding tasks in large-scale language modeling scenarios.
Created on 04 Dec. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.