Landmark Attention: Random-Access Infinite Context Length for Transformers
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Transformers have limitations in handling longer contexts due to large memory requirements
- Previous approaches compromised random-access flexibility or relied on separate mechanisms for context retrieval
- The authors propose a novel approach using landmark tokens to represent input blocks and training the attention mechanism to select relevant blocks
- This eliminates the need for a separate mechanism and allows retrieval of blocks directly through the attention mechanism
- The method seamlessly integrates with specialized data structures and memory hierarchy for processing arbitrarily long context lengths
- Comparable performance with Transformer-XL is achieved while reducing the number of retrieved tokens in each step
- Fine-tuning LLaMA 7B with this method extends its context length capacity up to 32k tokens, allowing for inference at GPT-4's context lengths
- "Landmark Attention" addresses memory limitations of transformers when handling longer contexts
- Random-access flexibility is maintained while efficiently retrieving relevant blocks through the attention mechanism itself
- Experimental results demonstrate comparable performance with existing models and reduced computational requirements.
Authors: Amirkeivan Mohtashami, Martin Jaggi
Abstract: While transformers have shown remarkable success in natural language processing, their attention mechanism's large memory requirements have limited their ability to handle longer contexts. Prior approaches, such as recurrent memory or retrieval-based augmentation, have either compromised the random-access flexibility of attention (i.e., the capability to select any token in the entire context) or relied on separate mechanisms for relevant context retrieval, which may not be compatible with the model's attention. In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context. Our method uses a landmark token to represent each block of the input and trains the attention to use it for selecting relevant blocks, enabling retrieval of blocks directly through the attention mechanism instead of by relying on a separate mechanism. Our approach seamlessly integrates with specialized data structures and the system's memory hierarchy, enabling processing of arbitrarily long context lengths. We demonstrate that our method can obtain comparable performance with Transformer-XL while significantly reducing the number of retrieved tokens in each step. Finally, we show that fine-tuning LLaMA 7B with our method successfully extends its context length capacity up to 32k tokens, allowing for inference at the context lengths of GPT-4.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.