Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

AI-generated keywords: Context Infinite Transformers Attention Mechanism Scaling

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Paper introduces a novel method for scaling Transformer-based Large Language Models (LLMs) to handle infinitely long inputs
  • Key innovation is the development of a new attention mechanism called Infini-attention
  • Infini-attention incorporates compressive memory, masked local attention, and long-term linear attention mechanisms within a single Transformer block
  • Demonstrated effectiveness through experiments on various tasks such as long-context language modeling benchmarks, passkey context block retrieval, and book summarization tasks
  • Method introduces minimal bounded memory parameters, enabling fast streaming inference for LLMs
  • Significant advancement in scaling LLMs to handle infinitely long inputs efficiently
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal

9 pages, 4 figures, 4 tables

Abstract: This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

Submitted to arXiv on 10 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.07143v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" by Tsendsuren Munkhdalai, Manaal Faruqui, and Siddharth Gopal introduces a novel method for scaling Transformer-based Large Language Models (LLMs) to handle infinitely long inputs while maintaining bounded memory and computation requirements. The key innovation in their approach is the development of a new attention mechanism called Infini-attention. This attention technique incorporates compressive memory within the traditional attention mechanism and integrates both masked local attention and long-term linear attention mechanisms within a single Transformer block. The authors demonstrate the effectiveness of their approach through experiments on various tasks such as long-context language modeling benchmarks, 1M sequence length passkey context block retrieval, and 500K length book summarization tasks using LLMs with 1B and 8B parameters. Their method introduces minimal bounded memory parameters, enabling fast streaming inference for LLMs. Overall, the paper presents a significant advancement in scaling LLMs to handle infinitely long inputs efficiently, showcasing the potential for improved performance in tasks requiring extensive contextual information processing.
Created on 18 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.