Extending Context Window of Large Language Models via Positional Interpolation

AI-generated keywords: Position Interpolation RoPE-based Pretrained LLMs Context Window Sizes Passkey Retrieval Language Modeling

AI-generated Key Points

  • Position Interpolation (PI) is a novel method that extends the context window sizes of RoPE-based pretrained LLMs.
  • PI allows for up to 32768 context window size with minimal fine-tuning and has shown strong empirical results on various tasks requiring long context.
  • PI works by linearly down-scaling the input position indices to match the original context window size instead of extrapolating beyond the trained length.
  • Theoretical analysis supports interpolation as a more stable alternative to extrapolation.
  • PI retains the original architecture of models and can reuse existing optimization and infrastructure.
  • Experiments on long document summarization using GovReport dataset show competitive ROUGE scores compared to other baselines.
  • PI complements retrieval-augmented LLMs by allowing more documents to be included in the input without modifying the attention mechanism or model architecture.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian

License: CC BY 4.0

Abstract: We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least $\sim 600 \times$ smaller than that of extrapolation, further demonstrating its stability. Models extended via Position Interpolation retain its original architecture and can reuse most pre-existing optimization and infrastructure.

Submitted to arXiv on 27 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.15595v1

The article introduces Position Interpolation (PI), a novel method that extends the context window sizes of RoPE-based pretrained LLMs. This approach allows for up to 32768 context window size with minimal fine-tuning and has shown strong empirical results on various tasks requiring long context. The authors demonstrate its effectiveness in passkey retrieval, language modeling, and long document summarization. PI works by linearly down-scaling the input position indices to match the original context window size instead of extrapolating beyond the trained length. This prevents high attention scores that can disrupt the self-attention mechanism. Theoretical analysis also supports interpolation as a more stable alternative to extrapolation. One of the key advantages of PI is that it retains the original architecture of models and can reuse existing optimization and infrastructure. In their experiments on long document summarization using GovReport dataset, the authors fine-tune extended LLaMA models with a context window of 16384 after truncating all input documents to their first 15000 tokens. The results show competitive ROUGE scores compared to other baselines. The article also discusses related work in retrieval-augmented LLMs and highlights how PI complements these approaches by allowing more documents to be included in the input without modifying the attention mechanism or model architecture. Overall, PI provides an effective solution for extending context window sizes in RoPE-based pretrained LLMs while maintaining stability and reusability of existing models.
Created on 25 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.