In Defense of RAG in the Era of Long-Context Language Models
AI-generated Key Points
- Two key approaches in language model research: Retrieval-augmented generation (RAG) and long-context language models (LLMs)
- RAG leverages external knowledge for context-based answer generation, enhancing factual accuracy and reducing hallucinations
- Advancements in long-context LLMs enable efficient processing of extremely large text sequences
- Long-context LLMs have shown to outperform RAG in handling lengthy contexts but may lead to a decline in answer quality due to potential drawbacks
- Order-preserve retrieval-augmented generation (OP-RAG) aims to enhance RAG performance for long-context question-answer applications by preserving the order of retrieved chunks from the original document
- OP-RAG achieves higher answer quality with fewer tokens compared to long-context LLMs that process the entire context as input
- OP-RAG presents a promising alternative for improving answer generation in complex linguistic contexts by striking a balance between context length and answer quality through order preservation
Authors: Tan Yu, Anbang Xu, Rama Akkiraju
Abstract: Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-context applications. Unlike the existing works favoring the long-context LLM over RAG, we argue that the extremely long context in LLMs suffers from a diminished focus on relevant information and leads to potential degradation in answer quality. This paper revisits the RAG in long-context answer generation. We propose an order-preserve retrieval-augmented generation (OP-RAG) mechanism, which significantly improves the performance of RAG for long-context question-answer applications. With OP-RAG, as the number of retrieved chunks increases, the answer quality initially rises, and then declines, forming an inverted U-shaped curve. There exist sweet points where OP-RAG could achieve higher answer quality with much less tokens than long-context LLM taking the whole context as input. Extensive experiments on public benchmark demonstrate the superiority of our OP-RAG.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.