Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

AI-generated keywords: Retrieval Augmented Generation (RAG)

AI-generated Key Points

  • Compared performance of Retrieval Augmented Generation (RAG) and Long-Context (LC) Large Language Models (LLMs)
  • LC consistently outperformed RAG in terms of average performance when adequately resourced
  • RAG's lower cost remained a significant advantage
  • Proposed Self-Route method to route queries to RAG or LC based on model self-reflection, maintaining comparable performance to LC while significantly reducing costs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

License: CC BY 4.0

Abstract: Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. Self-Route significantly reduces the computation cost while maintaining a comparable performance to LC. Our findings provide a guideline for long-context applications of LLMs using RAG and LC.

Submitted to arXiv on 23 Jul. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2407.16833v1

, , , , In our study, we compared the performance of Retrieval Augmented Generation (RAG) and Long-Context (LC) Large Language Models (LLMs), specifically Gemini-1.5 and GPT-4, on various public datasets for real and query-based tasks in English. We excluded summarization tasks without queries from our comparison. The datasets included NarrativeQA, Qasper, MultiFieldQA, HotpotQA, 2WikiMultihopQA, MuSiQue, QMSum from LongBench, and En.QA and EN.MC from ∞Bench. For evaluation metrics, we used F1 scores for open-ended QA tasks, accuracy for multi-choice QA tasks, and ROUGE score for summarization tasks. Our evaluation included three latest LLMs: Gemini-1.5-Pro supporting up to 1 million tokens, GPT-4O supporting 128k tokens, and GPT-3.5-Turbo supporting 16k tokens. Our results showed that LC consistently outperformed RAG in terms of average performance when adequately resourced. However,<kg>RAG's lower cost remained a significant advantage.</kg> To leverage the strengths of both approaches while reducing computation costs,<kg>we proposed Self-Route - a method that routes queries to RAG or LC based on model self-reflection.</kg> This approach maintained comparable performance to LC while significantly reducing costs.
Created on 28 Aug. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.