Searching for Best Practices in Retrieval-Augmented Generation

AI-generated keywords: Retrieval-augmented generation up-to-date information response quality summarization methods Generator Fine-tuning

AI-generated Key Points

  • Integration of up-to-date information in retrieval-augmented generation (RAG) techniques enhances response quality and mitigates hallucinations, especially in specialized domains.
  • RAG workflows involve multiple processing steps that can be executed in different ways.
  • Summarization of retrieved documents is a crucial aspect of the RAG pipeline to avoid redundant or unnecessary information hindering accurate response generation by Language Models (LLMs).
  • Efficient summarization methods are essential for optimizing the RAG workflow, with extractive and abstractive methods being common approaches.
  • Query-based summarization methods focus on retrieving information relevant to queries, with approaches like Recomp, LongLLMLingua, and Selective Context evaluated for their performance.
  • Generator Fine-tuning techniques using models like monoT5 and RankLLaMA on datasets such as MS MARCO Passage ranking dataset enhance LLM efficiency in generating responses based on summarized information.
  • Document Repacking methods after reranking optimize subsequent processes by arranging documents based on relevancy scores from the reranking phase, reducing time and resources required for response generation.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

License: CC BY 4.0

Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps, each of which can be executed in various ways. Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. Through extensive experiments, we suggest several strategies for deploying RAG that balance both performance and efficiency. Moreover, we demonstrate that multimodal retrieval techniques can significantly enhance question-answering capabilities about visual inputs and accelerate the generation of multimodal content using a "retrieval as generation" strategy.

Submitted to arXiv on 01 Jul. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2407.01219v1

In the realm of retrieval-augmented generation (RAG) techniques, the integration of up-to-date information has proven to be effective in enhancing response quality and mitigating hallucinations, especially in specialized domains. While various RAG approaches have been proposed to improve large language models through query-dependent retrievals, these methods often face challenges related to complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps that can be executed in different ways. One crucial aspect of the RAG pipeline is the summarization of retrieved documents. Retrieval results may contain redundant or unnecessary information that could hinder accurate response generation by Language Models (LLMs). Additionally, long prompts can slow down the inference process. Therefore, efficient summarization methods are essential for optimizing the RAG workflow. Summarization tasks can be extractive or abstractive in nature. Extractive methods involve segmenting text into sentences and ranking them based on importance, while abstractive compressors synthesize information from multiple documents to generate a cohesive summary. In this paper, we focus on query-based summarization methods as RAG retrieves information relevant to queries. We explore several approaches such as Recomp, LongLLMLingua, and Selective Context for summarizing retrieved documents. Through extensive experiments on benchmark datasets like NQ, TriviaQA, and HotpotQA, we evaluate the performance of these methods. Recomp stands out for its exceptional performance in generating accurate summaries. Although LongLLMLingua does not perform well on these experimental datasets due to its limited generalization capabilities compared to other methods like Selective Context. Furthermore, we delve into Generator Fine-tuning techniques to enhance the efficiency and effectiveness of LLMs in generating responses based on summarized information. By fine-tuning generators like monoT5 and RankLLaMA on datasets like MS MARCO Passage ranking dataset using TILDEv2 indexing techniques, we achieve a balance between performance and efficiency. Additionally, we introduce Document Repacking methods after reranking to optimize subsequent processes like LLM response generation by arranging documents based on relevancy scores from the reranking phase. This helps in reducing the time and resources required for generating responses. Overall, our study provides insights into optimal practices for deploying RAG techniques that strike a balance between performance enhancement and efficiency improvement across various stages of the workflow.
Created on 19 Nov. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.