RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

AI-generated keywords: Retrieval-augmented language models

AI-generated Key Points

  • Retrieval-augmented language models (LMs) have limitations in understanding overall document context holistically.
  • RAPTOR (Recursive And Progressive Tree Of Retrievals) is a novel approach that addresses this limitation.
  • RAPTOR utilizes text summarization techniques to construct a tree with different levels of summarization from the bottom up.
  • RAPTOR retrieves from this tree at inference time, enabling a more comprehensive understanding of the document context.
  • Controlled experiments show that RAPTOR significantly outperforms existing methods on various tasks, especially when coupled with GPT-4 for complex multi-step reasoning.
  • RAPTOR also outperforms current retrieval augmentation methods when applied to collections of long documents.
  • RAPTOR enhances the relevance and effectiveness of retrieved information by leveraging text summarization techniques at different scales.
  • The code for RAPTOR will be released publicly to facilitate further research and development in this area.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning

License: CC BY 4.0

Abstract: Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

Submitted to arXiv on 31 Jan. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2401.18059v1

, , , , Retrieval-augmented language models (LMs) have shown promise in adapting to changes in the world state and incorporating long-tail knowledge. However, existing methods often retrieve only short contiguous chunks from a retrieval corpus, which limits their ability to understand the overall document context holistically. In this paper, we propose a novel approach called RAPTOR (Recursive And Progressive Tree Of Retrievals) that addresses this limitation. RAPTOR utilizes text summarization techniques to recursively embed, cluster, and summarize chunks of text, constructing a tree with different levels of summarization from the bottom up. This allows for information integration across lengthy documents at different levels of abstraction. At inference time, our RAPTOR model retrieves from this tree, enabling a more comprehensive understanding of the document context. We conducted controlled experiments comparing RAPTOR with traditional retrieval-augmented LMs using three language models: UnifiedQA, GPT-3, and GPT-4. The results demonstrate that retrieval with recursive summaries significantly outperforms existing methods on various tasks. In particular, when coupled with GPT-4, RAPTOR achieves state-of-the-art performance on question-answering tasks involving complex multi-step reasoning. For example, on the QuALITY benchmark, RAPTOR improves the best performance by 20% in absolute accuracy. In addition to its superior performance on QA tasks, RAPTOR also outperforms current retrieval augmentation methods when applied to collections of long documents. By leveraging text summarization techniques to provide context at different scales, RAPTOR enhances the relevance and effectiveness of retrieved information. Our work contributes to the field by demonstrating the effectiveness of using text summarization for retrieval augmentation and showcasing its potential in handling long documents. We will release the code for RAPTOR publicly to facilitate further research and development in this area. Related work has explored the need for retrieval systems despite advances in hardware and algorithms that enable models to handle longer contexts. Models often struggle to utilize long-range context effectively and experience diminishing performance as context length increases. Retrieval systems play a crucial role in selecting the most relevant information for knowledge-intensive tasks, especially when important information is embedded within lengthy contexts. Existing retrieval methods primarily rely on standard approaches such as chunking corpora and encoding with BERT-based retrievers. However, this approach may not capture the complete semantic depth of the text. Reading extracted snippets from technical or scientific documents can lack important context, making them challenging to interpret accurately. To address these limitations, our RAPTOR model incorporates recursive summarization techniques that provide a condensed view of documents while preserving granular details. This approach enables more focused engagement with the content and facilitates capturing distant interdependencies within the text that may be overlooked by other methods. In summary, our work introduces RAPTOR, a retrieval-augmented language model that leverages recursive summarization to enhance contextual understanding and improve performance on various tasks. The experiments demonstrate its superiority over existing methods and highlight its potential for handling long documents effectively. We will make the code for RAPTOR publicly available to facilitate further research in this area.
Created on 02 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.