Self-Retrieval: Building an Information Retrieval System with One Large Language Model

AI-generated keywords: Information Retrieval

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Large language models (LLMs) have transformed information retrieval (IR) systems
  • Self-Retrieval is a groundbreaking solution that integrates essential IR functionalities into a single LLM
  • Self-Retrieval operates by internalizing the corpus within an LLM through natural language indexing
  • Experimental results show that Self-Retrieval surpasses previous approaches and enhances downstream applications driven by LLMs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Qiaoyu Tang, Jiawei Chen, Bowen Yu, Yaojie Lu, Cheng Fu, Haiyang Yu, Hongyu Lin, Fei Huang, Ben He, Xianpei Han, Le Sun, Yongbin Li

Abstract: The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation.

Submitted to arXiv on 23 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.00801v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , The emergence of large language models (LLMs) has transformed the landscape of information retrieval (IR) systems, bringing about a paradigm shift in how humans access information. Traditional IR systems have struggled to adapt to this change, but a groundbreaking solution known as Self-Retrieval offers a promising solution. This end-to-end, LLM-driven architecture seamlessly integrates essential IR functionalities into a single LLM, harnessing its full potential during the retrieval process. Self-Retrieval operates by internalizing the corpus within an LLM through sophisticated natural language indexing, redefining the retrieval process as document generation and self-assessment steps that can be efficiently executed using one large language model. Experimental results have shown that Self-Retrieval surpasses previous approaches and significantly enhances downstream applications driven by LLMs. Authored by Qiaoyu Tang, Jiawei Chen, Bowen Yu, Yaojie Lu, Cheng Fu, Haiyang Yu, Hongyu Lin, Fei Huang, Ben He, Xianpei Han, Le Sun and Yongbin Li; "Self-Retrieval: Building an Information Retrieval System with One Large Language Model" sheds light on this cutting-edge approach that promises to revolutionize information retrieval processes and elevate efficiency and effectiveness of LLM-driven applications across various domains.
Created on 10 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.