Generate rather than Retrieve: Large Language Models are Strong Context Generators
AI-generated Key Points
- The paper proposes a new approach to solving knowledge-intensive tasks using large language model generators instead of document retrievers.
- The proposed method is called GenRead and involves prompting a large language model to generate contextual documents based on a given question and then reading the generated documents to produce the final answer.
- The authors also propose a clustering-based prompting method that selects distinct prompts resulting in generated documents that cover different perspectives, leading to better recall over acceptable answers.
- GenRead is demonstrated to be effective through extensive experiments on three different knowledge-intensive tasks, including open-domain QA, fact checking, and dialogue systems.
- GenRead achieves significantly higher exact match scores than state-of-the-art retrieve-then-read pipeline DPR-FiD without retrieving any documents from external knowledge sources.
- However, GenRead has limitations in updating knowledge state and adapting to new domains compared to retrieve-then-read methods' ability to swap in new documents when new information is learned or adding in documents from a new domain for quick adaptation to downstream tasks.
- Future research directions include incorporating new knowledge efficiently into generate then read methods while minimizing hallucination errors in generated documents.
Authors: Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang
Abstract: Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents. In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer. Furthermore, we propose a novel clustering-based prompting method that selects distinct prompts, resulting in the generated documents that cover different perspectives, leading to better recall over acceptable answers. We conduct extensive experiments on three different knowledge-intensive tasks, including open-domain QA, fact checking, and dialogue system. Notably, GenRead achieves 71.6 and 54.4 exact match scores on TriviaQA and WebQ, significantly outperforming the state-of-the-art retrieve-then-read pipeline DPR-FiD by +4.0 and +3.9, without retrieving any documents from any external knowledge source. Lastly, we demonstrate the model performance can be further improved by combining retrieval and generation. Our code and generated documents can be found at https://github.com/wyu97/GenRead.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.