ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision

AI-generated keywords: Multi-hop question answering Dense retrievers Retriever Supervision with Consistency and Relevance (ReSCORE) Iterative retrieval-augmented generation (RAG) Large language models

AI-generated Key Points

  • Dense retrievers outperform sparse methods like BM25 in multi-hop question answering by leveraging semantic embeddings.
  • A challenge in MHQA is the variability of queries throughout reasoning steps, requiring labeled query-document pairs for fine-tuning dense retrievers.
  • ReSCORE method introduces Retriever Supervision with Consistency and Relevance, utilizing large language models to capture document relevance and consistency with correct answers.
  • ReSCORE enables training of a retriever within an iterative question-answering framework without the need for labeled documents, showcasing significant improvements in retrieval performance.
  • Dense retrievers like Contriever are recognized as more effective overall in MHQA due to their reliance on domain-specific query and document embeddings.
  • ReSCORE leverages large language models to enhance retrieval accuracy without needing labeled data, offering a solution to labor-intensive and costly training processes.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Dosung Lee, Wonjun Oh, Boyoung Kim, Minyoung Kim, Joonsuk Park, Paul Hongsuck Seo

9 pages, 3 figures, ACL 2025
License: CC BY 4.0

Abstract: Multi-hop question answering (MHQA) involves reasoning across multiple documents to answer complex questions. Dense retrievers typically outperform sparse methods like BM25 by leveraging semantic embeddings; however, they require labeled query-document pairs for fine-tuning. This poses a significant challenge in MHQA due to the high variability of queries (reformulated) questions throughout the reasoning steps. To overcome this limitation, we introduce Retriever Supervision with Consistency and Relevance (ReSCORE), a novel method for training dense retrievers for MHQA without labeled documents. ReSCORE leverages large language models to capture each documents relevance to the question and consistency with the correct answer and use them to train a retriever within an iterative question-answering framework. Experiments on three MHQA benchmarks demonstrate the effectiveness of ReSCORE, with significant improvements in retrieval, and in turn, the state-of-the-art MHQA performance. Our implementation is available at: https://leeds1219.github.io/ReSCORE.

Submitted to arXiv on 27 May. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2505.21250v1

In the realm of multi-hop question answering (MHQA), dense retrievers have shown superior performance compared to sparse methods like BM25 by leveraging semantic embeddings. However, a significant challenge arises in MHQA due to the variability of queries throughout the reasoning steps. This necessitates labeled query-document pairs for fine-tuning dense retrievers. To address this limitation, a novel method called Retriever Supervision with Consistency and Relevance (ReSCORE) has been introduced. utilizes large language models to capture the relevance of each document to the question and its consistency with the correct answer. This enables training of a retriever within an iterative question-answering framework without the need for labeled documents. Experiments conducted on three MHQA benchmarks have demonstrated the effectiveness of ReSCORE, showcasing significant improvements in retrieval performance and subsequently enhancing state-of-the-art MHQA outcomes. The approach is commonly employed in MHQA systems, where relevant documents are retrieved iteratively to generate partial answers until a final answer is reached. While sparse retrievers like BM25 are frequently used in these systems, dense retrievers such as Contriever have been recognized as more effective overall due to their reliance on query and document embeddings trained specifically for the target domain. Despite this advantage, training dense retrievers for MHQA can be labor-intensive and costly due to the need for labeled documents reflecting their relevance across different iterations. offers a solution by leveraging large language models to streamline this process and enhance retrieval accuracy without requiring labeled data. The implementation of is publicly available at https://leeds1219.github.io/ReSCORE.
Created on 02 Feb. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.