RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

AI-generated keywords: RankT5

AI-generated Key Points

  • Limited studies on leveraging T5 for text ranking
  • Existing approaches treat text ranking as a classification problem
  • RankT5 introduces two T5-based ranking model structures: encoder-decoder and encoder-only
  • Models directly output ranking scores and can be fine-tuned with pairwise or listwise ranking losses
  • Experiments show significant improvements in ranking performance across various datasets
  • RankT5 with listwise ranking losses performs better on out-of-domain datasets compared to classification losses
  • Focuses on short document or passage ranking tasks, not long document rankings like MS MARCO
  • Uses Natural Questions dataset for training and development partitions
  • Preprocessing setup similar to previous work, dual-encoder retriever fine-tuned on NQ used to retrieve top 1000 passages for each query
  • Training data constructed by selecting one positive document per query and randomly sampling negative examples from other queries
  • Maximum sequence length set to 128
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, Michael Bendersky

13 pages
License: CC BY 4.0

Abstract: Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually formulate text ranking as classification and rely on postprocessing to obtain a ranked list. In this paper, we propose RankT5 and study two T5-based ranking model structures, an encoder-decoder and an encoder-only one, so that they not only can directly output ranking scores for each query-document pair, but also can be fine-tuned with "pairwise" or "listwise" ranking losses to optimize ranking performances. Our experiments show that the proposed models with ranking losses can achieve substantial ranking performance gains on different public text ranking data sets. Moreover, when fine-tuned with listwise ranking losses, the ranking model appears to have better zero-shot ranking performance on out-of-domain data sets compared to the model fine-tuned with classification losses.

Submitted to arXiv on 12 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.10634v1

The paper titled "RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses" addresses the issue of limited studies on leveraging powerful sequence-to-sequence models like T5 for text ranking. While recent progress has been made in text ranking using pretrained language models such as BERT, there is a lack of exploration on how to utilize models like T5 effectively. Existing approaches often treat text ranking as a classification problem and rely on postprocessing techniques to obtain a ranked list. In this paper, the authors propose RankT5, which introduces two T5-based ranking model structures: an encoder-decoder model and an encoder-only model. These models not only directly output ranking scores for each query-document pair but also allow fine-tuning with "pairwise" or "listwise" ranking losses to optimize ranking performance. The experiments conducted by the authors demonstrate that the proposed models with ranking losses achieve significant improvements in ranking performance across various public text ranking datasets. Additionally, when fine-tuned with listwise ranking losses, the RankT5 model exhibits better zero-shot ranking performance on out-of-domain datasets compared to the model fine-tuned with classification losses. It is important to note that this paper focuses on short document or passage ranking tasks rather than long document rankings like the MS MARCO document ranking task. The authors use the Natural Questions (NQ) dataset, which consists of over 50,000 queries in the training partition and 8,000 queries in the development partition. They adopt a preprocessing setup similar to previous work and employ a dual-encoder retriever fine-tuned on NQ to retrieve the top 1000 passages for each query. To construct the training data, they select one document with label 1 for each query and randomly sample (m - 1) documents from other queries as negative examples. The maximum sequence length is set to 128. Overall, this paper presents RankT5 as a novel approach to text ranking using T5 based models and demonstrates its effectiveness through experiments on various datasets. The findings suggest that fine tuning with ranking losses can significantly enhance ranking performance, particularly when utilizing listwiseranking losses for zero shotranking on out of domain data.
Created on 30 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.