Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions

AI-generated keywords: Contrast consistency OpenQA Dense Passage Retriever Query-side Contrastive Loss Data Augmentation

AI-generated Key Points

  • Contrast consistency is important in natural language processing for consistently correct predictions in the presence of perturbations.
  • This has been studied in tasks such as sentiment analysis and reading comprehension, but not in open-domain question answering (OpenQA).
  • Collecting perturbed questions that satisfy factuality requirements is difficult, so researchers collected minimally edited questions as challenging contrast sets to evaluate OpenQA models.
  • The widely used dense passage retriever (DPR) performed poorly on these contrast sets despite fitting the training set well and performing competitively on standard test sets.
  • To improve DPR training, the researchers introduced a simple and effective query-side contrastive loss with data augmentation.
  • Experiments on the contrast sets demonstrated that DPR's contrast consistency improved without sacrificing its accuracy on standard test sets.
  • A set of candidate minimally edited questions was generated by applying antonym edits, adding or removing words, and other techniques from a large corpus for future research.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhihan Zhang, Wenhao Yu, Zheng Ning, Mingxuan Ju, Meng Jiang

Accepted at TACL. This is a pre-MIT Press publication version
License: CC BY 4.0

Abstract: Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR's contrast consistency is improved without sacrificing its accuracy on the standard test sets.

Submitted to arXiv on 23 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.14441v1

In natural language processing, contrast consistency is a crucial aspect of a model's ability to make consistently correct predictions in the presence of perturbations. While this has been studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. To address this issue, researchers collected minimally edited questions as challenging contrast sets to evaluate OpenQA models using a combination of human annotation and large language model generation. The widely used dense passage retriever (DPR) was found to perform poorly on these contrast sets despite fitting the training set well and performing competitively on standard test sets. To improve DPR training, the researchers introduced a simple and effective query-side contrastive loss with data augmentation. Experiments on the contrast sets demonstrated that DPR's contrast consistency improved without sacrificing its accuracy on standard test sets. The researchers also generated a set of candidate minimally edited questions by applying antonym edits, adding or removing words, and other techniques from a large corpus for future research in this area.
Created on 25 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.