Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions
AI-generated Key Points
- Contrast consistency is important in natural language processing for consistently correct predictions in the presence of perturbations.
- This has been studied in tasks such as sentiment analysis and reading comprehension, but not in open-domain question answering (OpenQA).
- Collecting perturbed questions that satisfy factuality requirements is difficult, so researchers collected minimally edited questions as challenging contrast sets to evaluate OpenQA models.
- The widely used dense passage retriever (DPR) performed poorly on these contrast sets despite fitting the training set well and performing competitively on standard test sets.
- To improve DPR training, the researchers introduced a simple and effective query-side contrastive loss with data augmentation.
- Experiments on the contrast sets demonstrated that DPR's contrast consistency improved without sacrificing its accuracy on standard test sets.
- A set of candidate minimally edited questions was generated by applying antonym edits, adding or removing words, and other techniques from a large corpus for future research.
Authors: Zhihan Zhang, Wenhao Yu, Zheng Ning, Mingxuan Ju, Meng Jiang
Abstract: Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR's contrast consistency is improved without sacrificing its accuracy on the standard test sets.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.