BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

AI-generated keywords: Defeasible Reasoning

AI-generated Key Points

Automated reasoning with unstructured natural text is important for many NLP and AI applications.
Existing evaluation assumes consistent information, but real-world information is often inconsistent or contradictory.
Google Research developed BoardgameQA dataset to measure LM reasoning capacity in the presence of conflicting input sources.
BoardgameQA incorporates reasoning with implicit background knowledge and scenarios where additional information needs to come from the model itself.
LMs perform poorly when reasoning with conflicting inputs and additional knowledge is needed.
Resolving conflicts by adopting the source with higher preference can be effective.
BoardgameQA highlights an important gap in current LMs' understanding capacity and can guide future work to improve their understanding ability under this setup.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Mehran Kazemi, Quan Yuan, Deepti Bhatia, Najoung Kim, Xin Xu, Vaiva Imbrasaite, Deepak Ramachandran

arXiv: 2306.07934v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.

Submitted to arXiv on 13 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.07934v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Automated reasoning with unstructured natural text is a crucial requirement for many potential applications of Natural Language Processing (NLP) and for developing robust Artificial Intelligence (AI) systems. While Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. However, in the real-world, available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. To address this issue, researchers at Google Research have formulated the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning and developed a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA incorporates reasoning with implicit background knowledge to better reflect reasoning problems in downstream applications. The dataset includes scenarios where knowledge required for reasoning is only partially provided as input and additional information needs to come from the model itself. The researchers benchmarked various LMs on BoardgameQA and observed that LMs perform poorly when reasoning with conflicting inputs. In the case of smaller models, performance was also poor when additional knowledge from the LM was needed. Since reasoning over contradicting and incomplete sets of information is a common scenario in real-world applications, these results highlight an important gap in the current LMs' reasoning capacity. The work spans three dimensions: text-based logical reasoning, reasoning with conflicting sources, and incomplete information. Earlier works on natural language logical reasoning have finetuned LMs to directly provide answers to logical questions. Later work showed that explicitly generating the entire proof leads to substantial improvements both in accuracy and interpretability. One widely-applicable way of resolving conflicts is to impose preferences over information sources based on source credibility or recency and adopt the source with higher preference. In conclusion, BoardgameQA provides a valuable resource for measuring the natural language reasoning ability of LMs in the presence of conflicting input sources. The dataset highlights an important gap in the current LMs' understanding capacity and can guide future work developing methodology to improve their understanding ability under this setup or finding alternative formulations of conflict resolution that better facilitate LM understanding.

- Automated reasoning with unstructured natural text is important for many NLP and AI applications.
- Existing evaluation assumes consistent information, but real-world information is often inconsistent or contradictory.
- Google Research developed BoardgameQA dataset to measure LM reasoning capacity in the presence of conflicting input sources.
- BoardgameQA incorporates reasoning with implicit background knowledge and scenarios where additional information needs to come from the model itself.
- LMs perform poorly when reasoning with conflicting inputs and additional knowledge is needed.
- Resolving conflicts by adopting the source with higher preference can be effective.
- BoardgameQA highlights an important gap in current LMs' understanding capacity and can guide future work to improve their understanding ability under this setup.

Summary: This is about computers that can read and understand words like people do. Sometimes the information they get is not clear or makes no sense, but we want them to still be able to figure things out. Google made a game for computers to test how well they can understand confusing information. The game also tests if the computer can use what it already knows to solve problems. The game showed that computers need more help with this kind of thinking. Definitions- Automated reasoning: When a computer uses its own knowledge and rules to solve problems. - Unstructured natural text: Words that are written or spoken in a way that is similar to how people talk, without following strict rules. - NLP (Natural Language Processing): A type of technology that helps computers understand human language. - AI (Artificial Intelligence): When a computer can do things that usually require human intelligence, like learning and problem-solving. - LM (Language Model): A type of program used in NLP and AI that helps computers understand language by predicting what words come next based on what it has learned before.

Automated Reasoning with Unstructured Natural Text: A Closer Look at BoardgameQA

Natural Language Processing (NLP) and Artificial Intelligence (AI) systems have become increasingly important in our daily lives. As such, automated reasoning with unstructured natural text is a crucial requirement for many potential applications of NLP and AI. While Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. However, in the real-world, available information is frequently inconsistent or contradictory, making it difficult for models to accurately understand the data they are presented with. To address this issue, researchers at Google Research formulated the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning and developed a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. In this article we will take a closer look at BoardgameQA and explore how it can be used to measure LM understanding under conflicting input sources.

What is BoardgameQA?

BoardgameQA incorporates reasoning with implicit background knowledge to better reflect real-world scenarios where additional information needs to come from the model itself. The dataset includes scenarios where knowledge required for reasoning is only partially provided as input and additional information needs to come from the model itself. This makes it an ideal tool for evaluating LM understanding under incomplete sets of data or when dealing with contradicting inputs that require conflict resolution strategies such as source preference ranking based on credibility or recency.

Benchmarking Results

The researchers benchmarked various LMs on BoardgameQA and observed that LMs perform poorly when dealing with conflicting inputs due to their limited understanding capabilities in these situations. In particular, smaller models showed poor performance when additional knowledge from the LM was needed due to their lack of sufficient contextual understanding abilities needed for accurate conflict resolution strategies such as source preference ranking based on credibility or recency.

Conclusion

In conclusion, BoardgameQA provides a valuable resource for measuring the natural language reasoning ability of LMs in challenging settings involving conflicting input sources or incomplete datasets requiring additional knowledge from within the model itself. The dataset highlights an important gap in current LMs' understanding capacity and can guide future work developing methodology to improve their understanding ability under these setups or finding alternative formulations of conflict resolution that better facilitate LM understanding

Created on 14 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

60.3%

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

cs.CL

56.8%

We're Afraid Language Models Aren't Modeling Ambiguity

cs.CL

56.3%

Chain of Thought Prompting Elicits Reasoning in Large Language Models

cs.CL

56.1%

Successive Prompting for Decomposing Complex Questions

cs.CL

56.0%

A Categorical Archive of ChatGPT Failures

cs.CL

55.7%

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in N…

cs.CL

55.6%

Training a Helpful and Harmless Assistant with Reinforcement Learning from Hu…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.