Self-Consistency Improves Chain of Thought Reasoning in Language Models

AI-generated keywords: Self-consistency Reasoning Accuracy Diverse Outputs Ensemble Results Chain of Thought

AI-generated Key Points

  • Proposed self-consistency method to improve reasoning accuracy of large language models
  • Multiple ways to arrive at correct answer in tasks requiring deliberate thinking
  • Simulate this process by sampling diverse set of outputs from model's decoder representing different reasoning paths
  • Hypothesize that correct reasoning processes have greater agreement in final answer
  • Implement self-consistency by prompting model with manually written chain of thought exemplars and sampling candidate outputs for diversity
  • Ensemble results by selecting most consistent answer among generated answers
  • Experimental investigation showed substantial improvements compared to using chain of thought alone with single path
  • Self-consistency consistently improved accuracy across various datasets for arithmetic and commonsense reasoning benchmarks
  • Approach leverages natural diversity in human thinking processes and applies it to language models for improved reasoning accuracy
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

License: CC BY 4.0

Abstract: We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models. The idea is to sample a diverse set of outputs from a language model and return the most consistent answer in the set. Such ensembling method improves reasoning accuracy when combined with chain of thought prompting. For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements in a variety of datasets, such as GSM8K (+10%), SVAMP (+14%), MultiArith (+24%), CommonsenseQA (+5%) and ARC (easy +4%, challenge +5%).

Submitted to arXiv on 21 Mar. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2203.11171v1

We propose a self-consistency method to improve the reasoning accuracy of large language models. We observe that in tasks requiring deliberate thinking, there are often multiple ways to arrive at the correct answer. To simulate this process in language models, we sample a diverse set of outputs from the model's decoder. These outputs represent different reasoning paths that lead to the same answer. While some of these paths may be incorrect or contain mistakes, we hypothesize that correct reasoning processes tend to have greater agreement in their final answer. To implement self-consistency, we first prompt the language model with a set of manually written chain of thought exemplars. Then, we sample a set of candidate outputs from the model's decoder which introduces diversity in the generated reasoning paths. Finally, we ensemble the results by selecting the most consistent answer among the generated answers. In our experimental investigation, we combine chain of thought prompting with self-consistency and demonstrate substantial improvements compared to using chain of thought alone with a single generated path. For arithmetic and commonsense reasoning benchmarks such as GSM8K (+10%), SVAMP (+14%), MultiArith (+24%), CommonsenseQA (+5%) and ARC (easy +4%, challenge +5%), self-consistency consistently yields significant accuracy improvements across various datasets. Our approach leverages natural diversity in human thinking processes and applies it to language models through ensembling diverse reasoning paths leading to improved reasoning accuracy and potential applications in various domains where accurate reasoning is crucial.
Created on 27 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.