Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

AI-generated keywords: Chain-of-thought prompting

AI-generated Key Points

  • Recent advancements in natural language processing have been driven by the development of large language models, which have shown improved performance and sample efficiency.
  • Scaling up model size alone has not been sufficient for achieving high performance on challenging tasks such as arithmetic, commonsense, and symbolic reasoning.
  • The paper explores how the reasoning ability of large language models can be unlocked by a simple method called chain-of-thought prompting.
  • The method is motivated by two ideas: generating natural language rationales that lead to the final answer for arithmetic reasoning and in-context few-shot learning via prompting.
  • Ling et al. (2017) pioneered the idea of using natural language rationales to solve math word problems through a series of intermediate steps, while Cobbe et al. (2021) extended this work by creating a larger dataset and using it to finetune a pretrained language model rather than training a model from scratch.
  • In program synthesis, Nye et al. (2021) leveraged language models to predict the final outputs of Python programs via step-by-step prediction.
  • Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning.
  • Experiments on three large language models demonstrate that providing a few chain-of-thought demonstrations as exemplars in prompting achieves state-of-the-art accuracy on math word problems benchmark GSM8K, surpassing even finetuned GPT-3 with a verifier.
  • Chain-of-thought prompting is presented as a simple and broadly applicable method for enhancing reasoning in language models.
  • The broadening range of reasoning tasks that language models can perform will hopefully inspire further work on language-based approaches to reasoning.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

License: CC BY 4.0

Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

Submitted to arXiv on 28 Jan. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2201.11903v6

Recent advancements in natural language processing have been driven by the development of large language models, which have shown improved performance and sample efficiency. However, scaling up model size alone has not been sufficient for achieving high performance on challenging tasks such as arithmetic, commonsense, and symbolic reasoning. This paper explores how the reasoning ability of large language models can be unlocked by a simple method called chain-of-thought prompting. The method is motivated by two ideas: generating natural language rationales that lead to the final answer for arithmetic reasoning and in-context few-shot learning via prompting. The paper draws inspiration from related work in various research areas, including using intermediate steps to solve reasoning problems and recent work on prompting. Ling et al. (2017) pioneered the idea of using natural language rationales to solve math word problems through a series of intermediate steps, while Cobbe et al. (2021) extended this work by creating a larger dataset and using it to finetune a pretrained language model rather than training a model from scratch. In program synthesis, Nye et al. (2021) leveraged language models to predict the final outputs of Python programs via step-by-step prediction. The paper shows that chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning. Experiments on three large language models demonstrate that providing a few chain-of-thought demonstrations as exemplars in prompting achieves state-of-the-art accuracy on math word problems benchmark GSM8K, surpassing even finetuned GPT-3 with a verifier. In conclusion, this paper presents chain-of-thought prompting as a simple and broadly applicable method for enhancing reasoning in language models. The method unlocks emergent properties of model scale that allow sufficiently large language models to perform reasoning tasks that otherwise have flat scaling curves. The broadening range of reasoning tasks that language models can perform will hopefully inspire further work on language based approaches to reasoning.
Created on 02 May. 2023

Assess the quality of the AI-generated content by voting

Score: 1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.