Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

AI-generated keywords: Chain-of-thought prompting

AI-generated Key Points

Recent advancements in natural language processing have been driven by the development of large language models, which have shown improved performance and sample efficiency.
Scaling up model size alone has not been sufficient for achieving high performance on challenging tasks such as arithmetic, commonsense, and symbolic reasoning.
The paper explores how the reasoning ability of large language models can be unlocked by a simple method called chain-of-thought prompting.
The method is motivated by two ideas: generating natural language rationales that lead to the final answer for arithmetic reasoning and in-context few-shot learning via prompting.
Ling et al. (2017) pioneered the idea of using natural language rationales to solve math word problems through a series of intermediate steps, while Cobbe et al. (2021) extended this work by creating a larger dataset and using it to finetune a pretrained language model rather than training a model from scratch.
In program synthesis, Nye et al. (2021) leveraged language models to predict the final outputs of Python programs via step-by-step prediction.
Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning.
Experiments on three large language models demonstrate that providing a few chain-of-thought demonstrations as exemplars in prompting achieves state-of-the-art accuracy on math word problems benchmark GSM8K, surpassing even finetuned GPT-3 with a verifier.
Chain-of-thought prompting is presented as a simple and broadly applicable method for enhancing reasoning in language models.
The broadening range of reasoning tasks that language models can perform will hopefully inspire further work on language-based approaches to reasoning.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

arXiv: 2201.11903v6 - DOI (cs.CL)

License: CC BY 4.0

Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

Submitted to arXiv on 28 Jan. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2201.11903v6

Comprehensive Summary
Key points
Layman's Summary
Blog article

Recent advancements in natural language processing have been driven by the development of large language models, which have shown improved performance and sample efficiency. However, scaling up model size alone has not been sufficient for achieving high performance on challenging tasks such as arithmetic, commonsense, and symbolic reasoning. This paper explores how the reasoning ability of large language models can be unlocked by a simple method called chain-of-thought prompting. The method is motivated by two ideas: generating natural language rationales that lead to the final answer for arithmetic reasoning and in-context few-shot learning via prompting. The paper draws inspiration from related work in various research areas, including using intermediate steps to solve reasoning problems and recent work on prompting. Ling et al. (2017) pioneered the idea of using natural language rationales to solve math word problems through a series of intermediate steps, while Cobbe et al. (2021) extended this work by creating a larger dataset and using it to finetune a pretrained language model rather than training a model from scratch. In program synthesis, Nye et al. (2021) leveraged language models to predict the final outputs of Python programs via step-by-step prediction. The paper shows that chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning. Experiments on three large language models demonstrate that providing a few chain-of-thought demonstrations as exemplars in prompting achieves state-of-the-art accuracy on math word problems benchmark GSM8K, surpassing even finetuned GPT-3 with a verifier. In conclusion, this paper presents chain-of-thought prompting as a simple and broadly applicable method for enhancing reasoning in language models. The method unlocks emergent properties of model scale that allow sufficiently large language models to perform reasoning tasks that otherwise have flat scaling curves. The broadening range of reasoning tasks that language models can perform will hopefully inspire further work on language based approaches to reasoning.

- Recent advancements in natural language processing have been driven by the development of large language models, which have shown improved performance and sample efficiency.
- Scaling up model size alone has not been sufficient for achieving high performance on challenging tasks such as arithmetic, commonsense, and symbolic reasoning.
- The paper explores how the reasoning ability of large language models can be unlocked by a simple method called chain-of-thought prompting.
- The method is motivated by two ideas: generating natural language rationales that lead to the final answer for arithmetic reasoning and in-context few-shot learning via prompting.
- Ling et al. (2017) pioneered the idea of using natural language rationales to solve math word problems through a series of intermediate steps, while Cobbe et al. (2021) extended this work by creating a larger dataset and using it to finetune a pretrained language model rather than training a model from scratch.
- In program synthesis, Nye et al. (2021) leveraged language models to predict the final outputs of Python programs via step-by-step prediction.
- Chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning.
- Experiments on three large language models demonstrate that providing a few chain-of-thought demonstrations as exemplars in prompting achieves state-of-the-art accuracy on math word problems benchmark GSM8K, surpassing even finetuned GPT-3 with a verifier.
- Chain-of-thought prompting is presented as a simple and broadly applicable method for enhancing reasoning in language models.
- The broadening range of reasoning tasks that language models can perform will hopefully inspire further work on language-based approaches to reasoning.

Recent advancements in language technology have made computers better at understanding and using human language. However, they still struggle with tasks like math and common sense reasoning. A new method called chain-of-thought prompting helps these computers reason better by giving them examples to follow. This method has been tested on different language models and has shown great improvement in solving complex problems like math word problems. This breakthrough can inspire more research on how computers can understand human reasoning. Definitions- Natural Language Processing: The ability of a computer to understand and use human language. - Language models: Computer programs that are designed to understand and generate human-like language. - Sample efficiency: The ability of a model to learn from fewer examples. - Arithmetic: A branch of mathematics that deals with numbers and their operations. - Commonsense reasoning: The ability to make logical deductions based on general knowledge about the world. - Symbolic reasoning: The ability to manipulate symbols or abstract concepts according to rules or procedures.

Unlocking Reasoning in Language Models with Chain-of-Thought Prompting

In recent years, natural language processing (NLP) has seen tremendous advancements driven by the development of large language models. These models have achieved impressive performance and sample efficiency on various tasks such as sentiment analysis, question answering, and machine translation. However, scaling up model size alone has not been sufficient for achieving high performance on challenging tasks such as arithmetic, commonsense, and symbolic reasoning. This paper explores how the reasoning ability of large language models can be unlocked by a simple method called chain-of-thought prompting.

Motivation Behind Chain-of-Thought Prompting

The method is motivated by two ideas: generating natural language rationales that lead to the final answer for arithmetic reasoning and in-context few-shot learning via prompting. The paper draws inspiration from related work in various research areas including using intermediate steps to solve reasoning problems and recent work on prompting. Ling et al. (2017) pioneered the idea of using natural language rationales to solve math word problems through a series of intermediate steps while Cobbe et al. (2021) extended this work by creating a larger dataset and using it to finetune a pretrained language model rather than training a model from scratch. In program synthesis, Nye et al. (2021) leveraged language models to predict the final outputs of Python programs via step-by-step prediction.

Experimental Results

The paper shows that chain-of-thought prompting significantly improves the ability of large language models to perform complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning. Experiments on three large language models demonstrate that providing a few chain-of thought demonstrations as exemplars in prompting achieves state -of -the art accuracy on math word problems benchmark GSM8K surpassing even finetuned GPT - 3 with verifier .

Conclusion

In conclusion , this paper presents chain - of - thought prompting as a simple and broadly applicable method for enhancing reasoning in language models . The method unlocks emergent properties of model scale that allow sufficiently large language models to perform reasoning tasks that otherwise have flat scaling curves . The broadening range of reasoning tasks that language models can perform will hopefully inspire further work on languages based approaches to reasonings .

Created on 02 May. 2023

Assess the quality of the AI-generated content by voting

Score: 1

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

67.0%

When do you need Chain-of-Thought Prompting for ChatGPT?

cs.AI

64.0%

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

cs.CL

62.2%

When Brain-inspired AI Meets AGI

cs.AI

59.3%

Constitutional AI: Harmlessness from AI Feedback

cs.CL

57.4%

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

cs.CL

55.0%

ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language …

cs.CL

53.4%

A Categorical Archive of ChatGPT Failures

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.