Solving math word problems with process- and outcome-based feedback

AI-generated keywords: Supervision methods Language models Math word problems Outcome-based approaches Process-based approaches

AI-generated Key Points

  • Study examines supervision methods for language models in solving math word problems
  • Researchers compare outcome-based and process-based approaches
  • Investigate both final-answer and reasoning errors
  • Experiments conducted on the GSM8K task
  • Process-based supervision crucial for correct reasoning steps
  • Results demonstrate improved performance with reduced final-answer and reasoning errors
  • Significance of incorporating process-based feedback in training language models for math problem-solving tasks highlighted
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins

License: CC BY 4.0

Abstract: Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% $\to$ 12.7% final-answer error and 14.0% $\to$ 3.4% reasoning error among final-answer-correct solutions.

Submitted to arXiv on 25 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.14275v1

The study examines supervision methods for language models in solving math word problems. The researchers compare outcome-based and process-based approaches and investigate both final-answer and reasoning errors. They conduct experiments on the GSM8K task and show that process-based supervision is crucial for correct reasoning steps. The results demonstrate improved performance with reduced final-answer and reasoning errors. This highlights the significance of incorporating process-based feedback in training language models for math problem-solving tasks.
Created on 06 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.