PAL: Program-aided Language Models

AI-generated keywords: Program-Aided Language models (PaL)

AI-generated Key Points

  • Large language models (LLMs) have made progress in reasoning tasks through few-shot prompting
  • LLMs often make mistakes in the solution part of the problem, even when correctly decomposed
  • Program-Aided Language models (PaL) propose using LLMs to understand natural language problems and generate programs as intermediate reasoning steps
  • The solution step is offloaded to a programmatic runtime like a Python interpreter
  • Experiments on 12 reasoning tasks achieved state-of-the-art results in all benchmarks
  • PaL outperformed larger models and surpassed chain-of-thought prompting by a significant margin in various tasks
  • The approach combines the strengths of LLMs in understanding natural language with the accuracy of Python interpreters for solving complex arithmetic and logical problems.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig

The first three authors contributed equally. Our code and data are publicly available at http://reasonwithpal.com/
License: CC ZERO 1.0

Abstract: Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks when provided with a few examples at test time (few-shot prompting). Much of this success can be attributed to prompting methods for reasoning, such as chain-of-thought, that employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is correctly decomposed. We present Program-Aided Language models (PaL): a new method that uses the LLM to understand natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a programmatic runtime such as a Python interpreter. With PaL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We experiment with 12 reasoning tasks from BIG-Bench Hard and other benchmarks, including mathematical reasoning, symbolic reasoning, and algorithmic problems. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models, and we set new state-of-the-art results in all 12 benchmarks. For example, PaL using Codex achieves state-of-the-art few-shot accuracy on the GSM benchmark of math word problems when the model is allowed only a single decoding, surpassing PaLM-540B with chain-of-thought prompting by an absolute 8% .In three reasoning tasks from the BIG-Bench Hard benchmark, PaL outperforms CoT by 11%. On GSM-hard, a more challenging version of GSM that we create, PaL outperforms chain-of-thought by an absolute 40%.

Submitted to arXiv on 18 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.10435v1

Large language models (LLMs) have made significant progress in reasoning tasks, including mathematical and symbolic reasoning, through few-shot prompting. However, LLMs often make mistakes in the solution part of the problem, even when correctly decomposed. To address this issue, we propose Program-Aided Language models (PaL), which use LLMs to understand natural language problems and generate programs as intermediate reasoning steps. The solution step is then offloaded to a programmatic runtime like a Python interpreter. We conducted experiments on 12 reasoning tasks and achieved state-of-the-art results in all benchmarks. PaL outperformed larger models and surpassed chain-of-thought prompting by a significant margin in various tasks. Our approach combines the strengths of LLMs in understanding natural language with the accuracy of Python interpreters for solving complex arithmetic and logical problems.
Created on 13 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.