Design of Chain-of-Thought in Math Problem Solving

AI-generated keywords: Chain-of-Thought (CoT)

AI-generated Key Points

  • Chain-of-Thought (CoT) design for math problem solving
  • Categorized into three types: non-describing program, self-describing program, and comment-describing program
  • Self-describing programs outperform other CoT types and even surpass GPT-3.5-turbo in few-shot prompting capabilities
  • Program CoTs are more effective than natural language CoTs for math problem solving tasks
  • Self-describing and comment-describing programs outperform non-describing programs, with self-describing programs showing superior performance
  • Using Python for program CoTs yields better results than using Wolfram Language
  • Datasets and code used in the study are publicly available for further research
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhanming Jie, Trung Quoc Luong, Xinbo Zhang, Xiaoran Jin, Hang Li

15 pages
License: CC BY 4.0

Abstract: Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem solving. We conduct a comprehensive examination of methods for designing CoT, comparing conventional natural language CoT with various program CoTs, including the self-describing program, the comment-describing program, and the non-describing program. Furthermore, we investigate the impact of programming language on program CoTs, comparing Python and Wolfram Language. Through extensive experiments on GSM8K, MATHQA, and SVAMP, we find that program CoTs often have superior effectiveness in math problem solving. Notably, the best performing combination with 30B parameters beats GPT-3.5-turbo by a significant margin. The results show that self-describing program offers greater diversity and thus can generally achieve higher performance. We also find that Python is a better choice of language than Wolfram for program CoTs. The experimental results provide a valuable guideline for future CoT designs that take into account both programming language and coding style for further advancements. Our datasets and code are publicly available.

Submitted to arXiv on 20 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.11054v2

, , , , In this study, we explore the design of Chain-of-Thought (CoT) for math problem solving, categorizing program CoTs into three types: non-describing program, self-describing program, and comment-describing program. Through extensive experiments on GSM8K, MATHQA, and SVAMP datasets, we discover that the self-describing program consistently outperforms other types of CoTs and even surpasses the few-shot prompting capabilities of GPT-3.5-turbo. Our findings suggest that utilizing program CoTs is more effective than relying on natural language CoTs for math problem solving tasks. Additionally, we observe that both self-describing and comment-describing programs outperform non-describing programs, with self-describing programs showing superior performance compared to comment-describing ones. Furthermore, our research indicates that using Python for program CoTs yields better results than using Wolfram Language. These experimental insights provide valuable guidance for future developments in CoT designs for math problem solving tasks. The datasets and code used in our study are publicly available to facilitate further research in this area.
Created on 01 Dec. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.