ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

AI-generated keywords: Large Language Models Mathematical Problem-Solving Self-Critique Pipeline Feedback Learning Cognitive Reasoning Abilities

AI-generated Key Points

  • Study focuses on enhancing mathematical problem-solving capabilities and language abilities of large language models (LLMs)
  • Introduces Self-Critique pipeline for feedback learning challenges in LLM alignment
  • Training Math-Critique model from LLM to provide feedback signals on generated mathematical responses
  • Utilizes rejective fine-tuning and direct preference optimization to improve problem-solving and language capabilities simultaneously
  • Experiment results show significant enhancement in LLM's math problem-solving skills and language ability compared to larger LLMs
  • Techniques developed deployed in ChatGLM online serving system with evaluation datasets and scripts available for further exploration
  • Discussion on existing approaches for math problem-solving in LLMs, including prompting methods, supervised fine-tuning, reinforcement learning techniques, decoding strategies, and external tool utilization
  • Importance of mathematical evaluation through benchmark datasets like GSM8k and MATH highlighted
  • Detailed discussion on datasets for evaluating mathematical capabilities across different languages such as AQuA, Mathematics, SAT-Math, NumGLUE with specific mention of Chinese datasets like Math23K and CMath covering various proficiency levels
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Wenyi Zhao, Jie Tang, Yuxiao Dong

License: CC BY 4.0

Abstract: Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM\footnote{\url{https://chatglm.cn}}, an online serving LLM. Related evaluation dataset and scripts are released at \url{https://github.com/THUDM/ChatGLM-Math}.

Submitted to arXiv on 03 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.02893v1

This study focuses on enhancing the mathematical problem-solving capabilities of large language models (LLMs) while also improving their language abilities. The researchers introduce a novel approach called the Self-Critique pipeline, specifically designed to address feedback learning challenges in LLM alignment. This involves training a Math-Critique model from the LLM itself to provide feedback signals on generated mathematical responses. Rejective fine-tuning and direct preference optimization are then employed to improve both problem-solving and language capabilities simultaneously. Through experiments on academic and challenging datasets like MathUserEval using ChatGLM3-32B as the base model, it is shown that this approach significantly enhances the LLM's mathematical problem-solving skills while also improving its language ability by up to two times compared to larger LLMs. Additionally, related techniques developed in this work have been deployed in ChatGLM, an online serving LLM system. The researchers have made available evaluation datasets and scripts for further exploration. Furthermore, various existing approaches for math problem-solving in LLMs are discussed, including prompting methods, supervised fine-tuning, reinforcement learning techniques, decoding strategies, and external tool utilization. The importance of mathematical evaluation through benchmark datasets like GSM8k and MATH is highlighted in assessing cognitive reasoning abilities of LLMs. Moreover, there is a detailed discussion on available datasets for evaluating mathematical capabilities across different languages such as AQuA, Mathematics, SAT-Math, NumGLUE among others with specific mention of Chinese datasets like Math23K and CMath covering various proficiency levels from elementary school to exam-level challenges. Overall,this work provides valuable insights into advancing both language understanding and mathematical problem-solving skills in large language models through innovative methodologies and thorough experimentation.
Created on 20 Oct. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.