Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

AI-generated keywords: Large Language Models Mathematical Reasoning Pre-training Loss Supervised Fine-tuning Rejection Sampling Fine-Tuning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Study titled "Scaling Relationship on Learning Mathematical Reasoning with Large Language Models" by Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou
  • Focus on mathematical reasoning for large language models (LLMs) and the relationship between LLM capacity and performance
  • Pre-training loss as a reliable indicator of model performance compared to the number of parameters
  • Log-linear correlation between data volume and model proficiency through supervised fine-tuning (SFT)
  • Diminishing returns for superior models with larger supervised datasets
  • Introduction of Rejection sampling Fine-Tuning (RFT) to enhance model performance without human intervention
  • RFT shows significant improvements in mathematical reasoning capabilities for LLMs by incorporating diverse reasoning pathways in augmented samples
  • More pronounced enhancements observed for less proficient LLMs with RFT
  • Remarkable results achieved with LLaMA-7B reaching an accuracy rate of 49.3% using RFT compared to 35.9% with traditional SFT
  • Emphasis on effective strategies for enhancing mathematical reasoning in large language models through innovative techniques like RFT and consideration of pre-training loss alongside other factors for evaluating model performance
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, Chang Zhou

Working in Progress

Abstract: Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that pre-training loss is a better indicator of the model's performance than the model's parameter count. We apply supervised fine-tuning (SFT) with different amounts of supervised data and empirically find a log-linear relation between data amount and model performance, and we find better models improve less with enlarged supervised datasets. To augment more data samples for improving model performances without any human effort, we propose to apply Rejection sampling Fine-Tuning (RFT). RFT uses supervised models to generate and collect correct reasoning paths as augmented fine-tuning datasets. We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs. We also find RFT brings more improvement for less performant LLMs. Furthermore, we combine rejection samples from multiple models which push LLaMA-7B to an accuracy of 49.3% and outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.

Submitted to arXiv on 03 Aug. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2308.01825v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the study titled "Scaling Relationship on Learning Mathematical Reasoning with Large Language Models," authors Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou delve into the complexities of mathematical reasoning for large language models (LLMs) and explore the relationship between LLM capacity and mathematical reasoning performance. The researchers focus on how factors such as pre-training loss, supervised data quantity, and augmented data impact the reasoning abilities of supervised LLMs. Through their investigation, the team discovers that pre-training loss serves as a more reliable indicator of model performance compared to the sheer number of parameters in the model. By employing supervised fine-tuning (SFT) with varying amounts of labeled data, they establish a log-linear correlation between data volume and model proficiency. Interestingly, they observe that superior models exhibit diminishing returns when exposed to larger supervised datasets. To enhance model performance without additional human intervention, the researchers propose a novel approach called Rejection sampling Fine-Tuning (RFT). This method leverages supervised models to generate and compile accurate reasoning paths as augmented fine-tuning datasets. Their experiments reveal that RFT yields significant improvements in mathematical reasoning capabilities for LLMs by incorporating diverse reasoning pathways in augmented samples. Notably, RFT demonstrates more pronounced enhancements for less proficient LLMs. Moreover, by amalgamating rejection samples from multiple models, the team achieves remarkable results with LLaMA-7B reaching an accuracy rate of 49.3%. This surpasses the accuracy attained through traditional supervised fine-tuning (SFT), which stood at 35.9%. Overall, this research sheds light on effective strategies for enhancing mathematical reasoning in large language models through innovative techniques like RFT and highlights the importance of considering pre-training loss alongside other factors when evaluating model performance.
Created on 12 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.