Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

AI-generated keywords: rStar self-play mutual reasoning small language models Monte Carlo Tree Search diverse set of human-like reasoning actions

AI-generated Key Points

  • Introduction of rStar, a self-play mutual reasoning approach for enhancing the reasoning capabilities of small language models (SLMs)
  • Key innovation of rStar: decoupling reasoning into a self-play mutual generation-discrimination process
  • Use of Monte Carlo Tree Search (MCTS) to enrich SLM with diverse human-like reasoning actions and construct higher quality reasoning trajectories
  • Validation of generated trajectories by a discriminator SLM to ensure accuracy
  • Effectiveness of rStar demonstrated through experiments on various reasoning problems such as GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA
  • Remarkable improvements in accuracy levels on the GSM8K dataset for different models using rStar
  • Ablation study conducted to evaluate the effectiveness of rich action space in solving complex reasoning tasks
  • Highlighting how mutual reasoning through self-play can enhance problem-solving abilities of smaller language models without external sources or fine-tuning processes associated with larger models like GPT-4
  • Availability of rStar code for further exploration and implementation at https://github.com/zhentingqi/rStar
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao Yang

License: CC BY 4.0

Abstract: This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, from 74.53% to 91.13% for LLaMA3-8B-Instruct. Code will be available at https://github.com/zhentingqi/rStar.

Submitted to arXiv on 12 Aug. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2408.06195v1

In this paper, the authors introduce rStar, a self-play mutual reasoning approach that enhances the reasoning capabilities of small language models (SLMs) without the need for fine-tuning or reliance on superior models. The key innovation of rStar lies in decoupling reasoning into a self-play mutual generation-discrimination process. This allows for a target SLM to enrich the Monte Carlo Tree Search (MCTS) with a diverse set of human-like reasoning actions and construct higher quality reasoning trajectories. Another SLM then functions as a discriminator to validate each trajectory generated by the target SLM. The mutually agreed upon trajectories are deemed mutual consistent and are thus more likely to be accurate. Extensive experiments conducted across five different SLMs showcase the effectiveness of rStar in solving various reasoning problems such as GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Notably, rStar demonstrates remarkable improvements in accuracy levels for different models on the GSM8K dataset - boosting accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, and from 74.53% to 91.13% for LLaMA3-8B-Instruct. The authors also conduct an ablation study on the effectiveness of their rich action space by evaluating LLaMA3-8B on 200 sampled GSM8K questions. This study provides insights into how different combinations of actions impact the model's performance in solving complex reasoning tasks. Overall, this research highlights how mutual reasoning through self-play can significantly enhance the problem-solving abilities of smaller language models without relying on external sources or fine-tuning processes typically associated with larger models like GPT-4. The code for rStar will be made available at https://github.com/zhentingqi/rStar for further exploration and implementation by interested researchers and practitioners in the field.
Created on 14 Aug. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.