Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

AI-generated keywords: Language Models Large Language Models (LLMs) Thought Reinforcement Learning (RL) Test-time Scaling

AI-generated Key Points

  • Language is a fundamental tool for human reasoning
  • Large Language Models (LLMs) are being used for complex reasoning tasks
  • Introduction of "thought" concept allows LLMs to emulate human reasoning processes
  • Reinforcement learning (RL) is applied to train LLMs for better reasoning capabilities
  • Encouraging LLMs to engage in more extensive "thinking" during test-time improves reasoning accuracy
  • Train-time and test-time scaling lead to the development of Large Reasoning Models
  • OpenAI's o1 series marks a significant milestone in this research direction
  • Automated data construction, learning-to-reason techniques, and test-time scaling are key components driving large reasoning model development
  • Test-time enhancing techniques can further enhance LLMs' reasoning capacities by enabling strategic reasoning across solution spaces, leveraging past experiences, and optimizing workflows dynamically
  • Designing robust evaluation benchmarks is crucial for documenting improvements in LLM capabilities and guiding future research directions
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li

36 pages, 5 figures
License: CC BY 4.0

Abstract: Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.

Submitted to arXiv on 16 Jan. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2501.09686v3

Language has long been recognized as a fundamental tool for human reasoning. The emergence of Large Language Models (LLMs) has sparked considerable interest in utilizing these models to tackle complex reasoning tasks. Researchers have advanced beyond simple autoregressive token generation by introducing the concept of "thought" - a sequence of tokens representing intermediate steps in the reasoning process. This innovative approach allows LLMs to emulate intricate human reasoning processes like tree search and reflective thinking. A recent trend in learning to reason involves applying reinforcement learning (RL) to train LLMs to excel in reasoning processes. This method enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly enhancing LLMs' reasoning capacity by providing ample training data. Moreover, studies have shown that encouraging LLMs to engage in more extensive "thinking" with additional tokens during test-time inference can substantially improve reasoning accuracy. The combination of train-time and test-time scaling presents a new research frontier towards developing Large Reasoning Models. The introduction of OpenAI's o1 series signifies a significant milestone in this research direction. In this survey, we delve into recent advancements in LLM reasoning, starting with an introduction to the foundational background of LLMs. We then explore key technical components driving the development of large reasoning models, such as automated data construction, learning-to-reason techniques, and test-time scaling. Furthermore, we discuss how test-time enhancing techniques have the potential to further enhance LLMs' reasoning capacities by enabling them to strategically reason across solution spaces, leverage past experiences, and dynamically optimize workflows. Additionally, designing robust evaluation benchmarks is crucial for documenting improvements in LLM capabilities and guiding future research directions. We review popular benchmarks for LLM reasoning and categorize them systematically based on their taxonomy. Overall, the field is moving towards developing Large Reasoning Models that can mimic complex human-like reasoning processes effectively. By leveraging advancements in automated data construction, learning-to-reason techniques, and test-time scaling strategies, researchers are paving the way for more sophisticated and capable language models that can excel at complex reasoning tasks.
Created on 21 Feb. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.