Enhance Reasoning for Large Language Models in the Game Werewolf

AI-generated keywords: Large Language Models External Thinker Module Dual-System Reasoning Social Deduction Games Data-Driven Approaches

AI-generated Key Points

A novel framework proposed to enhance reasoning capabilities of Large Language Models (LLMs) by integrating them with an external Thinker module
Thinker module leverages knowledge from databases and optimization techniques for complex logical analysis and domain-specific tasks
Hierarchy established where LLMs focus on System-1 tasks, while Thinker specializes in cognitive System-2 reasoning
Framework demonstrated effectiveness in a 9-player Werewolf game scenario requiring dual-system reasoning
Communication protocol introduced to facilitate interaction between LLMs and the Thinker, trained using data from over 18,800 human game sessions and reinforcement learning techniques
Experimental results showcase proficiency in deductive reasoning, speech generation, and online game evaluation
6B LLM integrated with the Thinker surpasses GPT4 after fine-tuning
Largest dataset provided for social deduction games to date through authentic game sessions and speech data collection methodology
Framework comprises three key components: The Listener for natural language understanding, the Thinker for deep logical analysis and decision-making, and the Presenter for generating coherent language output based on strategic instructions from the Thinker
Data preparation involved collecting game data from Werewolf sessions enriched with domain-specific corpus sources and utilizing advanced models like Paraformer for Automatic Speech Recognition (ASR)
Study aims to develop more generalized and flexible methods by incorporating data-driven approaches
Evaluation involves testing AI performance in games involving AI vs AI or one human player against multiple AIs; challenges arise in evaluating AI performance in settings with human players due to interactive nature and varied speech strategies

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, Haobo Fu

arXiv: 2402.02330v1 - DOI (cs.AI)

License: CC BY 4.0

Abstract: This paper presents an innovative framework that integrates Large Language Models (LLMs) with an external Thinker module to enhance the reasoning capabilities of LLM-based agents. Unlike augmenting LLMs with prompt engineering, Thinker directly harnesses knowledge from databases and employs various optimization techniques. The framework forms a reasoning hierarchy where LLMs handle intuitive System-1 tasks such as natural language processing, while the Thinker focuses on cognitive System-2 tasks that require complex logical analysis and domain-specific knowledge. Our framework is presented using a 9-player Werewolf game that demands dual-system reasoning. We introduce a communication protocol between LLMs and the Thinker, and train the Thinker using data from 18800 human sessions and reinforcement learning. Experiments demonstrate the framework's effectiveness in deductive reasoning, speech generation, and online game evaluation. Additionally, we fine-tune a 6B LLM to surpass GPT4 when integrated with the Thinker. This paper also contributes the largest dataset for social deduction games to date.

Submitted to arXiv on 04 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.02330v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

A novel framework is proposed to enhance the reasoning capabilities of Large Language Models (LLMs) by integrating them with an external Thinker module. Unlike traditional methods that rely on prompt engineering, the Thinker module leverages knowledge from databases and employs optimization techniques to handle complex logical analysis and domain-specific tasks. The framework establishes a hierarchy where LLMs focus on intuitive System-1 tasks like natural language processing, while the Thinker specializes in cognitive System-2 reasoning. The effectiveness of this framework is demonstrated through its application in a 9-player Werewolf game scenario that requires dual-system reasoning. A communication protocol is introduced to facilitate interaction between LLMs and the Thinker, with the latter being trained using data from over 18,800 human game sessions and reinforcement learning techniques. Experimental results showcase the framework's proficiency in deductive reasoning, speech generation, and online game evaluation. Furthermore, a 6B LLM integrated with the Thinker surpasses GPT4 after fine-tuning. This study also contributes to the field by providing the largest dataset for social deduction games to date. The methodology involves collecting authentic game sessions and speech data to align closely with real-world scenarios and human interaction patterns. The framework comprises three key components: The Listener for natural language understanding, the Thinker for deep logical analysis and decision-making, and the Presenter for generating coherent language output based on strategic instructions from the Thinker. Data preparation involved collecting game data from Werewolf sessions hosted on an online platform, enriching it with domain-specific corpus sources, and utilizing advanced models like Paraformer for Automatic Speech Recognition (ASR). Overall, this study aims to develop more generalized and flexible methods by incorporating data-driven approaches. Evaluation involves testing AI performance in games involving AI vs AI or one human player against multiple AIs. Challenges arise in evaluating AI performance in settings with human players due to their interactive nature and varied speech strategies.

- A novel framework proposed to enhance reasoning capabilities of Large Language Models (LLMs) by integrating them with an external Thinker module
- Thinker module leverages knowledge from databases and optimization techniques for complex logical analysis and domain-specific tasks
- Hierarchy established where LLMs focus on System-1 tasks, while Thinker specializes in cognitive System-2 reasoning
- Framework demonstrated effectiveness in a 9-player Werewolf game scenario requiring dual-system reasoning
- Communication protocol introduced to facilitate interaction between LLMs and the Thinker, trained using data from over 18,800 human game sessions and reinforcement learning techniques
- Experimental results showcase proficiency in deductive reasoning, speech generation, and online game evaluation
- 6B LLM integrated with the Thinker surpasses GPT4 after fine-tuning
- Largest dataset provided for social deduction games to date through authentic game sessions and speech data collection methodology
- Framework comprises three key components: The Listener for natural language understanding, the Thinker for deep logical analysis and decision-making, and the Presenter for generating coherent language output based on strategic instructions from the Thinker
- Data preparation involved collecting game data from Werewolf sessions enriched with domain-specific corpus sources and utilizing advanced models like Paraformer for Automatic Speech Recognition (ASR)
- Study aims to develop more generalized and flexible methods by incorporating data-driven approaches
- Evaluation involves testing AI performance in games involving AI vs AI or one human player against multiple AIs; challenges arise in evaluating AI performance in settings with human players due to interactive nature and varied speech strategies

Summary- A new plan was made to make smart language models even smarter by adding a special Thinker module. - The Thinker uses information from databases and special techniques to solve tricky problems and tasks in specific areas. - The smart language models focus on one type of task, while the Thinker is good at another type of thinking. - This plan worked well in a game where players had to use different types of thinking skills. - They made a way for the smart models and the Thinker to talk to each other, using lots of data and learning methods. Definitions- Framework: A plan or structure that helps organize things. - Module: A part or piece that does a specific job within a larger system. - Database: A place where information is stored and can be easily accessed. - Optimization: Making something work as well as possible by making it more efficient or effective. - Hierarchy: A system where things are organized in levels or ranks based on importance or power.

Introduction: Language models have made significant strides in recent years, with the development of large language models (LLMs) such as GPT-3 and BERT. These models have shown impressive capabilities in natural language processing tasks, but they still struggle with complex logical reasoning and domain-specific tasks. This limitation has led to the proposal of a novel framework that integrates LLMs with an external Thinker module to enhance their reasoning abilities. The Framework: The proposed framework establishes a hierarchy where LLMs handle intuitive System-1 tasks like natural language processing, while the Thinker specializes in cognitive System-2 reasoning. This approach differs from traditional methods that rely on prompt engineering, which can be time-consuming and limited in its effectiveness. The Thinker Module: The Thinker module leverages knowledge from databases and employs optimization techniques to handle complex logical analysis and domain-specific tasks. It is trained using data from over 18,800 human game sessions and reinforcement learning techniques. This allows it to make more informed decisions based on real-world scenarios and human interaction patterns. Application in Werewolf Game Scenario: To showcase the effectiveness of this framework, it was applied in a 9-player Werewolf game scenario that requires dual-system reasoning. The results demonstrated the proficiency of the framework in deductive reasoning, speech generation, and online game evaluation. Communication Protocol: A communication protocol was introduced to facilitate interaction between LLMs and the Thinker module. The Presenter component generates coherent language output based on strategic instructions from the Thinker module. Data Preparation: One of the key contributions of this study is providing the largest dataset for social deduction games to date. Data preparation involved collecting game data from Werewolf sessions hosted on an online platform, enriching it with domain-specific corpus sources, and utilizing advanced models like Paraformer for Automatic Speech Recognition (ASR). Evaluation Challenges: Evaluating AI performance in games involving human players can be challenging due to their interactive nature and varied speech strategies. However, the framework's effectiveness was tested in games involving AI vs AI or one human player against multiple AIs. Conclusion: In conclusion, this research paper proposes a novel framework for enhancing the reasoning capabilities of LLMs by integrating them with an external Thinker module. The results demonstrate its proficiency in deductive reasoning, speech generation, and online game evaluation. This study also contributes to the field by providing a large dataset for social deduction games and utilizing data-driven approaches to develop more generalized and flexible methods. Future research could explore the application of this framework in other domains that require dual-system reasoning and further improve its performance through additional training data and advanced models.

Created on 03 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.