Enhance Reasoning for Large Language Models in the Game Werewolf

AI-generated keywords: Large Language Models External Thinker Module Dual-System Reasoning Social Deduction Games Data-Driven Approaches

AI-generated Key Points

  • A novel framework proposed to enhance reasoning capabilities of Large Language Models (LLMs) by integrating them with an external Thinker module
  • Thinker module leverages knowledge from databases and optimization techniques for complex logical analysis and domain-specific tasks
  • Hierarchy established where LLMs focus on System-1 tasks, while Thinker specializes in cognitive System-2 reasoning
  • Framework demonstrated effectiveness in a 9-player Werewolf game scenario requiring dual-system reasoning
  • Communication protocol introduced to facilitate interaction between LLMs and the Thinker, trained using data from over 18,800 human game sessions and reinforcement learning techniques
  • Experimental results showcase proficiency in deductive reasoning, speech generation, and online game evaluation
  • 6B LLM integrated with the Thinker surpasses GPT4 after fine-tuning
  • Largest dataset provided for social deduction games to date through authentic game sessions and speech data collection methodology
  • Framework comprises three key components: The Listener for natural language understanding, the Thinker for deep logical analysis and decision-making, and the Presenter for generating coherent language output based on strategic instructions from the Thinker
  • Data preparation involved collecting game data from Werewolf sessions enriched with domain-specific corpus sources and utilizing advanced models like Paraformer for Automatic Speech Recognition (ASR)
  • Study aims to develop more generalized and flexible methods by incorporating data-driven approaches
  • Evaluation involves testing AI performance in games involving AI vs AI or one human player against multiple AIs; challenges arise in evaluating AI performance in settings with human players due to interactive nature and varied speech strategies
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, Haobo Fu

License: CC BY 4.0

Abstract: This paper presents an innovative framework that integrates Large Language Models (LLMs) with an external Thinker module to enhance the reasoning capabilities of LLM-based agents. Unlike augmenting LLMs with prompt engineering, Thinker directly harnesses knowledge from databases and employs various optimization techniques. The framework forms a reasoning hierarchy where LLMs handle intuitive System-1 tasks such as natural language processing, while the Thinker focuses on cognitive System-2 tasks that require complex logical analysis and domain-specific knowledge. Our framework is presented using a 9-player Werewolf game that demands dual-system reasoning. We introduce a communication protocol between LLMs and the Thinker, and train the Thinker using data from 18800 human sessions and reinforcement learning. Experiments demonstrate the framework's effectiveness in deductive reasoning, speech generation, and online game evaluation. Additionally, we fine-tune a 6B LLM to surpass GPT4 when integrated with the Thinker. This paper also contributes the largest dataset for social deduction games to date.

Submitted to arXiv on 04 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.02330v1

A novel framework is proposed to enhance the reasoning capabilities of Large Language Models (LLMs) by integrating them with an external Thinker module. Unlike traditional methods that rely on prompt engineering, the Thinker module leverages knowledge from databases and employs optimization techniques to handle complex logical analysis and domain-specific tasks. The framework establishes a hierarchy where LLMs focus on intuitive System-1 tasks like natural language processing, while the Thinker specializes in cognitive System-2 reasoning. The effectiveness of this framework is demonstrated through its application in a 9-player Werewolf game scenario that requires dual-system reasoning. A communication protocol is introduced to facilitate interaction between LLMs and the Thinker, with the latter being trained using data from over 18,800 human game sessions and reinforcement learning techniques. Experimental results showcase the framework's proficiency in deductive reasoning, speech generation, and online game evaluation. Furthermore, a 6B LLM integrated with the Thinker surpasses GPT4 after fine-tuning. This study also contributes to the field by providing the largest dataset for social deduction games to date. The methodology involves collecting authentic game sessions and speech data to align closely with real-world scenarios and human interaction patterns. The framework comprises three key components: The Listener for natural language understanding, the Thinker for deep logical analysis and decision-making, and the Presenter for generating coherent language output based on strategic instructions from the Thinker. Data preparation involved collecting game data from Werewolf sessions hosted on an online platform, enriching it with domain-specific corpus sources, and utilizing advanced models like Paraformer for Automatic Speech Recognition (ASR). Overall, this study aims to develop more generalized and flexible methods by incorporating data-driven approaches. Evaluation involves testing AI performance in games involving AI vs AI or one human player against multiple AIs. Challenges arise in evaluating AI performance in settings with human players due to their interactive nature and varied speech strategies.
Created on 03 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.