Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs

AI-generated keywords: Social Intelligence Theory of Mind Natural Language Processing GPT-3 Person-Centric NLP

AI-generated Key Points

  • The paper explores social intelligence and Theory of Mind (ToM) in NLP systems
  • It focuses on GPT-3 and evaluates its performance in terms of social intelligence
  • GPT-3 struggles significantly with Theory of Mind tasks, achieving low accuracies
  • The study suggests that scale alone is not sufficient for improving models' performance
  • Person-centric NLP approaches may be more effective in developing neural Theory of Mind capabilities
  • A new benchmark called TOMI QA is introduced, showing lower accuracy on mental state questions compared to factual questions
  • Ongoing challenges exist in incorporating social intelligence and ToM into large language models.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Maarten Sap, Ronan LeBras, Daniel Fried, Yejin Choi

EMNLP 2022
License: CC BY 4.0

Abstract: Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open question of social intelligence and Theory of Mind in modern NLP systems from an empirical and theory-based perspective. We show that one of today's largest language models (GPT-3; Brown et al., 2020) lacks this kind of social intelligence out-of-the box, using two tasks: SocialIQa (Sap et al., 2019), which measures models' ability to understand intents and reactions of participants of social interactions, and ToMi (Le et al., 2019), which measures whether models can infer mental states and realities of participants of situations. Our results show that models struggle substantially at these Theory of Mind tasks, with well-below-human accuracies of 55% and 60% on SocialIQa and ToMi, respectively. To conclude, we draw on theories from pragmatics to contextualize this shortcoming of large language models, by examining the limitations stemming from their data, neural architecture, and training paradigms. Challenging the prevalent narrative that only scale is needed, we posit that person-centric NLP approaches might be more effective towards neural Theory of Mind.

Submitted to arXiv on 24 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.13312v1

The paper titled "Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs" explores the concept of social intelligence and Theory of Mind (ToM) in modern Natural Language Processing (NLP) systems. Social intelligence and ToM refer to the ability to understand and reason about the mental states, intentions, and reactions of individuals involved in social interactions. As NLP systems are increasingly used in complex social situations, it becomes crucial for them to grasp social dynamics. The study focuses on one of the largest language models currently available, GPT-3 (Brown et al., 2020), and investigates its out-of-the-box performance in terms of social intelligence. Two specific tasks are used to evaluate GPT-3's abilities: SocialIQa (Sap et al., 2019), which measures the model's understanding of intents and reactions in social interactions; and ToMi (Le et al., 2019), which assesses whether the model can infer mental states and realities of participants in various situations. The results reveal that GPT-3 struggles significantly with these Theory of Mind tasks, achieving accuracies well below human performance. Specifically, GPT-3 achieves only 55% accuracy on SocialIQa and 60% accuracy on ToMi. These findings highlight a limitation in large language models' ability to comprehend social dynamics. To provide further insights into this shortcoming, the authors draw on theories from pragmatics to contextualize the limitations arising from data, neural architecture, and training paradigms employed by large language models. They challenge the prevailing belief that scale alone is sufficient for improving these models' performance. Instead, they suggest that person-centric NLP approaches may be more effective in developing neural Theory of Mind capabilities. In addition to evaluating GPT-3's performance on existing benchmarks, the study introduces a new benchmark called TOMI QA inspired by the classic Sally-Ann False Belief Theory of Mind test. The results show that GPT-3 models achieve only 60% accuracy on questions related to participants' mental states compared to 90–100% accuracy on factual questions. Overall, this research highlights ongoing challenges in incorporating social intelligence and Theory of Mind into large language models while emphasizing the need for further exploration and development of person-centric NLP approaches to enhance these models' understanding of social dynamics.
Created on 25 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.