Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea

AI-generated keywords: Language Models North Korea Misinformation Geopolitical Contexts Accuracy

AI-generated Key Points

  • Researchers investigate how large language models (LLMs) generate information about North Korea, a country known for lack of reliable sources and prevalence of sensationalist falsehoods
  • Research aims to address two main questions:
  • 1. How do current LLMs generate information about North Korea given the scarcity of reliable sources?
  • 2. Are there differences in how various LLMs generate information about North Korea across different languages?
  • Constructed dataset focuses on two categories of topics: widely circulated false rumors with limited correction and lesser-known information
  • Evaluated LLMs including ChatGPT-3.5, Gemini, Claude 3 Sonnet, Solar-Mini (for Korean), and Qwen-72B (for Mandarin Chinese) in Korean, English, and Mandarin Chinese
  • Measures accuracy, consistency, and refusal-to-answer rates for 13 topics with verifiable ground truth
  • Study highlights critical nuances overlooked in addressing LLM hallucinations and misinformation; emphasizes need for rigorous scrutiny when using LLMs in multiple languages in sensitive geopolitical contexts
  • Background section discusses history of misinformation surrounding North Korea due to lack of communication with outside world; Western media's contribution to sensationalist reporting and false information; attitudes towards North Koreans and journalistic standards related to reporting on North Korea
  • Findings show model capacity doesn't always correlate with higher accuracy; Claude 3 Sonnet exhibited highest accuracy across all three languages tested, followed by ChatGPT-3.5 and Gemini; Gemini's lower accuracy attributed to high refusal-to-answer frequency; consistency levels varied across languages and models
  • Research sheds light on how different LLMs generate information about North Korea across various languages; underscores importance of critically evaluating their outputs in sensitive geopolitical contexts where misinformation can have significant implications
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Eunjung Cho, Won Ik Cho, Soomin Seo

Accepted at COLING 2025
License: CC BY-SA 4.0

Abstract: Hallucination in large language models (LLMs) remains a significant challenge for their safe deployment, particularly due to its potential to spread misinformation. Most existing solutions address this challenge by focusing on aligning the models with credible sources or by improving how models communicate their confidence (or lack thereof) in their outputs. While these measures may be effective in most contexts, they may fall short in scenarios requiring more nuanced approaches, especially in situations where access to accurate data is limited or determining credible sources is challenging. In this study, we take North Korea - a country characterised by an extreme lack of reliable sources and the prevalence of sensationalist falsehoods - as a case study. We explore and evaluate how some of the best-performing multilingual LLMs and specific language-based models generate information about North Korea in three languages spoken in countries with significant geo-political interests: English (United States, United Kingdom), Korean (South Korea), and Mandarin Chinese (China). Our findings reveal significant differences, suggesting that the choice of model and language can lead to vastly different understandings of North Korea, which has important implications given the global security challenges the country poses.

Submitted to arXiv on 10 Jan. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2501.05981v1

In this study, researchers investigate how large language models (LLMs) generate information about North Korea. The country is known for its extreme lack of reliable sources and prevalence of sensationalist falsehoods. The research aims to address two main questions: 1) How do current LLMs generate information about topics on North Korea given the scarcity of reliable sources? 2) Are there differences in how various LLMs generate information about North Korea across different languages? To answer these questions, the researchers construct a dataset focusing on two categories of topics about North Korea: widely circulated but false rumors with limited correction by credible sources and lesser-known information. They evaluate some of the most widely used LLMs - ChatGPT-3.5, Gemini, Claude 3 Sonnet, Solar-Mini (for Korean), and Qwen-72B (for Mandarin Chinese) - in three languages: Korean, English, and Mandarin Chinese. For 13 topics with verifiable ground truth, they measure accuracy, consistency, and refusal-to-answer rates of the models. The study makes two key contributions: highlighting critical nuances overlooked in current methods for addressing LLM hallucinations and misinformation; emphasizing the need for more rigorous scrutiny when using LLMs in multiple languages, especially in sensitive geopolitical contexts where misinformation can have serious consequences. The background section discusses the history of misinformation surrounding North Korea due to its lack of communication with the outside world. It also touches on how Western media coverage has contributed to sensationalist reporting and false information about the country. Additionally, it explores attitudes towards North Koreans and journalistic standards related to reporting on North Korea. The findings reveal that model capacity does not always correlate with higher accuracy. Claude 3 Sonnet exhibited the highest accuracy across all three languages tested, followed by ChatGPT-3.5 and Gemini. Gemini's lower accuracy was attributed to its high refusal-to-answer frequency. Consistency levels varied across languages and models. In conclusion, this research sheds light on how different LLMs generate information about North Korea across various languages and highlights the importance of critically evaluating their outputs in sensitive geopolitical contexts where misinformation can have significant implications.
Created on 16 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.