Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

AI-generated keywords: Large Language Models Hallucinations Detecting Mitigating Curating

AI-generated Key Points

  • Large language models (LLMs) pose challenges in terms of hallucinations, generating content that deviates from user input or established knowledge.
  • Recent efforts have focused on detecting, explaining, and mitigating hallucinations in LLMs, with a particular emphasis on the unique challenges they present.
  • Taxonomies of LLM hallucination phenomena and evaluation benchmarks are presented, along with an analysis of existing approaches to mitigate them.
  • Curating pre-training corpora is important to reduce hallucinations during training; strategies like up-sampling data from factual sources and adding topic prefixes to sentences can improve LLM performance and reduce hallucinations.
  • Mitigation of hallucinations during supervised fine-tuning (SFT) is crucial, emphasizing the need for well-designed SFT strategies to prevent inaccurate responses from LLMs.
  • Effective selection and filtering strategies for data curation could help mitigate hallucinations in LLMs; future research may explore new approaches to improve the reliability of LLMs in real-world scenarios.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

work in progress; 32 pages
License: CC BY-NC-SA 4.0

Abstract: While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.

Submitted to arXiv on 03 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.01219v1

In this paper, we focus on the challenges posed by large language models (LLMs) in terms of hallucinations. LLMs occasionally generate content that deviates from user input or established knowledge. The paper surveys recent efforts in detecting, explaining, and mitigating hallucinations in LLMs with a particular emphasis on the unique challenges they present. Taxonomies of LLM hallucination phenomena and evaluation benchmarks are presented along with an analysis of existing approaches to mitigate them. We also discuss the importance of curating pre-training corpora to reduce hallucinations during training. Strategies such as up-sampling data from factual sources like Wikipedia and adding topic prefixes to sentences have been proposed to improve LLM performance and reduce hallucinations. Additionally, we explore the mitigation of hallucinations during supervised fine-tuning (SFT), highlighting the need for well-designed SFT strategies to prevent inaccurate responses from LLMs. Overall, our paper suggests that more effective selection and filtering strategies for data curation could help mitigate hallucinations in LLMs. Future research directions may include exploring new approaches to address this challenge and improve the reliability of LLMs in real-world scenarios.
Created on 31 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.