Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

AI-generated keywords: Knowledge Injection Large Language Models (LLMs) Entity Triplets Entity Summaries NBA Domain

AI-generated Key Points

  • Authors enhance performance of Large Language Models (LLMs) through knowledge injection
  • Utilize entity triplets and summaries from Wikipedia API to create 54K training samples for NBA domain
  • Preserve triplet format for naturalness in generated responses
  • Model trained using special token "TRUE_FACT:" and causal language model objective
  • Experiment with two settings for knowledge injection: Intermediate tuning and Combined tuning
  • Evaluate effectiveness of techniques and knowledge retention during intermediate finetuning stages
  • Introduce HaloCheck, a lightweight BlackBox framework for quantifying hallucinations in LLMs
  • Compare with selfcheckGPT-NLI to show efficiency in detecting contradictions in responses
  • Contribute insights into reducing hallucinations in low-parameter LLMs and introduce novel evaluation framework
  • Pave way for future research to expand approaches across domains and improve model performance
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Mohamed Elaraby, Mengyin Lu, Jacob Dunn, Xueying Zhang, Yu Wang, Shizhu Liu, Pingchuan Tian, Yuping Wang, Yuxuan Wang

License: CC BY-SA 4.0

Abstract: Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP). Although convenient for research and practical applications, open-source LLMs with fewer parameters often suffer from severe hallucinations compared to their larger counterparts. This paper focuses on measuring and reducing hallucinations in BLOOM 7B, a representative of such weaker open-source LLMs that are publicly available for research and commercial applications. We introduce HaloCheck, a lightweight BlackBox knowledge-free framework designed to quantify the severity of hallucinations in LLMs. Additionally, we explore techniques like knowledge injection and teacher-student approaches to alleviate hallucinations in low-parameter LLMs. Our experiments effectively demonstrate the reduction of hallucinations in challenging domains for these LLMs.

Submitted to arXiv on 22 Aug. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2308.11764v4

The authors of this study build upon previous research to enhance the performance of Large Language Models (LLMs) through knowledge injection. By utilizing entity triplets and summaries extracted from Wikipedia API, they create a set of 54K training samples for the NBA domain. Unlike previous approaches, they preserve the triplet format to maintain naturalness in generated responses. The model is trained using a special token "TRUE_FACT:" and a causal language model objective due to its decoder-only architecture. Two settings for knowledge injection are experimented with: Intermediate tuning where finetuning is done exclusively on knowledge text before SFT data, and Combined tuning where both types of data are jointly finetuned. The effectiveness of these techniques is evaluated along with knowledge retention during intermediate finetuning stages. Additionally, the authors introduce HaloCheck, a lightweight BlackBox framework for quantifying hallucinations in LLMs without requiring extensive computational resources or question generation modules. Comparisons with selfcheckGPT-NLI show its efficiency in detecting subtle contradictions within sampled responses through quantitative and qualitative analyses. This study contributes valuable insights into reducing hallucinations in low-parameter LLMs and introduces a novel framework for evaluating hallucination severity in generated responses. It also paves the way for future research to expand these approaches across multiple domains and improve model performance in challenging tasks.
Created on 04 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.