Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

AI-generated keywords: Large Vision Language Models Hallucination Evaluation Event Hallucinations Discriminative and Generative Evaluation Comprehensive Taxonomy

AI-generated Key Points

  • Large Vision Language Models (LVLMs) face challenges with hallucinations, including inconsistencies between images and descriptions
  • Previous research has focused on hallucinations related to objects, attributes, and relations in LVLMs but overlooked complex narrative-based hallucinations
  • Authors introduce a refined taxonomy of hallucinations that includes a new category: Event Hallucination
  • Utilizing advanced LVLMs, authors generate and filter fine-grained hallucinatory data with a focus on event hallucinations
  • Proposed benchmark aims to assess LVLMs' ability to handle various types of hallucinations effectively
  • Authors provide a reliable tool for evaluating LVLMs' efficacy in addressing hallucination issues through their taxonomy and evaluation framework
  • Plan to release code and data for further research and development in this area
  • Emphasize the importance of annotating different types of hallucinations for enhanced understanding and evaluation processes
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

License: CC BY 4.0

Abstract: Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations, with a particular focus on event hallucinations, laying the groundwork for integrating discriminative and generative evaluation methods within our universal evaluation framework. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations, making it a reliable and comprehensive tool for gauging LVLMs efficacy in handling hallucinations. We will release our code and data.

Submitted to arXiv on 24 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.15721v1

The paper "Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models" by Chaoya Jiang et al. addresses the challenges faced by Large Vision Language Models (LVLMs) in dealing with hallucinations. These inconsistencies between images and their descriptions have been previously identified in relation to objects, attributes, and relations in LVLMs. However, complex hallucinations that construct an entire narrative around a fictional entity have often been overlooked. To tackle this issue, the authors introduce a refined taxonomy of hallucinations that includes a new category: Event Hallucination. By utilizing advanced LVLMs, the authors generate and filter fine-grained hallucinatory data encompassing various types of hallucinations with a specific focus on event hallucinations. This approach lays the foundation for integrating discriminative and generative evaluation methods within a universal evaluation framework. The proposed benchmark aims to assess LVLMs' ability to handle a broad spectrum of hallucinations effectively. Through their innovative taxonomy and evaluation framework, the authors provide a reliable tool for evaluating LVLMs' efficacy in addressing hallucination issues. They also plan to release their code and data for further research and development in this area. In addition to discussing generative evaluation methods currently used in assessing models' performance based on generating hallucinatory content, the paper emphasizes the importance of annotating different types of hallucinations to enhance understanding and evaluation processes. Overall,"Hal-Eval" presents a significant contribution to improving the assessment of LVLMs' capabilities in handling complex hallucinations through its comprehensive taxonomy and evaluation framework.
Created on 20 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.