Human Uncertainty in Concept-Based AI Systems
AI-generated Key Points
- Placing a human in the loop can mitigate risks in deploying AI systems in safety-critical settings.
- Addressing challenges arising from human error and uncertainty within human-AI interactions is crucial but understudied.
- Real-world decision-making involves occasional mistakes and uncertainty, contrary to assumptions made in previous research.
- Training with uncertain concept labels can help address weaknesses in concept-based systems when dealing with uncertain interventions.
- Challenges related to calibration errors among annotators exist, with some poorly calibrated annotations affecting overall calibration.
- Annotators tend to underestimate small probabilities and overestimate large probabilities, potentially influenced by cognitive load reduction or rounding errors.
- Poor calibration raises questions about the interface used or inherent limitations in eliciting uncertainties in a crowdsourcing setting with limited cognitive resources.
- The richness of the CUB-S dataset poses a substantial challenge for concept-based models.
- Using models trained on coarse-grained uncertainty from CUB helps mitigate failures under test-time uncertainty to some extent but is not perfect.
- Intervening with coarse-grained annotations impacts performance in bird species classification task.
- Evaluation protocols focus on task accuracy and Skyline interventions demonstrate the best possible intervention policy and highlight impact of different types of uncertainty on performance.
Authors: Katherine M. Collins, Matthew Barker, Mateo Espinosa Zarlenga, Naveen Raman, Umang Bhatt, Mateja Jamnik, Ilia Sucholutsky, Adrian Weller, Krishnamurthy Dvijotham
Abstract: Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context of concept-based models, a family of AI systems that enable human feedback via concept interventions where an expert intervenes on human-interpretable concepts relevant to the task. Prior work in this space often assumes that humans are oracles who are always certain and correct. Yet, real-world decision-making by humans is prone to occasional mistakes and uncertainty. We study how existing concept-based models deal with uncertain interventions from humans using two novel datasets: UMNIST, a visual dataset with controlled simulated uncertainty based on the MNIST dataset, and CUB-S, a relabeling of the popular CUB concept dataset with rich, densely-annotated soft labels from humans. We show that training with uncertain concept labels may help mitigate weaknesses of concept-based systems when handling uncertain interventions. These results allow us to identify several open challenges, which we argue can be tackled through future multidisciplinary research on building interactive uncertainty-aware systems. To facilitate further research, we release a new elicitation platform, UElic, to collect uncertain feedback from humans in collaborative prediction tasks.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.