The Complexity of Why-Provenance for Datalog Queries
AI-generated Key Points
- Explainable AI aims to understand why a database query result is obtained
- Why-provenance is a standard way to explain a query result by providing information about the witnesses to that result
- Computational complexity of why-provenance for Datalog queries remains unexplored
- Why-provenance for recursive queries, even with limited linear recursion, is an intractable problem
- Non-recursive queries are highly tractable for why-provenance
- Unambiguous proof trees are introduced to overcome arbitrary proof trees' conceptual limitations and do not affect the data complexity of the why-provenance problem
- SAT solvers can help compute why-provenance efficiently and make it applicable in practice for (recursive) Datalog queries
- Future research should provide a complete classification of the data complexity of the why-provenance problem as a dichotomy result and explore combined complexity when Datalog queries are part of input.
- Thorough experimental evaluations are necessary to better understand our SAT-based machinery's practical application potential.
- Understanding why a database query result is obtained through explainability techniques like why-provenance is crucial towards achieving Explainable AI goals in ontology-based applications using expressive database query languages like Datalog.
Authors: Marco Calautti, Ester Livshits, Andreas Pieris, Markus Schneider
Abstract: Explaining why a database query result is obtained is an essential task towards the goal of Explainable AI, especially nowadays where expressive database query languages such as Datalog play a critical role in the development of ontology-based applications. A standard way of explaining a query result is the so-called why-provenance, which essentially provides information about the witnesses to a query result in the form of subsets of the input database that are sufficient to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. The goal of this work is to fill this apparent gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable. Having said that, we experimentally confirm, by exploiting SAT solvers, that making why-provenance for (recursive) Datalog queries work in practice is not an unrealistic goal.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.