=Paper=
{{Paper
|id=Vol-3741/paper34
|storemode=property
|title=Computing the Why-Provenance for Datalog Queries via SAT Solvers
|pdfUrl=https://ceur-ws.org/Vol-3741/paper34.pdf
|volume=Vol-3741
|authors=Marco Calautti,Ester Livshits,Andreas Pieris,Markus Schneider
|dblpUrl=https://dblp.org/rec/conf/sebd/CalauttiLPS24
}}
==Computing the Why-Provenance for Datalog Queries via SAT Solvers==
Computing the Why-Provenance for Datalog Queries via SAT Solversβ (Discussion Paper) Marco Calautti1,** , Ester Livshits2 , Andreas Pieris2,3 and Markus Schneider2 1 Department of Computer Science, University of Milan 2 School of Informatics, University of Edinburgh 3 Department of Computer Science, University of Cyprus Abstract Explaining an answer to a Datalog query is an essential task towards Explainable AI, especially nowadays where Datalog plays a critical role in the development of ontology-based applications. A well-established approach for explaining a query answer is the so-called why-provenance, which essentially collects all the subsets of the input database that can be used to obtain that answer via some derivation process, typically represented as a proof tree. It is well known, however, that computing the why-provenance for Datalog queries is computationally expensive, and thus, very few attempts can be found in the literature. The goal of this work is to demonstrate how off-the-shelf SAT solvers can be exploited towards an efficient computation of the why-provenance for Datalog queries. Interestingly, our SAT-based approach allows us to build the why-provenance in an incremental fashion, that is, one explanation at a time, which is much more useful in a practical context than the one-shot computation of the whole set of explanations as done by existing approaches. Keywords Datalog queries, explainability, why-provenance, computational complexity 1. Introduction Datalog has emerged in the 1980s as a logic-based query language from Logic Programming and has been extensively studied since then [2]. The name Datalog reflects the intention of devising a counterpart of Prolog for data processing. It essentially extends the language of unions of conjunctive queries, which corresponds to the select-project-join-union fragment of relational algebra, with the important feature of recursion, much needed to express some natural queries. As for any other query language, explaining why a result to a Datalog query is obtained is crucial towards transparent data-intensive applications. A well-established approach for providing such explanations to query answers is the so-called why-provenance [3]. Its essence is to collect all the subsets of the input database that as a whole can be used to derive a certain answer. More precisely, for Datalog queries, the why-provenance of an answer tuple Β―π‘ is obtained by considering all the possible proof trees π of the fact Ans(π‘Β―), with Ans being the answer predicate of the Datalog query in question, and then collecting all the database facts that label the leaves of π . Recall that a proof tree of a fact πΌ w.r.t. a database π· and a set Ξ£ of Datalog SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy β This is a short version of the paper [1] ** Corresponding author. $ marco.calautti@unimi.it (M. Calautti); ester.livshits@ed.ac.uk (E. Livshits); apieris@inf.ed.ac.uk (A. Pieris); m.schneider@ed.ac.uk (M. Schneider) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings rules forms a tree-like representation of a way for deriving πΌ by starting from π· and executing the rules occurring in Ξ£ [2]. Despite its wide acceptance, why-provenance for Datalog queries comes with two weaknesses: it is computationally very expensive, and it may provide counterintuitive explanations. The first weakness is manifested by the fact that, although why-provenance for Datalog queries has been around for decades, only a couple of works have considered implementing it for recursive queries [4, 5]. An attempt to change this state of affairs was made in [6] by focusing on the more practical setting of computing the why-provenance of a given query answer (a.k.a. on-demand why-provenance), instead of computing the why-provenance for all the query answers. Concerning the second weakness, it has been observed that there are proof trees that correspond to unnatural derivation processes, e.g., derivations where an atom is derived from itself [7]. Now, an explanation witnessed via such an unnatural proof tree, might be classified as a counterintuitive one as it does not correspond to an intuitive derivation process that can be extracted from the proof tree; this is further discussed in Section 3. The main goal of this work is to tackle the two weaknesses of why-provenance for Datalog queries discussed above. In particular, we place our work in the more practical setting of on-demand why-provenance, and target an efficient implementation that provides conceptually meaningful explanations for the given query answer. 2. Preliminaries We consider the disjoint countably infinite sets C and V of constants and variables, respectively. We may refer to constants and variables as terms. For brevity, given an integer π > 0, we may write [π] for the set of integers {1, . . . , π}. Relational Databases. A schema S is a finite set of relation names (or predicates) with associated arity. We write π /π to say that π has arity π β₯ 0; we may write ar(π ) for π. A (relational) atom πΌ over S is an expression of the form π (π‘Β―), where π /π β S and Β―π‘ is an π-tuple of terms. By abuse of notation, we may treat tuples as the set of their elements. A fact is an atom that mentions only constants. A database over S is a finite set of facts over S. The active domain of a database π·, denoted dom(π·), is the set of constants in π·. Syntax and Semantics of Datalog Programs. A (Datalog) rule π over a schema S is an expression of the form Β― 0 ) :β π 1 (π₯ π 0 (π₯ Β― 1 ), . . . , π π (π₯ Β―π) for π β₯ 1, where π π (π₯Β― π ) is a (constant-free) relational atom over S for π β {0, . . . , π}, and each variable in π₯ Β― 0 occurs in π₯ Β― π for some π β [π]. We refer to π 0 (π₯ Β― 0 ) as the head of π, denoted head(π), and to the expression that appears on the right of the :β symbol as the body of π, denoted body(π), which we may treat as the set of its atoms. A Datalog program over a schema S is defined as a finite set Ξ£ of Datalog rules over S. A predicate π occurring in Ξ£ is called extensional if there is no rule in Ξ£ having π in its head, and intensional if there exists at least one rule in Ξ£ with π in its head. The extensional (database) schema of Ξ£, denoted edb(Ξ£), is the set of all extensional predicates in Ξ£, while the intensional schema of Ξ£, denoted idb(Ξ£), is the set of all intensional predicates in Ξ£. Note that, by definition, edb(Ξ£) β© idb(Ξ£) = β . The schema of Ξ£, denoted sch(Ξ£), is the set edb(Ξ£) βͺ idb(Ξ£), which is in general a subset of S since some predicates of S may not appear in Ξ£. An elegant property of Datalog programs is that they have three equivalent semantics: model-theoretic, fixpoint, and proof-theoretic [2]. We recall the proof-theoretic semantics of Datalog programs since it is closer to the notion of why-provenance. To this end, we need the key notion of proof tree of a fact. For a database π· and a Datalog program Ξ£, let base(π·, Ξ£) = {π (π‘Β―) | π β sch(Ξ£) and Β―π‘ β dom(π·)ar(π ) }, the facts that can be formed using predicates of sch(Ξ£) and constants of dom(π·). Definition 2.1. (Proof Tree) Consider a Datalog program Ξ£, a database π· over edb(Ξ£), and a fact πΌ over sch(Ξ£). A proof tree of πΌ w.r.t. π· and Ξ£ is a finite labeled rooted tree π = (π, πΈ, π), with π : π β base(π·, Ξ£), such that: 1. If π£ β π is the root, then π(π£) = πΌ. 2. If π£ β π is a leaf, then π(π£) β π·. 3. If π£ β π is a node with π β₯ 1 children π’1 , . . . , π’ βοΈπ , then there is a rule π 0 (π₯Β― 0 ) :β π 1 (π₯ Β― π ) β Ξ£ and a function β : πβ[π] π₯ Β― 1 ), . . . , π π (π₯ Β― π β C such that π(π£) = π 0 (β(π₯ Β― 0 )), and π(π’π ) = π π (β(π₯ Β― π )) for each π β [π]. Essentially, a proof tree of a fact πΌ w.r.t. π· and Ξ£ indicates that we can derive πΌ starting from π· end executing the rules of Ξ£. Now, given a Datalog program Ξ£ and a database π· over sch(Ξ£), the semantics of Ξ£ on π· is Ξ£(π·) = {πΌ | there is a proof tree of πΌ w.r.t. π· and Ξ£}. That is, the set of facts that can be proven using π· and Ξ£. Datalog Queries. It is now straightforward to recall the syntax and the semantics of Datalog queries. A Datalog query is a pair π = (Ξ£, π ), where Ξ£ is a Datalog program and π a predicate of idb(Ξ£). Now, for a database π· over edb(Ξ£), the answer to π over π· is defined as the set {οΈ }οΈ π(π·) = Β―π‘ β dom(π·)ar(π ) | π (π‘Β―) β Ξ£(π·) , that is, the set of tuples Β―π‘ of dom(π·)ar(π ) such that the fact π (π‘Β―) can be derived using π· and Ξ£. Why-Provenance for Datalog Queries. As discussed in the introduction, why-provenance is a standard way of explaining query results. It essentially collects all the subsets of the database (without unnecessary atoms) that allow to prove (or derive) a query result. We now formalize this simple idea. Given a proof tree π = (π, πΈ, π) (of some fact w.r.t. some database and Datalog program), the support of π is the set support(π ) = {π(π£) | π£ β π is a leaf of π }, which is essentially the set of facts that label the leaves of the proof tree π . Note that support(π ) is a subset of the underlying database since, by definition, the leaves of a proof tree are labeled with database atoms. The formal definition of why-provenance for Datalog queries follows. Definition 2.2. (Why-Provenance for Datalog) Consider a Datalog query π = (Ξ£, π ), a database π· over edb(Ξ£), and a tuple Β―π‘ β dom(π·)ar(π ) . The why-provenance of Β―π‘ w.r.t. π· and π is defined as the family of sets of facts why(π‘Β―, π·, π) = {support(π ) | π is a proof tree of π (π‘Β―) w.r.t. π· and Ξ£}. Intuitively speaking, a set of facts π·β² β π· that belongs to why(π‘Β―, π·, π) should be understood as a reason why the tuple Β―π‘ is an answer to the query π over the database π·, i.e., π·β² explains why Β―π‘ β π(π·). In particular, all the facts of π·β² are used in order to derive Β―π‘ as an answer. 3. Unambiguous Proof Trees The standard notion of why-provenance defined above relies on arbitrary proof trees. However, as discussed in [7], there are proof trees that are counterintuitive. For example, such a proof tree is one where a fact is derived from itself, that is, it contains two nodes labeled with the same fact and one is a descendant of the other. Now, a member of why(π‘Β―, π·, π), witnessed via such an unnatural proof tree, might be classified as a counterintuitive explanation of Β―π‘ as it does not correspond to a natural derivation process that can be extracted from the proof tree. Therefore, we need refined classes of proof trees that overcome the conceptual limitations of arbitrary proof trees. Some refined classes of proof trees have been recently discussed in [7]: non-recursive proof trees, minimal-depth proof trees, and hereditary minimal-depth proof trees. Roughly speaking, a non-recursive proof tree is a proof tree that does not contain two nodes labeled with the same fact and one is a descendant of the other, a minimal-depth proof tree is a proof tree that has the minimum depth among all the proof trees of the same fact, and a hereditary minimal-depth proof tree is minimizing the depth of each of its subtrees. Although non-recursive and (hereditary) minimal-depth proof trees form well-justified notions that deserve our attention, there are still proof trees from those classes that can be classified as counterintuitive. In fact, there are proof trees that are non-recursive and (hereditary) minimal-depth, but they are ambiguous in the way some facts are derived. Here is a simple example that illustrates this phenomenon. Example 3.1. Consider the Datalog program Ξ£ π΄(π₯) :β π(π₯) π΄(π₯) :β π΄(π¦), π΄(π§), π (π¦, π§, π₯) that encodes the path accessibility problem [8]. The predicate π represents source nodes, π΄ represents nodes that are accessible from the source nodes, and π represents accessibility conditions, that is, π (π¦, π§, π₯) means that if both π¦ and π§ are accessible from the source nodes, then so is π₯. We further consider the database π· = {π(π), π(π), π (π, π, π), π (π, π, π), π (π, π, π)}. The following is a proof tree of the fact π΄(π) w.r.t. π· and Ξ£ that is non-recursive and (hereditary) minimal-depth, but it suffers from the ambiguity issue described above: π΄(π) π΄(π) π΄(π) π (π, π, π) π΄(π) π΄(π) π (π, π, π) π΄(π) π΄(π) π (π, π, π) π(π) π(π) π(π) π(π) Indeed, there are two nodes labeled with the fact π΄(π), but their subtrees differ, and thus, it is ambiguous how π΄(π) is derived. Hence, the database π·, which belongs to the why-provenance of (π) w.r.t. π· and π due to the above proof tree, might be classified as a counterintuitive explanation since it does not correspond to an intuitive derivation process where each fact is derived once due to a unique reason. Indeed, the intuitive explanations that one expects are the following: π is accessible from π via π (i.e., the subset {π(π), π (π, π, π), π (π, π, π)} of π·), or π is accessible from π via π (i.e., the subset {π(π), π (π, π, π), π (π, π, π)} of π·). This leads to the class of unambiguous proof trees, where all occurrences of a fact must be proved via the same derivation. Two rooted trees π = (π, πΈ, π) and π β² = (π β² , πΈ β² , πβ² ) are isomorphic, denoted π β π β² , if there exists a bijection β : π β π β² such that, for each π£ β π , π(π£) = πβ² (β(π£)), and for each π’, π£ β π , (π’, π£) β πΈ iff (β(π’), β(π£)) β πΈ β² . Let π [π£] be the subtree of the proof tree π rooted at node π£. The formal definition of unambiguous proof trees follows. Definition 3.2. (Unambiguous Proof Tree) Consider a Datalog program Ξ£, a database π· over edb(Ξ£), and a fact πΌ over sch(Ξ£). An unambiguous proof tree of πΌ w.r.t. π· and Ξ£ is a proof tree π = (π, πΈ, π) of πΌ w.r.t. π· and Ξ£ such that, for all π£, π’ β π , π(π£) = π(π’) implies π [π£] β π [π’]. Why-provenance relative to unambiguous proof trees is defined in the obvious way: for a Datalog query π = (Ξ£, π ), a database π· over edb(Ξ£), and a tuple Β―π‘ β dom(π·)ar(π ) , the why-provenance of Β―π‘ w.r.t. π· and π relative to unambiguous proof trees is the family of sets of facts {support(π ) | π is an unambiguous proof tree of π (π‘Β―) w.r.t. π· and Ξ£}, denoted whyUN (π‘Β―, π·, π). The main concern of this work is to efficiently compute the why- provenance of a tuple relative to unambiguous proof trees. To this end, we are going to exploit off-the-shelf SAT solvers and report encouraging results; this is the subject of the next two sections. To the best of our knowledge, this is the first time that SAT solvers are used for computing the why-provenance. Let us stress that focusing on unambiguous proof trees, apart from their conceptual advantage discussed above, was crucial towards our encouraging results as it is unclear how a SAT-based implementation can be made practical for arbitrary proof trees. 4. From Why-Provenance to SAT In this section, we show that the why-provenance of a tuple relative to unambiguous proof trees can be extracted from the satisfying truth assignments of a Boolean formula. Compactly Representing Unambiguous Proof Trees. The construction of the Boolean formula relies on a characterization of the existence of an unambiguous proof tree of a fact πΌ w.r.t. π· and Ξ£ via the existence of a so-called compressed directed acyclic graph (DAG) of πΌ w.r.t. π· and Ξ£, which, intuitively speaking, is a compact representation of an unambiguous proof tree of πΌ w.r.t. π· and Ξ£; this is needed, as an unambiguous proof tree can be exponentially large in the size of π·. We proceed to formalize the notion of compressed DAG and give the characterization in question. Recall that a DAG πΊ is rooted if it has exactly one node, the root, with no incoming edges. A node of πΊ is a leaf if it has no outgoing edges. Definition 4.1. (Compressed DAG) Consider a Datalog program Ξ£, a database π· over edb(Ξ£), and a fact πΌ over sch(Ξ£). A compressed DAG of πΌ w.r.t. π· and Ξ£ is a rooted DAG πΊ = (π, πΈ), with π β base(π·, Ξ£), such that: 1. The root of πΊ is πΌ. 2. If π½ β π is a leaf node, then π½ β π·. 3. If π½ β π has π β₯ 1 outgoing edges (π½, πΎ1 ), . . . , (π½, βοΈ πΎπ ), then there is a rule π 0 (π₯Β― 0 ) :β π 1 (π₯ Β― π ) β Ξ£ and a function β : πβ[π] π₯ Β― 1 ), . . . , π π (π₯ Β― π β C such that π½ = π 0 (β(π₯ Β― 0 )) and {πΎπ }πβ[π] = {π π (β(π₯ Β― π )) | π β [π]}. For a compressed DAG πΊ = (π, πΈ), the support of πΊ is defined analogously to the support of a proof tree, that is, support(πΊ) = {π£ β π | π£ is a leaf of πΊ}. The desired characterization follows. Proposition 4.2. For a Datalog program Ξ£, a database π· over edb(Ξ£), a fact πΌ over sch(Ξ£), and a database π·β² β π·, the following are equivalent: 1. There exists an unambiguous proof tree π of πΌ w.r.t. π· and Ξ£ such that support(π ) = π·β² . 2. There exists a compressed DAG πΊ of πΌ w.r.t. π· and Ξ£ such that support(πΊ) = π·β² . Note that the above characterization relies on the fact that we focus on unambiguous proof trees. More precisely, unambiguity allows us to use a single node in the compressed DAG as a representative for all the (possibly exponentially many) nodes in the proof tree labelled with the same fact. The Boolean Formula. Fix a Datalog query π = (Ξ£, π ), a database π· over edb(Ξ£), and a tuple Β―π‘ β dom(π·)ar(π ) . We construct in polynomial time in π· a Boolean formula π(π‘Β―,π·,π) such that the why-provenance of Β―π‘ w.r.t. π· and π relative to unambiguous proof trees can be computed from the truth assignments that make π(π‘Β―,π·,π) true. Among the Boolean variables of π(π‘Β―,π·,π) , we also have the disjoint sets of variables ππ and ππΈ , where each variable in ππ corresponds to a possible node of some compressed DAG of π (π‘Β―) w.r.t. π· and Ξ£, while each variable in ππΈ corresponds to a possible edge between two nodes of some compressed DAG of π (π‘Β―) w.r.t. π· and Ξ£. The key idea is that the variables of ππ and ππΈ that become true via a satisfying truth assignment of π(π‘Β―,π·,π) , induce the nodes and the edges of a compressed DAG πΊ for π (π‘Β―) w.r.t. π· and Ξ£, which, by Proposition 4.2, implies that support(πΊ) β whyUN (π‘Β―, π·, π). The Boolean formula π(π‘Β―,π·,π) is a conjunction of the form πgraph β§ πroot β§ πproof β§ πacyclic , where πgraph is in charge of guaranteeing consistency between the truth assignments of the variables in ππ and the variables in ππΈ , i.e., if an edge between two nodes is part of πΊ, then the two nodes must belong to πΊ as well. The formula πroot guarantees that the atom π (π‘Β―) is a node of πΊ, is the root of πΊ, and no other node π£ of πΊ can be the root (i.e., π£ must have at least one incoming edge). The formula πproof is in charge of guaranteeing that, whenever an intensional atom πΌ is a node of πΊ, then it must have the correct children in πΊ. Finally, πacyclic is in charge of checking that πΊ, namely the graph whose edges correspond to the true variables in ππΈ , is acyclic. We can prove the following crucial result about the above Boolean formula: Scenario Databases Recursive #Rules TClosure [9, 10] π·bitcoin (235K), π·facebook (88.2K) β 2 Doctors [6] π·1 (100K) β 6 Galen [6] π·1 (26.5K), π·2 (30.5K), π·3 (67K), π·4 (82K) β 14 Andersen [11] π·1 (68K), π·2 (340K), π·3 (680K), π·4 (3.4M), π·5 (6.8M) β 2 CSDA [11] π·httpd (10M), π·postgresql (34.8M), π·linux (44M) β 2 Table 1 Experimental scenarios. For each database π·, the number of tuples occurring in π· is in parenthesis. Proposition 4.3. Consider a Datalog query π = (Ξ£, π ), a database π· over edb(Ξ£), and a tuple Β―π‘ β dom(π·)ar(π ) . It holds that: β’ π(π‘Β―,π·,π) is in CNF and can be constructed in polynomial time w.r.t. π·. β’ There is a one-to-one correspondence between the members of whyUN (π‘Β―, π·, π) and the satisfying assignments of π(π‘Β―,π·,π) . β’ Each member of whyUN (π‘Β―, π·, π) can be recovered from a satisfying assignment of π(π‘Β―,π·,π) in polynomial time w.r.t. the size of π(π‘Β―,π·,π) . 5. Implementation and Experimental Evaluation Proposition 4.3 provides a way for computing the why-provenance of a tuple relative to unam- biguous proof trees via off-the-shelf SAT solvers. But how does this machinery behave when applied in a practical context? In particular, we are interested in the incremental computation of the why-provenance by enumerating its members, which is more useful in practice than computing the whole set at once. This is achieved by adapting a standard technique from the SAT literature for enumerating the satisfying assignments of a Boolean formula, called blocking clause. After asking the SAT solver for an arbitrary satisfying assignment π of π(π‘Β―,π·,π) , we output the member of whyUN (π‘Β―, π·, π), denoted db(π ), corresponding to π , and then construct a βblockingβ clause πΆdb(π ) which expresses that no other satisfying assignment π β² should give rise to the same member of the why-provenance. This will exclude the previously computed explanations from the computation. We keep adding such blocking clauses each time we get a new member of the why-provenance until the formula is unsatisfiable. We now proceed to experimentally evaluate the SAT-based approach discussed above. To this end, we consider a variety of scenarios from the Datalog literature consisting of a query π = (Ξ£, π ) and a family of databases π over edb(Ξ£). All the scenarios are summarized in Table 1. Experimental Setup. For each scenario π consisting of the query π = (Ξ£, π ) and the family of databases π, and for each π· β π, we have computed π(π·) using DLV [12], version 2.1.1, and selected 100 tuples Β―π‘1π ,π· , . . . , Β―π‘100 π ,π· from π(π·) uniformly at random. Then, for each π β [100], we constructed the Boolean formula π(π‘Β―π ,π·,π) via a C++ implementation. Finally, we ran π ,π· the state-of-the-art SAT solver Glucose [13], version 4.2.1, with the above formula as input, to enumerate the members of whyUN (π‘Β―ππ ,π· , π·, π). All the experiments have been conducted on a laptop with an Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz, and 32GB of RAM, running Fedora Linux 37. Experimental Results and Take-Home Messages. Concerning the construction of the formula, in most of the scenarios, the runtime is in the order of some seconds, and we have observed that almost all the time is spent for computing the intermediate data structures, needed to build the formula. Considering the more demanding scenarios Andersen and CSDA, the total time is in the order of some minutes. Note, however, that it would be similarly demanding even for query answering. For the incremental computation of the why-provenance, we considered the time between the current member of the why-provenance and the next (the delay). All delays are in the order of milliseconds. Hence, once the formula is built, incrementally computing the members of the why-provenance is extremely fast. We also compared our SAT-based approach with the one of [6], which is the only one in the literature for constructing the why-provenance of a given tuple. We point out that for a query π, a database π·, and a tuple Β―π‘ β π(π·), the technique of [6], which we call rule-based, constructs the set why(π‘Β―, π·, π) via a rewriting of π into a set of existential rules. Since whyUN (π‘Β―, π·, π) β why(π‘Β―, π·, π), one may think that computing why(π‘Β―, π·, π) is more demanding. However, computing whyUN (π‘Β―, π·, π) requires checking for unambiguity. We use VLog [14], version 0.9.0, for the rule-based implementation, and we perform the comparison over the Galen and Doctors-based scenarios as these are the only ones considered in [6], and thus, we have access to the rewritten set of existential rules. For the comparison, since the approach of [6] does not support incremental computation, we considered the end-to-end runtime for constructing whyUN (π‘Β―, π·, π) and why(π‘Β―, π·, π) using the SAT-based and the rule- based implementation, respectively; we set a 5 minutes timeout for both approaches. We observed that our SAT-based implementation consistently outperforms the rule-based one. In particular, for the Galen scenario, in most cases, the rule-based implementation does not finish in less than 5 minutes, with the worst case occurring with the largest database, where 41 out of the 100 runs time out. 6. Future Steps From our analysis, it is clear that our future efforts should focus on improving the construction of the Boolean formula. Since proof trees describe a finite reasoning process, it will be interesting to understand how they can be adapted to other rule-based formalisms with finite reasoning, in order to apply our SAT-based approach, such as logic programs with finite stable models [15, 16, 17], and ontology-mediated queries that guarantee the termination of the chase (e.g., see [18, 19, 20, 21, 22, 23]). Finally, it would be interesting to see how to adapt our approach to explain answers obtained over uncertain data, such as inconsistent databases (e.g., see [24, 25, 26, 27, 28]). Acknowledgments This work was funded by the European Union - Next Generation EU under the MUR PRIN-PNRR grant P2022KHTX7 βDISTORTβ, and by the EPSRC grant EP/S003800/1 βEQUIDβ. References [1] M. Calautti, E. Livshits, A. Pieris, M. Schneider, Computing the why-provenance for datalog queries via SAT solvers, in: AAAI, 2024, pp. 10459β10466. [2] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison-Wesley, 1995. [3] P. Buneman, S. Khanna, W. C. Tan, Why and where: A characterization of data provenance, in: ICDT, 2001, pp. 316β330. [4] D. Zhao, P. Subotic, B. Scholz, Debugging large-scale datalog: A scalable provenance evaluation strategy, ACM Trans. Program. Lang. Syst. 42 (2020) 7:1β7:35. [5] J. Esparza, M. Luttenberger, M. Schlund, Fpsolve: A generic solver for fixpoint equations over semirings, in: CIAA, 2014, pp. 1β15. [6] A. Elhalawati, M. KrΓΆtzsch, S. Mennicke, An existential rule framework for computing why-provenance on-demand for datalog, in: RuleML+RR, 2022. [7] C. Bourgaux, P. Bourhis, L. Peterfreund, M. Thomazo, Revisiting semiring provenance for datalog, in: KR, 2022. [8] S. A. Cook, An observation on time-storage trade off, J. Comput. Syst. Sci. 9 (1974) 308β316. [9] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, C. E. Leiserson, Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics, CoRR abs/1908.02591 (2019). URL: http://arxiv.org/abs/1908.02591. [10] J. McAuley, J. Leskovec, Learning to discover social circles in ego networks, in: NIPS, 2012, p. 539β547. [11] Z. Fan, S. Mallireddy, P. Koutris, Towards better understanding of the performance and design of datalog systems, in: Datalog 2.0, 2022, pp. 166β180. [12] N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, F. Scarcello, The DLV system for knowledge representation and reasoning, ACM Trans. Comput. Log. 7 (2006) 499β562. [13] G. Audemard, L. Simon, On the glucose SAT solver, Int. J. Artif. Intell. Tools 27 (2018) 1840001:1β1840001:25. [14] J. Urbani, C. Jacobs, M. KrΓΆtzsch, Column-oriented datalog materialization for large knowledge graphs, in: AAAI, 2016, pp. 258β264. [15] M. Calautti, S. Greco, F. Spezzano, I. Trubitsyna, Checking termination of bottom-up evaluation of logic programs with function symbols, TPLP 15 (2015) 854β889. [16] M. Calautti, S. Greco, I. Trubitsyna, Detecting decidable classes of finitely ground logic programs with function symbols, in: PPDP, 2013, pp. 239β250. [17] M. Calautti, S. Greco, C. Molinaro, I. Trubitsyna, Logic program termination analysis using atom sizes, in: IJCAI, 2015, pp. 2833β2839. [18] T. Gogacz, J. Marcinkowski, A. Pieris, Uniform restricted chase termination, SIAM J. Comput. 52 (2023) 641β683. [19] M. Calautti, A. Pieris, Semi-oblivious chase termination: The sticky case, ToCS 65 (2021) 84β121. [20] M. Calautti, G. Gottlob, A. Pieris, Non-uniformly terminating chase: Size and complexity, in: PODS, 2022, pp. 369β378. [21] M. KrΓΆtzsch, M. Marx, S. Rudolph, The power of the terminating chase (invited talk), in: ICDT, volume 127, 2019, pp. 3:1β3:17. [22] M. Calautti, M. Milani, A. Pieris, Semi-oblivious chase termination for linear existential rules: An experimental study, VLDB 16 (2023) 2858β2870. [23] M. Calautti, G. Gottlob, A. Pieris, Chase termination for guarded existential rules, in: PODS, 2015, pp. 91β103. [24] M. Arenas, L. E. Bertossi, J. Chomicki, Consistent query answers in inconsistent databases, in: PODS, 1999, pp. 68β79. [25] M. Calautti, M. Console, A. Pieris, Counting database repairs under primary keys revisited, in: PODS, 2019, pp. 104β118. [26] M. Calautti, S. Greco, C. Molinaro, I. Trubitsyna, Preference-based inconsistency-tolerant query answering under existential rules, AI 312 (2022) 103772. [27] M. Calautti, L. Caroprese, S. Greco, C. Molinaro, I. Trubitsyna, E. Zumpano, Existential active integrity constraints, ESWA 168 (2021) 114297. [28] M. Calautti, M. Console, A. Pieris, Benchmarking approximate consistent query answering, in: PODS, 2021, pp. 233β246.