=Paper= {{Paper |id=Vol-3816/paper12 |storemode=property |title=Rule-aware Datalog Fact Explanation Using Group-SAT Solver |pdfUrl=https://ceur-ws.org/Vol-3816/paper12.pdf |volume=Vol-3816 |authors=Akira Charoensit,David Carral,Pierre Bisquert,Lucas Rouquette,Federico Ulliana |dblpUrl=https://dblp.org/rec/conf/rulemlrr/CharoensitCBRU24 }} ==Rule-aware Datalog Fact Explanation Using Group-SAT Solver== https://ceur-ws.org/Vol-3816/paper12.pdf
                         Rule-aware Datalog Fact Explanation Using Group-SAT
                         Solver
                         Akira Charoensit1 , David Carral1 , Pierre Bisquert1,2 , Lucas Rouquette1 and Federico Ulliana1
                         1
                             Inria, LIRMM, Univ Montpellier, CNRS, France
                         2
                             IATE, Univ Montpellier, INRAE, Institut Agro, Montpellier, France


                                        Abstract
                                        One of the major benefits of symbolic AI is explainability. When new knowledge is obtained via a reasoning
                                        process, it is possible to determine precisely all elements of the knowledge base that yield this knowledge.
                                        Typically, one would use a SAT solver to compute the explanations. However, SAT-solving is computationally
                                        expensive, and as the knowledge base grows, the time required increases exponentially. This work presents a
                                        method for filtering a datalog knowledge base to optimise the time used by a SAT solver. This is achieved by
                                        creating a hypergraph representing the grounded knowledge base and pruning the nodes that are not reachable
                                        from the fact that we want to explain. This approach proves to be time-effective. Interestingly, one additional
                                        benefit of using this hypergraph is that it is possible to encode more information about the rules used in the
                                        reasoning process. By using an off-the-shelf group-SAT solver, this extra information allows us to find specific
                                        explanations that would be missed if we only considered facts.




                         1. The Explanation Issue
                         In the realm of Artificial Intelligence, symbolic AI stands out for its ability to provide explainable results
                         and predictions. In particular, knowledge and rule-based systems make it possible for users to trace the
                         derivation of new knowledge back to the exact elements that have contributed to it. This traceability
                         is crucial for applications where understanding the reasoning process is as important as the outcome
                         itself, such as in healthcare or regulatory enforcement.
                            While the issue of computing explanations has long been studied for Description Logics (see e.g.,
                         [1, 2, 3, 4, 5]), the extension of the approach to mainstream rule languages such as Datalog has not
                         been considered so far. Datalog is widely recognised as a language which leverages recursion for data
                         processing [6] and plays a significant role in ontology-mediated query answering [7]. On the one hand,
                         it encompasses RDF-Schema and OWL-RL ontologies. On the other hand, many reasoning tasks on
                         ontologies can be reduced to query answering in this language [8]. Providing explanations for Datalog
                         is thus a step towards more reliable data-intensive applications, and the goal of this work is to contribute
                         to the explainability of Datalog.
                            Explaining reasoning in Datalog raises, however, two key challenges. The first is to choose a form of
                         explanation that is suitable for the reasoning task. The second is to actually compute them, which is
                         known to be expensive even for basic forms of explanations [9, 10].

                         Choosing Explanations Let us illustrate the importance of choosing the โ€œrightโ€ notion of explanation.
                         Hereafter, we assume the reader is familiar with propositional and first order logic. Consider the
                         knowledge base ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ in Figure 1 with facts โ„ฑ = {Boss(alice, alice)} and rules โ„› = {๐‘Ÿ1 , ๐‘Ÿ2 ,
                         ๐‘Ÿ3 }. Consider the task of explaining the fact ๐œ™ = Manager(alice), which is entailed by ๐’ฆ. As Figure 1
                         illustrates, ๐œ™ can be derived in two different ways starting from โ„ฑ. One is by applying ๐‘Ÿ1 . The other is
                         by applying ๐‘Ÿ2 and then ๐‘Ÿ3 .


                          RuleML+RRโ€™24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16โ€“22, 2024,
                          Bucharest, Romania
                          $ akira.charoensit@inria.fr (A. Charoensit); david.carral@inria.fr (D. Carral); pierre.bisquert@inrae.fr (P. Bisquert);
                          lucas.rouquette@inria.fr (L. Rouquette); federico.ulliana@inria.fr (F. Ulliana)
                                        ยฉ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                  ๐‘Ÿ1 : โˆ€๐‘ฅ๐‘ฆ.Boss(๐‘ฅ, ๐‘ฆ) โ†’ Manager(๐‘ฅ)
                                                                                  SAT Encoding
                                                                         ๐ด โˆง ๐ด โ†’ ๐ถ โˆง ๐ด โ†’ ๐ต โˆง ๐ต โ†’ ๐ถ โˆง ยฌ๐ถ
           A                                        C                  MUS 1                       MUS 2
    Boss(alice, alice)                         Manager(alice)
                                                                  ๐ด โˆง ๐ด โ†’ ๐ถ โˆง ยฌ๐ถ           ๐ด โˆง ๐ด โ†’ ๐ต โˆง ๐ต โ†’ ๐ถ โˆง ยฌ๐ถ

                                                                   explanation 1                explanation 2
 ๐‘Ÿ2 : โˆ€๐‘ฅ.Boss(๐‘ฅ, ๐‘ฅ) โ†’ CEO(๐‘ฅ)      ๐‘Ÿ3 : โˆ€๐‘ฅ.CEO(๐‘ฅ) โ†’ Manager(๐‘ฅ)
                                                                facts: Boss(alice,alice)     facts: Boss(alice,alice)
                                B
                                                                       rules: ๐‘Ÿ1                  rules: ๐‘Ÿ2 , ๐‘Ÿ3
                             CEO(alice)

Figure 1: Computing Minimal KB-Support Explanations via MUSes


   One possible form of explanation consists in computing the fact-support of ๐œ™ with respect to โ„ฑ,
that is, the set of facts โ„ฑ โ€ฒ โІ โ„ฑ that can contribute to the derivation of ๐œ™. The notion of fact-support
corresponds to that of why-provenance [8, 10, 9, 5]. For the example in Figure 1, this would yield โ„ฑ โ€ฒ = โ„ฑ.
Here, fact-support provides an insufficient information as it does not show the role of rules in the
reasoning task.
   A meaningful notion of explanation here would be that of kb-support, that is, a subset ๐’ฆโ€ฒ of ๐’ฆ
which entails ๐œ™. To be precise, we need to identify minimal subsets of the knowledge base that preserve
the entailment since users typically prefer concise explanations. In the example of Figure 1, there are
two kb-support explanations: ๐’ฆ1โ€ฒ = โŸจ{๐‘Ÿ1 }, โ„ฑโŸฉ and ๐’ฆ2โ€ฒ = โŸจ{๐‘Ÿ2 , ๐‘Ÿ3 }, โ„ฑโŸฉ. The interest in the notion of
kb-support is that it better explains the roles taken by both the rules and the data in the reasoning
task thereby giving the user more power to potentially take action and revise the knowledge base. In
the context of Description Logics, kb-supports correspond to the notion of justifications with ABoxes
(the standard version of justifications rather considering entailment axioms over TBoxes) [1, 2]. In this
work, we will thus consider this notion of kb-support which enables more detailed explanations for fact
entailment over Datalog knowledge bases.

Computing Explanations An effective approach to tackling the expensive theoretical cost of com-
puting explanations is to rely on the use of SAT-solvers [2, 8]. Nowadays, these are considered mature
and effective tools. The critical step of this reduction is that of encoding the input knowledge base and
entailment (i.e., the inputs of the explanation problem) as a propositional formula. To illustrate, Figure 1
shows the SAT encoding of the explanation problem, which goes as follows. It associates every atom
entailed by the knowledge base with a distinct propositional variable (here, ๐ด, ๐ต, and ๐ถ). Then, a
conjunctive formula is built. Its elements are either:

  (i) a propositional variable corresponding to a fact in โ„ฑ (e.g., ๐ด represents Boss(alice, alice)),

  (ii) a grounded rule corresponding to a rule application (e.g., ๐ด โ†’ ๐ต represents Boss(alice, alice) โ†’
       CEO(alice)) or

 (iii) a negated literal, whose propositional variable corresponds to the entailment (e.g., ยฌ๐ถ corresponds
       to ยฌManager(alice)).

  An important detail here is that negating the fact to explain in the encoded formula allows one to
solve the explanation problem by looking at its minimal unsatisfiable sets (MUSes) [11]. As Figure 1
shows, each MUS of the encoded formula provides an explanation.

Computing All Explanations Computing explanations via MUS for Datalog can lead to potential
pitfalls. The main issue is that two distinct rule applications can โ€œconflictโ€ by resulting in the same
SAT-encoding. This case is illustrated in Figure 2. Here, we can see that the applications of ๐‘Ÿ1 and ๐‘Ÿ2 are
both encoded as ๐ด โ†’ ๐ถ. As a result, the MUS enumeration no longer yields a (minimal) explanation.
The situation becomes even more complex when many recursive rules share the same groundings. To
          ๐‘Ÿ1 : โˆ€๐‘ฅ๐‘ฆ.Boss(๐‘ฅ, ๐‘ฆ) โ†’ Manager(๐‘ฅ)              SAT Encoding                       Group SAT Encoding
                                                       ๐ด โˆง ๐ด โ†’ ๐ถ โˆง ยฌ๐ถ                ๐ด โˆง {๐ด โ†’ ๐ถ}๐‘Ÿ1 โˆง {๐ด โ†’ ๐ถ}๐‘Ÿ2 โˆง ยฌ๐ถ

                                                            MUS                    Group MUS 1                Group MUS 2
        A                                C
                                                       ๐ด โˆง ๐ด โ†’ ๐ถ โˆง ยฌ๐ถ           ๐ด โˆง {๐ด โ†’ ๐ถ}๐‘Ÿ1 โˆง ยฌ๐ถ         ๐ด โˆง {๐ด โ†’ ๐ถ}๐‘Ÿ2 โˆง ยฌ๐ถ
 Boss(alice,alice)                  Manager(alice)
                                                         explanation               explanation 1              explanation 2
                                                     facts: Boss(alice,alice)   facts: Boss(alice,alice)   facts: Boss(alice,alice)
           ๐‘Ÿ2 : โˆ€๐‘ฅ.Boss(๐‘ฅ, ๐‘ฅ) โ†’ Manager(๐‘ฅ)                rules: ๐‘Ÿ1 , ๐‘Ÿ2               rules: ๐‘Ÿ1                  rules: ๐‘Ÿ2

Figure 2: SAT vs Group-SAT Translation Example


address these issues, we present an encoding into group-SAT formulas [11] which allows us to establish
a precise correspondence between explanations and group-MUSes.

Computing All Explanations Efficiently While solvers are efficient, MUS computation remains
hard. Therefore, significant challenges may arise in terms of scalability as the size and complexity of
the knowledge base increase. The main reason for this is that in more complex knowledge bases there
may be a larger number of atoms and rules which are not relevant for the entailment to explain. This
is illustrated in Figure 3: atoms s(c) and s(d), as well as rule ๐‘Ÿ4 , are irrelevant for explaining goal(a).
Considering irrelevant atoms will slow down both the encoding computation and the MUS enumeration
task. Hence, we introduce a filtering technique that reduces the size of the encoded formula while
preserving its soundness and completeness.

Contribution The contributions of this paper are the following:

   1. We introduce the first comprehensive approach for computing kb-support explanations for fact
      entailment (a.k.a justifications with ABoxes) over Datalog knowledge bases.

   2. We present a reduction from Datalog rules and facts to group-SAT formulas, and establish an
      exact correspondence between kb-support explanations and group-MUSes (Section 2).

   3. We study the filtering of facts that are irrelevant for the explanation task and show that it is com-
      putationally hard. In light of this, we present a two-step rule-based approach for approximating
      relevance; this includes a static step preprocessing of the input KB and a dynamic step for tracing
      a single entailment. (Section 3).

   4. We present an experimental evaluation showing the effectiveness of our approach. In particular,
      we show that the two-step approach allows good performances to dynamically explain any
      entailment query (Section 4).


2. From Datalog Explanations to Group-MUS
The Logical Setting We assume a first-order signature with functions. We consider mutually disjoint
sets of variables, constants, and functions. We call term any variable, constant, or functional term of
the form ๐‘“ (๐‘ก1 , . . . , ๐‘ก๐‘› ) where ๐‘“ is a function symbol and every ๐‘ก๐‘– is a term. We write lists ๐‘ก1 , . . . , ๐‘ก๐‘›
of terms as ยฏ๐‘ก and often treat these as sets. An atom is a first-order formula of the form ๐‘ƒ (๐‘กยฏ) where
๐‘ƒ is a relational predicate and ยฏ๐‘ก a sequence of terms. An atom is grounded if it does not contain any
variable. For a first-order formula ฮฆ and a list ๐‘ฅ     ยฏ of variables, we write ฮฆ[๐‘ฅ  ยฏ ] to indicate that ๐‘ฅ
                                                                                                          ยฏ is the
set of all free variables that occur in ฮฆ. A fact is a grounded atom without function symbols. A rule ๐‘Ÿ
is a first-order formula of the form โˆ€๐‘ฅ          ยฏ ] โ†’ ๐ป[๐‘งยฏ] where ๐ต[๐‘ฅ
                                            ยฏ .๐ต[๐‘ฅ                     ยฏ ] and ๐ป[๐‘งยฏ] are conjunctions of atoms
and ๐‘งยฏ โІ ๐‘ฅ ยฏ . Such a rule is called Datalog if it is function-free and ๐ป is a single atom. We will often
omit universal quantifiers when writing rules. Moreover, we identify conjunctions of atoms such as ๐ต
and ๐ป above with the corresponding sets, and define body(๐‘Ÿ) = ๐ต and head(๐‘Ÿ) = ๐ป. A knowledge
base (KB) ๐’ฆ is a tuple โŸจโ„›, โ„ฑโŸฉ where โ„› and โ„ฑ are finite sets of rules and facts, respectively. A Datalog
knowledge base is such that โ„ฑ only contains grounded atoms without function symbols and โ„› only
contains Datalog rules. For ๐’ฆ1 = โŸจโ„›1 , โ„ฑ1 โŸฉ and ๐’ฆ2 = โŸจโ„›2 , โ„ฑ2 โŸฉ we write ๐’ฆ1 โІ ๐’ฆ2 when โ„›1 โІ โ„›2 and
โ„ฑ1 โІ โ„ฑ2 .

Fact Entailment The chase is a standard method for computing universal models of knowledge
bases that can in turn be used for tasks like fact entailment [12]. A substitution is a partial function
mapping variables to ground terms. A trigger for a fact set โ„ฑ and a rule set โ„› is a tuple ๐œ = โŸจ๐‘Ÿ, ๐œŽโŸฉ where
๐‘Ÿ โˆˆ โ„›, and ๐œŽ is a substitution such that ๐œŽ(body(๐‘Ÿ)) โІ โ„ฑ. The trigger outputs out(๐œ ) = ๐œŽ(head(๐‘Ÿ)).
For a KB ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ, let Chase0 (๐’ฆ) = โ„ฑ; and, for every ๐‘– โ‰ฅ 1, let Chase๐‘– (๐’ฆ) be the minimal fact
               โ‹ƒ๏ธ€ Chase๐‘–โˆ’1 (๐’ฆ) and out(๐œ ) for every trigger ๐œ of Chase๐‘–โˆ’1 (๐’ฆ) and โ„›. Moreover, let
set that includes
Chase (๐’ฆ) = ๐‘–โ‰ฅ0 Chase๐‘– (๐’ฆ). A KB ๐’ฆ entails a fact ๐œ™, denoted ๐’ฆ |= ๐œ™, if and only if ๐œ™ โˆˆ Chase (๐’ฆ).
  From now on, every knowledge base considered in this section is Datalog. Knowledge bases with
functions will be employed in Section 3. At this point, we can formally define the notion of explanation
for Datalog knowledge bases.

Definition 1 (Explanation). For a Datalog knowledge base ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ and a fact ๐œ™, a kb-support
explanation of ๐’ฆ |= ๐œ™ is a KB ๐’ฆโ€ฒ โІ ๐’ฆ such that ๐’ฆโ€ฒ |= ๐œ™ but ๐’ฆโ€ฒโ€ฒ ฬธ|= ๐œ™ for every ๐’ฆโ€ฒโ€ฒ โŠ‚ ๐’ฆโ€ฒ . We denote by
Expl(๐’ฆ, ๐œ™) the set of all explanations of ๐’ฆ |= ๐œ™.

From Explanations to Group-MUS We now show how to reduce the explanation problem to
group-MUS (GMUS) enumeration. Group-SAT (GSAT) formulas are natural extensions of SAT formulas
where constraints are modeled as sets or groups [13]. The underlying idea is that all clauses in a group
must hold together. Below, we assume the standard notion of literal (positive or negative propositional
variable) and clause (disjunction of literals). A group ๐’ข is a set of clauses. To keep the formalisation
concise, we slightly enrich the standard definition and also assume that every group ๐’ข has a unique
identifier ๐‘– denoted by ๐’ข๐‘– . Otherwise said, we consider two groups with the same clauses but with
different identifiers to be different. A GSAT formula F is a set of groups F = {๐’ข1 , . . . , ๐’ข๐‘› }. A GSAT
formula is satisfiable if the conjunction of all the clauses in the union of its groups is satisfiable. A
GMUS of F is a minimal set of the groups of F that is unsatisfiable. We write GMUS(F) for the set of all
GMUS of F. Figure 2 illustrates the GSAT formula stemming from the encoding of the input facts and
rules. In the formula, we have two groups which contain the same (set of) clauses; these are denoted as
{๐ด โ†’ ๐ถ}๐‘Ÿ1 and {๐ด โ†’ ๐ถ}๐‘Ÿ2 where the identifier of each group corresponds to the rule that generates
it. Also note that with this encoding every GMUS in Figure 2 corresponds to one explanation.
    To reduce the computation of the explanations for a fact ๐œ“ over a KB ๐’ฆ to group-MUS enumeration,
our goal is to produce a GSAT formula which encodes the derivations enabled by the KB as well as
the (negation of) the fact to explain. Let ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ, we define GSAT(๐’ฆ โˆง ยฌ๐œ“) as the minimal set
containing:

    โ€ข the group ๐’ข๐‘Ÿ with ๐’ข = Grounding๐’ฆ (๐‘Ÿ), for every ๐‘Ÿ โˆˆ โ„›

    โ€ข the group ๐’ข๐œ™ with ๐’ข = {๐œ™}, for every ๐œ™ โˆˆ โ„ฑ

    โ€ข the group ๐’ข๐œ“ with ๐’ข = {ยฌ๐œ“}

Above, Grounding๐’ฆ (๐‘Ÿ) is the set containing the grounded rule ๐œŽ(body(๐‘Ÿ)) โ†’ ๐œŽ(head(๐‘Ÿ)) for every
trigger ๐œ = โŸจ๐‘Ÿ, ๐œŽโŸฉ over Chase (๐’ฆ). Given a set of groups F, we denote by KBs(F) the knowledge base
built by using all facts and rules that are identifiers of the groups in F. Proposition 1 establishes a
precise correspondence between the explanations for ๐’ฆ |= ๐œ“ and the group-MUSes of GSAT(๐’ฆ โˆง ยฌ๐œ“).
                                         โ‹ƒ๏ธ
Proposition 1. Expl(๐’ฆ, ๐œ“) =                         KBs(F).
                                FโˆˆGMUS(GSAT(๐’ฆโˆงยฌ๐œ“))
                                         goal(a)
Rules                                       ๐‘Ÿ3        ๐‘Ÿ3
๐‘Ÿ1 : โˆ€๐‘ฅ.p(๐‘ฅ) โ†’ t(๐‘ฅ, ๐‘ฅ)
๐‘Ÿ2 : โˆ€๐‘ฅ๐‘ฆ.t(๐‘ฅ, ๐‘ฆ) โˆง q(๐‘ฆ) โ†’ t(๐‘ฆ, ๐‘ฅ)           t(a, a)        t(a, b)                  v(c, c)   v(d, d)
๐‘Ÿ3 : โˆ€๐‘ฅ๐‘ฆ.t(๐‘ฅ, ๐‘ฅ) โˆง t(๐‘ฅ, ๐‘ฆ) โ†’ goal(๐‘ฅ)
๐‘Ÿ4 : โˆ€๐‘ฅ.s(๐‘ฅ) โ†’ v(๐‘ฅ, ๐‘ฅ)                      ๐‘Ÿ1                       ๐‘Ÿ2             ๐‘Ÿ4            ๐‘Ÿ4
Facts
p(a), q(a), t(b, a), s(c), s(d)              p(a)           q(a)          t(b, a)    s(c)       s(d)

                                         relevant approximation
Figure 3: Relevance: Exact vs Approximate


3. Knowledge-Base Filtering via Relevance
To explain an entailment of a knowledge base, it is often the case that only a portion of the knowledge
baseโ€™s facts and rules are necessary or relevant to that entailment. In this section, we present a formal
definition of relevance and introduce a rule-based technique for filtering out facts that are irrelevant for
explaining a given fact. This technique can drastically reduce the size of the encoded formula to be
solved.

Relevance We say that a fact ๐œ“ is relevant for the entailment of a fact ๐œ™ in a knowledge base ๐’ฆ
if there exists an explanation ๐’ฆโ€ฒ โˆˆ Expl(๐’ฆ, ๐œ™) that contains ๐œ“. This definition naturally extends to
rules. Consider the knowledge base in Figure 3 with facts โ„ฑ = {p(a), q(a), t(b, a), s(c), s(d)} and
rules โ„› = {๐‘Ÿ1 , ๐‘Ÿ2 , ๐‘Ÿ3 , ๐‘Ÿ4 }. The entailment ๐œ™ = goal(a) can be derived in two ways. The first is by
applying ๐‘Ÿ1 , which yields t(a, a); this atom allows for the application of ๐‘Ÿ3 to derive ๐œ™. The second
is by applying ๐‘Ÿ3 on the results of ๐‘Ÿ1 and ๐‘Ÿ2 . The fact ๐œ™ has, however, only one explanation, namely,
๐’ฆโ€ฒ = โŸจ{๐‘Ÿ1 , ๐‘Ÿ3 }, {p(a)}โŸฉ. Indeed, ๐’ฆโ€ฒโ€ฒ = โŸจ{๐‘Ÿ1 , ๐‘Ÿ2 , ๐‘Ÿ3 }, {p(a), q(a), t(b, a)}โŸฉ is not minimal as ๐’ฆโ€ฒ โŠ‚ ๐’ฆโ€ฒโ€ฒ
(see Definition 1). As a result, with Expl(๐’ฆ, ๐œ™) = {๐’ฆโ€ฒ }, the only elements relevant to the entailment of
๐œ™ in ๐’ฆ are p(a) and ๐‘Ÿ1 , ๐‘Ÿ3 . Despite its apparent simplicity, it turns out that deciding relevance is hard.
This holds true even if one focuses only on deciding the relevance of facts, for a fixed rule set. By a
reduction from SAT, we have the following proposition. Proofs can be found in Appendix A.
Proposition 2. For a fixed ruleset โ„›, deciding if a fact ๐œ“ โˆˆ โ„ฑ is relevant for ๐œ™ on โŸจโ„›, โ„ฑโŸฉ is NP-complete.

Approximating Relevance Considering this hardness result, we turn our attention to the task
of approximating relevant facts and rules. This is achieved using a two-step rule-based approach:
a static step which preprocesses the input knowledge base, followed by a dynamic step for tracing
each individual entailment. The static step builds an entailment graph which tracks all relationships
between any entailed atoms and rules. This entailment graph allows us to compute an approximation
of relevant facts. For instance, as illustrated in Figure 3, the relevance approximation includes atoms
p(a), q(a), t(b, a) and rules ๐‘Ÿ1 , ๐‘Ÿ2 , ๐‘Ÿ3 .
   We now present our approach and give a construction for the entailement graph based on rules
with function symbols, which can be deployed on reasoners supporting the formalism. Before detailing
each step, we introduce the following notation. Given a predicate ๐‘ƒ , we denote by ๐‘ƒ + and ๐‘“๐‘ƒ a fresh
predicate and a function unique for ๐‘ƒ , respectively; below, the same will hold for predicates ๐ต and ๐ป.
Moreover, given a rule ๐‘Ÿ we respectively denote by E๐‘Ÿ and ๐‘“๐‘Ÿ a fresh predicate and a function unique
for ๐‘Ÿ.

Static step (entailment graph building) Given a knowledge base ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ, we compute
๐’ฎ = Chase (โŸจEntGraph(โ„›), โ„ฑโŸฉ) where EntGraph(โ„›) is the minimal set containing the following
rules:
   EntGraph(โ„›)

                      ๐‘Ÿ๐‘ : p(๐‘ฅ) โ†’ p+ (๐‘ฅ, ๐‘“p (๐‘ฅ))                              goal+ (a, fgoal (a))   Er3 (ft (a, a), ft (a, a), fgoal (a))
                                         +
                      ๐‘Ÿ๐‘ก : t(๐‘ฅ, ๐‘ฆ) โ†’ t (๐‘ฅ, ๐‘ฆ, ๐‘“t (๐‘ฅ, ๐‘ฆ))
                      ๐‘Ÿ๐‘ž : q(๐‘ฅ) โ†’ q+ (๐‘ฅ, ๐‘“q (๐‘ฅ))                                      ๐‘Ÿ3โ€ฒ

                   ๐‘Ÿ๐‘”๐‘œ๐‘Ž๐‘™ : goal(๐‘ฅ) โ†’ goal+ (๐‘ฅ, ๐‘“goal (๐‘ฅ))
                                                                              t+ (a, a, ft (a, a))         Er1 (fp (a), ft (a, a))
                      ๐‘Ÿ๐‘  : s(๐‘ฅ) โ†’ s+ (๐‘ฅ, ๐‘“s (๐‘ฅ))

                                                                                      ๐‘Ÿ1โ€ฒ

    ๐‘Ÿ1โ€ฒ : p+ (๐‘ฅ, ๐‘“p (๐‘ฅ)) โ†’ t+ (๐‘ฅ, ๐‘ฅ, ๐‘“t (๐‘ฅ, ๐‘ฅ)) โˆง Er1 (๐‘“p (๐‘ฅ), ๐‘“t (๐‘ฅ, ๐‘ฅ))
                                                                                 p+ (a, fp (a))
    ๐‘Ÿ2โ€ฒ : t+ (๐‘ฅ, ๐‘ฆ, ๐‘“t (๐‘ฅ, ๐‘ฆ)) โˆง q+ (๐‘ฅ, ๐‘“q (๐‘ฅ))
             โ†’ t+ (๐‘ฆ, ๐‘ฅ, ๐‘“t (๐‘ฆ, ๐‘ฅ)) โˆง Er2 (๐‘“๐‘ก (๐‘ฅ, ๐‘ฆ), ๐‘“q (๐‘ฅ), ๐‘“t (๐‘ฆ, ๐‘ฅ))
                                                                                      ๐‘Ÿ๐‘
    ๐‘Ÿ3โ€ฒ : t+ (๐‘ฅ, ๐‘ฅ, ๐‘“t (๐‘ฅ, ๐‘ฅ)) โˆง t+ (๐‘ฅ, ๐‘ฆ, ๐‘“t (๐‘ฅ, ๐‘ฆ))
             โ†’ goal+ (๐‘ฅ, ๐‘“goal (๐‘ฅ)) โˆง Er3 (๐‘“t (๐‘ฅ, ๐‘ฅ), ๐‘“t (๐‘ฅ, ๐‘ฆ), ๐‘“goal (๐‘ฅ))
                                                                                     p(a)
    ๐‘Ÿ4โ€ฒ : s+ (๐‘ฅ, ๐‘“s (๐‘ฅ)) โ†’ v+ (๐‘ฅ, ๐‘ฅ, ๐‘“v (๐‘ฅ, ๐‘ฅ)) โˆง Er4 (๐‘“s (๐‘ฅ), ๐‘“v (๐‘ฅ, ๐‘ฅ))


Figure 4: Static Step Rules and Inferences


         ยฏ ) โ†’ ๐‘ƒ + (๐‘ฅ
   1. ๐‘ƒ (๐‘ฅ            ยฏ , ๐‘“๐‘ƒ (๐‘ฅ
                              ยฏ ))                                             for every predicate ๐‘ƒ occurring in โ„ฑ
      โ‹€๏ธ€๐‘›
   2. ๐‘–=1 ๐ต๐‘–+ (๐‘ฅยฏ ๐‘– , ๐‘ฆ๐‘– ) โ†’ ๐ป + (๐‘ฅ        ยฏ )) โˆง E๐‘Ÿ (๐‘ฆ1 , . . . , ๐‘ฆ๐‘› , ๐‘“๐ป (๐‘ฅ
                                   ยฏ , ๐‘“๐ป (๐‘ฅ                                ยฏ ))
                                                                    for every rule ๐‘Ÿ = ๐‘›๐‘–=1 ๐ต๐‘– (๐‘ฅ
                                                                                         โ‹€๏ธ€
                                                                                                   ยฏ ๐‘– ) โ†’ ๐ป(๐‘ฅ
                                                                                                             ยฏ) โˆˆ โ„›

   The aim of the static step is to produce the entailment graph, i.e. a hypergraph representing the links
(in terms of entailment) between the atoms. In order to do that, we first need to be able to reference
the atoms in Chase (๐’ฆ). This is done by rules of type (1) which infer, for any atom ๐‘ƒ (๐‘กยฏ) of arity ๐‘˜, an
atom ๐‘ƒ + (๐‘กยฏ, ๐‘“๐‘ƒ (๐‘กยฏ)) of arity ๐‘˜ + 1 where ๐‘“๐‘ƒ (๐‘กยฏ) is a functional term representing ๐‘ƒ (๐‘กยฏ) itself. Then, every
rule ๐‘Ÿ is transformed into a rule of type (2) which, once applied, uses the functional terms to link all
images of the body atoms to the image of the head atom via an hyperedge relation E๐‘Ÿ proper to ๐‘Ÿ. Let
us emphasise that this step is done only once, for every input ๐’ฆ.
   Figure 4 illustrates the rules for the static step EntGraph(โ„›) for the rule base of Figure 3, as well
as their applications starting from p(a). First, the rule ๐‘Ÿ๐‘ produces the atom p+ (a, fp (a)), where ๐‘“p (a)
represents p(a). Consider then rule ๐‘Ÿ1 . This is transformed in the following way: the body p(๐‘ฅ) and the
head t(๐‘ฅ, ๐‘ฅ) are respectively replaced by p+ (๐‘ฅ, ๐‘“p (๐‘ฅ)) and t+ (๐‘ฅ, ๐‘ฅ, ๐‘“t (๐‘ฅ, ๐‘ฅ)), and an hyperedge atom
Er1 (๐‘“p (๐‘ฅ), ๐‘“t (๐‘ฅ, ๐‘ฅ)) is added to the head to represent the link between the two atoms in the entailment
graph. Note that this edge atom uses the functional terms corresponding to the atoms appearing in
the rule. Since we have p+ (a, fp (a)), the rule ๐‘Ÿ1โ€ฒ produces both t+ (a, a, ft (a, a)) and the edge predicate
Er1 (fp (a), ft (a, a)) representing the fact that p+ (a, fp (a)) is used to derive t+ (a, a, ft (a, a)). Finally, ๐‘Ÿ3โ€ฒ
is obtained from ๐‘Ÿ3 in a similar way, and it is used to derive goal+ (a, fgoal (a, a)).


Dynamic Step (tracing) For any fact ๐œ™ to explain, we define ๐’Ÿ = Chase (โŸจrelPropag(โ„›, ๐œ™), ๐’ฎโŸฉ)
where ๐’ฎ is the result of the static step and relPropag(โ„›, ๐œ™) is the minimal set containing the following
rules:

   1. โ†’ Rel(๐‘“๐‘ƒ (๐‘ฅ
                ยฏ ))                                                                                                    if ๐œ™ = ๐‘ƒ (๐‘ฅ
                                                                                                                                  ยฏ)

   2. Er (๐‘ฆ1 , . . . , ๐‘ฆ๐‘› , ๐‘ง) โˆง Rel(๐‘ง) โ†’ Rel(๐‘ฆ1 ) โˆง . . . โˆง Rel(๐‘ฆ๐‘› ) โˆง RelEdge(๐‘“๐‘Ÿ (๐‘ฆ1 , . . . , ๐‘ฆ๐‘› , ๐‘ง))
                                                                                                        for every rule ๐‘Ÿ in โ„›

Note first that rule (1) has an empty body; this is used to bootstrap the tracing for ๐œ™. Furthermore, for
rules of type (2), we consider that E๐‘Ÿ and ๐‘“๐‘Ÿ are the fresh predicate and function unique for ๐‘Ÿ introduced
    relPropag(โ„›, goal(a))
                                                                  ๐‘Ÿ๐‘Ÿ๐‘’๐‘™๐‘”๐‘œ๐‘Ž๐‘™
    ๐‘Ÿ๐‘Ÿ๐‘’๐‘™๐‘”๐‘œ๐‘Ž๐‘™ : โ†’ Rel(fgoal (a))
      ๐‘Ÿ๐‘Ÿ๐‘’๐‘™1 : Er1 (๐‘ฆ, ๐‘ฅ) โˆง Rel(๐‘ฅ)                               Rel(fgoal (a))       Er3 (ft (a, a), ft (a, a), fgoal (a))
                 โ†’ Rel(๐‘ฆ) โˆง RelEdge(๐‘“๐‘Ÿ1 (๐‘ฆ, ๐‘ฅ))
                                                                    ๐‘Ÿ๐‘Ÿ๐‘’๐‘™3        RelEdge(fr3 (ft (a, a), ft (a, a), fgoal (a)))
      ๐‘Ÿ๐‘Ÿ๐‘’๐‘™2 : Er2 (๐‘ฆ, ๐‘ง, ๐‘ฅ) โˆง Rel(๐‘ฅ)
                 โ†’ Rel(๐‘ฆ) โˆง Rel(๐‘ง) โˆง RelEdge(๐‘“๐‘Ÿ2 (๐‘ฆ, ๐‘ง, ๐‘ฅ))
                                                                Rel(ft (a, a))       Er1 (fp (a), ft (a, a))
      ๐‘Ÿ๐‘Ÿ๐‘’๐‘™3 : Er3 (๐‘ฆ, ๐‘ง, ๐‘ฅ) โˆง Rel(๐‘ฅ)
                 โ†’ Rel(๐‘ฆ) โˆง Rel(๐‘ง) โˆง RelEdge(๐‘“๐‘Ÿ3 (๐‘ฆ, ๐‘ง, ๐‘ฅ))
                                                                    ๐‘Ÿ๐‘Ÿ๐‘’๐‘™1        RelEdge(fr1 (fp (a), ft (a, a)))
      ๐‘Ÿ๐‘Ÿ๐‘’๐‘™4 : Er4 (๐‘ฆ, ๐‘ฅ) โˆง Rel(๐‘ฅ)
                 โ†’ Rel(๐‘ฆ) โˆง RelEdge(๐‘“๐‘Ÿ4 (๐‘ฆ, ๐‘ฅ))
                                                                 Rel(fp (a))


Figure 5: Dynamic Step Rules and Inferences


in the static step, and Rel and RelEdge are reserved predicates. Intuitively, these rules recursively trace
a fact back to the ancestor atoms which belong to the input fact base.
   Figure 5 illustrates the rules for the dynamic step relPropag(โ„›, ๐œ™) for the rule base of Figure 3 and
the entailment ๐œ™ = goal(a), as well as their applications. In a nutshell, the atom to be explained is
marked as relevant thanks to the (empty body) rule โ†’ Rel(fgoal (a)). Then the relevance relation is
propagated back using the edge atoms produced during the static step. In order to do this, we need the
โ€œpropagation ruleโ€ ๐‘Ÿ๐‘Ÿ๐‘’๐‘™1 -๐‘Ÿ๐‘Ÿ๐‘’๐‘™4 . For instance, since we have Er3 (ft (a, a), ft (a, a), fgoal (a)) from the static
step, and goal(a) is relevant (i.e., Rel(fgoal (a))), then by rule ๐‘Ÿ๐‘Ÿ๐‘’๐‘™3 we have that both t(a, a) and ๐‘Ÿ3 are
relevant (i.e, Rel(ft (a, a)) and RelEdge(fr3 (ft (a, a), ft (a, a), fgoal (a))), respectively). The same process
applies for deeming p(a) and rule ๐‘Ÿ1 as relevant by the application of ๐‘Ÿ๐‘Ÿ๐‘’๐‘™1 .
  Finally, for ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ, we define the approximation of the relevant KB as SupRel๐œ™ (๐’ฆ) = โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ
where โ„ฑ โ€ฒ = {๐‘ƒ (๐‘กยฏ) โˆˆ โ„ฑ | ๐’Ÿ |= Rel(๐‘“๐‘ƒ (๐‘กยฏ))} and โ„›โ€ฒ = {๐‘Ÿ โˆˆ โ„› | ๐’Ÿ |= RelEdge(๐‘“๐‘Ÿ (๐‘ก1 , . . . , ๐‘ก๐‘› , ๐‘กโ€ฒ ))}.
Note that both ๐’ฎ and ๐’Ÿ knowledge bases always admit a finite (terminating) chase. Soundness and
completeness below state that this approximation provides the same explanations as the exact relevance.

Proposition 3. Expl(๐’ฆ, ๐œ™) = Expl(SupRel๐œ™ (๐’ฆ), ๐œ™)


4. Experimental Analysis
We have implemented our approach by using the MARCO tool for group-MUS enumeration [11] (Section
2) and InteGraal [14] for approximating relevance (Section 3).

Benchmarks We selected 24 interesting ontologies from the MOWL corpus [15]. These ontologies
were in the EL profile and have been translated into Datalog programs using the Java OWL API [16]
and the translation presented in [17]. We made sure that each ontology has at least 5 extensional facts
and produces at least 5 intensional facts. For each ontology, we picked 5 entailment queries to explain
among the atoms with the greatest reasoning depth (and that are hence, intuitively, more challenging to
explain), that is, atoms that are derived in the last (breadth-first) step of the saturation of the ontology.

Protocol For each ontology, we compute the entailment graph using InteGraal [14] and the rules for
the static step presented in Section 3; recall that this step is done only once per ontology. Subsequently,
for each entailment query, we again use InteGraal [14] and the rules for the dynamic step presented in
Section 3 to compute a knowledge base that consists only of the part of the input ontology relevant to
that particular entailment query. This relevant knowledge base is then translated into a GSAT formula
as described in Section 2 and fed to the MARCO tool [11]. The tool returns all MUSes which, when
Figure 6: Query Explanation Performances


translated back to Datalog knowledge base statements, constitute the explanations. We compared against
the OWL API [16] which is the only tool we are aware of which computes kb-support explanations (i.e.,
ontology axioms corresponding to facts and rules). We implemented our benchmarking protocols using
the B-Runner library [18] and made them available online [19].

Setup All our experiments are run on a server with AMD Ryzen 9 3900XT 12-Core Processor @4.7GHz,
128GB of RAM @2.4GHz, and 2TB of SSD NVME.

Analysis Figure 6 reports the time taken by the static step, which corresponds to the entailment
graph construction, vs the dynamic step which is further divided into filtering (via InteGraal) and MUS
enumeration (via MARCO). For the dynamic step, we report the average time for the 5 queries we
considered. Times are in milliseconds.
   As expected, we can observe that the time taken by the static step (even if within tenths of a second)
is significantly longer than that of the dynamic step. Interestingly, we found that the time required for
constructing the entailment graph depends on the number of possible rule groundings as well as on the
depth (in a breadth-first sense) of the chase. This cost, being fixed, is however amortised when multiple
explanation queries are submitted.
   Concerning the dynamic step, we can observe that this is much faster than the static step, more in
the order of tens of milliseconds. Again, we found that a crucial aspect here is the depth of the derived
atom: the deeper an atom is in the entailment graph the more one has to propagate relevance. Our
results also indicate that the group-MUS enumerator is very fast (in the order of milliseconds), which
further validates this approach for computing explanations. Note also that the generality of our method
makes that it can be implemented with any reasoner supporting the rules described in Section 3, and
any group-MUS enumerator.
   We conclude with a comparison with the explanation facility provided by the OWL API. We can see
that our approach is generally more competitive if we consider only the dynamic step, and can even be
more competitive depending on the cases for one-of explanations including both the static and dynamic
step as a single operation.


5. Related Work & Conclusion
We have introduced the first comprehensive approach for computing kb-support explanations for fact
entailments over Datalog knowledge bases. Our method leverages on group-MUS enumeration for
computing explanations as well as on a rule-based approach to filter irrelevant atoms for the task. Our
approach has been implemented and experimentally evaluated.
   Our work considers explanations in the form of kb-support (a.k.a, justifications with ABoxes), which,
to the best of our knowledge, has not been investigated so far for Datalog [9, 8, 10]. Indeed, existing
methods for Datalog focus on the notion of why-provenance which corresponds to that of fact-support.
The recent work of [8] studies the use of SAT-solvers for computing why-provenance on Datalog based
on so called non-ambiguous proof trees. Rule-based approaches for on-demand computation of Datalog
provenance have been studied in [10]. The recent work of [5] considers provenance computation for
the EL Description Logic [20, 21]. In the context of answer set programming (ASP), the work of [21]
considers a notion of graph-explanation; this can be seen as an extension of fact-support including all
inferred facts (but no rules). While these approaches are close to ours in spirit, the technical development
remains different. Notably, considering kb-supports allows us to obtain more explanations than why-
provenance, and thus a better understanding of the entailed facts. On the other hand, this also required
us to establish a new correspondence between group-MUSes and kb-support explanations.
   In the context of Description Logics, reasoning explanations have long been studied. Justifications,
also called minimal axiom sets, and their computation are referred to as axiom pinpointing. The
approaches closest to ours are the following. [1] and [2] consider the problem of computing axiom
explanation for EL TBoxes (hence, without data). [1] studies the complexity of computing relevant
axioms while [2] presents a reduction to GMUS enumeration. However, in contrast to our work, none
of these approaches consider data (ABoxes). The problem of filtering the knowledge base to a set of
relevant facts has been considered in [8, 10, 3, 4]. All of these approaches employ a notion akin to that
of a dependency graph. However, our work differs in that it considers a different notion of explanation.
Moreover, our work is the only one that studies the complexity of deciding the relevance of atoms
(NP-complete).
   Through this work we demonstrated the practical applicability of group-SAT solvers for knowledge
bases, paving the way for more efficient and explainable AI systems. Future work includes the extension
of our setting to richer rule languages, including suitable forms of functions, negation, and disjunction.


Acknowledgments
This work was supported by the Inria-DFKI bilateral project R4Agri and by the ANR project CQFD
(ANR-18-CE23-0003).


References
 [1] F. Baader, R. Penaloza, B. Suntisrivaraporn, Pinpointing in the description logic, in: Annual
     Conference on Artificial Intelligence, Springer, 2007, pp. 52โ€“67.
 [2] N. Manthey, R. Peรฑaloza, S. Rudolph, Satpin: Axiom pinpointing for lightweight description logics
     through incremental sat, KI-Kรผnstliche Intelligenz 34 (2020) 389โ€“394.
 [3] Z. Zhou, G. Qi, B. Suntisrivaraporn, A new method of finding all justifications in owl 2 el, in: 2013
     IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent
     Technologies (IAT), volume 1, IEEE, 2013, pp. 213โ€“220.
 [4] Z. Zhou, G. Qi, A dependency-graph based approach for finding justification in owl 2 el, Intelligent
     Data Analysis 22 (2018) 1315โ€“1335.
 [5] S. Borgwardt, S. Breuer, A. Kovtunova, Computing abox justifications for query answers via
     datalog rewriting., in: Description Logics, 2023.
 [6] S. Abiteboul, R. Hull, V. Vianu, Foundations of databases, volume 8, Addison-Wesley Reading,
     1995.
 [7] G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, M. Zakharyaschev, Ontology-
     based data access: A survey, in: J. Lang (Ed.), Proceedings of the Twenty-Seventh International
     Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, 2018,
     pp. 5511โ€“5519. URL: https://doi.org/10.24963/ijcai.2018/777. doi:10.24963/IJCAI.2018/777.
 [8] M. Calautti, E. Livshits, A. Pieris, M. Schneider, Computing the why-provenance for datalog queries
     via SAT solvers, in: M. J. Wooldridge, J. G. Dy, S. Natarajan (Eds.), Thirty-Eighth AAAI Conference
     on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of
     Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial
     Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, AAAI Press, 2024, pp. 10459โ€“
     10466. URL: https://doi.org/10.1609/aaai.v38i9.28914. doi:10.1609/AAAI.V38I9.28914.
 [9] C. Bourgaux, P. Bourhis, L. Peterfreund, M. Thomazo, Revisiting semiring provenance for datalog,
      in: G. Kern-Isberner, G. Lakemeyer, T. Meyer (Eds.), Proceedings of the 19th International Confer-
      ence on Principles of Knowledge Representation and Reasoning, KR 2022, Haifa, Israel, July 31 -
     August 5, 2022, 2022. URL: https://proceedings.kr.org/2022/10/.
[10] A. Elhalawati, M. Krรถtzsch, S. Mennicke, An existential rule framework for computing why-
      provenance on-demand for datalog, in: G. Governatori, A. Turhan (Eds.), Rules and Reasoning
     - 6th International Joint Conference on Rules and Reasoning, RuleML+RR 2022, Berlin, Ger-
      many, September 26-28, 2022, Proceedings, volume 13752 of Lecture Notes in Computer Science,
      Springer, 2022, pp. 146โ€“163. URL: https://doi.org/10.1007/978-3-031-21541-4_10. doi:10.1007/
      978-3-031-21541-4\_10.
[11] M. H. Liffiton, A. Previti, A. Malik, J. Marques-Silva, Fast, flexible MUS enumeration, Con-
      straints An Int. J. 21 (2016) 223โ€“250. URL: https://doi.org/10.1007/s10601-015-9183-0. doi:10.
     1007/S10601-015-9183-0.
[12] C. Beeri, M. Y. Vardi, A proof procedure for data dependencies, J. ACM 31 (1984) 718โ€“741. URL:
      https://doi.org/10.1145/1634.1636. doi:10.1145/1634.1636.
[13] M. H. Liffiton, K. A. Sakallah, Algorithms for computing minimal unsatisfiable subsets of
      constraints, J. Autom. Reason. 40 (2008) 1โ€“33. URL: https://doi.org/10.1007/s10817-007-9084-z.
      doi:10.1007/S10817-007-9084-Z.
[14] J.-F. Baget, P. Bisquert, M. Leclรจre, M.-L. Mugnier, G. Pรฉrution-Kihli, F. Tornil, F. Ulliana, InteGraal:
      a Tool for Data-Integration and Reasoning on Heterogeneous and Federated Sources, in: BDA 2023
     - 39e Confรฉrence sur la Gestion de Donnรฉes โ€“ Principes, Technologies et Applications, Montpellier,
      France, 2023. URL: https://hal-lirmm.ccsd.cnrs.fr/lirmm-04304601.
[15] N. Matentzoglu, B. Parsia, The Manchester OWL Corpus (MOWLCorp), original serialisation, 2014.
      URL: https://doi.org/10.5281/zenodo.10851. doi:10.5281/zenodo.10851.
[16] M. Horridge, S. Bechhofer, The OWL API: A java API for OWL ontologies, Semantic Web 2 (2011)
     11โ€“21. URL: https://doi.org/10.3233/SW-2011-0025. doi:10.3233/SW-2011-0025.
[17] D. Carral, J. Zalewski, P. Hitzler, An efficient algorithm for reasoning over OWL EL ontologies with
      nominal schemas, J. Log. Comput. 33 (2023) 136โ€“162. URL: https://doi.org/10.1093/logcom/exac032.
      doi:10.1093/LOGCOM/EXAC032.
[18] F. Ulliana, P. Bisquert, A. Charoensit, R. Colin, F. Tornil, Q. Yeche, Collaborative benchmarking with
      b-runner, in: Rules and Reasoning - 8th International Joint Conference on Rules and Reasoning,
      RuleML+RR 2024, Bucarest, Romania, September 16-18, 2024, Proceedings, Lecture Notes in
      Computer Science, 2024.
[19] Artifacts for Rule-aware Datalog Fact Explanation Using Group-SAT Solver, https://gitlab.inria.fr/
      boreal-artifacts/ruleml2024, 2024.
[20] T. Eiter, T. Geibinger, Explaining answer-set programs with abstract constraint atoms., in: IJCAI,
      2023, pp. 3193โ€“3202.
[21] M. Alviano, L. L. Trieu, T. C. Son, M. Balduccini, Explanations for answer set programming, arXiv
      preprint arXiv:2308.15879 (2023).
A. Proofs for Section 3

                                             Fact Relevance(โ„›)

 Instance: A fact set โ„ฑ, a fact ๐œ™ and a fact ๐œ“ โˆˆ โ„ฑ.
 Question: Is ๐œ“ relevant for the entailment of ๐œ™ with respect to โŸจโ„›, โ„ฑโŸฉ?


Lemma 1. Fact Relevance(โ„›) is in NP .
Proof. We consider an oracle that, for a fixed โ„›, can decide in polynomial time Fact Entailment(โ„›),
that is, whether โŸจโ„›, โ„ฑโŸฉ |= ๐œ™ for a given fact set โ„ฑ and fact ๐œ™. We then consider a non-deterministic
Turing Machine that accepts on input โŸจโ„ฑ, ๐œ™, ๐œ“โŸฉ if and only if ๐œ“ is relevant for ๐œ™ with โŸจโ„›, โ„ฑโŸฉ and does
the following.
   1. Non-deterministically guesses the sets โ„ฑ โ€ฒ โІ โ„ฑ and โ„›โ€ฒ โІ โ„› such that ๐œ“ โˆˆ โ„ฑ โ€ฒ .

   2. Checks with the oracle on Fact Entailment(โ„›) whether โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ |= ๐œ™. If it does not, rejects.

   3. Checks the minimality of โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ. If there exists โŸจโ„›โ€ฒโ€ฒ , โ„ฑ โ€ฒโ€ฒ โŸฉ โІ โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ with |โ„ฑ โ€ฒโ€ฒ | = |โ„ฑ โ€ฒ | โˆ’ 1 or
      |โ„›โ€ฒโ€ฒ | = |โ„›โ€ฒ | โˆ’ 1 such that โŸจโ„›โ€ฒโ€ฒ , โ„ฑ โ€ฒโ€ฒ โŸฉ |= ๐œ™ (checked with the oracle) then rejects, otherwise accepts.


   (Soundness) Given a fact set โ„ฑ, a fact ๐œ™ and some fact ๐œ“ โˆˆ โ„ฑ, we show that if the TM accepts
on input โŸจ๐’ฆ, ๐œ™, ๐œ“โŸฉ then ๐œ“ is relevant for ๐œ™ with โŸจโ„›, โ„ฑโŸฉ. By hypothesis the TM accepts so there is a
pair โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ โІ โŸจโ„›, โ„ฑโŸฉ such that ๐œ“ โˆˆ โ„ฑ โ€ฒ , โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ |= ๐œ™ and โŸจโ„›โ€ฒโ€ฒ , โ„ฑ โ€ฒโ€ฒ โŸฉ ฬธ|= ๐œ™ for every โ„ฑ โ€ฒโ€ฒ โŠ‚ โ„ฑ โ€ฒ and
โ„›โ€ฒโ€ฒ โŠ‚ โ„›โ€ฒ . Hence, by the definitions of explanations and relevance, โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ is an explanation for ๐œ™
containing ๐œ“, thus ๐œ“ is relevant for ๐œ™.
   (Completeness) Given a fact set โ„ฑ, a fact ๐œ™ and some fact ๐œ“ โˆˆ โ„ฑ, we show that if ๐œ“ is relevant for ๐œ™
then the TM accepts on input โŸจ๐’ฆ, ๐œ™, ๐œ“โŸฉ. By hypothesis ๐œ“ is relevant for ๐œ™ so there is an explanation โ„ฐ =
โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ for ๐œ™ containing ๐œ“. Hence, ๐œ“ โˆˆ โ„ฑ โ€ฒ and โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ |= ๐œ™. At step 1 the TM guesses the explanation
โ„ฐ = โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ and by construction of โ„ฐ the oracle returns โ€œyes" at step 2. Again, by hypothesis โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ
is minimal thus the oracle answers โ€œno" for every โŸจโ„›โ€ฒโ€ฒ , โ„ฑ โ€ฒโ€ฒ โŸฉ โŠ‚ โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ of size |โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ| โˆ’ 1 checked
in step 3, and the TM accepts.
   (Termination) As step one is a non-deterministic guess of subsets of โ„ฑ and โ„› and then we use
the P oracle a linear number of times in the size of the guessed subsets, Fact Relevance(โ„›) is in
๐‘๐‘ƒ๐‘ƒ = ๐‘๐‘ƒ.

Lemma 2. Fact Relevance(โ„›) is NP -hard.
Proof. We show that Fact Relevance(โ„›) is NP-hard via a reduction from SAT.
   (Translation) For a CNF formula ฮฆ = ๐ถ1 โˆง . . . โˆง ๐ถ๐‘š defined using the variables {๐‘‰1 , . . . , ๐‘‰๐‘› },
we produce a knowledge base ๐’ฆ = โŸจโ„›, โ„ฑโŸฉ with nullary predicates Source and Target such that ฮฆ is
satisfiable if and only if Source is relevant for the entailment of Target. The knowledge base ๐’ฆ is built
as follows:

              โ„ฑ = {T(๐‘ฃ๐‘– ), F(๐‘ฃ๐‘– ) | 1 โ‰ค ๐‘– โ‰ค ๐‘›} โˆช {Source}
              โ„› = {T(๐‘ฃ๐‘˜ ) โ†’ T(๐‘๐‘– ) | 1 โ‰ค ๐‘– โ‰ค ๐‘š and every positive literal ๐‘‰๐‘˜ in ๐ถ๐‘– } โˆช                           (1)
                    {F(๐‘ฃ๐‘˜ ) โ†’ T(๐‘๐‘– ) | 1 โ‰ค ๐‘– โ‰ค ๐‘š and every negative literal ยฌ๐‘‰๐‘˜ in ๐ถ๐‘– } โˆช                        (2)
                              โ‹€๏ธ๐‘š
                    {Source โˆง         T(๐‘๐‘– ) โ†’ Target} โˆช                                                         (3)
                                    ๐‘–=1
                    {T(๐‘ฅ) โˆง F(๐‘ฅ) โ†’ Target}                                                                       (4)

  Note that in the construction above, we introduced a unique constant ๐‘ฃ๐‘– for each variable ๐‘‰๐‘– and a
unique constant ๐‘๐‘– for each clause ๐ถ๐‘– in ฮฆ. Note that therefore the only rule with variables is (4).
  (Soundness) We show that if Source is relevant for the entailment of Target with ๐’ฆ then the CNF
formula ฮฆ is satisfiable.

  a By hypothesis there is an explanation โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ โІ โŸจโ„›, โ„ฑโŸฉ such that Source โˆˆ โ„ฑ โ€ฒ and โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ |=
  โ—‹
    Target.

  b We show first that for every constant ๐‘ฃ๐‘– appearing in โ„ฑ โ€ฒ , we have that T(๐‘ฃ๐‘– ) โˆˆ โ„ฑ โ€ฒ if and only if
  โ—‹
    F(๐‘ฃ๐‘– ) ฬธโˆˆ โ„ฑ โ€ฒ . Indeed, suppose T(๐‘ฃ๐‘– ), F(๐‘ฃ๐‘– ) โˆˆ โ„ฑ โ€ฒ for some constant ๐‘ฃ๐‘– . Then, with rule (4) we have
    โŸจโ„›โ€ฒ , {T(๐‘ฃ๐‘– ), F(๐‘ฃ๐‘– )}โŸฉ |= Target. This contradicts the fact that โ„ฑ โ€ฒ is a minimal set that, together
    with โ„›โ€ฒ , entails Target.

  c Let ๐œŽโ„ฑ โ€ฒ be the assignment such that ๐œŽโ„ฑ โ€ฒ (๐‘‰๐‘– ) = True if and only if T(๐‘ฃ๐‘– ) โˆˆ โ„ฑ โ€ฒ for every variable
  โ—‹
    ๐‘‰๐‘– in ฮฆ and its corresponding constant ๐‘ฃ๐‘– in ๐’ฆ.

  d From โ—‹,
  โ—‹        b we know that (4) cannot be applied in โ„ฑ โ€ฒ . Therefore, the rule (3) has been applied to
    infer Target.
     Hence, โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ |= T(๐‘๐‘– ) for every 1 โ‰ค ๐‘– โ‰ค ๐‘š.

  โ—‹
  e For 1 โ‰ค ๐‘– โ‰ค ๐‘š we know that T(๐‘๐‘– ) is either inferred by a rule from (1) or from (2).

     If a rule of the form (1) has been applied then there is a constant ๐‘ฃ๐‘˜ such that T(๐‘ฃ๐‘˜ ) โˆˆ โ„ฑ โ€ฒ and
     T(๐‘ฃ๐‘˜ ) โ†’ T(๐‘๐‘– ) โˆˆ โ„›โ€ฒ . Let ๐‘‰๐‘˜ be the corresponding variable of ๐‘ฃ๐‘˜ and ๐ถ๐‘– the clause corresponding
     to ๐‘๐‘– in ฮฆ, then by the construction of the rules in (1) we have that ๐‘‰๐‘˜ โˆˆ ๐ถ๐‘– and by โ—‹ c we know
     that ๐œŽโ„ฑ โ€ฒ (๐‘‰๐‘˜ ) = True, thus ๐œŽโ„ฑ โ€ฒ (๐ถ๐‘– ) = True.
     If a rule of the form (2) has been applied then there is a constant ๐‘ฃ๐‘˜ such that F(๐‘ฃ๐‘˜ ) โˆˆ โ„ฑ โ€ฒ and
     F(๐‘ฃ๐‘˜ ) โ†’ T(๐‘๐‘– ) โˆˆ โ„›โ€ฒ . Let ๐‘‰๐‘˜ be the corresponding variable of ๐‘ฃ๐‘˜ and ๐ถ๐‘– the clause corresponding
     to ๐‘๐‘– in ฮฆ, then by the construction of the rules in (2) we have that ยฌ๐‘‰๐‘˜ โˆˆ ๐ถ๐‘– and by โ—‹c we know
     that ๐œŽโ„ฑ โ€ฒ (๐‘‰๐‘˜ ) = False thus ๐œŽโ„ฑ โ€ฒ (๐ถ๐‘– ) = True.
     Hence, ๐œŽโ„ฑ โ€ฒ (๐ถ๐‘– ) = True for every clause ๐ถ๐‘– โˆˆ ฮฆ, thus ๐œŽ(ฮฆ) = True and ฮฆ is satisfiable.

  (Completeness) We show that if ฮฆ is satisfiable then Source is relevant for Target with ๐’ฆ.

  โ—‹
  a By hypothesis, there is an assignment ๐œŽ such that ๐œŽ is a model of ฮฆ. Consider the set of facts
    โ„ฑ โ€ฒ = {Source} โˆช {T(๐‘ฃ๐‘– ) | ๐œŽ(๐‘‰๐‘– ) = True} โˆช {F(๐‘ฃ๐‘– ) | ๐œŽ(๐‘‰๐‘– ) = False} โІ โ„ฑ.

  โ—‹
  b By hypothesis, we have ๐œŽ(๐ถ๐‘– ) = ๐‘‡ ๐‘Ÿ๐‘ข๐‘’ for every clause ๐ถ๐‘– in ฮฆ and two cases hold.

     There is a positive literal ๐‘‰๐‘˜ โˆˆ ๐ถ๐‘– such that ๐œŽ(๐‘‰๐‘˜ ) = ๐‘‡ ๐‘Ÿ๐‘ข๐‘’. In this case, let ๐‘ฃ๐‘˜ and ๐‘๐‘– be
     the constants corresponding to ๐‘‰๐‘˜ and ๐ถ๐‘– respectively. By โ—‹a we have that T(๐‘ฃ๐‘˜ ) โˆˆ โ„ฑ โ€ฒ , and
     T(๐‘ฃ๐‘˜ ) โ†’ T(๐‘๐‘– ) โˆˆ โ„› by the construction of rules in (1).
     There is a negative literal ยฌ๐‘‰๐‘˜ โˆˆ ๐ถ๐‘– such that ๐œŽ(๐‘‰๐‘˜ ) = ๐น ๐‘Ž๐‘™๐‘ ๐‘’. Let ๐‘ฃ๐‘˜ and ๐‘๐‘– be the constants
                                                 a we have that F(๐‘ฃ๐‘˜ ) โˆˆ โ„ฑ โ€ฒ , and F(๐‘ฃ๐‘˜ ) โ†’ T(๐‘๐‘– ) โˆˆ โ„›
     corresponding to ๐‘‰๐‘˜ and ๐ถ๐‘– respectively. By โ—‹
     by the construction of rules in (2).
     In both cases, โŸจโ„›, โ„ฑ โ€ฒ โŸฉ |= T(๐‘๐‘– ) for every 1 โ‰ค ๐‘– โ‰ค ๐‘š.

  c Let โ„ฑ โ€ฒโ€ฒ be a minimal subset of โ„ฑ โ€ฒ and โ„›โ€ฒ a minimal subset of โ„› (obtained in linear time by
  โ—‹
    deletion, i.e., by removing on fact/rule until the entailment is lost) such that โŸจโ„›โ€ฒ , โ„ฑ โ€ฒโ€ฒ โŸฉ |= Target.
    We know that (3) and (4) are the only rules that can infer Target. However, (4) can not be applied
    on โ„ฑ โ€ฒโ€ฒ . Indeed, โ„ฑ โ€ฒ is built from an assignment ๐œŽ and therefore either T(๐‘ฃ๐‘– ) or F(๐‘ฃ๐‘– ) belongs to
    โ„ฑ โ€ฒโ€ฒ for any constant ๐‘ฃ๐‘– corresponding to a variable ๐‘‰๐‘– in ฮฆ. Hence the rule (3) has been applied
    to derive Target.

  d To conclude, note that to apply (3) the atom Source must belong to โ„ฑ โ€ฒโ€ฒ and โŸจโ„›โ€ฒ , โ„ฑ โ€ฒ โŸฉ is an
  โ—‹
    explanation of Target. Hence, Source is relevant for Target.
   (Polynomiality) We show the size of ๐’ฆ is polynomial with respect to the size of ฮฆ. In the fact set
โ„ฑ we have 2๐‘› + 1 atoms with ๐‘› being the number of propositional variables in ฮฆ. Then the rule set
contains at most ๐‘˜ ร— ๐‘š + 2 rules where ๐‘š is the number of clauses in ฮฆ and ๐‘˜ the largest number of
literals in a clause. Thus the instance ๐’ฆ is polynomial with respect to ฮฆ and the translation can be
computed in polynomial time.