=Paper=
{{Paper
|id=Vol-3816/paper12
|storemode=property
|title=Rule-aware Datalog Fact Explanation Using Group-SAT
Solver
|pdfUrl=https://ceur-ws.org/Vol-3816/paper12.pdf
|volume=Vol-3816
|authors=Akira Charoensit,David Carral,Pierre Bisquert,Lucas Rouquette,Federico Ulliana
|dblpUrl=https://dblp.org/rec/conf/rulemlrr/CharoensitCBRU24
}}
==Rule-aware Datalog Fact Explanation Using Group-SAT
Solver==
<pdf width="1500px">https://ceur-ws.org/Vol-3816/paper12.pdf</pdf>
<pre>
                         Rule-aware Datalog Fact Explanation Using Group-SAT
                         Solver
                         Akira Charoensit1 , David Carral1 , Pierre Bisquert1,2 , Lucas Rouquette1 and Federico Ulliana1
                         1
                             Inria, LIRMM, Univ Montpellier, CNRS, France
                         2
                             IATE, Univ Montpellier, INRAE, Institut Agro, Montpellier, France


                                        Abstract
                                        One of the major benefits of symbolic AI is explainability. When new knowledge is obtained via a reasoning
                                        process, it is possible to determine precisely all elements of the knowledge base that yield this knowledge.
                                        Typically, one would use a SAT solver to compute the explanations. However, SAT-solving is computationally
                                        expensive, and as the knowledge base grows, the time required increases exponentially. This work presents a
                                        method for filtering a datalog knowledge base to optimise the time used by a SAT solver. This is achieved by
                                        creating a hypergraph representing the grounded knowledge base and pruning the nodes that are not reachable
                                        from the fact that we want to explain. This approach proves to be time-effective. Interestingly, one additional
                                        benefit of using this hypergraph is that it is possible to encode more information about the rules used in the
                                        reasoning process. By using an off-the-shelf group-SAT solver, this extra information allows us to find specific
                                        explanations that would be missed if we only considered facts.


                         1. The Explanation Issue
                         In the realm of Artificial Intelligence, symbolic AI stands out for its ability to provide explainable results
                         and predictions. In particular, knowledge and rule-based systems make it possible for users to trace the
                         derivation of new knowledge back to the exact elements that have contributed to it. This traceability
                         is crucial for applications where understanding the reasoning process is as important as the outcome
                         itself, such as in healthcare or regulatory enforcement.
                            While the issue of computing explanations has long been studied for Description Logics (see e.g.,
                         [1, 2, 3, 4, 5]), the extension of the approach to mainstream rule languages such as Datalog has not
                         been considered so far. Datalog is widely recognised as a language which leverages recursion for data
                         processing [6] and plays a significant role in ontology-mediated query answering [7]. On the one hand,
                         it encompasses RDF-Schema and OWL-RL ontologies. On the other hand, many reasoning tasks on
                         ontologies can be reduced to query answering in this language [8]. Providing explanations for Datalog
                         is thus a step towards more reliable data-intensive applications, and the goal of this work is to contribute
                         to the explainability of Datalog.
                            Explaining reasoning in Datalog raises, however, two key challenges. The first is to choose a form of
                         explanation that is suitable for the reasoning task. The second is to actually compute them, which is
                         known to be expensive even for basic forms of explanations [9, 10].

                         Choosing Explanations Let us illustrate the importance of choosing the “right” notion of explanation.
                         Hereafter, we assume the reader is familiar with propositional and first order logic. Consider the
                         knowledge base 𝒦 = ⟨ℛ, ℱ⟩ in Figure 1 with facts ℱ = {Boss(alice, alice)} and rules ℛ = {𝑟1 , 𝑟2 ,
                         𝑟3 }. Consider the task of explaining the fact 𝜙 = Manager(alice), which is entailed by 𝒦. As Figure 1
                         illustrates, 𝜙 can be derived in two different ways starting from ℱ. One is by applying 𝑟1 . The other is
                         by applying 𝑟2 and then 𝑟3 .


                          RuleML+RR’24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16–22, 2024,
                          Bucharest, Romania
                          $ akira.charoensit@inria.fr (A. Charoensit); david.carral@inria.fr (D. Carral); pierre.bisquert@inrae.fr (P. Bisquert);
                          lucas.rouquette@inria.fr (L. Rouquette); federico.ulliana@inria.fr (F. Ulliana)
                                        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                  𝑟1 : ∀𝑥𝑦.Boss(𝑥, 𝑦) → Manager(𝑥)
                                                                                  SAT Encoding
                                                                         𝐴 ∧ 𝐴 → 𝐶 ∧ 𝐴 → 𝐵 ∧ 𝐵 → 𝐶 ∧ ¬𝐶
           A                                        C                  MUS 1                       MUS 2
    Boss(alice, alice)                         Manager(alice)
                                                                  𝐴 ∧ 𝐴 → 𝐶 ∧ ¬𝐶           𝐴 ∧ 𝐴 → 𝐵 ∧ 𝐵 → 𝐶 ∧ ¬𝐶

                                                                   explanation 1                explanation 2
 𝑟2 : ∀𝑥.Boss(𝑥, 𝑥) → CEO(𝑥)      𝑟3 : ∀𝑥.CEO(𝑥) → Manager(𝑥)
                                                                facts: Boss(alice,alice)     facts: Boss(alice,alice)
                                B
                                                                       rules: 𝑟1                  rules: 𝑟2 , 𝑟3
                             CEO(alice)

Figure 1: Computing Minimal KB-Support Explanations via MUSes


   One possible form of explanation consists in computing the fact-support of 𝜙 with respect to ℱ,
that is, the set of facts ℱ ′ ⊆ ℱ that can contribute to the derivation of 𝜙. The notion of fact-support
corresponds to that of why-provenance [8, 10, 9, 5]. For the example in Figure 1, this would yield ℱ ′ = ℱ.
Here, fact-support provides an insufficient information as it does not show the role of rules in the
reasoning task.
   A meaningful notion of explanation here would be that of kb-support, that is, a subset 𝒦′ of 𝒦
which entails 𝜙. To be precise, we need to identify minimal subsets of the knowledge base that preserve
the entailment since users typically prefer concise explanations. In the example of Figure 1, there are
two kb-support explanations: 𝒦1′ = ⟨{𝑟1 }, ℱ⟩ and 𝒦2′ = ⟨{𝑟2 , 𝑟3 }, ℱ⟩. The interest in the notion of
kb-support is that it better explains the roles taken by both the rules and the data in the reasoning
task thereby giving the user more power to potentially take action and revise the knowledge base. In
the context of Description Logics, kb-supports correspond to the notion of justifications with ABoxes
(the standard version of justifications rather considering entailment axioms over TBoxes) [1, 2]. In this
work, we will thus consider this notion of kb-support which enables more detailed explanations for fact
entailment over Datalog knowledge bases.

Computing Explanations An effective approach to tackling the expensive theoretical cost of com-
puting explanations is to rely on the use of SAT-solvers [2, 8]. Nowadays, these are considered mature
and effective tools. The critical step of this reduction is that of encoding the input knowledge base and
entailment (i.e., the inputs of the explanation problem) as a propositional formula. To illustrate, Figure 1
shows the SAT encoding of the explanation problem, which goes as follows. It associates every atom
entailed by the knowledge base with a distinct propositional variable (here, 𝐴, 𝐵, and 𝐶). Then, a
conjunctive formula is built. Its elements are either:

  (i) a propositional variable corresponding to a fact in ℱ (e.g., 𝐴 represents Boss(alice, alice)),

  (ii) a grounded rule corresponding to a rule application (e.g., 𝐴 → 𝐵 represents Boss(alice, alice) →
       CEO(alice)) or

 (iii) a negated literal, whose propositional variable corresponds to the entailment (e.g., ¬𝐶 corresponds
       to ¬Manager(alice)).

  An important detail here is that negating the fact to explain in the encoded formula allows one to
solve the explanation problem by looking at its minimal unsatisfiable sets (MUSes) [11]. As Figure 1
shows, each MUS of the encoded formula provides an explanation.

Computing All Explanations Computing explanations via MUS for Datalog can lead to potential
pitfalls. The main issue is that two distinct rule applications can “conflict” by resulting in the same
SAT-encoding. This case is illustrated in Figure 2. Here, we can see that the applications of 𝑟1 and 𝑟2 are
both encoded as 𝐴 → 𝐶. As a result, the MUS enumeration no longer yields a (minimal) explanation.
The situation becomes even more complex when many recursive rules share the same groundings. To
          𝑟1 : ∀𝑥𝑦.Boss(𝑥, 𝑦) → Manager(𝑥)              SAT Encoding                       Group SAT Encoding
                                                       𝐴 ∧ 𝐴 → 𝐶 ∧ ¬𝐶                𝐴 ∧ {𝐴 → 𝐶}𝑟1 ∧ {𝐴 → 𝐶}𝑟2 ∧ ¬𝐶

                                                            MUS                    Group MUS 1                Group MUS 2
        A                                C
                                                       𝐴 ∧ 𝐴 → 𝐶 ∧ ¬𝐶           𝐴 ∧ {𝐴 → 𝐶}𝑟1 ∧ ¬𝐶         𝐴 ∧ {𝐴 → 𝐶}𝑟2 ∧ ¬𝐶
 Boss(alice,alice)                  Manager(alice)
                                                         explanation               explanation 1              explanation 2
                                                     facts: Boss(alice,alice)   facts: Boss(alice,alice)   facts: Boss(alice,alice)
           𝑟2 : ∀𝑥.Boss(𝑥, 𝑥) → Manager(𝑥)                rules: 𝑟1 , 𝑟2               rules: 𝑟1                  rules: 𝑟2

Figure 2: SAT vs Group-SAT Translation Example


address these issues, we present an encoding into group-SAT formulas [11] which allows us to establish
a precise correspondence between explanations and group-MUSes.

Computing All Explanations Efficiently While solvers are efficient, MUS computation remains
hard. Therefore, significant challenges may arise in terms of scalability as the size and complexity of
the knowledge base increase. The main reason for this is that in more complex knowledge bases there
may be a larger number of atoms and rules which are not relevant for the entailment to explain. This
is illustrated in Figure 3: atoms s(c) and s(d), as well as rule 𝑟4 , are irrelevant for explaining goal(a).
Considering irrelevant atoms will slow down both the encoding computation and the MUS enumeration
task. Hence, we introduce a filtering technique that reduces the size of the encoded formula while
preserving its soundness and completeness.

Contribution The contributions of this paper are the following:

   1. We introduce the first comprehensive approach for computing kb-support explanations for fact
      entailment (a.k.a justifications with ABoxes) over Datalog knowledge bases.

   2. We present a reduction from Datalog rules and facts to group-SAT formulas, and establish an
      exact correspondence between kb-support explanations and group-MUSes (Section 2).

   3. We study the filtering of facts that are irrelevant for the explanation task and show that it is com-
      putationally hard. In light of this, we present a two-step rule-based approach for approximating
      relevance; this includes a static step preprocessing of the input KB and a dynamic step for tracing
      a single entailment. (Section 3).

   4. We present an experimental evaluation showing the effectiveness of our approach. In particular,
      we show that the two-step approach allows good performances to dynamically explain any
      entailment query (Section 4).


2. From Datalog Explanations to Group-MUS
The Logical Setting We assume a first-order signature with functions. We consider mutually disjoint
sets of variables, constants, and functions. We call term any variable, constant, or functional term of
the form 𝑓 (𝑡1 , . . . , 𝑡𝑛 ) where 𝑓 is a function symbol and every 𝑡𝑖 is a term. We write lists 𝑡1 , . . . , 𝑡𝑛
of terms as ¯𝑡 and often treat these as sets. An atom is a first-order formula of the form 𝑃 (𝑡¯) where
𝑃 is a relational predicate and ¯𝑡 a sequence of terms. An atom is grounded if it does not contain any
variable. For a first-order formula Φ and a list 𝑥     ¯ of variables, we write Φ[𝑥  ¯ ] to indicate that 𝑥
                                                                                                          ¯ is the
set of all free variables that occur in Φ. A fact is a grounded atom without function symbols. A rule 𝑟
is a first-order formula of the form ∀𝑥          ¯ ] → 𝐻[𝑧¯] where 𝐵[𝑥
                                            ¯ .𝐵[𝑥                     ¯ ] and 𝐻[𝑧¯] are conjunctions of atoms
and 𝑧¯ ⊆ 𝑥 ¯ . Such a rule is called Datalog if it is function-free and 𝐻 is a single atom. We will often
omit universal quantifiers when writing rules. Moreover, we identify conjunctions of atoms such as 𝐵
and 𝐻 above with the corresponding sets, and define body(𝑟) = 𝐵 and head(𝑟) = 𝐻. A knowledge
base (KB) 𝒦 is a tuple ⟨ℛ, ℱ⟩ where ℛ and ℱ are finite sets of rules and facts, respectively. A Datalog
knowledge base is such that ℱ only contains grounded atoms without function symbols and ℛ only
contains Datalog rules. For 𝒦1 = ⟨ℛ1 , ℱ1 ⟩ and 𝒦2 = ⟨ℛ2 , ℱ2 ⟩ we write 𝒦1 ⊆ 𝒦2 when ℛ1 ⊆ ℛ2 and
ℱ1 ⊆ ℱ2 .

Fact Entailment The chase is a standard method for computing universal models of knowledge
bases that can in turn be used for tasks like fact entailment [12]. A substitution is a partial function
mapping variables to ground terms. A trigger for a fact set ℱ and a rule set ℛ is a tuple 𝜏 = ⟨𝑟, 𝜎⟩ where
𝑟 ∈ ℛ, and 𝜎 is a substitution such that 𝜎(body(𝑟)) ⊆ ℱ. The trigger outputs out(𝜏 ) = 𝜎(head(𝑟)).
For a KB 𝒦 = ⟨ℛ, ℱ⟩, let Chase0 (𝒦) = ℱ; and, for every 𝑖 ≥ 1, let Chase𝑖 (𝒦) be the minimal fact
               ⋃︀ Chase𝑖−1 (𝒦) and out(𝜏 ) for every trigger 𝜏 of Chase𝑖−1 (𝒦) and ℛ. Moreover, let
set that includes
Chase (𝒦) = 𝑖≥0 Chase𝑖 (𝒦). A KB 𝒦 entails a fact 𝜙, denoted 𝒦 |= 𝜙, if and only if 𝜙 ∈ Chase (𝒦).
  From now on, every knowledge base considered in this section is Datalog. Knowledge bases with
functions will be employed in Section 3. At this point, we can formally define the notion of explanation
for Datalog knowledge bases.

Definition 1 (Explanation). For a Datalog knowledge base 𝒦 = ⟨ℛ, ℱ⟩ and a fact 𝜙, a kb-support
explanation of 𝒦 |= 𝜙 is a KB 𝒦′ ⊆ 𝒦 such that 𝒦′ |= 𝜙 but 𝒦′′ ̸|= 𝜙 for every 𝒦′′ ⊂ 𝒦′ . We denote by
Expl(𝒦, 𝜙) the set of all explanations of 𝒦 |= 𝜙.

From Explanations to Group-MUS We now show how to reduce the explanation problem to
group-MUS (GMUS) enumeration. Group-SAT (GSAT) formulas are natural extensions of SAT formulas
where constraints are modeled as sets or groups [13]. The underlying idea is that all clauses in a group
must hold together. Below, we assume the standard notion of literal (positive or negative propositional
variable) and clause (disjunction of literals). A group 𝒢 is a set of clauses. To keep the formalisation
concise, we slightly enrich the standard definition and also assume that every group 𝒢 has a unique
identifier 𝑖 denoted by 𝒢𝑖 . Otherwise said, we consider two groups with the same clauses but with
different identifiers to be different. A GSAT formula F is a set of groups F = {𝒢1 , . . . , 𝒢𝑛 }. A GSAT
formula is satisfiable if the conjunction of all the clauses in the union of its groups is satisfiable. A
GMUS of F is a minimal set of the groups of F that is unsatisfiable. We write GMUS(F) for the set of all
GMUS of F. Figure 2 illustrates the GSAT formula stemming from the encoding of the input facts and
rules. In the formula, we have two groups which contain the same (set of) clauses; these are denoted as
{𝐴 → 𝐶}𝑟1 and {𝐴 → 𝐶}𝑟2 where the identifier of each group corresponds to the rule that generates
it. Also note that with this encoding every GMUS in Figure 2 corresponds to one explanation.
    To reduce the computation of the explanations for a fact 𝜓 over a KB 𝒦 to group-MUS enumeration,
our goal is to produce a GSAT formula which encodes the derivations enabled by the KB as well as
the (negation of) the fact to explain. Let 𝒦 = ⟨ℛ, ℱ⟩, we define GSAT(𝒦 ∧ ¬𝜓) as the minimal set
containing:

    • the group 𝒢𝑟 with 𝒢 = Grounding𝒦 (𝑟), for every 𝑟 ∈ ℛ

    • the group 𝒢𝜙 with 𝒢 = {𝜙}, for every 𝜙 ∈ ℱ

    • the group 𝒢𝜓 with 𝒢 = {¬𝜓}

Above, Grounding𝒦 (𝑟) is the set containing the grounded rule 𝜎(body(𝑟)) → 𝜎(head(𝑟)) for every
trigger 𝜏 = ⟨𝑟, 𝜎⟩ over Chase (𝒦). Given a set of groups F, we denote by KBs(F) the knowledge base
built by using all facts and rules that are identifiers of the groups in F. Proposition 1 establishes a
precise correspondence between the explanations for 𝒦 |= 𝜓 and the group-MUSes of GSAT(𝒦 ∧ ¬𝜓).
                                         ⋃︁
Proposition 1. Expl(𝒦, 𝜓) =                         KBs(F).
                                F∈GMUS(GSAT(𝒦∧¬𝜓))
                                         goal(a)
Rules                                       𝑟3        𝑟3
𝑟1 : ∀𝑥.p(𝑥) → t(𝑥, 𝑥)
𝑟2 : ∀𝑥𝑦.t(𝑥, 𝑦) ∧ q(𝑦) → t(𝑦, 𝑥)           t(a, a)        t(a, b)                  v(c, c)   v(d, d)
𝑟3 : ∀𝑥𝑦.t(𝑥, 𝑥) ∧ t(𝑥, 𝑦) → goal(𝑥)
𝑟4 : ∀𝑥.s(𝑥) → v(𝑥, 𝑥)                      𝑟1                       𝑟2             𝑟4            𝑟4
Facts
p(a), q(a), t(b, a), s(c), s(d)              p(a)           q(a)          t(b, a)    s(c)       s(d)

                                         relevant approximation
Figure 3: Relevance: Exact vs Approximate


3. Knowledge-Base Filtering via Relevance
To explain an entailment of a knowledge base, it is often the case that only a portion of the knowledge
base’s facts and rules are necessary or relevant to that entailment. In this section, we present a formal
definition of relevance and introduce a rule-based technique for filtering out facts that are irrelevant for
explaining a given fact. This technique can drastically reduce the size of the encoded formula to be
solved.

Relevance We say that a fact 𝜓 is relevant for the entailment of a fact 𝜙 in a knowledge base 𝒦
if there exists an explanation 𝒦′ ∈ Expl(𝒦, 𝜙) that contains 𝜓. This definition naturally extends to
rules. Consider the knowledge base in Figure 3 with facts ℱ = {p(a), q(a), t(b, a), s(c), s(d)} and
rules ℛ = {𝑟1 , 𝑟2 , 𝑟3 , 𝑟4 }. The entailment 𝜙 = goal(a) can be derived in two ways. The first is by
applying 𝑟1 , which yields t(a, a); this atom allows for the application of 𝑟3 to derive 𝜙. The second
is by applying 𝑟3 on the results of 𝑟1 and 𝑟2 . The fact 𝜙 has, however, only one explanation, namely,
𝒦′ = ⟨{𝑟1 , 𝑟3 }, {p(a)}⟩. Indeed, 𝒦′′ = ⟨{𝑟1 , 𝑟2 , 𝑟3 }, {p(a), q(a), t(b, a)}⟩ is not minimal as 𝒦′ ⊂ 𝒦′′
(see Definition 1). As a result, with Expl(𝒦, 𝜙) = {𝒦′ }, the only elements relevant to the entailment of
𝜙 in 𝒦 are p(a) and 𝑟1 , 𝑟3 . Despite its apparent simplicity, it turns out that deciding relevance is hard.
This holds true even if one focuses only on deciding the relevance of facts, for a fixed rule set. By a
reduction from SAT, we have the following proposition. Proofs can be found in Appendix A.
Proposition 2. For a fixed ruleset ℛ, deciding if a fact 𝜓 ∈ ℱ is relevant for 𝜙 on ⟨ℛ, ℱ⟩ is NP-complete.

Approximating Relevance Considering this hardness result, we turn our attention to the task
of approximating relevant facts and rules. This is achieved using a two-step rule-based approach:
a static step which preprocesses the input knowledge base, followed by a dynamic step for tracing
each individual entailment. The static step builds an entailment graph which tracks all relationships
between any entailed atoms and rules. This entailment graph allows us to compute an approximation
of relevant facts. For instance, as illustrated in Figure 3, the relevance approximation includes atoms
p(a), q(a), t(b, a) and rules 𝑟1 , 𝑟2 , 𝑟3 .
   We now present our approach and give a construction for the entailement graph based on rules
with function symbols, which can be deployed on reasoners supporting the formalism. Before detailing
each step, we introduce the following notation. Given a predicate 𝑃 , we denote by 𝑃 + and 𝑓𝑃 a fresh
predicate and a function unique for 𝑃 , respectively; below, the same will hold for predicates 𝐵 and 𝐻.
Moreover, given a rule 𝑟 we respectively denote by E𝑟 and 𝑓𝑟 a fresh predicate and a function unique
for 𝑟.

Static step (entailment graph building) Given a knowledge base 𝒦 = ⟨ℛ, ℱ⟩, we compute
𝒮 = Chase (⟨EntGraph(ℛ), ℱ⟩) where EntGraph(ℛ) is the minimal set containing the following
rules:
   EntGraph(ℛ)

                      𝑟𝑝 : p(𝑥) → p+ (𝑥, 𝑓p (𝑥))                              goal+ (a, fgoal (a))   Er3 (ft (a, a), ft (a, a), fgoal (a))
                                         +
                      𝑟𝑡 : t(𝑥, 𝑦) → t (𝑥, 𝑦, 𝑓t (𝑥, 𝑦))
                      𝑟𝑞 : q(𝑥) → q+ (𝑥, 𝑓q (𝑥))                                      𝑟3′

                   𝑟𝑔𝑜𝑎𝑙 : goal(𝑥) → goal+ (𝑥, 𝑓goal (𝑥))
                                                                              t+ (a, a, ft (a, a))         Er1 (fp (a), ft (a, a))
                      𝑟𝑠 : s(𝑥) → s+ (𝑥, 𝑓s (𝑥))

                                                                                      𝑟1′

    𝑟1′ : p+ (𝑥, 𝑓p (𝑥)) → t+ (𝑥, 𝑥, 𝑓t (𝑥, 𝑥)) ∧ Er1 (𝑓p (𝑥), 𝑓t (𝑥, 𝑥))
                                                                                 p+ (a, fp (a))
    𝑟2′ : t+ (𝑥, 𝑦, 𝑓t (𝑥, 𝑦)) ∧ q+ (𝑥, 𝑓q (𝑥))
             → t+ (𝑦, 𝑥, 𝑓t (𝑦, 𝑥)) ∧ Er2 (𝑓𝑡 (𝑥, 𝑦), 𝑓q (𝑥), 𝑓t (𝑦, 𝑥))
                                                                                      𝑟𝑝
    𝑟3′ : t+ (𝑥, 𝑥, 𝑓t (𝑥, 𝑥)) ∧ t+ (𝑥, 𝑦, 𝑓t (𝑥, 𝑦))
             → goal+ (𝑥, 𝑓goal (𝑥)) ∧ Er3 (𝑓t (𝑥, 𝑥), 𝑓t (𝑥, 𝑦), 𝑓goal (𝑥))
                                                                                     p(a)
    𝑟4′ : s+ (𝑥, 𝑓s (𝑥)) → v+ (𝑥, 𝑥, 𝑓v (𝑥, 𝑥)) ∧ Er4 (𝑓s (𝑥), 𝑓v (𝑥, 𝑥))


Figure 4: Static Step Rules and Inferences


         ¯ ) → 𝑃 + (𝑥
   1. 𝑃 (𝑥            ¯ , 𝑓𝑃 (𝑥
                              ¯ ))                                             for every predicate 𝑃 occurring in ℱ
      ⋀︀𝑛
   2. 𝑖=1 𝐵𝑖+ (𝑥¯ 𝑖 , 𝑦𝑖 ) → 𝐻 + (𝑥        ¯ )) ∧ E𝑟 (𝑦1 , . . . , 𝑦𝑛 , 𝑓𝐻 (𝑥
                                   ¯ , 𝑓𝐻 (𝑥                                ¯ ))
                                                                    for every rule 𝑟 = 𝑛𝑖=1 𝐵𝑖 (𝑥
                                                                                         ⋀︀
                                                                                                   ¯ 𝑖 ) → 𝐻(𝑥
                                                                                                             ¯) ∈ ℛ

   The aim of the static step is to produce the entailment graph, i.e. a hypergraph representing the links
(in terms of entailment) between the atoms. In order to do that, we first need to be able to reference
the atoms in Chase (𝒦). This is done by rules of type (1) which infer, for any atom 𝑃 (𝑡¯) of arity 𝑘, an
atom 𝑃 + (𝑡¯, 𝑓𝑃 (𝑡¯)) of arity 𝑘 + 1 where 𝑓𝑃 (𝑡¯) is a functional term representing 𝑃 (𝑡¯) itself. Then, every
rule 𝑟 is transformed into a rule of type (2) which, once applied, uses the functional terms to link all
images of the body atoms to the image of the head atom via an hyperedge relation E𝑟 proper to 𝑟. Let
us emphasise that this step is done only once, for every input 𝒦.
   Figure 4 illustrates the rules for the static step EntGraph(ℛ) for the rule base of Figure 3, as well
as their applications starting from p(a). First, the rule 𝑟𝑝 produces the atom p+ (a, fp (a)), where 𝑓p (a)
represents p(a). Consider then rule 𝑟1 . This is transformed in the following way: the body p(𝑥) and the
head t(𝑥, 𝑥) are respectively replaced by p+ (𝑥, 𝑓p (𝑥)) and t+ (𝑥, 𝑥, 𝑓t (𝑥, 𝑥)), and an hyperedge atom
Er1 (𝑓p (𝑥), 𝑓t (𝑥, 𝑥)) is added to the head to represent the link between the two atoms in the entailment
graph. Note that this edge atom uses the functional terms corresponding to the atoms appearing in
the rule. Since we have p+ (a, fp (a)), the rule 𝑟1′ produces both t+ (a, a, ft (a, a)) and the edge predicate
Er1 (fp (a), ft (a, a)) representing the fact that p+ (a, fp (a)) is used to derive t+ (a, a, ft (a, a)). Finally, 𝑟3′
is obtained from 𝑟3 in a similar way, and it is used to derive goal+ (a, fgoal (a, a)).


Dynamic Step (tracing) For any fact 𝜙 to explain, we define 𝒟 = Chase (⟨relPropag(ℛ, 𝜙), 𝒮⟩)
where 𝒮 is the result of the static step and relPropag(ℛ, 𝜙) is the minimal set containing the following
rules:

   1. → Rel(𝑓𝑃 (𝑥
                ¯ ))                                                                                                    if 𝜙 = 𝑃 (𝑥
                                                                                                                                  ¯)

   2. Er (𝑦1 , . . . , 𝑦𝑛 , 𝑧) ∧ Rel(𝑧) → Rel(𝑦1 ) ∧ . . . ∧ Rel(𝑦𝑛 ) ∧ RelEdge(𝑓𝑟 (𝑦1 , . . . , 𝑦𝑛 , 𝑧))
                                                                                                        for every rule 𝑟 in ℛ

Note first that rule (1) has an empty body; this is used to bootstrap the tracing for 𝜙. Furthermore, for
rules of type (2), we consider that E𝑟 and 𝑓𝑟 are the fresh predicate and function unique for 𝑟 introduced
    relPropag(ℛ, goal(a))
                                                                  𝑟𝑟𝑒𝑙𝑔𝑜𝑎𝑙
    𝑟𝑟𝑒𝑙𝑔𝑜𝑎𝑙 : → Rel(fgoal (a))
      𝑟𝑟𝑒𝑙1 : Er1 (𝑦, 𝑥) ∧ Rel(𝑥)                               Rel(fgoal (a))       Er3 (ft (a, a), ft (a, a), fgoal (a))
                 → Rel(𝑦) ∧ RelEdge(𝑓𝑟1 (𝑦, 𝑥))
                                                                    𝑟𝑟𝑒𝑙3        RelEdge(fr3 (ft (a, a), ft (a, a), fgoal (a)))
      𝑟𝑟𝑒𝑙2 : Er2 (𝑦, 𝑧, 𝑥) ∧ Rel(𝑥)
                 → Rel(𝑦) ∧ Rel(𝑧) ∧ RelEdge(𝑓𝑟2 (𝑦, 𝑧, 𝑥))
                                                                Rel(ft (a, a))       Er1 (fp (a), ft (a, a))
      𝑟𝑟𝑒𝑙3 : Er3 (𝑦, 𝑧, 𝑥) ∧ Rel(𝑥)
                 → Rel(𝑦) ∧ Rel(𝑧) ∧ RelEdge(𝑓𝑟3 (𝑦, 𝑧, 𝑥))
                                                                    𝑟𝑟𝑒𝑙1        RelEdge(fr1 (fp (a), ft (a, a)))
      𝑟𝑟𝑒𝑙4 : Er4 (𝑦, 𝑥) ∧ Rel(𝑥)
                 → Rel(𝑦) ∧ RelEdge(𝑓𝑟4 (𝑦, 𝑥))
                                                                 Rel(fp (a))


Figure 5: Dynamic Step Rules and Inferences


in the static step, and Rel and RelEdge are reserved predicates. Intuitively, these rules recursively trace
a fact back to the ancestor atoms which belong to the input fact base.
   Figure 5 illustrates the rules for the dynamic step relPropag(ℛ, 𝜙) for the rule base of Figure 3 and
the entailment 𝜙 = goal(a), as well as their applications. In a nutshell, the atom to be explained is
marked as relevant thanks to the (empty body) rule → Rel(fgoal (a)). Then the relevance relation is
propagated back using the edge atoms produced during the static step. In order to do this, we need the
“propagation rule” 𝑟𝑟𝑒𝑙1 -𝑟𝑟𝑒𝑙4 . For instance, since we have Er3 (ft (a, a), ft (a, a), fgoal (a)) from the static
step, and goal(a) is relevant (i.e., Rel(fgoal (a))), then by rule 𝑟𝑟𝑒𝑙3 we have that both t(a, a) and 𝑟3 are
relevant (i.e, Rel(ft (a, a)) and RelEdge(fr3 (ft (a, a), ft (a, a), fgoal (a))), respectively). The same process
applies for deeming p(a) and rule 𝑟1 as relevant by the application of 𝑟𝑟𝑒𝑙1 .
  Finally, for 𝒦 = ⟨ℛ, ℱ⟩, we define the approximation of the relevant KB as SupRel𝜙 (𝒦) = ⟨ℛ′ , ℱ ′ ⟩
where ℱ ′ = {𝑃 (𝑡¯) ∈ ℱ | 𝒟 |= Rel(𝑓𝑃 (𝑡¯))} and ℛ′ = {𝑟 ∈ ℛ | 𝒟 |= RelEdge(𝑓𝑟 (𝑡1 , . . . , 𝑡𝑛 , 𝑡′ ))}.
Note that both 𝒮 and 𝒟 knowledge bases always admit a finite (terminating) chase. Soundness and
completeness below state that this approximation provides the same explanations as the exact relevance.

Proposition 3. Expl(𝒦, 𝜙) = Expl(SupRel𝜙 (𝒦), 𝜙)


4. Experimental Analysis
We have implemented our approach by using the MARCO tool for group-MUS enumeration [11] (Section
2) and InteGraal [14] for approximating relevance (Section 3).

Benchmarks We selected 24 interesting ontologies from the MOWL corpus [15]. These ontologies
were in the EL profile and have been translated into Datalog programs using the Java OWL API [16]
and the translation presented in [17]. We made sure that each ontology has at least 5 extensional facts
and produces at least 5 intensional facts. For each ontology, we picked 5 entailment queries to explain
among the atoms with the greatest reasoning depth (and that are hence, intuitively, more challenging to
explain), that is, atoms that are derived in the last (breadth-first) step of the saturation of the ontology.

Protocol For each ontology, we compute the entailment graph using InteGraal [14] and the rules for
the static step presented in Section 3; recall that this step is done only once per ontology. Subsequently,
for each entailment query, we again use InteGraal [14] and the rules for the dynamic step presented in
Section 3 to compute a knowledge base that consists only of the part of the input ontology relevant to
that particular entailment query. This relevant knowledge base is then translated into a GSAT formula
as described in Section 2 and fed to the MARCO tool [11]. The tool returns all MUSes which, when
Figure 6: Query Explanation Performances


translated back to Datalog knowledge base statements, constitute the explanations. We compared against
the OWL API [16] which is the only tool we are aware of which computes kb-support explanations (i.e.,
ontology axioms corresponding to facts and rules). We implemented our benchmarking protocols using
the B-Runner library [18] and made them available online [19].

Setup All our experiments are run on a server with AMD Ryzen 9 3900XT 12-Core Processor @4.7GHz,
128GB of RAM @2.4GHz, and 2TB of SSD NVME.

Analysis Figure 6 reports the time taken by the static step, which corresponds to the entailment
graph construction, vs the dynamic step which is further divided into filtering (via InteGraal) and MUS
enumeration (via MARCO). For the dynamic step, we report the average time for the 5 queries we
considered. Times are in milliseconds.
   As expected, we can observe that the time taken by the static step (even if within tenths of a second)
is significantly longer than that of the dynamic step. Interestingly, we found that the time required for
constructing the entailment graph depends on the number of possible rule groundings as well as on the
depth (in a breadth-first sense) of the chase. This cost, being fixed, is however amortised when multiple
explanation queries are submitted.
   Concerning the dynamic step, we can observe that this is much faster than the static step, more in
the order of tens of milliseconds. Again, we found that a crucial aspect here is the depth of the derived
atom: the deeper an atom is in the entailment graph the more one has to propagate relevance. Our
results also indicate that the group-MUS enumerator is very fast (in the order of milliseconds), which
further validates this approach for computing explanations. Note also that the generality of our method
makes that it can be implemented with any reasoner supporting the rules described in Section 3, and
any group-MUS enumerator.
   We conclude with a comparison with the explanation facility provided by the OWL API. We can see
that our approach is generally more competitive if we consider only the dynamic step, and can even be
more competitive depending on the cases for one-of explanations including both the static and dynamic
step as a single operation.


5. Related Work & Conclusion
We have introduced the first comprehensive approach for computing kb-support explanations for fact
entailments over Datalog knowledge bases. Our method leverages on group-MUS enumeration for
computing explanations as well as on a rule-based approach to filter irrelevant atoms for the task. Our
approach has been implemented and experimentally evaluated.
   Our work considers explanations in the form of kb-support (a.k.a, justifications with ABoxes), which,
to the best of our knowledge, has not been investigated so far for Datalog [9, 8, 10]. Indeed, existing
methods for Datalog focus on the notion of why-provenance which corresponds to that of fact-support.
The recent work of [8] studies the use of SAT-solvers for computing why-provenance on Datalog based
on so called non-ambiguous proof trees. Rule-based approaches for on-demand computation of Datalog
provenance have been studied in [10]. The recent work of [5] considers provenance computation for
the EL Description Logic [20, 21]. In the context of answer set programming (ASP), the work of [21]
considers a notion of graph-explanation; this can be seen as an extension of fact-support including all
inferred facts (but no rules). While these approaches are close to ours in spirit, the technical development
remains different. Notably, considering kb-supports allows us to obtain more explanations than why-
provenance, and thus a better understanding of the entailed facts. On the other hand, this also required
us to establish a new correspondence between group-MUSes and kb-support explanations.
   In the context of Description Logics, reasoning explanations have long been studied. Justifications,
also called minimal axiom sets, and their computation are referred to as axiom pinpointing. The
approaches closest to ours are the following. [1] and [2] consider the problem of computing axiom
explanation for EL TBoxes (hence, without data). [1] studies the complexity of computing relevant
axioms while [2] presents a reduction to GMUS enumeration. However, in contrast to our work, none
of these approaches consider data (ABoxes). The problem of filtering the knowledge base to a set of
relevant facts has been considered in [8, 10, 3, 4]. All of these approaches employ a notion akin to that
of a dependency graph. However, our work differs in that it considers a different notion of explanation.
Moreover, our work is the only one that studies the complexity of deciding the relevance of atoms
(NP-complete).
   Through this work we demonstrated the practical applicability of group-SAT solvers for knowledge
bases, paving the way for more efficient and explainable AI systems. Future work includes the extension
of our setting to richer rule languages, including suitable forms of functions, negation, and disjunction.


Acknowledgments
This work was supported by the Inria-DFKI bilateral project R4Agri and by the ANR project CQFD
(ANR-18-CE23-0003).


References
 [1] F. Baader, R. Penaloza, B. Suntisrivaraporn, Pinpointing in the description logic, in: Annual
     Conference on Artificial Intelligence, Springer, 2007, pp. 52–67.
 [2] N. Manthey, R. Peñaloza, S. Rudolph, Satpin: Axiom pinpointing for lightweight description logics
     through incremental sat, KI-Künstliche Intelligenz 34 (2020) 389–394.
 [3] Z. Zhou, G. Qi, B. Suntisrivaraporn, A new method of finding all justifications in owl 2 el, in: 2013
     IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent
     Technologies (IAT), volume 1, IEEE, 2013, pp. 213–220.
 [4] Z. Zhou, G. Qi, A dependency-graph based approach for finding justification in owl 2 el, Intelligent
     Data Analysis 22 (2018) 1315–1335.
 [5] S. Borgwardt, S. Breuer, A. Kovtunova, Computing abox justifications for query answers via
     datalog rewriting., in: Description Logics, 2023.
 [6] S. Abiteboul, R. Hull, V. Vianu, Foundations of databases, volume 8, Addison-Wesley Reading,
     1995.
 [7] G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, M. Zakharyaschev, Ontology-
     based data access: A survey, in: J. Lang (Ed.), Proceedings of the Twenty-Seventh International
     Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, 2018,
     pp. 5511–5519. URL: https://doi.org/10.24963/ijcai.2018/777. doi:10.24963/IJCAI.2018/777.
 [8] M. Calautti, E. Livshits, A. Pieris, M. Schneider, Computing the why-provenance for datalog queries
     via SAT solvers, in: M. J. Wooldridge, J. G. Dy, S. Natarajan (Eds.), Thirty-Eighth AAAI Conference
     on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of
     Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial
     Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, AAAI Press, 2024, pp. 10459–
     10466. URL: https://doi.org/10.1609/aaai.v38i9.28914. doi:10.1609/AAAI.V38I9.28914.
 [9] C. Bourgaux, P. Bourhis, L. Peterfreund, M. Thomazo, Revisiting semiring provenance for datalog,
      in: G. Kern-Isberner, G. Lakemeyer, T. Meyer (Eds.), Proceedings of the 19th International Confer-
      ence on Principles of Knowledge Representation and Reasoning, KR 2022, Haifa, Israel, July 31 -
     August 5, 2022, 2022. URL: https://proceedings.kr.org/2022/10/.
[10] A. Elhalawati, M. Krötzsch, S. Mennicke, An existential rule framework for computing why-
      provenance on-demand for datalog, in: G. Governatori, A. Turhan (Eds.), Rules and Reasoning
     - 6th International Joint Conference on Rules and Reasoning, RuleML+RR 2022, Berlin, Ger-
      many, September 26-28, 2022, Proceedings, volume 13752 of Lecture Notes in Computer Science,
      Springer, 2022, pp. 146–163. URL: https://doi.org/10.1007/978-3-031-21541-4_10. doi:10.1007/
      978-3-031-21541-4\_10.
[11] M. H. Liffiton, A. Previti, A. Malik, J. Marques-Silva, Fast, flexible MUS enumeration, Con-
      straints An Int. J. 21 (2016) 223–250. URL: https://doi.org/10.1007/s10601-015-9183-0. doi:10.
     1007/S10601-015-9183-0.
[12] C. Beeri, M. Y. Vardi, A proof procedure for data dependencies, J. ACM 31 (1984) 718–741. URL:
      https://doi.org/10.1145/1634.1636. doi:10.1145/1634.1636.
[13] M. H. Liffiton, K. A. Sakallah, Algorithms for computing minimal unsatisfiable subsets of
      constraints, J. Autom. Reason. 40 (2008) 1–33. URL: https://doi.org/10.1007/s10817-007-9084-z.
      doi:10.1007/S10817-007-9084-Z.
[14] J.-F. Baget, P. Bisquert, M. Leclère, M.-L. Mugnier, G. Pérution-Kihli, F. Tornil, F. Ulliana, InteGraal:
      a Tool for Data-Integration and Reasoning on Heterogeneous and Federated Sources, in: BDA 2023
     - 39e Conférence sur la Gestion de Données – Principes, Technologies et Applications, Montpellier,
      France, 2023. URL: https://hal-lirmm.ccsd.cnrs.fr/lirmm-04304601.
[15] N. Matentzoglu, B. Parsia, The Manchester OWL Corpus (MOWLCorp), original serialisation, 2014.
      URL: https://doi.org/10.5281/zenodo.10851. doi:10.5281/zenodo.10851.
[16] M. Horridge, S. Bechhofer, The OWL API: A java API for OWL ontologies, Semantic Web 2 (2011)
     11–21. URL: https://doi.org/10.3233/SW-2011-0025. doi:10.3233/SW-2011-0025.
[17] D. Carral, J. Zalewski, P. Hitzler, An efficient algorithm for reasoning over OWL EL ontologies with
      nominal schemas, J. Log. Comput. 33 (2023) 136–162. URL: https://doi.org/10.1093/logcom/exac032.
      doi:10.1093/LOGCOM/EXAC032.
[18] F. Ulliana, P. Bisquert, A. Charoensit, R. Colin, F. Tornil, Q. Yeche, Collaborative benchmarking with
      b-runner, in: Rules and Reasoning - 8th International Joint Conference on Rules and Reasoning,
      RuleML+RR 2024, Bucarest, Romania, September 16-18, 2024, Proceedings, Lecture Notes in
      Computer Science, 2024.
[19] Artifacts for Rule-aware Datalog Fact Explanation Using Group-SAT Solver, https://gitlab.inria.fr/
      boreal-artifacts/ruleml2024, 2024.
[20] T. Eiter, T. Geibinger, Explaining answer-set programs with abstract constraint atoms., in: IJCAI,
      2023, pp. 3193–3202.
[21] M. Alviano, L. L. Trieu, T. C. Son, M. Balduccini, Explanations for answer set programming, arXiv
      preprint arXiv:2308.15879 (2023).
A. Proofs for Section 3

                                             Fact Relevance(ℛ)

 Instance: A fact set ℱ, a fact 𝜙 and a fact 𝜓 ∈ ℱ.
 Question: Is 𝜓 relevant for the entailment of 𝜙 with respect to ⟨ℛ, ℱ⟩?


Lemma 1. Fact Relevance(ℛ) is in NP .
Proof. We consider an oracle that, for a fixed ℛ, can decide in polynomial time Fact Entailment(ℛ),
that is, whether ⟨ℛ, ℱ⟩ |= 𝜙 for a given fact set ℱ and fact 𝜙. We then consider a non-deterministic
Turing Machine that accepts on input ⟨ℱ, 𝜙, 𝜓⟩ if and only if 𝜓 is relevant for 𝜙 with ⟨ℛ, ℱ⟩ and does
the following.
   1. Non-deterministically guesses the sets ℱ ′ ⊆ ℱ and ℛ′ ⊆ ℛ such that 𝜓 ∈ ℱ ′ .

   2. Checks with the oracle on Fact Entailment(ℛ) whether ⟨ℛ′ , ℱ ′ ⟩ |= 𝜙. If it does not, rejects.

   3. Checks the minimality of ⟨ℛ′ , ℱ ′ ⟩. If there exists ⟨ℛ′′ , ℱ ′′ ⟩ ⊆ ⟨ℛ′ , ℱ ′ ⟩ with |ℱ ′′ | = |ℱ ′ | − 1 or
      |ℛ′′ | = |ℛ′ | − 1 such that ⟨ℛ′′ , ℱ ′′ ⟩ |= 𝜙 (checked with the oracle) then rejects, otherwise accepts.


   (Soundness) Given a fact set ℱ, a fact 𝜙 and some fact 𝜓 ∈ ℱ, we show that if the TM accepts
on input ⟨𝒦, 𝜙, 𝜓⟩ then 𝜓 is relevant for 𝜙 with ⟨ℛ, ℱ⟩. By hypothesis the TM accepts so there is a
pair ⟨ℛ′ , ℱ ′ ⟩ ⊆ ⟨ℛ, ℱ⟩ such that 𝜓 ∈ ℱ ′ , ⟨ℛ′ , ℱ ′ ⟩ |= 𝜙 and ⟨ℛ′′ , ℱ ′′ ⟩ ̸|= 𝜙 for every ℱ ′′ ⊂ ℱ ′ and
ℛ′′ ⊂ ℛ′ . Hence, by the definitions of explanations and relevance, ⟨ℛ′ , ℱ ′ ⟩ is an explanation for 𝜙
containing 𝜓, thus 𝜓 is relevant for 𝜙.
   (Completeness) Given a fact set ℱ, a fact 𝜙 and some fact 𝜓 ∈ ℱ, we show that if 𝜓 is relevant for 𝜙
then the TM accepts on input ⟨𝒦, 𝜙, 𝜓⟩. By hypothesis 𝜓 is relevant for 𝜙 so there is an explanation ℰ =
⟨ℛ′ , ℱ ′ ⟩ for 𝜙 containing 𝜓. Hence, 𝜓 ∈ ℱ ′ and ⟨ℛ′ , ℱ ′ ⟩ |= 𝜙. At step 1 the TM guesses the explanation
ℰ = ⟨ℛ′ , ℱ ′ ⟩ and by construction of ℰ the oracle returns “yes" at step 2. Again, by hypothesis ⟨ℛ′ , ℱ ′ ⟩
is minimal thus the oracle answers “no" for every ⟨ℛ′′ , ℱ ′′ ⟩ ⊂ ⟨ℛ′ , ℱ ′ ⟩ of size |⟨ℛ′ , ℱ ′ ⟩| − 1 checked
in step 3, and the TM accepts.
   (Termination) As step one is a non-deterministic guess of subsets of ℱ and ℛ and then we use
the P oracle a linear number of times in the size of the guessed subsets, Fact Relevance(ℛ) is in
𝑁𝑃𝑃 = 𝑁𝑃.

Lemma 2. Fact Relevance(ℛ) is NP -hard.
Proof. We show that Fact Relevance(ℛ) is NP-hard via a reduction from SAT.
   (Translation) For a CNF formula Φ = 𝐶1 ∧ . . . ∧ 𝐶𝑚 defined using the variables {𝑉1 , . . . , 𝑉𝑛 },
we produce a knowledge base 𝒦 = ⟨ℛ, ℱ⟩ with nullary predicates Source and Target such that Φ is
satisfiable if and only if Source is relevant for the entailment of Target. The knowledge base 𝒦 is built
as follows:

              ℱ = {T(𝑣𝑖 ), F(𝑣𝑖 ) | 1 ≤ 𝑖 ≤ 𝑛} ∪ {Source}
              ℛ = {T(𝑣𝑘 ) → T(𝑐𝑖 ) | 1 ≤ 𝑖 ≤ 𝑚 and every positive literal 𝑉𝑘 in 𝐶𝑖 } ∪                           (1)
                    {F(𝑣𝑘 ) → T(𝑐𝑖 ) | 1 ≤ 𝑖 ≤ 𝑚 and every negative literal ¬𝑉𝑘 in 𝐶𝑖 } ∪                        (2)
                              ⋀︁𝑚
                    {Source ∧         T(𝑐𝑖 ) → Target} ∪                                                         (3)
                                    𝑖=1
                    {T(𝑥) ∧ F(𝑥) → Target}                                                                       (4)

  Note that in the construction above, we introduced a unique constant 𝑣𝑖 for each variable 𝑉𝑖 and a
unique constant 𝑐𝑖 for each clause 𝐶𝑖 in Φ. Note that therefore the only rule with variables is (4).
  (Soundness) We show that if Source is relevant for the entailment of Target with 𝒦 then the CNF
formula Φ is satisfiable.

  a By hypothesis there is an explanation ⟨ℛ′ , ℱ ′ ⟩ ⊆ ⟨ℛ, ℱ⟩ such that Source ∈ ℱ ′ and ⟨ℛ′ , ℱ ′ ⟩ |=
  ○
    Target.

  b We show first that for every constant 𝑣𝑖 appearing in ℱ ′ , we have that T(𝑣𝑖 ) ∈ ℱ ′ if and only if
  ○
    F(𝑣𝑖 ) ̸∈ ℱ ′ . Indeed, suppose T(𝑣𝑖 ), F(𝑣𝑖 ) ∈ ℱ ′ for some constant 𝑣𝑖 . Then, with rule (4) we have
    ⟨ℛ′ , {T(𝑣𝑖 ), F(𝑣𝑖 )}⟩ |= Target. This contradicts the fact that ℱ ′ is a minimal set that, together
    with ℛ′ , entails Target.

  c Let 𝜎ℱ ′ be the assignment such that 𝜎ℱ ′ (𝑉𝑖 ) = True if and only if T(𝑣𝑖 ) ∈ ℱ ′ for every variable
  ○
    𝑉𝑖 in Φ and its corresponding constant 𝑣𝑖 in 𝒦.

  d From ○,
  ○        b we know that (4) cannot be applied in ℱ ′ . Therefore, the rule (3) has been applied to
    infer Target.
     Hence, ⟨ℛ′ , ℱ ′ ⟩ |= T(𝑐𝑖 ) for every 1 ≤ 𝑖 ≤ 𝑚.

  ○
  e For 1 ≤ 𝑖 ≤ 𝑚 we know that T(𝑐𝑖 ) is either inferred by a rule from (1) or from (2).

     If a rule of the form (1) has been applied then there is a constant 𝑣𝑘 such that T(𝑣𝑘 ) ∈ ℱ ′ and
     T(𝑣𝑘 ) → T(𝑐𝑖 ) ∈ ℛ′ . Let 𝑉𝑘 be the corresponding variable of 𝑣𝑘 and 𝐶𝑖 the clause corresponding
     to 𝑐𝑖 in Φ, then by the construction of the rules in (1) we have that 𝑉𝑘 ∈ 𝐶𝑖 and by ○ c we know
     that 𝜎ℱ ′ (𝑉𝑘 ) = True, thus 𝜎ℱ ′ (𝐶𝑖 ) = True.
     If a rule of the form (2) has been applied then there is a constant 𝑣𝑘 such that F(𝑣𝑘 ) ∈ ℱ ′ and
     F(𝑣𝑘 ) → T(𝑐𝑖 ) ∈ ℛ′ . Let 𝑉𝑘 be the corresponding variable of 𝑣𝑘 and 𝐶𝑖 the clause corresponding
     to 𝑐𝑖 in Φ, then by the construction of the rules in (2) we have that ¬𝑉𝑘 ∈ 𝐶𝑖 and by ○c we know
     that 𝜎ℱ ′ (𝑉𝑘 ) = False thus 𝜎ℱ ′ (𝐶𝑖 ) = True.
     Hence, 𝜎ℱ ′ (𝐶𝑖 ) = True for every clause 𝐶𝑖 ∈ Φ, thus 𝜎(Φ) = True and Φ is satisfiable.

  (Completeness) We show that if Φ is satisfiable then Source is relevant for Target with 𝒦.

  ○
  a By hypothesis, there is an assignment 𝜎 such that 𝜎 is a model of Φ. Consider the set of facts
    ℱ ′ = {Source} ∪ {T(𝑣𝑖 ) | 𝜎(𝑉𝑖 ) = True} ∪ {F(𝑣𝑖 ) | 𝜎(𝑉𝑖 ) = False} ⊆ ℱ.

  ○
  b By hypothesis, we have 𝜎(𝐶𝑖 ) = 𝑇 𝑟𝑢𝑒 for every clause 𝐶𝑖 in Φ and two cases hold.

     There is a positive literal 𝑉𝑘 ∈ 𝐶𝑖 such that 𝜎(𝑉𝑘 ) = 𝑇 𝑟𝑢𝑒. In this case, let 𝑣𝑘 and 𝑐𝑖 be
     the constants corresponding to 𝑉𝑘 and 𝐶𝑖 respectively. By ○a we have that T(𝑣𝑘 ) ∈ ℱ ′ , and
     T(𝑣𝑘 ) → T(𝑐𝑖 ) ∈ ℛ by the construction of rules in (1).
     There is a negative literal ¬𝑉𝑘 ∈ 𝐶𝑖 such that 𝜎(𝑉𝑘 ) = 𝐹 𝑎𝑙𝑠𝑒. Let 𝑣𝑘 and 𝑐𝑖 be the constants
                                                 a we have that F(𝑣𝑘 ) ∈ ℱ ′ , and F(𝑣𝑘 ) → T(𝑐𝑖 ) ∈ ℛ
     corresponding to 𝑉𝑘 and 𝐶𝑖 respectively. By ○
     by the construction of rules in (2).
     In both cases, ⟨ℛ, ℱ ′ ⟩ |= T(𝑐𝑖 ) for every 1 ≤ 𝑖 ≤ 𝑚.

  c Let ℱ ′′ be a minimal subset of ℱ ′ and ℛ′ a minimal subset of ℛ (obtained in linear time by
  ○
    deletion, i.e., by removing on fact/rule until the entailment is lost) such that ⟨ℛ′ , ℱ ′′ ⟩ |= Target.
    We know that (3) and (4) are the only rules that can infer Target. However, (4) can not be applied
    on ℱ ′′ . Indeed, ℱ ′ is built from an assignment 𝜎 and therefore either T(𝑣𝑖 ) or F(𝑣𝑖 ) belongs to
    ℱ ′′ for any constant 𝑣𝑖 corresponding to a variable 𝑉𝑖 in Φ. Hence the rule (3) has been applied
    to derive Target.

  d To conclude, note that to apply (3) the atom Source must belong to ℱ ′′ and ⟨ℛ′ , ℱ ′ ⟩ is an
  ○
    explanation of Target. Hence, Source is relevant for Target.
   (Polynomiality) We show the size of 𝒦 is polynomial with respect to the size of Φ. In the fact set
ℱ we have 2𝑛 + 1 atoms with 𝑛 being the number of propositional variables in Φ. Then the rule set
contains at most 𝑘 × 𝑚 + 2 rules where 𝑚 is the number of clauses in Φ and 𝑘 the largest number of
literals in a clause. Thus the instance 𝒦 is polynomial with respect to Φ and the translation can be
computed in polynomial time.

</pre>