=Paper=
{{Paper
|id=Vol-3816/paper12
|storemode=property
|title=Rule-aware Datalog Fact Explanation Using Group-SAT
Solver
|pdfUrl=https://ceur-ws.org/Vol-3816/paper12.pdf
|volume=Vol-3816
|authors=Akira Charoensit,David Carral,Pierre Bisquert,Lucas Rouquette,Federico Ulliana
|dblpUrl=https://dblp.org/rec/conf/rulemlrr/CharoensitCBRU24
}}
==Rule-aware Datalog Fact Explanation Using Group-SAT
Solver==
Rule-aware Datalog Fact Explanation Using Group-SAT
Solver
Akira Charoensit1 , David Carral1 , Pierre Bisquert1,2 , Lucas Rouquette1 and Federico Ulliana1
1
Inria, LIRMM, Univ Montpellier, CNRS, France
2
IATE, Univ Montpellier, INRAE, Institut Agro, Montpellier, France
Abstract
One of the major benefits of symbolic AI is explainability. When new knowledge is obtained via a reasoning
process, it is possible to determine precisely all elements of the knowledge base that yield this knowledge.
Typically, one would use a SAT solver to compute the explanations. However, SAT-solving is computationally
expensive, and as the knowledge base grows, the time required increases exponentially. This work presents a
method for filtering a datalog knowledge base to optimise the time used by a SAT solver. This is achieved by
creating a hypergraph representing the grounded knowledge base and pruning the nodes that are not reachable
from the fact that we want to explain. This approach proves to be time-effective. Interestingly, one additional
benefit of using this hypergraph is that it is possible to encode more information about the rules used in the
reasoning process. By using an off-the-shelf group-SAT solver, this extra information allows us to find specific
explanations that would be missed if we only considered facts.
1. The Explanation Issue
In the realm of Artificial Intelligence, symbolic AI stands out for its ability to provide explainable results
and predictions. In particular, knowledge and rule-based systems make it possible for users to trace the
derivation of new knowledge back to the exact elements that have contributed to it. This traceability
is crucial for applications where understanding the reasoning process is as important as the outcome
itself, such as in healthcare or regulatory enforcement.
While the issue of computing explanations has long been studied for Description Logics (see e.g.,
[1, 2, 3, 4, 5]), the extension of the approach to mainstream rule languages such as Datalog has not
been considered so far. Datalog is widely recognised as a language which leverages recursion for data
processing [6] and plays a significant role in ontology-mediated query answering [7]. On the one hand,
it encompasses RDF-Schema and OWL-RL ontologies. On the other hand, many reasoning tasks on
ontologies can be reduced to query answering in this language [8]. Providing explanations for Datalog
is thus a step towards more reliable data-intensive applications, and the goal of this work is to contribute
to the explainability of Datalog.
Explaining reasoning in Datalog raises, however, two key challenges. The first is to choose a form of
explanation that is suitable for the reasoning task. The second is to actually compute them, which is
known to be expensive even for basic forms of explanations [9, 10].
Choosing Explanations Let us illustrate the importance of choosing the โrightโ notion of explanation.
Hereafter, we assume the reader is familiar with propositional and first order logic. Consider the
knowledge base ๐ฆ = โจโ, โฑโฉ in Figure 1 with facts โฑ = {Boss(alice, alice)} and rules โ = {๐1 , ๐2 ,
๐3 }. Consider the task of explaining the fact ๐ = Manager(alice), which is entailed by ๐ฆ. As Figure 1
illustrates, ๐ can be derived in two different ways starting from โฑ. One is by applying ๐1 . The other is
by applying ๐2 and then ๐3 .
RuleML+RRโ24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16โ22, 2024,
Bucharest, Romania
$ akira.charoensit@inria.fr (A. Charoensit); david.carral@inria.fr (D. Carral); pierre.bisquert@inrae.fr (P. Bisquert);
lucas.rouquette@inria.fr (L. Rouquette); federico.ulliana@inria.fr (F. Ulliana)
ยฉ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
๐1 : โ๐ฅ๐ฆ.Boss(๐ฅ, ๐ฆ) โ Manager(๐ฅ)
SAT Encoding
๐ด โง ๐ด โ ๐ถ โง ๐ด โ ๐ต โง ๐ต โ ๐ถ โง ยฌ๐ถ
A C MUS 1 MUS 2
Boss(alice, alice) Manager(alice)
๐ด โง ๐ด โ ๐ถ โง ยฌ๐ถ ๐ด โง ๐ด โ ๐ต โง ๐ต โ ๐ถ โง ยฌ๐ถ
explanation 1 explanation 2
๐2 : โ๐ฅ.Boss(๐ฅ, ๐ฅ) โ CEO(๐ฅ) ๐3 : โ๐ฅ.CEO(๐ฅ) โ Manager(๐ฅ)
facts: Boss(alice,alice) facts: Boss(alice,alice)
B
rules: ๐1 rules: ๐2 , ๐3
CEO(alice)
Figure 1: Computing Minimal KB-Support Explanations via MUSes
One possible form of explanation consists in computing the fact-support of ๐ with respect to โฑ,
that is, the set of facts โฑ โฒ โ โฑ that can contribute to the derivation of ๐. The notion of fact-support
corresponds to that of why-provenance [8, 10, 9, 5]. For the example in Figure 1, this would yield โฑ โฒ = โฑ.
Here, fact-support provides an insufficient information as it does not show the role of rules in the
reasoning task.
A meaningful notion of explanation here would be that of kb-support, that is, a subset ๐ฆโฒ of ๐ฆ
which entails ๐. To be precise, we need to identify minimal subsets of the knowledge base that preserve
the entailment since users typically prefer concise explanations. In the example of Figure 1, there are
two kb-support explanations: ๐ฆ1โฒ = โจ{๐1 }, โฑโฉ and ๐ฆ2โฒ = โจ{๐2 , ๐3 }, โฑโฉ. The interest in the notion of
kb-support is that it better explains the roles taken by both the rules and the data in the reasoning
task thereby giving the user more power to potentially take action and revise the knowledge base. In
the context of Description Logics, kb-supports correspond to the notion of justifications with ABoxes
(the standard version of justifications rather considering entailment axioms over TBoxes) [1, 2]. In this
work, we will thus consider this notion of kb-support which enables more detailed explanations for fact
entailment over Datalog knowledge bases.
Computing Explanations An effective approach to tackling the expensive theoretical cost of com-
puting explanations is to rely on the use of SAT-solvers [2, 8]. Nowadays, these are considered mature
and effective tools. The critical step of this reduction is that of encoding the input knowledge base and
entailment (i.e., the inputs of the explanation problem) as a propositional formula. To illustrate, Figure 1
shows the SAT encoding of the explanation problem, which goes as follows. It associates every atom
entailed by the knowledge base with a distinct propositional variable (here, ๐ด, ๐ต, and ๐ถ). Then, a
conjunctive formula is built. Its elements are either:
(i) a propositional variable corresponding to a fact in โฑ (e.g., ๐ด represents Boss(alice, alice)),
(ii) a grounded rule corresponding to a rule application (e.g., ๐ด โ ๐ต represents Boss(alice, alice) โ
CEO(alice)) or
(iii) a negated literal, whose propositional variable corresponds to the entailment (e.g., ยฌ๐ถ corresponds
to ยฌManager(alice)).
An important detail here is that negating the fact to explain in the encoded formula allows one to
solve the explanation problem by looking at its minimal unsatisfiable sets (MUSes) [11]. As Figure 1
shows, each MUS of the encoded formula provides an explanation.
Computing All Explanations Computing explanations via MUS for Datalog can lead to potential
pitfalls. The main issue is that two distinct rule applications can โconflictโ by resulting in the same
SAT-encoding. This case is illustrated in Figure 2. Here, we can see that the applications of ๐1 and ๐2 are
both encoded as ๐ด โ ๐ถ. As a result, the MUS enumeration no longer yields a (minimal) explanation.
The situation becomes even more complex when many recursive rules share the same groundings. To
๐1 : โ๐ฅ๐ฆ.Boss(๐ฅ, ๐ฆ) โ Manager(๐ฅ) SAT Encoding Group SAT Encoding
๐ด โง ๐ด โ ๐ถ โง ยฌ๐ถ ๐ด โง {๐ด โ ๐ถ}๐1 โง {๐ด โ ๐ถ}๐2 โง ยฌ๐ถ
MUS Group MUS 1 Group MUS 2
A C
๐ด โง ๐ด โ ๐ถ โง ยฌ๐ถ ๐ด โง {๐ด โ ๐ถ}๐1 โง ยฌ๐ถ ๐ด โง {๐ด โ ๐ถ}๐2 โง ยฌ๐ถ
Boss(alice,alice) Manager(alice)
explanation explanation 1 explanation 2
facts: Boss(alice,alice) facts: Boss(alice,alice) facts: Boss(alice,alice)
๐2 : โ๐ฅ.Boss(๐ฅ, ๐ฅ) โ Manager(๐ฅ) rules: ๐1 , ๐2 rules: ๐1 rules: ๐2
Figure 2: SAT vs Group-SAT Translation Example
address these issues, we present an encoding into group-SAT formulas [11] which allows us to establish
a precise correspondence between explanations and group-MUSes.
Computing All Explanations Efficiently While solvers are efficient, MUS computation remains
hard. Therefore, significant challenges may arise in terms of scalability as the size and complexity of
the knowledge base increase. The main reason for this is that in more complex knowledge bases there
may be a larger number of atoms and rules which are not relevant for the entailment to explain. This
is illustrated in Figure 3: atoms s(c) and s(d), as well as rule ๐4 , are irrelevant for explaining goal(a).
Considering irrelevant atoms will slow down both the encoding computation and the MUS enumeration
task. Hence, we introduce a filtering technique that reduces the size of the encoded formula while
preserving its soundness and completeness.
Contribution The contributions of this paper are the following:
1. We introduce the first comprehensive approach for computing kb-support explanations for fact
entailment (a.k.a justifications with ABoxes) over Datalog knowledge bases.
2. We present a reduction from Datalog rules and facts to group-SAT formulas, and establish an
exact correspondence between kb-support explanations and group-MUSes (Section 2).
3. We study the filtering of facts that are irrelevant for the explanation task and show that it is com-
putationally hard. In light of this, we present a two-step rule-based approach for approximating
relevance; this includes a static step preprocessing of the input KB and a dynamic step for tracing
a single entailment. (Section 3).
4. We present an experimental evaluation showing the effectiveness of our approach. In particular,
we show that the two-step approach allows good performances to dynamically explain any
entailment query (Section 4).
2. From Datalog Explanations to Group-MUS
The Logical Setting We assume a first-order signature with functions. We consider mutually disjoint
sets of variables, constants, and functions. We call term any variable, constant, or functional term of
the form ๐ (๐ก1 , . . . , ๐ก๐ ) where ๐ is a function symbol and every ๐ก๐ is a term. We write lists ๐ก1 , . . . , ๐ก๐
of terms as ยฏ๐ก and often treat these as sets. An atom is a first-order formula of the form ๐ (๐กยฏ) where
๐ is a relational predicate and ยฏ๐ก a sequence of terms. An atom is grounded if it does not contain any
variable. For a first-order formula ฮฆ and a list ๐ฅ ยฏ of variables, we write ฮฆ[๐ฅ ยฏ ] to indicate that ๐ฅ
ยฏ is the
set of all free variables that occur in ฮฆ. A fact is a grounded atom without function symbols. A rule ๐
is a first-order formula of the form โ๐ฅ ยฏ ] โ ๐ป[๐งยฏ] where ๐ต[๐ฅ
ยฏ .๐ต[๐ฅ ยฏ ] and ๐ป[๐งยฏ] are conjunctions of atoms
and ๐งยฏ โ ๐ฅ ยฏ . Such a rule is called Datalog if it is function-free and ๐ป is a single atom. We will often
omit universal quantifiers when writing rules. Moreover, we identify conjunctions of atoms such as ๐ต
and ๐ป above with the corresponding sets, and define body(๐) = ๐ต and head(๐) = ๐ป. A knowledge
base (KB) ๐ฆ is a tuple โจโ, โฑโฉ where โ and โฑ are finite sets of rules and facts, respectively. A Datalog
knowledge base is such that โฑ only contains grounded atoms without function symbols and โ only
contains Datalog rules. For ๐ฆ1 = โจโ1 , โฑ1 โฉ and ๐ฆ2 = โจโ2 , โฑ2 โฉ we write ๐ฆ1 โ ๐ฆ2 when โ1 โ โ2 and
โฑ1 โ โฑ2 .
Fact Entailment The chase is a standard method for computing universal models of knowledge
bases that can in turn be used for tasks like fact entailment [12]. A substitution is a partial function
mapping variables to ground terms. A trigger for a fact set โฑ and a rule set โ is a tuple ๐ = โจ๐, ๐โฉ where
๐ โ โ, and ๐ is a substitution such that ๐(body(๐)) โ โฑ. The trigger outputs out(๐ ) = ๐(head(๐)).
For a KB ๐ฆ = โจโ, โฑโฉ, let Chase0 (๐ฆ) = โฑ; and, for every ๐ โฅ 1, let Chase๐ (๐ฆ) be the minimal fact
โ๏ธ Chase๐โ1 (๐ฆ) and out(๐ ) for every trigger ๐ of Chase๐โ1 (๐ฆ) and โ. Moreover, let
set that includes
Chase (๐ฆ) = ๐โฅ0 Chase๐ (๐ฆ). A KB ๐ฆ entails a fact ๐, denoted ๐ฆ |= ๐, if and only if ๐ โ Chase (๐ฆ).
From now on, every knowledge base considered in this section is Datalog. Knowledge bases with
functions will be employed in Section 3. At this point, we can formally define the notion of explanation
for Datalog knowledge bases.
Definition 1 (Explanation). For a Datalog knowledge base ๐ฆ = โจโ, โฑโฉ and a fact ๐, a kb-support
explanation of ๐ฆ |= ๐ is a KB ๐ฆโฒ โ ๐ฆ such that ๐ฆโฒ |= ๐ but ๐ฆโฒโฒ ฬธ|= ๐ for every ๐ฆโฒโฒ โ ๐ฆโฒ . We denote by
Expl(๐ฆ, ๐) the set of all explanations of ๐ฆ |= ๐.
From Explanations to Group-MUS We now show how to reduce the explanation problem to
group-MUS (GMUS) enumeration. Group-SAT (GSAT) formulas are natural extensions of SAT formulas
where constraints are modeled as sets or groups [13]. The underlying idea is that all clauses in a group
must hold together. Below, we assume the standard notion of literal (positive or negative propositional
variable) and clause (disjunction of literals). A group ๐ข is a set of clauses. To keep the formalisation
concise, we slightly enrich the standard definition and also assume that every group ๐ข has a unique
identifier ๐ denoted by ๐ข๐ . Otherwise said, we consider two groups with the same clauses but with
different identifiers to be different. A GSAT formula F is a set of groups F = {๐ข1 , . . . , ๐ข๐ }. A GSAT
formula is satisfiable if the conjunction of all the clauses in the union of its groups is satisfiable. A
GMUS of F is a minimal set of the groups of F that is unsatisfiable. We write GMUS(F) for the set of all
GMUS of F. Figure 2 illustrates the GSAT formula stemming from the encoding of the input facts and
rules. In the formula, we have two groups which contain the same (set of) clauses; these are denoted as
{๐ด โ ๐ถ}๐1 and {๐ด โ ๐ถ}๐2 where the identifier of each group corresponds to the rule that generates
it. Also note that with this encoding every GMUS in Figure 2 corresponds to one explanation.
To reduce the computation of the explanations for a fact ๐ over a KB ๐ฆ to group-MUS enumeration,
our goal is to produce a GSAT formula which encodes the derivations enabled by the KB as well as
the (negation of) the fact to explain. Let ๐ฆ = โจโ, โฑโฉ, we define GSAT(๐ฆ โง ยฌ๐) as the minimal set
containing:
โข the group ๐ข๐ with ๐ข = Grounding๐ฆ (๐), for every ๐ โ โ
โข the group ๐ข๐ with ๐ข = {๐}, for every ๐ โ โฑ
โข the group ๐ข๐ with ๐ข = {ยฌ๐}
Above, Grounding๐ฆ (๐) is the set containing the grounded rule ๐(body(๐)) โ ๐(head(๐)) for every
trigger ๐ = โจ๐, ๐โฉ over Chase (๐ฆ). Given a set of groups F, we denote by KBs(F) the knowledge base
built by using all facts and rules that are identifiers of the groups in F. Proposition 1 establishes a
precise correspondence between the explanations for ๐ฆ |= ๐ and the group-MUSes of GSAT(๐ฆ โง ยฌ๐).
โ๏ธ
Proposition 1. Expl(๐ฆ, ๐) = KBs(F).
FโGMUS(GSAT(๐ฆโงยฌ๐))
goal(a)
Rules ๐3 ๐3
๐1 : โ๐ฅ.p(๐ฅ) โ t(๐ฅ, ๐ฅ)
๐2 : โ๐ฅ๐ฆ.t(๐ฅ, ๐ฆ) โง q(๐ฆ) โ t(๐ฆ, ๐ฅ) t(a, a) t(a, b) v(c, c) v(d, d)
๐3 : โ๐ฅ๐ฆ.t(๐ฅ, ๐ฅ) โง t(๐ฅ, ๐ฆ) โ goal(๐ฅ)
๐4 : โ๐ฅ.s(๐ฅ) โ v(๐ฅ, ๐ฅ) ๐1 ๐2 ๐4 ๐4
Facts
p(a), q(a), t(b, a), s(c), s(d) p(a) q(a) t(b, a) s(c) s(d)
relevant approximation
Figure 3: Relevance: Exact vs Approximate
3. Knowledge-Base Filtering via Relevance
To explain an entailment of a knowledge base, it is often the case that only a portion of the knowledge
baseโs facts and rules are necessary or relevant to that entailment. In this section, we present a formal
definition of relevance and introduce a rule-based technique for filtering out facts that are irrelevant for
explaining a given fact. This technique can drastically reduce the size of the encoded formula to be
solved.
Relevance We say that a fact ๐ is relevant for the entailment of a fact ๐ in a knowledge base ๐ฆ
if there exists an explanation ๐ฆโฒ โ Expl(๐ฆ, ๐) that contains ๐. This definition naturally extends to
rules. Consider the knowledge base in Figure 3 with facts โฑ = {p(a), q(a), t(b, a), s(c), s(d)} and
rules โ = {๐1 , ๐2 , ๐3 , ๐4 }. The entailment ๐ = goal(a) can be derived in two ways. The first is by
applying ๐1 , which yields t(a, a); this atom allows for the application of ๐3 to derive ๐. The second
is by applying ๐3 on the results of ๐1 and ๐2 . The fact ๐ has, however, only one explanation, namely,
๐ฆโฒ = โจ{๐1 , ๐3 }, {p(a)}โฉ. Indeed, ๐ฆโฒโฒ = โจ{๐1 , ๐2 , ๐3 }, {p(a), q(a), t(b, a)}โฉ is not minimal as ๐ฆโฒ โ ๐ฆโฒโฒ
(see Definition 1). As a result, with Expl(๐ฆ, ๐) = {๐ฆโฒ }, the only elements relevant to the entailment of
๐ in ๐ฆ are p(a) and ๐1 , ๐3 . Despite its apparent simplicity, it turns out that deciding relevance is hard.
This holds true even if one focuses only on deciding the relevance of facts, for a fixed rule set. By a
reduction from SAT, we have the following proposition. Proofs can be found in Appendix A.
Proposition 2. For a fixed ruleset โ, deciding if a fact ๐ โ โฑ is relevant for ๐ on โจโ, โฑโฉ is NP-complete.
Approximating Relevance Considering this hardness result, we turn our attention to the task
of approximating relevant facts and rules. This is achieved using a two-step rule-based approach:
a static step which preprocesses the input knowledge base, followed by a dynamic step for tracing
each individual entailment. The static step builds an entailment graph which tracks all relationships
between any entailed atoms and rules. This entailment graph allows us to compute an approximation
of relevant facts. For instance, as illustrated in Figure 3, the relevance approximation includes atoms
p(a), q(a), t(b, a) and rules ๐1 , ๐2 , ๐3 .
We now present our approach and give a construction for the entailement graph based on rules
with function symbols, which can be deployed on reasoners supporting the formalism. Before detailing
each step, we introduce the following notation. Given a predicate ๐ , we denote by ๐ + and ๐๐ a fresh
predicate and a function unique for ๐ , respectively; below, the same will hold for predicates ๐ต and ๐ป.
Moreover, given a rule ๐ we respectively denote by E๐ and ๐๐ a fresh predicate and a function unique
for ๐.
Static step (entailment graph building) Given a knowledge base ๐ฆ = โจโ, โฑโฉ, we compute
๐ฎ = Chase (โจEntGraph(โ), โฑโฉ) where EntGraph(โ) is the minimal set containing the following
rules:
EntGraph(โ)
๐๐ : p(๐ฅ) โ p+ (๐ฅ, ๐p (๐ฅ)) goal+ (a, fgoal (a)) Er3 (ft (a, a), ft (a, a), fgoal (a))
+
๐๐ก : t(๐ฅ, ๐ฆ) โ t (๐ฅ, ๐ฆ, ๐t (๐ฅ, ๐ฆ))
๐๐ : q(๐ฅ) โ q+ (๐ฅ, ๐q (๐ฅ)) ๐3โฒ
๐๐๐๐๐ : goal(๐ฅ) โ goal+ (๐ฅ, ๐goal (๐ฅ))
t+ (a, a, ft (a, a)) Er1 (fp (a), ft (a, a))
๐๐ : s(๐ฅ) โ s+ (๐ฅ, ๐s (๐ฅ))
๐1โฒ
๐1โฒ : p+ (๐ฅ, ๐p (๐ฅ)) โ t+ (๐ฅ, ๐ฅ, ๐t (๐ฅ, ๐ฅ)) โง Er1 (๐p (๐ฅ), ๐t (๐ฅ, ๐ฅ))
p+ (a, fp (a))
๐2โฒ : t+ (๐ฅ, ๐ฆ, ๐t (๐ฅ, ๐ฆ)) โง q+ (๐ฅ, ๐q (๐ฅ))
โ t+ (๐ฆ, ๐ฅ, ๐t (๐ฆ, ๐ฅ)) โง Er2 (๐๐ก (๐ฅ, ๐ฆ), ๐q (๐ฅ), ๐t (๐ฆ, ๐ฅ))
๐๐
๐3โฒ : t+ (๐ฅ, ๐ฅ, ๐t (๐ฅ, ๐ฅ)) โง t+ (๐ฅ, ๐ฆ, ๐t (๐ฅ, ๐ฆ))
โ goal+ (๐ฅ, ๐goal (๐ฅ)) โง Er3 (๐t (๐ฅ, ๐ฅ), ๐t (๐ฅ, ๐ฆ), ๐goal (๐ฅ))
p(a)
๐4โฒ : s+ (๐ฅ, ๐s (๐ฅ)) โ v+ (๐ฅ, ๐ฅ, ๐v (๐ฅ, ๐ฅ)) โง Er4 (๐s (๐ฅ), ๐v (๐ฅ, ๐ฅ))
Figure 4: Static Step Rules and Inferences
ยฏ ) โ ๐ + (๐ฅ
1. ๐ (๐ฅ ยฏ , ๐๐ (๐ฅ
ยฏ )) for every predicate ๐ occurring in โฑ
โ๏ธ๐
2. ๐=1 ๐ต๐+ (๐ฅยฏ ๐ , ๐ฆ๐ ) โ ๐ป + (๐ฅ ยฏ )) โง E๐ (๐ฆ1 , . . . , ๐ฆ๐ , ๐๐ป (๐ฅ
ยฏ , ๐๐ป (๐ฅ ยฏ ))
for every rule ๐ = ๐๐=1 ๐ต๐ (๐ฅ
โ๏ธ
ยฏ ๐ ) โ ๐ป(๐ฅ
ยฏ) โ โ
The aim of the static step is to produce the entailment graph, i.e. a hypergraph representing the links
(in terms of entailment) between the atoms. In order to do that, we first need to be able to reference
the atoms in Chase (๐ฆ). This is done by rules of type (1) which infer, for any atom ๐ (๐กยฏ) of arity ๐, an
atom ๐ + (๐กยฏ, ๐๐ (๐กยฏ)) of arity ๐ + 1 where ๐๐ (๐กยฏ) is a functional term representing ๐ (๐กยฏ) itself. Then, every
rule ๐ is transformed into a rule of type (2) which, once applied, uses the functional terms to link all
images of the body atoms to the image of the head atom via an hyperedge relation E๐ proper to ๐. Let
us emphasise that this step is done only once, for every input ๐ฆ.
Figure 4 illustrates the rules for the static step EntGraph(โ) for the rule base of Figure 3, as well
as their applications starting from p(a). First, the rule ๐๐ produces the atom p+ (a, fp (a)), where ๐p (a)
represents p(a). Consider then rule ๐1 . This is transformed in the following way: the body p(๐ฅ) and the
head t(๐ฅ, ๐ฅ) are respectively replaced by p+ (๐ฅ, ๐p (๐ฅ)) and t+ (๐ฅ, ๐ฅ, ๐t (๐ฅ, ๐ฅ)), and an hyperedge atom
Er1 (๐p (๐ฅ), ๐t (๐ฅ, ๐ฅ)) is added to the head to represent the link between the two atoms in the entailment
graph. Note that this edge atom uses the functional terms corresponding to the atoms appearing in
the rule. Since we have p+ (a, fp (a)), the rule ๐1โฒ produces both t+ (a, a, ft (a, a)) and the edge predicate
Er1 (fp (a), ft (a, a)) representing the fact that p+ (a, fp (a)) is used to derive t+ (a, a, ft (a, a)). Finally, ๐3โฒ
is obtained from ๐3 in a similar way, and it is used to derive goal+ (a, fgoal (a, a)).
Dynamic Step (tracing) For any fact ๐ to explain, we define ๐ = Chase (โจrelPropag(โ, ๐), ๐ฎโฉ)
where ๐ฎ is the result of the static step and relPropag(โ, ๐) is the minimal set containing the following
rules:
1. โ Rel(๐๐ (๐ฅ
ยฏ )) if ๐ = ๐ (๐ฅ
ยฏ)
2. Er (๐ฆ1 , . . . , ๐ฆ๐ , ๐ง) โง Rel(๐ง) โ Rel(๐ฆ1 ) โง . . . โง Rel(๐ฆ๐ ) โง RelEdge(๐๐ (๐ฆ1 , . . . , ๐ฆ๐ , ๐ง))
for every rule ๐ in โ
Note first that rule (1) has an empty body; this is used to bootstrap the tracing for ๐. Furthermore, for
rules of type (2), we consider that E๐ and ๐๐ are the fresh predicate and function unique for ๐ introduced
relPropag(โ, goal(a))
๐๐๐๐๐๐๐๐
๐๐๐๐๐๐๐๐ : โ Rel(fgoal (a))
๐๐๐๐1 : Er1 (๐ฆ, ๐ฅ) โง Rel(๐ฅ) Rel(fgoal (a)) Er3 (ft (a, a), ft (a, a), fgoal (a))
โ Rel(๐ฆ) โง RelEdge(๐๐1 (๐ฆ, ๐ฅ))
๐๐๐๐3 RelEdge(fr3 (ft (a, a), ft (a, a), fgoal (a)))
๐๐๐๐2 : Er2 (๐ฆ, ๐ง, ๐ฅ) โง Rel(๐ฅ)
โ Rel(๐ฆ) โง Rel(๐ง) โง RelEdge(๐๐2 (๐ฆ, ๐ง, ๐ฅ))
Rel(ft (a, a)) Er1 (fp (a), ft (a, a))
๐๐๐๐3 : Er3 (๐ฆ, ๐ง, ๐ฅ) โง Rel(๐ฅ)
โ Rel(๐ฆ) โง Rel(๐ง) โง RelEdge(๐๐3 (๐ฆ, ๐ง, ๐ฅ))
๐๐๐๐1 RelEdge(fr1 (fp (a), ft (a, a)))
๐๐๐๐4 : Er4 (๐ฆ, ๐ฅ) โง Rel(๐ฅ)
โ Rel(๐ฆ) โง RelEdge(๐๐4 (๐ฆ, ๐ฅ))
Rel(fp (a))
Figure 5: Dynamic Step Rules and Inferences
in the static step, and Rel and RelEdge are reserved predicates. Intuitively, these rules recursively trace
a fact back to the ancestor atoms which belong to the input fact base.
Figure 5 illustrates the rules for the dynamic step relPropag(โ, ๐) for the rule base of Figure 3 and
the entailment ๐ = goal(a), as well as their applications. In a nutshell, the atom to be explained is
marked as relevant thanks to the (empty body) rule โ Rel(fgoal (a)). Then the relevance relation is
propagated back using the edge atoms produced during the static step. In order to do this, we need the
โpropagation ruleโ ๐๐๐๐1 -๐๐๐๐4 . For instance, since we have Er3 (ft (a, a), ft (a, a), fgoal (a)) from the static
step, and goal(a) is relevant (i.e., Rel(fgoal (a))), then by rule ๐๐๐๐3 we have that both t(a, a) and ๐3 are
relevant (i.e, Rel(ft (a, a)) and RelEdge(fr3 (ft (a, a), ft (a, a), fgoal (a))), respectively). The same process
applies for deeming p(a) and rule ๐1 as relevant by the application of ๐๐๐๐1 .
Finally, for ๐ฆ = โจโ, โฑโฉ, we define the approximation of the relevant KB as SupRel๐ (๐ฆ) = โจโโฒ , โฑ โฒ โฉ
where โฑ โฒ = {๐ (๐กยฏ) โ โฑ | ๐ |= Rel(๐๐ (๐กยฏ))} and โโฒ = {๐ โ โ | ๐ |= RelEdge(๐๐ (๐ก1 , . . . , ๐ก๐ , ๐กโฒ ))}.
Note that both ๐ฎ and ๐ knowledge bases always admit a finite (terminating) chase. Soundness and
completeness below state that this approximation provides the same explanations as the exact relevance.
Proposition 3. Expl(๐ฆ, ๐) = Expl(SupRel๐ (๐ฆ), ๐)
4. Experimental Analysis
We have implemented our approach by using the MARCO tool for group-MUS enumeration [11] (Section
2) and InteGraal [14] for approximating relevance (Section 3).
Benchmarks We selected 24 interesting ontologies from the MOWL corpus [15]. These ontologies
were in the EL profile and have been translated into Datalog programs using the Java OWL API [16]
and the translation presented in [17]. We made sure that each ontology has at least 5 extensional facts
and produces at least 5 intensional facts. For each ontology, we picked 5 entailment queries to explain
among the atoms with the greatest reasoning depth (and that are hence, intuitively, more challenging to
explain), that is, atoms that are derived in the last (breadth-first) step of the saturation of the ontology.
Protocol For each ontology, we compute the entailment graph using InteGraal [14] and the rules for
the static step presented in Section 3; recall that this step is done only once per ontology. Subsequently,
for each entailment query, we again use InteGraal [14] and the rules for the dynamic step presented in
Section 3 to compute a knowledge base that consists only of the part of the input ontology relevant to
that particular entailment query. This relevant knowledge base is then translated into a GSAT formula
as described in Section 2 and fed to the MARCO tool [11]. The tool returns all MUSes which, when
Figure 6: Query Explanation Performances
translated back to Datalog knowledge base statements, constitute the explanations. We compared against
the OWL API [16] which is the only tool we are aware of which computes kb-support explanations (i.e.,
ontology axioms corresponding to facts and rules). We implemented our benchmarking protocols using
the B-Runner library [18] and made them available online [19].
Setup All our experiments are run on a server with AMD Ryzen 9 3900XT 12-Core Processor @4.7GHz,
128GB of RAM @2.4GHz, and 2TB of SSD NVME.
Analysis Figure 6 reports the time taken by the static step, which corresponds to the entailment
graph construction, vs the dynamic step which is further divided into filtering (via InteGraal) and MUS
enumeration (via MARCO). For the dynamic step, we report the average time for the 5 queries we
considered. Times are in milliseconds.
As expected, we can observe that the time taken by the static step (even if within tenths of a second)
is significantly longer than that of the dynamic step. Interestingly, we found that the time required for
constructing the entailment graph depends on the number of possible rule groundings as well as on the
depth (in a breadth-first sense) of the chase. This cost, being fixed, is however amortised when multiple
explanation queries are submitted.
Concerning the dynamic step, we can observe that this is much faster than the static step, more in
the order of tens of milliseconds. Again, we found that a crucial aspect here is the depth of the derived
atom: the deeper an atom is in the entailment graph the more one has to propagate relevance. Our
results also indicate that the group-MUS enumerator is very fast (in the order of milliseconds), which
further validates this approach for computing explanations. Note also that the generality of our method
makes that it can be implemented with any reasoner supporting the rules described in Section 3, and
any group-MUS enumerator.
We conclude with a comparison with the explanation facility provided by the OWL API. We can see
that our approach is generally more competitive if we consider only the dynamic step, and can even be
more competitive depending on the cases for one-of explanations including both the static and dynamic
step as a single operation.
5. Related Work & Conclusion
We have introduced the first comprehensive approach for computing kb-support explanations for fact
entailments over Datalog knowledge bases. Our method leverages on group-MUS enumeration for
computing explanations as well as on a rule-based approach to filter irrelevant atoms for the task. Our
approach has been implemented and experimentally evaluated.
Our work considers explanations in the form of kb-support (a.k.a, justifications with ABoxes), which,
to the best of our knowledge, has not been investigated so far for Datalog [9, 8, 10]. Indeed, existing
methods for Datalog focus on the notion of why-provenance which corresponds to that of fact-support.
The recent work of [8] studies the use of SAT-solvers for computing why-provenance on Datalog based
on so called non-ambiguous proof trees. Rule-based approaches for on-demand computation of Datalog
provenance have been studied in [10]. The recent work of [5] considers provenance computation for
the EL Description Logic [20, 21]. In the context of answer set programming (ASP), the work of [21]
considers a notion of graph-explanation; this can be seen as an extension of fact-support including all
inferred facts (but no rules). While these approaches are close to ours in spirit, the technical development
remains different. Notably, considering kb-supports allows us to obtain more explanations than why-
provenance, and thus a better understanding of the entailed facts. On the other hand, this also required
us to establish a new correspondence between group-MUSes and kb-support explanations.
In the context of Description Logics, reasoning explanations have long been studied. Justifications,
also called minimal axiom sets, and their computation are referred to as axiom pinpointing. The
approaches closest to ours are the following. [1] and [2] consider the problem of computing axiom
explanation for EL TBoxes (hence, without data). [1] studies the complexity of computing relevant
axioms while [2] presents a reduction to GMUS enumeration. However, in contrast to our work, none
of these approaches consider data (ABoxes). The problem of filtering the knowledge base to a set of
relevant facts has been considered in [8, 10, 3, 4]. All of these approaches employ a notion akin to that
of a dependency graph. However, our work differs in that it considers a different notion of explanation.
Moreover, our work is the only one that studies the complexity of deciding the relevance of atoms
(NP-complete).
Through this work we demonstrated the practical applicability of group-SAT solvers for knowledge
bases, paving the way for more efficient and explainable AI systems. Future work includes the extension
of our setting to richer rule languages, including suitable forms of functions, negation, and disjunction.
Acknowledgments
This work was supported by the Inria-DFKI bilateral project R4Agri and by the ANR project CQFD
(ANR-18-CE23-0003).
References
[1] F. Baader, R. Penaloza, B. Suntisrivaraporn, Pinpointing in the description logic, in: Annual
Conference on Artificial Intelligence, Springer, 2007, pp. 52โ67.
[2] N. Manthey, R. Peรฑaloza, S. Rudolph, Satpin: Axiom pinpointing for lightweight description logics
through incremental sat, KI-Kรผnstliche Intelligenz 34 (2020) 389โ394.
[3] Z. Zhou, G. Qi, B. Suntisrivaraporn, A new method of finding all justifications in owl 2 el, in: 2013
IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent
Technologies (IAT), volume 1, IEEE, 2013, pp. 213โ220.
[4] Z. Zhou, G. Qi, A dependency-graph based approach for finding justification in owl 2 el, Intelligent
Data Analysis 22 (2018) 1315โ1335.
[5] S. Borgwardt, S. Breuer, A. Kovtunova, Computing abox justifications for query answers via
datalog rewriting., in: Description Logics, 2023.
[6] S. Abiteboul, R. Hull, V. Vianu, Foundations of databases, volume 8, Addison-Wesley Reading,
1995.
[7] G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, M. Zakharyaschev, Ontology-
based data access: A survey, in: J. Lang (Ed.), Proceedings of the Twenty-Seventh International
Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, 2018,
pp. 5511โ5519. URL: https://doi.org/10.24963/ijcai.2018/777. doi:10.24963/IJCAI.2018/777.
[8] M. Calautti, E. Livshits, A. Pieris, M. Schneider, Computing the why-provenance for datalog queries
via SAT solvers, in: M. J. Wooldridge, J. G. Dy, S. Natarajan (Eds.), Thirty-Eighth AAAI Conference
on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of
Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial
Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, AAAI Press, 2024, pp. 10459โ
10466. URL: https://doi.org/10.1609/aaai.v38i9.28914. doi:10.1609/AAAI.V38I9.28914.
[9] C. Bourgaux, P. Bourhis, L. Peterfreund, M. Thomazo, Revisiting semiring provenance for datalog,
in: G. Kern-Isberner, G. Lakemeyer, T. Meyer (Eds.), Proceedings of the 19th International Confer-
ence on Principles of Knowledge Representation and Reasoning, KR 2022, Haifa, Israel, July 31 -
August 5, 2022, 2022. URL: https://proceedings.kr.org/2022/10/.
[10] A. Elhalawati, M. Krรถtzsch, S. Mennicke, An existential rule framework for computing why-
provenance on-demand for datalog, in: G. Governatori, A. Turhan (Eds.), Rules and Reasoning
- 6th International Joint Conference on Rules and Reasoning, RuleML+RR 2022, Berlin, Ger-
many, September 26-28, 2022, Proceedings, volume 13752 of Lecture Notes in Computer Science,
Springer, 2022, pp. 146โ163. URL: https://doi.org/10.1007/978-3-031-21541-4_10. doi:10.1007/
978-3-031-21541-4\_10.
[11] M. H. Liffiton, A. Previti, A. Malik, J. Marques-Silva, Fast, flexible MUS enumeration, Con-
straints An Int. J. 21 (2016) 223โ250. URL: https://doi.org/10.1007/s10601-015-9183-0. doi:10.
1007/S10601-015-9183-0.
[12] C. Beeri, M. Y. Vardi, A proof procedure for data dependencies, J. ACM 31 (1984) 718โ741. URL:
https://doi.org/10.1145/1634.1636. doi:10.1145/1634.1636.
[13] M. H. Liffiton, K. A. Sakallah, Algorithms for computing minimal unsatisfiable subsets of
constraints, J. Autom. Reason. 40 (2008) 1โ33. URL: https://doi.org/10.1007/s10817-007-9084-z.
doi:10.1007/S10817-007-9084-Z.
[14] J.-F. Baget, P. Bisquert, M. Leclรจre, M.-L. Mugnier, G. Pรฉrution-Kihli, F. Tornil, F. Ulliana, InteGraal:
a Tool for Data-Integration and Reasoning on Heterogeneous and Federated Sources, in: BDA 2023
- 39e Confรฉrence sur la Gestion de Donnรฉes โ Principes, Technologies et Applications, Montpellier,
France, 2023. URL: https://hal-lirmm.ccsd.cnrs.fr/lirmm-04304601.
[15] N. Matentzoglu, B. Parsia, The Manchester OWL Corpus (MOWLCorp), original serialisation, 2014.
URL: https://doi.org/10.5281/zenodo.10851. doi:10.5281/zenodo.10851.
[16] M. Horridge, S. Bechhofer, The OWL API: A java API for OWL ontologies, Semantic Web 2 (2011)
11โ21. URL: https://doi.org/10.3233/SW-2011-0025. doi:10.3233/SW-2011-0025.
[17] D. Carral, J. Zalewski, P. Hitzler, An efficient algorithm for reasoning over OWL EL ontologies with
nominal schemas, J. Log. Comput. 33 (2023) 136โ162. URL: https://doi.org/10.1093/logcom/exac032.
doi:10.1093/LOGCOM/EXAC032.
[18] F. Ulliana, P. Bisquert, A. Charoensit, R. Colin, F. Tornil, Q. Yeche, Collaborative benchmarking with
b-runner, in: Rules and Reasoning - 8th International Joint Conference on Rules and Reasoning,
RuleML+RR 2024, Bucarest, Romania, September 16-18, 2024, Proceedings, Lecture Notes in
Computer Science, 2024.
[19] Artifacts for Rule-aware Datalog Fact Explanation Using Group-SAT Solver, https://gitlab.inria.fr/
boreal-artifacts/ruleml2024, 2024.
[20] T. Eiter, T. Geibinger, Explaining answer-set programs with abstract constraint atoms., in: IJCAI,
2023, pp. 3193โ3202.
[21] M. Alviano, L. L. Trieu, T. C. Son, M. Balduccini, Explanations for answer set programming, arXiv
preprint arXiv:2308.15879 (2023).
A. Proofs for Section 3
Fact Relevance(โ)
Instance: A fact set โฑ, a fact ๐ and a fact ๐ โ โฑ.
Question: Is ๐ relevant for the entailment of ๐ with respect to โจโ, โฑโฉ?
Lemma 1. Fact Relevance(โ) is in NP .
Proof. We consider an oracle that, for a fixed โ, can decide in polynomial time Fact Entailment(โ),
that is, whether โจโ, โฑโฉ |= ๐ for a given fact set โฑ and fact ๐. We then consider a non-deterministic
Turing Machine that accepts on input โจโฑ, ๐, ๐โฉ if and only if ๐ is relevant for ๐ with โจโ, โฑโฉ and does
the following.
1. Non-deterministically guesses the sets โฑ โฒ โ โฑ and โโฒ โ โ such that ๐ โ โฑ โฒ .
2. Checks with the oracle on Fact Entailment(โ) whether โจโโฒ , โฑ โฒ โฉ |= ๐. If it does not, rejects.
3. Checks the minimality of โจโโฒ , โฑ โฒ โฉ. If there exists โจโโฒโฒ , โฑ โฒโฒ โฉ โ โจโโฒ , โฑ โฒ โฉ with |โฑ โฒโฒ | = |โฑ โฒ | โ 1 or
|โโฒโฒ | = |โโฒ | โ 1 such that โจโโฒโฒ , โฑ โฒโฒ โฉ |= ๐ (checked with the oracle) then rejects, otherwise accepts.
(Soundness) Given a fact set โฑ, a fact ๐ and some fact ๐ โ โฑ, we show that if the TM accepts
on input โจ๐ฆ, ๐, ๐โฉ then ๐ is relevant for ๐ with โจโ, โฑโฉ. By hypothesis the TM accepts so there is a
pair โจโโฒ , โฑ โฒ โฉ โ โจโ, โฑโฉ such that ๐ โ โฑ โฒ , โจโโฒ , โฑ โฒ โฉ |= ๐ and โจโโฒโฒ , โฑ โฒโฒ โฉ ฬธ|= ๐ for every โฑ โฒโฒ โ โฑ โฒ and
โโฒโฒ โ โโฒ . Hence, by the definitions of explanations and relevance, โจโโฒ , โฑ โฒ โฉ is an explanation for ๐
containing ๐, thus ๐ is relevant for ๐.
(Completeness) Given a fact set โฑ, a fact ๐ and some fact ๐ โ โฑ, we show that if ๐ is relevant for ๐
then the TM accepts on input โจ๐ฆ, ๐, ๐โฉ. By hypothesis ๐ is relevant for ๐ so there is an explanation โฐ =
โจโโฒ , โฑ โฒ โฉ for ๐ containing ๐. Hence, ๐ โ โฑ โฒ and โจโโฒ , โฑ โฒ โฉ |= ๐. At step 1 the TM guesses the explanation
โฐ = โจโโฒ , โฑ โฒ โฉ and by construction of โฐ the oracle returns โyes" at step 2. Again, by hypothesis โจโโฒ , โฑ โฒ โฉ
is minimal thus the oracle answers โno" for every โจโโฒโฒ , โฑ โฒโฒ โฉ โ โจโโฒ , โฑ โฒ โฉ of size |โจโโฒ , โฑ โฒ โฉ| โ 1 checked
in step 3, and the TM accepts.
(Termination) As step one is a non-deterministic guess of subsets of โฑ and โ and then we use
the P oracle a linear number of times in the size of the guessed subsets, Fact Relevance(โ) is in
๐๐๐ = ๐๐.
Lemma 2. Fact Relevance(โ) is NP -hard.
Proof. We show that Fact Relevance(โ) is NP-hard via a reduction from SAT.
(Translation) For a CNF formula ฮฆ = ๐ถ1 โง . . . โง ๐ถ๐ defined using the variables {๐1 , . . . , ๐๐ },
we produce a knowledge base ๐ฆ = โจโ, โฑโฉ with nullary predicates Source and Target such that ฮฆ is
satisfiable if and only if Source is relevant for the entailment of Target. The knowledge base ๐ฆ is built
as follows:
โฑ = {T(๐ฃ๐ ), F(๐ฃ๐ ) | 1 โค ๐ โค ๐} โช {Source}
โ = {T(๐ฃ๐ ) โ T(๐๐ ) | 1 โค ๐ โค ๐ and every positive literal ๐๐ in ๐ถ๐ } โช (1)
{F(๐ฃ๐ ) โ T(๐๐ ) | 1 โค ๐ โค ๐ and every negative literal ยฌ๐๐ in ๐ถ๐ } โช (2)
โ๏ธ๐
{Source โง T(๐๐ ) โ Target} โช (3)
๐=1
{T(๐ฅ) โง F(๐ฅ) โ Target} (4)
Note that in the construction above, we introduced a unique constant ๐ฃ๐ for each variable ๐๐ and a
unique constant ๐๐ for each clause ๐ถ๐ in ฮฆ. Note that therefore the only rule with variables is (4).
(Soundness) We show that if Source is relevant for the entailment of Target with ๐ฆ then the CNF
formula ฮฆ is satisfiable.
a By hypothesis there is an explanation โจโโฒ , โฑ โฒ โฉ โ โจโ, โฑโฉ such that Source โ โฑ โฒ and โจโโฒ , โฑ โฒ โฉ |=
โ
Target.
b We show first that for every constant ๐ฃ๐ appearing in โฑ โฒ , we have that T(๐ฃ๐ ) โ โฑ โฒ if and only if
โ
F(๐ฃ๐ ) ฬธโ โฑ โฒ . Indeed, suppose T(๐ฃ๐ ), F(๐ฃ๐ ) โ โฑ โฒ for some constant ๐ฃ๐ . Then, with rule (4) we have
โจโโฒ , {T(๐ฃ๐ ), F(๐ฃ๐ )}โฉ |= Target. This contradicts the fact that โฑ โฒ is a minimal set that, together
with โโฒ , entails Target.
c Let ๐โฑ โฒ be the assignment such that ๐โฑ โฒ (๐๐ ) = True if and only if T(๐ฃ๐ ) โ โฑ โฒ for every variable
โ
๐๐ in ฮฆ and its corresponding constant ๐ฃ๐ in ๐ฆ.
d From โ,
โ b we know that (4) cannot be applied in โฑ โฒ . Therefore, the rule (3) has been applied to
infer Target.
Hence, โจโโฒ , โฑ โฒ โฉ |= T(๐๐ ) for every 1 โค ๐ โค ๐.
โ
e For 1 โค ๐ โค ๐ we know that T(๐๐ ) is either inferred by a rule from (1) or from (2).
If a rule of the form (1) has been applied then there is a constant ๐ฃ๐ such that T(๐ฃ๐ ) โ โฑ โฒ and
T(๐ฃ๐ ) โ T(๐๐ ) โ โโฒ . Let ๐๐ be the corresponding variable of ๐ฃ๐ and ๐ถ๐ the clause corresponding
to ๐๐ in ฮฆ, then by the construction of the rules in (1) we have that ๐๐ โ ๐ถ๐ and by โ c we know
that ๐โฑ โฒ (๐๐ ) = True, thus ๐โฑ โฒ (๐ถ๐ ) = True.
If a rule of the form (2) has been applied then there is a constant ๐ฃ๐ such that F(๐ฃ๐ ) โ โฑ โฒ and
F(๐ฃ๐ ) โ T(๐๐ ) โ โโฒ . Let ๐๐ be the corresponding variable of ๐ฃ๐ and ๐ถ๐ the clause corresponding
to ๐๐ in ฮฆ, then by the construction of the rules in (2) we have that ยฌ๐๐ โ ๐ถ๐ and by โc we know
that ๐โฑ โฒ (๐๐ ) = False thus ๐โฑ โฒ (๐ถ๐ ) = True.
Hence, ๐โฑ โฒ (๐ถ๐ ) = True for every clause ๐ถ๐ โ ฮฆ, thus ๐(ฮฆ) = True and ฮฆ is satisfiable.
(Completeness) We show that if ฮฆ is satisfiable then Source is relevant for Target with ๐ฆ.
โ
a By hypothesis, there is an assignment ๐ such that ๐ is a model of ฮฆ. Consider the set of facts
โฑ โฒ = {Source} โช {T(๐ฃ๐ ) | ๐(๐๐ ) = True} โช {F(๐ฃ๐ ) | ๐(๐๐ ) = False} โ โฑ.
โ
b By hypothesis, we have ๐(๐ถ๐ ) = ๐ ๐๐ข๐ for every clause ๐ถ๐ in ฮฆ and two cases hold.
There is a positive literal ๐๐ โ ๐ถ๐ such that ๐(๐๐ ) = ๐ ๐๐ข๐. In this case, let ๐ฃ๐ and ๐๐ be
the constants corresponding to ๐๐ and ๐ถ๐ respectively. By โa we have that T(๐ฃ๐ ) โ โฑ โฒ , and
T(๐ฃ๐ ) โ T(๐๐ ) โ โ by the construction of rules in (1).
There is a negative literal ยฌ๐๐ โ ๐ถ๐ such that ๐(๐๐ ) = ๐น ๐๐๐ ๐. Let ๐ฃ๐ and ๐๐ be the constants
a we have that F(๐ฃ๐ ) โ โฑ โฒ , and F(๐ฃ๐ ) โ T(๐๐ ) โ โ
corresponding to ๐๐ and ๐ถ๐ respectively. By โ
by the construction of rules in (2).
In both cases, โจโ, โฑ โฒ โฉ |= T(๐๐ ) for every 1 โค ๐ โค ๐.
c Let โฑ โฒโฒ be a minimal subset of โฑ โฒ and โโฒ a minimal subset of โ (obtained in linear time by
โ
deletion, i.e., by removing on fact/rule until the entailment is lost) such that โจโโฒ , โฑ โฒโฒ โฉ |= Target.
We know that (3) and (4) are the only rules that can infer Target. However, (4) can not be applied
on โฑ โฒโฒ . Indeed, โฑ โฒ is built from an assignment ๐ and therefore either T(๐ฃ๐ ) or F(๐ฃ๐ ) belongs to
โฑ โฒโฒ for any constant ๐ฃ๐ corresponding to a variable ๐๐ in ฮฆ. Hence the rule (3) has been applied
to derive Target.
d To conclude, note that to apply (3) the atom Source must belong to โฑ โฒโฒ and โจโโฒ , โฑ โฒ โฉ is an
โ
explanation of Target. Hence, Source is relevant for Target.
(Polynomiality) We show the size of ๐ฆ is polynomial with respect to the size of ฮฆ. In the fact set
โฑ we have 2๐ + 1 atoms with ๐ being the number of propositional variables in ฮฆ. Then the rule set
contains at most ๐ ร ๐ + 2 rules where ๐ is the number of clauses in ฮฆ and ๐ the largest number of
literals in a clause. Thus the instance ๐ฆ is polynomial with respect to ฮฆ and the translation can be
computed in polynomial time.