Concept Abduction for Description Logics
Birte Glimm1 , Yevgeny Kazakov1 and Michael Welt1
1
    Ulm University, Germany


                                         Abstract
                                         We present two alternative algorithms for computing (all or some) solutions to the concept abduction
                                         problem: one algorithm is based on Reiter’s hitting set tree algorithm, whereas the other one relies on a
                                         SAT encoding. In contrast to previous work, the algorithms do not rely on a refutation-based calculus
                                         and, hence, can be used also with efficient reasoners for tractable DLs such as ℰℒ and its extensions. An
                                         adaptation to other forms of (logic-based) abduction, e.g., to ABox abduction, is also possible.

                                         Keywords
                                         Description Logics, abduction, ontologies, reasoning


1. Introduction
The reasoning service of abduction, as originally introduced by Peirce [1], is a form of rea-
soning that is used to generate possible explanations for a given observation. In logic-based
abduction, the objective is to find a set of hypotheses (the explanation) which are consistent
with a background theory and which logically entail the observation when taken together with
the background knowledge. Often some minimality criterion, e.g., set inclusion or cardinality,
is used to select preferred explanations. Abduction has numerous applications in artificial
intelligence, e.g., in diagnosis, planning, or natural language understanding and abductive
reasoning has intensively been studied in the context of propositional logic (see, e.g., [2]) and
also in the context of answer set programming (see, e.g., [3]). For logic-based abduction over
ontologies, approaches started to emerge more recently (see, e.g., [4, 5, 6, 7]). With an exception
of recent approaches based on forgetting [8, 9], practical approaches for logic-based abduction
over ontologies are based on refutation-based calculi such as tableau. Such approaches are
based on the observation that the knowledge base 𝒦 does not yet entail the observation 𝑂,
i.e., 𝒦 ∪ {¬𝑂} is satisfiable, whereas an explanation 𝐸 has to be such that 𝒦 ∪ 𝐸 ∪ {¬𝑂}
is unsatisfiable since it is desired that 𝒦 ∪ 𝐸 |= 𝑂. Hence, open branches in a tableau for
𝒦 ∪ {¬𝑂} can “guide” the process of finding explanations as explanations have to be such that
they lead to closing the open branches.
   While refutation-based calculi are dominantly used for reasoning in expressive Description
Logics (DLs), tractable reasoning procedures for DLs such as ℰℒ are not based on refutation,


   DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel
$ birte.glimm@uni-ulm.de (B. Glimm); yevgeny.kazakov@uni-ulm.de (Y. Kazakov); michael.welt@uni-ulm.de
(M. Welt)
 https://www.uni-ulm.de/in/ki/glimm/ (B. Glimm); https://www.uni-ulm.de/in/ki/kazakov/ (Y. Kazakov)
 0000-0002-6331-4176 (B. Glimm); 0000-0002-3011-8646 (Y. Kazakov)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
but directly derive (logical) consequences, which prevents the use of existing approaches to
abduction. We aim at closing this gap and, inspired by the work of Kazakov and Glimm [10] for
computing justification (i.e., minimal subsets of axioms from a knowledge base that are needed
for a given entailment), we propose an alternative approach to efficiently compute some or
all (minimal) explanations. The approach is based on Reiter’s minimal hitting set algorithm
[11] without relying on a refutation-based calculus to guide the algorithm. An encoding of
the problem into SAT allows for better managing the combinatorial problem of selecting what
constitutes an explanation. As our goal is the development of an approach for DLs such as
ℰℒ, where the main reasoning task involves determining concept subsumptions, we focus on
concept abduction [12] as introduced next. The accompanying technical report gives further
details, examples, and complete proofs [13].


2. Preliminaries
We use the standard syntax and semantics of DLs (see, e.g., [14]). Note that the DL ℰℒ only
allows for using the top concept, conjunctions, and existential restrictions as concepts.
We now introduce the basic ideas of (concept) abduction (see also [12, 15]).

Definition 1. A concept abduction problem 𝒫 is a tuple ⟨𝒦, ℋ, 𝒪⟩ with 𝒦 a knowledge base,
ℋ, the hypotheses, a set of atomic concepts, and 𝑂, the observation, a single atomic concept.1
A set 𝐸 = {𝐴1 , . . . , 𝐴𝑛 } ⊆ ℋ is an explanation for 𝒫 if 𝒦 |= 𝐴1 ⊓ . . . ⊓ 𝐴𝑛 ⊑ 𝑂. Such an 𝐸
is explanatory if 𝐴1 ⊓ . . . ⊓ 𝐴𝑛 ⊑ 𝑂 ̸∈ 𝒦, 𝐸 is satisfiable if 𝐴1 ⊓ . . . ⊓ 𝐴𝑛 is satisfiable w.r.t.
𝒦, 𝐸 is relevant if ∅ ̸|= 𝐴1 ⊓ . . . ⊓ 𝐴𝑛 ⊑ 𝑂, and 𝐸 (syntactically) minimal if there is no other
explanation 𝐸 ′ for 𝒫 such that 𝐸 ′ ⊊ 𝐸.

   Note that the given set of hypothesis allows for restricting the set of concepts that can be
used in explanations, but may also contain all concepts used in the knowledge base.
   In the remainder, we are interested in finding explanations that are explanatory, satisfiable,
relevant, and (syntactically) minimal. Abusing notation, for a set of concepts 𝐸 = {𝐴1 , . . . , 𝐴𝑛 },
we also write 𝒦 |= 𝐸 ⊑ 𝑂 instead of 𝒦 |= 𝐴1 ⊓ . . . ⊓ 𝐴𝑛 ⊑ 𝑂.

Example 1. Consider the knowledge base 𝒦 containing the axioms

                          𝐴 ⊑ ∃𝑟.𝐵                     (1)                        𝐶 ⊓ 𝐶′ ⊑ 𝐷                      (3)
                      ∃𝑟.𝐵 ⊑ 𝐶                         (2)                         𝐴⊓𝐵 ⊑⊥                         (4)

and the abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ with ℋ = {𝐴, 𝐵, 𝐶, 𝐶 ′ } and 𝑂 = 𝐷. From Axiom (1)
and Axiom (2), we have 𝒦 |= 𝐴 ⊑ 𝐶. Hence, together with Axiom (3), we find that 𝒦 |= 𝐴 ⊓ 𝐶 ′ ⊑
𝐷 = 𝑂. Among others, we have the following explanations for 𝒫:

           𝐸1 = {𝐴, 𝐶 ′ }            𝐸2 = {𝐶, 𝐶 ′ }            𝐸3 = {𝐴, 𝐵}              𝐸4 = {𝐴, 𝐶, 𝐶 ′ }

1
    Note that requiring the observation to be an atomic concept is without loss of generality since for an observation
    in form of a complex concept 𝐶, we can simply introduce an axiom 𝑂 ≡ 𝐶 and use 𝑂 as observation. We can
    proceed analogously for hypotheses.
 Algorithm 1: Finding one explanation
  Minimize(𝒦, ℋ, 𝑂): compute a minimal explanation for the concept abduction problem
                      ⟨𝒦, ℋ, 𝒪⟩
  input : a knowledge base 𝒦, a set of hypotheses ℋ, and an observation 𝑂 such that
         𝒦 |= ℋ ⊑ 𝑂
  output : a minimal explanation 𝐸 ⊆ ℋ such that 𝒦 |= 𝐸 ⊑ 𝑂 (cf. Definition 1)
1 𝐸 ← ℋ;
2 for 𝐴 ∈ ℋ do
3     if 𝒦 |= (𝐸 ∖ {𝐴}) ⊑ 𝑂 then
4         𝐸 ← 𝐸 ∖ {𝐴};

5 return 𝐸;


Among these only 𝐸1 is explanatory, relevant, satisfiable, and minimal. We have that 𝐸2 is not
explanatory since 𝐶 ⊓ 𝐶 ′ ⊑ 𝐷 ∈ 𝒦, and 𝐸3 is not satisfiable (due to Axiom (4)). Finally, 𝐸4 is
not minimal since 𝐸1 ⊊ 𝐸4 . Note that {𝐷} is not an explanation since 𝐷 ∈
                                                                        / ℋ, but even if it were,
the explanation would not be relevant since ∅ |= 𝐷 ⊑ 𝐷.


3. Computing Abductive Explanations
Before proposing our approach, we point out that the number of (minimal) explanations for a
given observation may be exponential in the size of the knowledge base. We start by showing
how to compute one explanation before generalizing the approach to compute all explanations,
first, using hitting set trees and, then, via a SAT encoding.

3.1. Computing One Abductive Explanation
Given a concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩, a naive algorithm for computing one
explanation 𝐸 such that 𝒦 |= 𝐸 ⊑ 𝑂 is relatively easy. If 𝒦 ̸|= ℋ ⊑ 𝑂, there is no satisfiable
explanation and we can stop. Otherwise, we start from 𝐸 = ℋ and repeatedly remove concepts
from 𝐸 as long as this does not break the entailment 𝒦 |= 𝐸 ⊑ 𝑂. At a certain point, no concept
can be removed without breaking the entailment, which implies that 𝐸 is a minimal explanation
w.r.t. 𝒫. Algorithm 1 summarizes this idea. Note that Algorithm 1 makes calls to a reasoning
service for concept subsumption checking without relying on a particular kind of procedure
(e.g., tableau-based as well as consequence-based procedures may be used). It remains to check
whether the returned 𝐸 is explanatory, satisfiable, and relevant, which requires at most two
further subsumption checks and a check for set containment.
   The correctness of Algorithm 1 relies on the fact that a conjunction with additional conjuncts
is more specific:

Lemma 1. Let 𝒦 be a DL knowledge base, 𝐸 and 𝐸 ′ two sets of atomic concepts such that 𝐸 ′ ⊆ 𝐸,
and 𝑂 an atomic concept. Then 𝒦 |= 𝐸 ′ ⊑ 𝑂 implies 𝒦 |= 𝐸 ⊑ 𝑂.
Proof Sketch. Intuitively, the interpretation of a conjunction is more restricted the more con-
juncts are contained in the conjunction.

  Note that Lemma 1 only relies on the semantics of conjunctions and subsumption, which is
shared among all DLs. We show that Algorithm 1 is correct under the assumed semantics.

Theorem 1. Let 𝐸 be the output of Algorithm 1 for the input 𝒦, ℋ, and 𝑂 such that 𝒦 |= 𝐸 ⊑ 𝑂.
Then 𝐸 is a minimal explanation for the abduction problem ⟨𝒦, ℋ, 𝒪⟩.

Proof Sketch. Since 𝒦 |= ℋ ⊑ 𝑂 by assumption, 𝐸 is initialized such that 𝒦 |= 𝐸 ⊑ 𝑂 (in
Line 1). Since we only remove a concept from 𝐸, if the subsumption still holds in Line 4,
𝒦 |= 𝐸 ⊑ 𝑂 is preserved. We can further show that 𝐸 is minimal explanation as unneeded
conjuncts are step by step removed.

   Finally, observe that a run of Algorithm 1 requires exactly 𝑛 subsumption tests, where
𝑛 = ||ℋ||, which is bounded by the number of concepts in 𝒦 (two further subsumption tests
and a set containment test are needed for checking whether 𝐸 is relevant, satisfiable, and
explanatory). Hence, the complexity of computing one minimal explanation is bounded by
a linear function over the complexity of subsumption checking. In particular, for tractable
languages such as ℰℒ and its tractable extensions [16], one (minimal) explanation for the
concept abduction problem can be computed in polynomial time, which is worst-case optimal
[12].2 For DLs where subsumption checking is ExpTime-complete such as 𝒜ℒ𝒞 [17, 18], the
algorithm requires worst-case exponential time.

3.2. Computing All Abductive Explanations
An explanation for a concept abduction problem ⟨𝒦, ℋ, 𝒪⟩ consists of concepts that are sub-
sumed by the observation. As we have seen in Example 1, there can be several different
explanations. To find all explanations, it is, therefore, necessary to change a concept in every
explanation of ⟨𝒦, ℋ, 𝒪⟩ and the question of how to compute all (minimal) explanations arises.
  Note first, that the output of Algorithm 1 depends on the order in which the axioms in 𝒦
are enumerated in the for-loop (Line 2). Different orders of the concepts can result in different
removals and, consequently, different explanations.

Lemma 2. For each (minimal) explanation 𝐸 of a concept abduction problem ⟨𝒦, ℋ, 𝒪⟩, there
exists some order of concepts in ℋ for which Algorithm 1 with the input 𝒦, ℋ, and 𝑂 returns 𝐸.

Proof Sketch. Assume, to the contrary of what is to be shown, that a minimal explanation 𝐸𝑘 is
not returned. We can show that processing first concepts in ℋ ∖ 𝐸𝑘 and then concepts from 𝐸𝑘
leads to a contradiction as desired.

2
    Note that although subsumption checking for ℰℒ++ is tractable, the complexity of checking whether an explanation
    exists is found to be NP-complete by Bienvenu [12]. This apparent contradiction is because unsatisfiable explanations
    (which can only occur in logics that can express unsatisfiable concepts/⊥, as ℰℒ++) are excluded in the definition
    of Bienvenu as explanations. Our algorithm, however, might terminate with an unsatisfiable explanation. If that
    has to be avoided, some form of backtracking for such cases would be needed, which indeed then leads to a higher
    complexity of the problem.
                      𝐸1 = {𝐴, 𝐶 ′ }                               𝐸3 = {𝐴, 𝐵}
                   𝐴          𝐶′                                𝐴          𝐵
          𝐸2 = {𝐶, 𝐶 ′ }        𝐸3 = {𝐴, 𝐵}            𝐸2 = {𝐶, 𝐶 ′ }       𝐸1 = {𝐴, 𝐶 ′ }

          𝐶      𝐶′              𝐴     𝐵                𝐶     𝐶′            𝐴      𝐶′
         ⊥              ⊥      ⊥           ⊥          ⊥             ⊥      ⊥            ⊥

Figure 1: Two HS-trees for the concept abduction problem from Example 1


   The property stated in Lemma 2 means that for computing all explanations for 𝒫 = ⟨𝒦, ℋ, 𝒪⟩,
it is sufficient to run Algorithm 1 for all possible orders of concepts in ℋ. As the number of
explanations can be exponential in 𝑛, the exponential behavior of an algorithm for computing
all explanations cannot be avoided in general. Unfortunately, the described algorithm is not
very practical since it performs exponentially many subsumption tests for all inputs, even if, e.g.,
𝒫 has just one explanation. This is because this algorithm is not goal-directed: the computation
of each next explanation does not depend on the explanations computed before.
   Hence, the question arises, how we can we find a more goal-directed algorithm. Suppose that
we have computed an explanation 𝐸1 using Algorithm 1. The next explanation 𝐸2 must be
different from 𝐸1 , so 𝐸2 should miss at least one axiom from 𝐸1 . Hence the next explanation
𝐸2 can be found by finding 𝐴1 ∈ 𝐸1 such that 𝒦 |= (ℋ ∖ {𝐴1 }) ⊑ 𝑂 and calling Algorithm 1
for the input 𝒦, ℋ ∖ {𝐴1 }, and 𝑂. The next explanation 𝐸3 , similarly, should miss something
from 𝐸1 and something from 𝐸2 , so it can be found by finding some 𝐴1 ∈ 𝐸1 and 𝐴2 ∈ 𝐸2
such that 𝒦 |= (ℋ ∖ {𝐴1 , 𝐴2 }) ⊑ 𝑂 and calling Algorithm 1 for the input 𝒦, (ℋ ∖ {𝐴1 , 𝐴2 }),
and 𝑂. In general, when explanations 𝐸𝑖 (1 ≤ 𝑖 ≤ 𝑘) are computed, the next explanation
can be found by calling Algorithm 1 for the input 𝒦, (ℋ ∖ {𝐴𝑖 | 1 ≤ 𝑖 ≤ 𝑘}), and 𝑂 such
that 𝐴𝑖 ∈ 𝐸𝑖 (1 ≤ 𝑖 ≤ 𝑘) and 𝒦 |= (ℋ ∖ {𝐴𝑖 | 1 ≤ 𝑖 ≤ 𝑘}) ⊑ 𝑂. Enumeration of subsets
ℋ ∖ {𝐴𝑖 | 1 ≤ 𝑖 ≤ 𝑘} can be organized using a data structure called a hitting set tree.

Definition 2. Let 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ be a concept abduction problem. A hitting set tree (short:
HS-tree) for 𝒫 is a labeled tree 𝑇 = (𝑉, 𝐸, 𝐿) with 𝑉 ̸= ∅ such that:
  1. each non-leaf node 𝑣 ∈ 𝑉 is labeled with an explanation 𝐿(𝑣) = 𝐸 for 𝒫 and, for each
      𝐴 ∈ 𝐸, 𝑣 has an outgoing edge ⟨𝑣, 𝑤⟩ ∈ 𝐸 with label 𝐿(𝑣, 𝑤) = 𝐴, and
    2. each leaf node 𝑣 ∈ 𝑉 is labeled by a special symbol 𝐿(𝑣) = ⊥.
For each 𝑣 ∈ 𝑉 , let 𝐻(𝑣) be the set of edge labels appearing on the path from 𝑣 to the root node of
𝑇 . Then the following properties should additionally hold:
    3. for each non-leaf node 𝑣 ∈ 𝑉 , we have 𝐿(𝑣) ∩ 𝐻(𝑣) = ∅, and
   4. for each leaf node 𝑣 ∈ 𝑉 , we have 𝒦 ̸|= (ℋ ∖ 𝐻(𝑣)) ⊑ 𝑂.

  Figure 1 shows an example of two different HS-trees for the concept abduction problem from
Example 1.
  The following example shows that an explanation might occur more than once in an HS-Tree:
                                             𝐸1 = {𝐴′ , 𝐶}
                                        𝐴′               𝐶
                              𝐸2 = {𝐴, 𝐵}                    𝐸2 = {𝐴, 𝐵}

                               𝐴    𝐵                         𝐴   𝐵
                               ⊥         ⊥         𝐸3 = {𝐴′ , 𝐵} ⊥

                                                    𝐴′        𝐵
                                                    ⊥             ⊥

Figure 2: An HS-tree for the concept abduction problem from Example 2


Example 2. Consider the concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ with 𝒦 = {𝐴′ ⊑ 𝐴, 𝐴′ ⊓
𝐶 ⊑ ⊥, 𝐴 ⊓ 𝐵 ⊑ 𝐷}, ℋ = {𝐴, 𝐴′ , 𝐵, 𝐶}, and 𝑂 = 𝐷, which has the (minimal) explanations
𝐸1 = {𝐴′ , 𝐶} (not satisfiable), 𝐸2 = {𝐴, 𝐵} (not explanatory), and 𝐸3 = {𝐴′ , 𝐵} (satisfiable,
explanatory, and relevant). Figure 2 shows an HS-trees for 𝒫, where the explanation 𝐸2 = {𝐴, 𝐵}
labels two different nodes of the tree.

  We next prove that every HS-tree must contain every explanation at least once.

Lemma 3. Let 𝑇 = (𝑉, 𝐸, 𝐿) be an HS-tree for the concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩.
Then, for each explanation 𝐸 for 𝒫, there exists a node 𝑣 ∈ 𝑉 such that 𝐿(𝑣) = 𝐸.

Proof Sketch. Using Lemma 1 and Conditions 1, 3 and 4 of Definition 2, we can show that a
node 𝑣 ∈ 𝑉 for which 𝐻(𝑣) is maximal (w.r.t. set inclusion) and 𝐻(𝑣) ∩ 𝐸 = ∅ is such that
𝐿(𝑣) = 𝐸.

  We next show that each HS-tree 𝑇 = (𝑉, 𝐸, 𝐿) for a concept abduction problem 𝒫 =
⟨𝒦, ℋ, 𝒪⟩ has at most exponentially many nodes in the number of concepts in ℋ.

Lemma    4. Every HS-tree 𝑇 for a concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ has at most
           𝑘 nodes, where 𝑛 is the number of concepts in ℋ.
∑︀
   0≤𝑘≤𝑛 𝑛

Proof Sketch. By analyzing Conditions 1 and 3 of Definition 2, we can show that both the depth
and the branching factor of 𝑇 is bounded by ||ℋ||, which gives the desired bound.

   For constructing an HS-tree 𝑇 = (𝑉, 𝐸, 𝐿) for a concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩,
we can use Reiter’s Hitting Set Tree algorithm (or short: HST-algorithm) [11, 19]: We start by
creating the root node 𝑣0 ∈ 𝑉 . Then we repeatedly assign labels of nodes and edges as follows.
For each 𝑣 ∈ 𝑉 , if 𝐿(𝑣) was not yet assigned, we calculate 𝐻(𝑣). If 𝒦 ̸|= (ℋ ∖ 𝐻(𝑣)) ⊑ 𝑂, we
label 𝐿(𝑣) = ⊥ according to Condition 4 of Definition 2. Otherwise, we compute an explanation
𝐸 for 𝒦 |= (ℋ ∖ 𝐻(𝑣)) ⊑ 𝑂 using Algorithm 1 and set 𝐿(𝑣) = 𝐸. Note that 𝐸 satisfies
Condition 3 of Definition 2 since 𝐸 ⊆ (ℋ ∖ 𝐻(𝑣)). Next, for each 𝐴 ∈ 𝐸, we create a successor
node 𝑤 of 𝑣 and label 𝐿(𝑣, 𝑤) = 𝐴. This ensures that Condition 1 of Definition 2 is satisfied for
𝑣. Since, by Lemma 4, 𝐻 has a bounded number of nodes, this process eventually terminates.
 Algorithm 2: Computing all explanations by the Hitting Set Tree algorithm
  ComputeExplanationsHST(𝒦, ℋ, 𝑂): compute all explanations for ⟨𝒦, ℋ, 𝒪⟩
  input : a knowledge base 𝒦, a set of hypotheses ℋ, and an observation 𝑂 such that
         𝒦 |= ℋ ⊑ 𝑂
  output : the set of all (minimal) subsets 𝐸 ⊆ ℋ such that 𝒦 |= 𝐸 ⊑ 𝑂
1 𝑆 ← ∅;
2 𝑄 ← {∅};
3 while 𝑄 ̸= ∅ do
4     𝐻 ← choose 𝐻 ∈ 𝑄;
5     𝑄 ← 𝑄 ∖ {𝐻};
6     if 𝒦 |= ℋ ∖ 𝐻 ⊑ 𝑂 then
 7        𝐸 ← Minimize(𝒦, ℋ ∖ 𝐻, 𝑂);
 8        𝑆 ← 𝑆 ∪ {𝐸};
 9        for 𝐴 ∈ 𝐸 do
10            𝑄 ← 𝑄 ∪ {𝐻 ∪ {𝐴}};

11 return 𝑆;


Note that unlike the algorithm sketched in Lemma 2, the input for each call of Algorithm 1 now
depends on the results returned by the previous calls.
   The main idea of the HST-algorithm is to systematically compute two kinds of sets: (1) expla-
nations 𝐸 for the concept abduction problem ⟨𝒦, ℋ, 𝒪⟩ and (2) sets 𝐻 that contain one element
from each explanation 𝐸 on a branch. The name of the algorithm comes from the notion of a
hitting set, which characterizes the latter sets.

Definition 3. Let 𝑃 be a set of sets of some elements. A set 𝐻 is a hitting set for 𝑃 if 𝐻 ∩ 𝑆 ̸= ∅
for each 𝑆 ∈ 𝑃 . A hitting set 𝐻 for 𝑃 is minimal if every 𝐻 ′ ⊊ 𝐻 is not a hitting set for 𝑃 .

   Intuitively, a hitting set for 𝑃 is a set 𝐻 that contains at least one element from every set
𝑆 ∈ 𝑃 . An HS-tree is then a tree 𝑇 = (𝑉, 𝐸, 𝐿) such that for each 𝑣 ∈ 𝑉 , 𝐻(𝑣) is a hitting set
of the set of explanations on the path from 𝑣 to the root of 𝑇 . The leaf nodes 𝑣 of 𝑇 are labeled
by hitting sets 𝐻(𝑣) such 𝒦 ̸|= (ℋ ∖ 𝐻(𝑣)) ⊑ 𝑂. Intuitively, the set 𝐻(𝑣) represents a set such
that the removal of 𝐻(𝑣) from ℋ breaks the subsumption.
   We get following bound on the number of subsumption tests performed by the HST-algorithm:

Lemma∑︀ 5. An HS-tree for a concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ can be constructed using
at most 1≤𝑘≤𝑛+1 𝑛𝑘 subsumption tests, where 𝑛 is the number of concepts in ℋ.

Proof. We call Algorithm 1 exactly once per node. Combined with Lemma 4, we get the desired
bound on the number of subsumption tests performed by the HST-algorithm.

   The HST-algorithm can further be optimized in several ways. First, it is not necessary to
store the complete HS-tree in memory. For computing an explanation 𝐸 at each node 𝑣, it is
sufficient to know just the set 𝐻(𝑣). For each successor 𝑤 of 𝑣 associated with some 𝐴 ∈ 𝐻(𝑣),
the set 𝐻(𝑤) can be computed as 𝐻(𝑤) = 𝐻(𝑣) ∪ {𝐴}. Hence, it is possible to compute all
explanations by recursively processing and creating the sets 𝐻(𝑣) as shown in Algorithm 2. The
algorithm saves all explanations in a set 𝑆, which is initially empty (Line 1). The explanations
are computed by processing the sets 𝐻(𝑣); the sets that are not yet processed are stored in the
queue 𝑄, which initially contains 𝐻(𝑣0 ) = ∅ for the root node 𝑣0 (Line 2). The elements of 𝑄
are then repeatedly processed in a loop (Lines 3–10) until 𝑄 becomes empty. First, we choose
any 𝐻 ∈ 𝑄 (Line 4) and remove it from 𝑄 (Line 5). Then, we test whether 𝒦 |= (ℋ ∖ 𝐻) ⊑ 𝑂
(Line 6). If the subsumption holds, this means that the corresponding node 𝑣 of the HS-tree
with 𝐻(𝑣) = 𝐻 is not a leaf node. We then compute an explanation 𝐸 using Algorithm 1 and
add it to 𝑆 (Lines 7–8). Further, for each 𝐴 ∈ 𝐸, we create the set 𝐻(𝑤) = 𝐻(𝑣) ∪ {𝐴} for the
corresponding successor node 𝑤 of 𝑣 and add 𝐻(𝑤) to 𝑄 for later processing (Lines 9–10). If
the subsumption 𝒦 |= (ℋ ∖ 𝐻) ⊑ 𝑂 does not hold, we have reached a leaf of the HS-tree and
no further children of this node should be created.
Lemma 6. Given 𝒦, ℋ, and 𝑂 as input, Algorithm 2 returns all explanations for the concept
abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩.
Proof Sketch. We can prove the claim by showing that the following invariant always holds
in the main loop (Lines 3–10): If 𝐸 is an explanation for 𝒫, then either 𝐸 ∈ 𝑆 or there exists
𝐻 ∈ 𝑄 such that 𝐸 ⊆ ℋ ∖ 𝐻.

  Note that, as before, checking whether a returned subsumption is explanatory, satisfiable,
and relevant requires at most two subsumption and a set membership check.

3.3. Computing Abductive Explanations using SAT Solvers
The main idea of the HST-algorithm is to systematically compute two kinds of sets: (1) expla-
nations 𝐸 for a concept abduction problem ⟨𝒦, ℋ, 𝒪⟩ and (2) hitting sets 𝐻 that contain one
element from each explanation 𝐸 on a branch. Intuitively, for a leaf node 𝑣, 𝐻(𝑣) represents a
set such that the removal of 𝐻(𝑣) from ℋ breaks the subsumption, which means that every
explanation on the path to 𝑣 is also a minimal hitting set for 𝐻(𝑣). This property is called the
hitting set duality and it takes a prominent place in the HST-algorithm. We can, however, also
use this property as the basis of a direct algorithm for computing abductive explanations.
   Suppose that we have already computed some set 𝑆 of explanations and some set 𝑃 of hitting
sets for the concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩. How can we find a new explanation?
As mentioned, each new explanation must be a hitting set for 𝑃 , i.e., it should contain one
concept from every set in 𝑃 . Furthermore, it should be different from any of the previously
computed explanations, i.e., it should miss one axiom from every 𝐸 ∈ 𝑆. Suppose we have
found a subset 𝑀 ⊆ ℋ satisfying these two requirements:
                                    ∀𝑅 ∈ 𝑃 : 𝑀 ∩ 𝑅 ̸= ∅,                                     (5)
                                    ∀𝐸 ∈ 𝑆 : 𝐸 ∖ 𝑀 ̸= ∅.                                     (6)
If 𝒦 |= 𝑀 ⊑ 𝑂, then, using Algorithm 1, we can extract a minimal subset 𝐸 ′ ⊆ 𝑀 such
that 𝒦 |= 𝐸 ′ ⊑ 𝑂. Note that 𝐸 ′ still misses at least one axiom from each 𝐸 ∈ 𝑆 since (6) is
preserved under removal of concepts from 𝑀 . Therefore, 𝐸 ′ is a new explanation for 𝒫. If
𝒦 ̸|= 𝑀 ⊑ 𝑂, then, similarly, by adding concepts 𝐴 ∈ ℋ to 𝑀 preserving 𝒦 ̸|= 𝑀 ⊑ 𝑂, we
 Algorithm 3: Maximizing non-subsumption
  Maximize(𝒦, ℋ, 𝑀, 𝑂): compute a maximal subset 𝑁 ⊆ ℋ such that 𝑀 ⊆ 𝑁 and
                         𝒦 ̸|= 𝑁 ⊑ 𝑂
  input : a knowledge base 𝒦, a set of hypotheses ℋ, a subset 𝑀 ⊆ ℋ, and an
         observation 𝑂 such that 𝒦 ̸|= 𝑀 ⊑ 𝑂
  output : 𝑁 such that 𝑀 ⊆ 𝑁 ⊆ ℋ such that 𝒦 ̸|= 𝑁 ⊑ 𝑂 but 𝒦 |= 𝑁 ′ ⊑ 𝑂 for every
         𝑁 ′ with 𝑁 ⊊ 𝑁 ′ ⊆ ℋ
1 𝑁 ← 𝑀;
2 for 𝐴 ∈ ℋ ∖ 𝑀 do
3     if 𝒦 ̸|= (𝑁 ∪ {𝐴}) ⊑ 𝑂 then
4         𝑁 ← 𝑁 ∪ {𝐴};

5 return 𝑁 ;


can find a maximal superset 𝑁 of 𝑀 (𝑀 ⊆ 𝑁 ⊆ ℋ) such that 𝒦 ̸|= 𝑁 ⊑ 𝑂: see Algorithm 3.
Note that (5) is preserved under additions of elements to 𝑀 . Thus, using any set 𝑀 satisfying
(5) and (6) we can find either a new explanation or a new minimal hitting set.
   The question arises, how we can find a set 𝑀 satisfying Conditions (5) and (6). These
conditions require solving a rather complex combinatorial problem, for which propositional
(SAT) solvers offer a convenient and effective way of solving such problems. In the following,
we describe a propositional encoding of Conditions (5) and (6).
   To formulate the propositional encoding, we assign to each concept 𝐴 ∈ ℋ a fresh proposi-
tional variable 𝑝𝐴 . Then, every interpretation ℐ determines a set 𝑀 = 𝑀 (ℐ) = {𝐴 ∈ ℋ | 𝑝ℐ𝐴 =
1} of concepts whose corresponding propositional variable is true. We construct a propositional
formula 𝐹 such that 𝐹 ℐ = 1 if and only if 𝑀 (ℐ) satisfies (5) and (6) for the given sets 𝑆 of
explanations and 𝑃 of minimal hitting sets. Thus, to find a subset 𝑀 satisfying (5) and (6), it is
sufficient to find a model ℐ of 𝐹 and compute 𝑀 (ℐ). We define 𝐹 as follows:
                                            ⋀︁ ⋁︁            ⋀︁ ⋁︁
                          𝐹 = 𝐹 (𝑆, 𝑃 ) =            ¬𝑝𝐴 ∧           𝑝𝐴 .                     (7)
                                             𝐸∈𝑆 𝐴∈𝐸            𝑅∈𝑃 𝐴∈𝑅

Example 3. Let 𝒦 be the ontology from Example 1. We assign the propositional variables
𝑝𝐴 , 𝑝𝐵 , 𝑝𝐶 , 𝑝𝐶 ′ to the concepts 𝐴, 𝐵, 𝐶, 𝐶 ′ , respectively. Let 𝑆 = {{𝐴, 𝐶 ′ }, {𝐶, 𝐶 ′ }} be the set
of explanations 𝐸1 and 𝐸2 from Example 1 and 𝑃 a set containing {𝐴, 𝐶}, i.e., assume we have
constructed the left-most branch on the HST on the left-hand side of Figure 1. Then according to (7)
we have:
                    𝐹 = 𝐹 (𝑆, 𝑃 ) = (¬𝑝𝐴 ∨ ¬𝑝𝐶 ′ ) ∧ (¬𝑝𝐶 ∨ ¬𝑝𝐶 ′ ) ∧ (𝑝𝐴 ∨ 𝑝𝐶 ).
𝐹 has a model ℐ with 𝑝ℐ𝐴 = 1 and 𝑝ℐ𝐵 = 𝑝ℐ𝐶 = 𝑝ℐ𝐶 ′ = 0, which gives 𝑀 (ℐ) = {𝐴}.
   Once the set 𝑀 determined by a model ℐ of 𝐹 is found, we can extract either a new explanation
𝐸 or a new minimal hitting set 𝑅 from 𝑀 by minimizing the subsumee using Algorithm 1
or maximizing non-subsumption using Algorithm 3. After that, we can update 𝐹 according
to (7) and compute a new model of 𝐹 , if there exist any. Once 𝐹 is unsatisfiable, 𝑆 contains all
explanations for 𝒫 and 𝑃 contains all minimal hitting sets.
 Algorithm 4: Computing all explanations using a SAT solver
  ComputeExplanationsSAT(𝒦, ℋ, 𝑂): compute all explanations for the concept
                                           abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩
  input : a knowledge base 𝒦, a set of hypotheses ℋ, and an observation 𝑂 such that
         𝒦 |= ℋ ⊑ 𝑂
  output : the set of all minimal subsets 𝐸 ⊆ ℋ such that 𝒦 |= 𝐸 ⊑ 𝑂
1 𝑆 ← ∅;
2 𝐹 ← ⊤;
3 while ∃ ℐ : 𝐹 ℐ = 1 do
4   ℐ ← choose ℐ : 𝐹 ℐ = 1;
 5    𝑀 ← {𝐴 | 𝑝ℐ𝐴 = 1};
 6    if 𝒦 |= 𝑀 ⊑ 𝑂 then
 7        𝐸 ← Minimize(𝒦, 𝑀, 𝑂);
 8        𝑆 ← 𝑆 ∪ {𝐸};
                  ⋁︀
 9        𝐹 ← 𝐹 ∧ {¬𝑝𝐴 | 𝐴 ∈ 𝐸};
10    else
11        𝑁 ← Maximize(ℋ,
                  ⋁︀      ℋ, 𝑀, 𝑂);
12        𝐹 ← 𝐹 ∧ {𝑝𝐴 | 𝐴 ∈ ℋ ∖ 𝑁 };

13 return 𝑆;


   Algorithm 4 summarizes the described procedure for computing all explanations for a concept
abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ using a SAT solver. We start by creating an empty set 𝑆 of
explanations (Line 1) and a formula 𝐹 that is always true (Line 2). Then, in a loop (Lines 3–12), as
long as 𝐹 is satisfiable (which is checked using a SAT solver), we take any model ℐ of 𝐹 (Line 4),
extract the corresponding set 𝑀 = 𝑀 (ℐ) that it defines (Line 5), and check the subsumption
𝒦 |= 𝑀 ⊑ 𝑂. If the entailment holds, using Algorithm 1 we compute an explanation for
𝒦 |= 𝑀 ⊑ 𝑂 (Line 7), which, by Lemma 1, is an explanation for 𝒫. This explanation is then
added to 𝑆 (Line 8) and 𝐹 is extended with a new conjunct for this explanation according to (7)
(Line 9). If the entailment does not hold, we compute a maximal superset 𝑁 of 𝑀 such that
𝒦 ̸|= 𝑁 ⊑ 𝑂 using Algorithm 3 (Line 11) and extend 𝐹 with the corresponding conjunct for
the new minimal hitting set 𝑅 = ℋ ∖ 𝑁 according to (7) (Line 12). As soon as 𝐹 becomes
unsatisfiable, we return the set 𝑆 of computed explanations (Line 13).

Example 4. Consider the concept abduction problem 𝒫 = ⟨𝒦, ℋ, 𝒪⟩ from Example 1 and
propositional encoding of concepts in ℋ from Example 3. The following table shows a run of
Algorithm 4 for the inputs 𝒦, ℋ, and 𝑂. Every row in this table corresponds to one iteration of the
while-loop (Lines 3–12). The first column gives the value of the interpretation ℐ for 𝐹 computed
in this iteration. The second column shows the value of 𝑀 computed for this interpretation and
whether the entailment 𝒦 |= 𝑀 ⊑ 𝑂 holds. The third column shows the result of minimizing the
subsumee or maximizing the non-subsumption using Algorithms 1 and 3. The last column shows
the conjunct that is added to 𝐹 for the corresponding explanation or minimal hitting set.
                                                       𝒦 |= min(𝑀 ) ⊑ 𝑂/
       𝑝ℐ𝐴   𝑝ℐ𝐵   𝑝ℐ𝐶   𝑝ℐ𝐶 ′    𝒦 |=? 𝑀 ⊑ 𝑂          𝒦 ̸|= max(𝑀 ) ⊑ 𝑂         𝐶
        0     0     0     0          𝒦 ̸|= ⊤ ⊑ 𝑂       𝒦 ̸|= {𝐴, 𝐶} ⊑ 𝑂       𝑝𝐵 ∨ 𝑝𝐶 ′
        0     1     0     0       𝒦 ̸|= {𝐵} ⊑ 𝑂        𝒦 ̸|= {𝐵, 𝐶} ⊑ 𝑂       𝑝𝐴 ∨ 𝑝𝐶 ′
        1     1     0     0      𝒦 |= {𝐴, 𝐵} ⊑ 𝑂       𝒦 |= {𝐴, 𝐵} ⊑ 𝑂       ¬𝑝𝐴 ∨ ¬𝑝𝐵
        1     0     0     1      𝒦 |= {𝐴, 𝐶 ′ } ⊑ 𝑂    𝒦 |= {𝐴, 𝐶 ′ } ⊑ 𝑂    ¬𝑝𝐴 ∨ ¬𝑝𝐶 ′
        0     0     1     1      𝒦 |= {𝐶, 𝐶 ′ } ⊑ 𝑂    𝒦 |= {𝐶, 𝐶 ′ } ⊑ 𝑂    ¬𝑝𝐶 ∨ ¬𝑝𝐶 ′
        0     1     0     1      𝒦 ̸|= {𝐵, 𝐶 ′ } ⊑ 𝑂   𝒦 ̸|= {𝐵, 𝐶 ′ } ⊑ 𝑂    𝑝𝐴 ∨ 𝑝𝐶

   We briefly discuss similarities and differences between Algorithm 2 and Algorithm 4: Both
algorithms systematically explore subsets of ℋ and minimize the subsumee from such subset to
compute explanations. Algorithm 2 constructs such subsets (ℋ ∖ 𝐻) manually by removing one
axiom appearing in the previously computed explanation (if there is any) in all possible ways.
Algorithm 4 enumerates such subsets 𝑀 with the help of a SAT solver. The main difference is
that Algorithm 2 may encounter the same subsets many times (on different branches), whereas
the propositional encoding in Algorithm 4 ensures that such subsets never repeat. Of course,
an iteration of Algorithm 2 cannot be directly compared to an iteration of Algorithm 4. Both
iterations use at most one call to Algorithm 1, but Algorithm 4 may also require a call to
Algorithm 3, as well as checking satisfiability of 𝐹 . The latter requires solving an NP-complete
problem, for which no polynomial algorithm is known so far. In order to check satisfiability
of 𝐹 , a SAT solver usually tries several (in worst-case exponentially many) propositional
interpretations until a model of 𝐹 is found. As each such interpretation ℐ corresponds to a
subset 𝑀 (ℐ) ⊆ ℋ, this process can be compared to the enumeration of subsets in Algorithm 2.
However, a SAT solver usually implements a number of sophisticated optimizations, which
make the search for models very efficient in practice, whereas the subset enumeration strategy
used by Algorithm 2 is rather simplistic. Hence Algorithm 4 is likely to win in speed. On
the other hand, Algorithm 4 requires saving all explanations (and minimal hitting sets) in the
propositional formula 𝐹 , which might result in a formula of exponential size, if the number
of such explanations or hitting sets is exponential. In this regard, Algorithm 2 could be more
memory efficient since saving (all) explanations is optional (see the discussion at the end of
Section 3.2). Hence both algorithms have their own advantages and disadvantages.


4. Conclusions
We have presented two alternative algorithms for computing (all or some) solutions to the
concept abduction problem: one algorithm is based on Reiter’s hitting set tree algorithm,
whereas the other one relies on a SAT encoding. It remains to be analyzed how these algorithms
behave in practice and how these algorithms differ on different real-world ontologies, where an
important aspect is also finding efficient incremental SAT solvers.
  In contrast to previous work, the algorithms do not rely on a refutation-based calculus and,
hence, can be used also with efficient reasoners for tractable DLs such as ℰℒ and its extensions.
Another direction for future work is extending the approach to other forms of (logic-based)
abduction, e.g., to ABox abduction and a comparison with other existing approaches, which are
mostly focusing on ABox abduction.
References
 [1] C. S. Peirce, Deduction, Induction, and Hypothesis, Popular Science Monthly 13 (1878)
     470–482.
 [2] T. Eiter, G. Gottlob, The complexity of logic-based abduction, J. ACM 42 (1995) 3–42. URL:
     https://doi.org/10.1145/200836.200838.
 [3] M. Denecker, A. C. Kakas, Abduction in logic programming, in: A. C. Kakas, F. Sadri
     (Eds.), Computational Logic: Logic Programming and Beyond, Essays in Honour of Robert
     A. Kowalski, Part I, volume 2407 of Lecture Notes in Computer Science, Springer, 2002, pp.
     402–436. URL: https://doi.org/10.1007/3-540-45628-7_16.
 [4] C. Elsenbroich, O. Kutz, U. Sattler, A case for abductive reasoning over ontologies, in:
     B. C. Grau, P. Hitzler, C. Shankey, E. Wallace (Eds.), Proceedings of the OWLED*06
     Workshop on OWL: Experiences and Directions, Athens, Georgia, USA, November 10-
     11, 2006, volume 216 of CEUR Workshop Proceedings, CEUR-WS.org, 2006. URL: http:
     //ceur-ws.org/Vol-216/submission_25.pdf.
 [5] S. Klarman, U. Endriss, S. Schlobach, Abox abduction in the description logic ALC, J.
     Autom. Reason. 46 (2011) 43–80. URL: https://doi.org/10.1007/s10817-010-9168-z.
 [6] K. Halland, A. Britz, S. Klarman, Tbox abduction in ALC using a DL tableau, in:
     M. Bienvenu, M. Ortiz, R. Rosati, M. Simkus (Eds.), Informal Proceedings of the 27th
     International Workshop on Description Logics, Vienna, Austria, July 17-20, 2014, vol-
     ume 1193 of CEUR Workshop Proceedings, CEUR-WS.org, 2014, pp. 556–566. URL: http:
     //ceur-ws.org/Vol-1193/paper_42.pdf.
 [7] J. Pukancová, M. Homola, Abox abduction for description logics: The case of multiple
     observations, in: M. Ortiz, T. Schneider (Eds.), Proceedings of the 31st International Work-
     shop on Description Logics co-located with 16th International Conference on Principles
     of Knowledge Representation and Reasoning (KR 2018), Tempe, Arizona, US, October 27th
     - to - 29th, 2018, volume 2211 of CEUR Workshop Proceedings, CEUR-WS.org, 2018. URL:
     http://ceur-ws.org/Vol-2211/paper-31.pdf.
 [8] P. Koopmann, W. Del-Pinto, S. Tourret, R. A. Schmidt, Signature-based abduction for
     expressive description logics, in: D. Calvanese, E. Erdem, M. Thielscher (Eds.), Proceedings
     of the 17th International Conference on Principles of Knowledge Representation and
     Reasoning, KR 2020, Rhodes, Greece, September 12-18, 2020, 2020, pp. 592–602. URL:
     https://doi.org/10.24963/kr.2020/59.
 [9] W. Del-Pinto, R. A. Schmidt, Abox abduction via forgetting in ALC, in: The Thirty-
     Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative
     Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium
     on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA,
     January 27 - February 1, 2019, AAAI Press, 2019, pp. 2768–2775. URL: https://doi.org/10.
     1609/aaai.v33i01.33012768.
[10] B. Glimm, Y. Kazakov, Classical algorithms for reasoning and explanation in description
     logics, in: M. Krötzsch, D. Stepanova (Eds.), Reasoning Web. Explainable Artificial Intel-
     ligence - 15th International Summer School 2019, Bolzano, Italy, September 20-24, 2019,
     Tutorial Lectures, volume 11810 of Lecture Notes in Computer Science, Springer, 2019, pp.
     1–64. URL: https://doi.org/10.1007/978-3-030-31423-1_1.
[11] R. Reiter, A theory of diagnosis from first principles, Artificial Intelligence 32 (1987) 57–
     95. URL: https://www.sciencedirect.com/science/article/pii/0004370287900622. doi:https:
     //doi.org/10.1016/0004-3702(87)90062-2.
[12] M. Bienvenu, Complexity of abduction in the EL family of lightweight description logics,
     in: G. Brewka, J. Lang (Eds.), Principles of Knowledge Representation and Reasoning: Pro-
     ceedings of the Eleventh International Conference, KR 2008, Sydney, Australia, September
     16-19, 2008, AAAI Press, 2008, pp. 220–230. URL: http://www.aaai.org/Library/KR/2008/
     kr08-022.php.
[13] M. W. Birte Glimm, Yevgeny Kazakov, Concept Abduction for Description Log-
     ics – Technical Report, Technical Report, Ulm University, Institute of Artifi-
     cial Intelligence, 2022. https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.090/
     Publikationen/2022/GlKW22a_report.pdf.
[14] M. Krötzsch, F. Simančík, I. Horrocks, A description logic primer, CoRR abs/1201.4089
     (2012). Available at http://arxiv.org/pdf/1201.4089.pdf.
[15] T. Eiter, G. Gottlob, The complexity of logic-based abduction, J. ACM 42 (1995) 3–42. URL:
     https://doi.org/10.1145/200836.200838. doi:10.1145/200836.200838.
[16] F. Baader, S. Brandt, C. Lutz, Pushing the ℰℒ envelope further, in: Proc. 5th Workshop on
     OWL: Experiences and Directions (OWLED’08), volume 496, CEUR, 2008.
[17] K. Schild, A correspondence theory for terminological logics: Preliminary report, in:
     J. Mylopoulos, R. Reiter (Eds.), Proc. 12th Int. Joint Conf. on Artificial Intelligence (IJCAI’91),
     Morgan Kaufmann, 1991, pp. 466–471.
[18] F. M. Donini, F. Massacci, EXPTIME tableaux for 𝒜ℒ𝒞, J. of Artificial Intelligence 124
     (2000) 87–138.
[19] R. Greiner, B. A. Smith, R. W. Wilkerson, Readings in model-based diagnosis, Morgan
     Kaufmann Publishers Inc., 1992, pp. 49–53.