Concept Abduction for Description Logics Birte Glimm1 , Yevgeny Kazakov1 and Michael Welt1 1 Ulm University, Germany Abstract We present two alternative algorithms for computing (all or some) solutions to the concept abduction problem: one algorithm is based on Reiter’s hitting set tree algorithm, whereas the other one relies on a SAT encoding. In contrast to previous work, the algorithms do not rely on a refutation-based calculus and, hence, can be used also with efficient reasoners for tractable DLs such as β„°β„’ and its extensions. An adaptation to other forms of (logic-based) abduction, e.g., to ABox abduction, is also possible. Keywords Description Logics, abduction, ontologies, reasoning 1. Introduction The reasoning service of abduction, as originally introduced by Peirce [1], is a form of rea- soning that is used to generate possible explanations for a given observation. In logic-based abduction, the objective is to find a set of hypotheses (the explanation) which are consistent with a background theory and which logically entail the observation when taken together with the background knowledge. Often some minimality criterion, e.g., set inclusion or cardinality, is used to select preferred explanations. Abduction has numerous applications in artificial intelligence, e.g., in diagnosis, planning, or natural language understanding and abductive reasoning has intensively been studied in the context of propositional logic (see, e.g., [2]) and also in the context of answer set programming (see, e.g., [3]). For logic-based abduction over ontologies, approaches started to emerge more recently (see, e.g., [4, 5, 6, 7]). With an exception of recent approaches based on forgetting [8, 9], practical approaches for logic-based abduction over ontologies are based on refutation-based calculi such as tableau. Such approaches are based on the observation that the knowledge base 𝒦 does not yet entail the observation 𝑂, i.e., 𝒦 βˆͺ {¬𝑂} is satisfiable, whereas an explanation 𝐸 has to be such that 𝒦 βˆͺ 𝐸 βˆͺ {¬𝑂} is unsatisfiable since it is desired that 𝒦 βˆͺ 𝐸 |= 𝑂. Hence, open branches in a tableau for 𝒦 βˆͺ {¬𝑂} can β€œguide” the process of finding explanations as explanations have to be such that they lead to closing the open branches. While refutation-based calculi are dominantly used for reasoning in expressive Description Logics (DLs), tractable reasoning procedures for DLs such as β„°β„’ are not based on refutation, DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel $ birte.glimm@uni-ulm.de (B. Glimm); yevgeny.kazakov@uni-ulm.de (Y. Kazakov); michael.welt@uni-ulm.de (M. Welt) Β€ https://www.uni-ulm.de/in/ki/glimm/ (B. Glimm); https://www.uni-ulm.de/in/ki/kazakov/ (Y. Kazakov)  0000-0002-6331-4176 (B. Glimm); 0000-0002-3011-8646 (Y. Kazakov) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) but directly derive (logical) consequences, which prevents the use of existing approaches to abduction. We aim at closing this gap and, inspired by the work of Kazakov and Glimm [10] for computing justification (i.e., minimal subsets of axioms from a knowledge base that are needed for a given entailment), we propose an alternative approach to efficiently compute some or all (minimal) explanations. The approach is based on Reiter’s minimal hitting set algorithm [11] without relying on a refutation-based calculus to guide the algorithm. An encoding of the problem into SAT allows for better managing the combinatorial problem of selecting what constitutes an explanation. As our goal is the development of an approach for DLs such as β„°β„’, where the main reasoning task involves determining concept subsumptions, we focus on concept abduction [12] as introduced next. The accompanying technical report gives further details, examples, and complete proofs [13]. 2. Preliminaries We use the standard syntax and semantics of DLs (see, e.g., [14]). Note that the DL β„°β„’ only allows for using the top concept, conjunctions, and existential restrictions as concepts. We now introduce the basic ideas of (concept) abduction (see also [12, 15]). Definition 1. A concept abduction problem 𝒫 is a tuple βŸ¨π’¦, β„‹, π’ͺ⟩ with 𝒦 a knowledge base, β„‹, the hypotheses, a set of atomic concepts, and 𝑂, the observation, a single atomic concept.1 A set 𝐸 = {𝐴1 , . . . , 𝐴𝑛 } βŠ† β„‹ is an explanation for 𝒫 if 𝒦 |= 𝐴1 βŠ“ . . . βŠ“ 𝐴𝑛 βŠ‘ 𝑂. Such an 𝐸 is explanatory if 𝐴1 βŠ“ . . . βŠ“ 𝐴𝑛 βŠ‘ 𝑂 ̸∈ 𝒦, 𝐸 is satisfiable if 𝐴1 βŠ“ . . . βŠ“ 𝐴𝑛 is satisfiable w.r.t. 𝒦, 𝐸 is relevant if βˆ… ΜΈ|= 𝐴1 βŠ“ . . . βŠ“ 𝐴𝑛 βŠ‘ 𝑂, and 𝐸 (syntactically) minimal if there is no other explanation 𝐸 β€² for 𝒫 such that 𝐸 β€² ⊊ 𝐸. Note that the given set of hypothesis allows for restricting the set of concepts that can be used in explanations, but may also contain all concepts used in the knowledge base. In the remainder, we are interested in finding explanations that are explanatory, satisfiable, relevant, and (syntactically) minimal. Abusing notation, for a set of concepts 𝐸 = {𝐴1 , . . . , 𝐴𝑛 }, we also write 𝒦 |= 𝐸 βŠ‘ 𝑂 instead of 𝒦 |= 𝐴1 βŠ“ . . . βŠ“ 𝐴𝑛 βŠ‘ 𝑂. Example 1. Consider the knowledge base 𝒦 containing the axioms 𝐴 βŠ‘ βˆƒπ‘Ÿ.𝐡 (1) 𝐢 βŠ“ 𝐢′ βŠ‘ 𝐷 (3) βˆƒπ‘Ÿ.𝐡 βŠ‘ 𝐢 (2) π΄βŠ“π΅ βŠ‘βŠ₯ (4) and the abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ with β„‹ = {𝐴, 𝐡, 𝐢, 𝐢 β€² } and 𝑂 = 𝐷. From Axiom (1) and Axiom (2), we have 𝒦 |= 𝐴 βŠ‘ 𝐢. Hence, together with Axiom (3), we find that 𝒦 |= 𝐴 βŠ“ 𝐢 β€² βŠ‘ 𝐷 = 𝑂. Among others, we have the following explanations for 𝒫: 𝐸1 = {𝐴, 𝐢 β€² } 𝐸2 = {𝐢, 𝐢 β€² } 𝐸3 = {𝐴, 𝐡} 𝐸4 = {𝐴, 𝐢, 𝐢 β€² } 1 Note that requiring the observation to be an atomic concept is without loss of generality since for an observation in form of a complex concept 𝐢, we can simply introduce an axiom 𝑂 ≑ 𝐢 and use 𝑂 as observation. We can proceed analogously for hypotheses. Algorithm 1: Finding one explanation Minimize(𝒦, β„‹, 𝑂): compute a minimal explanation for the concept abduction problem βŸ¨π’¦, β„‹, π’ͺ⟩ input : a knowledge base 𝒦, a set of hypotheses β„‹, and an observation 𝑂 such that 𝒦 |= β„‹ βŠ‘ 𝑂 output : a minimal explanation 𝐸 βŠ† β„‹ such that 𝒦 |= 𝐸 βŠ‘ 𝑂 (cf. Definition 1) 1 𝐸 ← β„‹; 2 for 𝐴 ∈ β„‹ do 3 if 𝒦 |= (𝐸 βˆ– {𝐴}) βŠ‘ 𝑂 then 4 𝐸 ← 𝐸 βˆ– {𝐴}; 5 return 𝐸; Among these only 𝐸1 is explanatory, relevant, satisfiable, and minimal. We have that 𝐸2 is not explanatory since 𝐢 βŠ“ 𝐢 β€² βŠ‘ 𝐷 ∈ 𝒦, and 𝐸3 is not satisfiable (due to Axiom (4)). Finally, 𝐸4 is not minimal since 𝐸1 ⊊ 𝐸4 . Note that {𝐷} is not an explanation since 𝐷 ∈ / β„‹, but even if it were, the explanation would not be relevant since βˆ… |= 𝐷 βŠ‘ 𝐷. 3. Computing Abductive Explanations Before proposing our approach, we point out that the number of (minimal) explanations for a given observation may be exponential in the size of the knowledge base. We start by showing how to compute one explanation before generalizing the approach to compute all explanations, first, using hitting set trees and, then, via a SAT encoding. 3.1. Computing One Abductive Explanation Given a concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩, a naive algorithm for computing one explanation 𝐸 such that 𝒦 |= 𝐸 βŠ‘ 𝑂 is relatively easy. If 𝒦 ΜΈ|= β„‹ βŠ‘ 𝑂, there is no satisfiable explanation and we can stop. Otherwise, we start from 𝐸 = β„‹ and repeatedly remove concepts from 𝐸 as long as this does not break the entailment 𝒦 |= 𝐸 βŠ‘ 𝑂. At a certain point, no concept can be removed without breaking the entailment, which implies that 𝐸 is a minimal explanation w.r.t. 𝒫. Algorithm 1 summarizes this idea. Note that Algorithm 1 makes calls to a reasoning service for concept subsumption checking without relying on a particular kind of procedure (e.g., tableau-based as well as consequence-based procedures may be used). It remains to check whether the returned 𝐸 is explanatory, satisfiable, and relevant, which requires at most two further subsumption checks and a check for set containment. The correctness of Algorithm 1 relies on the fact that a conjunction with additional conjuncts is more specific: Lemma 1. Let 𝒦 be a DL knowledge base, 𝐸 and 𝐸 β€² two sets of atomic concepts such that 𝐸 β€² βŠ† 𝐸, and 𝑂 an atomic concept. Then 𝒦 |= 𝐸 β€² βŠ‘ 𝑂 implies 𝒦 |= 𝐸 βŠ‘ 𝑂. Proof Sketch. Intuitively, the interpretation of a conjunction is more restricted the more con- juncts are contained in the conjunction. Note that Lemma 1 only relies on the semantics of conjunctions and subsumption, which is shared among all DLs. We show that Algorithm 1 is correct under the assumed semantics. Theorem 1. Let 𝐸 be the output of Algorithm 1 for the input 𝒦, β„‹, and 𝑂 such that 𝒦 |= 𝐸 βŠ‘ 𝑂. Then 𝐸 is a minimal explanation for the abduction problem βŸ¨π’¦, β„‹, π’ͺ⟩. Proof Sketch. Since 𝒦 |= β„‹ βŠ‘ 𝑂 by assumption, 𝐸 is initialized such that 𝒦 |= 𝐸 βŠ‘ 𝑂 (in Line 1). Since we only remove a concept from 𝐸, if the subsumption still holds in Line 4, 𝒦 |= 𝐸 βŠ‘ 𝑂 is preserved. We can further show that 𝐸 is minimal explanation as unneeded conjuncts are step by step removed. Finally, observe that a run of Algorithm 1 requires exactly 𝑛 subsumption tests, where 𝑛 = ||β„‹||, which is bounded by the number of concepts in 𝒦 (two further subsumption tests and a set containment test are needed for checking whether 𝐸 is relevant, satisfiable, and explanatory). Hence, the complexity of computing one minimal explanation is bounded by a linear function over the complexity of subsumption checking. In particular, for tractable languages such as β„°β„’ and its tractable extensions [16], one (minimal) explanation for the concept abduction problem can be computed in polynomial time, which is worst-case optimal [12].2 For DLs where subsumption checking is ExpTime-complete such as π’œβ„’π’ž [17, 18], the algorithm requires worst-case exponential time. 3.2. Computing All Abductive Explanations An explanation for a concept abduction problem βŸ¨π’¦, β„‹, π’ͺ⟩ consists of concepts that are sub- sumed by the observation. As we have seen in Example 1, there can be several different explanations. To find all explanations, it is, therefore, necessary to change a concept in every explanation of βŸ¨π’¦, β„‹, π’ͺ⟩ and the question of how to compute all (minimal) explanations arises. Note first, that the output of Algorithm 1 depends on the order in which the axioms in 𝒦 are enumerated in the for-loop (Line 2). Different orders of the concepts can result in different removals and, consequently, different explanations. Lemma 2. For each (minimal) explanation 𝐸 of a concept abduction problem βŸ¨π’¦, β„‹, π’ͺ⟩, there exists some order of concepts in β„‹ for which Algorithm 1 with the input 𝒦, β„‹, and 𝑂 returns 𝐸. Proof Sketch. Assume, to the contrary of what is to be shown, that a minimal explanation πΈπ‘˜ is not returned. We can show that processing first concepts in β„‹ βˆ– πΈπ‘˜ and then concepts from πΈπ‘˜ leads to a contradiction as desired. 2 Note that although subsumption checking for β„°β„’++ is tractable, the complexity of checking whether an explanation exists is found to be NP-complete by Bienvenu [12]. This apparent contradiction is because unsatisfiable explanations (which can only occur in logics that can express unsatisfiable concepts/βŠ₯, as β„°β„’++) are excluded in the definition of Bienvenu as explanations. Our algorithm, however, might terminate with an unsatisfiable explanation. If that has to be avoided, some form of backtracking for such cases would be needed, which indeed then leads to a higher complexity of the problem. 𝐸1 = {𝐴, 𝐢 β€² } 𝐸3 = {𝐴, 𝐡} 𝐴 𝐢′ 𝐴 𝐡 𝐸2 = {𝐢, 𝐢 β€² } 𝐸3 = {𝐴, 𝐡} 𝐸2 = {𝐢, 𝐢 β€² } 𝐸1 = {𝐴, 𝐢 β€² } 𝐢 𝐢′ 𝐴 𝐡 𝐢 𝐢′ 𝐴 𝐢′ βŠ₯ βŠ₯ βŠ₯ βŠ₯ βŠ₯ βŠ₯ βŠ₯ βŠ₯ Figure 1: Two HS-trees for the concept abduction problem from Example 1 The property stated in Lemma 2 means that for computing all explanations for 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩, it is sufficient to run Algorithm 1 for all possible orders of concepts in β„‹. As the number of explanations can be exponential in 𝑛, the exponential behavior of an algorithm for computing all explanations cannot be avoided in general. Unfortunately, the described algorithm is not very practical since it performs exponentially many subsumption tests for all inputs, even if, e.g., 𝒫 has just one explanation. This is because this algorithm is not goal-directed: the computation of each next explanation does not depend on the explanations computed before. Hence, the question arises, how we can we find a more goal-directed algorithm. Suppose that we have computed an explanation 𝐸1 using Algorithm 1. The next explanation 𝐸2 must be different from 𝐸1 , so 𝐸2 should miss at least one axiom from 𝐸1 . Hence the next explanation 𝐸2 can be found by finding 𝐴1 ∈ 𝐸1 such that 𝒦 |= (β„‹ βˆ– {𝐴1 }) βŠ‘ 𝑂 and calling Algorithm 1 for the input 𝒦, β„‹ βˆ– {𝐴1 }, and 𝑂. The next explanation 𝐸3 , similarly, should miss something from 𝐸1 and something from 𝐸2 , so it can be found by finding some 𝐴1 ∈ 𝐸1 and 𝐴2 ∈ 𝐸2 such that 𝒦 |= (β„‹ βˆ– {𝐴1 , 𝐴2 }) βŠ‘ 𝑂 and calling Algorithm 1 for the input 𝒦, (β„‹ βˆ– {𝐴1 , 𝐴2 }), and 𝑂. In general, when explanations 𝐸𝑖 (1 ≀ 𝑖 ≀ π‘˜) are computed, the next explanation can be found by calling Algorithm 1 for the input 𝒦, (β„‹ βˆ– {𝐴𝑖 | 1 ≀ 𝑖 ≀ π‘˜}), and 𝑂 such that 𝐴𝑖 ∈ 𝐸𝑖 (1 ≀ 𝑖 ≀ π‘˜) and 𝒦 |= (β„‹ βˆ– {𝐴𝑖 | 1 ≀ 𝑖 ≀ π‘˜}) βŠ‘ 𝑂. Enumeration of subsets β„‹ βˆ– {𝐴𝑖 | 1 ≀ 𝑖 ≀ π‘˜} can be organized using a data structure called a hitting set tree. Definition 2. Let 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ be a concept abduction problem. A hitting set tree (short: HS-tree) for 𝒫 is a labeled tree 𝑇 = (𝑉, 𝐸, 𝐿) with 𝑉 ΜΈ= βˆ… such that: 1. each non-leaf node 𝑣 ∈ 𝑉 is labeled with an explanation 𝐿(𝑣) = 𝐸 for 𝒫 and, for each 𝐴 ∈ 𝐸, 𝑣 has an outgoing edge βŸ¨π‘£, π‘€βŸ© ∈ 𝐸 with label 𝐿(𝑣, 𝑀) = 𝐴, and 2. each leaf node 𝑣 ∈ 𝑉 is labeled by a special symbol 𝐿(𝑣) = βŠ₯. For each 𝑣 ∈ 𝑉 , let 𝐻(𝑣) be the set of edge labels appearing on the path from 𝑣 to the root node of 𝑇 . Then the following properties should additionally hold: 3. for each non-leaf node 𝑣 ∈ 𝑉 , we have 𝐿(𝑣) ∩ 𝐻(𝑣) = βˆ…, and 4. for each leaf node 𝑣 ∈ 𝑉 , we have 𝒦 ΜΈ|= (β„‹ βˆ– 𝐻(𝑣)) βŠ‘ 𝑂. Figure 1 shows an example of two different HS-trees for the concept abduction problem from Example 1. The following example shows that an explanation might occur more than once in an HS-Tree: 𝐸1 = {𝐴′ , 𝐢} 𝐴′ 𝐢 𝐸2 = {𝐴, 𝐡} 𝐸2 = {𝐴, 𝐡} 𝐴 𝐡 𝐴 𝐡 βŠ₯ βŠ₯ 𝐸3 = {𝐴′ , 𝐡} βŠ₯ 𝐴′ 𝐡 βŠ₯ βŠ₯ Figure 2: An HS-tree for the concept abduction problem from Example 2 Example 2. Consider the concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ with 𝒦 = {𝐴′ βŠ‘ 𝐴, 𝐴′ βŠ“ 𝐢 βŠ‘ βŠ₯, 𝐴 βŠ“ 𝐡 βŠ‘ 𝐷}, β„‹ = {𝐴, 𝐴′ , 𝐡, 𝐢}, and 𝑂 = 𝐷, which has the (minimal) explanations 𝐸1 = {𝐴′ , 𝐢} (not satisfiable), 𝐸2 = {𝐴, 𝐡} (not explanatory), and 𝐸3 = {𝐴′ , 𝐡} (satisfiable, explanatory, and relevant). Figure 2 shows an HS-trees for 𝒫, where the explanation 𝐸2 = {𝐴, 𝐡} labels two different nodes of the tree. We next prove that every HS-tree must contain every explanation at least once. Lemma 3. Let 𝑇 = (𝑉, 𝐸, 𝐿) be an HS-tree for the concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩. Then, for each explanation 𝐸 for 𝒫, there exists a node 𝑣 ∈ 𝑉 such that 𝐿(𝑣) = 𝐸. Proof Sketch. Using Lemma 1 and Conditions 1, 3 and 4 of Definition 2, we can show that a node 𝑣 ∈ 𝑉 for which 𝐻(𝑣) is maximal (w.r.t. set inclusion) and 𝐻(𝑣) ∩ 𝐸 = βˆ… is such that 𝐿(𝑣) = 𝐸. We next show that each HS-tree 𝑇 = (𝑉, 𝐸, 𝐿) for a concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ has at most exponentially many nodes in the number of concepts in β„‹. Lemma 4. Every HS-tree 𝑇 for a concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ has at most π‘˜ nodes, where 𝑛 is the number of concepts in β„‹. βˆ‘οΈ€ 0β‰€π‘˜β‰€π‘› 𝑛 Proof Sketch. By analyzing Conditions 1 and 3 of Definition 2, we can show that both the depth and the branching factor of 𝑇 is bounded by ||β„‹||, which gives the desired bound. For constructing an HS-tree 𝑇 = (𝑉, 𝐸, 𝐿) for a concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩, we can use Reiter’s Hitting Set Tree algorithm (or short: HST-algorithm) [11, 19]: We start by creating the root node 𝑣0 ∈ 𝑉 . Then we repeatedly assign labels of nodes and edges as follows. For each 𝑣 ∈ 𝑉 , if 𝐿(𝑣) was not yet assigned, we calculate 𝐻(𝑣). If 𝒦 ΜΈ|= (β„‹ βˆ– 𝐻(𝑣)) βŠ‘ 𝑂, we label 𝐿(𝑣) = βŠ₯ according to Condition 4 of Definition 2. Otherwise, we compute an explanation 𝐸 for 𝒦 |= (β„‹ βˆ– 𝐻(𝑣)) βŠ‘ 𝑂 using Algorithm 1 and set 𝐿(𝑣) = 𝐸. Note that 𝐸 satisfies Condition 3 of Definition 2 since 𝐸 βŠ† (β„‹ βˆ– 𝐻(𝑣)). Next, for each 𝐴 ∈ 𝐸, we create a successor node 𝑀 of 𝑣 and label 𝐿(𝑣, 𝑀) = 𝐴. This ensures that Condition 1 of Definition 2 is satisfied for 𝑣. Since, by Lemma 4, 𝐻 has a bounded number of nodes, this process eventually terminates. Algorithm 2: Computing all explanations by the Hitting Set Tree algorithm ComputeExplanationsHST(𝒦, β„‹, 𝑂): compute all explanations for βŸ¨π’¦, β„‹, π’ͺ⟩ input : a knowledge base 𝒦, a set of hypotheses β„‹, and an observation 𝑂 such that 𝒦 |= β„‹ βŠ‘ 𝑂 output : the set of all (minimal) subsets 𝐸 βŠ† β„‹ such that 𝒦 |= 𝐸 βŠ‘ 𝑂 1 𝑆 ← βˆ…; 2 𝑄 ← {βˆ…}; 3 while 𝑄 ΜΈ= βˆ… do 4 𝐻 ← choose 𝐻 ∈ 𝑄; 5 𝑄 ← 𝑄 βˆ– {𝐻}; 6 if 𝒦 |= β„‹ βˆ– 𝐻 βŠ‘ 𝑂 then 7 𝐸 ← Minimize(𝒦, β„‹ βˆ– 𝐻, 𝑂); 8 𝑆 ← 𝑆 βˆͺ {𝐸}; 9 for 𝐴 ∈ 𝐸 do 10 𝑄 ← 𝑄 βˆͺ {𝐻 βˆͺ {𝐴}}; 11 return 𝑆; Note that unlike the algorithm sketched in Lemma 2, the input for each call of Algorithm 1 now depends on the results returned by the previous calls. The main idea of the HST-algorithm is to systematically compute two kinds of sets: (1) expla- nations 𝐸 for the concept abduction problem βŸ¨π’¦, β„‹, π’ͺ⟩ and (2) sets 𝐻 that contain one element from each explanation 𝐸 on a branch. The name of the algorithm comes from the notion of a hitting set, which characterizes the latter sets. Definition 3. Let 𝑃 be a set of sets of some elements. A set 𝐻 is a hitting set for 𝑃 if 𝐻 ∩ 𝑆 ΜΈ= βˆ… for each 𝑆 ∈ 𝑃 . A hitting set 𝐻 for 𝑃 is minimal if every 𝐻 β€² ⊊ 𝐻 is not a hitting set for 𝑃 . Intuitively, a hitting set for 𝑃 is a set 𝐻 that contains at least one element from every set 𝑆 ∈ 𝑃 . An HS-tree is then a tree 𝑇 = (𝑉, 𝐸, 𝐿) such that for each 𝑣 ∈ 𝑉 , 𝐻(𝑣) is a hitting set of the set of explanations on the path from 𝑣 to the root of 𝑇 . The leaf nodes 𝑣 of 𝑇 are labeled by hitting sets 𝐻(𝑣) such 𝒦 ΜΈ|= (β„‹ βˆ– 𝐻(𝑣)) βŠ‘ 𝑂. Intuitively, the set 𝐻(𝑣) represents a set such that the removal of 𝐻(𝑣) from β„‹ breaks the subsumption. We get following bound on the number of subsumption tests performed by the HST-algorithm: Lemmaβˆ‘οΈ€ 5. An HS-tree for a concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ can be constructed using at most 1β‰€π‘˜β‰€π‘›+1 π‘›π‘˜ subsumption tests, where 𝑛 is the number of concepts in β„‹. Proof. We call Algorithm 1 exactly once per node. Combined with Lemma 4, we get the desired bound on the number of subsumption tests performed by the HST-algorithm. The HST-algorithm can further be optimized in several ways. First, it is not necessary to store the complete HS-tree in memory. For computing an explanation 𝐸 at each node 𝑣, it is sufficient to know just the set 𝐻(𝑣). For each successor 𝑀 of 𝑣 associated with some 𝐴 ∈ 𝐻(𝑣), the set 𝐻(𝑀) can be computed as 𝐻(𝑀) = 𝐻(𝑣) βˆͺ {𝐴}. Hence, it is possible to compute all explanations by recursively processing and creating the sets 𝐻(𝑣) as shown in Algorithm 2. The algorithm saves all explanations in a set 𝑆, which is initially empty (Line 1). The explanations are computed by processing the sets 𝐻(𝑣); the sets that are not yet processed are stored in the queue 𝑄, which initially contains 𝐻(𝑣0 ) = βˆ… for the root node 𝑣0 (Line 2). The elements of 𝑄 are then repeatedly processed in a loop (Lines 3–10) until 𝑄 becomes empty. First, we choose any 𝐻 ∈ 𝑄 (Line 4) and remove it from 𝑄 (Line 5). Then, we test whether 𝒦 |= (β„‹ βˆ– 𝐻) βŠ‘ 𝑂 (Line 6). If the subsumption holds, this means that the corresponding node 𝑣 of the HS-tree with 𝐻(𝑣) = 𝐻 is not a leaf node. We then compute an explanation 𝐸 using Algorithm 1 and add it to 𝑆 (Lines 7–8). Further, for each 𝐴 ∈ 𝐸, we create the set 𝐻(𝑀) = 𝐻(𝑣) βˆͺ {𝐴} for the corresponding successor node 𝑀 of 𝑣 and add 𝐻(𝑀) to 𝑄 for later processing (Lines 9–10). If the subsumption 𝒦 |= (β„‹ βˆ– 𝐻) βŠ‘ 𝑂 does not hold, we have reached a leaf of the HS-tree and no further children of this node should be created. Lemma 6. Given 𝒦, β„‹, and 𝑂 as input, Algorithm 2 returns all explanations for the concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩. Proof Sketch. We can prove the claim by showing that the following invariant always holds in the main loop (Lines 3–10): If 𝐸 is an explanation for 𝒫, then either 𝐸 ∈ 𝑆 or there exists 𝐻 ∈ 𝑄 such that 𝐸 βŠ† β„‹ βˆ– 𝐻. Note that, as before, checking whether a returned subsumption is explanatory, satisfiable, and relevant requires at most two subsumption and a set membership check. 3.3. Computing Abductive Explanations using SAT Solvers The main idea of the HST-algorithm is to systematically compute two kinds of sets: (1) expla- nations 𝐸 for a concept abduction problem βŸ¨π’¦, β„‹, π’ͺ⟩ and (2) hitting sets 𝐻 that contain one element from each explanation 𝐸 on a branch. Intuitively, for a leaf node 𝑣, 𝐻(𝑣) represents a set such that the removal of 𝐻(𝑣) from β„‹ breaks the subsumption, which means that every explanation on the path to 𝑣 is also a minimal hitting set for 𝐻(𝑣). This property is called the hitting set duality and it takes a prominent place in the HST-algorithm. We can, however, also use this property as the basis of a direct algorithm for computing abductive explanations. Suppose that we have already computed some set 𝑆 of explanations and some set 𝑃 of hitting sets for the concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩. How can we find a new explanation? As mentioned, each new explanation must be a hitting set for 𝑃 , i.e., it should contain one concept from every set in 𝑃 . Furthermore, it should be different from any of the previously computed explanations, i.e., it should miss one axiom from every 𝐸 ∈ 𝑆. Suppose we have found a subset 𝑀 βŠ† β„‹ satisfying these two requirements: βˆ€π‘… ∈ 𝑃 : 𝑀 ∩ 𝑅 ΜΈ= βˆ…, (5) βˆ€πΈ ∈ 𝑆 : 𝐸 βˆ– 𝑀 ΜΈ= βˆ…. (6) If 𝒦 |= 𝑀 βŠ‘ 𝑂, then, using Algorithm 1, we can extract a minimal subset 𝐸 β€² βŠ† 𝑀 such that 𝒦 |= 𝐸 β€² βŠ‘ 𝑂. Note that 𝐸 β€² still misses at least one axiom from each 𝐸 ∈ 𝑆 since (6) is preserved under removal of concepts from 𝑀 . Therefore, 𝐸 β€² is a new explanation for 𝒫. If 𝒦 ΜΈ|= 𝑀 βŠ‘ 𝑂, then, similarly, by adding concepts 𝐴 ∈ β„‹ to 𝑀 preserving 𝒦 ΜΈ|= 𝑀 βŠ‘ 𝑂, we Algorithm 3: Maximizing non-subsumption Maximize(𝒦, β„‹, 𝑀, 𝑂): compute a maximal subset 𝑁 βŠ† β„‹ such that 𝑀 βŠ† 𝑁 and 𝒦 ΜΈ|= 𝑁 βŠ‘ 𝑂 input : a knowledge base 𝒦, a set of hypotheses β„‹, a subset 𝑀 βŠ† β„‹, and an observation 𝑂 such that 𝒦 ΜΈ|= 𝑀 βŠ‘ 𝑂 output : 𝑁 such that 𝑀 βŠ† 𝑁 βŠ† β„‹ such that 𝒦 ΜΈ|= 𝑁 βŠ‘ 𝑂 but 𝒦 |= 𝑁 β€² βŠ‘ 𝑂 for every 𝑁 β€² with 𝑁 ⊊ 𝑁 β€² βŠ† β„‹ 1 𝑁 ← 𝑀; 2 for 𝐴 ∈ β„‹ βˆ– 𝑀 do 3 if 𝒦 ΜΈ|= (𝑁 βˆͺ {𝐴}) βŠ‘ 𝑂 then 4 𝑁 ← 𝑁 βˆͺ {𝐴}; 5 return 𝑁 ; can find a maximal superset 𝑁 of 𝑀 (𝑀 βŠ† 𝑁 βŠ† β„‹) such that 𝒦 ΜΈ|= 𝑁 βŠ‘ 𝑂: see Algorithm 3. Note that (5) is preserved under additions of elements to 𝑀 . Thus, using any set 𝑀 satisfying (5) and (6) we can find either a new explanation or a new minimal hitting set. The question arises, how we can find a set 𝑀 satisfying Conditions (5) and (6). These conditions require solving a rather complex combinatorial problem, for which propositional (SAT) solvers offer a convenient and effective way of solving such problems. In the following, we describe a propositional encoding of Conditions (5) and (6). To formulate the propositional encoding, we assign to each concept 𝐴 ∈ β„‹ a fresh proposi- tional variable 𝑝𝐴 . Then, every interpretation ℐ determines a set 𝑀 = 𝑀 (ℐ) = {𝐴 ∈ β„‹ | 𝑝ℐ𝐴 = 1} of concepts whose corresponding propositional variable is true. We construct a propositional formula 𝐹 such that 𝐹 ℐ = 1 if and only if 𝑀 (ℐ) satisfies (5) and (6) for the given sets 𝑆 of explanations and 𝑃 of minimal hitting sets. Thus, to find a subset 𝑀 satisfying (5) and (6), it is sufficient to find a model ℐ of 𝐹 and compute 𝑀 (ℐ). We define 𝐹 as follows: ⋀︁ ⋁︁ ⋀︁ ⋁︁ 𝐹 = 𝐹 (𝑆, 𝑃 ) = ¬𝑝𝐴 ∧ 𝑝𝐴 . (7) πΈβˆˆπ‘† 𝐴∈𝐸 π‘…βˆˆπ‘ƒ π΄βˆˆπ‘… Example 3. Let 𝒦 be the ontology from Example 1. We assign the propositional variables 𝑝𝐴 , 𝑝𝐡 , 𝑝𝐢 , 𝑝𝐢 β€² to the concepts 𝐴, 𝐡, 𝐢, 𝐢 β€² , respectively. Let 𝑆 = {{𝐴, 𝐢 β€² }, {𝐢, 𝐢 β€² }} be the set of explanations 𝐸1 and 𝐸2 from Example 1 and 𝑃 a set containing {𝐴, 𝐢}, i.e., assume we have constructed the left-most branch on the HST on the left-hand side of Figure 1. Then according to (7) we have: 𝐹 = 𝐹 (𝑆, 𝑃 ) = (¬𝑝𝐴 ∨ ¬𝑝𝐢 β€² ) ∧ (¬𝑝𝐢 ∨ ¬𝑝𝐢 β€² ) ∧ (𝑝𝐴 ∨ 𝑝𝐢 ). 𝐹 has a model ℐ with 𝑝ℐ𝐴 = 1 and 𝑝ℐ𝐡 = 𝑝ℐ𝐢 = 𝑝ℐ𝐢 β€² = 0, which gives 𝑀 (ℐ) = {𝐴}. Once the set 𝑀 determined by a model ℐ of 𝐹 is found, we can extract either a new explanation 𝐸 or a new minimal hitting set 𝑅 from 𝑀 by minimizing the subsumee using Algorithm 1 or maximizing non-subsumption using Algorithm 3. After that, we can update 𝐹 according to (7) and compute a new model of 𝐹 , if there exist any. Once 𝐹 is unsatisfiable, 𝑆 contains all explanations for 𝒫 and 𝑃 contains all minimal hitting sets. Algorithm 4: Computing all explanations using a SAT solver ComputeExplanationsSAT(𝒦, β„‹, 𝑂): compute all explanations for the concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ input : a knowledge base 𝒦, a set of hypotheses β„‹, and an observation 𝑂 such that 𝒦 |= β„‹ βŠ‘ 𝑂 output : the set of all minimal subsets 𝐸 βŠ† β„‹ such that 𝒦 |= 𝐸 βŠ‘ 𝑂 1 𝑆 ← βˆ…; 2 𝐹 ← ⊀; 3 while βˆƒ ℐ : 𝐹 ℐ = 1 do 4 ℐ ← choose ℐ : 𝐹 ℐ = 1; 5 𝑀 ← {𝐴 | 𝑝ℐ𝐴 = 1}; 6 if 𝒦 |= 𝑀 βŠ‘ 𝑂 then 7 𝐸 ← Minimize(𝒦, 𝑀, 𝑂); 8 𝑆 ← 𝑆 βˆͺ {𝐸}; ⋁︀ 9 𝐹 ← 𝐹 ∧ {¬𝑝𝐴 | 𝐴 ∈ 𝐸}; 10 else 11 𝑁 ← Maximize(β„‹, ⋁︀ β„‹, 𝑀, 𝑂); 12 𝐹 ← 𝐹 ∧ {𝑝𝐴 | 𝐴 ∈ β„‹ βˆ– 𝑁 }; 13 return 𝑆; Algorithm 4 summarizes the described procedure for computing all explanations for a concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ using a SAT solver. We start by creating an empty set 𝑆 of explanations (Line 1) and a formula 𝐹 that is always true (Line 2). Then, in a loop (Lines 3–12), as long as 𝐹 is satisfiable (which is checked using a SAT solver), we take any model ℐ of 𝐹 (Line 4), extract the corresponding set 𝑀 = 𝑀 (ℐ) that it defines (Line 5), and check the subsumption 𝒦 |= 𝑀 βŠ‘ 𝑂. If the entailment holds, using Algorithm 1 we compute an explanation for 𝒦 |= 𝑀 βŠ‘ 𝑂 (Line 7), which, by Lemma 1, is an explanation for 𝒫. This explanation is then added to 𝑆 (Line 8) and 𝐹 is extended with a new conjunct for this explanation according to (7) (Line 9). If the entailment does not hold, we compute a maximal superset 𝑁 of 𝑀 such that 𝒦 ΜΈ|= 𝑁 βŠ‘ 𝑂 using Algorithm 3 (Line 11) and extend 𝐹 with the corresponding conjunct for the new minimal hitting set 𝑅 = β„‹ βˆ– 𝑁 according to (7) (Line 12). As soon as 𝐹 becomes unsatisfiable, we return the set 𝑆 of computed explanations (Line 13). Example 4. Consider the concept abduction problem 𝒫 = βŸ¨π’¦, β„‹, π’ͺ⟩ from Example 1 and propositional encoding of concepts in β„‹ from Example 3. The following table shows a run of Algorithm 4 for the inputs 𝒦, β„‹, and 𝑂. Every row in this table corresponds to one iteration of the while-loop (Lines 3–12). The first column gives the value of the interpretation ℐ for 𝐹 computed in this iteration. The second column shows the value of 𝑀 computed for this interpretation and whether the entailment 𝒦 |= 𝑀 βŠ‘ 𝑂 holds. The third column shows the result of minimizing the subsumee or maximizing the non-subsumption using Algorithms 1 and 3. The last column shows the conjunct that is added to 𝐹 for the corresponding explanation or minimal hitting set. 𝒦 |= min(𝑀 ) βŠ‘ 𝑂/ 𝑝ℐ𝐴 𝑝ℐ𝐡 𝑝ℐ𝐢 𝑝ℐ𝐢 β€² 𝒦 |=? 𝑀 βŠ‘ 𝑂 𝒦 ΜΈ|= max(𝑀 ) βŠ‘ 𝑂 𝐢 0 0 0 0 𝒦 ΜΈ|= ⊀ βŠ‘ 𝑂 𝒦 ΜΈ|= {𝐴, 𝐢} βŠ‘ 𝑂 𝑝𝐡 ∨ 𝑝𝐢 β€² 0 1 0 0 𝒦 ΜΈ|= {𝐡} βŠ‘ 𝑂 𝒦 ΜΈ|= {𝐡, 𝐢} βŠ‘ 𝑂 𝑝𝐴 ∨ 𝑝𝐢 β€² 1 1 0 0 𝒦 |= {𝐴, 𝐡} βŠ‘ 𝑂 𝒦 |= {𝐴, 𝐡} βŠ‘ 𝑂 ¬𝑝𝐴 ∨ ¬𝑝𝐡 1 0 0 1 𝒦 |= {𝐴, 𝐢 β€² } βŠ‘ 𝑂 𝒦 |= {𝐴, 𝐢 β€² } βŠ‘ 𝑂 ¬𝑝𝐴 ∨ ¬𝑝𝐢 β€² 0 0 1 1 𝒦 |= {𝐢, 𝐢 β€² } βŠ‘ 𝑂 𝒦 |= {𝐢, 𝐢 β€² } βŠ‘ 𝑂 ¬𝑝𝐢 ∨ ¬𝑝𝐢 β€² 0 1 0 1 𝒦 ΜΈ|= {𝐡, 𝐢 β€² } βŠ‘ 𝑂 𝒦 ΜΈ|= {𝐡, 𝐢 β€² } βŠ‘ 𝑂 𝑝𝐴 ∨ 𝑝𝐢 We briefly discuss similarities and differences between Algorithm 2 and Algorithm 4: Both algorithms systematically explore subsets of β„‹ and minimize the subsumee from such subset to compute explanations. Algorithm 2 constructs such subsets (β„‹ βˆ– 𝐻) manually by removing one axiom appearing in the previously computed explanation (if there is any) in all possible ways. Algorithm 4 enumerates such subsets 𝑀 with the help of a SAT solver. The main difference is that Algorithm 2 may encounter the same subsets many times (on different branches), whereas the propositional encoding in Algorithm 4 ensures that such subsets never repeat. Of course, an iteration of Algorithm 2 cannot be directly compared to an iteration of Algorithm 4. Both iterations use at most one call to Algorithm 1, but Algorithm 4 may also require a call to Algorithm 3, as well as checking satisfiability of 𝐹 . The latter requires solving an NP-complete problem, for which no polynomial algorithm is known so far. In order to check satisfiability of 𝐹 , a SAT solver usually tries several (in worst-case exponentially many) propositional interpretations until a model of 𝐹 is found. As each such interpretation ℐ corresponds to a subset 𝑀 (ℐ) βŠ† β„‹, this process can be compared to the enumeration of subsets in Algorithm 2. However, a SAT solver usually implements a number of sophisticated optimizations, which make the search for models very efficient in practice, whereas the subset enumeration strategy used by Algorithm 2 is rather simplistic. Hence Algorithm 4 is likely to win in speed. On the other hand, Algorithm 4 requires saving all explanations (and minimal hitting sets) in the propositional formula 𝐹 , which might result in a formula of exponential size, if the number of such explanations or hitting sets is exponential. In this regard, Algorithm 2 could be more memory efficient since saving (all) explanations is optional (see the discussion at the end of Section 3.2). Hence both algorithms have their own advantages and disadvantages. 4. Conclusions We have presented two alternative algorithms for computing (all or some) solutions to the concept abduction problem: one algorithm is based on Reiter’s hitting set tree algorithm, whereas the other one relies on a SAT encoding. It remains to be analyzed how these algorithms behave in practice and how these algorithms differ on different real-world ontologies, where an important aspect is also finding efficient incremental SAT solvers. In contrast to previous work, the algorithms do not rely on a refutation-based calculus and, hence, can be used also with efficient reasoners for tractable DLs such as β„°β„’ and its extensions. Another direction for future work is extending the approach to other forms of (logic-based) abduction, e.g., to ABox abduction and a comparison with other existing approaches, which are mostly focusing on ABox abduction. References [1] C. S. Peirce, Deduction, Induction, and Hypothesis, Popular Science Monthly 13 (1878) 470–482. [2] T. Eiter, G. Gottlob, The complexity of logic-based abduction, J. ACM 42 (1995) 3–42. URL: https://doi.org/10.1145/200836.200838. [3] M. Denecker, A. C. Kakas, Abduction in logic programming, in: A. C. Kakas, F. Sadri (Eds.), Computational Logic: Logic Programming and Beyond, Essays in Honour of Robert A. Kowalski, Part I, volume 2407 of Lecture Notes in Computer Science, Springer, 2002, pp. 402–436. URL: https://doi.org/10.1007/3-540-45628-7_16. [4] C. Elsenbroich, O. Kutz, U. Sattler, A case for abductive reasoning over ontologies, in: B. C. Grau, P. Hitzler, C. Shankey, E. Wallace (Eds.), Proceedings of the OWLED*06 Workshop on OWL: Experiences and Directions, Athens, Georgia, USA, November 10- 11, 2006, volume 216 of CEUR Workshop Proceedings, CEUR-WS.org, 2006. URL: http: //ceur-ws.org/Vol-216/submission_25.pdf. [5] S. Klarman, U. Endriss, S. Schlobach, Abox abduction in the description logic ALC, J. Autom. Reason. 46 (2011) 43–80. URL: https://doi.org/10.1007/s10817-010-9168-z. [6] K. Halland, A. Britz, S. Klarman, Tbox abduction in ALC using a DL tableau, in: M. Bienvenu, M. Ortiz, R. Rosati, M. Simkus (Eds.), Informal Proceedings of the 27th International Workshop on Description Logics, Vienna, Austria, July 17-20, 2014, vol- ume 1193 of CEUR Workshop Proceedings, CEUR-WS.org, 2014, pp. 556–566. URL: http: //ceur-ws.org/Vol-1193/paper_42.pdf. [7] J. PukancovΓ‘, M. Homola, Abox abduction for description logics: The case of multiple observations, in: M. Ortiz, T. Schneider (Eds.), Proceedings of the 31st International Work- shop on Description Logics co-located with 16th International Conference on Principles of Knowledge Representation and Reasoning (KR 2018), Tempe, Arizona, US, October 27th - to - 29th, 2018, volume 2211 of CEUR Workshop Proceedings, CEUR-WS.org, 2018. URL: http://ceur-ws.org/Vol-2211/paper-31.pdf. [8] P. Koopmann, W. Del-Pinto, S. Tourret, R. A. Schmidt, Signature-based abduction for expressive description logics, in: D. Calvanese, E. Erdem, M. Thielscher (Eds.), Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020, Rhodes, Greece, September 12-18, 2020, 2020, pp. 592–602. URL: https://doi.org/10.24963/kr.2020/59. [9] W. Del-Pinto, R. A. Schmidt, Abox abduction via forgetting in ALC, in: The Thirty- Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, AAAI Press, 2019, pp. 2768–2775. URL: https://doi.org/10. 1609/aaai.v33i01.33012768. [10] B. Glimm, Y. Kazakov, Classical algorithms for reasoning and explanation in description logics, in: M. KrΓΆtzsch, D. Stepanova (Eds.), Reasoning Web. Explainable Artificial Intel- ligence - 15th International Summer School 2019, Bolzano, Italy, September 20-24, 2019, Tutorial Lectures, volume 11810 of Lecture Notes in Computer Science, Springer, 2019, pp. 1–64. URL: https://doi.org/10.1007/978-3-030-31423-1_1. [11] R. Reiter, A theory of diagnosis from first principles, Artificial Intelligence 32 (1987) 57– 95. URL: https://www.sciencedirect.com/science/article/pii/0004370287900622. doi:https: //doi.org/10.1016/0004-3702(87)90062-2. [12] M. Bienvenu, Complexity of abduction in the EL family of lightweight description logics, in: G. Brewka, J. Lang (Eds.), Principles of Knowledge Representation and Reasoning: Pro- ceedings of the Eleventh International Conference, KR 2008, Sydney, Australia, September 16-19, 2008, AAAI Press, 2008, pp. 220–230. URL: http://www.aaai.org/Library/KR/2008/ kr08-022.php. [13] M. W. Birte Glimm, Yevgeny Kazakov, Concept Abduction for Description Log- ics – Technical Report, Technical Report, Ulm University, Institute of Artifi- cial Intelligence, 2022. https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.090/ Publikationen/2022/GlKW22a_report.pdf. [14] M. KrΓΆtzsch, F. Simančík, I. Horrocks, A description logic primer, CoRR abs/1201.4089 (2012). Available at http://arxiv.org/pdf/1201.4089.pdf. [15] T. Eiter, G. Gottlob, The complexity of logic-based abduction, J. ACM 42 (1995) 3–42. URL: https://doi.org/10.1145/200836.200838. doi:10.1145/200836.200838. [16] F. Baader, S. Brandt, C. Lutz, Pushing the β„°β„’ envelope further, in: Proc. 5th Workshop on OWL: Experiences and Directions (OWLED’08), volume 496, CEUR, 2008. [17] K. Schild, A correspondence theory for terminological logics: Preliminary report, in: J. Mylopoulos, R. Reiter (Eds.), Proc. 12th Int. Joint Conf. on Artificial Intelligence (IJCAI’91), Morgan Kaufmann, 1991, pp. 466–471. [18] F. M. Donini, F. Massacci, EXPTIME tableaux for π’œβ„’π’ž, J. of Artificial Intelligence 124 (2000) 87–138. [19] R. Greiner, B. A. Smith, R. W. Wilkerson, Readings in model-based diagnosis, Morgan Kaufmann Publishers Inc., 1992, pp. 49–53.