=Paper=
{{Paper
|id=Vol-3263/paper-13
|storemode=property
|title=Hybrid MHS-MXP ABox Abduction Solver: First Empirical Results
|pdfUrl=https://ceur-ws.org/Vol-3263/paper-13.pdf
|volume=Vol-3263
|authors=Martin Homola,Júlia Pukancová,Iveta Balintová,Janka Boborová
|dblpUrl=https://dblp.org/rec/conf/dlog/HomolaPBB22
}}
==Hybrid MHS-MXP ABox Abduction Solver: First Empirical Results==
Hybrid MHS-MXP ABox Abduction Solver: First Empirical Results Martin Homola, Júlia Pukancová, Iveta Balintová and Janka Boborová Comenius University in Bratislava, Mlynská dolina, 842 41 Bratislava, Slovakia Abstract MHS-MXP is a hybrid algorithm which adopts the divide-and-conquer strategy of fast but incomplete MergeXplain (MXP) and combines it with Minimal Hitting Set (MHS) to regain completeness. We describe a class of inputs on which MHS-MXP has an advantage. We provide an experimental implementation which enables us to perform first preliminary empirical evaluation on this class of inputs. Keywords ABox abduction, description logics, ontologies 1. Introduction In the context of DL, ABox abduction [1] assumes a DL KB 𝒦 and an extensional observation 𝑂 (in form of an ABox assertion). Explanations (also extensional) are sets of ABox assertions ℰ such that 𝒦 together with ℰ entails 𝑂. The MHS algorithm [2] is the classic method to find all explanations of an ABox abduction problem. MHS is mostly applied relying on an external DL reasoner for consistency checking as a black box. MHS is complete. It searches through the space of all possible explanations, from the smallest (in terms of cardinality) towards the largest. This approach is useful, particularly in cases where there are smaller explanations. However in cases where there is (even a small number of) explanations of larger size this search strategy may be inefficient. Notably, the worst-case complexity of the underlying problems is also hard. The MHS problem is NP-complete [3] and consistency checking of DL KBs repeatedly called by MHS depends on the particular DL, but for many DLs it may be exponential or worse. Alternative strategies were explored. QuickXplain (QXP) [4], and more recently its extension MergeXplain [5] employ the divide and conquer strategy which allows to find one (QXP) or even multiple explanations (MXP) very efficiently. On the other hand, these methods are not complete, i.e., there is no warranty that all explanations will be found. However, when MXP is run repeatedly, with slightly modified inputs, it divides the search space differently and it may return a different set of explanations than in the previous runs. Based on this key observation, we have proposed a combined algorithm, dubbed MHS-MXP [6], DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel Envelope-Open homola@fmph.uniba.sk (M. Homola); pukancova@fmph.uniba.sk (J. Pukancová); balintova37@uniba.sk (I. Balintová); boborova3@uniba.sk (J. Boborová) GLOBE https://dai.fmph.uniba.sk/~homola/ (M. Homola); https://dai.fmph.uniba.sk/~pukancova/ (J. Pukancová) Orcid 0000-0001-6384-9771 (M. Homola) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) that iterates runs of MXP and it uses MHS to steer the search space exploration in such a way that completeness is retained. In the current report we study relevant properties of MHS-MXP that help not only to establish its correctness, but also hint the class of inputs on which it may have an advantage over MHS. We conjecture, that one such class of favourable inputs are those with a smaller number of shorter explanations. We provide a preliminary implementation and show first empirical results that supports the above conjecture. The main advantage of our work compared to some other promising approaches in DL abduction [7, 8] lies in being a truly black box approach and thus it may be immediately paired with any DL for which a suitable reasoner is available (provided that a model can be extracted from it, see below). In fact, the approach is not limited to DLs and it can be applied in any case in which MHS is applicable. Different approaches to DL abduction, on the other hand, have other advantages, e.g. they are able to provide for richer abducibles than our approach. 2. Preliminaries We assume familiarity with the basics of DL [9], including vocabulary consisting of individuals 𝑁I = {𝑎, 𝑏, …}, roles 𝑁R = {𝑃, 𝑄, …}, and atomic concepts 𝑁C = {𝐴, 𝐵, …}; complex concepts 𝐶, 𝐷, … built by constructors (e.g. ¬, ⊓, ⊔, ∃, ∀, in case of 𝒜 ℒ 𝒞 [10]); a KB 𝒦 = 𝒯 ∪ 𝒜 composed of a TBox 𝒯 (with subsumption axioms of the form 𝐶 ⊑ 𝐷) and an ABox 𝒜 (with concept assertions of the form 𝐶(𝑎) and (possibly negated [11]) role assertions of the form 𝑅(𝑎) and ¬𝑅(𝑎)). We also remind about the semantics that relies on models 𝑀 of a KB 𝒦, that satisfy all axioms or assertions 𝜙 in 𝒦 (𝑀 ⊧ 𝜙); and the reasoning tasks of checking the consistency of 𝒦 (if it has a model) and entailment (𝒦 ⊧ 𝜙 if 𝑀 ⊧ 𝜙 for all its models 𝑀). In ABox abduction [1], we are given a KB 𝒦 and an observation 𝑂 consisting of an ABox assertion. The task is to find an explanation ℰ, again, consisting of ABox assertions, such that 𝒦 ∪ ℰ ⊧ 𝑂. Explanations are drawn from some set of abducibles Abd. Definition 1 (ABox Abduction Problem). Let Abd be a finite set of ABox assertions. An ABox abduction problem is a pair 𝒫 = (𝒦 , 𝑂) such that 𝒦 is a knowledge base in DL and 𝑂 is an ABox assertion. An explanation of 𝒫 (on Abd) is any finite set of ABox assertions ℰ ⊆ Abd such that 𝒦 ∪ ℰ ⊧ 𝑂. We limit the explanations to atomic and negated atomic concept and role assertions; hence Abd ⊆ {𝐴(𝑎), ¬𝐴(𝑎) ∣ 𝐴 ∈ 𝑁C , 𝑎 ∈ 𝑁I } ∪ {𝑅(𝑎, 𝑏), ¬𝑅(𝑎, 𝑏) ∣ 𝑅 ∈ 𝑁R , 𝑎, 𝑏 ∈ 𝑁I }. Note that we do not limit the observations, apart from allowing only one (possibly complex) ABox assertion. According to Elsenbroich et al. [1] it is reasonable to require from each explanation ℰ of 𝒫 = (𝒦 , 𝑂) to be: (a) consistent (𝒦 ∪ ℰ is consistent); (b) relevant (ℰ ⊧ ̸ 𝑂); and (c) explanatory (𝒦 ⊧ ̸ 𝑂). Explanations that satisfy these three conditions will be called desired. In addition, in order to avoid excess hypothesizing, minimality is required. Definition 2 (Minimality). Assume an ABox abduction problem 𝒫 = (𝒦 , 𝑂). Given explana- tions ℰ and ℰ ′ of 𝒫, ℰ is (syntactically) smaller than ℰ ′ if ℰ ⊆ ℰ ′ .An explanation ℰ of 𝒫 is (syntactically) minimal if there is no other explanation ℰ ′ of 𝒫 that is smaller than ℰ. Algorithm 1 MHS(𝒦,𝑂,Abd) Input: Knowledge base 𝒦, observation 𝑂, abducibles Abd Output: 𝒮ℰ all explanations of 𝒫 = (𝒦 , 𝑂) w.r.t. Abd 1: 𝑀 ← a model 𝑀 of 𝒦 ∪ {¬𝑂} 10: prune 𝑛 2: if 𝑀 = n u l l then 11: else if exists model 𝑀 of 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛) then 3: return ” n o t h i n g t o e x p l a i n ” 12: label 𝑛 by 𝐿(𝑛) ← Abd(𝑀) 4: end if 13: else if 𝐻 (𝑛) is desired then 5: 𝑇 ← (𝑉 = {𝑟}, 𝐸 = ∅, 𝐿 = {𝑟 ↦ Abd(𝑀)}) 14: 𝒮ℰ ← 𝒮ℰ ∪ {𝐻 (𝑛)} 6: for each 𝜎 ∈ 𝐿(𝑟) create new 𝜎-successor 𝑛𝜎 of 𝑟 15: end if 7: 𝒮ℰ ← {} 16: for each 𝜎 ∈ 𝐿(𝑛) create new 𝜎-successor 𝑛𝜎 of 𝑛 8: while exists next node 𝑛 in 𝑇 w.r.t. BFS do 17: end while 9: if 𝑛 can be pruned then 18: return 𝒮ℰ 3. Computing Explanations We now review first the complete MHS algorithm and then the faster but approximative MXP algorithm. The hybrid approach that tries to combine “the best of both worlds” is then introduced in Section 4. 3.1. Minimal Hitting Set Adopting the well-known result of Reiter [2], computing all minimal explanations of (𝒦 , 𝑂) reduces to finding all minimal hitting sets of the set of models of 𝐾 ∪ {¬𝑂} in the following sense. Also, if some of the models contain no abducibles then there are no explanations. Observation 1. The minimal explanations of (𝒦 , 𝑂) on Abd directly corresponds to the minimal hitting sets of {Abd(𝑀) ∣ 𝑀 ⊧ 𝒦 ∪ {¬𝑂}} where Abd(𝑀) = {𝜙 ∈ Abd ∣ 𝑀 ⊧ ̸ 𝜙}. Observation 2. If Abd(𝑀) = ∅ for some 𝑀 ⊧ 𝒦 ∪ {¬𝑂}, then (𝒦 , 𝑂) has no explanations on Abd. The MHS algorithm (Algorithm 1) works by constructing an HS-tree 𝑇 = (𝑉 , 𝐸, 𝐿) in which each node is labelled by Abd(𝑀) for some model 𝑀 of 𝒦 ∪ {¬𝑂} and whose edges are labelled by elements of the parent’s label. If a node 𝑛1 ∈ 𝑉 has a successor 𝑛2 ∈ 𝑉 such that 𝐿(⟨𝑛1 , 𝑛2 ⟩) = 𝜎 then 𝑛2 is a 𝜎-successor of 𝑛1 . The HS-tree has the property that the node-label 𝐿(𝑛) and the union 𝐻 (𝑛) of the edge-labels on the path from the root 𝑟 to each node 𝑛 are disjoint. Each label 𝐿(𝑛) can be found as Abd(𝑀) of any model of 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛), by one call to an external DL reasoner. If no such model 𝑀 exists then 𝐻 (𝑛) corresponds to a hitting set. Note that if 𝑀 exists but Abd(𝑀) = ∅, then in accord with Observation 2 𝐻 (𝑛) cannot be extended to a hitting set. Subset pruning eliminates non-minimal hitting sets: given a hitting set 𝐻 (𝑛), nodes 𝑛′ with 𝐻 (𝑛) ⊆ 𝐻 (𝑛′ ) are pruned. Equal-paths pruning prunes also nodes 𝑛′ with 𝐻 (𝑛) = 𝐻 (𝑛′ ), even if 𝐻 (𝑛) is not a hitting set. Once completed, a pruned HS-tree contains all minimal hitting sets [2]. MHS is sound and complete [2, 12]. Algorithm 2 MXP(ℬ,𝒞) Input: background theory ℬ, set of possibly faulty constraints 𝒞 Output: a set of minimal conflicts Γ 1: if ¬isConsistent(ℬ) then 19: 𝑋 ← GetConflict(ℬ ∪ 𝒞2′ , 𝒞2′ , 𝒞1′ ) 2: return ” n o e x p l a n a t i o n ” 20: 𝛾 ← 𝑋 ∪ GetConflict(ℬ ∪ 𝑋 , 𝑋 , 𝒞2′ ) 3: else if isConsistent(ℬ ∪ 𝒞 ) then 21: 𝒞1′ ← 𝒞1′ \{𝜎} where 𝜎 ∈ 𝑋 4: return ∅ 22: Γ ← Γ ∪ {𝛾 } 5: end if 23: end while 6: ⟨ , Γ⟩ ← FindConflicts(ℬ, 𝒞) 24: return ⟨𝒞1′ ∪ 𝒞2′ , Γ⟩ 7: return Γ 25: end function 8: function FindConflicts(ℬ, 𝒞) 26: function GetConflict(ℬ, 𝐷, 𝒞) 9: if isConsistent(ℬ ∪ 𝒞 ) then 27: if 𝐷 ≠ ∅ ∧ ¬isConsistent(ℬ) then 10: return ⟨𝒞 , ∅⟩ 28: return ∅ 11: else if |𝒞 | = 1 then 29: else if |𝒞 | = 1 then 12: return ⟨∅, {𝒞 }⟩ 30: return 𝒞 13: end if 31: end if 14: Split 𝒞 into disjoint, non-empty sets 𝒞1 and 𝒞2 32: Split 𝒞 into disjoint, non-empty sets 𝒞1 and 𝒞2 15: ⟨𝒞1′ , Γ1 ⟩ ← FindConflicts(ℬ, 𝒞1 ) 33: 𝐷2 ← GetConflict(ℬ ∪ 𝒞1 , 𝒞1 , 𝒞2 ) 16: ⟨𝒞2′ , Γ2 ⟩ ← FindConflicts(ℬ, 𝒞2 ) 34: 𝐷1 ← GetConflict(ℬ ∪ 𝐷2 , 𝐷2 , 𝒞1 ) 17: Γ ← Γ 1 ∪ Γ2 35: return 𝐷1 ∪ 𝐷2 18: while ¬isConsistent(𝒞1′ ∪ 𝒞2′ ∪ ℬ) do 36: end function Theorem 1. The MHS algorithm is sound and complete (i.e., it returns the set 𝒮ℰ of all minimal desired explanations of 𝒦 and 𝑂 on Abd). The fact that MHS explores the search space using breadth-first search (BFS) allows to limit the search for explanations by maximum size. The algorithm is still complete w.r.t. any given target size [12]. 3.2. MergeXplain Both QXP [4] and MXP [5] were originally designed to find minimal inconsistent subsets (dubbed conflicts) of an over-constrained knowledge base 𝒦 = ℬ ∪ 𝒞, where ℬ is the consistent background theory and 𝒞 is the “suspicious” part from which the conflicts are drawn. The algorithm is listed in Algorithm 2. The essence of QXP is captured in the GetConflict function, which cleverly decomposes 𝒞 by splitting it into smaller and smaller subsets such that it is always able to reconstruct one minimal conflict, if it only exists. The auxiliary function isConsistent(𝒦) encapsulates calls to an external reasoner; it returns true if 𝒦 is consistent and false otherwise. Thus, if we just wanted to find one minimal explanation of an ABox abduction problem, adopting a result of Junker [4] we could use GetConflict in the following way. Theorem 2. Assume an ABox abduction problem 𝒫 = (𝒦 , 𝑂) and a set of abducibles Abd. If there is at least one explanation 𝛾 ⊆ Abd of 𝒫 then calling GetConflict(𝒦 ∪{¬𝑂}, 𝒦 ∪{¬𝑂}, Abd) returns some minimal explanation 𝛿 ⊆ Abd of 𝒫. MXP, captured in the FindConflicts function, finds as many conflicts as it is possible to reconstruct from one way in which 𝒞 can be split. MXP relies on GetConflict to recover some of the conflicts that would be lost due to splitting. This ensures that it keeps the important property of QXP that at least one minimal is found in each run, if it exists. This approach can be immediately adopted for ABox abduction: in order to find explanations for an abduction problem 𝒫 = (𝒦 , 𝑂) on Abd one needs to call MXP(𝒦 ∪ {¬𝑂}, Abd). This observation allows us to adopt the following result from Shchekotykhin et al. [5]: Theorem 3. Assume an ABox abduction problem 𝒫 = (𝒦 , 𝑂) and a set of abducibles Abd. If there is at least one explanation 𝛾 ⊆ Abd of 𝒫 then calling MXP(𝒦 ∪ {¬𝑂}, Abd) returns a nonempty set Γ of minimal explanations of 𝒫. In fact, MXP is thorough in its decomposition of 𝒞, which is broken to smaller and smaller subsets until they are consistent with ℬ or until only sets of size 1 remain. This directly implies that all conflicts of size 1 will always be found and returned by a single run of MXP. This observation will prove to be useful for our hybrid algorithm. Observation 3. Given an ABox abduction problem 𝒫 = (𝒦 , 𝑂), a set of abducibles Abd, and any 𝛾 ⊆ Abd s.t. |𝛾 | = 1, if 𝒦 ∪ 𝛾 ⊧ 𝑂 then 𝛾 ∈ MXP(𝒦 ∪ {¬𝑂}, Abd). Thus MXP is sound and it always finds at least one minimal explanation (Theorem 3), and it finds all explanations of size one (Observation 3). Still, MXP is not complete. Some explanations may be lost, especially in cases with multiple partially overlapping explanations. Example 1. Let 𝒦 = {𝐴 ⊓ 𝐵 ⊑ 𝐷, 𝐴 ⊓ 𝐶 ⊑ 𝐷} and let 𝑂 = 𝐷(𝑎). Let us ignore negated ABox expressions and start with Abd = {𝐴(𝑎), 𝐵(𝑎), 𝐶(𝑎)}. There are two minimal explanations of 𝒫 = (𝒦 , 𝑂): {𝐴(𝑎), 𝐵(𝑎)}, and {𝐴(𝑎), 𝐶(𝑎)}. Calling MXP(𝒦 ∪ {¬𝑂}, Abd), it passes the initial tests and calls FindConflicts(𝒦 ∪ {¬𝑂}, Abd). FindConflicts needs to decide how to split 𝒞 = Abd into 𝒞1 and 𝒞2 . Let us assume the split was 𝒞1 = {𝐴(𝑎)} and 𝒞2 = {𝐵(𝑎), 𝐶(𝑎)}. Since both 𝒞1 and 𝒞2 are now conflict-free w.r.t. 𝒦 ∪ {¬𝑂}, the two consecutive recursive calls return ⟨𝒞1′ , ∅⟩ and ⟨𝒞2′ , ∅⟩ where 𝒞1′ = {𝐴(𝑎)} and 𝒞2′ = {𝐵(𝑎), 𝐶(𝑎)}. In the while loop, GetConflict(𝒦 ∪ {¬𝑂} ∪ {𝐵(𝑎), 𝐶(𝑎)}, {𝐵(𝑎), 𝐶(𝑎)}, {𝐴(𝑎)}) returns 𝑋 = {𝐴(𝑎)} while GetConflict(𝒦 ∪{¬𝑂}∪{𝐴(𝑎)}, {𝐴(𝑎)}, {𝐵(𝑎), 𝐶(𝑎)}) returns 𝐵(𝑎), and hence the first conflict 𝛾 = {𝐴(𝑎), 𝐵(𝑎)} is found and added into Γ. However, consecutively 𝐴(𝑎) is removed from 𝒞1′ leaving it empty, and thus the other conflict is not found and Γ = {{𝐴(𝑎), 𝐵(𝑎)}} is returned. Finally, not only is MXP able to find all explanations of size 1, but it also has the property that if no longer explanations are returned in a given run then in fact this is because there are none. So in such a case we are sure that we have found all explanations whatsoever in a single run and we do not have to search any further. Theorem 4. Given an ABox abduction problem 𝒫 = (𝒦 , 𝑂), a set of abducibles Abd, let Γ = MXP(𝒦 ∪ {¬𝑂}, Abd). If there is no 𝛾 ∈ Γ s.t. |𝛾 | > 1, then for all minimal 𝛿 ⊆ Abd s.t. 𝒦 ∪ 𝛿 ⊧ 𝑂 we have that 𝛿 ∈ Γ. Algorithm 3 MHS-MXP(𝒦,𝑂,Abd) Input: knowledge base 𝒦, observation 𝑂, set of abducibles Abd Output: set 𝒮ℰ of all explanations of 𝒫 = (𝒦 , 𝑂) of the class Abd 1: Con ← {} ▷ Set of conflicts 2: Mod ← {} ▷ Set of cached models 3: if ¬isConsistent(𝒦 ∪ {¬𝑂}) then 4: return ” n o t h i n g t o e x p l a i n ” 5: else if Abd(𝑀) = ∅ where Mod = {𝑀} then 6: return 𝒮ℰ = ∅ 7: end if 8: 𝑇 ← (𝑉 = {𝑟}, 𝐸 = ∅, 𝐿 = ∅) ▷ Init. HS-Tree 9: while there is next node 𝑛 in 𝑇 w.r.t. BFS do 10: if 𝛾 ⊈ 𝐻 (𝑛) for all 𝛾 ∈ Con then ▷ Otherwise 𝑛 is pruned 11: ⟨ , Γ⟩ ← FindConflicts(𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛), Abd ⧵ 𝐻 (𝑛)) 12: Con ← Con ∪ {𝐻 (𝑛) ∪ 𝛾 ∣ 𝛾 ∈ Γ} 13: if Γ ≠ ∅ and ∃𝛾 ∈ Γ ∶ |𝛾 | > 1 then ▷ HS-tree is extended under 𝑛 14: 𝐿(𝑛) ← Abd(𝑀) ⧵ 𝐻 (𝑛) for some 𝑀 ∈ Mod s.t. 𝑀 ⊧ 𝐻 (𝑛) 15: for each 𝜎 ∈ 𝐿(𝑛) create new 𝜎-successor 𝑛𝜎 of 𝑛 16: end if 17: end if 18: end while 19: return 𝒮ℰ ← {𝛾 ∈ Con ∣ 𝛾 is desired} 20: function isConsistent(𝒦) 21: if there is 𝑀 ⊧ 𝒦 then 22: Mod ← Mod ∪ {𝑀} 23: return true 24: else 25: return false 26: end if 27: end function 4. Combined MHS-MXP Algorithm The idea to use MXP to find all explanations is based on the observation that running it multiple times in a row may result in a consecutive extension of the overall set of conflicts found so far. A naïve, and possibly to a large extent successful idea, would be to randomize the set splits MXP does in each recursive call. We would likely find different conflicts each time, however it would not be clear when to stop. We will instead explore a hybrid approach, and we will show that by modifying MXP’s inputs in its consecutive iterations, the search space exploration can be guided by the construction of an HS-tree from the obtained outputs, and thus completeness will be achieved. The combined MHS-MXP algorithm, listed as Algorithm 3, therefore constructs the HS-tree 𝑇 as usual, but in each node 𝑛, instead of simply retrieving one model of 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛), it launches MXP by calling FindConflicts. It starts by checking the consistency of 𝒦 ∪ {¬𝑂}. We use a modified isConsistent function which stores all models in the model cache Mod. The stored models are later used to construct the HS-tree and label its nodes. For this reason we also assume that FindConflicts uses this modified isConsistent function. Then the main loop is initiated. For the root node 𝑟, FindConflicts is simply called passing 𝒦 ∪ {¬𝑂} as the background theory and Abd as the set of conflicts (as 𝐻 (𝑛) = ∅ at this point). The obtained conflicts Γ are stored in Con. We then verify if all conflicts were already found or if the search needs to go on (line 13). From Theorem 3 we know that if no conflicts were returned in Γ, it means there are no conflicts whatsoever. Also from Observation 3 we know that all conflicts of size 1 are always found and returned in Γ. Finally, by Theorem 4 we have that if any larger conflicts remain, at least one is also present in Γ. Hence, if there is no 𝛾 ∈ Γ with |𝛾 | > 1 there are no other explanations to be found and the search can be terminated. If however at least one such 𝛾 was returned in Γ then the HS-tree is extended under 𝑟 using the model 𝑀 that was previously found and stored in Mod. When consecutively any other node 𝑛 ≠ 𝑟 is visited by the main loop, we first check if it can be pruned: it is pruned if 𝐻 (𝑛) contains any of the conflicts already stored in Con. If not, we now want to use MXP with the goal to explore as much as possible of that part of the space of explanations that extends 𝐻 (𝑛). Therefore we call FindConflicts passing 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛) as the background theory and Abd ⧵ 𝐻 (𝑛) as the set of conflicts. If we are lucky, we might cut off this branch completely in line 13, that is, if no extension of 𝐻 (𝑛) of size greater than 1 is found (by Theorem 4). Otherwise we extend the HS-tree below 𝑛. To be able to do that, we need a model of 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛). However, we do not need to run another consistency check here, as by design of our algorithm at this point such a model is already cached in Mod. Observation 4. For each node 𝑛 of the HS-tree visited by the main loop of MHS-MXP(𝒦 , 𝑂, Abd) either 𝐻 (𝑛) ∈ Con or 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛) is consistent and at least for one 𝑀 ∈ Mod, 𝑀 ⊧ 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛). For the root 𝑟 this trivially holds as it was only visited (as the first node in the main loop) if the consistency check in line 3 was positive, i.e. a suitable model was found and cached. For any other node 𝑛, this holds due to FindConflicts was previously called in the parent node ′ 𝑛 of 𝑛, and from Observation 3 we know that during that call all possible inconsistent extensions of 𝐻 (𝑛′ ) of size 1 were added to Con. So, if 𝑛 was not pruned in line 10, 𝐻 (𝑛) = 𝐻 (𝑛′ ) ∪ {𝜎 } must be consistent with 𝒦 ∪ {¬𝑂}. Moreover, since FindConflicts did not prove {𝜎 } being a conflict for 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛′ ), at some point it must have checked the consistency of 𝒦 ∪ {¬𝑂} ∪ 𝐻 (𝑛′ ) together with {𝜎 } or some superset thereof with a positive result, and at that point the respective model was added to Mod. Finally, by the time a complete HS-tree is constructed, all explanations are accumulated in Con. However, due to calls to FindConflicts where (nonempty) 𝐻 (𝑛) was passed together with 𝒦 as the consistent background theory, some of these conflicts in Con may be non-minimal and they have to be filtered out. At this point we also filter out any other undesired explanations. Then the remaining minimal and desired explanations are returned as 𝒮ℰ . Theorem 5. The MHS-MXP algorithm is sound and complete (i.e., it returns the set 𝒮ℰ of all minimal desired explanations of 𝒦 and 𝑂 on Abd). 5. Advantages and Limitations First of all, it is immediately apparent from the properties of MXP alone, that our approach absolutely crushes MHS in cases when all explanations are of size one. By Observation 3 and Theorem 4 the search may immediately stop after one call to MXP in the root node of the HS-tree. Without this “look ahead” capability provided to the hybrid algorithm by MXP, pure MHS has no way of knowing it could stop and has to generate the HS-tree completely. Let us now consider some cases when bigger explanations come into play. Example 2. Let 𝒦 = {𝐴 ⊓ 𝐵 ⊑ 𝐹 , 𝐶 ⊓ 𝐷(𝑎), 𝐸(𝑏)}, let 𝑂 = 𝐹 (𝑎) and let Abd = {𝐴(𝑎), 𝐵(𝑎), 𝐶(𝑎), 𝐷(𝑎)}. There is exactly one explanation of (𝒦 , 𝑂), ℰ1 = {𝐴(𝑎), 𝐵(𝑎)}. If we run MHS-MXP, it first checks 𝒦 ∪ {¬𝐹 (𝑎)} for consistency and it obtains a model 𝑀 thereof, say one with Abd(𝑀) = {𝐴(𝑎), 𝐶(𝑎)}. The call to FindConflicts in the root does not allow to terminate the search, since ℰ1 was returned and |ℰ1 | > 1. Therefore 𝑛1 and 𝑛2 are added to the HS-tree with 𝐻 (𝑛1 ) = {𝐴(𝑎)} and 𝐻 (𝑛2 ) = {𝐶(𝑎)}. When FindConflicts is called in 𝑛1 , it returns one conflict {𝐵(𝑎)} which together with 𝐻 (𝑛1 ) makes up for the explanation ℰ1 . What is more, this branch is consecutively cut off, as no greater conflicts were found. Notably, further exploration of branches extending 𝐻 (𝑛1 ) with 𝐶(𝑎) and 𝐷(𝑎) is avoided (in comparison with MHS). Then FindConflicts is called in 𝑛2 returning one conflict {𝐴(𝑎), 𝐵(𝑎)}, corresponding to the non-minimal explanation {𝐶(𝑎), 𝐴(𝑎), 𝐵(𝑎)}. However, since there was a conflict extending 𝐻 (𝑛1 ) by a size greater than one, we may not terminate yet and must explore this branch in the HS-tree further, until only extensions of size one are returned by MXP in each path. The case presented in Example 2 and similar cases with a small overall number of explanations can still be handled rather efficiently, compared to MHS. It can be observed that a significant part of the search space is cut off. However consider the following modification of the inputs. Example 3. Given the 𝒦 and 𝑂 as in Example 2, let Abd = {𝐴(𝑎), 𝐵(𝑎), 𝐶(𝑎), 𝐷(𝑎), 𝐸(𝑎), ¬𝐸(𝑎)}. The abduction problem (𝒦 , 𝑂) now has two explanations ℰ1 and ℰ2 = {𝐸(𝑎), ¬𝐸(𝑎)}, where the second one is undesired (inconsistent). When FindConflicts is called in the root node 𝑟 it returns {{𝐴(𝑎), 𝐵(𝑎)}, {𝐸(𝑎), ¬𝐸(𝑎)}}. W.l.o.g. we may assume that the same model 𝑀 was used to label 𝑟 and that 𝑀 ⊧ ̸ 𝐸(𝑎). This time Abd(𝑀) = {𝐴(𝑎), 𝐶(𝑎), 𝐸(𝑎)} and in addition to 𝑛1 and 𝑛2 as above also 𝑛3 is generated with 𝐻 (𝑛3 ) = {𝐸(𝑎)}. Now the search cannot be immediately cut off after MXP is called in any of the three nodes 𝑛1 , 𝑛2 , or 𝑛3 . E.g., in 𝑛1 FindConflicts returns {{𝐵(𝑎)}, {𝐸(𝑎), ¬𝐸(𝑎)}}. Only branches where all but one element from each explanation is already present can be cut off safely. The exercise we make in Example 3 shows that the larger the overall amount of explanations (including also various undesired explanations as ℰ2 ) and also the greater their size, the less advantage MHS-MXP is likely to retain. It should also be noted that while adding complementary assertions (inducing inconsistent explanations) to abducibles does not make a difference for MHS, it does make a difference (towards the worse) for MHS-MXP. This is also true of assertions that are mutually inconsistent due to the background ontology (and therefore inducing another type of undesired explanations, called irrelevant). The problem of finding all explanations of a given abduction problem is hard. MHS itself is NP-complete [3], and it still has to call DL reasoning (likely exponential or worse) in every node. While MHS-MXP provides an advantage on certain inputs we have no reason to suppose it is substantially better in the worst case. We were however interested to explore the improvement in the advantageous cases empirically on which we report in the following part of this paper. 6. Implementation and Evaluation An implementation1 of MHS-MXP was developed in Java using Maven for dependency manage- ment. Compared to the previous version of the implementation [6], the current version features: pure MHS mode, observations in form of a set of complex concept assertions, and equal-paths pruning (in addition to subset pruning). An external DL reasoner is called in each node to query about the model. The model data is extracted through OWL API [13], using the g e t T y p e s method. This approach is preliminary and it has known issues2 , we are currently working on an alternate approach (cf. Conclusions). Using the implementation, we ran3 tests in order to understand if MHS-MXP may have a certain advantage at least on cases of inputs that we identified as favourable. We have used the LUBM ontology [14]. It does not contain disjunction, so we are able to test on it even given the limited model extraction method that we currently use. LUBM does not feature atomic concepts that would allow explanations of size greater than 1. Hence in each input we have set the observation to 𝐷(𝑎) where 𝑎 is an individual and 𝐷 is a new auxiliary concept name. Then we have randomly generated a set of LUBM concept names 𝐴1 , … , 𝐴𝑛 , and added an equivalence 𝐷 ≡ 𝐴1 ⊓ ⋯ ⊓ 𝐴𝑛 into the input ontology. If 𝐴1 , … , 𝐴𝑛 are independent concepts with no subconcepts there is exactly one explanation {𝐴1 (𝑎), … , 𝐴𝑛 (𝑎)}. If 𝐴1 , … , 𝐴𝑛 are not independent the explanation will be shorter. If some of 𝐴1 , … , 𝐴𝑛 have (independent) subconcepts there will be multiple partially overlapping explanations. We targeted explanations of size up to 5, hence we generated inputs for 𝑛 ∈ [1..5] – 50 inputs altogether. We have aggregated the inputs into five groups S1–S5 based on the size of the largest explanation. The inputs were generated randomly (however LUBM concepts with subconcepts were drawn twice as often to ensure higher number of explanations). The number of generated samples was consecutively reduced in order to obtain balanced groups S1–S5, each with 10 inputs. To verify the second part of our conjecture, i.e. that MHS-MXP may perform better on inputs with smaller number of explanations, we also aggregated the same 50 inputs differently, into 1 The implementation is available at https://github.com/IvetBx/MHS-MXP-algorithm. 2 We would need to extract all atomic concept assertions that are true in a model (i.e. in a complete completion tree constructed during a consistency check). OWL API is not well equipped for this task. The g e t T y p e s method instead returns all concept assertions that are entailed for a given individual. In cases e.g. when the ontology contain disjunction and 𝒦 ⊧ 𝐴 ⊔ 𝐵(𝑎) this would yield incorrect results. However for simple ontologies without disjunction there is always one model in which exactly those atomic concept assertions are true for a given individual which are also entailed (which is the model that we will extract). Therefore we may employ this preliminary model extraction method in our limited evaluation as described in the following. 3 The experiments were executed on a virtual machine with 8 cores (16 threads) of the processor Intel Xeon CPU E5-2695 v4, 32GB RAM, 2.10GHz, running Ubuntu 20.04 and Oracle Java SE Runtime Environment v1.8.0_201. To measure the execution times in Java, ThreadMXBean from the package java.lang.management was used. The user time measured the actual time without system overhead. The maximum Java heap size to 4GB. groups C1–C5 accordingly. Basic characteristics of groups S1–S5 and C1–C5 are given in Table 1. Table 1 Statistics for input groups: #: number of inputs; Cm , Ca , CM : min, average, and max count of explanations; Sm , Sa , SM : min, average, and max size of the largest explanation Set # Cm Ca CM Set # Cm Ca CM Sm Sa SM S1 10 2 8 21 C1 9 2 5.8 10 1 1.11 2 S2 10 9 70.5 160 C2 11 16 52 100 1 2 3 S3 10 48 213.4 480 C3 17 112 260.3 480 2 3.39 4 S4 10 168 418.8 840 C4 4 504 640.5 840 4 4.25 5 S5 10 504 2628 6720 C5 9 1176 2864 6720 5 5 5 Our implementation supports atomic and negated atomic concept assertions as abducibles, where the latter may be suppressed by a switch. We have used this to obtain an unfavourable case for MHS-MXP (both atomic and negated atomic concepts allowed) and a favourable case (negated atomic concepts suppressed). However, all generated inputs only have explanations involving atomic concept assertions, hence each input has exactly the same number of explanations in either case (favourable and unfavourable) – only the search space in the unfavourable case (and the inherent difficulty for MHS-MXP to handle it) is larger. Testing the inputs on the unfavourable case did not show significant difference between MHS and MHS-MXP. For the lack of space we omit details. The results on the favourable case are plotted in Figure 1. MHS runs are plotted on the left, and MHS-MXP runs are plotted on the right; results from groups S1–S5 are at the top, while groups C1–C5 are at the bottom. To observe the efficiency of search space exploration we measured for each level of the HS-tree (𝑥-axis) the time by which it is fully completed (𝑦-axis). Group averages are plotted. Recall that when MHS completes level 𝑛 it has already found all explanations of size up to 𝑛. In contrast MHS-MXP, after completing level 𝑛, has already found all explanations of size up to 𝑛+1 (cf. Observation 3). MHS runs show similar times for S1–S5 up to level 2. S4–S5 reached time out in level 3, S2–S3 in level 4. Few S1 runs managed to finish levels 4 and 5. (Note that the results for levels 4 and 5 are “polluted” by the fact that only a small number of runs which finished before the time out are averaged here. However, levels 0–3 show the general trend.) In contrast, MHS-MXP was able to finish all runs within the same time out, even if sometimes required a few more levels than the maximal size of explanation for the given input. We can see a clear correlation between the maximal length of explanations and the required running time, in this respect each consecutive group between S1 and S5 proved to be harder. The results on groups C1–C5 shows very similar trends, proving that MHS-MXP has greater advantage on inputs with smaller number of explanations. 7. Discussion We have reported on an ABox abduction algorithm based on hybrid combination of MHS and MXP. The aim of this approach is to obtain a complete algorithm, that is more efficient than MHS MHS-MXP 104 104 all S5 S4 103 103 S3 S2 S1 102 102 101 101 100 100 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 MHS MHS-MXP 104 104 all C5 C4 103 103 C3 C2 C1 102 102 101 101 100 100 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Figure 1: Average time (in the seconds) for computing individual levels from the certain group of inputs MHS, at least on a class of favourable inputs that we have described. We have implemented the algorithm and our first preliminary evaluation on the LUBM ontology confirms this conjecture. Both MHS and MHS-MXP rely on an external DL reasoner (as a black box) to compute models of the KB. However DL reasoners do not return models as a standard reasoning service. They do not even necessarily compute them, however, tableau-based reasoners compute a finite represen- tation of a model which contains sufficient information for our approach. We currently employ a simplified workaround for model extraction which does not work in general. It allowed us to make experiments on LUBM, however we want to extend these to other (real world) ontologies. We are currently looking into the possibility to use OWLKnowledgeExplorerReasoner extension in OWL API for true model extension. The advantage of using an API would be that different (tableau-based) DL reasoners could then be plugged in. Another option would be to use a tight source-code integration of a DL reasoner such as in the case the MHS solver AAA [15] which tightly integrates Pellet [16]. Acknowledgments We would like to express our thanks to anonymous reviewers for their valuable feedback on this and also on the previous version of this report. This work was supported by the Slovak Research and Development Agency under the Contract no. APVV-19-0220 (ORBIS) and by the EU H2020 programme under Contract no. 952215 (TAILOR). Martin Homola is also supported by projects VEGA 1/0621/22 and APVV-20-0353. References [1] C. Elsenbroich, O. Kutz, U. Sattler, A case for abductive reasoning over ontologies, in: Proceedings of the OWLED*06 Workshop on OWL: Experiences and Directions, Athens, GA, US, volume 216 of CEUR-WS, 2006. [2] R. Reiter, A theory of diagnosis from first principles, Artificial intelligence 32 (1987) 57–95. [3] R. M. Karp, Reducibility among combinatorial problems, in: Proceedings of a symposium on the Complexity of Computer Computations, March 20–22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York., 1972, pp. 85–103. [4] U. Junker, QuickXplain: Preferred explanations and relaxations for over-constrained problems, in: Proceedings of the Nineteenth National Conference on Artificial Intelli- gence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, San Jose, California, US, AAAI Press, 2004, pp. 167–172. [5] K. M. Shchekotykhin, D. Jannach, T. Schmitz, MergeXplain: Fast computation of mul- tiple conflicts for diagnosis, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, AAAI Press, 2015. [6] M. Homola, J. Pukancová, J. Gablíková, K. Fabianová, Merge, explain, iterate, in: S. Borg- wardt, T. Meyer (Eds.), Proceedings of the 33rd International Workshop on Description Logics (DL 2020) co-located with the 17th International Conference on Principles of Knowl- edge Representation and Reasoning (KR 2020), Online Event [Rhodes, Greece], September 12th to 14th, 2020, volume 2663 of CEUR Workshop Proceedings, CEUR-WS.org, 2020. [7] W. Del-Pinto, R. A. Schmidt, Abox abduction via forgetting in ALC, in: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, AAAI Press, 2019, pp. 2768–2775. [8] P. Koopmann, W. Del-Pinto, S. Tourret, R. A. Schmidt, Signature-based abduction for expressive description logics, in: Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020, Rhodes, Greece, 2020, pp. 592–602. [9] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The Description Logic Handbook: Theory, Implementation, and Applications, Cambridge University Press, 2003. [10] M. Schmidt-Schauß, G. Smolka, Attributive concept descriptions with complements, Artificial intelligence 48 (1991) 1–26. [11] I. Horrocks, O. Kutz, U. Sattler, The even more irresistible 𝒮 ℛ𝒪ℐ 𝒬, in: Proceedings, Tenth International Conference on Principles of Knowledge Representation and Reasoning, Lake District of the United Kingdom, AAAI, 2006, pp. 57–67. [12] J. Pukancová, M. Homola, ABox abduction for description logics: The case of multiple observations, in: Proceedings of the 31st International Workshop on Description Logics, Tempe, Arizona, US, volume 2211 of CEUR-WS, 2018. [13] M. Horridge, S. Bechhofer, The OWL API: A java API for OWL ontologies, Semantic Web 2 (2011) 11–21. [14] Y. Guo, Z. Pan, J. Heflin, LUBM: A benchmark for OWL knowledge base systems, Journal of Web Semantics 3 (2005) 158–182. [15] J. Pukancová, M. Homola, The AAA abox abduction solver, Künstliche Intell. 34 (2020) 517–522. [16] E. Sirin, B. Parsia, B. Cuenca Grau, A. Kalyanpur, Y. Katz, Pellet: A practical OWL-DL reasoner, Journal of Web Semantics 5 (2007) 51–53.