Enumerating Justifications using Resolution Yevgeny Kazakov and Peter Skočovský The University of Ulm, Germany {yevgeny.kazakov, peter.skocovsky}@uni-ulm.de Abstract. We propose a new procedure that can enumerate justifications of a logical entailment given a set of inferences using which this entailment can be derived from axioms in the ontology. The procedure is based on the extension of the resolution method with so-called answer literals. In comparison to other (SAT-based) methods for enumerating justifications, our procedure can enumer- ate justifications in any user-defined order that extends the subset relation. The procedure is easy to implement and can be parametrized with ordering and selec- tion strategies used in resolution. We describe an implementation of the procedure provided in PULi—a new Java-based Proof Utility Library, and provide an em- pirical comparison of (several strategies of) our procedure and other SAT-based tools on popular EL ontologies. The experiments show that our procedure pro- vides a comparable, if not better performance than those highly optimized tools. For example, using one of the strategies, we were able to compute all justifica- tions for all direct subsumptions of Snomed CT in about 1.5 hour. No other tool used in our experiments was able to do it even within a much longer period. 1 Introduction and Motivation Axiom pinpointing, or computing justifications—minimal subsets of axioms of the on- tology that entail a given logical consequence—has been a widely studied research topic in ontology engineering [2–4, 10, 11, 17, 19, 23, 21]. Most of the recent methods focus on the so-called EL family of description logics (DLs), in which logical conse- quences can be proved by deriving new axioms from existing ones using inferences rules. The resulting inferences are usually encoded as propositional (Horn) clauses, and justifications are computed from them using (modifications of) SAT solvers. To ensure correctness, the input inference set must be complete, that is, the inferences are enough to derive the consequence from any subset of the ontology from which it follows. In this paper, we present a new resolution-based procedure that enumerates all jus- tifications of an entailment given a complete set of inferences. Apart from requiring completeness, the form of inferences can be arbitrary and does not depend on any logic. For example, our method can be used with the inferences provided by existing consequence-based procedures [5, 16, 14, 22]. The procedure can enumerate justifica- tions in any given order, provided it extends the proper subset relation on sets of axioms. Performance of the procedure depends on the strategy it follows while enumerating jus- tifications. We have empirically evaluated three simple strategies and experimentally compared our procedure with other highly optimized justification computation tools. The paper is organized as follows. In Section 2 we describe related work. Section 3 defines basic notions such as inferences and resolution. In Section 4 we present the new procedure, and in Section 5 we describe its implementation and empirical evaluation. 2 Related Work There are, generally, two kinds of procedures for computing justifications [19] using a reasoner. Black-Box procedures use a reasoner solely for entailment checking, and thus can be used for any reasoner and DL. Glass-Box procedures require some additional information provided from a reasoner, such as inferences that the reasoner has used, and thus can only work with such reasoners. In a nutshell, Black-Box procedures [6, 11–13] add or remove axioms from a set of axioms (a justification candidate) while maintaining the invariant that the set entails the given goal logical consequence. An external reasoner is used for checking this invari- ant. Finding one justification is relatively easy and can be done by trying to remove each axiom once and check if the result still entails the goal. If it does not, the removed ax- iom is added back. Provided the entailment check takes polynomial time in the number of all axioms, e.g., in the case of EL, this procedure runs in polynomial time. However, finding all justifications is hard also in the case of EL, because there may be exponen- tially many of them [6]. Should a procedure for finding one justification be extended to a procedure for justification enumeration, the main problem is how to prevent repetition of justifications. Most techniques rely on the hitting set duality that was introduced in the field of Model Based Diagnosis [9, 20] and later adapted for DLs [10, 13]. Having a collection of sets, its hitting set contains some element from each of them. A minimal hitting set of a set of all justifications of some goal is its repair—when removed from the ontology, the goal is no longer entailed. Having a number of justifications of some goal, we can find a new one by removing one of their hitting sets from the ontology and applying the procedure for finding one justification. Many Glass-Box procedures focus on EL, because EL reasoning is relatively sim- ple and efficient. Some of them employ propositional satisfiability (SAT) solvers. Their input are inferences recorded by the reasoner during classification of an ontology. The inferences are used to reduce entailment to satisfiability of Horn clauses. This approach was first used by EL+SAT [21, 23], that includes an EL reasoner which records the in- ferences plus a few optimizations for reducing the set of inferences relevant to the goal. A common feature of the SAT-based procedures is enumeration of the candidate ax- iom sets exploiting the hitting set duality and further minimization of these candidates. EL+SAT and EL2MUS [4] use two instances of a SAT-solver—one for enumeration of candidates and the other one for checking whether a candidate entails the goal. SAT- Pin [17] uses one SAT-solver for both of these tasks and encodes a candidate as the set of axiom atoms currently assigned true. Enumeration of the same candidate twice is avoided by adding a blocking clause to the enumerating solver. If a candidate contains a justification, its blocking clause consists of negated atoms that encode axioms from that justification. EL+SAT and SATPin block candidates that do not entail the goal in the same way as justifications. EL2MUS does it in a different way. When its specialized enumeration solver finds a non-entailing candidate, its complement is automatically a repair. A repair is blocked by adding a clause consisting of positive atoms of its axioms. Further differences are that in EL2MUS the entailment checking solver is specialized for Horn clauses and that EL+SAT and SATPin extract justifications by a deletion-based procedure, while EL2MUS uses more efficient insertion-based procedure. EL2MCS [3] uses MaxSAT [1, 18] to enumerate all repairs and extracts justifications from them using the hitting set duality. BEACON [2] is a tool that integrates the justification procedure of EL2MUS. Up to few optimizations, the mentioned SAT-based procedures use inferences only for the entailment check. Had they delegated the entailment check to a separate DL reasoner, they could be considered Black-Box. Our approach uses a similar encoding of derivability check, however, it does not rely on the hitting set duality and, instead, manipulates the inferences directly. 3 Preliminaries 3.1 Inferences, Supports, and Justifications In this section, we introduce a general type of inferences that can manipulate any types of objects, that we call axioms. An inference is an expression inf of the form hα1 , . . . , αn ` αi where α1 , . . . , αn is a (possibly empty) sequence of axioms called the premises of inf, and α is an axiom called the conclusion of inf. An ontology is a finite set of axioms. Let I be an inference set. An I-derivation from O is a sequence of inferences d = hinf1 , . . . , infk i from I such that for every i with (1 ≤ i ≤ k), and each premise α0 of infi that is not in O, there exists j < i such that α0 is the conclusion of infj . An axiom α is derivable from O using I (notation: O `I α) if either α ∈ O or there exists an I- derivation d = hinf1 , . . . , infk i from O such that α is the conclusion of infk . A support for O `I α is a subset of O0 ⊆ O such that O0 `I α. A subset-minimal support for O `I α is called a justification for O `I α. Suppose that |= is an entailment relation between ontologies and axioms. A support for O |= α is a subset of O0 ⊆ O such that O0 |= α. A subset-minimal support for O |= α is called a justification for O |= α (also called a minimal axiom set MinA [6]). An inference hα1 , . . . , αn ` αi is sound if {α1 , . . . , αn } |= α. A set of inferences I is complete for the entailment O |= α if O0 |= α implies O0 `I α for every O0 ⊆ O. Note that if I is a set of sound inferences that is complete for the entailment O |= α then O0 |= α iff O0 `I α for every O0 ⊆ O. In particular, supports and justifications for O `I α coincide with supports and, respectively, justifications for O |= α. Example 1. Consider the ontology Oe = {A v B u C; A v B; A v C} over DL ax- ioms and the axiom αe = A v C u B. Assume that some consequence-based pro- cedure performed inferences Ie = {hA v B u C ` A v Bi, hA v B u C ` A v Ci, hA v C, A v B ` A v C u Bi} in order to derive αe from O. It is easy to see that Oe0 = {A v B; A v C} `Ie αe and Oe00 = {A v B u C} `Ie αe , but {A v B} 6`Ie αe and {A v C} 6`Ie αe . Hence, Oe0 and Oe00 are justifications for Oe `Ie αe . All inferences in Ie are also sound for the standard entailment relation |=. Since Oe0 and Oe00 are the only two justifications for Oe |= αe , the inference set Ie is complete for the entailment Oe |= αe . c ∨ a ¬a ∨ d c∨a∨a Resolution Factoring c∨d c∨a Fig. 1. Propositional resolution and factoring rules 3.2 Resolution with Answer Literals In this section we introduce the resolution calculus, which is a popular method for automated theorem proving [7]. We will mainly use resolution for propositional Horn clauses. A (propositional) atom is a propositional variable a. A (propositional) literal is either an atom l = a (positive literal) or a negation of atom l = ¬a (negative literal). A (propositional) clause is a disjunction of literals c = l1 ∨ · · · ∨ ln , n ≥ 0. As usual, we do not distinguish between the order of literals in clauses, i.e., we associate clauses with multisets of literals. Given two clauses c1 and c2 , we denote by c1 ∨ c2 the clause consisting of all literals from c1 and all literals from c2 . A clause is Horn if it has at most one positive literal. The empty clause  is the clause containing n = 0 literals. The inference rules for the propositional resolution calculus are given in Figure 1. We say that a set of clauses S is closed under the resolution rules, if every clause derived by the rules in Figure 1 from S already occurs in S. The resolution calculus is refutationally complete: a set of clauses S that is closed under the resolution rules is satisfiable if and only if it does not contain the empty clause. This means that for checking satisfiability of the input set of clauses, it is sufficient to deductively close this set of clauses under the resolution rules and check if the empty clause is derived in the closure. To reduce the number of resolution inferences (and hence the size of the closure) several refinements of the resolution calculus were proposed. The rules in Figure 1 can be restricted using orderings and selection functions [7]. In particular, for Horn clauses, it is sufficient to select one (positive or negative) literal in each clause, and require that the resolution inferences are applied only on those (Theorem 7.2 in [7]).1 This strategy is called resolution with free selection. In addition to rule restrictions, one can also use a number of simplification rules that can remove or replace clauses in the closure S. We will use two such rules. Elimination of duplicate literals removes all duplicate literals from a clause (including duplicate negative literals). Subsumption deletion removes a clause c from S if there exists another sub-clause c0 of c in S, i.e., c = c0 ∨ c00 for some (possibly empty) clause c00 . In this case we say that c0 subsumes c. Example 2. Consider the set of Horn clauses 1-7 below. We apply resolution with free selection that selects the underlined literals in clauses. Clauses 8-10 are obtained by resolution inferences from clauses shown on the right. 1: ¬p1 ∨ p2 4: p1 7: ¬p4 8: ¬p3 ∨ p4 h3, 5i 2: ¬p1 ∨ p3 5: p2 9: p4 h6, 8i 3: ¬p2 ∨ ¬p3 ∨ p4 6: p3 10:  h7, 9i Note that the resolution rule was not applied, e.g., to clauses 3 and 6 because literal ¬p3 in clause 3 is not selected. Also note that many clauses in the closure above can be derived by several resolution inferences. For example, clause 5 can be obtained by 1 Note that the factoring rule cannot apply to Horn clauses. resolving clauses 1 and 4 and clause 6 by resolving 2 and 4. Therefore the empty clause 10 can be derived from several subsets of the original clauses 1-7. Although the resolution calculus is mainly used for checking satisfiability of a clause set, it can be also used for finding unsatisfiable subsets of clauses. To do this, it is sufficient to add to every input clause a fresh positive answer literal [8]. Resolution rules can then be applied to the extended clauses on the remaining (ordinary) literals using the usual orderings and selection functions. Thus, if a clause with answer liter- als is derived, then this clause with the answer literals removed, can be derived from the clauses for which the answer literals were introduced. In particular, if a clause con- taining only answer literals is derived, then the set of clauses that corresponds to these answer literals is unsatisfiable. Completeness of resolution means that all such unsatisfi- able sets of clauses can be found in this way. If answer literals are added to some but not all clauses and a clause with only answer literals is derived, then the set of clauses that corresponds to the answer literals plus clauses without answer literals is unsatisfiable. Example 3. Consider the clauses 1-7 from Example 2. Let us add answer literals a1 -a3 to clauses 4-6 and apply the resolution inferences on the remaining (underlined) literals like in Example 2, eliminating duplicate literals if they appear. 1: ¬p1 ∨ p2 8: p2 ∨ a1 h1, 4i 15: p4 ∨ a1 h9, 11i 2: ¬p1 ∨ p3 9: p3 ∨ a1 h2, 4i 16: a2 ∨ a3 h7, 12i 3: ¬p2 ∨ ¬p3 ∨ p4 10: ¬p3 ∨ p4 ∨ a2 h3, 5i 17: a1 ∨ a2 h7, 13i 4: p1 ∨ a1 11: ¬p3 ∨ p4 ∨ a1 h3, 8i 18: a1 ∨ a3 h7, 14i 5: p2 ∨ a2 12: p4 ∨ a2 ∨ a3 h6, 10i 19: a1 h7, 15i 6: p3 ∨ a3 13: p4 ∨ a1 ∨ a2 h9, 10i 7: ¬p4 14: p4 ∨ a1 ∨ a3 h6, 11i The framed clauses 16-19 contain only answer literals, so the corresponding sets of clauses are unsatisfiable in conjunction with the input clauses without answer literals. For example, clause 16 means that clauses 1-3, 5-7 are unsatisfiable and clause 19 means that clauses 1-4, 7 are also unsatisfiable. Note that clause 19 subsumes clauses 17-18; if subsumed clauses are deleted, we obtain only clauses with answer literals that correspond to minimal subsets of clauses 4-6 that are unsatisfiable in conjunction with the remaining input clauses 1-3, 7. 4 Enumerating Justifications using Resolution In this section, we present a new procedure that, given an ontology O, an inference set I and a goal axiom αg , enumerates justifications for O `I αg . It uses the usual reduction of the derivability problem O `I αg to satisfiability of propositional Horn clauses in combination with the resolution procedure with answer literals. Given the derivability problem O `I αg , we assign to each axiom αi occurring in I a fresh atom pαi . Each inference hα1 , . . . , αn ` αi ∈ I is then translated to the Horn clause ¬pα1 ∨ · · · ∨ ¬pαn ∨ pα . In addition, for each axiom α ∈ O that appears in I, we introduce a (unit) clause pα . Finally, we add the clause ¬pαg encoding the assumption that αg is not derivable. It is easy to see that O `I αg if and only if the resulting set of clauses is unsatisfiable. We now extend this reduction to find justifications for O `I αg . Recall that a subset O0 ⊆ O is a support for O `I αg if O0 `I αg . Hence, the subset of clauses pα for α ∈ O0 is unsatisfiable in combination with the clauses for the encoding of inferences and ¬pαg . We can find all such minimal subsets (corresponding to justifications) by adding a fresh answer literal to every clause pα with α ∈ O, and applying resolution on non-answer literals together with elimination of redundant clauses. Example 4. Consider the ontology Oe , inferences Ie and axiom αe from Example 1. To encode the derivability problem Oe `Ie αe we assign atoms p1 –p4 to the axioms occurring in Ie as follows: p1 : A v B u C, p2 : A v B, p3 : A v C, p4 : A v C u B. The encoding produces clauses 1-7 from Example 3: the inferences Ie are encoded by clauses 1-3, the axioms in Oe result in clauses 4-6 with answer literals, and the assumption that αe is not derivable is encoded by clause 7. The derived clauses 16-19 correspond to supports of Oe `Ie αe , and by eliminating redundant clauses 17-18, we obtain clauses 16 and 19 that correspond to justifications Oe0 and Oe00 from Example 1. One disadvantage of the described procedure is that it requires the closure under the resolution rules to be fully computed before any justification can be found. Indeed, since derived clauses may be subsumed by later clauses, one cannot immediately see whether a clause with only answer literals corresponds to a justification or not. For example, clauses 17-18 in Example 3 are subsumed by clause 19 that is derived later, and hence do not correspond to justifications. We address this problem by using non-chronological application of resolution inferences. Intuitively, instead of processing clauses in the order in which they are derived, we process clauses containing fewer answer literals first. Thus, in Example 3, we process clause 15 before clauses 12-14. The improved procedure can enumerate justifications, i.e., return justifications one by one without waiting for the algorithm to terminate. The procedure is described in Algorithm 1. It is a minor variation of the standard saturation-based procedure for com- puting the closure under (resolution) rules, which uses a priority queue to store unpro- cessed clauses instead of an ordinary queue. Let - be a total preorder on clauses (a transitive reflexive relation for which every two clauses are comparable). As usual, we write c1 ≺ c2 if c1 - c2 but c2 6- c1 . We say that - is admissible if c1 ≺ c2 when- ever the set of answer literals of c1 is a proper subset of the set of answer literals of c2 . For example, it is required that ¬p3 ∨ p4 ∨ a1 ≺ p4 ∨ a1 ∨ a2 , but not necessary that p4 ∨ a1 ≺ p4 ∨ a2 ∨ a3 . Note that if c is derived by resolution from clauses c1 and c2 then c1 - c and c2 - c since c contains the answer literals of both c1 and c2 . We say that a clause d (not necessarily occurring in Q) is minimal w.r.t. Q if there exists no clause c ∈ Q such that c ≺ d. A priority queue based on - is a queue in which the remove operation can take out only minimal elements w.r.t. Q.2 Given such a queue Q, Algorithm 1 initializes it with the translation of the input problem O `I α (line 3) 2 If there are several minimal elements in the queue, one of them is chosen arbitrarily. Algorithm 1: Enumeration of justifications using resolution Enumerate(O `I α, -): enumerate justifications for O `I α input : O `I α – the problem for which to enumerate justifications, - – an admissible preorder on clauses 1 S ← createEmptyList() ; // for processed clauses 2 Q ← createEmptyQueue(-) ; // for unprocessed clauses 3 Q.addAll(encode(O `I α)); // add the clause encoding of the problem 4 while Q 6= ∅ do 5 c ← Q.remove(); // take one minimal element out of the queue 6 c ← simplify(c); // remove duplicate literals from c 7 if c is not subsumed by any c0 ∈ S then 8 S.add(c); 9 if c contains only answer literals then 10 report decode(c); // a new justification is found 11 else // apply resolution rules to c and clauses in S 12 for c0 ∈ resolve(c,S) do 13 Q.add(c0 ); and then repeatedly applies resolution inferences between minimal clauses taken out of this queue (loop 4-13) and the clauses in S that were processed before. Specifically, the removed minimal clause c is first simplified by removing duplicate literals (line 6) and then checked if it is subsumed by previously processed clauses in S (in particular, if c was processed before). If c is subsumed by some c0 ∈ S, it is ignored and the next (minimal) clause is taken from the queue Q. Otherwise, c is added to S (line 8). If c contains only answer literals, then it corresponds to a justification (as we show next), which is then reported by the algorithm (line 10). Otherwise, resolution inferences are then applied on the selected non-answer literal in c (line 12). The new clauses derived by resolution are then added to Q (line 13) and the loop continues. We now prove that Algorithm 1 in line 10 always returns a (new) justification. It is easy to see that if a clause d was minimal w.r.t. Q in the beginning of the while loop (line 4) then it remains minimal w.r.t. Q at the end of the loop (line 13). Indeed, for the clause c taken from the queue (line 5), we have c 6≺ d. For all clauses c0 obtained by resolving c with clauses from S (line 12) we have c - c0 . Hence c0 6≺ d for all c0 added to Q (line 13) (for otherwise, c - c0 ≺ d). This, in particular, implies that each clause in S is always minimal w.r.t. Q and, consequently, if c1 was added to S before c2 then c1 - c2 (for otherwise c2 ≺ c1 and c1 would not be minimal w.r.t. Q when c2 ∈ Q). Hence, there cannot be two clauses c1 and c2 in S that contain only answer literals such that c1 is a proper sub-clause of c2 since in this case c1 ≺ c2 , thus c2 must be added to S after c1 , but then c2 would be subsumed by c1 (see line 7). Hence each result returned in line 10 is a (new) justification. Since clauses are added to S in the order defined by -, the justifications are also returned according to this order. Hence Algorithm 1 can return justifications in any user-defined order - on subsets of axioms as long as s1 ( s2 implies s1 ≺ s2 . Indeed, any such an order - can be lifted to an admissible order on clauses by comparing the sets of answer literals of clauses like the corresponding sets of axioms. For example, one can define s1 - s2 by ||s1 || ≤ ||s2 || where ||s|| is the cardinality of s. Instead of ||s|| one can use any other measure m(s) that is monotonic over the proper subset relation (i.e., s1 ( s2 implies m(s1 ) < m(s2 )), e.g., the length of s—the total number of symbols needed to write down all axioms in s. 5 Implementation and Evaluation We have implemented Algorithm 1 as a part of the new Java-based Proof Utility Library (PULi).3 In our implementation, we used the standard Java priority queue for Q, and employed a few optimisations to improve the performance of the algorithm. First, we have noticed that our implementation spends over 95% of time on checking subsumptions in line 7. To improve subsumption checks, we developed a new datastruc- ture for storing sets of elements and checking if a given set is a superset of some stored set. In a nutshell, we index the sets by 128 bit vectors, represented as a pair of 64 bit integers, where each element in a set sets one bit of the bit vector to 1 using its hash value. This idea is reminiscent of Bloom filters.4 We store the sets in a trie5 with the bit vector as the key, and use bitwise operations to determine if one vector has all bits of the other vector, which gives us a necessary condition for set inclusion. Using this datastructure, we were able to significantly improve the subsumption tests. We have also noticed that the queue Q often contains about 10 times more elements than the closure S. To improve the memory consumption, we do not create the resolvents c0 immediately (see line 12), but instead store in the queue Q the pairs of clauses (from S) from which these resolvents were obtained. This does not reduce the number of elements in the queue, but reduces the memory consumed by each element to essentially a few pointers. We have evaluated our implementation on inferences computed for entailed ax- ioms in some large EL ontologies, and compared performance with SAT-based tools for enumeration of justifications from inferences EL2MUS [4], EL2MCS [3] and SAT- Pin [17]. The inferences were extracted by EL+SAT [21, 23] (in the following called sat inferences) and ELK [15] (in the following called elk inferences). Both are capable of computing small inference sets that derive particular entailed axioms and are complete for these entailments (see Section 3.1). See Table 2 for statistics about the inferences obtained for the entailments. For our evaluation, we chose ontologies GO-P LUS, G ALEN and S NOMED, which contain (mostly) EL axioms. GO-P LUS is a recent version of Gene Ontology,6 which imports a number of other ontologies. The provided distribution included subsumption axioms that were inferred (annotated with is_inferred), which we have removed. G ALEN is the version 7 of OpenGALEN.7 We did not use the more recent version 8, 3 https://github.com/liveontologies/puli 4 https://en.wikipedia.org/wiki/Bloom_filter 5 https://en.wikipedia.org/wiki/Trie 6 http://geneontology.org/page/download-ontology 7 http://www.opengalen.org/sources/sources.html Table 1. Summary of the input ontologies Table 2. Summary of sizes of inference sets GO-P LUS G ALEN S NOMED GO-P LUS G ALEN S NOMED # axioms 105557 44475 315521 average 470.3 59140.0 997.8 # concepts 57173 28482 315510 sat median 39.0 110290.0 1.0 # roles 157 964 77 max 15915.0 152802.0 39381.0 # queries 90443 91332 468478 average 166.9 3602.0 110.3 elk median 43.0 3648.0 8.0 max 7919.0 81501.0 1958.0 because the other tools were running out of memory. S NOMED is the version of Snomed CT8 released on 2015-01-31. From the first two ontologies we removed non-EL axioms, such as functional property axioms, and axioms that contain inverse property expres- sions and disjunctions. We have also adapted the input ontologies, so that they could be processed by (the reasoner of) EL+SAT. We removed disjointness axioms and re- placed property equivalences with pairs of property inclusions. Duplicate axioms were removed by loading and saving the ontologies with OWL API. With these ontologies, we have computed justifications for the entailed direct subsumptions between atomic concepts (in the following called the queries) using various tools. All queries were pro- cessed by tools in a fixed random order to achieve a fair distribution of easy and hard problems. We used a global timeout of one hour for each tool and a local timeout of one minute per query. To run the experiments we used a PC with Intel Core i5 2.5 GHz processor and 7.7 GiB RAM operated under 64-bit OS Ubuntu 14.04. Table 1 shows the numbers of axioms, atomic concepts, atomic roles, and queries of each input ontology. As an admissible order on clauses for our implementation of Algorithm 1, we chose the relation - that compares the number of different answer literals in clauses. When using this order, cardinality-minimal justifications are found first. Note that finding such justifications is NP-hard [6], however, practically, our algorithm with this order found first justifications of all the queries of GO-P LUS, G ALEN and S NOMED respectively in about 13 minutes, 2 hours and 1.5 hours. To control resolution inferences, we used three different selection strategies (for Horn clauses) that we detail next. For a proposi- tional atom p, let #(p) be the number of input clauses in which p appears as a (positive) literal. Given a clause c, the BottomUp strategy, selects a negative literal ¬p of c whose value #(p) is minimal; if there are no negative literals, the (only) positive literal of c is selected. The TopDown strategy selects a positive literal, if there is one, and otherwise selects a negative literal like in BottomUp. Finally, the Threshold strategy selects a neg- ative literal ¬p with the minimal value #(p) if #(p) does not exceed a given threshold value or there is no positive literal in c; otherwise the positive literal is selected. In our experiments we used the threshold value of 2. Intuitively, the BottomUp strategy simu- lates the Unit resolution, the TopDown simulates the SLD resolution, and the Threshold is a combination thereof. Table 3 shows for how many queries all justifications were computed within the global and local timeouts.9 The first six rows correspond to experiments on sat infer- 8 http://www.snomed.org/ 9 All experimental data is available at https://osf.io/9sj8n/ Table 3. Number of queries processed in 1h / number of 60s timeouts / % of processed queries in the number of all queries / % of 60s timeouts in the number of queries attempted in 1h GO-P LUS G ALEN S NOMED BottomUp 2004 / 43 / 2.2 / 2.15 123 / 56 / 0.1 / 45.5 3270 / 31 / 0.7 / 0.95 TopDown 16662 / 48 / 18.4 / 0.29 5028 / 18 / 5.5 / 0.36 7388 / 30 / 1.6 / 0.41 Threshold 23968 / 45 / 26.5 / 0.19 2873 / 3 / 3.1 / 0.10 18700 / 14 / 4.0 / 0.07 sat EL2MUS 10357 / 42 / 11.5 / 0.41 4862 / 32 / 5.3 / 0.66 12773 / 37 / 2.7 / 0.29 EL2MCS 6369 / 43 / 7.0 / 0.68 3194 / 27 / 3.5 / 0.85 6163 / 46 / 1.3 / 0.75 SATPin 4390 / 53 / 4.9 / 1.21 1475 / 39 / 1.6 / 2.64 3490 / 46 / 0.7 / 1.32 BottomUp 3113 / 41 / 3.4 / 1.32 8204 / 23 / 9.0 / 0.28 85977 / 20 / 18.4 / 0.02 TopDown 18392 / 43 / 20.3 / 0.23 6757 / 33 / 7.4 / 0.49 73063 / 15 / 15.6 / 0.02 Threshold 30579 / 47 / 33.8 / 0.15 25824 / 15 / 28.3 / 0.06 291151 / 1 / 62.1 / 0.00 elk EL2MUS 12877 / 48 / 14.2 / 0.37 12349 / 40 / 13.5 / 0.32 15054 / 42 / 3.2 / 0.28 EL2MCS 6694 / 49 / 7.4 / 0.73 8545 / 47 / 9.4 / 0.55 6466 / 48 / 1.4 / 0.74 SATPin 4689 / 50 / 5.2 / 1.07 3050 / 48 / 3.3 / 1.57 4320 / 52 / 0.9 / 1.20 ences and the other six rows to experiments on elk inferences. Note that every tool processed more queries and had less relative timeouts on elk inferences. Notice that the resolution procedure with the Threshold selection strategy processed the most queries and had the least number of timeouts on all ontologies. In particular, for S NOMED it timed out only in one case. It turns out, that without any timeout Threshold was able to find all justifications for all queries of S NOMED within about 1.5 hours, with the longest query taking less than 4 minutes. None of the SAT-based tools was able to find all justifications for all queries of S NOMED even within 24 hours. To determine whether the Threshold strategy was the best among the resolution strategies on all queries, we have plotted in Figure 2 the distributions of the query times for all resolution strategies. Each point hx, yi of a plot represents the proportion x of queries that were solved by the method in under the time y. For instance, TopDown solved about 90% of the queries of G ALEN in under 0.1 seconds. Each plot considers only queries attempted by all tools on that plot. Since each plot represents the distri- bution of times and not a direct comparison of times for each query, even if one line is completely below another one, this does not mean that the corresponding method is faster for every query. To get a more detailed comparison, we have also plotted the distribution of minimum query times with a thin black line. For a query, minimum time is the time spent by the tool that was the fastest on that query (among the tools on the same plot). If a plot for some tool coincides with this black line at point (x, y), then all queries solved within time y by some tool were also solved within time y by this tool. In particular, this tool is the fastest for all queries with the minimal time y. This analysis shows, for example, that TopDown was the best tool for easy queries (solved under 0.1 seconds by some tool) in G ALEN, and Threshold was the best tool for hard queries (solved over 1 second by all tools) on all ontologies. In Figure 3 we present a similar comparison of Threshold with the SAT-based tools. Threshold comes as the winner for hard queries (solved over 0.1 seconds) in 60 60 60 GO-P LUS G ALEN S NOMED time in seconds time in seconds time in seconds 10 10 10 1 1 1 0.1 0.1 0.1 0.01 0.01 0.01 % of queries % of queries % of queries 0 60 80 90 100 0 60 80 90 100 0 60 80 90 100 BottomUp TopDown Threshold minimum Fig. 2. Distribution of query times for the selection strategies on elk inferences 60 60 60 GO-P LUS G ALEN S NOMED time in seconds time in seconds time in seconds 10 10 10 1 1 1 0.1 0.1 0.1 0.01 0.01 0.01 % of queries % of queries % of queries 0 60 80 90 100 0 60 80 90 100 0 60 80 90 100 SATPin EL2MCS EL2MUS Threshold minimum Fig. 3. Distribution of query times on elk inferences10 S NOMED.11 EL2MUS wins on queries solved between about 10 and 50 milliseconds in G ALEN. For all other query ranges, there appears to be no absolute winner. 6 Summary We presented a new procedure that enumerates justifications using inferences that de- rive the goal consequence from an ontology. The inferences are encoded as Horn clauses and resolution with answer literals is applied. Our procedure can be parameterized by an ordering in which the justifications should be enumerated (as long as it extends the subset relation) and by a strategy that selects literals for resolution. The algorithm is relatively easy to implement and it can be easily used also with non-Horn and non- propositional clauses. Our empirical evaluation shows that the procedure provides com- parable, if not better performance than other tools that also use inferences as input. We were able to compute all justifications for all direct subsumptions of Snomed CT in about 1.5 hours. Currently, we cannot explain the difference in the performance of the evaluated selection strategies. We hope to explore this question in the future. 10 Plots on sat inferences have the same shape, except that all lines are shifted up. 11 For other ontologies Threshold times out on some query for which some other tool does not. References 1. Ansótegui, C., Bonet, M.L., Levy, J.: Sat-based maxsat algorithms. Artif. Intell. 196, 77–105 (2013), http://dx.doi.org/10.1016/j.artint.2013.01.002 2. Arif, M.F., Mencía, C., Ignatiev, A., Manthey, N., Peñaloza, R., Marques-Silva, J.: BEACON: an efficient sat-based tool for debugging EL+ ontologies. In: Creignou, N., Berre, D.L. (eds.) Proc. 19th Int. Conf. on Theory and Applications of Satisfiability Testing (SAT’16). Lecture Notes in Computer Science, vol. 9710, pp. 521–530. Springer (2016), http://dx.doi. org/10.1007/978-3-319-40970-2_32 3. Arif, M.F., Mencía, C., Marques-Silva, J.: Efficient axiom pinpointing with EL2MCS. In: Hölldobler, S., Krötzsch, M., Peñaloza, R., Rudolph, S. (eds.) Proc. 38th An- nual GermanConf. on Artificial Intelligence (KI’15). Lecture Notes in Computer Sci- ence, vol. 9324, pp. 225–233. Springer (2015), http://dx.doi.org/10.1007/ 978-3-319-24489-1_17 4. Arif, M.F., Mencía, C., Marques-Silva, J.: Efficient MUS enumeration of horn formulae with applications to axiom pinpointing. CoRR abs/1505.04365 (2015), http://arxiv.org/ abs/1505.04365 5. Baader, F., Lutz, C., Suntisrivaraporn, B.: Efficient reasoning in EL+ . In: Parsia, B., Sat- tler, U., Toman, D. (eds.) Proc. 19th Int. Workshop on Description Logics (DL’06). CEUR Workshop Proceedings, vol. 189. CEUR-WS.org (2006) 6. Baader, F., Peñaloza, R., Suntisrivaraporn, B.: Pinpointing in the description logic EL+ . In: Hertzberg, J., Beetz, M., Englert, R. (eds.) Proc. 30th Annual GermanConf. on Artificial Intelligence (KI’07). Lecture Notes in Computer Science, vol. 4667, pp. 52–67. Springer (2007), https://doi.org/10.1007/978-3-540-74565-5_7 7. Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Robinson, J.A., Voronkov, A. (eds.) Handbook of Automated Reasoning, pp. 19–99. Elsevier and MIT Press (2001) 8. Green, C.: Theorem proving by resolution as a basis for question-answering systems. Ma- chine Intelligence pp. 183–205 (1969) 9. Greiner, R., Smith, B.A., Wilkerson, R.W.: A correction to the algorithm in reiter’s the- ory of diagnosis. Artif. Intell. 41(1), 79–88 (1989), http://dx.doi.org/10.1016/ 0004-3702(89)90079-9 10. Horridge, M.: Justification based explanation in ontologies. Ph.D. thesis, Univer- sity of Manchester, UK (2011), http://www.manchester.ac.uk/escholar/ uk-ac-man-scw:131699 11. Junker, U.: QUICKXPLAIN: preferred explanations and relaxations for over-constrained problems. In: McGuinness, D.L., Ferguson, G. (eds.) Proc. 19th AAAI Conf. on Artificial Intelligence (AAAI’04). pp. 167–172. AAAI Press / The MIT Press (2004), http://www. aaai.org/Library/AAAI/2004/aaai04-027.php 12. Kalyanpur, A.: Debugging and Repair of OWL Ontologies. Ph.D. thesis, University of Mary- land College Park, USA (2006) 13. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justifications of OWL DL entailments. In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) Proc. 6th Int. Semantic Web Conf. (ISWC’07). LNCS, vol. 4825, pp. 267–280. Springer (2007) 14. Kazakov, Y.: Consequence-driven reasoning for Horn SHIQ ontologies. In: Boutilier, C. (ed.) Proc. 21st Int. Joint Conf. on Artificial Intelligence (IJCAI’09). pp. 2040–2045. IJCAI (2009) 15. Kazakov, Y., Klinov, P.: Goal-directed tracing of inferences in EL ontologies. In: The Se- mantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II. pp. 196–211 (2014) 16. Kazakov, Y., Krötzsch, M., Simančík, F.: The incredible ELK: From polynomial procedures to efficient reasoning with EL ontologies. J. of Automated Reasoning 53(1), 1–61 (2014) 17. Manthey, N., Peñaloza, R., Rudolph, S.: Efficient axiom pinpointing in EL using SAT technology. In: Lenzerini, M., Peñaloza, R. (eds.) Proc. 29th Int. Workshop on Descrip- tion Logics (DL’16). CEUR Workshop Proceedings, vol. 1577. CEUR-WS.org (2016), http://ceur-ws.org/Vol-1577/paper_33.pdf 18. Morgado, A., Liffiton, M.H., Marques-Silva, J.: Maxsat-based MCS enumeration. In: Biere, A., Nahir, A., Vos, T.E.J. (eds.) Proc. 8th Int. Haifa Verification Conf. (HVC’12). Lecture Notes in Computer Science, vol. 7857, pp. 86–101. Springer (2012), http://dx.doi. org/10.1007/978-3-642-39611-3_13 19. Parsia, B., Sirin, E., Kalyanpur, A.: Debugging OWL ontologies. In: Ellis, A., Hagino, T. (eds.) Proc. 14th Int. Conf. on World Wide Web (WWW’05). pp. 633–640. ACM (2005), http://doi.acm.org/10.1145/1060745.1060837 20. Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987), http://dx.doi.org/10.1016/0004-3702(87)90062-2 21. Sebastiani, R., Vescovi, M.: Axiom pinpointing in lightweight description logics via horn- sat encoding and conflict analysis. In: Schmidt, R.A. (ed.) Proc. 22st Conf. on Automated Deduction (CADE’09). Lecture Notes in Computer Science, vol. 5663, pp. 84–99. Springer (2009), http://dx.doi.org/10.1007/978-3-642-02959-2_6 22. Simančík, F., Kazakov, Y., Horrocks, I.: Consequence-based reasoning beyond Horn ontolo- gies. In: Walsh, T. (ed.) Proc. 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI’11). pp. 1093–1098. AAAI Press/IJCAI (2011) 23. Vescovi, M.: Exploiting SAT and SMT Techniques for Automated Reasoning and Ontology Manipulation in Description Logics. Ph.D. thesis, University of Trento, Italy (2011), http: //eprints-phd.biblio.unitn.it/477/