=Paper=
{{Paper
|id=None
|storemode=property
|title=Optimising Resolution-Based Rewriting Algorithms for DL Ontologies
|pdfUrl=https://ceur-ws.org/Vol-1014/paper_74.pdf
|volume=Vol-1014
|dblpUrl=https://dblp.org/rec/conf/dlog/TrivelaSCS13
}}
==Optimising Resolution-Based Rewriting Algorithms for DL Ontologies==
Optimising Resolution-Based Rewriting Algorithms for DL Ontologies? Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos Stamou School of Electrical and Computer Engineering, National Technical University of Athens, Greece Abstract. Resolution-based rewriting algorithms have been widely used for computing disjunctive datalog rewritings for DL TBoxes, and re- cently, also for computing datalog rewriting for queries over TBoxes ex- pressed in DL-Lite and ELHI. Although such algorithms are general enough to support a wide variety of (even very complex) DLs, this gen- erality comes with performance prices. In the current paper we present a resolution-based (query) rewriting algorithm for ELHI that is based on a hyper-resolution like inference rule and which avoids performing many redundant inferences. We have implemented the algorithm and have con- ducted an experimental evaluation using large and complex ontologies. 1 Introduction A prominent approach to query answering over DL ontologies is via rewriting the given input into formalisms for which efficient data retrieval systems exist. More precisely, in TBox rewriting (resp. query rewriting) the input TBox T (resp. TBox T and query Q) is transformed into a set of sentences P, typically a (disjunctive) datalog program called rewriting, such that for any dataset D the answers of any ground query Qg (resp. of the query Q) w.r.t. D and T coincide with the answers of Qg (resp. Q) w.r.t. D and P [12, 3, 19]. Consequently, after computing P the problem of query answering can be delegated to efficient and scalable (deductive) database and datalog evaluation systems by either directly evaluating P using off-the-shelf systems [8], by implementing customised engines [16], or by integrating P in an optimal way into data saturation engines [23]. Numerous rewriting systems for DLs of varying expressivity have been de- veloped the last decade. One of the first practical systems for answering ground queries over OWL DL ontologies, namely KAON2 [16], was based on TBox- rewriting. Also recently, a large number of rewriting-based systems for answer- ing arbitrary queries over less expressive DLs have been developed. Prominent examples include QuOnto [1], Presto [21], Quest [20], Rapid [7], Nyaya [17], and IQAROS [24], which support DL-Lite, and Requiem [19] and Clipper [8] which support ELHI and Horn-SHIQ, respectively. An important approach for computing rewritings is by using resolution calculi [2]. In this setting, the input is first transformed into a set of clauses which ? Work was supported by an FP7 Marie Curie CIG grant and Europeana Creative. is then saturated using resolution to derive new (datalog) clauses. On the one hand, such calculi are worst-case optimal and allow for a large number of existing optimisations, like ordering restrictions and subsumption deletion. On the other hand, since there exist many resolution-based decision procedures for expressive fragments of first order logic [9, 13] it is (relatively) easier to design a resolution- based rewriting algorithm for an expressive DL compared to designing a custom made one. For example, to the best of our knowledge, none of the tailor made systems for DL-Lite can currently support more expressive DLs. However, the efficiency of resolution-based approaches has also been criticised [22]. Even with all existing optimisations the saturation can perform many redun- dant inferences producing clauses that contain function symbols and which don’t subsequently lead to the generation of function-free (datalog) clauses that are part of the rewriting. Hence, tailor made approaches have greatly surpassed the resolution-based ones [22, 14]. To circumvent this issue and provide an efficient resolution-based query rewriting algorithm for DL-Lite, an optimised hyper- resolution like inference that avoids producing (intermediate) redundant clauses that contain function symbols has been proposed [7]. The calculus was imple- mented in Rapid and as shown also by independent evaluations, Rapid is cur- rently one of the fastest query rewriting systems [7, 14], however, like most of them it can only support DL-Lite. In the current paper we investigate whether a resolution-based rewriting algorithm that is based on a similar hyper-resolution step can be defined for for ELHI TBoxes. This is a technically very challenging task as the structure of ELHI axioms implies many complex interactions between the clauses (in contrast to DL-Lite, ELHI is not FO-rewritable and subsumption checking is in ExpTime). However, we show that a rewriting can be computed by a calculus that contains an extension of the hyper-resolution step of DL-Lite plus a new resolution rule that we expect to be rarely applied in practice. To achieve this we first cast the Rapid calculus for DL-Lite to the framework of resolution and sketch its correctness. Finally, we conducted an experimental evaluation using large-scale real-world ontologies. Our results show that in many tests state-of- the-art systems cannot compute a rewriting within a reasonable amount of time (more than one hour) compared to the new implementation. 2 Preliminaries We use the standard notions of first-order term, atom, variable, sentence, con- stant, function symbols, functional terms, entailment (|=), and the like. Resolution-Based Calculi We use standard notions from (resolution) theorem- proving like clause, (hyper-)resolvent and most general unifier (mgu). An infer- ence rule, or simply inference is an n + 1-ary relation usually written as follows: C1 C2 . . . Cn (1) C where C1 is called the main premise, C2 , . . . Cn are called the side premises and C is called the conclusion or resolvent. An inference system I, also called calculus, 2 is a collection of inference rules. Let Σ be a set of clauses, C a clause and I an inference system. A derivation of C from Σ by I, written Σ `I C (or simply Σ ` C if I is clear from the context), is a sequence of clauses C1 , . . . , Cm such that Cm = C, each Ci is either a member of Σ or the conclusion of an inference by I from Σ ∪ {C1 , . . . , Ci−1 }. In that case we say that C is derivable from Σ by I. We write Σ `i C to denote that the depth of the corresponding derivation tree [6] constructed for C from Σ is less or equal to i. We also often write Σ, C ` C 0 instead of Σ ∪ {C} ` C 0 . An inference system with particular interest to us is SLD, denoted by ISLD , where all side premises are members of Σ. Description Logics Let C, R, and I be countable, pairwise disjoint sets of atomic concepts, atomic roles, and individuals. An ELHI-role is either an atomic role P or its inverse P − . The set of ELHI-concepts is defined inductively as follows, where A ∈ C, R is an ELHI-role, and C(i) are ELHI-concepts: C := > | ⊥ | A | C1 u C2 | ∃R.C An ELHI-TBox T is a finite set of GCIs C1 v C2 , with Ci ELHI-concepts, and RIAs R1 v R2 with Ri ELHI-roles. We assume from now on that ELHI-TBoxes are normalised, i.e., they contain only GCIs of the form A1 v A2 , A1 uA2 v A, A1 v ∃R.A2 , or ∃R.A2 v A1 , where A(i) ∈ C∪{>}, and R ∈ R. An ABox A is a finite set of assertions A(a) or P (a, b), for A ∈ C, P ∈ R, and a, b ∈ I. An ELHI-ontology is a set O = T ∪ A. DL-Lite is obtained from ELHI by disallowing GCIs of the form ∃R.A1 v A2 where A1 6= >.1 We call such GCIs RA-GCIs while all the rest DL-Lite-GCIs. Queries and Query Rewriting A datalog rule r is an expression of the form H ← B1 ∧ . . . ∧ Bn where H, called head, is a (possibly empty) function-free atom, {B1 , . . . , Bn }, called body, is a set of function-free atoms, and each variable in the head also occurs in the body. A variable that appears twice in the body and not in the head is called ej-variable; we use ejvar(r) to denote all ej-variables of r. A datalog program P is a finite set of datalog rules. A union of conjunctive queries (UCQ) Q is a set of datalog rules such that their head atoms share the same predicate, called query predicate, which does not appear anywhere in the body. A conjunctive query (CQ) is a UCQ with exactly one rule. We often abuse notation and identify a CQ with the only rule it contains instead of a singleton set. For a query Q with query predicate Q, a tuple of constants ~a is an answer of Q w.r.t. a TBox T and an ABox A if the arity of ~a agrees with the arity of Q and T ∪ A ∪ Q |= Q(~a). We denote with cert(Q, T ∪ A) the answers to Q w.r.t. T ∪ A. Finally, we recall the notion of (query) rewriting [3, 19, 21]. Definition 1. Let Q be a CQ with query predicate Q and let T be a TBox. A datalog rewriting (or simply rewriting) R of a CQ Q w.r.t. T is a datalog program whose rules can be partitioned into two disjoint sets, RD and RQ such that RD does not mention Q, RQ is a UCQ with query predicate Q, and where 1 In the literature, this DL is called DL-LiteR,u [4], however, for simplicity we call it DL-Lite. Typically, DL-Lite also allows for axioms of the form A1 v ¬A2 and R1 v ¬R2 , however, these do not have any effects in query answering when T ∪ A is consistent (see, e.g., [19]); hence we will discard them here. 3 for each A consistent w.r.t. T and using only predicates from T we have: cert(Q, T ∪ A) = cert(RQ , RD ∪ A). The Requiem System Throughout the paper we will use the inference system implemented by Requiem as a yardstick, to highlight the inefficiencies of typical resolution-based rewriting algorithms and sketch completeness of the refined approach. Like most rewriting systems, the behaviour of Requiem on input Q and T can be characterised by the application of the following three steps: 1. Clausification: the input TBox T is transformed into a set of clauses TC , using standard techniques like skolemisation (see Example 1). 2. Saturation: TC ∪Q is saturated by using (binary) resolution with free selection [2] (denoted as IREQ ) computing a new set TCsat . 3. Post-processing: all function-free clauses in TCsat are returned. Next, we briefly illustrate the Requiem calculus through an example. Example 1. Consider the TBox T consisting of the following GCIs (left side), together with the respective clauses produced at step 1. (right side): A v ∃R.B R(x, f (x)) ← A(x) (2a) B(f (x)) ← A(x) (2b) RvS S(x, y) ← R(x, y) (3) As can be seen, the first GCI produces two clauses, where f is a skolem function that is uniquely associated with the specific occurence of concept ∃R.B. Consider now the query Q1 = Q(x) ← S(x, y) ∧ C(y). When applied to Q1 and T the Requiem algorithm would perform the following inferences: (2a) and (3) produces S(x, f (x)) ← A(x) (4) Q1 and (4) produces Q2 = Q(x) ← A(x) ∧ C(f (x)) (5) Finally, the function-free set returned is R = {Q1 , (3)}. ♦ Conventions To simplify the presentation, in the following we assume that TBoxes are given in a clausal form. Clauses that are produced by clausfying RA- GCIs and DL-Lite-GCIs are called RA-clauses and DL-Lite-clauses, respectively. Finally, clauses with the query predicate in the head are called Q-clauses; note that such clauses can contain functional terms in the body (e.g., clause Q2 in Example 1). 3 An Optimised Calculus for DL-Lite In the current section we present a hyper-resolution like rewriting algorithm for DL-Lite, which characterises the algorithm implemented in Rapid using the framework of resolution. In addition, we sketch its proof of correctness, which implies correctness of Rapid; this is also important subsequently for ELHI. 4 Example 2. Consider the TBox T and query Q1 from Example 1 as well as the inferences performed by IREQ on T ∪ Q1 . As it can be observed the algorithm performed two redundant inferences since neither clauses (4) and Q2 nor any other clause derived from these is a member of the computed rewriting R. These clauses would be of importance only if a clause of the form C(x) ← B(x) also existed in T . Then, the latter would resolve with (2b) producing C(f (x)) ← A(x) which would subsequently resolve with Q2 to produce Q(x) ← A(x), that would be part of R. ♦ As it has been argued in [22], TBoxes typically contain many clauses of the form R(x, fi (x)) ← Ai (x). Together with clause (3) this implies that the al- gorithm would produce many clauses of the form S(x, fi (x)) ← Ai (u) and Q(x) ← A(x) ∧ C(fi (x)) which can adversely affect performance. One could try to remedy the above issue by applying the following ap- proach/refinement on IREQ : first, saturate the clauses of T obtaining Tsat and then, resolve Q1 with (4) only if a clause of the form C(f (x)) ← A(x) also exists in Tsat . If it does, then Q(x) ← A(x) can be obtained directly from Q1 as a hyper-resolvent; if it doesn’t, then we can avoid computing Q2 . However, to im- plement this hyper-resolution step the algorithm needs to perform a quadratic loop over the set Tsat , which in practice can be very large. As we will show in the next section this issue is even more acute in more expressive ontologies like ELHI which allow for RA-clauses (see also discussion in [22]). Our inability to efficiently implement the above refinement is because IREQ follows a “forward” style approach to apply resolution which generates new clauses that contain function symbols (e.g., clause (4)). In contrast Rapid is based on a goal-oriented “backwards” style approach that resembles SLD deriva- tions. In the previous example, it will first resolve Q1 with (3) to obtain Q02 = Q(x) ← R(x, y) ∧ C(y), while then, it will resolve Q02 with clause (2a) only if also a clause of the form C(f (x)) ← A(x) exists in T . The crucial difference is that, in the latter case, to implement the hyper-resolution step we pick clauses from T rather than Tsat . In addition to T being much smaller than Tsat , as explained in Example 1, functional terms are unique per occurence of ∃R.B, and this can be exploited in practice using indexes over the functional terms. The next definition characterises the Rapid algorithms using the framework of resolution. Definition 2. Let Q be a CQ, let C(i) be (not necessarily distinct) DL-Lite clause(s) with head atom(s) C(i) . With Ilite we denote the inference system that consists of the following inference rules, where Q0 σ is a function-free (hyper- )resolvent of Q and C(i) : Q C unfolding: s.t. if x 7→ f (y) ∈ σ, then x 6∈ ejvar(Q) Q0 σ Q C 1 C2 shrinking: s.t. there exists x 7→ f (y) ∈ σ with x ∈ ejvar(Q) Q0 σ Finally, for Q a CQ and T a DL-Lite-TBox let Rapid-Lite(Q, T ) be all function- free clauses derivable from Q ∪ T by Ilite . 5 Intuitively, unfolding corresponds to all classical binary resolution inferences that won’t introduce function symbols in Q, while shrinking is a hyper-resolution step which “eliminates” an ej-variable of Q using clauses that mention a functional term f . Example 3. Consider T and Q1 from Example 1 and consider also the TBox T 0 = {C(x) ← B(x)} ∪ T . Applying Ilite to T ∪ Q1 performs the following inferences: unfolding on Q1 and (3) produces Q02 = Q(x) ← R(x, y) ∧ C(y) unfolding on Q02 and C(x) ← B(x) produces Q03 = Q(x) ← R(x, y) ∧ B(y) shrinking on Q03 , (2a), and (2b) produces Q4 = Q(x) ← A(x) The set R0 = {Q1 , Q02 , Q03 , Q4 } is a rewriting of Q1 w.r.t. T . ♦ Correctness of our rewriting algorithm follows by showing how a derivation con- structed by IREQ can be transformed into a derivation by Ilite ; hence, saturating T ∪ Q by Ilite will create all necessary members of a rewriting. This is done in two steps. First, we show that each Requiem derivation can be transformed into an SLD derivation. Lemma 1. Let T be a DL-Lite-TBox and let Q be a CQ with query predicate Q. Every Q-clause Q0 derivable from T ∪ Q by IREQ is also derivable from T ∪ Q by ISLD . The Lemma is proven by showing how to “unfold” a Requiem inference Q, C ` Q0 where T `i C into Q, C1 ` Q00 , Q00 , C2 ` Q0 where T `i−1 C1 , T `i−1 C2 and C1 , C2 ` C. By repeated application of this unfolding we will eventually (fully) unfold any derivation of a Q-clause Q0 into inferences of the form Q, C1 ` Q1 , . . . , Qn−1 , Cn ` Q0 , where Ci ∈ T , i.e., into an SLD derivation of Q0 . Second, we show that each SLD derivation can be transformed into a deriva- tion by Ilite . By definition of Ilite and Example 3 we can see that inferences using the unfolding rule directly corresponds to one SLD inference, while shrinking cor- responds to many SLD inferences where, the main premise is a function-free CQ, the side premises contain the same function symbol f , and the conclusion is also a function-free CQ. Hence, we need to show that each SLD derivation can be transformed into one that satisfies the following property. Definition 3. Let Q1 , Q2 , . . . , Qn be an SLD derivation of Qn such that Qi , Ci ` Qi+1 for some clause Ci . Assume also that all Q2 , . . . , Qn−1 contain a term that mentions a function symbol f while Q1 and Qn are function-free. We say that the derivation is function compact if all side premises Ci with 1 ≤ i < n used in the derivation also mention f . The following can be shown for Horn clauses with one existential variable. Lemma 2. Any SLD derivation can be transformed to a function compact one. Using Lemmas 1 and 2 we can show the following. Theorem 1. Let a DL-Lite-TBox T and a CQ Q. Every derivation from T ∪ Q by Ilite terminates. Moreover, Rapid-Lite(Q, T ) is a rewriting of Q w.r.t. T . 6 4 A Rewriting Algorithm for ELHI In the current section we extend the inference system Ilite in order to provide a (query) rewriting algorithm for ELHI ontologies. ELHI additionally allows for RA-clauses of the form E(x) ← R(x, y) ∧ F (x). According to the Requiem calculus this clause interacts with clause (3) of Ex- ample 2 producing E(x) ← A(x) ∧ F (f (x)). Again this inference is of interest only if also a clause of the form F (f (x)) ← G(x) can be deduced and hence then produce F (x) ← A(x) ∧ G(x) which is function-free. As argued in [22] ontologies typically have many clauses of these forms which can lead to a quadratic number of redundant inferences. A straightforward approach to obtain an efficient calculus for ELHI ontolo- gies would be to extend Definition 2 to allow for arbitrary ELHI-clauses as side premises in shrinking and unfolding. Then, Lemmas 1 and 2 apply with few modifications and hence this calculus would produce a rewriting. Example 4. Let T be the ELHI-TBox consisting of the following clauses: C(x) ← S(x, y) ∧ D(y) (6) S(f (x), x) ← B(x) (7) K(x) ← S(y, x) ∧ C(y) (8) and let also the query Q1 = Q(x) ← K(x). By unfolding on Q1 and (8) we obtain Q2 = Q(x) ← S(y, x) ∧ C(y); then, by unfolding on Q2 and (6) we obtain Q3 = Q(x) ← S(y, x) ∧ S(y, z) ∧ D(z); finally, by shrinking on Q3 and (7) we can obtain Q4 = Q(x) ← B(x) ∧ D(x). It can be verified that R = {Q1 , Q2 , Q3 , Q4 } is a rewriting of Q w.r.t. T ♦ However, as the previous example shows, unfolding using RA-clauses produces clauses that contain more variables than the main premise and hence to variable proliferation which implies termination problems. There are two solutions to this issue. We either decide a bound on the number of times that such clauses can be used as side premises or, like in Ilite , we still only allow DL-Lite-clauses as side premises. But then, such a calculus would require additional inference rules in order to be able to derive intermediate clauses that can then be side premises in inferences. Next, we show how the Requiem calculus would behave when applied to the input of Example 4 which motivates our ELHI calculus. Example 5. Consider the TBox T and query Q1 from Example 4. When applied to T and Q1 the Requiem algorithm would perform the following inferences with the respective conclusions: (6) and (7) produces C(f (x)) ← B(x) ∧ D(x) (9) (8) and (7) produces K(x) ← B(x) ∧ C(f (x)) (10) (10) and (9) produces K(x) ← B(x) ∧ D(x) (11) Q1 and (11) produces Q(x) ← B(x) ∧ D(x) (12) 7 The set R = {Q1 , (6), (8), (12)} is a (datalog) rewriting of Q1 w.r.t. T . First, we can observe that, if we extend shrinking to accept RA-clauses as main premises, then the inferences performed to produce clause (11) from clause (8) can be captured by a single shrinking inference over clause (8) with side premises (7) and (9). However, we can observe that clause (9) is produced by re- solving together an RA-clause with a DL-Lite-clause and this cannot be captured by any of the inferences of Ilite . ♦ Motivated by the above example, our caluclus for ELHI-ontologies consists of a new inference rule which can infer clauses like clause (9), together with unfolding and (an extension of) shrinking that permit as a main premise either a CQ or an RA-clause (i.e., Ilite with main premise clauses that are non DL-Lite). Definition 4. With IEL we denote the inference system consisting of unfold- ing, where the main premise is either a CQ or an RA-clause, together with the following inference rules, where Υ is either a CQ or an RA-clause, for n ≥ 2 each Ci is a DL-Lite-clause, and Υ 0 σ is a function-free hyper-resolvent of Υ and Ci : Υ C1 . . . Cn n-shrinking: s.t. there exists x 7→ f (y) ∈ σ with x ∈ ejvar(Υ ) Υ 0σ B(x) ← R(x, y) ∧ C(y) R(f (x), x) ← A(x) or B(f (x)) ← A(x) ∧ C(x) function: B(x) ← R(y, x) ∧ C(y) R(x, f (x)) ← A(x) B(f (x)) ← A(x) ∧ C(x) Finally, for Q a CQ and T an ELHI-TBox, Rapid-EL(Q, T ) is defined as all function-free clauses derivable from Q ∪ T by IEL . First, note that we had to extend shrinking to accept possibly more that two side premises. This is because due to the function rule we can now deduce new clauses that contain function symbols; hence there can be more than two clauses contain- ing some function f , unlike the case in DL-Lite. In practice, however, we expect that n is small since such clauses are only produced by the function rule and only when there is a complex interaction between a clause containing R(x, f (x)) (R(f (x), x)) and an RA-clause containing the inverse R(y, x) (R(x, y)). Example 6. Consider Example 5. The inference between clauses (6) and (7) that produces clause (9) corresponds to an inference using the function rule. Then, clause (11) can be produced by n-shrinking over (8) with side premises clauses (7) and (9), while (12) is produced by unfolding on Q and (8), hence computing the required rewriting. ♦ As mentioned in the previous section correctness of Rapid-Lite is heavily based on showing that Requiem derivations can be unfolded into SLD derivations. Consider now ELHI and an inference of the form Υ, C ` Υ 0 , where T `IREQ C. 8 If the derivation of C does not involve an RA-clause, then again we can fully unfold into an SLD derivation, like in DL-Lite. If it does, then this inference can be unfolded into Υ, C1 ` Υ2 , . . . , Υn−1 , Cn ` Υ 0 up to a certain level; i.e., for some Ci we might have Ci 6∈ T with Ci derivable by some RA-clause. By inspecting the types of inferences performed by IREQ using RA-clauses (cf. [18, Table 4]) we can see that we can unfold up to the point where Ci has one of the following forms, where the notation [C(x)] means that C(x) can be omitted: A(x) ← B(x) ∧ [C(x)] (13) B(f (x)) ← A(x) ∧ [C(x)] (14) Again by inspection of IREQ we can see that clauses of form (14) can be pro- duced from RA-clauses by an inference which is exactly the one captured by the function rule. The derivation of clauses of form (13) is more involved and is characterised next. Lemma 3. Let P be a set of DL-Lite-clauses and let Υ1 be an RA-clause. Con- sider also a derivation Υ1 , . . . , Υn by IREQ , such that Υi , Ci ` Υi+1 and Ci is deriv- able from P by IREQ . Assume also that Υn is of the form A(x) ← B(x) ∧ [C(x)] while no Υi with i < n is of the same form as Υn . Then, Υn can be derived from P ∪ Υ1 by unfolding and n-shrinking. Using Lemma 3 and our observation regarding clauses of form (14) we can show the following. Lemma 4. Let T be an ELHI-TBox and let Q be a CQ with query predicate Q. Let also Plt be all DL-Lite-clauses derivable from T by IEL . Then, every Q-clause Q0 derivable from T ∪ Q by IREQ is also derivable from Plt ∪ Q by ISLD . Finally, we can show the following. Theorem 2. Let an ELHI-TBox T and a CQ Q. Every derivation from T ∪ Q by IEL terminates. Moreover, Rapid-EL(Q, T ) is a datalog rewriting of Q, T . 5 Evaluation We have extended the Rapid2 query rewriting system [7] to support ELHI ontologies. Rapid attempts to compactly represent many unfolding inferences as datalog rules in an effort to compute small (datalog) rewritings. We conducted an experimental evaluation comparing against Requiem [19], Presto [21], and Clipper [8], using real-world large scale DL-Lite and ELHI ontologies. Regarding DL-Lite, we used DL-Lite versions of the OpenGALEN2 (http://www.opengalen.org/), OBO protein (http://www.obofoundry.org/), and NCI 3.12e (http://evs.nci.nih.gov/ftp1/NCI_Thesaurus) ontologies, which we denote by Gd , Pd , and N, respectively. This part extends the evalua- tion in [7] for large scale ontologies, and uses datalog rewritings instead of UCQs. 2 http://www.image.ece.ntua.gr/~achort/rapid.zip 9 Table 1. Statistics of the used test ontologies O concepts roles GCIs RIAs O concepts roles GCIs RIAs O concepts roles GCIs RIAs Gd 23193 851 49046 882 S 4298 519 6004 372 Ge 30048 851 63726 882 Pd 35351 6 43351 0 C 4282 22 9564 15 Pe 37560 6 52383 0 N 29173 66 53341 0 Regarding ELHI, we used ELHI versions of the NASA SWEET 2.3 (http: //sweet.jpl.nasa.gov/ontology/), periodic table (http://www.cs.man.ac. uk/~stevensr/ontology/), OpenGALEN2, and OBO protein ontologies, which we denote by S, C, Ge , and Pe , respectively. The DL-Lite and ELHI versions of the ontologies where obtained by normalizing the ontologies and keeping the appropriate subset of axioms. Table 1 provides statistics for the ontologies. For each of them we manually constructed 5 test queries. All tests were performed on a dual core 1.8GHz Intel Celeron processor laptop running Windows 8 and JVM 1.7 with 3.6GB maximum heap size. The timeout limit was set to 2 hours. The results for DL-Lite are shown in the upper part of Table 2, which includes the computation time and the size of the computed rewriting; “t/o” denotes timeout. First, we observe that neither Presto nor Clipper managed to compute a rewriting for Gd , Requiem required 9–18 minutes, while Rapid required only milliseconds. Similar observations can be made for the rest of the DL-Lite ontolo- gies. Notable cases are queries 1–5 over N for Clipper, where it required about 13 minutes for each of them, query 5 over Pd and query 3 over N for Requiem, for which it did not manage to compute a rewriting, and all queries over Pd and N for Presto for which it required 1 to 2 hours, being in general much slower also from Requiem and Clipper. Rapid was slower only for query 5 over Pd , where it required 7:15 minutes. The analysis showed that 7:00 out of the 7:15 minutes are spent in backwards subsumption checking (which guarantees a compact re- sult with no equivalent or subsumed queries), while the actual rewriting time is only 15 seconds. Note that no other system performs backwards subsumption. In fact, in this case e.g. Clipper returns several rewritings that are equivalent up to variable renaming, which justifies also the difference in the rewriting sizes. The results for ELHI are shown in the lower part of Table 2; we did not run Presto since it does not support ELHI. Our conclusions are similar to those of DL-Lite—that is, in the large scale ontologies Rapid greatly outperforms all other systems while in query 5 of the ELHI version of the OBO protein ontology (Pe ) it performed worse than Clipper since it spent 13:27 out of the 13:52 minutes in backwards subsumption. Note also that for Ge , Rapid was the only system that managed to compute a rewriting within ‘reasonable’ time (8–11 minutes). In summary, we can see that computing even some rewriting over large scale and complex ontologies is still an unresolved issue as in many cases many state- of-the-art systems did not manage to terminate, required several minutes, or even hours. This could be acceptable if a rewriting is computed once for a fixed ontology, however, practice has shown that ontologies are very often dynamic [5, 11, 10]. Moreover, in design time, ontology engineers tend to run reasoners often 10 Table 2. Evaluation results. Time (hh:ss:mm.msec) Rewriting size O Rapid Requiem Presto Clipper Rapid Requiem Presto Clipper DL-Lite .11 18:34.96 t/o t/o 225 30870 t/o t/o .09 9:01.05 t/o t/o 1276 1152 t/o t/o Gd .11 t/o t/o t/o 973 t/o t/o t/o .02 12:08.44 t/o t/o 667 11306 t/o t/o .34 t/o t/o t/o 3267 t/o t/o t/o .07 4.72 1:15:07.96 1:03.97 29 27 48 29 .81 45.58 58:19.05 1:04.94 1356 1356 2621 1356 Pd 1.40 9:50.54 1:16:36.30 1:08.40 33919 33887 33888 33919 6.99 11:30.87 1:21:28.37 1:08.70 34879 34733 35416 34879 7:14.56 t/o 1:12:51.73 1:08.86 27907 t/o 2670 54430 0.06 4.08 1:51:31.96 12:09.45 488 5002 469 488 0.04 37.70 1:56:40.12 13:02.02 1804 1765 1766 1804 N 0.15 t/o 1:59:20.03 13:15.10 4143 t/o 3546 4143 0.05 9:42.13 1:57:19.87 13:17.46 1875 64500 1917 1875 0.03 1:06:26.66 1:57:09.73 13:08.27 256 219150 208 340 ELHI .08 .17 n/a 13.67 172 298 n/a 171 .16 .36 n/a 13.32 473 1523 n/a 367 S .02 .44 n/a 13.79 629 1674 n/a 518 .05 1.09 n/a 13.32 1065 2861 n/a 949 .04 38.08 n/a 13.30 1075 18716 n/a 959 .06 3:36.54 n/a 21.42 1103 6800 n/a 2892 .05 3:40.24 n/a 19.73 879 6941 n/a 2892 C .09 3:47.85 n/a 18.95 1653 6889 n/a 2892 .08 3:31.29 n/a 20.48 1609 8077 n/a 2849 .15 9:10.73 n/a 20.33 1743 57054 n/a 2893 11:01.71 t/o n/a t/o 142566 t/o n/a t/o 8:45.87 t/o n/a t/o 140640 t/o n/a t/o Ge 10:18.01 t/o n/a t/o 143636 t/o n/a t/o 9:26.00 t/o n/a t/o 136679 t/o n/a t/o 7:47.52 t/o n/a t/o 128566 t/o n/a t/o 4.80 t/o n/a 1:42.08 51641 t/o n/a 51641 29.73 t/o n/a 1:42.30 52877 t/o n/a 52877 Pe 2.59 t/o n/a 1:45.35 51614 t/o n/a 51614 15.71 t/o n/a 1:43.19 52407 t/o n/a 52407 13:51.97 t/o n/a 1:47.10 79427 t/o n/a 105950 to see what effects their changes have on reasoning. In this case, any computation requiring more than a couple of minutes might not be acceptable. 6 Conclusions We have presented an efficient resolution-based rewriting algorithm for ELHI. Despite the complex interactions in ELHI we show that a calculus using exten- sions of the unfolding and shrinking rules of Rapid for DL-Lite, plus a new rule can be defined. Our experimental evaluation shows that there are large scale ontologies which existing systems cannot handle, while the new algorithm only requires milliseconds. This is to a large extent due to the hyper-resolution infer- ence performed by shrinking, which avoids well-known inefficiencies of resolution- based rewriting algorithms [22]. Concerning future work, we plan to investigate whether a similar hyper- resolution step can be used to optimise resolution-based rewriting algorithms for non-Horn DLs like ALC, like those implemented in the KAON2 system [15]. 11 References 1. Andrea Acciarri, Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maur- izio Lenzerini, Mattia Palmieri, and Riccardo Rosati. Quonto: Querying ontologies. In AAAI, pages 1670–1671, 2005. 2. Leo Bachmair and Harald Ganzinger. Resolution theorem proving. In Handbook of Automated Reasoning, pages 19–99. Elsevier and MIT Press, 2001. 3. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efficient query answering in de- scription logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 4. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Data complexity of query answering in description logics. Artificial Intelligence, 195:335–360, 2013. 5. Diego Calvanese, Evgeny Kharlamov, Werner Nutt, and Dmitriy Zheleznyakov. Evolution of dl-lite knowledge bases. In International Semantic Web Conference (1), pages 112–128, 2010. 6. Chin-Liang Chang and Richard C. T. Lee. Symbolic logic and mechanical theorem proving. Computer science classics. Academic Press, 1973. 7. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting in OWL 2 QL. In Proc. of CADE 23, pages 192–206, 2011. 8. Thomas Eiter, Magdalena Ortiz, Mantas Simkus, Trung-Kien Tran, and Guohui Xiao. Query rewriting for Horn-SHIQ plus rules. In AAAI, 2012. 9. Christian G Fermüller, Alexander Leitsch, Ullrich Hustadt, and Tanel Tammet. Resolution decision procedures. In Handbook of Automated Reasoning, pages 1791– 1849. Elsevier Science Publishers BV, 2001. 10. Giorgos Flouris, Dimitris Manakanatas, Haridimos Kondylakis, Dimitris Plex- ousakis, and Grigoris Antoniou. Ontology change: classification and survey. Knowl- edge Eng. Review, 23(2):117–152, 2008. 11. Rafael S. Gonçalves, Bijan Parsia, and Ulrike Sattler. Analysing the evolution of the nci thesaurus. In CBMS, pages 1–6, 2011. 12. Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Deciding Expressive Description Logics in the Framework of Resolution. Information & Computation, 206(5):579– 601, 2008. 13. Ullrich Hustadt and Renate A Schmidt. Issues of decidability for description logics in the framework of resolution. In Automated Deduction in Classical and Non- Classical Logics, pages 191–205. Springer, 2000. 14. Martha Imprialou, Giorgos Stoilos, and Bernardo Cuenca Grau. Benchmarking ontology-based query rewriting systems. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2012). AAAI Press, 2012. 15. Boris Motik. Description Logics and Disjunctive Datalog—More Than just a Fleet- ing Resemblance? In Proc. of the 4th Workshop on Methods for Modalities (M4M- 4), volume 194, pages 246–265, 2005. 16. Boris Motik, Ulrike Sattler, and Rudi Studer. Query answering for OWL-DL with rules. J. Web Sem., 3(1):41–60, 2005. 17. Giorgio Orsi and Andreas Pieris. Optimizing query answering under ontological constraints. VLDB Endowment, 4(11):1004–1015, 2011. 18. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. Rewriting Conjunctive Queries under Description Logic Constraints. In Andrea Cal‘i, Georg Gottlob, Laks V.S. Lakshmanan, and Davide Martinenghi, editors, Proc. of the Int. Work- shop on Logic in Databases (LID 2008), Rome, Italy, May 19–20 2008. 12 19. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. Tractable query answer- ing and rewriting under description logic constraints. Journal of Applied Logic, 8(2):186–209, 2010. 20. Mariano Rodriguez-Muro and Diego Calvanese. High performance query answering over DL-Lite ontologies. In KR, 2012. 21. Riccardo Rosati and Alessandro Almatelli. Improving query answering over DL- Lite ontologies. In Proc. of KR-10, 2010. 22. Frantisek Simancik, Yevgeny Kazakov, and Ian Horrocks. Consequence-based rea- soning beyond horn ontologies. In IJCAI, pages 1093–1098, 2011. 23. Giorgos Stoilos, Bernardo Cuenca Grau, Boris Motik, and Ian Horrocks. Repair- ing ontologies for incomplete reasoners. In Proceedings of the 10th International Semantic Web Conference (ISWC-11), Bonn, Germany, pages 681–696, 2011. 24. Tassos Venetis, Giorgos Stoilos, and Giorgos Stamou. Query extensions and in- cremental query rewriting for OWL 2 QL ontologies. Journal on Data Semantics, Accepted, 2013. 13