=Paper= {{Paper |id=None |storemode=property |title=Optimising Resolution-Based Rewriting Algorithms for DL Ontologies |pdfUrl=https://ceur-ws.org/Vol-1014/paper_74.pdf |volume=Vol-1014 |dblpUrl=https://dblp.org/rec/conf/dlog/TrivelaSCS13 }} ==Optimising Resolution-Based Rewriting Algorithms for DL Ontologies== https://ceur-ws.org/Vol-1014/paper_74.pdf
         Optimising Resolution-Based Rewriting
             Algorithms for DL Ontologies?

Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos Stamou

                   School of Electrical and Computer Engineering,
                   National Technical University of Athens, Greece



        Abstract. Resolution-based rewriting algorithms have been widely used
        for computing disjunctive datalog rewritings for DL TBoxes, and re-
        cently, also for computing datalog rewriting for queries over TBoxes ex-
        pressed in DL-Lite and ELHI. Although such algorithms are general
        enough to support a wide variety of (even very complex) DLs, this gen-
        erality comes with performance prices. In the current paper we present a
        resolution-based (query) rewriting algorithm for ELHI that is based on
        a hyper-resolution like inference rule and which avoids performing many
        redundant inferences. We have implemented the algorithm and have con-
        ducted an experimental evaluation using large and complex ontologies.


1     Introduction
A prominent approach to query answering over DL ontologies is via rewriting
the given input into formalisms for which efficient data retrieval systems exist.
More precisely, in TBox rewriting (resp. query rewriting) the input TBox T
(resp. TBox T and query Q) is transformed into a set of sentences P, typically a
(disjunctive) datalog program called rewriting, such that for any dataset D the
answers of any ground query Qg (resp. of the query Q) w.r.t. D and T coincide
with the answers of Qg (resp. Q) w.r.t. D and P [12, 3, 19]. Consequently, after
computing P the problem of query answering can be delegated to efficient and
scalable (deductive) database and datalog evaluation systems by either directly
evaluating P using off-the-shelf systems [8], by implementing customised engines
[16], or by integrating P in an optimal way into data saturation engines [23].
    Numerous rewriting systems for DLs of varying expressivity have been de-
veloped the last decade. One of the first practical systems for answering ground
queries over OWL DL ontologies, namely KAON2 [16], was based on TBox-
rewriting. Also recently, a large number of rewriting-based systems for answer-
ing arbitrary queries over less expressive DLs have been developed. Prominent
examples include QuOnto [1], Presto [21], Quest [20], Rapid [7], Nyaya [17], and
IQAROS [24], which support DL-Lite, and Requiem [19] and Clipper [8] which
support ELHI and Horn-SHIQ, respectively.
    An important approach for computing rewritings is by using resolution calculi
[2]. In this setting, the input is first transformed into a set of clauses which
?
    Work was supported by an FP7 Marie Curie CIG grant and Europeana Creative.
is then saturated using resolution to derive new (datalog) clauses. On the one
hand, such calculi are worst-case optimal and allow for a large number of existing
optimisations, like ordering restrictions and subsumption deletion. On the other
hand, since there exist many resolution-based decision procedures for expressive
fragments of first order logic [9, 13] it is (relatively) easier to design a resolution-
based rewriting algorithm for an expressive DL compared to designing a custom
made one. For example, to the best of our knowledge, none of the tailor made
systems for DL-Lite can currently support more expressive DLs.
    However, the efficiency of resolution-based approaches has also been criticised
[22]. Even with all existing optimisations the saturation can perform many redun-
dant inferences producing clauses that contain function symbols and which don’t
subsequently lead to the generation of function-free (datalog) clauses that are
part of the rewriting. Hence, tailor made approaches have greatly surpassed the
resolution-based ones [22, 14]. To circumvent this issue and provide an efficient
resolution-based query rewriting algorithm for DL-Lite, an optimised hyper-
resolution like inference that avoids producing (intermediate) redundant clauses
that contain function symbols has been proposed [7]. The calculus was imple-
mented in Rapid and as shown also by independent evaluations, Rapid is cur-
rently one of the fastest query rewriting systems [7, 14], however, like most of
them it can only support DL-Lite.
    In the current paper we investigate whether a resolution-based rewriting
algorithm that is based on a similar hyper-resolution step can be defined for
for ELHI TBoxes. This is a technically very challenging task as the structure
of ELHI axioms implies many complex interactions between the clauses (in
contrast to DL-Lite, ELHI is not FO-rewritable and subsumption checking is in
ExpTime). However, we show that a rewriting can be computed by a calculus
that contains an extension of the hyper-resolution step of DL-Lite plus a new
resolution rule that we expect to be rarely applied in practice. To achieve this
we first cast the Rapid calculus for DL-Lite to the framework of resolution and
sketch its correctness. Finally, we conducted an experimental evaluation using
large-scale real-world ontologies. Our results show that in many tests state-of-
the-art systems cannot compute a rewriting within a reasonable amount of time
(more than one hour) compared to the new implementation.

2    Preliminaries
We use the standard notions of first-order term, atom, variable, sentence, con-
stant, function symbols, functional terms, entailment (|=), and the like.
Resolution-Based Calculi We use standard notions from (resolution) theorem-
proving like clause, (hyper-)resolvent and most general unifier (mgu). An infer-
ence rule, or simply inference is an n + 1-ary relation usually written as follows:
                                    C1 C2 . . . Cn
                                                                                  (1)
                                         C
where C1 is called the main premise, C2 , . . . Cn are called the side premises and C
is called the conclusion or resolvent. An inference system I, also called calculus,


                                           2
is a collection of inference rules. Let Σ be a set of clauses, C a clause and I an
inference system. A derivation of C from Σ by I, written Σ `I C (or simply
Σ ` C if I is clear from the context), is a sequence of clauses C1 , . . . , Cm such
that Cm = C, each Ci is either a member of Σ or the conclusion of an inference
by I from Σ ∪ {C1 , . . . , Ci−1 }. In that case we say that C is derivable from Σ by
I. We write Σ `i C to denote that the depth of the corresponding derivation tree
[6] constructed for C from Σ is less or equal to i. We also often write Σ, C ` C 0
instead of Σ ∪ {C} ` C 0 . An inference system with particular interest to us is
SLD, denoted by ISLD , where all side premises are members of Σ.

Description Logics Let C, R, and I be countable, pairwise disjoint sets of
atomic concepts, atomic roles, and individuals. An ELHI-role is either an atomic
role P or its inverse P − . The set of ELHI-concepts is defined inductively as
follows, where A ∈ C, R is an ELHI-role, and C(i) are ELHI-concepts: C := > |
⊥ | A | C1 u C2 | ∃R.C An ELHI-TBox T is a finite set of GCIs C1 v C2 , with
Ci ELHI-concepts, and RIAs R1 v R2 with Ri ELHI-roles. We assume from
now on that ELHI-TBoxes are normalised, i.e., they contain only GCIs of the
form A1 v A2 , A1 uA2 v A, A1 v ∃R.A2 , or ∃R.A2 v A1 , where A(i) ∈ C∪{>},
and R ∈ R. An ABox A is a finite set of assertions A(a) or P (a, b), for A ∈ C,
P ∈ R, and a, b ∈ I. An ELHI-ontology is a set O = T ∪ A. DL-Lite is obtained
from ELHI by disallowing GCIs of the form ∃R.A1 v A2 where A1 6= >.1 We
call such GCIs RA-GCIs while all the rest DL-Lite-GCIs.

Queries and Query Rewriting A datalog rule r is an expression of the form
H ← B1 ∧ . . . ∧ Bn where H, called head, is a (possibly empty) function-free
atom, {B1 , . . . , Bn }, called body, is a set of function-free atoms, and each variable
in the head also occurs in the body. A variable that appears twice in the body
and not in the head is called ej-variable; we use ejvar(r) to denote all ej-variables
of r. A datalog program P is a finite set of datalog rules. A union of conjunctive
queries (UCQ) Q is a set of datalog rules such that their head atoms share the
same predicate, called query predicate, which does not appear anywhere in the
body. A conjunctive query (CQ) is a UCQ with exactly one rule. We often abuse
notation and identify a CQ with the only rule it contains instead of a singleton
set. For a query Q with query predicate Q, a tuple of constants ~a is an answer
of Q w.r.t. a TBox T and an ABox A if the arity of ~a agrees with the arity of Q
and T ∪ A ∪ Q |= Q(~a). We denote with cert(Q, T ∪ A) the answers to Q w.r.t.
T ∪ A. Finally, we recall the notion of (query) rewriting [3, 19, 21].

Definition 1. Let Q be a CQ with query predicate Q and let T be a TBox.
A datalog rewriting (or simply rewriting) R of a CQ Q w.r.t. T is a datalog
program whose rules can be partitioned into two disjoint sets, RD and RQ such
that RD does not mention Q, RQ is a UCQ with query predicate Q, and where
1
    In the literature, this DL is called DL-LiteR,u [4], however, for simplicity we call
    it DL-Lite. Typically, DL-Lite also allows for axioms of the form A1 v ¬A2 and
    R1 v ¬R2 , however, these do not have any effects in query answering when T ∪ A
    is consistent (see, e.g., [19]); hence we will discard them here.


                                            3
for each A consistent w.r.t. T and using only predicates from T we have:
                      cert(Q, T ∪ A) = cert(RQ , RD ∪ A).

The Requiem System Throughout the paper we will use the inference system
implemented by Requiem as a yardstick, to highlight the inefficiencies of typical
resolution-based rewriting algorithms and sketch completeness of the refined
approach. Like most rewriting systems, the behaviour of Requiem on input Q
and T can be characterised by the application of the following three steps:
 1. Clausification: the input TBox T is transformed into a set of clauses TC ,
    using standard techniques like skolemisation (see Example 1).
 2. Saturation: TC ∪Q is saturated by using (binary) resolution with free selection
    [2] (denoted as IREQ ) computing a new set TCsat .
 3. Post-processing: all function-free clauses in TCsat are returned.
Next, we briefly illustrate the Requiem calculus through an example.
Example 1. Consider the TBox T consisting of the following GCIs (left side),
together with the respective clauses produced at step 1. (right side):
                      A v ∃R.B           R(x, f (x)) ← A(x)                   (2a)
                                         B(f (x)) ← A(x)                      (2b)
                       RvS              S(x, y) ← R(x, y)                      (3)
As can be seen, the first GCI produces two clauses, where f is a skolem function
that is uniquely associated with the specific occurence of concept ∃R.B.
   Consider now the query Q1 = Q(x) ← S(x, y) ∧ C(y). When applied to Q1
and T the Requiem algorithm would perform the following inferences:
             (2a) and (3) produces S(x, f (x)) ← A(x)                          (4)
              Q1 and (4) produces Q2 = Q(x) ← A(x) ∧ C(f (x))                  (5)
Finally, the function-free set returned is R = {Q1 , (3)}.                       ♦

Conventions To simplify the presentation, in the following we assume that
TBoxes are given in a clausal form. Clauses that are produced by clausfying RA-
GCIs and DL-Lite-GCIs are called RA-clauses and DL-Lite-clauses, respectively.
Finally, clauses with the query predicate in the head are called Q-clauses; note
that such clauses can contain functional terms in the body (e.g., clause Q2 in
Example 1).


3   An Optimised Calculus for DL-Lite
In the current section we present a hyper-resolution like rewriting algorithm
for DL-Lite, which characterises the algorithm implemented in Rapid using the
framework of resolution. In addition, we sketch its proof of correctness, which
implies correctness of Rapid; this is also important subsequently for ELHI.


                                        4
Example 2. Consider the TBox T and query Q1 from Example 1 as well as the
inferences performed by IREQ on T ∪ Q1 . As it can be observed the algorithm
performed two redundant inferences since neither clauses (4) and Q2 nor any
other clause derived from these is a member of the computed rewriting R.
    These clauses would be of importance only if a clause of the form C(x) ←
B(x) also existed in T . Then, the latter would resolve with (2b) producing
C(f (x)) ← A(x) which would subsequently resolve with Q2 to produce Q(x) ←
A(x), that would be part of R.                                             ♦
As it has been argued in [22], TBoxes typically contain many clauses of the
form R(x, fi (x)) ← Ai (x). Together with clause (3) this implies that the al-
gorithm would produce many clauses of the form S(x, fi (x)) ← Ai (u) and
Q(x) ← A(x) ∧ C(fi (x)) which can adversely affect performance.
    One could try to remedy the above issue by applying the following ap-
proach/refinement on IREQ : first, saturate the clauses of T obtaining Tsat and
then, resolve Q1 with (4) only if a clause of the form C(f (x)) ← A(x) also exists
in Tsat . If it does, then Q(x) ← A(x) can be obtained directly from Q1 as a
hyper-resolvent; if it doesn’t, then we can avoid computing Q2 . However, to im-
plement this hyper-resolution step the algorithm needs to perform a quadratic
loop over the set Tsat , which in practice can be very large. As we will show in
the next section this issue is even more acute in more expressive ontologies like
ELHI which allow for RA-clauses (see also discussion in [22]).
    Our inability to efficiently implement the above refinement is because IREQ
follows a “forward” style approach to apply resolution which generates new
clauses that contain function symbols (e.g., clause (4)). In contrast Rapid is
based on a goal-oriented “backwards” style approach that resembles SLD deriva-
tions. In the previous example, it will first resolve Q1 with (3) to obtain Q02 =
Q(x) ← R(x, y) ∧ C(y), while then, it will resolve Q02 with clause (2a) only if also
a clause of the form C(f (x)) ← A(x) exists in T . The crucial difference is that,
in the latter case, to implement the hyper-resolution step we pick clauses from T
rather than Tsat . In addition to T being much smaller than Tsat , as explained in
Example 1, functional terms are unique per occurence of ∃R.B, and this can be
exploited in practice using indexes over the functional terms. The next definition
characterises the Rapid algorithms using the framework of resolution.
Definition 2. Let Q be a CQ, let C(i) be (not necessarily distinct) DL-Lite
clause(s) with head atom(s) C(i) . With Ilite we denote the inference system that
consists of the following inference rules, where Q0 σ is a function-free (hyper-
)resolvent of Q and C(i) :
               Q          C
  unfolding:                    s.t. if x 7→ f (y) ∈ σ, then x 6∈ ejvar(Q)
                   Q0 σ

               Q       C 1 C2
  shrinking:                    s.t. there exists x 7→ f (y) ∈ σ with x ∈ ejvar(Q)
                     Q0 σ
Finally, for Q a CQ and T a DL-Lite-TBox let Rapid-Lite(Q, T ) be all function-
free clauses derivable from Q ∪ T by Ilite .


                                          5
Intuitively, unfolding corresponds to all classical binary resolution inferences that
won’t introduce function symbols in Q, while shrinking is a hyper-resolution step
which “eliminates” an ej-variable of Q using clauses that mention a functional
term f .
Example 3. Consider T and Q1 from Example 1 and consider also the TBox
T 0 = {C(x) ← B(x)} ∪ T . Applying Ilite to T ∪ Q1 performs the following
inferences:
              unfolding on Q1 and (3) produces Q02 = Q(x) ← R(x, y) ∧ C(y)
  unfolding on Q02 and C(x) ← B(x) produces Q03 = Q(x) ← R(x, y) ∧ B(y)
     shrinking on Q03 , (2a), and (2b) produces Q4 = Q(x) ← A(x)
The set R0 = {Q1 , Q02 , Q03 , Q4 } is a rewriting of Q1 w.r.t. T .                ♦
Correctness of our rewriting algorithm follows by showing how a derivation con-
structed by IREQ can be transformed into a derivation by Ilite ; hence, saturating
T ∪ Q by Ilite will create all necessary members of a rewriting. This is done in
two steps. First, we show that each Requiem derivation can be transformed into
an SLD derivation.
Lemma 1. Let T be a DL-Lite-TBox and let Q be a CQ with query predicate
Q. Every Q-clause Q0 derivable from T ∪ Q by IREQ is also derivable from T ∪ Q
by ISLD .
The Lemma is proven by showing how to “unfold” a Requiem inference Q, C ` Q0
where T `i C into Q, C1 ` Q00 , Q00 , C2 ` Q0 where T `i−1 C1 , T `i−1 C2 and
C1 , C2 ` C. By repeated application of this unfolding we will eventually (fully)
unfold any derivation of a Q-clause Q0 into inferences of the form Q, C1 `
Q1 , . . . , Qn−1 , Cn ` Q0 , where Ci ∈ T , i.e., into an SLD derivation of Q0 .
     Second, we show that each SLD derivation can be transformed into a deriva-
tion by Ilite . By definition of Ilite and Example 3 we can see that inferences using
the unfolding rule directly corresponds to one SLD inference, while shrinking cor-
responds to many SLD inferences where, the main premise is a function-free CQ,
the side premises contain the same function symbol f , and the conclusion is also
a function-free CQ. Hence, we need to show that each SLD derivation can be
transformed into one that satisfies the following property.
Definition 3. Let Q1 , Q2 , . . . , Qn be an SLD derivation of Qn such that Qi , Ci `
Qi+1 for some clause Ci . Assume also that all Q2 , . . . , Qn−1 contain a term that
mentions a function symbol f while Q1 and Qn are function-free. We say that
the derivation is function compact if all side premises Ci with 1 ≤ i < n used in
the derivation also mention f .
   The following can be shown for Horn clauses with one existential variable.
Lemma 2. Any SLD derivation can be transformed to a function compact one.
   Using Lemmas 1 and 2 we can show the following.
Theorem 1. Let a DL-Lite-TBox T and a CQ Q. Every derivation from T ∪ Q
by Ilite terminates. Moreover, Rapid-Lite(Q, T ) is a rewriting of Q w.r.t. T .


                                           6
4   A Rewriting Algorithm for ELHI

In the current section we extend the inference system Ilite in order to provide a
(query) rewriting algorithm for ELHI ontologies.
    ELHI additionally allows for RA-clauses of the form E(x) ← R(x, y) ∧ F (x).
According to the Requiem calculus this clause interacts with clause (3) of Ex-
ample 2 producing E(x) ← A(x) ∧ F (f (x)). Again this inference is of interest
only if also a clause of the form F (f (x)) ← G(x) can be deduced and hence then
produce F (x) ← A(x) ∧ G(x) which is function-free. As argued in [22] ontologies
typically have many clauses of these forms which can lead to a quadratic number
of redundant inferences.
    A straightforward approach to obtain an efficient calculus for ELHI ontolo-
gies would be to extend Definition 2 to allow for arbitrary ELHI-clauses as side
premises in shrinking and unfolding. Then, Lemmas 1 and 2 apply with few
modifications and hence this calculus would produce a rewriting.
Example 4. Let T be the ELHI-TBox consisting of the following clauses:

                                 C(x) ← S(x, y) ∧ D(y)                           (6)
                           S(f (x), x) ← B(x)                                    (7)
                                K(x) ← S(y, x) ∧ C(y)                            (8)

and let also the query Q1 = Q(x) ← K(x).
    By unfolding on Q1 and (8) we obtain Q2 = Q(x) ← S(y, x) ∧ C(y); then,
by unfolding on Q2 and (6) we obtain Q3 = Q(x) ← S(y, x) ∧ S(y, z) ∧ D(z);
finally, by shrinking on Q3 and (7) we can obtain Q4 = Q(x) ← B(x) ∧ D(x). It
can be verified that R = {Q1 , Q2 , Q3 , Q4 } is a rewriting of Q w.r.t. T ♦
However, as the previous example shows, unfolding using RA-clauses produces
clauses that contain more variables than the main premise and hence to variable
proliferation which implies termination problems. There are two solutions to this
issue. We either decide a bound on the number of times that such clauses can be
used as side premises or, like in Ilite , we still only allow DL-Lite-clauses as side
premises. But then, such a calculus would require additional inference rules in
order to be able to derive intermediate clauses that can then be side premises in
inferences. Next, we show how the Requiem calculus would behave when applied
to the input of Example 4 which motivates our ELHI calculus.

Example 5. Consider the TBox T and query Q1 from Example 4. When applied
to T and Q1 the Requiem algorithm would perform the following inferences with
the respective conclusions:

                  (6) and (7) produces C(f (x)) ← B(x) ∧ D(x)                    (9)
                  (8) and (7) produces K(x) ← B(x) ∧ C(f (x))                   (10)
                (10) and (9) produces K(x) ← B(x) ∧ D(x)                        (11)
                Q1 and (11) produces Q(x) ← B(x) ∧ D(x)                         (12)

                                         7
The set R = {Q1 , (6), (8), (12)} is a (datalog) rewriting of Q1 w.r.t. T .
    First, we can observe that, if we extend shrinking to accept RA-clauses as
main premises, then the inferences performed to produce clause (11) from clause
(8) can be captured by a single shrinking inference over clause (8) with side
premises (7) and (9). However, we can observe that clause (9) is produced by re-
solving together an RA-clause with a DL-Lite-clause and this cannot be captured
by any of the inferences of Ilite .                                           ♦

Motivated by the above example, our caluclus for ELHI-ontologies consists of a
new inference rule which can infer clauses like clause (9), together with unfolding
and (an extension of) shrinking that permit as a main premise either a CQ or
an RA-clause (i.e., Ilite with main premise clauses that are non DL-Lite).

Definition 4. With IEL we denote the inference system consisting of unfold-
ing, where the main premise is either a CQ or an RA-clause, together with the
following inference rules, where Υ is either a CQ or an RA-clause, for n ≥ 2
each Ci is a DL-Lite-clause, and Υ 0 σ is a function-free hyper-resolvent of Υ and
Ci :
               Υ    C1 . . . Cn
n-shrinking:                      s.t. there exists x 7→ f (y) ∈ σ with x ∈ ejvar(Υ )
                    Υ 0σ

               B(x) ← R(x, y) ∧ C(y)   R(f (x), x) ← A(x)
                                                          or
                        B(f (x)) ← A(x) ∧ C(x)
   function:
               B(x) ← R(y, x) ∧ C(y)   R(x, f (x)) ← A(x)
                        B(f (x)) ← A(x) ∧ C(x)

Finally, for Q a CQ and T an ELHI-TBox, Rapid-EL(Q, T ) is defined as all
function-free clauses derivable from Q ∪ T by IEL .

First, note that we had to extend shrinking to accept possibly more that two side
premises. This is because due to the function rule we can now deduce new clauses
that contain function symbols; hence there can be more than two clauses contain-
ing some function f , unlike the case in DL-Lite. In practice, however, we expect
that n is small since such clauses are only produced by the function rule and
only when there is a complex interaction between a clause containing R(x, f (x))
(R(f (x), x)) and an RA-clause containing the inverse R(y, x) (R(x, y)).

Example 6. Consider Example 5. The inference between clauses (6) and (7) that
produces clause (9) corresponds to an inference using the function rule. Then,
clause (11) can be produced by n-shrinking over (8) with side premises clauses
(7) and (9), while (12) is produced by unfolding on Q and (8), hence computing
the required rewriting.                                                      ♦

As mentioned in the previous section correctness of Rapid-Lite is heavily based
on showing that Requiem derivations can be unfolded into SLD derivations.
Consider now ELHI and an inference of the form Υ, C ` Υ 0 , where T `IREQ C.


                                         8
If the derivation of C does not involve an RA-clause, then again we can fully
unfold into an SLD derivation, like in DL-Lite. If it does, then this inference can
be unfolded into Υ, C1 ` Υ2 , . . . , Υn−1 , Cn ` Υ 0 up to a certain level; i.e., for some
Ci we might have Ci 6∈ T with Ci derivable by some RA-clause. By inspecting
the types of inferences performed by IREQ using RA-clauses (cf. [18, Table 4])
we can see that we can unfold up to the point where Ci has one of the following
forms, where the notation [C(x)] means that C(x) can be omitted:

                                 A(x) ← B(x) ∧ [C(x)]                                 (13)
                              B(f (x)) ← A(x) ∧ [C(x)]                                (14)

Again by inspection of IREQ we can see that clauses of form (14) can be pro-
duced from RA-clauses by an inference which is exactly the one captured by
the function rule. The derivation of clauses of form (13) is more involved and is
characterised next.
Lemma 3. Let P be a set of DL-Lite-clauses and let Υ1 be an RA-clause. Con-
sider also a derivation Υ1 , . . . , Υn by IREQ , such that Υi , Ci ` Υi+1 and Ci is deriv-
able from P by IREQ . Assume also that Υn is of the form A(x) ← B(x) ∧ [C(x)]
while no Υi with i < n is of the same form as Υn . Then, Υn can be derived from
P ∪ Υ1 by unfolding and n-shrinking.
Using Lemma 3 and our observation regarding clauses of form (14) we can show
the following.
Lemma 4. Let T be an ELHI-TBox and let Q be a CQ with query predicate
Q. Let also Plt be all DL-Lite-clauses derivable from T by IEL . Then, every
Q-clause Q0 derivable from T ∪ Q by IREQ is also derivable from Plt ∪ Q by ISLD .
     Finally, we can show the following.
Theorem 2. Let an ELHI-TBox T and a CQ Q. Every derivation from T ∪ Q
by IEL terminates. Moreover, Rapid-EL(Q, T ) is a datalog rewriting of Q, T .


5     Evaluation
We have extended the Rapid2 query rewriting system [7] to support ELHI
ontologies. Rapid attempts to compactly represent many unfolding inferences as
datalog rules in an effort to compute small (datalog) rewritings.
    We conducted an experimental evaluation comparing against Requiem [19],
Presto [21], and Clipper [8], using real-world large scale DL-Lite and ELHI
ontologies. Regarding DL-Lite, we used DL-Lite versions of the OpenGALEN2
(http://www.opengalen.org/), OBO protein (http://www.obofoundry.org/),
and NCI 3.12e (http://evs.nci.nih.gov/ftp1/NCI_Thesaurus) ontologies,
which we denote by Gd , Pd , and N, respectively. This part extends the evalua-
tion in [7] for large scale ontologies, and uses datalog rewritings instead of UCQs.
2
    http://www.image.ece.ntua.gr/~achort/rapid.zip


                                            9
                   Table 1. Statistics of the used test ontologies

  O concepts roles GCIs RIAs O concepts roles GCIs RIAs O concepts roles GCIs RIAs
  Gd 23193 851 49046 882 S       4298    519 6004 372 Ge 30048 851 63726 882
  Pd 35351     6 43351 0     C 4282      22 9564 15 Pe 37560         6 52383 0
                             N 29173     66 53341 0




Regarding ELHI, we used ELHI versions of the NASA SWEET 2.3 (http:
//sweet.jpl.nasa.gov/ontology/), periodic table (http://www.cs.man.ac.
uk/~stevensr/ontology/), OpenGALEN2, and OBO protein ontologies, which
we denote by S, C, Ge , and Pe , respectively. The DL-Lite and ELHI versions
of the ontologies where obtained by normalizing the ontologies and keeping the
appropriate subset of axioms. Table 1 provides statistics for the ontologies. For
each of them we manually constructed 5 test queries. All tests were performed
on a dual core 1.8GHz Intel Celeron processor laptop running Windows 8 and
JVM 1.7 with 3.6GB maximum heap size. The timeout limit was set to 2 hours.
    The results for DL-Lite are shown in the upper part of Table 2, which includes
the computation time and the size of the computed rewriting; “t/o” denotes
timeout. First, we observe that neither Presto nor Clipper managed to compute
a rewriting for Gd , Requiem required 9–18 minutes, while Rapid required only
milliseconds. Similar observations can be made for the rest of the DL-Lite ontolo-
gies. Notable cases are queries 1–5 over N for Clipper, where it required about
13 minutes for each of them, query 5 over Pd and query 3 over N for Requiem,
for which it did not manage to compute a rewriting, and all queries over Pd and
N for Presto for which it required 1 to 2 hours, being in general much slower also
from Requiem and Clipper. Rapid was slower only for query 5 over Pd , where
it required 7:15 minutes. The analysis showed that 7:00 out of the 7:15 minutes
are spent in backwards subsumption checking (which guarantees a compact re-
sult with no equivalent or subsumed queries), while the actual rewriting time is
only 15 seconds. Note that no other system performs backwards subsumption.
In fact, in this case e.g. Clipper returns several rewritings that are equivalent up
to variable renaming, which justifies also the difference in the rewriting sizes.
    The results for ELHI are shown in the lower part of Table 2; we did not run
Presto since it does not support ELHI. Our conclusions are similar to those of
DL-Lite—that is, in the large scale ontologies Rapid greatly outperforms all other
systems while in query 5 of the ELHI version of the OBO protein ontology (Pe )
it performed worse than Clipper since it spent 13:27 out of the 13:52 minutes in
backwards subsumption. Note also that for Ge , Rapid was the only system that
managed to compute a rewriting within ‘reasonable’ time (8–11 minutes).
    In summary, we can see that computing even some rewriting over large scale
and complex ontologies is still an unresolved issue as in many cases many state-
of-the-art systems did not manage to terminate, required several minutes, or
even hours. This could be acceptable if a rewriting is computed once for a fixed
ontology, however, practice has shown that ontologies are very often dynamic [5,
11, 10]. Moreover, in design time, ontology engineers tend to run reasoners often

                                         10
                            Table 2. Evaluation results.
                  Time (hh:ss:mm.msec)                     Rewriting size
          O  Rapid Requiem        Presto Clipper Rapid Requiem Presto Clipper
                                          DL-Lite
                 .11 18:34.96      t/o       t/o       225   30870 t/o    t/o
                 .09    9:01.05    t/o       t/o      1276    1152 t/o    t/o
         Gd      .11    t/o        t/o       t/o       973   t/o    t/o   t/o
                 .02 12:08.44      t/o       t/o       667   11306 t/o    t/o
                 .34    t/o        t/o       t/o      3267   t/o    t/o   t/o
                 .07       4.72 1:15:07.96 1:03.97      29       27   48     29
                 .81      45.58 58:19.05 1:04.94      1356    1356 2621    1356
         Pd     1.40    9:50.54 1:16:36.30 1:08.40 33919     33887 33888 33919
                6.99 11:30.87 1:21:28.37 1:08.70 34879       34733 35416 34879
             7:14.56    t/o     1:12:51.73 1:08.86 27907     t/o    2670 54430
                0.06       4.08 1:51:31.96 12:09.45    488    5002   469    488
                0.04      37.70 1:56:40.12 13:02.02   1804    1765 1766    1804
         N      0.15    t/o     1:59:20.03 13:15.10   4143   t/o    3546   4143
                0.05    9:42.13 1:57:19.87 13:17.46   1875   64500 1917    1875
                0.03 1:06:26.66 1:57:09.73 13:08.27    256 219150    208    340
                                           ELHI
                 .08        .17    n/a        13.67    172     298 n/a      171
                 .16        .36    n/a        13.32    473    1523 n/a      367
         S       .02        .44    n/a        13.79    629    1674 n/a      518
                 .05       1.09    n/a        13.32   1065    2861 n/a      949
                 .04      38.08    n/a        13.30   1075   18716 n/a      959
                 .06    3:36.54    n/a        21.42   1103    6800 n/a     2892
                 .05    3:40.24    n/a        19.73    879    6941 n/a     2892
         C       .09    3:47.85    n/a        18.95   1653    6889 n/a     2892
                 .08    3:31.29    n/a        20.48   1609    8077 n/a     2849
                 .15    9:10.73    n/a        20.33   1743   57054 n/a     2893
            11:01.71    t/o        n/a       t/o    142566   t/o    n/a   t/o
             8:45.87    t/o        n/a       t/o    140640   t/o    n/a   t/o
         Ge 10:18.01    t/o        n/a       t/o    143636   t/o    n/a   t/o
             9:26.00    t/o        n/a       t/o    136679   t/o    n/a   t/o
             7:47.52    t/o        n/a       t/o    128566   t/o    n/a   t/o
                4.80    t/o        n/a      1:42.08 51641    t/o    n/a   51641
               29.73    t/o        n/a      1:42.30 52877    t/o    n/a   52877
         Pe     2.59    t/o        n/a      1:45.35 51614    t/o    n/a   51614
               15.71    t/o        n/a      1:43.19 52407    t/o    n/a   52407
            13:51.97    t/o        n/a      1:47.10 79427    t/o    n/a 105950



to see what effects their changes have on reasoning. In this case, any computation
requiring more than a couple of minutes might not be acceptable.

6   Conclusions
We have presented an efficient resolution-based rewriting algorithm for ELHI.
Despite the complex interactions in ELHI we show that a calculus using exten-
sions of the unfolding and shrinking rules of Rapid for DL-Lite, plus a new rule
can be defined. Our experimental evaluation shows that there are large scale
ontologies which existing systems cannot handle, while the new algorithm only
requires milliseconds. This is to a large extent due to the hyper-resolution infer-
ence performed by shrinking, which avoids well-known inefficiencies of resolution-
based rewriting algorithms [22].
    Concerning future work, we plan to investigate whether a similar hyper-
resolution step can be used to optimise resolution-based rewriting algorithms
for non-Horn DLs like ALC, like those implemented in the KAON2 system [15].


                                          11
References
 1. Andrea Acciarri, Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maur-
    izio Lenzerini, Mattia Palmieri, and Riccardo Rosati. Quonto: Querying ontologies.
    In AAAI, pages 1670–1671, 2005.
 2. Leo Bachmair and Harald Ganzinger. Resolution theorem proving. In Handbook
    of Automated Reasoning, pages 19–99. Elsevier and MIT Press, 2001.
 3. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    and Riccardo Rosati. Tractable reasoning and efficient query answering in de-
    scription logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429,
    2007.
 4. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    and Riccardo Rosati. Data complexity of query answering in description logics.
    Artificial Intelligence, 195:335–360, 2013.
 5. Diego Calvanese, Evgeny Kharlamov, Werner Nutt, and Dmitriy Zheleznyakov.
    Evolution of dl-lite knowledge bases. In International Semantic Web Conference
    (1), pages 112–128, 2010.
 6. Chin-Liang Chang and Richard C. T. Lee. Symbolic logic and mechanical theorem
    proving. Computer science classics. Academic Press, 1973.
 7. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting in OWL 2
    QL. In Proc. of CADE 23, pages 192–206, 2011.
 8. Thomas Eiter, Magdalena Ortiz, Mantas Simkus, Trung-Kien Tran, and Guohui
    Xiao. Query rewriting for Horn-SHIQ plus rules. In AAAI, 2012.
 9. Christian G Fermüller, Alexander Leitsch, Ullrich Hustadt, and Tanel Tammet.
    Resolution decision procedures. In Handbook of Automated Reasoning, pages 1791–
    1849. Elsevier Science Publishers BV, 2001.
10. Giorgos Flouris, Dimitris Manakanatas, Haridimos Kondylakis, Dimitris Plex-
    ousakis, and Grigoris Antoniou. Ontology change: classification and survey. Knowl-
    edge Eng. Review, 23(2):117–152, 2008.
11. Rafael S. Gonçalves, Bijan Parsia, and Ulrike Sattler. Analysing the evolution of
    the nci thesaurus. In CBMS, pages 1–6, 2011.
12. Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Deciding Expressive Description
    Logics in the Framework of Resolution. Information & Computation, 206(5):579–
    601, 2008.
13. Ullrich Hustadt and Renate A Schmidt. Issues of decidability for description logics
    in the framework of resolution. In Automated Deduction in Classical and Non-
    Classical Logics, pages 191–205. Springer, 2000.
14. Martha Imprialou, Giorgos Stoilos, and Bernardo Cuenca Grau. Benchmarking
    ontology-based query rewriting systems. In Proceedings of the Twenty-Sixth AAAI
    Conference on Artificial Intelligence (AAAI 2012). AAAI Press, 2012.
15. Boris Motik. Description Logics and Disjunctive Datalog—More Than just a Fleet-
    ing Resemblance? In Proc. of the 4th Workshop on Methods for Modalities (M4M-
    4), volume 194, pages 246–265, 2005.
16. Boris Motik, Ulrike Sattler, and Rudi Studer. Query answering for OWL-DL with
    rules. J. Web Sem., 3(1):41–60, 2005.
17. Giorgio Orsi and Andreas Pieris. Optimizing query answering under ontological
    constraints. VLDB Endowment, 4(11):1004–1015, 2011.
18. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. Rewriting Conjunctive
    Queries under Description Logic Constraints. In Andrea Cal‘i, Georg Gottlob,
    Laks V.S. Lakshmanan, and Davide Martinenghi, editors, Proc. of the Int. Work-
    shop on Logic in Databases (LID 2008), Rome, Italy, May 19–20 2008.


                                          12
19. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. Tractable query answer-
    ing and rewriting under description logic constraints. Journal of Applied Logic,
    8(2):186–209, 2010.
20. Mariano Rodriguez-Muro and Diego Calvanese. High performance query answering
    over DL-Lite ontologies. In KR, 2012.
21. Riccardo Rosati and Alessandro Almatelli. Improving query answering over DL-
    Lite ontologies. In Proc. of KR-10, 2010.
22. Frantisek Simancik, Yevgeny Kazakov, and Ian Horrocks. Consequence-based rea-
    soning beyond horn ontologies. In IJCAI, pages 1093–1098, 2011.
23. Giorgos Stoilos, Bernardo Cuenca Grau, Boris Motik, and Ian Horrocks. Repair-
    ing ontologies for incomplete reasoners. In Proceedings of the 10th International
    Semantic Web Conference (ISWC-11), Bonn, Germany, pages 681–696, 2011.
24. Tassos Venetis, Giorgos Stoilos, and Giorgos Stamou. Query extensions and in-
    cremental query rewriting for OWL 2 QL ontologies. Journal on Data Semantics,
    Accepted, 2013.




                                         13