=Paper= {{Paper |id=Vol-477/paper-5 |storemode=property |title=A Comparison of Query Rewriting Techniques for DL-lite |pdfUrl=https://ceur-ws.org/Vol-477/paper_2.pdf |volume=Vol-477 |dblpUrl=https://dblp.org/rec/conf/dlog/Perez-UrbinaMH09 }} ==A Comparison of Query Rewriting Techniques for DL-lite== https://ceur-ws.org/Vol-477/paper_2.pdf
    A Comparison of Query Rewriting Techniques
                    for DL-Lite

              Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

                   Oxford University Computing Laboratory
                              Oxford, England
      {hector.perez-urbina,boris.motik,ian.horrocks}@comlab.ox.ac.uk



1    Introduction

An incomplete database is defined by a set of constraints and a partial database
instance [1]. Answering conjunctive queries over incomplete databases is an im-
portant computational task that lies at the core of many problems, such as in-
formation integration [12], data exchange [9], and data warehousing [17]. Given
a query and an incomplete database, the task is to compute the so-called certain
answers—the tuples that satisfy the query in every database instance that con-
forms to the partial instance and satisfies the constraints [16]. It is well known
that answering queries under general constraints is undecidable; therefore, the
expressivity of the constraint languages considered is typically restricted in order
to achieve decidability.
    Description Logics (DLs) [3] can be viewed as very expressive but decidable
first-order fragments, which makes them natural candidates for constraint lan-
guages. DLs are a family of knowledge representation formalisms that represent a
given domain in terms of concepts (unary predicates), roles (binary predicates),
and individuals (constants). A DL Knowledge Base (KB) K = hT , Ai consists
of a terminological component T called the TBox, and an assertional compo-
nent A called the ABox. In analogy to incomplete databases, the TBox can be
seen as a conceptual schema containing the set of constraints, and the ABox
as some partial instance of the schema. The use of DLs as constraint languages
has already proven to be useful in a variety of scenarios such as ontology-based
information integration [7] and the Semantic Web [10]. Various DLs exist for
which it is possible to transform a conjunctive query Q and a TBox T into a
datalog query Q0 —a so-called rewriting of Q w.r.t. T —such that for every ABox
A, the answers to Q over T and A, and the answers to Q0 over A coincide. Thus,
the problem of answering Q over K = hT , Ai for a specific A can be reduced
to evaluating the datalog query Q0 over A. A motivation for answering queries
via query rewriting is the prospect of using existing deductive database sys-
tems to evaluate the computed rewritings, thus taking advantage of the query
optimization strategies provided by such systems.
    The problem of conjunctive query rewriting for DLs has been considered by
Rosati [15] for the EL family of languages [2], and by Calvanese et al. for the
DL-Lite family of languages [6]. The rewriting algorithm proposed by Calvanese
2       Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

et al., which we refer to as the CGLLR algorithm, takes as input a conjunctive
query Q and a DL-LiteR TBox T and computes a union of conjunctive queries
that is a rewriting of Q w.r.t. T . The CGLLR algorithm lies at the core of
reasoners such as QuOnto1 and Owlgres2 . The aforementioned approaches are
closely related; however, they have been developed for two different DL families.
Motivated by the prospect of developing a unified algorithm, in our previous
work [13] we described a novel rewriting algorithm for the DL ELHI [2]. The
algorithm takes as input a conjunctive query Q and an ELHI TBox T , and
produces a rewriting Q0 of Q w.r.t. T . A distinguishing feature of our algorithm
is that it exhibits “pay-as-you-go” behavior: if T is in ELHI, ELH or EL, then
Q0 is a datalog query, as in [15]; if T is in DL-Lite+ [13], then Q0 consists of a
union of conjunctive queries and a linear datalog query; and, if T is in DL-LiteR
or DL-Litecore , then Q0 is a union of conjunctive queries, as in [6]. Hence, our
approach not only handles all the aforementioned sublanguages of ELHI, but
it is worst-case optimal w.r.t. data complexity for all of them, which makes it a
generalization and an extension of the approaches of [15] and [6].
     Our algorithm and the CGLLR algorithm mainly differ in the way existential
quantification is handled. In our algorithm, functional terms are used to keep
track of role successors. In contrast, the CGLLR algorithm deals with existen-
tial quantification by relying on a so-called reduction step. Moreover, unlike our
algorithm, the CGLLR algorithm does not handle qualified existential quan-
tification axioms natively; instead, a preliminary step is required where every
such an axiom is replaced with a suitable encoding that introduces a fresh role.
Our algorithm does not derive the queries produced by the reduction step and it
handles qualified existential quantification natively; therefore, we conjecture that
our algorithm will often produce smaller rewritings than the CGLLR algorithm.
     In order to gather empirical evidence to support our conjecture, we imple-
mented both algorithms in their original form—that is, we did not implement
any optimization to reduce the size of the rewritings. We then compared the
two implementations using a benchmark suite containing different DL-LiteR on-
tologies with TBoxes of varying complexity, and a set of test queries derived
from applications that use these ontologies. Our results clearly indicate that the
reduction step of the CGLLR algorithm can lead to a significant increase in the
size of the rewritings. The implementation of our algorithm—called REQUIEM
(REsolution-based QUery rewrIting for Expressive Models)—not only produced
significantly smaller rewritings, but was also significantly faster in most cases.
Finally, in order to determine the level of redundancy in the rewritings pro-
duced by our algorithm, we implemented an optimized version of REQUIEM,
called REQUIEM-Full, that uses forward subsumption [5] and performs an a
posteriori query subsumption check on the rewritings. REQUIEM produced the
same rewritings as REQUIEM-full for 63% of the queries. Therefore, our empiri-
cal evaluation suggests that REQUIEM alone can be effectively used in practice
in various scenarios.
1
    http://www.dis.uniroma1.it/~quonto/
2
    http://pellet.owldl.com/owlgres/
                   A Comparison of Query Rewriting Techniques for DL-Lite           3

2    Preliminaries

For P an atomic role, a basic role has the form P or P − . For A an atomic
concept and R a basic role, an antecedent concept has the form A or ∃R; and a
consequent concept has the form A, ∃R, or ∃R.A. A DL-LiteR TBox is a set of
inclusion assertions or inclusion axioms of the form A v C or R1 v R2 , where
A is an antecedent concept, C is a consequent concept, and R1 and R2 are basic
roles. The former inclusion assertion is called a concept inclusion, and the latter,
a role inclusion. An ABox is a set of membership assertions of the form A(a)
or P (a, b), where A is an atomic concept, P is an atomic role, and a and b are
constants. A DL-LiteR Knowledge Base (KB) K is a tuple hT , Ai, where T is a
DL-LiteR TBox, and A is an ABox. The semantics of K is defined as usual [3].
     We use the well-known definitions of constants, variables, function symbols,
terms, and atoms of first-order logic [8]. A Horn clause C is an expression of the
form D0 ← D1 ∧ ... ∧ Dn where each Di is an atom. The atom D0 is called the
head, and the set {D1 , ..., Dn } is called the body. Variables that occur in the body
at most once and do not occur in the head are called unbound variables and may
be denoted with the symbol ; all other variables are called bound variables.
A Horn clause is safe if every variable occurring in the head also occurs in the
body. The depth of a term t is defined as depth(t) = 0 if t is a constant or a
variable, and depth(f (s1 , ..., sn )) = 1 + max(depth(si )) for 1 ≤ i ≤ n if t is a
functional term f (s1 , ..., sn ). The notion of depth for atoms and Horn clauses is
extended in the natural way. An atom Di occurring in a Horn clause C is said
to be deepest in C if depth(Di ) ≥ depth(Dj ) for every body atom Dj of C. A
conjunctive query over a DL KB K is a safe Horn clause whose head predicate
does not occur in K, and whose body is a set of atoms whose predicates are
concept and role names occurring in K. A union of conjunctive queries over K is
a set of conjunctive queries over K with the same head up to variable renaming
[4]. A tuple of constants ~a is a certain answer to a union of conjunctive queries
Q over K if and only if K ∪ Q |= QP (~a), where QP is the head predicate of Q,
and Q is considered to be a set of universally quantified implications with the
usual first-order semantics [8]. The set of all answers to Q over K is denoted by
ans(Q, K). Given a conjunctive query Q and a TBox T , a query Q0 is said to be
a rewriting of Q w.r.t. T if ans(Q, hT , Ai) = ans(Q0 , A) for every ABox A.


3    Query Rewriting Algorithms

In this section we briefly present the two rewriting algorithms we consider in our
empirical evaluation: the CGLLR algorithm and our resolution-based algorithm.
Both algorithms transform a conjunctive query Q and a DL-LiteR TBox T into
a union of conjunctive queries Q0 that is a rewriting of Q w.r.t. T .
    The CGLLR Algorithm This algorithm uses the axioms of T as rewriting
rules and applies them to the body atoms of Q. The rewriting rules are based
on a partial function ref(D, α) that takes as input an axiom α ∈ T and a body
atom D of Q. We define ref(D, α) as follows:
4         Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

      Input: Conjunctive query Q, DL-LiteR TBox T
      R = {Q};
      repeat
         foreach query Q0 ∈ R do
             (reformulation) foreach atom D in Q0 do
                 foreach axiom α ∈ T do
                     if α is applicable to D then
                         R = R ∪ {Q0 [D/ref(D, α)]};
                     end
                 end
             end
             (reduction) forall atoms D1 , D2 in Q0 do
                 if D1 and D2 unify then
                     σ = MGU(D1 , D2 );
                     R = R ∪ {λ(Q0 σ))};
                 end
             end
         end
      until no query unique up to variable renaming can be added to R ;
      return R;
                      Algorithm 1: The CGLLR algorithm


    – if D = A(x), then we have that (i) if α = B v A, then ref(D, α) = B(x);
      (ii) if α = ∃P v A, then ref(D, α) = P (x, ); and (iii) if α = ∃P − v A, then
      ref(D, α) = P ( , x).
    – If D = P (x, ), then we have that (i) if α = A v ∃P , then ref(D, α) = A(x);
      (ii) if α = ∃S v ∃P , then ref(D, α) = S(x, ); and (iii) if α = ∃S − v ∃P ,
      then ref(D, α) = S( , x).
    – If D = P ( , x), then we have that (i) if α = A v ∃P − , then ref(D, α) = A(x);
      (ii) if α = ∃S v ∃P − , then ref(D, α) = S(x, ); and (iii) if α = ∃S − v ∃P − ,
      then ref(D, α) = S( , x).
    – If D = P (x, y), then we have that (i) if either α = S v P or α = S − v P − ,
      then ref(D, α) = S(x, y); and (ii) if either α = S v P − or α = S − v P , then
      ref(D, α) = S(y, x).

If ref(D, α) is defined for α and D, we say that α is applicable to D.
    The CGLLR algorithm is shown in Algorithm 1. Starting with the original
query Q, the algorithm keeps producing queries in two steps until no other query
unique up to variable renaming can be produced. In the reformulation step the
algorithm rewrites the body atoms of a given query Q0 by using applicable TBox
axioms as rewriting rules, generating a new query for every atom reformulation.
The expression Q[D/D0 ] denotes the conjunctive query obtained from Q by
replacing the body atom D with a new atom D0 . Then, in the reduction step
the algorithm produces a new query λ(Q0 σ) for each pair of body atoms of Q0
that unify. The function MGU takes as input two atoms and returns their most
general unifier [4]. The function λ takes as input a conjunctive query Q and
                   A Comparison of Query Rewriting Techniques for DL-Lite           5

   Input: Conjunctive query Q, DL-LiteR TBox T
   R = Ξ(T ) ∪ {Q};
   repeat
      (saturation) forall clauses C1 , C2 in R do
          R = R ∪ resolve(C1 , C2 );
      end
   until no query unique up to variable renaming can be added to R ;
   return {C | C ∈ unfold(ff(R)), and C has the same head predicate as Q};
                Algorithm 2: Our resolution-based algorithm


returns a new conjunctive query obtained by replacing each occurrence of an
unbound variable in Q with the symbol . The reduction step is necessary to
achieve completeness since an axiom that is not applicable to any body atom of
Q0 may be applicable to some body atom of λ(Q0 σ).
    As a final remark, we point out that the CGLLR algorithm does not handle
qualified existential quantification natively—that is, there is no rewriting rule
for axioms of the form A v ∃R.B. Instead, w.l.o.g. every axiom of such a form
occurring in T is replaced with the set of axioms {A v ∃P1 , ∃P1− v B, P1 v R},
where P1 is a new atomic role not occurring in T .
    Resolution-based Algorithm Our algorithm takes as input a conjunctive
query Q and an ELHI TBox T , and produces a datalog query that is a rewriting
of Q w.r.t. T . We have shown that if T is in DL-LiteR , then the rewriting is
a union of conjunctive queries. The algorithm first transforms Q and T into
clauses, and then computes the rewriting by using a resolution-based calculus to
derive new clauses from the initial set.
    Our algorithm is presented in Algorithm 2, where we show the parts of the
original algorithm that are relevant to DL-LiteR only. The rewriting is computed
in four steps: clausification, saturation, unfolding, and pruning. The algorithm
starts by transforming Q and T into a set of clauses Ξ(T ) ∪ {Q}. The expression
Ξ(T ) denotes the set of clauses obtained from T according to Table 1. Then, the
algorithm keeps producing clauses in the saturation step until no other clause
unique up to variable renaming can be produced. The function resolve takes two
clauses C1 and C2 , and it returns a set containing every clause CR that can be
obtained by combining the atoms of C1 and C2 according to the inference tem-
plates shown in Table 2. A template of the form P1 R P2 denotes that, if C1 is a
clause of the form of P1 and C2 is a clause of the form of P2 , then resolve(C1 , C2 )
contains all clauses of the form of R that can be constructed from C1 and C2 ; oth-
erwise, resolve(C1 , C2 ) = ∅. After the saturation step, the resulting function-free
clauses are unfolded. The function ff takes a set of clauses N and returns the set
of function-free clauses of N . The function unfold takes a set of clauses N , and re-
turns the set obtained by unfolding every clause in N ; for example, if we have that
N = {QP (x) ← A(x), A(x) ← B(x)}, then unfold(N ) = N ∪ {QP (x) ← B(x)},
which results by unfolding A(x) ← B(x) into QP (x) ← A(x). A formal descrip-
tion of the unfolding step can be found in [13]. In the last step, every clause that
does not have the same head predicate as Q is dropped.
6      Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

                Table 1. Translating T into a set of clauses Ξ(T )


                       DL-LiteR clause DL-LiteR axiom
                       B(x) ← A(x)         AvB
                       P (x, f (x)) ← A(x) A v ∃P.B
                       B(f (x)) ← A(x)
                       P (f (x), x) ← A(x) A v ∃P − .B
                       B(f (x)) ← A(x)
                       A(x) ← P (x, y)     ∃P v A
                       A(x) ← P (y, x)     ∃P − v A
                       S(x, y) ← P (x, y) P v S, P − v S −
                       S(x, y) ← P (y, x) P v S − , P − v S

Note 1. Each axiom of the form A v ∃R.B is uniquely associated with a function
symbol f .


    The Main Difference The presented techniques mainly differ in their han-
dling of existential quantification: while our algorithm deals with axioms con-
taining existential quantifiers on the right-hand side by introducing functional
terms, the CGLLR algorithm does so by restricting the applicability of such ax-
ioms and relying on the reduction step. We explore this difference by means of
an example (taken from [6]). Consider a DL-LiteR TBox T that consists of the
following axioms:

                              Professor v ∃teaches,                           (1)
                                      −
                             ∃teaches v Student.                              (2)

The TBox T states that a professor teaches at least someone, and that someone
that is taught is a student. Consider the query

                      Q(x) ← teaches(x, y) ∧ Student(y).                      (3)

    We first analyze the execution of the CGLLR algorithm. In the first iteration,
the axiom (1) is not applicable to teaches(x, y) because teaches(x, y) has more
than one bound variable. The reason why the applicability of (1) has to be
restricted in this case is that the CGLLR algorithm does not keep track of
information about role successors. In fact, naively allowing axioms of the form
of (1) to be applicable in such a case would result in the loss of soundness.
To illustrate this point, suppose that (1) were applicable to teaches(x, y) and
ref(teaches(x, y), (1)) = Professor(x); the algorithm would then obtain

                       Q(x) ← Professor(x) ∧ Student(y).                      (4)

Note that the relation between x and y is lost—that is, the fact that the individ-
ual represented by y must be a teaches-successor of the individual represented
by x is not captured by query (4). In particular, if T were to contain axiom (1)
only, then query (4) would clearly produce wrong answers.
                   A Comparison of Query Rewriting Techniques for DL-Lite              7

                Table 2. Inference templates for the function resolve

                           C(x) ← B(x) B(f (x)) ← A(x)
                                 C(f (x)) ← A(x)

   B(x) ← P (x, y) P (x, f (x)) ← A(x)       B(x) ← P (x, y) P (f (x), x) ← A(x)
             B(x) ← A(x)                             B(f (x)) ← A(x)

   B(x) ← P (y, x) P (x, f (x)) ← A(x)       B(x) ← P (y, x) P (f (x), x) ← A(x)
           B(f (x)) ← A(x)                             B(x) ← A(x)

   S(x, y) ← P (x, y) P (x, f (x)) ← A(x) S(x, y) ← P (x, y) P (f (x), x) ← A(x)
            S(x, f (x)) ← A(x)                     S(f (x), x) ← A(x)

   S(x, y) ← P (y, x) P (x, f (x)) ← A(x) S(x, y) ← P (y, x) P (f (x), x) ← A(x)
            S(f (x), x) ← A(x)                     S(x, f (x)) ← A(x)

                       u) ← B(t) ∧ Di (t~i ) B(f (x)) ← A(x)
                                   V
                   QP (~
                              u)σ ← A(x)σ ∧ Di (t~i )σ
                                             V
                          QP (~
          where σ = MGU(B(t), B(f (x)), and B(t) is deepest in its clause.

                     u) ← P (s, t) ∧ Di (t~i ) P (x, f (x)) ← A(x)
                                      V
                 QP (~
                               u)σ ← A(x)σ ∧ Di (t~i )σ
                                                 V
                         QP (~
       where σ = MGU(P (s, t), P (x, f (x)), and P (s, t) is deepest in its clause.

                     u) ← P (s, t) ∧ Di (t~i ) P (f (x), x) ← A(x)
                                      V
                 QP (~
                               u)σ ← A(x)σ ∧ Di (t~i )σ
                                                 V
                         QP (~
       where σ = MGU(P (s, t), P (f (x), x), and P (s, t) is deepest in its clause.




   Although the applicability of (1) is restricted, the axiom (2) is applicable to
Student(y) in (3), producing

                       Q(x) ← teaches(x, y) ∧ teaches( , y).                          (5)

In the next iteration, neither (1) nor (2) are applicable to any body atom of (5),
so no query is added in the reformulation step. In the reduction step, however,
the algorithm produces

                                Q(x) ← teaches(x, ),                                  (6)

by unifying the body atoms of (5). In the following iteration, the axiom (1) is
applicable to the only body atom of (6), producing

                                Q(x) ← Professor(x).                                  (7)

Note that without the reduction step, the algorithm would not have produced
query (7). It can easily be verified that no more queries unique up to variable
renaming can be produced; thus, the algorithm returns {(3), (5), (6), (7)}.
8       Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

   We now analyze the execution of our algorithm. The axioms (1) and (2) are
translated into the following clauses:

                         teaches(x, f (x)) ← Professor(x),                        (8)
                              Student(x) ← teaches(y, x).                         (9)

In the saturation step the algorithm produces

    Student(f (x)) ← Professor(x),                         (resolve((8), (9)))   (10)
            Q(x) ← Professor(x) ∧ Student(f (x)),          (resolve((3), (8)))   (11)
            Q(x) ← teaches(x, f (x)) ∧ Professor(x),      (resolve((3), (10)))   (12)
            Q(x) ← Professor(x).                          (resolve((8), (12)))   (13)

Note the difference between queries (4) and (11). Since the function symbol f
is uniquely associated with clause (8), unlike query (4), query (11) captures the
fact that the individual represented by f (x) must be a teaches-successor of the
individual represented by x. It can easily be verified that no other clause is
produced in the first step.
    Clearly, ff(R) = {(3), (9), (13)}; therefore, we have that unfold(ff(R)) consists
of ff(R) and the query

                      Q(x) ← teaches(x, y) ∧ teaches(z, y),                      (14)

which results from unfolding (9) into (3). Finally, since the clause (9) does not
have the same head predicate as query (3), our algorithm returns {(3), (13), (14)}.
    The use of functional terms is what mainly distinguishes our algorithm from
the CGLLR algorithm. This distinctive feature makes our approach more goal-
oriented, in the sense that it does not need to derive the queries produced by
the reduction step of the CGLLR algorithm in order to be complete. Moreover,
we have shown that every clause containing functional terms produced in the
saturation step of our algorithm can be safely dropped. Furthermore, our algo-
rithm handles qualified existential quantification natively, so it does not need to
introduce auxiliary roles. These are the main reasons why we conjecture that our
algorithm will often produce smaller rewritings than the CGLLR algorithm.


4    Evaluation
The main goal of the evaluation is to compare the algorithms described in Section
3 w.r.t. the size of the rewritings they produce. Since a rewriting containing fewer
queries than another is not necessarily less complex, we consider the size of a
rewriting as being the number of symbols needed to represent it in the standard
datalog notation. In order to give an indication of likely performance, we also
present the running time of both implementations; we point out, however, that
no special care was taken to obtain time efficient implementations and that the
times reported correspond to the computation of the rewritings only (we did not
evaluate the computed rewritings).
                  A Comparison of Query Rewriting Techniques for DL-Lite         9

    The implementation of our algorithm is called REQUIEM and we refer to our
implementation of the CGLLR algorithm as C. Both implementations are avail-
able at REQUIEM’s Web site3 . All tests were performed on a desktop computer
with a 2.59 GHz Intel processor, 1.87 GB of RAM, running Windows XP. We
used Java 1.6.0 Update 7 with a maximum heap size of 256 MB.
    Test Ontologies and Queries We selected test DL-LiteR ontologies with
varying numbers of axioms containing existential quantification. We consid-
ered ontologies that were developed in the context of real projects. The queries
we used were selected based on canonical examples used in the corresponding
project.
    VICODI is an ontology developed in the EU-funded project of the same
name,4 that captures information about European history contextualized w.r.t.
location, time and subject. STOCKEXCHANGE is an ontology developed for
ontology-based data access [14], that represents the domain of financial institu-
tions of the European Union. UNIVERSITY is a DL-LiteR version of LUBM5 —a
benchmark developed at Lehigh University for testing the performance of on-
tology management and reasoning systems; the ontology describes the organiza-
tional structure of universities. ADOLENA (Abilities and Disabilities OntoLogy
for ENhancing Accessibility) is an ontology developed for ontology-based data
access for the South African National Accessibility Portal [11]; the ontology de-
scribes abilities, disabilities, and devices. We decided to include two synthetic
ontologies in our tests in order to have a controlled scenario to help us under-
stand the impact of the reduction step of the CGLLR algorithm. PATH1 and
PATH5 model information about graphs: nodes are represented by individuals,
and vertices are ABox assertions of the form edge(a, b). The ontology PATH5
contains concepts representing paths of length 1–5, while PATH1 contains a
concept representing paths of length 1 only. All the ontologies and queries are
available at REQUIEM’s Web site.
    Results Figure 1 shows the size of the rewritings produced by REQUIEM
and C for the different considered ontologies and queries. Figure 2 shows the
time it took the implementations to compute the rewritings.
    As can be seen in Figure 1, the rewritings produced by REQUIEM were never
larger than those produced by C and were often significantly smaller. The most
extreme case was the fifth query over ADOLENA, in which REQUIEM produced
624 queries, as opposed to the more than 78,500 produced by C. We compared
the actual rewritings in order to verify whether for every query QR produced by
REQUIEM, there was an equivalent query up to variable renaming QP produced
by C. It turned out that this was the case for all of our tests.
    A closer inspection of the rewritings produced by C revealed that most of
them contained a considerable number of queries produced in the reduction step
of the algorithm. The impact of the reduction step can be clearly seen in the tests
involving PATH1 and PATH5. The queries we used with these two ontologies are
3
  http://www.comlab.ox.ac.uk/projects/requiem/
4
  http://www.vicodi.org/
5
  http://swat.cse.lehigh.edu/projects/lubm/
10      Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks


                                                       Fig. 1. Size of the rewritings
                                                VICODI                                                                         PATH1
                             100,000                                                                    10,000


                               10,000
                                                                                                          1,000

                                1,000




             Symbols




                                                                                       Symbols
                                                                                                           100
                                 100

                                                                                                            10
                                   10


                                    1                                                                        1
                                        Q0     Q1          Q2        Q3        Q4                                  Q0         Q1         Q2        Q3        Q4
                       REQUIEM          454    762        6,525    11,911     3,761              REQUIEM           42         70         98       126       154
                       C                454    812        6,525    11,911    16,255              C                 42         92        262       734       1,824

                                                PATH5                                                                      STOCKEXCHANGE
                           10,000,000                                                                10,000,000

                            1,000,000                                                                 1,000,000

                             100,000                                                                   100,000




                                                                                       Symbols
             Symbols




                              10,000                                                                    10,000

                                1,000                                                                     1,000

                                 100                                                                       100

                                  10                                                                        10

                                   1                                                                         1
                                        Q0     Q1          Q2        Q3        Q4                                  Q0         Q1         Q2        Q3        Q4
                       REQUIEM          122    286         486      708       938                REQUIEM           158      11,422     56,536    111,092   466,896
                       C                298   2,814      23,740    200,696 1,690,902             C                 158      13,680     121,674   173,088 1,602,203

                                              UNIVERSITY                                                                      ADOLENA
                            1,000,000                                                                10,000,000


                             100,000                                                                  1,000,000

                                                                                                       100,000
                               10,000
             Symbols




                                                                                       Symbols
                                                                                                        10,000
                                1,000
                                                                                                          1,000
                                 100
                                                                                                           100

                                   10                                                                       10

                                    1                                                                        1
                                        Q0     Q1          Q2        Q3        Q4                                  Q0         Q1         Q2        Q3        Q4
                       REQUIEM          118   10,378     29,376    113,270   279,266             REQUIEM          21,933     7,122     10,108    33,454    70,320
                       C                286   18,496     151,848   348,782   822,279             C                39,593    116,137    413,760   461,549 7,747,561




of the form Qn (x0 ) ← edge(x0 , x1 ) ∧ ... ∧ edge(xn , xn+1 ), for 0 ≤ n ≤ 4. Clearly,
every pair of body atoms in queries ofthis form unify; therefore, for every Q0
with m body atoms, C will produce m      2 new queries in the reduction step. The
most extreme case for these ontologies was the fifth query over PATH5, in which
REQUIEM produced 16 queries, as opposed to the more than 25,000 produced
by C. Note that for every query Q00 produced from a query Q0 in the reduction
step, there is a substitution σ such that Q0 σ = Q00 , in which case we say that
Q0 subsumes Q00 . It is well known that every such query Q00 can be dropped
from the rewriting without sacrificing completeness [8]. Identifying such queries,
however, is not straightforward since the CGLLR algorithm does not keep track
of which queries were produced in the reduction step. Notably, several of the
rewritings produced by C contained a considerable number of queries containing
the auxiliary roles introduced to simulate axioms of the form A v ∃R.B (see
Section 3). We believe that the CGLLR algorithm could benefit if such axioms
were handled natively. To the best of our knowledge, however, there does not
exist a version of the CGLLR algorithm that does so.
    The results shown in Figure 2 suggest that REQUIEM will be significantly
faster than C in case the queries are relatively complex and/or the ontologies
contain a relatively large number of existential quantification axioms. The most
extreme case was again the fifth query over ADOLENA, in which REQUIEM
took a little over half a second, as opposed to the almost 20 minutes taken by C.
    Finally, in order to determine the level of redundancy in the rewritings pro-
duced by our algorithm, we implemented REQUIEM-Full—an optimized version
of REQUIEM that uses forward subsumption [5] and performs an a posteriori
                                       A Comparison of Query Rewriting Techniques for DL-Lite                                                                          11


                                                                     Fig. 2. Running time
                                                    VICODI                                                                        PATH1
                                    1,000                                                                        100




                                     100




             Milliseconds




                                                                                        Milliseconds
                                                                                                                  10


                                      10




                                        1                                                                          1
                                            Q0     Q1          Q2      Q3       Q4                                      Q0       Q1         Q2      Q3        Q4
                            REQUIEM         250    266         297    359      282                     REQUIEM           1        1         1       16        32
                            C                1      1          31      47       78                     C                 1       16         16      16        31

                                                    PATH5                                                                     STOCKEXCHANGE
                                1,000,000                                                                    100,000


                                 100,000
                                                                                                              10,000

                                  10,000




                                                                                        Milliseconds
            Milliseconds




                                                                                                                1,000
                                   1,000
                                                                                                                 100
                                     100

                                                                                                                  10
                                      10


                                       1                                                                           1
                                            Q0     Q1          Q2      Q3       Q4                                      Q0       Q1         Q2      Q3        Q4
                            REQUIEM         15     15          94     922     24,172                   REQUIEM          16       93        625     1,438    21,704
                            C               1      15          141    3,546   265,875                  C                 1       78        922     1,109    80,031

                                                  UNIVERSITY                                                                     ADOLENA
                                 100,000                                                                   10,000,000

                                                                                                            1,000,000
                                   10,000
                                                                                                             100,000
             Milliseconds




                                                                                        Milliseconds
                                    1,000
                                                                                                              10,000

                                                                                                                1,000
                                     100

                                                                                                                 100
                                      10
                                                                                                                  10

                                        1                                                                          1
                                            Q0     Q1          Q2      Q3       Q4                                      Q0       Q1         Q2      Q3        Q4
                            REQUIEM         32     78          156    1,828    7,594                   REQUIEM          203      79        109     281       657
                            C                1     62          531    4,610   16,219                   C                157      687       4,187   8,000   1,119,000




query subsumption check on the rewritings. For every clause C produced in the
saturation step, the forward subsumption optimization verifies whether there is
a previously produced clause C 0 such that C 0 subsumes C. If this is the case,
then C is not added to the set of produced clauses. The query subsumption
check takes the rewriting after the pruning step and gets rid of every clause C
for which there is another clause C 0 such that C 0 subsumes C. Note that in
the case of the CGLLR algorithm, there always is a previously produced query
that subsumes every query produced in the reduction step; therefore, forward
subsumption would get rid of every query produced in the reduction step on
the fly, which would compromise completeness. Moreover, given the size of the
rewritings, a straightforward optimization based on an a posteriori query sub-
sumption check may be impractical. REQUIEM produced the same rewritings as
REQUIEM-Full for 63% of the queries. This suggests that REQUIEM alone can
be used effectively in practice in various scenarios.


5   Future Work

We plan to develop optimization techniques for REQUIEM in order to reduce or
eliminate recursion when possible in the computed rewritings. Additionally, we
plan to use REQUIEM in an ontology-based data access system using existing
database technology in order to evaluate the practicality of our approach for
query answering. Finally, we plan to extend REQUIEM to be a reasoner for
OWL 2 QL.
12      Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

References
 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley,
    1995.
 2. F. Baader, S. Brandt, and C. Lutz. Pushing the EL Envelope. In International
    Joint Conferences on Artificial Intelligence (IJCAI-05), 2005.
 3. F. Baader and W. Nutt. Basic Description Logics, chapter 2, pages 47–100. Cam-
    bridge University Press, 2003.
 4. F. Baader and W. Snyder. Unification Theory. In A. Robinson and A. Voronkov,
    editors, Handbook of Automated Reasoning, volume I, chapter 8, pages 445–532.
    Elsevier Science, 2001.
 5. L. Bachmair and H. Ganzinger. Resolution Theorem Proving. In A. Robinson
    and A. Voronkov, editors, Handbook of Automated Reasoning, volume 1, chapter 2,
    pages 19–100. North Holland, 2001.
 6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
    Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Fam-
    ily. J. of Automated Reasoning, 2007.
 7. D. Calvanese, G. D. Giacomo, M. Lenzerini, D. Nardi, and R. Rosati. Descrip-
    tion Logic Framework for Information Integration. In Principles of Knowledge
    Representation and Reasoning, pages 2–13, 1998.
 8. C.-L. Chang and R. C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving.
    Academic Press, Inc., Orlando, FL, USA, 1997.
 9. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and
    Query Answering. In ICDT, pages 207–224, 2003.
10. J. Heflin and J. Hendler. A portrait of the semantic web in action. IEEE Intelligent
    Systems, 16(2):54–59, 2001.
11. C. M. Keet, R. Alberts, A. Gerber, and G. Chimamiwa. Enhancing web por-
    tals with ontology-based data access: The case study of south africa’s accessibility
    portal for people with disabilities. In OWLED, 2008.
12. M. Lenzerini. Data Integration: a theoretical perspective. In PODS ’02: Proceedings
    of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of
    database systems, pages 233–246, New York, NY, USA, 2002. ACM Press.
13. H. Pérez-Urbina, B. Motik, and I. Horrocks. Rewriting Conjunctive Queries under
    Description Logic Constraints. In Proceedings of the International Workshop on
    Logics in Databases, May 2008.
14. M. Rodriguez-Muro, L. Lubyte, and D. Calvanese. Realizing ontology based data
    access: A plug-in for protégé. In Proc. of the Workshop on Information Integration
    Methods, Architectures, and Systems (IIMAS 2008), pages 286–289. IEEE Com-
    puter Society Press, 2008.
15. R. Rosati. On conjunctive query answering in EL. In Proceedings of the 2007
    International Workshop on Description Logics (DL2007), CEUR-WS, 2007.
16. R. van der Meyden. Logical Approaches to Incomplete Information: A Survey. In
    J. Chomicki and G. Saake, editors, Logics for Databases and Information Systems,
    pages 307–356. Kluwer, 1998.
17. J. Widom. Research Problems in Data Warehousing. In 4th International Confer-
    ence on Information and Knowledge Management, pages 25–30, Baltimore, Mary-
    land, 1995.