=Paper= {{Paper |id=Vol-477/paper-11 |storemode=property |title=Combined FO Rewritability for Conjunctive Query Answering in DL-Lite |pdfUrl=https://ceur-ws.org/Vol-477/paper_32.pdf |volume=Vol-477 |dblpUrl=https://dblp.org/rec/conf/dlog/KontchakovLTWZ09 }} ==Combined FO Rewritability for Conjunctive Query Answering in DL-Lite== https://ceur-ws.org/Vol-477/paper_32.pdf
         Combined FO Rewritability for Conjunctive
               Query Answering in DL-Lite

        R. Kontchakov,1 C. Lutz,2 D. Toman,3 F. Wolter4 and M. Zakharyaschev1
    1
        School of CS and Information Systems 2 Fachbereich Mathematik und Informatik
           Birkbeck College London, UK             Universität Bremen, Germany
         {roman,michael}@dcs.bbk.ac.uk            clu@informatik.uni-bremen.de
           3                                      4
            D.R. Cheriton School of CS                Department of Computer Science
          University of Waterloo, Canada               University of Liverpool, UK
               david@uwaterloo.ca                         frank@csc.liv.ac.uk



1        Introduction

Standard description logic (DL) reasoning services such as satisfiability and sub-
sumption mainly aim to support TBox design. When the design has been com-
pleted and the TBox is used in an actual application, it is usually combined
with instance data stored in an ABox, and therefore query answering becomes
the most important reasoning service. In this context, conjunctive queries (CQs)
have become very popular, and a rich literature on the subject has emerged in
recent years; see, e.g., [5, 9, 12–14, 19, 22]. Unfortunately, CQ answering in ex-
pressive DLs of the ALC and SHIQ families is often computationally harder
than satisfiability and subsumption [11, 15], while in typical applications it is
also much more time critical. To address this problem, it has been proposed to
use standard relational database management systems (RDBMS) for DL query
processing [7]. Indeed, the DL-Lite family of DLs was designed to allow CQ an-
swering using standard RDBMSs while still capturing as much basic expressivity
of ontology languages as possible [2, 3, 6–9, 20].
    The basic idea behind CQ answering in DL-Lite is to store the ABox as
a relational instance in the RDBMS and to rewrite the query to account for
the implicit information represented in the TBox, of which the RDBMS is un-
aware. The foundation of this approach is captured formally by the notion of
FO rewritability [8].1 Recently, an alternative approach to using RDBMSs for
CQ answering in DLs has been proposed by generalizing FO rewritability to
combined FO rewritability [17, 18]. In this new approach, the TBox is incorpo-
rated mainly into the ABox rather than into the query, though limited query
rewriting is still necessary. Formally, a DL enjoys combined FO rewritability if,
given a knowledge base (T , A) and a CQ q, it is possible to effectively rewrite
(i) the ABox A and the TBox T into a finite FO structure M (independently
of q) and (ii) q and (possibly) T into an FO query q † (independently of A) in
such a way that query answers are preserved—i.e., the answers to q † over M
are the same as the certain answers to q over (T , A). Important members of the
1
    The term used in [8] is FO reducibility, but we prefer ‘rewritability’ to avoid confusion
    with the complexity class FO from descriptive complexity.
EL family of DLs, which is incomparable in expressive power to the DL-Lite
family, enjoy combined FO rewritability [17, 18]. In fact, this was originally the
main motivation for introducing the more general notion of rewritability: while
a DL can only enjoy FO rewritability if its data complexity for CQ answering is
in LogSpace [8, 9] (actually, in AC0 [3]), PTime DLs such as those in the EL
family can allow combined FO rewritability.
    However, even for DLs such as DL-Lite in which the data complexity of CQ
answering is in AC0 , the novel approach to CQ answering can be useful. On
the one hand, this approach presupposes authority over the data to implement
ABox rewriting, which makes it unsuitable for applications in which only a query
interface is provided such as virtual data integration. On the other hand, though,
the increased flexibility obtained by allowing to rewrite both the query and the
ABox can be expected to enable more efficient query execution. In this context,
it is interesting to note that the query rewriting for DL-Lite of the original
approach [8, 9] involves an exponential blowup in the size of the query (the
rewritten query is a union of (|T | · |q|)|q| many CQs of length ≤ |q|), whereas the
approach for EL [17, 18] does not involve such a blowup.
    In this paper, we apply the novel approach to CQ answering pursued in
[17, 18] to the DL-Lite family of DLs. More precisely, we concentrate on the
DL-LiteN horn [2, 3] (which properly contains DL-LiteF ,u of [8]) and show that it
is possible to rewrite (i) every DL-LiteN  horn TBox T and ABox A (independently
of a given CQ q) to a finite FO structure of size O(|T | · |A|) and (ii) every CQ q
(independently of both T and A) into an FO query q † such that the answers are
preserved. The proof is much less trivial than one might expect since, compared
to EL, the presence of inverse roles in DL-Lite complicates matters considerably.
This also results, in general, in the rewritten query q † to be of size 2O(|q|) and
thus exponential in the size of q (which is commonly small), but independent
of the size of T (which is often large). However, we identify a large and natural
class of queries where the size of q † is only O(|q|2 ).
    We perform an initial experimental evaluation of our approach and compare
its performance to the approach of [8, 9] implemented in the QuOnto system [1,
21]. Summarized, the evaluation confirms the intuition that our combined ap-
proach should typically lead to smaller query execution times. In particular, the
QuOnto rewriting often significantly blows up queries that use concepts with
many subsumees, which typically leads to long query execution times. The ex-
ecution times in our own approach are typically small if we stay within the
identified query class that admits polynomial rewriting. On the negative side,
the ABox rewriting can take long and should either be implemented offline or
using (yet to be developed) incremental techniques.


2   DL-LiteN
           horn

The DL-Lite family of DLs was originally introduced in [7–9]. We consider
DL-LiteNhorn , the most expressive dialect of DL-Lite that does not include role
inclusions and for which query answering is in AC0 for data complexity; for
relations to other logics in the DL-Lite family see [3, 2]. Let NI , NC , and NR
be countably infinite sets of individual names, concept names, and role names.
Roles R and concepts C of DL-LiteN
                                 horn are defined as follows, where P ∈ NR ,
A ∈ NC and m > 0:
            R ::= P | P −          and      C ::= > | ⊥ | A | ≥ m R.
As usual, we write ∃R for ≥ 1 R and identify (R− )− with R. A DL-LiteN                horn
TBox is a finite set of concept inclusions (CIs) of the form C1 u · · · u Cn v C,
where C1 , . . . , Cn and C are concepts. For example, the CIs ≥ 2 R v ⊥ and
A u B v ⊥ say that the role R is functional and that concepts A and B are
disjoint, respectively. An ABox is a finite set of concept assertions A(a) and
role assertions R(a, b), where A is a concept name, R a role and a, b ∈ NI . We
use Ind(A) to denote the set of individual names occurring in A. A DL-LiteN           horn
knowledge base (KB) is a pair (T , A) with T a TBox and A an ABox. The
semantics of the DL-LiteN     horn constructs is defined in the standard way [4], with
the unique names assumption for ABox individuals. For I an interpretation, C1 ,
C2 concepts, R a role, and a, b individual names, we use the notation I |= C1 v
C2 , I |= C1 (a), and I |= R(a, b) with the usual meaning. A KB K = (T , A) is
consistent if there is an interpretation I with I |= α for all α ∈ T ∪ A. In this
case, we write I |= K and say that I is a model of K.
    Let NV be a countably infinite set of variables. Taken together, the sets NV
and NI form the set NT of terms. A first-order (FO) query is a first-order formula
q = ϕ(v) in the signature NC ∪ NR with terms from NT , where the concept and
role names are treated as unary and binary predicates, respectively, and the
sequence v = v1 , . . . , vk of variables from NV contains all the free variables of ϕ.
The free variables are called the answer variables of q, and q is k-ary if it has
k answer variables. A conjunctive query (CQ, for short) is a first-order query of
the form q = ∃u ψ(u, v), where ψ is a conjunction of concept atoms A(t) and
role atoms P (t, t0 ) with t, t0 ∈ NT . The variables in u are called the quantified
variables of q. We denote by var(q) the set of all variables in u and v, by qvar(q)
the set of quantified variables, by avar(q) the set of answer variables, and by
term(q) the set of terms in q.
    For I an interpretation, q = ϕ(v) a k-ary FO query, and a1 , . . . , ak ∈ NI ,
we write I |= q[a1 , . . . , ak ] if I satisfies q with vi assigned to aIi , 1 ≤ i ≤ k.
A certain answer for a k-ary CQ q and a KB K is a tuple (a1 , . . . , ak ) such
that a1 , . . . , ak occur in K and I |= q[a1 , . . . , ak ] for each model I of K. We use
cert(q, K) to denote the set of all certain answers for q and K. This defines the
query answering problem studied in this paper: given a DL-LiteN             horn KB K and
a CQ q, compute cert(q, K).


3    ABox Rewriting / Canonical Interpretations
The ABox rewriting part of our approach consists of expanding the ABox A to a
canonical interpretation IK for the KB K = (T , A). The ABox A can be viewed
as a ‘fragment’ of IK , i.e., every ABox individual is an element of ∆IK , but there
are also additional elements that witness existential and number restrictions.
Expanding A to IK involves adding these additional elements, connecting them
to the ABox individuals and other additional elements by means of roles, and
adding extra concept memberships for the ABox individuals.
   Let K = (T , A) be a DL-LiteN
                               horn knowledge base. For a role R, let

              domA (R) = {a ∈ Ind(A) | ∃b ∈ Ind(A) R(a, b) ∈ A}.
We call a role R generating w.r.t. K if there are a ∈ Ind(A) and R1 , . . . , Rn
such that Rn = R, K |= ∃R1 (a), a 6∈ domA (R1 ), and T |= ∃Ri− v ∃Ri+1 for
1 ≤ i < n. In other words, R is generating w.r.t. K if every model I of K must
contain a d ∈ ∆I with an incoming R-arrow but there is an I where no such d
is identified by an individual in the ABox. The canonical interpretation IK for
K is defined now as follows:
     ∆IK = Ind(A) ∪ {xR | R is generating w.r.t. K},
     AIK = {a ∈ Ind(A) | K |= A(a)} ∪ {xR ∈ ∆I | K |= ∃R− v A},
     P IK = {(a, b) ∈ Ind(A) × Ind(A) | P (a, b) ∈ A} ∪
                               {(a, xP ) | K |= ∃P (a) and a ∈
                                                             / domA (P )} ∪
                               {(xP − , a) | K |= ∃P − (a) and a ∈
                                                                 / domA (P − )} ∪
                               {(xS , xP ) | K |= ∃S − v ∃P and P 6= S − } ∪
                               {(xP − , xS ) | K |= ∃S − v ∃P − and P 6= S},
      aIK = a.
In the above definition, P ranges over role names, while R and S range over role
names and their inverses. We do not discuss in detail how IK can be constructed
in the context of database systems here; for a similar construction see [17]. Note
that IK is finite by definition, and that ∆IK \ Ind(A) can typically be expected
to be relatively small.

Example 1. Consider the KB K = (T , A) with
         T = {A1 v A, A2 v A, ∃P − v ∃S, ∃S − v ∃R, A v ∃P },
         A = {A1 (a), A2 (b), S(a, b)}.
Then the canonical interpretation IK is as follows:
∆IK = {a, b, xP , xS , xR },
AIK = {a, b},                  AI1 K = {a},                   AI2 K = {b},
P IK = {(a, xP ), (b, xP )},   S IK = {(a, b), (xP , xS )},   RIK = {(b, xR ), (xS , xR )}.

We note that IK is not always a model of K. Indeed, this cannot be expected
since DL-LiteN horn does not enjoy the finite model property whereas IK needs to
be finite to be stored as a relational instance. As a concrete example, consider
K = (T , A) with T = {A v ∃P, ≥ 2 P − v ⊥} and A = {A(a), A(b)}. Then
xP ∈ (≥ 2 P − )IK , and so IK 6|= K. From the perspective of query answering,
this is not a problem. What is more problematic is that IK does not give the
right answers to queries, i.e., it is not the case that IK |= q[a] iff K |= q[a] for all
k-ary CQs q and k-tuples a ⊆ Ind(A). In contrast, the ‘unravelling’ of IK into
an (infinite) interpretation UK as introduced in the following does give the right
answers to queries.
    A path in IK is a finite sequence axR1 · · · xRn , for n ≥ 0, where

 – a ∈ Ind(A);
 – (aIK , xR1 ) ∈ R1IK ;
 – (xRi , xRi+1 ) ∈ RiIK , for 1 ≤ i < n;
 – Ri+1 6= Ri− , for 1 ≤ i < n.

We denote by paths(IK ) the set of all paths in IK and by tail(p) the last element
in p ∈ paths(IK ). Then UK is defined as follows:
     ∆UK    =   paths(IK );
     aUK    =   a for all a ∈ Ind(A);
     AU K   =   {p | tail(p) ∈ AIK };
     P UK   =   {(a, b) | a, b ∈ Ind(A) ∧ P (a, b) ∈ A} ∪
                {(s, s · xP ) | s · xP ∈ ∆UK } ∪ {(s · xP − , s) | s · xP − ∈ ∆UK },
where ‘·’ denotes concatenation. Just like IK , UK is in general not a model of K.
A simple example is given by T = {A v ≥ 2 P } and A = {A(a)}, where we
have UK 6|= A v ≥ 2P because a is P -related only to xP in UK . Nevertheless,
UK gives the right answers to all conjunctive queries. A proof of the following
theorem can be found in the full version of this paper [16].
Theorem 1. For every satisfiable DL-LiteN  horn KB K, every k-ary conjunctive
query q, and every k-tuple a ⊆ Ind(A), K |= q[a] iff UK |= q[a].
In the next section, we show how to rewrite the query q such that the answers
to q in UK (which is infinite and cannot be stored in a database) are identical
to the answers to the rewritten query in IK , thus paving the way to using IK
as a database instance for query answering. In the rewriting, we exploit several
structural properties of IK and UK that follow immediately from the definitions.
Of particular interest are the following two properties:

(p1) If (d, e), (d, e0 ) ∈ RIK with e 6= e0 , then d ∈ Ind(A) or d = xR− .
(p2) If (d, e), (d, e0 ) ∈ RUK with e 6= e0 , then d ∈ Ind(A).

We also use an additional concept name aux with the interpretations auxIK =
∆IK \ Ind(A) and auxUK = ∆UK \ Ind(A). Thus, aux distinguishes the elements
of IK used to satisfy existential restrictions and number restrictions from those
that correspond to individual names.


4    Query Rewriting

We present a procedure that rewrites a k-ary CQ q into an FO query q † such that
IK |= q † [a] iff UK |= q[a], for all KBs K = (T , A) and all k-tuples a ⊆ Ind(A).
Using Theorem 1 we obtain the intended answers to K and q by using an RDBMS
to execute q † over IK stored as a relational instance.
    Our construction of q † proceeds in two steps. First, we consider only queries
q satisfying a syntactic condition of being free of bad spikes. For queries of this
sort, we construct a rewriting q ∗ of size O(|q|2 ). In the second step, we show how
an unrestricted query q can be transformed into a query q † that is a disjunction
of queries free of bad spikes and such that the answers to q and q † coincide
on interpretations UK . The size of q † is 2O(|q|) . Applying the first step to each
                                                            2
disjunct, we thus obtain a rewriting of q of size 2O(|q|) . We note that the first
step is quite useful by itself as it establishes queries free of bad spikes as a large
and natural class of queries that can be rewritten with only a polynomial blowup.


4.1    Polynomial Rewriting

To simplify the notation, we sometimes identify a CQ with the set of its atoms
and do not distinguish atoms P (u, v) and P − (v, u) in queries. To prepare for
the rewriting of arbitrary queries, which uses the rewriting presented here as
a sub-procedure, we allow the input query q to contain atoms ¬aux(v). Note
that this is a purely technical issue and in applications aux(v) can be completely
hidden from the user.
Definition 1. Let q be a conjunctive query. A subquery
                 q c = {R0 (t0 , t1 ), R1 (t1 , t2 ), . . . , Rn−1 (tn−1 , tn )} ⊆ q
is said to be a cycle if t0 = tn , ti 6= tj for 0 ≤ i < j < n, and R0 6= R1− if n = 2.
                                                                                        −
    For n ≥ 3 each 5-tuple (ti , Ri , ti+n 1 , Ri+n 1 , ti+n 2 ) with i < n and Ri = Ri+   n1
                         c
is called a spike in q , where +n denotes addition modulo n. The spike is bad
w.r.t. q if ti+n 1 is a quantified variable and ¬aux(ti+n 1 ) ∈     / q. Now we call q c free
                                 c
of bad spikes if no spike in q is bad w.r.t. q, and q is free of bad spikes if every
cycle q c ⊆ q is.
The condition that R0 6= R1− for n = 2 in the definition of cycles ensures that
the query {P (u, v)}, which is identified with {P (u, v), P − (v, u)}, is not regarded
as a cycle. Fix a query q = ∃u ϕ that is free of bad spikes. Define inductively
sets Id(i)            (0)
       q by taking Idq = {(t, t, ∅) | t ∈ term(q)} and, for i > 0,


  Id(i+1)
    q     = Id(i)
              q ∪
                                                                              −
      {(t1 , t2 , {R}) | ∃(s1 , s2 , R) ∈ Id(i)
                                            q ∃R(t1 , s1 ), R(t2 , s2 ) ∈ q (R ∈/ R)} ∪
                    {(t1 , t2 , R1 ∪ R2 )) | ∃s (t1 , s, R1 ) ∈ Idq(i) ∧ (s, t2 , R2 ) ∈ Id(i)
                                                                                           q }.

Set Idq = i≥0 Id(i)
         S
                 q and let Cyc(q) be the set of quantified variables v occurring
on some cycle in q. We define the rewriting of q as q ∗ = ∃u (ϕ ∧ ϕ1 ∧ ϕ2 ), where
                                 ^
                   ϕ1 =                   ¬aux(v),
                                 v∈avar(q)∪Cyc(q)
                                          ^
                                                  (s = xR ) → (t = t0 ) .
                                                                       
                      ϕ2 =
                                  R(t,s),R(t0 ,s0 )∈q,
                                 (s,s0 ,R)∈Idq ,R− ∈R
                                                    /

The main result of this paper is the following theorem whose rather non-trivial
proof can be found in [16]:
Theorem 2. Let q = ∃u ϕ be a k-ary CQ free of bad spikes. Then, for every
knowledge base K and every k-tuple a ⊆ Ind(A), IK |= q ∗ [a] iff UK |= q[a].
Together, Theorems 1 and 2 show that the rewritten query q ∗ is as required.
Clearly, the size of q ∗ is O(|q|2 ). We now give an intuitive explanation of this
rewriting whose general purpose is to allow only matches in IK that can be
‘reproduced’ in UK .
    The conjunct ϕ1 of q ∗ ensures that answer variables and variables in cycles are
matched to ABox individuals. The former is required independently of whether
we use UK or IK and just reflects the fact that query answers can contain ABox
individuals only. The latter is more interesting. First, it is readily seen that no
variable in a cycle can be matched to an element of auxUK if the cycle is free of
bad spikes and thus, completeness is not compromised. To see why variables in
cycles must be matched to ABox individuals to ensure soundness, consider the
query q0 = ∃v1 , v2 , v3 (P1 (v1 , v2 ) ∧ P2 (v2 , v3 ) ∧ P3 (v3 , v1 )). As q0 is a cycle,
q0∗ = ∃v1 v2 v3 (P1 (v1 , v2 )∧P2 (v2 , v3 )∧P3 (v3 , v1 )∧¬aux(v1 )∧¬aux(v2 )∧¬aux(v3 )),
where the ¬aux(vi ) are introduced since the vi are in a cycle. To see that without
this rewriting we would not get correct answers, consider K0 = (T0 , A0 ) with
  T0 = {A v ∃P1 , ∃P1− v ∃P2 , ∃P2− v ∃P3 , ∃P3− v ∃P1 },               A0 = {A(a)}.
Then K 6|= q0 and thus UK 6|= q0 , but IK |= q0 . On the other hand, IK 6|= q0∗ .
   To explain the conjunct ϕ2 , consider q1 = ∃u (P (v, u) ∧ P (w, u)). Then we
have (v, w, {P }) ∈ Id(1)
                      q1 and, therefore,

      q1∗ = ∃u (P (v, u) ∧ P (w, u) ∧ ¬aux(v) ∧ ¬aux(w) ∧ (u = xP → v = w)),
where ¬aux(v) ∧ ¬aux(w) is introduced by ϕ1 since v and w are answer variables
and the last conjunct is introduced by ϕ2 . Completeness is not compromised
because, by (p2), no non-ABox element can be in a relation RUK to two distinct
elements and u = xP implies that u is matched with a non-ABox element. To see
that the last conjunct is necessary to ensure soundness, consider K1 = (T1 , A1 ),
for T1 = {A v ∃P } and A1 = {A(a), A(b)}. Then K1 6|= q1 [a, b] and thus
UK1 6|= q1 [a, b], but IK1 |= q1 [a, b]. On the other hand, IK1 6|= q1∗ [a, b]. The use of
the pre-condition u = xP instead of aux(u) for identifying v and w is one of the
main differences to the rewriting for EL in [17]. It reflects property (p1) of IK .


4.2     Exponential Rewriting

We now show how an unrestricted query q can be transformed into a query q †
that is a disjunction of queries free of bad spikes and such that the answers to q
and q † coincide on the interpretations UK .
   Given a spike S = (t0 , R0 , t1 , R1 , t2 ), its center variable v(S) is t1 (which
must, by definition, be a quantified variable). We rewrite q = ∃u ϕ into a dis-
junction of the form               _
                           q† =        ∃u (ϕi ∧ ψi ∧ χi ),                        (†)
                                      i

where

 1. each ϕi is a substitution instance of ϕ (obtained from ϕ by replacing terms);
 2. each ψi is a conjunction of atoms aux(t);
 3. each χi is a conjunction of negated atoms ¬aux(t);
 4. for each ϕi and each spike S in a cycle in ϕi , either v(S) is not quantified or
    ¬aux(v(S)) is a conjunct of χi .
The last condition guarantees that each disjunct is free of bad spikes. Indeed,
each disjunct is of the form discussed in the previous section and, using Theo-
rem 2, we can apply the rewriting from the previous section to each disjunct of
q † and get a rewriting for q that can be passed to an RDBMS for execution.
Theorem 3. For every k-ary CQ q, one can construct an FO query q † of the
form (†) such that, for every knowledge base K and k-tuple a ⊆ Ind(A), we have
UK |= q[a] iff UK |= q † [a].
We now discuss the construction of q † in detail. The rewriting is iterative in
the following sense: a single step rewriting obtains a formula of the form q † , but
for which Condition 4 may not yet be satisfied. Hence we reapply the rewriting
to the disjuncts of the obtained formula until Condition 4 is satisfied. For this
reason, we assume that the input query q = ∃u ϕ to the rewriting may contain
atoms ¬aux(v). Let S(q) be the set of bad spikes S in cycles in q and let Γ =
{v(S) | S ∈ S(q)}. For every subset U of Γ , define an equivalence relation ∼U
on term(q) as the reflexive and transitive closure of
                    {(t0 , t1 ) | (t0 , R, t, R0 , t1 ) ∈ S(q), t ∈ U }.
Intuitively, U is a ‘guess’ identifying those variables in Γ that are mapped to
elements of auxUK , whereas all variables in Γ \ U are mapped to elements of ∆IK
that are identified by individual names. Having this guess allows us to eliminate
bad spikes by (i) adding atoms ¬aux(v) for all v ∈ Γ \ U and (ii) identifying
all terms t, t0 with t ∼U t0 as these have to be mapped to the same element in
matches of q in UK by property (p2) from Section 3.
                                   ^               ^
                 q U = ∃u ϕU ∧
                                                                 
                                       aux(tξ ) ∧       ¬aux(tξ ) ,
                                     t∈U              t∈(Γ \U )

where tξ is a representative of the ∼U -equivalence class ξ of t and ϕU results
from ϕ by replacing every t in it with tξ .
Lemma 1. For every k-ary W  CQ q, every KB K, and every k-tuple a ⊆ Ind(A),
we have UK |= q[a] iff UK |= U ⊆Γ q U [a].

Clearly, all bad spikes in q are eliminated during the construction of q U . However,
as mentioned above, this elimination can introduce new bad spikes. So we apply
the rewriting to the disjuncts of q U and repeat this process until we end up with
a query satisfying Condition 4 above. This procedure must terminate since at
least two terms are identified at each step. The outcome is the desired rewriting
q † of q. It is not hard to see that it is of size 2O(|q|) .


5   Experiments
We evaluate the performance of our combined FO rewriting technique by com-
paring it (i) with unrewritten queries over the original ABox, which does not yield
correct results but provides an ‘ultimate lower bound’ in the sense that queries
cannot be expected to be executed any faster; and (ii) with the non-combined FO
query rewriting introduced in [8, 9] and implemented in the QuOnto system [1,
21].
    The experiments employ two DL-Lite ontologies formulated in DL-Litecore ,
the common fragment of DL-LiteN     horn and the logic underlying QuOnto. The
first ontology, called Galen-Lite, is the DL-Litecore approximation of the well-
known medical ontology Galen: it consists of exactly those DL-Litecore con-
cept inclusions (in the vocabulary of Galen) that are logical consequences of
Galen. Galen-Lite contains 2734 concept names, 207 roles, and 4890 axioms.
The second ontology is called Core; it is a DL-Litecore representation of (a
fragment of) a supply-chain management system used by the bookstore chain
Ottakar’s, now rebranded as Waterstone’s. Core contains 84 concept names, 77
roles, and 381 axioms. Galen-Lite defines a complex concept hierarchy, but there
are only few axioms that involve roles. This leads to canonical interpretations
whose anonymous (i.e., non-ABox) part is relatively small. Core, in contrast,
comprises more interesting statements about roles, which gives a richer anony-
mous part in canonical interpretations. Core’s concept hierarchy, however, is
not as sophisticated as the one of Galen-Lite. Both ontologies can be found at
http://www.dcs.bbk.ac.uk/~roman/query-rewriting/.
    In the experiments for the combined FO rewriting technique, we use the
following representation of IK . As relational database systems are not optimized
to handle a large number of relatively small relations, one for each concept and
role name in the ontology, only two relations have been used to represent IK :
acbox(conceptid,indid) and arbox(roleid,domain-indid,range-indid),
where conceptid and roleid are numerical identifiers for concept names and
roles names, indid, domain-indid, and range-indid are numerical identifiers
for individuals, acbox represents concept memberships, and arbox role member-
ships. B+tree indices were created on the attribute pairs {conceptid,indid},
{roleid,domain-indid} and {roleid,range-indid}. We distinguish ABox in-
dividuals from anonymous ones by positive and negative ids, and thus need not
store aux as a relation. As an example for this representation, take the query
‘q = A(x) ∧ ¬aux(x)’ which, with respect to the Core ontology (that merely
provides the concept identifier for A), translates to the SQL statement
     select indid from acbox where conceptid=31 and indid > 0.
Data sets (ABox instances) for our experiments were generated randomly. The
data was stored and the test queries were executed on PostgreSQL version 8.3.7
running on a SUN Fire-280R server with two UltraSPARC III 1.2GHz CPUs,
4GB memory and 1TB storage under Solaris 5.10.
    Figures 1 and 2 summarize the running times for each of the test queries
and varying ABox size. For each ABox, we report the number of individuals
(Ind, in thousands), the numbers of concept assertions (CAs, in millions) and
role assertions (RAs, in millions) in the original ABox and in the rewritten
ABox (i.e, the canonical interpretation). For each query, we show the following
execution times in the columns UN, RW and QO:
       ABox size (in M)                           query
  Ind original canonical    coreQ1       coreQ2          coreQ3       coreQ4
(in K) CA RA CA RA UN RW QO UN RW QO UN RW QO UN RW QO
   100 0.5 0.5 1 1.8 0.4 0.8 >2h 1.5 4.5 92.8 1.4 6.0 >2h 1.4 1.8 2.6
   100 1 1 1.8 3.2 0.6 2.3 >2h 1.7 7.6 241.1 1.5 11.4 >2h 3.1 3.4 3.1
   100 5 5 5.8        10 8.1 16.5 >2h 9.7 27.6 2048.2 9.4 33.3 >2h 18.8 20.6 18.6
   500 0.5 0.5 1.5 2.3 0.2 0.5 >2h 2.2 7.9 113.9 0.4 3.4 >2h 2.3 1.5 2.3
   500 1 1 2.8 4.3 0.6 1.7 >2h 3.2 13.6 271.0 1.2 12.8 >2h 2.9 2.7 2.8
   500 5 5 9 15.9 4.0 6.6 >2h 9.7 48.8 1921.3 7.9 69.4 >2h 20.6 20.5 19.7
   500 10 10 15 27.7 5.8 11.2 >2h 26.9 93.5 5093.2 23.2 125.1 >2h 47.5 48.1 44.1

         Fig. 1. Query processing time (in seconds) for the Core ontology.

       ABox size (in M)                         query
  Ind original canonical galenQ1     galenQ2         galenQ3         galenQ4
(in K) CA RA CA RA UN RW QO UN RW QO UN RW QO UN RW QO
     5 0.5 0.5 1.6 1.2 0.3 0.4 0.5 0.1 2.2 6.1 0.2 0.9 97.9 1.6 3.1 1.5
     5 1 1 2.4 1.9 0.7 0.6 1.0 1.1 3.5 9.9 0.5 1.5 133.9 1.6 5.3 1.6
     5 5 5 7.2 6.2 6.2 6.6 15.4 9.5 7.5 226.2 5.1 2.6 2330.8 22.0 27.1 22.6
    50 0.5 0.5 3.4 3.8 0.1 0.3 0.4 0.1 1.5 6.7 0.1 0.6 136.9 1.3 8.1 1.3
    50 1 1 5.5 4.8 0.2 0.7 0.5 0.2 3.5 9.0 0.2 1.3 152.8 2.6 14.9 1.6
    50 5 5 15.5 12.0 1.1 1.6 2.5 1.2 15.1 40.7 16.3 6.2 3397.3 8.8 40.0 9.2
    50 10 10 24.6 18.7 2.1 2.4 5.7 1.5 21.8 117.3 2.7 16.9 1318.6 23.8 72.0 19.9

      Fig. 2. Query processing time (in seconds) for the Galen-Lite ontology.
UN: the unmodified query over the original ABox;
RW: the query rewritten as in Section 4 over the canonical interpretation from
   Section 3;
QO: the query produced by QuOnto (over the original ABox).
The queries referred to in the figure are as follows, where the symbols A, B, R,
etc., stand for concept and role names in Core and Galen-Lite:
              coreQ1(x) = ∃yz A(x), B(y), R(x, y), R(x, z), C(z)
              coreQ2(x) = ∃yz R(x, y), S(x, z)
              coreQ3(x) = ∃yz A(x), R(x, y), S(x, z)
              coreQ4(x) = ∃xyzw R(x, y), R(z, y), R(x, w), R(z, w)
             galenQ1(x) = ∃y A(x), R(x, y), B(y)
             galenQ2(x) = ∃y A0 (x), R0 (x, y), B 0 (y)
             galenQ3(x) = ∃yz A(x), B(y), R(x, y), R(x, z), C(z)
             galenQ4(x) = ∃xyzw R(x, y), R(z, y), R(x, w), R(z, w)
Thus, the only difference between, say, coreQ1 and galenQ3 is that the abstract
symbols have been replaced by distinct concept and role names. Note that coreQ4
and galenQ4 contain bad spikes, whereas all other queries are free of them. These
queries show only a sample of the queries that we have tested, but the results
are representative.
   To evaluate the results, we first compare the query processing times for the
unmodified query over the original ABox (UN) with the times for our combined
FO rewriting technique (RW). The figures show that RW is almost always less
than 10 times longer than UN, in many case much less than 5 times longer. In
terms of absolute numbers, RW queries are all answered within one minute, and
usually even within a few seconds (just like UN queries). This applies even to
the queries coreQ4 and galenQ4, for which an exponential rewriting is required.
Finally, we observe that the performance of the combined FO rewriting approach
is similar for Core and Galen-Lite, and thus does not seem to strongly depend
on the structure of the TBox. We find these results extremely encouraging: while
our approach results in a considerable increase in the ABox size and rewriting the
ABox may take some time (up to an hour using standard RDBMSs, which are not
optimized for this task), the actual query execution is usually not significantly
slower than the ‘ultimate lower bound’.
    We now compare UN/RW with the QuOnto approach, which results in query
execution times that are much less favorable for both Core and Galen. To explain
this discrepancy, recall that the RW approach rewrites the query independently
of the TBox, whereas QuOnto rewriting does depend on the TBox. To see a basic
example of this dependence, consider q(x) = A(x) and T = {A1 v A, A2 v A},
where the QuOnto rewriting is q 0 (x) = A(x) ∨ A1 (x) ∨ A2 (x), i.e., there is one
disjunct for each subsumee of the concept A in the query. Galen-Lite is a TBox
in which many concepts have a lot of subsumees. For example, the concepts A
and B in galenQ2 have 39 and 11 (direct and indirect) subsumees, which results
in 429 subqueries; galenQ3 even generates 3072 subqueries.
    Other sources of query expansion in QuOnto stem from mandatory role par-
ticipation assertions and the possibility of query folding due to multiple oc-
currences of the same role. Like many ontologies originating from ER design,
the Core TBox contains such constraints. For the queries coreQ1, coreQ2, and
coreQ3, this results in 180, 310, and 2100 disjuncts in the rewritten query, re-
spectively. We note that the weak performance of QuOnto on coreQ1 and coreQ3
over the Core TBox can, in part, be traced to poor query optimizer choices. On
queries with very many disjuncts, the probability of the optimizer choosing a
bad strategy for at least one of the disjuncts is rather high, and the resulting
execution time will dominate the execution time of the overall query.



6   Conclusion


We introduce a novel approach to CQ answering over DL-LiteN   horn TBoxes using
relational database systems. Our experimental evaluation suggests that, in ap-
plications where the user has authority over the data and can implement ABox
rewriting, this approach may be preferable to the TBox-based rewriting [8, 9].
The work presented in this paper is only a first step: future work includes an
adaptation of our approach to variants of DL-Lite with role hierarchies, extend-
ing the class of CQs for which a polynomial rewriting is possible, finding a more
careful exponential rewriting step, and developing incremental approaches to
ABox rewriting. It might also be interesting to combine the ideas underlying
query rewriting in QuOnto with ABox rewriting to improve query execution
times in the QuOnto approach.
References

 1. A. Acciarri, D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, M. Palmieri,
    and R. Rosati. QuOnto: Querying ontologies. In Proc. of the 20th Nat. Conf.
    on Artificial Intelligence (AAAI 2005), pages 1670–1671, 2005.
 2. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. DL-Lite in the
    light of first-order logic. In Proc. of the 22nd Nat. Conf. on Artificial Intelligence
    (AAAI 2007), pages 361–366, 2007.
 3. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite fam-
    ily and relations. Technical Report BBKCS-09-03, SCSIS, Birkbeck College, Lon-
    don, 2009 (available at http://www.dcs.bbk.ac.uk/research/techreps/2009/
    bbkcs-09-03.pdf).
 4. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, edi-
    tors. The Description Logic Handbook: Theory, Implementation and Applications.
    Cambridge University Press, 2003.
 5. D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query
    containment under constraints. In PODS’98, pages 149–158, New York, NY, USA,
    1998. ACM.
 6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, and
    M. Ruzzi. Data integration through DL-Lite A ontologies. In K.-D. Schewe and
    B. Thalheim, editors, Revised Selected Papers of the 3rd Int. Workshop on Seman-
    tics in Data and Knowledge Bases (SDKB 2008), volume 4925 of Lecture Notes in
    Computer Science, pages 26–47. Springer, 2008.
 7. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. DL-Lite:
    Tractable description logics for ontologies. In Proc. of the 20th Nat. Conf. on
    Artificial Intelligence (AAAI 2005), pages 602–607, 2005.
 8. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Data
    complexity of query answering in description logics. In Proc. of the 10th Int. Conf.
    on the Principles of Knowledge Representation and Reasoning (KR 2006), pages
    260–270, 2006.
 9. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. of Automated Reasoning, 39(3):385–429, 2007.
10. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Inconsis-
    tency tolerance in P2P data integration: An epistemic logic approach. Information
    Systems, 33(4):360–384, 2008.
11. T. Eiter, C. Lutz, M. Ortiz, and M. Šimkus. Query answering in description logics
    with transitive roles. In Proceedings of the 21st International Joint Conference on
    Artificial Intelligence IJCAI09. AAAI Press, 2009.
12. B. Glimm, I. Horrocks, C. Lutz, and U. Sattler. Conjunctive query answering for
    the description logic SHIQ. In Proc. of the 20th Int. Joint Conf. on Artificial
    Intelligence (IJCAI 2007), pages 399–404, 2007.
13. B. Glimm, I. Horrocks, and U. Sattler. Conjunctive query answering for description
    logics with transitive roles. In Proceedings of the 2006 Description Logic Workshop
    (DL 2006). CEUR Workshop Proceedings, 2006.
14. I. Horrocks and S. Tessaris. A conjunctive query language for description logic
    ABoxes. In In AAAI/IAAI, pages 399–404, 2000.
15. C. Lutz. The complexity of conjunctive query answering in expressive description
    logics. In A. Armando, P. Baumgartner, and G. Dowek, editors, Proceedings of
    the 4th International Joint Conference on Automated Reasoning (IJCAR2008),
    number 5195 in LNAI, pages 179–193. Springer, 2008.
16. C. Lutz, R. Kontchakov, D. Toman, F. Wolter, and M. Zakharyaschev. Combined
    FO rewritability for conjunctive query answering in DL-Lite. Available at http:
    //www.informatik.uni-bremen.de/~clu/, 2009.
17. C. Lutz, D. Toman, and F. Wolter. Conjunctive query answering in EL using a
    database system. In Proc. of the 5th Int. Workshop on OWL: Experiences and
    Directions (OWLED 2008), 2008.
18. C. Lutz, D. Toman, and F. Wolter. Conjunctive Query Answering in the Descrip-
    tion Logic EL using a Relational Database System. In Proc. of the 21th Int. Joint
    Conf. on Artificial Intelligence (IJCAI 2009), page in press, 2009.
19. M. Ortiz, D. Calvanese, and T. Eiter. Data complexity of query answering in
    expressive description logics via tableaux. J. of Automated Reasoning, 41(1):61–
    98, 2008.
20. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati.
    Linking data to ontologies. J. on Data Semantics, X:133–173, 2008.
21. A. Poggi, M. Rodriguez, and M. Ruzzi. Ontology-based database access with
    DIG-Mastro and the OBDA Plugin for Protégé. In Kendall Clark and Peter F.
    Patel-Schneider, editors, Proc. of the 4th Int. Workshop on OWL: Experiences and
    Directions (OWLED 2008 DC), 2008.
22. H. Perez-Urbina, B. Motik, and I. Horrocks. Rewriting Conjunctive Queries over
    Description Logic Knowledge Bases. In Proc. of the 3rd Int. Workshop on Seman-
    tics in Data and Knowledge Bases (SDKB 2008), number 4925 in LNCS, pages
    199–214, Springer, 2008.