Query Answering via Modal Definability
             with FaCT++: First Blood

           S. Kikot1 , D. Tsarkov2 , M. Zakharyaschev1 , and E. Zolin3
           1
             Department of Computer Science and Information Systems,
                       Birkbeck, University of London, U.K.
                          {kikot,michael}@dcs.bbk.ac.uk
           2
             School of Computer Science, University of Manchester, U.K.
                               tsarkov@cs.man.ac.uk
                     3
                       Faculty of Mathematics and Mechanics,
                         Moscow State University, Russia
                                 ezolin@gmail.com


      Abstract. We use results on modal definability of first-order formulas
      to reduce the problem of answering conjunctive queries over knowledge
      bases (of any expressivity) to checking inconsistency of those knowledge
      bases extended with a number of ALCI concept assertions. This reduc-
      tion has been shown to work for conjunctive queries without cycles that
      only involve bound variables. In this paper, we present an optimised
      algorithm for the reduction, its implementation using FaCT++ as an
      underlying reasoner and the results of first experiments.


1   Introduction
Conjunctive query (CQ) answering over description logic (DL) ontologies has
recently become one of the hottest topics in the DL community. The DLs con-
sidered range from

 – the DL-Lite family and OWL 2 QL, which support first-order rewritability
   (and so belong to the class AC0 for data complexity) and can be used with
   standard relational database systems [2, 20, 1, 13], to
 – Horn DLs encompassing the EL family, OWL 2 EL, Horn-SROIQ and Horn-
   SHOIQ, which support DataLog rewritability (and so are P-complete for
   data complexity) and can be used with DataLog engines [21, 14, 19, 5], and
   further to
 – very expressive DLs which are coNP-hard for data complexity and require
   some special techniques for answering CQs; see [10, 15, 18, 7, 4, 3] and refer-
   ences therein for details.

In this paper, we present yet another approach to answering CQs over DL on-
tologies. It is based on a transformation of a given CQ q(x1 , . . . , xn ) to ALCI-
concepts C1 , . . . , Cn such that, for any knowledge base K = (T , A) (in any DL
language) and any n-tuple a = (a1 , . . . , an ) of individuals from the ABox A, we
have K |= q(a) iff K ∪ {a1 : C1 , . . . , an : Cn } is inconsistent. This transforma-
tion of CQs, which we call concept-rewriting (or c-rewriting for short), is based
on the approach to modal definability of first-order formulas developed in [23,
24, 12]. It works (at least) for connected CQs that have no cycles with only
bound variables. (Note that c-rewriting is based solely on the shape of a CQ and
hence works even for cyclic CQs with non-simple roles in cycles, which cannot
be handled by the standard query answering techniques [6, 15].) Thus, in theory,
answering c-rewritable CQs can be delegated to any standard DL reasoner.
    In practice, however, there are a few major obstacles to implementing the
c-rewriting approach using a DL reasoner as a black box. The principal one
is that checking consistency of a KB with a large ABox may be expensive,
especially if we have to test all possible tuples of individuals in order to compute
all answers to a given CQ. Even if a reasoner supports incremental ABoxes [9],
the approach is still impractical as the algorithms for processing additions and
(especially) removals of ABox assertions are quite involved and complicated.
    Our solution in this paper is to integrate the c-rewriting approach into the DL
reasoner FaCT++. During the consistency check, FaCT++ materialises the
ABox as a completion graph. For every answer candidate, the concepts involved
in the c-rewriting of the query are added, in the constructed completion graph,
to the labels of the corresponding individuals, and then the tableau algorithm
is used to check consistency of the updated completion graph. Thus, we do not
check consistency from scratch, but rather use the already computed completion
graph as a starting point. To cope with numerous answer candidates, we propose
three optimisation techniques that drastically improve performance, as our first
experimental results demonstrate.

2     Concept-Rewritable Conjunctive Queries
By a knowledge base (KB) in this paper we understand a KB K = (T , A) for-
mulated in the language of any DL, for example, SROIQ underlying OWL 2.
We use ind(A) to denote the set of individual names that occur in the ABox A.
    As usual, a conjunctive query (CQ) q(x) is a formula ∃u ϕ(x, u), where ϕ is
a conjunction of atoms of the form Ai (z1 ) or Rj (z1 , z2 ) with zk ∈ x ∪ u (without
loss of generality, we assume that CQs do not contain individual names). The
free variables x in q(x) are called answer variables. We typically use x and y
to denote tuples of answer variables. A tuple a ⊆ ind(A) is a certain answer to
q(x) over a KB K if I |= q(a) for all I |= K; in this case we write K |= q(a).
Where convenient, we regard a CQ as the set of its atoms. In this paper, we
consider only CQs with at least one answer variable.

Definition 1. Given a CQ q(x) with x = (x1 , . . . , xn ), a concept rewriting
(or a c-rewriting) of q(x) in a DL L is an n-tuple (x1 : C1 , . . . , xn : Cn ) with L-
concepts Ci such that, for any KB1 K = (T , A) and any n-tuple a = (a1 , . . . , an )
of individuals in A, we have K |= q(a) iff the KB K ∪ {a1 : C1 , . . . , an : Cn } is
1
    In any DL or even first-order language; see the discussion after Definition 4 in [23].
inconsistent. We say that a CQ q(x) is concept-rewritable (or c-rewritable) in L
if it has a c-rewriting in L. The size of a c-rewriting is defined as the sum of the
sizes of all Ci , 1 ≤ i ≤ n. Note that a c-rewriting of a CQ may (and usually does)
contain concept or role names that do not occur in the CQ. We assume that such
names are fresh in the sense that they are taken from a special alphabet that is
not used in any DL knowledge bases K mentioned above.

Example 1. Here are some c-rewritings in ALC (with B a fresh concept name):

 – (x : ¬A) is a c-rewriting of the CQ q(x) = A(x);
 – (x : B u ∀R.¬B) is a c-rewriting of q(x) = R(x, x);
 – (x : ∀R.¬B, y : B) is a c-rewriting of q(x, y) = R(x, y);
 – (x : ∀R.B, y : ∀R.¬B) is a c-rewriting of q(x, y) = ∃u (R(x, u) ∧ R(y, u)).

We only show this for q(x) = R(x, x). Suppose K = (T , A). We need to show
that, for every a ∈ ind(A), we have K |= R(a, a) iff K ∪ {a : B u ∀R.¬B} is
inconsistent. The implication ⇒ is trivial. For the converse, suppose K 6|= R(a, a)
and consider a model I of K where I 6|= R(a, a). Since B is a fresh concept name,
we can set B I := {aI }. Then, clearly, we have I |= K ∪ {a : B u ∀R.¬B}.

Example 2. On the other hand, the query q(x) = ∃u (R(x, u) ∧ R(u, u)) is not
c-rewritable in ALC. Indeed, suppose there exists an ALC-concept C such that
K |= q(a) iff K ∪ {a : C} is inconsistent, for every KB K = (T , A) and every
a ∈ ind(A). Consider the KBs K = (T , A) and K0 = (T , A0 ) in the DL S, where

                   T = { A v ∀R.¬A, > v ∃R.>, Trans(R) },
                   A = { A(a) },
                  A0 = { A(a), R(a, b), R(b, b) }.

Clearly, K 6|= q(a), and so K ∪ {a : C} is consistent. On the other hand, we have
K0 |= q(a), and so K0 ∪ {a : C} is inconsistent. By the finite model property
of S (see, e.g., [16, Corollary 4.3]), K ∪ {a : C} has a finite model I. Since every
element in I has an R-successor, aI has an R-successor w such that (w, w) ∈ RI .
Moreover, w 6= aI due to the first axiom in T . Now, by taking bI := w, we obtain
a model of K0 ∪ {a : C}, contrary to our assumption. (We do not know whether
the same result can be proved using ALC KBs.)

    Two sufficient conditions of c-rewritability of CQs in ALC and ALCI can
be obtained as a consequence of the modal definability results from [12]. To
formulate these conditions, we need a few more definitions.
    A CQ q(x) is called connected if its primal graph (whose vertices are the
variables in q and edges are the pairs (z, v) such that R(z, v) ∈ q, for some role
name R) is connected. We say that q(x) is internally cycle-free if its primal graph
has no cycles that consist of bound variables. Thus, all CQs from Example 1 are
internally cycle-free, while that in Example 2 is not, as R(u, u) is a cycle through
the bound variable u. Finally, we call q(x) internally reachable if each bound
variable u in q(x) is reachable via a directed path from some free variable;
more precisely, if there is a path S1 (z0 , z1 ), S2 (z1 , z2 ), . . . , Sn (zn−1 , zn ) such that
z0 ∈ x, u = zn and each Si is a role name in the given DL. For example, the CQ
q(x) = ∃u R(x, u) is internally reachable, while q 0 (x) = ∃u R(u, x) is not.
Theorem 1 (cf. [12], Theorem 7.8). (i ) Every connected and internally
cycle-free CQ q(x) has a c-rewriting in ALCI of size O(|q|).
   (ii ) Every connected, internally cycle-free, and internally reachable CQ q(x)
has a c-rewriting in ALC of size O(|q|).
    Below we present an algorithm that runs in quadratic time and, given a con-
nected CQ q(x), checks whether it is internally cycle-free and, if it is, constructs
a c-rewriting of q(x) in ALCI of size O(|q|). Moreover, if q(x) is internally
reachable, the resulting c-rewriting will be in ALC.
    Note that the c-rewriting algorithm from [12] can only be applied to CQs
without concept (unary) atoms, and in order to deal with arbitrary CQs, it was
suggested to eliminate such atoms at the price of introducing fresh role names.
In contrast, the c-rewriting algorithm presented below is directly applicable to
queries with concept atoms. These atoms are treated uniformly with fresh con-
cept names introduced by the algorithm (see A and B2 in Examples 6 and 7
below). However, one can see that the output of the algorithm below coincides
with that of the algorithm from [12] after eliminating the fresh role names in the
latter in accordance with their definitions. Thus, the new algorithm is correct.

3    C-Rewriting Algorithm
Suppose we are given a connected CQ q(x) (in this section, we treat every CQ
as the set of its atoms). The algorithm proceeds in four stages.
    First, it ‘unravels’ q(x) to a CQ q 0 (y) that has no cycles containing at least
one free variable. While doing this, it keeps the information about the substi-
tution σ : y → x of free variables required to transform q 0 (y) back to q(x).
Secondly, the algorithm checks whether the CQ q 0 (y) is tree-shaped, in which
case it represents q 0 (y) by means of an ELIO-concept C with nominals. Thirdly,
the algorithm translates C into a c-rewriting (y1 : D1 , . . . , ym : Dm ) of q 0 (y).
Finally, it identifies variables in y in accordance with the substitution σ and
returns the resulting c-rewriting (x1 : C1 , . . . , xn : Cn ) of q(x) in ALCI (or even
in ALC, if q(x) is internally reachable).
Example 3. Consider again the CQ q(x) = R(x, x).
Stage 1: We unravel q(x) into the CQ q 0 (x, y) = R(x, y) with the substitution
    σ(x) = x and σ(y) = x.
Stage 2: We represent q 0 (x, y) by the ELIO-assertion x : ({x} u ∃R.{y}).
Stage 3: We construct the c-rewriting (x : ¬∃R.B, y : B) of q 0 (x, y) in ALC,
    where B is a fresh concept name.
Stage 4: We merge x and y and take the conjunction of their concepts, which
    gives us the c-rewriting x : (B u ¬∃R.B) of the initial query q(x) in ALC.
    Note that it is equivalent to the c-rewriting we gave in Example 1.
    Now we proceed to a detailed description of all the stages of the algorithm.
Stage 1: Unravelling the CQ
In this stage, the input is a connected CQ q(x). The output is a connected CQ
q 0 (y), with |y| ≥ |x|, that has no cycles containing any free variables, together
with a substitution σ : y → x such that σ(q 0 (y)) = q(x). In particular, if q is
internally cycle-free, then q 0 is a tree-shaped CQ (i.e., its primal graph is a tree).
Example 4. For example, the CQ q(x, y) on the left in the figure below is un-
ravelled into the tree-shaped CQ q 0 (x1 , x2 , y1 , y2 , y3 ) on the right:
                   5                                          5
          u                v                     u                     v
         1                     4               1         2   3       4
                2     3                                                   6
                                     6
        x                      y:A            x1    x2       y1 :A   y2       y3
   More precisely, we transform a CQ q(x) into a CQ q 0 (y) as follows:
 1. Initially, we take the identical substitution σ : x → x, i.e., σ(xi ) = xi .
 2. Find in the current CQ a cycle that contains at least one free variable. If no
    such cycle is found, we return the current CQ and halt.
 3. Pick from this cycle any free variable x and any role atom containing this
    variable. Assume that the chosen atom has the form R(x, z), where R is a
    role name (and possibly z = x). (If the chosen atom has the form R(z, x) the
    subsequent operations are analogous.)
 4. Take a fresh free variable y, remove the atom R(x, z) from the CQ and add
    the atom R(y, z). Extend the substitution σ by σ(y) := x. Go to Step 2.

Stage 2: Rolling-up a tree-shaped CQ into an ELIO-concept
In this stage, the input is a connected CQ q 0 (y) that has no cycles involving
any free variables. If the CQ contains a cycle (which consists of bound variables
only), then an error message is returned (since this means that the initial query
q(x) is not internally cycle-free). Otherwise, the CQ is tree-shaped and the
output is an ELIO-concept C with nominals y, denoted C(y), that ‘represents’
the query q 0 (y) in the following sense: for any interpretation I and any tuple e
of its elements with |y| = |e| and e1 being the first component of e,

                          I |= q 0 (e)   ⇐⇒   I |= e1 : C(e).

    The algorithm given below performs traversal of the primal graph of the CQ
in a depth-first manner, starting from some answer variable and marking all
‘visited’ variables. If it encounters an already marked variable then the graph
has a cycle. Otherwise, the traversal is recorded as an ELIO-concept. All this is
achieved by assigning C(y) = Build ELIO Concept(q 0 , ∅, NULL, y1 ).
Example 5. The CQ q 0 (x1 , x2 , y1 , y2 , y3 ) from Example 4 is transformed by this
algorithm into the pair (x1 : C), where

C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 . ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .{y3 }) .
                                                                                  
Algorithm 1 Rolling-up a tree-shaped CQ into an ELIO-concept (here R is
either a role name or its inverse and InverseOf(R(u, v)) = R− (v, u)).
 1: function Build ELIO Concept(Query q, Set V isited, atom at, var v)
 2:    Concept C ← >
 3:    if v is free in q then
 4:        C ← {v}
 5:    for all A(v) ∈ q do
 6:        C ←C uA
 7:    V isited ← V isited ∪ {v}
 8:    for all R(v, w) ∈ q : R(v, w) 6= InverseOf(at) do
 9:        if w ∈ V isited then
10:            throw ‘Cycle through only bound variables found’
11:        C ← C u ∃R.Build ELIO Concept(q, V isited, R(v, w), w)
12:    return C


As shown in [12], this concept C = C(x1 , x2 , y1 , y2 , y3 ) is related to the query
q 0 (x1 , x2 , y1 , y2 , y3 ) and to the original query q(x, y) in the following way: for
any KB K = (T , A) and any individuals a, b ∈ ind(A), we have K |= q(a, b) iff
K |= q 0 (a, a, b, b, b) iff K |= a : C(a, a, b, b, b).

Remark 1. If we are happy to deal with ELIO-concepts (for instance, in case
our KB is already formulated in an extension of ELIO), then we can terminate
the algorithm now, for the above ELIO-concept can already be used for finding
answers to q(x). For example, to check whether a given pair of individuals (a, b)
is a certain answer to the CQ q(x, y) from Example 5, we can simply check
whether the KB K ∪ {a : ¬C(a, a, b, b, b)} is inconsistent.
    However, usually the addition of inverse roles and nominals to a DL makes
reasoning harder. So, our aim below is to avoid these constructors and produce
concepts that have neither nominals nor (if possible) inverse roles.

Stage 3: Converting ELIO-concepts to c-rewritings
As an input, we are given an ELIO-concept C with nominals y = (y1 , . . . , ym ).
For an output, we construct an m-tuple of ALCI-concepts (D1 , . . . , Dm ) such
that (y1 : D1 , . . . , ym : Dm ) is a c-rewriting of q 0 (y) in ALCI. Moreover, as shown
in [12], if the initial query q(x) is internally reachable, the algorithm below
constructs a c-rewriting in ALC.
    To describe the algorithm, we require a few definitions. Denote by d({x}, C)
the role-name-depth of an occurrence of a nominal {x} in C, which is the number
of constructors of the form ∃R, where R is a role name (and not an inverse role),
under which this occurrence of {x} lies in C. Formally, it is defined by induction:

 – d({x}, {x}) = 0;
 – d({x}, C u D) = d({x}, D u C) = d({x}, C), if this occurrence of {x} is in C;
 – d({x}, ∃R.C) = d({x}, C) + 1;
 – d({x}, ∃R− .C) = d({x}, C).
Thus, in Example 5, nominal occurrences have the following role-name-depths:
C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 . ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .{y3 }) .
                                                                                  

      0                  2                        2              2            3
   Simple ELIO-concepts are defined by induction (note ‘or’ in the last item):

 – each nominal {xi } is a simple concept;
 – if C is simple, then so is ∃R− .C;
 – if C or D is simple, then so is C u D.

Typical examples of simple concepts are {x}, ∃R− .{x}, ∃R− .({x} u ∃R.{y}).
Simple concepts were introduced in [11] for technical reasons: they correspond
to minimal valuations for antecedents of generalised Sahlqvist formulas [8]. We
need this notion to ensure that the resulting rewriting is in ALC for internally
reachable CQs. Examples of non-simple concepts are ∃R.{x} and ∃R− .∃R.{x}.
   Now, the algorithm takes an ELIO-concept C with nominals y = (y1 , . . . , ym ).
Note that each nominal from y has exactly one occurrence in C. The algorithm
produces an ordered set of pairs Res = (y1 : D1 , . . . , ym : Dm ), where Dj are
ALCI-concepts. Initially, Res is set to empty. Then we proceed as follows.

 1. The concept C has the form C = {y1 } u E. If E has no nominals, then add
    (y1 : ¬E) to Res and halt.
 2. Otherwise, pick a nominal {y} in E with maximal d({y}, E).
 3. Find in E the maximal (w.r.t. the subconcept relation) simple subconcept H
    containing {y} and no other nominals. To this end, start the search with {y}
    (which is simple) and go upwards in the syntactic tree of C until the concept
    under consideration either is non-simple or contains another nominal.
 4. Replace this occurrence of H in C with a fresh concept name B and denote
    the result by C 0 .
 5. Starting with the sequent H ⇒ B, apply the following rules repeatedly (the
    omitted rule with F 0 u F is symmetrical):

               F u F0 ⇒ G                                           ∃R− .F ⇒ G
       (r1)                (where {y} is in F )              (r2)
              F ⇒ ¬F 0 t G                                           F ⇒ ∀R.G
    until a sequent of the form {y} ⇒ D is obtained, for some ALCI-concept D.
 6. Add the pair (y : D) to Res.
 7. Go to Step 1 with the new concept C := C 0 .

    Note that since the concept H is simple and contains only one nominal, the
transformation of Step 5 is always possible and even deterministic. As shown
in [12, Theorem 7.9], if the initial CQ q(x) was internally cycle-free, then the
above algorithm returns a c-rewriting of q 0 (y) in ALCI; if additionally the initial
CQ q(x) was internally reachable, then D and E above are always ALC-concepts,
so in this case the algorithm produces a c-rewriting of q 0 (y) in ALC.

Example 6. Let C be as in Example 5. We show how Stage 3 works.
Iteration 1: The deepest nominal in C is {y3 }. Since ∃R6 .{y3 } is not simple, we
    take H = {y3 } and replace it with a fresh concept name B1 . No rules from
    Step 4 were needed. So we add the pair y3 : B1 to Res and obtain

    C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 . ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .B1 ) .
                                                                                    

Iteration 2: The nominals {y1 }, {x2 } and {y2 } are equally deep, so any of them
    will do. Let us pick {y2 }. Then H = ∃R4− .({y2 } u ∃R6 .B1 ), since it is simple,
    while ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .B1 ) contains a different nominal {x2 }.
    We introduce a fresh concept name B2 and transform the sequent H ⇒ B2
    as follows:
        ∃R4− .({y2 } u ∃R6 .B1 ) ⇒                 B2            (initial sequent)
               {y2 } u ∃R6 .B1 ⇒              ∀R4 .B2            by (r2)
               {y2 }             ⇒ ¬∃R6 .B1 t ∀R4 .B2            by (r1)

    So we add the pair y2 : ¬∃R6 .B1 t ∀R4 .B2 to Res and obtain:

             C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 .(∃R2− .{x2 } u B2 ) .
                                                                          

Iteration 3: Now we pick the nominal {x2 }, so H = (∃R2− .{x2 } u B2 ). Replace
    it in C with a fresh concept name B3 and transform the sequent H ⇒ B3 :

             ∃R2− .{x2 } u B2   ⇒             B3            (initial sequent)
             ∃R2− .{x2 }        ⇒       ¬B2 t B3            by (r1)
                   {x2 }        ⇒ ∀R2 .(¬B2 t B3 )          by (r2)

    So we add x2 : ∀R2 .(¬B2 t B3 ) to Res and obtain the concept

                     C = {x1 } u ∃R1 .(∃R3 .({y1 } u A) u ∃R5 .B3 ).

Iteration 4: Pick the nominal {y1 }, set H = ({y1 }uA), introduce a fresh concept
    name B4 , add y1 : (¬A t B4 ) to Res and obtain the concept

                         C = {x1 } u ∃R1 .(∃R3 .B4 u ∃R5 .B3 ).

Iteration 5: We notice that C = {x1 }uE, where the concept E has no nominals.
    So we add x1 : ¬∃R1 .(∃R3 .B4 u ∃R5 .B3 ) to Res and halt.

   The resulting set Res consists of all the framed pairs.

Stage 4: Merging nominals
In the set Res of pairs (y1 : D1 , . . . , ym : Dm ), we replace the variables y with x
in accordance with the substitution σ defined in Stage 1. Thus we obtain a new
set Res0 . Finally, for every variable xi ∈ x, we set Ci to be the conjunction of
all concepts that are assigned in Res0 to the variable xi . As shown in [12], the
resulting tuple (x1 : C1 , . . . , xn : Cn ) is a c-rewriting of the initial CQ q(x).
Example 7. For the set Res constructed in Example 6, Stage 4 yields the follow-
ing c-rewriting in ALC of the CQ q(x, y) from Example 4:

    x : ¬∃R1 .(∃R3 .B4 u ∃R5 .B3 ) u ∀R2 .(¬B2 t B3 ),
                                                                              
                                   y : B1 u (¬A t B4 ) u (¬∃R6 .B1 t ∀R4 .B2 ) .


4     Implementation, Optimisations and Experiments

We have implemented the c-rewriting approach to answering CQs over ontologies
in the DL reasoner FaCT++ [22]. FaCT++ is a tableau-based reasoner, which
means that in order to check whether a given KB K is consistent, it constructs
a so-called completion graph whose nodes include all the individuals from K
labelled with concept expressions.
     Given a KB K = (T , A) and a CQ q(x), FaCT++ first tests K for consis-
tency and keeps in memory the ‘deterministic’ part D of its completion graph
obtained by applying deterministic tableau rules (which are always applied be-
fore non-deterministic ones in FaCT++). This will save a lot of reasoning time
in later stages. Next, it checks whether q(x) satisfies the conditions of Theorem 1
and, if this is the case, constructs its c-rewriting (x1 : C1 , . . . , xn : Cn ). Now, for
every n-tuple of a = (a1 , . . . , an ) of individuals from A, it adds the concept Ci
to the label of ai , for 1 ≤ i ≤ n, in the completion graph D and runs the tableau
algorithm to check whether the resulting tableau is closed. If it is, which means
that the KB K ∪ {a1 : C1 , . . . , an : Cn } is inconsistent, the tuple a is returned as
a certain answer to q over K.
     The benefit of such an implementation over the ‘reasoner as a black box’
approach is that every time KB consistency checking starts from D rather than
from scratch. As the size of a typical CQ (and hence of its c-rewriting) is negligi-
bly smaller than the size of an ABox, only a small part of the completion graph
requires rebuilding. However, iterating over all tuples of individuals from A is
still a very time consuming task. Below, we describe a few optimisations that
have been implemented to reduce the search space.


4.1   Optimisations

We call an n-tuple of concepts (E1 , . . . , En ) an upper bound for a CQ q(x) if
K |= q(a) implies K |= ai : Ei , 1 ≤ i ≤ n, for any KB K = (T , A) and any n-
tuple a of individuals from A. The concept Ei is called an upper bound for xi in
q(x). If (E1 , . . . , En ) is an upper bound for q(x), then we can clearly restrict the
search for answers to q(x) in K to the tuples a for which K |= ai : Ei , 1 ≤ i ≤ n.
For example, if q(x) contains atoms Ai (xi ), 1 ≤ i ≤ n, then (A1 , . . . , An ) is an
upper bound for q(x). If q(x) contains an atom R(x1 , z), then (∃R.>, >, . . . , >)
is another upper bound for q(x).
    We are now in a position to describe our optimisations.
Rolling-up elimination. Suppose that the ELIO-concept C built in Stage 2 has
a subexpression of the form2 {yj } u F , where F is an ELI-concept (without
nominals). Then clearly F can be taken as an upper bound for yj in q 0 (y) and,
consequently, for the variable σ(yj ) in q(x). Moreover, it is now safe to remove
F from the ELIO-concept C, that is, replace {yj } u F with {yj } in C, which
simplifies the subsequent reasoning.

Query approximation. Here we build a stronger upper bound, but do not alter
the ELIO-concept C. Recall that, in Stage 2, we constructed an ELIO-concept,
say H1 , by traversing the graph of q 0 (y) starting from y1 . Let us construct ELIO-
concepts Hj , for 1 ≤ j ≤ m, similarly but starting
                                                  d       from yj . Now we replace in
Hj every nominal {y} with the conjunction {A | A(xi ) ∈ q}, where xi := σ(y)
and the empty conjunction is understood as >. It is not hard to see that the
resulting tuple of ELI-concepts (F1 , . . . , Fm ) forms an upper bound for q 0 (y). An
upper dbound (E1 , . . . , En ) for the original CQ q(x) can be obtained by taking
Ei := {Fj | σ(yj ) = xi }.

Avoiding search. Suppose that (x : B,x1 : C1 ,. . . ,xn : Cn ) is a c-rewriting of a CQ
q(x, x1 , . . . , xn ), where B is a fresh concept name. Fix any tuple a = (a1 , . . . , an )
of individuals from A. In order to find all answers to q of the form (b, a), we
would normally run, for every individual b from K, the tableau algorithm on
the KB K ∪ {b : B, a1 : C1 , . . . , an : Cn } to check whether it is inconsistent. We
can optimise this procedure as follows. Initially, the set of candidates for b is
taken to be G := ind(A). We run the tableau algorithm only once on the KB
K ∪ {a1 : C1 , . . . , an : Cn } and, for every complete clash-free tableau for it, we
remove from G all individuals b whose labels in M do not include ¬B. After that3
we return the tuples (b, a), for all b ∈ G, as answers. A similar optimisation is
applied to a c-rewriting of the form (x : ¬B, x1 : C1 , . . . , xn : Cn ).

4.2    Evaluation
We tested our implementation of query answering via c-rewriting with FaCT++
using the testbed described in [17]. It contains an extension of the LUBM ontol-
ogy, an ABox generator and 11 CQs. For our tests, we generated an ABox for 1
university; it contains 17,138 individuals and 76,039 assertions. The tests were
performed on a Mac OS X laptop with 3.06 GHz processor and 8 Gb of RAM.
On this machine, the consistency check for the whole KB takes 2.3 seconds, while
each of the subsequent checks for answer candidates takes about 0.2 ms.
   The results of experiments are shown in the table below, which gives the
number of tuples to be checked in order to answer the 11 CQs from [17]. We ran
four series of tests: the first one without optimisations, the second one with the
2
  We assume that conjunctions are stored as sets of conjuncts, so {yj } and F are not
  necessarily adjacent conjuncts, but rather ‘belong’ to the same conjunction.
3
  For better performance, the implemented algorithm is even more involved: for in-
  stance, it uses the information whether ¬B was added to the label of a node deter-
  ministically or non-deterministically. We omit details because of the space limit.
rolling-up optimisation, the third with both rolling-up and query approximation,
and the fourth series with all three optimisations.
    method      q1 q2 q3 q4 q5 q6 req1 req2 req3 req4 req5
    no optim. 0.3G 0.3G 17k 0.3G 17k 0.3G 17k 0.3G 5T 0.3G 17k
    rolling-up 4.2M 2.8M 536 8 6k 0.9M 0 13.5M 6.8G 10M 8k
    r+q         4M 2.8M 507 8 3k 0         0 0.8M 2.2G 16k 0
    r+q+a       4M 2.8M 507 1 3k 0         0   536 2.2G 1k  0
    The first row in the table is clear: as the ABox contains about 17k individuals,
the number of tuples to be checked is 17,138n , where n is the number of free
variables in the CQ. The ‘rolling-up’ row already shows a significant reduction
in the number of answer candidates. An interesting bit is the CQ
      req1(x) = ∃y, z (works For (x, y) ∧ affiliated Organization Of (y, z)),
which is c-rewritten to (x : ∀works For .∀affiliated Organization Of .⊥), and there
are no existential restrictions or property assertions in the KB that involve the
role affiliated Organization Of. As a result, the set of individuals to choose from
is empty. The ‘r+q’ row shows that in some cases (e.g., q6 and req5) the answer
set becomes empty as there are no instances of the upper bound concepts. Note
also a significant reduction in the number of answer candidates for req4. The
‘r+q+a’ row shows that 3 out of 11 CQs are of the form suitable for the avoiding
search optimisation. For example, the CQ
            req2(x, y) = (Person(x) ∧ teacher Of (x, y) ∧ Course(y))
leads to the rolled-up c-rewriting (x : ∀teacher Of.¬B, y : B) with B a fresh con-
cept name. By avoiding search for y, the number of checks is reduced by 3 orders
of magnitude, as there are only 1600 instances of the concept Course in the KB.

5   Conclusions
In this paper, we have presented a novel c-rewriting algorithm that reduces the
problem of answering conjunctive queries over ontologies to knowledge base con-
sistency checking. The CQs allowed by the algorithm must contain no cycles that
involve only existentially quantified variables (note that all the 11 CQs from [17]
satisfy this condition). The allowed KBs are formulated in any decidable DL. We
integrated the c-rewriting algorithm into the DL reasoner FaCT++, together
with a number of optimisations. Our first experiments show that the c-rewriting
approach with FaCT++ works reasonably well for CQs with one answer vari-
able: it takes under one second to find all answers to such CQs in the experiments
described above. Answering CQs with two variables may take up to 15 minutes.
The experiments also demonstrate that even simple optimisations can drasti-
cally reduce the number of answer candidates. Apart from implementing and
fine-tuning new optimisation techniques, we plan to characterise those CQs that
are not c-rewritable ([12] characterises modal definability, not c-rewritability of
CQs). Another interesting question is whether c-rewritings with number restric-
tions (i.e., in ALCQ or ALCIQ) can extend the class of c-rewritable CQs.
References
 1. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
    and relations. J. of Artificial Intelligence Research 36, 1–69 (2009)
 2. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. of Automated Reasoning 39(3), 385–429 (2007)
 3. Calvanese, D., Giacomo, G.D., Lenzerini, M., Rosati, R.: View-based query answer-
    ing in description logics: Semantics and complexity. J. of Computer and System
    Sciences 78(1), 26–46 (2012)
 4. Eiter, T., Ortiz, M., Simkus, M.: Conjunctive query answering in the description
    logic SH using knots. J. of Computer and System Sciences 78(1), 47–85 (2012)
 5. Eiter, T., Ortiz, M., Simkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn-
    SHIQ plus rules. In: Hoffmann, J., Selman, B. (eds.) Proc. of the 26th Nat. Conf.
    on Artificial Intelligence (AAAI 2010). pp. 726–733. AAAI Press (2012)
 6. Glimm, B., Kazakov, Y., Lutz, C.: Status QIO: An update. In: Rosati, R.,
    Rudolph, S., Zakharyaschev, M. (eds.) Proc. of the 24th Int. Workshop on De-
    scription Logics (DL 2011). CEUR Workshop Proceedings, vol. 745. CEUR-WS.org
    (2011)
 7. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Conjunctive query answering for the
    description logic SHIQ. J. of Artificial Intelligence Research 31, 157–204 (2008)
 8. Goranko, V., Vakarelov, D.: Elementary canonical formulae: extending Sahlqvist’s
    theorem. Annals of Pure and Applied Logic 141(1–2), 180–217 (2006)
 9. Halaschek-Wiener, C., Parsia, B., Sirin, E., Kalyanpur, A.: Description logic rea-
    soning for dynamic ABoxes. In: Parsia, B., Sattler, U., Toman, D. (eds.) Proc.
    of the 19th Int. Workshop on Description Logics (DL 2006). CEUR Workshop
    Proceedings, vol. 189. CEUR-WS.org (2006)
10. Hustadt, U., Motik, B., Sattler, U.: Reasoning in description logics by a reduction
    to disjunctive Datalog. J. of Automated Reasoning 39(3), 351–384 (2007)
11. Kikot, S.: An extension of Kracht’s theorem to generalized Sahlqvist formulas.
    Journal of Applied Non-Classical Logics 19(2), 227–251 (2009)
12. Kikot, S., Zolin, E.: Modal definability of first-order formulas with free variables
    and query answering. Journal of Applied Logic 11(2), 190–216 (2013)
13. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com-
    bined approach to query answering in DL-Lite. In: Lin, F., Sattler, U., Truszczyn-
    ski, M. (eds.) Proc. of the 12th Int. Conf. on the Principles of Knowledge Repre-
    sentation and Reasoning (KR 2010). pp. 247–257. AAAI Press (2010)
14. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description
    logic EL using a relational database system. In: Proc. of the 21st Int. Joint Conf.
    on Artificial Intelligence (IJCAI 2009). pp. 2070–2075. AAAI Press (2009)
15. Lutz, C.: The complexity of conjunctive query answering in expressive description
    logics. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) Proc. of the 4th Int.
    Joint Conf. on Automated Reasoning (IJCAR 2008). LNAI, vol. 5195, pp. 179–193.
    Springer (2008)
16. Lutz, C., Areces, C., Horrocks, I., Sattler, U.: Keys, nominals, and concrete do-
    mains. J. of Artificial Intelligence Research 23, 667–726 (2004)
17. Lutz, C., Seylan, I., Toman, D., Wolter, F.: The combined approach to OBDA:
    taming role hierarchies using filters. In: Fokoue, A., Liebig, T., Goodman, E.,
    Weaver, J., Urbani, J., Mizell, D. (eds.) Joint Workshop on Scalable and High-
    Performance Semantic Web Systems (SSWS+HPCSW 2012). pp. 16–31. CEUR-
    WS.org (2012)
18. Ortiz, M., Calvanese, D., Eiter, T.: Data complexity of query answering in ex-
    pressive description logics via tableaux. J. of Automated Reasoning 41(1), 61–98
    (2008)
19. Ortiz, M., Rudolph, S., Simkus, M.: Query answering in the Horn fragments of the
    description logics SHOIQ and SROIQ. In: Walsh, T. (ed.) Proc. of the 22nd Int.
    Joint Conf. on Artificial Intelligence (IJCAI 2011). pp. 1039–1044. AAAI Press
    (2011)
20. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
    Linking data to ontologies. J. on Data Semantics X, 133–173 (2008)
21. Rosati, R.: On conjunctive query answering in EL. In: Calvanese, D., Franconi,
    E., Haarslev, V., Lembo, D., Motik, B., Turhan, A.Y., Tessaris, S. (eds.) Proc.
    of the 20th Int. Workshop on Description Logics (DL 2007). CEUR Workshop
    Proceedings, vol. 250. CEUR-WS.org (2007)
22. Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: System description.
    In: Furbach, U., Shankar, N. (eds.) Proc. of the 3rd Int. Joint Conf. on Automated
    Reasoning (IJCAR 2006). Lecture Notes in Artificial Intelligence, vol. 4130, pp.
    292–297. Springer (2006)
23. Zolin, E.: Query answering based on modal correspondence theory. In: Proc. of the
    4th “Methods for modalities” Workshop (M4M-4). pp. 21–37. m4m.loria.fr (2005)
24. Zolin, E.: Modal logic applied to query answering and the case for variable modal-
    ities. In: Calvanese, D., Franconi, E., Haarslev, V., Lembo, D., Motik, B., Turhan,
    A.Y., Tessaris, S. (eds.) Proc. of the 20th Int. Workshop on Description Logics
    (DL 2007). CEUR Workshop Proceedings, vol. 250. CEUR-WS.org (2007)