Query Answering via Modal Definability with FaCT++: First Blood S. Kikot1 , D. Tsarkov2 , M. Zakharyaschev1 , and E. Zolin3 1 Department of Computer Science and Information Systems, Birkbeck, University of London, U.K. {kikot,michael}@dcs.bbk.ac.uk 2 School of Computer Science, University of Manchester, U.K. tsarkov@cs.man.ac.uk 3 Faculty of Mathematics and Mechanics, Moscow State University, Russia ezolin@gmail.com Abstract. We use results on modal definability of first-order formulas to reduce the problem of answering conjunctive queries over knowledge bases (of any expressivity) to checking inconsistency of those knowledge bases extended with a number of ALCI concept assertions. This reduc- tion has been shown to work for conjunctive queries without cycles that only involve bound variables. In this paper, we present an optimised algorithm for the reduction, its implementation using FaCT++ as an underlying reasoner and the results of first experiments. 1 Introduction Conjunctive query (CQ) answering over description logic (DL) ontologies has recently become one of the hottest topics in the DL community. The DLs con- sidered range from – the DL-Lite family and OWL 2 QL, which support first-order rewritability (and so belong to the class AC0 for data complexity) and can be used with standard relational database systems [2, 20, 1, 13], to – Horn DLs encompassing the EL family, OWL 2 EL, Horn-SROIQ and Horn- SHOIQ, which support DataLog rewritability (and so are P-complete for data complexity) and can be used with DataLog engines [21, 14, 19, 5], and further to – very expressive DLs which are coNP-hard for data complexity and require some special techniques for answering CQs; see [10, 15, 18, 7, 4, 3] and refer- ences therein for details. In this paper, we present yet another approach to answering CQs over DL on- tologies. It is based on a transformation of a given CQ q(x1 , . . . , xn ) to ALCI- concepts C1 , . . . , Cn such that, for any knowledge base K = (T , A) (in any DL language) and any n-tuple a = (a1 , . . . , an ) of individuals from the ABox A, we have K |= q(a) iff K ∪ {a1 : C1 , . . . , an : Cn } is inconsistent. This transforma- tion of CQs, which we call concept-rewriting (or c-rewriting for short), is based on the approach to modal definability of first-order formulas developed in [23, 24, 12]. It works (at least) for connected CQs that have no cycles with only bound variables. (Note that c-rewriting is based solely on the shape of a CQ and hence works even for cyclic CQs with non-simple roles in cycles, which cannot be handled by the standard query answering techniques [6, 15].) Thus, in theory, answering c-rewritable CQs can be delegated to any standard DL reasoner. In practice, however, there are a few major obstacles to implementing the c-rewriting approach using a DL reasoner as a black box. The principal one is that checking consistency of a KB with a large ABox may be expensive, especially if we have to test all possible tuples of individuals in order to compute all answers to a given CQ. Even if a reasoner supports incremental ABoxes [9], the approach is still impractical as the algorithms for processing additions and (especially) removals of ABox assertions are quite involved and complicated. Our solution in this paper is to integrate the c-rewriting approach into the DL reasoner FaCT++. During the consistency check, FaCT++ materialises the ABox as a completion graph. For every answer candidate, the concepts involved in the c-rewriting of the query are added, in the constructed completion graph, to the labels of the corresponding individuals, and then the tableau algorithm is used to check consistency of the updated completion graph. Thus, we do not check consistency from scratch, but rather use the already computed completion graph as a starting point. To cope with numerous answer candidates, we propose three optimisation techniques that drastically improve performance, as our first experimental results demonstrate. 2 Concept-Rewritable Conjunctive Queries By a knowledge base (KB) in this paper we understand a KB K = (T , A) for- mulated in the language of any DL, for example, SROIQ underlying OWL 2. We use ind(A) to denote the set of individual names that occur in the ABox A. As usual, a conjunctive query (CQ) q(x) is a formula ∃u ϕ(x, u), where ϕ is a conjunction of atoms of the form Ai (z1 ) or Rj (z1 , z2 ) with zk ∈ x ∪ u (without loss of generality, we assume that CQs do not contain individual names). The free variables x in q(x) are called answer variables. We typically use x and y to denote tuples of answer variables. A tuple a ⊆ ind(A) is a certain answer to q(x) over a KB K if I |= q(a) for all I |= K; in this case we write K |= q(a). Where convenient, we regard a CQ as the set of its atoms. In this paper, we consider only CQs with at least one answer variable. Definition 1. Given a CQ q(x) with x = (x1 , . . . , xn ), a concept rewriting (or a c-rewriting) of q(x) in a DL L is an n-tuple (x1 : C1 , . . . , xn : Cn ) with L- concepts Ci such that, for any KB1 K = (T , A) and any n-tuple a = (a1 , . . . , an ) of individuals in A, we have K |= q(a) iff the KB K ∪ {a1 : C1 , . . . , an : Cn } is 1 In any DL or even first-order language; see the discussion after Definition 4 in [23]. inconsistent. We say that a CQ q(x) is concept-rewritable (or c-rewritable) in L if it has a c-rewriting in L. The size of a c-rewriting is defined as the sum of the sizes of all Ci , 1 ≤ i ≤ n. Note that a c-rewriting of a CQ may (and usually does) contain concept or role names that do not occur in the CQ. We assume that such names are fresh in the sense that they are taken from a special alphabet that is not used in any DL knowledge bases K mentioned above. Example 1. Here are some c-rewritings in ALC (with B a fresh concept name): – (x : ¬A) is a c-rewriting of the CQ q(x) = A(x); – (x : B u ∀R.¬B) is a c-rewriting of q(x) = R(x, x); – (x : ∀R.¬B, y : B) is a c-rewriting of q(x, y) = R(x, y); – (x : ∀R.B, y : ∀R.¬B) is a c-rewriting of q(x, y) = ∃u (R(x, u) ∧ R(y, u)). We only show this for q(x) = R(x, x). Suppose K = (T , A). We need to show that, for every a ∈ ind(A), we have K |= R(a, a) iff K ∪ {a : B u ∀R.¬B} is inconsistent. The implication ⇒ is trivial. For the converse, suppose K 6|= R(a, a) and consider a model I of K where I 6|= R(a, a). Since B is a fresh concept name, we can set B I := {aI }. Then, clearly, we have I |= K ∪ {a : B u ∀R.¬B}. Example 2. On the other hand, the query q(x) = ∃u (R(x, u) ∧ R(u, u)) is not c-rewritable in ALC. Indeed, suppose there exists an ALC-concept C such that K |= q(a) iff K ∪ {a : C} is inconsistent, for every KB K = (T , A) and every a ∈ ind(A). Consider the KBs K = (T , A) and K0 = (T , A0 ) in the DL S, where T = { A v ∀R.¬A, > v ∃R.>, Trans(R) }, A = { A(a) }, A0 = { A(a), R(a, b), R(b, b) }. Clearly, K 6|= q(a), and so K ∪ {a : C} is consistent. On the other hand, we have K0 |= q(a), and so K0 ∪ {a : C} is inconsistent. By the finite model property of S (see, e.g., [16, Corollary 4.3]), K ∪ {a : C} has a finite model I. Since every element in I has an R-successor, aI has an R-successor w such that (w, w) ∈ RI . Moreover, w 6= aI due to the first axiom in T . Now, by taking bI := w, we obtain a model of K0 ∪ {a : C}, contrary to our assumption. (We do not know whether the same result can be proved using ALC KBs.) Two sufficient conditions of c-rewritability of CQs in ALC and ALCI can be obtained as a consequence of the modal definability results from [12]. To formulate these conditions, we need a few more definitions. A CQ q(x) is called connected if its primal graph (whose vertices are the variables in q and edges are the pairs (z, v) such that R(z, v) ∈ q, for some role name R) is connected. We say that q(x) is internally cycle-free if its primal graph has no cycles that consist of bound variables. Thus, all CQs from Example 1 are internally cycle-free, while that in Example 2 is not, as R(u, u) is a cycle through the bound variable u. Finally, we call q(x) internally reachable if each bound variable u in q(x) is reachable via a directed path from some free variable; more precisely, if there is a path S1 (z0 , z1 ), S2 (z1 , z2 ), . . . , Sn (zn−1 , zn ) such that z0 ∈ x, u = zn and each Si is a role name in the given DL. For example, the CQ q(x) = ∃u R(x, u) is internally reachable, while q 0 (x) = ∃u R(u, x) is not. Theorem 1 (cf. [12], Theorem 7.8). (i ) Every connected and internally cycle-free CQ q(x) has a c-rewriting in ALCI of size O(|q|). (ii ) Every connected, internally cycle-free, and internally reachable CQ q(x) has a c-rewriting in ALC of size O(|q|). Below we present an algorithm that runs in quadratic time and, given a con- nected CQ q(x), checks whether it is internally cycle-free and, if it is, constructs a c-rewriting of q(x) in ALCI of size O(|q|). Moreover, if q(x) is internally reachable, the resulting c-rewriting will be in ALC. Note that the c-rewriting algorithm from [12] can only be applied to CQs without concept (unary) atoms, and in order to deal with arbitrary CQs, it was suggested to eliminate such atoms at the price of introducing fresh role names. In contrast, the c-rewriting algorithm presented below is directly applicable to queries with concept atoms. These atoms are treated uniformly with fresh con- cept names introduced by the algorithm (see A and B2 in Examples 6 and 7 below). However, one can see that the output of the algorithm below coincides with that of the algorithm from [12] after eliminating the fresh role names in the latter in accordance with their definitions. Thus, the new algorithm is correct. 3 C-Rewriting Algorithm Suppose we are given a connected CQ q(x) (in this section, we treat every CQ as the set of its atoms). The algorithm proceeds in four stages. First, it ‘unravels’ q(x) to a CQ q 0 (y) that has no cycles containing at least one free variable. While doing this, it keeps the information about the substi- tution σ : y → x of free variables required to transform q 0 (y) back to q(x). Secondly, the algorithm checks whether the CQ q 0 (y) is tree-shaped, in which case it represents q 0 (y) by means of an ELIO-concept C with nominals. Thirdly, the algorithm translates C into a c-rewriting (y1 : D1 , . . . , ym : Dm ) of q 0 (y). Finally, it identifies variables in y in accordance with the substitution σ and returns the resulting c-rewriting (x1 : C1 , . . . , xn : Cn ) of q(x) in ALCI (or even in ALC, if q(x) is internally reachable). Example 3. Consider again the CQ q(x) = R(x, x). Stage 1: We unravel q(x) into the CQ q 0 (x, y) = R(x, y) with the substitution σ(x) = x and σ(y) = x. Stage 2: We represent q 0 (x, y) by the ELIO-assertion x : ({x} u ∃R.{y}). Stage 3: We construct the c-rewriting (x : ¬∃R.B, y : B) of q 0 (x, y) in ALC, where B is a fresh concept name. Stage 4: We merge x and y and take the conjunction of their concepts, which gives us the c-rewriting x : (B u ¬∃R.B) of the initial query q(x) in ALC. Note that it is equivalent to the c-rewriting we gave in Example 1. Now we proceed to a detailed description of all the stages of the algorithm. Stage 1: Unravelling the CQ In this stage, the input is a connected CQ q(x). The output is a connected CQ q 0 (y), with |y| ≥ |x|, that has no cycles containing any free variables, together with a substitution σ : y → x such that σ(q 0 (y)) = q(x). In particular, if q is internally cycle-free, then q 0 is a tree-shaped CQ (i.e., its primal graph is a tree). Example 4. For example, the CQ q(x, y) on the left in the figure below is un- ravelled into the tree-shaped CQ q 0 (x1 , x2 , y1 , y2 , y3 ) on the right: 5 5 u v u v 1 4 1 2 3 4 2 3 6 6 x y:A x1 x2 y1 :A y2 y3 More precisely, we transform a CQ q(x) into a CQ q 0 (y) as follows: 1. Initially, we take the identical substitution σ : x → x, i.e., σ(xi ) = xi . 2. Find in the current CQ a cycle that contains at least one free variable. If no such cycle is found, we return the current CQ and halt. 3. Pick from this cycle any free variable x and any role atom containing this variable. Assume that the chosen atom has the form R(x, z), where R is a role name (and possibly z = x). (If the chosen atom has the form R(z, x) the subsequent operations are analogous.) 4. Take a fresh free variable y, remove the atom R(x, z) from the CQ and add the atom R(y, z). Extend the substitution σ by σ(y) := x. Go to Step 2. Stage 2: Rolling-up a tree-shaped CQ into an ELIO-concept In this stage, the input is a connected CQ q 0 (y) that has no cycles involving any free variables. If the CQ contains a cycle (which consists of bound variables only), then an error message is returned (since this means that the initial query q(x) is not internally cycle-free). Otherwise, the CQ is tree-shaped and the output is an ELIO-concept C with nominals y, denoted C(y), that ‘represents’ the query q 0 (y) in the following sense: for any interpretation I and any tuple e of its elements with |y| = |e| and e1 being the first component of e, I |= q 0 (e) ⇐⇒ I |= e1 : C(e). The algorithm given below performs traversal of the primal graph of the CQ in a depth-first manner, starting from some answer variable and marking all ‘visited’ variables. If it encounters an already marked variable then the graph has a cycle. Otherwise, the traversal is recorded as an ELIO-concept. All this is achieved by assigning C(y) = Build ELIO Concept(q 0 , ∅, NULL, y1 ). Example 5. The CQ q 0 (x1 , x2 , y1 , y2 , y3 ) from Example 4 is transformed by this algorithm into the pair (x1 : C), where C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 . ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .{y3 }) .   Algorithm 1 Rolling-up a tree-shaped CQ into an ELIO-concept (here R is either a role name or its inverse and InverseOf(R(u, v)) = R− (v, u)). 1: function Build ELIO Concept(Query q, Set V isited, atom at, var v) 2: Concept C ← > 3: if v is free in q then 4: C ← {v} 5: for all A(v) ∈ q do 6: C ←C uA 7: V isited ← V isited ∪ {v} 8: for all R(v, w) ∈ q : R(v, w) 6= InverseOf(at) do 9: if w ∈ V isited then 10: throw ‘Cycle through only bound variables found’ 11: C ← C u ∃R.Build ELIO Concept(q, V isited, R(v, w), w) 12: return C As shown in [12], this concept C = C(x1 , x2 , y1 , y2 , y3 ) is related to the query q 0 (x1 , x2 , y1 , y2 , y3 ) and to the original query q(x, y) in the following way: for any KB K = (T , A) and any individuals a, b ∈ ind(A), we have K |= q(a, b) iff K |= q 0 (a, a, b, b, b) iff K |= a : C(a, a, b, b, b). Remark 1. If we are happy to deal with ELIO-concepts (for instance, in case our KB is already formulated in an extension of ELIO), then we can terminate the algorithm now, for the above ELIO-concept can already be used for finding answers to q(x). For example, to check whether a given pair of individuals (a, b) is a certain answer to the CQ q(x, y) from Example 5, we can simply check whether the KB K ∪ {a : ¬C(a, a, b, b, b)} is inconsistent. However, usually the addition of inverse roles and nominals to a DL makes reasoning harder. So, our aim below is to avoid these constructors and produce concepts that have neither nominals nor (if possible) inverse roles. Stage 3: Converting ELIO-concepts to c-rewritings As an input, we are given an ELIO-concept C with nominals y = (y1 , . . . , ym ). For an output, we construct an m-tuple of ALCI-concepts (D1 , . . . , Dm ) such that (y1 : D1 , . . . , ym : Dm ) is a c-rewriting of q 0 (y) in ALCI. Moreover, as shown in [12], if the initial query q(x) is internally reachable, the algorithm below constructs a c-rewriting in ALC. To describe the algorithm, we require a few definitions. Denote by d({x}, C) the role-name-depth of an occurrence of a nominal {x} in C, which is the number of constructors of the form ∃R, where R is a role name (and not an inverse role), under which this occurrence of {x} lies in C. Formally, it is defined by induction: – d({x}, {x}) = 0; – d({x}, C u D) = d({x}, D u C) = d({x}, C), if this occurrence of {x} is in C; – d({x}, ∃R.C) = d({x}, C) + 1; – d({x}, ∃R− .C) = d({x}, C). Thus, in Example 5, nominal occurrences have the following role-name-depths: C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 . ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .{y3 }) .   0 2 2 2 3 Simple ELIO-concepts are defined by induction (note ‘or’ in the last item): – each nominal {xi } is a simple concept; – if C is simple, then so is ∃R− .C; – if C or D is simple, then so is C u D. Typical examples of simple concepts are {x}, ∃R− .{x}, ∃R− .({x} u ∃R.{y}). Simple concepts were introduced in [11] for technical reasons: they correspond to minimal valuations for antecedents of generalised Sahlqvist formulas [8]. We need this notion to ensure that the resulting rewriting is in ALC for internally reachable CQs. Examples of non-simple concepts are ∃R.{x} and ∃R− .∃R.{x}. Now, the algorithm takes an ELIO-concept C with nominals y = (y1 , . . . , ym ). Note that each nominal from y has exactly one occurrence in C. The algorithm produces an ordered set of pairs Res = (y1 : D1 , . . . , ym : Dm ), where Dj are ALCI-concepts. Initially, Res is set to empty. Then we proceed as follows. 1. The concept C has the form C = {y1 } u E. If E has no nominals, then add (y1 : ¬E) to Res and halt. 2. Otherwise, pick a nominal {y} in E with maximal d({y}, E). 3. Find in E the maximal (w.r.t. the subconcept relation) simple subconcept H containing {y} and no other nominals. To this end, start the search with {y} (which is simple) and go upwards in the syntactic tree of C until the concept under consideration either is non-simple or contains another nominal. 4. Replace this occurrence of H in C with a fresh concept name B and denote the result by C 0 . 5. Starting with the sequent H ⇒ B, apply the following rules repeatedly (the omitted rule with F 0 u F is symmetrical): F u F0 ⇒ G ∃R− .F ⇒ G (r1) (where {y} is in F ) (r2) F ⇒ ¬F 0 t G F ⇒ ∀R.G until a sequent of the form {y} ⇒ D is obtained, for some ALCI-concept D. 6. Add the pair (y : D) to Res. 7. Go to Step 1 with the new concept C := C 0 . Note that since the concept H is simple and contains only one nominal, the transformation of Step 5 is always possible and even deterministic. As shown in [12, Theorem 7.9], if the initial CQ q(x) was internally cycle-free, then the above algorithm returns a c-rewriting of q 0 (y) in ALCI; if additionally the initial CQ q(x) was internally reachable, then D and E above are always ALC-concepts, so in this case the algorithm produces a c-rewriting of q 0 (y) in ALC. Example 6. Let C be as in Example 5. We show how Stage 3 works. Iteration 1: The deepest nominal in C is {y3 }. Since ∃R6 .{y3 } is not simple, we take H = {y3 } and replace it with a fresh concept name B1 . No rules from Step 4 were needed. So we add the pair y3 : B1 to Res and obtain C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 . ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .B1 ) .   Iteration 2: The nominals {y1 }, {x2 } and {y2 } are equally deep, so any of them will do. Let us pick {y2 }. Then H = ∃R4− .({y2 } u ∃R6 .B1 ), since it is simple, while ∃R2− .{x2 } u ∃R4− .({y2 } u ∃R6 .B1 ) contains a different nominal {x2 }. We introduce a fresh concept name B2 and transform the sequent H ⇒ B2 as follows: ∃R4− .({y2 } u ∃R6 .B1 ) ⇒ B2 (initial sequent) {y2 } u ∃R6 .B1 ⇒ ∀R4 .B2 by (r2) {y2 } ⇒ ¬∃R6 .B1 t ∀R4 .B2 by (r1) So we add the pair y2 : ¬∃R6 .B1 t ∀R4 .B2 to Res and obtain: C = {x1 } u ∃R1 . ∃R3 .({y1 } u A) u ∃R5 .(∃R2− .{x2 } u B2 ) .  Iteration 3: Now we pick the nominal {x2 }, so H = (∃R2− .{x2 } u B2 ). Replace it in C with a fresh concept name B3 and transform the sequent H ⇒ B3 : ∃R2− .{x2 } u B2 ⇒ B3 (initial sequent) ∃R2− .{x2 } ⇒ ¬B2 t B3 by (r1) {x2 } ⇒ ∀R2 .(¬B2 t B3 ) by (r2) So we add x2 : ∀R2 .(¬B2 t B3 ) to Res and obtain the concept C = {x1 } u ∃R1 .(∃R3 .({y1 } u A) u ∃R5 .B3 ). Iteration 4: Pick the nominal {y1 }, set H = ({y1 }uA), introduce a fresh concept name B4 , add y1 : (¬A t B4 ) to Res and obtain the concept C = {x1 } u ∃R1 .(∃R3 .B4 u ∃R5 .B3 ). Iteration 5: We notice that C = {x1 }uE, where the concept E has no nominals. So we add x1 : ¬∃R1 .(∃R3 .B4 u ∃R5 .B3 ) to Res and halt. The resulting set Res consists of all the framed pairs. Stage 4: Merging nominals In the set Res of pairs (y1 : D1 , . . . , ym : Dm ), we replace the variables y with x in accordance with the substitution σ defined in Stage 1. Thus we obtain a new set Res0 . Finally, for every variable xi ∈ x, we set Ci to be the conjunction of all concepts that are assigned in Res0 to the variable xi . As shown in [12], the resulting tuple (x1 : C1 , . . . , xn : Cn ) is a c-rewriting of the initial CQ q(x). Example 7. For the set Res constructed in Example 6, Stage 4 yields the follow- ing c-rewriting in ALC of the CQ q(x, y) from Example 4: x : ¬∃R1 .(∃R3 .B4 u ∃R5 .B3 ) u ∀R2 .(¬B2 t B3 ),  y : B1 u (¬A t B4 ) u (¬∃R6 .B1 t ∀R4 .B2 ) . 4 Implementation, Optimisations and Experiments We have implemented the c-rewriting approach to answering CQs over ontologies in the DL reasoner FaCT++ [22]. FaCT++ is a tableau-based reasoner, which means that in order to check whether a given KB K is consistent, it constructs a so-called completion graph whose nodes include all the individuals from K labelled with concept expressions. Given a KB K = (T , A) and a CQ q(x), FaCT++ first tests K for consis- tency and keeps in memory the ‘deterministic’ part D of its completion graph obtained by applying deterministic tableau rules (which are always applied be- fore non-deterministic ones in FaCT++). This will save a lot of reasoning time in later stages. Next, it checks whether q(x) satisfies the conditions of Theorem 1 and, if this is the case, constructs its c-rewriting (x1 : C1 , . . . , xn : Cn ). Now, for every n-tuple of a = (a1 , . . . , an ) of individuals from A, it adds the concept Ci to the label of ai , for 1 ≤ i ≤ n, in the completion graph D and runs the tableau algorithm to check whether the resulting tableau is closed. If it is, which means that the KB K ∪ {a1 : C1 , . . . , an : Cn } is inconsistent, the tuple a is returned as a certain answer to q over K. The benefit of such an implementation over the ‘reasoner as a black box’ approach is that every time KB consistency checking starts from D rather than from scratch. As the size of a typical CQ (and hence of its c-rewriting) is negligi- bly smaller than the size of an ABox, only a small part of the completion graph requires rebuilding. However, iterating over all tuples of individuals from A is still a very time consuming task. Below, we describe a few optimisations that have been implemented to reduce the search space. 4.1 Optimisations We call an n-tuple of concepts (E1 , . . . , En ) an upper bound for a CQ q(x) if K |= q(a) implies K |= ai : Ei , 1 ≤ i ≤ n, for any KB K = (T , A) and any n- tuple a of individuals from A. The concept Ei is called an upper bound for xi in q(x). If (E1 , . . . , En ) is an upper bound for q(x), then we can clearly restrict the search for answers to q(x) in K to the tuples a for which K |= ai : Ei , 1 ≤ i ≤ n. For example, if q(x) contains atoms Ai (xi ), 1 ≤ i ≤ n, then (A1 , . . . , An ) is an upper bound for q(x). If q(x) contains an atom R(x1 , z), then (∃R.>, >, . . . , >) is another upper bound for q(x). We are now in a position to describe our optimisations. Rolling-up elimination. Suppose that the ELIO-concept C built in Stage 2 has a subexpression of the form2 {yj } u F , where F is an ELI-concept (without nominals). Then clearly F can be taken as an upper bound for yj in q 0 (y) and, consequently, for the variable σ(yj ) in q(x). Moreover, it is now safe to remove F from the ELIO-concept C, that is, replace {yj } u F with {yj } in C, which simplifies the subsequent reasoning. Query approximation. Here we build a stronger upper bound, but do not alter the ELIO-concept C. Recall that, in Stage 2, we constructed an ELIO-concept, say H1 , by traversing the graph of q 0 (y) starting from y1 . Let us construct ELIO- concepts Hj , for 1 ≤ j ≤ m, similarly but starting d from yj . Now we replace in Hj every nominal {y} with the conjunction {A | A(xi ) ∈ q}, where xi := σ(y) and the empty conjunction is understood as >. It is not hard to see that the resulting tuple of ELI-concepts (F1 , . . . , Fm ) forms an upper bound for q 0 (y). An upper dbound (E1 , . . . , En ) for the original CQ q(x) can be obtained by taking Ei := {Fj | σ(yj ) = xi }. Avoiding search. Suppose that (x : B,x1 : C1 ,. . . ,xn : Cn ) is a c-rewriting of a CQ q(x, x1 , . . . , xn ), where B is a fresh concept name. Fix any tuple a = (a1 , . . . , an ) of individuals from A. In order to find all answers to q of the form (b, a), we would normally run, for every individual b from K, the tableau algorithm on the KB K ∪ {b : B, a1 : C1 , . . . , an : Cn } to check whether it is inconsistent. We can optimise this procedure as follows. Initially, the set of candidates for b is taken to be G := ind(A). We run the tableau algorithm only once on the KB K ∪ {a1 : C1 , . . . , an : Cn } and, for every complete clash-free tableau for it, we remove from G all individuals b whose labels in M do not include ¬B. After that3 we return the tuples (b, a), for all b ∈ G, as answers. A similar optimisation is applied to a c-rewriting of the form (x : ¬B, x1 : C1 , . . . , xn : Cn ). 4.2 Evaluation We tested our implementation of query answering via c-rewriting with FaCT++ using the testbed described in [17]. It contains an extension of the LUBM ontol- ogy, an ABox generator and 11 CQs. For our tests, we generated an ABox for 1 university; it contains 17,138 individuals and 76,039 assertions. The tests were performed on a Mac OS X laptop with 3.06 GHz processor and 8 Gb of RAM. On this machine, the consistency check for the whole KB takes 2.3 seconds, while each of the subsequent checks for answer candidates takes about 0.2 ms. The results of experiments are shown in the table below, which gives the number of tuples to be checked in order to answer the 11 CQs from [17]. We ran four series of tests: the first one without optimisations, the second one with the 2 We assume that conjunctions are stored as sets of conjuncts, so {yj } and F are not necessarily adjacent conjuncts, but rather ‘belong’ to the same conjunction. 3 For better performance, the implemented algorithm is even more involved: for in- stance, it uses the information whether ¬B was added to the label of a node deter- ministically or non-deterministically. We omit details because of the space limit. rolling-up optimisation, the third with both rolling-up and query approximation, and the fourth series with all three optimisations. method q1 q2 q3 q4 q5 q6 req1 req2 req3 req4 req5 no optim. 0.3G 0.3G 17k 0.3G 17k 0.3G 17k 0.3G 5T 0.3G 17k rolling-up 4.2M 2.8M 536 8 6k 0.9M 0 13.5M 6.8G 10M 8k r+q 4M 2.8M 507 8 3k 0 0 0.8M 2.2G 16k 0 r+q+a 4M 2.8M 507 1 3k 0 0 536 2.2G 1k 0 The first row in the table is clear: as the ABox contains about 17k individuals, the number of tuples to be checked is 17,138n , where n is the number of free variables in the CQ. The ‘rolling-up’ row already shows a significant reduction in the number of answer candidates. An interesting bit is the CQ req1(x) = ∃y, z (works For (x, y) ∧ affiliated Organization Of (y, z)), which is c-rewritten to (x : ∀works For .∀affiliated Organization Of .⊥), and there are no existential restrictions or property assertions in the KB that involve the role affiliated Organization Of. As a result, the set of individuals to choose from is empty. The ‘r+q’ row shows that in some cases (e.g., q6 and req5) the answer set becomes empty as there are no instances of the upper bound concepts. Note also a significant reduction in the number of answer candidates for req4. The ‘r+q+a’ row shows that 3 out of 11 CQs are of the form suitable for the avoiding search optimisation. For example, the CQ req2(x, y) = (Person(x) ∧ teacher Of (x, y) ∧ Course(y)) leads to the rolled-up c-rewriting (x : ∀teacher Of.¬B, y : B) with B a fresh con- cept name. By avoiding search for y, the number of checks is reduced by 3 orders of magnitude, as there are only 1600 instances of the concept Course in the KB. 5 Conclusions In this paper, we have presented a novel c-rewriting algorithm that reduces the problem of answering conjunctive queries over ontologies to knowledge base con- sistency checking. The CQs allowed by the algorithm must contain no cycles that involve only existentially quantified variables (note that all the 11 CQs from [17] satisfy this condition). The allowed KBs are formulated in any decidable DL. We integrated the c-rewriting algorithm into the DL reasoner FaCT++, together with a number of optimisations. Our first experiments show that the c-rewriting approach with FaCT++ works reasonably well for CQs with one answer vari- able: it takes under one second to find all answers to such CQs in the experiments described above. Answering CQs with two variables may take up to 15 minutes. The experiments also demonstrate that even simple optimisations can drasti- cally reduce the number of answer candidates. Apart from implementing and fine-tuning new optimisation techniques, we plan to characterise those CQs that are not c-rewritable ([12] characterises modal definability, not c-rewritability of CQs). Another interesting question is whether c-rewritings with number restric- tions (i.e., in ALCQ or ALCIQ) can extend the class of c-rewritable CQs. References 1. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. of Artificial Intelligence Research 36, 1–69 (2009) 2. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), 385–429 (2007) 3. Calvanese, D., Giacomo, G.D., Lenzerini, M., Rosati, R.: View-based query answer- ing in description logics: Semantics and complexity. J. of Computer and System Sciences 78(1), 26–46 (2012) 4. Eiter, T., Ortiz, M., Simkus, M.: Conjunctive query answering in the description logic SH using knots. J. of Computer and System Sciences 78(1), 47–85 (2012) 5. Eiter, T., Ortiz, M., Simkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn- SHIQ plus rules. In: Hoffmann, J., Selman, B. (eds.) Proc. of the 26th Nat. Conf. on Artificial Intelligence (AAAI 2010). pp. 726–733. AAAI Press (2012) 6. Glimm, B., Kazakov, Y., Lutz, C.: Status QIO: An update. In: Rosati, R., Rudolph, S., Zakharyaschev, M. (eds.) Proc. of the 24th Int. Workshop on De- scription Logics (DL 2011). CEUR Workshop Proceedings, vol. 745. CEUR-WS.org (2011) 7. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Conjunctive query answering for the description logic SHIQ. J. of Artificial Intelligence Research 31, 157–204 (2008) 8. Goranko, V., Vakarelov, D.: Elementary canonical formulae: extending Sahlqvist’s theorem. Annals of Pure and Applied Logic 141(1–2), 180–217 (2006) 9. Halaschek-Wiener, C., Parsia, B., Sirin, E., Kalyanpur, A.: Description logic rea- soning for dynamic ABoxes. In: Parsia, B., Sattler, U., Toman, D. (eds.) Proc. of the 19th Int. Workshop on Description Logics (DL 2006). CEUR Workshop Proceedings, vol. 189. CEUR-WS.org (2006) 10. Hustadt, U., Motik, B., Sattler, U.: Reasoning in description logics by a reduction to disjunctive Datalog. J. of Automated Reasoning 39(3), 351–384 (2007) 11. Kikot, S.: An extension of Kracht’s theorem to generalized Sahlqvist formulas. Journal of Applied Non-Classical Logics 19(2), 227–251 (2009) 12. Kikot, S., Zolin, E.: Modal definability of first-order formulas with free variables and query answering. Journal of Applied Logic 11(2), 190–216 (2013) 13. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The com- bined approach to query answering in DL-Lite. In: Lin, F., Sattler, U., Truszczyn- ski, M. (eds.) Proc. of the 12th Int. Conf. on the Principles of Knowledge Repre- sentation and Reasoning (KR 2010). pp. 247–257. AAAI Press (2010) 14. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL using a relational database system. In: Proc. of the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI 2009). pp. 2070–2075. AAAI Press (2009) 15. Lutz, C.: The complexity of conjunctive query answering in expressive description logics. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) Proc. of the 4th Int. Joint Conf. on Automated Reasoning (IJCAR 2008). LNAI, vol. 5195, pp. 179–193. Springer (2008) 16. Lutz, C., Areces, C., Horrocks, I., Sattler, U.: Keys, nominals, and concrete do- mains. J. of Artificial Intelligence Research 23, 667–726 (2004) 17. Lutz, C., Seylan, I., Toman, D., Wolter, F.: The combined approach to OBDA: taming role hierarchies using filters. In: Fokoue, A., Liebig, T., Goodman, E., Weaver, J., Urbani, J., Mizell, D. (eds.) Joint Workshop on Scalable and High- Performance Semantic Web Systems (SSWS+HPCSW 2012). pp. 16–31. CEUR- WS.org (2012) 18. Ortiz, M., Calvanese, D., Eiter, T.: Data complexity of query answering in ex- pressive description logics via tableaux. J. of Automated Reasoning 41(1), 61–98 (2008) 19. Ortiz, M., Rudolph, S., Simkus, M.: Query answering in the Horn fragments of the description logics SHOIQ and SROIQ. In: Walsh, T. (ed.) Proc. of the 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI 2011). pp. 1039–1044. AAAI Press (2011) 20. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. on Data Semantics X, 133–173 (2008) 21. Rosati, R.: On conjunctive query answering in EL. In: Calvanese, D., Franconi, E., Haarslev, V., Lembo, D., Motik, B., Turhan, A.Y., Tessaris, S. (eds.) Proc. of the 20th Int. Workshop on Description Logics (DL 2007). CEUR Workshop Proceedings, vol. 250. CEUR-WS.org (2007) 22. Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: System description. In: Furbach, U., Shankar, N. (eds.) Proc. of the 3rd Int. Joint Conf. on Automated Reasoning (IJCAR 2006). Lecture Notes in Artificial Intelligence, vol. 4130, pp. 292–297. Springer (2006) 23. Zolin, E.: Query answering based on modal correspondence theory. In: Proc. of the 4th “Methods for modalities” Workshop (M4M-4). pp. 21–37. m4m.loria.fr (2005) 24. Zolin, E.: Modal logic applied to query answering and the case for variable modal- ities. In: Calvanese, D., Franconi, E., Haarslev, V., Lembo, D., Motik, B., Turhan, A.Y., Tessaris, S. (eds.) Proc. of the 20th Int. Workshop on Description Logics (DL 2007). CEUR Workshop Proceedings, vol. 250. CEUR-WS.org (2007)