Status QI O: An Update Birte Glimm1 , Yevgeny Kazakov1 , and Carsten Lutz2 1 The University of Oxford, Department of Computer Science, UK 2 Universität Bremen, Germany Abstract. We prove co-N2ExpTime-hardness for conjunctive query entailment in the description logic ALCOIF , thus improving the previously known 2ExpTime lower bound. The result transfers to OWL DL and OWL2 DL, of which ALCOIF is an important fragment. A matching upper bound remains open. 1 Introduction Due to its importance for ontology-based data access and data integration, conjunctive query (CQ) answering has developed into one of the most widely studied reasoning tasks in description logic (DL). Nevertheless, the precise complexity (and sometimes even decidability) of CQ answering in several important expressive DLs is still an open problem. In particular, this concerns fragments of the W3C-standardized OWL DL on- tology language that comprise nominals, inverse roles, and number restrictions, a com- bination of expressive means that is notorious for interacting in intricate ways. In this paper, we concentrate on the basic such fragment ALCOIF in which number restric- tions take the form of global functionality constraints. Decidability of CQ answering in ALCOIF and its extension ALCOIQ with qual- ified number restrictions has been shown only very recently [1]. Since the proof is based on a mutual enumeration of finite models and theorems of first-order logic, it does not yield any upper complexity bound. The best known lower bound for CQ answer- ing in ALCOIF is 2ExpTime, inherited from the fragment ALCI of ALCOIF that does not include nominals and functionality constraints [2, 3]. The aim of this paper is to improve upon this lower bound by establishing co-N2ExpTime-hardness. Note that CQ answering in the fragment ALCIF of ALCOIF that does not include nominals is in 2ExpTime [4], and the same is true for the fragment ALCQO that does not in- clude inverse roles [5] and ALCOI that does not include functionality restrictions [6]. Thus, our result shows that the combination of nominals, inverse roles, and number restrictions leads to an increase of complexity of CQ answering from 2ExpTime to (at least) co-N2ExpTime. This parallels the situation for the subsumption problem, which is co-NExpTime-complete for ALCOIF , but ExpTime-complete in any of ALCIF , ALCQO, and ALCOI. Since ALCOIF is a fragment of OWL DL (in both the OWL1 and the OWL2 version), our co-N2ExpTime lower bound obviously also applies to CQ answering in this language. We prove our result by a reduction of the tiling problem that requires to tile a torus n n of size 22 × 22 . Our construction combines elements of two existing hardness proofs, but also requires the development of novel ideas. We follow the general strategy of the proofs that show N2ExpTime-hardness of satisfiability in SROIQ [7] and in the ex- tension of SHOIF with role conjunctions [8]. One central part of those proofs is the n realization of a counter that counts up to 22 . We realize this counter using a (rather sub- tle!) adaptation of the conjunctive queries that have been developed in [2, 3] to establish 2ExpTime-hardness of CQ-answering in ALCI. An extended technical report including proofs and further details is available [9]. 2 Preliminaries We assume standard notation for the syntax and semantics of ALCOIF knowledge bases [10]. The presence of nominals allows for only working with TBoxes, which consist of concept inclusions (CIs) C v D. A knowledge base (KB) is then simply a TBox. Let NV be a countably infinite set of variables. An atom is an expression C(v) or r(v, v0 ), where C is a (potentially compound) ALCOIF -concept, r is an atomic role, and v, v0 ∈ NV .3 A conjunctive query q is a finite set of atoms. We use Var(q) to denote the set of variables that occur in the query q. Let K be an ALCOIF KB, I = (·I , ∆I ) a model of K, q a conjunctive query, and π : Var(q) → ∆I a total function. We write I |=π C(v) if π(v) ∈ C I and I |=π r(v, v0 ) if hπ(v), π(v0 )i ∈ rI . If I |=π at for all at ∈ q, we write I |=π q and call π a match for I and q. We say that I satisfies q and write I |= q if there is a match π for I and q. If I |= q for all models I of a KB K, we write K |= q and say that K entails q. The conjunctive query entailment problem is, given a knowledge base K and a query q, to decide whether K |= q. This is the decision problem corresponding to query answering, see e.g. [11]. A domino system is a triple D = (T, H, V), where T = {1, . . . , k} is a finite set of tiles and H, V ⊆ T × T are horizontal and vertical matching relations. A tiling of m × m for a domino system D with initial condition c0 = ht10 , . . . , tn0 i, ti0 ∈ T for 1 ≤ i ≤ n, is a mapping t : {0, . . . , m − 1} × {0, . . . , m − 1} → T such that ht(i, j), t(i + 1 mod m, j)i ∈ H, ht(i, j), t(i, j + 1 mod m)i ∈ V, and t(i, 0) = ti−1 0 (0 ≤ i, j < m). There exists a domino system D0 for which it is N2ExpTime-complete to decide, given an initial condition c0 n n of length n, whether D0 admits a tiling of 22 × 22 with initial condition c0 [12]. 3 Conjunctive Query Entailment in ALCOI F Our aim is to construct, for an initial condition c0 of length n, an ALCOIF -KB K0 n n and conjunctive query q0 such that K0 6|= q0 iff D0 admits a tiling of 22 × 22 with initial 0 condition c . Intuitively, the models of K0 that we are interested in have the form depicted in n n Figure 1: a torus of dimension 22 × 22 , where the lower left corner is identified by the nominal o, the upper right corner by the nominal e, each horizontal dashed arrow denotes the role h, and each vertical dotted arrow the role v. We will install two counters that identify the vertical and horizontal position of torus nodes. To store the counter values, we use binary trees of (roughly) depth n below the torus nodes, where each 3 Complex concepts C in atoms C(x) are used w.l.o.g.; to eliminate them, we can replace C(x) with AC (x) for a fresh atomic concept AC and add C v AC to the TBox. n 22 {e} n 22 {o} 2n Fig. 1. Schematic depiction of the torus of the 2n leaves store one bit of each counter (represented via concept names X and Y). The filled circles in Figure 1 denote true torus nodes, which are labeled by a tile later on, while the unfilled circles denote auxiliary nodes that will help us in properly incrementing the counters. This incrementation is the main difficulty of the reduction, and it is achieved with the help of the query q0 . As the details are intricate, we defer a discussion of the details until later, and first concentrate on the construction of K0 . The following concept inclusions (1) to (9) of K0 lay the foundation for enforcing the torus structure with attached trees. Successors in trees are connected via the com- position of the roles r− and r, from now on denoted by r− ; r. This is needed in the query construction later on, similar to the use of symmetric roles in [2, 3]. We call additional nodes between r− and r the ‘intermediate’ tree nodes. Note that no branching occurs at intermediate nodes. Also for the query construction, the root of a tree below a true torus node is the torus node itself while the root of a tree below an auxiliary torus node is reachable by traveling one step along the role r (see Figure 1). To distinguish these two kinds of trees, we label trees of the former kind with the concept name B and call them black trees, and trees of the latter kind with the concept name W and call them white trees. Later on, we will use white trees that are on the vertical axis to increment the vertical counter and white trees that are on the horizontal axis to increment the hor- izontal counter. To support this, we further label white trees of the former kind with V and white trees of the latter kind with H. The basic idea for constructing the torus itself is similar to what is done in [13, 7, 8]: the maximum value of both counters (indicated by the concept names MX and MY ) identifies the upper right corner, which has to sat- isfy the nominal e and is thus unique. Inverse functionality for h and v then guarantees uniqueness of elements for all other values of the horizontal and vertical counters, and that the torus ‘closes’ in the expected way. We use concept names L0 , . . . , Ln to mark the levels of the trees, to deal with the symmetry of the composition r− ; r. Thus, the B W L0 H W L0 L1 A1 ¬A1 A1 ¬A1 L1 L2 A1 A1 ¬A1 ¬A1 A2 ¬A2 A2 ¬A2 L2 A1 A1 ¬A1 ¬A1 A2 ¬A2 A2 ¬A2 Fig. 2. A black and a white tree, for n = 2 concept B u L0 identifies the true torus nodes. {o} v B u L0 (1) B u L0 v ∃h.(W u ∃r.(H u W u L0 ) u ∃h.(B u L0 )) (2) B u L0 v ∃v.(W u ∃r.(V u W u L0 ) u ∃v.(B u L0 )) (3) B u L0 u MX u MY v {e} (4) Li v ∃r− .∃r.(Ai+1 u Li+1 ) u ∃r− .∃r.(¬Ai+1 u Li+1 ) i v 6 1h− .> > v 6 1v− .> (9) Note that the concept names A1 , . . . , An implement a binary counter for the leafs of the trees, i.e., for counting the bit positions in the horizontal and vertical counters. In summary, the internal structure of the trees is as shown in Figure 2 where branching tree nodes have dark background and intermediate nodes have light background. The next step is to make sure that the horizontal and vertical counter have value 0 at the origin and that MX is true at the root of a tree when the horizontal counter has reached the maximum value, and similarly for MY . We use ∀(r− ; r)n .C to denote the 2n- quantifier prefixed ∀r− .∀r. · · · ∀r− .∀r.C. Recall that the concept name X represents the truth value of bits of the horizontal counter, and likewise for Y and the vertical counter. {o} v ∀(r− ; r)n .(¬X u ¬Y) (10) Ln v (X ↔ MX ) u (Y ↔ MY ) (11) Li−1 u ∃r− .∃r.(Li u Ai u MX ) u ∃r− .∃r.(Li u ¬Ai u MX ) v MX 0 u ∃v.∃r.∃v− .∃r.> u − ∃h− .∃r.∃v.∃r.∃v− .∃r.> u ∃v− .∃r.> (27) NI v ∃v.∃r .∃v .∃r .∃h.∃r .∃h .∃r .> u ∃h.∃r .∃h .∃r .> u − − − − − − − − − ∃v− .∃r− .∃h.∃r− .∃h− .∃r− .> u ∃h− .∃r− .> (28) In matches of the refined query, the endpoints of the two foldings shown in Figure 3 will match at the end of (possibly empty) side chains at the level n + 1, v and v0 will match at the end of the side chains between the levels n and n + 1, and the adjacent inner nodes labeled with ¬Ai resp. Ai will match at the end of the side chains at the level n. For this reason, we propagate all relevant concept names to the end of those chains. For C ∈ {A1 , . . . , An , ¬A1 , . . . , ¬An , X, ¬X, Y, ¬Y}, C0 ∈ {Q, B, W}, add the concept inclusions (Ln t Ln+1 ) u C v ∀h.∀r.∀h− .∀r.∀v.∀r.∀v− .∀r.C u ∀v.∀r.∀v− .∀r.C u ∀h− .∀r.∀v.∀r.∀v− .∀r.C u ∀v− .∀r.C (29) Q u C v ∀v.∀r− .∀v− .∀r− .∀h.∀r− .∀h− .∀r− .C0 u ∀h.∀r− .∀h− .∀r− .C0 u 0 ∀v− .∀r− .∀h.∀r− .∀h− .∀r− .C0 u ∀h− .∀r− .C0 (30) Figure 6 shows, for n = 2, how to fold the refined counting query such that v is mapped to a Q-node of a black tree and v0 to a Q-node of a white successor tree that can be reached via crossing an h-edge in the torus, and likewise for the case where v0 is mapped to a white tree, and v to a black successor tree reachable via h. We display only those side chains that are needed for accommodating the query match. To get started, note that in the left part of Figure 6, the h-edge in the right half of a meta role as shown in Figure 4 is matched onto the crossing h-edge in the model. The square and diamond nodes indicate where the middle and end parts of each meta role in the query match. Crossings of v-edges are similar. We now define counting query parts in a more precise way. Note that each counting query consists of 4n + 4 meta roles. In the subsequent definition, qi,mj is a meta role used in the counting query for Ai , where j ranges over 0, . . . , 4n + 3. Definition 1. For all i, j with 1 ≤ i ≤ n and 0 ≤ j < 4n + 4, put i, j qm := { r(v1i, j , v0i, j ), v(v1i, j , vi,2 j ), r(vi,3 j , vi,2 j ), v(vi,4 j , vi,3 j ), r(vi,5 j , vi,4 j ), h(vi,5 j , vi,6 j ), r(v7i, j , v6i, j ), h(v8i, j , vi,7 j ), r(vi,9 j , vi,8 j ), r(vi,9 j , vi,10j ), h(vi,10j , vi,11j ), r(vi,11j , vi,12j ), i, j i, j i, j i, j h(v12 , v13 ), r(v13 , v14 ), v(vi,14j , vi,15j ), r(vi,15j , vi,16j ), v(vi,17j , vi,16j ), r(vi,17j , vi,0 j+1 )} with v0i,4n+4 = vi,0 0 . For each i with 1 ≤ i ≤ n, the counting query for Ai is [ qic := { Ai (vi,0 0 ), ¬Ai (vi,1 0 ), Ai (vi,2n+2 0 ), ¬Ai (vi,2n+3 0 )} ∪ qi,mj 0≤ j<4n+4 9 = v, v9 with vi,0 = v0 . The counting query qc for the whole counter is i,2n+2 [ qc := {B(v), Q(v), W(v0 ), Q(v0 )} ∪ qic 1≤i≤n Note that each counting query qic is a cycle as intended since vi,4n+4 0 = vi,0 0 . As explained above, the overall counting query qc links a Q-node x1 of a tree to the Q-nodes x2 of its successor trees that represent the same bit position for the horizontal and vertical counters. To establish the central property (∗) in models of K0 that do not match the query q0 to be constructed, it thus remains to modify q0 such that it matches only if the truth assignment at the Ln -predecessor x10 of x1 to X 0 and Y 0 is not identical to the truth assignment at the Ln -predecessor x20 of x2 to X and Y. This is achieved by the second type of component queries in q0 , the copying components. To prepare for these components, let us distinguish two types of side chains in the tree nodes: the outgoing chains starting with h and v shown to the left in Figure 5 and the incoming chains starting with the inverses of h and v show to the right in Figure 5. As can be seen in Figure 6, when the query has a match, the incoming chains are used in a predecessor tree and the outgoing chains are used in a successor tree. We will distinguish the ends of incoming chains from the ends of the outgoing chains on the levels n and n + 1 using an additional concept P: (Ln t Ln+1 ) v ∀h.∀r.∀h− .∀r.∀v.∀r.∀v− .∀r.P u ∀v.∀r.∀v− .∀r.P u ∀h− .∀r.∀v.∀r.∀v− .∀r.¬P u ∀v− .∀r.¬P (31) The copying components take the form displayed in Figure 7, i.e., there are 8 such components in total. Each component is like the upper half of a counting component, except that the concept labels have changed to negated conjunctions. In Figure 7, the four copying components in each row take care of each possible truth assignment to X 0 and Y 0 for the predecessor and the corresponding assignment to X and Y for the successor. We need two queries per truth assignment to deal with the two possible ways in which a counting query can match for the variables v and v0 : (a) v matches into a tree that satisfies B, and v0 into a successor tree that satisfies W; (b) v0 matches into a tree that satisfies W, and v into a successor tree that satisfies B; To explain in detail how the copying queries work, consider case (a). Due to the Q- label in the counting queries, the variables v and v0 can only be matched to Q-nodes. Thus assume that v is matched to a Q-node x1 of a predecessor tree that satisfies B and v0 to a Q-node x2 of a successor tree that satisfies W. The relevant queries are those ¬(P uX 0 uY 0 ) ¬(¬P uXuY) ¬(P uX 0 u¬Y 0 ) ¬(¬P uXu¬Y) ¬(P u¬X 0 uY 0 ) ¬(¬P u¬XuY) ¬(P u¬X 0 u¬Y 0 ) ¬(¬P u¬Xu¬Y) v v0 v v0 v v0 v v0 ¬(P uX 0 uY 0 ) ¬(¬P uXuY) ¬(P uX 0 u¬Y 0 ) ¬(¬P uXu¬Y) ¬(P u¬X 0 uY 0 ) ¬(¬P u¬XuY) ¬(P u¬X 0 u¬Y 0 ) ¬(¬P u¬Xu¬Y) v0 v v0 v v0 v v0 v Fig. 7. The 8 query copying components from the first row in Figure 7. Let u be the neighboring variable of v in the copying components, and u0 the neighboring variable of v0 ; thus, both u and u0 are labeled with negated conjunctions. Similar to the situation in Figure 6, u can only be matched to the ends of two outgoing chains (satisfying P): one chain is at level n and another one is at level n + 1. Let us refer to the ends of these chains as x10 and x100 respectively. Likewise, u0 can only be matched to one of the two ends x20 and x200 of incoming chains (satisfying ¬P) at the levels n and n + 1, respectively. First assume that the truth assignment at x10 to X 0 and Y 0 is identical to the truth assignment at x20 to X and Y and thus there should be no match of the overall query q0 . Take the corresponding counting component from the first row, e.g., the first one when X 0 , Y 0 , X, and Y are all interpreted as true. It can be seen that, in this situation, the matches with u 7→ x10 or with u0 7→ x20 are not possible because they violate the concept labels of u and respectively u0 . The match u 7→ x100 and u0 7→ x200 is also not possible because the path from u to u0 is not long enough. Thus, there is no match of this component, whence no match of the overall query q0 . Conversely, assume that the truth assignment at x10 to X 0 and Y 0 is different from the truth assignment at x20 to X and Y, e.g., that x10 satisfies X 0 but x20 does not satisfy X. Since by (26) the truth values of X 0 and X are complemented at the level n + 1, x100 does not satisfy X 0 and x200 satisfies X. Then the first two components from the first row have a match u 7→ x100 and u0 7→ x20 and the next two components have a match u 7→ x10 and u0 7→ x200 . All components in the second row have a match due to the use of the concept name P in the labels (note the swapped v, v0 ). Thus,the overall query q0 matches. A formal definition of copying queries can be found in [9]. This finishes the construction of the query q0 and of the part of K0 that enforces the torus structure. It remains to encode tilings of the domino system D0 : > v T1 t · · · t Tk Ti u T j v ⊥ 1≤i< j≤k (32) T i u ∃h.T j v ⊥ T k u ∃v.T ` v ⊥ hi, ji < H, hk, `i < V (33) Finally, we enforce the initial condition c 0 = ht10 , . . . , tn0 i of the torus. {o} v T t10 u ∀h.(T t20 u ∀h.(T t30 u ∀h.(T t40 u . . . ∀h.T tn0 . . .))) (34) More details regarding the correctness of the reduction can be found in [9]. The most challenging issue is to show that when D0 admits a tiling with initial condition c0 and we build a model I of K that has the intended torus shape, then I 6|= q0 : we need to prove that there are no unintended foldings and matchings of the query q0 . Theorem 1. Conjunctive query entailment by ALCOIF knowledge bases is co-N2Exp- Time-hard. 4 Conclusions We have shown that conjunctive query entailment in the Description Logic ALCOIF is hard for co-N2ExpTime. The challenging problem of finding a matching upper bound, or in fact any elementary upper bound, remains open. Acknowledgments B. Glimm is supported by EPSRC grant EP/F065841/1, Y. Kazakov by EPSRC grant EP/G02085X, and C. Lutz by DFG SFB/TR 8 “Spatial Cognition”. References 1. Rudolph, S., Glimm, B.: Nominals, inverses, counting, and conjunctive queries. J. of Artifi- cial Intelligence Research 39 (2010) 429–481 2. Lutz, C.: Inverse roles make conjunctive queries hard. In: Proc. of the 2007 Description Logic Workshop (DL 2007). (2007) 3. Lutz, C.: The complexity of conjunctive query answering in expressive description logics. In: Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR-08), LNCS (2008) 179–193 4. Calvanese, D., De Giacomo, G., Lenzerini, M.: On the decidability of query containment under constraints. In: Proc. of the Seventeenth ACM SIGACT SIGMOD Sym. on Principles of Database Systems (PODS-98), ACM Press and Addison Wesley (1998) 149–158 5. Glimm, B., Horrocks, I., Sattler, U.: Unions of conjunctive queries in SHOQ. In: Proc. of the 11th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR-08), AAAI Press/The MIT Press (2008) 6. Calvanese, D., Eiter, T., Ortiz, M.: Regular path queries in expressive description logics with nominals. In: Proc. of the 21st Int. Joint Conf. on AI (IJCAI-09). (2009) 714–720 7. Kazakov, Y.: RIQ and SROIQ are harder than SHOIQ. In: Proc. of the 11th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR-08), AAAI Press/The MIT Press (2008) 8. Glimm, B., Kazakov, Y.: Role conjunctions in expressive description logics. In: Proc. of the 15th Int. Conf. on Logic for Programming and Automated Reasoning (LPAR 2008). Volume 5330 of LNCS., Springer (2008) 391–405 9. Glimm, B., Kazakov, Y., Lutz, C.: Status QIO: An update. Technical report, The University of Oxford (2011) http://www.comlab.ox.ac.uk/files/3969/GlKL11a.pdf. 10. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Descrip- tion Logic Handbook. Cambridge University Press (2003) 11. Glimm, B., Horrocks, I., Lutz, C., Sattler, U.: Conjunctive query answering for the descrip- tion logic SHIQ. J. of Artificial Intelligence Research 31 (2008) 151–198 12. Börger, E., Grädel, E., Gurevich, Y.: The Classical Decision Problem. Perspectives in Math- ematical Logic. Springer (1997) Second printing, Springer Verlag, 2001. 13. Tobies, S.: The complexity of reasoning with cardinality restrictions and nominals in expres- sive description logics. J. of Artificial Intelligence Research 12 (2000)