On the Data Complexity of Ontology-Mediated Queries with a Covering Axiom O. Gerasimova1 , S. Kikot2 , V. Podolskii3,1 , and M. Zakharyaschev2 1 National Research University Higher School of Economics, Moscow, Russia 2 Birkbeck, University of London, U.K. 3 Steklov Mathematical Institute, Moscow, Russia Abstract. This paper reports on our ongoing work that aims at a classification of conjunctive queries q according to the data complexity of answering ontology- mediated queries ({A v T t F }, q). We give examples of queries from the complexity classes C ∈ {AC0 , L, NL, P, CO NP}, and obtain a few syntactical conditions for C-membership and C-hardness. 1 Introduction The OWL 2 QL profile of OWL 2 —as well as the underlining description logics from the DL-Lite family [4, 2]—were designed to ensure that every ontology-mediated query (OMQ, for short) (T , q) with an OWL 2 QL ontology T and a conjunctive query (CQ) q is first-order (FO) rewritable. However, when developing ontologies for ontology- based data access (OBDA) [10] applications, domain experts are often tempted to use axioms with constructs that are not available in OWL 2 QL. For example, the NPD FactPages ontology,4 which was created to facilitate querying the datasets of the Nor- wegian Petroleum Directorate,5 contains cardinality restrictions and covering axioms of the form A v B1 t · · · t Bn . Typical answers to the question whether such axioms could have a negative impact on OMQ rewriting are as follows: (i) the data satisfies the axioms anyway (because of the database schema), (ii) our ‘real-world queries’ are never affected by them, and (iii) OBDA systems such as Ontop drop everything outside OWL 2 QL. Ideally, of course, we would rather want our system to detect automatically whether the given OMQ is FO-rewritable and alert the user if this is not so. Furthermore, in case of non-FO-rewritability, we might want the system to check whether a datalog rewriting is possible, and so on. From the complexity-theoretic point of view, we are thus interested in the data complexity of answering a given OMQ with an expressive ontology. A systematic investigation of this problem was started in [3], which showed among other results that answering OMQs of the form (Disn , u), where u is a union of CQs (UCQ) and Disn = {A v B1 t · · · t Bn }, is polynomially equivalent to the constraint satisfaction problems CSP(A). In particular, a P/CO NP dichotomy for such OMQs would give a dichotomy for CSPs, thereby confirming the Feder-Vardi conjecture. As 4 http://sws.ifi.uio.no/project/npd-v2/ 5 http://factpages.npd.no/factpages/ shown in [7], answering CQs with basic schema.org ontologies (in particular, Disn ) and CQs of qvar-size ≤ 2 is in P for combined complexity, where q is of qvar-size n if the restriction of q to its quantified variables is a disjoint union of CQs with at most n variables each. Moreover, FO- and datalog-rewritability of OMQs of the form (T , u), where T is a schema.org ontology and u is a UCQ, are decidable in NE XP T IME. It has also been recently established in [5] that checking FO-rewritability of OMQs with on- tologies formulated in any description logic between ALCI and SHI is 2NE XP T IME- complete. Datalog rewritability of OMQs with ontologies given in disjunctive datalog has been investigated in [8]. In this paper, we consider one fixed non-Horn ontology Dis = {A v T t F }. Ul- timately aiming at a complete classification of CQs q according to the data complexity of answering OMQs Q = (Dis, q), here we present our initial observations about this problem. Ideally, we would like to obtain transparent necessary and sufficient conditions relating the structure of q—say, the way how T and F occur in it—with the complexity of answering Q. For example, one such condition guaranteeing datalog rewritability, and so tractability of answering Q follows from [8, Theorem 27]: it suffices that q contains at most one occurrence of F or at most one occurrence of T . We obtain a few conditions in the same spirit for the complexity classes AC0 , L, NL and P. We also give quite a few simple and instructive CQs distinguishing between NL and P, and develop techniques for establishing P and CO NP lower bounds. 2 Preliminaries In our context, a conjunctive query (CQ) is a first-order (FO) formula of the form q(x) = ∃y ϕ(x, y), where ϕ is a conjunction of unary or binary atoms P (z) with z ⊆ x ∪ y.Unions of conjunctive queries (UCQ) is a disjunction of conjunctive queries. Given an ABox (or data instance) A, we denote by ind(A) the set of individual names that occur in A. A tuple a ⊆ ind(A) is a certain answer to the OMQ Q = (Dis, q(x)) over A if M |= q(a), for every model M of Dis ∪ A; in this case we write Dis, A |= q(a). If the set x of answer variables is empty, a certain answer to Q over D is ‘yes’ if M |= q, for every model M of Dis ∪ A, and ‘no’ otherwise. OMQs and CQs without answer variables x are called Boolean. We often regard CQs as sets of their atoms. In this paper, we assume that the all CQs are connected. Let Q = (Dis, q(x)) be a fixed OMQ. By answering Q, we understand the problem of checking, given an ABox A and a tuple a ⊆ ind(A), whether Dis, A |= q(a). It is readily seen that this problem is always in CO NP. It is in the complexity class AC0 if there is an FO-formula q 0 (x), called an FO-rewriting of Q, such that Dis, A |= q(a) iff q 0 (a) holds in the model given by A, for any ABox A and any tuple a ⊆ ind(A). A datalog program, Π, is a finite set of rules of the form ∀z (γ0 ← γ1 ∧ · · · ∧ γm ), where each γi is an atom Q(y) with y ⊆ z or an equality (z = z 0 ) with z, z 0 ∈ z. (As usual, we omit ∀z.) The atom γ0 is the head of the rule, and γ1 , . . . , γm its body. All variables in the head must occur in the body, and = can only occur in the body. The predicates in the heads of rules in Π are IDB predicates, the rest (including =) EDB predicates. A program Π is called linear if the body of every rule in Π contains at most one IDB predicate. A datalog query is a pair (Π, G(x)), where Π is a datalog program and G(x) an atom. A tuple a ⊆ ind(A) is an answer to (Π, G(x)) over an ABox A if G(a) holds in the first-order structure with domain ind(A) obtained by closing A under the rules in Π; in this case we write Π, A |= G(a). A datalog query (Π, G(x)) is a datalog rewriting of an OMQ Q = (Dis, q(x)) in case Dis, A |= q(a) iff Π, A |= G(a), for any ABox A and any a ⊆ ind(A). The evaluation problem for (Π, G(x))—that is, checking, given an ABox A and a tuple a ⊆ ind(A), whether Π, A |= G(a)—is known to be in P; for linear Π, this problem is in NL; see [6] and references therein. 3 AC0 By a solitary occurrence of F in a CQ q we mean any F (x) ∈ q such that T (x) ∈ / q; likewise, a solitary occurrence of T in q is any T (x) ∈ q such that F (x) ∈ / q. Theorem 1. For any CQ q without solitary occurrences of F (or T ), answering the OMQ Q = (Dis, q) is in AC0 . Proof. We show that Dis, A |= q(a) iff A |= q(a). Suppose that A 6|= q(a) and F (x) ∈ q ⇒ T (x) ∈ q. Take A0 = A ∪ {F (a) | a ∈ ind(A) ∧ T (a) ∈ / A}. Clearly, A0 |= Dis and A0 6|= q(a). The converse direction is trivial. In particular, answering any OMQ Q = (Dis, q), where q does not contain one of F or T , is in AC0 . This observation can be easily generalised to OMQs with ontologies Disn = {A v B1 t · · · t Bn }, for n ≥ 2: Theorem 2. Suppose q is any CQ that does not contain an occurrence of Bi , for some i (1 ≤ i ≤ n). Then answering the OMQ Q = (Disn , q) is in AC0 . Thus, only those CQs can ‘feel’ Disn as far as FO-rewritability is concerned that contain all the Bn (which makes them quite complex in practice). Theorem 1 also shows that Q = (Dis, q) satisfying the respective condition has a trivial FO-rewriting, viz. q itself. This is not accidental as shown by the following observation: Proposition 1. If Q = (Dis, q) is in AC0 , then q is a rewriting of Q. Proof. By [3, Proposition 5.9], if Q is FO-rewritable, it has a UCQ rewriting. Then there is a homomorphism from q to any CQ q 0 in this rewriting. We do not know yet whether the sufficient condition for FO-rewritability given by Theorem 1 is also a necessary one for minimal CQs q (that are not equivalent to any of their proper subqueries). For non-minimal CQs, this is not the case as shown by F R ◦ R F T R ◦ R T which is in AC0 because it is equivalent to the CQ ◦ R FT R ◦ . Below we obtain some partial results showing how a single F -atom and a single T -atom in q can cause L- and NL-hardness. 4 L and NL We say that a Boolean CQ q is an F -T -CQ if it has exactly one atom of the form F (x), exactly one atom of the form T (y), and the variables x and y are distinct. Theorem 3. Answering any OMQ Q = (Dis, q) with an F -T -CQ q is L-hard. Proof. The proof is by reduction to the reachability problem for undirected graphs, which is known to be L-complete; see, e.g., [1]. Let q 0 be the CQ obtained from q by removing the atoms F (x) and T (y). Suppose we are given an undirected graph G = (V, E) and two vertices s, t ∈ V . It will be convenient to regard G as a directed graph such that (u, v) ∈ E iff (v, u) ∈ E, for any u, v ∈ V . We encode G by means of an ABox AG that is obtained from G as follows. For every edge e = (u, v) ∈ E, let q 0e be the set of atoms in q 0 with x renamed to u, y to v and all other variables z to ze . Then AG comprises all such q e , for e ∈ E, as well as F (s), T (t) and A(v), for v ∈ V \ {s, t}. Our aim is to show that s →G t iff Dis, AG |= q. Suppose s →G t, that is, there exists a path s = v0 , v1 , . . . , vn = t in G with ei = (vi , vi+1 ) ∈ E, for i < n. Consider an arbitrary model I of Dis and AG . Since I |= Dis and F (s), T (t), A(vi ), for 1 ≤ i < n, are all in AG , we can find some i < n such that I |= F (vi ) and I |= T (vi+1 ). As q 0ei is an isomorphic copy of q 0 , we obtain I |= q. Conversely, suppose s 6→G t. Define an interpretation I by extending the ABox AG with F I = {v ∈ V | s →G v} and T I = {v ∈ V | s 6→G v}. Clearly, I is a model of Dis. By the construction, the elements of the connected component of I containing s cannot be instances of T , while the remaining elements of I cannot be instances of F . Since q is connected, it follows that I 6|= q. We call a Boolean CQ q linear-directed if all of its variables can be arranged in a sequence v0 , . . . , vm such that all binary predicates in q are of the form R(vi , vi+1 ), for some i, 0 ≤ i < m. Theorem 4. Answering any OMQ Q = (Dis, q) with a linear-directed CQ q contain- ing both a solitary F and a solitary T is NL-hard. Proof. Suppose F (vk ) ∈ q, T (vk ) ∈ / q and F (vl ) ∈ / q, T (vl ) ∈ q, for some k, l with 0 ≤ k < l ≤ m. We rename the sequence vk , . . . , vl to x0 , . . . , xn . The proof proceeds by reduction to the reachability problem in directed graphs, which is known to be NL- complete; see, e.g., [1]. Given a directed graph G = (V, E) and vertices s, t ∈ V , we construct the ABox AG in the same way as in the proof of Theorem 3 treating x0 as x and xn as y. Again, we show that s →G t iff Dis, AG |= q. The implication (⇒) is established exactly as above. To prove (⇐), we assume that s 6→G t and consider the same model I as defined in the proof of Theorem 3. Taking account of linear-directedness of q, we immediately conclude that there is no homomorphism h : q → I with h(x0 ) ∈ V . It remains to show that there is no homomorphism h : q → I with h(x0 ) ∈ / V either. Suppose to the contrary that such a homomorphism exists. Then there exist B ∈ {F, T } and a homomorphism f : q → (AG2 ∪ {B(r)}), where G2 = ({s, r, t}, {(s, r), (r, t)}). We denote the points of AG2 between s and r by x0 , x1 , . . . , xn and those between r and t by xn , x01 , . . . , x0n . By comparing the lengths of appropriate segments of q, we obtain f (x0 ) = xi , for some i (0 < i < n). As F (x0 ) ∈ q, we must have F (xi ) ∈ q; see the picture below. As f (xi ) = x2i if 2i ≤ n, and f (xi ) = x02i mod n otherwise, we also have F (x2i mod n ) ∈ q; more generally, F (xki mod n ) ∈ q for all natural k. Now, since the equation of the form ‘iX = n mod n’ always has a solution, F (xn ) ∈ q, which is impossible if B = T . If B = F , we use a similar argument starting from T (xi ) ∈ q and show that T (xn ) ∈ q, which is again a contradiction. F f (x0 ) f (xi ) B ... x0i ... x0n x0 ... xi ... ... xn T F ... ... ... T x0 xi xj xn Theorems 1 and 4 give the following dichotomy for OMQs Q = (Dis, q) with linear-directed CQs q: – either q does not contain a solitary F or a solitary T , and answering Q is in AC0 , – or q contains both solitary F and T , and answering Q is NL-hard. We now complement the sufficient conditions of L- and NL-hardness obtained above with sufficient conditions of OMQ answering in L- and NL. A CQ q 0 (x, y) is symmetric if the CQs q 0 (x, y) and q 0 (y, x) are equivalent in the sense that q 0 (a, b) holds in A iff q 0 (b, a) holds in A, for any ABox A and a, b ∈ ind(A). Theorem 5. Let Q = (Dis, q) be any OMQ such that q = ∃x, y (F (x) ∧ q 01 (x) ∧ q 0 (x, y) ∧ q 02 (y) ∧ T (y)), for some connected CQs q 0 (x, y), q 01 (x) and q 02 (y) that do not contain solitary T and F , and q 0 (x, y) is symmetric. Then answering Q can be done in L. Proof. It is not hard to show that, for any ABox A, we have Dis, A |= q iff there exist v0 , v1 , . . . , vn ∈ ind(A), for some n ≥ 1, such that the following conditions hold: – F (v0 ), A(v1 ), . . . , A(vn−1 ), T (vn ) ∈ A; – A |= q 0 (vi , vi+1 ) for 0 ≤ i < n; – A |= q 01 (vi ) for 0 ≤ i < n; – A |= q 02 (vi ) for 1 ≤ i ≤ n. It remains to observe that checking these conditions reduces to checking VT –VF reach- ability in the undirected graph GA = (VA , EA ) defined below. The vertices in GA comprise the set VA = VT ∪ VA ∪ VF , where – VT = {v ∈ ind(A) | A |= T (v) ∧ q 02 (v)}; – VA = {v ∈ ind(A) | A |= A(v) ∧ q 01 (v) ∧ q 02 (v)}; – VF = {v ∈ ind(A) | A |= F (v) ∧ q 01 (v)}. The edges in GA comprise the set EA = ET A ∪ EAA ∪ EF A , where – Eall = {(x, y) | A |= q 0 (x, y)}; – ET A = {(x, y) ∈ Eall | (x ∈ VT ∧ y ∈ VA ) ∨ (y ∈ VT ∧ x ∈ VA )}; – EAA = {(x, y) ∈ Eall | x ∈ VA ∧ y ∈ VA }; – EF A = {(x, y) ∈ Eall | (x ∈ VF ∧ y ∈ VA ) ∨ (y ∈ VF ∧ x ∈ VA )}. It is readily seen that GA = (VA , EA ) is undirected in the sense that, for all of its vertices u and v, (u, v) ∈ EA iff (u, v) ∈ EA . If we do not require q 0 (x, y) to be symmetric, the complexity upper bound increases to NL: Theorem 6. Let Q = (Dis, q) be any OMQ such that q = ∃x, y (F (x) ∧ T (y) ∧ q 0 (x, y)), for some connected CQ q 0 (x, y) without solitary occurrences of F and T . Then answer- ing Q can be done in NL. Proof. We claim that the datalog query (Π, G) with the following linear datalog pro- gram Π, where q̃ 0 is the result of omitting all the ∃ from q 0 : G ← F (x) ∧ q̃ 0 (x, y) ∧ P (y) P (x) ← T (x) P (x) ← A(x) ∧ q̃ 0 (x, y) ∧ P (y) is a datalog rewriting of Q. Indeed, if Π, A |= G then there are v0 , v1 , . . . , vn ∈ ind(A) such that F (v0 ), A(v1 ), . . . , A(vn−1 ), T (vn ) ∈ A and q 0 (vi , vi+1 ) holds in A, for 0 ≤ i < n. Clearly, in any model I of Dis and A there is i with I |= F (vi ) ∧ T (vi+1 ). It follows that Dis, A |= q. Conversely, suppose Π, A 6|= G. Let VP = {v ∈ ind(A) | Π, A |= P (v)}. Define a model I of Dis with domain ind(A) by setting T I = {v | T (v) ∈ A} ∪ {v ∈ VP | A(v) ∈ A}, F I = F A ∪ {v ∈ / VP | A(v) ∈ A}. We claim that I 6|= q. Indeed, otherwise there is a homomorphism h : q → I. As h(y) ∈ T I , we have Π, A |= P (h(y)). As h(x) ∈ F I , we have either F (h(x)) ∈ A or A(h(x)) ∈ A, contrary to Π, A 6|= G. The sufficient conditions of Theorems 5 and 6 only apply to CQs with exactly one solitary occurrence of F and exactly one solitary occurrence of T . What happens if we allow more than one solitary occurrences of F or T ? 5 P The following result is a consequence of [8, Theorem 27]: Theorem 7. Let Q = (Dis, q) be any OMQ such that q = ∃x, y1 , . . . , yn (F (x) ∧ T (y1 ) ∧ · · · ∧ T (yn ) ∧ q 0 (x, y1 , . . . , yn )), for some connected CQ q 0 (x, y1 , . . . , yn ) without solitary occurrences of T and F . Then answering Q can be done in P. Indeed, for any ABox A, we have Dis, A |= q iff Π, A |= G, where Π is the following datalog program and q̃ 0 is the result of omitting all the ∃ from q 0 : G ← F (x) ∧ q̃ 0 (x, y1 , . . . , yn ) ∧ P (y1 ) ∧ · · · ∧ P (yn ) P (x) ← T (x) P (x) ← A(x) ∧ q̃ 0 (x, y1 , . . . , yn ) ∧ P (y1 ) ∧ · · · ∧ P (yn ). Is the P-upper bound of Theorem 7 optimal? The following example gives a typical OMQ in the scope of that theorem answering which is P-hard. Example 1. We show that Q = (Dis, q) is P-hard for q shown in the picture below. T T F S R The proof is by reduction of the alternating monotone circuit evaluation problem, which is known to be P-complete [9]. An example of an alternating monotone circuit is shown in the picture below. Given such a circuit C and an input α, we define an ABox Aα C as the set of the following atoms: – R(g, h), if a gate g is an input of a gate h; – S(g, h), if g and h are distinct inputs of some AND-gate; – S(g, g), if g is an input gate or a non-output AND-gate; – T (g), if g is an input gate with 1 under α; – F (g), for the only output gate g; – A(g), for those g that are neither inputs nor the output. To illustrate, the picture below shows an alternating monotone circuit C, an input α for it, and the ABox Aα C , where the solid arrows represent R and the dashed ones S: F AND OR OR A A AND AND AND A A A OR OR OR OR A A A A 1 0 1 0 0 0 T T One can show that C(α) = 1 iff Dis, Aα C |= q. Curiously, by changing S to R in the CQ from Example 1, we obtain an OMQ that is NL-complete as follows from Theorem 8 below. 6 NL vs. P Theorem 8. Answering any OMQ Q = (Dis, q n ) with n−1 ^  q n = ∃x1 , . . . , xn , y T (xi ) ∧ R(xi , xi+1 ) ∧ T (xn ) ∧ R(xn , y) ∧ F (y), i=1 for n ≥ 1, is NL-complete. Proof. The lower bound follows from Theorem 4. The proof of the upper one is by reduction to directed reachability. We split q n into two CQs: n−1 ^ q 0n = ∃x1 , . . . , xn  T (xi ) ∧ R(xi , xi+1 ) ∧ T (xn ), i=1  q = ∃x, y T (x) ∧ R(x, y) ∧ F (y) . One can show that, for any ABox A, we have Dis, A |= q n iff there exist a homomor- phism f : q 0n → A and a directed R-path f (xn ), v0 , v1 , . . . , vm ∈ ind(A) such that A(vi ) ∈ A, for i = 1, . . . , m − 1, and F (vm ) ∈ A. Clearly, this criterion reduces to directed reachability. To further illustrate how minor modifications to the structure of CQs can send them to different complexity classes, we collect in Table 1 a number of CQs in the scope of Theorem 7, some of which turn out to be NL-complete, while others are P-complete. (All the omitted labels on the arrows in Table 1 are assumed to be R, −/A means either blank or A, and F T /A means either F T or A). Here, we only sketch the proof of P-hardness for the OMQ (Dis, q), where q is T F T R R The proof is by reduction of the monotone circuit evaluation problem. Given a mono- tone circuit C and an input α, we define an ABox Aα C as the following labelled directed graph, all of whose edges are labelled with R. For each gate g of C except the inputs and output, the graph contains two vertices g and g 0 labelled with A; the output gate g gives rise to only one vertex g labelled with F , while each input gate g to only one vertex g 0 labelled according to α. For an OR-gate g = h1 ∨ h2 , we have the directed edges (h01 , g), (h02 , g), (g, rg ), where rg is a new vertex labelled with T . For an AND- gate g = h1 ∧ h2 , we have the edges (h01 , g), (g, h02 ). Also, for each gate g, we have the edges (g, g 0 ), (g 0 , tg ), where tg is a new vertex labelled with T . An example illustrating the construction is given below. One can show that C(α) = 1 iff Dis, Aα C |= q. T tg1 F g3 T tg2 g3 AND g10 A A g20 rg2 T g1 AND OR g2 g1 A A g2 1 1 0 T T F The membership in NL for the CQs in the left column of Table 1 can be shown by constructing appropriate linear datalog programs. For example, answering the OMQ NL-complete P-complete T T F T T F S R T F T T T T F T T −/A F T −/A T F T T FT F T T FT F T T FT FT F T −/A T FT FT F T T −/A FT F T T T −/A FT F T T −/A −/A FT F T T T −/A FT FT F T T F T T F T T FT F T F T /A T F T −/A T F T A T F T F T T F T Table 1. NL- and P-complete OMQs in the scope of Theorem 7. with the last CQ of the left column can be done by the following linear program: P (x) ← R(x, y), T (y), R(x, z), R(z, v), T (v) P (x) ← R(x, y), T (y), R(x, z), R(z, v), P (v), A(v) P (x) ← R(x, y), P (y), A(y) G ← P (x), F (x) Note that the classification problem we deal with in this section can be regarded as an instance of a more general problem of classifying datalog programs in terms of their data complexity, in particular, finding an NL/P dichotomy. 7 CO NP On the other hand, a minor extension of the CQ from Example 1 can lead to CO NP- completeness. First we show that answering the OMQ Q = (Dis, q) with the Boolean CQ q given in the picture below is CO NP-complete. T T F F R S Q Consider the ABoxes AN constructed according to the pattern shown below for N = 3: Q R a3 S c2 b3 S T A F b2 c3 F A R Q a2 T T a0 Q R A F c1 b0 F A T S b1 c0 S a1 R Q Let V = {a0 , . . . , aN }. It is not hard to see that (i) for any interpretation I based on AN , if I 6|= q then either V ⊆ T I or V ⊆ F I ; (ii) the interpretations I and I 0 0 obtained by extending AN with T I = T A ∪ V and F I = F A ∪ V , respectively, are both models of Dis that do not satisfy q. Given a 2+2-CNF φ with clauses D1 , . . . , DN and variables p1 , . . . , pM , we take M disjoint copies of AN , distinguishing between them by the superscripts 1, . . . , M . For example, a23 is the a3 -point of the second copy of AN and V 2 = {a20 , . . . , a2N }. For each Dn of the form ¬pi ∨ ¬pj ∨ pk ∨ pl , we add to those copies the atoms R(ain , ajn ), S(ajn , akn ) and Q(akn , aln ), and denote the resulting ABox by Aφ . A ain−1 A ajn−1 A akn−1 A aln−1 R Q A ain A ajn A akn A aln S pi pj pk pl We show that φ is satisfiable iff Dis, Aφ 6|= q. Let q 0 = R(x, y) ∧ S(y, z) ∧ Q(z, w). Observe that any possible match of q 0 in Aφ falls into one of the two groups: (A) (ain , bin , cin , ain+1 ), for 0 ≤ n ≤ N , 1 ≤ i ≤ M and addition modulo N + 1; (B) (ain , ajn , akn , aln ), for some clause Dn = (¬pi ∨ ¬pj ∨ pk ∨ pl ) in φ. Suppose φ is satisfiable under S an assignment a. We define a S model I of Dis by extending Aφ with T I = T Aφ ∪ {V i | a(pi ) = 1}, F I = F Aφ ∪ {V i | a(pi ) = 0}. We claim that I 6|= q. Indeed, the tuples in (A) cannot yield a match by (ii) above, while the tuples in (B) do not give a match since a(Dn ) = 1, for all n ≤ N . To see this, suppose a tuple (ain , ajn , akn , aln ) from (B) is a match for q in I. Then {ain , ajn } ⊆ T I and {akn , aln } ⊆ F I , from which a(pi ) = 1, a(pj ) = 1, a(pk ) = 0 and a(pl ) = 0, and so the clause Dn = ¬pi ∨ ¬pj ∨ pk ∨ pl is false under a. Conversely, suppose Dis, Aφ 6|= q. Then there is a model I of Dis based on Aφ such that I 6|= q. By (i) above applied to the copies of AN , for every i ≤ M , we have either Vi ⊆ T I or Vi ⊆ F I . In the former case, we set a(pi ) = 1; in the latter one, we set a(pi ) = 0. We claim that φ is satisfiable under a. Indeed, if Dn = ¬pi ∨¬pj ∨pk ∨pl is false under a, then a(pi ) = 1, a(pj ) = 1, a(pk ) = 0 and a(pl ) = 0, and so the tuple (ain , ajn , akn , aln ) would be a match for q in I. The proposed method is generic in the sense that we can try to apply it to any ‘sufficiently asymmetric’ CQ q with two T -atoms and two F -atoms: we use a T -F fragment of q for copying the values of the Boolean variables, and the whole q for encoding the clauses of a 2 + 2-CNF. However, this method does not work for the CQ T T F F q0 R R R which requires a somewhat different technique. We show CO NP-hardness of (Dis, q 0 ) by reduction of 3SAT. Given a 3CNF ψ, we define an ABox Aψ as follows. First, for every variable p in ψ, we construct a ‘gadget’ shown in the picture below, where the number of A-nodes above each of the circles matches the number of clauses in ψ; we refer to these nodes as p-nodes and, respectively, ¬p-nodes (below the circles, there are 2 p- and 2 ¬p-nodes): A A ... ... A A A A F T F T p ¬p T F T F A A A A T T F F A A Observe that, for any model I of Dis and the constructed gadget for p, if I 6|= q then either (i) the p-nodes are all in F I and the ¬p-nodes are all in T I , or (ii) the p-nodes are all in T I and the ¬p-nodes are all in F I . Now, for every clause c = (l1 ∨ l2 ∨ l3 ) in ψ, we add to the constructed gadgets the atoms T (c), R(c, ac¬l1 ), R(ac¬l1 , acl2 ), R(acl2 , acl3 ), where c is a new individual, ac¬l1 a fresh ¬l1 -node, acl2 a fresh l2 -node, and acl3 a fresh l3 -node. For example, for the clause c = (p ∨ q ∨ r), we obtain the fragment below. The resulting ABox is denoted by Aψ . T A A A F A T F A T F A T ¬p q r T F T F T F A A A One can show that ψ is satisfiable iff Dis, Aψ 6|= q 0 . Acknowledgements. The work of O. Gerasimova and M. Zakharyaschev was carried out at the National Research University Higher School of Economics and supported by the Russian Science Foundation under grant 17-11-01294; the work of V. Podol- skii was supported by the Russian Academic Excellence Project ‘5-100’ and by grant MK-7312.2016.1. Thanks are due to Frank Wolter and Carsten Lutz for comments, suggestions and discussions. References 1. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge Univer- sity Press, New York, NY, USA, 1st edn. (2009) 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and re- lations. Journal of Artificial Intelligence Research (JAIR) 36, 1–69 (2009) 3. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, csp, and MMSNP. ACM Transactions on Database Systems 39(4), 33:1– 44 (2014) 4. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: the DL-Lite family. Journal of Automated Reasoning 39(3), 385–429 (2007) 5. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, MMSNP, and expressive description logics. CoRR abs/1701.02231 (2017), http://arxiv.org/ abs/1701.02231 6. Gottlob, G., Papadimitriou, C.H.: On the complexity of single-rule datalog queries. Inf. Com- put. 183(1), 104–122 (2003), http://dx.doi.org/10.1016/S0890-5401(03) 00012-9 7. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: Calvanese, D., Konev, B. (eds.) Proceedings of the 28th International Workshop on Description Logics, Athens,Greece, June 7-10, 2015. CEUR Workshop Proceedings, vol. 1350. CEUR-WS.org (2015), http://ceur-ws.org/Vol-1350/paper-24.pdf 8. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs and non-Horn ontologies. Artif. Intell. 236, 90–118 (2016), http://dx.doi.org/10. 1016/j.artint.2016.03.006 9. Papadimitriou, C.: Computational Complexity. Addison-Wesley (1994) 10. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. Journal on Data Semantics X, 133–173 (2008)