Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering O. Gerasimova1 , S. Kikot2 , and M. Zakharyaschev3 1 National Research University Higher School of Economics, Moscow, Russia 2 University of Oxford, U.K. 3 Birkbeck, University of London, U.K. Abstract. We prove a number of new syntactic and semantic sufficient and nec- essary conditions for ontology-mediated queries (OMQs) with one covering ax- iom to be in the classes AC0 and NL for data complexity, and to be L-, NL- or P-hard. We also give two new examples of very simple CO NP-complete OMQs. 1 Introduction The problem we are concerned with in this paper originated from the observation that the NPD FactPages ontology4, used for testing ontology-based data access (OBDA) in industry [13, 12], contained covering axioms of the form A v B1 t · · · t Bn . It has been known since Schaerf’s paper [18] that answering ontology-mediated queries (OMQs) with a covering axiom can be CO NP-hard for data complexity, and so such OMQs are not suitable for OBDA systems in general. In fact, two interesting OMQs with covering can be extracted from [18]: Q1 = (Cov > , q 1 ) and Q2 = (Cov > , q 2 ), where Cov > = {> v F t T } and the Boolean conjunctive queries (CQs) q 1 and q 2 are shown below. F F N1 R N2 F T T T q1 q2 S R P1 P2 Schaerf pointed out that answering these queries involves case analysis and showed that Q2 is CO NP-hard for data complexity. It is readily seen that Q1 is NL-complete. The problem we started attacking in [9] was to find a complete syntactic or semantic clas- sification of OMQs with covering according to their data complexity. The direct attack failed, and connections with other hard classification problems such as boundedness or linearisability of datalog programs (to be discussed in Section 3) indicate that a longer siege is needed. In this paper, we report on our recent findings. We consider four ontologies with covering: Cov > as above, Cov A = {A v F t T }, Cov ⊥ ⊥ > = {> v F t T, F u T v ⊥}, Cov A = {A v F t T, F u T v ⊥}, and we classify connected Boolean CQs q by the number of solitary occurrences of F and T in them, where a solitary F is any F (x) ∈ q with T (x) ∈ / q, and symmetrically for T . We prove a number of new syntactic and semantic sufficient and necessary conditions for 4 http://sws.ifi.uio.no/project/npd-v2/ such OMQs with at most one solitary F to be in the complexity classes AC0 and NL, and to be L-, NL- or P-hard. We also give two new examples of very simple CO NP- complete OMQs. 2 Preliminaries In this paper, a Boolean conjunctive query (CQ) is any first-order (FO) sentence of the form q = ∃x ϕ(x), where ϕ is a conjunction of unary or binary atoms P (y) with y ⊆ x. We often regard CQs as sets of their atoms, depict them as labelled digraphs, and assume that all of our CQs are connected. By answering an ontology-mediated query (OMQ) Q = (T , q) with a TBox T of the form defined above, we understand the problem of checking, given an ABox A (a finite set of unary or binary ground atoms), whether q holds in every model of T ∪ A, or T , A |= q in symbols. For every Q, this problem is clearly in CO NP. It is in the complexity class AC0 if there is an FO-sentence q 0 , called an FO-rewriting of Q, such that T , A |= q iff A |= q 0 , for any ABox A. A datalog program, Π, is a finite set of rules ∀x (γ0 ← γ1 ∧ · · · ∧ γm ), where each γi is an atom Q(y) with y ⊆ x. (As usual, we omit ∀x.) The atom γ0 is the head of the rule, and γ1 , . . . , γm its body. All the variables in the head must occur in the body. The predicates in the head of rules are IDB predicates, the rest EDB predicates [1]. A datalog query is a pair (Π, G), where Π is a datalog program and G an 0-ary atom, the goal. The answer to (Π, G) over an ABox A is ‘yes’ if G holds in the FO- structure obtained by closing A under Π, in which case we write Π, A |= G. A datalog query (Π, G) is a datalog rewriting of an OMQ Q = (T , q) in case T , A |= q iff Π, A |= G, for any ABox A. The answering problem for (Π, G)—i.e., checking, given an ABox A, whether Π, A |= G—is clearly in P. Answering a datalog query with a linear program, whose rules have at most one IDB predicate in the body, can be done in NL. The NL upper bound also holds for datalog queries with a linear-stratified program defined as follows. A stratified program [1] is a sequence Π = (Π0 , . . . , Πn ) of datalog programs, called the strata of Π, such that each predicate in Π can occur in the head of a rule only in one stratum Πi and can occur in the body of a rule only in strata Πj with j ≥ i. If, additionally, the body of each rule in Π contains at most one occurrence of a head predicate from the same stratum, Π is called linear-stratified. Every linear- stratified program can be converted to an equivalent linear datalog program [2]. 3 Connections in High Places We begin by putting our little problem into the context of more general investigations of (i) boundedness and linearisability of datalog programs and (ii) the data complexity of answering OMQs with expressive ontologies. The decision problem whether a given datalog program is bounded has been a hot research topic in database theory since the late 1980s. Thus, it was shown that bounded- ness is undecidable already for linear datalog programs with binary IDB predicates [20] and single rule programs (aka sirups) [15]. On the other hand, deciding boundedness is 2E XP T IME-complete for monadic datalog programs [7, 4] and PS PACE-complete for linear monadic programs [7]; for linear sirups, it is even NP-complete [20]. The last two results are relevant for deciding FO-rewritability of OMQs (Cov A , q), where q has a single solitary F and is called a 1-CQ. Indeed, suppose that F (x) and T (y1 ), . . . , T (yn ) are all the solitary occurrences of F and T in q. Let Πq be a monadic datalog program with three rules G ← F (x), q 0 , P (y1 ), . . . , P (yn ), (1) P (x) ← T (x), (2) 0 P (x) ← A(x), q , P (y1 ), . . . , P (yn ), (3) where q 0 = q \ {F (x), T (y1 ), . . . , T (yn )} and P is a fresh predicate symbol that never occurs in our ABoxes. Then, for any ABox A, we have Cov A , A |= q iff Πq , A |= G. Thus, FO-rewritability of (Cov A , q) is clearly related to boundedness of the sirup (3). The problem of linearising datalog programs, that is, transforming them into equiv- alent linear datalog programs, which are known to be in NL for data complexity, has also attracted much attention [16, 17, 21, 2] after the Ullman and van Gelder pioneering paper [19]. The success of the OBDA paradigm spurred considerable interest in the data com- plexity of answering individual OMQs with expressive ontologies. Thus, by establish- ing a remarkable connection to CSPs, it was shown in [5] that deciding FO-rewritability and datalog rewritability of OMQs with a SHIU ontology and a (Boolean) atomic query is NE XP T IME-complete. This result is obviously applicable to (Cov A , q) with a tree-shaped q. In [8], it was shown that checking rewritability of Boolean monadic disjunctive datalog programs into FO and monadic datalog can be done in, respectively, 2NE XP T IME and 3E XP T IME, which is applicable to all of our OMQs (Cov A , q). An AC0 /NL/P trichotomy for the data complexity of answering OMQs with an EL ontology and atomic query, which can be checked in E XP T IME, was established in [14]. This result is applicable to OMQs (Cov A , q), in which q is an F -tree having a single solitary F (x) such that the binary atoms in q form a ditree with root x. Indeed, denote by TQ the EL TBox with concept inclusions F u Cq v G0 , T v P and A u Cq v P , where Cq is an EL-concept representing q \ {F (x)} with P for T (so for q of the form F FT T x R1 y1 R2 y2 R3 y3 Cq = ∃R1 .(F u P u ∃R2 .∃R3 .P )). Then, for any ABox A that does not contain G0 , we have ΠQ , A |= G iff TQ , A |= ∃x G0 (x). Yet, despite all of these efforts and results (implying, in view of the recent positive solution to the Feder-Vardi conjecture [6, 22], that there is a P/CO NP dichotomy for OMQs with a SHIU ontology and a (Boolean) atomic query, which is decidable in NE XP T IME), we are still lacking simple and transparent, in particular syntactic, con- ditions guaranteeing this or that data complexity or type of rewritability. Some results in this direction were obtained in [10, 11]. That a transparent classification of monadic sirups according to their data complexity has not been found so far and the close con- nection to CSPs indicate that this problem is extremely hard in general. Our aim below is to demonstrate that, for the OMQs with covering considered in this paper, syntactic conditions do exist but are difficult to find. 4 0-CQs As observed in [9], if an OMQ Q (of the form defined above) is in AC0 , then q is an FO-rewriting of Q. By a 0-CQ we mean any CQ that does not contain a solitary F . A twin in a CQ q is any pair F (x), T (x) ∈ q. Here is an encouraging syntactic criterion of FO-rewritability for OMQs of the form (Cov ⊥A , q): Theorem 1. (i) If q is a 0-CQ, then answering both (Cov ⊥ A , q) and (Cov A , q) is in AC0 , with q being an FO-rewriting of these OMQs. (ii) If q is not a 0-CQ and does not contain twins, then answering both (Cov ⊥ > , q) and (Cov > , q) is L-hard. Proof. (i) follows from [9, Theorem 1]. (ii) The proof is by reduction to the reachabil- ity problem for undirected graphs, which is known to be L-complete; see, e.g., [3]. Denote by q 0 the CQ obtained from q by gluing together all the variables x with F (x) ∈ q and also all the variables y with T (y) ∈ q. Thus, q 0 contains a single F - atom, F (x), and a single T -atom, T (y). Clearly, there is a homomorphism h : q → q 0 . Let q 00 = q 0 \ {F (x), T (y)}. Suppose we are given an undirected graph G = (V, E) and two vertices s, t ∈ V . We regard G as a directed graph such that (u, v) ∈ E iff (v, u) ∈ E, for any u, v ∈ V . Now, we encode G by means of an ABox AG that is obtained from G as follows. For every edge e = (u, v) ∈ E, let q 00e be the set of atoms in q 00 with x renamed to u, y to v and all other variables z to ze . Then AG comprises all such q 00e , for e ∈ E, as well as F (s) and T (t). We show that s →G t iff Cov > , AG |= q. Suppose that s →G t, i.e., there exists a path s = v0 , . . . , vn = t in G with ei = (vi , vi+1 ) ∈ E, for i < n. Consider an arbitrary model I of Cov > and AG . Since I |= Cov > , and F (s) and T (t) are in AG , we can find some i < n such that I |= F (vi ) and I |= T (vi+1 ). As q 00ei is an isomorphic copy of q 00 , we obtain I |= q 0 , and so I |= q. Conversely, suppose s 6→G t. Define an interpretation I, extending the ABox AG , by setting F I to be the set of objects in AG that are reachable from s and T I its complement. Clearly, I is a model of Cov > . By the construction, the elements of the connected component of I containing s cannot be instances of T , while the remaining elements of I are not be instances of F . Since q is connected, it follows that I 6|= q. 0 Corollary 1. An OMQ (Cov ⊥ A , q) is in AC iff q is a 0-CQ, which can be decided in linear time. If twins can occur in CQs (that is, F and T are not necessarily disjoint), the pic- ture becomes more complex. On one hand, we have the following criterion for OMQs (Cov A , q) with a path CQ q whose variables x0 , . . . , xn in q are ordered so that the binary atoms in q form a chain R1 (x0 , x1 ), . . . , Rn (xn−1 , xn ). Theorem 2 ([9]). An OMQ (Cov A , q) with a path CQ q is in AC0 iff q is a 0-CQ. If q contains both solitary F and T , then (Cov A , q) is NL-hard. On the other hand, this AC0 /NL criterion collapses for path CQs with loops: Proposition 1. The OMQ (Cov A , q), where q is shown below, is in AC0 . FT T F FT R R R S S S Proof. Follows from the sufficient condition of Theorem 4 (i). Note that the CQ q above is minimal (not equivalent to any of its proper sub-CQs). Note also that, if a minimal 1-CQ q contains both a solitary F and a solitary T , then FO-rewritability of (Cov A , q) implies that q contains at least one twin (F T ) and at least one y with T (y) ∈/ q and F (y) ∈ / q (which can be shown using Theorem 4 (i)). 5 1-CQs 1-CQs have exactly one solitary F . As observed in Section 3, we have the following: Theorem 3. (i) Answering any OMQ (Cov A , q) with a 1-CQ q can be done in P. (ii) Answering any OMQ (Cov A , q) with an F -tree q is either in AC0 or NL- complete or P-complete. The trichotomy can be decided in E XP T IME. Theorem 3 (ii) was proved by a reduction to the AC0 /NL/P-trichotomy of [14]. It is to be noted, however, that applying the algorithm from [14] in our case is tricky because the input ontology TQ must first be converted to a normal form. As a result, we do not obtain transparent syntactic criteria on the shape of q that would guarantee that the OMQ (Cov A , q) belongs to the desired complexity class (see examples below). We now give a semantic sufficient condition for an OMQ with a 1-CQ to lie in NL. This condition uses ideas and constructions from [7, 14]. Let Q = (Cov A , q) be an OMQ with a 1-CQ q having a solitary F (x). Define by induction a class KQ of ABoxes that will be called cactuses for Q. We start by setting KQ := {q}, regarding q as an ABox, and then recursively apply to KQ the following two rules: (bud) if T (y) ∈ A ∈ KQ with solitary T (y), then we add to KQ the ABox obtained by replacing T (y) in A with (q \ F (x)) ∪ {A(x)}, in which x is renamed to y and all of the other variables are given fresh names; (prune) if Cov A , A0 |= Q, where A0 = A \ {T (y)} and T (y) is solitary, we add to KQ the ABox obtained by removing T (y) from A ∈ KQ . It is readily seen that, for any ABox A0 , we have Cov A , A0 |= Q iff there exist A ∈ KQ and a homomorphism h : A → A0 . Denote by K†Q the set of minimal cactuses in KQ (that have no proper sub-cactuses in KQ ). For a cactus C ∈ KQ , we refer to the copies of (maximal subsets of) q that comprise C as segments. The skeleton C s of C is the ditree whose nodes are the segments s of C and edges (s, s0 ) mean that s0 was attached to s by budding. The atoms T (y) ∈ s are called the buds of s. The rank r(s) of s is defined by induction: if s is a leaf, then r(s) = 0; for non-leaf s, we compute the maximal rank m of its children and then set ( m + 1, if s has ≥ 2 children of rank m; r(s) = m, otherwise. The width of C and C s is the rank of the root in C s . We say that K†Q is of width k if it contains a cactus of width k but no cactus of greater width. The depth of C and C s is the number of edges in the longest branch in C s . We illustrate the definition by an example. Denote by q T nT , for n ≥ 0, the 1-CQ shown below, where all the binary predicates are R and the n variables without labels do not occur in F - or T -atoms: F T T ... n Example 1. Let Q = (Cov > , q T 1T ). In the picture below, we show a cactus C obtained by applying (bud) twice to q T 1T (with A = > omitted): F T z T T T One can check that Cov > , C \ {T (z)} |= q T 1T , and so an application of (prune) will remove T (z) from C. Using this observation, one can show that K†Q is of width 1. On the other hand, if Q = (Cov A , q T 1T ) then K†Q is of unbounded width as follows from Theorem 6 below. Theorem 4. Let Q = (Cov A , q) be an OMQ with a 1-CQ q. Then (i) Q is in AC0 iff for every C ∈ K†Q , there is a homomorphism h : q → C; (ii) Q is rewritable in linear datalog, and so is in NL, if K†Q is of bounded width. Proof. The proof of (i) is straightforward. To prove (ii), we represent the ABoxes in KQ as words in a tree alphabet and construct a non-deterministic finite tree automaton AQ such that K†Q ⊆ L(AQ ) ⊆ KQ . Then we show that, for K†Q of finite width, the automaton AQ can be transformed into a monadic linear-stratified datalog rewriting of Q. It can further be converted into a linear datalog rewriting at the expense of increasing the arity if IDBs in the program [2]. It is worth noting that, for Q = (Cov > , q) with q from Proposition 1, K†Q consists of q and the cactus of depth 1, in which the only solitary T is removed by (prune). Clearly, there is a homomorphism from q into this cactus, and so Q is FO-rewritable. However, for the 1-CQ q in the picture below (where all edges are bidirectional), (Cov > , q) is not FO-rewritable, but there is a homomorphism from q to both cactuses of depth 1. We do not know whether, in general, there is an upper bound Nq such that the existence of homomorphisms h : q → C, for all C ∈ KQ of depth Nq , would ensure FO-rewritability of (Cov A , q). For 1-CQs q with a single solitary T , one can take Nq = |q| + 1. Neither do we know the exact complexity of deciding FO-rewritability of OMQs with 1-CQs. As mentioned in Section 3, this problem is reducible to the boundedness problem for monadic datalog programs, which is known to be in 2E XP T IME. S R, Q R, Q T S S R Q R FT Q FT R S F FT FT Q T S S S R Q R S R, Q R, Q Theorem 4 (ii) allows us to obtain a sufficient condition for linear-datalog rewritabil- ity of OMQs (Cov A , q) with an F -path CQ q, that is, a path CQ with a single solitary F at its root. We represent such a q as shown in the picture below, which indicates all the solitary occurrences of F and T : F T T T ... ... ... ... q = x y1 yi ym ym+1 We require the following sub-CQs of q: – q i is the suffix of q that starts at yi , but without T (yi ), for 1 ≤ i ≤ m; – q ∗i is the prefix of q that ends at yi , but without F (x) and T (yi ), for 1 ≤ i ≤ m; – q ∗m+1 is q without F (x), and write fi : q i  q if fi is a homomorphism from q i into q with fi (yi ) = x. Theorem 5. If for each 1 ≤ i ≤ m there exist fi : q i  q, then (Cov A , q) is rewritable into a linear datalog program, and so is NL-complete. For F -path CQs q without twins, we extend Theorem 5 to a NL/P dichotomy (pro- vided that NL 6= P). Given such a CQ q, we denote by Nq the set of the numbers indicating the length of the path from x to each of the yi , i = 1, . . . , m + 1. Theorem 6. Let Q = (Cov A , q) be an OMQ where q is an F -path CQ without twins having a single binary relation. The following are equivalent unless NL = P: (i) Q is NL-complete; (ii) {0} ∪ Nq is an arithmetic progression; (iii) there exist fi : q i  q for every i = 1, . . . , m. If these conditions do not hold, then Q is P-complete. Proof. It is readily seen, using Theorem 5, that (ii) ⇒ (iii) ⇒ (i). We prove the implication (i) ⇒ (ii). Suppose {0} ∪ Nq is not an arithmetic progression. We show that Q is P-hard by reduction of the monotone circuit evaluation problem. Let X be the sub-CQ of q between F (x) and the first T but without the F and T atoms, and let |X| be the number of atoms in X. Let Y be the first sub-CQ between neighbouring T -atoms without these T -atoms such that |Y | = 6 |X|, n the number of X-sub-CQs before Y , and let Z be the rest of the query without the first T atom; see the picture below where n = 2. F X T X T Y T Z x0 x1 x2 x3 x4 We distinguish between two cases: |Y | > |X| and |Y | < |X|. Depending on the case, we use the following gadgets for ∧-gates, which are shown below for n = 2 (it should be clear how to modify the construction for n > 2); the ∨-gate gadget is the same for both cases. c ∧-gate for |Y | > |X| A A T X ∨-gate Z Y X A ∧-gate for |X| > |Y | b0 a0 c X X c X X T T A T T X X X X X T T T T T Y Y X A Y Z Y Z Z Z Y Z A A A A A b a b a a b Given any monotone Boolean circuit C and its input α, we construct an ABox AC,α by replacing ∧- and ∨-gates with the inputs a and b and output c by the gadgets above, placing T on the input gates evaluated to 1 under α and F on the output gate. We claim that Cov A , AC,α |= q iff C(α) = 1. (⇐) It is not hard to show by induction that whenever a non-input gate g in C outputs 1 under α, then the sub-ABox of AC,α generated by g and with F placed on g validates Q. For example, suppose T (a), T (b) and F (c) hold in the left-hand side gadget. Then either F (b0 ) or T (b0 ). In the former case, q is satisfied; so assume T (b0 ). We have either F (a0 ) or T (a0 ). Now, in either case, q is satisfied, as required. (⇒) Suppose C(α) = 0. We construct a model I of Cov A based on AC,α by placing T or F to g and g 0 (if available) depending on the value of g in C under α. It can be checked that I 6|= q. In particular, if |Y | > |X|, c is the conjunction of b and a, c = b = 0 and a = 1, then if x0 goes to c, x1 goes to a0 , then x2 must go to the point below a0 , and so x3 must go somewhere between a and the point above a, which is impossible. If a = 0, b = 1 and x0 goes to a0 and x1 goes to the central T , then x2 must go somewhere between the central T and b0 , which is also impossible. If |Y | < |X|, c = b ∧ a, b = d ∨ e, a = c = 0, b = 1, then if x0 goes to c, x2 goes to b, then x3 cannot go to a as a = 0, and so it must go into some X-segment of the gate b = d ∨ e going out of b, which is again impossible. Note that the proof of P-hardness in Theorem 6 does not go through for A = >. Thus, for (Cov > , q T 1T ), we are in the framework of Example 1 and, by Theorem 4 (ii), this OMQ is in NL. In fact, we have the following NL/P dichotomy for the OMQs of the form (Cov > , q T nT ). Theorem 7. (i) The OMQ (Cov > , q T 1T ) is NL-complete (whereas (Cov A , q T 1T ) is P-complete). (ii) The OMQs (Cov > , q T nT ) (and (Cov A , q T nT )), for n ≥ 2, are P-complete. Proof. The proof is similar to that of P-hardness in Theorem 6. We now apply Theorem 4 (ii) to the class of T F -path CQs of the form T F T T ... ... ... ... qT F = y x y1 ym ym+1 0 where the T (yi ) and F (x) are all the solitary occurrences of T and F in q T F . We represent this CQ as q T F = {T (y0 )} ∪ q 0 ∪ q, where q 0 is the sub-CQ of q T F between y0 and x with T (y0 ) removed and q is the same as in Theorem 5 (and q ∗m+1 is q without F (x)). Theorem 8. If q satisfies the condition of Theorem 5 and there is a homomorphism h : q ∗m+1 → q 0 such that h(x) = y0 , then answering (Cov A , q T F ) is NL-complete. For example, the OMQ (Cov A , q) with q shown below is NL-complete: T FT F T On the other hand, as shown in [9], (Cov A , q) with q of the form T F T is P-complete; in fact, it follows from the proof that (Cov > , q) is P-complete, too. 6 2-CQs A 2-CQ has at least two solitary F and at least two solitary T . For example, as shown in [9], answering (Cov A , q) with the following CQ q is CO NP-complete: T T F F The proof can be generalised to the class of 2-2-CQs, which are path 2-CQs where all the F are located after all the T , and every occurrence of T or F is solitary. We represent any given 2-2-CQ q as shown below T T F F p x r y s z u w v where p, r, u and v do not contain F and T , while s may contain solitary occurrences of both T and F (in other words, the T shown in the picture are the first two occurrences of T in q and the F are the last two occurrences of F in q). Denote by q r the suffix of q that starts from x but without T (x); similarly, q u is the suffix of q starting from z but without F (z). Denote by q − r the prefix of q that ends at y but without T (y); similarly, q− u is the prefix of q ending at w but without F (w). Using the construction from [9], one can show the following: Theorem 9. Any OMQ (Cov A , q) with a 2-2-CQ q is CO NP-complete provided the following conditions are satisfied: (i) there is no homomorphism h1 : q u → q r with h1 (z) = x, and (ii) there is no homomorphism h2 : q − − r → q u with h2 (y) = w. Unfortunately, the proof does not generalise to the other two path 2-CQs with four variables, whose complexity has so far been open: T F T F q1 T F F T q2 Theorem 10. The OMQs Q1 = (Cov A , q 1 ) and Q2 = (Cov A , q 2 ) are CO NP-complete. Proof. We prove CO NP-hardness of Q1 by reduction of the NP-complete 3SAT. Given a 3CNF ψ, we construct an ABox Aψ as follows. First, for every variable p occurring in ψ, we take the following p-gadget, where n is the number of clauses in ψ: F F F F F F F F F F T T T T T F T T T T T a1 ... an b1 ... bn A A A A A A A A A A A A T F F T p ¬p F T T F A A A A A A A A A A A A ... ... c1 cn d1 dn F F F F F T F F F F F T T T T T T T T T T The key property of the p-gadget is that, for any model I of Cov A based on this gadget, if I 6|= q 1 then either the A-points on left-hand side of the gadget are all in T I and the A-points on the right-hand side are all in F I , or the other way round. We refer to the ai and bi as p↑ - and ¬p↑ -contacts, and to the ci and di as p↓ - and ¬p↓ -contacts, respectively. Now, for every clause c = (l1 ∨ l2 ∨ l3 ) in ψ, we add to the constructed gadgets for the variables in ψ the atoms R(uc¬l1 , vlc2 ), R(vlc2 , c), T (c), R(c, wlc3 ), where c is a new individual, uc¬l1 a fresh ¬l1↑ -contact, vlc2 a fresh l2↓ -contact, and wlc3 a fresh l3↓ -contact. For example, for the clause c = (p ∨ q ∨ ¬r), we obtain the fragment below: A A T A bi cj c dk A ¬p A A q A A ¬r A The resulting ABox Aψ is such that ψ is satisfiable iff Cov A , Aψ 6|= q 1 . For the OMQ Q2 , we use (simplified) p-gadgets presented in Theorem 11 and con- nect them similarly to the picture above with T replaced by F at c. We now sketch two generalisation of Theorem 10. In Theorem 11 and 12, we assume that p and v do not contain F and T , while r and u may only contain solitary occurrences of T (F 6∈ r, u), and s only solitary occurrences of F (T 6∈ s). Theorem 11. Any OMQ (Cov A , q) with q of the form T F F T p x r y s z u w v is CO NP-complete provided the following conditions are satisfied: (i) there is no ho- momorphism h1 : rt → u with h1 (y) = w, and (ii) there is no homomorphism h2 : ut → r with h2 (z) = x, where r is the sub-CQ of q between x and y without T (x), F (y), and similarly for u, rt is r with T (x) and ut is u with T (w). The proof uses the following p-gadget: v v v v v v v v v v T T T T T T T T T T u u u u u u u u u u p v T F F F F F F F F F F F F T r s s s s s s s s s s s u a1 ... an A r u A b1 ... bn p A r A r A r A r A r r A r A r A r A r A p p p p p v v p p p p p p s u r s F p F F ¬p F u s u r v v v v v v v v v v A A A A A A A A A A A A p u ... u ... u c1 u cn u u u u c1 u cn u p s s s s s r u s s s s s s F F F F F T T F F F F F p v r r r r r r r r r r T T T T T T T T T T p p p p p p p p p p If one of the two conditions of this theorem is not satisfied then the given gadget will not work; see the shaded parts. For example, the complexity of the OMQ with the CQ below is still unknown: T F F F T T T x y z w ext In Theorem 12, we use r = r(x, y) ∧ T (y) ∧ s1 (y, y1 ) ∧ F (y1 ), where s1 is the part of s such that s(y, z) = s1 (y, y1 ) ∧ F (y1 ) ∧ s2 (y1 , z) and s1 (y, y1 ) does not contain any occurrences of F . In other words, the variable y1 corresponds to the first appearance of F in s, where s is the sub-CQ of q between y and z without F (y), T (z). Theorem 12. Any OMQ (Cov A , q) with q of the form T F T F p x r y s z u w v is CO NP-complete provided the following conditions hold: (i) there is no homomor- phism g1 : rt → u with g1 (y) = w, and (ii) there is no homomorphism g2 : u → r ext with g2 (z) = x and g2 (w) = y1 . The CQ below does not satisfy (ii), and its complexity is unknown: T T F F F T T T F x y z w It is also unknown whether there are OMQs with 2CQs in P. Acknowledgements. The work of O. Gerasimova and M. Zakharyaschev was carried out at the National Research University Higher School of Economics and supported by the Russian Science Foundation under grant 17-11-01294. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Afrati, F.N., Gergatsoulis, M., Toni, F.: Linearisability on datalog programs. Theor. Comput. Sci. 308(1-3), 199–226 (2003), https://doi.org/10.1016/S0304-3975(02) 00730-2 3. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge Univer- sity Press, New York, NY, USA, 1st edn. (2009) 4. Benedikt, M., ten Cate, B., Colcombet, T., Vanden Boom, M.: The complexity of bound- edness for guarded logics. In: 30th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10, 2015. pp. 293–304. IEEE Computer Society (2015), https://doi.org/10.1109/LICS.2015.36 5. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Transactions on Database Systems 39(4), 33:1–44 (2014) 6. Bulatov, A.A.: A dichotomy theorem for nonuniform csps. In: Umans, C. (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017. pp. 319–330. IEEE Computer Society (2017), https://doi.org/ 10.1109/FOCS.2017.37 7. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization prob- lems for database logic programs (preliminary report). In: STOC. pp. 477–490 (1988) 8. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, mmsnp, and expressive description logics (invited talk). In: Benedikt, M., Orsi, G. (eds.) 20th Inter- national Conference on Database Theory, ICDT 2017, March 21-24, 2017, Venice, Italy. LIPIcs, vol. 68, pp. 1:1–1:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017), https://doi.org/10.4230/LIPIcs.ICDT.2017.1 9. Gerasimova, O., Kikot, S., Podolskii, V., Zakharyaschev, M.: On the data complexity of ontology-mediated queries with a covering axiom. In: Proceedings of the 30th International Workshop on Description Logics (2017) 10. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: Calvanese, D., Konev, B. (eds.) Proceedings of the 28th International Workshop on Description Logics, Athens,Greece, June 7-10, 2015. CEUR Workshop Proceedings, vol. 1350. CEUR-WS.org (2015), http://ceur-ws.org/Vol-1350/paper-24.pdf 11. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs and non-Horn ontologies. Artif. Intell. 236, 90–118 (2016), http://dx.doi.org/10. 1016/j.artint.2016.03.006 12. Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G., Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., Giese, M., Lie, H., Ioannidis, Y.E., Kotidis, Y., Koubarakis, M., Waaler, A.: Ontology based data access in statoil. J. Web Sem. 44, 3–36 (2017), https://doi.org/10.1016/j.websem.2017.05.005 13. Lanti, D., Rezk, M., Xiao, G., Calvanese, D.: The NPD benchmark: Reality check for OBDA systems. In: Alonso, G., Geerts, F., Popa, L., Barceló, P., Teubner, J., Ugarte, M., den Buss- che, J.V., Paredaens, J. (eds.) Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015. pp. 617–628. OpenProceedings.org (2015), https://doi.org/10.5441/002/edbt.2015.62 14. Lutz, C., Sabellek, L.: Ontology-mediated querying with the description logic EL: tri- chotomy and linear datalog rewritability. In: Sierra, C. (ed.) Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 1181–1187. ijcai.org (2017), https://doi.org/10.24963/ ijcai.2017/164 15. Marcinkowski, J.: DATALOG sirups uniform boundedness is undecidable. In: Proceedings, 11th Annual IEEE Symposium on Logic in Computer Science, New Brunswick, New Jersey, USA, July 27-30, 1996. pp. 13–24. IEEE Computer Society (1996), https://doi.org/ 10.1109/LICS.1996.561299 16. Ramakrishnan, R., Sagiv, Y., Ullman, J.D., Vardi, M.Y.: Proof-tree transformation theorems and their applications. In: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART sym- posium on Principles of database systems. pp. 172–181. ACM (1989) 17. Saraiya, Y.P.: Linearizing nonlinear recursions in polynomial time. In: Silberschatz, A. (ed.) Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 29-31, 1989, Philadelphia, Pennsylvania, USA. pp. 182–189. ACM Press (1989), http://doi.acm.org/10.1145/73721.73740 18. Schaerf, A.: On the complexity of the instance checking problem in concept languages with existential quantification. J. of Intelligent Information Systems 2, 265–278 (1993) 19. Ullman, J.D., Gelder, A.V.: Parallel complexity of logical query programs. Algorithmica 3, 5–42 (1988), https://doi.org/10.1007/BF01762108 20. Vardi, M.Y.: Decidability and undecidability results for boundedness of linear recursive queries. In: Edmondson-Yurkanan, C., Yannakakis, M. (eds.) Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 21-23, 1988, Austin, Texas, USA. pp. 341–351. ACM (1988), http://doi.acm.org/ 10.1145/308386.308470 21. Zhang, W., Yu, C.T., Troy, D.: Necessary and sufficient conditions to linearize double re- cursive programs in logic databases. ACM Trans. Database Syst. 15(3), 459–482 (1990), http://doi.acm.org/10.1145/88636.89237 22. Zhuk, D.: A proof of CSP dichotomy conjecture. In: Umans, C. (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, Octo- ber 15-17, 2017. pp. 331–342. IEEE Computer Society (2017), https://doi.org/10. 1109/FOCS.2017.38