On the Data Complexity of Ontology-Mediated Queries
              with a Covering Axiom

               O. Gerasimova1 , S. Kikot2 , V. Podolskii3,1 , and M. Zakharyaschev2
           1
               National Research University Higher School of Economics, Moscow, Russia
                               2
                                 Birkbeck, University of London, U.K.
                          3
                            Steklov Mathematical Institute, Moscow, Russia


          Abstract. This paper reports on our ongoing work that aims at a classification of
          conjunctive queries q according to the data complexity of answering ontology-
          mediated queries ({A v T t F }, q). We give examples of queries from the
          complexity classes C ∈ {AC0 , L, NL, P, CO NP}, and obtain a few syntactical
          conditions for C-membership and C-hardness.


1      Introduction

The OWL 2 QL profile of OWL 2 —as well as the underlining description logics from
the DL-Lite family [4, 2]—were designed to ensure that every ontology-mediated query
(OMQ, for short) (T , q) with an OWL 2 QL ontology T and a conjunctive query (CQ)
q is first-order (FO) rewritable. However, when developing ontologies for ontology-
based data access (OBDA) [10] applications, domain experts are often tempted to use
axioms with constructs that are not available in OWL 2 QL. For example, the NPD
FactPages ontology,4 which was created to facilitate querying the datasets of the Nor-
wegian Petroleum Directorate,5 contains cardinality restrictions and covering axioms of
the form A v B1 t · · · t Bn . Typical answers to the question whether such axioms
could have a negative impact on OMQ rewriting are as follows: (i) the data satisfies
the axioms anyway (because of the database schema), (ii) our ‘real-world queries’ are
never affected by them, and (iii) OBDA systems such as Ontop drop everything outside
OWL 2 QL. Ideally, of course, we would rather want our system to detect automatically
whether the given OMQ is FO-rewritable and alert the user if this is not so. Furthermore,
in case of non-FO-rewritability, we might want the system to check whether a datalog
rewriting is possible, and so on. From the complexity-theoretic point of view, we are
thus interested in the data complexity of answering a given OMQ with an expressive
ontology.
    A systematic investigation of this problem was started in [3], which showed among
other results that answering OMQs of the form (Disn , u), where u is a union of CQs
(UCQ) and Disn = {A v B1 t · · · t Bn }, is polynomially equivalent to the constraint
satisfaction problems CSP(A). In particular, a P/CO NP dichotomy for such OMQs
would give a dichotomy for CSPs, thereby confirming the Feder-Vardi conjecture. As
 4
     http://sws.ifi.uio.no/project/npd-v2/
 5
     http://factpages.npd.no/factpages/
shown in [7], answering CQs with basic schema.org ontologies (in particular, Disn )
and CQs of qvar-size ≤ 2 is in P for combined complexity, where q is of qvar-size n if
the restriction of q to its quantified variables is a disjoint union of CQs with at most n
variables each. Moreover, FO- and datalog-rewritability of OMQs of the form (T , u),
where T is a schema.org ontology and u is a UCQ, are decidable in NE XP T IME. It has
also been recently established in [5] that checking FO-rewritability of OMQs with on-
tologies formulated in any description logic between ALCI and SHI is 2NE XP T IME-
complete. Datalog rewritability of OMQs with ontologies given in disjunctive datalog
has been investigated in [8].
    In this paper, we consider one fixed non-Horn ontology Dis = {A v T t F }. Ul-
timately aiming at a complete classification of CQs q according to the data complexity
of answering OMQs Q = (Dis, q), here we present our initial observations about this
problem. Ideally, we would like to obtain transparent necessary and sufficient conditions
relating the structure of q—say, the way how T and F occur in it—with the complexity
of answering Q. For example, one such condition guaranteeing datalog rewritability,
and so tractability of answering Q follows from [8, Theorem 27]: it suffices that q
contains at most one occurrence of F or at most one occurrence of T . We obtain a few
conditions in the same spirit for the complexity classes AC0 , L, NL and P. We also give
quite a few simple and instructive CQs distinguishing between NL and P, and develop
techniques for establishing P and CO NP lower bounds.


2   Preliminaries

In our context, a conjunctive query (CQ) is a first-order (FO) formula of the form
q(x) = ∃y ϕ(x, y), where ϕ is a conjunction of unary or binary atoms P (z) with
z ⊆ x ∪ y.Unions of conjunctive queries (UCQ) is a disjunction of conjunctive queries.
Given an ABox (or data instance) A, we denote by ind(A) the set of individual names
that occur in A. A tuple a ⊆ ind(A) is a certain answer to the OMQ Q = (Dis, q(x))
over A if M |= q(a), for every model M of Dis ∪ A; in this case we write Dis, A |=
q(a). If the set x of answer variables is empty, a certain answer to Q over D is ‘yes’ if
M |= q, for every model M of Dis ∪ A, and ‘no’ otherwise. OMQs and CQs without
answer variables x are called Boolean. We often regard CQs as sets of their atoms. In
this paper, we assume that the all CQs are connected.
     Let Q = (Dis, q(x)) be a fixed OMQ. By answering Q, we understand the problem
of checking, given an ABox A and a tuple a ⊆ ind(A), whether Dis, A |= q(a). It is
readily seen that this problem is always in CO NP. It is in the complexity class AC0 if
there is an FO-formula q 0 (x), called an FO-rewriting of Q, such that Dis, A |= q(a)
iff q 0 (a) holds in the model given by A, for any ABox A and any tuple a ⊆ ind(A).
     A datalog program, Π, is a finite set of rules of the form ∀z (γ0 ← γ1 ∧ · · · ∧ γm ),
where each γi is an atom Q(y) with y ⊆ z or an equality (z = z 0 ) with z, z 0 ∈ z. (As
usual, we omit ∀z.) The atom γ0 is the head of the rule, and γ1 , . . . , γm its body. All
variables in the head must occur in the body, and = can only occur in the body. The
predicates in the heads of rules in Π are IDB predicates, the rest (including =) EDB
predicates. A program Π is called linear if the body of every rule in Π contains at most
one IDB predicate.
     A datalog query is a pair (Π, G(x)), where Π is a datalog program and G(x) an
atom. A tuple a ⊆ ind(A) is an answer to (Π, G(x)) over an ABox A if G(a) holds
in the first-order structure with domain ind(A) obtained by closing A under the rules
in Π; in this case we write Π, A |= G(a). A datalog query (Π, G(x)) is a datalog
rewriting of an OMQ Q = (Dis, q(x)) in case Dis, A |= q(a) iff Π, A |= G(a),
for any ABox A and any a ⊆ ind(A). The evaluation problem for (Π, G(x))—that
is, checking, given an ABox A and a tuple a ⊆ ind(A), whether Π, A |= G(a)—is
known to be in P; for linear Π, this problem is in NL; see [6] and references therein.


3   AC0

By a solitary occurrence of F in a CQ q we mean any F (x) ∈ q such that T (x) ∈ / q;
likewise, a solitary occurrence of T in q is any T (x) ∈ q such that F (x) ∈
                                                                           / q.

Theorem 1. For any CQ q without solitary occurrences of F (or T ), answering the
OMQ Q = (Dis, q) is in AC0 .

Proof. We show that Dis, A |= q(a) iff A |= q(a). Suppose that A 6|= q(a) and
F (x) ∈ q ⇒ T (x) ∈ q. Take A0 = A ∪ {F (a) | a ∈ ind(A) ∧ T (a) ∈
                                                                 / A}. Clearly,
A0 |= Dis and A0 6|= q(a). The converse direction is trivial.

   In particular, answering any OMQ Q = (Dis, q), where q does not contain one of
F or T , is in AC0 . This observation can be easily generalised to OMQs with ontologies
Disn = {A v B1 t · · · t Bn }, for n ≥ 2:

Theorem 2. Suppose q is any CQ that does not contain an occurrence of Bi , for some
i (1 ≤ i ≤ n). Then answering the OMQ Q = (Disn , q) is in AC0 .

    Thus, only those CQs can ‘feel’ Disn as far as FO-rewritability is concerned that
contain all the Bn (which makes them quite complex in practice). Theorem 1 also shows
that Q = (Dis, q) satisfying the respective condition has a trivial FO-rewriting, viz. q
itself. This is not accidental as shown by the following observation:

Proposition 1. If Q = (Dis, q) is in AC0 , then q is a rewriting of Q.

Proof. By [3, Proposition 5.9], if Q is FO-rewritable, it has a UCQ rewriting. Then
there is a homomorphism from q to any CQ q 0 in this rewriting.

   We do not know yet whether the sufficient condition for FO-rewritability given
by Theorem 1 is also a necessary one for minimal CQs q (that are not equivalent to
any of their proper subqueries). For non-minimal CQs, this is not the case as shown
by F R ◦ R F T R ◦ R T which is in AC0 because it is equivalent to the CQ
◦   R
        FT   R
                 ◦ . Below we obtain some partial results showing how a single F -atom
and a single T -atom in q can cause L- and NL-hardness.
4    L and NL

We say that a Boolean CQ q is an F -T -CQ if it has exactly one atom of the form F (x),
exactly one atom of the form T (y), and the variables x and y are distinct.

Theorem 3. Answering any OMQ Q = (Dis, q) with an F -T -CQ q is L-hard.

Proof. The proof is by reduction to the reachability problem for undirected graphs,
which is known to be L-complete; see, e.g., [1]. Let q 0 be the CQ obtained from q
by removing the atoms F (x) and T (y). Suppose we are given an undirected graph
G = (V, E) and two vertices s, t ∈ V . It will be convenient to regard G as a directed
graph such that (u, v) ∈ E iff (v, u) ∈ E, for any u, v ∈ V . We encode G by means
of an ABox AG that is obtained from G as follows. For every edge e = (u, v) ∈ E,
let q 0e be the set of atoms in q 0 with x renamed to u, y to v and all other variables z
to ze . Then AG comprises all such q e , for e ∈ E, as well as F (s), T (t) and A(v), for
v ∈ V \ {s, t}. Our aim is to show that s →G t iff Dis, AG |= q.
    Suppose s →G t, that is, there exists a path s = v0 , v1 , . . . , vn = t in G with
ei = (vi , vi+1 ) ∈ E, for i < n. Consider an arbitrary model I of Dis and AG . Since
I |= Dis and F (s), T (t), A(vi ), for 1 ≤ i < n, are all in AG , we can find some i < n
such that I |= F (vi ) and I |= T (vi+1 ). As q 0ei is an isomorphic copy of q 0 , we obtain
I |= q. Conversely, suppose s 6→G t. Define an interpretation I by extending the ABox
AG with F I = {v ∈ V | s →G v} and T I = {v ∈ V | s 6→G v}. Clearly, I is a model
of Dis. By the construction, the elements of the connected component of I containing
s cannot be instances of T , while the remaining elements of I cannot be instances of
F . Since q is connected, it follows that I 6|= q.

   We call a Boolean CQ q linear-directed if all of its variables can be arranged in a
sequence v0 , . . . , vm such that all binary predicates in q are of the form R(vi , vi+1 ), for
some i, 0 ≤ i < m.

Theorem 4. Answering any OMQ Q = (Dis, q) with a linear-directed CQ q contain-
ing both a solitary F and a solitary T is NL-hard.

Proof. Suppose F (vk ) ∈ q, T (vk ) ∈ / q and F (vl ) ∈
                                                      / q, T (vl ) ∈ q, for some k, l with
0 ≤ k < l ≤ m. We rename the sequence vk , . . . , vl to x0 , . . . , xn . The proof proceeds
by reduction to the reachability problem in directed graphs, which is known to be NL-
complete; see, e.g., [1]. Given a directed graph G = (V, E) and vertices s, t ∈ V , we
construct the ABox AG in the same way as in the proof of Theorem 3 treating x0 as x
and xn as y. Again, we show that s →G t iff Dis, AG |= q. The implication (⇒) is
established exactly as above.
    To prove (⇐), we assume that s 6→G t and consider the same model I as defined
in the proof of Theorem 3. Taking account of linear-directedness of q, we immediately
conclude that there is no homomorphism h : q → I with h(x0 ) ∈ V . It remains to
show that there is no homomorphism h : q → I with h(x0 ) ∈            / V either. Suppose to
the contrary that such a homomorphism exists. Then there exist B ∈ {F, T } and a
homomorphism f : q → (AG2 ∪ {B(r)}), where G2 = ({s, r, t}, {(s, r), (r, t)}). We
denote the points of AG2 between s and r by x0 , x1 , . . . , xn and those between r and
t by xn , x01 , . . . , x0n . By comparing the lengths of appropriate segments of q, we obtain
f (x0 ) = xi , for some i (0 < i < n). As F (x0 ) ∈ q, we must have F (xi ) ∈ q; see
the picture below. As f (xi ) = x2i if 2i ≤ n, and f (xi ) = x02i mod n otherwise, we
also have F (x2i mod n ) ∈ q; more generally, F (xki mod n ) ∈ q for all natural k. Now,
since the equation of the form ‘iX = n mod n’ always has a solution, F (xn ) ∈ q,
which is impossible if B = T . If B = F , we use a similar argument starting from
T (xi ) ∈ q and show that T (xn ) ∈ q, which is again a contradiction.

       F            f (x0 )         f (xi )           B      ...      x0i       ...   x0n

       x0     ...       xi    ...             ...     xn                              T

                    F         ...             ...            ...            T
                        x0            xi              xj             xn

    Theorems 1 and 4 give the following dichotomy for OMQs Q = (Dis, q) with
linear-directed CQs q:
 – either q does not contain a solitary F or a solitary T , and answering Q is in AC0 ,
 – or q contains both solitary F and T , and answering Q is NL-hard.
We now complement the sufficient conditions of L- and NL-hardness obtained above
with sufficient conditions of OMQ answering in L- and NL.
   A CQ q 0 (x, y) is symmetric if the CQs q 0 (x, y) and q 0 (y, x) are equivalent in the
sense that q 0 (a, b) holds in A iff q 0 (b, a) holds in A, for any ABox A and a, b ∈ ind(A).

Theorem 5. Let Q = (Dis, q) be any OMQ such that

                 q = ∃x, y (F (x) ∧ q 01 (x) ∧ q 0 (x, y) ∧ q 02 (y) ∧ T (y)),

for some connected CQs q 0 (x, y), q 01 (x) and q 02 (y) that do not contain solitary T and
F , and q 0 (x, y) is symmetric. Then answering Q can be done in L.

Proof. It is not hard to show that, for any ABox A, we have Dis, A |= q iff there exist
v0 , v1 , . . . , vn ∈ ind(A), for some n ≥ 1, such that the following conditions hold:
 – F (v0 ), A(v1 ), . . . , A(vn−1 ), T (vn ) ∈ A;
 – A |= q 0 (vi , vi+1 ) for 0 ≤ i < n;
 – A |= q 01 (vi ) for 0 ≤ i < n;
 – A |= q 02 (vi ) for 1 ≤ i ≤ n.
It remains to observe that checking these conditions reduces to checking VT –VF reach-
ability in the undirected graph GA = (VA , EA ) defined below. The vertices in GA
comprise the set VA = VT ∪ VA ∪ VF , where
 – VT = {v ∈ ind(A) | A |= T (v) ∧ q 02 (v)};
 – VA = {v ∈ ind(A) | A |= A(v) ∧ q 01 (v) ∧ q 02 (v)};
 – VF = {v ∈ ind(A) | A |= F (v) ∧ q 01 (v)}.
The edges in GA comprise the set EA = ET A ∪ EAA ∪ EF A , where
    – Eall = {(x, y) | A |= q 0 (x, y)};
    – ET A = {(x, y) ∈ Eall | (x ∈ VT ∧ y ∈ VA ) ∨ (y ∈ VT ∧ x ∈ VA )};
    – EAA = {(x, y) ∈ Eall | x ∈ VA ∧ y ∈ VA };
    – EF A = {(x, y) ∈ Eall | (x ∈ VF ∧ y ∈ VA ) ∨ (y ∈ VF ∧ x ∈ VA )}.
It is readily seen that GA = (VA , EA ) is undirected in the sense that, for all of its
vertices u and v, (u, v) ∈ EA iff (u, v) ∈ EA .
    If we do not require q 0 (x, y) to be symmetric, the complexity upper bound increases
to NL:
Theorem 6. Let Q = (Dis, q) be any OMQ such that
                               q = ∃x, y (F (x) ∧ T (y) ∧ q 0 (x, y)),
for some connected CQ q 0 (x, y) without solitary occurrences of F and T . Then answer-
ing Q can be done in NL.
Proof. We claim that the datalog query (Π, G) with the following linear datalog pro-
gram Π, where q̃ 0 is the result of omitting all the ∃ from q 0 :
                                    G ← F (x) ∧ q̃ 0 (x, y) ∧ P (y)
                                 P (x) ← T (x)
                                 P (x) ← A(x) ∧ q̃ 0 (x, y) ∧ P (y)
is a datalog rewriting of Q. Indeed, if Π, A |= G then there are v0 , v1 , . . . , vn ∈ ind(A)
such that F (v0 ), A(v1 ), . . . , A(vn−1 ), T (vn ) ∈ A and q 0 (vi , vi+1 ) holds in A, for
0 ≤ i < n. Clearly, in any model I of Dis and A there is i with I |= F (vi ) ∧ T (vi+1 ).
It follows that Dis, A |= q.
     Conversely, suppose Π, A 6|= G. Let VP = {v ∈ ind(A) | Π, A |= P (v)}. Define
a model I of Dis with domain ind(A) by setting
T I = {v | T (v) ∈ A} ∪ {v ∈ VP | A(v) ∈ A},                   F I = F A ∪ {v ∈
                                                                              / VP | A(v) ∈ A}.
We claim that I 6|= q. Indeed, otherwise there is a homomorphism h : q → I. As
h(y) ∈ T I , we have Π, A |= P (h(y)). As h(x) ∈ F I , we have either F (h(x)) ∈ A or
A(h(x)) ∈ A, contrary to Π, A 6|= G.
    The sufficient conditions of Theorems 5 and 6 only apply to CQs with exactly one
solitary occurrence of F and exactly one solitary occurrence of T . What happens if we
allow more than one solitary occurrences of F or T ?

5     P
The following result is a consequence of [8, Theorem 27]:
Theorem 7. Let Q = (Dis, q) be any OMQ such that
          q = ∃x, y1 , . . . , yn (F (x) ∧ T (y1 ) ∧ · · · ∧ T (yn ) ∧ q 0 (x, y1 , . . . , yn )),
for some connected CQ q 0 (x, y1 , . . . , yn ) without solitary occurrences of T and F .
Then answering Q can be done in P.
    Indeed, for any ABox A, we have Dis, A |= q iff Π, A |= G, where Π is the
following datalog program and q̃ 0 is the result of omitting all the ∃ from q 0 :
                       G ← F (x) ∧ q̃ 0 (x, y1 , . . . , yn ) ∧ P (y1 ) ∧ · · · ∧ P (yn )
                    P (x) ← T (x)
                    P (x) ← A(x) ∧ q̃ 0 (x, y1 , . . . , yn ) ∧ P (y1 ) ∧ · · · ∧ P (yn ).
  Is the P-upper bound of Theorem 7 optimal? The following example gives a typical
OMQ in the scope of that theorem answering which is P-hard.
Example 1. We show that Q = (Dis, q) is P-hard for q shown in the picture below.
                         T             T             F
                                                   S           R

The proof is by reduction of the alternating monotone circuit evaluation problem, which
is known to be P-complete [9]. An example of an alternating monotone circuit is shown
in the picture below. Given such a circuit C and an input α, we define an ABox Aα  C as
the set of the following atoms:
    – R(g, h), if a gate g is an input of a gate h;
    – S(g, h), if g and h are distinct inputs of some AND-gate;
    – S(g, g), if g is an input gate or a non-output AND-gate;
    – T (g), if g is an input gate with 1 under α;
    – F (g), for the only output gate g;
    – A(g), for those g that are neither inputs nor the output.
To illustrate, the picture below shows an alternating monotone circuit C, an input α for
it, and the ABox Aα  C , where the solid arrows represent R and the dashed ones S:
                                                                               F
                         AND


                    OR         OR
                                                                     A                A

              AND        AND        AND
                                                                A          A        A

         OR         OR         OR         OR
                                                           A                            A    A
                                                                       A
     1        0      1         0     0         0       T               T

One can show that C(α) = 1 iff Dis, Aα
                                     C |= q.

    Curiously, by changing S to R in the CQ from Example 1, we obtain an OMQ that
is NL-complete as follows from Theorem 8 below.

6        NL vs. P
Theorem 8. Answering any OMQ Q = (Dis, q n ) with
                                    n−1
                                    ^                            
      q n = ∃x1 , . . . , xn , y          T (xi ) ∧ R(xi , xi+1 ) ∧ T (xn ) ∧ R(xn , y) ∧ F (y),
                                    i=1
for n ≥ 1, is NL-complete.

Proof. The lower bound follows from Theorem 4. The proof of the upper one is by
reduction to directed reachability. We split q n into two CQs:

                                          n−1
                                          ^
                q 0n = ∃x1 , . . . , xn
                                                                         
                                                  T (xi ) ∧ R(xi , xi+1 ) ∧ T (xn ),
                                          i=1
                                                   
                  q = ∃x, y T (x) ∧ R(x, y) ∧ F (y) .

One can show that, for any ABox A, we have Dis, A |= q n iff there exist a homomor-
phism f : q 0n → A and a directed R-path f (xn ), v0 , v1 , . . . , vm ∈ ind(A) such that
A(vi ) ∈ A, for i = 1, . . . , m − 1, and F (vm ) ∈ A. Clearly, this criterion reduces to
directed reachability.

    To further illustrate how minor modifications to the structure of CQs can send them
to different complexity classes, we collect in Table 1 a number of CQs in the scope of
Theorem 7, some of which turn out to be NL-complete, while others are P-complete.
(All the omitted labels on the arrows in Table 1 are assumed to be R, −/A means either
blank or A, and F T /A means either F T or A).
    Here, we only sketch the proof of P-hardness for the OMQ (Dis, q), where q is
                              T             F              T
                                              R              R
The proof is by reduction of the monotone circuit evaluation problem. Given a mono-
tone circuit C and an input α, we define an ABox Aα         C as the following labelled directed
graph, all of whose edges are labelled with R. For each gate g of C except the inputs
and output, the graph contains two vertices g and g 0 labelled with A; the output gate
g gives rise to only one vertex g labelled with F , while each input gate g to only one
vertex g 0 labelled according to α. For an OR-gate g = h1 ∨ h2 , we have the directed
edges (h01 , g), (h02 , g), (g, rg ), where rg is a new vertex labelled with T . For an AND-
gate g = h1 ∧ h2 , we have the edges (h01 , g), (g, h02 ). Also, for each gate g, we have the
edges (g, g 0 ), (g 0 , tg ), where tg is a new vertex labelled with T . An example illustrating
the construction is given below. One can show that C(α) = 1 iff Dis, Aα          C |= q.


                                                       T     tg1       F       g3        T       tg2
                     g3
                   AND                                           g10   A        A      g20       rg2
                                                                                             T
       g1 AND                OR    g2                            g1    A        A      g2

     1               1                    0            T                   T             F

    The membership in NL for the CQs in the left column of Table 1 can be shown
by constructing appropriate linear datalog programs. For example, answering the OMQ
                NL-complete                                                    P-complete
                                                              T           T            F
T           T        F
                                                                  S            R
                                                              T           F            T


T           T         T          F                            T           T            −/A         F


                                                              T       −/A              T           F


T           T        FT          F                            T           T            FT                   F


T           T        FT          FT         F


T       −/A           T          FT         FT        F


 T          T        −/A         FT         F                 T       T            T         −/A       FT       F


 T          T        −/A         −/A        FT        F


 T      T        T         −/A         FT        FT       F


T                     T                     F                 T                    T                            F


T                T                     FT             F       T       F T /A           T                    F


                                                              T                 −/A                    T            F


                                                              T                    A          T                 F


 T          F                    T                            T           F                        T


                Table 1. NL- and P-complete OMQs in the scope of Theorem 7.


with the last CQ of the left column can be done by the following linear program:

                     P (x) ← R(x, y), T (y), R(x, z), R(z, v), T (v)
                     P (x) ← R(x, y), T (y), R(x, z), R(z, v), P (v), A(v)
                     P (x) ← R(x, y), P (y), A(y)
                        G ← P (x), F (x)

Note that the classification problem we deal with in this section can be regarded as an
instance of a more general problem of classifying datalog programs in terms of their
data complexity, in particular, finding an NL/P dichotomy.


7    CO NP

On the other hand, a minor extension of the CQ from Example 1 can lead to CO NP-
completeness. First we show that answering the OMQ Q = (Dis, q) with the Boolean
CQ q given in the picture below is CO NP-complete.
                              T                         T                             F                          F
                                  R                                      S                               Q

Consider the ABoxes AN constructed according to the pattern shown below for N = 3:
                                                                 Q            R
                                                                         a3
                                           S            c2                            b3       S
                                                                         T
                                                             A                    F
                                           b2                                                      c3
                                                    F                                      A
                                  R                                                                          Q


                                      a2        T                                              T        a0


                                  Q                                                                          R
                                                    A                                      F
                                           c1                                                      b0
                                                             F                    A
                                                                         T
                                           S            b1                            c0       S
                                                                         a1
                                                                 R            Q


Let V = {a0 , . . . , aN }. It is not hard to see that (i) for any interpretation I based
on AN , if I 6|= q then either V ⊆ T I or V ⊆ F I ; (ii) the interpretations I and I 0
                                                             0
obtained by extending AN with T I = T A ∪ V and F I = F A ∪ V , respectively, are
both models of Dis that do not satisfy q.
    Given a 2+2-CNF φ with clauses D1 , . . . , DN and variables p1 , . . . , pM , we take
M disjoint copies of AN , distinguishing between them by the superscripts 1, . . . , M .
For example, a23 is the a3 -point of the second copy of AN and V 2 = {a20 , . . . , a2N }. For
each Dn of the form ¬pi ∨ ¬pj ∨ pk ∨ pl , we add to those copies the atoms R(ain , ajn ),
S(ajn , akn ) and Q(akn , aln ), and denote the resulting ABox by Aφ .

                A     ain−1                A        ajn−1                         A        akn−1                     A     aln−1

                              R                                                                              Q


            A       ain               A        ajn                            A        akn                       A       aln
                                                                     S

                pi                         pj                                         pk                             pl

We show that φ is satisfiable iff Dis, Aφ 6|= q. Let q 0 = R(x, y) ∧ S(y, z) ∧ Q(z, w).
Observe that any possible match of q 0 in Aφ falls into one of the two groups:
(A) (ain , bin , cin , ain+1 ), for 0 ≤ n ≤ N , 1 ≤ i ≤ M and addition modulo N + 1;
(B) (ain , ajn , akn , aln ), for some clause Dn = (¬pi ∨ ¬pj ∨ pk ∨ pl ) in φ.
    Suppose φ is satisfiable under       S an assignment a. We define a S model I of Dis by
extending Aφ with T I = T Aφ ∪ {V i | a(pi ) = 1}, F I = F Aφ ∪ {V i | a(pi ) = 0}.
We claim that I 6|= q. Indeed, the tuples in (A) cannot yield a match by (ii) above,
while the tuples in (B) do not give a match since a(Dn ) = 1, for all n ≤ N . To see this,
suppose a tuple (ain , ajn , akn , aln ) from (B) is a match for q in I. Then {ain , ajn } ⊆ T I
and {akn , aln } ⊆ F I , from which a(pi ) = 1, a(pj ) = 1, a(pk ) = 0 and a(pl ) = 0, and
so the clause Dn = ¬pi ∨ ¬pj ∨ pk ∨ pl is false under a.
    Conversely, suppose Dis, Aφ 6|= q. Then there is a model I of Dis based on Aφ
such that I 6|= q. By (i) above applied to the copies of AN , for every i ≤ M , we have
either Vi ⊆ T I or Vi ⊆ F I . In the former case, we set a(pi ) = 1; in the latter one, we
set a(pi ) = 0. We claim that φ is satisfiable under a. Indeed, if Dn = ¬pi ∨¬pj ∨pk ∨pl
is false under a, then a(pi ) = 1, a(pj ) = 1, a(pk ) = 0 and a(pl ) = 0, and so the tuple
(ain , ajn , akn , aln ) would be a match for q in I.
     The proposed method is generic in the sense that we can try to apply it to any
‘sufficiently asymmetric’ CQ q with two T -atoms and two F -atoms: we use a T -F
fragment of q for copying the values of the Boolean variables, and the whole q for
encoding the clauses of a 2 + 2-CNF. However, this method does not work for the CQ
                                  T              T         F             F
                        q0
                                        R                   R             R

which requires a somewhat different technique. We show CO NP-hardness of (Dis, q 0 )
by reduction of 3SAT. Given a 3CNF ψ, we define an ABox Aψ as follows. First, for
every variable p in ψ, we construct a ‘gadget’ shown in the picture below, where the
number of A-nodes above each of the circles matches the number of clauses in ψ; we
refer to these nodes as p-nodes and, respectively, ¬p-nodes (below the circles, there are
2 p- and 2 ¬p-nodes):
                                  A                             A
                                  ...                           ...
                                  A                             A

                                  A                             A
                          F                     T       F                 T

                                  p                             ¬p
                          T                     F       T                 F


                                  A                             A

                                  A                             A             T       T

           F          F           A                             A


Observe that, for any model I of Dis and the constructed gadget for p, if I 6|= q then
either (i) the p-nodes are all in F I and the ¬p-nodes are all in T I , or (ii) the p-nodes
are all in T I and the ¬p-nodes are all in F I .
    Now, for every clause c = (l1 ∨ l2 ∨ l3 ) in ψ, we add to the constructed gadgets the
atoms T (c), R(c, ac¬l1 ), R(ac¬l1 , acl2 ), R(acl2 , acl3 ), where c is a new individual, ac¬l1 a
fresh ¬l1 -node, acl2 a fresh l2 -node, and acl3 a fresh l3 -node. For example, for the clause
c = (p ∨ q ∨ r), we obtain the fragment below. The resulting ABox is denoted by Aψ .
               T                  A                         A                 A


                              F   A         T       F       A   T     F       A   T
                                  ¬p                        q                 r
                              T             F       T           F     T           F

                                  A                         A                 A

One can show that ψ is satisfiable iff Dis, Aψ 6|= q 0 .
Acknowledgements. The work of O. Gerasimova and M. Zakharyaschev was carried
out at the National Research University Higher School of Economics and supported
by the Russian Science Foundation under grant 17-11-01294; the work of V. Podol-
skii was supported by the Russian Academic Excellence Project ‘5-100’ and by grant
MK-7312.2016.1. Thanks are due to Frank Wolter and Carsten Lutz for comments,
suggestions and discussions.


References
 1. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge Univer-
    sity Press, New York, NY, USA, 1st edn. (2009)
 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and re-
    lations. Journal of Artificial Intelligence Research (JAIR) 36, 1–69 (2009)
 3. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through
    disjunctive datalog, csp, and MMSNP. ACM Transactions on Database Systems 39(4), 33:1–
    44 (2014)
 4. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning
    and efficient query answering in description logics: the DL-Lite family. Journal of Automated
    Reasoning 39(3), 385–429 (2007)
 5. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, MMSNP,
    and expressive description logics. CoRR abs/1701.02231 (2017), http://arxiv.org/
    abs/1701.02231
 6. Gottlob, G., Papadimitriou, C.H.: On the complexity of single-rule datalog queries. Inf. Com-
    put. 183(1), 104–122 (2003), http://dx.doi.org/10.1016/S0890-5401(03)
    00012-9
 7. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: Calvanese,
    D., Konev, B. (eds.) Proceedings of the 28th International Workshop on Description Logics,
    Athens,Greece, June 7-10, 2015. CEUR Workshop Proceedings, vol. 1350. CEUR-WS.org
    (2015), http://ceur-ws.org/Vol-1350/paper-24.pdf
 8. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs
    and non-Horn ontologies. Artif. Intell. 236, 90–118 (2016), http://dx.doi.org/10.
    1016/j.artint.2016.03.006
 9. Papadimitriou, C.: Computational Complexity. Addison-Wesley (1994)
10. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking
    data to ontologies. Journal on Data Semantics X, 133–173 (2008)