=Paper=
{{Paper
|id=Vol-2211/paper-36
|storemode=property
|title=Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering
|pdfUrl=https://ceur-ws.org/Vol-2211/paper-36.pdf
|volume=Vol-2211
|authors=Michael Zakharyaschev,Stanislav Kikot,Olga Gerasimova
|dblpUrl=https://dblp.org/rec/conf/dlog/ZakharyaschevKG18
}}
==Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering==
<pdf width="1500px">https://ceur-ws.org/Vol-2211/paper-36.pdf</pdf>
<pre>
                Towards a Data Complexity Classification of
                 Ontology-Mediated Queries with Covering

                       O. Gerasimova1 , S. Kikot2 , and M. Zakharyaschev3
            1
                National Research University Higher School of Economics, Moscow, Russia
                                     2
                                       University of Oxford, U.K.
                                3
                                  Birkbeck, University of London, U.K.


            Abstract. We prove a number of new syntactic and semantic sufficient and nec-
            essary conditions for ontology-mediated queries (OMQs) with one covering ax-
            iom to be in the classes AC0 and NL for data complexity, and to be L-, NL- or
            P-hard. We also give two new examples of very simple CO NP-complete OMQs.


1      Introduction

The problem we are concerned with in this paper originated from the observation that
the NPD FactPages ontology4, used for testing ontology-based data access (OBDA)
in industry [13, 12], contained covering axioms of the form A v B1 t · · · t Bn . It
has been known since Schaerf’s paper [18] that answering ontology-mediated queries
(OMQs) with a covering axiom can be CO NP-hard for data complexity, and so such
OMQs are not suitable for OBDA systems in general. In fact, two interesting OMQs
with covering can be extracted from [18]: Q1 = (Cov > , q 1 ) and Q2 = (Cov > , q 2 ),
where Cov > = {> v F t T } and the Boolean conjunctive queries (CQs) q 1 and q 2
are shown below.
                                                               F                 F
                                                                   N1   R   N2
                                F              T           T                         T
       q1                                                                                   q2
                         S              R                          P1       P2

Schaerf pointed out that answering these queries involves case analysis and showed that
Q2 is CO NP-hard for data complexity. It is readily seen that Q1 is NL-complete. The
problem we started attacking in [9] was to find a complete syntactic or semantic clas-
sification of OMQs with covering according to their data complexity. The direct attack
failed, and connections with other hard classification problems such as boundedness or
linearisability of datalog programs (to be discussed in Section 3) indicate that a longer
siege is needed. In this paper, we report on our recent findings.
     We consider four ontologies with covering: Cov > as above, Cov A = {A v F t T },
Cov ⊥                                        ⊥
     > = {> v F t T, F u T v ⊥}, Cov A = {A v F t T, F u T v ⊥}, and we
classify connected Boolean CQs q by the number of solitary occurrences of F and T in
them, where a solitary F is any F (x) ∈ q with T (x) ∈ / q, and symmetrically for T . We
prove a number of new syntactic and semantic sufficient and necessary conditions for
 4
     http://sws.ifi.uio.no/project/npd-v2/
such OMQs with at most one solitary F to be in the complexity classes AC0 and NL,
and to be L-, NL- or P-hard. We also give two new examples of very simple CO NP-
complete OMQs.


2   Preliminaries
In this paper, a Boolean conjunctive query (CQ) is any first-order (FO) sentence of
the form q = ∃x ϕ(x), where ϕ is a conjunction of unary or binary atoms P (y) with
y ⊆ x. We often regard CQs as sets of their atoms, depict them as labelled digraphs, and
assume that all of our CQs are connected. By answering an ontology-mediated query
(OMQ) Q = (T , q) with a TBox T of the form defined above, we understand the
problem of checking, given an ABox A (a finite set of unary or binary ground atoms),
whether q holds in every model of T ∪ A, or T , A |= q in symbols. For every Q, this
problem is clearly in CO NP. It is in the complexity class AC0 if there is an FO-sentence
q 0 , called an FO-rewriting of Q, such that T , A |= q iff A |= q 0 , for any ABox A.
      A datalog program, Π, is a finite set of rules ∀x (γ0 ← γ1 ∧ · · · ∧ γm ), where each
γi is an atom Q(y) with y ⊆ x. (As usual, we omit ∀x.) The atom γ0 is the head of the
rule, and γ1 , . . . , γm its body. All the variables in the head must occur in the body. The
predicates in the head of rules are IDB predicates, the rest EDB predicates [1].
      A datalog query is a pair (Π, G), where Π is a datalog program and G an 0-ary
atom, the goal. The answer to (Π, G) over an ABox A is ‘yes’ if G holds in the FO-
structure obtained by closing A under Π, in which case we write Π, A |= G. A datalog
query (Π, G) is a datalog rewriting of an OMQ Q = (T , q) in case T , A |= q iff
Π, A |= G, for any ABox A. The answering problem for (Π, G)—i.e., checking, given
an ABox A, whether Π, A |= G—is clearly in P. Answering a datalog query with a
linear program, whose rules have at most one IDB predicate in the body, can be done in
NL. The NL upper bound also holds for datalog queries with a linear-stratified program
defined as follows. A stratified program [1] is a sequence Π = (Π0 , . . . , Πn ) of datalog
programs, called the strata of Π, such that each predicate in Π can occur in the head
of a rule only in one stratum Πi and can occur in the body of a rule only in strata Πj
with j ≥ i. If, additionally, the body of each rule in Π contains at most one occurrence
of a head predicate from the same stratum, Π is called linear-stratified. Every linear-
stratified program can be converted to an equivalent linear datalog program [2].


3   Connections in High Places
We begin by putting our little problem into the context of more general investigations
of (i) boundedness and linearisability of datalog programs and (ii) the data complexity
of answering OMQs with expressive ontologies.
    The decision problem whether a given datalog program is bounded has been a hot
research topic in database theory since the late 1980s. Thus, it was shown that bounded-
ness is undecidable already for linear datalog programs with binary IDB predicates [20]
and single rule programs (aka sirups) [15]. On the other hand, deciding boundedness
is 2E XP T IME-complete for monadic datalog programs [7, 4] and PS PACE-complete for
linear monadic programs [7]; for linear sirups, it is even NP-complete [20].
    The last two results are relevant for deciding FO-rewritability of OMQs (Cov A , q),
where q has a single solitary F and is called a 1-CQ. Indeed, suppose that F (x) and
T (y1 ), . . . , T (yn ) are all the solitary occurrences of F and T in q. Let Πq be a monadic
datalog program with three rules

                              G ← F (x), q 0 , P (y1 ), . . . , P (yn ),                  (1)
                          P (x) ← T (x),                                                  (2)
                                               0
                          P (x) ← A(x), q , P (y1 ), . . . , P (yn ),                     (3)

where q 0 = q \ {F (x), T (y1 ), . . . , T (yn )} and P is a fresh predicate symbol that never
occurs in our ABoxes. Then, for any ABox A, we have Cov A , A |= q iff Πq , A |= G.
Thus, FO-rewritability of (Cov A , q) is clearly related to boundedness of the sirup (3).
    The problem of linearising datalog programs, that is, transforming them into equiv-
alent linear datalog programs, which are known to be in NL for data complexity, has
also attracted much attention [16, 17, 21, 2] after the Ullman and van Gelder pioneering
paper [19].
    The success of the OBDA paradigm spurred considerable interest in the data com-
plexity of answering individual OMQs with expressive ontologies. Thus, by establish-
ing a remarkable connection to CSPs, it was shown in [5] that deciding FO-rewritability
and datalog rewritability of OMQs with a SHIU ontology and a (Boolean) atomic
query is NE XP T IME-complete. This result is obviously applicable to (Cov A , q) with
a tree-shaped q. In [8], it was shown that checking rewritability of Boolean monadic
disjunctive datalog programs into FO and monadic datalog can be done in, respectively,
2NE XP T IME and 3E XP T IME, which is applicable to all of our OMQs (Cov A , q).
    An AC0 /NL/P trichotomy for the data complexity of answering OMQs with an EL
ontology and atomic query, which can be checked in E XP T IME, was established in [14].
This result is applicable to OMQs (Cov A , q), in which q is an F -tree having a single
solitary F (x) such that the binary atoms in q form a ditree with root x. Indeed, denote
by TQ the EL TBox with concept inclusions F u Cq v G0 , T v P and A u Cq v P ,
where Cq is an EL-concept representing q \ {F (x)} with P for T (so for q of the form
                               F         FT                       T

                               x    R1    y1       R2   y2   R3   y3

Cq = ∃R1 .(F u P u ∃R2 .∃R3 .P )). Then, for any ABox A that does not contain G0 ,
we have ΠQ , A |= G iff TQ , A |= ∃x G0 (x).
     Yet, despite all of these efforts and results (implying, in view of the recent positive
solution to the Feder-Vardi conjecture [6, 22], that there is a P/CO NP dichotomy for
OMQs with a SHIU ontology and a (Boolean) atomic query, which is decidable in
NE XP T IME), we are still lacking simple and transparent, in particular syntactic, con-
ditions guaranteeing this or that data complexity or type of rewritability. Some results
in this direction were obtained in [10, 11]. That a transparent classification of monadic
sirups according to their data complexity has not been found so far and the close con-
nection to CSPs indicate that this problem is extremely hard in general. Our aim below
is to demonstrate that, for the OMQs with covering considered in this paper, syntactic
conditions do exist but are difficult to find.
4   0-CQs
As observed in [9], if an OMQ Q (of the form defined above) is in AC0 , then q is an
FO-rewriting of Q. By a 0-CQ we mean any CQ that does not contain a solitary F . A
twin in a CQ q is any pair F (x), T (x) ∈ q. Here is an encouraging syntactic criterion
of FO-rewritability for OMQs of the form (Cov ⊥A , q):

Theorem 1. (i) If q is a 0-CQ, then answering both (Cov ⊥   A , q) and (Cov A , q) is in
AC0 , with q being an FO-rewriting of these OMQs.
   (ii) If q is not a 0-CQ and does not contain twins, then answering both (Cov ⊥ > , q)
and (Cov > , q) is L-hard.

Proof. (i) follows from [9, Theorem 1]. (ii) The proof is by reduction to the reachabil-
ity problem for undirected graphs, which is known to be L-complete; see, e.g., [3].
Denote by q 0 the CQ obtained from q by gluing together all the variables x with
F (x) ∈ q and also all the variables y with T (y) ∈ q. Thus, q 0 contains a single F -
atom, F (x), and a single T -atom, T (y). Clearly, there is a homomorphism h : q → q 0 .
Let q 00 = q 0 \ {F (x), T (y)}.
     Suppose we are given an undirected graph G = (V, E) and two vertices s, t ∈ V .
We regard G as a directed graph such that (u, v) ∈ E iff (v, u) ∈ E, for any u, v ∈ V .
Now, we encode G by means of an ABox AG that is obtained from G as follows. For
every edge e = (u, v) ∈ E, let q 00e be the set of atoms in q 00 with x renamed to u, y to
v and all other variables z to ze . Then AG comprises all such q 00e , for e ∈ E, as well as
F (s) and T (t). We show that s →G t iff Cov > , AG |= q.
     Suppose that s →G t, i.e., there exists a path s = v0 , . . . , vn = t in G with ei =
(vi , vi+1 ) ∈ E, for i < n. Consider an arbitrary model I of Cov > and AG . Since
I |= Cov > , and F (s) and T (t) are in AG , we can find some i < n such that I |= F (vi )
and I |= T (vi+1 ). As q 00ei is an isomorphic copy of q 00 , we obtain I |= q 0 , and so
I |= q. Conversely, suppose s 6→G t. Define an interpretation I, extending the ABox
AG , by setting F I to be the set of objects in AG that are reachable from s and T I its
complement. Clearly, I is a model of Cov > . By the construction, the elements of the
connected component of I containing s cannot be instances of T , while the remaining
elements of I are not be instances of F . Since q is connected, it follows that I 6|= q.
                                        0
Corollary 1. An OMQ (Cov ⊥
                         A , q) is in AC iff q is a 0-CQ, which can be decided in
linear time.

    If twins can occur in CQs (that is, F and T are not necessarily disjoint), the pic-
ture becomes more complex. On one hand, we have the following criterion for OMQs
(Cov A , q) with a path CQ q whose variables x0 , . . . , xn in q are ordered so that the
binary atoms in q form a chain R1 (x0 , x1 ), . . . , Rn (xn−1 , xn ).

Theorem 2 ([9]). An OMQ (Cov A , q) with a path CQ q is in AC0 iff q is a 0-CQ. If q
contains both solitary F and T , then (Cov A , q) is NL-hard.

    On the other hand, this AC0 /NL criterion collapses for path CQs with loops:

Proposition 1. The OMQ (Cov A , q), where q is shown below, is in AC0 .
                         FT         T                  F         FT

                          R    R         R         S        S    S


Proof. Follows from the sufficient condition of Theorem 4 (i).
   Note that the CQ q above is minimal (not equivalent to any of its proper sub-CQs).
Note also that, if a minimal 1-CQ q contains both a solitary F and a solitary T , then
FO-rewritability of (Cov A , q) implies that q contains at least one twin (F T ) and at least
one y with T (y) ∈/ q and F (y) ∈ / q (which can be shown using Theorem 4 (i)).


5   1-CQs
1-CQs have exactly one solitary F . As observed in Section 3, we have the following:
Theorem 3. (i) Answering any OMQ (Cov A , q) with a 1-CQ q can be done in P.
   (ii) Answering any OMQ (Cov A , q) with an F -tree q is either in AC0 or NL-
complete or P-complete. The trichotomy can be decided in E XP T IME.
     Theorem 3 (ii) was proved by a reduction to the AC0 /NL/P-trichotomy of [14].
It is to be noted, however, that applying the algorithm from [14] in our case is tricky
because the input ontology TQ must first be converted to a normal form. As a result, we
do not obtain transparent syntactic criteria on the shape of q that would guarantee that
the OMQ (Cov A , q) belongs to the desired complexity class (see examples below).
     We now give a semantic sufficient condition for an OMQ with a 1-CQ to lie in NL.
This condition uses ideas and constructions from [7, 14]. Let Q = (Cov A , q) be an
OMQ with a 1-CQ q having a solitary F (x). Define by induction a class KQ of ABoxes
that will be called cactuses for Q. We start by setting KQ := {q}, regarding q as an
ABox, and then recursively apply to KQ the following two rules:
(bud) if T (y) ∈ A ∈ KQ with solitary T (y), then we add to KQ the ABox obtained by
    replacing T (y) in A with (q \ F (x)) ∪ {A(x)}, in which x is renamed to y and all
    of the other variables are given fresh names;
(prune) if Cov A , A0 |= Q, where A0 = A \ {T (y)} and T (y) is solitary, we add to KQ
    the ABox obtained by removing T (y) from A ∈ KQ .
It is readily seen that, for any ABox A0 , we have Cov A , A0 |= Q iff there exist A ∈ KQ
and a homomorphism h : A → A0 . Denote by K†Q the set of minimal cactuses in KQ
(that have no proper sub-cactuses in KQ ).
     For a cactus C ∈ KQ , we refer to the copies of (maximal subsets of) q that comprise
C as segments. The skeleton C s of C is the ditree whose nodes are the segments s of
C and edges (s, s0 ) mean that s0 was attached to s by budding. The atoms T (y) ∈ s
are called the buds of s. The rank r(s) of s is defined by induction: if s is a leaf, then
r(s) = 0; for non-leaf s, we compute the maximal rank m of its children and then set
                             (
                               m + 1, if s has ≥ 2 children of rank m;
                    r(s) =
                               m,       otherwise.
The width of C and C s is the rank of the root in C s . We say that K†Q is of width k if it
contains a cactus of width k but no cactus of greater width. The depth of C and C s is the
number of edges in the longest branch in C s .
    We illustrate the definition by an example. Denote by q T nT , for n ≥ 0, the 1-CQ
shown below, where all the binary predicates are R and the n variables without labels
do not occur in F - or T -atoms:
                             F       T                     T
                                               ...

                                                n

Example 1. Let Q = (Cov > , q T 1T ). In the picture below, we show a cactus C obtained
by applying (bud) twice to q T 1T (with A = > omitted):
                      F      T

                              z
                                                     T             T


                                          T

One can check that Cov > , C \ {T (z)} |= q T 1T , and so an application of (prune) will
remove T (z) from C. Using this observation, one can show that K†Q is of width 1. On
the other hand, if Q = (Cov A , q T 1T ) then K†Q is of unbounded width as follows from
Theorem 6 below.

Theorem 4. Let Q = (Cov A , q) be an OMQ with a 1-CQ q. Then
   (i) Q is in AC0 iff for every C ∈ K†Q , there is a homomorphism h : q → C;
   (ii) Q is rewritable in linear datalog, and so is in NL, if K†Q is of bounded width.

Proof. The proof of (i) is straightforward. To prove (ii), we represent the ABoxes in
KQ as words in a tree alphabet and construct a non-deterministic finite tree automaton
AQ such that K†Q ⊆ L(AQ ) ⊆ KQ . Then we show that, for K†Q of finite width, the
automaton AQ can be transformed into a monadic linear-stratified datalog rewriting of
Q. It can further be converted into a linear datalog rewriting at the expense of increasing
the arity if IDBs in the program [2].

    It is worth noting that, for Q = (Cov > , q) with q from Proposition 1, K†Q consists of
q and the cactus of depth 1, in which the only solitary T is removed by (prune). Clearly,
there is a homomorphism from q into this cactus, and so Q is FO-rewritable. However,
for the 1-CQ q in the picture below (where all edges are bidirectional), (Cov > , q) is not
FO-rewritable, but there is a homomorphism from q to both cactuses of depth 1. We
do not know whether, in general, there is an upper bound Nq such that the existence of
homomorphisms h : q → C, for all C ∈ KQ of depth Nq , would ensure FO-rewritability
of (Cov A , q). For 1-CQs q with a single solitary T , one can take Nq = |q| + 1. Neither
do we know the exact complexity of deciding FO-rewritability of OMQs with 1-CQs.
As mentioned in Section 3, this problem is reducible to the boundedness problem for
monadic datalog programs, which is known to be in 2E XP T IME.
         S                                                         R, Q           R, Q


    T
             S             S        R          Q             R     FT         Q   FT
                                                                                          R
     S                                                                                           F
                                                                   FT             FT
                                                                                          Q
    T
             S             S        S          R             Q                R
         S                                                         R, Q           R, Q

    Theorem 4 (ii) allows us to obtain a sufficient condition for linear-datalog rewritabil-
ity of OMQs (Cov A , q) with an F -path CQ q, that is, a path CQ with a single solitary
F at its root. We represent such a q as shown in the picture below, which indicates all
the solitary occurrences of F and T :
             F                 T                   T                      T
                     ...                ...                  ...                    ...
    q = x                      y1                  yi                     ym                  ym+1

We require the following sub-CQs of q:
 – q i is the suffix of q that starts at yi , but without T (yi ), for 1 ≤ i ≤ m;
 – q ∗i is the prefix of q that ends at yi , but without F (x) and T (yi ), for 1 ≤ i ≤ m;
 – q ∗m+1 is q without F (x),
and write fi : q i  q if fi is a homomorphism from q i into q with fi (yi ) = x.

Theorem 5. If for each 1 ≤ i ≤ m there exist fi : q i  q, then (Cov A , q) is rewritable
into a linear datalog program, and so is NL-complete.

    For F -path CQs q without twins, we extend Theorem 5 to a NL/P dichotomy (pro-
vided that NL 6= P). Given such a CQ q, we denote by Nq the set of the numbers
indicating the length of the path from x to each of the yi , i = 1, . . . , m + 1.

Theorem 6. Let Q = (Cov A , q) be an OMQ where q is an F -path CQ without twins
having a single binary relation. The following are equivalent unless NL = P:
(i) Q is NL-complete;
(ii) {0} ∪ Nq is an arithmetic progression;
(iii) there exist fi : q i  q for every i = 1, . . . , m.
If these conditions do not hold, then Q is P-complete.

Proof. It is readily seen, using Theorem 5, that (ii) ⇒ (iii) ⇒ (i). We prove the
implication (i) ⇒ (ii). Suppose {0} ∪ Nq is not an arithmetic progression. We show
that Q is P-hard by reduction of the monotone circuit evaluation problem. Let X be the
sub-CQ of q between F (x) and the first T but without the F and T atoms, and let |X|
be the number of atoms in X. Let Y be the first sub-CQ between neighbouring T -atoms
without these T -atoms such that |Y | =
                                      6 |X|, n the number of X-sub-CQs before Y , and
let Z be the rest of the query without the first T atom; see the picture below where
n = 2.
                                       F       X   T    X   T    Y   T    Z
                                       x0          x1       x2       x3       x4

We distinguish between two cases: |Y | > |X| and |Y | < |X|. Depending on the case,
we use the following gadgets for ∧-gates, which are shown below for n = 2 (it should
be clear how to modify the construction for n > 2); the ∨-gate gadget is the same for
both cases.
                                   c
 ∧-gate for |Y | > |X|                 A

              A           T
                                       X                                                   ∨-gate
         Z            Y       X
                                       A           ∧-gate for |X| > |Y |
         b0                   a0                                                               c
                  X                    X                         c                     X            X
                  T                    T                    A
                                                                                   T                       T
                  X                    X                    X
                                                                                   X               X
                  T                    T                    T
                                                                                   T                       T
                  Y                    Y                    X             A
                                                                                   Y       Z       Y       Z
         Z                                 Z                         Y        Z    A               A
                  A           A                             A
              b                    a                             b        a            a               b
Given any monotone Boolean circuit C and its input α, we construct an ABox AC,α
by replacing ∧- and ∨-gates with the inputs a and b and output c by the gadgets above,
placing T on the input gates evaluated to 1 under α and F on the output gate. We claim
that Cov A , AC,α |= q iff C(α) = 1.
    (⇐) It is not hard to show by induction that whenever a non-input gate g in C
outputs 1 under α, then the sub-ABox of AC,α generated by g and with F placed on
g validates Q. For example, suppose T (a), T (b) and F (c) hold in the left-hand side
gadget. Then either F (b0 ) or T (b0 ). In the former case, q is satisfied; so assume T (b0 ).
We have either F (a0 ) or T (a0 ). Now, in either case, q is satisfied, as required.
    (⇒) Suppose C(α) = 0. We construct a model I of Cov A based on AC,α by
placing T or F to g and g 0 (if available) depending on the value of g in C under α. It
can be checked that I 6|= q. In particular, if |Y | > |X|, c is the conjunction of b and a,
c = b = 0 and a = 1, then if x0 goes to c, x1 goes to a0 , then x2 must go to the point
below a0 , and so x3 must go somewhere between a and the point above a, which is
impossible. If a = 0, b = 1 and x0 goes to a0 and x1 goes to the central T , then x2 must
go somewhere between the central T and b0 , which is also impossible. If |Y | < |X|,
c = b ∧ a, b = d ∨ e, a = c = 0, b = 1, then if x0 goes to c, x2 goes to b, then x3
cannot go to a as a = 0, and so it must go into some X-segment of the gate b = d ∨ e
going out of b, which is again impossible.

    Note that the proof of P-hardness in Theorem 6 does not go through for A = >.
Thus, for (Cov > , q T 1T ), we are in the framework of Example 1 and, by Theorem 4 (ii),
this OMQ is in NL. In fact, we have the following NL/P dichotomy for the OMQs of
the form (Cov > , q T nT ).

Theorem 7. (i) The OMQ (Cov > , q T 1T ) is NL-complete (whereas (Cov A , q T 1T ) is
P-complete).
     (ii) The OMQs (Cov > , q T nT ) (and (Cov A , q T nT )), for n ≥ 2, are P-complete.

Proof. The proof is similar to that of P-hardness in Theorem 6.

     We now apply Theorem 4 (ii) to the class of T F -path CQs of the form

              T                  F                     T                        T
                      ...                    ...                    ...              ...
    qT F = y                     x                     y1                       ym         ym+1
            0


where the T (yi ) and F (x) are all the solitary occurrences of T and F in q T F . We
represent this CQ as
                             q T F = {T (y0 )} ∪ q 0 ∪ q,
where q 0 is the sub-CQ of q T F between y0 and x with T (y0 ) removed and q is the
same as in Theorem 5 (and q ∗m+1 is q without F (x)).

Theorem 8. If q satisfies the condition of Theorem 5 and there is a homomorphism
h : q ∗m+1 → q 0 such that h(x) = y0 , then answering (Cov A , q T F ) is NL-complete.

     For example, the OMQ (Cov A , q) with q shown below is NL-complete:
                                 T           FT        F                T


On the other hand, as shown in [9], (Cov A , q) with q of the form
                                         T         F            T


is P-complete; in fact, it follows from the proof that (Cov > , q) is P-complete, too.


6    2-CQs

A 2-CQ has at least two solitary F and at least two solitary T . For example, as shown
in [9], answering (Cov A , q) with the following CQ q is CO NP-complete:
                             T               T              F               F


The proof can be generalised to the class of 2-2-CQs, which are path 2-CQs where
all the F are located after all the T , and every occurrence of T or F is solitary. We
represent any given 2-2-CQ q as shown below
                             T               T              F               F
                       p     x       r       y     s        z       u       w    v

where p, r, u and v do not contain F and T , while s may contain solitary occurrences
of both T and F (in other words, the T shown in the picture are the first two occurrences
of T in q and the F are the last two occurrences of F in q). Denote by q r the suffix of
q that starts from x but without T (x); similarly, q u is the suffix of q starting from z but
without F (z). Denote by q − r the prefix of q that ends at y but without T (y); similarly,
q−
 u  is the prefix of q ending  at w but without F (w). Using the construction from [9],
one can show the following:
Theorem 9. Any OMQ (Cov A , q) with a 2-2-CQ q is CO NP-complete provided the
following conditions are satisfied: (i) there is no homomorphism h1 : q u → q r with
h1 (z) = x, and (ii) there is no homomorphism h2 : q −      −
                                                      r → q u with h2 (y) = w.

    Unfortunately, the proof does not generalise to the other two path 2-CQs with four
variables, whose complexity has so far been open:
                                                          T                F                T                      F
                                        q1
                                                          T                F                F                      T
                                        q2

Theorem 10. The OMQs Q1 = (Cov A , q 1 ) and Q2 = (Cov A , q 2 ) are CO NP-complete.
Proof. We prove CO NP-hardness of Q1 by reduction of the NP-complete 3SAT. Given
a 3CNF ψ, we construct an ABox Aψ as follows. First, for every variable p occurring
in ψ, we take the following p-gadget, where n is the number of clauses in ψ:
              F       F                 F             F                F                F                F             F            F            F


              T       T                 T             T                T        F       T                T             T            T            T

                               a1               ...           an                                b1           ...           bn
      A           A        A                A             A            A            A                A             A            A            A           A


      T                                                                F                F                                                                T

                                    p                                                                                  ¬p
      F                                                                T                T                                                                F


      A           A        A                A             A            A            A                A             A            A            A           A
                                                ...                                                          ...
                               c1                             cn                                d1                         dn
              F       F                 F             F                F        T       F                F             F            F            F


              T       T                 T             T                T                T                T             T            T            T


The key property of the p-gadget is that, for any model I of Cov A based on this gadget,
if I 6|= q 1 then either the A-points on left-hand side of the gadget are all in T I and
the A-points on the right-hand side are all in F I , or the other way round. We refer to
the ai and bi as p↑ - and ¬p↑ -contacts, and to the ci and di as p↓ - and ¬p↓ -contacts,
respectively.
    Now, for every clause c = (l1 ∨ l2 ∨ l3 ) in ψ, we add to the constructed gadgets for
the variables in ψ the atoms R(uc¬l1 , vlc2 ), R(vlc2 , c), T (c), R(c, wlc3 ), where c is a new
individual, uc¬l1 a fresh ¬l1↑ -contact, vlc2 a fresh l2↓ -contact, and wlc3 a fresh l3↓ -contact.
For example, for the clause c = (p ∨ q ∨ ¬r), we obtain the fragment below:
                           A                                                    A                    T                                   A

                      bi                                                   cj                        c                              dk

          A           ¬p                    A                      A       q        A                         A                 ¬r                   A


The resulting ABox Aψ is such that ψ is satisfiable iff Cov A , Aψ 6|= q 1 .
   For the OMQ Q2 , we use (simplified) p-gadgets presented in Theorem 11 and con-
nect them similarly to the picture above with T replaced by F at c.
   We now sketch two generalisation of Theorem 10.
   In Theorem 11 and 12, we assume that p and v do not contain F and T , while
r and u may only contain solitary occurrences of T (F 6∈ r, u), and s only solitary
occurrences of F (T 6∈ s).
Theorem 11. Any OMQ (Cov A , q) with q of the form
                                                             T                                  F               F                           T
                                                 p           x              r                   y       s       z           u               w                v
is CO NP-complete provided the following conditions are satisfied: (i) there is no ho-
momorphism h1 : rt → u with h1 (y) = w, and (ii) there is no homomorphism
h2 : ut → r with h2 (z) = x, where r is the sub-CQ of q between x and y without
T (x), F (y), and similarly for u, rt is r with T (x) and ut is u with T (w).
         The proof uses the following p-gadget:
                 v            v             v                v                  v                                       v               v                v                v            v
                 T            T             T                T              T                                               T               T                    T            T            T

                 u            u             u                u                  u                                       u               u                u                u            u
     p                                                                                                                                                                                             v
  T              F            F             F                F              F                       F       F               F               F                    F            F            F           T

     r           s            s             s                s               s                          s               s               s                s                s            s           u
                                   a1            ...              an        A               r                   u           A               b1               ...              bn
 p       A   r        A   r        A    r        A     r          A    r                                                            r       A    r           A       r        A    r       A   r   A       p
                 p            p             p                p                          v                           v                   p                p                p            p
                                                                            p                                                   p
     s                                                                              u                                   r                                                                              s

         F                              p                                   F                                               F                                        ¬p                            F

     u                                                                              s                                   u                                                                              r
                 v            v             v                v                          v                           v                   v                v                v            v
         A            A            A             A                A         A                                               A               A                A                A            A       A
 p           u                     ... u                                                                                                                                      ... u
                      c1 u                       cn u                  u                                                            u            u           c1 u                          cn u            p
     s           s            s             s                s                  r                                       u               s                s                s            s           s
                                                                                                        s
  F              F            F             F                F              T                                               T               F                    F            F            F           F
                                                                             p                                          v
     r           r            r             r                r                                                                          r                r                r            r           r

  T              T            T             T                T                                                                              T                    T            T            T           T
   p              p            p             p                p                                                                         p                p                p            p           p


If one of the two conditions of this theorem is not satisfied then the given gadget will
not work; see the shaded parts. For example, the complexity of the OMQ with the CQ
below is still unknown:
             T                                         F                            F                   F               T                        T                                             T

             x                                           y                                              z                                                                                      w
                                                                      ext
    In Theorem 12, we use r       = r(x, y) ∧ T (y) ∧ s1 (y, y1 ) ∧ F (y1 ), where s1 is
the part of s such that s(y, z) = s1 (y, y1 ) ∧ F (y1 ) ∧ s2 (y1 , z) and s1 (y, y1 ) does not
contain any occurrences of F . In other words, the variable y1 corresponds to the first
appearance of F in s, where s is the sub-CQ of q between y and z without F (y), T (z).
Theorem 12. Any OMQ (Cov A , q) with q of the form
                                                             T                                  F               T                           F
                                                 p           x              r                   y       s       z           u               w                v
is CO NP-complete provided the following conditions hold: (i) there is no homomor-
phism g1 : rt → u with g1 (y) = w, and (ii) there is no homomorphism g2 : u → r ext
with g2 (z) = x and g2 (w) = y1 .
         The CQ below does not satisfy (ii), and its complexity is unknown:
         T                    T                      F                          F                       F                   T                        T                        T                    F

         x                                           y                                                                      z                                                                      w

It is also unknown whether there are OMQs with 2CQs in P.
Acknowledgements. The work of O. Gerasimova and M. Zakharyaschev was carried
out at the National Research University Higher School of Economics and supported by
the Russian Science Foundation under grant 17-11-01294.


References
 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
 2. Afrati, F.N., Gergatsoulis, M., Toni, F.: Linearisability on datalog programs. Theor. Comput.
    Sci. 308(1-3), 199–226 (2003), https://doi.org/10.1016/S0304-3975(02)
    00730-2
 3. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge Univer-
    sity Press, New York, NY, USA, 1st edn. (2009)
 4. Benedikt, M., ten Cate, B., Colcombet, T., Vanden Boom, M.: The complexity of bound-
    edness for guarded logics. In: 30th Annual ACM/IEEE Symposium on Logic in Computer
    Science, LICS 2015, Kyoto, Japan, July 6-10, 2015. pp. 293–304. IEEE Computer Society
    (2015), https://doi.org/10.1109/LICS.2015.36
 5. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through
    disjunctive datalog, CSP, and MMSNP. ACM Transactions on Database Systems 39(4),
    33:1–44 (2014)
 6. Bulatov, A.A.: A dichotomy theorem for nonuniform csps. In: Umans, C. (ed.) 58th IEEE
    Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA,
    October 15-17, 2017. pp. 319–330. IEEE Computer Society (2017), https://doi.org/
    10.1109/FOCS.2017.37
 7. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization prob-
    lems for database logic programs (preliminary report). In: STOC. pp. 477–490 (1988)
 8. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, mmsnp, and
    expressive description logics (invited talk). In: Benedikt, M., Orsi, G. (eds.) 20th Inter-
    national Conference on Database Theory, ICDT 2017, March 21-24, 2017, Venice, Italy.
    LIPIcs, vol. 68, pp. 1:1–1:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017),
    https://doi.org/10.4230/LIPIcs.ICDT.2017.1
 9. Gerasimova, O., Kikot, S., Podolskii, V., Zakharyaschev, M.: On the data complexity of
    ontology-mediated queries with a covering axiom. In: Proceedings of the 30th International
    Workshop on Description Logics (2017)
10. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: Calvanese,
    D., Konev, B. (eds.) Proceedings of the 28th International Workshop on Description Logics,
    Athens,Greece, June 7-10, 2015. CEUR Workshop Proceedings, vol. 1350. CEUR-WS.org
    (2015), http://ceur-ws.org/Vol-1350/paper-24.pdf
11. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs
    and non-Horn ontologies. Artif. Intell. 236, 90–118 (2016), http://dx.doi.org/10.
    1016/j.artint.2016.03.006
12. Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G.,
    Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., Giese, M., Lie, H., Ioannidis, Y.E., Kotidis,
    Y., Koubarakis, M., Waaler, A.: Ontology based data access in statoil. J. Web Sem. 44, 3–36
    (2017), https://doi.org/10.1016/j.websem.2017.05.005
13. Lanti, D., Rezk, M., Xiao, G., Calvanese, D.: The NPD benchmark: Reality check for OBDA
    systems. In: Alonso, G., Geerts, F., Popa, L., Barceló, P., Teubner, J., Ugarte, M., den Buss-
    che, J.V., Paredaens, J. (eds.) Proceedings of the 18th International Conference on Extending
    Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015. pp. 617–628.
    OpenProceedings.org (2015), https://doi.org/10.5441/002/edbt.2015.62
14. Lutz, C., Sabellek, L.: Ontology-mediated querying with the description logic EL: tri-
    chotomy and linear datalog rewritability. In: Sierra, C. (ed.) Proceedings of the Twenty-Sixth
    International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia,
    August 19-25, 2017. pp. 1181–1187. ijcai.org (2017), https://doi.org/10.24963/
    ijcai.2017/164
15. Marcinkowski, J.: DATALOG sirups uniform boundedness is undecidable. In: Proceedings,
    11th Annual IEEE Symposium on Logic in Computer Science, New Brunswick, New Jersey,
    USA, July 27-30, 1996. pp. 13–24. IEEE Computer Society (1996), https://doi.org/
    10.1109/LICS.1996.561299
16. Ramakrishnan, R., Sagiv, Y., Ullman, J.D., Vardi, M.Y.: Proof-tree transformation theorems
    and their applications. In: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART sym-
    posium on Principles of database systems. pp. 172–181. ACM (1989)
17. Saraiya, Y.P.: Linearizing nonlinear recursions in polynomial time. In: Silberschatz, A. (ed.)
    Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles
    of Database Systems, March 29-31, 1989, Philadelphia, Pennsylvania, USA. pp. 182–189.
    ACM Press (1989), http://doi.acm.org/10.1145/73721.73740
18. Schaerf, A.: On the complexity of the instance checking problem in concept languages with
    existential quantification. J. of Intelligent Information Systems 2, 265–278 (1993)
19. Ullman, J.D., Gelder, A.V.: Parallel complexity of logical query programs. Algorithmica 3,
    5–42 (1988), https://doi.org/10.1007/BF01762108
20. Vardi, M.Y.: Decidability and undecidability results for boundedness of linear recursive
    queries. In: Edmondson-Yurkanan, C., Yannakakis, M. (eds.) Proceedings of the Seventh
    ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March
    21-23, 1988, Austin, Texas, USA. pp. 341–351. ACM (1988), http://doi.acm.org/
    10.1145/308386.308470
21. Zhang, W., Yu, C.T., Troy, D.: Necessary and sufficient conditions to linearize double re-
    cursive programs in logic databases. ACM Trans. Database Syst. 15(3), 459–482 (1990),
    http://doi.acm.org/10.1145/88636.89237
22. Zhuk, D.: A proof of CSP dichotomy conjecture. In: Umans, C. (ed.) 58th IEEE Annual
    Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, Octo-
    ber 15-17, 2017. pp. 331–342. IEEE Computer Society (2017), https://doi.org/10.
    1109/FOCS.2017.38

</pre>