Constraint Patterns for Tractable
    Ontology-Mediated Queries with Datatypes

                André Hernich, Julio Lemos, and Frank Wolter

                           University of Liverpool, UK
                   {hernich,jlemos,wolter}@liverpool.ac.uk


      Abstract. Adding datatypes to ontology-mediated conjunctive queries
      (OMQs) often makes query answering hard. This applies, in particular, to
      datatypes with non-unary predicates. In this paper we propose a new, non-
      uniform way, of analysing the data-complexity of OMQ answering with
      datatypes containing higher-arity predicates. We aim at a classification of
      the patterns of datatype atoms in OMQs into those that can occur in non-
      tractable OMQs and those that only occur in tractable OMQs. Our main
      result is a P/coNP-dichotomy for OMQs over DL-Lite TBoxes and rooted
      CQs using the datatype (Q, ≤). The proof employs a recent dichotomy
      result by Bodirsky and Kara for temporal constraint satisfaction problems.


1    Introduction
In recent years, querying data using ontologies has become one of the main
applications of description logics (DLs). The general idea is that an ontology
is used to enrich incomplete and heterogeous data with a semantics and with
background knowledge, thus serving as an interface for querying data and allowing
to derive additional facts. In this area called ontology-based data management
(OBDM) one of the main research problems is to identify ontology languages
and queries for which query answering scales to large amounts of data [11, 6]. In
DL, ontologies take the form of a TBox, data is stored in an ABox, and the most
important class of queries are conjunctive queries (CQs). A basic observation
regarding this setup is that even for DLs from the DL-Lite family that have
been designed for tractable OBDM the addition of datatypes to the TBoxes or
the CQs typically leads to non-tractable query answering problems [2, 20]. As
a consequence of this, the use of datatypes in ontology and query languages
for OBDM has been severely restricted. For example, the OWL2 QL standard
admits datatypes with unary predicates only. Nevertheless, in applications there
is clearly a need for more expressive datatypes and, in particular, datatypes with
predicates of higher arity.
    The aim of this paper is to revisit OBDM with expressive datatypes from a
new, non-uniform, perspective. Instead of the standard approach that aims at
the definition of DLs L and query languages Q such that for any TBox T in L
and any query q in Q, answering q under T is tractable in data-complexity we
now aim at describing the complexity of query answering with datatypes at a
more fine-grained level by taking into account the way in which datatype atoms
2        André Hernich, Julio Lemos, and Frank Wolter

can occur in queries. In more detail, an ontology-mediated query (OMQ) Q is a
triple Q = (T , q, D) consisting of a TBox T 1 , a CQ q, and a datatype D (that
we identify with a relational structure (D, R1 , . . .)). The TBox T in Q can refer
to the datatype D using existential restrictions ∃U where U is an attribute. The
CQ q can contain atoms using attributes and relations Ri from D. We aim at
a classification of the data complexity of query answering with OMQs (T , q, D)
based on the datatype pattern dtype(q) of q that consists of all atoms using the
relations in D. To illustrate, the CQ q uses the datatype D = (Q, ≤) and asks
for all rectangles whose width is larger than its height:
            q(x)   ←     rectangle(x) ∧ height(x, u) ∧ width(x, v) ∧ u ≤ v.
Thus, height and width are attributes and the datatype pattern dtype(q) of q is
u ≤ v. The following CQ q 0 uses (Q, ≤) as well and asks for all countries having
a neigbour to the west that is larger and a neighbour to the east that is smaller:
        q 0 (x) ← country(x) ∧ westneighbour(x, y1 ) ∧ eastneighbour(x, y2 ) ∧
                  area(x, u) ∧ area(y1 , u1 ) ∧ area(y2 , u2 ) ∧
                  (u ≤ u1 ) ∧ (u2 ≤ u).
Its datatype pattern dtype(q 0 ) is (u ≤ u1 ) ∧ (u2 ≤ u).
    Our main results assume that either the CQ q is a rooted CQ (a CQ in
which each variable is connected to an answer variable not in dtype(q)) or that
the chase of the TBox under consideration is finite for all ABoxes. Under this
assumption, we first show a close link between the complexity of query evaluation
for OMQs with datatype D and the evaluation problem for positive existential
sentences in the structure D̄ in which every relation R has been replaced by its
complement (thus, D̄ = (Q, >) if D = (Q, ≤)). In more detail, we show that
evaluating an OMQ with datatype D and a datatype pattern with at most k
atoms is polynomially reducible to the complement of the problem PEk (D̄) of
evaluating positive existential formulas in k-CNF in D̄. Conversely, PEk (D̄) is
polynomially reducible to the complement of evaluating a single fixed OMQ
(depending only on k) with datatype D and a datatype pattern with nk atoms,
where n is the number of relations in D.
    Basic complexity results can be obtained easily from this observation. For
example, from the fact that PE1 (D) is tractable for D ∈ {(Z, <), (Z, ≤), (Q, <),
(Q, ≤)}, we obtain that evaluating OMQs (T , q, D) in which q is a rooted CQ
whose datatype pattern is a singleton is tractable. Conversely, from the fact
that PE2 (Q, >) is non-tractable we obtain an intractable OMQ (T , q, D) with
datatype D = (Q, ≤) and a rooted CQ q whose datatype pattern consists of two
atoms.
    Our main result is a P/coNP-dichotomy for OMQs using the datatype D =
(Q, ≤). Namely, we show that for any datatype pattern q0 over (Q, ≤) exactly
one of the following two conditions holds (unless P=coNP):
1
    The results presented in this paper do not depend on the particular tractable DL we
    extend with datatypes. To prove the lower bounds we require that the TBox is given
    in a DL containing DL-Litecore ; the upper bounds can be proved for all standard
    Horn-DLs for which CQ evaluation is in PTime.
                       Tractable Ontology-Mediated Queries with Datatypes           3

 – Evaluating rooted OMQs (T , q, D) with dtype(q) = q0 is in PTime.
 – There exists a rooted OMQ (T , q, D) with dtype(q) = q0 whose evaluation
   problem is coNP-hard.
The proof uses a recent dichotomy result by Bodirsky and Kára for temporal con-
straint satisfaction problems [7] and provides a sufficient and necessary condition
for the evaluation problem to be in PTime that can be checked in linear time.
    Due to space limitations, some proofs had to be deferred to the full version
of this paper, which is available on the authors’ websites.
Related Work Expressive DLs with datatypes (or concrete domains) have been
introduced in [4] and studied extensively [15]. In the context of tractable DLs,
reasoning with datatypes has been studied in [3, 17] for EL and in [19, 20, 2] for DL-
Lite. These works focus on finding ontology languages for which typical reasoning
tasks are tractable. In contrast, here we start with ontology languages for which
query answering is intractable in general, and aim at a complexity classification
of query answering guided by the datatype pattern. Our methodology is closely
related to recent work relating OBDM to constraint satisfaction problems [16, 5,
13]. However, here we classify datatype patterns according to the data-complexity
of evaluating the OMQs containing them, whereas in [16] TBoxes are classified
according to the data-complexity of OMQs containing them, and in [5] OMQs
themselves are classified according to their data-complexity. Consequently, here
we establish a link to temporal constraint satisfiaction [7] whereas the work
mentioned above establishes a link to standard constraint satisfaction and the
Feder-Vardi conjecture [12, 10].


2    Preliminaries
A datatype is a tuple D = (D, R1 , R2 , . . . ), where D is a non-empty set and
R1 , R2 , . . . are relations on D. We call dom(D) := D the domain of D. To
simplify the presentation, we will not distinguish between a relation Ri and its
name (i.e., we use Ri both as a relation and as the name of the relation Ri ). The
complement of D, denoted by D̄, is obtained by replacing each k-ary relation Ri
in D by its complement R̄i := Dk \ Ri . For example, we have (Q, ≤) = (Q, >).
    We assume countably infinite and mutually disjoint sets of concept names, role
names, attribute names, and individual names. Concept names will typically be
denoted by A, role names by P , attribute names by U , and individual names by
a, b, c. The logic DL-Litecore and Horn DLs such as Horn-ALCHIQ are defined as
usual [11, 1, 14, 9]. We consider the extension Lattrib of such DLs L in which the
existential restrictions ∃U for attribute names U can occur in exactly the same
places in concepts and concept inclusions as concept names can occur in L. Unless
stated otherwise, TBoxes range over Lattrib TBoxes, where L is DL-Litecore or
any standard Horn-DL with data complexity for CQs in PTime.
    Let D be a datatype. A D-ABox consists of assertions of the form A(a),
P (a, b), and U (a, u), where A is a concept name, P is a role name, U is an
attribute name, a, b are individual names, and u ∈ dom(D). A D-knowledge base
(D-KB) is a pair (T , A) consisting of a TBox T and a D-ABox A.
4       André Hernich, Julio Lemos, and Frank Wolter

     An interpretation I = (∆I , ·I ) over a datatype D consists of a non-empty
domain ∆I = ∆Iind ∪dom(D) and an interpretation function ·I that assigns to each
concept name A a set AI ⊆ ∆Iind , to each role name P a relation P I ⊆ ∆Iind ×∆Iind ,
and to each attribute name U a relation U I ⊆ ∆Iind × dom(D). The elements in
∆Iind are called individuals, whereas the elements in dom(D) are called data values.
We assume that ∆Iind and dom(D) are disjoint. Throughout this paper, we make
the standard name assumption: if I is an interpretation, then we set aI := a for
all individual names a. We also set uI := u for each u ∈ dom(D), and RI := R
for each relation R of D. The interpretation I induces the interpretations C I
and S I for each complex concept C and complex role S in the standard way.
     We say that I is a model of a KB (T , A) if aI ∈ AI , (aI , bI ) ∈ P I , and
(a , uI ) ∈ U I for all assertions A(a), P (a, b), and U (a, u) in A, and if for every
   I

inclusion X v Y in T we have X I ⊆ Y I . A KB (T , A) is satisfiable if it has a
model; in this case we say that A is satisfiable relative to T .
     We consider conjunctive queries (CQs) q (over D) of the form q(x̄) ← ϕ,
where x̄ is the tuple of answer variables of q, and ϕ is a conjunction of atomic
formulas of the form A(y), P (y, z), U (y, u), or R(u1 , . . . , uk ), where A, P , U , and
R range over concept names, role names, attribute names, and relation names in
D, respectively; each y, z is a variable; and each u, u1 , . . . , uk is a variable. As
usual, all variables of x̄ must occur in some atom of ϕ. The size |q| of q is the
number of atoms in q. The datatype pattern of q, denoted by dtype(q), is the
conjunction of all atoms in ϕ that use a relation in D. The variables that occur
in dtype(q) are called data variables. We assume that all data variables occur in
some atom U (·, ·) outside of dtype(q). A match of q in an interpretation I is a
mapping µ from the variables of ϕ to ∆I such that for each atom X(t1 , . . . , tk )
of ϕ we have (µ(t1 ), . . . , µ(tk )) ∈ X I . A tuple c̄ of individual names and data
values is an answer to q in an interpretation I if there is a match µ of q in I
such that µ(x̄) = c̄. We denote this by I |= q(c̄). Given a KB (T , A), we write
T , A |= q(c̄) if c̄ is an answer to q in every model of (T , A).
     The connection graph of a CQ q is the undirected graph with the variables
of q as its vertices and an edge between any two distinct variables if they occur
together in an atom of q that does not belong to dtype(q). We say that q is rooted
if for every variable y of q the connection graph contains a path from y to an
answer variable of q.
     We consider ontology-mediated queries (OMQs) of the form Q = (T , q, D),
where D is a datatype, T is a TBox, and q is a CQ over D. Given a D-ABox
A and a tuple c̄, we write A |= Q(c̄) if (T , A) |= q(c̄). An OMQ Q = (T , q, D)
is rooted if q is rooted. The query evaluation problem for Q is the problem to
decide for given D-ABox A and c̄ whether A |= Q(c̄).


3    Query Evaluation and Positive Existential Sentences

We establish a tight link between the OMQ evaluation problem with datatype D
and the satisfaction problem for positive existential sentences over D̄. To this end
we first introduce a variant of the universal (or canonical) model for standard
                         Tractable Ontology-Mediated Queries with Datatypes              5

Horn-DL KBs. In contrast to KBs with Horn-DL TBoxes without datatypes, in
general there does not exist a universal model for KBs with datatypes.
Example 1. Consider the KB (T , A) with T = {A v ∃U1 , A v ∃U2 } and A =
{A(a)} and with datatype (Q, ≤). Consider the OMQs Qi = (T , qi , (Q, ≤)),
i = 1, 2, where
q1 (x) ← U1 (x, u1 ) ∧ U2 (x, u2 ) ∧ u1 ≤ u2   q2 (x) ← U1 (x, u1 ) ∧ U2 (x, u2 ) ∧ u2 ≤ u1
Clearly A 6|= Q1 (a) since for the interpretation I with U1I = {(a, 2)} and
U2I = {(a, 1)} we have I 6|= q1 (a). Also, A 6|= Q2 (a) since for the interpretation J
with U1J = {(a, 1)} and U2J = {(a, 2)} we have J 6|= q2 (a). However, there does
not exist a model I of T and A such that I 6|= qi (a) for both i = 1 and i = 2.
The reason that universal models do not exist is that distinct interpretations
of attributes can be required to refute the entailment of CQs. The notion of
pre-interpretation formalizes this intuition: it fixes the interpretation of concept
and roles names but leaves the interpretation of attributes open by adding
placeholders for data values (called data nulls) to the set of possible values
of attributes. A pre-interpretation J over D is the same as an interpretation
over D with the exception that attribute names U are now interpreted as sets
U J ⊆ ∆J                      J             J
          ind × (dom(D) ∪ ∆null ), where ∆null is a set of data nulls disjoint from
  J
∆ind ∪ dom(D). The definitions of the interpretations C J of a concept C and S J
of a role S are extended from interpretations to pre-interpretations in the obvious
way. A pre-model of a KB is a pre-interpretation that satisfies all assertions and
inclusions in the KB.
    Pre-interpretations J can be completed to interpretations by assigning data
values to data nulls. A completion function f for J is a mapping f : ∆J        null →
dom(D). The completion f (J ) of J by f is the interpretation I obtained from
J by setting AI = AJ for all concept names A, P I = P J for all role names P ,
and U I = (U J ∩ (∆J                                            J        J
                      ind × dom(D))) ∪ {(d, f (v)) | (d, v) ∈ U , v ∈ ∆null } for all
attribute names U . An interpretation I is called a completion of J if there exists
a completion function f for J such that f (J ) = I.
    Using a straightforward modification of the standard chase procedure for
Horn-DLs (see, e.g., [9]) one can construct a pre-model can(T , A) of any satisfiable
D-KB (T , A) such that for any CQ q over D and any c̄:
   (T , A) |= q(c̄)   ⇔     f (can(T , A)) |= q(c̄) for all completion functions f .
We call can(T , A) with this property a universal pre-model of T and A.
Lemma 1. For every satisfable D-KB (T , A) there exists a universal pre-model
can(T , A).
Example 2. A universal pre-model can(T , A) for the KB (T , A) given in Exam-
                                                                         can(T ,A)
ple 1 is given by setting ∆can(T ,A) = {a, u1 , u2 }; Acan(T ,A) = {a}; U1         =
                 can(T ,A)
{(a, u1 )}; and U2         = {(a, u2 )}.
    A completion function f for can(T , A) maps u1 and u2 to rational numbers
and defines a completion f (can(T , A)) in which U1 is interpreted as (a, f (u1 ))
and U2 is interpreted as (a, f (u2 )).
6        André Hernich, Julio Lemos, and Frank Wolter

The universal pre-model can(T , A) can be infinite. If we are given a rooted OMQ
Q = (T , q, D), then it is sufficient to consider the subinterpretation cann (T , A) of
can(T , A) induced by the set of domain elements that are reachable from ABox
elements in at most n = |q| steps. We call cann (T , A) a n-universal pre-model of
T and A. As Q is rooted, it has the following property for any c̄:

    (T , A) |= q(c̄)   ⇔     f (cann (T , A)) |= q(c̄) for all completion functions f .

A straightforward modification of the standard chase shows that an n-universal
pre-model cann (T , A) can be computed in polynomial time in the size of A [9].
Lemma 2. Assume OMQ Q = (T , q, D) is given. Then one can compute for
any ABox A that is satisfiable relative to T a |q|-universal pre-model of T and
A in polynomial time in the size of A.
A positive existential sentence Φ over a datatype D is a first-order sentence
built from atomic formulas over the relations in D by using solely conjunction,
disjunction and existential quantifiers. Atomic formulas can use both individual
variables and constants from D. Φ is in Conjunctive Normal Form (CNF) if it
has the form
                            m
                            ^                           ni
                                                        _
                  Φ = ∃x̄         ci ,   where ci =           ci,j for i = 1, . . . m
                            i=1                         j=1

where ci,j are atomic formulas. If ni = k, for each i, then we say that Φ is in
k-CNF. The problem of deciding whether a positive existential sentence in k-CNF
is satisfied in D is denoted PEk (D). We now show that we have a polynomial
time reduction from evaluating OMQs over D to the complement of the problem
PEk (D̄) and vice versa.
Theorem 1. Let k > 0 and let D = (D, R1 , . . . , Rn ) be a datatype.
   Let Q = (T , q, D) be a rooted OMQ and assume that dtype(q) has k atoms.
Then evaluating Q is polynomially reducible to the complement of PEk (D̄).
   Conversely, there is a rooted OMQ Q = (T , q, D) such that dtype(q) has nk
atoms and PEk (D̄) is polynomially reducible to the complement of evaluating Q.
Proof. Assume q is given as q(x̄) ← ϕ. Let A be an ABox satisfiable relative to
T and let c̄ be a tuple of individual names and data values in A of the same
length as x̄. Remove from ϕ the datatype pattern of q and denote by ψ the
remaining atoms in ϕ. A match of ψ in a pre-interpretation I is a mapping µ
from the variables in ψ to ∆I such that for each atom X(t1 , . . . , tk ) in ψ we have
(µ(t1 ), . . . , µ(tk )) ∈ X I . Consider the set X of all matches µ of ψ with µ(x̄) = c̄
                                                                   Vk
in cann (T , A), where n = |q|. Now assume that dtype(q) = i=1 Si (z̄i ) and let
                                                k
                                              ^ _
                                   Φ := ∃ū             S̄i (µ(z̄i )),
                                              µ∈X i=1

where ū is a repetition-free enumeration of all data nulls in the set {µ(z̄i ) | µ ∈ X,
1 ≤ i ≤ k} (here we identify data nulls with individual variables in the k-CNF
                                         Tractable Ontology-Mediated Queries with Datatypes                                                      7

Φ). It is readily checked that T , A |= q(c̄) if, and only if, D̄ 6|= Φ. This establishes
the first part.
                                      Vm                   Wk
    Conversely, assume Φ = ∃x̄ i=1 ci , where ci = j=1 ci,j for 1 ≤ i ≤ m. We
first assume that Φ is uniform, that is, for each 1 ≤ j ≤ k there exists a relation
Sj (independent from i) such that ci,j is of the form S̄j (t̄i,j ). In this case we
can construct the required OMQ (T , q, D) with |dtype(q)| = k. The TBox T is
independent from k and defined as T = {A v ∃U }. Assume that the relations Sj
are of arity lj and ci,j = S̄j (t̄i,j ) for all 1 ≤ i ≤ m. Before defining the CQ q we
define an ABox AΦ as follows:
 – AΦ uses individuals cΦ and c1 , . . . , cm that are connected by a role name P
   using the assertions P (cΦ , ci ) for 1 ≤ i ≤ m;
 – in addition AΦ uses individuals ci,j which are connected to the individuals ci
   by role names P1 , . . . , Pk using the assertions Pj (ci , ci,j ) for 1 ≤ i ≤ m and
   1 ≤ j ≤ k;
 – in addition AΦ uses individuals dt for each variable and constant t in Φ that
   are connected to the ci,j using role names N1j , . . . , Nljj for 1 ≤ j ≤ k and the
   assertions Nrj (ci,j , dt ) if the r-th component of t̄i,j equals t;
 – finally AΦ contains A(dt ) if t is a variable in Φ and U (dt , t) if t is a constant
   in Φ.
Define the query q(x) ← ψ, by setting
                      k                                  lj                                                                     
                      ^                                   ^
                                                                    Nrj (yj , zj,r )∧U (zj,r , uj,r ) ∧Sj (uj,1 , . . . , uj,lj )
                                                                                                     
ψ = P (x, y)∧                    Pj (y, yj )∧
                      j=1                                r=1

It is not difficult to show that D̄ |= Φ if, and only if, T , AΦ 6|= q(cΦ ). For
                                                                                  
        Φ0 = ∃x1 ∃x2 (R1 (1, x1 ) ∨ R2 (x2 , x1 )) ∧ (R1 (x1 , x2 ) ∨ R2 (x2 , 2))

the ABox AΦ and query q are shown in Figure 1.

                                              cΦ
                                  P                      P                                                          x
                      c1                                            c2
                                                                                                                P
                                                                                                               y
                 P1         P2                                 P1         P2                             P1                   P2
       c11                            c12          c21                             c22             y1                               y2
                                                                                                  N11         N21       N12          N22
                                  N22                    N21                                z11          z12             z21               z22
          N11         N21               N11        N12              N12        N22

        d1        A         dx1                              dx2                   d2       u11          u12             u21               u22
             U                                                                 U                        R1                     R2
         1                                                                         2

                                                   Fig. 1. ABox AΦ0 and CQ q


    It remains to consider the case in which Φ is not uniform. We may assume that
Ri 6= ∅ for all 1 ≤ i ≤ n. We equivalently transform Φ into a uniform sentence Ψ
8         André Hernich, Julio Lemos, and Frank Wolter

in nk-CNF over D̄. To this end, each conjunct ci of Φ is equivalently transformed
into a conjunct c0i of the form
                                k
                                _                               k
                                                                _
                                      R̄1 (t̄1i,j ) ∨ · · · ∨         R̄n (t̄ni,j )
                                j=1                             j=1


We construct R̄1 (t̄1i,j ) for a fixed i and 1 ≤ j ≤ k. The remaining atoms are
constructed in the same way. ci contains between 0 and k disjuncts of the form
R̄1 (t̄). Thus, if ci contains at least one disjunct of this form we take sufficiently
many copies to obtain R̄1 (t̄1i,1 )∨. . . , ∨R1 (t̄1i,k ) that is equivalent to the disjunction
over all atoms of the form R̄1 (t̄) in ci . If ci does not contain any R̄1 (t̄), then
take c̄ with c̄ ∈ R1 and let t̄1i,1 = · · · = t̄1i,k = c̄. Then R̄1 (t̄1i,1 ) ∨ . . . ∨ R̄1 (t̄1i,k ) is
unsatisfiable, as required.                                                                           t
                                                                                                      u

    We illustrate Theorem 1 by transfering some complexity results from temporal
CSP over (Q, <) to OMQs. Recall that we are after a complexity classification.
The following result shows that answering OMQs with datatype (Q, ≤) can be
intractable already for datatype patterns of size two.

Corollary 1. There is a rooted OMQ Q = (T , q, (Q, ≤)) with T = {A v ∃U }
and |dtype(q)| = 2 such that evaluating Q is coNP-hard.

Proof. The BETWEENNESS problem in        V temporal constraints satisfaction is the
problem to decide whether β := ∃u (x,y,z)∈C (x < y < z ∨ z < y < x) is
satisfiable in (Q, <), where C is a set of triples of the form (x, y, z). This problem
is NP-complete [18]. Clearly, β is equivalent to the 2-CNF
       ^
∃u           (x < y ∨ z < y) ∧ (x < y ∨ y < x) ∧ (y < z ∨ z < y) ∧ (y < z ∨ y < x).
    (x,y,z)∈C

The claim now follows from Theorem 1.                                                                t
                                                                                                     u

This is optimal as shown by the following result.
Corollary 2. Let D ∈ {(Z, <), (Z, ≤), (Q, <), (Q, ≤)}. Then evaluating rooted
OMQs Q = (T , q, D) with |dtype(q)| = 1 is in PTime.

Proof. Follows from Theorem 1 and the observation that satisfiability of sentences
in 1-CNF in D is decidable in PTime.                                            t
                                                                                u


4     A Dichotomy for (Q, ≤)
In this section, we focus on rooted OMQs that use the datatype (Q, ≤). We prove
a P/coNP-dichotomy of evaluating such OMQs based on their datatype pattern,
and provide a simple syntactic characterization of the datatype patterns of rooted
OMQs with datatype (Q, ≤) for which the evaluation problem can be solved
in polynomial time (Theorem 4). These results are based on a recent dichtomy
result by Bodirsky and Kára [7] for temporal constraint satisfaction problems.
                              Tractable Ontology-Mediated Queries with Datatypes                          9

      We start by reviewing the temporal constraint satisfaction framework of [7]. A
temporal constraint language is a logical structure Γ = (Q, R1 , R2 , . . . ), where each
Ri of arity k is definable by a first-order formula Φ(x1 , . . . , xk ) over (Q, <), i.e.,
Ri = {(a1 , . . . , ak ) ∈ Qk | (Q, <) |= Φ(a1 , . . . , ak )}. A primitive positive sentence
over Γ is a first-order sentence over Γ built from atomic formulas using conjunction
and existential quantification. It is crucial for the results in [7] that both first-
order definitions of relations in Γ and primitive positive sentences over Γ do not
contain constants. The problem of deciding whether a primitive positive sentence
over Γ is satisfied in Γ is denoted by CSP(Γ ).
      Bodirsky and Kára [7] proved that for every temporal constraint language
Γ , CSP(Γ ) is either in PTime or NP-complete. They characterize the temporal
languages Γ for which CSP(Γ ) is tractable in terms of preservation properties of
the relations in Γ . A function f : Qk → Q preserves a relation R ⊆ Qn if for all
t1 , . . . , tk ∈ R we have f (t1 , . . . , tk ) ∈ R. Here, f (t1 , . . . , tk ) is obtained as follows.
Given a tuple t of length n and an integer i ∈ {1, . . . , n}, let t[i] denote the ith                     
component of t. Then, f (t1 , . . . , tk ) = f (t1 [1], . . . , tk [1]), . . . , f (t1 [n], . . . , tk [n]) .
We say that f preserves a temporal constraint language Γ if f preserves all
relations in Γ . The following functions are considered in [7]:
  – min : Q2 → Q which maps its two arguments to the minimal one;
  – mi : Q2 → Q which maps (x, y) ∈ Q2 to α(x) if x = y, to β(y) if x > y, and to
    γ(x) if x < y, where α, β, γ are any functions with α(x) < β(x) < γ(x) < α(y)
    for all x < y;
  – mx : Q2 → Q which maps (x, y) ∈ Q2 to β(x) if x = y, and to α(min{x, y})
    if x 6= y, where α, β are any functions with α(x) < β(x) < α(y) for all x < y;
  – ll : Q2 → Q which is any function that satisfies ll (x, y) < ll (x0 , y 0 ) iff x ≤ 0
    and (x, y) is lexicographically smaller than (x0 , y 0 ), or x, x0 > 0 and (y, x) is
    lexicographically smaller than (y 0 , x0 );
  – The dual of f ∈ {min, mi , mx , ll }, which maps (x, y) ∈ Q2 to −f (−x, −y).
Theorem 2 ([7]). Let Γ be a temporal constraint language. If Γ is preserved
under min, mi , mx , ll , one of their duals, or a constant function, then CSP(Γ )
is in PTime. Otherwise, CSP(Γ ) is NP-complete.
    We now translate the evaluation problem for rooted OMQs over (Q, ≤) into
the temporal constraint satisfaction framework. With every datatype pattern
                       Vk
q0 (z1 , . . . , zn ) = i=1 zsi ≤ zti we associate the temporal constraint language
                                                                                      Wk
 Γq0 := (Q, <, Rq0 ) where Rq0 := {(c1 , . . . , cn ) ∈ Qn | (Q, <) |=                  i=1 cti < csi }.
                                      Vk
Theorem 3. Let q0 (z1 , . . . , zn ) = i=1 zsi ≤ zti be a datatype pattern.
    If Q = (T , q, (Q, ≤)) is a rooted OMQ with dtype(q) = q0 , then evaluating Q
is polynomially reducible to the complement of CSP(Γq0 ).
    Conversely, there is a rooted OMQ Q = (T , q, (Q, ≤)) with dtype(q) = q0 such
that CSP(Γq0 ) is polynomially reducible to the complement of evaluating Q.
Proof (Sketch). We refine the proof of Theorem 1. For the first part, we construct
the positive existential sentence as in the proof of Theorem 1, and then express
10      André Hernich, Julio Lemos, and Frank Wolter

each disjunctive clause by a single Rq0 -atom. The result is a primitive positive
sentence Ψ 0 over Γq0 . Note that Γq0 may still contain constants. To obtain a
primitive positive sentence Ψ without constants, we turn each constant in Ψ 0
into an existentially quantified variable and add constraints that ensure that
the order of the elements assigned to these variables corresponds to the order
                                                                             0
of the constants. More precisely, let
                      Vl−1             c1 < · · · < cl be the constants in Ψ . Then,
Ψ = ∃c1 · · · ∃cl Ψ ∧ i=1 ci < ci+1 . It can be shown that Γq0 |= Ψ iff Γq0 |= Ψ 0 .
                   0

    For the second part, we first translate a given primitive positive sentence
over Γq0 into a positive existential sentence over (Q, >). Atoms using the relation
Rq0 are replaced by disjunctive formulas expressing the predicate defining the
relation Rq0 . Atoms using < are expanded into disjunctions with k atoms. The
resulting sentence is in k-CNF. The second part of the theorem now follows from
the construction in the proof of Theorem 1.                                        t
                                                                                   u

   Since < is preserved under min, mi , mx , ll and their duals, but not under
any constant function, Theorem 2 implies that CSP(Γq0 ) is in PTime iff Rq0 is
preserved under min, mi , mx , ll or one of their duals. Together with Theorem 3,
we obtain the following corollary.

Corollary 3. Let q0 be a datatype pattern. If Rq0 is preserved under min, mi ,
mx , ll , or one of their duals, then evaluating rooted OMQs (T , q, (Q, ≤)) with
dtype(q) = q0 is in PTime. Otherwise, there is a rooted OMQ Q = (T , q, (Q, ≤))
with dtype(q) = q0 such that evaluating Q is coNP-complete.

    To illustrate, consider the datatype pattern q0 (x, y, z) = x ≤ y ∧ y ≤ z. It
is straightforward to verify that Rq0 = {(a, b, c) ∈ Q3 | a > b ∨ b > c} is not
preserved under min, mi , mx , ll or their duals, so there are OMQs (T , q, (Q, ≤))
with dtype(q) = q0 for which theVevaluation problem       Vn is coNP-complete. On the
                                           n
other hand, if q0 has the form i=1 x0 ≤ xi or i=1 xi ≤ x0 , then it follows
from [8, Proposition 3.5] that Rq0 is preserved under ll or its dual. In particular,
evaluation of OMQs (T , q, (Q, ≤)) with dtype(q) = q0 is in PTime. In fact, we will
now show that these two forms of datatype patterns, which we call min-pattern
and max-pattern, respectively, essentially characterize all the tractable cases.
    The following lemma is the core of the characterization result. It implies that
preservation under min, mi , mx or their duals collapses to preservation under ll
or its dual for relations that are definable by normal disjunctive formulas. By
a disjunctive formula we mean a disjunction of atoms of the form x < y. A
disjunctive formula Φ(x1 , . . . , xn ) is normal if the directed graph with vertex set
{x1 , . . . , xn } and edge set {(xj , xi ) | xi < xj ∈ Φ} is acyclic.

Lemma 3. Let R ⊆ Qn be defined by a normal disjunctive formula Φ(x1 , . . . , xn )
over (Q, <). If RWis preserved under
                                 Wn min, mi , mx , ll , or one of their duals, then
                  n
Φ has the form i=1 x0 > xi or i=1 xi > x0 .

In particular,
          Wn [8, Proposition
                         Vn 3.5] implies that a relation defined by a formula of
the form i=1 x0 > xi or i=1 xi > x0 is preserved under ll or its dual.
   We apply the lemma to relations Rq0 for acyclic datatype patterns q0 . A
datatype pattern q0 is acyclic if the directed graph with the variables of q0 as
                          Tractable Ontology-Mediated Queries with Datatypes                11

vertices and an edge (x, y) for each atom x ≤ y of q0 is acyclic. Since a cycle
x0 ≤ x1 ∧x1 ≤ x2 ∧· · ·∧xn ≤ x0 tells us that x0 , x1 , . . . , xn have to be mapped to
the same data value, we can eliminate any cycle by removing all of its atoms, and
replacing x1 , . . . , xn by x0 . Let q0acyclic be the acyclic datatype pattern obtained
from a datatype pattern q0 by eliminating all of its cycles.
Theorem 4. Let q0 be a datatype pattern over (Q, ≤). If q0acyclic is a min-pattern
or a max-pattern, then evaluating rooted OMQs (T , q, (Q, ≤)) with dtype(q) = q0 is
in PTime. Otherwise, there is a rooted OMQ Q = (T , q, (Q, ≤)) with dtype(q) = q0
such that evaluating Q is coNP-complete.
Proof. To simplify the presentation, we assume without loss of generality that
q0 is acyclic. By Corollary 3, it suffices to show that q0 is a min-pattern or a
max-pattern iff Rq0 is preserved under min, mi , mx , ll , or one of their       Wnduals.
     If q0 is a min-pattern, then Rq0 is defined by a formula of the form i=1 x0 >
xi . Similarly, if q0 is a max-pattern, then Rq0 is defined by a formula of the form
W n
  i=1 xi > x0 . It is known [8, Proposition 3.5] that relations defined by such
formulas are preserved under ll and dual-ll , respectively.
     Now suppose that Rq0 is preserved under min, mi , mx , ll , or one of their
                                  Vk
duals. Let q0 (z1 , . . . , zn ) = i=1 zsi ≤ zti . Then, Rq0 is defined by Φ(z1 , . . . , zn ) =
Wk
  i=1 zti < zsi . Clearly, Φ is disjunctive, and W    since q0 is acyclic
                                                                       Wnit is also normal.
                                                        n
Thus, Lemma 3 implies that Φ has the form i=1 x0 > xi or i=1 xi > x0 . This
implies that q0 is a min-pattern or a max-pattern.                                             t
                                                                                               u
   Note that the fact that q0acyclic is neither a min-pattern nor a max-pattern
does not imply that evaluation is hard for all OMQs Q = (T , q, (Q, ≤)) with
dtype(q) = q0 . For example, q0acyclic may have several connected components, each
a min-pattern or a max-pattern. If no component is connected to another one via
a path of existential variables in q one can show that evaluating Q is in PTime.


5    Conclusion
We have presented first promising results for a non-uniform complexity analysis
of ontology-mediated queries with expressive datatypes. Many research questions
arise within this framework, including the following: (1) It remains an interesting
open problem whether our results generalize to non-rooted OMQs. Such OMQs
can “look” arbitrarily deep into a universal pre-model. We suspect that a finite
portion of a universal pre-model suffices, but this is far from obvious. (2) The
TBoxes we consider have very limited expressive power regarding datatypes
and it would be of interest to generalize our method to TBoxes that admit
more constructors using datatypes. (3) In this paper, we have used datatype
patterns within CQs to classify the complexity of OMQs. It would be of interest
to complement and extend this approach with classifications based on TBoxes,
OMQs, or extended patterns in CQs that take into account additional atoms
not using datatype relations. (4) In addition to the PTime/coNP dichotomy
considered above, it would be of interest to investigate FO-rewritability and
Datalog-rewritability of OMQs as well [5].
12      André Hernich, Julio Lemos, and Frank Wolter

References
 1. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
    and relations. J. Artif. Intell. Res. (JAIR) 36, 1–69 (2009)
 2. Artale, A., Ryzhikov, V., Kontchakov, R.: DL-Lite with attributes and datatypes.
    In: ECAI. pp. 61–66 (2012)
 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: IJCAI-05. pp.
    364–369 (2005)
 4. Baader, F., Hanschke, P.: A scheme for integrating concrete domains into concept
    languages. In: IJCAI 1991. pp. 452–457
 5. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A
    study through disjunctive datalog, csp, and MMSNP. ACM Trans. Database Syst.
    39(4), 33:1–33:44 (2014)
 6. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable
    description logics. In: Reasoning Web 2015. pp. 218–307
 7. Bodirsky, M., Kára, J.: The complexity of temporal constraint satisfaction problems.
    J. ACM 57(2) (2010)
 8. Bodirsky, M., Kára, J.: A fast algorithm and datalog inexpressibility for temporal
    reasoning. ACM Trans. Comput. Log. 11(3) (2010), http://doi.acm.org/10.1145/
    1740582.1740583
 9. Botoeva, E., Kontchakov, R., Ryzhikov, V., Wolter, F., Zakharyaschev, M.: Query
    inseparability for description logic knowledge bases. In: KR 2014
10. Bulatov, A.A., Jeavons, P., Krokhin, A.A.: Classifying the complexity of constraints
    using finite algebras. SIAM J. Comput. 34(3), 720–742 (2005)
11. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. Autom. Reasoning 39(3), 385–429 (2007)
12. Feder, T., Vardi, M.Y.: Monotone monadic SNP and constraint satisfaction. In:
    Proc. of the ACM Symposium on Theory of Computing. pp. 612–622 (1993)
13. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In:
    IJCAI 2015. pp. 3048–3054
14. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive
    description logics. In: IJCAI 2005. pp. 466–471
15. Lutz, C.: Description logics with concrete domains-a survey. In: Advances in Modal
    Logic 4. pp. 265–296 (2002)
16. Lutz, C., Wolter, F.: Non-uniform data complexity of query answering in description
    logics. In: KR 2012
17. Magka, D., Kazakov, Y., Horrocks, I.: Tractable extensions of the description logic
    EL with numerical datatypes. J. Autom. Reasoning 47(4), 427–450 (2011)
18. Opatrny, J.: Total ordering problem. SIAM J. Comput. 8(1), 111–114 (1979)
19. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
    Linking data to ontologies. J. Data Semantics 10, 133–173 (2008)
20. Savkovic, O., Calvanese, D.: Introducing datatypes in DL-Lite. In: ECAI 2012. pp.
    720–725