Constraint Patterns for Tractable Ontology-Mediated Queries with Datatypes André Hernich, Julio Lemos, and Frank Wolter University of Liverpool, UK {hernich,jlemos,wolter}@liverpool.ac.uk Abstract. Adding datatypes to ontology-mediated conjunctive queries (OMQs) often makes query answering hard. This applies, in particular, to datatypes with non-unary predicates. In this paper we propose a new, non- uniform way, of analysing the data-complexity of OMQ answering with datatypes containing higher-arity predicates. We aim at a classification of the patterns of datatype atoms in OMQs into those that can occur in non- tractable OMQs and those that only occur in tractable OMQs. Our main result is a P/coNP-dichotomy for OMQs over DL-Lite TBoxes and rooted CQs using the datatype (Q, ≤). The proof employs a recent dichotomy result by Bodirsky and Kara for temporal constraint satisfaction problems. 1 Introduction In recent years, querying data using ontologies has become one of the main applications of description logics (DLs). The general idea is that an ontology is used to enrich incomplete and heterogeous data with a semantics and with background knowledge, thus serving as an interface for querying data and allowing to derive additional facts. In this area called ontology-based data management (OBDM) one of the main research problems is to identify ontology languages and queries for which query answering scales to large amounts of data [11, 6]. In DL, ontologies take the form of a TBox, data is stored in an ABox, and the most important class of queries are conjunctive queries (CQs). A basic observation regarding this setup is that even for DLs from the DL-Lite family that have been designed for tractable OBDM the addition of datatypes to the TBoxes or the CQs typically leads to non-tractable query answering problems [2, 20]. As a consequence of this, the use of datatypes in ontology and query languages for OBDM has been severely restricted. For example, the OWL2 QL standard admits datatypes with unary predicates only. Nevertheless, in applications there is clearly a need for more expressive datatypes and, in particular, datatypes with predicates of higher arity. The aim of this paper is to revisit OBDM with expressive datatypes from a new, non-uniform, perspective. Instead of the standard approach that aims at the definition of DLs L and query languages Q such that for any TBox T in L and any query q in Q, answering q under T is tractable in data-complexity we now aim at describing the complexity of query answering with datatypes at a more fine-grained level by taking into account the way in which datatype atoms 2 André Hernich, Julio Lemos, and Frank Wolter can occur in queries. In more detail, an ontology-mediated query (OMQ) Q is a triple Q = (T , q, D) consisting of a TBox T 1 , a CQ q, and a datatype D (that we identify with a relational structure (D, R1 , . . .)). The TBox T in Q can refer to the datatype D using existential restrictions ∃U where U is an attribute. The CQ q can contain atoms using attributes and relations Ri from D. We aim at a classification of the data complexity of query answering with OMQs (T , q, D) based on the datatype pattern dtype(q) of q that consists of all atoms using the relations in D. To illustrate, the CQ q uses the datatype D = (Q, ≤) and asks for all rectangles whose width is larger than its height: q(x) ← rectangle(x) ∧ height(x, u) ∧ width(x, v) ∧ u ≤ v. Thus, height and width are attributes and the datatype pattern dtype(q) of q is u ≤ v. The following CQ q 0 uses (Q, ≤) as well and asks for all countries having a neigbour to the west that is larger and a neighbour to the east that is smaller: q 0 (x) ← country(x) ∧ westneighbour(x, y1 ) ∧ eastneighbour(x, y2 ) ∧ area(x, u) ∧ area(y1 , u1 ) ∧ area(y2 , u2 ) ∧ (u ≤ u1 ) ∧ (u2 ≤ u). Its datatype pattern dtype(q 0 ) is (u ≤ u1 ) ∧ (u2 ≤ u). Our main results assume that either the CQ q is a rooted CQ (a CQ in which each variable is connected to an answer variable not in dtype(q)) or that the chase of the TBox under consideration is finite for all ABoxes. Under this assumption, we first show a close link between the complexity of query evaluation for OMQs with datatype D and the evaluation problem for positive existential sentences in the structure D̄ in which every relation R has been replaced by its complement (thus, D̄ = (Q, >) if D = (Q, ≤)). In more detail, we show that evaluating an OMQ with datatype D and a datatype pattern with at most k atoms is polynomially reducible to the complement of the problem PEk (D̄) of evaluating positive existential formulas in k-CNF in D̄. Conversely, PEk (D̄) is polynomially reducible to the complement of evaluating a single fixed OMQ (depending only on k) with datatype D and a datatype pattern with nk atoms, where n is the number of relations in D. Basic complexity results can be obtained easily from this observation. For example, from the fact that PE1 (D) is tractable for D ∈ {(Z, <), (Z, ≤), (Q, <), (Q, ≤)}, we obtain that evaluating OMQs (T , q, D) in which q is a rooted CQ whose datatype pattern is a singleton is tractable. Conversely, from the fact that PE2 (Q, >) is non-tractable we obtain an intractable OMQ (T , q, D) with datatype D = (Q, ≤) and a rooted CQ q whose datatype pattern consists of two atoms. Our main result is a P/coNP-dichotomy for OMQs using the datatype D = (Q, ≤). Namely, we show that for any datatype pattern q0 over (Q, ≤) exactly one of the following two conditions holds (unless P=coNP): 1 The results presented in this paper do not depend on the particular tractable DL we extend with datatypes. To prove the lower bounds we require that the TBox is given in a DL containing DL-Litecore ; the upper bounds can be proved for all standard Horn-DLs for which CQ evaluation is in PTime. Tractable Ontology-Mediated Queries with Datatypes 3 – Evaluating rooted OMQs (T , q, D) with dtype(q) = q0 is in PTime. – There exists a rooted OMQ (T , q, D) with dtype(q) = q0 whose evaluation problem is coNP-hard. The proof uses a recent dichotomy result by Bodirsky and Kára for temporal con- straint satisfaction problems [7] and provides a sufficient and necessary condition for the evaluation problem to be in PTime that can be checked in linear time. Due to space limitations, some proofs had to be deferred to the full version of this paper, which is available on the authors’ websites. Related Work Expressive DLs with datatypes (or concrete domains) have been introduced in [4] and studied extensively [15]. In the context of tractable DLs, reasoning with datatypes has been studied in [3, 17] for EL and in [19, 20, 2] for DL- Lite. These works focus on finding ontology languages for which typical reasoning tasks are tractable. In contrast, here we start with ontology languages for which query answering is intractable in general, and aim at a complexity classification of query answering guided by the datatype pattern. Our methodology is closely related to recent work relating OBDM to constraint satisfaction problems [16, 5, 13]. However, here we classify datatype patterns according to the data-complexity of evaluating the OMQs containing them, whereas in [16] TBoxes are classified according to the data-complexity of OMQs containing them, and in [5] OMQs themselves are classified according to their data-complexity. Consequently, here we establish a link to temporal constraint satisfiaction [7] whereas the work mentioned above establishes a link to standard constraint satisfaction and the Feder-Vardi conjecture [12, 10]. 2 Preliminaries A datatype is a tuple D = (D, R1 , R2 , . . . ), where D is a non-empty set and R1 , R2 , . . . are relations on D. We call dom(D) := D the domain of D. To simplify the presentation, we will not distinguish between a relation Ri and its name (i.e., we use Ri both as a relation and as the name of the relation Ri ). The complement of D, denoted by D̄, is obtained by replacing each k-ary relation Ri in D by its complement R̄i := Dk \ Ri . For example, we have (Q, ≤) = (Q, >). We assume countably infinite and mutually disjoint sets of concept names, role names, attribute names, and individual names. Concept names will typically be denoted by A, role names by P , attribute names by U , and individual names by a, b, c. The logic DL-Litecore and Horn DLs such as Horn-ALCHIQ are defined as usual [11, 1, 14, 9]. We consider the extension Lattrib of such DLs L in which the existential restrictions ∃U for attribute names U can occur in exactly the same places in concepts and concept inclusions as concept names can occur in L. Unless stated otherwise, TBoxes range over Lattrib TBoxes, where L is DL-Litecore or any standard Horn-DL with data complexity for CQs in PTime. Let D be a datatype. A D-ABox consists of assertions of the form A(a), P (a, b), and U (a, u), where A is a concept name, P is a role name, U is an attribute name, a, b are individual names, and u ∈ dom(D). A D-knowledge base (D-KB) is a pair (T , A) consisting of a TBox T and a D-ABox A. 4 André Hernich, Julio Lemos, and Frank Wolter An interpretation I = (∆I , ·I ) over a datatype D consists of a non-empty domain ∆I = ∆Iind ∪dom(D) and an interpretation function ·I that assigns to each concept name A a set AI ⊆ ∆Iind , to each role name P a relation P I ⊆ ∆Iind ×∆Iind , and to each attribute name U a relation U I ⊆ ∆Iind × dom(D). The elements in ∆Iind are called individuals, whereas the elements in dom(D) are called data values. We assume that ∆Iind and dom(D) are disjoint. Throughout this paper, we make the standard name assumption: if I is an interpretation, then we set aI := a for all individual names a. We also set uI := u for each u ∈ dom(D), and RI := R for each relation R of D. The interpretation I induces the interpretations C I and S I for each complex concept C and complex role S in the standard way. We say that I is a model of a KB (T , A) if aI ∈ AI , (aI , bI ) ∈ P I , and (a , uI ) ∈ U I for all assertions A(a), P (a, b), and U (a, u) in A, and if for every I inclusion X v Y in T we have X I ⊆ Y I . A KB (T , A) is satisfiable if it has a model; in this case we say that A is satisfiable relative to T . We consider conjunctive queries (CQs) q (over D) of the form q(x̄) ← ϕ, where x̄ is the tuple of answer variables of q, and ϕ is a conjunction of atomic formulas of the form A(y), P (y, z), U (y, u), or R(u1 , . . . , uk ), where A, P , U , and R range over concept names, role names, attribute names, and relation names in D, respectively; each y, z is a variable; and each u, u1 , . . . , uk is a variable. As usual, all variables of x̄ must occur in some atom of ϕ. The size |q| of q is the number of atoms in q. The datatype pattern of q, denoted by dtype(q), is the conjunction of all atoms in ϕ that use a relation in D. The variables that occur in dtype(q) are called data variables. We assume that all data variables occur in some atom U (·, ·) outside of dtype(q). A match of q in an interpretation I is a mapping µ from the variables of ϕ to ∆I such that for each atom X(t1 , . . . , tk ) of ϕ we have (µ(t1 ), . . . , µ(tk )) ∈ X I . A tuple c̄ of individual names and data values is an answer to q in an interpretation I if there is a match µ of q in I such that µ(x̄) = c̄. We denote this by I |= q(c̄). Given a KB (T , A), we write T , A |= q(c̄) if c̄ is an answer to q in every model of (T , A). The connection graph of a CQ q is the undirected graph with the variables of q as its vertices and an edge between any two distinct variables if they occur together in an atom of q that does not belong to dtype(q). We say that q is rooted if for every variable y of q the connection graph contains a path from y to an answer variable of q. We consider ontology-mediated queries (OMQs) of the form Q = (T , q, D), where D is a datatype, T is a TBox, and q is a CQ over D. Given a D-ABox A and a tuple c̄, we write A |= Q(c̄) if (T , A) |= q(c̄). An OMQ Q = (T , q, D) is rooted if q is rooted. The query evaluation problem for Q is the problem to decide for given D-ABox A and c̄ whether A |= Q(c̄). 3 Query Evaluation and Positive Existential Sentences We establish a tight link between the OMQ evaluation problem with datatype D and the satisfaction problem for positive existential sentences over D̄. To this end we first introduce a variant of the universal (or canonical) model for standard Tractable Ontology-Mediated Queries with Datatypes 5 Horn-DL KBs. In contrast to KBs with Horn-DL TBoxes without datatypes, in general there does not exist a universal model for KBs with datatypes. Example 1. Consider the KB (T , A) with T = {A v ∃U1 , A v ∃U2 } and A = {A(a)} and with datatype (Q, ≤). Consider the OMQs Qi = (T , qi , (Q, ≤)), i = 1, 2, where q1 (x) ← U1 (x, u1 ) ∧ U2 (x, u2 ) ∧ u1 ≤ u2 q2 (x) ← U1 (x, u1 ) ∧ U2 (x, u2 ) ∧ u2 ≤ u1 Clearly A 6|= Q1 (a) since for the interpretation I with U1I = {(a, 2)} and U2I = {(a, 1)} we have I 6|= q1 (a). Also, A 6|= Q2 (a) since for the interpretation J with U1J = {(a, 1)} and U2J = {(a, 2)} we have J 6|= q2 (a). However, there does not exist a model I of T and A such that I 6|= qi (a) for both i = 1 and i = 2. The reason that universal models do not exist is that distinct interpretations of attributes can be required to refute the entailment of CQs. The notion of pre-interpretation formalizes this intuition: it fixes the interpretation of concept and roles names but leaves the interpretation of attributes open by adding placeholders for data values (called data nulls) to the set of possible values of attributes. A pre-interpretation J over D is the same as an interpretation over D with the exception that attribute names U are now interpreted as sets U J ⊆ ∆J J J ind × (dom(D) ∪ ∆null ), where ∆null is a set of data nulls disjoint from J ∆ind ∪ dom(D). The definitions of the interpretations C J of a concept C and S J of a role S are extended from interpretations to pre-interpretations in the obvious way. A pre-model of a KB is a pre-interpretation that satisfies all assertions and inclusions in the KB. Pre-interpretations J can be completed to interpretations by assigning data values to data nulls. A completion function f for J is a mapping f : ∆J null → dom(D). The completion f (J ) of J by f is the interpretation I obtained from J by setting AI = AJ for all concept names A, P I = P J for all role names P , and U I = (U J ∩ (∆J J J ind × dom(D))) ∪ {(d, f (v)) | (d, v) ∈ U , v ∈ ∆null } for all attribute names U . An interpretation I is called a completion of J if there exists a completion function f for J such that f (J ) = I. Using a straightforward modification of the standard chase procedure for Horn-DLs (see, e.g., [9]) one can construct a pre-model can(T , A) of any satisfiable D-KB (T , A) such that for any CQ q over D and any c̄: (T , A) |= q(c̄) ⇔ f (can(T , A)) |= q(c̄) for all completion functions f . We call can(T , A) with this property a universal pre-model of T and A. Lemma 1. For every satisfable D-KB (T , A) there exists a universal pre-model can(T , A). Example 2. A universal pre-model can(T , A) for the KB (T , A) given in Exam- can(T ,A) ple 1 is given by setting ∆can(T ,A) = {a, u1 , u2 }; Acan(T ,A) = {a}; U1 = can(T ,A) {(a, u1 )}; and U2 = {(a, u2 )}. A completion function f for can(T , A) maps u1 and u2 to rational numbers and defines a completion f (can(T , A)) in which U1 is interpreted as (a, f (u1 )) and U2 is interpreted as (a, f (u2 )). 6 André Hernich, Julio Lemos, and Frank Wolter The universal pre-model can(T , A) can be infinite. If we are given a rooted OMQ Q = (T , q, D), then it is sufficient to consider the subinterpretation cann (T , A) of can(T , A) induced by the set of domain elements that are reachable from ABox elements in at most n = |q| steps. We call cann (T , A) a n-universal pre-model of T and A. As Q is rooted, it has the following property for any c̄: (T , A) |= q(c̄) ⇔ f (cann (T , A)) |= q(c̄) for all completion functions f . A straightforward modification of the standard chase shows that an n-universal pre-model cann (T , A) can be computed in polynomial time in the size of A [9]. Lemma 2. Assume OMQ Q = (T , q, D) is given. Then one can compute for any ABox A that is satisfiable relative to T a |q|-universal pre-model of T and A in polynomial time in the size of A. A positive existential sentence Φ over a datatype D is a first-order sentence built from atomic formulas over the relations in D by using solely conjunction, disjunction and existential quantifiers. Atomic formulas can use both individual variables and constants from D. Φ is in Conjunctive Normal Form (CNF) if it has the form m ^ ni _ Φ = ∃x̄ ci , where ci = ci,j for i = 1, . . . m i=1 j=1 where ci,j are atomic formulas. If ni = k, for each i, then we say that Φ is in k-CNF. The problem of deciding whether a positive existential sentence in k-CNF is satisfied in D is denoted PEk (D). We now show that we have a polynomial time reduction from evaluating OMQs over D to the complement of the problem PEk (D̄) and vice versa. Theorem 1. Let k > 0 and let D = (D, R1 , . . . , Rn ) be a datatype. Let Q = (T , q, D) be a rooted OMQ and assume that dtype(q) has k atoms. Then evaluating Q is polynomially reducible to the complement of PEk (D̄). Conversely, there is a rooted OMQ Q = (T , q, D) such that dtype(q) has nk atoms and PEk (D̄) is polynomially reducible to the complement of evaluating Q. Proof. Assume q is given as q(x̄) ← ϕ. Let A be an ABox satisfiable relative to T and let c̄ be a tuple of individual names and data values in A of the same length as x̄. Remove from ϕ the datatype pattern of q and denote by ψ the remaining atoms in ϕ. A match of ψ in a pre-interpretation I is a mapping µ from the variables in ψ to ∆I such that for each atom X(t1 , . . . , tk ) in ψ we have (µ(t1 ), . . . , µ(tk )) ∈ X I . Consider the set X of all matches µ of ψ with µ(x̄) = c̄ Vk in cann (T , A), where n = |q|. Now assume that dtype(q) = i=1 Si (z̄i ) and let k ^ _ Φ := ∃ū S̄i (µ(z̄i )), µ∈X i=1 where ū is a repetition-free enumeration of all data nulls in the set {µ(z̄i ) | µ ∈ X, 1 ≤ i ≤ k} (here we identify data nulls with individual variables in the k-CNF Tractable Ontology-Mediated Queries with Datatypes 7 Φ). It is readily checked that T , A |= q(c̄) if, and only if, D̄ 6|= Φ. This establishes the first part. Vm Wk Conversely, assume Φ = ∃x̄ i=1 ci , where ci = j=1 ci,j for 1 ≤ i ≤ m. We first assume that Φ is uniform, that is, for each 1 ≤ j ≤ k there exists a relation Sj (independent from i) such that ci,j is of the form S̄j (t̄i,j ). In this case we can construct the required OMQ (T , q, D) with |dtype(q)| = k. The TBox T is independent from k and defined as T = {A v ∃U }. Assume that the relations Sj are of arity lj and ci,j = S̄j (t̄i,j ) for all 1 ≤ i ≤ m. Before defining the CQ q we define an ABox AΦ as follows: – AΦ uses individuals cΦ and c1 , . . . , cm that are connected by a role name P using the assertions P (cΦ , ci ) for 1 ≤ i ≤ m; – in addition AΦ uses individuals ci,j which are connected to the individuals ci by role names P1 , . . . , Pk using the assertions Pj (ci , ci,j ) for 1 ≤ i ≤ m and 1 ≤ j ≤ k; – in addition AΦ uses individuals dt for each variable and constant t in Φ that are connected to the ci,j using role names N1j , . . . , Nljj for 1 ≤ j ≤ k and the assertions Nrj (ci,j , dt ) if the r-th component of t̄i,j equals t; – finally AΦ contains A(dt ) if t is a variable in Φ and U (dt , t) if t is a constant in Φ. Define the query q(x) ← ψ, by setting k  lj  ^ ^ Nrj (yj , zj,r )∧U (zj,r , uj,r ) ∧Sj (uj,1 , . . . , uj,lj )  ψ = P (x, y)∧ Pj (y, yj )∧ j=1 r=1 It is not difficult to show that D̄ |= Φ if, and only if, T , AΦ 6|= q(cΦ ). For  Φ0 = ∃x1 ∃x2 (R1 (1, x1 ) ∨ R2 (x2 , x1 )) ∧ (R1 (x1 , x2 ) ∨ R2 (x2 , 2)) the ABox AΦ and query q are shown in Figure 1. cΦ P P x c1 c2 P y P1 P2 P1 P2 P1 P2 c11 c12 c21 c22 y1 y2 N11 N21 N12 N22 N22 N21 z11 z12 z21 z22 N11 N21 N11 N12 N12 N22 d1 A dx1 dx2 d2 u11 u12 u21 u22 U U R1 R2 1 2 Fig. 1. ABox AΦ0 and CQ q It remains to consider the case in which Φ is not uniform. We may assume that Ri 6= ∅ for all 1 ≤ i ≤ n. We equivalently transform Φ into a uniform sentence Ψ 8 André Hernich, Julio Lemos, and Frank Wolter in nk-CNF over D̄. To this end, each conjunct ci of Φ is equivalently transformed into a conjunct c0i of the form k _ k _ R̄1 (t̄1i,j ) ∨ · · · ∨ R̄n (t̄ni,j ) j=1 j=1 We construct R̄1 (t̄1i,j ) for a fixed i and 1 ≤ j ≤ k. The remaining atoms are constructed in the same way. ci contains between 0 and k disjuncts of the form R̄1 (t̄). Thus, if ci contains at least one disjunct of this form we take sufficiently many copies to obtain R̄1 (t̄1i,1 )∨. . . , ∨R1 (t̄1i,k ) that is equivalent to the disjunction over all atoms of the form R̄1 (t̄) in ci . If ci does not contain any R̄1 (t̄), then take c̄ with c̄ ∈ R1 and let t̄1i,1 = · · · = t̄1i,k = c̄. Then R̄1 (t̄1i,1 ) ∨ . . . ∨ R̄1 (t̄1i,k ) is unsatisfiable, as required. t u We illustrate Theorem 1 by transfering some complexity results from temporal CSP over (Q, <) to OMQs. Recall that we are after a complexity classification. The following result shows that answering OMQs with datatype (Q, ≤) can be intractable already for datatype patterns of size two. Corollary 1. There is a rooted OMQ Q = (T , q, (Q, ≤)) with T = {A v ∃U } and |dtype(q)| = 2 such that evaluating Q is coNP-hard. Proof. The BETWEENNESS problem in V temporal constraints satisfaction is the problem to decide whether β := ∃u (x,y,z)∈C (x < y < z ∨ z < y < x) is satisfiable in (Q, <), where C is a set of triples of the form (x, y, z). This problem is NP-complete [18]. Clearly, β is equivalent to the 2-CNF ^ ∃u (x < y ∨ z < y) ∧ (x < y ∨ y < x) ∧ (y < z ∨ z < y) ∧ (y < z ∨ y < x). (x,y,z)∈C The claim now follows from Theorem 1. t u This is optimal as shown by the following result. Corollary 2. Let D ∈ {(Z, <), (Z, ≤), (Q, <), (Q, ≤)}. Then evaluating rooted OMQs Q = (T , q, D) with |dtype(q)| = 1 is in PTime. Proof. Follows from Theorem 1 and the observation that satisfiability of sentences in 1-CNF in D is decidable in PTime. t u 4 A Dichotomy for (Q, ≤) In this section, we focus on rooted OMQs that use the datatype (Q, ≤). We prove a P/coNP-dichotomy of evaluating such OMQs based on their datatype pattern, and provide a simple syntactic characterization of the datatype patterns of rooted OMQs with datatype (Q, ≤) for which the evaluation problem can be solved in polynomial time (Theorem 4). These results are based on a recent dichtomy result by Bodirsky and Kára [7] for temporal constraint satisfaction problems. Tractable Ontology-Mediated Queries with Datatypes 9 We start by reviewing the temporal constraint satisfaction framework of [7]. A temporal constraint language is a logical structure Γ = (Q, R1 , R2 , . . . ), where each Ri of arity k is definable by a first-order formula Φ(x1 , . . . , xk ) over (Q, <), i.e., Ri = {(a1 , . . . , ak ) ∈ Qk | (Q, <) |= Φ(a1 , . . . , ak )}. A primitive positive sentence over Γ is a first-order sentence over Γ built from atomic formulas using conjunction and existential quantification. It is crucial for the results in [7] that both first- order definitions of relations in Γ and primitive positive sentences over Γ do not contain constants. The problem of deciding whether a primitive positive sentence over Γ is satisfied in Γ is denoted by CSP(Γ ). Bodirsky and Kára [7] proved that for every temporal constraint language Γ , CSP(Γ ) is either in PTime or NP-complete. They characterize the temporal languages Γ for which CSP(Γ ) is tractable in terms of preservation properties of the relations in Γ . A function f : Qk → Q preserves a relation R ⊆ Qn if for all t1 , . . . , tk ∈ R we have f (t1 , . . . , tk ) ∈ R. Here, f (t1 , . . . , tk ) is obtained as follows. Given a tuple t of length n and an integer i ∈ {1, . . . , n}, let t[i] denote the ith  component of t. Then, f (t1 , . . . , tk ) = f (t1 [1], . . . , tk [1]), . . . , f (t1 [n], . . . , tk [n]) . We say that f preserves a temporal constraint language Γ if f preserves all relations in Γ . The following functions are considered in [7]: – min : Q2 → Q which maps its two arguments to the minimal one; – mi : Q2 → Q which maps (x, y) ∈ Q2 to α(x) if x = y, to β(y) if x > y, and to γ(x) if x < y, where α, β, γ are any functions with α(x) < β(x) < γ(x) < α(y) for all x < y; – mx : Q2 → Q which maps (x, y) ∈ Q2 to β(x) if x = y, and to α(min{x, y}) if x 6= y, where α, β are any functions with α(x) < β(x) < α(y) for all x < y; – ll : Q2 → Q which is any function that satisfies ll (x, y) < ll (x0 , y 0 ) iff x ≤ 0 and (x, y) is lexicographically smaller than (x0 , y 0 ), or x, x0 > 0 and (y, x) is lexicographically smaller than (y 0 , x0 ); – The dual of f ∈ {min, mi , mx , ll }, which maps (x, y) ∈ Q2 to −f (−x, −y). Theorem 2 ([7]). Let Γ be a temporal constraint language. If Γ is preserved under min, mi , mx , ll , one of their duals, or a constant function, then CSP(Γ ) is in PTime. Otherwise, CSP(Γ ) is NP-complete. We now translate the evaluation problem for rooted OMQs over (Q, ≤) into the temporal constraint satisfaction framework. With every datatype pattern Vk q0 (z1 , . . . , zn ) = i=1 zsi ≤ zti we associate the temporal constraint language Wk Γq0 := (Q, <, Rq0 ) where Rq0 := {(c1 , . . . , cn ) ∈ Qn | (Q, <) |= i=1 cti < csi }. Vk Theorem 3. Let q0 (z1 , . . . , zn ) = i=1 zsi ≤ zti be a datatype pattern. If Q = (T , q, (Q, ≤)) is a rooted OMQ with dtype(q) = q0 , then evaluating Q is polynomially reducible to the complement of CSP(Γq0 ). Conversely, there is a rooted OMQ Q = (T , q, (Q, ≤)) with dtype(q) = q0 such that CSP(Γq0 ) is polynomially reducible to the complement of evaluating Q. Proof (Sketch). We refine the proof of Theorem 1. For the first part, we construct the positive existential sentence as in the proof of Theorem 1, and then express 10 André Hernich, Julio Lemos, and Frank Wolter each disjunctive clause by a single Rq0 -atom. The result is a primitive positive sentence Ψ 0 over Γq0 . Note that Γq0 may still contain constants. To obtain a primitive positive sentence Ψ without constants, we turn each constant in Ψ 0 into an existentially quantified variable and add constraints that ensure that the order of the elements assigned to these variables corresponds to the order 0 of the constants. More precisely, let Vl−1  c1 < · · · < cl be the constants in Ψ . Then, Ψ = ∃c1 · · · ∃cl Ψ ∧ i=1 ci < ci+1 . It can be shown that Γq0 |= Ψ iff Γq0 |= Ψ 0 . 0 For the second part, we first translate a given primitive positive sentence over Γq0 into a positive existential sentence over (Q, >). Atoms using the relation Rq0 are replaced by disjunctive formulas expressing the predicate defining the relation Rq0 . Atoms using < are expanded into disjunctions with k atoms. The resulting sentence is in k-CNF. The second part of the theorem now follows from the construction in the proof of Theorem 1. t u Since < is preserved under min, mi , mx , ll and their duals, but not under any constant function, Theorem 2 implies that CSP(Γq0 ) is in PTime iff Rq0 is preserved under min, mi , mx , ll or one of their duals. Together with Theorem 3, we obtain the following corollary. Corollary 3. Let q0 be a datatype pattern. If Rq0 is preserved under min, mi , mx , ll , or one of their duals, then evaluating rooted OMQs (T , q, (Q, ≤)) with dtype(q) = q0 is in PTime. Otherwise, there is a rooted OMQ Q = (T , q, (Q, ≤)) with dtype(q) = q0 such that evaluating Q is coNP-complete. To illustrate, consider the datatype pattern q0 (x, y, z) = x ≤ y ∧ y ≤ z. It is straightforward to verify that Rq0 = {(a, b, c) ∈ Q3 | a > b ∨ b > c} is not preserved under min, mi , mx , ll or their duals, so there are OMQs (T , q, (Q, ≤)) with dtype(q) = q0 for which theVevaluation problem Vn is coNP-complete. On the n other hand, if q0 has the form i=1 x0 ≤ xi or i=1 xi ≤ x0 , then it follows from [8, Proposition 3.5] that Rq0 is preserved under ll or its dual. In particular, evaluation of OMQs (T , q, (Q, ≤)) with dtype(q) = q0 is in PTime. In fact, we will now show that these two forms of datatype patterns, which we call min-pattern and max-pattern, respectively, essentially characterize all the tractable cases. The following lemma is the core of the characterization result. It implies that preservation under min, mi , mx or their duals collapses to preservation under ll or its dual for relations that are definable by normal disjunctive formulas. By a disjunctive formula we mean a disjunction of atoms of the form x < y. A disjunctive formula Φ(x1 , . . . , xn ) is normal if the directed graph with vertex set {x1 , . . . , xn } and edge set {(xj , xi ) | xi < xj ∈ Φ} is acyclic. Lemma 3. Let R ⊆ Qn be defined by a normal disjunctive formula Φ(x1 , . . . , xn ) over (Q, <). If RWis preserved under Wn min, mi , mx , ll , or one of their duals, then n Φ has the form i=1 x0 > xi or i=1 xi > x0 . In particular, Wn [8, Proposition Vn 3.5] implies that a relation defined by a formula of the form i=1 x0 > xi or i=1 xi > x0 is preserved under ll or its dual. We apply the lemma to relations Rq0 for acyclic datatype patterns q0 . A datatype pattern q0 is acyclic if the directed graph with the variables of q0 as Tractable Ontology-Mediated Queries with Datatypes 11 vertices and an edge (x, y) for each atom x ≤ y of q0 is acyclic. Since a cycle x0 ≤ x1 ∧x1 ≤ x2 ∧· · ·∧xn ≤ x0 tells us that x0 , x1 , . . . , xn have to be mapped to the same data value, we can eliminate any cycle by removing all of its atoms, and replacing x1 , . . . , xn by x0 . Let q0acyclic be the acyclic datatype pattern obtained from a datatype pattern q0 by eliminating all of its cycles. Theorem 4. Let q0 be a datatype pattern over (Q, ≤). If q0acyclic is a min-pattern or a max-pattern, then evaluating rooted OMQs (T , q, (Q, ≤)) with dtype(q) = q0 is in PTime. Otherwise, there is a rooted OMQ Q = (T , q, (Q, ≤)) with dtype(q) = q0 such that evaluating Q is coNP-complete. Proof. To simplify the presentation, we assume without loss of generality that q0 is acyclic. By Corollary 3, it suffices to show that q0 is a min-pattern or a max-pattern iff Rq0 is preserved under min, mi , mx , ll , or one of their Wnduals. If q0 is a min-pattern, then Rq0 is defined by a formula of the form i=1 x0 > xi . Similarly, if q0 is a max-pattern, then Rq0 is defined by a formula of the form W n i=1 xi > x0 . It is known [8, Proposition 3.5] that relations defined by such formulas are preserved under ll and dual-ll , respectively. Now suppose that Rq0 is preserved under min, mi , mx , ll , or one of their Vk duals. Let q0 (z1 , . . . , zn ) = i=1 zsi ≤ zti . Then, Rq0 is defined by Φ(z1 , . . . , zn ) = Wk i=1 zti < zsi . Clearly, Φ is disjunctive, and W since q0 is acyclic Wnit is also normal. n Thus, Lemma 3 implies that Φ has the form i=1 x0 > xi or i=1 xi > x0 . This implies that q0 is a min-pattern or a max-pattern. t u Note that the fact that q0acyclic is neither a min-pattern nor a max-pattern does not imply that evaluation is hard for all OMQs Q = (T , q, (Q, ≤)) with dtype(q) = q0 . For example, q0acyclic may have several connected components, each a min-pattern or a max-pattern. If no component is connected to another one via a path of existential variables in q one can show that evaluating Q is in PTime. 5 Conclusion We have presented first promising results for a non-uniform complexity analysis of ontology-mediated queries with expressive datatypes. Many research questions arise within this framework, including the following: (1) It remains an interesting open problem whether our results generalize to non-rooted OMQs. Such OMQs can “look” arbitrarily deep into a universal pre-model. We suspect that a finite portion of a universal pre-model suffices, but this is far from obvious. (2) The TBoxes we consider have very limited expressive power regarding datatypes and it would be of interest to generalize our method to TBoxes that admit more constructors using datatypes. (3) In this paper, we have used datatype patterns within CQs to classify the complexity of OMQs. It would be of interest to complement and extend this approach with classifications based on TBoxes, OMQs, or extended patterns in CQs that take into account additional atoms not using datatype relations. (4) In addition to the PTime/coNP dichotomy considered above, it would be of interest to investigate FO-rewritability and Datalog-rewritability of OMQs as well [5]. 12 André Hernich, Julio Lemos, and Frank Wolter References 1. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. Artif. Intell. Res. (JAIR) 36, 1–69 (2009) 2. Artale, A., Ryzhikov, V., Kontchakov, R.: DL-Lite with attributes and datatypes. In: ECAI. pp. 61–66 (2012) 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: IJCAI-05. pp. 364–369 (2005) 4. Baader, F., Hanschke, P.: A scheme for integrating concrete domains into concept languages. In: IJCAI 1991. pp. 452–457 5. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, csp, and MMSNP. ACM Trans. Database Syst. 39(4), 33:1–33:44 (2014) 6. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable description logics. In: Reasoning Web 2015. pp. 218–307 7. Bodirsky, M., Kára, J.: The complexity of temporal constraint satisfaction problems. J. ACM 57(2) (2010) 8. Bodirsky, M., Kára, J.: A fast algorithm and datalog inexpressibility for temporal reasoning. ACM Trans. Comput. Log. 11(3) (2010), http://doi.acm.org/10.1145/ 1740582.1740583 9. Botoeva, E., Kontchakov, R., Ryzhikov, V., Wolter, F., Zakharyaschev, M.: Query inseparability for description logic knowledge bases. In: KR 2014 10. Bulatov, A.A., Jeavons, P., Krokhin, A.A.: Classifying the complexity of constraints using finite algebras. SIAM J. Comput. 34(3), 720–742 (2005) 11. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007) 12. Feder, T., Vardi, M.Y.: Monotone monadic SNP and constraint satisfaction. In: Proc. of the ACM Symposium on Theory of Computing. pp. 612–622 (1993) 13. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: IJCAI 2015. pp. 3048–3054 14. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive description logics. In: IJCAI 2005. pp. 466–471 15. Lutz, C.: Description logics with concrete domains-a survey. In: Advances in Modal Logic 4. pp. 265–296 (2002) 16. Lutz, C., Wolter, F.: Non-uniform data complexity of query answering in description logics. In: KR 2012 17. Magka, D., Kazakov, Y., Horrocks, I.: Tractable extensions of the description logic EL with numerical datatypes. J. Autom. Reasoning 47(4), 427–450 (2011) 18. Opatrny, J.: Total ordering problem. SIAM J. Comput. 8(1), 111–114 (1979) 19. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. Data Semantics 10, 133–173 (2008) 20. Savkovic, O., Calvanese, D.: Introducing datatypes in DL-Lite. In: ECAI 2012. pp. 720–725