From Conjunctive Queries to SPARQL Queries in Ontology-Mediated Querying Cristina Feier1 , Carsten Lutz1 , and Frank Wolter2 1 University of Bremen, Germany feier@uni-bremen.de clu@uni-bremen.de 2 University of Liverpool, United Kingdom wolter@liverpool.ac.uk Abstract. We consider the rewritability of ontology-mediated queries (OMQs) based on UCQs into OMQs based on (certain kinds of) SPARQL queries. Our focus is on ALCI as a paradigmatic expressive DL. For rewritability into SPARQL queries that are unions of basic graph pat- terns, we show that the existence of a rewriting is decidable, based on a suitable characterization; for unary OMQs, this coincides with rewritabil- ity into instance queries. For SPARQL queries that additionally admit projection, we make some interesting first observations. In particular, we show that whenever there is a rewriting, then there is one that uses the same TBox as the original OMQ and only ALCI concepts from a certain finite class of such concepts. We also observe that if the TBox of the original OMQ falls into Horn-ALCI, then a rewriting always exists. 1 Introduction We study ontology-mediated querying with expressive description logics (DLs), focussing on ALCI as a paradigmatic such DL. A variety of query languages has been considered in this context, including conjunctive queries (CQs), unions of conjunctive queries (UCQs), instance queries (IQs), and SPARQL queries. The former two are a particularly natural choice because they play a fundamental role in database theory and systems, and in fact they are used in a large number of theoretical studies on ontology-mediated querying [3, 4, 6]. On the practical side, however, there do not seem to be any systems that fully support CQs and UCQs whereas there are systems that support IQs and SPARQL queries, such as Hermit [8, 12]. This raises the question of when and how a (U)CQ can be expressed as an IQ or as a SPARQL query. The relationship between unary (U)CQs and IQs, which can be seen as a simple form of SPARQL query, has been studied in [10,11,13], and very recently in [7]. While it is easy to see that every tree-shaped CQ can be translated into an equivalent IQ, it is more surprising that also some CQs that are not tree-shaped turn out to be IQ-rewritable when we have an expressive DL at our disposal [13]. For example, the CQ r(x, x), which asks to return all objects from the data that are involved in a reflexive r-loop, can be rewritten into the equivalent IQ P → ∃r.P (x) (equivalently ¬P t ∃r.P (x)). Here, P behaves like a monadic second-order variable due to the open-world assumption made for OMQs: we are free to interpret P in any possible way and when making P true at an object we are forced to make also ∃r.P true if and only if the object is involved in a reflexive r-loop. Kikot and Zolin have identified a large class of CQs that are rewritable into IQs in the context of ALCI, namely that class of unary CQs in which every cycle passes through the answer variable. This was further elaborated in [7] where it is shown that this condition precisely characterizes IQ- rewritability, the condition is generalized to the case of non-empty TBoxes and non-full ABox signatures, and tight complexity bounds for deciding whether an IQ-rewriting exists are obtained, between NP-complete in the case of the empty TBox and full ABox signature and 2NExpTime-complete in the general case. The purpose of this paper is to present initial results on the rewritability of (U)CQs of arity greater than one, for which IQs are clearly not a suitable target. Kikot and Zolin replace IQ-rewritability by Turing reductions to knowledge base consistency [11]. We prefer to consider rewritability into (two fragments of) SPARQL, namely into unions of basic graph patterns (UBGP), that is, UCQs that contain no quantified variables but potentially have atoms C(x) with C a compound concept, and into PUBGPs, which consist of a projection applied to a UBGP. From an implementation perspective, answering such queries is closely related to knowledge base consistency and other common reasoning tasks which has resulted in support by practical systems, while answering (U)CQs is not. In fact, an imporant difference is that the union in UBGPs and the projection in PUBGPs are operations on query answers while the disjunction of UCQs and the existential quantification of CQs are logical operators on the level of models which makes them more difficult to handle. Note that we disregard many parts of SPARQL such as difference, optional, and the binding of variables to concept and role names. Also, we stick to the OWL 2 direct semantics entailment regime. After giving preliminaries, in Section 3 we summarize the results from [7] regarding IQ-rewritability in ALCI. We also observe that IQs and unary UBGPs have the same expressive power, which is not entirely trivial, and that unary PUBGPs are more expressive. In Section 4, we then study UBGP-rewritability of OMQs from (ALCI, UCQ), that is, the TBox is formulated in ALCI, the actual query is a UCQ, and an ABox signature may be imposed. Remarkably, even if the UCQ is full (that is, it has no quantified variables), UBGP-rewritability is not guaranteed. We develop a characterization of UBGP-rewritability and use it to show that this problem is decidable.3 The characterization is more technical than in the case of IQ-rewritability which we believe to be unavoidable. It also implies that, in UBGP-rewritings, it is always sufficient to use the same TBox as in the original OMQ. For PUBGP-rewritability, our results are less complete. We start with observ- ing that PUBGP-rewritings always exist when the TBox of the original OMQ is formulated in Horn-ALCI, that otherwise rewritings are not guaranteed to exist and that, when they exist, they might necessarily involve BGPs whose number of 3 Here and in the subsequent results, we assume that every CQ in the original UCQ is connected in the sense that every variable is reachable from an answer variable. variables is exponential in the size of the TBox of the original OMQ. We then es- tablish a (non-effective) characterization which shows that PUBGP-rewritability implies rewritability into a particular class of PUBGPs. It follows that it is never necessary to modify the TBox when constructing PUBGP-rewritings and that in atoms of the form C(x), the concept C can be restricted to a certain finite class of concepts. This is a first step towards decidability of PUBGP-rewritability, which remains as an interesting open problem. A long version of the paper containing full proofs in the appendix is available at http://www.informatik.uni-bremen.de/tdki/research/papers.html. 2 Preliminaries We assume familiarity with standard DL notation and languages [2] and only introduce notions that are potentially ambiguous. The main DL studied in this paper is ALCI. An ABox A is a finite set of assertions of the form A(a) and r(a, b), where A is a concept name, r a role name, and a, b are individual names. We use ind(A) to denote the set of all individual names that occur in A. An interpretation is a model of an ABox A if it satisfies all assertions in A, that is, a ∈ AI when A(a) is in A and (a, b) ∈ rI when r(a, b) is in A. We thus make the standard name assumption. A signature Σ is a set of concept and role names. We use sig(A) to denote the set of concept and role names that occur in the ABox A, and likewise for other syntactic objects such as TBoxes. An ABox A is a Σ-ABox if sig(A) ⊆ Σ. An instance query (IQ) takes the form C(x) where C is a concept from the DL under consideration and x a variable. In this paper, C will always be an ALCI concept. For an interpretation I, we write I |= C(a) if a ∈ C I . A conjunctive query (CQ) is of the form q(x) = ∃y ϕ(x, y), where x and y are tuples of variables and ϕ(x, y) is a conjunction of atoms of the form A(z), r(z1 , z2 ), or z1 = z2 , with A a concept name, r a role name, and z, z1 , z2 ∈ x ∪ y. When z1 = z2 occurs in q we assume w.l.o.g. that z1 , z2 ∈ x and there is no other atom in q which W contains x2 . A union of conjunctive queries (UCQ) q(x) is a formula of the form i qi (x), where each qi (x) is a CQ. When compound concepts are admitted in place of concept names in a CQ/UCQ, then we speak about an extended CQ/UCQ. For any kind of query, the free variables are called answer variables, the arity is the number of answer variables, and a query is Boolean if it has arity zero. A CQ is full if all variables are answer variables and a UCQ is full if every disjunct is. A homomorphism from a CQ q(x) to an interpretation I is a function h : x ∪ y → ∆I such that h(z) ∈ AI for every atom A(z) of q(x), (h(z1 ), h(z2 )) ∈ rI for every atom r(z1 , z2 ) of q(x), and h(x1 ) = h(x2 ) for every atom x1 = x2 of q(x). We write I |= q(a) and call a an answer to q(x) onWI if there is a homomorphism from q(x) to I with h(x) = a. For a UCQ i qi (x) and an interpretation I, we write I |= q(a) if I |= qi (a) for some i. The semantics of extended CQ/UCQs is as expected. We now introduce query languages related to SPARQL [9]. A basic graph pattern (BGP) is an extended full CQ. Note that IQs coincide with unary BGPs. S A union of basic graph patterns (UBGP) is ofSthe form i ϕi (x) where each ϕi is a BGP and a PUBGP is of the form Πx ( i ϕi (x, yi )) where each ϕi (x, yi ) is a BGP. UBGPs correspond to PUBGPs in which all tuples yi are empty as in this case the projection to x denoted by Πx is vacuous. Note that BGPs and UBGPs must have non-zero arity while this is not the case for PUBGPs. An ontology-mediated query (OMQ) takes the form Q = (T , Σ, q(x)) with T a TBox, Σ ⊆ sig(T ) ∪ sig(q) an ABox signature, and q(x) a query. The arity of Q is the arity of q(x). When Σ is sig(T ) ∪ sig(q), then for brevity we denote it with Σfull and speak of the full ABox signature. Let A be a Σ-ABox. If q(x) is an IQ or a (possibly extended) UCQ, then a is an answer to Q on A if I |= q(a) for all modelsSI of A and T . This also captures the case where q(x) is a BGP. If q(x) = Πx ( i ϕi (x, yi )) is a PUBGP, then a is an answer to Q on A if there exist i and b such that ab is an answer to (T , Σ, ϕi (x, yi )). This also captures the case where q(x) is a UBGP. In either case, we write A |= Q(a) if a is an answer to Q on A. It is important to note and to keep in mind that, in OMQs, the union in (P)UBGPs behaves differently from the disjunction in UCQs and the projection in PUBGPs behaves differently from the existential quantification in UCQs (also note that the order is reversed). For example, let T = {∃r.> v A t B} and A an {A, B}-ABox. Then the UCQ-based OMQ (T , Σfull , A(x) ∨ B(x)) returns every individual in the range of r as an answer whereas the UBGP-based OMQ (T , Σfull , A(x) ∪ B(x)) returns only those individuals a from the range of r such that A(a) ∈ A or B(a) ∈ A. In fact, the latter query corresponds to executing the two OMQs (T , Σfull , A(x)) and (T , Σfull , B(x)) independently and then taking the union of their answer sets. Likewise, the projection of PUBGPs is a projection on answer sets. We use (L, Q) to refer to the OMQ language in which the TBox is formulated in the DL L and where the actual queries are from the query language Q, such as in (ALCI, UCQ). Definition 1. Let (L, Q) be an OMQ language. An OMQ Q = (T , Σ, q(x)) is (L, Q)-rewritable if there is an OMQ Q0 from (L, Q) such that the answers to Q and to Q0 are identical on any Σ-ABox that is consistent with T . In this case, we say that Q is rewritable into Q0 and call Q0 a rewriting of Q. We are interested in rewriting OMQs from (ALCI, UCQ) into OMQs based on IQs, UBGPs, and PUBGPs. For brevity, we speak of Q-rewritability instead of (ALCI, Q)-rewritability. For example, IQ-rewritability means rewritability of an OMQ from (ALCI, UCQ) into an OMQ from (ALCI, IQ) with the IQ formu- lated in ALCI, and PUBGP-rewritability means rewritability into an OMQ from (ALCI, PUBGP) where the PUBGP uses only ALCI compound concepts. The following example of IQ-rewritability is from [7]. Examples and non-examples of PUBGP-rewritability are given later in this paper. Example 1. Let q1 (x) = r(x, x). The OMQ Q1 = (∅, {r}, q1 (x)) is rewritable into the OMQ (∅, {r}, C(x)) where C is the ALCI concept P → ∃r.P. In contrast, let q2 (x) = ∃y r(x, y) ∧ r(y, y). It follows from Theorem 1 below that the OMQ Q2 = (∅, {r}, q2 (x)) is not rewritable into an OMQ from (ALCI, IQ). We close this section with some additional, more technical preliminaries. We say that an OMQ Q = (T , Σ, q(x)) is empty if for all Σ-ABoxes A, there is no answer to Q on A. Let Q1 , Q2 be OMQs, Qi = (Ti , Σ, qi (x)) for i ∈ {1, 2}. Then Q1 is contained in Q2 , written Q1 ⊆ Q2 , if for all Σ-ABoxes A, every answer to Q1 on A is also an answer to Q2 on A. Further, Q1 and Q2 are equivalent, written Q1 ≡ Q2 , if Q1 ⊆ Q2 and Q2 ⊆ Q1 . Every CQ q(x) = ∃y ϕ(x, y) gives rise to an undirecteded graph Gq whose nodes are the elements of x ∪ y and that contains an edge {z1 , z2 } if ϕ(x, y) contains an atom r(z10 , z20 ) or z10 = z20 with {z1 , z2 } = {z10 , z20 }. Gq might contain self loops. We say that q(x) is connected if every variable is reachable from an answer variable in Gq . An UCQ is connected if every CQ in it is and an OMQ from (ALCI, UCQ) is connected if the UCQ in it is. A contraction of a CQ q(x) is a CQ obtained from q(x) by zero or more variable identifications, where the identification of an answer variable x with any non-answer variable yields x. 3 IQ-rewritability We give an overview of the results on IQ-rewritability in (ALCI, UCQ) from [7]. In that paper, several other DLs are considered as well, including ALC and variations of ALC and ALCI that support role hierarchies, the universal role, and functional roles. In some cases, the characterization of OMQ rewritability and the complexity of deciding IQ-rewritability are rather different from the ALCI case, which we briefly comment on. We also show that unary OMQs from (ALCI, IQ), (ALCI, BGP), and (ALCI, UBGP) have the same expressive power while unary OMQs from (ALCI, PUBGP) are more expressive. We start with a fundamental characterization of OMQs from (ALCI, UCQ) that are IQ-rewritable, first giving some preliminaries. Let q(x) be a unary CQ. We can assume w.l.o.g. that, being unary, q(x) contains no equality atoms. A cycle in q(x) is a sequence of non-identical atoms r0 (x0 , x1 ), . . . , rn−1 (xn−1 , xn ) in q(x), n ≥ 1, where4 1. r0 , . . . , rn−1 are (potentially inverse) roles, 2. xi 6= xj for 0 ≤ i < j < n, and x0 = xn . We say that q(x) is x-acyclic if every cycle in it passes through x. A UCQ is x-acyclic if every CQ in it is. Let q(x) be a UCQ. We use qacyc (x) to denote the UCQ that consists of all con contractions of a CQ from q(x) that are x-acyclic and qacyc (x) to denote the UCQ obtained from qacyc (x) by restricting every CQ in it to atoms that only use variables reachable in Gq from the answer variable x. The following theorem is the announced characterization [7]. Theorem 1. Let Q = (T , Σ, q(x)) be a unary OMQ from (ALCI, UCQ) that is non-empty. Then the following are equivalent: 4 We require the atoms be non-identical to prevent r(x0 , x1 ), r− (x1 , x0 ) from being a cycle (both atoms are identical). 1. Q is IQ-rewritable; 2. Q is rewritable into an OMQ Q0 = (T , Σ, C(x)) from (ALCI, IQ); con 3. Q ≡ (T , Σ, qacyc (x)). Theorem 1 excludes empty OMQs, but these are trivially IQ-rewritable; moreover, emptiness of OMQs from (ALCI, UCQ) is decidable (and 2ExpTime- complete) [1]. Equivalence of Points 1 and 2 implies that it is not necessary to modify the TBox when constructing an IQ-rewriting. The latter is not the case, for example, when constructing IQ-rewritings of OMQs from (ALC, UCQ) as- suming that the constructed IQ must be an ALC concept. In that case, Point 1 and 3 of Theorem 1 are still equivalent, but equivalence with Point 2 is not as it might be necessary to (mildly) extend the TBox [7]. In the direction “3 ⇒ 2”, we construct actual rewritings, extending a con- struction due to Kikot and Zolin [11] to accomodate TBoxes, ABox signatures, and UCQs (instead of CQs). This extension yields the following. Lemma 1. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ). If q(x) is x-acyclic and connected, then Q is rewritable into an OMQ (T , Σ, C(x)) with C(x) an IQ. The size of the IQ C(x) is polynomial in the size of q(x). The IQ C(x) constructed in the proof of Lemma 1 takes the form P → (C1 t· · ·tCn ) where P is a concept name or > and C1 , . . . , Cn are ELI-concepts. Based on Theorem 1 and variations thereof and using careful reductions to OMQ containment [5], one can establish the following complexity results [7]. Theorem 2. Deciding IQ-rewritability of a unary OMQ from (ALCI, UCQ) is 1. 2 NExpTime-complete in the general case; 2. 2 ExpTime-complete for OMQs based on the full ABox signature; 3. NP-complete for OMQs based on the empty TBox. Different complexities are obtained for other DLs. For example, IQ-rewritability is undecidable in (ALCF, CQ) and between ExpTime and coNExpTime in (ALC, UCQ) when the ABox signature is full. As observed in [7], IQ-rewritings for OMQs based on the empty TBox can be viewed as underapproximations for OMQs with non-empty TBoxes in the sense that an IQ-rewriting for (∅, Σ, q(x)) with q(x) a UCQ is also an IQ-rewriting for (T , Σ, q(x)) for any ALCI TBox T . We now compare OMQs based on IQs to unary OMQs based on BGPs, UBGPs, and PUBGPs. It is obvious from the definitions that OMQs from (ALCI, IQ) and (ALCI, BGP) have the same expressive power. The follow- ing is less trivial to show than one might think as converting an OMQ from (ALCI, UBGP) into an OMQ from (ALCI, IQ) requires a careful modification of the TBox to bridge the semantic gap between unions in UBGPs and disjunc- tion in IQs. A proof is in the long version of the paper. Theorem 3. The language (ALCI, IQ) has the same expressive power as unary (ALCI, UBGP). The next example shows that unary (ALCI, PUBGP) is more expressive than (ALCI, IQ). Example 2. Let Q = (∅, {r}, q(x)) be the OMQ where q(x) is the PUBGP Πx ϕ(x, y) and ϕ(x, y) is the BGP r(x, y) ∧ (P → ∃r.P )(y). It can be verified that Q is equivalent to the OMQ Q2 from Example 1, which is not expressible in (ALCI, IQ). 4 UBGP-Rewritability We study UBGP-rewritability in (ALCI, UCQ). We first present an example and a characterization of UBGP-rewritability, and then use it to show that this property is decidable. The example illustrates that as a result of disjunction in UCQs behaving differently from union in UBGPs, OMQs based on full UCQs are not always UBGP-rewritable. Note that this is in contrast to the CQ case since every OMQ based on a full CQ is by definition also an OMQ based on a BGP. Example 3. Take the UCQ q(x, y) = (A(x) ∧ r(x, y)) ∨ (r(x, y) ∧ A(y)). We first consider the OMQ Q0 = (∅, Σfull , q(x, y)) based on the empty TBox. Clearly, Q0 is rewritable into the OMQ Q00 = (∅, Σfull , q 0 (x, y)) from the language (ALCI, UBGP) in which the disjunction from the UCQ q(x, y) is replaced by BGP union: q 0 (x, y) = (A(x) ∧ r(x, y)) ∪ (r(x, y) ∧ A(y)). Now consider the TBox T = {M v ∀s.A t ∀t.A} and let Q1 = (T , Σfull , q(x, y)). Then Q01 = (T , Σfull , q 0 (x, y)) is not a rewriting of Q1 . To see this, consider the ABox A = {r(a, b), M (c), s(c, a), t(c, b)}. Then A |= Q1 (a, b), but A 6|= Q01 (a, b). In fact, Q1 is not UBGP-rewritable. We sketchS the proof. Assume to the contrary that there is a rewriting Q = (T 0 , Σ, i ϕi (x, y)) of Q1 from (ALCI, UBGP). Then A |= Q(a, b). Consider the ABox A0 obtained from A by dropping all assertions that use c, taking the union with two copies Aa and Ab of A that do not share individuals with A or with each other, and then identifying the copy of a in Aa with a, and the copy of b in Ab with b. Thus, A0 = {r(a, b), M (c1 ), s(c1 , a), t(c1 , b0 ), M (c2 ), s(c2 , a0 ), s(c2 , b), t(a0 , b)}. We then have A0 6|= Q1 (a, b), and so A0 6|= Q(a, b). By construction, there is a homomorphism from A to A0 that maps a to a and also a homomorphism from A to A0 that maps b to b. As entailment of ALCI concepts is preserved under ABox homomorphisms, for every ALCI concept C, A |= (T 0 , Σfull , C(x))(a) implies A0 |= (T 0 , Σfull , C(x))(a) and the same holds for b. We obtain A0 |= Q(a, b) from A |= Q(a, b) and have derived a contradiction. We now give the characterization of UBGP-rewritability, starting with some technical preliminaries. We first identify a set of concepts to be used in UBGP- rewritings and, later on, also in PUBGP-rewritings. Let Q = (T , Σ, q(x)) be a non-Boolean OMQ from (ALCI, UCQ). Let Qq(x) be the set of unary CQs p(x) that are connected and x-acyclic and can be obtained from a contraction of a CQ in q(x) by dropping all equality atoms, then (potentially) taking a subquery, and finally choosing any variable x as the new answer variable and existentially quantifying all other variables. For every such p(x), take an ALCI concept Cp such that the CQ-based OMQ (T , Σ, p(x)) is equivalent to the IQ-based OMQ (T , Σ, Cp (x)), as constructed in the proof of Lemma 1. Let sub(Q) be the closure under subconcepts of concepts Cp , p(x) ∈ Qq(x) , and concepts used in T . A Q-type is a minimal set of concepts that contains C or ¬C for every C ∈ sub(Q). For a Q-type τ , set Cτ = u C. Also C∈τ let CQ be the set of concepts of the form Cτ1 t · · · t Cτ` with each τi a Q-type. We next define an approximation of Q from below that is formulated in (ALCI, UBGP) and based on the TBox T from Q and the concepts from CQ . More precisely, QUBGP is the OMQ (T , Σ, q 0 (x)) where q 0 (x) is the UBGP that is the union of all BGPs ϕ(x) such that 1. for every x ∈ x, there is a unique atom D(x) ∈ ϕ(x), and D ∈ CQ and 2. (T , Σ, ϕ(x)) ⊆ Q. It is easy to see that there are only finitely many such BGPs. Note that, by construction, QUBGP ⊆ Q. Theorem 4. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ) that is con- nected, of arity at least one, and non-empty. Then the following are equivalent: 1. Q is UBGP-rewritable; 2. QUBGP is a rewriting of Q; 3. Q ⊆ QUBGP . Since QUBGP uses the same TBox as Q, Theorem 4 implies that it is not nec- essary to modify the TBox when constructing UBGP-rewritings. The proof of Theorem 4 is a minor variation of the proof of Theorem 7 below, given after Lemma 2. The next theorem can be obtained by a reduction of UBGP-rewritability to containment in monadic disjunctive datalag [5], based on Theorem 4. Theorem 5. UBGP-rewritability in (ALCI, UCQ) is decidable. A 2NExpTime lower bound follows from Theorems 2 and 3. The described reduction only yields a 4NExpTime upper bound. 5 PUBGP-Rewritability We study PUBGP-rewritability in (ALCI, UCQ), first observing that these al- ways exist when the TBox of the original OMQ is formulated in the fragment Horn-ALCI of ALCI. This is essentially a consequence of the fact that Horn- ALCI has universal models, more details are given in the long version. Theorem 6. Every OMQ from (Horn-ALCI, UCQ) is PUBGP-rewritable. We next give an example showing that PUBGP-rewritings are not guaranteed to exist in the non-Horn case. This is already true for OMQs from (ALC, CQ) even when ALCI concepts are admitted in PUBGPs. Example 4. Let Q = (T , Σfull , q(x)) with T = {M v M1 t M2 , M1 v ∀s.A, M2 v ∀t.A, A v ∀t.A} and q(x) = ∃y r(x, y) ∧ A(y) ∧ r(y, y). Then Q is not PUBGP-rewritable. For a proof by contradiction, assume that Q0 = (T 0 , Σfull , q 0 (x)) is a PUBGP-rewriting. For every n ≥ 1, let An = {r(a, b1 ), r(a, b2 ), r(b1 , b1 ), r(b2 , b2 ), M (c1 ), s(c1 , b1 ), t(c1 , c2 ), . . . , t(cn , b2 )}. It can be verified that An |= Q(a) and thus An |= Q0 (a), for every n. Choose n > 0 that exceeds the number of variables in any BGP in q 0 . As An |= Q0 (a), there must be a BGP ϕ(x, y) in q 0 and a b such that An |= Qϕ (a, b) for the OMQ Qϕ = (T , Σfull , ϕ(x, y)). Further, there must be a ci in An that is not in b. Let ind− = ind(An ) \ {ci } and for each b ∈ ind− , let Abn denote the disjoint copy of An obtained by renaming every individual c to cb . Now consider the ABox A0n obtained from An by removing all assertions that use ci , taking the disjoint union with all Abn , b ∈ ind− , and then identifying each individual b ∈ ind− with the individual bb from Abn . Note that this construction is similar to the construction of A0 in Example 3. We have A0n |= Qϕ (a, b) since for every b ∈ ind− , there is a homomorphism h from An to A0n with h(b) = b and for all b, b0 ∈ ind− and role names v, v(b, b0 ) ∈ An iff v(b, b0 ) ∈ A0n . On the other hand, it is easy to see that A0n 6|= Q(a) due to the removal of ci . Next example shows that when constructing PUBGP-rewritings of OMQs from (ALC, CQ), it can be necessary to use BGPs whose topology is different from that of the CQs in the original OMQ. In fact, exponentially many variables (in the size of the TBox of the original OMQ) can be required in BGPs. We conjecture that the example can even be improved to show a double exponential lower bound on the number of variables. As for the previous example, the same is true for (ALCI, CQ), that is, when ALCI concepts are admitted in PUBGPs. Example 5. Let Q = (T , Σ, q(x)) be the OMQ with T = {Ai v ∀r.Ai+1 t ∀s.Ai+1 | 0 ≤ i < n} ∪ {An v A} Σ = {A0 , r, s, t} q(x) = ∃y t(x, y) ∧ A(y) ∧ t(y, y). Q is rewritable into the OMQ Q0 = (∅, Σ, q 0 (x)) where q 0 (x) is the PUBGP Πx ϕ(x, y) with ϕ(x, y) as depicted in Figure 5. Observe that the topmost part is a full binary tree of depth n. We aim to show that there is no rewriting of Q that uses less than 2n variables. A0 r s r s r s ... ... ... ... r s r s t t ... ... ... t t t t t t x Fig. 1. PUBGP-rewriting for the OMQ Q in Example 5 Assume for a proof by contradiction that there is such a rewriting Q0 = (T 0 , Σ, q 0 (x)). Let Aϕ be ϕ viewed as an ABox. Clearly, Aϕ |= Q(x) and thus there must be a BGP ϕ0 (x, y0 ) in q 0 and a homomorphism h such that Aϕ |= Qϕ (x, h(y0 )) where Qϕ = (T 0 , Σ, ϕ0 (x, y0 )). Some leaf node y from the binary tree in ϕ must be outside the range of h. Now consider the ABox A0ϕ obtained from Aϕ by dropping all assertions that use y, adding a disjoint copy Azϕ of Aϕ for every z ∈ ind(Aϕ ) except y, and then identifying each z from Aϕ with the copy of z in Azϕ . Note that this construction is similar to the construction of A0 in Example 3. Then, as in Example 3, A0ϕ |= Q0 (x, h(y0 )) follows from the fact that ϕ0 (x, y0 ) is a BGP in q 0 . On the other hand, A0ϕ 6|= Q(x) and we have derived a contradiction. We now show that PUBGP-rewritability implies rewritability into a more controlled class of PUBGPs, based on the set of concepts CQ defined in Section 4. A CQ -PUBGP is a PUBGP that uses only concepts from CQ . Theorem 7. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ) that is con- nected, of arity at least one, and non-empty. Then the following are equivalent: 1. Q is PUBGP-rewritable; 2. Q is rewritable into an OMQ Q0 = (T , Σ, q 0 (x)) with q 0 (x) a CQ -PUBGP. Theorem 7 clearly implies that it is not necessary to modify the TBox when con- structing PUBGP-rewritings. It also implies that it is not necessary to use con- cepts from outside CQ . Unlike Theorem 4, however, it does not immediately give rise to a decision procedure for PUBGP-rewritability. Note that an approxima- tion QPUBGP of Q from below, formulated in (ALCI, PUBGP) and constructed in exact analogy with the query QUBGP from Theorem 4, is not guaranteed to be finite. In fact, the main challenge in showing decidability of PUBGP-rewritability is to give a bound on the number of variables used in PUBGP-rewritings. An additional complication is that there are no existing approaches for deciding containment or equivalence between an OMQ from (ALCI, UCQ) and an OMQ from (ALCI, PUBGP). Regarding the converse direction, it is possible to prove that every OMQ from (ALCI, PUBGP) can be rewritten into (ALCI, UCQ), the construction being similar to that used in the proof of Theorem 3. To prove Theorems 7 and 4, we introduce a certain normalization of PUBGP- rewritings. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ) S and let QR = (TR , Σ, qR (x)) be a PUBGP-rewriting of Q with qR (x) = Πx ( i ϕi (x, yi )). We may assume w.l.o.g. that the set of BGPs ϕi (x, yi ) in qR (x) is closed under contraction.5 From each BGP ϕi (x, yi ), we construct all BGPs ϕi,j (x, yi ) that satisfy the following conditions: 1. r(z1 , z2 ) ∈ ϕi,j iff r(z1 , z2 ) ∈ ϕi ; 2. z1 = z2 ∈ ϕi,j iff z1 = z2 ∈ ϕi ; 3. for each z ∈ xyi , there is a unique atom C(z) ∈ ϕi,j , and C ∈ CQ ; 4. (T , Σ, Πx ϕi,j (x, yi )) ⊆ Q. We call the OMQ Q0R = (T , Σ, Πx ( i,j ϕi,j (x, yi ))) from (ALCI, PUBGP) the S normalization of QR . In the long version of the paper, we show the following, which clearly yields Theorem 7. Lemma 2. If QR is a PUBGP-rewriting of Q, then so is its normalization Q0R . We note that Lemma 2 also gives Theorem 4. In fact, assume that an OMQ Q = (T , Σ, q(x)) from (ALCI, UCQ) is rewritable into an OMQ QR = (T 0 , Σ, q 0 (x)) from (ALCI, UBGP), a subclass of (ALCI, PUBGP). Then, the rewriting Q0R constructed above is also from (ALCI, UBGP). Recall the OMQ QUBGP constructed before Theorem 4. By construction, QUBGP ⊆ Q and QUBGP contains every BGP from QR . Thus, we also have Q ⊆ QUBGP , as required. 6 Conclusion The main problem left open in this paper is whether PUBGP-rewritability is de- cidable in (ALCI, UCQ) and related OMQ languages. As a precursor, it would be interesting to show that equivalence of an OMQ from (ALCI, UCQ) and an OMQ from (ALCI, PUBGP) is decidable. Another interesting question is whether rewritability into PUBGPs is actually different from rewritability into (more) complete SPARQL, that is, whether features such as difference make it possible to rewrite additional queries. Finally, it would be interesting to investi- gate the consequence of admitting constants in the original query and nominals in the rewriting. Acknowledgements. Cristina Feier and Carsten Lutz were supported by ERC Consolidator Grant 647289 CODA. Frank Wolter was supported by EPSRC UK grant EP/M012646/1. 5 Meaning that we treat ϕi (x, yi ) as a CQ with answer variables x. References 1. Franz Baader, Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. Query and predicate emptiness in ontology-based data access. J. Artif. Intell. Res., 56:1–59, 2016. 2. Franz Baader, Ian Horrocks, Carsten Lutz, and Ulrike Sattler. An Introduction to Description Logic. Cambridge University Press, 2017. 3. Meghyn Bienvenu and Magdalena Ortiz. Ontology-mediated query answering with data-tractable description logics. In Proc. of Reasoning Web, volume 9203 of LNCS, pages 218–307. Springer, 2015. 4. Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. Ontology- based data access: A study through disjunctive Datalog, CSP, and MMSNP. ACM Trans. Database Syst., 39(4):33:1–33:44, 2014. 5. Pierre Bourhis and Carsten Lutz. Containment in monadic disjunctive datalog, MMSNP, and expressive description logics. In Proc. of KR, pages 207–216. AAAI Press, 2016. 6. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez-Muro, and Riccardo Rosati. Ontologies and databases: The DL-Lite approach. In Proc. of Reasoning Web 2009, volume 5689 of LNCS, pages 255–356. Springer, 2009. 7. Cristina Feier, Carsten Lutz, and Frank Wolter. From conjunctive queries to in- stance queries in ontology-mediated querying. In Proc. of IJCAI, pages 1810–1816. ijcai.org, 2018. 8. Birte Glimm, Ian Horrocks, Boris Motik, Giorgos Stoilos, and Zhe Wang. Hermit: An OWL 2 reasoner. J. of Autom. Reasoning, 53(3):245–269, 2014. 9. Birte Glimm and Chimezie Ogbuji, editors. SPARQL 1.1 Entail- ment Regimes. W3C Recommendation, 21 March 2013. Available at https://www.w3.org/TR/sparql11-entailment/. 10. Stanislav Kikot, Dmitry Tsarkov, Michael Zakharyaschev, and Evgeny Zolin. Query answering via modal definability with FaCT++: First blood. In Proc. DL, volume 1014 of CEUR Workshop Proceedings, pages 328–340. CEUR-WS.org, 2013. 11. Stanislav Kikot and Evgeny Zolin. Modal definability of first-order formulas with free variables and query answering. J. Applied Logic, 11(2):190–216, 2013. 12. Ilianna Kollia, Birte Glimm, and Ian Horrocks. SPARQL query answering over OWL ontologies. In Proc. of ESWC, volume 6643 of LNCS, pages 382–396. Springer, 2011. 13. Evgeny Zolin. Modal logic applied to query answering and the case for variable modalities. In Proc. of DL, volume 250 of CEUR Workshop Proceedings. CEUR- WS.org, 2007.