From Conjunctive Queries to SPARQL Queries
       in Ontology-Mediated Querying

              Cristina Feier1 , Carsten Lutz1 , and Frank Wolter2
                         1
                          University of Bremen, Germany
                    feier@uni-bremen.de clu@uni-bremen.de
                    2
                      University of Liverpool, United Kingdom
                            wolter@liverpool.ac.uk


      Abstract. We consider the rewritability of ontology-mediated queries
      (OMQs) based on UCQs into OMQs based on (certain kinds of) SPARQL
      queries. Our focus is on ALCI as a paradigmatic expressive DL. For
      rewritability into SPARQL queries that are unions of basic graph pat-
      terns, we show that the existence of a rewriting is decidable, based on a
      suitable characterization; for unary OMQs, this coincides with rewritabil-
      ity into instance queries. For SPARQL queries that additionally admit
      projection, we make some interesting first observations. In particular,
      we show that whenever there is a rewriting, then there is one that uses
      the same TBox as the original OMQ and only ALCI concepts from a
      certain finite class of such concepts. We also observe that if the TBox of
      the original OMQ falls into Horn-ALCI, then a rewriting always exists.


1   Introduction
We study ontology-mediated querying with expressive description logics (DLs),
focussing on ALCI as a paradigmatic such DL. A variety of query languages has
been considered in this context, including conjunctive queries (CQs), unions of
conjunctive queries (UCQs), instance queries (IQs), and SPARQL queries. The
former two are a particularly natural choice because they play a fundamental
role in database theory and systems, and in fact they are used in a large number
of theoretical studies on ontology-mediated querying [3, 4, 6]. On the practical
side, however, there do not seem to be any systems that fully support CQs and
UCQs whereas there are systems that support IQs and SPARQL queries, such
as Hermit [8, 12]. This raises the question of when and how a (U)CQ can be
expressed as an IQ or as a SPARQL query.
    The relationship between unary (U)CQs and IQs, which can be seen as a
simple form of SPARQL query, has been studied in [10,11,13], and very recently
in [7]. While it is easy to see that every tree-shaped CQ can be translated
into an equivalent IQ, it is more surprising that also some CQs that are not
tree-shaped turn out to be IQ-rewritable when we have an expressive DL at
our disposal [13]. For example, the CQ r(x, x), which asks to return all objects
from the data that are involved in a reflexive r-loop, can be rewritten into the
equivalent IQ P → ∃r.P (x) (equivalently ¬P t ∃r.P (x)). Here, P behaves like
a monadic second-order variable due to the open-world assumption made for
OMQs: we are free to interpret P in any possible way and when making P true
at an object we are forced to make also ∃r.P true if and only if the object is
involved in a reflexive r-loop. Kikot and Zolin have identified a large class of CQs
that are rewritable into IQs in the context of ALCI, namely that class of unary
CQs in which every cycle passes through the answer variable. This was further
elaborated in [7] where it is shown that this condition precisely characterizes IQ-
rewritability, the condition is generalized to the case of non-empty TBoxes and
non-full ABox signatures, and tight complexity bounds for deciding whether an
IQ-rewriting exists are obtained, between NP-complete in the case of the empty
TBox and full ABox signature and 2NExpTime-complete in the general case.
    The purpose of this paper is to present initial results on the rewritability of
(U)CQs of arity greater than one, for which IQs are clearly not a suitable target.
Kikot and Zolin replace IQ-rewritability by Turing reductions to knowledge base
consistency [11]. We prefer to consider rewritability into (two fragments of)
SPARQL, namely into unions of basic graph patterns (UBGP), that is, UCQs
that contain no quantified variables but potentially have atoms C(x) with C a
compound concept, and into PUBGPs, which consist of a projection applied to
a UBGP. From an implementation perspective, answering such queries is closely
related to knowledge base consistency and other common reasoning tasks which
has resulted in support by practical systems, while answering (U)CQs is not. In
fact, an imporant difference is that the union in UBGPs and the projection in
PUBGPs are operations on query answers while the disjunction of UCQs and
the existential quantification of CQs are logical operators on the level of models
which makes them more difficult to handle. Note that we disregard many parts
of SPARQL such as difference, optional, and the binding of variables to concept
and role names. Also, we stick to the OWL 2 direct semantics entailment regime.
    After giving preliminaries, in Section 3 we summarize the results from [7]
regarding IQ-rewritability in ALCI. We also observe that IQs and unary UBGPs
have the same expressive power, which is not entirely trivial, and that unary
PUBGPs are more expressive. In Section 4, we then study UBGP-rewritability of
OMQs from (ALCI, UCQ), that is, the TBox is formulated in ALCI, the actual
query is a UCQ, and an ABox signature may be imposed. Remarkably, even if
the UCQ is full (that is, it has no quantified variables), UBGP-rewritability is
not guaranteed. We develop a characterization of UBGP-rewritability and use it
to show that this problem is decidable.3 The characterization is more technical
than in the case of IQ-rewritability which we believe to be unavoidable. It also
implies that, in UBGP-rewritings, it is always sufficient to use the same TBox
as in the original OMQ.
    For PUBGP-rewritability, our results are less complete. We start with observ-
ing that PUBGP-rewritings always exist when the TBox of the original OMQ is
formulated in Horn-ALCI, that otherwise rewritings are not guaranteed to exist
and that, when they exist, they might necessarily involve BGPs whose number of
3
    Here and in the subsequent results, we assume that every CQ in the original UCQ
    is connected in the sense that every variable is reachable from an answer variable.
variables is exponential in the size of the TBox of the original OMQ. We then es-
tablish a (non-effective) characterization which shows that PUBGP-rewritability
implies rewritability into a particular class of PUBGPs. It follows that it is never
necessary to modify the TBox when constructing PUBGP-rewritings and that in
atoms of the form C(x), the concept C can be restricted to a certain finite class
of concepts. This is a first step towards decidability of PUBGP-rewritability,
which remains as an interesting open problem.
    A long version of the paper containing full proofs in the appendix is available
at http://www.informatik.uni-bremen.de/tdki/research/papers.html.


2    Preliminaries

We assume familiarity with standard DL notation and languages [2] and only
introduce notions that are potentially ambiguous. The main DL studied in this
paper is ALCI. An ABox A is a finite set of assertions of the form A(a) and
r(a, b), where A is a concept name, r a role name, and a, b are individual names.
We use ind(A) to denote the set of all individual names that occur in A. An
interpretation is a model of an ABox A if it satisfies all assertions in A, that is,
a ∈ AI when A(a) is in A and (a, b) ∈ rI when r(a, b) is in A. We thus make the
standard name assumption. A signature Σ is a set of concept and role names.
We use sig(A) to denote the set of concept and role names that occur in the
ABox A, and likewise for other syntactic objects such as TBoxes. An ABox A
is a Σ-ABox if sig(A) ⊆ Σ.
     An instance query (IQ) takes the form C(x) where C is a concept from the DL
under consideration and x a variable. In this paper, C will always be an ALCI
concept. For an interpretation I, we write I |= C(a) if a ∈ C I . A conjunctive
query (CQ) is of the form q(x) = ∃y ϕ(x, y), where x and y are tuples of variables
and ϕ(x, y) is a conjunction of atoms of the form A(z), r(z1 , z2 ), or z1 = z2 , with
A a concept name, r a role name, and z, z1 , z2 ∈ x ∪ y. When z1 = z2 occurs in q
we assume w.l.o.g. that z1 , z2 ∈ x and there is no other atom in q which W contains
x2 . A union of conjunctive queries (UCQ) q(x) is a formula of the form i qi (x),
where each qi (x) is a CQ. When compound concepts are admitted in place of
concept names in a CQ/UCQ, then we speak about an extended CQ/UCQ. For
any kind of query, the free variables are called answer variables, the arity is the
number of answer variables, and a query is Boolean if it has arity zero. A CQ is
full if all variables are answer variables and a UCQ is full if every disjunct is.
     A homomorphism from a CQ q(x) to an interpretation I is a function h :
x ∪ y → ∆I such that h(z) ∈ AI for every atom A(z) of q(x), (h(z1 ), h(z2 )) ∈ rI
for every atom r(z1 , z2 ) of q(x), and h(x1 ) = h(x2 ) for every atom x1 = x2
of q(x). We write I |= q(a) and call a an answer to q(x) onWI if there is a
homomorphism from q(x) to I with h(x) = a. For a UCQ i qi (x) and an
interpretation I, we write I |= q(a) if I |= qi (a) for some i. The semantics of
extended CQ/UCQs is as expected.
     We now introduce query languages related to SPARQL [9]. A basic graph
pattern (BGP) is an extended full CQ. Note that IQs coincide with unary BGPs.
                                                            S
A union of basic graph patterns (UBGP) is ofSthe form i ϕi (x) where each ϕi
is a BGP and a PUBGP is of the form Πx ( i ϕi (x, yi )) where each ϕi (x, yi )
is a BGP. UBGPs correspond to PUBGPs in which all tuples yi are empty as
in this case the projection to x denoted by Πx is vacuous. Note that BGPs and
UBGPs must have non-zero arity while this is not the case for PUBGPs.
    An ontology-mediated query (OMQ) takes the form Q = (T , Σ, q(x)) with T
a TBox, Σ ⊆ sig(T ) ∪ sig(q) an ABox signature, and q(x) a query. The arity of
Q is the arity of q(x). When Σ is sig(T ) ∪ sig(q), then for brevity we denote it
with Σfull and speak of the full ABox signature. Let A be a Σ-ABox. If q(x) is
an IQ or a (possibly extended) UCQ, then a is an answer to Q on A if I |= q(a)
for all modelsSI of A and T . This also captures the case where q(x) is a BGP.
If q(x) = Πx ( i ϕi (x, yi )) is a PUBGP, then a is an answer to Q on A if there
exist i and b such that ab is an answer to (T , Σ, ϕi (x, yi )). This also captures
the case where q(x) is a UBGP. In either case, we write A |= Q(a) if a is an
answer to Q on A.
    It is important to note and to keep in mind that, in OMQs, the union in
(P)UBGPs behaves differently from the disjunction in UCQs and the projection
in PUBGPs behaves differently from the existential quantification in UCQs (also
note that the order is reversed). For example, let T = {∃r.> v A t B} and A
an {A, B}-ABox. Then the UCQ-based OMQ (T , Σfull , A(x) ∨ B(x)) returns
every individual in the range of r as an answer whereas the UBGP-based OMQ
(T , Σfull , A(x) ∪ B(x)) returns only those individuals a from the range of r such
that A(a) ∈ A or B(a) ∈ A. In fact, the latter query corresponds to executing the
two OMQs (T , Σfull , A(x)) and (T , Σfull , B(x)) independently and then taking the
union of their answer sets. Likewise, the projection of PUBGPs is a projection
on answer sets.
    We use (L, Q) to refer to the OMQ language in which the TBox is formulated
in the DL L and where the actual queries are from the query language Q, such
as in (ALCI, UCQ).
Definition 1. Let (L, Q) be an OMQ language. An OMQ Q = (T , Σ, q(x)) is
(L, Q)-rewritable if there is an OMQ Q0 from (L, Q) such that the answers to Q
and to Q0 are identical on any Σ-ABox that is consistent with T . In this case,
we say that Q is rewritable into Q0 and call Q0 a rewriting of Q.
We are interested in rewriting OMQs from (ALCI, UCQ) into OMQs based on
IQs, UBGPs, and PUBGPs. For brevity, we speak of Q-rewritability instead of
(ALCI, Q)-rewritability. For example, IQ-rewritability means rewritability of an
OMQ from (ALCI, UCQ) into an OMQ from (ALCI, IQ) with the IQ formu-
lated in ALCI, and PUBGP-rewritability means rewritability into an OMQ from
(ALCI, PUBGP) where the PUBGP uses only ALCI compound concepts. The
following example of IQ-rewritability is from [7]. Examples and non-examples of
PUBGP-rewritability are given later in this paper.
Example 1. Let q1 (x) = r(x, x). The OMQ Q1 = (∅, {r}, q1 (x)) is rewritable into
the OMQ (∅, {r}, C(x)) where C is the ALCI concept P → ∃r.P. In contrast,
let q2 (x) = ∃y r(x, y) ∧ r(y, y). It follows from Theorem 1 below that the OMQ
Q2 = (∅, {r}, q2 (x)) is not rewritable into an OMQ from (ALCI, IQ).
We close this section with some additional, more technical preliminaries. We
say that an OMQ Q = (T , Σ, q(x)) is empty if for all Σ-ABoxes A, there is no
answer to Q on A. Let Q1 , Q2 be OMQs, Qi = (Ti , Σ, qi (x)) for i ∈ {1, 2}. Then
Q1 is contained in Q2 , written Q1 ⊆ Q2 , if for all Σ-ABoxes A, every answer
to Q1 on A is also an answer to Q2 on A. Further, Q1 and Q2 are equivalent,
written Q1 ≡ Q2 , if Q1 ⊆ Q2 and Q2 ⊆ Q1 .
    Every CQ q(x) = ∃y ϕ(x, y) gives rise to an undirecteded graph Gq whose
nodes are the elements of x ∪ y and that contains an edge {z1 , z2 } if ϕ(x, y)
contains an atom r(z10 , z20 ) or z10 = z20 with {z1 , z2 } = {z10 , z20 }. Gq might contain
self loops. We say that q(x) is connected if every variable is reachable from an
answer variable in Gq . An UCQ is connected if every CQ in it is and an OMQ
from (ALCI, UCQ) is connected if the UCQ in it is. A contraction of a CQ q(x)
is a CQ obtained from q(x) by zero or more variable identifications, where the
identification of an answer variable x with any non-answer variable yields x.


3     IQ-rewritability
We give an overview of the results on IQ-rewritability in (ALCI, UCQ) from [7].
In that paper, several other DLs are considered as well, including ALC and
variations of ALC and ALCI that support role hierarchies, the universal role,
and functional roles. In some cases, the characterization of OMQ rewritability
and the complexity of deciding IQ-rewritability are rather different from the
ALCI case, which we briefly comment on. We also show that unary OMQs from
(ALCI, IQ), (ALCI, BGP), and (ALCI, UBGP) have the same expressive power
while unary OMQs from (ALCI, PUBGP) are more expressive.
    We start with a fundamental characterization of OMQs from (ALCI, UCQ)
that are IQ-rewritable, first giving some preliminaries. Let q(x) be a unary CQ.
We can assume w.l.o.g. that, being unary, q(x) contains no equality atoms. A
cycle in q(x) is a sequence of non-identical atoms r0 (x0 , x1 ), . . . , rn−1 (xn−1 , xn )
in q(x), n ≥ 1, where4
 1. r0 , . . . , rn−1 are (potentially inverse) roles,
 2. xi 6= xj for 0 ≤ i < j < n, and x0 = xn .
We say that q(x) is x-acyclic if every cycle in it passes through x. A UCQ is
x-acyclic if every CQ in it is.
   Let q(x) be a UCQ. We use qacyc (x) to denote the UCQ that consists of all
                                                           con
contractions of a CQ from q(x) that are x-acyclic and qacyc    (x) to denote the
UCQ obtained from qacyc (x) by restricting every CQ in it to atoms that only use
variables reachable in Gq from the answer variable x. The following theorem is
the announced characterization [7].
Theorem 1. Let Q = (T , Σ, q(x)) be a unary OMQ from (ALCI, UCQ) that is
non-empty. Then the following are equivalent:
4
    We require the atoms be non-identical to prevent r(x0 , x1 ), r− (x1 , x0 ) from being a
    cycle (both atoms are identical).
1. Q is IQ-rewritable;
2. Q is rewritable into an OMQ Q0 = (T , Σ, C(x)) from (ALCI, IQ);
                con
3. Q ≡ (T , Σ, qacyc (x)).
    Theorem 1 excludes empty OMQs, but these are trivially IQ-rewritable;
moreover, emptiness of OMQs from (ALCI, UCQ) is decidable (and 2ExpTime-
complete) [1]. Equivalence of Points 1 and 2 implies that it is not necessary to
modify the TBox when constructing an IQ-rewriting. The latter is not the case,
for example, when constructing IQ-rewritings of OMQs from (ALC, UCQ) as-
suming that the constructed IQ must be an ALC concept. In that case, Point 1
and 3 of Theorem 1 are still equivalent, but equivalence with Point 2 is not as
it might be necessary to (mildly) extend the TBox [7].
    In the direction “3 ⇒ 2”, we construct actual rewritings, extending a con-
struction due to Kikot and Zolin [11] to accomodate TBoxes, ABox signatures,
and UCQs (instead of CQs). This extension yields the following.
Lemma 1. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ). If q(x) is
x-acyclic and connected, then Q is rewritable into an OMQ (T , Σ, C(x)) with
C(x) an IQ. The size of the IQ C(x) is polynomial in the size of q(x).
   The IQ C(x) constructed in the proof of Lemma 1 takes the form P →
(C1 t· · ·tCn ) where P is a concept name or > and C1 , . . . , Cn are ELI-concepts.
Based on Theorem 1 and variations thereof and using careful reductions to OMQ
containment [5], one can establish the following complexity results [7].
Theorem 2. Deciding IQ-rewritability of a unary OMQ from (ALCI, UCQ) is
1. 2 NExpTime-complete in the general case;
2. 2 ExpTime-complete for OMQs based on the full ABox signature;
3. NP-complete for OMQs based on the empty TBox.
Different complexities are obtained for other DLs. For example, IQ-rewritability
is undecidable in (ALCF, CQ) and between ExpTime and coNExpTime in
(ALC, UCQ) when the ABox signature is full. As observed in [7], IQ-rewritings
for OMQs based on the empty TBox can be viewed as underapproximations for
OMQs with non-empty TBoxes in the sense that an IQ-rewriting for (∅, Σ, q(x))
with q(x) a UCQ is also an IQ-rewriting for (T , Σ, q(x)) for any ALCI TBox T .
    We now compare OMQs based on IQs to unary OMQs based on BGPs,
UBGPs, and PUBGPs. It is obvious from the definitions that OMQs from
(ALCI, IQ) and (ALCI, BGP) have the same expressive power. The follow-
ing is less trivial to show than one might think as converting an OMQ from
(ALCI, UBGP) into an OMQ from (ALCI, IQ) requires a careful modification
of the TBox to bridge the semantic gap between unions in UBGPs and disjunc-
tion in IQs. A proof is in the long version of the paper.
Theorem 3. The language (ALCI, IQ) has the same expressive power as unary
(ALCI, UBGP).
  The next example shows that unary (ALCI, PUBGP) is more expressive than
(ALCI, IQ).
Example 2. Let Q = (∅, {r}, q(x)) be the OMQ where q(x) is the PUBGP
Πx ϕ(x, y) and ϕ(x, y) is the BGP r(x, y) ∧ (P → ∃r.P )(y). It can be verified
that Q is equivalent to the OMQ Q2 from Example 1, which is not expressible
in (ALCI, IQ).


4    UBGP-Rewritability

We study UBGP-rewritability in (ALCI, UCQ). We first present an example
and a characterization of UBGP-rewritability, and then use it to show that this
property is decidable. The example illustrates that as a result of disjunction in
UCQs behaving differently from union in UBGPs, OMQs based on full UCQs
are not always UBGP-rewritable. Note that this is in contrast to the CQ case
since every OMQ based on a full CQ is by definition also an OMQ based on a
BGP.

Example 3. Take the UCQ

                      q(x, y) = (A(x) ∧ r(x, y)) ∨ (r(x, y) ∧ A(y)).

We first consider the OMQ Q0 = (∅, Σfull , q(x, y)) based on the empty TBox.
Clearly, Q0 is rewritable into the OMQ Q00 = (∅, Σfull , q 0 (x, y)) from the language
(ALCI, UBGP) in which the disjunction from the UCQ q(x, y) is replaced by
BGP union:
                  q 0 (x, y) = (A(x) ∧ r(x, y)) ∪ (r(x, y) ∧ A(y)).
Now consider the TBox T = {M v ∀s.A t ∀t.A} and let Q1 = (T , Σfull , q(x, y)).
Then Q01 = (T , Σfull , q 0 (x, y)) is not a rewriting of Q1 . To see this, consider the
ABox
                            A = {r(a, b), M (c), s(c, a), t(c, b)}.
Then A |= Q1 (a, b), but A 6|= Q01 (a, b). In fact, Q1 is not UBGP-rewritable.
We sketchS the proof. Assume to the contrary that there is a rewriting Q =
(T 0 , Σ, i ϕi (x, y)) of Q1 from (ALCI, UBGP). Then A |= Q(a, b). Consider the
ABox A0 obtained from A by dropping all assertions that use c, taking the union
with two copies Aa and Ab of A that do not share individuals with A or with
each other, and then identifying the copy of a in Aa with a, and the copy of b
in Ab with b. Thus,

     A0 = {r(a, b), M (c1 ), s(c1 , a), t(c1 , b0 ), M (c2 ), s(c2 , a0 ), s(c2 , b), t(a0 , b)}.

We then have A0 6|= Q1 (a, b), and so A0 6|= Q(a, b). By construction, there is a
homomorphism from A to A0 that maps a to a and also a homomorphism from A
to A0 that maps b to b. As entailment of ALCI concepts is preserved under ABox
homomorphisms, for every ALCI concept C, A |= (T 0 , Σfull , C(x))(a) implies
A0 |= (T 0 , Σfull , C(x))(a) and the same holds for b. We obtain A0 |= Q(a, b) from
A |= Q(a, b) and have derived a contradiction.
We now give the characterization of UBGP-rewritability, starting with some
technical preliminaries. We first identify a set of concepts to be used in UBGP-
rewritings and, later on, also in PUBGP-rewritings.
    Let Q = (T , Σ, q(x)) be a non-Boolean OMQ from (ALCI, UCQ). Let Qq(x)
be the set of unary CQs p(x) that are connected and x-acyclic and can be
obtained from a contraction of a CQ in q(x) by dropping all equality atoms,
then (potentially) taking a subquery, and finally choosing any variable x as the
new answer variable and existentially quantifying all other variables. For every
such p(x), take an ALCI concept Cp such that the CQ-based OMQ (T , Σ, p(x))
is equivalent to the IQ-based OMQ (T , Σ, Cp (x)), as constructed in the proof of
Lemma 1. Let sub(Q) be the closure under subconcepts of concepts Cp , p(x) ∈
Qq(x) , and concepts used in T . A Q-type is a minimal set of concepts that
contains C or ¬C for every C ∈ sub(Q). For a Q-type τ , set Cτ = u C. Also
                                                                     C∈τ
let CQ be the set of concepts of the form Cτ1 t · · · t Cτ` with each τi a Q-type.
    We next define an approximation of Q from below that is formulated in
(ALCI, UBGP) and based on the TBox T from Q and the concepts from CQ .
More precisely, QUBGP is the OMQ (T , Σ, q 0 (x)) where q 0 (x) is the UBGP that
is the union of all BGPs ϕ(x) such that
1. for every x ∈ x, there is a unique atom D(x) ∈ ϕ(x), and D ∈ CQ and
2. (T , Σ, ϕ(x)) ⊆ Q.
It is easy to see that there are only finitely many such BGPs. Note that, by
construction, QUBGP ⊆ Q.
Theorem 4. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ) that is con-
nected, of arity at least one, and non-empty. Then the following are equivalent:
1. Q is UBGP-rewritable;
2. QUBGP is a rewriting of Q;
3. Q ⊆ QUBGP .
Since QUBGP uses the same TBox as Q, Theorem 4 implies that it is not nec-
essary to modify the TBox when constructing UBGP-rewritings. The proof of
Theorem 4 is a minor variation of the proof of Theorem 7 below, given after
Lemma 2.
    The next theorem can be obtained by a reduction of UBGP-rewritability to
containment in monadic disjunctive datalag [5], based on Theorem 4.
Theorem 5. UBGP-rewritability in (ALCI, UCQ) is decidable.
   A 2NExpTime lower bound follows from Theorems 2 and 3. The described
reduction only yields a 4NExpTime upper bound.

5   PUBGP-Rewritability
We study PUBGP-rewritability in (ALCI, UCQ), first observing that these al-
ways exist when the TBox of the original OMQ is formulated in the fragment
Horn-ALCI of ALCI. This is essentially a consequence of the fact that Horn-
ALCI has universal models, more details are given in the long version.
Theorem 6. Every OMQ from (Horn-ALCI, UCQ) is PUBGP-rewritable.
    We next give an example showing that PUBGP-rewritings are not guaranteed
to exist in the non-Horn case. This is already true for OMQs from (ALC, CQ)
even when ALCI concepts are admitted in PUBGPs.
Example 4. Let Q = (T , Σfull , q(x)) with

            T = {M v M1 t M2 , M1 v ∀s.A, M2 v ∀t.A, A v ∀t.A}

and q(x) = ∃y r(x, y) ∧ A(y) ∧ r(y, y). Then Q is not PUBGP-rewritable. For a
proof by contradiction, assume that Q0 = (T 0 , Σfull , q 0 (x)) is a PUBGP-rewriting.
For every n ≥ 1, let
                  An = {r(a, b1 ), r(a, b2 ), r(b1 , b1 ), r(b2 , b2 ),
                        M (c1 ), s(c1 , b1 ), t(c1 , c2 ), . . . , t(cn , b2 )}.

It can be verified that An |= Q(a) and thus An |= Q0 (a), for every n. Choose
n > 0 that exceeds the number of variables in any BGP in q 0 . As An |= Q0 (a),
there must be a BGP ϕ(x, y) in q 0 and a b such that An |= Qϕ (a, b) for the
OMQ Qϕ = (T , Σfull , ϕ(x, y)). Further, there must be a ci in An that is not in b.
Let ind− = ind(An ) \ {ci } and for each b ∈ ind− , let Abn denote the disjoint copy
of An obtained by renaming every individual c to cb . Now consider the ABox A0n
obtained from An by removing all assertions that use ci , taking the disjoint union
with all Abn , b ∈ ind− , and then identifying each individual b ∈ ind− with the
individual bb from Abn . Note that this construction is similar to the construction
of A0 in Example 3. We have A0n |= Qϕ (a, b) since for every b ∈ ind− , there is a
homomorphism h from An to A0n with h(b) = b and for all b, b0 ∈ ind− and role
names v, v(b, b0 ) ∈ An iff v(b, b0 ) ∈ A0n . On the other hand, it is easy to see that
A0n 6|= Q(a) due to the removal of ci .
    Next example shows that when constructing PUBGP-rewritings of OMQs
from (ALC, CQ), it can be necessary to use BGPs whose topology is different
from that of the CQs in the original OMQ. In fact, exponentially many variables
(in the size of the TBox of the original OMQ) can be required in BGPs. We
conjecture that the example can even be improved to show a double exponential
lower bound on the number of variables. As for the previous example, the same
is true for (ALCI, CQ), that is, when ALCI concepts are admitted in PUBGPs.
Example 5. Let Q = (T , Σ, q(x)) be the OMQ with

              T = {Ai v ∀r.Ai+1 t ∀s.Ai+1 | 0 ≤ i < n} ∪ {An v A}
              Σ = {A0 , r, s, t}
            q(x) = ∃y t(x, y) ∧ A(y) ∧ t(y, y).

Q is rewritable into the OMQ Q0 = (∅, Σ, q 0 (x)) where q 0 (x) is the PUBGP
Πx ϕ(x, y) with ϕ(x, y) as depicted in Figure 5. Observe that the topmost part
is a full binary tree of depth n. We aim to show that there is no rewriting of Q
that uses less than 2n variables.
                                        A0
                                  r               s

                          r       s               r           s

                        ...       ...         ...             ...


                    r         s                           r         s
               t          t       ...   ...   ...             t         t

                              t   t           t       t

                                        x


              Fig. 1. PUBGP-rewriting for the OMQ Q in Example 5

     Assume for a proof by contradiction that there is such a rewriting Q0 =
(T 0 , Σ, q 0 (x)). Let Aϕ be ϕ viewed as an ABox. Clearly, Aϕ |= Q(x) and thus
there must be a BGP ϕ0 (x, y0 ) in q 0 and a homomorphism h such that Aϕ |=
Qϕ (x, h(y0 )) where Qϕ = (T 0 , Σ, ϕ0 (x, y0 )). Some leaf node y from the binary
tree in ϕ must be outside the range of h. Now consider the ABox A0ϕ obtained
from Aϕ by dropping all assertions that use y, adding a disjoint copy Azϕ of Aϕ
for every z ∈ ind(Aϕ ) except y, and then identifying each z from Aϕ with the
copy of z in Azϕ . Note that this construction is similar to the construction of
A0 in Example 3. Then, as in Example 3, A0ϕ |= Q0 (x, h(y0 )) follows from the
fact that ϕ0 (x, y0 ) is a BGP in q 0 . On the other hand, A0ϕ 6|= Q(x) and we have
derived a contradiction.
   We now show that PUBGP-rewritability implies rewritability into a more
controlled class of PUBGPs, based on the set of concepts CQ defined in Section 4.
A CQ -PUBGP is a PUBGP that uses only concepts from CQ .
Theorem 7. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ) that is con-
nected, of arity at least one, and non-empty. Then the following are equivalent:
1. Q is PUBGP-rewritable;
2. Q is rewritable into an OMQ Q0 = (T , Σ, q 0 (x)) with q 0 (x) a CQ -PUBGP.
Theorem 7 clearly implies that it is not necessary to modify the TBox when con-
structing PUBGP-rewritings. It also implies that it is not necessary to use con-
cepts from outside CQ . Unlike Theorem 4, however, it does not immediately give
rise to a decision procedure for PUBGP-rewritability. Note that an approxima-
tion QPUBGP of Q from below, formulated in (ALCI, PUBGP) and constructed
in exact analogy with the query QUBGP from Theorem 4, is not guaranteed to be
finite. In fact, the main challenge in showing decidability of PUBGP-rewritability
is to give a bound on the number of variables used in PUBGP-rewritings. An
additional complication is that there are no existing approaches for deciding
containment or equivalence between an OMQ from (ALCI, UCQ) and an OMQ
from (ALCI, PUBGP). Regarding the converse direction, it is possible to prove
that every OMQ from (ALCI, PUBGP) can be rewritten into (ALCI, UCQ),
the construction being similar to that used in the proof of Theorem 3.
    To prove Theorems 7 and 4, we introduce a certain normalization of PUBGP-
rewritings. Let Q = (T , Σ, q(x)) be an OMQ from (ALCI, UCQ)  S and let QR =
(TR , Σ, qR (x)) be a PUBGP-rewriting of Q with qR (x) = Πx ( i ϕi (x, yi )). We
may assume w.l.o.g. that the set of BGPs ϕi (x, yi ) in qR (x) is closed under
contraction.5 From each BGP ϕi (x, yi ), we construct all BGPs ϕi,j (x, yi ) that
satisfy the following conditions:
 1. r(z1 , z2 ) ∈ ϕi,j iff r(z1 , z2 ) ∈ ϕi ;
 2. z1 = z2 ∈ ϕi,j iff z1 = z2 ∈ ϕi ;
 3. for each z ∈ xyi , there is a unique atom C(z) ∈ ϕi,j , and C ∈ CQ ;
 4. (T , Σ, Πx ϕi,j (x, yi )) ⊆ Q.
We call the OMQ Q0R = (T , Σ, Πx ( i,j ϕi,j (x, yi ))) from (ALCI, PUBGP) the
                                              S
normalization of QR . In the long version of the paper, we show the following,
which clearly yields Theorem 7.
Lemma 2. If QR is a PUBGP-rewriting of Q, then so is its normalization Q0R .
     We note that Lemma 2 also gives Theorem 4. In fact, assume that an
OMQ Q = (T , Σ, q(x)) from (ALCI, UCQ) is rewritable into an OMQ QR =
(T 0 , Σ, q 0 (x)) from (ALCI, UBGP), a subclass of (ALCI, PUBGP). Then, the
rewriting Q0R constructed above is also from (ALCI, UBGP). Recall the OMQ
QUBGP constructed before Theorem 4. By construction, QUBGP ⊆ Q and QUBGP
contains every BGP from QR . Thus, we also have Q ⊆ QUBGP , as required.


6     Conclusion
The main problem left open in this paper is whether PUBGP-rewritability is de-
cidable in (ALCI, UCQ) and related OMQ languages. As a precursor, it would
be interesting to show that equivalence of an OMQ from (ALCI, UCQ) and
an OMQ from (ALCI, PUBGP) is decidable. Another interesting question is
whether rewritability into PUBGPs is actually different from rewritability into
(more) complete SPARQL, that is, whether features such as difference make it
possible to rewrite additional queries. Finally, it would be interesting to investi-
gate the consequence of admitting constants in the original query and nominals
in the rewriting.


Acknowledgements. Cristina Feier and Carsten Lutz were supported by ERC
Consolidator Grant 647289 CODA. Frank Wolter was supported by EPSRC UK
grant EP/M012646/1.
5
    Meaning that we treat ϕi (x, yi ) as a CQ with answer variables x.
References
 1. Franz Baader, Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. Query and
    predicate emptiness in ontology-based data access. J. Artif. Intell. Res., 56:1–59,
    2016.
 2. Franz Baader, Ian Horrocks, Carsten Lutz, and Ulrike Sattler. An Introduction to
    Description Logic. Cambridge University Press, 2017.
 3. Meghyn Bienvenu and Magdalena Ortiz. Ontology-mediated query answering with
    data-tractable description logics. In Proc. of Reasoning Web, volume 9203 of LNCS,
    pages 218–307. Springer, 2015.
 4. Meghyn Bienvenu, Balder ten Cate, Carsten Lutz, and Frank Wolter. Ontology-
    based data access: A study through disjunctive Datalog, CSP, and MMSNP. ACM
    Trans. Database Syst., 39(4):33:1–33:44, 2014.
 5. Pierre Bourhis and Carsten Lutz. Containment in monadic disjunctive datalog,
    MMSNP, and expressive description logics. In Proc. of KR, pages 207–216. AAAI
    Press, 2016.
 6. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    Antonella Poggi, Mariano Rodriguez-Muro, and Riccardo Rosati. Ontologies and
    databases: The DL-Lite approach. In Proc. of Reasoning Web 2009, volume 5689
    of LNCS, pages 255–356. Springer, 2009.
 7. Cristina Feier, Carsten Lutz, and Frank Wolter. From conjunctive queries to in-
    stance queries in ontology-mediated querying. In Proc. of IJCAI, pages 1810–1816.
    ijcai.org, 2018.
 8. Birte Glimm, Ian Horrocks, Boris Motik, Giorgos Stoilos, and Zhe Wang. Hermit:
    An OWL 2 reasoner. J. of Autom. Reasoning, 53(3):245–269, 2014.
 9. Birte Glimm and Chimezie Ogbuji, editors.                  SPARQL 1.1 Entail-
    ment Regimes.        W3C Recommendation, 21 March 2013.               Available at
    https://www.w3.org/TR/sparql11-entailment/.
10. Stanislav Kikot, Dmitry Tsarkov, Michael Zakharyaschev, and Evgeny Zolin.
    Query answering via modal definability with FaCT++: First blood. In Proc. DL,
    volume 1014 of CEUR Workshop Proceedings, pages 328–340. CEUR-WS.org, 2013.
11. Stanislav Kikot and Evgeny Zolin. Modal definability of first-order formulas with
    free variables and query answering. J. Applied Logic, 11(2):190–216, 2013.
12. Ilianna Kollia, Birte Glimm, and Ian Horrocks. SPARQL query answering over
    OWL ontologies. In Proc. of ESWC, volume 6643 of LNCS, pages 382–396.
    Springer, 2011.
13. Evgeny Zolin. Modal logic applied to query answering and the case for variable
    modalities. In Proc. of DL, volume 250 of CEUR Workshop Proceedings. CEUR-
    WS.org, 2007.