=Paper=
{{Paper
|id=Vol-1350/paper-01
|storemode=property
|title=Query
                Rewriting Beyond DL-Lite
|pdfUrl=https://ceur-ws.org/Vol-1350/paper-01.pdf
|volume=Vol-1350
|dblpUrl=https://dblp.org/rec/conf/dlog/Lutz15
}}
==Query
                Rewriting Beyond DL-Lite==
<pdf width="1500px">https://ceur-ws.org/Vol-1350/paper-01.pdf</pdf>
<pre>
            Query Rewriting Beyond DL-Lite

                                  Carsten Lutz

               Fachbereich Informatik, Universität Bremen, Germany


1   Abstract of Invited Talk

Query rewriting has become a very prominent tool for efficiently implementing
ontology-mediated querying in practice. The technique was originally introduced
in the context of DL-Lite [4], but is now increasingly being used also for more
expressive DLs. While rewritings are not guaranteed to exist beyond DL-Lite,
the simple structure of ontologies that emerge from practical applications gives
hope that non-existence of rewritings is a rare case.
    The aim of the talk is to survey FO- and Datalog-rewriting of ontology-
mediated queries in description logics beyond DL-Lite. It is structured into three
parts. The first part is concerned with FO-rewritings in Horn-DLs such as EL,
ELI, and Horn-SHI, the second part considers FO-rewritings in non-Horn-DLs
such as ALC and ALCI, and the third part is about Datalog-rewritings in non-
Horn DLs. In all three parts, I will try to emphasize useful characterizations of
FO-rewritability, practically efficient algorithms for constructing rewritings, and
relevant computational complexity results.
    The presentation is based on joint work with Meghyn Bienvenu, Balder ten
Cate, Peter Hansen, İnanc Seylan, and Frank Wolter. The subsequent section
provides some supplementary material that is featured in the talk, but has not
yet been published elsewhere. It establishes a link between the first and the
second part of the talk.


2   Supplementary Material

In [3, 7], we have proposed an approach to deciding the FO-rewritability of OMQs
for the case where the ontology/TBox is formulated in a Horn DL such as EL,
ELI, and Horn-ALCI. The approach has led to efficient (yet complete) practical
implementations, and it relies on a characterization of FO-rewritability in terms
of tree-shaped ABoxes. Intuitively, the characterization relies on a property of
TBoxes that is called ‘unraveling tolerance’ and which typically is enjoyed by
Horn DLs, but not by DLs that include forms of disjunction. In contrast, the only
known complete approach to deciding FO-rewritability of OMQs in which the
TBox is formulated in full (non-Horn) ALC and ALCI is via the CSP connection
in [9, 2]. Since ALC- and ALCI-TBoxes are typically not unraveling tolerant, it
might seems that these two world are largely unrelated. In the following, though,
we point out a characterization of FO-rewritability in full ALCI that establishes
an interesting connection to tree-shaped ABoxes and thus to the Horn case. We
consider Boolean atomic queries (BAQs), that is, queries of the form ∃x A(x)
with A a concept name.
    An ontology-mediated query (OMQ) is a triple Q = (T , Σ, q) with T a TBox,
Σ an ABox signature (set of concept and role names), and q a query. An OBDA
language is a set of OMQs. We use (ALCI, BAQ) to denote the OBDA language
that consists of all OMQs (T , Σ, q) with T an ALCI-TBox and q a BAQ, and
likewise for other combinations of a DL and a query language. An OMQ Q =
(T , Σ, q) is FO-rewritable if there is an FO-sentence ϕ such that for every Σ-
ABox A that is consistent w.r.t. T , we have A |= Q iff A |= ϕ.
    As usual in OBDA, an ABox is a finite set of assertions of the form A(a)
or r(a, b) with A a concept name and r a role name. We write r− (a, b) ∈ A to
mean r(b, a) ∈ A and use Ind(A) to denote the set of individuals used in A. An
ABox A is tree-shaped if the undirected graph (Ind(A), {{a, b} | r(a, b) ∈ A}) is
a tree and whenever r(a, b) ∈ A, then (i) s(a, b) ∈ A implies r = s and (ii) A
contains no assertion of the form s(b, a). Tree-shapedness of conjunctive queries
(CQs) is defined accordingly. Note that, in both cases, our trees allow upwards-
and downwards-directed edges, but no multi-edges.
    We now introduce unravelings of ABoxes and the notion of unraveling toler-
ance [9]. Let A be an ABox and a ∈ Ind(A). The unraveling Aua of A at a is the
following (possibly infinite) ABox:

 – Ind(Aua ) is the set of sequences b0 r0 b1 · · · rn−1 bn , n ≥ 0, such that b0 = a,
   b0 , . . . , bn ∈ Ind(A) and r0 , . . . , rn−1 are (potentially inverse) roles;
 – for each C(b) ∈ A and α = b0 · · · bn ∈ Ind(Aua ) with bn = b: C(α) ∈ Aua ;
 – for each α = b0 r0 · · · rn−1 bn ∈ Ind(Aua ) with n > 0: rn−1 (b0 · · · bn−1 , α) ∈ Aua .

For all α = b0 · · · bn ∈ Ind(Aua ), we write tail(α) to denote bn . Note that Aua is tree-
shaped. An OMQ Q = (T , Σ, q) is unraveling tolerant if for every Σ-ABox A,
A |= Q implies Aua |= Q for some a ∈ Ind(A). Note that this is essentially the
same notion of unraveling tolerance as introduced in [9].
    It can be shown as in [9] that in OBDA languages where the TBoxes are
formulated in Horn DLs such as EL, ELI, and Horn-ALC and where queries are
BAQs or atomic queries (AQs, queries of the form A(x) with A a concept name),
all OMQs are unraveling tolerant. This underlies the following characterization
from [3].
Theorem 1 ([3]). A BAQ Q = (T , Σ, q) from (Horn-ALCI, AQ) is FO-re-
writable iff there exists a k ≥ 0 such that for all tree-shaped Σ-ABoxes A which
are consistent with T , A |= Q implies A|k |= Q where Ak is A with all nodes on
level exceeding k removed.
Using a pumping argument, it can be shown that if there is any bound k as
Theorem 1, then we can choose k = 22|T | . Based on this, worst-case optimal
(ExpTime) decision procedures for FO-rewritability in (Horn-ALCI, AQ) can
be devised using automata methods. Efficiently computing rewritings in practice
requires further algorithm engineering [7].
    We will now establish a characterization of FO-rewritability in the non-
Horn OBDA language (ALCI, BAQ). It is shown in [2] that for every OMQ
Q = (T , Σ, q) from (ALCI, BAQ), there is a CSP template (a finite relational
structure) TQ over signature Σ such that for all Σ-ABoxes A, we have A, T |= q
iff A 6→ TQ , that is, iff there is no homomorphism from A to TQ (in the standard
sense of labeled directed graphs). We say that a CSP template T is FO-definable
if there is an FO-sentence ϕ such that for all finite Σ-structures S, we have
S → T iff S |= ϕ. The complement of T is definable in monadic Datalog if there
is a monadic Datalog program Π such that for all finite Σ-structures S, we have
S 6→ T iff S |= Π. Note that a CSP template is FO-definable iff its complement is
(just take the negation of the defining sentence), but this is not true for monadic
Datalog definability.
    It is easy to see that an OMQ Q is FO-rewritable if and only if the comple-
ment of TQ is FO-definable, and likewise for rewritability into monadic Datalog.
In [2], this observation is used together with results on the FO-definability of
CSPs [8] to show the following.
Theorem 2 ([2]). FO-rewritability in (ALCI, BAQ) and (ALCI, AQ) is decid-
able and NExpTime-complete.
This approach is also capable of producing actual rewritings, but unfortunately it
is best-case exponential. This calls for a better understanding of FO-rewritability
in (ALCI, BAQ) and related languages, as a basis for more practical (yet com-
plete) approaches.
    As a preliminary, we show that unraveling tolerance is equivalent to rewritabil-
ity into monadic Datalog. This actually follows straightforwardly from known
results about CSPs.
Theorem 3. An OMQ from (ALCI, BAQ) is unraveling tolerant iff it is re-
writable into monadic Datalog.

Proof. A CSP template T over signature Σ has tree duality iff there is a set O
of tree-shaped Σ-structures (called obstructions and where tree-shapedness is
defined as for ABoxes and CQs above) such that for all finite Σ-structures S,
we have T ← S iff S 6← O for all O ∈ O. It was shown in [5] that T has tree
duality iff the complement of T is definable in monadic Datalog. It thus remains
to show that an OMQ from (ALCI, BAQ) is unraveling tolerant iff TQ has tree
duality.
    “if”. Assume that Q = (T , Σ, q) is unraveling tolerant. Let O be the set of all
tree-shaped Σ-ABoxes A with A |= Q. Then O witnesses tree duality: if TQ ← A
for some Σ-ABox A, then A 6|= Q; since B |= Q and B → A implies A |= Q [2],
we thus have A 6← B for all B ∈ O as required. Conversely, assume that A is a
Σ-ABox with A 6← B for all B ∈ O. Clearly, Aua → A for all a ∈ Ind(A). Thus, no
such Aua is in O, implying that Aua 6|= Q. Since Q is unraveling tolerant, A 6|= Q
which implies TQ ← A as required.
    “only if”. Assume that TQ has tree duality with set of obstructions O. Let A
be a Σ-ABox with A |= Q. Then TQ 6← A and thus A ← B for some B ∈ O. Since
B is tree-shaped, A ← B implies Aua ← B for some a ∈ Ind(A). Consequently
TQ 6← Aua which yields Aua |= Q as required.                           t
                                                                       u

We now establish the announced characterization.
Theorem 4. Let Q = (T , q, Σ) be an OMQ from (ALCI, BAQ). Then Q is
FO-rewritable iff

1. Q is FO-rewritable on tree-shaped ABoxes and
2. Q is unraveling tolerant.

Proof. “if”. Assume that Q is unraveling tolerant and FO-rewritable on tree-
shaped ABoxes. By Theorem 3, the complement of the template TQ is definable
                                                0
by a monadic Datalog program ΠQ . Let ΠQ            be obtained from ΠQ by identifying
the variables in rule bodies in all possible ways and then retaining only those
                                                                          0
rules whose bodies are a tree-shaped CQ. It can be verified that ΠQ         is a rewriting
                          u
of Q: A |= Q implies Aa |= Q for some a ∈ Ind(A) (since Q is unraveling tolerant)
                                                                             0
implies Aua |= ΠQ (since ΠQ is a rewriting of Q) implies Aua |= ΠQ              (since Aua
                                  0                                              0
is tree-shaped) implies A |= ΠQ       (since AuA → A). Conversely, A |= ΠQ         implies
                                    0
A |= ΠQ (by construction of ΠQ        ) implies A |= Q. It is easy to further modify
   0
ΠQ    so that in addition to being tree shaped, every role body contains at most
one EDB atom.
                                        0
     We now use the existence of ΠQ       to argue that Q has an FO-rewriting ϕ on
tree-shaped ABoxes that takes the form of a union of tree-shaped CQs. Let ψ be
an FO-rewriting of Q on tree-shaped ABoxes. By Gaifman’s locality theorem,
there is a number d ≥ 0 such that for every Σ-ABox A, we have A |= ψ iff A∗d |= ψ
where A∗d is obtained by taking the disjoint union of all d-neighborhoods in A;
here, the d-neighborhood in A around a ∈ Ind(A) is the restriction of A to all
individuals that can be reached from a on a role path in A of length at most d.
Note that ψ is a rewriting of Q and every OMQ from (ALCI, BAQ) satisfies
the property that if a Σ-ABox A is the disjoint union of ABoxes A1 , . . . , Ak ,
then A |= Q iff Ai |= Q for at least one Ai . We can thus strengthen the above
obervation as follows: for every Σ-ABox A, we have A |= ψ iff there is some d-
                                                                      0
neighborhood N in A such that N |= ψ. Since both ψ and ΠQ               are rewritings of
                                                                                 0
the same query Q, the same applies to the monadic Datalog program ΠQ               instead
of to ψ. Moreover, we can find an ` ≥ 0 such that for every Σ-ABox A with
          0
A |= ΠQ     , there is an A0 ⊆ A with A0 |= ΠQ    0
                                                     and in which every individual has
                                                        0
degree at most `—due to the special shape of ΠQ           , we can in fact simply choose
                                              0
for ` the number of IDB relations in ΠQ . Combining these two observations,
we get the following: for every tree-shaped Σ-ABox A with A |= Q, there is a
tree-shaped ABox A0 ⊆ A of depth at most d and degree at most ` such that
A0 |= Q. We can thus choose as the desired rewriting ϕ the UCQ that consists
of all tree-shaped ABoxes A (viewed as a CQ) that satisfy A |= Q and are of
depth at most d and of degree at most `.
     It remains to note that, due to its syntactic shape, ϕ is an FO-rewriting not
only on tree-shaped ABoxes, but also on unrestricted ones. First assume that
A is a Σ-ABox with A |= Q. Since Q is unraveling tolerant, there then is an
a ∈ Ind(A) with Aua |= Q. Since ϕ is an FO-rewriting on tree-shaped ABoxes,
we get Aua |= ϕ. Since ϕ is a UCQ and Aua → A, we obtain A |= ϕ. Conversely,
assume A |= ϕ. Since ϕ is a union of tree-shaped CQs, this yields Aua |= ϕ for
some a ∈ Ind(A), thus Aua |= Q and A |= Q.
     “only if”. Assume that Q is FO-rewritable. Then it is clearly also FO-
rewritable on tree-shaped ABoxes (the same rewriting works). It thus remains
to show that Q is unraveling tolerant.
     It is proved in [1] that a CSP template T over signature Σ is FO-rewritable
iff it has finite duality, that is, iff there is a finite set of structures O such that for
all finite Σ-structures S, we have T ← S iff S 6← O for all O ∈ O. It was shown
in [10] that finite duality implies tree duality. In fact, as observed in [8], we can
assume w.l.o.g. that the finitely many elements of O are finite and tree-shaped.
One could call this finite duality in terms of finite trees.
     Now back to our OMQ Q. Since Q is FO-rewritable, so is TQ . By the above
result on finite duality in terms of finite trees, there is thus a finite set Γ of
tree-shaped ABoxes such that for all Σ-ABoxesWA, we have A |= Q iff B → A
for some B ∈ Γ . Consequently, the UCQ qb = B∈Γ qB is an FO-rewriting of
Q, where qB is B viewed as a Boolean CQ in the obvious way. Note that qb is a
disjunction of tree-shaped CQs. It is thus straightforward to show that for all
Σ-ABoxes A, we have A |= qb iff Aua |= qb for some a ∈ Ind(A). The unraveling
tolerance of Q follows.                                                                  t
                                                                                         u

The proof of Theorem 4 also yields the following corollary, which strengthens the
observation from [2] that in (ALCI, BAQ), every FO-rewritable OMQ is UCQ-
rewritable (essentially a consequence of Rossmann’s homomorphism preservation
theorem).
Corollary 1. If an OMQ in (ALCI, BAQ) is FO-rewritable, then it is rewritable
into a union of tree-shaped conjunctive queries.
We remark that, even when switching to the OBDA language (ALC, BAQ), it is
not possible to replace the undirected trees in Corollary 1 with directed trees.
    We close with some discussion of Theorem 4. As future work, we plan to adapt
the result from (ALCI, BAQ) to (ALCI, AQ) and to use them as a basis for
developing practically feasible algorithms that construct FO-rewritings. Dealing
with (ALCI, AQ) seems to require more liberal definitions of tree-shaped ABoxes
and of unraveling tolerance which allow for back-edges to the root as in the
tree-model property for DLs with nominals. To obtain a first impression of the
effect of answer variables, the reader might want to consider the following OMQ
Q = (T , Σ, q) from (ALCI, BAQ):

      T = {P u ∃r.P v A, ¬P u ∃r.¬P v A}                Σ = {r}       q = ∃x A(x)

and its variation Q0 from (ALCI, BAQ) obtained by replacing q with the AQ
q 0 = A(x). Q is not unraveling tolerant as witnessed by the ABox A = {r(a, a)}
which satisfies A |= Q, but Aua 6|= Q. The same is true for Q0 if the notion of
unraveling tolerance is adapted in a naive way to non-Boolean OMQs. However,
while Q is not FO-rewritable (by Theorem 4), it is not too hard to prove that
the FO-formula r(x, x) is an FO-rewriting of Q0 .
    Another interesting question concerns the complexity of deciding FO-re-
writability in (ALCI, BAQ) (and of course also (ALCI, AQ)) via Theorem 4.
It is shown in [5] that unraveling tolerance is decidable (in 3-ExpTime) and
using techniques from [2], it is possible to prove NExpTime-hardness. We spec-
ulate that the problem might actually be NExpTime-complete. Regarding FO-
rewritability in (ALCI, BAQ) on tree-shaped ABoxes, it seems likely that a
2-ExpTime lower bound can be established by combining reductions from [3]
and [6]—thus FO-rewritability on tree-shaped ABoxes would be harder than on
unrestricted ABoxes! However, if we already know that Q is unraveling tolerant,
then FO-rewritability on trees is trivially in NExpTime, simply by Theorem [6].
Acknowledgements. I am grateful to Frank Wolter for, as always, very helpful
and stimulating discussions.


References
 1. Atserias, A.: On digraph coloring problems and treewidth duality. Eur. J. Comb.
    29(4), 796–820 (2008)
 2. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A
    study through disjunctive datalog, CSP, and MMSNP. ACM Trans. Database Syst.
    39(4), 33:1–33:44 (2014)
 3. Bienvenu, M., Lutz, C., Wolter, F.: First-order rewritability of atomic queries in
    Horn description logics. In: Proc. of IJCAI (2013)
 4. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    J. Autom. Reasoning 39(3), 385–429 (2007)
 5. Feder, T., Vardi, M.Y.: The computational structure of monotone monadic SNP
    and constraint satisfaction: A study through datalog and group theory. SIAM J.
    Comput. 28(1), 57–104 (1998)
 6. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser-
    vative extensions in description logics. In: Proc. of KR (2006)
 7. Hansen, P., Lutz, C., İnanç Seylan, Wolter, F.: Efficient query rewriting in the
    description logic EL and beyond. In: Proc. of IJCAI (2015)
 8. Larose, B., Loten, C., Tardif, C.: A characterisation of first-order constraint satis-
    faction problems. Logical Methods in Computer Science 3(4) (2007)
 9. Lutz, C., Wolter, F.: Non-uniform data complexity of query answering in descrip-
    tion logics. In: Proc. of KR (2012)
10. Nesetril, J., Tardif, C.: Duality theorems for finite structures (characterising gaps
    and good characterisations). J. Comb. Theory, Ser. B 80(1), 80–97 (2000)

</pre>