=Paper=
{{Paper
|id=Vol-1350/paper-01
|storemode=property
|title=Query
Rewriting Beyond DL-Lite
|pdfUrl=https://ceur-ws.org/Vol-1350/paper-01.pdf
|volume=Vol-1350
|dblpUrl=https://dblp.org/rec/conf/dlog/Lutz15
}}
==Query
Rewriting Beyond DL-Lite==
Query Rewriting Beyond DL-Lite Carsten Lutz Fachbereich Informatik, Universität Bremen, Germany 1 Abstract of Invited Talk Query rewriting has become a very prominent tool for efficiently implementing ontology-mediated querying in practice. The technique was originally introduced in the context of DL-Lite [4], but is now increasingly being used also for more expressive DLs. While rewritings are not guaranteed to exist beyond DL-Lite, the simple structure of ontologies that emerge from practical applications gives hope that non-existence of rewritings is a rare case. The aim of the talk is to survey FO- and Datalog-rewriting of ontology- mediated queries in description logics beyond DL-Lite. It is structured into three parts. The first part is concerned with FO-rewritings in Horn-DLs such as EL, ELI, and Horn-SHI, the second part considers FO-rewritings in non-Horn-DLs such as ALC and ALCI, and the third part is about Datalog-rewritings in non- Horn DLs. In all three parts, I will try to emphasize useful characterizations of FO-rewritability, practically efficient algorithms for constructing rewritings, and relevant computational complexity results. The presentation is based on joint work with Meghyn Bienvenu, Balder ten Cate, Peter Hansen, İnanc Seylan, and Frank Wolter. The subsequent section provides some supplementary material that is featured in the talk, but has not yet been published elsewhere. It establishes a link between the first and the second part of the talk. 2 Supplementary Material In [3, 7], we have proposed an approach to deciding the FO-rewritability of OMQs for the case where the ontology/TBox is formulated in a Horn DL such as EL, ELI, and Horn-ALCI. The approach has led to efficient (yet complete) practical implementations, and it relies on a characterization of FO-rewritability in terms of tree-shaped ABoxes. Intuitively, the characterization relies on a property of TBoxes that is called ‘unraveling tolerance’ and which typically is enjoyed by Horn DLs, but not by DLs that include forms of disjunction. In contrast, the only known complete approach to deciding FO-rewritability of OMQs in which the TBox is formulated in full (non-Horn) ALC and ALCI is via the CSP connection in [9, 2]. Since ALC- and ALCI-TBoxes are typically not unraveling tolerant, it might seems that these two world are largely unrelated. In the following, though, we point out a characterization of FO-rewritability in full ALCI that establishes an interesting connection to tree-shaped ABoxes and thus to the Horn case. We consider Boolean atomic queries (BAQs), that is, queries of the form ∃x A(x) with A a concept name. An ontology-mediated query (OMQ) is a triple Q = (T , Σ, q) with T a TBox, Σ an ABox signature (set of concept and role names), and q a query. An OBDA language is a set of OMQs. We use (ALCI, BAQ) to denote the OBDA language that consists of all OMQs (T , Σ, q) with T an ALCI-TBox and q a BAQ, and likewise for other combinations of a DL and a query language. An OMQ Q = (T , Σ, q) is FO-rewritable if there is an FO-sentence ϕ such that for every Σ- ABox A that is consistent w.r.t. T , we have A |= Q iff A |= ϕ. As usual in OBDA, an ABox is a finite set of assertions of the form A(a) or r(a, b) with A a concept name and r a role name. We write r− (a, b) ∈ A to mean r(b, a) ∈ A and use Ind(A) to denote the set of individuals used in A. An ABox A is tree-shaped if the undirected graph (Ind(A), {{a, b} | r(a, b) ∈ A}) is a tree and whenever r(a, b) ∈ A, then (i) s(a, b) ∈ A implies r = s and (ii) A contains no assertion of the form s(b, a). Tree-shapedness of conjunctive queries (CQs) is defined accordingly. Note that, in both cases, our trees allow upwards- and downwards-directed edges, but no multi-edges. We now introduce unravelings of ABoxes and the notion of unraveling toler- ance [9]. Let A be an ABox and a ∈ Ind(A). The unraveling Aua of A at a is the following (possibly infinite) ABox: – Ind(Aua ) is the set of sequences b0 r0 b1 · · · rn−1 bn , n ≥ 0, such that b0 = a, b0 , . . . , bn ∈ Ind(A) and r0 , . . . , rn−1 are (potentially inverse) roles; – for each C(b) ∈ A and α = b0 · · · bn ∈ Ind(Aua ) with bn = b: C(α) ∈ Aua ; – for each α = b0 r0 · · · rn−1 bn ∈ Ind(Aua ) with n > 0: rn−1 (b0 · · · bn−1 , α) ∈ Aua . For all α = b0 · · · bn ∈ Ind(Aua ), we write tail(α) to denote bn . Note that Aua is tree- shaped. An OMQ Q = (T , Σ, q) is unraveling tolerant if for every Σ-ABox A, A |= Q implies Aua |= Q for some a ∈ Ind(A). Note that this is essentially the same notion of unraveling tolerance as introduced in [9]. It can be shown as in [9] that in OBDA languages where the TBoxes are formulated in Horn DLs such as EL, ELI, and Horn-ALC and where queries are BAQs or atomic queries (AQs, queries of the form A(x) with A a concept name), all OMQs are unraveling tolerant. This underlies the following characterization from [3]. Theorem 1 ([3]). A BAQ Q = (T , Σ, q) from (Horn-ALCI, AQ) is FO-re- writable iff there exists a k ≥ 0 such that for all tree-shaped Σ-ABoxes A which are consistent with T , A |= Q implies A|k |= Q where Ak is A with all nodes on level exceeding k removed. Using a pumping argument, it can be shown that if there is any bound k as Theorem 1, then we can choose k = 22|T | . Based on this, worst-case optimal (ExpTime) decision procedures for FO-rewritability in (Horn-ALCI, AQ) can be devised using automata methods. Efficiently computing rewritings in practice requires further algorithm engineering [7]. We will now establish a characterization of FO-rewritability in the non- Horn OBDA language (ALCI, BAQ). It is shown in [2] that for every OMQ Q = (T , Σ, q) from (ALCI, BAQ), there is a CSP template (a finite relational structure) TQ over signature Σ such that for all Σ-ABoxes A, we have A, T |= q iff A 6→ TQ , that is, iff there is no homomorphism from A to TQ (in the standard sense of labeled directed graphs). We say that a CSP template T is FO-definable if there is an FO-sentence ϕ such that for all finite Σ-structures S, we have S → T iff S |= ϕ. The complement of T is definable in monadic Datalog if there is a monadic Datalog program Π such that for all finite Σ-structures S, we have S 6→ T iff S |= Π. Note that a CSP template is FO-definable iff its complement is (just take the negation of the defining sentence), but this is not true for monadic Datalog definability. It is easy to see that an OMQ Q is FO-rewritable if and only if the comple- ment of TQ is FO-definable, and likewise for rewritability into monadic Datalog. In [2], this observation is used together with results on the FO-definability of CSPs [8] to show the following. Theorem 2 ([2]). FO-rewritability in (ALCI, BAQ) and (ALCI, AQ) is decid- able and NExpTime-complete. This approach is also capable of producing actual rewritings, but unfortunately it is best-case exponential. This calls for a better understanding of FO-rewritability in (ALCI, BAQ) and related languages, as a basis for more practical (yet com- plete) approaches. As a preliminary, we show that unraveling tolerance is equivalent to rewritabil- ity into monadic Datalog. This actually follows straightforwardly from known results about CSPs. Theorem 3. An OMQ from (ALCI, BAQ) is unraveling tolerant iff it is re- writable into monadic Datalog. Proof. A CSP template T over signature Σ has tree duality iff there is a set O of tree-shaped Σ-structures (called obstructions and where tree-shapedness is defined as for ABoxes and CQs above) such that for all finite Σ-structures S, we have T ← S iff S 6← O for all O ∈ O. It was shown in [5] that T has tree duality iff the complement of T is definable in monadic Datalog. It thus remains to show that an OMQ from (ALCI, BAQ) is unraveling tolerant iff TQ has tree duality. “if”. Assume that Q = (T , Σ, q) is unraveling tolerant. Let O be the set of all tree-shaped Σ-ABoxes A with A |= Q. Then O witnesses tree duality: if TQ ← A for some Σ-ABox A, then A 6|= Q; since B |= Q and B → A implies A |= Q [2], we thus have A 6← B for all B ∈ O as required. Conversely, assume that A is a Σ-ABox with A 6← B for all B ∈ O. Clearly, Aua → A for all a ∈ Ind(A). Thus, no such Aua is in O, implying that Aua 6|= Q. Since Q is unraveling tolerant, A 6|= Q which implies TQ ← A as required. “only if”. Assume that TQ has tree duality with set of obstructions O. Let A be a Σ-ABox with A |= Q. Then TQ 6← A and thus A ← B for some B ∈ O. Since B is tree-shaped, A ← B implies Aua ← B for some a ∈ Ind(A). Consequently TQ 6← Aua which yields Aua |= Q as required. t u We now establish the announced characterization. Theorem 4. Let Q = (T , q, Σ) be an OMQ from (ALCI, BAQ). Then Q is FO-rewritable iff 1. Q is FO-rewritable on tree-shaped ABoxes and 2. Q is unraveling tolerant. Proof. “if”. Assume that Q is unraveling tolerant and FO-rewritable on tree- shaped ABoxes. By Theorem 3, the complement of the template TQ is definable 0 by a monadic Datalog program ΠQ . Let ΠQ be obtained from ΠQ by identifying the variables in rule bodies in all possible ways and then retaining only those 0 rules whose bodies are a tree-shaped CQ. It can be verified that ΠQ is a rewriting u of Q: A |= Q implies Aa |= Q for some a ∈ Ind(A) (since Q is unraveling tolerant) 0 implies Aua |= ΠQ (since ΠQ is a rewriting of Q) implies Aua |= ΠQ (since Aua 0 0 is tree-shaped) implies A |= ΠQ (since AuA → A). Conversely, A |= ΠQ implies 0 A |= ΠQ (by construction of ΠQ ) implies A |= Q. It is easy to further modify 0 ΠQ so that in addition to being tree shaped, every role body contains at most one EDB atom. 0 We now use the existence of ΠQ to argue that Q has an FO-rewriting ϕ on tree-shaped ABoxes that takes the form of a union of tree-shaped CQs. Let ψ be an FO-rewriting of Q on tree-shaped ABoxes. By Gaifman’s locality theorem, there is a number d ≥ 0 such that for every Σ-ABox A, we have A |= ψ iff A∗d |= ψ where A∗d is obtained by taking the disjoint union of all d-neighborhoods in A; here, the d-neighborhood in A around a ∈ Ind(A) is the restriction of A to all individuals that can be reached from a on a role path in A of length at most d. Note that ψ is a rewriting of Q and every OMQ from (ALCI, BAQ) satisfies the property that if a Σ-ABox A is the disjoint union of ABoxes A1 , . . . , Ak , then A |= Q iff Ai |= Q for at least one Ai . We can thus strengthen the above obervation as follows: for every Σ-ABox A, we have A |= ψ iff there is some d- 0 neighborhood N in A such that N |= ψ. Since both ψ and ΠQ are rewritings of 0 the same query Q, the same applies to the monadic Datalog program ΠQ instead of to ψ. Moreover, we can find an ` ≥ 0 such that for every Σ-ABox A with 0 A |= ΠQ , there is an A0 ⊆ A with A0 |= ΠQ 0 and in which every individual has 0 degree at most `—due to the special shape of ΠQ , we can in fact simply choose 0 for ` the number of IDB relations in ΠQ . Combining these two observations, we get the following: for every tree-shaped Σ-ABox A with A |= Q, there is a tree-shaped ABox A0 ⊆ A of depth at most d and degree at most ` such that A0 |= Q. We can thus choose as the desired rewriting ϕ the UCQ that consists of all tree-shaped ABoxes A (viewed as a CQ) that satisfy A |= Q and are of depth at most d and of degree at most `. It remains to note that, due to its syntactic shape, ϕ is an FO-rewriting not only on tree-shaped ABoxes, but also on unrestricted ones. First assume that A is a Σ-ABox with A |= Q. Since Q is unraveling tolerant, there then is an a ∈ Ind(A) with Aua |= Q. Since ϕ is an FO-rewriting on tree-shaped ABoxes, we get Aua |= ϕ. Since ϕ is a UCQ and Aua → A, we obtain A |= ϕ. Conversely, assume A |= ϕ. Since ϕ is a union of tree-shaped CQs, this yields Aua |= ϕ for some a ∈ Ind(A), thus Aua |= Q and A |= Q. “only if”. Assume that Q is FO-rewritable. Then it is clearly also FO- rewritable on tree-shaped ABoxes (the same rewriting works). It thus remains to show that Q is unraveling tolerant. It is proved in [1] that a CSP template T over signature Σ is FO-rewritable iff it has finite duality, that is, iff there is a finite set of structures O such that for all finite Σ-structures S, we have T ← S iff S 6← O for all O ∈ O. It was shown in [10] that finite duality implies tree duality. In fact, as observed in [8], we can assume w.l.o.g. that the finitely many elements of O are finite and tree-shaped. One could call this finite duality in terms of finite trees. Now back to our OMQ Q. Since Q is FO-rewritable, so is TQ . By the above result on finite duality in terms of finite trees, there is thus a finite set Γ of tree-shaped ABoxes such that for all Σ-ABoxesWA, we have A |= Q iff B → A for some B ∈ Γ . Consequently, the UCQ qb = B∈Γ qB is an FO-rewriting of Q, where qB is B viewed as a Boolean CQ in the obvious way. Note that qb is a disjunction of tree-shaped CQs. It is thus straightforward to show that for all Σ-ABoxes A, we have A |= qb iff Aua |= qb for some a ∈ Ind(A). The unraveling tolerance of Q follows. t u The proof of Theorem 4 also yields the following corollary, which strengthens the observation from [2] that in (ALCI, BAQ), every FO-rewritable OMQ is UCQ- rewritable (essentially a consequence of Rossmann’s homomorphism preservation theorem). Corollary 1. If an OMQ in (ALCI, BAQ) is FO-rewritable, then it is rewritable into a union of tree-shaped conjunctive queries. We remark that, even when switching to the OBDA language (ALC, BAQ), it is not possible to replace the undirected trees in Corollary 1 with directed trees. We close with some discussion of Theorem 4. As future work, we plan to adapt the result from (ALCI, BAQ) to (ALCI, AQ) and to use them as a basis for developing practically feasible algorithms that construct FO-rewritings. Dealing with (ALCI, AQ) seems to require more liberal definitions of tree-shaped ABoxes and of unraveling tolerance which allow for back-edges to the root as in the tree-model property for DLs with nominals. To obtain a first impression of the effect of answer variables, the reader might want to consider the following OMQ Q = (T , Σ, q) from (ALCI, BAQ): T = {P u ∃r.P v A, ¬P u ∃r.¬P v A} Σ = {r} q = ∃x A(x) and its variation Q0 from (ALCI, BAQ) obtained by replacing q with the AQ q 0 = A(x). Q is not unraveling tolerant as witnessed by the ABox A = {r(a, a)} which satisfies A |= Q, but Aua 6|= Q. The same is true for Q0 if the notion of unraveling tolerance is adapted in a naive way to non-Boolean OMQs. However, while Q is not FO-rewritable (by Theorem 4), it is not too hard to prove that the FO-formula r(x, x) is an FO-rewriting of Q0 . Another interesting question concerns the complexity of deciding FO-re- writability in (ALCI, BAQ) (and of course also (ALCI, AQ)) via Theorem 4. It is shown in [5] that unraveling tolerance is decidable (in 3-ExpTime) and using techniques from [2], it is possible to prove NExpTime-hardness. We spec- ulate that the problem might actually be NExpTime-complete. Regarding FO- rewritability in (ALCI, BAQ) on tree-shaped ABoxes, it seems likely that a 2-ExpTime lower bound can be established by combining reductions from [3] and [6]—thus FO-rewritability on tree-shaped ABoxes would be harder than on unrestricted ABoxes! However, if we already know that Q is unraveling tolerant, then FO-rewritability on trees is trivially in NExpTime, simply by Theorem [6]. Acknowledgements. I am grateful to Frank Wolter for, as always, very helpful and stimulating discussions. References 1. Atserias, A.: On digraph coloring problems and treewidth duality. Eur. J. Comb. 29(4), 796–820 (2008) 2. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Trans. Database Syst. 39(4), 33:1–33:44 (2014) 3. Bienvenu, M., Lutz, C., Wolter, F.: First-order rewritability of atomic queries in Horn description logics. In: Proc. of IJCAI (2013) 4. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007) 5. Feder, T., Vardi, M.Y.: The computational structure of monotone monadic SNP and constraint satisfaction: A study through datalog and group theory. SIAM J. Comput. 28(1), 57–104 (1998) 6. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser- vative extensions in description logics. In: Proc. of KR (2006) 7. Hansen, P., Lutz, C., İnanç Seylan, Wolter, F.: Efficient query rewriting in the description logic EL and beyond. In: Proc. of IJCAI (2015) 8. Larose, B., Loten, C., Tardif, C.: A characterisation of first-order constraint satis- faction problems. Logical Methods in Computer Science 3(4) (2007) 9. Lutz, C., Wolter, F.: Non-uniform data complexity of query answering in descrip- tion logics. In: Proc. of KR (2012) 10. Nesetril, J., Tardif, C.: Duality theorems for finite structures (characterising gaps and good characterisations). J. Comb. Theory, Ser. B 80(1), 80–97 (2000)