Polynomial Combined Rewritings for Linear Existential
       Rules and DL-Lite with n-ary Relations

                   Georg Gottlob1 , Marco Manna2 , and Andreas Pieris3
               1
                  Department of Computer Science, University of Oxford, UK
                             georg.gottlob@cs.ox.ac.uk
      2
        Department of Mathematics and Computer Science, University of Calabria, Italy
                                 manna@mat.unical.it
        3
          Institute of Information Systems, Vienna University of Technology, Austria
                              pieris@dbai.tuwien.ac.at


1   Introduction

This paper considers the setting of ontology-based query answering (OBQA). In this
setting, Description Logics (DLs) and existential rules (a.k.a. tuple-generating depen-
dencies or Datalog± rules) are popular ontology languages, while conjunctive queries
(CQs) is the main querying tool. Among KR researchers there is a clear consensus that
the required level of scalability in OBQA can only be achieved via query rewriting,
which attempts to reduce OBQA into the problem of evaluating a query over a rela-
tional database. Two query languages are usually considered: first-order queries (FO)
and non-recursive Datalog queries (NDL).
    An interesting approach to query rewriting is the polynomial combined approach [7],
which can be described as follows: an ontology Σ can be incorporated together with a
CQ q into a database query qΣ in polynomial time, and also with a database D into
a database DΣ in polynomial time, such that qΣ over DΣ yields the same result as q
evaluated over the knowledge base consisting of D and Σ. The polynomial combined
approach has been applied to ELHdr       ⊥ [7], an extension of the well-known DL EL, to
DL-LiteN  horn [5, 6], one  of the most   expressive logics of the DL-Lite family, and only
recently to the main guarded- and sticky-based classes of existential rules [3].
    Research Challenges. The problem of applying the polynomial combined approach
to existing DLs and classes of existential rules is relatively understood. Nevertheless,
there are still basic open questions that deserve our attention. Regarding DLs, little is
known about formalisms with n-ary relations such as DLR-LiteR , that is, the exten-
sion of DL-LiteR with n-ary roles. Regarding existential rules, it is open whether the
polynomial combined approach can be applied to the class of linear existential rules (or
simply linear rules), that is, assertions of the form ∀X∀Y(s(X, Y) → ∃Z p(X, Z)),
where s(X, Y) and p(X, Z) are atomic formulas [1].
    It is not difficult to show that, if linear rules are polynomially combined rewritable,
then also DLR-LiteR is polynomially combined rewritable — this follows from the
fact that query answering under DLR-LiteR can be easily reduced to query answering
under linear rules [1]. Thus, the key question that we need to answer, which has been
explicitly stated as an open problem in [3], is the following:
                             p(a,b,c)                 p(b,c,d)


                             p(a,z1,b)                p(b,z5,c)


                p(z1,a,z2)               p(b,z4,z1)                                z1              z5


                p(z1,z3,a)               p(z4,z1,b)   p(z5,c,b)               z3              z4

                                                                                        (b)
                p(z3,a,z1)


                 ∃A∃B∃C∃D (p(A,a,B) ∧ p(C,B,b) ∧ p(D,c,b))

                                         (a)


                                 Fig. 1. Illustration of a proof generator.

 Given a (Boolean) CQ q, a database D, and a set Σ of linear rules, can we rewrite
   in polynomial time: (i) q and Σ, independently of D, into a Q-query qΣ , where
   Q ∈ {FO, NDL}, and (ii) D and Σ, independently of q, into a database DΣ , such
   that (D ∪ Σ) |= q iff DΣ |= qΣ ?

    The answer to the above question is affirmative under the assumption that the arity
of the underlying schema is bounded; implicit in [2]. However, little is known for arbi-
trary linear rules, without any assumption on the underlying schema. We give a positive
answer even for linear rules that use predicates of unbounded arity. For more details,
we refer the reader to [4].


2   Proof Generator

We assume the reader is familiar with the chase procedure. Recall that the chase for
a database D and a set Σ of rules, denoted chase(D, Σ), is a universal model of D
and Σ, and thus (D ∪ Σ) |= q iff chase(D, Σ) |= q, for each CQ q. The instance
chase(D, Σ) can be naturally seen as a directed labeled graph, called chase graph,
denoted CG(D, Σ). It is also easy to verify that for linear rules, CG(D, Σ) is a directed
forest; for details on the chase, see, e.g., [1]. Our main technical tool is called proof
generator, and it formalizes the intuitive idea that (Boolean) CQ answering under linear
rules can be realized as a reachability problem on the chase graph. Let us illustrate the
key ideas underlying the proof generator via a simple example.

Example 1. Let D = {p(a, b, c), p(b, c, d)}, and let Σ be the set of linear rules (for
brevity, the universal quantifiers are omitted):

    p(X, Y, Z) → ∃W p(X, W, Y )                        p(X, Y, Z) → ∃W p(Z, W, Y )
    p(X, Y, Z) → ∃W p(Y, X, W )                        p(X, Y, Z) → p(Y, Z, X).

Given the BCQ q = ∃A∃B∃C∃D (p(A, a, B) ∧ p(C, B, b) ∧ p(D, c, b)), as shown in
Figure 1(a), there exists a homomorphism h (dashed arrows in the figure) that maps q to
an initial segment of the chase graph of D and Σ, and thus (D∪Σ) |= q. It is interesting
to observe that the nulls occurring in h(q), i.e., z1 , z3 , z4 and z5 , form a rooted forest
F , depicted in Figure 1(b), with the following properties; for brevity, let ν(z) be the
atom in CG(D, Σ) where the null z is invented (see shaded atoms in Figure 1(a) for
ν(z1 ), ν(z3 ), ν(z4 ) and ν(z5 )): (i) for every root node z, ν(z) is reachable from D; (ii)
for every edge (z, w), ν(w) is reachable from ν(z); and (iii) for every atom a ∈ h(q),
there exists a unique path π in F that contains all the nulls in a, and, assuming that
the leaf node of π is z, a is reachable from ν(z). Indeed, it is easy to verify that ν(z1 )
and ν(z5 ) are reachable from D, ν(z3 ) and ν(z4 ) are reachable from ν(z1 ), and finally,
h(p(A, a, B)) = p(z3 , a, z1 ) is reachable from ν(z3 ), h(p(C, B, b)) = p(z4 , z1 , b) is
reachable from ν(z4 ), and h(p(D, c, b)) = p(z5 , c, b) is reachable from ν(z5 ).
    Roughly speaking, the triple consisting of: (1) the homomorphism h, that maps q to
the chase; (2) the function ν, that gives the atoms in the chase where the nulls occurring
in h(q) were invented; and (3) the forest F , that encodes how the nulls of h(q) depend
on each other, as well as the order of their generation, is what we call a proof generator
for q w.r.t. D and Σ. The existence of such a triple, allows us to generate the part of
the chase due to which the query is entailed, i.e., the proof of the query (hence the
name “proof generator”). Therefore, for query answering purposes under linear rules,
we simply need to check if such a proof generator exists.
Lemma 1. (D ∪ Σ) |= q iff there exists a proof generator for q w.r.t. D and Σ.


3   The Combined Rewriting
We give a positive answer to our research question regarding linear rules and the poly-
nomial combined approach. More precisely, we show that:
Theorem 1. The class of linear rules is polynomially combined Q-rewritable, where
Q ∈ {FO, NDL}.
    To establish the above theorem, we heavily rely on the notion of the proof generator.
Fix a (Boolean) CQ q, a database D, and a set Σ of linear rules. By Lemma 1, it
suffices to construct in polynomial time a query qΣ ∈ Q (independently of D), and
a database DΣ (independently of q), such that DΣ |= qΣ iff a proof generator for q
w.r.t. D and Σ exists. Roughly, the query qΣ guesses a triple (h, ν, F ) (as described in
Example 1), and then verifies that the guessed triple is a proof generator for q w.r.t. D
and Σ. The interesting part of qΣ is the component that applies the crucial reachability
checks required by the definition of the proof generator. Although the path among two
atoms in the chase graph may be of exponential size, its existence can be checked via
Q-queries of polynomial size. An immediate consequence of Theorem 1 is that:
Corollary 1. The description logic DLR-LiteR is polynomially combined Q-rewritable,
where Q ∈ {FO, NDL}.
    The results of this work are, for the moment, of theoretical nature and we do not
claim that they will directly lead to better practical algorithms. We believe that a smart
implementation of the proposed techniques can lead to more efficient rewriting proce-
dures; this will be the subject of future research. We are also planning to optimize the
proposed rewriting algorithm, with the aim of constructing queries of optimal size.
    Acknowledgements. G. Gottlob was supported by the EPSRC Programme Grant
EP/M025268/ “VADA: Value Added Data Systems – Principles and Architecture”. M.
Manna was supported by the MIUR project “SI-LAB BA2KNOW – Business Anali-
tycs to Know”, and by Regione Calabria, programme POR Calabria FESR 2007-2013,
projects “ITravel PLUS” and “KnowRex: Un sistema per il riconoscimento e l’estrazione
di conoscenza”. A. Pieris was supported by the Austrian Science Fund (FWF): P25207-
N23 and Y698.


References
1. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query
   answering over ontologies. J. Web Sem. 14, 57–83 (2012)
2. Gottlob, G., Kikot, S., Kontchakov, R., Podolskii, V.V., Schwentick, T., Zakharyaschev, M.:
   The price of query rewriting in ontology-based data access. Artif. Intell. 213, 42–59 (2014)
3. Gottlob, G., Manna, M., Pieris, A.: Polynomial combined rewritings for existential rules. In:
   KR (2014)
4. Gottlob, G., Manna, M., Pieris, A.: Polynomial rewritings for linear existential rules. In: IJCAI
   (2015)
5. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach
   to query answering in DL-Lite. In: KR (2010)
6. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach
   to ontology-based data access. In: IJCAI. pp. 2656–2661 (2011)
7. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL
   using a relational database system. In: IJCAI. pp. 2070–2075 (2009)