Polynomial Combined Rewritings for Linear Existential Rules and DL-Lite with n-ary Relations Georg Gottlob1 , Marco Manna2 , and Andreas Pieris3 1 Department of Computer Science, University of Oxford, UK georg.gottlob@cs.ox.ac.uk 2 Department of Mathematics and Computer Science, University of Calabria, Italy manna@mat.unical.it 3 Institute of Information Systems, Vienna University of Technology, Austria pieris@dbai.tuwien.ac.at 1 Introduction This paper considers the setting of ontology-based query answering (OBQA). In this setting, Description Logics (DLs) and existential rules (a.k.a. tuple-generating depen- dencies or Datalog± rules) are popular ontology languages, while conjunctive queries (CQs) is the main querying tool. Among KR researchers there is a clear consensus that the required level of scalability in OBQA can only be achieved via query rewriting, which attempts to reduce OBQA into the problem of evaluating a query over a rela- tional database. Two query languages are usually considered: first-order queries (FO) and non-recursive Datalog queries (NDL). An interesting approach to query rewriting is the polynomial combined approach [7], which can be described as follows: an ontology Σ can be incorporated together with a CQ q into a database query qΣ in polynomial time, and also with a database D into a database DΣ in polynomial time, such that qΣ over DΣ yields the same result as q evaluated over the knowledge base consisting of D and Σ. The polynomial combined approach has been applied to ELHdr ⊥ [7], an extension of the well-known DL EL, to DL-LiteN horn [5, 6], one of the most expressive logics of the DL-Lite family, and only recently to the main guarded- and sticky-based classes of existential rules [3]. Research Challenges. The problem of applying the polynomial combined approach to existing DLs and classes of existential rules is relatively understood. Nevertheless, there are still basic open questions that deserve our attention. Regarding DLs, little is known about formalisms with n-ary relations such as DLR-LiteR , that is, the exten- sion of DL-LiteR with n-ary roles. Regarding existential rules, it is open whether the polynomial combined approach can be applied to the class of linear existential rules (or simply linear rules), that is, assertions of the form ∀X∀Y(s(X, Y) → ∃Z p(X, Z)), where s(X, Y) and p(X, Z) are atomic formulas [1]. It is not difficult to show that, if linear rules are polynomially combined rewritable, then also DLR-LiteR is polynomially combined rewritable — this follows from the fact that query answering under DLR-LiteR can be easily reduced to query answering under linear rules [1]. Thus, the key question that we need to answer, which has been explicitly stated as an open problem in [3], is the following: p(a,b,c) p(b,c,d) p(a,z1,b) p(b,z5,c) p(z1,a,z2) p(b,z4,z1) z1 z5 p(z1,z3,a) p(z4,z1,b) p(z5,c,b) z3 z4 (b) p(z3,a,z1) ∃A∃B∃C∃D (p(A,a,B) ∧ p(C,B,b) ∧ p(D,c,b)) (a) Fig. 1. Illustration of a proof generator. Given a (Boolean) CQ q, a database D, and a set Σ of linear rules, can we rewrite in polynomial time: (i) q and Σ, independently of D, into a Q-query qΣ , where Q ∈ {FO, NDL}, and (ii) D and Σ, independently of q, into a database DΣ , such that (D ∪ Σ) |= q iff DΣ |= qΣ ? The answer to the above question is affirmative under the assumption that the arity of the underlying schema is bounded; implicit in [2]. However, little is known for arbi- trary linear rules, without any assumption on the underlying schema. We give a positive answer even for linear rules that use predicates of unbounded arity. For more details, we refer the reader to [4]. 2 Proof Generator We assume the reader is familiar with the chase procedure. Recall that the chase for a database D and a set Σ of rules, denoted chase(D, Σ), is a universal model of D and Σ, and thus (D ∪ Σ) |= q iff chase(D, Σ) |= q, for each CQ q. The instance chase(D, Σ) can be naturally seen as a directed labeled graph, called chase graph, denoted CG(D, Σ). It is also easy to verify that for linear rules, CG(D, Σ) is a directed forest; for details on the chase, see, e.g., [1]. Our main technical tool is called proof generator, and it formalizes the intuitive idea that (Boolean) CQ answering under linear rules can be realized as a reachability problem on the chase graph. Let us illustrate the key ideas underlying the proof generator via a simple example. Example 1. Let D = {p(a, b, c), p(b, c, d)}, and let Σ be the set of linear rules (for brevity, the universal quantifiers are omitted): p(X, Y, Z) → ∃W p(X, W, Y ) p(X, Y, Z) → ∃W p(Z, W, Y ) p(X, Y, Z) → ∃W p(Y, X, W ) p(X, Y, Z) → p(Y, Z, X). Given the BCQ q = ∃A∃B∃C∃D (p(A, a, B) ∧ p(C, B, b) ∧ p(D, c, b)), as shown in Figure 1(a), there exists a homomorphism h (dashed arrows in the figure) that maps q to an initial segment of the chase graph of D and Σ, and thus (D∪Σ) |= q. It is interesting to observe that the nulls occurring in h(q), i.e., z1 , z3 , z4 and z5 , form a rooted forest F , depicted in Figure 1(b), with the following properties; for brevity, let ν(z) be the atom in CG(D, Σ) where the null z is invented (see shaded atoms in Figure 1(a) for ν(z1 ), ν(z3 ), ν(z4 ) and ν(z5 )): (i) for every root node z, ν(z) is reachable from D; (ii) for every edge (z, w), ν(w) is reachable from ν(z); and (iii) for every atom a ∈ h(q), there exists a unique path π in F that contains all the nulls in a, and, assuming that the leaf node of π is z, a is reachable from ν(z). Indeed, it is easy to verify that ν(z1 ) and ν(z5 ) are reachable from D, ν(z3 ) and ν(z4 ) are reachable from ν(z1 ), and finally, h(p(A, a, B)) = p(z3 , a, z1 ) is reachable from ν(z3 ), h(p(C, B, b)) = p(z4 , z1 , b) is reachable from ν(z4 ), and h(p(D, c, b)) = p(z5 , c, b) is reachable from ν(z5 ). Roughly speaking, the triple consisting of: (1) the homomorphism h, that maps q to the chase; (2) the function ν, that gives the atoms in the chase where the nulls occurring in h(q) were invented; and (3) the forest F , that encodes how the nulls of h(q) depend on each other, as well as the order of their generation, is what we call a proof generator for q w.r.t. D and Σ. The existence of such a triple, allows us to generate the part of the chase due to which the query is entailed, i.e., the proof of the query (hence the name “proof generator”). Therefore, for query answering purposes under linear rules, we simply need to check if such a proof generator exists. Lemma 1. (D ∪ Σ) |= q iff there exists a proof generator for q w.r.t. D and Σ. 3 The Combined Rewriting We give a positive answer to our research question regarding linear rules and the poly- nomial combined approach. More precisely, we show that: Theorem 1. The class of linear rules is polynomially combined Q-rewritable, where Q ∈ {FO, NDL}. To establish the above theorem, we heavily rely on the notion of the proof generator. Fix a (Boolean) CQ q, a database D, and a set Σ of linear rules. By Lemma 1, it suffices to construct in polynomial time a query qΣ ∈ Q (independently of D), and a database DΣ (independently of q), such that DΣ |= qΣ iff a proof generator for q w.r.t. D and Σ exists. Roughly, the query qΣ guesses a triple (h, ν, F ) (as described in Example 1), and then verifies that the guessed triple is a proof generator for q w.r.t. D and Σ. The interesting part of qΣ is the component that applies the crucial reachability checks required by the definition of the proof generator. Although the path among two atoms in the chase graph may be of exponential size, its existence can be checked via Q-queries of polynomial size. An immediate consequence of Theorem 1 is that: Corollary 1. The description logic DLR-LiteR is polynomially combined Q-rewritable, where Q ∈ {FO, NDL}. The results of this work are, for the moment, of theoretical nature and we do not claim that they will directly lead to better practical algorithms. We believe that a smart implementation of the proposed techniques can lead to more efficient rewriting proce- dures; this will be the subject of future research. We are also planning to optimize the proposed rewriting algorithm, with the aim of constructing queries of optimal size. Acknowledgements. G. Gottlob was supported by the EPSRC Programme Grant EP/M025268/ “VADA: Value Added Data Systems – Principles and Architecture”. M. Manna was supported by the MIUR project “SI-LAB BA2KNOW – Business Anali- tycs to Know”, and by Regione Calabria, programme POR Calabria FESR 2007-2013, projects “ITravel PLUS” and “KnowRex: Un sistema per il riconoscimento e l’estrazione di conoscenza”. A. Pieris was supported by the Austrian Science Fund (FWF): P25207- N23 and Y698. References 1. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012) 2. Gottlob, G., Kikot, S., Kontchakov, R., Podolskii, V.V., Schwentick, T., Zakharyaschev, M.: The price of query rewriting in ontology-based data access. Artif. Intell. 213, 42–59 (2014) 3. Gottlob, G., Manna, M., Pieris, A.: Polynomial combined rewritings for existential rules. In: KR (2014) 4. Gottlob, G., Manna, M., Pieris, A.: Polynomial rewritings for linear existential rules. In: IJCAI (2015) 5. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to query answering in DL-Lite. In: KR (2010) 6. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to ontology-based data access. In: IJCAI. pp. 2656–2661 (2011) 7. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL using a relational database system. In: IJCAI. pp. 2070–2075 (2009)