Guarded Ontology-Mediated Queries Distributing Over Components⋆ Pablo Barceló1 , Gerald Berger2 , and Andreas Pieris3 1 Center for Semantic Web Research & DCC, University of Chile pbarcelo@dcc.uchile.cl 2 Institute of Information Systems, TU Wien gberger@dbai.tuwien.ac.at 3 School of Informatics, University of Edinburgh apieris@inf.ed.ac.uk 1 Introduction Ontology-based data access (OBDA) has recently emerged as a promising application of knowledge representation and reasoning technologies in information management systems. The aim of OBDA is to facilitate access to data that is significantly heteroge- neous and incomplete. This is achieved via an ontology that provides a unified concep- tual view of the data, and makes it accessible via database queries formulated solely in the vocabulary of the ontology. The actual database query and the ontology can be seen as two components of one composite query, called ontology-mediated query [8]. OBDA can then be realized as the problem of answering ontology-mediated queries. An important topic for declarative database query languages, and in particular (ex- tensions of) Datalog, is coordination-free evaluation. This has its roots in declarative networking [15], an approach where distributed computations are programmed using Datalog-based formalisms. In this setting, programs (or queries) are specified over a global schema, and are executed by multiple computing nodes over which the database is distributed. These nodes can perform local computations, and also communicate asynchronously with each other via messages. The model assumes that messages can never be lost but can be arbitrarily delayed. An intrinsic source of inefficiency in such systems are the global barriers raised by the need for synchronization in computing the result of queries. This has motivated a fruitful line of research for isolating classes of queries that can be evaluated in a coordination-free manner [1–4, 16]. It is natural to ask whether the results on coordination-free evaluation for declarative database query languages can be transferred to ontology-mediated queries. Undoubt- edly, a positive answer to this question, apart from possible applications to declarative networking, will significantly contribute towards more efficient procedures for OBDA, since it will enable distributed ontology-mediated query evaluation in a coordination- free way. The goal of the present work is to initiate the study of coordination-free eval- uation for ontology-mediated queries, and give strong indications that the answer to the above challenging question is affirmative. Distribution Over Components. More concretely, we focus on the question whether the answer to an ontology-mediated query can be computed by parallelizing it over the ⋆ This short paper is based on [6, 7]. (maximally connected) components of the database, i.e., whether the query distributes over components. In other words, given an ontology-mediated query Q, the question is whether S for every database D the answer to Q over D, denoted Q(D), coincides with 1≤i≤n Q(D i ), where D1 , . . . , Dn are the components of D. In this case, Q(D) can be computed without any communication over a network using a distribution where every computing node is assigned some of the connected components of the database, and every component is assigned to at least one computing node. The notion of dis- tribution over components has been introduced in [2], and explicitly considered for Datalog queries in [3]. It has been shown that connected Datalog, that is, the fragment of Datalog where all rule-bodies are connected, provides an effective syntax for Data- log queries that distribute over components, while the problem of deciding whether a Datalog query distributes over components is undecidable. Aims and Objectives. As said, this work concentrates on ontology-mediated queries and their distribution over components. We focus on guarded ontology-mediated queries where the database query is a conjunctive query and the ontology is formulated using guarded existential rules (a.k.a. tuple-generating dependencies and Datalog± rules), that is, sentences of the form ∀x̄∀ȳ(ϕ(x̄, ȳ) → ∃z̄ ψ(x̄, z̄)) where ϕ, ψ are conjunctions of atoms with ϕ being guarded, i.e., it has an atom, called guard, that contains all the vari- ables (x̄ ∪ ȳ) [10, 11]. It is widely accepted that the class of guarded existential rules forms a natural and convenient way for modeling ontologies, which generalizes promi- nent description logics such as ELHI [5]. The goal of this work is to answer the fol- lowing questions: (1) Can we characterize the fragment of guarded ontology-mediated queries that distribute over components via the notion of connectedness, i.e., when the rule-bodies and the conjunctive query are connected? (2) What is the complexity of deciding whether a guarded ontology-mediated query distributes over components? Our Results. Our results can be summarized as follows: – The fragment of guarded ontology-mediated queries that distribute over compo- nents has the same expressive power as the fragment of guarded ontology-mediated queries where the conjunctive query is connected. Notice that the body of a guarded existential rule is always connected due to the existence of the guard atom. Thus, this result provides a positive answer to the first question above. – Regarding the second question, we show that deciding whether a guarded ontology- mediated query distributes over components is strongly related to a classical static analysis task, namely containment. We show that the problem in question is feasi- ble in polynomial time assuming access to an oracle that can solve containment for guarded ontology-mediated queries. We then show that the containment problem is feasible in 2E XP T IME, which immediately implies that distribution over compo- nents is also feasible in 2E XP T IME. A matching lower bound can be easily shown by exploiting the fact that evaluation of guarded OMQs is 2E XP T IME-hard [10]. 2 Distribution of Queries and Connectedness Let us first recall the basics on ontology-mediated querying. An ontology-mediated query (OMQ) over a (relational) schema S is a triple (S, Σ, q), where Σ is a set of existential rules (the ontology), and q is a conjunctive query (CQ). A guarded OMQ is an OMQ as the one above where Σ is a set of guarded existential rules; we write (G, CQ) for the OMQ language that consists of all guarded OMQs.4 The semantics of an OMQ is given in terms of certain answers. The evaluation of Q = (S, Σ, q) over an S-database D, which is a finite set of atoms of the form R(t̄), where R ∈ S and t̄ is a tuple of constants, denoted Q(D), is defined as the certain answers to q w.r.t. D and Σ. Equivalently, Q(D) is the evaluation of q over the canonical instance of D and Σ that can be constructed by the well-known chase procedure. Recall that the chase adds new atoms to D as dictated by Σ until the final result, written chase(D, Σ), satisfies all the rules of Σ. For more details on ontology-mediated queries see, e.g., [7]. The notion of distribution over components has been introduced in [2], and it states that the answer to a query can be computed by parallelizing it over the (maximally connected) components of the input database. But let us first make precise what a com- ponent is. A set A of atoms is connected if for all terms c, d occurring in A there exists a sequence α1 , . . . , αn of atoms in A such that c occurs in α1 , d occurs in αn , and αi , αi+1 have at least one term in common, for each i ∈ {1, . . . , n − 1}. We call B ⊆ A a component of A if (i) B is connected, and (ii) for every α ∈ A \ B, B ∪ {α} is not connected. Let co(A) be the set of components of A. We are now ready to intro- duce the notion of distribution over components. Consider an OMQ Q = (S, Σ, q). We say that Q distributes over components if Q(D) = Q(D1 ) ∪ · · · ∪ Q(Dn ), where co(D) = {D1 , . . . , Dn }, for every S-database D. Roughly, the centralized answer to Q w.r.t. D is precisely obtained when we parallelize Q over the components of D. Let DIST be the class of OMQs that distribute over components. We proceed to characterize the expressive power of the fragment of (G, CQ) that distributes over components. This is done via connectedness of CQs. A CQ q is con- nected if the set of atoms occurring in q is connected; we write CCQ for the class of connected CQs. The main result of this section states that every guarded OMQ that dis- tributes over components is equivalent to a connected guarded OMQ. Given two OMQ languages O1 and O2 , we write O1 = O2 to state that they are equally expressive, i.e., for every query Q1 ∈ O1 over S we can construct a query Q2 ∈ O2 over S such that Q1 (D) = Q2 (D), for every S-database D, and vice versa. Then: Theorem 1. (G, CQ) ∩ DIST = (G, CCQ). For the (⊇) direction we show that connectedness ensures distribution over compo- nents. This is a consequence of the fact that, given a query Q = (S, Σ, q) ∈ (G, CCQ), for every S-database D with co(D) = {D1 , . . . , Dm }, chase(D, Σ) can be partitioned into {I1 , . . . , Im } such that, for each i ∈ {1, . . . , m}, Ii depends solely on Di . Consider now a query Q = (S, Σ, q(x̄)) ∈ (G, CQ) ∩ DIST. Observe that if Q is unsatisfiable, i.e., there is no S-database D such that Q(D) 6= ∅, then the claim holds trivially; simply choose an arbitrary unsatisfiable query from (G, CCQ). The interesting case is when Q is satisfiable. Assuming that {q1 , . . . , qk } are the components of q, we can show that there exists i ∈ {1, . . . , k} such that Qi = (S, Σ, qi (x̄)) is well-defined and equivalent to Q. Since Qi ∈ (G, CCQ) the claim follows. 4 Notice that, in general, OMQs are defined for arbitrary ontology languages based on first-order logic (not only on existential rules), and arbitrary query languages (not only CQs). 3 Deciding Distribution Over Components In this section we focus on the problem of deciding whether a guarded OMQ distributes over components. In fact, this problem can be defined for an OMQ language based on an arbitrary class C of existential rules: PROBLEM : Dist(C, CQ) INPUT : An OMQ Q ∈ (C, CQ). QUESTION : Does Q distribute over components? Our goal is to pinpoint the complexity of Dist(G, CQ). Towards this direction, we first show that it is closely related to the crucial task of containment: given two OMQs Q1 , Q2 ∈ (G, CQ) over a schema S, decide whether Q1 (D) ⊆ Q2 (D), for every S- database D, written as Q1 ⊆ Q2 . By exploiting Theorem 1 we can show the following: Proposition 1. Let Q = (S, Σ, q(x̄)) ∈ (G, CQ). The following are equivalent: 1. Q distributes over components. 2. Q is unsatisfiable or there exists q̂(x̄) ∈ co(q)5 such that (S, Σ, q̂(x̄)) ⊆ Q. Notice that checking unsatisfiability can be easily reduced to containment. There- fore, the above results essentially states that Dist(G, CQ) is feasible in polynomial time assuming access to a C oracle, where C is a complexity class powerful enough for check- ing containment among guarded OMQs. The latter is a highly non-trivial problem, and we have recently shown that is 2E XP T IME-complete [6]. To obtain the 2E XP T IME up- per bound, we make use of techniques based on two-way alternating parity automata on trees (2WAPA) [12]. We first show that if Q1 and Q2 are guarded OMQs such that Q1 is not contained in Q2 , then this is witnessed over a class of “tree-like” databases that can be represented as the set of trees accepted by a 2WAPA A. We then build a 2WAPA B with exponentially many states that recognizes those trees accepted by A that represent witnesses to non-containment of Q1 in Q2 . Hence, Q1 is contained in Q2 iff B accepts no tree. Since the emptiness problem for 2WAPA is feasible in exponential time in the number of states [12], we obtain that containment for guarded OMQs is in 2E XP T IME. A matching lower bound follows from [9]. Thus, Proposition 1 together with the fact that containment for guarded OMQs is in 2E XP T IME implies that Dist(G, CQ) is in 2E XP T IME, while a matching lower bound can be shown by exploiting the fact that evaluation of (G, CQ) queries is 2E XP T IME-hard [10]. Then: Theorem 2. Dist(G, CQ) is 2E XP T IME-complete. 4 Next Steps Here is a couple of interesting open problems that we are planning to tackle: (i) distri- bution over components in the presence of equality and denial constraints is not well 5 This denotes the set of components of the set of atoms in q, or simply, the components of q. understood, and (ii) we do not know how distribution over components behaves in the case of guarded OMQs enriched with non-monotonic negation [13, 14]. Acknowledgements. Barceló is supported by the Millenium Nucleus Center for Se- mantic Web Research under grant NC120004. Berger is funded by the Austrian Science Fund (FWF), project W1255-N23, and a DOC Fellowship of the Austrian Academy of Sciences. Pieris is supported by the EPSRC Programme Grant EP/M025268/ “VADA: Value Added Data Systems – Principles and Architecture”. References 1. Alvaro, P., Conway, N., Hellerstein, J.M., Maier, D.: Blazes: Coordination analysis for dis- tributed programs. In: ICDE. pp. 52–63 (2014) 2. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Weaker forms of monotonicity for declar- ative networking: A more fine-grained answer to the calm-conjecture. In: PODS. pp. 64–75 (2014) 3. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Datalog queries distributing over compo- nents. In: ICDT. pp. 308–323 (2015) 4. Ameloot, T.J., Neven, F., den Bussche, J.V.: Relational transducers for declarative network- ing. J. ACM 60(2), 15 (2013) 5. Baader, F., Brandt, S., Lutz, C.: Pushing the el envelope. In: Proc. of IJCAI. pp. 364–369 (2005) 6. Barceló, P., Berger, G., Pieris, A.: Containment for rule-based ontology-mediated queries. CoRR abs/1703.07994 (2017), http://arxiv.org/abs/1703.07994 7. Berger, G., Pieris, A.: Ontology-mediated queries distributing over components. In: IJCAI. pp. 943–949 (2016) 8. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, csp, and MMSNP. ACM Trans. Database Syst. 39(4), 33:1–33:44 (2014) 9. Bienvenu, M., Hansen, P., Lutz, C., Wolter, F.: First order-rewritability and containment of conjunctive queries in horn description logics. In: IJCAI. pp. 965–971 (2016) 10. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013) 11. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012) 12. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization prob- lems for database logic programs (preliminary report). In: STOC. pp. 477–490 (1988) 13. Gottlob, G., Hernich, A., Kupke, C., Lukasiewicz, T.: Stable model semantics for guarded existential rules and description logics. In: KR (2014) 14. Hernich, A., Kupke, C., Lukasiewicz, T., Gottlob, G.: Well-founded semantics for extended Datalog and ontological reasoning. In: PODS. pp. 225–236 (2013) 15. Loo, B.T., Condie, T., Garofalakis, M.N., Gay, D.E., Hellerstein, J.M., Maniatis, P., Ramakr- ishnan, R., Roscoe, T., Stoica, I.: Declarative networking. Commun. ACM 52(11), 87–95 (2009) 16. Zinn, D., Green, T.J., Ludäscher, B.: Win-move is coordination-free (sometimes). In: ICDT. pp. 99–113 (2012)