=Paper= {{Paper |id=Vol-1912/paper30 |storemode=property |title=Guarded Ontology-Mediated Queries Distributing Over Components |pdfUrl=https://ceur-ws.org/Vol-1912/paper30.pdf |volume=Vol-1912 |authors=Pablo Barceló,Gerald Berger,Andreas Pieris |dblpUrl=https://dblp.org/rec/conf/amw/BarceloBP17 }} ==Guarded Ontology-Mediated Queries Distributing Over Components== https://ceur-ws.org/Vol-1912/paper30.pdf
                    Guarded Ontology-Mediated Queries
                      Distributing Over Components⋆

                     Pablo Barceló1 , Gerald Berger2 , and Andreas Pieris3
                1
                    Center for Semantic Web Research & DCC, University of Chile
                                  pbarcelo@dcc.uchile.cl
       2
          Institute of Information Systems, TU Wien gberger@dbai.tuwien.ac.at
         3
            School of Informatics, University of Edinburgh apieris@inf.ed.ac.uk



1 Introduction

Ontology-based data access (OBDA) has recently emerged as a promising application
of knowledge representation and reasoning technologies in information management
systems. The aim of OBDA is to facilitate access to data that is significantly heteroge-
neous and incomplete. This is achieved via an ontology that provides a unified concep-
tual view of the data, and makes it accessible via database queries formulated solely in
the vocabulary of the ontology. The actual database query and the ontology can be seen
as two components of one composite query, called ontology-mediated query [8]. OBDA
can then be realized as the problem of answering ontology-mediated queries.
    An important topic for declarative database query languages, and in particular (ex-
tensions of) Datalog, is coordination-free evaluation. This has its roots in declarative
networking [15], an approach where distributed computations are programmed using
Datalog-based formalisms. In this setting, programs (or queries) are specified over a
global schema, and are executed by multiple computing nodes over which the database
is distributed. These nodes can perform local computations, and also communicate
asynchronously with each other via messages. The model assumes that messages can
never be lost but can be arbitrarily delayed. An intrinsic source of inefficiency in such
systems are the global barriers raised by the need for synchronization in computing the
result of queries. This has motivated a fruitful line of research for isolating classes of
queries that can be evaluated in a coordination-free manner [1–4, 16].
    It is natural to ask whether the results on coordination-free evaluation for declarative
database query languages can be transferred to ontology-mediated queries. Undoubt-
edly, a positive answer to this question, apart from possible applications to declarative
networking, will significantly contribute towards more efficient procedures for OBDA,
since it will enable distributed ontology-mediated query evaluation in a coordination-
free way. The goal of the present work is to initiate the study of coordination-free eval-
uation for ontology-mediated queries, and give strong indications that the answer to the
above challenging question is affirmative.
Distribution Over Components. More concretely, we focus on the question whether
the answer to an ontology-mediated query can be computed by parallelizing it over the
⋆
    This short paper is based on [6, 7].
(maximally connected) components of the database, i.e., whether the query distributes
over components. In other words, given an ontology-mediated query Q, the question is
whether
S        for every database D the answer to Q over D, denoted Q(D), coincides with
   1≤i≤n Q(D  i ), where D1 , . . . , Dn are the components of D. In this case, Q(D) can
be computed without any communication over a network using a distribution where
every computing node is assigned some of the connected components of the database,
and every component is assigned to at least one computing node. The notion of dis-
tribution over components has been introduced in [2], and explicitly considered for
Datalog queries in [3]. It has been shown that connected Datalog, that is, the fragment
of Datalog where all rule-bodies are connected, provides an effective syntax for Data-
log queries that distribute over components, while the problem of deciding whether a
Datalog query distributes over components is undecidable.
Aims and Objectives. As said, this work concentrates on ontology-mediated queries
and their distribution over components. We focus on guarded ontology-mediated queries
where the database query is a conjunctive query and the ontology is formulated using
guarded existential rules (a.k.a. tuple-generating dependencies and Datalog± rules), that
is, sentences of the form ∀x̄∀ȳ(ϕ(x̄, ȳ) → ∃z̄ ψ(x̄, z̄)) where ϕ, ψ are conjunctions of
atoms with ϕ being guarded, i.e., it has an atom, called guard, that contains all the vari-
ables (x̄ ∪ ȳ) [10, 11]. It is widely accepted that the class of guarded existential rules
forms a natural and convenient way for modeling ontologies, which generalizes promi-
nent description logics such as ELHI [5]. The goal of this work is to answer the fol-
lowing questions: (1) Can we characterize the fragment of guarded ontology-mediated
queries that distribute over components via the notion of connectedness, i.e., when the
rule-bodies and the conjunctive query are connected? (2) What is the complexity of
deciding whether a guarded ontology-mediated query distributes over components?
Our Results. Our results can be summarized as follows:
 – The fragment of guarded ontology-mediated queries that distribute over compo-
   nents has the same expressive power as the fragment of guarded ontology-mediated
   queries where the conjunctive query is connected. Notice that the body of a guarded
   existential rule is always connected due to the existence of the guard atom. Thus,
   this result provides a positive answer to the first question above.
 – Regarding the second question, we show that deciding whether a guarded ontology-
   mediated query distributes over components is strongly related to a classical static
   analysis task, namely containment. We show that the problem in question is feasi-
   ble in polynomial time assuming access to an oracle that can solve containment for
   guarded ontology-mediated queries. We then show that the containment problem
   is feasible in 2E XP T IME, which immediately implies that distribution over compo-
   nents is also feasible in 2E XP T IME. A matching lower bound can be easily shown
   by exploiting the fact that evaluation of guarded OMQs is 2E XP T IME-hard [10].


2 Distribution of Queries and Connectedness
Let us first recall the basics on ontology-mediated querying. An ontology-mediated
query (OMQ) over a (relational) schema S is a triple (S, Σ, q), where Σ is a set of
existential rules (the ontology), and q is a conjunctive query (CQ). A guarded OMQ
is an OMQ as the one above where Σ is a set of guarded existential rules; we write
(G, CQ) for the OMQ language that consists of all guarded OMQs.4 The semantics of
an OMQ is given in terms of certain answers. The evaluation of Q = (S, Σ, q) over an
S-database D, which is a finite set of atoms of the form R(t̄), where R ∈ S and t̄ is a
tuple of constants, denoted Q(D), is defined as the certain answers to q w.r.t. D and Σ.
Equivalently, Q(D) is the evaluation of q over the canonical instance of D and Σ that
can be constructed by the well-known chase procedure. Recall that the chase adds new
atoms to D as dictated by Σ until the final result, written chase(D, Σ), satisfies all the
rules of Σ. For more details on ontology-mediated queries see, e.g., [7].
    The notion of distribution over components has been introduced in [2], and it states
that the answer to a query can be computed by parallelizing it over the (maximally
connected) components of the input database. But let us first make precise what a com-
ponent is. A set A of atoms is connected if for all terms c, d occurring in A there exists
a sequence α1 , . . . , αn of atoms in A such that c occurs in α1 , d occurs in αn , and αi ,
αi+1 have at least one term in common, for each i ∈ {1, . . . , n − 1}. We call B ⊆ A
a component of A if (i) B is connected, and (ii) for every α ∈ A \ B, B ∪ {α} is
not connected. Let co(A) be the set of components of A. We are now ready to intro-
duce the notion of distribution over components. Consider an OMQ Q = (S, Σ, q).
We say that Q distributes over components if Q(D) = Q(D1 ) ∪ · · · ∪ Q(Dn ), where
co(D) = {D1 , . . . , Dn }, for every S-database D. Roughly, the centralized answer to
Q w.r.t. D is precisely obtained when we parallelize Q over the components of D. Let
DIST be the class of OMQs that distribute over components.
    We proceed to characterize the expressive power of the fragment of (G, CQ) that
distributes over components. This is done via connectedness of CQs. A CQ q is con-
nected if the set of atoms occurring in q is connected; we write CCQ for the class of
connected CQs. The main result of this section states that every guarded OMQ that dis-
tributes over components is equivalent to a connected guarded OMQ. Given two OMQ
languages O1 and O2 , we write O1 = O2 to state that they are equally expressive, i.e.,
for every query Q1 ∈ O1 over S we can construct a query Q2 ∈ O2 over S such that
Q1 (D) = Q2 (D), for every S-database D, and vice versa. Then:

Theorem 1. (G, CQ) ∩ DIST = (G, CCQ).

     For the (⊇) direction we show that connectedness ensures distribution over compo-
nents. This is a consequence of the fact that, given a query Q = (S, Σ, q) ∈ (G, CCQ),
for every S-database D with co(D) = {D1 , . . . , Dm }, chase(D, Σ) can be partitioned
into {I1 , . . . , Im } such that, for each i ∈ {1, . . . , m}, Ii depends solely on Di . Consider
now a query Q = (S, Σ, q(x̄)) ∈ (G, CQ) ∩ DIST. Observe that if Q is unsatisfiable,
i.e., there is no S-database D such that Q(D) 6= ∅, then the claim holds trivially;
simply choose an arbitrary unsatisfiable query from (G, CCQ). The interesting case
is when Q is satisfiable. Assuming that {q1 , . . . , qk } are the components of q, we can
show that there exists i ∈ {1, . . . , k} such that Qi = (S, Σ, qi (x̄)) is well-defined and
equivalent to Q. Since Qi ∈ (G, CCQ) the claim follows.
 4
     Notice that, in general, OMQs are defined for arbitrary ontology languages based on first-order
     logic (not only on existential rules), and arbitrary query languages (not only CQs).
3 Deciding Distribution Over Components

In this section we focus on the problem of deciding whether a guarded OMQ distributes
over components. In fact, this problem can be defined for an OMQ language based on
an arbitrary class C of existential rules:

                      PROBLEM : Dist(C, CQ)
                      INPUT :    An OMQ Q ∈ (C, CQ).
                      QUESTION : Does Q distribute over components?

     Our goal is to pinpoint the complexity of Dist(G, CQ). Towards this direction, we
first show that it is closely related to the crucial task of containment: given two OMQs
Q1 , Q2 ∈ (G, CQ) over a schema S, decide whether Q1 (D) ⊆ Q2 (D), for every S-
database D, written as Q1 ⊆ Q2 . By exploiting Theorem 1 we can show the following:

Proposition 1. Let Q = (S, Σ, q(x̄)) ∈ (G, CQ). The following are equivalent:

 1. Q distributes over components.
 2. Q is unsatisfiable or there exists q̂(x̄) ∈ co(q)5 such that (S, Σ, q̂(x̄)) ⊆ Q.

    Notice that checking unsatisfiability can be easily reduced to containment. There-
fore, the above results essentially states that Dist(G, CQ) is feasible in polynomial time
assuming access to a C oracle, where C is a complexity class powerful enough for check-
ing containment among guarded OMQs. The latter is a highly non-trivial problem, and
we have recently shown that is 2E XP T IME-complete [6]. To obtain the 2E XP T IME up-
per bound, we make use of techniques based on two-way alternating parity automata on
trees (2WAPA) [12]. We first show that if Q1 and Q2 are guarded OMQs such that Q1 is
not contained in Q2 , then this is witnessed over a class of “tree-like” databases that can
be represented as the set of trees accepted by a 2WAPA A. We then build a 2WAPA B
with exponentially many states that recognizes those trees accepted by A that represent
witnesses to non-containment of Q1 in Q2 . Hence, Q1 is contained in Q2 iff B accepts
no tree. Since the emptiness problem for 2WAPA is feasible in exponential time in the
number of states [12], we obtain that containment for guarded OMQs is in 2E XP T IME.
A matching lower bound follows from [9]. Thus, Proposition 1 together with the fact
that containment for guarded OMQs is in 2E XP T IME implies that Dist(G, CQ) is in
2E XP T IME, while a matching lower bound can be shown by exploiting the fact that
evaluation of (G, CQ) queries is 2E XP T IME-hard [10]. Then:

Theorem 2. Dist(G, CQ) is 2E XP T IME-complete.


4 Next Steps

Here is a couple of interesting open problems that we are planning to tackle: (i) distri-
bution over components in the presence of equality and denial constraints is not well
 5
     This denotes the set of components of the set of atoms in q, or simply, the components of q.
understood, and (ii) we do not know how distribution over components behaves in the
case of guarded OMQs enriched with non-monotonic negation [13, 14].
Acknowledgements. Barceló is supported by the Millenium Nucleus Center for Se-
mantic Web Research under grant NC120004. Berger is funded by the Austrian Science
Fund (FWF), project W1255-N23, and a DOC Fellowship of the Austrian Academy of
Sciences. Pieris is supported by the EPSRC Programme Grant EP/M025268/ “VADA:
Value Added Data Systems – Principles and Architecture”.


References
 1. Alvaro, P., Conway, N., Hellerstein, J.M., Maier, D.: Blazes: Coordination analysis for dis-
    tributed programs. In: ICDE. pp. 52–63 (2014)
 2. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Weaker forms of monotonicity for declar-
    ative networking: A more fine-grained answer to the calm-conjecture. In: PODS. pp. 64–75
    (2014)
 3. Ameloot, T.J., Ketsman, B., Neven, F., Zinn, D.: Datalog queries distributing over compo-
    nents. In: ICDT. pp. 308–323 (2015)
 4. Ameloot, T.J., Neven, F., den Bussche, J.V.: Relational transducers for declarative network-
    ing. J. ACM 60(2), 15 (2013)
 5. Baader, F., Brandt, S., Lutz, C.: Pushing the el envelope. In: Proc. of IJCAI. pp. 364–369
    (2005)
 6. Barceló, P., Berger, G., Pieris, A.: Containment for rule-based ontology-mediated queries.
    CoRR abs/1703.07994 (2017), http://arxiv.org/abs/1703.07994
 7. Berger, G., Pieris, A.: Ontology-mediated queries distributing over components. In: IJCAI.
    pp. 943–949 (2016)
 8. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through
    disjunctive datalog, csp, and MMSNP. ACM Trans. Database Syst. 39(4), 33:1–33:44 (2014)
 9. Bienvenu, M., Hansen, P., Lutz, C., Wolter, F.: First order-rewritability and containment of
    conjunctive queries in horn description logics. In: IJCAI. pp. 965–971 (2016)
10. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive
    relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)
11. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable
    query answering over ontologies. J. Web Sem. 14, 57–83 (2012)
12. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization prob-
    lems for database logic programs (preliminary report). In: STOC. pp. 477–490 (1988)
13. Gottlob, G., Hernich, A., Kupke, C., Lukasiewicz, T.: Stable model semantics for guarded
    existential rules and description logics. In: KR (2014)
14. Hernich, A., Kupke, C., Lukasiewicz, T., Gottlob, G.: Well-founded semantics for extended
    Datalog and ontological reasoning. In: PODS. pp. 225–236 (2013)
15. Loo, B.T., Condie, T., Garofalakis, M.N., Gay, D.E., Hellerstein, J.M., Maniatis, P., Ramakr-
    ishnan, R., Roscoe, T., Stoica, I.: Declarative networking. Commun. ACM 52(11), 87–95
    (2009)
16. Zinn, D., Green, T.J., Ludäscher, B.: Win-move is coordination-free (sometimes). In: ICDT.
    pp. 99–113 (2012)