=Paper=
{{Paper
|id=Vol-1195/long14
|storemode=property
|title=Query Answering over Contextualized RDF Knowledge with Forall-Existential Bridge Rules: Attaining Decidability Using Acyclicity
|pdfUrl=https://ceur-ws.org/Vol-1195/long14.pdf
|volume=Vol-1195
|dblpUrl=https://dblp.org/rec/conf/cilc/JosephKS14
}}
==Query Answering over Contextualized RDF Knowledge with Forall-Existential Bridge Rules: Attaining Decidability Using Acyclicity==
Query Answering over Contextualized RDF Knowledge with Forall-Existential Bridge Rules: Attaining Decidability using Acyclicity Mathew Joseph1,2 , Gabriel Kuper2 , and Luciano Serafini1 1 DKM, FBK-IRST, Trento, Italy 2 DISI, University Of Trento, Trento, Italy {mathew,serafini}@fbk.eu, kuper@disi.unitn.it Abstract. The recent outburst of context-dependent knowledge on the Seman- tic Web (SW) has led to the realization of the importance of the quads in the SW community. Quads, which extend a standard RDF triple, by adding a new parameter of the ‘context’ of an RDF triple, thus informs a reasoner to distin- guish between the knowledge in various contexts. Although this distinction sep- arates the triples in an RDF graph into various contexts, and allows the reasoning to be decoupled across various contexts, bridge rules need to be provided for inter-operating the knowledge across these contexts. We call a set of quads to- gether with the bridge rules, a quad-system. In this paper, we discuss the problem of query answering over quad-systems with expressive forall-existential bridge rules. It turns out the query answering over quad-systems is undecidable, in gen- eral. We derive a decidable class of quad-systems, namely context-acyclic quad- systems, for which query answering can be done using forward chaining. Tight bounds for data and combined complexity of query entailment has been estab- lished for the derived class. Keywords: Contextualized RDF/OWL knowledge, Contextualized Query Answering, Quads, Forall-Existential Rules, Semantic Web, Knowledge Representation. 1 Introduction One of the major recent changes in the SW community is the transformation from a triple to a quad as its primary knowledge carrier. As a consequence, more and more triple stores are becoming quad stores. Some of the popular quad-stores are 4store1 , Openlink Virtuoso2 , and some of the current popular triple stores like Sesame3 inter- nally keep track of the context by storing arrays of four names (c, s, p, o) (further de- noted as c : (s, p, o)), where c is an identifier that stands for the context of the triple (s, p, o). Some of the recent initiatives in this direction have also extended existing formats like N-Triples to N-Quads. The latest Billion triples challenge datasets (BTC 2012) have all been released in the N-Quads format. 1 http://4store.org 2 http://virtuoso.openlinksw.com/rdf-quad-store/ 3 http://www.openrdf.org/ 210 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules One of the main benefits of quads over triples are that they allow users to specify various attributes of meta-knowledge that further qualify knowledge [8], and also al- low users to query for this meta knowledge [30]. Examples of these attributes, which are also called context dimensions [12], are provenance, creator, intended user, creation time, validity time, geo-location, and topic. Having defined various contexts in which triples are dispersed, one can declare in another meta-context mc, statements such as mc : (c1 , creator, John), mc : (c1 , expiryTime, “jun-2013”) that talk about the knowl- edge in context c1 , in this case its creator and expiry time. Another benefit of such a contextualized approach is that it opens possibilities of interesting ways for querying a contextualized knowledge base. For instance, if context c1 contains knowledge about Football World Cup 2014 and context c2 about Football Euro Cup 2012. Then the query “who beat Italy in both Euro Cup 2012 and World Cup 2014” can be formalized as the conjunctive query: c1 : (x, beat, Italy) ^ c2 : (x, beat, Italy), where x is a variable. As the knowledge can be separated context wise and simultaneously be fed to separate reasoning engines, this approach increases both efficiency and scalability. Besides the above flexibility, bridge rules [4] can be provided for inter-interoperating the knowledge in different contexts. Such rules are primarily of the form: c : (x) ! c0 : 0 (x) where , 0 are both atomic concept (role) symbols, c, c0 are contexts. The semantics of such a rule is that if, for any a, (a) holds in context c, then 0 (a) should hold in con- text c0 , where a is a unary/binary vector dependending on whether , 0 are concept/role symbols. Although such bridge rules serve the purpose of specifying knowledge inter- operability from a source context c to a target context c0 , in many practical situations there is the need of interoperating multiple source contexts with multiple target targets, for which the bridge rules of the form (1) is inadequate. Besides, one would also want the ability of creating new values in target contexts for the bridge rules. In this work, we consider forall-existential bridge rules that allows conjunctions and existential quantifiers in them, and hence is more expressive than those, in DDL [4] and McCarthy et al. [28]. A set of quads together with such bridge rules is called a quad-system. The main contributions of this work can be summarized as: 1. We provide a basic semantics for contextual reasoning over quad-systems, and study contextualized conjunctive query answering over them. For query answer- ing, we use the notion of a distributed chase, which is an extension of a standard chase [22, 1] that is widely used in databases and KR for the same. 2. We show that conjunctive query answering over quad-systems, in general, is unde- cidable. We derive a class of quad-systems called context acyclic quad-systems, for which query answering is decidable and can be done by forward chaining. We give both data and combined complexity of conjunctive query entailment for the same. The paper is structured as follows: In section 2, we formalize the idea of contextual- ized quad-systems, giving various definitions and notations for setting the background. 211 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules In section 3, we formalize the query answering on quad-systems, define notions such as distributed chase that is further used for query answering, and give the undecidabil- ity results of query entailment for unrestricted quad-systems. In section 4, we present context acyclic quad-systems and its properties. We give an account of relevant related works in section 5, and conclude in section 6. A version of this paper with detailed proofs is available at [23]. 2 Contextualized Quad-Systems In this section, we formalize the notion of a quad-system and its semantics. For any vector or sequence x, we denote by kxk the number of symbols in x, and by {x} the set of symbols in x. For any sets A and B, A ! B denotes the set of all functions from set A to set B. Given the set of URIs U, the set of blank nodes B, and the set of literals L, the set C = U ] B ] L are called the set of (RDF) constants. Any (s, p, o) 2 C ⇥ C ⇥ C is called a generalized RDF triple (from now on, just triple). A graph is defined as a set of triples. A Quad is a tuple of the form c : (s, p, o), where (s, p, o) is a triple and c is a URI4 , called the context identifier that denotes the context of the RDF triple. A quad-graph is defined as a set of quads. For any quad-graph Q and any context identifier c, we denote by graphQ (c) the set {(s, p, o)|c : (s, p, o) 2 Q}. We denote by QC the quad-graph whose set of context identifiers is C. Let V be the set of variables, any element of the set CV = V[C is a term. Any (s, p, o) 2 CV ⇥CV ⇥CV is called a triple pattern, and an expression of the form c : (s, p, o), where (s, p, o) is a triple pattern, c a context identifier, is called a quad pattern. A triple pattern t, whose variables are elements of the vector x or elements of the vector y is written as t(x, y). For any function f : A ! B, the restriction of f to a set A0 , is the mapping f |A0 from A0 \A to B s.t. f |A0 (a) = f (a), for each a 2 A\A0 . For any triple pattern t = (s, p, o) and a function µ from V to a set A, t[µ] denotes (µ0 (s), µ0 (p), µ0 (o)), where µ0 is an extension of µSto C s.t. µ0 |C is the identity function. For any set of triple patterns G, G[µ] denotes t2G t[µ]. For any vector of constants a = ha1 , . . . , akak i, and vector of variables x of the same length, x/a is the function µ s.t. µ(xi ) = ai , for 1 i kak. We use the notation t(a, y) to denote t(x, y)[x/a]. Bridge rules (BRs) Bridge rules (BR) enables knowledge propagation across contexts. Formally, a BR is an expression of the form: 8x8z [c1 : t1 (x, z) ^ ... ^ cn : tn (x, z) ! 9y c01 : t01 (x, y) ^ ... ^ c0m : t0m (x, y)] (1) where c1 , ..., cn , c01 , ..., c0m are context identifiers, x, y, z are vectors of variables s.t. {x}, {y}, and {z} are pairwise disjoint. t1 (x, z), ..., tn (x, z) are triple patterns which do not contain blank-nodes, and whose set of variables are from x or z. t01 (x, y), ...,t0m (x, y) are triple patterns, whose set of variables are from x or y, and also does not contain blank-nodes. For any BR, r, of the form (1), body(r) is the set of quad patterns {c1 : t1 (x, z),...,cn : tn (x, z)}, and head(r) is the set of quad patterns {c01 : t01 (x, y), ... c0m : t0m (x, y)}. 4 Although, in general a context identifier can be a constant, for the ease of notation, we restrict them to be a URI 212 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules Definition 1 (Quad-System). A quad-system QSC is defined as a pair hQC , Ri, where QC is a quad-graph, whose set of context identifiers is C, and R is a set of BRs. For any quad-graph QC (BR r), its symbols size kQC k (krk) is the number of symbols required to print QC (r). Hence, kQC k ⇡ 4⇤|QC |, where |QC | denotes the cardinality of the set QC . Note that |QC | equals the number of quads in QC . For a BR r, krk ⇡ 4 ⇤ k, where k is the number of quad-patterns in r. For a set of BRs R, its size kRk is given as ⌃r2R krk. For any quad-system QSC = hQC , Ri, its size kQSC k = kQC k + kRk. Semantics In order to provide a semantics for enabling reasoning over a quad-system, we need to use a local semantics for each context to interpret the knowledge pertaining to it. Since the primary goal of this paper is a decision procedure for query answering over quad-systems based on forward chaining, we consider the following desiderata for the choice of the local semantics: – there exists a set of inference rules and an operation lclosure() that computes the deductive closure of a graph w.r.t to the local semantics using the inference rules. – given a finite graph as input, the lclosure() operation, terminates with a finite graph as output in polynomial time whose size is polynomial w.r.t. to the input set. Some of the alternatives for the local semantics satisfying the above mentioned criterion are Simple, RDF, RDFS [19], OWL-Horst [20] etc. Assuming that a local semantics has been fixed, for any context c, we denote by I c = h c , ·c i an interpretation structure for the local semantics, where c is the interpretation domain, ·c the corresponding in- terpretation function. Also |=local denotes the local satisfaction relation between a local interpretation structure and a graph. Given a quad graph QC , a distributed interpretation structure is an indexed set I C = {I c }c2C , where I c is a local interpretation structure, for each c 2 C. We define the satisfaction relation |= between a distributed interpretation structure I C and a quad-system QSC as: Definition 2 (Model of a Quad-System). A distributed interpretation structure I C = {I c }c2C satisfies a quad-system QSC = hQC , Ri, in symbols I C |= QSC , iff all the following conditions are satisfied: 1. I c |=local graphQC (c), for each c 2 C; 2. aci = acj , for any a 2 C, ci , cj 2 C; 3. for S eachc BR r 2 R of the form (1) and for each 2 V ! C , where C = , if c2C I c1 |=local t1 (x, z)[ ], ..., I cn |=local tn (x, z)[ ], then there exists function 0 ◆ , s.t. c01 0 I |=local t01 (x, y)[ 0 ], ..., I cm |=local t0m (x, y)[ 0 ]. Condition 1 in the above definition ensures that for any model I of a quad-graph, each C I c 2 I C is a local model of the set of triples in context c. Condition 2 ensures that any constant c is rigid, i.e. represents the same resource across a quad-graph, irrespective of the context in which it occurs. Condition 3 ensure that any model of a quad-system satisfies each BR in it. Any I C s.t. I C |= QSC is said to be a model of QSC . A quad- system QSC is said to be consistent if there exists a model I C , s.t. I C |= QSC , and otherwise said to be inconsistent. For any quad-system QSC = hQC , Ri, it can be 213 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules the case that graphQC (c) is locally consistent, for each c 2 C, whereas QSC is not consistent. This is because the set of BRs R adds more knowledge to the quad-system, and restricts the set of models that satisfy the quad-system. Definition 3 (Quad-system entailment). (a) A quad-system QSC entails a quad c : (s, p, o), in symbols QSC |= c : (s, p, o), iff for any distributed interpretation structure I C , if I C |= QSC then I C |= h{c : (s, p, o)}, ;i. (b) A quad-system QSC entails a quad- graph Q0C 0 , in symbols QSC |= Q0C 0 iff QSC |= c : (s, p, o) for any c : (s, p, o) 2 Q0C 0 . (c) A quad-system QSC entails a BR r iff for any I C , if I C |= QSC then I C |= h;, {r}i. (d) For a set of BRs R, QSC |= R iff QSC |= r, for every r 2 R. (e) Finally, a quad- system QSC entails another quad-system QSC0 0 = hQ0C 0 , R0 i, in symbols QSC |= QSC0 0 iff QSC |= Q0C 0 and QSC |= R0 . We call the decision problems (DPs) corresponding to the entailment problems (EPs) in (a), (b), (c), (d), and (e) as quad EP, quad-graph EP, BR EP, BRs EP, and quad-system EP, respectively. 3 Query Answering on Quad-Systems In the realm of quad-systems, the classical conjunctive queries or select-project-join queries are slightly extended to what we call Contextualized Conjunctive Queries (CCQs). A CCQ CQ(x) is an expression of the form: 9y q1 (x, y) ^ ... ^ qp (x, y) (2) where qi , for i = 1, ..., p are quad patterns over vectors of free variables x and quanti- fied variables y. A CCQ is called a boolean CCQ if it does not have any free variables. For any CCQ CQ(x) and a vector a of constants s.t. kxk = kak, CQ(a) is boolean. A vector a is an answer for a CCQ CQ(x) w.r.t. structure S IC , in symbols IC |= CQ(a), iff there exists assignment µ : {y} ! B s.t. IC |= i=1,...,p qi (a, y)[µ]. A vector a is a certain answer for a CCQ CQ(x) over a quad-system QSC , iff IC |= CQ(a), for every model IC of QSC . Given a quad-system QSC , a CCQ CQ(x), and a vector a, DP of determining whether QSC |= CQ(a) is called the CCQ EP. It can be noted that the other DPs over quad-systems that we have seen are reducible to CCQ EP. Hence, in this paper, we primarily focus on the CCQ EP. dChase of a Quad-System In order to do query answering over a quad-system, we em- ploy what has been called in the literature, a chase [22, 1], specifically, we adopt notion of the skolem chase given in Marnette [27] and Cuenca Grau et al [9]. In order to fit the framework of quad-systems, we extend the standard notion of chase to a distributed chase, abbreviated dChase. In the following, we show how the dChase of a quad-system can be constructed. For any BR r of the form (1), the skolemization sk(r) is the result of replacing each yi 2 {y} with a globally unique Skolem function fir , s.t. fir : Ckxk ! Bsk , where Bsk is a fresh set of blank nodes called skolem blank nodes. Intuitively, for every distinct vector a of constants, with kak = kxk, fir (a) is a fresh blank node, whose node id is a hash of a. Let f r = hf1r , ..., fkyk r i be a vector of distinct Skolem functions; for any BR 214 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules r the form (1), with slight abuse (Datalog notation) we write its skolemization sk(r) as follows: c1 : t1 (x, z), ..., cn : tn (x, z) ! c01 : t01 (x, f r ), ..., c0m : t0m (x, f r ) (3) Moreover, a skolemized BR r of the form (3) can be replaced by the following se- mantically equivalent set of formulas, whose symbol size is worst case quadratic w.r.t krk: {c1 : t1 (x, z), ..., cn : tn (x, z) ! c01 : t01 (x, f r ), (4) ..., c1 : t1 (x, z), ..., cn : tn (x, z) ! c0m : t0m (x, f r )} Note that each BR in the above set has exactly one quad pattern with optional function symbols in its head part. Also note that a BR with out function symbols can be replaced with a set of BRs with single quad-pattern heads. Hence, w.l.o.g, we assume that any BR in a skolemized set sk(R) of BRs is of the form (4). For any quad-graph QC and a skolemized BR r of the form (4), application of r on QC , denoted by r(QC ), is given as: [ r(QC ) = c01 : t01 (x, f r )[µ] | c1 : t1 (x, z)[µ] 2 QC , ..., cn : tn (x, z)[µ] 2 QC µ2V!C For any set of skolemized BRs R, application of R on QC is given by: R(QC ) = S r2R r(QC ). For any quad-graph QC , we define: [ lclosure(QC ) = {c : (s, p, o) |(s, p, o) 2 lclosure(graphQC (c))} c2C For any quad-system QSC = hQC , Ri, generating BRs RF is the set of BRs in sk(R) with function symbols, and the non-generating BRs is the set RI = sk(R) \ RF . Let dChase0 (QSC ) = lclosure(QC ); for i 2 N, dChasei+1 (QSC ) = lclosure(dChasei (QSC ) [ RI (dChasei (QSC ))), if RI (dChasei (QSC )) 6✓ dChasei (QSC ); lclosure(dChasei (QSC ) [ RF (dChasei (QSC ))), otherwise; The dChase of QSC , denoted dChase(QSC ), is given as: [ dChase(QSC ) = dChasei (QSC ) i2N Intuitively, dChasei (QSC ) can be thought of as the state of dChase(QSC ) at the end of iteration i. It can be noted that, if there exists i s.t. dChasei (QSC ) = dChasei+1 (QSC ), then dChase(QSC ) = dChasei (QSC ). An iteration i, s.t. dChasei (QSC ) is com- puted by the application of the set of (resp. non-)generating BRs RF (resp. RI ), on dChasei 1 (QSC ) is called a (resp. non-)generating iteration. The dChase dChase(QSC ) of a consistent quad-system QSC is a universal model [10] of the quad-system, i.e. it is a model of QSC , and for any model IC of QSC , there is a homomorphism from 215 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules dChase(QSC ) to IC . Hence, for any boolean CCQ CQ(), QSC |= CQ() iff there exists a map µ : V(CQ) ! C s.t. {CQ()}[µ] ✓ dChase(QSC ). We call the se- quence dChase0 (QSC ), dChase1 (QSC ), ..., the dChase sequence of QSC . The fol- lowing lemma shows that in a dChase sequence of a quad-system, the result of a single generating iteration and a subsequent number of non-generating iterations causes only an exponential blow up in size. Lemma 1. For a quad-system QSC = hQC , Ri, the following holds: (i) if i 2 N is a generating iteration, then kdChasei (QSC )k = O(kdChasei 1 (QSC )kkRk ), (ii) sup- pose i 2 N is a generating iteration, and for any j 1, i+1, ..., i+j are non-generating iterations, then kdChasei+j (QSC )k = O(kdChasei 1 (QSC )kkRk ), (iii) for any iter- ation k, dChasek (QSC ) can be computed in time O(kdChasek 1 (QSC )kkRk ). Proof (sketch). (i) R can be applied on dChasei 1 (QSC ) by grounding R to the set of constants in dChasei 1 (QSC ), the number of such groundings is of the order O( kdChasei 1 (QSC )kkRk ), kR(dChasei 1 (QSC ))k = O(kRk⇤kdChasei 1 (QSC )kkRk ). Since lclosure only increases the size polynomially, kdChasei (QSC )k = O( kdChasei 1 ( QSC )kkRk ). (ii) From (i) we know that kR(dChasei 1 (QSC ))k = O(kdChasei 1 (QSC )kkRk ). Since, no new constant is introduced in any subsequent non-generating iterations, and since any quad contains only four constants, the set of constants in any subsequent dChase iteration is O(4 ⇤ kdChasei 1 (QSC )kkRk ). Since only these many constants can appear in positions c, s, p, o of any quad generated in the subsequent iterations, the size of dChasei+j (QSC ) can only increase polynomially, which means that kdChasei+j ( QSC )k = O(kdChasei 1 (QSC )kkRk ). (iii) Since any dChase iteration k involves the following two operations: (a) lclosure(), and (b) computing R(dChasek 1 (QSC )). (a) can be done in PTIME w.r.t to its in- put. (b) can be done in the following manner: ground R to the set of constants in dChasei 1 (QSC ); then for each grounding g, if body(g) ✓ dChasei 1 (QSC ), then add head(g) to R(dChasek 1 (QSC )). Since, the number of such groundings is of the order O(kdChasek 1 (QSC )kkRk ), and checking if each grounding is contained in dChasek 1 (QSC ), can be done in time polynomial in kdChasek 1 (QSC )k, the time taken for (b) is O(kdChasek 1 (QSC )kkRk ). Hence, any iteration k can be done in time O(kdChasek 1 (QSC )kkRk ). t u Although, we now know how to compute the dChase of a quad-system, which can be used for deciding CCQ EP, it turns out that for the class of quad-systems whose BRs are of the form (1), which we call unrestricted quad-systems, the dChase can be infi- nite. This raises the question if there are other approaches that can be used, for instance similar problem arises in DLs with value creation, due to the presence of existential quantifiers, whereas the approaches like the one in Glim et al. [16] provides an algo- rithm for CQ entailment based on query rewriting. Theorem 1. The CCQ EP over unrestricted quad-systems is undecidable. Proof (sketch). We show that the well known undecidable problem of non-emptiness of intersection of context-free grammars (CFGs) is reducible to the CCQ entailment 216 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules problem. Given two CFGs, G1 = hV1 , T, S1 , P1 i and G2 = hV2 , T, S2 , P2 i, where V1 , V2 are the set of variables, T s.t. T \ (V1 [ V2 ) = ; is the set of terminals. S1 2 V1 is the start symbol of G1 , and P1 are the set of PRs of the form v ! w, where v 2 V , w is a sequence of the form w1 ...wn , where wi 2 V1 [ T . Similarly s2 , P2 is defined. Deciding whether the language generated by the grammars L(G1 ) and L(G2 ) have non-empty intersection is known to be undecidable [18]. Given two CFGs G1 = hV1 , T, S1 , P1 i and G2 = hV2 , T, S2 , P2 i, we encode gram- mars G1 , G2 into a quad-system QSc = hQc , Ri, with only a single context identifier c. Each PR r = v ! w 2 P1 [ P2 , with w = w1 w2 w3 ..wn , is encoded as a BR of the form: c : (x1 , w1 , x2 ), c : (x2 , w2 , x3 ), ..., c : (xn , wn , xn+1 ) ! c : (x1 , v, xn+1 ), where x1 , .., xn+1 are variables. For each terminal symbol ti 2 T , R contains a BR of the form: c : (x, rdf:type, C) ! 9y c : (x, ti , y), c : (y, rdf:type, C) and Qc is the singleton: {c : (a, rdf:type, C)}. It can be observed that: QSc |= 9y c : (a, S1 , y) ^ c : (a, S2 , y) , L(G1 ) \ L(G2 ) 6= ; We refer the reader to [23] for the complete proof. t u 4 Context Acyclic Quad-Systems: A decidable class In the previous section, we saw that query answering on unrestricted quad-systems is undecidable, in general. We in the following define a class of quad-systems for which query entailment is decidable. The class has the property that algorithms based on for- ward chaining, for deciding query entailment, can straightforwardly be implemented (by minor extensions) on existing quad stores. It should be noted that the technique we propose is reminiscent of the Weak acyclicity [13, 11] technique used in the realm of Datalog+-. Consider a BR r of the form: c1 : t1 (x, z), c2 : t2 (x, z) ! 9y c3 : t3 (x, y), c4 : t4 (x, y). Since such a rule triggers propagation of knowledge in a quad-system, specifically triples from the source contexts c1 , c2 to the target contexts c3 , c4 in a quad-system. As shown in Fig. 1, we can view a BR as a propagation rule across distinct com- c1 : t1 (x, z), c2 : t2 (x, z) ! 9y c3 : t3 (x, y), c4 : t4 (x, y) partments of knowledge, divided as contexts. For any BR of the form (1), each context in the set c1 c3 {c01 , ..., c0m } is said to de- c2 c4 pend on the set of con- texts {c1 , ..., cn }. In a quad- system QSC = hQC , Ri, for any r 2 R, of Fig. 1 the form (1), any context whose identifier is in the set {c | c : (s, p, o) 2 head(r), s or p or o is an existentially quantified variable}, is called a triple generating context (TGC). One can analyze the set of BRs in a quad-system QSC 217 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules using a context dependency graph, which is a directed graph, whose nodes are context identifiers in C, s.t. the nodes corresponding to TGCs are marked with a ⇤, and whose edges are constructed as follows: for each BR of the form (1), there exists an edge from each ci to c0j , for i = 1, ..., n, j = 1, ..., m. A quad-system is said to be context acyclic, iff its context dependency graph does not contain cycles involving TGCs. Example 1. Consider a quad-system, whose set of BRs R are: c1 : (x1 , x2 , U1 ) ! 9y1 c2 : (x1 , x2 , y1 ), c3 : (x2 , rdf:type, rdf:Property) (5) c2 : (x1 , x2 , z1 ) ! c1 : (x1 , x2 , U1 ) (6) c3 : (x1 , x2 , x3 ) ! c1 : (x1 , x2 , x3 ) where U1 be a URI, whose corresponding dependency graph is shown in Fig. 2. Note that the node corresponding to the triple generating context c2 is marked with a ‘⇤’ symbol. Since the cycle (c1 , c2 , c1 ) in the quad-system contains c2 which is a TGC, the quad-system is not context acyclic. In a context acyclic quad-system QSC , since there exists no cyclic path through any TGC node in the context dependency graph, there exists a set of TGCs C 0 ✓ C s.t. for any c 2 C 0 , there exists no incoming path5 from a TGC to c. We call such TGCs, level-1 TGCs. In other words, a TGC c is a level-1 TGC, if for any c0 2 C, there exists an incoming path from c0 to c, implies c0 is not a TGC. For l 1, a level-l+1 TGC c is a TGC that has an incoming path from a level-l TGC, and for any incoming path from a level-l0 TGC to c, is s.t. l0 l. Extending the notion of level also to the non- TGCs, we say that any non-TGC that does not have any incoming paths from a TGC is at level-0; we say that any non-TGC c 2 C is at level-l, if there exists an incoming path from a c1 level-l TGC to c, and for any incoming path from a level-l0 TGC to c, is s.t. l0 l. Hence, the set of contexts in a context acyclic quad-system can be partitioned using the above notion of levels. c2 c3 Definition 4. For a quad-system QSC , a con- ⇤ text c 2 C is said to be saturated in Fig. 2: Context an iteration i, iff for any quad of the form Dependency graph c : (s, p, o), c : (s, p, o) 2 dChase(QSC ) implies c : (s, p, o) 2 dChasei (QSC ). Intuitively, context c is saturated in the dChase iteration i, if no new quad of the form c : (s, p, o) will be generated in any dChasek (QSC ), for any k > i. The following lemma gives the relation between the sat- uration of a context and the required number of dChase iterations, for a context acyclic quad-system. Lemma 2. For any context acyclic quad-system, the following holds: (i) any level-0 context is saturated before the first generating iteration, (ii) any level-1 TGC is satu- rated after the first generating iteration, (iii) any level-k context is saturated before the k + 1th generating iteration. 5 assume that paths have at least one edge 218 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules Proof. Let QSC = hQC , Ri be the quad-system, whose first generating iteration is i. (i) for any level-0 context c, any BR r 2 R, and any quad-pattern of the form c : (s, p, o), if c : (s, p, o) 2 head(r), then for any c0 s.t. c0 : (s0 , p0 , o0 ) occurs in body(r) implies that c0 is a level-0 context and r is a non-generating BR. Also, since c0 is a level- 0 context, the same applies to c0 . Hence, it turns out that only non-generating BRs can bring triples to any level-0 context. Since at the end of iteration i 1, dChasei 1 (QSC ) is closed w.r.t. the set of non-generating BRs (otherwise, by construction of dChase, i would not be a generating iteration). This implies that c is saturated before the first generating iteration i. (ii) for any level-1 TGC c, any BR r 2 R, and any quad-pattern c : (s, p, o), if c : (s, p, o) 2 head(r), then for any c0 s.t. c0 : (s0 , p0 , o0 ) occurs in body(r) implies that c0 is a level-0 context (Otherwise level of c would be greater than 1). This means that only contexts from which triples get propagated to c are level-0 contexts. From (i) we know that all the level-0 contexts are saturated before ith iteration, and since during the ith iteration RF is applied followed by the lclosure() operation (RI need not be applied, since dChasei 1 (QSC ) is closed w.r.t. RI ), c is saturated after iteration i, the 1st generating iteration. (iii) can be obtained from generalization of (i) and (ii), and from the fact that any level-k context can only have incoming paths from contexts whose levels are less than or equal to k. t u ⇤ ⇤ c1 c1 c4 c4 .. .. .. c2 .. c2 c3 c3 ⇤ ⇤ .. .. (a) (b) Fig. 3 Example 2. Consider the dependency graph in Fig. 3a, where .. indicates part of the graph that is not under the scope of our discussion. The TGCs nodes c1 and c3 are marked with a ⇤. It can be seen that both c2 and c4 are level-0 contexts, since they do not have any incoming paths from TGCs. Since the only incoming paths to context c1 are from c2 and c4 , which are not TGCs, c1 is a level-1 TGC. Context c3 is a level-2 TGC, since it has an incoming path from the level-1 TGC c1 , and has no incoming path 219 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules from a TGC whose level is greater than 1. Since the level-0 contexts only have incoming paths from level-0 contexts and only appear on the head part of non-generating BRs, before first generating iteration, all the level-0 TGCs becomes saturated, as the set of non-generating BRs RI has been exhaustively applied. This situation is reflected in Fig. 3b, where the saturated nodes are shaded with gray. Note that after the first and second generating iterations c1 and c3 also become saturated, respectively. The following lemma shows that for context acyclic quad-systems, there exists a finite bound on the size and computation time of its dChase. Lemma 3. For any context acyclic quad-system QSC = hQC , Ri, the following holds: (i) the number of dChase iterations is finite, (ii) size of the dChase kdChase(QSC )k kQS k = O(22 C ), (iii) computing dChase(QSC ) is in 2EXPTIME, (iv) if R and the set of schema triples in QC is fixed, then kdChase(QSC )k is a polynomial in kQSC k, and computing dChase(QSC ) is in PTIME. Proof. (i) Since QSC is context-acyclic, all the contexts can be partitioned according to their levels. Also, the number of levels k is s.t. k |C|. Hence, applying lemma 1, before the k + 1th generating iteration all the contexts becomes saturated, and k + 1th generating iteration do not produce any new quads, terminating the dChase computation process. (ii) In the dChase computation process, since by lemma 1, any generating itera- tion and a sequence of non-generating iterations can only increase the dChase size exponentially in kRk, the size of the dChase before k + 1 th generating iteration is k k O(kdChase0 (QSC )kkRk ), which can be written as O(kQSC kkRk ) (†). As seen in (i), there can only be |C| generating iterations, and a sequence of non-generating iterations. Hence, applying k = |C| to (†), and taking into account the fact that |C| kQSC k, the kQS k size of the dChase kdChase(QSC )k = O(22 C ). (iii) Since in any dChase iteration except the final one, atleast one new quad should kQS k be produced and the final dChase can have at most O(22 C ) quads (by ii), the total kQS k number of iterations are bounded by O(22 C ) (†). Since from lemma 1, we know that for any iteration i, computing dChasei (QSC ) is of the order O(kdChasei 1 (QSC kQS k )kkRk ). Since, kdChasei 1 (QSC )k can at most be O(22 C ), computing dChasei ( kQSC k QSC ) is of the order O(2kRk⇤2 )). Also since kRk kQSC k, any iteration requires 2kQSC k O(2 ) time (‡). From (†) and (‡), we can conclude that the time required for computing dChase is in 2EXPTIME. (iv) In (ii) we saw that the size of the dChase before k + 1th generating iteration k is given by O(kQSC kkRk ) (⇧). Since by hypothesis kRk is a constant and also the k size of the dependency graph and the levels in it. Hence, the expression kRk in (⇧) amounts to a constant z. Hence, kdChase(QSC )k = O(kQSC k ). Hence, the size of z dChase(QSC ) is a polynomial in kQSC k. Also, since in any dChase iteration except the final one, atleast one quad should be produced and the final dChase can have at most O(kQSC kz ) quads, the total num- ber of iterations are bounded by O(kQSC kz ) (†). Also from lemma 1, we know that any dChase iteration i, computing dChasei (QSC ) involves two steps: (a) com- puting R(dChasei 1 (QSC )), and (b) computing lclosure(), which can be done in 220 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules PTIME in the size of its input. Since computing R(dChasei 1 (QSC )) is of the or- der O(kdChasei 1 (QSC )kkRk ), where |R| is a constant and kdChasei 1 (QSC )k is a polynomial is kQSC k, each iteration can be done in time polynomial in kQSC k (‡). From (†) and (‡), it can be concluded that dChase can be computed in PTIME. t u Lemma 4. For any context acyclic quad-system, the following holds: (i) data complex- ity of CCQ entailment is in PTIME (ii) combined complexity of CCQ entailment is in 2EXPTIME. Proof. For a context acyclic quad-system QSC = hQC , Ri, since dChase(QSC ) is fi- nite, a boolean CCQ CQ() can naively be evaluated by grounding the set of constants in the chase to the variables in the CQ(), and then checking if any of these ground- ings are contained in dChase(QSC ). The number of such groundings can at most be kdChase(QSC )kkCQ()k (†). (i) Since for data complexity, the size of the BRs kRk, the set of schema triples, and kCQ()k is fixed to constant. From lemma 3 (iv), we know that under the above mentioned settings the dChase can be computed in PTIME and is polynomial in the size of QSC . Since kCQ()k is fixed to a constant, and from (†), binding the set of constants in dChase(QSC ) on CQ() still gives a number of bindings that is worst case polynomial in the size of QSC . Since membership of these bindings can checked in the polynomially sized dChase in PTIME, the time required for CCQ evaluation is in PTIME. kQS k (ii) Since in this case kdChase(QSC )k = O(22 C ) (‡), from (†) and (‡), binding kQS k the set of constants in kdChase(QSC k to variables in CQ() amounts to O(2kCQ()k⇤2 C ) bindings. Since the size of dChase is double exponential in kQSC k, checking the mem- bership of each of these bindings can be done in 2EXPTIME. Hence, the combined complexity is in 2EXPTIME. t u Theorem 2. For any context acyclic quad-system, the following holds: (i) The data complexity of CCQ entailment is PTIME-complete, (ii) The combined complexity of CCQ entailment is 2EXPTIME-complete. For PTIME-hardness of data complexity, it can be shown that the well known problem of 3HornSat, the satisfiability of propositional Horn formulas with atmost 3 literals, and for 2EXPTIME-hardness for the combined complexity, it can be shown that the word problem of a double exponentially time bounded deterministic turing machine, which is a well known 2EXPTIME-hard problem, is reducible to the CCQ entailment problem (see [23] for detailed proof). Reconsidering the quad-system in example 1, which is not context acyclic. Suppose that the contexts are enabled with RDFS inferencing, i.e lclosure() = rdfsclosure(). During dChase construction, since any application of rule (5) can only create a triple in c2 in which the skolem blank node is in the object position, where as the application of rule (6), does not propogate constants in object postion to c1 . Although at a first look, the dChase might seem to terminate, but since the application of the following RDFS inference rule in c2 : (s, p, o) ! (o ,rdf:type, rdfs:Resource), derives a quad of the form c2 : ( :b, rdf:type, rdfs:Resource), where :b is the skolem blank-node created by the application of rule (5). Now by application of rule (6) leads 221 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules to c1 : ( :b, rdf:type, U1 ). Since rule (5) is applicable on c1 : ( :b, rdf:type, U1 ), which again brings a new skolem blank node to c2 , and hence the dChase construction doesn’t terminate. Hence, as seen above the notion of context acyclicity can alarm us about such infinite cases. 5 Related Work Contexts and Distributed Logics The work on contexts began in the 80s when Mc- Carthy [21] proposed context as a solution to the generality problem in AI. After this various studies about logics of contexts mainly in the field of KR was done by Guha [29], Distributed First Order Logics by Ghidini et al. [14] and Local Model Semantics by Giunchiglia et al. [15]. Primarily in these works contexts are formalized as a first or- der/propositional theories, and bridge rules were provided to inter-operate the various contexts. Some of the initial works on contexts relevant to semantic web were the ones like Distributed Description Logics [4] by Borgida et al., E-connections [26] by Kutz et al., Context-OWL [5] by Bouqet et al., and the work of CKR [31, 24] by Serafini et al. These were mainly logics based on DLs, which formalized contexts as OWL KBs, whose semantics is given using a distributed interpretation structure with additional se- mantic conditions that suits varying requirements. Compared to these works, the bridge rules we consider are much more expressive with conjunctions and existential variables that supports value/blank-node creation. 89 rules, TGDs, Datalog+- rules Query answering over rules with universal existen- tial quantifiers in the context of databases/KR, where these rules are called tuple gen- erating dependencies (TGDs)/Datalog+- rules, was done by Beeri and Vardi [3] even in the early 80s, where the authors show that the query entailment problem in gen- eral is undecidable. However, recently many classes of such rules have been identified for which query answering is decidable. These includes (a) fragments s.t. the result- ing models have bounded tree widths, called bounded treewidth sets (BTS), such as Weakly guarded rules [7], Frontier guarded rules [2], (b) fragments called finite unifica- tion sets (FUS), such as ‘sticky’ rules [6, 17], and (c) fragments called finite extension sets (FES), where sufficient conditions are enforced to ensure finiteness of the chase and its termination. The approach used for query answering in FUS is to rewrite the input query w.r.t. to the TGDs to another query that can be evaluated directly on the set of instances, s.t. the answers for the former query and latter query coincides. The approach is called the query rewriting approach. FES classes uses certain termination guarantying tests that check whether certain sufficient conditions are satisfied by the structure of TGDs. A large number of classes in FES are based on tests that detects ‘acyclicity conditions’ by analyzing the information flow between the TGD rules. Weak acyclicity [13, 11], was one of the first such notions, and was extended to joint acyclic- ity [25], super weak acyclicity [27], and model faithful acyclicity [9]. The most similar approach to ours is the weak acyclicity technique, where the structure of the rules is ana- lyzed using a dependency graph that models the propagation of constants across various predicates positions, and restricting the dependency graph to be acyclic. Although this technique can be used in our scenario by translating a quad-system to a set of TGDs; if the obtained translation is weakly acyclic, then one could use existing algorithms for chase computation for the TGDs to compute the chase, the query entailment check can 222 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules Quad-System dChase size w.r.t Data Complexity of Combined Complexity Fragment input quad-system CCQ entailment of CCQ entailment Unrestricted Quad-Systems Infinite Undecidable Undecidable Context acylic Quad-Systems Double exponential PTIME-complete 2EXPTIME-complete Table 1: Complexity info for various quad-system fragments be done by querying the obtained chase. However, our approach has the advantage of straightforward implementability on existing quad-stores. 6 Summary and Conclusion In this paper, we study the problem of query answering over contextualized RDF knowl- edge. We show that the problem in general is undecidable, and present a decidable class called context acyclic quad-systems. Table 1 summarizes the main results obtained. We can show that the notion of context acyclicity, introduced in section 4 can be used to extend the currently established tools for contextual reasoning to give support for ex- pressive BRs with conjuction and existentials with decidability guarantees. We view the results obtained in this paper as a general foundation for contextual reasoning and query answering over contextualized RDF knowledge formats such as Quads, and can straightforwardly be used to extend existing Quad stores to encorporate for-all existen- tial BRs of the form (1). References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Baget, J., Mugnier, M., Rudolph, S., Thomazo, M.: Walking the Complexity Lines for Gen- eralized Guarded Existential Rules. In: IJCAI. pp. 712–717 (2011) 3. Beeri, C., Vardi, M.Y.: The Implication Problem for Data Dependencies. In: ICALP. pp. 73–85 (1981) 4. Borgida, A., Serafini, L.: Distributed Description Logics: Assimilating Information from Peer Sources. J. Data Semantics 1, 153–184 (2003) 5. Bouquet, P., Giunchiglia, F., van Harmelen, F., Serafini, L., Stuckenschmidt, H.: C-OWL: Contextualizing Ontologies. In: ISWC. pp. 164–179 (2003) 6. Calı̀, A., Gottlob, G., Pieris, A.: Query Answering under Non-guarded Rules in Datalog+/-. In: Hitzler, P., Lukasiewicz, T. (eds.) RR. Lecture Notes in Computer Science, vol. 6333, pp. 1–17. Springer (2010) 7. Cali, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/-: A Family of Logical Knowledge Representation and Query Languages for New Applications. In: Logic in Computer Science (LICS), 25th Annual IEEE Symposium on. pp. 228 –242 (july 2010) 8. Carroll, J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: Proc. of the 14th int.l. conf. on WWW. pp. 613–622. ACM, New York, NY, USA (2005) 9. Cuenca Grau, B., Horrocks, I., Krötzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.: Acyclicity Conditions and their Application to Query Answering in Description Logics. In: Proceedings of the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR’12). pp. 243–253. AAAI Press (2012) 223 M. Joseph et al. Query answering over Contextualized RDF knowledge with Forall-Existential Bridge Rules 10. Deutsch, A., Nash, A., Remmel, J.: The chase revisited. In: Proceedings of the twenty- seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. pp. 149–158. PODS ’08 (2008) 11. Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints. In: In ICDT. pp. 225–241 (2003) 12. D.Lenat: The Dimensions of Context Space. Tech. rep., CYCorp (1998), published online http://www.cyc.com/doc/context-space.pdf 13. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data Exchange: Semantics and Query An- swering. In: Theoretical Computer Science. pp. 28(1):89–124 (2005) 14. Ghidini, C., Serafini, L.: Distributed first order logics. In: Frontiers Of Combining Systems 2, Studies in Logic and Computation. pp. 121–140. Research Studies Press (1998) 15. Giunchiglia, F., Ghidini, C.: Local models semantics, or contextual reasoning = locality + compatibility. Artificial Intelligence 127 (2001) 16. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Answering conjunctive queries in the SHIQ description logic. In: Proceedings of the IJCAI’07. pp. 299–404. AAAI Press (2007) 17. Gottlob, G., Manna, M., Pieris, A.: Polynomial Combined Rewritings for Existential Rules. In: KR’14: International Conference on Principles of Knowledge Representation and Rea- soning (2014) 18. Harrison, M.A.: Introduction to Formal Language Theory. Addison-Wesley Longman Pub- lishing Company, Inc., Boston, MA, USA, 1st edn. (1978) 19. Hayes, P. (ed.): RDF Semantics. W3C Recommendation (Feb 2004) 20. ter Horst, H.J.: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. Web Semantics: Science, Services and Agents on the WWW 3(2-3), 79–115 (2005) 21. J.McCarthy: Generality in AI. Comm. of the ACM 30(12), 1029–1035 (1987) 22. Johnson, D.S., Klug, A.C.: Testing containment of conjunctive queries under functional and inclusion dependencies. Computer and System Sciences 28, 167–189 (1984) 23. Joseph, M., Kuper, G., Serafini, L.: Query Answering over Contextualized RDF/OWL Knowledge with Forall-Existential Bridge Rules: Attaining Decidability using Acyclicity (full version). Tech. rep., CoRR Technical Report arXiv:1406.0893, Arxiv e-Print archive (2014), http://arxiv.org/abs/1406.0893 24. Joseph, M., Serafini, L.: Simple reasoning for contextualized RDF knowledge. In: Proc. of Workshop on Modular Ontologies (WOMO-2011) (2011) 25. Krötzsch, M., Rudolph, S.: Extending decidable existential rules by joining acyclicity and guardedness. In: Walsh, T. (ed.) Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11). pp. 963–968. AAAI Press/IJCAI (2011) 26. Kutz, O., Lutz, C., Wolter, F., Zakharyaschev, M.: E-Connections of Abstract Description Systems. Artificial Intelligence 156(1), 1–73 (2004) 27. Marnette, B.: Generalized schema-mappings: from termination to tractability. In: Proceed- ings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. pp. 13–22. PODS ’09, ACM, New York, NY, USA (2009) 28. McCarthy, J., Buvac, S., Costello, T., Fikes, R., Genesereth, M., Giunchiglia, F.: Formalizing Context (Expanded Notes) (1995) 29. R.Guha: Contexts: a Formalization and some Applications. Ph.D. thesis, Stanford (1992) 30. Schueler, B., Sizov, S., Staab, S., Tran, D.T.: Querying for meta knowledge. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web. pp. 625–634. ACM, New York, NY, USA (2008) 31. Serafini, L., Homola, M.: Contextualized knowledge repositories for the semantic web. Web Semantics: Science, Services and Agents on the World Wide Web (2012) 224