Well Founded Semantics for P2P Deductive Databases (Discussion Paper) Luciano Caroprese and Ester Zumpano DIMES, University of Calabria, 87036 Rende, Italy {l.caroprese, e.zumpano}@dimes.unical.it Abstract. This paper stems from previous works of the same authors in which a declarative semantics for Peer-to-Peer (P2P) systems, defined in terms of Preferred Weak Models, is proposed. Under this semantics only facts not making the local databases inconsistent can be imported. As in the general case a P2P system may admit many preferred weak models whose computational complexity is prohibitive, the paper looks for a more pragmatic solution. It assigns to a P2P system its Well Founded Model, a partial deterministic model that captures the intuition that if an atom is true in a preferred weak model, but it is false in another one, then it is undefined in the well founded model. 1 Introduction Different works investigate topic related to the integration of information and the computation of queries in P2P systems [5, 6, 18, 23, 25, 8, 2, 16]. This paper follows the proposal in [7, 8] in which the Preferred Weak Semantics has been proposed. Under this semantics only facts not making the local databases in- consistent can be imported, and the preferred weak models are the consistent scenarios in which peers import maximal sets of facts not violating constraints. Example 1. Consider a P2P in which the peer: (i) P 3 contains two atoms: r(a) and r(b); (ii) P 2 imports data from P 3 using the (mapping) rule q(X) ←- r(X) (Please, note the special syntax we use for mapping rules.). Moreover imported atoms must satisfy the constraint ← q(X), q(Y ), X 6= Y stating that the relation q may contain at most one tuple; (iii) P 1 imports data from P 2 , using the (mapping) rule p(X) ←- q(X). P 1 also contains the rules s ← p(X) stating that s is true if the relation p contains at least one tuple, and t ← p(X), p(Y ), X 6= Y , stating that t is true if the relation p contains at least two distinct tuples. The intuition is that, with r(a) and r(b) true in P 3 , either q(a) or q(b) could be imported in P 2 and, consequently, only one tuple is imported in the relation p of the peer P 1 . Note that whatever is the derivation in P 2 , s is derived in P 1 while t is not derived. Therefore, s and t are, respectively, true and false in P 1 . 2 Copyright c 2019 for the individual papers by the papers authors. Copying permit- ted for private and academic purposes. This volume is published and copyrighted by its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy. A P2P system may admits many preferred weak models and the computational complexity is prohibitive. Therefore, a more pragmatic solution is needed. This paper presents the Well Founded Model Semantics: a deterministic model a de- terministic model whose computation is polynomial time. As for future extension we plan to extend the semantics of P2P system by considering logic programs with function symbols [3, 4]. 2 P2P Systems: Syntax and Semantics Familiarity is assumed with deductive database [1], logic programming, stable models, head cycle free (HCF) program, well founded model and computational complexity [17, 21, 22, 24]. A (peer) predicate symbol is a pair i : p, where i is a peer identifier and p is a predicate symbol. A (peer) atom is of the form i : A, where i is a peer identifier and A is a standard atom. A (peer) literal is a peer atom i : A or its negation not i : A. A conjunction i : A1 , . . . , i : Am , not i : Am+1 , . . . , not i : An , φ, where φ is a conjunction of built-in atoms, will be also denoted as i : B, with B equals to A1 , . . . , Am , not Am+1 , . . . , not An , φ. A (peer) rule can be of one of the following three types: – standard rule. It is of the form i : H ← i : B, where i : H is an atom and i : B is a conjunction of atoms and built-in atoms. – integrity constraint. It is of the form ← i : B, where i : B is a conjunc- tion of literals and built-in atoms. – mapping rule. It is of the form i : H ←- j : B, where i : H is an atom, j : B is a conjunction of atoms and built-in atoms and i 6= j. i : H is called head while i : B (resp. j : B) is called body. Negation is allowed just in the body of integrity constraints. The definition of a predicate i : p consists of the set of rules in whose head the predicate symbol i : p occurs. A predicate can be of three different kinds: base predicate, derived predicate and mapping predicate. A base predicate is defined by a set of ground facts; a derived predicate is defined by a set of standard rules and a mapping predicate is defined by a set of mapping rules. An atom i : p(X) is a base atom (resp. derived atom, mapping atom) if i : p is a base predicate (resp. standard predicate, mapping predicate). Given an interpretation M , M [D] (resp. M [LP], M [MP]) denotes the subset of base atoms (resp. derived atoms, mapping atoms) in M . Definition 1. P2P System. A peer P i is a tuple hDi , LP i , MP i , IC i i, where (i) Di is a set of facts (local database); (ii) LP i is a set of standard rules; (iii) MP i is a set of mapping rules and (iv) IC i is a set of constraints over predicates defined by Di , LP i and MP i . A P2P system PS is a set of peers {P 1 , . . . , P n }. 2 Given a P2P system PS = {P 1 , . . . , P n }, where P i = hDi , LP i , MP i , IC i i, D, LP, MP and IC denote, respectively, the global sets of ground S facts, stan- dard rules, mapping rules and integrity constraints, i.e. D = i∈[1..n] Di , LP = S S S i∈[1..n] LP i , MP = i∈[1..n] MP i and IC = i∈[1..n] IC i . In the rest of this paper, with a little abuse of notation, PS will be also denoted both with the tuple hD, LP, MP, ICi and the set D ∪ LP ∪ MP ∪ IC. We now review the Preferred Weak Model semantics in [8]. For each peer P i = hDi , LP i , MP i , IC i i, the set Di ∪ LP i is a positive normal program, thus it admits just one minimal model that represents the local knowledge of P i . It is assumed that each peer is locally consistent, i.e. its local knowledge satisfies IC i (i.e. Di ∪ LP i |= IC i ). Therefore, inconsistencies may be introduced just when the peer imports data. The intuitive meaning of a mapping rule i : H ←- j : B ∈ MP i is that if the body conjunction j : B is true in the source peer P j the atom i : H can be imported in P i only if it does not imply (directly or indirectly) the violation of some constraint in IC i . Given a mapping rule r = H ←- B, the corresponding standard logic rule H ← B will be denoted as St(r). Analogously, given a set of mapping rules MP, St(MP) = {St(r) | r ∈ MP} and given a P2P system PS = D ∪ LP ∪ MP ∪ IC, St(PS) = D ∪ LP ∪ St(MP) ∪ IC. Given an interpretation M , an atom H and a conjunction of atoms B: - valM (H ← B) = valM (H) ≥ valM (B), - valM (H ←- B) = valM (H) ≤ valM (B). Therefore, if the body is true, the head of a standard rule must be true, whereas the head of a mapping rule could be true. Intuitively, a weak model M is an interpretation that satisfies all standard rules, mapping rules and constraints of PS and such that each atom H ∈ M [MP] (i.e. each mapping atom) is supported from a mapping rule H ←- B whose body B is satisfied by M . A preferred weak model is a weak model containing a maximal subset of mapping atoms. Definition 2. (Preferred) Weak Model. Given a P2P system PS = D ∪ LP ∪ MP ∪ IC, an interpretation M is a weak model for PS if {M } = MM(St(PS M )), where PS M is the program obtained from ground(PS) by removing all mapping rules whose head is false w.r.t. M . Given two weak models M and N , M is said to preferable to N , and is denoted as M w N , if M [MP] ⊇ N [MP]. Moreover, if M w N and N 6w M , then M = N . A weak model M is said to be preferred if there is no weak model N such that N = M . The set of weak models for a P2P system PS will be denoted by WM(PS), whereas the set of preferred weak models will be denoted by PWM(PS). 2 Theorem 1. For every consistent P2P system PS, PWM(PS) 6= ∅. Example 2. Consider a P2P system PS in which the peer: P 2 contains the facts q(a) and q(b), whereas P 1 contains the mapping rule p(X) ←- q(X) and the constraint ← p(X), p(Y ), X 6= Y . WM(PS) are: M0 = {q(a), q(b)}, M1 = {q(a), q(b), p(a)} and M2 = {q(a), q(b), p(b)}, whereas PWM(PS) are M1 and M2 as they import maximal sets of atoms from P 2 . 2 3 Computing the Preferred Weak Model Semantics This section presents an alternative characterization of the preferred weak model semantics, that allows to model a P2P system PS with a single logic program Rewt (PS). Let’s firstly introduce some preliminaries. Given an atom A = i : p(x), At denotes the atom i : pt (x) and Av denotes the atom i : pv (x). At will be called the testing atom, whereas Av will be called the violating atom. Definition 3. Given a conjunction B = A1 , . . . , Ah , not Ah+1 , . . . , not An , B1 , . . . , Bk , not Bk+1 , . . . , not Bm , φ (1) where Ai (i ∈ [1.. n]) is a mapping atom or a derived atom, Bi (i ∈ [1.. m]) is a base atom and φ is a conjunction of built in atoms, we define B t = At1 , . . . , Ath , not Ath+1 , . . . , not Atn , B1 , . . . , Bk , not Bk+1 , . . . , not Bm , φ (2) Therefore, given a negation free conjunction B = A1 , . . . , Ah , B1 , . . . , Bk , . . . , φ, then B t = At1 , . . . , Ath , B1 , . . . , Bk , φ. In the following, the rewriting of a P2P system is reported. Definition 4. Rewriting of an integrity constraint. Given an in- tegrity constraint i = ← B (that is of the form (1)) its rewriting is defined as Rewt (i) = {Av1 ∨ . . . ∨ Avh ← B t }. 2 If B t (of the form (2)), is true, at least one of the violating atoms Av1 , . . . , Avh is true. Therefore, at least one of the atoms A1 , . . . , Ah cannot be inferred. Definition 5. Rewriting of a standard rule. Given a standard rule s = H ← B, its rewriting is defined as Rewt (s) = {H ← B; H t ← B t ; Av1 ∨. . .∨Avh ← B t , H v }. 2 In order to find the mapping atoms that, if imported, generate some inconsis- tencies (i.e. in order to find their corresponding violating atoms), all possible mapping testing atoms are imported and the derived testing atoms are inferred. In the previous definition, if B t is true and the violating atom H v is true, then the body of the disjunctive rule is true and therefore it can be deduced that at least one of the violating atoms Av1 , . . . , Avh is true (i.e. to avoid such inconsistencies at least one of atoms A1 , . . . , Ah cannot be inferred). Definition 6. Rewriting of a mapping rule. Given a mapping rule m = H ←- B, its rewriting is defined as Rewt (m) = {H t ← B; H ← H t , not H v }. 2 Intuitively, to check whether a mapping atom H generates some inconsistencies, if imported in its target peer, a testing atom H t is imported in the same peer. Rather than violating some integrity constraint, it (eventually) generates, by rules obtained from the rewriting of standard rules and integrity constraints, the atom H v . In this case H, cannot be inferred and inconsistencies are prevented. Definition 7. Rewriting of a P2P system. S Given a P2P system PS = D S ∪ LP ∪ MP ∪ IC, then: Rew S t (MP) = m∈MP Rewt (m), Rewt (LP) = s∈LP Rewt (s), Rewt (IC) = i∈IC Rewt (i) and Rewt (PS) = D ∪ Rewt (LP) ∪ Rewt (MP) ∪ Rewt (IC) 2 Definition 8. Total Stable Model. Given a P2P system PS and a stable model M for Rewt (PS), the interpretation obtained by deleting from M its violating and testing atoms, denoted as T (M ), is a total stable model of PS. The set of total stable models of PS is denoted as T SM(PS). 2 Example 3. Consider the P2P system PS in Ex. 2. From Def. (7) we obtain: Rewt (PS) ={q(a); q(b); pt (X) ← q(X); p(X) ← pt (X), not pv (X); pv (X) ∨ pv (Y ) ← pt (X), pt (Y ), X 6= Y } The stable models of Rewt (PS) are: M1 = {q(a), q(b), pt (a), pt (b), pv (a), p(b)}, M2 = {q(a), q(b), pt (a), pt (b), p(a), pv (b)}. Then, the total stable models of PS are T SM(PS) = {{q(a), q(b), p(b)}, {q(a), q(b), p(a)}}. 2 Theorem 2. For every P2P system PS, T SM(PS) = PWM(PS). 2 4 Well Founded Semantics and Distributed Computation A P2P system may admit many preferred weak models whose computational complexity has been shown to be prohibitive [7]. Therefore, a deterministic model whose computation is guaranteed to be polynomial time is needed. In more details, the rewriting presented in Section 3 allows modeling a P2P system by a single disjunctive logic program. By assuming that this program is HCF, it can be rewritten into an equivalent normal program for which a Well Founded Model Semantics can be adopted. Such a semantics allows to compute in polynomial time a deterministic model describing the P2P system, by capturing the intuition that if an atom is true in a total stable (or preferred weak) model of PS and is false in another one, then it is undefined in the well founded model. Theorem 3. Let PS = D ∪ LP ∪ MP ∪ IC be a P2P system, then Rewt (PS) is HCF iff there are not two distinct atoms occurring in the body of a rule in ground(LP ∪ IC) mutually dependent by positive recursion. 2 we assume each P2P system PS is s.t. Rewt (PS) is HCF. PS will be called as HCF P2P system. From previous hypothesis, it follows that Rewt (PS) can be normalized as SM(Rewt (PS)) = SM(N ormalized(Rewt (PS))). Definition 9. Rewriting of an HCF P2P system. Given an HCF P2P system PS, Reww (PS) = N ormalized(Rewt (PS)). 2 Therefore, the preferred weak models of an HCF P2P system PS corresponds to the stable models of the normal program Reww (PS). Next step is to adopt for Reww (PS) a three-valued semantics that allows computing deterministic models and in particular the well founded model. Definition 10. Well Founded Semantics. Given an HCF P2P system PS and the well founded model of Reww (PS), say hT, F i, the well founded model semantics of PS is given by hT (T ), T (F )i. 2 Example 4. The rewriting of the HCF P2P system PS in Example 3 is: Reww (PS) ={q(a); q(b); pt (X) ← q(X); p(X) ← pt (X), not pv (X); pv (X) ← pt (X), pt (Y ), X 6= Y, not pv (Y ); pv (Y ) ← pt (X), pt (Y ), X 6= Y, not pv (X)} The well founded model of Reww (PS) is h{q(a), q(b), pt (a), pt (b)}, ∅i and the well founded semantics of PS is given by h{q(a), q(b)}, ∅i. The atoms q(a) and q(b) are true, while the atoms p(a) and p(b) are undefined. 2 Theorem 4. Let PS be a HCF P2P system, then deciding: (i) whether an in- terpretation M is a preferred weak model of PS is P -time; (ii) whether an atom A is true in some preferred weak model of PS is N P-complete; (iii) whether an atom A is true in every preferred weak model of PS is coN P-complete; (iv) whether an atom A is true in the well founded model of PS is P -time. 2 Reww (PS) allows to compute the well founded semantics of PS in polynomial time. In this section we present a technique allowing to compute the well founded model in a distributed way. The basic idea is that each peer computes its own portion of the “unique logic program”, sending to the other peers the result. Definition 11. Let PS= be an HCF P2P system, where P i = hDi , LP i , MP i , IC i i, for i ∈ [1..n]. Then, Reww (P i ) = Reww (Di ∪ LP i ∪ MP i ∪ IC i ). Previous definition allows to S derive a single normal logic program for each peer. Observe that, Reww (PS) = i∈[1..n] Reww (P i ). Example 5. The rewritings located on peers P 1 and P 2 of Ex. 2 are: Reww (P 1 ) = {pt (X) ← q(X); p(X) ← pt (X), not pv (X); pv (X) ← pt (X), pt (Y ), not pv (Y ), X 6= Y ; 2 pv (Y ) ← pt (X), pt (Y ), not pv (X), X 6= Y }. Reww (P 2 ) = {q(a); q(b)}. The idea is the following: if a peer receives a query, then it will recursively query the peer to which it is connected through mapping rules, before being able to calculate its answer. Formally, a local query submitted by a user to a peer does not differ from a remote query submitted by another peer. Once retrieved the necessary data from neighbor peers, the peer computes its well founded model and evaluates the query (either local or remote) on that model; then if the query is a remote query, the answer is sent to the requesting peer. For the sake of presentation, a partial model reports, using the syntax [T, U ], the sets of true and undefined atoms, instead of the sets of true and false atoms. Example 6. Consider the P2P system in Example 2. If P 1 receives the query p(X), it submits the query p(X) to the peer P 2 . Once P 2 receives the query q(X), it computes its well founded model W2 = [{q(a), q(b)}, ∅]. P 1 receives the data, populates its local database and computes its well founded model W1 = [{pt (a), pt (b)}, {pv (a), pv (b), p(a), p(b)}] and finally, evaluates the query p(X) over W1 . The answer will be [∅, {p(a), p(b)}]. Observe that, the peer replies providing two undefined atoms: p(a) and p(b). 2 Algorithm: ComputeAnsweri input: 1) i : q(X) - a query 2) s - a sender which is a peer P k (remote query) or null (local query) output: [T, U ] where T (resp. U ) is the set of true (resp. undefined ) atoms begin while true wait for an input i : q(X) and s; P = Reww (P i ); for each (i : h(X) ←- j : b(X)) ∈ MP i [T, U ] = ComputeAnswerj (j : b(X), P i ); P = P ∪ T ∪ {a ← not a | a ∈ U }; end for [Tw , Uw ] = ComputeW ellF oundedM odel(P ); answer = [{i : q(x) | i : q(x) ∈ Tw }, {i : q(x) | i : q(x) ∈ Uw }]; if isN ull(s) then show(answer); else send(answer, P k ); end if end while end We associate to a P2P system, PS, a graph g(PS) whose nodes are the peers of PS and whose edges are the pairs (P i , P j ) s.t. (i : H ←- j : B) ∈ PS. We say that PS is acyclic if g(PC) is acyclic. In order to guarantee the termination of the computation, from now on we assume that our P2P systems are acyclic. Without loss of generality, we assume that each mapping rule is of the form i : p(X) ←- j : q(X). The behavior of the peer P i = hDi , LP i , MP i , IC i i within the P2P system can be modeled by the algorithm ComputeAnsweri , running on the peer P i . It receives in input a query i : q(X) submitted by a sender s which can be another peer or a user. For each mapping rule i : h(X) ←- j : b(X), the peer P i queries the peer P j asking for j : b(X). P i will receive true and undefined tuples that will enrich the local knowledge. Observe that, true atoms will be simply inserted into P while for each undefined atom a, a rule a ← not a allowing to infer the correct truth value for a (undefined) in the well founded model, is inserted. Once the local knowledge has been updated, the local well founded model is computed by means of the function ComputeW ellF oundedM odel and the answer for the query is extracted. If the sender is a user then the answer is simply shown, otherwise (if it is a peer) it is sent back to it. A system prototype for query answering in a P2P network based on the proposed semantics has been implemented. The system has been developed using Java. The communication among peers is performed by using JXTA libraries [20]. XSB [26] is a logic programming and deductive database system used for com- puting the answer to a given query using the well founded semantics. InterProlog [19] is an open source front-end that provides Java with the ability to interact with XSB engine. References 1. Abiteboul,S., Hull,R., Vianu,V., Foundations of Databases. Addison-Wesley, 1994. 2. Bertossi, L. and Bravo, L. Query Answering in Peer-to-Peer Data Exchange Sys- tems. EDBT Workshops, 2004. 3. M. Calautti, S. Greco, I. Trubitsyna: Detecting Decidable Classes of Finitely Ground Logic Programs with Function Symbols. ACM Trans. Comput. Log. 18(4): 28:1-28:42 (2017) 4. M. Calautti, S. Greco, F. Spezzano, I. Trubitsyna: Checking termination of bottom- up evaluation of logic programs with function symbols. TPLP 15(6): 854-889 (2015) 5. Calı̀, A., Calvanese, D., De Giacomo, G., and M. Lenzerini, On the decidability and complexity of query answering over inconsistent and incomplete databases. PODS, 260-271, 2003. 6. Calvanese, D., De Giacomo, G., and M. Lenzerini, and Rosati, R., Logical founda- tions of peer-to-peer data integration. PODS, 241-251, 2004. 7. Caroprese, L., Zumpano, E., A Logic Framework for P2P Deductive Databases, TPLP 2019. 8. Caroprese, L., Zumpano, E., Modeling Cooperation in P2P Data Management Systems. ISMIS 2008: 225-235. 9. Caroprese, L., Zumpano, E., P2P Deductive Databases: Well Founded Semantics and Distributed Computation. ADBIS 2017: 91-99 10. Caroprese, L., Zumpano, E., A Declarative Semantics for P2P Systems. CD-MAKE 2017: 315-329 11. Caroprese, L., Zumpano, E., Computing a Deterministic Semantics for P2P De- ductive Databases. IDEAS 2017: 184-191 12. Caroprese, L., Zumpano, E., P2P deductive databases: a system prototype. iiWAS 2017: 258-265 13. Caroprese, L., Zumpano, E., Integration of Unsound Data in P2P Systems. ADBIS 2018: 278-290 14. Caroprese, L., Zumpano, E., Generalized Maximal Consistent Answers in P2P Deductive Databases. DEXA (2) 2016: 368-376. 15. Caroprese, L., Zumpano, E., A Deterministic Model for P2P Deductive Databases. IDEAS 2016: 193-198. 16. Franconi, E., Kuper, G. M., Lopatenko, A., Zaihrayeu,I., A Robust Logical and Computational Characterisation of Peer toPeer Database Systems. DBISP2P, 64- 76, 2003. 17. Gelfond, M., Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. Fifth Conf. on Logic Programming, 1070-1080, 1998. 18. Halevy, A., Ives, Z., Suciu, D., and Tatarinov, I., Schema mediation in peer data management systems. int. Conf. on Database Theory, pages 505-516, 2003. 19. InterProlog. http://www.declarativa.com/interprolog/. 20. Project JXTA, http://www.jxta.org/. 21. Lonc, Z., Truszczynski, M., On the Problem of Computing the Well-Founded Se- mantics. Computational Logic 2000: 673-687. 22. J. W. Lloyd, Foundations of Logic Programming. Springer-Verlag, 1987. 23. Madhavan, J., and Halevy, A. Y., Composing mappings among data sources. VLDB, 572-583, 2003. 24. Papadimitriou, C. H., Computational Complexity. Addison-Wesley, 1994. 25. Tatarinov, I., Halevy., A., Efficient Query reformulation in Peer Data Management Systems, SIGMOD, pages 539-550, 2004. 26. XSB Project. http://xsb.sourceforge.net.