=Paper=
{{Paper
|id=Vol-1378/paper18
|storemode=property
|title=PPDL: Probabilistic Programming with Datalog
|pdfUrl=https://ceur-ws.org/Vol-1378/AMW_2015_paper_18.pdf
|volume=Vol-1378
|dblpUrl=https://dblp.org/rec/conf/amw/CateKO15
}}
==PPDL: Probabilistic Programming with Datalog==
PPDL: Probabilistic Programming with Datalog Balder ten Cate1 , Benny Kimelfeld?2,1 , and Dan Olteanu3,1 1 2 3 LogicBlox, Inc., USA Technion, Israel University of Oxford, UK 1 Introduction There has been a substantial recent focus on the concept of probabilistic pro- gramming [6] towards its positioning as a prominent paradigm for advancing and facilitating the development of machine-learning applications.4 A probabilistic- programming language typically consists of two components: a specification of a stochastic process (the prior), and a specification of observations that re- strict the probability space to a conditional subspace (the posterior). This paper gives a brief overview of Probabilistic Programming DataLog (PPDL), a recently proposed declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog [1]. By virtue of extend- ing Datalog, PPDL offers a natural integration with the database, and has a ro- bust declarative semantics, that is, semantic independence from the algorithmic evaluation of rules, and semantic invariance under logical program transforma- tions. It provides convenient mechanisms to allow common numerical probability functions as first-class citizens in the language; in particular, conclusions of rules may contain values drawn from such functions. 2 PPDL The semantics of a PPDL program is a probability distribution over the possible outcomes of the input database with respect to the program. These outcomes are minimal solutions with respect to a related program that involves existentially quantified variables in conclusions. Observations are incorporated by means of logical integrity constraints. As argued in [1], the ability to express probabilis- tic models concisely and declaratively in a Datalog extension, with probability distributions as first-class citizens, is what sets PPDL apart from the wealth of literature on probabilistic Datalog [3], probabilistic databases [8,11], and Markov Logic Networks [4, 7, 10]. In the remaining of this section we introduce PPDL using an example program, and show how to interpret it probabilistically. Our example is inspired by the burglar example of Pearl that has been fre- quently used for illustrating probabilistic programming. This example models a statistical process of alarming due to burglaries and earthquakes, and the goal is ? Taub Fellow – supported by the Taub Foundation 4 An effort in this direction is led by DARPA’s Probabilistic Programming for Advanc- ing Machine Learning (PPAML) program. House Business City ObservedAlarm id city id city name burglaryrate unit NP1 Napa NP3 Napa Napa 0.03 NP1 NP2 Napa YC1 Yucaipa Yucaipa 0.01 YC1 YC1 Yucaipa YC2 Fig. 1. Database instance in the running example. to estimate the likelihood that a given collection of alarms indicates these alarm- ing events. Consider a database consisting of the following relations: House(h, c) represents houses h and their location cities c, Business(b, c) represents businesses b and their location cities c, City(c, r) represents cities c and their associated bur- glary rates r, and ObservedAlarm(x) represents units (houses or businesses) x where the alarm went off. These are the EDB relations that are not changed by the program. Figure 1 shows an instance over the schema. Now consider the PPDL program P in Figure 2, where some rules use the Flip distribution in their heads. The first rule states, intuitively, that for every fact of the form City(c, r), there must be a fact Earthquake(c, y) where y is drawn from the Flip (Bernoulli) distribution with the parameter 0.01. The fourth rule states that a burglary hap- pens in a unit (house or business) with probability r, where r is a number that represents the rate of burglaries in the city of the unit (note that we represent by Burglary(x, c, 1) and Burglary(x, c, 0) the fact that a Burglary did, respectively, did not happen at unit x in city c; likewise for Earthquake and Trig). Finally, c1 is a constraint stating that Alarm and ObservedAlarm have the same tuples. We now address the semantics of a program. What does it mean for a rule head like Earthquake(c, Flip[0.01]) to be satisfied ? What if this fact is derived by multiple, or even equivalent, rules? Do we need to sample more than once? The probabilistic semantics of the above PPDL program is established via an extension to Datalog, named Datalog∃ , where rule heads can have existential quantifiers. Datalog∃ rules (a.k.a. existential rules, which are syntactically iso- morphic to tuple-generating dependencies), have been used extensively in many areas, including data exchange [5] and ontological reasoning [2, 9]. Our PPDL 1. Earthquake(c, Flip[0.01]) ← City(c, r) 2. Unit(h, c) ← Home(h, c) 3. Unit(b, c) ← Business(b, c) 4. Burglary(x, c, Flip[r]) ← Unit(x, c) , City(c, r) 5. Trig(x, Flip[0.6]) ← Unit(x, c) , Earthquake(c, 1) 6. Trig(x, Flip[0.9]) ← Burglary(x, c, 1) 7. Alarm(x) ← Trig(x, 1) c1. Alarm(x) ↔ ObservedAlarm(x) Fig. 2. PPDL program P for Pearl’s burglar example 1. ∃y EarthquakeFlip 2 (c, y, 0.01) ← City(c, r) 2. Unit(h, c) ← Home(h, c) 3. Unit(b, c) ← Business(b, c) 4. ∃y BurglaryFlip 3 (x, c, y, r) ← Unit(x, c) , City(c, r) Flip 5. ∃y Trig2 (x, y, 0.6) ← Unit(x, c) , Earthquake(c, 1) 6. ∃y TrigFlip 2 (x, y, 0.9) ← Burglary(x, c, 1) 7. Alarm(x) ← Trig(x, 1) 8. Earthquake(c, d) ← EarthquakeFlip 2 (c, d, p) Flip 9. Burglary(x, c, b) ← Burglary3 (x, c, b, p) 10. Trig(x, y) ← TrigFlip 2 (x, y, p) Fig. 3. The Datalog∃ program P b for the PPDL program P in Figure 2 program P gives rise to the Datalog∃ program P b in Figure 3. Note that this program does not take into account the constraints. To illustrate the transla- tion, note how rule 6 in P becomes rule 6 in P. b A special, distributional relation Flip symbol Trig2 is created for Trig that captures the intention of the rule: when- ever the premise holds (there is a Burglary at a unit x), then there exists a fact TrigFlip 2 (x, y, 0.9) where y is drawn from a Bernoulli distribution with parameter 0.9. Rule 10 is implicitly added to update Trig with the content of TrigFlip 2 , where the additional parameter (i.e., the above 0.9) is projected out. In the absence of constraints, a possible outcome of an input database in- stance I (a “possible world”) is a minimal super-instance of I that satisfies the rules of the program. One possible outcome of the input instance in Fig- ure 1 with respect to the rules of P is the database instance formed by the input instance and the relations in Figure 4. Each tuple of a distributional rela- tion has a weight, which is the probability of the random choice made for that fact. For presentation’s sake, the sampled values are under the attribute name draw . Ignoring the constraints (c1 in the example), the probability of this out- come is the product of all of the numbers in the columns titled “w(f ),” that is, 0.01 × 0.99 × 0.03 × · · · × 0.4. Note that this multiplication is an instance of the chain rule Pr(A1 ∧ · · · ∧ An ) = Pr(A1 ) × Pr(A2 |A1 ) × . . . , and does not reflect an assumption of independence among the involved draws. One needs to show that this formula gives a proper probability space (i.e., the probabilities of all possible worlds sum up to 1). We do so via an adaptation of the chase procedure [1]. Constraints, such as c1 in our example, do not trigger generation of tuples, but rather have the semantics of conditional probability: violating possible worlds (where Alarm is different from ObservedAlarm) are eliminated, and the proba- bility is normalized across the remaining worlds. Hence, we unify the concept of observations in Bayesian statistics with that of integrity constraints in databases. In summary, a PPDL program associates to every given input instance a prob- ability distribution over possible outcomes. One can then, for example, ask for the marginal probability of an event such as Burglary(NP1). Standard techniques EarthquakeFlip 2 Earthquake BurglaryFlip 3 city draw param w(f ) city draw unit city draw param w(f ) Napa 1 0.01 0.01 Napa 1 NP1 Napa 1 0.03 0.03 Yucaipa 0 0.01 0.99 Yucaipa 0 NP2 Napa 0 0.03 0.97 NP3 Napa 1 0.03 0.03 YU1 Yucaipa 0 0.01 0.99 Alarm Burglary Unit TrigFlip 2 unit unit city draw id city unit draw param w(f ) NP1 NP1 Napa 1 NP1 Napa NP1 1 0.9 0.9 NP2 NP2 Napa 0 NP2 Napa NP3 0 0.9 0.1 NP3 Napa 1 NP3 Napa NP1 1 0.6 0.6 YU1 Yucaipa 0 YU1 Yucaipa NP2 1 0.6 0.6 NP3 0 0.6 0.4 Trig unit draw NP1 1 NP2 1 NP3 0 Fig. 4. An outcome of the instance in Figure 1 with respect to the PPDL program P. from the probabilistic programming literature (analytical, as lifted inference, or sampling-based, as MCMC) can be used to answer such questions. 3 Discussion Currently, PPDL semantics supports only discrete numerical distributions (e.g., Poisson). But even then, the space of possible outcomes may be uncountable (as possible outcomes may be infinite). We have defined a probability measure over possible outcomes by applying the known concept of cylinder sets to a probabilis- tic chase procedure. We have also shown that the resulting semantics is robust under different chases; moreover, we have identified conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete) [1]. The framework has a natural extension to continuous distributions (e.g., Gaus- sian or Pareto), though this requires a nontrivial generalization of our semantics. Additional future directions include an investigation of semantic aspects of ex- pressive power, tractability of inference, and a practical implementation (e.g., corresponding sampling techniques). Acknowledgements We are thankful to Molham Aref, Vince Bárány, Todd J. Green and Emir Pasalic Zografoula Vagena for insightful discussions and feedback on this work. We are grateful to Kathleen Fisher and Suresh Jagannathan for including us in DARPA’s PPAML initiative; this work came from our efforts to design transla- tions of probabilistic programs into statistical solvers. References 1. V. Bárány, B. ten Cate, B. Kimelfeld, D. Olteanu, and Z. Vagena. Declarative statistical modeling with Datalog. CoRR, abs/1412.2221, 2014. 2. A. Calı̀, G. Gottlob, T. Lukasiewicz, B. Marnette, and A. Pieris. Datalog+/-: A family of logical knowledge representation and query languages for new applica- tions. In LICS, pages 228–242, 2010. 3. D. Deutch, C. Koch, and T. Milo. On probabilistic fixpoint and Markov chain query languages. In PODS, pages 215–226, 2010. 4. P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial In- telligence. Synthesis Lectures on AI and Machine Learning. Morgan & Claypool Publishers, 2009. 5. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: Semantics and query answering. In ICDT, volume 2572 of LNCS, pages 207–224. Springer, 2003. 6. N. D. Goodman. The principles and practice of probabilistic programming. In POPL, pages 399–402, 2013. 7. G. Gottlob, T. Lukasiewicz, M. Martinez, and G. Simari. Query answering under probabilistic uncertainty in Datalog+/ ontologies. Annals of Math.& AI, 69(1):37– 72, 2013. 8. B. Kimelfeld and P. Senellart. Probabilistic XML: models and complexity. In Adv. in Probabl. Databases for Uncertain Information Management, volume 304 of Studies in Fuzziness and Soft Computing, pages 39–66. 2013. 9. M. Krötzsch and S. Rudolph. Extending decidable existential rules by joining acyclicity and guardedness. In IJCAI, pages 963–968, 2011. 10. F. Niu, C. Ré, A. Doan, and J. W. Shavlik. Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. PVLDB, 4(6):373–384, 2011. 11. D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.