1. Introduction

probKanren: A Simple Probabilistic extension for microKanren

Robert Zinkov

zinkov@robots.ox.ac.uk 0

William E. Byrd

webyrd@uab.edu 1 0 Dept. of Engineering Science, University of Oxford , 25 Banbury Rd, Oxford , UK 1 Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham , 705 20th Street S., Birmingham, AL , USA

Probabilistic programming can be conceptually seen as generalisation of logic programming where instead of just returning a set of answers to a given query, we also return a probability distribution over those answers. But many contemporary probabilistic logic programming languages implementations are not simple extensions of existing logic programming languages but instead involve their own unique implementations. Here we introduce probKanren, a simple extension to microKanren that transforms it into a probabilistic programming language without needing to make any modifications to the underlying logic language's search. We use several illustrative examples from the probabilistic programming and program synthesis literature to demonstrate the practicality of the approach.

Probabilistic Logic Programming miniKanren Probabilistic Programming Sequential Monte Carlo

1. Introduction

Conceptually, logic programming provides a way to model non-determinism. This is accomplished by maintaining a set of answers that satisfy a set of logical constraints. A natural generalisation to this domain is adding a notion of uncertainty to this set of answers by associating with them a probability distribution.

But the conceptual simplicity of this sort of generalisation is not reflected in the complexity of many existing probabilistic logic programming systems. They often involve implementing sophisticated inference algorithms and the underlying systems are not just implemented on top of existing logic programming systems.

We believe that a conceptually simple extension deserves a conceptually simple implementation to go along with it. We thus contribute a simple way to extend microKanren, a small logic programming domain-specific language such that it becomes a probabilistic programming language.

1.1. Illustrated Example

To help explain how to use probKanren we introduce the following example: nEvelop-O LGOBE PLP 2021 https://zinkov.com/ (R. Zinkov); http://webyrd.net/ (W. E. Byrd)

2. Related Work

There is a rich history of extending logic programming formalisms to support probabilistic inference. Early systems like PRISM[1] and ProbLog[2] allowed associating discrete distributions with facts. Early versions of ProbLog were also built on top of Prolog, matching one of the goals of our work. Later work[3] introduces distribution clauses so that some continuous distributions like Gaussians can be represented. Other work extended these methods further while focusing on eficient exact inference algorithms like weighted model integration[ 4, 5]. Our work is most similar to [6], except that while they combine their forward reasoning with an importance sampler we use a particle cascade instead which can be more sample eficient. 3. Background

3.1. microKanren

microKanren[7, 8] is a pure logic programming language embedded in Scheme. The language consists of a set of terms, a set of goal primitives, and two run forms that turn goal queries into a set of answers. The goal primitives consist of a f r e s h form for introducing logic variables, a unification primitive = = , a conjunction combinator c o n j , a disjunction combinator d i s j , and ways to define and apply relations. Further forms are shown in detail in Figure 1.

Goal expressions represent a possibly backtracking search procedure. These goals all take as input some state and output a stream of possible output states. We run these goals by starting with an initial state that holds an empty substitution dictionary, and passing it into the top-level goal expression which then passes it recursively down to sub-expressions. Encountering f r e s h extends the lexical environment with a binding between a lexical variable and a logic variable; c o n j will apply the first goal to the state passed in, and then apply the second goal to the states associated with each resulting stream from the first goal and finally unify the results; d i s j will apply both goals to the same input state and concatenate the resulting streams; and = = unifies its arguments in the context of the current state, discarding any streams which fail to unify.

To handle goal expressions that might diverge, the streams are expanded in an interleaving[9] fashion, giving each branch a chance to produce answers. This interleaving search is sound and complete[10]. ⟨prog⟩ ::= (run ⟨number ⟩ (⟨id⟩) ⟨goal-expr ⟩) ⟨goal-expr ⟩ ::= (disj ⟨goal-expr ⟩ ⟨goal-expr ⟩) | (conj ⟨goal-expr ⟩ ⟨goal-expr ⟩) | (fresh (⟨id⟩) ⟨goal-expr ⟩) | (== ⟨term-expr ⟩ ⟨term-expr ⟩) | (letrec-rel ((⟨id⟩ (⟨id⟩ ...) ⟨goal-expr ⟩) ...)

⟨goal-expr ⟩) | (call-rel ⟨lexical-var-ref ⟩ ⟨term-expr ⟩ ...) | (prim-rel-call ⟨lexical-var-ref ⟩ ⟨term-expr ⟩ ...) | (delay ⟨goal-expr ⟩) ⟨term-expr ⟩ ::= (quote ⟨datum⟩) | ⟨lexical-var-ref ⟩ | (cons ⟨term-expr ⟩ ⟨term-expr ⟩) | ⟨term⟩ ⟨term⟩ ::= ⟨number ⟩ | #f | #t | ⟨symbol⟩ | (⟨term⟩ . ⟨term⟩) | ⟨logic-var ⟩

3.2. Grammar and Definitions

To create probKanren, we extend this grammar with distribution clauses such as n o r m a l and b e r n . Distribution clauses take as arguments the parameters of the distribution and a last argument representing a draw from that distribution. For example, ( n o r m a l 0 1 x ) means x represents a draw from the ( 0, 1 ) distribution.

These are just another type of goal expression. We do not need to add a notion of probabilistic variables to the language grammar, as they can treated as logic variables constrained in a particular way. The semantics of the language though does change from an answer set to a probability distribution on that answer set.

We follow the semantics of [11], this is a denotational semantics where each language form is associated with a measurable function, and an operational semantics of a sampler.

3.3. Probabilistic Programming

Probabilistic Programming Languages[12, 13] are a family of domain-specific languages for posing and eficiently solving probabilistic modelling problems. At their core, all have a way to s a m p l e and o b s e r v e data under a probability distribution or Markov kernel.

There are many inference algorithms for probabilistic programming languages, but methods based on likelihood-weighting and Sequential Monte Carlo algorithms are the simplest to implement.

3.4. Sequential Monte Carlo

Sequential Monte Carlo[14](SMC) is an eficient online way to sample from probabilistic models, which is especially suited for state-space domains. If we imagine our probabilistic programs as straight-line programs with no control-flow we can imagine numbering every sample function 1, 2, … , and every observe function 1, 2, … , then our probability density over our latent variables 0∶ and observed data 0∶ can be defined as: ( 0∶ , 0∶ ) = ( 0 ) ∏ ( ∣ −1 ) ∏ ( ∣ ) =1 =0

3.5. Sequential Importance Sampling

lets us approximate each intermediary distribution.

The simplest way to sample from our target distribution ( 0∶ ∣ 0∶ ) is to sample from a sequence of intermediary distributions ( 0∶ ∣ 0∶ ) where goes from 1 to . We do this by sampling a population of particles { ( 1 ), … , () , … ( ) } from a proposal distribution ( ∣ 0∶−1 , 0∶ ), which when combined with a set of importance weights { ( 1 ), … , () , … ( ) } Thanks to the recurrence relation: ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 0∶ ∣ 0∶ ) = ( 0∶−1 ∣ 0∶−1 ) we know in rounds we compute the desired distribution as follows: Initialise with: ( ∣ −1 ) ( ∣ ) ( ∣ 0∶−1 ) Then for from 1 to 0() 0() = 1/ ∼ ( 0

) ()

We get the best results if ( ∣ 0∶−1 ) is equal to ( ∣ 0∶−1 , 0∶ ).

3.6. Sequential Importance Resampling

Unfortunately, over time SIS is likely to lead to most of the importance weight in a small fraction of the particles, with the rest of the particles having negligible weight. This is usually called weight degeneracy. We address this sample eficiency issue in the standard way by replicating the particles with high weight and dropping the ones with low weight. This is called a resampling step, and it happens after each round. Resampling efectively removes particles with low weight and duplicates particles with higher weight by sampling with replacement our existing particles. () () ∼ ( −1

(1∶ ) )

For each particle , resampling tells us which of the existing particles will be its ancestor. We set () =

Categorical() , which is called multinomial resampling but there are other methods as well[15].

3.7. Particle Cascade

Unfortunately, SMC as defined here requires having access to all the particles at every step in the process. This conflicts with the functional nature of our implementation. Instead we make use of Particle Cascades [16], removing this barrier allowing every particle to be resampled asynchronously with the associated weights being relative to a global running average.

4. Proposed Method

We propose to extend microKanren by turning the search into an SMC sampler. We accomplish this by augmenting each of the search streams with a set of particles. These particles represent the empirical distribution of that stream. Each particle has associated with it a substitution of all the logic and random variables as well as a weight that is proportional to the likelihood of the substitution. variables they specify.

We follow [3] and place the following restrictions on our distribution clauses and the random Firstly, the arguments of distribution clauses must be grounded. Secondly, a random variable cannot unify with any arithmetic expression.

An initial set of particles is created from the probabilistic program when it is first run. When encountering primitives such as n o r m a l we first check if all the parameter terms are grounded and then if the last argument is fresh, we sample a value for it for each particle and then add this sampled value along with the associated logic variable to the substitution associated with that particle. If all the terms are ground, we treat the primitive like an observation statement and multiply the weights of each substitution with the likelihood of the observation.

As conjunctions (c o n j ) are encountered, all particles in all output streams from the first goal become part of the initial states that are each passed to the second goal.

As disjunctions (d i s j ) are encountered, we split evenly the number of particles allocated to each stream. Whenever we encounter a unification primitive, we run a resampling step. This helps to prune particles with low weight and replicate particles with high weight.

As an optimisation, we may create more particles during resampling based on a globally stored counter of the efective sample size of all particles across all streams.

This extension does not modify the search and streams are managed exactly as in microKanren. An additional advantage, due to the microKanren search being complete, we are guaranteed to recover the true posterior as all paths of the search space will eventually be explored given enough particles.

5. Experiments

We validate that probKanren is at least as expressive as other probabilistic logic programming languages by implementing the Friends who Smoke model.

Friends who Smoke is a probabilistic logic program which models the social nature of who smokes cigarettes. The model predicts that people who are friends with people who smoke are more likely to smoke. We replicate the example on https://dtai.cs.kuleuven.be/problog/ tutorial/basic/05_smokers.html using 2000 particles and get an empirical distribution that seems to match up with the discrete distribution returned from ProbLog.

6. Conclusions References

We made a simple-to-implement extension to microKanren that lets us support probabilistic inference on both discrete and continuous distributions. The approach does not require modifying the underlying search algorithm or touching any of the backtracking code and comes with a theoretical guarantee that if the underlying search is complete then the probabilistic extension will recover the true posterior given enough particles. [7] J. Hemann, D. P. Friedman, W. E. Byrd, M. Might, A small embedding of logic programming with a simple complete search, in: Proceedings of the 12th Symposium on Dynamic Languages, DLS 2016, Association for Computing Machinery, 2016, p. 96–107. URL: https: //doi.org/10.1145/2989225.2989230. doi:1 0 . 1 1 4 5 / 2 9 8 9 2 2 5 . 2 9 8 9 2 3 0 . [8] D. P. Friedman, W. E. Byrd, O. Kiselyov, J. Hemann, The Reasoned Schemer, 2nd ed., MIT

Press, 2018. [9] O. Kiselyov, C.-c. Shan, D. P. Friedman, A. Sabry, Backtracking, interleaving, and terminating monad transformers: (functional pearl), ACM SIGPLAN Notices 40 (2005) 192–203. [10] D. Rozplokhas, A. Vyatkin, D. Boulytchev, Certified semantics for minikanren, in: Proceedings of the 2019 miniKanren and Relational Programming Workshop, 2019, pp. 80–98. [11] S. Staton, H. Yang, F. Wood, C. Heunen, O. Kammar, Semantics for probabilistic programming: Higher-order functions, continuous distributions, and soft constraints, in: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, Association for Computing Machinery, New York, NY, USA, 2016, p. 525–534. URL: https://doi.org/10.1145/2933575.2935313. doi:1 0 . 1 1 4 5 / 2 9 3 3 5 7 5 . 2 9 3 5 3 1 3 . [12] F. Wood, J. W. Meent, V. Mansinghka, A new approach to probabilistic programming inference, in: Artificial Intelligence and Statistics, PMLR, 2014, pp. 1024–1032. [13] J. van de Meent, B. Paige, H. Yang, F. Wood, An introduction to probabilistic programming,

CoRR abs/1809.10756 (2018). [14] N. Chopin, O. Papaspiliopoulos, et al., An Introduction to Sequential Monte Carlo, volume 4,

Springer, 2020. [15] R. Douc, O. Cappé, Comparison of resampling schemes for particle filtering, in: ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., IEEE, 2005, pp. 64–69. [16] B. Paige, F. D. Wood, A. Doucet, Y. W. Teh, Asynchronous anytime sequential monte carlo, in: Advances in Neural Information Processing Systems, 2014, pp. 3410–3418.

[1]

Sato ,

Kameya , Prism: a language for symbolic-statistical modeling , in: IJCAI , volume 97 , Citeseer , 1997 , pp. 1330 - 1339 .

[2]

De Raedt ,

Kimmig ,

Toivonen , Problog: A probabilistic prolog and its application in link discovery ., in: IJCAI , volume 7 , Hyderabad , 2007 , pp. 2462 - 2467 .

[3]

Gutmann ,

Jaeger , L. De Raedt, Extending problog with continuous distributions , in: International Conference on Inductive Logic Programming , Springer, 2010 , pp. 76 - 91 .

[4]

M. A.

Islam ,

Ramakrishnan , I. Ramakrishnan , Inference in probabilistic logic programs with continuous random variables , Theory and Practice of Logic Programming 12 ( 2012 ) 505 - 523 .

[5]

Belle ,

Passerini , G. Van den Broeck, Probabilistic inference in hybrid domains by weighted model integration , in: Twenty-Fourth International Joint Conference on Artificial Intelligence , 2015 .

[6]

Gutmann , I. Thon ,

Kimmig ,

Bruynooghe , L. De Raedt, The magic of logical inference in probabilistic programming , Theory and Practice of Logic Programming 11 ( 2011 ) 663 - 680 .