=Paper=
{{Paper
|id=Vol-2970/plppaper4
|storemode=property
|title=probKanren: A Simple Probabilistic Extension for microKanren
|pdfUrl=https://ceur-ws.org/Vol-2970/plppaper4.pdf
|volume=Vol-2970
|authors=Robert Zinkov,Willliam E. Byrd
|dblpUrl=https://dblp.org/rec/conf/iclp/ZinkovB21
}}
==probKanren: A Simple Probabilistic Extension for microKanren==
<pdf width="1500px">https://ceur-ws.org/Vol-2970/plppaper4.pdf</pdf>
<pre>
probKanren: A Simple Probabilistic extension for
microKanren
Robert Zinkov1 , William E. Byrd2
1
 Dept. of Engineering Science, University of Oxford, 25 Banbury Rd, Oxford, UK
2
 Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, 705 20th Street S., Birmingham, AL
35233, United States of America


                                         Abstract
                                         Probabilistic programming can be conceptually seen as generalisation of logic programming where
                                         instead of just returning a set of answers to a given query, we also return a probability distribution over
                                         those answers. But many contemporary probabilistic logic programming languages implementations are
                                         not simple extensions of existing logic programming languages but instead involve their own unique
                                         implementations. Here we introduce probKanren, a simple extension to microKanren that transforms it
                                         into a probabilistic programming language without needing to make any modifications to the underlying
                                         logic language’s search. We use several illustrative examples from the probabilistic programming and
                                         program synthesis literature to demonstrate the practicality of the approach.

                                         Keywords
                                         Probabilistic Logic Programming, miniKanren, Probabilistic Programming, Sequential Monte Carlo


1. Introduction
Conceptually, logic programming provides a way to model non-determinism. This is accom-
plished by maintaining a set of answers that satisfy a set of logical constraints. A natural
generalisation to this domain is adding a notion of uncertainty to this set of answers by associ-
ating with them a probability distribution.
   But the conceptual simplicity of this sort of generalisation is not reflected in the complexity
of many existing probabilistic logic programming systems. They often involve implementing
sophisticated inference algorithms and the underlying systems are not just implemented on top
of existing logic programming systems.
   We believe that a conceptually simple extension deserves a conceptually simple implemen-
tation to go along with it. We thus contribute a simple way to extend microKanren, a small
logic programming domain-specific language such that it becomes a probabilistic programming
language.

1.1. Illustrated Example
To help explain how to use probKanren we introduce the following example:

PLP 2021
Envelope-Open zinkov@robots.ox.ac.uk (R. Zinkov); webyrd@uab.edu (W. E. Byrd)
GLOBE https://zinkov.com/ (R. Zinkov); http://webyrd.net/ (W. E. Byrd)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
  (run 1000 (q)
   (conj
    (normal 0 3 q)
    (normal q 2 4)))

  In the above program, we have a probabilistic model for the Gaussian unknown mean problem.
We observe a value 4 from the 𝒩 (𝑞, 2) distribution, and that 𝑞 has a prior distribution of 𝒩 (0, 3).
This probKanren program draws 1000 samples from a conditional normal distribution that
represents the posterior probability distribution associated with 𝑞.


2. Related Work
There is a rich history of extending logic programming formalisms to support probabilistic
inference. Early systems like PRISM[1] and ProbLog[2] allowed associating discrete distributions
with facts. Early versions of ProbLog were also built on top of Prolog, matching one of the goals
of our work. Later work[3] introduces distribution clauses so that some continuous distributions
like Gaussians can be represented. Other work extended these methods further while focusing
on efficient exact inference algorithms like weighted model integration[4, 5]. Our work is most
similar to [6], except that while they combine their forward reasoning with an importance
sampler we use a particle cascade instead which can be more sample efficient.


3. Background
3.1. microKanren
microKanren[7, 8] is a pure logic programming language embedded in Scheme. The language
consists of a set of terms, a set of goal primitives, and two run forms that turn goal queries into
a set of answers. The goal primitives consist of a f r e s h form for introducing logic variables,
a unification primitive = = , a conjunction combinator c o n j , a disjunction combinator d i s j , and
ways to define and apply relations. Further forms are shown in detail in Figure 1.
    Goal expressions represent a possibly backtracking search procedure. These goals all take as
input some state and output a stream of possible output states. We run these goals by starting
with an initial state that holds an empty substitution dictionary, and passing it into the top-level
goal expression which then passes it recursively down to sub-expressions. Encountering f r e s h
extends the lexical environment with a binding between a lexical variable and a logic variable;
c o n j will apply the first goal to the state passed in, and then apply the second goal to the states
associated with each resulting stream from the first goal and finally unify the results; d i s j will
apply both goals to the same input state and concatenate the resulting streams; and = = unifies
its arguments in the context of the current state, discarding any streams which fail to unify.
    To handle goal expressions that might diverge, the streams are expanded in an interleaving[9]
fashion, giving each branch a chance to produce answers. This interleaving search is sound and
complete[10].
⟨prog⟩ ::= (run ⟨number⟩ (⟨id⟩) ⟨goal-expr⟩)

⟨goal-expr⟩ ::= (disj ⟨goal-expr⟩ ⟨goal-expr⟩)
  | (conj ⟨goal-expr⟩ ⟨goal-expr⟩)
  | (fresh (⟨id⟩) ⟨goal-expr⟩)
  | (== ⟨term-expr⟩ ⟨term-expr⟩)
  | (letrec-rel ((⟨id⟩ (⟨id⟩ ...) ⟨goal-expr⟩) ...)
     ⟨goal-expr⟩)
  | (call-rel ⟨lexical-var-ref ⟩ ⟨term-expr⟩ ...)
  | (prim-rel-call ⟨lexical-var-ref ⟩ ⟨term-expr⟩ ...)
  | (delay ⟨goal-expr⟩)

⟨term-expr⟩ ::= (quote ⟨datum⟩)
   | ⟨lexical-var-ref ⟩
   | (cons ⟨term-expr⟩ ⟨term-expr⟩)
   | ⟨term⟩

⟨term⟩ ::= ⟨number⟩
   | #f | #t
   | ⟨symbol⟩
   | (⟨term⟩ . ⟨term⟩)
   | ⟨logic-var⟩

Figure 1: Grammar for microKanren


3.2. Grammar and Definitions
To create probKanren, we extend this grammar with distribution clauses such as n o r m a l and
b e r n . Distribution clauses take as arguments the parameters of the distribution and a last
argument representing a draw from that distribution. For example, ( n o r m a l 0 1 x ) means x
represents a draw from the 𝒩 (0, 1) distribution.
    These are just another type of goal expression. We do not need to add a notion of probabilistic
variables to the language grammar, as they can treated as logic variables constrained in a
particular way. The semantics of the language though does change from an answer set to a
probability distribution on that answer set.
    We follow the semantics of [11], this is a denotational semantics where each language form
is associated with a measurable function, and an operational semantics of a sampler.

3.3. Probabilistic Programming
Probabilistic Programming Languages[12, 13] are a family of domain-specific languages for
posing and efficiently solving probabilistic modelling problems. At their core, all have a way to
s a m p l e and o b s e r v e data under a probability distribution or Markov kernel.
    There are many inference algorithms for probabilistic programming languages, but methods
based on likelihood-weighting and Sequential Monte Carlo algorithms are the simplest to
implement.
3.4. Sequential Monte Carlo
Sequential Monte Carlo[14](SMC) is an efficient online way to sample from probabilistic models,
which is especially suited for state-space domains. If we imagine our probabilistic programs as
straight-line programs with no control-flow we can imagine numbering every sample function
𝑓1 , 𝑓2 , … , 𝑓𝑛 and every observe function 𝑔1 , 𝑔2 , … , 𝑔𝑛 then our probability density over our latent
variables 𝑥0∶𝑇 and observed data 𝑦0∶𝑇 can be defined as:
                                                      𝑇                 𝑇
                         𝑝(𝑥0∶𝑇 , 𝑦0∶𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑓𝑡 (𝑥𝑡 ∣ 𝑥𝑡−1 ) ∏ 𝑔𝑡 (𝑦𝑡 ∣ 𝑥𝑡 )                  (1)
                                                  𝑡=1                  𝑡=0

3.5. Sequential Importance Sampling
The simplest way to sample from our target distribution 𝑝(𝑥0∶𝑇 ∣ 𝑦0∶𝑇 ) is to sample from a
sequence of intermediary distributions 𝑝(𝑥0∶𝑡 ∣ 𝑦0∶𝑡 ) where 𝑡 goes from 1 to 𝑇. We do this
                                                (1)       (𝑖)   (𝑁 )
by sampling a population of 𝑁 particles {𝑥𝑡 , … , 𝑥𝑡 , … 𝑥𝑡 } from a proposal distribution
                                                                               (1)      (𝑖)    (𝑁 )
𝑞(𝑥𝑡 ∣ 𝑥0∶𝑡−1 , 𝑦0∶𝑡 ), which when combined with a set of importance weights {𝑤𝑡 , … , 𝑤𝑡 , … 𝑤𝑡 }
lets us approximate each intermediary distribution.
   Thanks to the recurrence relation:

                                                               𝑓𝑡 (𝑥𝑡 ∣ 𝑥𝑡−1 )𝑔𝑡 (𝑦𝑡 ∣ 𝑥𝑡 )
                       𝑝(𝑥0∶𝑡 ∣ 𝑦0∶𝑡 ) = 𝑝(𝑥0∶𝑡−1 ∣ 𝑦0∶𝑡−1 )                                         (2)
                                                                    𝑝(𝑦𝑡 ∣ 𝑦0∶𝑡−1 )
  we know in 𝑇 rounds we compute the desired distribution as follows:
  Initialise with:

                                                (𝑖)
                                               𝑥0 ∼ 𝑝(𝑥0 )                                           (3)
                                               (𝑖)
                                              𝑤0 = 1/𝑁                                               (4)

  Then for 𝑡 from 1 to 𝑇

                                   (𝑖)           (𝑖)
                                  𝑥𝑡 ∼ 𝑞(𝑥𝑡 ∣ 𝑥0∶𝑡−1 )                                               (5)
                                               (𝑖)   (𝑖)         (𝑖)
                                  (𝑖)  (𝑖) 𝑓 (𝑥𝑡 ∣ 𝑥𝑡−1 )𝑔(𝑦𝑡 ∣ 𝑥𝑡 )
                                 𝑤𝑡 ∝ 𝑤𝑡−1         (𝑖)   (𝑖)
                                                                                                     (6)
                                                𝑞(𝑥𝑡 ∣ 𝑥0∶𝑡−1 )

  We get the best results if 𝑞(𝑥𝑡 ∣ 𝑥0∶𝑡−1 ) is equal to 𝑝(𝑥𝑡 ∣ 𝑥0∶𝑡−1 , 𝑦0∶𝑡 ).

3.6. Sequential Importance Resampling
Unfortunately, over time SIS is likely to lead to most of the importance weight in a small fraction
of the particles, with the rest of the particles having negligible weight. This is usually called
weight degeneracy. We address this sample efficiency issue in the standard way by replicating the
particles with high weight and dropping the ones with low weight. This is called a resampling
step, and it happens after each round. Resampling effectively removes particles with low weight
and duplicates particles with higher weight by sampling with replacement our existing particles.

                                  (𝑖)       (1∶𝑁 )
                                 𝑎𝑡 ∼ 𝑟(𝑤𝑡−1 )
                                                     (𝑖)
                                  (𝑖)           (𝑎 )
                                                𝑡
                                 𝑥𝑡 ∼ 𝑞(𝑥𝑡 ∣ 𝑥0∶𝑡−1 )
                                                  (𝑖)   (𝑖)         (𝑖)
                                  (𝑖)     (𝑖) 𝑓 (𝑥𝑡 ∣ 𝑥𝑡−1 )𝑔(𝑦𝑡 ∣ 𝑥𝑡 )
                                𝑤𝑡 ∝ 𝑤𝑡−1             (𝑖)   (𝑖)
                                                   𝑞(𝑥𝑡 ∣ 𝑥0∶𝑡−1 )

   For each particle 𝑖, resampling tells us which of the existing particles 𝑎𝑖 will be its ancestor. We
set 𝑟(𝑤) = Categorical(𝑤), which is called multinomial resampling but there are other methods
as well[15].

3.7. Particle Cascade
Unfortunately, SMC as defined here requires having access to all the particles at every step in
the process. This conflicts with the functional nature of our implementation. Instead we make
use of Particle Cascades [16], removing this barrier allowing every particle to be resampled
asynchronously with the associated weights being relative to a global running average.


4. Proposed Method
We propose to extend microKanren by turning the search into an SMC sampler. We accomplish
this by augmenting each of the search streams with a set of particles. These particles represent
the empirical distribution of that stream. Each particle has associated with it a substitution of
all the logic and random variables as well as a weight that is proportional to the likelihood of
the substitution.
   We follow [3] and place the following restrictions on our distribution clauses and the random
variables they specify.
   Firstly, the arguments of distribution clauses must be grounded. Secondly, a random variable
cannot unify with any arithmetic expression.
   An initial set of particles is created from the probabilistic program when it is first run. When
encountering primitives such as n o r m a l we first check if all the parameter terms are grounded
and then if the last argument is fresh, we sample a value for it for each particle and then add
this sampled value along with the associated logic variable to the substitution associated with
that particle. If all the terms are ground, we treat the primitive like an observation statement
and multiply the weights of each substitution with the likelihood of the observation.
   As conjunctions (c o n j ) are encountered, all particles in all output streams from the first goal
become part of the initial states that are each passed to the second goal.
   As disjunctions (d i s j ) are encountered, we split evenly the number of particles allocated to
each stream. Whenever we encounter a unification primitive, we run a resampling step. This
helps to prune particles with low weight and replicate particles with high weight.
   As an optimisation, we may create more particles during resampling based on a globally
stored counter of the effective sample size of all particles across all streams.
   This extension does not modify the search and streams are managed exactly as in microKan-
ren. An additional advantage, due to the microKanren search being complete, we are guaranteed
to recover the true posterior as all paths of the search space will eventually be explored given
enough particles.


5. Experiments
We validate that probKanren is at least as expressive as other probabilistic logic programming
languages by implementing the Friends who Smoke model.
   Friends who Smoke is a probabilistic logic program which models the social nature of who
smokes cigarettes. The model predicts that people who are friends with people who smoke
are more likely to smoke. We replicate the example on https://dtai.cs.kuleuven.be/problog/
tutorial/basic/05_smokers.html using 2000 particles and get an empirical distribution that seems
to match up with the discrete distribution returned from ProbLog.


6. Conclusions
We made a simple-to-implement extension to microKanren that lets us support probabilistic
inference on both discrete and continuous distributions. The approach does not require modify-
ing the underlying search algorithm or touching any of the backtracking code and comes with a
theoretical guarantee that if the underlying search is complete then the probabilistic extension
will recover the true posterior given enough particles.


References
 [1] T. Sato, Y. Kameya, Prism: a language for symbolic-statistical modeling, in: IJCAI,
     volume 97, Citeseer, 1997, pp. 1330–1339.
 [2] L. De Raedt, A. Kimmig, H. Toivonen, Problog: A probabilistic prolog and its application
     in link discovery., in: IJCAI, volume 7, Hyderabad, 2007, pp. 2462–2467.
 [3] B. Gutmann, M. Jaeger, L. De Raedt, Extending problog with continuous distributions, in:
     International Conference on Inductive Logic Programming, Springer, 2010, pp. 76–91.
 [4] M. A. Islam, C. Ramakrishnan, I. Ramakrishnan, Inference in probabilistic logic programs
     with continuous random variables, Theory and Practice of Logic Programming 12 (2012)
     505–523.
 [5] V. Belle, A. Passerini, G. Van den Broeck, Probabilistic inference in hybrid domains by
     weighted model integration, in: Twenty-Fourth International Joint Conference on Artificial
     Intelligence, 2015.
 [6] B. Gutmann, I. Thon, A. Kimmig, M. Bruynooghe, L. De Raedt, The magic of logical
     inference in probabilistic programming, Theory and Practice of Logic Programming 11
     (2011) 663–680.
 [7] J. Hemann, D. P. Friedman, W. E. Byrd, M. Might, A small embedding of logic programming
     with a simple complete search, in: Proceedings of the 12th Symposium on Dynamic
     Languages, DLS 2016, Association for Computing Machinery, 2016, p. 96–107. URL: https:
     //doi.org/10.1145/2989225.2989230. doi:1 0 . 1 1 4 5 / 2 9 8 9 2 2 5 . 2 9 8 9 2 3 0 .
 [8] D. P. Friedman, W. E. Byrd, O. Kiselyov, J. Hemann, The Reasoned Schemer, 2nd ed., MIT
     Press, 2018.
 [9] O. Kiselyov, C.-c. Shan, D. P. Friedman, A. Sabry, Backtracking, interleaving, and terminat-
     ing monad transformers: (functional pearl), ACM SIGPLAN Notices 40 (2005) 192–203.
[10] D. Rozplokhas, A. Vyatkin, D. Boulytchev, Certified semantics for minikanren, in: Pro-
     ceedings of the 2019 miniKanren and Relational Programming Workshop, 2019, pp. 80–98.
[11] S. Staton, H. Yang, F. Wood, C. Heunen, O. Kammar, Semantics for probabilistic pro-
     gramming: Higher-order functions, continuous distributions, and soft constraints, in:
     Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science,
     Association for Computing Machinery, New York, NY, USA, 2016, p. 525–534. URL:
     https://doi.org/10.1145/2933575.2935313. doi:1 0 . 1 1 4 5 / 2 9 3 3 5 7 5 . 2 9 3 5 3 1 3 .
[12] F. Wood, J. W. Meent, V. Mansinghka, A new approach to probabilistic programming
     inference, in: Artificial Intelligence and Statistics, PMLR, 2014, pp. 1024–1032.
[13] J. van de Meent, B. Paige, H. Yang, F. Wood, An introduction to probabilistic programming,
     CoRR abs/1809.10756 (2018).
[14] N. Chopin, O. Papaspiliopoulos, et al., An Introduction to Sequential Monte Carlo, volume 4,
     Springer, 2020.
[15] R. Douc, O. Cappé, Comparison of resampling schemes for particle filtering, in: ISPA
     2005. Proceedings of the 4th International Symposium on Image and Signal Processing
     and Analysis, 2005., IEEE, 2005, pp. 64–69.
[16] B. Paige, F. D. Wood, A. Doucet, Y. W. Teh, Asynchronous anytime sequential monte carlo,
     in: Advances in Neural Information Processing Systems, 2014, pp. 3410–3418.

</pre>