<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>probKanren: A Simple Probabilistic extension for microKanren</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert Zinkov</string-name>
          <email>zinkov@robots.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William E. Byrd</string-name>
          <email>webyrd@uab.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Engineering Science, University of Oxford</institution>
          ,
          <addr-line>25 Banbury Rd, Oxford</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham</institution>
          ,
          <addr-line>705 20th Street S., Birmingham, AL</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <issue>1</issue>
      <abstract>
        <p>Probabilistic programming can be conceptually seen as generalisation of logic programming where instead of just returning a set of answers to a given query, we also return a probability distribution over those answers. But many contemporary probabilistic logic programming languages implementations are not simple extensions of existing logic programming languages but instead involve their own unique implementations. Here we introduce probKanren, a simple extension to microKanren that transforms it into a probabilistic programming language without needing to make any modifications to the underlying logic language's search. We use several illustrative examples from the probabilistic programming and program synthesis literature to demonstrate the practicality of the approach.</p>
      </abstract>
      <kwd-group>
        <kwd>Probabilistic Logic Programming</kwd>
        <kwd>miniKanren</kwd>
        <kwd>Probabilistic Programming</kwd>
        <kwd>Sequential Monte Carlo</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Conceptually, logic programming provides a way to model non-determinism. This is
accomplished by maintaining a set of answers that satisfy a set of logical constraints. A natural
generalisation to this domain is adding a notion of uncertainty to this set of answers by
associating with them a probability distribution.</p>
      <p>But the conceptual simplicity of this sort of generalisation is not reflected in the complexity
of many existing probabilistic logic programming systems. They often involve implementing
sophisticated inference algorithms and the underlying systems are not just implemented on top
of existing logic programming systems.</p>
      <p>We believe that a conceptually simple extension deserves a conceptually simple
implementation to go along with it. We thus contribute a simple way to extend microKanren, a small
logic programming domain-specific language such that it becomes a probabilistic programming
language.</p>
      <sec id="sec-1-1">
        <title>1.1. Illustrated Example</title>
        <p>To help explain how to use probKanren we introduce the following example:
nEvelop-O
LGOBE
PLP 2021
https://zinkov.com/ (R. Zinkov); http://webyrd.net/ (W. E. Byrd)</p>
        <p>© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>There is a rich history of extending logic programming formalisms to support probabilistic
inference. Early systems like PRISM[1] and ProbLog[2] allowed associating discrete distributions
with facts. Early versions of ProbLog were also built on top of Prolog, matching one of the goals
of our work. Later work[3] introduces distribution clauses so that some continuous distributions
like Gaussians can be represented. Other work extended these methods further while focusing
on eficient exact inference algorithms like weighted model integration[ 4, 5]. Our work is most
similar to [6], except that while they combine their forward reasoning with an importance
sampler we use a particle cascade instead which can be more sample eficient.
3. Background</p>
      <sec id="sec-2-1">
        <title>3.1. microKanren</title>
        <p>microKanren[7, 8] is a pure logic programming language embedded in Scheme. The language
consists of a set of terms, a set of goal primitives, and two run forms that turn goal queries into
a set of answers. The goal primitives consist of a f r e s h form for introducing logic variables,
a unification primitive = = , a conjunction combinator c o n j , a disjunction combinator d i s j , and
ways to define and apply relations. Further forms are shown in detail in Figure 1.</p>
        <p>Goal expressions represent a possibly backtracking search procedure. These goals all take as
input some state and output a stream of possible output states. We run these goals by starting
with an initial state that holds an empty substitution dictionary, and passing it into the top-level
goal expression which then passes it recursively down to sub-expressions. Encountering f r e s h
extends the lexical environment with a binding between a lexical variable and a logic variable;
c o n j will apply the first goal to the state passed in, and then apply the second goal to the states
associated with each resulting stream from the first goal and finally unify the results; d i s j will
apply both goals to the same input state and concatenate the resulting streams; and = = unifies
its arguments in the context of the current state, discarding any streams which fail to unify.</p>
        <p>To handle goal expressions that might diverge, the streams are expanded in an interleaving[9]
fashion, giving each branch a chance to produce answers. This interleaving search is sound and
complete[10].
⟨prog⟩ ::= (run ⟨number ⟩ (⟨id⟩) ⟨goal-expr ⟩)
⟨goal-expr ⟩ ::= (disj ⟨goal-expr ⟩ ⟨goal-expr ⟩)
| (conj ⟨goal-expr ⟩ ⟨goal-expr ⟩)
| (fresh (⟨id⟩) ⟨goal-expr ⟩)
| (== ⟨term-expr ⟩ ⟨term-expr ⟩)
| (letrec-rel ((⟨id⟩ (⟨id⟩ ...) ⟨goal-expr ⟩) ...)</p>
        <p>⟨goal-expr ⟩)
| (call-rel ⟨lexical-var-ref ⟩ ⟨term-expr ⟩ ...)
| (prim-rel-call ⟨lexical-var-ref ⟩ ⟨term-expr ⟩ ...)
| (delay ⟨goal-expr ⟩)
⟨term-expr ⟩ ::= (quote ⟨datum⟩)
| ⟨lexical-var-ref ⟩
| (cons ⟨term-expr ⟩ ⟨term-expr ⟩)
| ⟨term⟩
⟨term⟩ ::= ⟨number ⟩
| #f | #t
| ⟨symbol⟩
| (⟨term⟩ . ⟨term⟩)
| ⟨logic-var ⟩</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Grammar and Definitions</title>
        <p>
          To create probKanren, we extend this grammar with distribution clauses such as n o r m a l and
b e r n . Distribution clauses take as arguments the parameters of the distribution and a last
argument representing a draw from that distribution. For example, ( n o r m a l 0 1 x ) means x
represents a draw from the  (
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ) distribution.
        </p>
        <p>These are just another type of goal expression. We do not need to add a notion of probabilistic
variables to the language grammar, as they can treated as logic variables constrained in a
particular way. The semantics of the language though does change from an answer set to a
probability distribution on that answer set.</p>
        <p>We follow the semantics of [11], this is a denotational semantics where each language form
is associated with a measurable function, and an operational semantics of a sampler.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. Probabilistic Programming</title>
        <p>Probabilistic Programming Languages[12, 13] are a family of domain-specific languages for
posing and eficiently solving probabilistic modelling problems. At their core, all have a way to
s a m p l e and o b s e r v e data under a probability distribution or Markov kernel.</p>
        <p>There are many inference algorithms for probabilistic programming languages, but methods
based on likelihood-weighting and Sequential Monte Carlo algorithms are the simplest to
implement.</p>
      </sec>
      <sec id="sec-2-4">
        <title>3.4. Sequential Monte Carlo</title>
        <p>Sequential Monte Carlo[14](SMC) is an eficient online way to sample from probabilistic models,
which is especially suited for state-space domains. If we imagine our probabilistic programs as
straight-line programs with no control-flow we can imagine numbering every sample function
 1,  2, … ,   and every observe function  1,  2, … ,   then our probability density over our latent
variables  0∶ and observed data  0∶ can be defined as:
( 0∶ ,  0∶ ) = ( 0
) ∏   (  ∣  −1 ) ∏   (  ∣   )

=1

=0</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.5. Sequential Importance Sampling</title>
        <p>lets us approximate each intermediary distribution.</p>
        <p>
          The simplest way to sample from our target distribution (
0∶ ∣  0∶ ) is to sample from a
sequence of intermediary distributions (
0∶ ∣  0∶ ) where  goes from 1 to  . We do this
by sampling a population of  particles {  (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), … ,  
()
, …  
( ) } from a proposal distribution
(  ∣  0∶−1 ,  0∶ ), which when combined with a set of importance weights { (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), … ,  
()
, …  
( ) }
Thanks to the recurrence relation:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
( 0∶ ∣  0∶ ) = ( 0∶−1 ∣  0∶−1 )
we know in  rounds we compute the desired distribution as follows:
Initialise with:
  (  ∣  −1 )  (  ∣   )
(  ∣  0∶−1 )
Then for  from 1 to 
 0()
 0() = 1/
∼ ( 0
        </p>
        <p>)
 
()</p>
        <p>We get the best results if (  ∣  0∶−1 ) is equal to (  ∣  0∶−1 ,  0∶ ).</p>
      </sec>
      <sec id="sec-2-6">
        <title>3.6. Sequential Importance Resampling</title>
        <p>Unfortunately, over time SIS is likely to lead to most of the importance weight in a small fraction
of the particles, with the rest of the particles having negligible weight. This is usually called
weight degeneracy. We address this sample eficiency issue in the standard way by replicating the
particles with high weight and dropping the ones with low weight. This is called a resampling
step, and it happens after each round. Resampling efectively removes particles with low weight
and duplicates particles with higher weight by sampling with replacement our existing particles.

 

()
()
∼  ( −1</p>
        <p>(1∶ ) )</p>
        <p>For each particle  , resampling tells us which of the existing particles   will be its ancestor. We
set  () =</p>
        <p>Categorical() , which is called multinomial resampling but there are other methods
as well[15].</p>
      </sec>
      <sec id="sec-2-7">
        <title>3.7. Particle Cascade</title>
        <p>Unfortunately, SMC as defined here requires having access to all the particles at every step in
the process. This conflicts with the functional nature of our implementation. Instead we make
use of Particle Cascades [16], removing this barrier allowing every particle to be resampled
asynchronously with the associated weights being relative to a global running average.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Proposed Method</title>
      <p>We propose to extend microKanren by turning the search into an SMC sampler. We accomplish
this by augmenting each of the search streams with a set of particles. These particles represent
the empirical distribution of that stream. Each particle has associated with it a substitution of
all the logic and random variables as well as a weight that is proportional to the likelihood of
the substitution.
variables they specify.</p>
      <p>We follow [3] and place the following restrictions on our distribution clauses and the random
Firstly, the arguments of distribution clauses must be grounded. Secondly, a random variable
cannot unify with any arithmetic expression.</p>
      <p>An initial set of particles is created from the probabilistic program when it is first run. When
encountering primitives such as n o r m a l we first check if all the parameter terms are grounded
and then if the last argument is fresh, we sample a value for it for each particle and then add
this sampled value along with the associated logic variable to the substitution associated with
that particle. If all the terms are ground, we treat the primitive like an observation statement
and multiply the weights of each substitution with the likelihood of the observation.</p>
      <p>As conjunctions (c o n j ) are encountered, all particles in all output streams from the first goal
become part of the initial states that are each passed to the second goal.</p>
      <p>As disjunctions (d i s j ) are encountered, we split evenly the number of particles allocated to
each stream. Whenever we encounter a unification primitive, we run a resampling step. This
helps to prune particles with low weight and replicate particles with high weight.</p>
      <p>As an optimisation, we may create more particles during resampling based on a globally
stored counter of the efective sample size of all particles across all streams.</p>
      <p>This extension does not modify the search and streams are managed exactly as in
microKanren. An additional advantage, due to the microKanren search being complete, we are guaranteed
to recover the true posterior as all paths of the search space will eventually be explored given
enough particles.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Experiments</title>
      <p>We validate that probKanren is at least as expressive as other probabilistic logic programming
languages by implementing the Friends who Smoke model.</p>
      <p>Friends who Smoke is a probabilistic logic program which models the social nature of who
smokes cigarettes. The model predicts that people who are friends with people who smoke
are more likely to smoke. We replicate the example on https://dtai.cs.kuleuven.be/problog/
tutorial/basic/05_smokers.html using 2000 particles and get an empirical distribution that seems
to match up with the discrete distribution returned from ProbLog.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions References</title>
      <p>We made a simple-to-implement extension to microKanren that lets us support probabilistic
inference on both discrete and continuous distributions. The approach does not require
modifying the underlying search algorithm or touching any of the backtracking code and comes with a
theoretical guarantee that if the underlying search is complete then the probabilistic extension
will recover the true posterior given enough particles.
[7] J. Hemann, D. P. Friedman, W. E. Byrd, M. Might, A small embedding of logic programming
with a simple complete search, in: Proceedings of the 12th Symposium on Dynamic
Languages, DLS 2016, Association for Computing Machinery, 2016, p. 96–107. URL: https:
//doi.org/10.1145/2989225.2989230. doi:1 0 . 1 1 4 5 / 2 9 8 9 2 2 5 . 2 9 8 9 2 3 0 .
[8] D. P. Friedman, W. E. Byrd, O. Kiselyov, J. Hemann, The Reasoned Schemer, 2nd ed., MIT</p>
      <p>Press, 2018.
[9] O. Kiselyov, C.-c. Shan, D. P. Friedman, A. Sabry, Backtracking, interleaving, and
terminating monad transformers: (functional pearl), ACM SIGPLAN Notices 40 (2005) 192–203.
[10] D. Rozplokhas, A. Vyatkin, D. Boulytchev, Certified semantics for minikanren, in:
Proceedings of the 2019 miniKanren and Relational Programming Workshop, 2019, pp. 80–98.
[11] S. Staton, H. Yang, F. Wood, C. Heunen, O. Kammar, Semantics for probabilistic
programming: Higher-order functions, continuous distributions, and soft constraints, in:
Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science,
Association for Computing Machinery, New York, NY, USA, 2016, p. 525–534. URL:
https://doi.org/10.1145/2933575.2935313. doi:1 0 . 1 1 4 5 / 2 9 3 3 5 7 5 . 2 9 3 5 3 1 3 .
[12] F. Wood, J. W. Meent, V. Mansinghka, A new approach to probabilistic programming
inference, in: Artificial Intelligence and Statistics, PMLR, 2014, pp. 1024–1032.
[13] J. van de Meent, B. Paige, H. Yang, F. Wood, An introduction to probabilistic programming,</p>
      <p>CoRR abs/1809.10756 (2018).
[14] N. Chopin, O. Papaspiliopoulos, et al., An Introduction to Sequential Monte Carlo, volume 4,</p>
      <p>Springer, 2020.
[15] R. Douc, O. Cappé, Comparison of resampling schemes for particle filtering, in: ISPA
2005. Proceedings of the 4th International Symposium on Image and Signal Processing
and Analysis, 2005., IEEE, 2005, pp. 64–69.
[16] B. Paige, F. D. Wood, A. Doucet, Y. W. Teh, Asynchronous anytime sequential monte carlo,
in: Advances in Neural Information Processing Systems, 2014, pp. 3410–3418.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kameya</surname>
          </string-name>
          ,
          <article-title>Prism: a language for symbolic-statistical modeling</article-title>
          ,
          <source>in: IJCAI</source>
          , volume
          <volume>97</volume>
          ,
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>1997</year>
          , pp.
          <fpage>1330</fpage>
          -
          <lpage>1339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>De Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          ,
          <article-title>Problog: A probabilistic prolog and its application in link discovery</article-title>
          .,
          <source>in: IJCAI</source>
          , volume
          <volume>7</volume>
          ,
          <string-name>
            <surname>Hyderabad</surname>
          </string-name>
          ,
          <year>2007</year>
          , pp.
          <fpage>2462</fpage>
          -
          <lpage>2467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Gutmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jaeger</surname>
          </string-name>
          , L. De Raedt,
          <article-title>Extending problog with continuous distributions</article-title>
          ,
          <source>in: International Conference on Inductive Logic Programming</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ramakrishnan</surname>
          </string-name>
          ,
          <article-title>Inference in probabilistic logic programs with continuous random variables</article-title>
          ,
          <source>Theory and Practice of Logic Programming</source>
          <volume>12</volume>
          (
          <year>2012</year>
          )
          <fpage>505</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Belle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passerini</surname>
          </string-name>
          , G. Van den Broeck,
          <article-title>Probabilistic inference in hybrid domains by weighted model integration</article-title>
          ,
          <source>in: Twenty-Fourth International Joint Conference on Artificial Intelligence</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Gutmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Thon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bruynooghe</surname>
          </string-name>
          , L. De Raedt,
          <article-title>The magic of logical inference in probabilistic programming</article-title>
          ,
          <source>Theory and Practice of Logic Programming</source>
          <volume>11</volume>
          (
          <year>2011</year>
          )
          <fpage>663</fpage>
          -
          <lpage>680</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>