<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Probabilistic Logic Programming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arnaud Nguembang Fadja</string-name>
          <email>arnaud.nguembangfadja@unife.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evelina Lamma</string-name>
          <email>evelina.lamma@unife.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Riguzzi</string-name>
          <email>fabrizio.riguzzi@unife.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria - University of Ferrara Via Saragat</institution>
          <addr-line>1, I-44122, Ferrara</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Matematica e Informatica - University of Ferrara Via Saragat</institution>
          <addr-line>1, I-44122, Ferrara</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>Probabilistic logic programming under the distribution semantics has been very useful in machine learning. However, inference is expensive so machine learning algorithms may turn out to be slow. In this paper we consider a restriction of the language called hierarchical PLP in which clauses and predicates are hierarchically organized. In this case the language becomes truth-functional and inference reduces to the evaluation of formulas in the product fuzzy logic. Programs in this language can also be seen as arithmetic circuits or deep neural networks and inference can be reperformed quickly when the parameters change. Learning can then be performed by EM or backpropagation.</p>
      </abstract>
      <kwd-group>
        <kwd>Probabilistic Logic Programming</kwd>
        <kwd>Distribution Semantics</kwd>
        <kwd>Deep Neural Networks</kwd>
        <kwd>Arithmetic Circuits</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Probabilistic Logic Programming (PLP) under the distribution semantics [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]
has found successful applications in machine learning [
        <xref ref-type="bibr" rid="ref12 ref17 ref23">23,17,12</xref>
        ], see in particular
the systems SLIPCOVER [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and LEMUR [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        These systems need to perform inference repeatedly in order to evaluate
clauses and theories. However, inference is expensive so learning may take much
time. In this paper we consider a restriction of the language of Logic Programs
with Annotated Disjunctions [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] called hierarchical PLP in which clauses and
predicates are hierarchically organized. In this case the language becomes
truthfunctional and equivalent to the product fuzzy logic. Inference then is much
cheaper as a simple dynamic programming algorithm similar to PRISM [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] is
sufficient and knowledge compilation as performed for example by ProbLog2 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
and cplint [
        <xref ref-type="bibr" rid="ref18 ref24 ref25">18,24,25</xref>
        ] is not necessary
      </p>
      <p>Programs in this language can also be seen as arithmetic circuits or deep
neural networks. From a network we can quickly recompute the probability of
the root node if the clause parameters change, as the structure remains the same.
Learning can then be performed by EM or backpropagation.</p>
      <p>The paper is organized as follows: we first introduce general PLP in Section
2, then we present hierarchical PLP in Section 3. Section 4 discusses inference.
Connections with related word are drawn in 5 and Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>Probabilistic Logic Programming under the</title>
    </sec>
    <sec id="sec-3">
      <title>Distribution Semantics</title>
      <p>
        We consider PLP languages under the distribution semantics [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] that have
been shown expressive enough to represent a wide variety of domains [
        <xref ref-type="bibr" rid="ref1 ref2 ref21">2,21,1</xref>
        ].
A program in a language adopting the distribution semantics defines a
probability distribution over normal logic programs called instances or worlds. Each
normal program is assumed to have a total well-founded model [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. Then, the
distribution is extended to queries and the probability of a query is obtained by
marginalizing the joint distribution of the query and the programs.
      </p>
      <p>
        A PLP language under the distribution semantics with a general syntax is
Logic Programs with Annotated Disjunctions (LPADs) [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. We present here the
semantics of LPADs for the case of no function symbols, if function symbols are
allowed see [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        In LPADs, heads of clauses are disjunctions in which each atom is annotated
with a probability. Let us consider an LPAD T with n clauses: T = {C1, . . . , Cn}.
Each clause Ci takes the form: hi1 : Πi1; . . . ; hivi : Πivi : − bi1, . . . , biui , where
hi1, . . . , hivi are logical atoms, bi1, . . . , biui are logical literals and
Πi1, . . . , Πivi are real numbers in the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] that sum to 1. bi1, . . . , biui
is indicated with body(Ci). Note that if vi = 1 the clause corresponds to a
non-disjunctive clause. We also allow clauses where Pvi
k=1 Πik &lt; 1: in this case
the head of the annotated disjunctive clause implicitly contains an extra atom
null that does not appear in the body of any clause and whose annotation is
1 − Pvi
      </p>
      <p>k=1 Πik. We denote by ground(T ) the grounding of an LPAD T .</p>
      <p>
        Each grounding Ciθj of a clause Ci corresponds to a random variable Xij
with values {1, . . . , vi} where vi is the number of head atoms of Ci. The random
variables Xij are independent of each other. An atomic choice [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is a triple
(Ci, θj , k) where Ci ∈ T , θj is a substitution that grounds Ci and k ∈ {1, . . . , vi}
identifies one of the head atoms. In practice (Ci, θj , k) corresponds to an
assignment Xij = k.
      </p>
      <p>A selection σ is a set of atomic choices that, for each clause Ciθj in ground(T ),
contains an atomic choice (Ci, θj , k). Let us indicate with ST the set of all
selections. A selection σ identifies a normal logic program lσ defined as lσ =
{(hik : − body(Ci))θj |(Ci, θj , k) ∈ σ}. lσ is called an instance, possible world or
simply world of T . Since the random variables associated to ground clauses are
independent, we can assign a probability to instances: P (lσ) = Q(Ci,θj,k)∈σ Πik.</p>
      <p>We consider only sound LPADs where, for each selection σ in ST , the
wellfounded model of the program lσ chosen by σ is two-valued. We write lσ |= q
to mean that the query q is true in the well-founded model of the program lσ.
Since the well-founded model of each world is two-valued, q can only be true or
false in lσ.</p>
      <p>We denote the set of all instances by LT . Let P (LT ) be the distribution over
instances. The probability of a query q given an instance l is P (q|l) = 1 if l |= q
and 0 otherwise. The probability of a query q is given by</p>
      <p>P (q) = X P (q, l) = X P (q|l)P (l) =
l∈LT
l∈LT</p>
      <p>X
l∈LT :l|=q</p>
      <p>
        P (l)
(1)
Example 1. Let us consider the UW-CSE domain [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] where the objective is to
predict the “advised by” relation between students and professors. In this case
a program for advisedby/2 may be
advisedby(A, B) : 0.3 : −
      </p>
      <p>student(A), prof essor(B), project(C, A), project(C, B).
advisedby(A, B) : 0.6 : −</p>
      <p>student(A), prof essor(B), ta(C, A), taughtby(C, B).
where project(C, A) means that C is a project with participant A, ta(C, A)
means that A is a teaching assistant for course C and taughtby(C, B) means
that course C is taught by B. The probability that a student is advised by a
professor depends on the number of joint projects and the number of courses
the professor teaches where the student is a TA, the higher these numbers the
higher the probability.</p>
      <p>Suppose we want to compute the probability of q = advisedby(harry, ben)
where harry is a student, ben is a professor, they have one joint project and ben
teaches one course where harry is a TA. Then the first clause has one grounding
with head q where the body is true and the second clause has one groundings with
head q where the body. The worlds where q is true are those that contain at least
one of these ground clauses independently of the presence of other groundings
so P (advisedby(harry, ben)) = 0.3 · 0.6 + 0.3 · 0.4 + 0.7 · 0.6 = 0.72.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Hierarchical PLP</title>
      <p>
        Suppose we want to compute the probability of atoms for a predicate r using a
probabilistic logic program under the distribution semantics [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. In particular,
we want to compute the probability of a ground atom r(t), where t is a vector
of term. r(t) can be an example in a learning problem and r a target predicate.
We want to compute the probability of r(t), starting from a set of ground atoms
(an interpretation) that represent what is true about the example encoded by
r(t). These ground atoms are build over a set of predicates that we call input
predicates, in the sense that their definition is given as input and is certain.
      </p>
      <p>We consider a specific form of an LPADs defining r in terms of the input
predicates. The program contains a set of rules that define r using a number
of input and hidden predicates. Hidden predicates are disjoint from input and
target predicates. Each rule in the program has a single head atom annotated
with a probability. Moreover, the program is hierarchically defined so that it
can be divided into layers. Each layer contains a set of hidden predicates that
are defined in terms of predicates of the layer immediately below or in terms of
input predicates. The partition of predicates into layers induces a partition of
clauses into layers, with the clauses of layer i defining the predicates of layer i.
The rules in layer 1 define r.</p>
      <p>
        This is an extreme form of program stratification: it is stronger than acyclicity
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] because it is not imposed on the atom dependency graph but on the predicate
dependency graph, and is also stronger than stratification [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] that, while applied
to the predicate dependency graph, allows clauses with positive literals built on
predicates in the same layer. As such, it also prevents inductive definitions and
recursion in general, thus making the language not Turing-complete.
      </p>
      <p>A generic clauses C is of the form</p>
      <p>C = p(X) : π : − φ(X, Y ), b1(X, Y ), . . . , bm(X, Y )
where φ(X, Y ) is a conjunction of literals for the input predicates using variables
X, Y . bi(X, Y ) for i = 1, . . . , m is a literal built on a hidden predicate. Y is a
possibly empty vector of variables. They are existentially quantified with scope
the body. Only literals for input predicates can introduce new variables into the
clause and all literals for hidden predicates must use the whole set of variables
X, Y . Moreover, we require that the predicate of each bi(X, Y ) does not appear
elsewhere in the body of C or in the body of any other clause. We call hierarchical
PLP the language that admits only programs of this form.</p>
      <p>A generic program defining r is thus:</p>
      <p>C1 = r(X) : π1 : − φ1, b11, . . . , b1m1</p>
      <p>Cn = r(X) : πn : − φn, bn1, . . . , bnmn</p>
      <p>C111 = r11(X) : π111 : − φ111, b1111, . . . , b111m111
C11n11 = r11(X) : π11n11 : − φ11n11 , b11n111, . . . , b11n11m11n11</p>
      <p>Cn11 = rn1(X) : πn11 : − φn11, bn111, . . . , bn11mn11
Cn1nn1 = rn1(X) : πn1nn1 : − φn1nn1 , bn1nn11, . . . , bn1nn1mn1nn1
where we omitted the variables except for rule heads. Such a program can be
represented with the tree of Figure 1 that contains a node for each literal of
hidden predicates and each clause. The tree is divided into levels with levels
containing literal nodes alternating with levels with clause nodes.</p>
      <p>Each clause node is indicate with Cp where p is a sequence of integers
encoding the path from the root to the node. For example C124 indicates the clause
obtained by going to the first child from the left of the root (C1), then to the
second child of C1 (b12) then to the fourth child of b12. Each literal node is
indicated similarly with bp where p is a sequence of integers encoding the path from
C1
. . .
b11</p>
      <p>b1m1
C111
. . . C11n11 C1m11 . . . C1m1n1m1 Cn11
. . .</p>
      <p>Cn
. . .
bn1</p>
      <p>bnmn
. . . Cn1nn1 Cnmn1 . . . Cnmnnnmn
the root. Therefore, nodes have a subscript that is formed by as many integers
as the number of levels from the root. The predicate of literal bp is rp which is
different for every value of p.</p>
      <p>The clauses in lower layers of the tree include a larger number of variables
with respect to upper layers: the head rpij(X, Y ) of clause Cpijk contains more
variables than the head rp(X) of clause Cpi that calls it.</p>
      <p>The constraints imposed on the program require different program literals to
have a unique predicate with respect to the literals in the body of every other
clause.</p>
      <p>r
Example 2. Let us consider a modified version of the program of Example 1:
C1 = advisedby(A, B) : 0.3 : −
student(A), prof essor(B), project(C, A), project(C, B),
r11(A, B, C).</p>
      <p>C2 = advisedby(A, B) : 0.6 : −</p>
      <p>student(A), prof essor(B), ta(C, A), taughtby(C, B).</p>
      <p>C111 = r11(A, B, C) : 0.2 : −</p>
      <p>publication(D, A, C), publication(D, B, C).
where publication(A, B, C) means that A is a publication with author B
produced in project C and student/1, prof essor/1, project/2, ta/2, taughtby/2 and
publication/3 are input predicates.</p>
      <p>In this case, the probability of q = advisedby(harry, ben) depends not only
on the number of joint courses and projects but also on the number of joint
publications from projects. The clause for r11(A, B, C) computes an
aggregation over publications of a projects and the clause level above aggregates over
projects. Such a program can be represented with the tree of Figure 2.</p>
      <p>Writing programs in hierarchical PLP may be unintuitive for humans because
of the need of satisfying the constraints and because the hidden predicates may
not have a clear meaning. However, the idea is that the structure of the program
is learned by means of a specialized algorithm, with hidden predicates generated
by a form of predicate invention. The learning algorithm should search only the
space of programs satisfying the constraints, to ensure that the final program is
a hierarchical PLP.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Inference</title>
      <p>In order to perform inference with such a program, we can generate its grounding.
Each ground probabilistic clause is associated with a random variable whose
probability of being true is given by the parameter of the clause and that is
independent of all the other clause random variables.</p>
      <p>Ground atoms are Boolean random variables as well whose probability of
being true can be computed by performing inference on the program. Given a
ground clause Cpi = ap : πpi : − bpi1, . . . , bpimp . where p is a path, we can
compute the probability that the body is true by multiplying the probability
of being true of each individual atom in positive literals and one minus the
probability of being true of each individual atom in negative literals. In fact,
different literals in the body depend on disjoint sets of clause random variables
because of the structure of the program, so the random variables of different
literals are independent as well. Therefore the probability of the body of Cpi is
P (bpi1, . . . , bpimp ) = Qim=pk P (bpik) and P (bpik) = 1 − P (apik) if bpik = ¬apik.
Note that if a is a literal for an input predicate, then P (a) = 1 if a belongs to
the example interpretation and P (a) = 0 otherwise.</p>
      <p>Now let us determine how to compute the probability of atoms for hidden
predicates. Given an atom ap of a literal bp, to compute P (ap) we need to take
into account the contribution of every ground clause for the predicate of ap.
Suppose these clauses are {Cp1, . . . , Cpop }. If we have a single clause Cp1 =
ap : πp1 : − bp11, . . . , bp1mp1 ., then P (ap) = πp1 · P (body(Cp1)). If we have two
clauses, then we can compute the contribution of each clause as above and then
combine them with the formula P (api) = 1 − (1 − πp1 · P (body(Cp1)) · (1 −
πp2 · P (body(Cp2))) because the contributions of the two clauses depend on a
disjoint set of variables and so they are independent. In fact, the probability of
a disjunction of two independent random variables is P (a ∨ b) = 1 − (1 − P (a)) ·
(1 − P (b)) = P (a) + P (b) − P (a) · P (b). We can define the operator ⊕ that
combines two probabilities as follows p ⊕ q = 1 − (1 − p) · (1 − q). This operator is
commutative and associative and we can compute sequences of applications as
M pi = 1 − Y(1 − pi)</p>
      <p>
        i i
The operators × and ⊕ so defined are respectively the t-norm and t-conorm of
the product fuzzy logic [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. They are called product t-norm and probabilistic
sum respectively. For programs of the type above, their interpretation according
to the distribution semantics is thus equivalent to that of a fuzzy logic.
Therefore the probability of a query can be computed in a truth-functional way, by
combining probability/fuzzy values in conjunctions and disjunctions using the
t-norm and t-conorm respectively, without considering how the values have been
generated: the probability of a conjunction or disjunction of literals depends only
on the probability of the two literals. This is in marked contrast with general
probabilistic logic programming where knowing the probability of two literals is
not enough to compute the probability of their conjunction or disjunction, as
they may depend on common random variables.
      </p>
      <p>
        Therefore, if the probabilistic program of Figure 1 is ground, the probability
of the example atom can be computed with the arithmetic circuit [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] of Figure
3, where nodes are labeled with the operation they perform and edges from ⊕
nodes are labeled with a probabilistic parameter that must be multiplied by the
child output before combining it with ⊕.
π111
×
      </p>
      <p>The arithmetic circuit can also be interpreted as a deep neural network where
nodes can have different activation functions: nodes labeled with × compute the
product of their inputs, nodes labeled with ⊕ compute a weighted probabilistic
sum of this form:</p>
      <p>π1 · q1 ⊕ . . . ⊕ πn · qn
where qi is the output of the ith child and πi is the weight of the edge to the ith
child.</p>
      <p>The output of nodes is also indicated in Figure 3, using letter p for ⊕ nodes
and letter q for × nodes, subscripted with the path from the root to the node.
The network built in this way provides the value p of the probability of the
example. Moreover, if we update the parameters π of clauses, we can quickly
recompute p in time linear in the number of ground clauses.</p>
      <p>When the program is not ground, we can build its grounding obtaining a
circuit/neural network of the type of Figure 3, where however some of the
parameters can be the same for different edges. So in this case the circuit will
exhibit parameter sharing.</p>
      <p>Example 3. Consider the program of Example 2 and suppose harry and ben have
two joint courses c1 and c2, two joint projects pr1 and pr2, two joint publications
p1 and p2 from project pr1 and two joint publications p3 and p4 from project
pr2. The resulting ground program is</p>
      <p>G1 = advisedby(harry, ben) : 0.3 : −
student(harry), prof essor(ben), project(pr1, harry),
project(pr1, ben), r11(harry, ben, pr1).</p>
      <p>G2 = advisedby(harry, ben) : 0.3 : −
student(harry), prof essor(ben), project(pr2, harry),
project(pr2, ben), r11(harry, ben, pr2).</p>
      <p>G3 = advisedby(harry, ben) : 0.6 : −</p>
      <p>student(harry), prof essor(ben), ta(c1, harry), taughtby(c1, ben).
G4 = advisedby(harry, ben) : 0.6 : −</p>
      <p>student(harry), prof essor(ben), ta(c2, harry), taughtby(c2, ben).
G111 = r11(harry, ben, pr1) : 0.2 : −</p>
      <p>publication(p1, harry, pr1), publication(p1, ben, pr1).</p>
      <p>G112 = r11(harry, ben, pr1) : 0.2 : −</p>
      <p>publication(p2, harry, pr1), publication(p2, ben, pr1).</p>
      <p>G211 = r11(harry, ben, pr2) : 0.2 : −</p>
      <p>publication(p3, harry, pr2), publication(p3, ben, pr2).</p>
      <p>G212 = r11(harry, ben, pr2) : 0.2 : −</p>
      <p>publication(p4, harry, pr2), publication(p4, ben, pr2).</p>
      <p>The program tree is shown in Figure 4. The corresponding arithmetic circuit is
shown in Figure 5 together with the values computed by the nodes.</p>
      <p>
        The network can be built by performing inference using tabling and answer
subsumption using PITA(IND,IND) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]: a program transformation is applied
that adds an extra argument to each subgoal of the program and of the query.
The extra argument is used to store the probability of answers to the subgoal:
when a subgoal returns, the extra argument will be instantiated to the
probability of the ground atom that corresponds to the subgoal without the extra
argument. In programs of hierarchical PLP, when a subgoal returns the original
arguments are guaranteed to be instantiated.
      </p>
      <p>The program transformation also adds to the body of clauses suitable
literals that combine the extra arguments of the subgoals: the probabilities of the</p>
      <p>G2</p>
      <p>G3
r11(harry, ben, pr1)</p>
      <p>r11(harry, ben, pr2)
G111</p>
      <p>G112</p>
      <p>G211</p>
      <p>G212
0.36
0.2
1
×
⊕
0.36
0.2
1
answers for the subgoals in the body should be multiplied together to give the
probability to be assigned to the extra argument of the head atom.</p>
      <p>
        Since a subgoal may unify with the head of multiple groundings of
multiple clauses, we need to combine the contributions of these groundings. This is
achieved by means of tabling with answer subsumption. Tabling is a Logic
Programming technique that reduces computation time and ensures termination for
a large class of programs [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The idea of tabling is simple: keep a store of the
subgoals encountered in a derivation together with answers to these subgoals. If
one of the subgoals is encountered again, its answers are retrieved from the store
rather than recomputing them. Besides saving time, tabling ensures termination
for programs without function symbols under the Well-Founded Semantics [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>
        Answer subsumption [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] is a tabling feature that, when a new answer for a
tabled subgoal is found, combines old answers with the new one. In PITA(IND,
IND) the combination operator is probabilistic sum. Computation by PITA(IND,
IND) is thus equivalent to the evaluation of the program arithmetic circuit.
      </p>
      <p>Parameter learning can be performed by EM or backpropagation. In this case
inference has to be performed repeatedly on the same program with different
values of the parameters. So we could use an algorithm similar to PITA(IND,IND)
to build a representation of the arithmetic circuit, instead of just computing the
probability. To do so it is enough to use the extra argument for storing a term
representing the circuit instead of the probability and changing the
implementation of the predicates for combining the values of the extra arguments in the
body and for combining the values from different clause groundings.</p>
      <p>The results of inference would thus be a term representing the arithmetic
circuit, that can be then used to perform regular inference several times with
different parameters values.</p>
      <p>
        Implementing EM would adapt the algorithm of [
        <xref ref-type="bibr" rid="ref6 ref7">7,6</xref>
        ] for hierarchical PLP.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] the authors discuss an approach for building deep neural networks using
a template expressed as a set of weighted rules. Similarly to our approach, the
resulting network has nodes representing ground atoms and nodes representing
ground rules and the values of ground rule nodes are aggregated to compute the
value of atom nodes. Differently from us, the contribution of different ground
rules are aggregated in two steps, first the contributions of different groundings
of the same rule sharing the same head and then the contributions of groundings
for different rules, resulting in an extra level of nodes between the ground rule
nodes and the atom nodes.
      </p>
      <p>The proposal is parametric in the activation functions of ground rule nodes,
extra level nodes and atom nodes. In particular, the authors introduce two
families of activation functions that are inspired by Lukasiewicz fuzzy logic.</p>
      <p>In this paper we show that by properly restricting the form of weighted rules
and by suitably choosing the activation functions, we can build a neural network
whose output if the probability of the example according to the distribution
semantics.</p>
      <p>
        Our proposal aims at combining deep learning with probabilistic
programming as [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], where the authors propose the Turing-complete probabilistic
programming language Edward. Programs in Edward define computational graphs
and inference is performed by stochastic graph optimization using TensorFlow as
the underlying engine. Hierarchical PLP is not Turing-complete as Edward but
ensures fast inference by circuit evaluation. Moreover, it is based on logic so it
handles well domains with multiple entities connected by relationships. Similarly
to Edward, hierarchical PLP can be compiled to TensorFlow and investigating
the advantages of this approach is an interesting direction for future work.
      </p>
      <p>
        Hierarchical PLP is also related to Probabilistic Soft Logic (PSL) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which
differs from Markov Logic because atom random variables are defined over
continuous variables in the [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] unit interval and logic formulas are interpreted
using Lukasiewicz fuzzy logic. We differ from PSL because PSL defines a joint
probability distribution over fuzzy variables, while the random variables in
hierarchical PLP are still Boolean and the fuzzy values are the probabilities that are
combined with the product fuzzy logic. Moreover, the main inference problem
in PSL is MAP, i.e., finding a most probable assignment, rather than MARG,
i.e., computing the probability of a query, as in hierarchical PLP.
      </p>
      <p>
        Hierarchical PLP is similar to sum-product networks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]: the circuits can be
seen as sum-product networks where children of sum nodes are not mutually
exclusive but independent and each product node has a leaf child that is associated
to a hidden random variable. The aim however is different: while sum-product
networks represent a distribution over input data, the programs in hierarchical
PLP describe only a distribution over the truth values of the query.
      </p>
      <p>
        Inference in hierarchical PLP is in a way “lifted” [
        <xref ref-type="bibr" rid="ref22 ref5">5,22</xref>
        ]: the probability of
the ground atoms can be computed knowing only the sizes of the populations of
individuals that can instantiate the existentially quantified variables,
6
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We have presented hierarchical PLP, a restriction of the language of LPADs that
allows to perform inference quickly using a simple and cheap dynamic
programming algorithm such as PITA(IND,IND). Such a language is particularly useful
in machine learning where inference needs to be repeated many times.</p>
      <p>Programs in this restricted language can also be seen as arithmetic circuits,
similar to sum-product networks, and as neural networks.</p>
      <p>The parameters can be trained by gradient descent, computing the gradients
using a form of backpropagation. Since random variables associated to clauses
are unobserved, an EM approach can also be tried for parameter learning. In
the future, we plan to develop and test these parameter learning algorithms and
to study the problem of structure learning.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alberti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cota</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zese</surname>
          </string-name>
          , R.: cplint on SWISH:
          <article-title>Probabilistic logical inference with a web browser</article-title>
          .
          <source>Intell. Artif</source>
          .
          <volume>11</volume>
          (
          <issue>1</issue>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alberti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cota</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zese</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Probabilistic logical inference on the web</article-title>
          . In: Adorni,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Cagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maratea</surname>
          </string-name>
          , M. (eds.)
          <source>AI*IA 2016. LNCS</source>
          , vol.
          <volume>10037</volume>
          , pp.
          <fpage>351</fpage>
          -
          <lpage>363</lpage>
          . Springer International Publishing (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Apt</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bezem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Acyclic programs</article-title>
          .
          <source>New Generat. Comput</source>
          .
          <volume>9</volume>
          (
          <issue>3</issue>
          /4),
          <fpage>335</fpage>
          -
          <lpage>364</lpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broecheler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getoor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Hinge-loss markov random fields and probabilistic soft logic</article-title>
          .
          <source>arXiv preprint arXiv:1505.04406 [cs.LG]</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamma</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costa</surname>
            ,
            <given-names>V.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zese</surname>
          </string-name>
          , R.:
          <article-title>Lifted variable elimination for probabilistic logic programming</article-title>
          .
          <source>Theor. Pract. Log. Prog</source>
          .
          <volume>14</volume>
          (
          <issue>4-5</issue>
          ),
          <fpage>681</fpage>
          -
          <lpage>695</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Experimentation of an expectation maximization algorithm for probabilistic logic programs</article-title>
          .
          <source>Intell. Artif</source>
          .
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <fpage>3</fpage>
          -
          <lpage>18</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Expectation Maximization over Binary Decision Diagrams for probabilistic logic programs</article-title>
          .
          <source>Intell. Data Anal</source>
          .
          <volume>17</volume>
          (
          <issue>2</issue>
          ),
          <fpage>343</fpage>
          -
          <lpage>363</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Structure learning of probabilistic logic programs by searching the clause space</article-title>
          .
          <source>Theor. Pract. Log. Prog</source>
          .
          <volume>15</volume>
          (
          <issue>2</issue>
          ),
          <fpage>169</fpage>
          -
          <lpage>212</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chandra</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Horn clauses queries and generalizations</article-title>
          .
          <source>J. Logic Program</source>
          .
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Darwiche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A differential approach to inference in bayesian networks</article-title>
          .
          <source>J. ACM</source>
          <volume>50</volume>
          (
          <issue>3</issue>
          ),
          <fpage>280</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Di</given-names>
            <surname>Mauro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Bellodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Riguzzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Bandit-based Monte-Carlo structure learning of probabilistic logic programs</article-title>
          .
          <source>Mach. Learn</source>
          .
          <volume>100</volume>
          (
          <issue>1</issue>
          ),
          <fpage>127</fpage>
          -
          <lpage>156</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Fierens</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Van den Broeck, G.,
          <string-name>
            <surname>Renkens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shterionov</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janssens</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Inference and learning in probabilistic logic programs using weighted boolean formulas</article-title>
          .
          <source>Theor. Pract. Log. Prog</source>
          .
          <volume>15</volume>
          (
          <issue>3</issue>
          ),
          <fpage>358</fpage>
          -
          <lpage>401</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Ha´jek, P.:
          <article-title>Metamathematics of fuzzy logic</article-title>
          ,
          <source>vol. 4</source>
          . Springer (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kok</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Learning the structure of Markov Logic Networks</article-title>
          .
          <source>In: ICML 2005</source>
          . pp.
          <fpage>441</fpage>
          -
          <lpage>448</lpage>
          . ACM (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Poole</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Independent Choice Logic for modelling multiple agents under uncertainty</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>94</volume>
          ,
          <fpage>7</fpage>
          -
          <lpage>56</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          :
          <article-title>Sum-product networks: A new deep architecture</article-title>
          . In: Cozman,
          <string-name>
            <given-names>F.G.</given-names>
            ,
            <surname>Pfeffer</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. (eds.) UAI</surname>
          </string-name>
          <year>2011</year>
          . pp.
          <fpage>337</fpage>
          -
          <lpage>346</lpage>
          . AUAI Press (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>ALLPAD: Approximate learning of logic programs with annotated disjunctions</article-title>
          .
          <source>Mach. Learn</source>
          .
          <volume>70</volume>
          (
          <issue>2-3</issue>
          ),
          <fpage>207</fpage>
          -
          <lpage>223</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Extended semantics and inference for the Independent Choice Logic</article-title>
          .
          <source>Logic Journal of the IGPL</source>
          <volume>17</volume>
          (
          <issue>6</issue>
          ),
          <fpage>589</fpage>
          -
          <lpage>629</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Speeding up inference for probabilistic logic programs</article-title>
          .
          <source>Comput. J</source>
          .
          <volume>57</volume>
          (
          <issue>3</issue>
          ),
          <fpage>347</fpage>
          -
          <lpage>363</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The distribution semantics for normal programs with function symbols</article-title>
          .
          <source>Int. J. Approx. Reason</source>
          .
          <volume>77</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          (
          <year>October 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamma</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zese</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cota</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Probabilistic logic programming on the web</article-title>
          .
          <source>Softw</source>
          .-Pract. Exper.
          <volume>46</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1381</fpage>
          -
          <lpage>1396</lpage>
          (
          <year>October 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellodi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zese</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cota</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamma</surname>
          </string-name>
          , E.:
          <article-title>A survey of lifted inference approaches for probabilistic logic programming under the distribution semantics</article-title>
          .
          <source>Int. J. Approx. Reason</source>
          .
          <volume>80</volume>
          ,
          <fpage>313</fpage>
          -
          <lpage>333</lpage>
          (
          <year>January 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Di Mauro</surname>
          </string-name>
          , N.:
          <article-title>Applying the information bottleneck to statistical relational learning</article-title>
          .
          <source>Mach. Learn</source>
          .
          <volume>86</volume>
          (
          <issue>1</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>114</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swift</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The PITA system: Tabling and answer subsumption for reasoning under uncertainty</article-title>
          .
          <source>Theor. Pract. Log. Prog</source>
          .
          <volume>11</volume>
          (
          <issue>4-5</issue>
          ),
          <fpage>433</fpage>
          -
          <lpage>449</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Riguzzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swift</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Well-definedness and efficient inference for probabilistic logic programming under the distribution semantics</article-title>
          .
          <source>Theor. Pract. Log. Prog</source>
          .
          <volume>13</volume>
          (
          <issue>Special Issue 02</issue>
          - 25th
          <source>Annual GULP Conference)</source>
          ,
          <fpage>279</fpage>
          -
          <lpage>302</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Sato</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A statistical learning method for logic programs with distribution semantics</article-title>
          . In: Sterling,
          <string-name>
            <surname>L. (ed.) ICLP</surname>
          </string-name>
          <year>1995</year>
          . pp.
          <fpage>715</fpage>
          -
          <lpage>729</lpage>
          . MIT Press, Cambridge, Massachusetts (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Sourek</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aschenbrenner</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , Zelezny´,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Kuzelka</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          :
          <article-title>Lifted relational neural networks</article-title>
          . In: Besold,
          <string-name>
            <given-names>T.</given-names>
            R.,
            <surname>d'Avila Garcez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.S.</given-names>
            ,
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.F.</given-names>
            ,
            <surname>Miikkulainen</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>NIPS Workshop on Cognitive Computation 2015. CEUR Workshop Proceedings</source>
          , vol.
          <volume>1583</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Swift</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warren</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          : XSB:
          <article-title>Extending prolog with tabled logic programming</article-title>
          .
          <source>Theor. Pract. Log. Prog</source>
          .
          <volume>12</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>157</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoffman</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saurous</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          :
          <article-title>Deep probabilistic programming</article-title>
          .
          <source>In: International Conference on Learning Representations</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Van Gelder</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ross</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlipf</surname>
            ,
            <given-names>J.S.:</given-names>
          </string-name>
          <article-title>The well-founded semantics for general logic programs</article-title>
          .
          <source>J. ACM</source>
          <volume>38</volume>
          (
          <issue>3</issue>
          ),
          <fpage>620</fpage>
          -
          <lpage>650</lpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Vennekens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbaeten</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruynooghe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Logic Programs With Annotated Disjunctions</article-title>
          .
          <source>In: ICLP 2004. LNCS</source>
          , vol.
          <volume>3132</volume>
          , pp.
          <fpage>431</fpage>
          -
          <lpage>445</lpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>