Optimizing Inference for Probabilistic Logic
     Programs Exploiting Independence and
                   Exclusiveness

                                 Fabrizio Riguzzi

       ENDIF – University of Ferrara, Via Saragat 1, I-44122, Ferrara, Italy
                         fabrizio.riguzzi@unife.it


      Abstract. Probabilistic Logic Programming (PLP) is gaining popular-
      ity due to its many applications in particular in Machine Learning. An
      important problem in PLP is how to compute the probability of queries.
      PITA is an algorithm for solving such a problem that exploits tabling,
      answer subsumption and Binary Decision Diagrams (BDDs). PITA does
      not impose any restriction on the programs. Other algorithms, such as
      PRISM, achieve a higher speed by imposing two restrictions on the
      program, namely that subgoals are independent and that clause bod-
      ies are mutually exclusive. Another assumption that simplifies inference
      is that clause bodies are independent. In this paper we present the al-
      gorithms PITA(IND,IND) and PITA(OPT). PITA(IND,IND) assumes
      that subgoals and clause bodies are independent. PITA(OPT) instead
      first checks whether these assumptions hold for subprograms and sub-
      goals: if they hold, PITA(OPT) uses a simplified calculation, otherwise it
      resorts to BDDs. Experiments on a number of benchmark datasets show
      that PITA(IND,IND) is the fastest on datasets respecting the assump-
      tions while PITA(OPT) is a good option when nothing is known about
      a dataset.


1   Introduction
Probabilistic Logic Programming (PLP) is an emerging field devoted to the study
of the representation of uncertain information in logic programming. Its use is
increasing in particular in the Machine Learning field [5], where many domains
present complex and uncertain relations among the entities.
    A wide variety of semantics and languages have been proposed in PLP.
Among these, the distribution semantics [20] is probably the most used. Many
languages adopt this semantics, such as Probabilistic Logic Programs [3], Inde-
pendent Choice Logic [11], PRISM [21], pD [7], Logic Programs with Annotated
Disjunctions (LPADs) [25] and ProbLog [4]. All these languages have the same
expressive power as a theory in one language can be translated into another.
LPADs offer the most general syntax as the constructs of all the other languages
can be directly encoded in LPADs.
    An important problem in PLP is computing the probability of queries or the
problem of inference. Solving this problem in a fast way is fundamental especially
for Machine Learning applications, where a high number of queries has be to
answered. PRISM [21] is a system that performs inference but restricts the class
of programs it can handle: subgoals in the body of clauses must be independent
and bodies of clauses with the same head must be mutually exclusive. These
restrictions allow PRISM to be very fast. Recently, other algorithms have been
proposed that lift these restrictions, such as Ailog2 [12], ProbLog [9], cplint [13,
14], SLGAD [15] and PITA [16, 18].

    In particular, PITA associates to each subgoal an extra argument used to
store a Binary Decision Diagram (BDD) that encodes the explanations for the
subgoal. PITA uses a Prolog library that exposes the functions of a highly effi-
cient BDD package: the conjunction of BDDs is used for handling conjunctions
of subgoals in the body while the disjunction of BDDs is used for combining
explanations for the same subgoal coming from different clauses. PITA exploits
tabling and answer subsumption to effectively combine answers for the same
subgoal and to store them for a fast retrieval. PITA was recently shown [17]
to be highly customizable for specific settings in order to increase its efficiency.
When the modeling assumptions of PRISM hold (independence of subgoals and
exclusiveness of clauses), PITA can be specialized to PITA(IND,EXC) obtaining
a system that is comparable or superior to PRISM in speed.

    In this paper, we consider another special case that can be treated by spe-
cializing PITA, one where the bodies of clause with the same head are mutually
independent, a situation first considered in [8]. This requires a different modi-
fication of PITA and PITA(IND,IND), the resulting system, is much faster on
programs where these assumption holds.

   In order to generalize these results, we propose PITA(OPT), a version of
PITA that performs conjunctions and disjunctions of explanations for subgoals
by first checking whether the independence or exclusiveness assumptions hold.
Then, depending on the test results, it uses a simplified calculation or, if no
assumption holds, it falls back on using BDDs.

    In order to investigate the performances of PITA(IND,IND) and PITA(OPT)
in comparison with the other systems, including the specialized ones, we per-
formed a number of experiments on real and artificial datasets. The results show
that PITA(IND,IND) is the fastest when the corresponding assumptions hold.
Moreover, PITA(OPT) is much faster than general purpose algorithms such as
PITA and ProbLog in some cases, while being only slightly slower in the other
cases, so that PITA(OPT) can be considered a valid alternative when nothing
is known about the program.

   After presenting the basic concepts of PLP, we illustrate PITA. Then we
present the modeling assumptions that simplify probability computations and
the system PITA(IND,IND). The description of PITA(OPT) follows, together
with the experiments performed on a number of real and artificial datasets. We
then conclude and present directions for future work.
2     Probabilistic Logic Programming
The distribution semantics [20] is one of the most widely used semantics for prob-
abilistic logic programming. In the distribution semantics, a probabilistic logic
program defines a probability distribution over a set of normal logic programs
(called worlds). The distribution is extended to a joint distribution over worlds
and a query and the probability of the query is obtained from this distribution
by marginalization.
    The languages based on the distribution semantics differ in the way they
define the distribution over logic programs. Each language allows probabilistic
choices among atoms in clauses. We consider here LPADs for their general syn-
tax. LPADs are sets of disjunctive clauses in which each atom in the head is
annotated with a probability.
    Formally a Logic Program with Annotated Disjunctions [25] consists of a finite
set of annotated disjunctive clauses. An annotated disjunctive clause Ci is of the
form hi1 : Πi1 ; . . . ; hini : Πini ← bi1 , . . . , bimi . In such a clause hi1 , . . . hini are
logical atoms and bi1 , . . . , bimi are
                                     Pnlogical     literals, Πi1 , . . . , Πini are real numbers
in the interval [0, 1] such that k=1     i
                                           Πik ≤ 1. Note that        Pif    ni = 1 and Πi1 = 1,
                                                                         ni
the clause corresponds to a non-disjunctive clause. If k=1                    Πik < 1, the head
of the annotated disjunctive clause implicitly contains an extra atomPnull that
                                                                                         ni
does not appear in the body of any clause and whose annotation is 1− k=1                     Πik .
We denote by ground(T ) the grounding of an LPAD T .

Example 1. The following LPAD T encodes a very simple model of the develop-
ment of an epidemic or a pandemic:
    C1 = epidemic : 0.6; pandemic : 0.3 ← f lu(X), cold.
    C2 = cold : 0.7.
    C3 = f lu(david).
    C4 = f lu(robert).
This program models the fact that if somebody has the flu and the climate
is cold, there is the possibility that an epidemic or a pandemic arises. We are
uncertain about whether the climate is cold but we know for sure that David
and Robert have the flu.

We now present the distribution semantics for the case in which the program
does not contain function symbols so that its Herbrand base is finite1 .
    An atomic choice is a selection of the k-th atom for a grounding Ci θj of
a probabilistic clause Ci and is represented by the triple (Ci , θj , k). An atomic
choice represents an equation of the form Xij = k where Xij is a random vari-
able associated to Ci θj . A set of atomic choices κ is consistent if (Ci , θj , k) ∈
κ, (Ci , θj , m) ∈ κ ⇒ k = m, i.e., only one head is selected for a ground clause.
    A composite choice κ is a consistent
                                Q         set of atomic choices. The probability of a
composite choice κ is P (κ) = (Ci ,θj ,k)∈κ Πik . A selection σ is a total composite
choice (one atomic choice for every grounding of each probabilistic clause). Let
1
    However, the distribution semantics for programs with function symbols has been
    defined as well [20, 12, 18].
us call ST the set all selections. A selection σ identifies a logic
                                                                  Q program wσ
called a world. The probability of wσ is P (wσ ) = P (σ) = (Ci ,θj ,k)∈σ Πik .
Since the program does not contain function symbols, the setPof worlds is finite
WT = {w1 , . . . , wm } and P (w) is a distribution over worlds: w∈WT P (w) = 1.
    We can define the conditional probability of a query Q given a world w:
P (Q|w) = 1 if Q is true in w and 0 otherwise. The probability of the query can
then be obtained by marginalizing over the query
                         X              X                   X
              P (Q) =       P (Q, w) =      P (Q|w)P (w) =      P (w)        (1)
                       w               w                  w|=Q

Example 2. For the LPAD T of Example 1, clause C1 has two groundings, C1 θ1
with θ1 = {X/david} and C1 θ2 with θ2 = {X/robert}, while clause C2 has
a single grounding C2 ∅. Since C1 has three head atoms and C2 two, T has
3 × 3 × 2 worlds. The query epidemic is true in 5 of them and its probability is
P (epidemic) = 0.6·0.6·0.7+0.6·0.3·0.7+0.6·0.1·0.7+0.3·0.6·0.7+0.1·0.6·0.7 =
0.588

Inference in probabilistic logic programming is performed by finding a covering
set of explanations for queries.
    A composite choice κ identifies a set of worlds ωκ that contains all the worlds
associated to a selection that is a superset of κ: i.e., ωκ = {wσ |σ ∈ ST , σ ⊇
κ}. WeS define the set of worlds identified by a set of composite choices K as
ωK = κ∈K ωκ . Given a ground atom Q, a composite choice κ is an explanation
for Q if Q is true in every world of ωκ . In Example 1, the composite choice
κ1 = {(C2 , ∅, 1), (C1 , {X/david}, 1)} is an explanation for epidemics. A set of
composite choices K is covering with respect to Q if every world wσ in which
Q is true is such that wσ ∈ ωK . For Example 1, a covering set of explana-
tions for epidemics is K = {κ1 , κ2 } where κ1 = {(C2 , ∅, 1), (C1 , {X/david}, 1)}
and κ2 = {(C2 , ∅, 1), (C1 , {X/robert}, 1)}. If we associate the variables X11 to
C1 {X/david}, X12 to C1 {X/robert} and X21 to C2 ∅, the query is true if the
following formula is true:

                f (X) = X21 = 1 ∧ X11 = 1 ∨ X21 = 1 ∧ X12 = 1.                 (2)

Two composite choices κ1 and κ2 are exclusive if their union is inconsistent,
i.e., if there exists a clause Ci and a substitution θj grounding Ci such that
(Ci , θj , k) ∈ κ1 , (Ci , θj , m) ∈ κ2 and k 6= m. A set K of composite choices is
mutually exclusive if for all κ1 ∈ K, κ2 ∈ K, κ1 6= κ2 ⇒ κ1 and κ2 are exclusive.
As an illustration, the set of composite choices

           K2 = {{(C2 , ∅, 1), (C1 , {X/david}, 1)},
                {(C2 , ∅, 1), (C1 , {X/david}, 0), (C1, {X/robert}, 1)}}

is mutually exclusive for the theory of Example 1.
    Given a covering set of explanations for a query, the query is true if the
disjunction of the explanations in the covering set is true, where each explanation
is interpreted as the conjunction of all its atomic choices. In this way we obtain
a formula in Disjunctive Normal Form (DNF).

   The covering set of explanations that is found for a query is not necessarily
mutually exclusive, so the probability of the query can not be computed by a
summation as in Formula (1). The explanations have first to be made mutually
exclusive so that a summation can be computed. This problem, known as disjoint
sum, is #P-complete [24]. The most effective way to date of solving the problem
makes use of Decision Diagram that are used to represent the DNF formula in a
way that allows to compute the probability with a simple dynamic programming
algorithm [4].

    Since the random variables that are associated to atomic choices can assume
multiple values, we need to use Multivalued Decision Diagrams (MDDs) [23]. An
MDD represents a function f (X) taking Boolean values on a set of multivalued
variables X by means of a rooted graph that has one level for each variable.
Each node n has one child for each possible value of the multivalued variable
associated to n. The leaves store either 0 or 1. Given values for all the variables
X, an MDD can be used to compute the value of f (X) by traversing the graph
starting from the root and returning the value associated to the leaf that is
reached. Since MDDs split paths on the basis of the values of a variable, the
branches are mutually exclusive so a dynamic programming algorithm can be
applied for computing the probability. Figure 1(a) shows the MDD corresponding
to Formula (2).

    Most packages for the manipulation of decision diagrams are however re-
stricted to work on Binary Decision Diagrams (BDD), i.e., decision diagrams
where all the variables are Boolean. These packages offer Boolean operators be-
tween BDDs and apply simplification rules to the results of operations in order
to reduce as much as possible the size of the BDD, obtaining a reduced BDD.

   A node n in a BDD has two children: the 1-child and the 0-child. When
drawing BDDs, rather than using edge labels, the 0-branch, the one going to the
0-child, is distinguished from the 1-branch by drawing it with a dashed line.

      To work on MDDs with a BDD package we must represent multivalued
variables by means of binary variables. The following encoding, proposed in
[19], gives good performances. For a multi-valued variable Xij , correspond-
ing to ground clause Ci θj , having ni values, we use ni − 1 Boolean variables
Xij1 , . . . , Xijni −1 and we represent the equation Xij = k for k = 1, . . . ni − 1 by
means of the conjunction Xij1 ∧ . . . ∧ Xijk−1 ∧ Xijk , and the equation Xij = ni
by means of the conjunction Xij1 ∧ . . . ∧ Xijni −1 . The BDD corresponding to
the MDD of Figure 1(a) is shown in Figure 1(b). BDDs obtained in this way can
be used as well for computing the probability of queries by associating to each
Boolean variable Xijk a parameter πik that represents P (Xijk = 1). The param-
eters are obtained from those of multivalued variables in this way: πi1 = Πi1 ,
. . . πik = Qk−1Πik        , . . ., up to k = ni − 1.
                   (1−πij )
             j=1
         X11
                                                        X111
                                                                                    n1 
                                                    3                                           J
                       1                 2                                                      :
         X12
                                                X121                         n 2 
                             1                   j
                                          jj    j
                                                      3
                                   jjjj                                                            
         X21
                         jj 1                   2 3     X211
                                                                        n3  [ X              
                                 UUUU 2 2                                               UR
                           1
                                        UUUU                                               OL 
                                                UU
                       1                                0                 1                         0
                           (a) MDD                                        (b) BDD

                              Fig. 1. Decision diagrams for Example 1


3    The PITA System

PITA computes the probability of a query from a probabilistic program in the
form of an LPAD by first transforming the LPAD into a normal program con-
taining calls for manipulating BDDs. The idea is to add an extra argument to
each subgoal to store a BDD encoding the explanations for the answers of the
subgoal. The extra arguments of subgoals are combined using a set of general
library functions:

 – init, end : initialize and terminate the data structures for manipulating BDDs;
 – bdd zero(-D), bdd one(-D), bdd and(+D1,+D2,-DO), bdd or(+D1,+D2, -DO),
   bdd not(+D1,-DO): Boolean operations between BDDs;
 – get var n(+R,+S,+Probs,-Var): returns the multi-valued random variable
   associated to rule R with grounding substitution S and list of probabilities
   Probs;
 – bdd equality(+Var,+Value,-D): D is the BDD representing Var=Value, i.e.
   that the multivalued random variable Var is assigned Value;
 – ret prob(+D,-P): returns the probability of the BDD D.

These functions are implemented in C as an interface to the CUDD library for
manipulating BDDs. A BDD is represented in Prolog as an integer that is a
pointer in memory to the root node of the BDD
    The PITA transformation applies to atoms, literals and clauses. The trans-
formation for a head atom h, P IT Ah (h), is h with the variable D added as
the last argument. Similarly, the transformation for a positive body literal bj ,
P IT Ab (bj ), is bj with the variable Dj added as the last argument. The transfor-
mation for a negative body literal bj = ¬aj , P IT Ab (bj ), is the Prolog conditional
(P IT A′b (aj ) → not(DNj , Dj ); one(Dj )), where P IT A′b (aj ) is aj with the vari-
able DNj added as the last argument. In other words, the input data structure,
DNj , is negated if it exists; otherwise the data structure for the constant function
1 is returned.
    The disjunctive clause Cr = h1 : Π1 ∨ . . . ∨ hn : Πn ← b1 , . . . , bm . where the
parameters sum to 1, is transformed into the set of clauses P IT A(Cr ):
    P IT A(Cr , i) = P IT Ah (hi ) ← one(DD0 ),
       P IT Ab (b1 ), and(DD0 , D1 , DD1 ), . . . ,
       P IT Ab (bm ), and(DDm−1 , Dm , DDm ),
       get var n(r, V C, [Π1 , . . . , Πn ], V ar),
       equality(V ar, i, Πi , DD), and(DDm , DD, D).
for i = 1, . . . , n, where V C is a list containing each variable appearing in Cr .
    The predicates one/1, not/2, and/3 and equality/4 are defined by
    one(D) ← bdd one(D).
    not(A, B) ← bdd not(A, B).
    and(A, B, C) ← bdd and(A, B, C).
    equality(V, I, P, D) ← bdd equality(V, I, D).
PITA uses tabling and a feature called answer subsumption available in XSB
Prolog that, when a new answer for a tabled subgoal is found, combines old
answers with the new one according to a partial order or lattice. For example,
if the lattice is on the second argument of a binary predicate p, answer sub-
sumption may be specified by means of the declaration table p( ,or/3- zero/1)
where zero/1 is the bottom element of the lattice and or/3 is the join operation
of the lattice. Thus if a table has an answer p(a, d1 ) and a new answer p(a, d2 )
is derived, the answer p(a, d1 ) is replaced by p(a, d3 ), where d3 is obtained by
calling or(d1 , d2 , d3 ).
    In PITA various predicates of the transformed program should be declared
as tabled. For a predicate p/n, the declaration is table p( 1,..., n,or/3-zero/1),
which indicates that answer subsumption is used to form the disjunction of
BDDs, with:
    zero(D) ← bdd zero(D).
    or(A, B, C) ← bdd or(A, B, C).
At a minimum, the predicate of the goal and all the predicates appearing in neg-
ative literals should be tabled with answer subsumption. However, it is usually
better to table every predicate whose answers have multiple explanations and
are going to be reused often.


4    Modeling Assumptions

PRISM makes the following modeling assumptions [22]:

 1. the probability of a conjunction (A, B) is computed as the product of the
    probabilities of A and B (and independence assumption),
 2. the probability of a disjunction (A; B) is computed as the sum of the prob-
    abilities of A and B (or exclusiveness assumption),

These assumptions can be stated more formally by referring to explanations.
Given an explanation κ, let RV S (κ) = {Ci θj |(Ci , θj , k) ∈ κ}. Given a set of
explanations K, let RV (K) = κ∈K RV (κ). Two sets of explanations, K1 and
K2 , are independent if RV (K1 ) ∩ RV (K2 ) = ∅ and exclusive if, ∀κ1 ∈ K1 , κ2 ∈
K2 , κ1 and κ2 are exclusive.
    Assumption 1 means that, when deriving a covering set of explanations for
a goal, the covering sets of explanations Ki and Kj for two ground subgoals in
the body of a clause are independent.
    Assumption 2 means that, when deriving a covering set of explanations for a
goal, two covering sets of explanations Ki and Kj obtained for a ground subgoal
h from two different clauses are exclusive. This implies that the atom h is derived
using clauses that have mutually exclusive bodies, i.e., that their bodies are not
both true in a world.
    PRISM [21] and PITA(IND,EXC) [17] exploit these assumptions to speed
up the computation. PITA(IND,EXC) differs from PITA in the definition of the
one/1, zero/1, not/2, and/3, or/3 and equality/4 predicates that now work on
probabilities P rather than on BDDs. Their definitions are
   zero(0).
   one(1).
   not(A, B) ← B is 1 − A.
   and(A, B, C) ← C is A ∗ B.
   or(A, B, C) ← C is A + B.
   equality(V, N, P, P ).
The or exclusiveness assumption can be replaced by
 3. the probability of a disjunction (A; B) is computed as if A and B were
    independent (or independence assumption).
This means that, when deriving a covering set of explanations for a goal, two
covering sets of explanations Ki and Kj obtained for a ground subgoal h from two
different clauses are independent. PITA(IND,EXC) can exploit this assumption
by modifying the or/3 predicate in this way
   or(A, B, P ) ← P is A + B − A ∗ B.
We call PITA(IND,IND) the resulting system.
   The exclusiveness assumption for conjunctions of literals means that the
conjunction is true in 0 worlds and thus has always a probability of 0 so it does
not make sense to consider a PITA(EXC, ) system.
   An example of a program satisfying assumptions 1 and 3 is the following
   path(N ode, N ode).
   path(Source, T arget) : 0.3 ← edge(Source, N ode), path(N ode, T arget).
   edge(0, 1) : 0.3.
   ...
depending on the structure of the graph. For example, the graphs in Figures
2(a) and 2(b) respect these assumptions for the query path(0, 1). Similar graphs
of increasing sizes can be obtained with the procedures presented in [1]. We call
the first graph type a “lanes” graph and the second a “branches” graph. The
graphs of the type of the one in Figure 2(c), called “parachutes” graphs, instead,
satisfy only Assumption 1 for the query path(0, 1).
    All three types of graphs respect Assumption 1 because, when deriving the
goal path(0, 1), paths are built incrementally starting from node 0 and adding
one edge at a time with the second clause of the definition of path/2. Since the
edge that is added does not appear in the following path, the assumption is
respected.
    Lanes and branches graphs respect Assumption 3 because, when deriving the
goal path(0, 1), ground instantiations of the second path clause have path(i, 1)
in the head and originate atomic choices of the form (C2 , {Source/i, T arget/1,
N ode/j}, 1). Explanations for path(i, 1) contain also atomic choices (Ei,j , ∅, 1)
for every fact edge(i, j) : 0.3 in the path. In the lanes graph each node except 0
and 1 lies on a single path, so the explanations for path(i, 1) do not share random
variables. In the branches graphs, each explanation for path(i, 1) depends on
a different set of edges. In the parachutes graph instead this is not true: for
example, the path from 3 to 1 shares the edge from 3 to 1 with the path 2, 3, 1.


        (a) Lanes                 (b) Branches               (c) Parachutes

                                  Fig. 2. Graphs


5   PITA(OPT)

PITA(OPT) differs from PITA because, before applying BDD logical operators
between sets of explanations, it checks for the truth of the assumptions. If they
hold, then simplified probability computations are used.
    The data structures used to store probabilistic information in PITA(OPT)
are couples (P, T ) where P is a real number representing a probability and T
is a term formed with the functor symbols zero/0, one/0, c/2, or/2, and/2, not/1
and the integers. If T is an integer it represents a pointer to the root node of
a BDD. If T is not an integer, it represents a Boolean expression of which the
terms of the form zero, one, c(var, val) or the integers represent the base case:
c(var, val) indicates the equation var = val while an integer indicates a BDD.
In this way we are able to represent Boolean formulas by means of a BDD, by
means of a Prolog term or a combination thereof. The last case happens when
an expression has been only partially converted to a BDD.
    For example, or(and(0x94ba008,c(1,1)), not(c(2,3))) represents the expres-
sion: (B ∧ X1 = 1) ∨ ¬(X2 = 3) where B is the Boolean function represented by
the BDD whose root node address in memory is 0x94ba008.
    PITA(OPT) differs from PITA also in the definition of zero/1, one/1, not/2,
and/3, or/3 and equality/4 that now work on couples (P, T ) rather than on
BDDs. eqaulity/4 is defined as
    equality(V, N, P, (P, c(V, N ))).
The one/1 and zero/1 predicates are defined as
    zero((0, zero)).
    one((1, one)).
The or/3 and and/3 predicates first check whether the independence or the ex-
clusiveness assumption holds. If so, they update the value of the probability using
the appropriate formula and return a compound term. If not, then they “evalu-
ate” the terms, turning them into BDDs, applying the corresponding operation
and returning the resulting BDD together with the probability it represents:
 or((P A, T A), (P B, T B), (P C, or(T A, T B))) ← ind(T A, T B), !,
   P C is P A + P B − P A ∗ P B.
 or((P A, T A), (P B, T B), (P C, or(T A, T B))) ← exc(T A, T B), !, P C is P A + P B.
 or(( P A, T A), ( P B, T B), (P C, T C)) ← ev(T A, T A1), ev(T B, T B1),
   bdd or(T A1, T B1, T C), ret prob(T C, P C).
 and((P A, T A), (P B, T B), (P C, and(T A, T B))) ← ind(T A, T B), !, P C is P A ∗ P B.
 and(( P A, T A), ( P B, T B), ) ← exc(T A, T B), !, f ail.
 and(( P A, T A), ( P B, T B), (P C, T C)) ← ev(T A, T A1), ev(T B, T B1),
   bdd and(T A1, T B1, T C), ret prob(T C, P C).
where ev/2 evaluates a term returning a BDD. The not/2 predicate is very
simple: it complements the probability and returns a new term:
    not((P, B), (P 1, not(B))) ←!, P 1 is 1 − P.
When checking for exclusiveness between two terms, if one of them is an integer,
then the other is evaluated to obtain a BDD and the conjunction of the two BDDs
is computed. If the result is equal to the 0 function, this means that the terms
are exclusive. Otherwise the predicate exc/2 recurses through the structure of
the two terms. The following code defines exc/2:
    exc(A, B) ← integer(A), !, ev(B, BB), bdd and(A, BB, C), zero(Z), Z = C.
    exc(A, B) ← integer(B), !, ev(A, AB), bdd and(AB, B, C), zero(Z), Z = C.
    exc(zero, ) ←!.
    exc( , zero) ←!.
    exc(c(V, N ), c(V, N 1)) ←!, N \ = N 1.
    exc(c(V, N ), or(X, Y )) ←!, exc(c(V, N ), X), exc(c(V, N ), Y ).
    exc(c(V, N ), and(X, Y )) ←!, (exc(c(V, N ), X); exc(c(V, N ), Y )).
    exc(or(A, B), or(X, Y )) ←!, exc(A, X), exc(A, Y ), exc(B, X), exc(B, Y ).
    exc(or(A, B), and(X, Y )) ←!, (exc(A, X); exc(A, Y )), (exc(B, X); exc(B, Y )).
    exc(and(A, B), and(X, Y )) ←!, exc(A, X); exc(A, Y ); exc(B, X); exc(B, Y ).
    exc(and(A, B), or(X, Y )) ←!, (exc(A, X); exc(B, X)), (exc(A, Y ); exc(B, Y )).
    exc(not(A), A) ←!.
    exc(not(A), and(X, Y )) ←!, exc(not(A), X); exc(not(A), Y ).
    exc(not(A), or(X, Y )) ←!, exc(not(A), X), exc(not(A), Y ).
    exc(A, or(X, Y )) ←!, exc(A, X), exc(A, Y ).
    exc(A, and(X, Y )) ← exc(A, X); exc(A, Y ).
In the test of independence between two terms, if one of them is a BDD, then the
library function bdd ind(B1, B2, I) is called. Such a function is implemented in C
and uses the CUDD function Cudd_SupportIndex that returns an array indicat-
ing which variables appear in a BDD (the support variables). bdd ind(B1, B2, I)
checks whether there is an intersection between the set of support variables of
B1 and B2 and returns I = 1 if the intersection is empty. If none of the two
terms are BDDs, then ind/2 visits the structure of the first term until it reaches
either a BDD or an atomic choice. In the latter case it checks for the absence of
the variable in the second term with the predicate absent/2. ind/2 is defined as:
    ind(one, ) ←!.
    ind(zero, ) ←!.
    ind( , one) ←!.
    ind( , zero) ←!.
    ind(A, B) ← integer(A), !, ev(B, BB), bdd ind(A, BB, I), I = 1.
    ind(A, B) ← integer(B), !, ev(A, AB), bdd ind(AB, B, I), I = 1.
    ind(c(V, N ), B) ←!, absent(V, B).
    ind(or(X, Y ), B) ←!, ind(X, B), ind(Y, B).
    ind(and(X, Y ), B) ←!, ind(X, B), ind(Y, B).
    ind(not(A), B) ← ind(A, B).
    absent(V, c(V 1, N 1)) ←!, V \ = V 1.
    absent(V, or(X, Y )) ←!, absent(V, X), absent(V, Y ).
    absent(V, and(X, Y )) ←!, absent(V, X), absent(V, Y ).
    absent(V, not(A)) ← absent(V, A).
The predicates exc/3 and ind/3 define sufficient conditions for exclusion and
independence respectively.
   The evaluation of a term, i.e., its transformation into a BDD, is defined as
    ev(B, B) ← integer(B), !.
    ev(zero, B) ←!, bdd zero(B).
    ev(one, B) ←!, bdd one(B).
    ev(c(V, N ), B) ←!, bdd equality(V, N, B).
    ev(and(A, B), C) ←!, ev(A, BA), ev(B, BB), bdd and(BA, BB, C).
    ev(or(A, B), C) ←!, ev(A, BA), ev(B, BB), bdd or(BA, BB, C).
    ev(not(A), C) ← ev(A), bdd not(A, C).


6   Experiments

In this section we compare PITA, PITA(IND,EXC), PITA(IND,IND), PITA-
(OPT), PRISM and ProbLog on a number of datasets. We first consider the
graphs of figures 2 and the path program shown in Section 4. The execution times
of PITA(OPT), PITA(IND,IND), PITA and ProbLog for graphs of increasing
sizes are shown in Figure 3 for lanes, Figure 4 for branches and Figure 5 for
parachutes graphs. Figure 3 clearly show the advantage of the (IND,IND) mod-
eling assumptions that allow PITA(IND,IND) to achieve high speed and scala-
bility. PITA(OPT) has lower performances but is still much better than PITA
and ProbLog. Figure 4 again shows the good performances of PITA(IND,IND).
Here PITA(OPT) is faster and more scalable than PITA and ProbLog as well.
                                  4
                             10


                                  2
                             10


                  Time (s)
                                  0
                             10


                                  −2                             PITA(OPT)
                             10                                  PITA(IND,IND)
                                                                 PITA
                                  −4                             ProbLog
                             10
                                      0   50       100   150   200        250   300
                                                          N


                      Fig. 3. Execution times on the lanes graphs

                                  4
                             10


                                  2
                             10
                  Time (s)


                                  0
                             10


                                  −2                             PITA(OPT)
                             10                                  PITA(IND,IND)
                                                                 PITA
                                  −4                             ProbLog
                             10
                                      0        5         10          15          20
                                                         N


                 Fig. 4. Execution times on the branches graphs


    The parachute graphs do not respect the (IND,IND) modeling assumption so
PITA(IND,IND) has not been applied to this dataset. Figure 5 compares PITA,
PITA(OPT) and ProbLog and shows that PITA(OPT) is the fastest and most
scalable.
    The blood type dataset [10] encodes the genetic inheritance of blood type in
families of increasing size. In this dataset, the (IND,EXC) assumption holds so
PRISM can also be used. Figure 6(a) shows the execution times of the algorithms.
As can be seen, PITA(IND,EXC) is much faster and more scalable than PRISM
that exploits the same assumptions and PITA is slightly faster than PITA(OPT).
    The growing head dataset [10] contains propositional programs with heads
of increasing size. In this dataset, neither the (IND,EXC) nor the (IND,IND)
assumptions hold so we compare PITA(OPT) with PITA and ProbLog. Figure
6(b) shows that PITA(OPT) is faster than PITA for sizes larger than 12 and is
able to solve 5 more programs.
    The growing negated body dataset [10] contains propositional programs with
bodies of increasing size. Also in this dataset neither the (IND,EXC) nor the
(IND,IND) assumptions hold so we compare PITA(OPT) with PITA and ProbLog.
                                             4
                                        10


                                             2
                                        10


                             Time (s)
                                             0
                                        10


                                             −2
                                        10                                                                  PITA(OPT)
                                                                                                            PITA
                                             −4                                                             ProbLog
                                        10
                                                 0         5               10                      15             20
                                                                            N


                           Fig. 5. Execution times on the parachutes graphs


                                                                                           4
            5
           10                                                                         10


                                                                                           2
                                                                                      10
                                                                           Time (s)
Time (s)


            0                                                                              0
           10                                                                         10


                                                                                           −2
                                                          PITA(OPT)                   10                                          PITA(OPT)
                                                          PITA
                                                          PITA(IND,EXC)                                                           PITA
                                                          ProbLog
            −5                                            PRISM                            −4                                     ProbLog
           10                                                                         10
                0     20    40                       60   80         100                       0        5        10         15   20      25
                                        N                                                                               N

(a) Execution times on the blood type (b) Execution times on the growing head
dataset                               dataset

                    Fig. 6. Experiments on the blood type and growing head datasets


Figure 7(a) shows an even larger improvement of PITA(OPT) with respect to
PITA and ProbLog.

    The UWCSE dataset [10] encodes a university domain. Also in this dataset
neither the (IND,EXC) nor the (IND,IND) assumptions hold so we compare
PITA(OPT) with PITA and ProbLog. Figure 7(a) shows again a large improve-
ment of PITA(OPT) with respect to PITA and ProbLog.

   These experiments show that, if we know that the program respects either the
(IND,EXC) or the (IND,IND) assumptions, using the corresponding algorithm
gives the best results. If nothing is known about the program, PITA(OPT) is a
good option since it gives very good results in datasets where these assumptions
hold for the whole program or for parts of it, while paying a limited penalty on
datasets where these assumptions are completely false.
                2                                                                 5
           10                                                                10


                0
           10


                                                                  Time (s)
Time (s)


                                                                                  0
                                                                             10
                −2
           10
                                                    PITA(OPT)                                        PITA(OPT)
                                                    PITA                                             PITA
                −4                                  ProbLog                       −5                 ProbLog
           10                                                                10
                    0       20    40        60      80      100                       0   10   20   30       40
                                       N                                                       N

(a) Execution times                        on the     growing (b) Execution times on the UWCSE
negated body dataset                                          dataset

                        Fig. 7. Experiments on the growing negated body and UWCSE datasets


7                   Conclusions

We have discussed how assumptions on the program can much simplify the
computation of the probability of queries. When subgoals in the body of clauses
are independent and the bodies of clauses for the same atom are independent as
well, PITA(IND,IND) achieves a high speed, being faster than general purpose
inference algorithm. When we don’t know whether these assumptions hold or
not, PITA(OPT) can be used that applies simplified probability computations
when the corresponding assumptions hold on the program or on parts of it.
    In the future we plan to investigate other optimization techniques such as
those presented in [2, 6].


References
   1. Bragaglia, S., Riguzzi, F.: Approximate inference for logic programs with annotated
      disjunctions. In: International Conference on Inductive Logic Programming. LNAI,
      vol. 6489, pp. 30–37. Springer (2011)
   2. Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted
      probabilistic inference by first-order knowledge compilation. In: International Joint
      Conference on Artificial Intelligence. pp. 2178–2185. AAAI Press/IJCAI (2011)
   3. Dantsin, E.: Probabilistic logic programs and their semantics. In: Russian Confer-
      ence on Logic Programming. LNCS, vol. 592, pp. 152–164. Springer (1991)
   4. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and
      its application in link discovery. In: International Joint Conference on Artificial
      Intelligence. pp. 2462–2467 (2007)
   5. De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S. (eds.): Probabilistic In-
      ductive Logic Programming - Theory and Applications, LNCS, vol. 4911. Springer
      (2008)
   6. Fierens, D., Van den Broeck, G., Thon, I., Gutmann, B., De Raedt, L.: Inference in
      probabilistic logic programs using weighted CNF’s. In: Conference on Uncertainty
      in Artificial Intelligence. pp. 211–220. AUAI Press (2011)
 7. Fuhr, N.: Probabilistic datalog: Implementing logical information retrieval for ad-
    vanced applications. J. Am. Soc. Inf. Sci. 51(2), 95–110 (2000)
 8. Hommersom, A., Lucas, P.: Generalising the interaction rules in probabilistic logic.
    In: International Joint Conference on Artificial Intelligence (2011)
 9. Kimmig, A., Demoen, B., Raedt, L.D., Costa, V.S., Rocha, R.: On the implemen-
    tation of the probabilistic logic programming language problog. Theor. Prac. Log.
    Prog. 11(2-3), 235–262 (2011)
10. Meert, W., Struyf, J., Blockeel, H.: CP-Logic theory inference with contextual
    variable elimination and comparison to BDD based inference methods. In: Interna-
    tional Conference on Inductive Logic Programming. LNCS, vol. 5989, pp. 96–109.
    Springer (2009)
11. Poole, D.: The independent choice logic for modelling multiple agents under un-
    certainty. Artif. Intell. 94(1-2), 7–56 (1997)
12. Poole, D.: Abducing through negation as failure: stable models within the inde-
    pendent choice logic. J. Log. Program. 44(1-3), 5–35 (2000)
13. Riguzzi, F.: A top down interpreter for LPAD and CP-logic. In: Congress of the Ital-
    ian Association for Artificial Intelligence. LNAI, vol. 4733, pp. 109–120. Springer
    (2007)
14. Riguzzi, F.: Extended semantics and inference for the Independent Choice Logic.
    Logic J. IGPL 17(6), 589–629 (2009)
15. Riguzzi, F.: SLGAD resolution for inference on Logic Programs with Annotated
    Disjunctions. Fundam. Inform. 102(3-4), 429–466 (2010)
16. Riguzzi, F., Swift, T.: Tabling and Answer Subsumption for Reasoning on Logic
    Programs with Annotated Disjunctions. In: Technical Communications of the In-
    ternational Conference on Logic Programming. LIPIcs, vol. 7, pp. 162–171. Schloss
    Dagstuhl–Leibniz-Zentrum fuer Informatik (2010)
17. Riguzzi, F., Swift, T.: The PITA system: Tabling and answer subsumption for
    reasoning under uncertainty. Theor. Prac. Log. Prog., International Conference on
    Logic Programming Special Issue 11(4–5), 433–449 (2011)
18. Riguzzi, F., Swift, T.: Well-definedness and efficient inference for probabilistic logic
    programming under the distribution semantics. Theor. Prac. Log. Prog., Convegno
    Italiano di Logica Computazionale Special Issue (2012), to appear
19. Sang, T., Beame, P., Kautz, H.A.: Solving bayesian networks by weighted model
    counting. In: National Conference on Artificial Intelligence. pp. 475–482. AAAI
    Press / The MIT Press (2005)
20. Sato, T.: A statistical learning method for logic programs with distribution seman-
    tics. In: International Conference on Logic Programming. pp. 715–729. MIT Press
    (1995)
21. Sato, T., Kameya, Y.: Prism: A language for symbolic-statistical modeling. In:
    International Joint Conference on Artificial Intelligence. pp. 1330–1339 (1997)
22. Sato, T., Zhou, N.F., Kameya, Y., Izumi, Y.: PRISM User’s Manual (Version 2.0)
    (2010)
23. Thayse, A., Davio, M., Deschamps, J.P.: Optimization of multivalued decision
    algorithms. In: International Symposium on Multiple-Valued Logic. pp. 171–178
    (1978)
24. Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J.
    Comp. 8(3), 410–421 (1979)
25. Vennekens, J., Verbaeten, S., Bruynooghe, M.: Logic programs with annotated
    disjunctions. In: International Conference on Logic Programming. LNCS, vol. 3131,
    pp. 195–209. Springer (2004)