=Paper=
{{Paper
|id=Vol-1413/paper-06
|storemode=property
|title=The Distribution Semantics is Well-Defined for All Normal Programs
|pdfUrl=https://ceur-ws.org/Vol-1413/paper-06.pdf
|volume=Vol-1413
|dblpUrl=https://dblp.org/rec/conf/iclp/Riguzzi15
}}
==The Distribution Semantics is Well-Defined for All Normal Programs==
The Distribution Semantics Is Well-Defined for
All Normal Programs
Fabrizio Riguzzi
Dipartimento di Matematica e Informatica, Università di Ferrara
Via Saragat 1, I-44122, Ferrara, Italy
fabrizio.riguzzi@unife.it
Abstract. The distribution semantics is an approach for integrating
logic programming and probability theory that underlies many languages
and has been successfully applied in many domains. When the program
has function symbols, the semantics was defined for special cases: either
the program has to be definite or the queries must have a finite number
of finite explanations. In this paper we show that it is possible to define
the semantics for all programs.
Keywords: Distribution Semantics, Function Symbols, ProbLog, Probabilistic
Logic Programming
1 Introduction
The distribution semantics [1, 2] was successfully applied in many domains and
underlies many languages that combine logic programming with probability the-
ory such as Probabilistic Horn Abduction, Independent Choice Logic, PRISM,
Logic Programs with Annotated Disjunctions and ProbLog.
The definition of the distribution semantics can be given quite simply in the
case of no function symbols in the program: a probabilistic logic program under
the distribution semantics defines a probability distribution over normal logic
programs called worlds and the probability of a ground query can be obtained
by marginalizing the joint distribution of the worlds and the query. In the case
the program has function symbols, however, this simple definition does not work
as the probability of individual worlds is zero.
A definition of the distribution semantics for programs with function symbols
was proposed in [1, 3] but restricted to definite programs. The case of normal
programs was taken into account in [4] where the semantics required that the
programs are acyclic. A looser condition was proposed in [5] but still required
each goal to have a finite set of finite explanations.
In this paper we show that the distribution semantics can be defined for all
programs, thus also for programs that have goals with an infinite number of
possibly infinite explanations. We do so by adapting the definition of the well-
founded semantics in terms of iterated fixpoints of [6] to the case of ProbLog,
similarly to the way in which the TP operator has been adapted in [7] to the
69
The Distribution Semantics is Well-Defined for All Normal Programs
case of stratified ProbLog programs using parameterized interpretations. In the
case of infinite number of infinite explanations, we show that the probability of
queries is defined in the limit and the limit always exists.
We consider the case of ProbLog but the results are equally applicable to all
other languages under the distribution semantics, as there are linear transfor-
mations from one language to another that preserve the semantics.
The paper is organized as follows. Section 2 presents preliminary material
on fixpoints and the well-founded semantics. Section 3 introduces the distribu-
tion semantics for programs without function symbols. Section 4 discusses the
definition of the distribution semantics with function symbols in the case of fi-
nite set of finite explanations. Section 5 represents the main contribution of this
paper and discusses the case of infinite set of infinite explanations. Finally, Sec-
tion 6 concludes the paper. The proofs of the main results are reported in the
Appendix.
2 Preliminaries
A relation on a set S is a partial order if it is reflexive, antisymmetric and
transitive. In the following, let S be a set with a partial order ≤. a ∈ S is an
upper bound of a subset X of S if x ≤ a for all x ∈ X. Similarly, b ∈ S is a lower
bound of X if b ≤ x for all x ∈ X.
a ∈ S is the least upper bound of a subset X of X if a is an upper bound
of X and, for all upper bounds a0 of X, we have a ≤ a0 . Similarly, b ∈ S is the
greatest lower bound of a subset X of S if b is a lower bound of X and, for all
lower bounds b0 of X, we have b0 ≤ b. The least upper bound of X is unique, if
it exists, and is denoted by lub(X). Similarly, the greatest lower bound of X is
unique, if it exists, and is denoted by glb(X).
A partially ordered set L is a complete lattice if lub(X) and glb(X) exist for
every subset X of L. We let > denote the top element lub(L) and ⊥ denote the
bottom element glb(L) of the complete lattice L.
Let L be a complete lattice and T : L → L be a mapping. We say T is
monotonic if T (x) ≤ T (y), whenever x ≤ y. We say a ∈ L is the least fixpoint of
T if a is a fixpoint (that is, T (a) = a) and for all fixpoints b of T we have a ≤ b.
Similarly, we define greatest fixpoint.
Let L be a complete lattice and T : L → L be monotonic. Then we define
T ↑ 0 = ⊥; T ↑ α = T (T ↑ (α − 1)), if α is a successor ordinal; T ↑ α = lub({T ↑
β|β < α}), if α is a limit ordinal; T ↓ 0 = >; T ↓ α = T (T ↓ (α − 1)), if α is a
successor ordinal; T ↓ α = glb({T ↓ β|β < α}), if α is a limit ordinal.
Proposition 1. Let L be a complete lattice and T : L → L be monotonic. Then
T has a lest fixpoint, lfp(T ) and a greatest fixpoint gfp(T ).
A normal program P is a set of normal rules. A normal rule has the form
r = h ← b1 , . . . , bn , not c1 , . . . , not cm (1)
where h, b1 , . . . , bn , c1 , . . . , cm are atoms.
70
The Distribution Semantics is Well-Defined for All Normal Programs
The set of ground atoms that can be built with the symbols of a program P
is called the Herbrand base and is denoted as BP .
A two-valued interpretation I is a subset of BP . I is the set of true atoms.
The set Int2 of two-valued interpretations for a program P forms a complete
lattice where the partial order ≤ is given by the subset relation ⊆. S The least
upper bound T and greatest lower bound are defined as lub(X) = I∈X I and
glb(X) = I∈X I. The bottom and top element are respectively ∅ and BP .
A three-valued interpretation I is a pair hIT ; IF i where IT and IF are subsets
of BP and represent respectively the set of true and false atoms. The union of
two three-valued interpretations hIT , IF i and hJT , JF i is defined as hIT , IF i ∪
hJT , JF i = hIT ∪JT , IF ∪JF i. The intersection of two three-valued interpretations
hIT , IF i and hJT , JF i is defined as hIT , IF i ∩ hJT , JF i = hIT ∩ JT , IF ∩ JF i.
The set Int3 of three-valued interpretations for a program P forms a complete
lattice where the partial order ≤ is defined as hIT , IF i ≤ hJT , JF i if IT ⊆ JT
and IF ⊆ JFS . The least upper bound T and greatest lower bound are defined
as lub(X) = I∈X I and glb(X) = I∈X I. The bottom and top element are
respectively h∅, ∅i and hBP , BP i.
The well-founded semantics (WFS) assigns a three-valued model to a pro-
gram, i.e., it identifies a three-valued interpretation as the meaning of the pro-
gram. The WFS was given in [8] in terms of the least fixpoint of an operator that
is composed by two sub-operators, one computing consequences and the other
computing unfounded sets. We give here the alternative definition of the WFS
of [6] that is based on a different iterated fixpoint.
Definition 1. For a normal program P , sets Tr and Fa of ground atoms, and
a 3-valued interpretation I we define
OpT rueP I (Tr ) = {a|a is not true in I; and there is a clause b ← l1 , ..., ln in P ,
a grounding substitution θ such that a = bθ and for every 1 ≤ i ≤ n either
li θ is true in I, or li θ ∈ Tr };
OpFalse PI (Fa) = {a|a is not false in I; and for every clause b ← l1 , ..., ln in P
and grounding substitution θ such that a = bθ there is some i (1 ≤ i ≤ n)
such that li θ is false in I or li θ ∈ Fa}.
In words, the operator OpT rueP I extends the interpretation I to add the new
atomic facts that can be derived from P knowing I, while OpFalse PI adds the new
negations of atomic facts that can be shown false in P by knowing I. OpT rueP I
and OpFalse PI are both monotonic [6], so they both have a least and greatest
fixpoints. An iterated fixpoint operator builds up dynamic strata by constructing
successive three-valued interpretations as follows.
Definition 2 (Iterated Fixed Point). For a normal program P , let IFP P :
P
Int3 → Int3 be defined as IFP P (I) = I ∪ hlfp(OpT rueP
I ), gfp(OpFalse I )i.
IFP P is monotonic [6] and thus as a least fixed point lfp(IFP P ). Moreover, the
well-founded model WFM (P ) of P is in fact lfp(IFP P ). Let δ be the smallest
ordinal such that WFM (P ) = IFP P ↑ δ. We refer to δ as the depth of P . The
stratum of atom a is the least ordinal β such that a ∈ IFP P ↑ β (where a may
71
The Distribution Semantics is Well-Defined for All Normal Programs
be either in the true or false component of IFP P ↑ β). Undefined atoms of the
well-founded model do not belong to any stratum – i.e. they are not added to
IFP P ↑ δ for any ordinal δ.
3 The Distribution Semantics for Programs without
Function Symbols
We present the distribution semantics for the case of ProbLog [9] as it is the
language with the simplest syntax. A ProbLog program P is composed by a set
of normal rules R and a set F of probabilistic facts. Each probabilistic fact is of
the form pi :: ai where pi ∈ [0, 1] and ai is an atom, meaning that each ground
instantiation ai θ of ai is true with probability pi and false with probability
1 − pi . Each world is obtained by selecting or rejecting each grounding of all
probabilistic facts.
An atomic choice is the selection or not of grounding F θ of a probabilistic
fact F . It is represented with the triple (F, θ, i) where i ∈ {0, 1}. A set κ of atomic
choices is consistent if it does not contain two atomic choices (F, θ, i) and (F, θ, j)
with i 6= j (only one alternative is selected for a ground probabilistic fact).
The function consistent(κ) returns true if κ is consistent. A composite choice
κ is a consistent
Q set of Q
atomic choices. The probability of composite choice κ
is P (κ) = (Fi ,θ,1)∈κ pi (Fi ,θ,0)∈κ 1 − pi where pi is the probability of the i-th
probabilistic fact Fi . A selection σ is a total composite choice, i.e., it contains
one atomic choice for every grounding of each probabilistic fact. A world wσ is
a logic program that is identified by a selection σ. The world wσ is formed by
including the atom corresponding to each atomic choice (F, θ, 1) of σ.
The probability of a world wσ is P (wσ ) = P (σ). Since in this section we
are assuming programs without function symbols, the set of groundings of each
probabilistic fact is finite, and so is the set of worlds WP . Accordingly, for a
ProbLog program P P, WP = {w1 , . . . , wm }. Moreover, P (w) is a distribution
over worlds: w∈WP P (w) = 1. We call sound a program for which every world
has a two-valued well-founded model. We consider only sound programs, as the
uncertainty should be handled by the choices rather than by the semantics of
negation.
Let q be a query in the form of a ground atom. We define the conditional
probability of q given a world w as: P (q|w) = 1 if q is true in w and 0 otherwise.
Since the program is sound, q can be only true or false in a world. The probability
of q can thus be computed by summing out P the worlds fromPthe joint distribu-
tion
P of the query and the worlds: P (q) = w P (q, w) = w P (q|w)P (w) =
w|=q P (w).
4 The Distribution Semantics for Programs with
Function Symbols
When a program contains functions symbols there is the possibility that its
grounding may be infinite. If so, the number of atomic choices in a selection that
72
The Distribution Semantics is Well-Defined for All Normal Programs
defines a world is countably infinite and there is an uncountably infinite number
of worlds. In this case, the probability of each individual world is zero since it is
the product of infinite numbers all smaller than one. So the semantics of Section
3 is not well-defined.
Example 1. Consider the program
p(0) ← u(0). t ← ¬s. F1 = a :: u(X).
p(s(X)) ← p(X), u(X). s ← r, ¬q. F2 = b :: r.
q ← u(X).
The set of worlds is infinite and uncountable. In fact, each world can be put
in a one to one relation with a selection and a selection can be represented as
a countable sequence of atomic choices of which the first involves fact F2 , the
second F1 /{X/0}, the third F1 /{X/s(0)} and so on. The set of selections can be
shown uncountable by Cantor’s diagonal argument. Suppose the set of selections
is countable. Then the selections could be listed in order, suppose from top to
bottom. Suppose the atomic choices of each selection are listed from left to right.
We can pick a composite choice that differs from the first selection in the first
atomic choice (if (F2 , ∅, k) is the first atomic choice of the first selection, pick
(F2 , ∅, 1 − k)), from the second selection in the second atomic choice (similar to
the case of the first atomic choice) and so on. In this way we have obtained a
selection that is not present in the list because it differs from each selection in
the list for at least an atomic choice. So it is not possible to list the selections
in order.
We now present the definition of the distribution semantics for programs with
function symbols following [4]. The semantics for a probabilistic logic program P
with function symbols is given by defining a probability measure µ over the set of
worlds WP . Informally, µ assigns a probability to a set of subsets of WP , rather
than to every element of (the infinite set) WP . The approach dates back to [10]
who defined a probability measure µ as a real-valued function whose domain is
a σ-algebra Ω on a set W called the sample space. Together hW, Ω, µi is called
a probability space.
Definition 3. [11, Section 3.1] The set Ω of subsets of W is a σ-algebra on
the set W iff (σ-1) W ∈ Ω; (σ-2) Ω is closed under complementation, i.e.,
ω ∈ Ω → (W \ ω) ∈ Ω; and S (σ-3) Ω is closed under countable union, i.e., if
ωi ∈ Ω for i = 1, 2, . . . then i ωi ∈ Ω.
The elements of Ω are called measurable sets. Importantly, for defining the dis-
tribution semantics for programs with function symbols, not every subset of W
need be present in Ω.
Definition 4. [10] Given a sample space W and a σ-algebra Ω of subsets of W,
a probability measure is a function µ : Ω → R that satisfies the following axioms:
(µ-1) µ(ω) ≥ 0 for all ω ∈ Ω; (µ-2) µ(W) = 1; (µ-3) µ is countably additive,
S = {ω1 , ωP
i.e., if O 2 , . . .} ⊆ Ω is a countable collection of pairwise disjoint sets,
then µ( ω∈O ) = i µ(ωi ).
73
The Distribution Semantics is Well-Defined for All Normal Programs
We first consider the finite additivity version of probability spaces. In this
stronger version, the σ-algebra is replaced by an algebra.
Definition 5. [11, Section 3.1] The set Ω of subsets of W is an algebra on the
set W iff it respects conditions (σ-1), (σ-2) and condition (a-3): Ω is closed
under finite union, i.e., ω1 ∈ Ω, ω2 ∈ Ω → (ω1 ∪ ω2 ) ∈ Ω
The probability measure is replaced by a finitely additive probability measure.
Definition 6. Given a sample space W and an algebra Ω of subsets of W, a
finitely additive probability measure is a function µ : Ω → R that satisfies
axioms (µ-1) and (µ-2) of Definition 4 and axiom (m-3): µ is finitely additive,
i.e., ω1 ∩ ω2 = ∅ → µ(ω1 ∪ ω2 ) = µ(ω1 ) + µ(ω2 ) for all ω1 , ω2 ∈ Ω.
Towards defining a suitable algebra given a probabilistic logic program P, we
define the set of worlds ωκ compatible with a composite choice κ as ωκ = {wσ ∈
WP |κ ⊆ σ}. Thus a composite choice P identifies a set of worlds. For programs
without function symbols P (κ) = w∈ωκ P (w).
GivenSa set of composite choices K, the set of worlds ωK compatible with K
is ωK = κ∈K ωκ . Two composite choices κ1 and κ2 are incompatible if their
union is not consistent. A set K of composite choices is pairwise incompatible if
for all κ1 ∈ K, κ2 ∈ K, κ1 6= κ2 implies that κ1 and κ2 are incompatible.
Regardless of whether a probabilistic logic program has a finite number of
worlds or not, obtaining pairwise incompatible sets of composite choices is an
important problem. This is because the probabilityP of a pairwise incompatible
set K of composite choices is defined as P (K) = κ∈K P (κ) which is easily
computed. Two sets K1 and K2 of finite composite choices are equivalent if they
correspond to the same set of worlds: ωK1 = ωK2 .
One way to assign probabilities to a set K of composite choices is to construct
an equivalent set that is pairwise incompatible; such a set can be constructed
through the technique of splitting. More specifically, if F θ is an instantiated fact
and κ is a composite choice that does not contain an atomic choice (F, θ, k)
for any k, the split of κ on F θ is the set of composite choices Sκ,F θ = {κ ∪
{(F, θ, 0)}, κ∪{(F, θ, 1)}}. It is easy to see that κ and Sκ,F θ identify the same set
of possible worlds, i.e., that ωκ = ωSκ,F θ , and that Sκ,F θ is pairwise incompatible.
The technique of splitting composite choices on formulas is used for the following
result [12].
Theorem 1 (Existence of a pairwise incompatible set of composite choices [12])
Given a finite set K of composite choices, there exists a finite set K 0 of pairwise
incompatible composite choices such that K and K 0 are equivalent.
Proof: Given a finite set of composite choices K, there are two possibilities to
form a new set K 0 of composite choices so that K and K 0 are equivalent:
1. removing dominated elements: if κ1 , κ2 ∈ K and κ1 ⊂ κ2 , let K 0 =
K \ {κ2 }.
2. splitting elements: if κ1 , κ2 ∈ K are compatible (and neither is a superset
of the other), there is a (F, θ, k) ∈ κ1 \ κ2 . We replace κ2 by the split of κ2
on F θ. Let K 0 = K \ {κ2 } ∪ Sκ2 ,F θ .
74
The Distribution Semantics is Well-Defined for All Normal Programs
In both cases ωK = ωK 0 . If we repeat this two operations until neither is ap-
plicable we obtain a splitting algorithm that terminates because K is a finite
set of composite choices. The resulting set K 0 is pairwise incompatible and is
equivalent to the original set. ♦
Theorem 2 (Equivalence of the probability of two equivalent pairwise incom-
patible finite set of finite composite choices [13]) If K1 and K2 are both pairwise
incompatible finite sets of finite composite choices such that they are equivalent
then P (K1 ) = P (K2 ).
For a probabilistic logic program P, we can thus define a unique probability
measure µ : ΩP → [0, 1] where ΩP is defined as the set of sets of worlds identified
by finite sets of finite composite choices: ΩP = {ωK |K is a finite set of finite
composite choices }. ΩP is an algebra over WP since WP = ωK with K = {∅}.
c
Moreover, the complement ωK of ωK where K is a finite set of finite composite
choice is ωK where K is a finite set of finite composite choices. In fact, K can
obtained with the function duals(K) of [12] that performs Reiter’s hitting set
algorithm over K, generating an element κ of K by picking an atomic choice
(F, θ, k) from each element of K and inserting in κ the atomic choice (F, θ, 1−k).
After this process is performed in all possible ways, inconsistent sets of atom
choices are removed obtaining K. Since the possible choices of the atomic choices
are finite, so is K. Finally, condition (a-3) holds since the union of ωK1 with ωK2
is equal to ωK1 ∪K2 for the definition of ωK .
The corresponding measure µ is defined by µ(ωK ) = P (K 0 ) where K 0 is a
pairwise incompatible set of composite choices equivalent to K. hWP , ΩP , µi is a
finitely additive probability space according to Definition 6 because µ(ω{∅} ) = 1,
µ(ωK ) ≥ 0 for all K and if ωK1 ∩ ωK2 = ∅ and K10 (K20 ) is pairwise incompatible
and equivalent to K1 (K2 ), then K10 ∪ K20 is pairwise incompatible and
X X X
µ(ωK1 ∪ ωK2 ) = P (κ) = P (κ1 ) + P (κ2 ) = µ(ωK1 ) + µ(ωK2 ).
κ∈K10 ∪K20 κ1 ∈K10 κ2 ∈K20
Given a query q, a composite choice κ is an explanation for q if ∀w ∈ ωκ : w |= q.
A set K of composite choices is covering wrt q if every world in which q is true
belongs to ωK
Definition 7. For a probabilistic logic program P, the probability of a ground
atom q is given by P (q) = µ({w|w ∈ WP , w |= q}).
If q has a finite set K of finite explanations such that K is covering then {w|w ∈
WP ∧ w |= q} = ωK ∈ ΩT and we say that P (q) is finitely well-defined for the
distribution semantics. A program P is finitely well-defined if the probability of
all ground atoms in the grounding of P is finitely well-defined.
Example 2. Consider the program of Example 1. The set K = {κ} with κ =
{(F1 , {X/0}, 1), (F1 , {X/s(0)}, 1)} is a pairwise incompatible finite set of finite
explanations that are covering for the query p(s(0)). Definition 7 therefore ap-
plies, and P (p(s(0))) = P (κ) = a2
75
The Distribution Semantics is Well-Defined for All Normal Programs
5 Infinite Covering Set of Explanations
In this section we go beyond [4] and we remove the requirement of the finiteness
of the covering set of explanations and of each explanation for a query q.
Example 3. In Example 1, the query s has the pairwise incompatible covering
set of explanations K s = {κs0 , κs1 , . . .} with
κsi = {(F2 , ∅, 1), (F1 , {X/0}, 1), . . . , (F1 , {X/si−1 (0)}, 1), (F1 , {X/si (0)}, 0)}
where si (0) is the term where the functor s is applied i times to 0. So K s is count-
able and infinite. A covering set of explanation for t is K t = {{(F2 , ∅, 0)}, κt }
where κt is the infinite composite choice
κt = {(F2 , ∅, 1), (F1 , {X/0}, 1), (F1 , {X/s(0)}, 1), . . .}
For a probabilistic logic program P, we can define the probability measure
µ : ΩP → [0, 1] where ΩP is defined as the set of sets of worlds identified
by countable sets of countable composite choices: ΩP = {ωK |K is a countable
set of countable composite choices }.
Lemma 3 ΩP is a an σ-algebra over WP .
c
Proof: (σ-1) is true as in the algebra case. To see that the complement ωK of
ωK is in ΩP , let us prove by induction that the dual K of K is a countable
c
set of countable composite choices and then that ωK = ωK . In the base case,
if K1 = {κ1 }, then we can obtain K1 by picking each atomic choice (F, θ, k) of
κ1 and inserting in K1 the composite choice {(F, θ, 1 − k)}. As there is a finite
or countable number of atomic choices in κ1 , K1 is a finite or countable set of
composite choices each with one atomic choice.
In the inductive case, assume that Kn−1 = {κ1 , . . . , κn−1 } and that Kn−1
is a finite or countable set of composite choices. Let Kn = Kn−1 ∪ {κn } and
Kn−1 = {κ01 , κ02 , . . .}. We can obtain Kn by picking each κ0i and each atomic
choice (F, θ, k) of κn . If (F, θ, k) ∈ κ0i , then discard κ0i , else if (F, θ, k 0 ) ∈ κ0i
with k 0 6= k, insert κ0i in Kn . Otherwise generate the composite choice κ00i where
κ00i = κ0i ∪ {(F, θ, 1 − k)} and insert it in Kn . Doing this for all atomic choices
(F, θ, k) in κn generates a finite set of composite choices if κn is finite and a
countable number if κn is countable. Doing this for all κ0i we obtain that Kn is a
c
countable union of countable sets which is a countable set [14, page 3]. ωK = ωK
because all composite choices of K are incompatible with each world of ωK , as
c
they are incompatible with each composite choice of K. So ωK ∈ ΩP . (σ-3) is
true as in the algebra case. ♦
We can see K as limn→∞ Kn where Kn = {κ1 , . . . , κn }. Each Kn is a finite
set of composite choices and we can compute an equivalent finite pairwise incom-
patible set of composite choices Kn0 . For each Kn0 we can compute the probability
P (Kn0 ), noting that the probability of infinite composite choices is 0.
Now consider limn→∞ P (Kn0 ). We can see the P (Kn0 )s as the partial sums
0
of a series. Moreover, it can be shown that P (Kn−1 ) ≤ P (Kn0 ) so the series
76
The Distribution Semantics is Well-Defined for All Normal Programs
has non-negative terms. Such a series converges if the sequence of partial sums
is bounded from above [15, page 92]. Since P (Kn0 ) is bounded by 1, the limit
limn→∞ P (Kn0 ) exists. So we can define measure µ as µ(ωK ) = limn→∞ P (Kn0 ).
Theorem 4 hWP , ΩP , µi is a probability space according to Definition 4.
Proof: (µ-1) and (µ-2) hold as for the finite case and for (µ-3) let O =
{ωL1 , ωL2 , . . .} be a countable set of subsets of ΩP such that the ωLi s are pair-
0
wise disjoint.
S∞ 0 Let Li be the pairwise incompatible set equivalent to Li and let
L be i=1 Li . Since the ωLi s are pairwise disjoint, then L is pairwise incom-
patible. ΩP is a σ-algebra, so L is countable. Let L be {κ1 , κ P2 , . . .} and let
Kn0 be {κ1 , . . . , κn }. Then µ(O) = limn→∞ P (Kn0 ) = limn→∞ κ∈Kn0 P (κ) =
P S∞ 0
κ∈L P (κ). Since P L = i=1 LP i , by rearranging
∞ Pthe
∞
terms in the last summation
we get µ(O) = κ∈L P (κ) = n=1 P (L0n ) = n=1 µ(ωLn ). ♦
For a probabilistic logic program P, the probability of a ground atom q is
again given by P (q) = µ({w|w ∈ WP , w |= q}). If q has a countable set K of
explanations such that K is covering then {w|w ∈ WP ∧ w |= q} = ωK ∈ ΩP
and we say that P (q) is well-defined for the distribution semantics. A program
P is well-defined if the probability of all ground atoms in the grounding of P is
well-defined.
Example 4. Consider Example 3. Since the explanations in K s are pairwise in-
compatible the probability of s can be computed as
b(1 − a)
P (s) = b(1 − a) + ba(1 − a) + ba2 (1 − a) + . . . = = b.
1−a
since the sum is a geometric series. K t is also pairwise incompatible and P (κt ) =
0 so P (t) = 1 − b + 0 = 1 − b which is what we intuitively expect.
We now want to show that every program has countable set of countable expla-
nations that is covering for each query. In the following, we consider only ground
programs that however may be countably infinite, thus they can be the result of
grounding a program with function symbols.
Given two sets of composite choices K1 and K2 , define the conjunction K1 ⊗
K2 of K1 and K2 as K1 ⊗ K2 = {κ1 ∪ κ2 |κ1 ∈ K1 , κ2 ∈ K2 , consistent(κ1 ∪ κ2 )}
Similarly to [7], we define parametrized interpretations and a IFPC P oper-
ator. Differently from [7], here parametrized interpretations associate a set of
composite choices instead of a Boolean formula to each atom.
Definition 8 (Parameterized two-valued interpretations). A parameter-
ized positive two-valued interpretation Tr of a ground probabilistic logic program
P with and atoms BP is a set of pairs (a, Ka ) with a ∈ atoms and Ka a set
of composite choices. A parameterized negative two-valued interpretation Fa of
a ground probabilistic logic program P with atoms BP is a set of pairs (a, K¬a )
with a ∈ BP and K¬a a set of composite choices.
Parametrized two-valued interpretations form a complete lattice where the par-
tial order is defined as I ≤ J if ∀(a, Ka ) ∈ I, (a, La ) ∈ J : ωKa ⊆ ωLa . The
77
The Distribution Semantics is Well-Defined for All Normal Programs
leastSupper bound and greatest lower bound always N exist and are lub(X) =
{(a, (a,Ka )∈I,I∈X Ka )|a ∈ BP } and glb(X) = {(a, (a,Ka )∈I,I∈X Ka )|a ∈ BP }.
The top element > is {(a, {∅})|a ∈ BP } and the bottom element ⊥ is {(a, ∅)|a ∈
BP }.
Definition 9 (Parameterized three-valued interpretation). A parameter-
ized three-valued interpretation I of a ground probabilistic logic program P with
atoms BP is a set of triples (a, Ka , K¬a ) with a ∈ BP and Ka and K¬a sets of
composite choices.
Parametrized three-valued interpretations form a complete lattice where the par-
tial order is defined as I ≤ J if ∀(a, Ka , K¬a ) ∈ I, (a, La , L¬a ) ∈ J : ωKa ⊆ ωLa
and ωK¬a ⊆ ωL¬a . The least
S upper bound and S greatest lower bound always exist
and are lub(X) = {(a, (a,Ka ,K¬a )∈I,I∈X Ka , (a,Ka ,K¬a )∈I,I∈X K¬a )|a ∈ BP }
N N
and glb(X) = {(a, (a,Ka ,K¬a )∈I,I∈X Ka , (a,Ka ,K¬a )∈I,I∈X K¬a )|a ∈ BP }. The
top element > is {(a, {∅}, {∅})|a ∈ BP }, the bottom element ⊥ is {(a, ∅, ∅)|a ∈
BP }.
Definition 10. For a ground program P, a two-valued parametrized positive
interpretation Tr with pairs (a, La ), a two-valued parametrized negative inter-
pretation Fa with pairs (a, M¬a ) and a three-valued parametrized interpretation
I with triples (a, Ka , K¬a ), we define OpT rueCIP (Tr ) = {(a, L0a )|a ∈ BP } where
S {{(a, ∅, 1)}} if a ∈ F
L0a = a←b1 ,...,bn ,¬c1 ,...,cm ∈R ((Lb1 ∪ Kb1 ) ⊗ . . . if a ∈ B \ F
P
⊗(Lbn ∪ Kbn ) ⊗ K¬c1 ⊗ . . . ⊗ K¬cm )
OpFalseC P 0
I (Fa) = {(a, Ma )|a ∈ BP } where
{{(a,
N ∅, 0)}} if a ∈ F
0
M¬a = ((M ¬b1 ⊗ K ¬b1 ) ∪ . . .
a←b1 ,...,bn ,¬c1 ,...,cm ∈R if a ∈ BP \ F
∪(M¬bn ⊗ K¬bn ) ∪ Kc1 ∪ . . . ∪ Kcm )
Proposition 5 OpT rueCIP and OpFalseC P
I are monotonic.
Since OpT rueCIP and OpFalseC P
I are monotonic, they have a least fixpoint and
a greatest fixpoint.
Definition 11 (Iterated Fixed Point). For a ground program P, let IFPC P
be defined as IFPC P (I) = {(a, Ka , K¬a )|(a, Ka ) ∈ lfp(OpT rueCIP ), (a, K¬a ) ∈
lfp(OpFalseC PI )}.
Proposition 6 IFPC P is monotonic.
So IFPC P has a least fixpoint. Let WFMC (P) denote lfp(IFPC P ), and let δ
the smallest ordinal such that IFPC P ↑ δ = WFMC (P). We refer to δ as the
depth of P.
78
The Distribution Semantics is Well-Defined for All Normal Programs
Theorem 7 For a ground probabilistic logic program P with atoms BP , let Kaα
α
and K¬a be the formulas associated with atom a in IFPC P ↑ α. For every atom
a and total choice σ, there is an iteration α0 such that for all α > α0 we have:
wσ ∈ ωKaα ↔ WFM (wσ ) |= a wσ ∈ ωK¬a
α ↔ WFM (wσ ) |= ¬a
Theorem 8 For a ground probabilistic logic program P, let Kaα and K¬a α
be the
P
formulas associated with atom a in IFPC ↑ α. For every atom a and every
iteration α, Kaα and K¬a
α
are countable sets of countable composite choices.
So every query for every program has a countable set of countable explanations
that is covering and the probability measure is well defined. Moreover, since the
c
program is sound, for all atoms a, ωKaδ = ωK δ where δ is the depth of the
¬a
program, as in each world a is either true or false.
5.1 Comparison with Sato and Kameya’s Definition
Sato and Kameya [3] define the distribution semantics for definite programs.
They build a probability measure on the sample space WP from a collection
of finite distributions. Let F be {F1 , F2 , . . .} and let Xi be a random variable
associated to Fi whose domain is {0, 1}.
(n)
The finite distributions PP (X1 = k1 , . . . , Xn = kn ) for n ≥ 1 must be such
that
(n)
P 0 ≤ PP (X1 = k1 , . . . , Xn = kn ) ≤ 1
(n)
k1 ,...,kn PP (X1 = k1 , . . . , Xn = kn ) = 1
P (n+1) (n)
kn+1 PP (X1 = k1 , . . . , Xn+1 = kn+1 ) = PP (X1 = k1 , . . . , Xn = kn )
(2)
The last equation is called the compatibility condition. It can be proved [16] from
the compatibility condition that there exists a probability space (WP , ΨP , η)
where η is a probability measure on ΨP , the minimal σ-algebra containing open
sets of WP such that for any n,
(n)
η(X1 = k1 , . . . , Xn = kn ) = PT (X1 = k1 , . . . , Xn = kn ). (3)
(n) (n)
PP (X1 = k1 , . . . , Xn = kn ) is defined as PP (X1 = k1 , . . . , Xn = kn ) =
p1 . . . pn where pi is the annotation of alternative ki in fact Fi . This definition
(n)
clearly satisfies the properties in (2). PP (X1 = k1 , . . . , Xn = kn ) is then ex-
tended to a probability measure over BP .
We conjecture that this definition of the distribution semantics with function
symbols coincides for definite programs with the one given above.
To show that the two definition coincide, we conjecture that ΨP = ΩT .
Moreover, X1 = k1 , . . . , Xn = kn is equivalent to the set of composite choices
K = {{(F1 , ∅, k1 ), . . . , (Fn , ∅, kn )}} and µ(ωK ) gives p1 . . . pn which satisfies
equation (3).
79
The Distribution Semantics is Well-Defined for All Normal Programs
6 Conclusions
We have presented a definition of the distribution semantics in terms of an
iterated fixpoint operator that allowed us to prove that the semantics is well
defined for all programs. The operator we have presented is also interesting from
an inference point of view, as it can be used for forward inference similarly to
[7].
References
1. Poole, D.: Logic programming, abduction and probability - a top-down anytime
algorithm for estimating prior and posterior probabilities. New Gen. Comp. 11
(1993) 377–400
2. Sato, T.: A statistical learning method for logic programs with distribution se-
mantics. In: International Conference on Logic Programming. (1995) 715–729
3. Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical
modeling. J. Artif. Intell. Res. 15 (2001) 391–454
4. Poole, D.: The Independent Choice Logic for modelling multiple agents under
uncertainty. Artif. Intell. 94 (1997) 7–56
5. Riguzzi, F., Swift, T.: Terminating evaluation of logic programs with finite three-
valued models. ACM T. Comput. Log. 15 (2014) 32:1–32:38
6. Przymusinski, T.: Every logic program has a natural stratification and an iterated
least fixed point model. In: ACM Conference on Principles of Database Systems.
(1989) 11–21
7. Vlasselaer, J., Van den Broeck, G., Kimmig, A., Meert, W., De Raedt, L.: Anytime
inference in probabilistic logic programs with Tp-compilation. In: Internation Joint
Conference on Artificial Intelligence. (2015) 1852–1858
8. Van Gelder, A., Ross, K.A., Schlipf, J.S.: The well-founded semantics for general
logic programs. J. ACM 38 (1991) 620–650
9. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and
its application in link discovery. In: Internation Joint Conference on Artificial
Intelligence. (2007) 2462–2467
10. Kolmogorov, A.N.: Foundations of the Theory of Probability. Chelsea Publishing
Company, New York (1950)
11. Srivastava, S.: A Course on Borel Sets. Graduate Texts in Mathematics. Springer
(2013)
12. Poole, D.: Abducing through negation as failure: stable models within the inde-
pendent choice logic. J. Logic Program. 44 (2000) 5–35
13. Poole, D.: Probabilistic Horn abduction and Bayesian networks. Artif. Intell. 64
(1993)
14. Cohn, P.: Basic Algebra: Groups, Rings, and Fields. Springer (2003)
15. Brannan, D.: A First Course in Mathematical Analysis. Cambridge University
Press (2006)
16. Chow, Y., Teicher, H.: Probability Theory: Independence, Interchangeability, Mar-
tingales. Springer Texts in Statistics. Springer (2012)
80
The Distribution Semantics is Well-Defined for All Normal Programs
A Proofs of Theorems
Proposition 5 OpT rueCIP and OpFalseC P
I are monotonic.
Proof: Let us consider OpT rueCIP . We have to prove that if Tr 1 ≤ Tr 2
then OpT rueCIP (Tr 1 ) ≤ OpT rueCIP (Tr 2 ). Tr 1 ≤ Tr 2 means that ∀(a, La ) ∈
Tr 1 , (a, Ma ) ∈ Tr 2 : La ⊆ Ma . Let (a, L0a ) be the elements of OpT rueCIP (Tr 1 )
and (a, Ma0 ) the elements of OpT rueCIP (Tr 2 ). We have to prove that L0a ⊆ Ma0
If a ∈ F then L0a = Ma0 = {{(a, θ, 1)}}. If a ∈ BP \ F, then L0a and Ma0 have
the same structure. Since ∀b ∈ BP : Lb ⊆ Mb , then L0a ⊆ Ma0
We can prove similarly that OpFalseC P I is monotonic. ♦
Proposition 6 IFPC P is monotonic.
Proof: We have to prove that if I1 ≤ I2 then IFPC P (I1 ) ≤ IFPC P (I2 ).
I1 ≤ I2 means that ∀(a, La , L¬a ) ∈ I1 , (a, Ma , M¬a ) ∈ I2 : La ⊆ Ma , L¬a ⊆
M¬a . Let (a, L0a , L0¬a ) be the elements of IFPC P (I1 ) and (a, Ma0 , M¬a
0
) the el-
P 0 0 0 0
ements of IFPC (I2 ). We have to prove that La ⊆ Ma and L¬a ⊆ M¬a . This
P P
follows from the montonicity of OpT rueCI1 and OpFalseC I2 in I1 and I2 re-
spectively, which can be proved as in Proposition 5. ♦
Lemma 9 For a ground probabilistic logic program P with probabilistic facts
F, rules R and atoms BP , let Lαa be the formula associated with atom a in
OpT rueCIP ↑ α. For every atom a, total choice σ and iteration α, we have:
wσ ∈ ωLαa → WFM (wσ |I) |= a
where wσ |I is obtained by adding to wσ the atoms a for which (a, Ka , K¬a ) ∈ I
and wσ ∈ Ka as facts and by removing all the rules with a in the head for which
(a, Ka , K¬a ) ∈ I and wσ ∈ K¬a .
Proof: Let us prove the lemma by transfinite induction: let as assume the thesis
for all β < α and let us prove it for α. If α is a successor ordinal, then it is easily
verified for a ∈ F. Otherwise assume wσ ∈ ωLαa where
[
Lαa = ((Lα−1 α−1
b1 ∪Kb1 )⊗. . . ⊗(Lbn ∪Kbn )⊗K¬c1 ⊗. . .⊗K¬cm )
a←b1 ,...,bn ,¬c1 ,...,cm ∈R
This means that there is rule a ← b1 , . . . , bn , ¬c1 , . . . , cm ∈ R such that wσ ∈
ωLα−1 ∪Kb for i = 1, . . . , n and wσ ∈ ωK¬cj for j = 1 . . . , m. By the induc-
bi i
tive assumption and because of how wσ |I is built then WFM (wσ |I) |= bi and
WFM (wσ |I) |= ¬cj so WFM (wσ |I) |= a.
If α is a limit ordinal, then
[
Lα β
a = lub({La |β < α}) = Lβa
β<α
If wσ ∈ ωLαa then there must exist a β < α such that wσ ∈ ωLβa . By the inductive
assumption the hypothesis holds. ♦
81
The Distribution Semantics is Well-Defined for All Normal Programs
Lemma 10 For a ground probabilistic logic program P with probabilistic facts
α
F, rules R and atoms BP , let M¬a be the set of composite choices associated
with atom a in OpFalseC P
I ↓ α. For every atom a, total choice σ and iteration
α, we have:
wσ ∈ ωM¬a → WFM (wσ |I) |= ¬a
where wσ |I is built as in Lemma 9.
Proof: Similar to the proof of Theorem Lemma 9. ♦
Lemma 11 For a ground probabilistic logic program P with probabilistic facts
F, rules R and atoms BP , let Kaα and K¬a
α
be the formulas associated with atom
P
a in IFPC ↑ α. For every atom a, total choice σ and iteration α, we have:
wσ ∈ ωKaα → WFM (wσ ) |= a (4)
wσ ∈ ωK¬a
α → WFM (wσ ) |= ¬a (5)
Proof: Let us first prove that for all α, WFM (wσ ) = WFM (wσ |IFPC P ↑ α).
We can prove it by transfinite induction. Consider the case of α a successor
ordinal. Consider an atom b. If wσ 6∈ ωKbα and wσ 6∈ ωK¬bα then the rules for b in
P
wσ and wσ |IFPC ↑ α are the same. If wσ ∈ ωKb then b is a fact in wσ |IFPC P ↑
α
α but, according to Lemma 9, WFM (wσ |IFPC P ↑ (α−1)) |= b. For the inductive
hypothesis WFM (wσ ) |= b so b has the same truth value in WFM (wσ ) and
WFM (wσ |IFPC P ↑ α). Similarly, if wσ ∈ ωK¬b α , then WFM (wσ ) |= ¬b and b
has the same truth value in WFM (wσ ) and WFM (wσ |IFPC P ↑ α). So overall
WFM (wσ ) = WFM (wσ |IFPC P ↑ α).
S S
If α is a limit ordinal, then Kbα = β<α Kbβ and K¬b α
= β<α Kbβ . So if wσ ∈
ωKbα there is a β such wσ ∈ ωK β and for the inductive hypothesis WFM (wσ ) |= b
b
so b has the same truth value in WFM (wσ ) and WFM (wσ |IFPC P ↑ α). Similarly
if wσ ∈ ωK¬bα .
We can now prove the lemma by transfinite induction. Consider the case of
P
α a successor ordinal. Since (a, Kaα ) ∈ lfp(OpT rueCIFPC ↑(α−1) ), by Lemma 9
wσ ∈ ωKaα → WFM (wσ |IFPC P ↑ (α − 1)) |= a
Since WFM (wσ |IFPC P ↑ (α − 1)) = WFM (wσ ), (4) is proved.
α
Since (a, K¬a ) ∈ gfp(OpFalseC P
IFPC P ↑(α−1) ), by Lemma 10
P
wσ ∈ ωK¬a
α → WFM (wσ |IFPC ↑ (α − 1)) |= ¬a
Since WFM (wσ |IFPC P ↑ (α − 1)) = WFM (wσ ), (5) is proved.
S S
If α is a limit ordinal, Kaα = β<α Kaβ and K¬aα
= β<α Kaβ . If wσ ∈ ωKaα
there is a β such that wσ ∈ ωKbα and by the inductive hypothesis (4) is proved.
Similarly for (5). ♦
82
The Distribution Semantics is Well-Defined for All Normal Programs
Lemma 12 For a ground probabilistic logic program P with probabilistic facts
F, rules R and atoms BP , let Kaα and K¬a
α
be the formulas associated with atom
a in IFPC P ↑ α. For every atom a, total choice σ and iteration α, we have:
a ∈ IFP wσ ↑ α → wσ ∈ Kaα
¬a ∈ IFP wσ ↑ α → wσ ∈ K¬a
α
Proof: Let us prove it by double transfinite induction. If α is a successor ordinal,
assume that
a ∈ IFP wσ ↑ (α − 1) → wσ ∈ Kaα−1
¬a ∈ IFP wσ ↑ (α − 1) → wσ ∈ K¬a
α−1
Let us perform transfinite induction on the iterations of OpTrue P
IFPC P ↑(α−1) .
Let us consider a successor ordinal δ: assume that
a ∈ OpTrue w δ−1
IFP wσ ↑(α−1) ↑ (δ − 1) → wσ ∈ La
σ
¬a ∈ OpFalse w δ−1
IFP wσ ↑(α−1) ↓ (δ − 1) → wσ ∈ M¬a
σ
and prove that
a ∈ OpTrue w δ
IFP wσ ↑(α−1) ↑ δ → wσ ∈ La
σ
¬a ∈ OpFalse w δ
IFP wσ ↑(α−1) ↓ δ → wσ ∈ M¬a
σ
Consider a. If a ∈ F then it is easily proved.
For other atoms a, a ∈ OpTrue w IFP wσ ↑(α−1) ↑ δ means that there is a rule a ←
σ
b1 , . . . , bn , ¬c1 , . . . , cm such that for all i = 1, . . . , n bi ∈ OpTrue w
IFP wσ ↑(α−1) ↑
σ
wσ
(δ − 1) and for all j = 1, . . . , m ¬cj ∈ IFP ↑ (α − 1). For the inductive
hypothesis ∀i : wσ ∈ Lδ−1 bi ∨ w σ ∈ K α−1
bi and ∀j : wσ ∈ K¬cα−1
j
so, for the
wσ δ
definition of OpTrue IFP wσ ↑(α−1) , wσ ∈ La . Analogously for ¬a.
S N
If δ is a limit ordinal, then Lδa = µ<δ Lµa and M¬a δ
= µ<δ M¬a µ
. For the
inductive hypothesis for all µ < δ
a ∈ OpTrue w µ
IFP wσ ↑(α−1) ↑ µ → wσ ∈ La
σ
¬a ∈ OpFalse w µ
IFP wσ ↑(α−1) ↓ µ → wσ ∈ M¬a
σ
If a ∈ OpTrue w IFP wσ ↑(α−1) ↑ δ, then there exists a µ < δ such that a ∈
σ
OpTrue w IFP wσ ↑(α−1) ↑ µ. For the inductive hypothesis, wσ ∈ La .
σ δ
If ¬a ∈ OpFalse IFP wσ ↑(α−1) ↓ δ, then, for all µ < δ, ¬a ∈ OpFalse w
wσ
IFP wσ ↑(α−1) ↓
σ
µ. For the inductive hypothesis, wσ ∈SMaδ . S
Consider a limit α. Then Kaα = β<α Kaβ and K¬a α
= β<α K¬a β
. The in-
ductive hypothesis is
a ∈ IFP wσ ↑ β → wσ ∈ Kaβ
¬a ∈ IFP wσ ↑ β → wσ ∈ K¬a
β
If a ∈ IFP wσ ↑ α, then there exists a β < α such that a ∈ IFP wσ ↑ β. For the
inductive hypothesis wσ ∈ Kaβ so wσ ∈ Kaα . Similarly for ¬a. ♦
83
The Distribution Semantics is Well-Defined for All Normal Programs
Theorem 7 For a ground probabilistic logic program P with atoms BP , let Kaα
α
and K¬a be the formulas associated with atom a in IFPC P ↑ α. For every atom
a and total choice σ, there is an iteration α0 such that for all α > α0 we have:
wσ ∈ ωKaα ↔ WFM (wσ ) |= a wσ ∈ ωK¬a
α ↔ WFM (wσ ) |= ¬a
Proof: The → direction is Lemma 11. In the other direction, WFM (wσ ) |= a
implies ∃α0 ∀α ≥ α0 : IFP wσ ↑ α |= a. For Lemma 12, wσ ∈ ωKaα . WFM (wσ ) |=
¬a implies ∃α0 ∀α ≥ α0 : IF P wσ ↑ α |= ¬a. For Lemma 12, wσ ∈ ωK¬a
α . ♦
Theorem 8 For a ground probabilistic logic program P, let Kaα and K¬a α
be the
P
formulas associated with atom a in IFPC ↑ α. For every atom a and every
iteration α, Kaα and K¬a
α
are countable sets of countable composite choices.
P
Proof: It can be proved by observing that each iteration of OpT rueCIFPC P ↑β
and OpFalseC P IFPC P ↑β generates countable sets of countable explanations since
the set of rules is countable. ♦
84