=Paper=
{{Paper
|id=Vol-3193/paper1PLP
|storemode=property
|title=Exploiting the Full Power of Pearl’s Causality in Probabilistic Logic Programming
|pdfUrl=https://ceur-ws.org/Vol-3193/paper1PLP.pdf
|volume=Vol-3193
|authors=Kilian Rückschloß,Felix Weitkämper
|dblpUrl=https://dblp.org/rec/conf/iclp/RuckschlossW22a
}}
==Exploiting the Full Power of Pearl’s Causality in Probabilistic Logic Programming==
Exploiting the Full Power of Pearl’s Causality in
Probabilistic Logic Programming
Kilian Rückschloß1 , Felix Weitkämper1
1
Ludwig-Maximilians-Universität München
Institut für Informatik
Lehr- und Forschungseinheit für Programmier- und Modellierungssprachen
Oettingenstraße 67
D-80538 München
Abstract
We introduce new semantics for acyclic probabilistic logic programs in terms of Pearl’s func-
tional causal models. Further, we show that our semantics is consistent with the classical
distribution semantics and CP-logic. This enables us to establish all query types of functional
causal models, namely probability calculus, predicting the effect of external interventions and
counterfactual reasoning, within probabilistic logic programming. Finally, we briefly discuss
the problems regarding knowledge representation and the structure learning task which result
from the different semantics and query types.
Keywords
Counterfactual Reasoning, Functional Causal Models, Causal Bayesian Networks, Causal Struc-
ture Discovery, Distribution Semantics, CP-Logic, Probabilistic Logic Programming
1. Introduction
Intuitively, an acyclic probabilistic logic program under Problog semantics defines
a distribution by solving a system of equations in terms of mutually independent
predefined Boolean random variables. This intuition however is currently not reflected
in the official distribution semantics of probabilistic logic programming. [1, §2.1] To
illustrate this issue we introduce a version of the sprinkler example from [2, §1.4]:
Consider a road, which passes along a field with a sprinkler in it. The sprinkler is
switched on, written 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟, by a weather sensor with probability 𝜋2 := 0.7 in
spring or summer, denoted by 𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚. Moreover, it rains, denoted by 𝑟𝑎𝑖𝑛,
with probability 𝜋3 := 0.1 in spring or summer and with probability 𝜋4 := 0.6 in fall
or winter. If it rains or the sprinkler is on, the pavement of the road gets wet, denoted
by 𝑤𝑒𝑡. And in the case where the pavement is wet we observe that the road is slippery,
denoted by 𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦.
To model the situation above we generate mutually independent Boolean random
variables 𝑢1-𝑢4 with 𝜋(𝑢1) = 0.5 and with 𝜋(𝑢𝑖) = 𝜋𝑖 for all 2 ≤ 𝑖 ≤ 4. The described
PLP 2022: The 9th Workshop on Probabilistic Logic Programming, August 1st, 2022, Haifa, Israel
kilian.rueckschloss@lmu.de (K. Rückschloß); felix.weitkaemper@lmu.de (F. Weitkämper)
{ https://www.pms.ifi.lmu.de/mitarbeiter/derzeitige/kilian-rueckschloss/ (K. Rückschloß);
https://www.pms.ifi.lmu.de/mitarbeiter/derzeitige/felix-weitkaemper/index.html (F. Weitkämper)
0000-0002-7891-6030 (K. Rückschloß); 0000-0002-3895-8279 (F. Weitkämper)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International
(CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)
mechanism is then represented by the following system of equations:
𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚 := 𝑢1
𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟 := 𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚 ∧ 𝑢2
𝑟𝑎𝑖𝑛 := (𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚 ∧ 𝑢3) ∨ (¬𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚 ∧ 𝑢4)
𝑤𝑒𝑡 := (𝑟𝑎𝑖𝑛 ∨ 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟)
𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦 := 𝑤𝑒𝑡 (1)
Further, with the usual reading of (probabilistic) logic programs one would encode (1)
into the following LPAD-program P:
u1:0.5. u2:0.7. u3:0.1. u4:0.6.
szn_spr_sum :- u1.
sprinkler :- szn_spr_sum, u2.
rain :- szn_spr_sum, u3.
rain :- \+szn_spr_sum, u4.
wet :- rain.
wet :- sprinkler.
slippery :- wet.
However, neither the distribution semantics [1, §2.1] nor CP-logic [3] link the pro-
gram P to the system of equations (1). Note for instance that these semantics do not
introduce any random variables.
Similar to [4] the current paper uses the framework of Pearls functional causal models
[2, §1.4] to obtain a new semantics for probabilistic logic programs. As a consequence
of our new semantics we obtain an easy correspondence between probabilistic logic pro-
grams and causal Bayesian networks. Moreover, we are able to answer counterfactual
queries with the theory discussed in [2, 1.4.4]:
Let us for instance assume we observe that the sprinkler is on and that the road
is slippery. We would like to answer the query: "What is the probability of the road
being slippery if the sprinkler were off?". In order to do so we need to update the
distribution of the error terms 𝑢𝑖 in our program P according to the evidence 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟
and 𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦 in a first step. Note that our observation may introduce new dependencies!
Hence, we encode the joint distribution of the error terms 𝑢1−𝑢4 explicitly. To this aim
we introduce new meta error terms 𝑣1, 𝑣11, 𝑣01, ..., 𝑣1111, ..., 𝑣0001, where 𝑣01...1
encodes the event that 𝑢𝑛 is true if we observe that 𝑢1 is false, 𝑢2 is true and so on,
i.e. we find
un :- \+u1,u2,...,v01...1.
Further, we update the probabilities of the meta error terms according to our observa-
tions. In particular, we recognize that it is spring or summer because the sprinkler is
observed to be on, i.e. we replace 𝑣1 : 0.5 by the fact 𝑣1. In a second step we erase the
clause with 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟 in the head to assure that the sprinkler is off. Finally, we query
the modified program P′ for 𝜋(𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦) to obtain 0.1 for the desired probability.
We are not aware of any other semantics for probabilistic logic programming, which
is able to answer counterfactual queries using Pearl’s theory of causality as we just did.
However, there also exists a notion of counterfactual query in CP-logic [5]. In a future
paper, we want to prove that these two approaches to counterfactual reasoning are
equivalent, which means that CP-logic indeed consistently generalizes Pearl’s theory
of causality. As the theory of causal structure discovery is written in the language of
causal Bayesian networks and causal models our results can also be seen as a first step
towards structure learning of probabilistic logic programs under their causal semantics.
Our presentation begins with recalling Pearl’s functional causal models and the
distribution semantics in Section 2. Further, in Section 3 we define our new semantics
and show that it generalizes the distribution semantics. Moreover, we prove that our
semantics yield a notion of intervention that is consistent with the one in CP-logic. In
Section 4 we discuss how counterfactual reasoning can be realized within probabilistic
logic programming. Finally, Section 6 closes this paper with a discussion of our results.
2. Preliminaries
Before we begin with the presentation of our results, we introduce Pearl’s functional
causal models, which are the target structures for our new semantics and we recall the
classical distribution semantics for probabilistic logic programming.
2.1. Pearl’s Functional Causal Models
Let us recall the definition of a functional causal model from [2, §1.4.1]:
Definition 1 (Causal Model). A functional causal model or causal model ℳ on a
set of variables V is a system of equations, which consists of one equation of the form
𝑋 := 𝑓𝑋 (pa(𝑋), error(𝑋)) (2)
for each variable 𝑋 ∈ V. Here the parents pa(𝑋) ⊆ V of 𝑋 form a subset of the set of
variables V, the error term error(𝑋) of 𝑋 is a vector of random variables and 𝑓𝑋 is a
function defining 𝑋 in terms of the parents 𝑝𝑎(𝑋) and the error term error(𝑋) of 𝑋.
Example 1. The system of equations (1) from Section 1 forms a causal model on the
set of variables V := {𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚, 𝑟𝑎𝑖𝑛, 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟, 𝑤𝑒𝑡, 𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦}(︂if we
)︂ define
𝑢3
error(𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚) := 𝑢1, error(𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟) := 𝑢2, error(𝑟𝑎𝑖𝑛) := .
𝑢4
Next, we note that causal models do not only support queries about conditional and
unconditional probabilities. They come along with two other more general query types,
namely determining the effect of interventions and counterfactuals. To proceed we fix
a causal model ℳ on a set of variables V.
Assume we are given a subset of variables X := {𝑋1 , ..., 𝑋𝑘 } ⊆ V together with vec-
tor of possible values x := (𝑥1 , ..., 𝑥𝑘 ) for the variables in X. In order to model the effect
of setting the variables in X to the values specified by x we simply replace the equations
for 𝑋𝑖 in ℳ by 𝑋𝑖 := 𝑥𝑖 for all 1 ≤ 𝑖 ≤ 𝑘 and calculate the desired probability. The
resulting probability distribution is then denoted by 𝜋(_| do(X = x)).[2, §1.4.3]
Example 2. To predict the effect of switching the sprinkler on in the causal model (1) we
simply replace the equation for 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟 by 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟 := 𝑇 𝑟𝑢𝑒.
Finally, let E ⊆ V and let X ⊆ V be two subsets of our set of variables V. Now
suppose we observe the evidence that E = e and we ask ourselves what would have
been happened if we had set X = x. Note that in general X = x and E = e contradict
each other. To answer this query we proceed in three steps: In the abduction step
we adjust the distribution of our error terms by replacing the distribution 𝜋(error(𝑉 ))
with the conditional distribution 𝜋(error(𝑉 )|E = e) for all variables 𝑉 ∈ V. Next, in
the action step we intervene in the resulting model according to X = x. Finally, we are
able to compute the desired probabilities from the modified model in the prediction
step. [2, 1.4.4] For an illustration of the treatment of counterfactuals we refer to the
introduction.
2.2. Causal Models and Bayesian Networks
Next, we compare causal models with the widespread formalism of Bayesian networks.
Recall, that a Bayesian network consists of a directed acyclic graph 𝐺 on the involved
random variables V and of the conditional probability distributions 𝜋(𝑋| pa(𝑋)) for
every 𝑋 ∈ V, where pa(𝑋) denotes the set of parents of 𝑋 in 𝐺. Further, a given
distribution on V can be stored in a Bayesian network on the graph 𝐺 if the Markov
condition is satisfied, i.e. if each 𝑋 ∈ V is independent of its predecessors conditioned
on its parents pa(𝑋). This motivates the following definition:
Definition 2 (Acyclic & Markovian Causal Models). For each causal model ℳ on V
we define the causal diagram graph(ℳ) to be the directed graph on V, which is given
by drawing an arrow 𝑋 → 𝑌 if and only if 𝑋 ∈ pa(𝑌 ). Further, we call ℳ an acyclic
causal model if its causal diagram graph(ℳ) is a directed acyclic graph. Finally, an
acyclic causal model is said to be Markovian if the error terms error(𝑋) are mutually
independent for all 𝑋 ∈ V.
The next result from [2, §1.4.2] links Markovian causal models to the theory of
Bayesian networks:
Theorem 1. Each Markovian causal model induces a distribution, which can be repre-
sented by Bayesian network on the associated causal digram.
Example 3. For the causal diagram of the causal model in Example 1 we obtain the
following acyclic graph:
𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚
𝑟𝑎𝑖𝑛 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟
(3)
𝑤𝑒𝑡
𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦
As the error terms are mutually independent by construction we have a Markovian causal
model, i.e. the resulting distribution can be represented by Bayesian network on the graph
above.
Theorem 1 yields a way to determine effects of interventions from a Bayesian network
structure. Assume we are given a Markovian causal model ℳ. We know that ℳ gives
rise to a Bayesian network ℬ on the causal diagram graph(ℳ). As it turns out to
calculate the effect of setting X := {𝑋1 , ..., 𝑋𝑘 } to x := (𝑥1 , ..., 𝑥𝑘 ) one can modify ℬ
by changing the distributions 𝜋(𝑋𝑖 | pa(𝑋𝑖 )) to
{︃
1, 𝑋𝑖 = 𝑥𝑖
𝜋(𝑋𝑖 | pa(𝑋𝑖 )) = for all 1 ≤ 𝑖 ≤ 𝑘.
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The resulting structure is called a causal Bayesian network, i.e. a causal Bayesian
network is a Bayesian network that additionally supports queries about the effect of
external interventions. [2, §1.3]
To summarize we are left with the following hierarchy: Probability distributions
support the calculation of (conditional) probabilities. Causal Bayesian networks are
more general because they are also able to predict the effect of interventions. Finally,
causal models are the most general structure as they additionally provide the possibility
of counterfactual reasoning. We also would like to highlight that each of these structures
comes along with a corresponding learning problem if we want to derive a certain
model from observational data.
2.3. Probabilistic Logic Programming under the Distribution
Semantics
We proceed by recalling the classical semantics, the distribution semantics, of probabilis-
tic logic programming. As the semantics of non-ground probabilistic logic programs
is usually defined by grounding [6], we will focus on propositional probabilistic logic
programs. Further, note that this paper uses LPAD-notation.
2.3.1. Syntax of Probabilistic Logic Programs
Before we discuss the semantics of a probabilistic logic program, we need to introduce
it’s syntax. Here, we construct a (probabilistic) logic program from a propositional
alphabet P:
Definition 3 (propositional alphabet). A propositional alphabet P is a finite set of
propositions together with a subset E(P) ⊆ P of external propositions. Further, we
define I(P) := P ∖ E(P) to be the set of internal propositions.
Example 4. To build the program P in Section 1 we need the alphabet P with internal
propositions I(P) := {𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚, 𝑠𝑝𝑟𝑖𝑛𝑘𝑙𝑒𝑟, 𝑟𝑎𝑖𝑛, 𝑤𝑒𝑡, 𝑠𝑙𝑖𝑝𝑝𝑒𝑟𝑦} and with the
external propositions E(P) := {𝑢1, 𝑢2, 𝑢3, 𝑢4}.
From propositional alphabets we built literals, clauses and random facts. Random
facts are used to specify the probabilities in our model. To proceed let us fix a proposi-
tional alphabet P.
Definition 4 (Literals and Clauses).
i) A literal 𝑙 is an expression 𝑝 or ¬𝑝 for a proposition 𝑝 ∈ P. We call 𝑙 a positive
literal if it is of the form 𝑝 and a negative literal if it is of the form ¬𝑝.
ii) A clause 𝐿𝐶 is an expression of the form ℎ ← 𝑏1 , ..., 𝑏𝑛 , where head(𝐿𝐶) := ℎ ∈ I(P)
is an internal proposition and where body(𝐿𝐶) := {𝑏1 , ..., 𝑏𝑛 } is a finite set of
literals.
iii) A random fact 𝑅𝐹 is an expression of the form 𝑢(𝑅𝐹 ) : 𝜋(𝑅𝐹 ), where 𝑢(𝑅𝐹 ) ∈ E(P)
is an external proposition and where 𝜋(𝑅𝐹 ) ∈ [0, 1] is a number called the prob-
ability of 𝑢(𝑅𝐹 ).
Example 5. In Example 4 we have that 𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚 is a positive literal, whereas ¬𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚
is a negative literal. Further, 𝑟𝑎𝑖𝑛 ← ¬𝑠𝑧𝑛_𝑠𝑝𝑟_𝑠𝑢𝑚, 𝑢4 is a clause and 𝑢4 : 0.6 is a
random fact.
Next, we define logic programs and probabilistc logic programs as follows:
Definition 5 (Logic Programs and Probabilistic Logic Program).
i) A logic program P is a finite set of clauses. We call such a program P acyclic if
there exists a level function 𝜆 : P → N, which satisfies the following assertion
for each clause 𝐿𝐶 ∈ P: ∀𝐿∈body(𝐿𝐶) : 𝜆(head(𝐿𝐶)) > 𝜆(𝐿). (Here we use the
convention 𝜆(¬𝑝) = 𝜆(𝑝) for every proposition 𝑝 ∈ P.)
ii) A probabilistic logic program P is given by a logic program LP(P) and set Facts(P),
which consists of an unique random fact for every external proposition. We call LP(P)
the underlying logic program of P. Further, we call P acyclic if LP(P) is an
acyclic logic program.
Finally, we define the dependency graph graph(P) of P to be the directed graph
on the set of internal propositions I(P), which is given by drawing an arrow 𝑝 → 𝑞
if and only if there exists a clause 𝐿𝐶 ∈ LP(P) with head(𝐿𝐶) = 𝑞 and such
that {𝑝, ¬𝑝} ∩ body(𝐿𝐶) ̸= ∅.
Notation 1. To reflect the closed world assumption we will omit random facts of the
form 𝑢 : 0 in the set Facts(P).
Example 6. The program P from the introduction is an acyclic probabilistic logic pro-
gram. We obtain the corresponding underlying logic program LP(P) by erasing all ran-
dom facts of the form 𝑢𝑖 : _ from P. Further, the dependency graph graph(P) of P is
given by (3).
Finally, we define formulas as usual in propositional logic:
Definition 6. Let Q ⊆ P be a set of propositions. A Q-formula is inductively de-
fined by:
- Each proposition 𝑝 ∈ Q is a Q-formula.
- For each formula 𝜑 we have that ¬(𝜑) is a Q-formula.
- For any two formulas 𝜑 and 𝜓 we have that (𝜑 ∧ 𝜓) is a Q-formula.
- For any two formulas 𝜑 and 𝜓 we have that (𝜑 ∨ 𝜓) is a Q-formula.
- For any two formulas 𝜑 and 𝜓 we have that (𝜑 ↔ 𝜓) is a Q-formula.
2.3.2. The Semantics of Propositions and Formulas
We will use Clark translation to define the semantics of acyclic logic programs, i.e. we
define the semantics of logic programs via the semantics of propositions and formulas.
To this aim we fix a set of propositions Q ⊆ P.
Definition 7. A Q-structure is a function ℳ : Q → {𝑇 𝑟𝑢𝑒, 𝐹 𝑎𝑙𝑠𝑒}, 𝑝 ↦→ 𝑝ℳ .
Next, we give a meaning to every Q-formula in the usual way:
Definition 8. For a Q-structure ℳ the models relation |= is defined inductively over
the structure of a Q-formula:
- For each proposition 𝑝 ∈ Q we write ℳ |= 𝑝 if and only if 𝑝ℳ = 𝑇 𝑟𝑢𝑒.
- For each formula 𝜑 we write ℳ |= ¬(𝜑) if and only if not ℳ |= 𝜑, i.e. if ℳ ̸|= 𝜑.
- For any two formulas 𝜑 and 𝜓 we write ℳ |= (𝜑 ∧ 𝜓) if and only if ℳ |= 𝜑
and ℳ |= 𝜓.
- For any two formulas 𝜑 and 𝜓 we write ℳ |= (𝜑 ∨ 𝜓) if and only if ℳ |= 𝜑
or ℳ |= 𝜓.
- For any two formulas 𝜑 and 𝜓 we write ℳ |= (𝜑 ↔ 𝜓) if and only if either
ℳ |= 𝜑 and ℳ |= 𝜓 or ℳ ̸|= 𝜑 and ℳ ̸|= 𝜓.
Now we can give a semantics to acyclic logic programs.
2.3.3. The Semantics of Acyclic Logic Programs
As already mentioned we define the semantics of an acyclic logic program by using
Clark translation.
Definition 9 (Clark Translation). For each logic program P we define it’s Clark trans-
lation by
⎧⎛ ⎞ ⎫
⎪ ⎛ ⎞ ⎪
⎪
⎨⎜ ⎪
⎬
⋁︁ ⋀︁
𝐶𝐿
⎟
P := ⎝𝑝 ↔ ⎜ ⎝ 𝑙 ⎠ : 𝑝 ∈ I(P)
⎠⎟
⎪ ⎪
⎪
⎩ 𝐿𝐶∈P 𝑙∈body(𝐿𝐶) ⎪
⎭
head(𝐿𝐶)=𝑝
with the convention that an empty disjunction evaluates to 𝐹 𝑎𝑙𝑠𝑒 and that the empty
conjunction evaluates to 𝑇 𝑟𝑢𝑒.
Example 7. One key observation is that the Clark translation of the underlying logic
program LP(P) of the probabilistic logic program P in Example 6 can be obtained from
the system of equations (1) by replaceing the ":="-sign by the "↔"-sign.
In the next proposition we formally justify, why it is permissible to exchange the
":="-sign by the "↔"-sign.
Proposition 1. Let P be an acyclic logic program.
i) For each E(P)-structure ℰ there exists an unique P-structure ℳ(ℰ, P) such that ℳ(ℰ, P)
extends ℰ and such that ℳ(ℰ, P) |= P𝐶𝐿 .
ii) We have that ℳ(ℰ, P) is obtained from ℰ by solving the system of equations:
⎧ ⎫
⎪ ⎛ ⎞ ⎪
⎪
⎨ ⎪
⎬
⋁︁ ⋀︁
ES(P) := 𝑝 := ⎝ 𝑙⎠ : 𝑝 ∈ I(P)
⎪ ⎪
⎪
⎩ 𝐿𝐶∈P 𝑙∈body(𝐿𝐶) ⎪
⎭
head(𝐿𝐶)=𝑝
Proof. Let P be an acyclic logic program. First, we observe that i) trivially follows
from ii) by the definition of |=. Hence, all we need to show is that ES(P) posses an
unique solution for every E(P)-structure ℰ.
Let ℰ be an E(P)-structure. As P is acyclic there exists a level function 𝜆 : P → N.
Let us proceed by induction over the number 𝑛 of the values of 𝜆.
n=1 In this case for every clause 𝐿𝐶 ∈ P we have that body(𝐿𝐶) = ∅. Hence, the
only P-structure ℳ extending ℰ and satisfying ES(P) is given by
𝒫 : P → {𝑇 𝑟𝑢𝑒, 𝐹 𝑎𝑙𝑠𝑒}
⎧
ℰ 𝑒 ∈ E(P)
⎨𝑒 ,
⎪
𝑝 ↦→ 𝑇 𝑟𝑢𝑒, 𝑝 ∈ I(P) and 𝑝 = head(𝐿𝐶) for a 𝐿𝐶 ∈ P .
⎪
𝐹 𝑎𝑙𝑠𝑒, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
⎩
n>1 Denote by Pmax the set of all propositions 𝑝 with 𝜆(𝑝) = max𝑝∈P 𝜆(𝑝). Further,
denote by Pmax the set of all clauses 𝐿𝐶 ∈ P with head(𝐿𝐶) ∈ Pmax . Finally,
set P′ := P ∖ Pmax and set P′ := P ∖ Pmax . Now by induction hypothesis
there exists an unique P′ -structure ℳ′ := ℳ(ℰ, P′ ) extending ℰ and satisfy-
ing ES(P′ ). To conclude we observe that by acyclicity the only P-structure ℳ
extending ℰ and satisfying ES(P) is given by
ℳ : P → {𝑇 𝑟𝑢𝑒, 𝐹 𝑎𝑙𝑠𝑒}
⎧ ′
ℳ 𝑝 ∈ P′
⎨𝑝 ,
⎪
𝑝 ↦→ 𝑇 𝑟𝑢𝑒, 𝑝 = head(𝐿𝐶) and ℳ′ |= body(𝐿𝐶) for a 𝐿𝐶 ∈ Pmax .
⎪
𝐹 𝑎𝑙𝑠𝑒, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
⎩
Remark 1. We take the assignment ℰ ↦→ ℳ(ℰ, P) as the semantics of an acyclic logic
program P. Note that this is consistent with the standard minimal Herbrand-model se-
mantics. [7]
Corollary 1. Let P be an acyclic logic program. Each model of the Clarck transla-
tion ℳ |= P𝐶𝐿 is uniquely determined by it’s restriction ℳ|E(P) to the external propo-
sitions.
Proof. This is a direct consequence of Proposition 1.
2.3.4. The Distribution Semantics of Acyclic Probabilistic Logic Programs
Finally, we are in the position to give the definition of the distribution semantics for
acyclic probabilistic logic programs:
Definition 10. Let P be a probabilistic logic program. An atomic choice is a sub-
set C ⊆ Facts(P) of the random facts of P. To each atomic choice C we associate the
E(P)-structure
ℰ(C) :E(P) → {𝑇 𝑟𝑢𝑒, 𝐹 𝑎𝑙𝑠𝑒}
{︃
𝑇 𝑟𝑢𝑒, if there exists a random fact of the form 𝑒 : _ in C
𝑒 ↦→
𝐹 𝑎𝑙𝑠𝑒, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and the probability
∏︁ ∏︁
𝜋(C) := 𝜋(𝑅𝐹 ) · (1 − 𝜋(𝑅𝐹 )).
𝑅𝐹 ∈C 𝑅𝐹 ̸∈C
The distribution semantics of an acyclic probabilistic logic program P is the distri-
bution 𝜋P𝑑𝑖𝑠𝑡 on the P-structures ℳ, which is given by
{︃
𝑑𝑖𝑠𝑡 𝜋(C), if ℳ = ℳ(ℰ(C), LP(P)) for an atomic choice C
𝜋P (ℳ) := .
0, otherwise
Finally, for every P-formula 𝜑 we define the probability for 𝜑 to hold by
∑︁
𝜋(𝜑) := 𝜋(ℳ). (4)
ℳ P−𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒
ℳ|=𝜑
Remark 2. Remark 1 ensures that Definition 10 defines the distribution semantics as
in [1, 2.1.2].
3. The Functional Causal Models Semantics for
Probabilistic Logic Programs
Since we are now finished with our preparations, we proceed to the functional causal
models semantics we are heading for:
Definition 11. Let P be an acyclic probabilistic logic program. We define the functional
causal models semantics or FCM semantics of P to be the functional causal model
on the internal propositions I(P), which is given by the system of equations
⎧ ⎫
⎪ ⎛ ⎞ ⎪
⎪
⎨ ⎪
⎬
⋁︁ ⋀︁
FCM FCM ⎠ : 𝑝 ∈ I(P)
FCM(P) := 𝑝 := ⎝ 𝑙
⎪ ⎪
⎪
⎩ 𝐿𝐶∈LP(P) 𝑙∈body(𝐿𝐶) ⎪
⎭
head(𝐿𝐶)=𝑝
and mutually independent Boolean random variables [︀𝑢(𝑅𝐹 )FCM ]︀for every random fact
𝑅𝐹 ∈ Facts(P) that are distributed according to 𝜋 𝑢(𝑅𝐹 )FCM = 𝜋(𝑅𝐹 ). Here we
again use the convention that an empty disjunction evaluates to 𝐹 𝑎𝑙𝑠𝑒 and that an empty
conjunction evaluates to 𝑇 𝑟𝑢𝑒.
Remark 3. By construction the dependency graph graph(P) of P yields the causal dia-
gram graph(FCM(P)) of the associated functional causal model.
Remark 4. The theory so far has been restricted to acyclic probabilistic logic programs,
since the Clark translation is known not to be correct for general acyclic logic programs [7].
However, potentially cyclic programs with stratified negation can also be accommodated
by cycle-breaking. This process replaces a propositional (or ground) logic program with an
equivalent acyclic logic program. There are different algorithms available for this task,
the current state-of-the-art being treewidth-aware cycle-breaking based on forward-
unfolding of the logic program [8].
Example 8. As intended in the introduction the causal model (1) yields the FCM-semantics
of the program P.
Next, we check that the FCM-semantics indeed yields a well-defined distribution.
Proposition 2. Let P be an acyclic probabilistic logic program. The FCM-semantics
assigns to each predicate 𝑝 ∈ P an unique random variable 𝑝FCM .
Proof. Similar to Proposition 1.
Our next aim is to verify that the FCM-semantics yield a consistent generalization of
the distributions semantics. For this purpose we introduce the following notation:
Notation 2. Let P be an acyclic probabilistic logic program. For each P-structure ℳ we
write
𝜋PFCM (ℳ) := 𝜋 𝑝FCM = ℳ(𝑝) : 𝑝 ∈ P .
[︀{︀ }︀]︀
Note that 𝜋PFCM encodes the joint distribution of the random variables 𝑝FCM . Hence, the
distribution of the random variables 𝑝FCM is uniquely determined by 𝜋PFCM .
With Notation 2 at hand we now proceed to the proof of the desired consistency:
Theorem 2. For each acyclic probabilistic logic program P we have that 𝜋PFCM = 𝜋P𝑑𝑖𝑠𝑡 .
Proof. We have to show that 𝜋PFCM (ℳ) = 𝜋P𝑑𝑖𝑠𝑡 (ℳ) for every P-structure ℳ. Denote
by ℰ the restriction of ℳ to the external alphabet E(P) and by C the unique atomic
choice with ℰ = ℰ(C). Now observe that
𝐷𝐸𝐹 }︀]︀ 𝐶𝑜𝑟𝑜𝑙𝑙𝑎𝑟𝑦 1
𝜋PFCM (ℳ) = 𝜋 𝑝FCM = ℳ(𝑝) for all 𝑝 ∈ P
[︀{︀
=
}︀]︀ 𝐷𝐸𝐹 [︀{︀ FCM }︀]︀ 𝐷𝐸𝐹
= 𝜋 𝑒FCM = ℳ(𝑒) : 𝑒 ∈ E(P) = ℰ(𝑒) : 𝑒 ∈ E(P)
[︀{︀
= 𝜋 𝑒 =
𝐷𝐸𝐹
∏︁ ∏︁
= 𝜋(𝑅𝐹 ) · (1 − 𝜋(𝑅𝐹 )) = 𝜋(C) = 𝜋P𝑑𝑖𝑠𝑡 (ℳ).
𝑅𝐹 ∈C 𝑅𝐹 ̸∈C
Corollary 2. The FCM-sematics and the distribution sematics yield the same probability
for every proposition in P.
Proof. This follows by marginalization and equation (4).
Further, recall from 2.1 that functional causal models do not only provide a probability
distribution they also support two further query types, namely the prediction of the
effects of interventions and counterfactual reasoning.
Predicting the Effect of External Interventions in FCM-semantics
Let P be a probabilistic logic program and let X1 , X2 ⊆ I(P) be two subsets of internal
propositions. Assume we would like to calculate the effect of setting the variables
in XFCM
1 to 𝑇 𝑟𝑢𝑒 and the variables in XFCM
2 to 𝐹 𝑎𝑙𝑠𝑒. According to the discussion
in 2.1 and Definition 11 the desired distribution can be calculated from a modified
program P′ , which is obtained as follows:
1.) Replace every clause 𝐿𝐶 ∈ P with head(𝐿𝐶) ∈ X1 by the clause head(𝐿𝐶) ←.
2.) Erase every clause 𝐿𝐶 ∈ P with head(𝐿𝐶) ∈ X2 form P.
We observe that this is the same notion of intervention, which was introduced for
CP-logic in [3, §7.5] and which is implemented in cplint [9]. As CP-logic is consistent
with the distribution semantics Theorem 2 yields the following result:
Theorem 3. The FCM-semantics consistently generalizes the notion of intervention in
CP-logic. □
Next, we want to use probabilistic logic programming as a language for causal
Bayesian networks:
Definition 12. We call an acyclic probabilistic logic program P Markovian if we obtain
for every external proposition 𝑢 ∈ E(P), which occurs in the body of a clause 𝐿𝐶1 that
∀𝐿𝐶2 ∈LP(P) : (head(𝐿𝐶1 ) ̸= head(𝐿𝐶2 ) ⇒ {𝑢, ¬𝑢} ∩ body(𝐿𝐶2 ) = ∅) .
Example 9. The program P in the introduction yields a Markovian probabilistic logic
program.
Theorem 4 (Probabilistic Logic Programs & Causal Bayesian Networks). A Markovian
probabilistic logic program P, equipped with the above notion of intervention, gives rise
to a Boolean causal Bayesian network 𝒞ℬ(P) on the dependency graph graph(P) of P.
Proof. This follows directly from Definition 11, Remark 3 and Theorem 1.
Definition 13 (Bayesian networks semantics). In the situation of Theorem 4 we call
𝒞ℬ(_) the causal Bayesian network semantics for Markovian probabilistic logic pro-
grams.
Remark 5. As the causal Bayesian network semantics allows for the computation of the
effect of interventions it is a proper generalization of the distribution semantics.
Next, we see that the causal Bayesian network semantics yields a proper generaliza-
tion of the distribution semantics for probabilistic logic programs.
Example 10. Consider the Markovian probabilistic logic programs P1 , respectively P2
below:
a :- b. b :- u. u :0.5.
b :- a. a :- u. u :0.5.
It is easy to observe that P1 and P2 determine the same distribution. However, intervening
according to 𝑎FCM = 𝑇 𝑟𝑢𝑒 leaves 𝑏FCM unchanged in P1 , while 𝑏FCM becomes true
in P2 . Hence, P1 and P2 are equivalent under the distribution semantics, whereas they
are not equivalent under the causal Bayesian network semantics.
Considering Example 10, we see that from the classical LPAD-syntax it is not clear,
which knowledge a probabilistic logic program actually states! Hence, one should be
careful when using probabilistic logic programs for causal inference. In particular this
observation has tremendous consequences for the structure learning theory of such
programs:
Assume we want to learn the probabilistic logic program P1 from data. If we only
claim to learn P1 under the distribution semantics, it is equally good to end up with P2 .
However, if we are interested in the effect of interventions, we need to distinguish
between P1 and P2 .
In the theory of Bayesian networks the task of learning a causal Bayesian network is
referred to as causal structure discovery. One of our future goals is to develop a causal
structure discovery algorithm for non-ground probabilistic logic programs.
4. Counterfactual Reasoning in Probabilistic Logic
Programming
Finally, we also want to establish counterfactual reasoning in probabilistic logic pro-
gramming. Assume that we are given an acyclic probabilistic logic program P. Addi-
tionally, let E ⊆ I(P) and X ⊆ I(P) be two subsets of internal propositions. Now
suppose we observe the evidence EFCM = e and we ask ourselves what would have
been happened if we set XFCM = x. In particular it may be that EFCM = e and
XFCM = x contradict each other!
Recalling the discussion in 2.1 and Definition 11 we proceed as follows: First, in
the abduction step, we build a modified program P′ , where we need to adjust the
distribution of the external predicates 𝑢1FCM -𝑢𝑛FCM according to our observations.
Note that in general given new evidence the random variables 𝑢1FCM -𝑢𝑛FCM are no
longer mutually independent! Hence, we encode the full joint distribution of the 𝑢𝑖 in
our program. To this aim we introduce 2𝑛 new meta external predicates 𝑣𝑎, which
we enumerate with sequences (𝑎𝑗) ∈ {0, 1}𝑚 with 𝑚 ≤ 𝑛 and 𝑎𝑚 = 1. The sequence
(𝑎𝑗) encodes the event that 𝑢𝑚FCM is true once we observe 𝑢𝑗 if 𝑎𝑗 = 1 and ¬𝑢𝑗 if
𝑎𝑗 = 0. From Theorem 4 we see that the distribution on the 𝑢𝑖 can be stored in the
following program P0 :
v1 : p1. %u1:p1 random fact in P
v11 : p2. v11 : p2. %u2:p2 random fact in P
...
v11...1 : pn. ... v0...01 : pn. %un : pn random fact in P
u1 :- v1.
u2 :- u1,v11. u2 :- \+u1,v01.
...
un :- u1,u2,...,un-1,v11...1.
...
un :- \+u1,\+u2,...,\+un-1,v0...01.
Hence, the program P′0 that results by replacing the error terms of P with the program
P0 yields the same distribution on the internal prdicates. Finally, we obtain the modified
P′ by replacing each random fact 𝑣𝑎 : 𝜋𝑎 in the program P′0 with the random fact
𝑣𝑎 : 𝜋 (𝑣𝑎|e). We can then compute the desired probabilities by intervening according
to XFCM = x. For a detailed example of the treatment of counterfactuals we refer the
reader to the introduction. Note that there is also a notion of counterfactual reasoning
in CP-logic [5]. We verify in a future paper that these to approaches to counterfactual
reasoning in probabilistic logic programming lead to the same result.
Further, like the causal Bayesian network semantics generalizes the distribution
semantics, we also find that the FCM-semantics properly generalizes the causal Bayesian
network semantics.
Example 11. Consider the probabilistic logic programs P1 , respectively P2 below:
u1:0.5. u2:0.5. u3:0.4.
treatment :- u1.
recovery :- u2.
recovery :- treatment, u3.
u1:0.5. u2:0.5. u3:0.7.
treatment :- u1.
recovery :- \+treatment, u2.
recovery :- treatment, u3.
According to Theorem 4 both programs give rise to a causal Bayesian network on the
graph 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 → 𝑟𝑒𝑐𝑜𝑣𝑒𝑟𝑦.
Observe that in P1 , there is a baseline chance of 0.5 for recovering from an illness,
and treatment gives an additional chance of 0.4 that a patient recovers from treatment
that would not otherwise have recovered. Therefore, the probability of recovery under
treatment is 0.7.
In P2 , the recovery rates with and without treatment have been modelled separately,
with a 0.7 recovery rate under treatment and a 0.5 rate without it.
Therefore the programs P1/2 yield the same probabilities for 𝜋(𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡),
𝜋(𝑟𝑒𝑐𝑜𝑣𝑒𝑟𝑦|𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡) and 𝜋(𝑟𝑒𝑐𝑜𝑣𝑒𝑟𝑦|¬𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡). Thus they share the same causal
Bayesian network semantics and therefore the same distribution semantics.
Next, let us assume we observe that treatment is false and that recovery is true. Let 𝑄
be the query: “What is the probability of recovery if treatment was true?” In both pro-
grams P1/2 our observations imply that 𝑢2 is true. Hence, P1 answers the query 𝑄 with 1.
For P2 we obtain that P′2 is equivalent to the following program:
v1:0.5. v11 : 0.7. v01:0.7.
u1 :- v1. u2. u3 :- u1,v11. u3:-\+u1,v01.
treatment :- u1.
recovery :- \+treatment, u2.
recovery :- treatment, u3.
This means that P2 answers 𝑄 with a probability of 0.7.
Note that when modelling under the distribution semantics, one would often be tempted
to prefer P2 , since separating the possible situations makes it easier to see the probabilities
of recovery of each reference class at first glance. However, when using the resulting
program for counterfactual reasoning, the FCM-semantics of P2 implies that recovery
with and without treatment are completely independent. In particular, we obtain for every
patient that recovered untreated a 0.3 probability that he would not have recovered if he
had indeed received the treatment. This would be an extraordinary claim, and usually the
interpretation encoded by P1 , namely that treatment enhances any individual’s chance
of recovery, would be closer to the intended meaning.
5. Syntactic Sugar and Non-Ground Probabilistic Logic
Programs
In probabilistic logic programming under Problog semantics, a special case of the
distribution semantics [1, §2.1.2], one introduces the following syntactic sugar:
Definition 14 (Problog Program). A Problog clause 𝑅𝐶 is an expression of the form
ℎ : 𝜋 ← 𝑏1 , ..., 𝑏𝑛 for a proposition head(𝑅𝐶) := ℎ, a real number 𝜋(𝑅𝐶) := 𝜋 ∈ [0, 1]
and literals body(𝑅𝐶) := {𝑏1 , ..., 𝑏𝑛 }. Further, a Problog program P is a finite set of
Problog clauses.
For each Problog clause 𝑅𝐶 ∈ P we add a distinct external literal 𝑢(𝑅𝐶) to our alpha-
bet. In this case the FCM-semantics of P is given by the FCM-semantics of the program
PLP(P), which is given by
LP(PLP(P)) := {head(𝑅𝐶) ← body(𝑅𝐶) ∪ 𝑢(𝑅𝐶) : 𝑅𝐶 ∈ P}
Facts(PLP(P)) := {𝑢(𝑅𝐶) : 𝜋(𝑅𝐶) : 𝑅𝐶 ∈ P}.
Finally, we call P acyclic is PLP(P) is an acyclic probabilistic logic program.
Remark 6. An acyclic Problog program is automatically Markovian, i.e. it posses a well-
defined causal Bayesian network semantics.
Example 12. The causal model in Example 1 can be defined using the following Problog
program:
szn_spr_sum:0.5.
sprinkler :0.7 :- szn_spr_sum.
rain:0.1 :- szn_spr_sum.
rain:0.6 :- \+szn_spr_sum.
wet:1 :- rain.
wet:1 :- sprinkler.
slippery :1 :- wet.
Finally, non-ground probabilistic logic programs can be considered as Problog pro-
grams built over relational alphabets. Instantiating such a non-ground program with a
given domain then yields a propositional program to which the results of this paper can
be applied. Hence, the FCM-semantics for probabilistic logic programs and all results
of this paper easily generalize to the non-ground case. In particular by Remark 6 all
non-ground programs constructed in this way possess a well-defined causal Bayesian
network semantics.
6. Conclusion
In this paper we present an interpretation of acyclic probabilistic logic programs in terms
of Pearl’s functional causal models. As a consequence we are able to establish the three
query types of functional causal models, namely probability calculus, prediction of the
effect of interventions and counterfactual reasoning in the framework of probabilistic
logic programming.
Moreover, we see that each of these query types comes with its own semantics and
that it is not clear from the syntax which knowledge a given probabilistic logic program
actually states. Hence, one should be careful when using a probabilistic logic program
for causal queries.
Finally, we highlight that each of those semantics yields a new structure learning task.
This brings the subject of causal structure discovery to the realm of probabilistic logic
programming. In the future we aim to develop causal structure discovery algorithms
for learning non-ground probabilistic logic programs under causal Bayesian network
and FCM-semantics.
Acknowledgments
The research leading to this publication was supported by LMUexcellent, funded by
the Federal Ministry of Education and Research (BMBF) and the Free State of Bavaria
under the Excellence Strategy of the Federal Government and the Länder.
The athosr would like to thank Kailin Sun for proofreading the initial submission.
References
[1] F. Riguzzi, Foundations of Probabilistic Logic Programming: Languages, Semantics,
Inference and Learning, River Publishers, 2020.
[2] J. Pearl, Causality, 2 ed., Cambridge University Press, Cambridge, UK, 2000.
doi:10.1017/CBO9780511803161.
[3] J. Vennekens, M. Denecker, M. Bruynooghe, CP-logic: A Language of Causal
Probabilistic Events and Its Relation to Logic Programming, Theory Pract. Log.
Program. 9 (2009) 245–308. doi:10.1017/S1471068409003767.
[4] A. Bochman, V. Lifschitz, Pearl’s causality in a logical setting, in: B. Bonet,
S. Koenig (Eds.), Proceedings of the Twenty-Ninth AAAI Conference on Artificial
Intelligence, January 25-30, 2015, Austin, Texas, USA, AAAI Press, 2015, pp. 1446–
1452. doi:10.1609/aaai.v29i1.9411.
[5] J. Vennekens, M. Bruynooghe, M. Denecker, Embracing events in causal modelling:
Interventions and counterfactuals in cp-logic, in: T. Janhunen, I. Niemelä (Eds.),
Logics in Artificial Intelligence, Springer Berlin Heidelberg, Berlin, Heidelberg,
2010, pp. 313–325.
[6] J. Vennekens, S. Verbaeten, Logic Programs with Annotated Disjunctions, Tech-
nical Report CW 368, Katholieke Universiteit Leuven, Department of Com-
puter Science, 2003. URL: https://www.cs.kuleuven.be/publicaties/rapporten/cw/
CW368.abs.html.
[7] F. Fages, Consistency of clark’s completion and existence of stable models., Meth.
of Logic in CS 1 (1994) 51–60.
[8] T. Eiter, M. Hecher, R. Kiesel, Treewidth-aware cycle breaking for algebraic answer
set counting, in: M. Bienvenu, G. Lakemeyer, E. Erdem (Eds.), Proceedings of
the 18th International Conference on Principles of Knowledge Representation
and Reasoning, KR 2021, Online event, November 3-12, 2021, 2021, pp. 269–279.
doi:10.24963/kr.2021/26.
[9] F. Riguzzi, G. Cota, E. Bellodi, R. Zese, Causal inference in cplint, Inter-
national Journal of Approximate Reasoning 91 (2017) 216–232. doi:10.1016/
j.ijar.2017.09.007.