Correct Causal Inference in
Probabilistic Logic Programming
Kilian Rückschloß, Felix Weitkämper
Ludwig-Maximilians-Universität München
Institut für Informatik
Lehr- und Forschungseinheit für Programmier- und Modellierungssprachen
Oettingenstraße 67
D-80538 München (Germany)


                                      Abstract
                                      In the current work we investigate the relationship between the classical distribution semantics and
                                      causal semantics for probabilistic logic programming. We first give an infinite family of programs yield-
                                      ing the same distribution. Therefore, we deduce that unlike in classical causal structure discovery, in
                                      the relational case it is not possible to build a finite search space of causal explanations for an observed
                                      distribution. Finally, this article develops a procedure to verify the causal content of a probabilistic logic
                                      program, whose structure is obtained from observations.

                                      Keywords
                                      Probabilistic Logic Programming, Causal Structure Discovery, Causal Semantics, Statistical Relational
                                      Artificial Intelligence


1. Introduction
To illustrate the difficulties of causal structure discovery in probabilistic logic programming we
begin by constructing an infinite family of programs that all induce the same distribution:
   Assume we are given a database that models a directed graph 𝐺, that is, it consists a fact 𝑒(𝑥, 𝑦)
for every edge 𝑥 → 𝑦 in 𝐺. Moreover, each node 𝑥 in 𝐺 may satisfy a certain property 𝑝. In
this case our database contains the fact 𝑝(𝑥).
   Let 𝑛 ∈ N be a natural number and let 𝑥 be a node in 𝐺. We write 𝑛_𝑝𝑎𝑡ℎ(𝑥) if there exists a
path of length at most 𝑛 from 𝑥 to a node satisfying 𝑝. The predicate 𝑛_𝑝𝑎𝑡ℎ(_) can be defined
by the following normal clause:

                                                  𝑛_𝑝𝑎𝑡ℎ(𝑋) ← 𝑒(𝑋, 𝑋1 ), 𝑒(𝑋1 , 𝑋2 ), ..., 𝑒(𝑋𝑛−1 , 𝑋𝑛 ), 𝑝(𝑋𝑛 )

Further, we write 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑥) if no such path exists, i.e.

                                                                           𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑋) ← ¬𝑛_𝑝𝑎𝑡ℎ(𝑋).

Haifa’22: 15th Workshop on Answer Set Programming and Other Computing Paradigms
   kilian.rueckschloss@lmu.de (K. Rückschloß); felix.weitkaemper@lmu.de (F. Weitkämper)
{ https://www.pms.ifi.lmu.de/mitarbeiter/derzeitige/kilian-rueckschloss/ (K. Rückschloß);
https://www.pms.ifi.lmu.de/mitarbeiter/derzeitige/felix-weitkaemper/index.html (F. Weitkämper)
 0000-0002-7891-6030 (K. Rückschloß); 0000-0002-3895-8279 (F. Weitkämper)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
Let 𝛼 ∈ (0, 1] be an arbitrary real number. Further, assume that every node 𝑥 in 𝐺 gives rise
to two Boolean random variables 𝑟(𝑥) and 𝑠(𝑥). We consider the following probabilistic logic
program:                         ⎧
                                 ⎪
                                 ⎪
                                 ⎪ 𝑟(𝑋) : 𝛼 ← 𝑛_𝑝𝑎𝑡ℎ(𝑋)
                                 ⎪
                                 ⎨𝑠(𝑋) ← 𝑟(𝑋), 𝑛_𝑝𝑎𝑡ℎ(𝑋)
                          P𝑛 :=                                     ,
                                 ⎪𝑠(𝑋) : 𝛼 ← 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑋)
                                 ⎪
                                 ⎪
                                 ⎪
                                 ⎩𝑟(𝑋) ← 𝑠(𝑋), 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑋)

An easy calculation shows that all the programs P𝑛 yield the same distribution of the random
variables 𝑟(𝑥) and 𝑠(𝑥), where 𝑥 ranges over the nodes of the fixed directed graph 𝐺. However,
for a fixed 𝑛 ∈ N the program P𝑛 describes a causal mechanism, in which for every node 𝑥
in 𝐺 we find that 𝑠(𝑥) is a cause of 𝑟(𝑥) unless 𝑛_𝑝𝑎𝑡ℎ(𝑥) holds. On the other hand 𝑟(𝑥) is a
cause of 𝑠(𝑥) if we find 𝑛_𝑝𝑎𝑡ℎ(𝑥) in 𝐺. Hence, all the P𝑛 represent different causal structures.
   Furthermore, assume that we are given data sampled from the distribution of 𝑟(_) and 𝑠(_).
Moreover, a structure learning algorithm like slipcover [1] yields a program P that induces the
same distribution as all the P𝑛 . Hence, we obtain infinitely many hypothesis for the underlying
causal mechanism, even if we assume that P represents the right distribution.
   Therefore, unlike the situation in classical causal structure discovery in probabilistic logic
programming it is in general impossible to compute the search space of all causal explainations
for an observed distribution. Hence, given a probabilistic logic program, whose structure has
been derived from observational data, the best one can do is to verify its causal content. The
aim of this contribution is to propose a procedure for this verification task.
   In Section 2 we begin by recalling Pearl’s functional causal models and their relations to
causal Bayesian networks. Further, we give a brief overview over the theory of causal structure
discovery for causal Bayesian networks. Finally, we introduce a semantics for probabilistic logic
programs in terms of functional causal models, which allows us to ground a given program
to a causal Bayesian network. The main idea of our work in Section 3 is then to lift the
indistinguishability statement for causal Bayesian networks, Theorem 5, to probabilistic logic
programming in order to accomplish our verification task.


2. Preliminaries
We begin with recalling Pearl’s theory of causality. Further, we establish a semantics for
probabilistic logic programs that allows us to reason about causal relationships in those programs.
Note that in the following we use 𝜋 to denote probabilities of random variables.

2.1. Pearl’s Functional Causal Models
Let us recall the definition of a functional causal model [2, 1.4.1]:

Definition 1 (Functional Causal Model). A functional causal model or causal model ℳ on
a set of variables V is a system of equations that consists of one defining equation of the form
𝑋 := 𝑓𝑋 (Pa(𝑋), Error(𝑋)) for each variable 𝑋 ∈ V. Here the parents Pa(𝑋) ⊆ V of 𝑋 form
a subset of the set of variables V, the error term Error(𝑋) of 𝑋 is a vector of random variables
and 𝑓𝑋 is a function defining 𝑋 in terms of the parents Pa(𝑋) and the error term Error(𝑋) of 𝑋.


  To illustrate Definition 1 we introduce the following example:
Example 1. Assume we are given a database of countries. Further, we know which products 𝑝
are produced in country 𝑐 denoted by 𝑝𝑟𝑜𝑑(𝑐, 𝑝). For each product and each country there is a
probability 𝛼 such that country 𝑐 needs the product 𝑝, denoted by 𝑛𝑒𝑒𝑑(𝑐, 𝑝). This is reflected in
the following equation:
                          𝑛𝑒𝑒𝑑(𝑐, 𝑝) := 𝑢1 (𝑐, 𝑝) with 𝜋(𝑢1 (𝑐, 𝑝)) := 𝛼                          (1)
Analogously, a country 𝑐 is corrupt (denoted 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐)) with probability 𝛽, i.e.
                            𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐) := 𝑢2 (𝑐) with 𝜋(𝑢2 (𝑐)) := 𝛽.                             (2)
If a country 𝑐1 needs a product 𝑝 and does not produce 𝑝, but 𝑐2 does, then with probability 𝛾 the
country 𝑐1 imports from 𝑐2 , denoted by 𝑖𝑚𝑝(𝑐1 , 𝑐2 ), i.e.
                            ⋁︁
        𝑖𝑚𝑝(𝑐1 , 𝑐2 ) :=           𝑛𝑒𝑒𝑑(𝑐1 , 𝑝) ∧ 𝑢3 (𝑐1 , 𝑐2 , 𝑝) with 𝜋(𝑢3 (𝑐1 , 𝑐2 , 𝑝)) := 𝛾. (3)
                        ¬𝑝𝑟𝑜𝑑(𝑐1 ,𝑝)
                         𝑝𝑟𝑜𝑑(𝑐2 ,𝑝)

Further, if a country 𝑐1 imports from a country 𝑐2 , we observe economic growth in 𝑐2 , denoted
𝑔𝑟𝑜𝑤𝑠(𝑐2 ) with probability 𝛿, i.e.
                             ⋁︁
             𝑔𝑟𝑜𝑤𝑠(𝑐2 ) :=          𝑖𝑚𝑝(𝑐1 , 𝑐2 ) ∧ 𝑢4 (𝑐1 , 𝑐2 ) with 𝜋(𝑢4 (𝑐1 , 𝑐2 )) := 𝛿. (4)
                            𝑐1 country

Finally, if we find economic growth in country 𝑐 and 𝑐 is not corrupt, the living standard rises in 𝑐,
denoted by 𝑙𝑖𝑣𝑒(𝑐), with probability 𝜖, i.e.
              𝑙𝑖𝑣𝑒(𝑐) := 𝑔𝑟𝑜𝑤𝑠(𝑐) ∧ (¬𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐)) ∧ 𝑢5 (𝑐) with 𝜋(𝑢5 (𝑐)) := 𝜖.                   (5)
The equations of the form (1), (2), (3), (4) and (5) yield a functional causal model ℳ.
   Next, we note that causal models do not only support queries about conditional and uncondi-
tional probabilities. They support two other more general query types, namely determining
the effect of interventions and counterfactuals. Here, we focus on the treatment of external
interventions.

2.1.1. Predicting the Effect of Interventions
Fix a causal model ℳ on a set of variables V. Assume we are given a subset of variables
X := {𝑋1 , ..., 𝑋𝑘 } ⊆ V together with a vector of possible values x := (𝑥1 , ..., 𝑥𝑘 ). To calculate
the effect of setting the variables in X to the values specified by x we replace the equations
for 𝑋𝑖 in ℳ by 𝑋𝑖 := 𝑥𝑖 for all 1 ≤ 𝑖 ≤ 𝑘 and calculate the desired probabilities. The resulting
probability distribution is then denoted by 𝜋(_| do(X = x)) [2, §1.4.3].
Example 2. If a commission roots out corruption in country 𝑐, we can model this intervention by
replacing equation (4) for 𝑐 in the causal model ℳ of Example 1 with 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐) := 𝐹 𝑎𝑙𝑠𝑒.
2.2. Causal Bayesian Networks and d-Separation
We briefly recall the theory of Bayesian networks, a widespread representation for probability
distributions. We compare them with the functional causal models of 2.1 and obtain a causal
semantics for Bayesian networks, i.e. we discuss how to determine the effect of an intervention
from a Bayesian networks structure.
   We begin by recalling the definition of a Bayesian network:
Definition 2 (Bayesian Network). A Bayesian network on a set of random variables V consists
of a directed acyclic graph 𝐺 on V and the conditional probability distributions 𝜋(𝑋| Pa(𝑋)) of
the random variables 𝑋 ∈ V conditioned on the set Pa(𝑋) of their parents in the graph 𝐺. It
gives rise to a probability distribution on V = {𝑋1 , ..., 𝑋𝑘 } by setting
                                                      𝑘
                                                     ∏︁
                    𝜋 (𝑋1 = 𝑥1 , ..., 𝑋𝑘 = 𝑥𝑘 ) =          𝜋 (𝑋𝑖 = 𝑥𝑖 | pa(𝑋𝑖 )) ,
                                                     𝑖=1

where pa(𝑋𝑖 ) := {𝑋𝑗 = 𝑥𝑗 |𝑋𝑗 ∈ Pa(𝑋𝑖 )}.
  We proceed by recalling the relation between Bayesian networks and causal models.
Definition 3 (Acyclic & Markovian Causal Models). For each causal model ℳ on V we define
the causal diagram graph(ℳ) to be the directed graph on V, which is given by drawing an
arrow 𝑋 → 𝑌 if and only if 𝑋 ∈ Pa(𝑌 ). Further, we call ℳ an acyclic causal model if its
causal diagram graph(ℳ) is a directed acyclic graph. Finally, an acyclic causal model is said to
be Markovian if the error terms Error(𝑋) are mutually independent for all 𝑋 ∈ V.
  The next theorem describes how causal models and Bayesian networks are related to each
other [2, 1.4.2]:
Theorem 1. Every Markovian causal model ℳ gives rise to a probability distribution that can
be stored in a Bayesian network ℬ(ℳ) on its causal diagram.
   Theorem 1 yields a way to determine effects of interventions from a Bayesian network ℬ(ℳ)
corresponding to a Markovian causal model ℳ. To calculate the effect of setting X := {𝑋1 , ..., 𝑋𝑘 }
to x := (𝑥1 , ..., 𝑥𝑘 ) one can modify ℬ(ℳ) by adjusting the conditional distributions 𝜋(𝑋𝑖 | Pa(𝑋𝑖 ))
to 𝜋(𝑋𝑖 | pa(𝑋𝑖 )) = 𝛿𝑋𝑖 ,𝑥𝑖 , where 𝛿𝑋𝑖 ,𝑥𝑖 denotes the Kronecker delta. The resulting structure is
then called the causal Bayesian network 𝒞ℬ(ℳ) [2, §1.3]. In other words a causal Bayesian
network is a Bayesian network that supports the calculation of effects of interventions.
Example 3. Assume that the error terms 𝑢1 (_, _), 𝑢2 (_), 𝑢3 (_, _, _), 𝑢4 (_, _) and 𝑢5 (_) are
mutually independent, i.e. the causal model ℳ of Example 1 is Markovian. Hence, for three
countries 𝑐1 , 𝑐2 , 𝑐3 and one product 𝑝 with 𝑝𝑟𝑜𝑑(𝑐2 , 𝑝) and not 𝑝𝑟𝑜𝑑(𝑐𝑖 , 𝑝) for 𝑖 = 1, 2. We
obtain by Theorem 1 that omitting isolated points the resulting distribution can be stored in a
causal Bayesian network on the graph

          𝑛𝑒𝑒𝑑(𝑐3 , 𝑝)               𝑖𝑚𝑝(𝑐3 , 𝑐2 )           𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐2 )
                                                                                                 .

          𝑛𝑒𝑒𝑑(𝑐1 , 𝑝)               𝑖𝑚𝑝(𝑐1 , 𝑐2 )            𝑔𝑟𝑜𝑤𝑠(𝑐2 )             𝑙𝑖𝑣𝑒(𝑐2 )
  Finally, we state a modularity result for causal models, whose proof is given in the Appendix:
Theorem 2 (Modularity of Causal Models). Let ℳ1 and ℳ2 be functional causal models that
induce the same distribution on the set of variables V and let 𝑋 ∈ V be a variable. Further-
more, let us assume that the set Pa(𝑋) of the parents of 𝑋 coincide for both causal diagrams
graph(ℳ1 ) and graph(ℳ2 ). If we construct the causal model ℳ3 by exchanging the defining
equation for 𝑋 in ℳ1 by the defining equation for 𝑋 in ℳ2 , we obtain that the causal Bayesian
networks 𝒞ℬ(ℳ1 ) and 𝒞ℬ(ℳ3 ) coincide.

2.2.1. d-Separation
The graph underlying a Bayesian network gives rise to a correct conditional independence
oracle in the following way [3]:
Definition 4 (d-Connecting Path). Let 𝐺 be a directed acyclic graph. A subgraph of the form
𝐴 → 𝐶 ← 𝐵 is called a collider at 𝐶. We say that a collider is shielded if there exists an
edge 𝐶 − 𝐵. Otherwise, a collider is said to be unshielded.
   Further, let Z be a set of nodes. A d-connecting path 𝑃 := 𝑅1 − ... − 𝑅𝑛 is an undirected
path in 𝐺 such that each non-collider 𝑁 of the path 𝑃 does not lie in Z and such that for every
collider at 𝐶 of the path 𝑃 there exists a directed path from 𝐶 to a node in Z.
   Moreover, we say that Z d-connects two nodes 𝐴 and 𝐵 if there exists a d-connecting path
between 𝐴 and 𝐵 with respect to Z. Otherwise, we say that Z d-separates 𝐴 and 𝐵. Finally, two
sets of nodes A and B are said to be d-separated by Z if Z d-separates 𝐴 and 𝐵 for every 𝐴 ∈ A
and every 𝐵 ∈ B. Otherwise, the sets A and B are d-connected by Z.
Remark 1. The term “d-connected” is a shorthand for “directionally connected” [4].
Theorem 3 (Correctness of d-Separation). Let 𝐺 be a directed acyclic graph on the set V. Further,
let Z ⊆ V be a subset of nodes and let 𝐴, 𝐵 ∈ V be nodes of 𝐺. If Z d-separates 𝐴 and 𝐵, we obtain
that 𝐴 and 𝐵 are independent conditioned on Z in every distribution that can be represented by a
Bayesian network on 𝐺.
 Note that in general d-separation does not yield a complete independence oracle [5, 1.4]. This
motivates the following definition:
Definition 5 (Causal Faithfulness). Let 𝜋 be a distribution on a set of random variables V and
let 𝐺 be a directed acyclic graph on theses random variables V. We say that the distribution 𝜋
is (causally) faithful to the graph 𝐺 in the case where two random variables 𝐴, 𝐵 ∈ V are
conditionally independent with respect to a subset of random variables Z ⊆ V if and only if 𝐴
and 𝐵 are d-separated by Z in 𝐺.
  However, the following theorem shows that faithfulness holds for almost all discrete Bayesian
networks [6]:
Theorem 4 (Obstruction to Causal Faithfulness). Let 𝐺 be a directed acyclic graph and let 𝜃 ∈ [0, 1]𝑛
be a vector which determines the conditional distributions that turn 𝐺 into a discrete Bayesian net-
work ℬ representing the distribution 𝜋. In this case we obtain finitely many non-trivial polynomial
equations such that 𝜋 is faithful to 𝐺 unless 𝜃 solves one of these equations.
  For an in depth discussion of causal faithfulness we refer the reader to [7].
2.2.2. Causal Structure Discovery for Bayesian Networks
At this place we want to discuss how we can learn a causal Bayesian network, i.e. a Bayesian
network that allows us to predict the effect of interventions, from observational data. Since we
have only observational data, the best we can do is to learn a Bayesian network that represents
the right distribution. However, we want to find the Bayesian network with the right causal
semantics!
   More precisely, we want to find all possible causal explanations for the observed distribution.
We consider this as an inverse problem and to tackle this inverse problem we assume that the
causal Bayesian network that generated our data satisfies the causal faithfulness assumption of
Definition 5. In the light of Theorem 4 this doesn’t seem to be too restrictive. Furthermore, the
following fact is known [8]:

Theorem 5. Let 𝐺 and 𝐺′ be two directed acyclic graphs on a set of random variables V. In this
case the following statements are equivalent:
    i) Each distribution 𝜋 that is faithful to 𝐺 is also faithful to 𝐺′ .
   ii) The graphs 𝐺 and 𝐺′ share the same adjacencies and the same unshielded colliders.

  Theorem 5 forms the theoretical basis for causal structure discovery algorithms such as the
PC-algorithm [9] and its generalizations.

Example 4. It is easy to see that in Example 3 the causal relationship between 𝑛𝑒𝑒𝑑(_, _) and
𝑖𝑚𝑝(_, _) cannot by identified.

2.3. Probabilistic Logic Programming under the FCM-Semantics
Let us fix a query language, i.e. a language Q ⊇ L ⊇ E in three parts with an extensional
vocabulary E and a logical vocabulary L. Here Q is a finite multisorted relational vocabulary,
i.e. it consists of a finite set of sorts, a finite set of relation symbols and a finite set of constants
in those sorts, as well as a countably infinite set of variables in every sort. For every variable or
constant 𝑥 we will denote its sort by s(𝑥). Further, L is a subvocabulary of Q that contains all
of the variables and constants of Q as well as a (possibly empty) subset of the relation symbols
of Q. Moreover, E is a subvocabulary of L that satisfies the same properties with respect to L
as L does with respect to Q.

Example 5. In order to model Example 1 we need two sorts: 𝑐𝑜𝑢𝑛𝑡𝑟𝑖𝑒𝑠 and 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠. Further,
we model 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(_), 𝑛𝑒𝑒𝑑(_, _), 𝑖𝑚𝑝(_, _), 𝑔𝑟𝑜𝑤𝑠(_) and 𝑙𝑖𝑣𝑒(_) as random predicates and
𝑝𝑟𝑜𝑑(_, _) as an external predicate. Later, we will also see an example for an internal predicate.

   As usual, an atom is an expression of the form 𝑟(𝑡1 , . . . , 𝑡𝑛 ), where 𝑟 is a relation symbol
and 𝑡1 to 𝑡𝑛 are constants or variables of the correct sorts and a literal is an expression of the
form 𝐴 or ¬𝐴 for an atom 𝐴. It is called a external atom or literal if 𝑟 is in E, logical atom
or literal if 𝑟 is in L, internal atom or literal if 𝑟 lies in L ∖ E and a random atom or literal
if 𝑟 is in Q ∖ L. A literal of the form 𝐴 is called positive and a literal of the form ¬𝐴 is called
negative. Moreover, we will use the operator var(_) to refer to the variables in an expression.
   The logical vocabulary will be used to formulate constraints and conditions for our prob-
abilistic logic program. The purpose of a probabilistic logic program, however, is to define
distributions for the random variables, determined by the language Q. This is done by so called
random clauses, i.e. LPAD-clauses with one random atom in the head.
Definition 6 (Random Clause). A random clause 𝑅𝐶 is an expression of the form
                              𝑅 : 𝜋(𝑅𝐶) ← 𝑅1 , ..., 𝑅𝑚 , 𝐿1 , ..., 𝐿𝑛 ,
where 𝑅 := effect(𝑅𝐶) is a random atom, called the effect of 𝑅𝐶, causes(𝑅𝐶) := {𝑅1 , ..., 𝑅𝑚 }
is a finite set of random literals, called the causes of 𝑅𝐶, cond(𝑅𝐶) := {𝐿1 , ..., 𝐿𝑛 } is a finite
set of logical atoms, called the condition of 𝑅𝐶 and 𝜋(𝑅𝐶) ∈ (0, 1] is a number called the
probability of 𝑅𝐶. In this case we also denote the random clause 𝑅𝐶 by
                      effect(𝑅𝐶) : 𝜋(𝑅𝐶) ← causes(𝑅𝐶) ∪ cond(𝑅𝐶).


   Having established the necessary syntax, we introduce the corresponding semantics. The
semantics of logical expressions are given in a straightforward way; we just want to highlight
the unique names assumption in our definition of a structure:
   An L-structure Λ consists of a sort universe sΛ for every sort s, an element of the corre-
sponding sort universe for every constant in L, such that two different constants are mapped to
different elements, and a relation among the corresponding sort universes for every relation
in L.
   Whether a logical formula is satisfied by a given L-structure (under an interpretation of
its free variables) is determined by the usual rules of first-order logic. Finally, note that the
semantics of external expressions is defined analogously.

Lifted Programs
We start with the definition of a lifted program:
Definition 7 (Lifted Program & Program Structure). A (lifted) program P := (R, I, C) is a
triple that consists of a finite set of normal clauses I of the form 𝐻 ← 𝐵1 , ..., 𝐵𝑚 with logical
literals 𝐵1 , ..., 𝐵𝑚 and an internal atom 𝐻, integrity constraints C of the form ⊥← 𝐿1 , ..., 𝐿𝑘
for logical literals 𝐿𝑖 with 1 ≤ 𝑖 ≤ 𝑘, as well as a finite set of random clauses R.
   We call C the constraints, I the internal part and R the random part of P. Further, we say
that P is stratified if its internal part I is a stratified set of normal clauses.
Example 6. We model the causal mechanism in Example 1 by a program P, given by the random
part:
𝑛𝑒𝑒𝑑(𝐶, 𝑃 ) : 𝛼
𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝐶) : 𝛽
𝑖𝑚(𝐶1 , 𝐶2 ) : 𝛾 ← 𝑛𝑒𝑒𝑑(𝐶1 , 𝑃 ), 𝑛𝑒𝑔_𝑝𝑟𝑜𝑑(𝐶1 , 𝑃 ), 𝑝𝑟𝑜𝑑(𝐶2 , 𝑃 )
𝑔𝑟𝑜𝑤𝑠(𝐶2 ) : 𝛿 ← 𝑖𝑚𝑝(𝐶1 ), 𝑖𝑚𝑝(𝐶2 )
𝑙𝑖𝑣𝑒(𝐶) : 𝜖 ← 𝑔𝑟𝑜𝑤(𝐶), ¬𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝐶)
and the internal part: 𝑛𝑒𝑔_𝑝𝑟𝑜𝑑(𝐶, 𝑃 ) ← ¬𝑝𝑟𝑜𝑑(𝐶, 𝑃 ) for an internal predicate 𝑛𝑒𝑔_𝑝𝑟𝑜𝑑(_, _).
Note that the database ℰ described in Example 3 yields an external database for P.
Notation 1. Let P := (R, I, C) be a stratified program and let ℰ be an E-structure. In our setting
we may w.l.o.g. assume that ℰ is a Herbrand model of an alphabet E* , extending the external
alphabet E by constants. Further, denote by L* and Q* the extension of the languages L and Q
by the new constants in E* . We write ℰ I := {𝐿 ground atom of L* : I ∪ ℰ |= 𝐿} for the minimal
Herbrand model of I ∪ ℰ, which is the result of applying the stratified datalog program I to ℰ.
   An E-structure ℰ is said to be an external database for the program P if it satisfies all integrity
constraints in C, i.e. if ℰ I |= C. A ground variable is a ground random atom 𝐺 := 𝑟(𝑥1 , ..., 𝑥𝑛 )
of Q* . Finally, we write 𝒢(ℰ) for the set of all ground variables.

  The term ground variable indicates that we expect 𝐺 to become a proper random variable
under our semantics. From now on we will assume every program to be stratified.

Definition 8 (FCM-Semantics of lifted Programs). Let P := (R, I, C) be a program and let ℰ be
an external database for P. The grounding Pℰ of the program P with respect to ℰ is the Boolean
functional causal model on the set of ground variables 𝒢(ℰ) given by the following defining equa-
tion for every ground variable 𝐺 ∈ 𝒢(ℰ):
                                              ⎛                               ⎞
                                 ⋁︁                   ⋀︁
                  𝐺 :=                        ⎝              𝐶 𝜄 ∧ 𝑢 (𝑅𝐶, 𝜄)⎠
                                   𝑅𝐶∈R(P)             𝐶∈causes(𝑅𝐶)
                         𝜄 interpretation on var(𝑅𝐶)
                                effect(𝑅𝐶)𝜄 =𝐺
                              ( I ,𝜄)|=cond(𝑅𝐶)
                               ℰ


Here, the error term 𝑢 (𝑅𝐶, 𝜄) is a distinct Boolean random variable with the distribution

                                        𝜋 (𝑢 (𝑅𝐶, 𝜄)) = 𝜋(𝑅𝐶)

for every random clause 𝑅𝐶 ∈ R and every variable interpretation 𝜄 on var(𝑅𝐶). Besides, the
error terms 𝑢 (𝑅𝐶, 𝜄) are assumed to be mutually independent.

Example 7. As intended, the causal model ℳ of Example 1 yields the semantics of the program P
in Example 6 with respect to the described database ℰ.

Definition 9 (Ground Graph & Acyclicity). Let ℰ be an external database for the program
P := (R, I, C). Then the ground graph Graphℰ (P) is the directed graph on the set of ground
variables 𝒢(ℰ) with an edge 𝐺1 → 𝐺2 if and only if there exists a random clause )︀ ∈ R, a cause
                                                                           (︀ I 𝑅𝐶
𝐶 ∈ causes(𝑅𝐶) and a variable interpretation 𝜄 on var(𝑅𝐶) such that ℰ , 𝜄 |= cond(𝑅𝐶),
𝐶 𝜄 ∈ {𝐺1 , ¬𝐺1 } and such that effect(𝑅𝐶)𝜄 = 𝐺2 . In this case we say that the edge 𝐺2 ← 𝐺1
is induced by 𝑅𝐶. Moreover, we call P an acyclic program if Graphℰ (P) is a directed acyclic
graph for every external database ℰ.

Example 8. The graph in Example 3 is the ground graph of the program P in Example 6 with
respect to the described database ℰ. Further, it is easy to see that P yields an acyclic program.

   From now on we assume every program P := (R, I, C) to be acyclic. Then for every external
database ℰ the grounding Pℰ yields unique expressions for every ground variable 𝐺 ∈ 𝒢(ℰ) in
terms of the error terms 𝑢(𝑅𝐶, 𝜅). Hence, it induces a unique probability distribution on 𝒢(ℰ).
We will show in a forthcoming paper that this distribution coincides with the distribution
induced by the LPAD-program P ∪ ℰ via the distribution semantics [10, Chapter 1 §2.1]. All in
all Definition 7 yields a generalization of the standard semantics for acyclic LPAD-programs
with interesting properties regarding causation. Further, since we observe that Graphℰ (P) is
the causal diagram of Pℰ , we obtain:

Proposition 1. Let P := (R, I, C) be a program. For every external database ℰ the ground-
ing Pℰ yields a distribution, which can be stored in a Boolean Bayesian network on the ground
graph Graphℰ (P). This is also the causal Bayesian network corresponding to the functional causal
model Pℰ .

  We call the Bayesian network 𝒞ℬ(P, ℰ) of Proposition 1 the causal Bayesian network
semantics of P with respect to ℰ. In our future paper we will also show that our semantics is
consistent with CP-logic [11]. Finally, note that Pearl’s do-calculus is implemented in cplint
with respect to the CP-logic semantics [12].


3. Verifying the Causal Content of Lifted Programs
The proofs of all further statements can be found in the proof appendix. We now return to
the task of verifying the causal content of a probabilistic logic program whose structure was
derived from observational data. More formally, we investigate the following situation:

Problem 1. Suppose we are given a query alphabet Q ⊇ L ⊇ E, a stratified datalog program I,
a set of integrity constraints C and an E-structure ℰ with ℰ I |= C. Further, assume we are given
                                                           ℰ
data, which is sampled from a functional causal model P̃ , where P̃ = (R̃, I, C) is a program in
the query alphabet Q.
   From a structure learning algorithm like slipcover [1] we obtain a set of random clauses R and
hence another program P = (R, I, C) together with a functional causal model Pℰ . Can we use the
program P to predict the effect of external interventions if we assume that Pℰ induces the same
                  ℰ
distribution as P̃ ?

   As we are mainly interested in the causal Bayesian network semantics of the program P̃ we
introduce the following equivalence relation:

Definition 10. Let ℰ be an external database for the programs P1 = (R1 , I, C) and P2 = (R2 , I, C).
We call P1 and P2 intervention equivalent on ℰ if they induce the same causal Bayesian net-
work semantics on ℰ, i.e. if the causal Bayesian networks 𝒞ℬ (P1 , ℰ) and 𝒞ℬ (P2 , ℰ) coincide. Note
that this is the case if and only if the causal models Pℰ1 and Pℰ2 predict the same effects for every
intervention.

  To apply Theorem 5, we need to make the following assumption.

Definition 11 (Faithful Programs). Let P := (R, I, C) be a program and let ℰ be an external
database. We say that P is faithful for ℰ if the distribution induced by Pℰ is faithful to the ground
graph Graphℰ (P). We say that two programs P1 and P2 are faithfully indistinguishable on ℰ
if Pℰ1 and Pℰ2 induce the same distribution and if P1 and P2 are faithful for ℰ.
Assumption 1 (Faithfulness). In the situation of Problem 1 the programs P̃ and P are faithfully
indistinguishable on the external database ℰ.

   With our notion of faithfulness in Definition 11, Theorem 5 translates to the following
statement about faithfully indistinguishable programs:

Theorem 6. Let ℰ be an external database for the programs P1 = (R1 , I, C) and P2 = (R2 , I, C).
If P1 and P2 are faithfully indistinguishable on ℰ, the ground graphs Graphℰ (P1 ) and Graphℰ (P2 )
share the same adjacencies and the same unshielded colliders.

  Hence, Assumption 1 yields that the ground graphs Graphℰ (P) and Graphℰ (P̃) yield the
same adjacencies and the same unshielded colliders. To reason about colliders instead of
unshielded colliders we restrict ourselves to triangle free programs.

Definition 12 (Triangle Free Program Structures). We call a program P triangle free if for every
external database ℰ the skeleton of the ground graph Graphℰ (P) does not contain any triangles.

Proposition 2. For every triangle free program P := (R, I, C) and every external database ℰ
every collider in the ground graph Graphℰ (P) is unshielded. □

Assumption 2. The programs P and P̃ in Problem 1 are triangle free.

  If in Problem 1 every edge of the ground graph Graphℰ (P) is involved in a collider, Theorem 6
and Proposition 2 imply that every program faithfully indistinguishable from P yields the same
ground graph on ℰ. Due to the following consequence of Theorem 2 the causal Bayesian
network semantics also coincide in this case.

Lemma 1. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be programs such that the functional causal
models Pℰ1 and Pℰ2 induce the same distribution for an external database ℰ. In this case P1 and P2
are intervention equivalent on ℰ if the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) coincide. □

   Assertions i)-iii) of the following lemma list the conditions under which a given random
clause induces a collider in the ground graph of an external database ℰ. Moreover, the symmetry
represented by the query alphabet Q implies that all edges induced by a random clause satisfying
Assertion iv) also appear in every ground graph of a program faithfully indistinguishable
from P. Note that Assertion iv) yields an analogue of the relational orientation rule in the
RPC-Algorithm [13].

Lemma 2. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be triangle free programs that are faithfully
indistinguishable on the external database ℰ. Let 𝑅𝐶 ∈ R1 be a random clause that induces an
edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 in the ground graph Graphℰ (P1 ) for an interpretation 𝜄 on var(𝑅𝐶).
Moreover, assume that the random clause 𝑅𝐶 ∈ R1 and the interpretation 𝜄 on var(𝑅𝐶) satisfy
one of the following assertions:
    i) We find two causes 𝐶1 , 𝐶2 ∈ causes(𝑅𝐶) such that 𝐶1𝜄 ̸= 𝐶2𝜄 .
   ii) There exists a second random clause 𝑅𝐶 ′ ∈ R1 together with an interpretation 𝜄′ on
                                                                                                      ′
       var(𝑅𝐶 ′ ) satisfying (ℰ I , 𝜄′ ) |= cond(𝑅𝐶 ′ ) such that we find effect(𝑅𝐶)𝜄 = effect(𝑅𝐶 ′ )𝜄
                         ′
       and 𝐶 𝜄 ̸= (𝐶 ′ )𝜄 for two causes 𝐶 ∈ causes(𝑅𝐶) and 𝐶 ′ ∈ causes(𝑅𝐶 ′ ).
  iii) There exists a cause 𝐶 ∈ causes(𝑅𝐶) together with a variable 𝑋 ∈ var(𝐶) such that
       𝑋 ̸∈ var(effect(𝑅𝐶)). Moreover, we find another interpretation 𝜄′ on var(𝑅𝐶) satisfying
                                                          ′
       (ℰ I , 𝜄′ ) |= cond(𝑅𝐶), effect(𝑅𝐶)𝜄 = effect(𝑅𝐶)𝜄 and 𝜄(𝑋) ̸= 𝜄′ (𝑋).
  iv) There is only one cause 𝐶 ∈ causes(𝑅𝐶) in 𝑅𝐶 and there exists a variable 𝑋 ∈ var(effect(𝑅𝐶))
       such that 𝑋 ̸∈ var(𝐶). Moreover, there exists an interpretation 𝜄′ on var(𝑅𝐶) of the same
       type in L as 𝜄, i.e. 𝜄 makes an existential formula 𝜑 true if and only if 𝜄′ does, such that
                     ′
       𝐶 𝜄 = 𝐶 𝜄 and such that 𝜄(𝑋) ̸= 𝜄′ (𝑋).
Then the edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 appears in the ground graph Graphℰ (P2 ) as well.

  Combining Theorem 6, Lemma 1 and Lemma 2 we get the following corollary:

Theorem 7. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be triangle free programs that are faith-
fully indistinguishable on the external database ℰ. Further, assume that every random clause
𝑅𝐶 ∈ R1 with causes(𝑅𝐶) ̸= ∅ and every interpretation 𝜄 on var(𝑅𝐶) with (ℰ, 𝜄) |= cond(𝑅𝐶)
satisfies at least one of assertions i)-iv) in Lemma 2. In this case P1 and P2 are intervention equiv-
alent on the external database ℰ. □

Example 9. Assume we obtained the program P as described in Problem 1 and Assumption 1 from
a structure learning algorithm on the database ℰ in Example 8. In this case we cannot verify the
causal content of P! However, if we would add a country producing 𝑝 or a country not producing 𝑝,
the program P would be guaranteed to be intervention equivalent to P̃ as desired.

   Theorem 7 yields a partial answer for Problem 1 if we want to predict effects of external
                                               ℰ
interventions in the functional causal model P̃ . Next, we want to generalize our result to other
external databases. To this aim we introduce the following assumption:

Assumption 3 (Generalization). In Problem 1 we additionally assume that the query alphabet Q
contains no constants and that the programs P̃ and P are faithfully indistinguishable on every
external database ℰ ′ .

  Next, we modify Assertions i),iii) and iv) of Lemma 2 to guarantee that under Assumption 3
every edge in the ground graph Graphℰ ′ (P) of an arbitrary external database ℰ ′ also occurs in
the ground graph Graphℰ ′ (P̃):

Lemma 3. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be triangle free programs that are faithfully
indistinguishable on every external database ℰ. Further, assume that no variable appears twice in
the head of a clause in I.
                                                                         ′                ′
   Moreover, let 𝑅𝐶 ∈ R be a random clause that induces an edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 in the
ground graph Graphℰ ′ (P1 ) of an external database ℰ ′ for an interpretation 𝜄′ on var(𝑅𝐶) in ℰ ′ .
Furthermore, assume that the random clause 𝑅𝐶 ∈ R1 satisfies at least one of the following as-
sertions:
    i) We find two causes 𝐶1 , 𝐶2 ∈ causes(𝑅𝐶) that are not sharing the same underlying predi-
       cate.
   ii) We find a cause 𝐶 ∈ causes(𝑅𝐶) and a variable 𝑋 ∈ var(𝐶) such that 𝑋 ̸∈ var(effect(𝑅𝐶)).
       Moreover, 𝐶 and effect(𝑅𝐶) do not share the same underlying predicate.
  iii) We find a cause 𝐶 ∈ causes(𝑅𝐶) and variable 𝑋 ∈ var(effect(𝑅𝐶)) such that 𝑋 ̸∈ var(𝐶).
       Moreover, 𝐶 and effect(𝑅𝐶) do not share the same underlying predicate.
                                 ′                 ′
Then we obtain that the edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 is contained in the ground graph Graphℰ ′ (P2 )
as well.
  The proof of Lemma 3 requires the construction of external databases, in which causes are
witnessed by more than one individual. Adding extra individuals is enabled by not having any
kind of privileged equality, which is reflected in the following assumption:
Assumption 4. In the situation of Problem 1 no variable appears twice in the head of a clause
in I.
  Finally, we combine Theorem 6, Lemma 3 and Lemma 2:
Theorem 8. In the situation of Lemma 3 we assume that every random clause in P1 satisfies one
of the Assertions i)-iii). In this case the programs P1 and P2 are intervention equivalent on every
external database ℰ. □
Example 10. Assume we obtained the program P as described in Problem 1 and Assumption 3
from a structure discovery algorithm on the database ℰ in Example 8. In this case we can invoke
Theorem 8 to see that P and P̃ are intervention equivalent on every external database ℰ ′ .
  Based on our results we propose the following procedure to verify the causal content of a
given probabilistic logic program.
Procedure 1. In the situation of Problem 1 we first check that every random clause 𝑅𝐶 of pro-
gram P and every interpretation 𝜄 on var(𝑅𝐶) in the external database ℰ satisfies one of the
Assertions i)-iv) of Lemma 2. In this case we obtain that P̃ and P are intervention equivalent
on ℰ under Assumption 2 and Assumption 1. Further, we check for every random clause 𝑅𝐶 of P1
whether there is an interpretation 𝜄 on var(𝑅𝐶) in ℰ such that (ℰ I , 𝜄) |= cond(𝑅𝐶).
   Next, we use this result to justify Assumption 3, i.e. to justify that no random clause is verified
on the basis of a collider not observed in the ground graph Graphℰ (P). Finally, we check whether
every random clause in P satisfies one of the Assertions i)-iii) of Lemma 3 and invoke Theorem 8
to conclude that under Assumption 4 the programs P and P̃ are intervention equivalent on every
external database ℰ ′ as desired.


4. Conclusion
We first illustrated that in the lifted case, the search space of causal explanations for an ob-
served distribution can be infinite. Hence, there is no straightforward generalization of causal
structure discovery here. However, we developed a procedure to verify the causal content of an
acyclic, stratified, triangle free probabilistic logic program, whose structure was determined
by observational data. Intuitively, we argued that we can verify every clause that observably
implies only colliders in the associated ground graphs.
   In future we would like to remove the assumption of triangle freeness. Furthermore, we want
to verify pairs of clauses that induce colliders in the ground graph for every external database.
To obtain a complete result we also aim to incorporate the orientation rules of the PC-algorithm
into our framework.
Acknowledgement
The research leading to this publication was supported by LMUexcellent, funded by the Federal
Ministry of Education and Research (BMBF) and the Free State of Bavaria under the Excellence
Strategy of the Federal Government and the Länder.


References
 [1] E. Bellodi, F. Riguzzi, Structure learning of probabilistic logic programs by searching the
     clause space, Theory and Practice of Logic Programming 15 (2014) 169–212. doi:10.1017/
     s1471068413000689.
 [2] J. Pearl, Causality, 2 ed., Cambridge University Press, Cambridge, UK, 2000. doi:10.1017/
     CBO9780511803161.
 [3] T. Verma, J. Pearl, Causal networks: Semantics and expressiveness, in: Uncertainty in
     Artificial Intelligence, volume 9 of Machine Intelligence and Pattern Recognition, North-
     Holland, 1990, pp. 69–76. doi:10.1016/B978-0-444-88650-7.50011-1.
 [4] D. Geiger, T. Verma, J. Pearl, Identifying independence in Bayesian networks, Networks
     20 (1990) 507–534. doi:10.1002/net.3230200504.
 [5] M. McDermott, Redundant causation, The British Journal for the Philosophy of Science
     46 (1995) 523–544. URL: http://www.jstor.org/stable/687896.
 [6] C. Meek, Strong completeness and faithfulness in Bayesian networks, in: Proceedings
     of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, Morgan
     Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, p. 411–418.
 [7] N. Weinberger, Faithfulness, coordination and causal coincidences, Erkenntnis 83 (2018)
     113–133. doi:10.1007/s10670-017-9882-6.
 [8] T. Verma, J. Pearl, Equivalence and synthesis of causal models, in: Proceedings of the Sixth
     Annual Conference on Uncertainty in Artificial Intelligence, UAI ’90, Elsevier Science Inc.,
     USA, 1990, p. 255–270. doi:10.5555/647233.719736.
 [9] C. Meek, Causal inference and causal explanation with background knowledge, in:
     Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95,
     Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, p. 403–410. doi:10.5555/
     2074158.2074204.
[10] F. Riguzzi, Foundations of Probabilistic Logic Programming: Languages, Semantics, Infer-
     ence and Learning, River Publishers, 2020.
[11] J. Vennekens, M. Denecker, M. Bruynooghe, CP-logic: A language of causal probabilistic
     events and its relation to logic programming, Theory and Practice of Logic Programming
     9 (2009) 245–308.
[12] F. Riguzzi, G. Cota, E. Bellodi, R. Zese, Causal inference in cplint, International Journal of
     Approximate Reasoning 91 (2017) 216–232. doi:10.1016/j.ijar.2017.09.007.
[13] M. Maier, Causal Discovery for Relational Domains: Representation, Reasoning, and
     Learning, Doctoral dissertations. 279., University of Massachusetts - Amherst, 2014. URL:
     https://scholarworks.umass.edu/dissertations_2/279/.
Proof Appendix
Proof of Theorem 2. Recall Definition 3 to see that 𝑋 has the same parents in the causal diagrams
graph(ℳ2 ) and graph(ℳ3 ) and that every other node 𝑌 ̸= 𝑋 has the same parents in
graph(ℳ3 ) as in graph(ℳ1 ). Furthermore, apply the assumption to obtain that the causal
diagram graph(ℳ3 ) coincides with graph(ℳ1 ).
   Hence, we are left to show that the parameters in the causal Bayesian networks 𝒞ℬ(ℳ1 )
and 𝒞ℬ(ℳ3 ) coincide as well. As ℳ1 and ℳ2 induce the same distribution, we find that the
parameters for 𝑋 coincide in the causal Bayesian networks 𝒞ℬ(ℳ1 ) and 𝒞ℬ(ℳ2 ). To conclude
we recall that the parameters of every node 𝑌 ∈ V in 𝒞ℬ(ℳ1/2/3 ) can be computed from
the corresponding defining equation by intervening on the set of parents Pa(𝑌 ) [2, Theorem
3.3.2].

Proof of Theorem 6. Assume that the functional causal models Pℰ1 and Pℰ2 induce the same
distribution 𝜋. Further, assume that both programs P1 and P2 are faithful for ℰ. In this case by
Definition 11 we see that the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) are both faithful to
the distribution 𝜋. Hence, we obtain the desired result from Theorem 5.

Proof of Lemma 2. If 𝑅𝐶 satisfies one of the assertions i)-iii), it is easy to see that the edge
𝐶 𝜄 → effect(𝑅𝐶)𝜄 is involved in a collider at effect(𝑅𝐶)𝜄 and the claim follows since by
Theorem 6 and Proposition 2 the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) share the same
colliders.
   Next, assume that 𝑅𝐶 and 𝜄 satisfy iv). By assumption there exists an interpretation 𝜄′ on
                                                           ′
var(𝑅𝐶) of the same type in L as 𝜄 such that 𝐶 𝜄 = 𝐶 𝜄 and such that 𝜄(𝑋) ̸= 𝜄′ (𝑋) for a
variable 𝑋 ∈ var(effect(𝑅𝐶)). As 𝜄 and 𝜄′ have the same type in L we find that the edges
                                         ′
𝐶 𝜄 − effect(𝑅𝐶)𝜄 and 𝐶 𝜄 − effect(𝑅𝐶)𝜄 also share the same orientation in the ground graph
Graphℰ (P2 ). If they were both pointing to 𝐶 𝜄 , we would observe a collider in Graphℰ (P2 ) not
contained Graphℰ (P2 ). This is a contradiction to Theorem 6 and Proposition 2 and the claim
follows. □

  To prove Lemma 3 we need the following lemma:

Lemma 4. Let P = (R, I, C) be a lifted program such that no variable appears twice in the head
of a clause in I and let ℰ be an external database. Assume the following holds for every external
atom 𝐴 = 𝑙(𝑥1 , ..., 𝑥𝑚 ):

  (*) Let 𝐼 be a subset of {1, . . . , 𝑚} and let 𝜄 be an interpretation on var(𝐴) with 𝜄(𝑥𝑖 ) = 𝑎 if
      and only if 𝑖 ∈ 𝐼. Moreover, let 𝑎′ be an individual in ℰ and let 𝜄′ be the interpretation on
      var(𝐴), which is given by setting 𝜄(𝑦𝑖 ) = 𝑎′ for 𝑖 ∈ 𝐼 and 𝜄′ (𝑥𝑗 ) = 𝜄(𝑥𝑗 ) for 𝑗 ̸∈ 𝐼. In this
      case (ℰ, 𝜄) |= 𝐴 if and only if (ℰ, 𝜄′ ) |= 𝐴.

Then (*) also holds for every internal atom 𝐴.

Proof. Let 𝒯 denote the fixpoint operator of the stratified datalog program I. We use induction
on the iteration of 𝒯 to show that for every 𝑛 ∈ N we have (𝒯 𝑛 (ℰ), 𝜄) |= 𝐴 if and only if
(𝒯 𝑛 (ℰ), 𝜄′ ) |= 𝐴.
    Note that the case 𝑛 = 0 follows directly from the assumption of the lemma. So assume that
(𝒯 𝑛+1 (ℰ), 𝜄) |= 𝐴. If we find (𝒯 𝑛+1 (ℰ), 𝜄′ ) |= 𝐴, we are finished by the induction hypothesis.
Hence, let us assume that (𝒯 𝑛+1 (ℰ), 𝜄′ ) ̸|= 𝐴.
    In this case there exists a clause 𝐶 ∈ I given by 𝐻 ← 𝐵1 , ..., 𝐵𝑘 and an interpretation 𝜈
on var(𝐶) such that 𝐻 𝜈 = 𝐴𝜄 and such that (𝒯 𝑛 (ℰ), 𝜈) |= {𝐵1 , ..., 𝐵𝑘 }. By the assumption
of the lemma, 𝐻 = 𝑙(𝑤1 , . . . , 𝑤𝑚 ) where all 𝑤1 , . . . , 𝑤𝑚 are distinct. Further, we know
that 𝜈(𝑤𝑖 ) = 𝜄(𝑥𝑖 ) = 𝑎 for all 𝑖 ∈ 𝐼. Define 𝜈 ′ by setting 𝜈 ′ (𝑤𝑖 ) = 𝑎′ for 𝑖 ∈ 𝐼 and
𝜈 ′ (𝑣) = 𝜈(𝑣) for all other variables 𝑣 on which 𝜈 is defined. Then the restrictions of 𝜈 and 𝜈 ′
to the variables in any 𝐵𝑖 , 1 ≤ 𝑖 ≤ 𝑘, satisfies the requirements of the induction hypothesis.
                                                                                        ′     ′
Hence, (𝒯 𝑛 (ℰ), 𝜈 ′ ) |= {𝐵1 , ..., 𝐵𝑘 } and thus (𝒯 𝑛+1 (ℰ), 𝜈 ′ ) |= 𝐻. Note that 𝐻 𝜈 = 𝐴𝜄 by
construction. Therefore, (𝒯 𝑛+1 (ℰ), 𝜄′ ) |= 𝐴 as required.

Proof of Lemma 3. We distinguish the following cases:
Case. The clause 𝑅𝐶 ∈ R1 satisfies i).
  In this case the desired result follows immediately from Lemma 2.
Case. The clause 𝑅𝐶 ∈ R1 satisfies ii).
                                           ′                    ′
  By Theorem 6 we find an edge 𝐶 𝜄 − effect(𝑅𝐶)𝜄 in the ground graph Graphℰ ′ (P2 ). This
edge needs now to be induced by a random clause 𝑅𝐶
                                                 ˜ ∈ R2 , i.e. we obtain that
                                 ′                  ′      ˜𝜄
                                                  ˜ → effect(𝑅𝐶
                              𝐶 𝜄 − effect(𝑅𝐶)𝜄 = 𝐶           ˜ )˜𝜄 .

Further, we may assume w.l.o.g. that the clauses 𝑅𝐶 and 𝑅𝐶
                                                         ˜ don’t share common variables.
Hence, we obtain an interpretation 𝜄 on var(𝑅𝐶) ∪ var(𝑅𝐶 ) such that
                                                       ˜

                           ((ℰ ′ )I , 𝜄) |= edge_eq ∪ cond(𝑅𝐶) ∪ cond(𝑅𝐶
                                                                       ˜ ),

where edge_eq denotes the set
    {︁                                                                        }︁
       ˜ = effect(𝑅𝐶) ∧ 𝐶 = effect(𝑅
      (𝐶                           ˜ 𝐶)) ∨ (effect(𝑅
                                                   ˜ 𝐶)) = effect(𝑅𝐶) ∧ 𝐶 = 𝐶
                                                                            ˜) .

  Moreover, we may assume w.l.o.g. that there exists a variable 𝑋 ∈ var(𝐶) such that
𝑋 ̸∈ var(effect(𝑅𝐶)). Furthermore, choose a variable 𝑋 ′ not contained in var(𝑅𝐶)∪var(𝑅𝐶                   ˜ ).
Next, choose a constant 𝜄(𝑋 ) not contained in ℰ of the sort s(𝑋) and add for each ground
                                   ′                     ′

atom 𝑝(𝑐1 , ..., 𝜄(𝑋), ..., 𝑐𝑛 ) in ℰ ′ the ground atom 𝑝(𝑐1 , ..., 𝜄(𝑋 ′ ), ..., 𝑐𝑛 ). In this way we end up
with an external database ℰ ′′ and an interpretation 𝜄 on var(𝑅𝐶) ∪ var(𝑅𝐶               ˜ ) ∪ {𝑋 ′ }. Further,
Lemma 4 yields that the following set is satisfied with respect to 𝜄 and (ℰ )I :         ′′


      (edge_eq ∪ cond(𝑅𝐶)) [𝑋 ′ /𝑋] ∪ edge_eq ∪ cond(𝑅𝐶) ∪ cond(𝑅𝐶
                                                                 ˜ ) ∪ {𝑋 ̸= 𝑋 ′ }

Hence, 𝑅𝐶 ∈ R1 satisfies the Assertion iii) of Lemma 2 with respect to the interpretation 𝜄
and the external database ℰ ′′ and the edge 𝐶 𝜄 − effect(𝑅𝐶)𝜄 is induced by 𝑅𝐶 ˜ in Graphℰ ′′ (P2 )
as well. From Lemma 2 we further obtain that the orientation of the edge 𝐶 𝜄 − effect(𝑅𝐶)𝜄
in Graphℰ ′′ (P2 ) is also given by 𝐶 𝜄 → effect(𝑅𝐶)𝜄 . Hence, we obtain from Definition 9 that
the formula 𝐶     ˜ = effect(𝑅𝐶) ∧ 𝐶 = effect(𝑅𝐶˜ ) is not satisfiable with respect to I ∪ C as 𝐶
and effect(𝑅𝐶) do not share the same underlying random predicate. However, this means that
                       ˜ ) = effect(𝑅𝐶) ∧ 𝐶 = 𝐶
(ℰ I , 𝜄′ ) |= effect(𝑅𝐶                       ˜ and the desired result follows.
Case. The clause 𝑅𝐶 ∈ R1 and the interpretation 𝜄2 satisfy iii) of Lemma 2.
  This can be proven analogously to the previous case.