Correct Causal Inference in Probabilistic Logic Programming Kilian Rückschloß, Felix Weitkämper Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Programmier- und Modellierungssprachen Oettingenstraße 67 D-80538 München (Germany) Abstract In the current work we investigate the relationship between the classical distribution semantics and causal semantics for probabilistic logic programming. We first give an infinite family of programs yield- ing the same distribution. Therefore, we deduce that unlike in classical causal structure discovery, in the relational case it is not possible to build a finite search space of causal explanations for an observed distribution. Finally, this article develops a procedure to verify the causal content of a probabilistic logic program, whose structure is obtained from observations. Keywords Probabilistic Logic Programming, Causal Structure Discovery, Causal Semantics, Statistical Relational Artificial Intelligence 1. Introduction To illustrate the difficulties of causal structure discovery in probabilistic logic programming we begin by constructing an infinite family of programs that all induce the same distribution: Assume we are given a database that models a directed graph 𝐺, that is, it consists a fact 𝑒(𝑥, 𝑦) for every edge 𝑥 → 𝑦 in 𝐺. Moreover, each node 𝑥 in 𝐺 may satisfy a certain property 𝑝. In this case our database contains the fact 𝑝(𝑥). Let 𝑛 ∈ N be a natural number and let 𝑥 be a node in 𝐺. We write 𝑛_𝑝𝑎𝑡ℎ(𝑥) if there exists a path of length at most 𝑛 from 𝑥 to a node satisfying 𝑝. The predicate 𝑛_𝑝𝑎𝑡ℎ(_) can be defined by the following normal clause: 𝑛_𝑝𝑎𝑡ℎ(𝑋) ← 𝑒(𝑋, 𝑋1 ), 𝑒(𝑋1 , 𝑋2 ), ..., 𝑒(𝑋𝑛−1 , 𝑋𝑛 ), 𝑝(𝑋𝑛 ) Further, we write 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑥) if no such path exists, i.e. 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑋) ← ¬𝑛_𝑝𝑎𝑡ℎ(𝑋). Haifa’22: 15th Workshop on Answer Set Programming and Other Computing Paradigms kilian.rueckschloss@lmu.de (K. Rückschloß); felix.weitkaemper@lmu.de (F. Weitkämper) { https://www.pms.ifi.lmu.de/mitarbeiter/derzeitige/kilian-rueckschloss/ (K. Rückschloß); https://www.pms.ifi.lmu.de/mitarbeiter/derzeitige/felix-weitkaemper/index.html (F. Weitkämper)  0000-0002-7891-6030 (K. Rückschloß); 0000-0002-3895-8279 (F. Weitkämper) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Let 𝛼 ∈ (0, 1] be an arbitrary real number. Further, assume that every node 𝑥 in 𝐺 gives rise to two Boolean random variables 𝑟(𝑥) and 𝑠(𝑥). We consider the following probabilistic logic program: ⎧ ⎪ ⎪ ⎪ 𝑟(𝑋) : 𝛼 ← 𝑛_𝑝𝑎𝑡ℎ(𝑋) ⎪ ⎨𝑠(𝑋) ← 𝑟(𝑋), 𝑛_𝑝𝑎𝑡ℎ(𝑋) P𝑛 := , ⎪𝑠(𝑋) : 𝛼 ← 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑋) ⎪ ⎪ ⎪ ⎩𝑟(𝑋) ← 𝑠(𝑋), 𝑛𝑜_𝑛_𝑝𝑎𝑡ℎ(𝑋) An easy calculation shows that all the programs P𝑛 yield the same distribution of the random variables 𝑟(𝑥) and 𝑠(𝑥), where 𝑥 ranges over the nodes of the fixed directed graph 𝐺. However, for a fixed 𝑛 ∈ N the program P𝑛 describes a causal mechanism, in which for every node 𝑥 in 𝐺 we find that 𝑠(𝑥) is a cause of 𝑟(𝑥) unless 𝑛_𝑝𝑎𝑡ℎ(𝑥) holds. On the other hand 𝑟(𝑥) is a cause of 𝑠(𝑥) if we find 𝑛_𝑝𝑎𝑡ℎ(𝑥) in 𝐺. Hence, all the P𝑛 represent different causal structures. Furthermore, assume that we are given data sampled from the distribution of 𝑟(_) and 𝑠(_). Moreover, a structure learning algorithm like slipcover [1] yields a program P that induces the same distribution as all the P𝑛 . Hence, we obtain infinitely many hypothesis for the underlying causal mechanism, even if we assume that P represents the right distribution. Therefore, unlike the situation in classical causal structure discovery in probabilistic logic programming it is in general impossible to compute the search space of all causal explainations for an observed distribution. Hence, given a probabilistic logic program, whose structure has been derived from observational data, the best one can do is to verify its causal content. The aim of this contribution is to propose a procedure for this verification task. In Section 2 we begin by recalling Pearl’s functional causal models and their relations to causal Bayesian networks. Further, we give a brief overview over the theory of causal structure discovery for causal Bayesian networks. Finally, we introduce a semantics for probabilistic logic programs in terms of functional causal models, which allows us to ground a given program to a causal Bayesian network. The main idea of our work in Section 3 is then to lift the indistinguishability statement for causal Bayesian networks, Theorem 5, to probabilistic logic programming in order to accomplish our verification task. 2. Preliminaries We begin with recalling Pearl’s theory of causality. Further, we establish a semantics for probabilistic logic programs that allows us to reason about causal relationships in those programs. Note that in the following we use 𝜋 to denote probabilities of random variables. 2.1. Pearl’s Functional Causal Models Let us recall the definition of a functional causal model [2, 1.4.1]: Definition 1 (Functional Causal Model). A functional causal model or causal model ℳ on a set of variables V is a system of equations that consists of one defining equation of the form 𝑋 := 𝑓𝑋 (Pa(𝑋), Error(𝑋)) for each variable 𝑋 ∈ V. Here the parents Pa(𝑋) ⊆ V of 𝑋 form a subset of the set of variables V, the error term Error(𝑋) of 𝑋 is a vector of random variables and 𝑓𝑋 is a function defining 𝑋 in terms of the parents Pa(𝑋) and the error term Error(𝑋) of 𝑋. To illustrate Definition 1 we introduce the following example: Example 1. Assume we are given a database of countries. Further, we know which products 𝑝 are produced in country 𝑐 denoted by 𝑝𝑟𝑜𝑑(𝑐, 𝑝). For each product and each country there is a probability 𝛼 such that country 𝑐 needs the product 𝑝, denoted by 𝑛𝑒𝑒𝑑(𝑐, 𝑝). This is reflected in the following equation: 𝑛𝑒𝑒𝑑(𝑐, 𝑝) := 𝑢1 (𝑐, 𝑝) with 𝜋(𝑢1 (𝑐, 𝑝)) := 𝛼 (1) Analogously, a country 𝑐 is corrupt (denoted 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐)) with probability 𝛽, i.e. 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐) := 𝑢2 (𝑐) with 𝜋(𝑢2 (𝑐)) := 𝛽. (2) If a country 𝑐1 needs a product 𝑝 and does not produce 𝑝, but 𝑐2 does, then with probability 𝛾 the country 𝑐1 imports from 𝑐2 , denoted by 𝑖𝑚𝑝(𝑐1 , 𝑐2 ), i.e. ⋁︁ 𝑖𝑚𝑝(𝑐1 , 𝑐2 ) := 𝑛𝑒𝑒𝑑(𝑐1 , 𝑝) ∧ 𝑢3 (𝑐1 , 𝑐2 , 𝑝) with 𝜋(𝑢3 (𝑐1 , 𝑐2 , 𝑝)) := 𝛾. (3) ¬𝑝𝑟𝑜𝑑(𝑐1 ,𝑝) 𝑝𝑟𝑜𝑑(𝑐2 ,𝑝) Further, if a country 𝑐1 imports from a country 𝑐2 , we observe economic growth in 𝑐2 , denoted 𝑔𝑟𝑜𝑤𝑠(𝑐2 ) with probability 𝛿, i.e. ⋁︁ 𝑔𝑟𝑜𝑤𝑠(𝑐2 ) := 𝑖𝑚𝑝(𝑐1 , 𝑐2 ) ∧ 𝑢4 (𝑐1 , 𝑐2 ) with 𝜋(𝑢4 (𝑐1 , 𝑐2 )) := 𝛿. (4) 𝑐1 country Finally, if we find economic growth in country 𝑐 and 𝑐 is not corrupt, the living standard rises in 𝑐, denoted by 𝑙𝑖𝑣𝑒(𝑐), with probability 𝜖, i.e. 𝑙𝑖𝑣𝑒(𝑐) := 𝑔𝑟𝑜𝑤𝑠(𝑐) ∧ (¬𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐)) ∧ 𝑢5 (𝑐) with 𝜋(𝑢5 (𝑐)) := 𝜖. (5) The equations of the form (1), (2), (3), (4) and (5) yield a functional causal model ℳ. Next, we note that causal models do not only support queries about conditional and uncondi- tional probabilities. They support two other more general query types, namely determining the effect of interventions and counterfactuals. Here, we focus on the treatment of external interventions. 2.1.1. Predicting the Effect of Interventions Fix a causal model ℳ on a set of variables V. Assume we are given a subset of variables X := {𝑋1 , ..., 𝑋𝑘 } ⊆ V together with a vector of possible values x := (𝑥1 , ..., 𝑥𝑘 ). To calculate the effect of setting the variables in X to the values specified by x we replace the equations for 𝑋𝑖 in ℳ by 𝑋𝑖 := 𝑥𝑖 for all 1 ≤ 𝑖 ≤ 𝑘 and calculate the desired probabilities. The resulting probability distribution is then denoted by 𝜋(_| do(X = x)) [2, §1.4.3]. Example 2. If a commission roots out corruption in country 𝑐, we can model this intervention by replacing equation (4) for 𝑐 in the causal model ℳ of Example 1 with 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐) := 𝐹 𝑎𝑙𝑠𝑒. 2.2. Causal Bayesian Networks and d-Separation We briefly recall the theory of Bayesian networks, a widespread representation for probability distributions. We compare them with the functional causal models of 2.1 and obtain a causal semantics for Bayesian networks, i.e. we discuss how to determine the effect of an intervention from a Bayesian networks structure. We begin by recalling the definition of a Bayesian network: Definition 2 (Bayesian Network). A Bayesian network on a set of random variables V consists of a directed acyclic graph 𝐺 on V and the conditional probability distributions 𝜋(𝑋| Pa(𝑋)) of the random variables 𝑋 ∈ V conditioned on the set Pa(𝑋) of their parents in the graph 𝐺. It gives rise to a probability distribution on V = {𝑋1 , ..., 𝑋𝑘 } by setting 𝑘 ∏︁ 𝜋 (𝑋1 = 𝑥1 , ..., 𝑋𝑘 = 𝑥𝑘 ) = 𝜋 (𝑋𝑖 = 𝑥𝑖 | pa(𝑋𝑖 )) , 𝑖=1 where pa(𝑋𝑖 ) := {𝑋𝑗 = 𝑥𝑗 |𝑋𝑗 ∈ Pa(𝑋𝑖 )}. We proceed by recalling the relation between Bayesian networks and causal models. Definition 3 (Acyclic & Markovian Causal Models). For each causal model ℳ on V we define the causal diagram graph(ℳ) to be the directed graph on V, which is given by drawing an arrow 𝑋 → 𝑌 if and only if 𝑋 ∈ Pa(𝑌 ). Further, we call ℳ an acyclic causal model if its causal diagram graph(ℳ) is a directed acyclic graph. Finally, an acyclic causal model is said to be Markovian if the error terms Error(𝑋) are mutually independent for all 𝑋 ∈ V. The next theorem describes how causal models and Bayesian networks are related to each other [2, 1.4.2]: Theorem 1. Every Markovian causal model ℳ gives rise to a probability distribution that can be stored in a Bayesian network ℬ(ℳ) on its causal diagram. Theorem 1 yields a way to determine effects of interventions from a Bayesian network ℬ(ℳ) corresponding to a Markovian causal model ℳ. To calculate the effect of setting X := {𝑋1 , ..., 𝑋𝑘 } to x := (𝑥1 , ..., 𝑥𝑘 ) one can modify ℬ(ℳ) by adjusting the conditional distributions 𝜋(𝑋𝑖 | Pa(𝑋𝑖 )) to 𝜋(𝑋𝑖 | pa(𝑋𝑖 )) = 𝛿𝑋𝑖 ,𝑥𝑖 , where 𝛿𝑋𝑖 ,𝑥𝑖 denotes the Kronecker delta. The resulting structure is then called the causal Bayesian network 𝒞ℬ(ℳ) [2, §1.3]. In other words a causal Bayesian network is a Bayesian network that supports the calculation of effects of interventions. Example 3. Assume that the error terms 𝑢1 (_, _), 𝑢2 (_), 𝑢3 (_, _, _), 𝑢4 (_, _) and 𝑢5 (_) are mutually independent, i.e. the causal model ℳ of Example 1 is Markovian. Hence, for three countries 𝑐1 , 𝑐2 , 𝑐3 and one product 𝑝 with 𝑝𝑟𝑜𝑑(𝑐2 , 𝑝) and not 𝑝𝑟𝑜𝑑(𝑐𝑖 , 𝑝) for 𝑖 = 1, 2. We obtain by Theorem 1 that omitting isolated points the resulting distribution can be stored in a causal Bayesian network on the graph 𝑛𝑒𝑒𝑑(𝑐3 , 𝑝) 𝑖𝑚𝑝(𝑐3 , 𝑐2 ) 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝑐2 ) . 𝑛𝑒𝑒𝑑(𝑐1 , 𝑝) 𝑖𝑚𝑝(𝑐1 , 𝑐2 ) 𝑔𝑟𝑜𝑤𝑠(𝑐2 ) 𝑙𝑖𝑣𝑒(𝑐2 ) Finally, we state a modularity result for causal models, whose proof is given in the Appendix: Theorem 2 (Modularity of Causal Models). Let ℳ1 and ℳ2 be functional causal models that induce the same distribution on the set of variables V and let 𝑋 ∈ V be a variable. Further- more, let us assume that the set Pa(𝑋) of the parents of 𝑋 coincide for both causal diagrams graph(ℳ1 ) and graph(ℳ2 ). If we construct the causal model ℳ3 by exchanging the defining equation for 𝑋 in ℳ1 by the defining equation for 𝑋 in ℳ2 , we obtain that the causal Bayesian networks 𝒞ℬ(ℳ1 ) and 𝒞ℬ(ℳ3 ) coincide. 2.2.1. d-Separation The graph underlying a Bayesian network gives rise to a correct conditional independence oracle in the following way [3]: Definition 4 (d-Connecting Path). Let 𝐺 be a directed acyclic graph. A subgraph of the form 𝐴 → 𝐶 ← 𝐵 is called a collider at 𝐶. We say that a collider is shielded if there exists an edge 𝐶 − 𝐵. Otherwise, a collider is said to be unshielded. Further, let Z be a set of nodes. A d-connecting path 𝑃 := 𝑅1 − ... − 𝑅𝑛 is an undirected path in 𝐺 such that each non-collider 𝑁 of the path 𝑃 does not lie in Z and such that for every collider at 𝐶 of the path 𝑃 there exists a directed path from 𝐶 to a node in Z. Moreover, we say that Z d-connects two nodes 𝐴 and 𝐵 if there exists a d-connecting path between 𝐴 and 𝐵 with respect to Z. Otherwise, we say that Z d-separates 𝐴 and 𝐵. Finally, two sets of nodes A and B are said to be d-separated by Z if Z d-separates 𝐴 and 𝐵 for every 𝐴 ∈ A and every 𝐵 ∈ B. Otherwise, the sets A and B are d-connected by Z. Remark 1. The term “d-connected” is a shorthand for “directionally connected” [4]. Theorem 3 (Correctness of d-Separation). Let 𝐺 be a directed acyclic graph on the set V. Further, let Z ⊆ V be a subset of nodes and let 𝐴, 𝐵 ∈ V be nodes of 𝐺. If Z d-separates 𝐴 and 𝐵, we obtain that 𝐴 and 𝐵 are independent conditioned on Z in every distribution that can be represented by a Bayesian network on 𝐺. Note that in general d-separation does not yield a complete independence oracle [5, 1.4]. This motivates the following definition: Definition 5 (Causal Faithfulness). Let 𝜋 be a distribution on a set of random variables V and let 𝐺 be a directed acyclic graph on theses random variables V. We say that the distribution 𝜋 is (causally) faithful to the graph 𝐺 in the case where two random variables 𝐴, 𝐵 ∈ V are conditionally independent with respect to a subset of random variables Z ⊆ V if and only if 𝐴 and 𝐵 are d-separated by Z in 𝐺. However, the following theorem shows that faithfulness holds for almost all discrete Bayesian networks [6]: Theorem 4 (Obstruction to Causal Faithfulness). Let 𝐺 be a directed acyclic graph and let 𝜃 ∈ [0, 1]𝑛 be a vector which determines the conditional distributions that turn 𝐺 into a discrete Bayesian net- work ℬ representing the distribution 𝜋. In this case we obtain finitely many non-trivial polynomial equations such that 𝜋 is faithful to 𝐺 unless 𝜃 solves one of these equations. For an in depth discussion of causal faithfulness we refer the reader to [7]. 2.2.2. Causal Structure Discovery for Bayesian Networks At this place we want to discuss how we can learn a causal Bayesian network, i.e. a Bayesian network that allows us to predict the effect of interventions, from observational data. Since we have only observational data, the best we can do is to learn a Bayesian network that represents the right distribution. However, we want to find the Bayesian network with the right causal semantics! More precisely, we want to find all possible causal explanations for the observed distribution. We consider this as an inverse problem and to tackle this inverse problem we assume that the causal Bayesian network that generated our data satisfies the causal faithfulness assumption of Definition 5. In the light of Theorem 4 this doesn’t seem to be too restrictive. Furthermore, the following fact is known [8]: Theorem 5. Let 𝐺 and 𝐺′ be two directed acyclic graphs on a set of random variables V. In this case the following statements are equivalent: i) Each distribution 𝜋 that is faithful to 𝐺 is also faithful to 𝐺′ . ii) The graphs 𝐺 and 𝐺′ share the same adjacencies and the same unshielded colliders. Theorem 5 forms the theoretical basis for causal structure discovery algorithms such as the PC-algorithm [9] and its generalizations. Example 4. It is easy to see that in Example 3 the causal relationship between 𝑛𝑒𝑒𝑑(_, _) and 𝑖𝑚𝑝(_, _) cannot by identified. 2.3. Probabilistic Logic Programming under the FCM-Semantics Let us fix a query language, i.e. a language Q ⊇ L ⊇ E in three parts with an extensional vocabulary E and a logical vocabulary L. Here Q is a finite multisorted relational vocabulary, i.e. it consists of a finite set of sorts, a finite set of relation symbols and a finite set of constants in those sorts, as well as a countably infinite set of variables in every sort. For every variable or constant 𝑥 we will denote its sort by s(𝑥). Further, L is a subvocabulary of Q that contains all of the variables and constants of Q as well as a (possibly empty) subset of the relation symbols of Q. Moreover, E is a subvocabulary of L that satisfies the same properties with respect to L as L does with respect to Q. Example 5. In order to model Example 1 we need two sorts: 𝑐𝑜𝑢𝑛𝑡𝑟𝑖𝑒𝑠 and 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠. Further, we model 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(_), 𝑛𝑒𝑒𝑑(_, _), 𝑖𝑚𝑝(_, _), 𝑔𝑟𝑜𝑤𝑠(_) and 𝑙𝑖𝑣𝑒(_) as random predicates and 𝑝𝑟𝑜𝑑(_, _) as an external predicate. Later, we will also see an example for an internal predicate. As usual, an atom is an expression of the form 𝑟(𝑡1 , . . . , 𝑡𝑛 ), where 𝑟 is a relation symbol and 𝑡1 to 𝑡𝑛 are constants or variables of the correct sorts and a literal is an expression of the form 𝐴 or ¬𝐴 for an atom 𝐴. It is called a external atom or literal if 𝑟 is in E, logical atom or literal if 𝑟 is in L, internal atom or literal if 𝑟 lies in L ∖ E and a random atom or literal if 𝑟 is in Q ∖ L. A literal of the form 𝐴 is called positive and a literal of the form ¬𝐴 is called negative. Moreover, we will use the operator var(_) to refer to the variables in an expression. The logical vocabulary will be used to formulate constraints and conditions for our prob- abilistic logic program. The purpose of a probabilistic logic program, however, is to define distributions for the random variables, determined by the language Q. This is done by so called random clauses, i.e. LPAD-clauses with one random atom in the head. Definition 6 (Random Clause). A random clause 𝑅𝐶 is an expression of the form 𝑅 : 𝜋(𝑅𝐶) ← 𝑅1 , ..., 𝑅𝑚 , 𝐿1 , ..., 𝐿𝑛 , where 𝑅 := effect(𝑅𝐶) is a random atom, called the effect of 𝑅𝐶, causes(𝑅𝐶) := {𝑅1 , ..., 𝑅𝑚 } is a finite set of random literals, called the causes of 𝑅𝐶, cond(𝑅𝐶) := {𝐿1 , ..., 𝐿𝑛 } is a finite set of logical atoms, called the condition of 𝑅𝐶 and 𝜋(𝑅𝐶) ∈ (0, 1] is a number called the probability of 𝑅𝐶. In this case we also denote the random clause 𝑅𝐶 by effect(𝑅𝐶) : 𝜋(𝑅𝐶) ← causes(𝑅𝐶) ∪ cond(𝑅𝐶). Having established the necessary syntax, we introduce the corresponding semantics. The semantics of logical expressions are given in a straightforward way; we just want to highlight the unique names assumption in our definition of a structure: An L-structure Λ consists of a sort universe sΛ for every sort s, an element of the corre- sponding sort universe for every constant in L, such that two different constants are mapped to different elements, and a relation among the corresponding sort universes for every relation in L. Whether a logical formula is satisfied by a given L-structure (under an interpretation of its free variables) is determined by the usual rules of first-order logic. Finally, note that the semantics of external expressions is defined analogously. Lifted Programs We start with the definition of a lifted program: Definition 7 (Lifted Program & Program Structure). A (lifted) program P := (R, I, C) is a triple that consists of a finite set of normal clauses I of the form 𝐻 ← 𝐵1 , ..., 𝐵𝑚 with logical literals 𝐵1 , ..., 𝐵𝑚 and an internal atom 𝐻, integrity constraints C of the form ⊥← 𝐿1 , ..., 𝐿𝑘 for logical literals 𝐿𝑖 with 1 ≤ 𝑖 ≤ 𝑘, as well as a finite set of random clauses R. We call C the constraints, I the internal part and R the random part of P. Further, we say that P is stratified if its internal part I is a stratified set of normal clauses. Example 6. We model the causal mechanism in Example 1 by a program P, given by the random part: 𝑛𝑒𝑒𝑑(𝐶, 𝑃 ) : 𝛼 𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝐶) : 𝛽 𝑖𝑚(𝐶1 , 𝐶2 ) : 𝛾 ← 𝑛𝑒𝑒𝑑(𝐶1 , 𝑃 ), 𝑛𝑒𝑔_𝑝𝑟𝑜𝑑(𝐶1 , 𝑃 ), 𝑝𝑟𝑜𝑑(𝐶2 , 𝑃 ) 𝑔𝑟𝑜𝑤𝑠(𝐶2 ) : 𝛿 ← 𝑖𝑚𝑝(𝐶1 ), 𝑖𝑚𝑝(𝐶2 ) 𝑙𝑖𝑣𝑒(𝐶) : 𝜖 ← 𝑔𝑟𝑜𝑤(𝐶), ¬𝑐𝑜𝑟𝑟𝑢𝑝𝑡(𝐶) and the internal part: 𝑛𝑒𝑔_𝑝𝑟𝑜𝑑(𝐶, 𝑃 ) ← ¬𝑝𝑟𝑜𝑑(𝐶, 𝑃 ) for an internal predicate 𝑛𝑒𝑔_𝑝𝑟𝑜𝑑(_, _). Note that the database ℰ described in Example 3 yields an external database for P. Notation 1. Let P := (R, I, C) be a stratified program and let ℰ be an E-structure. In our setting we may w.l.o.g. assume that ℰ is a Herbrand model of an alphabet E* , extending the external alphabet E by constants. Further, denote by L* and Q* the extension of the languages L and Q by the new constants in E* . We write ℰ I := {𝐿 ground atom of L* : I ∪ ℰ |= 𝐿} for the minimal Herbrand model of I ∪ ℰ, which is the result of applying the stratified datalog program I to ℰ. An E-structure ℰ is said to be an external database for the program P if it satisfies all integrity constraints in C, i.e. if ℰ I |= C. A ground variable is a ground random atom 𝐺 := 𝑟(𝑥1 , ..., 𝑥𝑛 ) of Q* . Finally, we write 𝒢(ℰ) for the set of all ground variables. The term ground variable indicates that we expect 𝐺 to become a proper random variable under our semantics. From now on we will assume every program to be stratified. Definition 8 (FCM-Semantics of lifted Programs). Let P := (R, I, C) be a program and let ℰ be an external database for P. The grounding Pℰ of the program P with respect to ℰ is the Boolean functional causal model on the set of ground variables 𝒢(ℰ) given by the following defining equa- tion for every ground variable 𝐺 ∈ 𝒢(ℰ): ⎛ ⎞ ⋁︁ ⋀︁ 𝐺 := ⎝ 𝐶 𝜄 ∧ 𝑢 (𝑅𝐶, 𝜄)⎠ 𝑅𝐶∈R(P) 𝐶∈causes(𝑅𝐶) 𝜄 interpretation on var(𝑅𝐶) effect(𝑅𝐶)𝜄 =𝐺 ( I ,𝜄)|=cond(𝑅𝐶) ℰ Here, the error term 𝑢 (𝑅𝐶, 𝜄) is a distinct Boolean random variable with the distribution 𝜋 (𝑢 (𝑅𝐶, 𝜄)) = 𝜋(𝑅𝐶) for every random clause 𝑅𝐶 ∈ R and every variable interpretation 𝜄 on var(𝑅𝐶). Besides, the error terms 𝑢 (𝑅𝐶, 𝜄) are assumed to be mutually independent. Example 7. As intended, the causal model ℳ of Example 1 yields the semantics of the program P in Example 6 with respect to the described database ℰ. Definition 9 (Ground Graph & Acyclicity). Let ℰ be an external database for the program P := (R, I, C). Then the ground graph Graphℰ (P) is the directed graph on the set of ground variables 𝒢(ℰ) with an edge 𝐺1 → 𝐺2 if and only if there exists a random clause )︀ ∈ R, a cause (︀ I 𝑅𝐶 𝐶 ∈ causes(𝑅𝐶) and a variable interpretation 𝜄 on var(𝑅𝐶) such that ℰ , 𝜄 |= cond(𝑅𝐶), 𝐶 𝜄 ∈ {𝐺1 , ¬𝐺1 } and such that effect(𝑅𝐶)𝜄 = 𝐺2 . In this case we say that the edge 𝐺2 ← 𝐺1 is induced by 𝑅𝐶. Moreover, we call P an acyclic program if Graphℰ (P) is a directed acyclic graph for every external database ℰ. Example 8. The graph in Example 3 is the ground graph of the program P in Example 6 with respect to the described database ℰ. Further, it is easy to see that P yields an acyclic program. From now on we assume every program P := (R, I, C) to be acyclic. Then for every external database ℰ the grounding Pℰ yields unique expressions for every ground variable 𝐺 ∈ 𝒢(ℰ) in terms of the error terms 𝑢(𝑅𝐶, 𝜅). Hence, it induces a unique probability distribution on 𝒢(ℰ). We will show in a forthcoming paper that this distribution coincides with the distribution induced by the LPAD-program P ∪ ℰ via the distribution semantics [10, Chapter 1 §2.1]. All in all Definition 7 yields a generalization of the standard semantics for acyclic LPAD-programs with interesting properties regarding causation. Further, since we observe that Graphℰ (P) is the causal diagram of Pℰ , we obtain: Proposition 1. Let P := (R, I, C) be a program. For every external database ℰ the ground- ing Pℰ yields a distribution, which can be stored in a Boolean Bayesian network on the ground graph Graphℰ (P). This is also the causal Bayesian network corresponding to the functional causal model Pℰ . We call the Bayesian network 𝒞ℬ(P, ℰ) of Proposition 1 the causal Bayesian network semantics of P with respect to ℰ. In our future paper we will also show that our semantics is consistent with CP-logic [11]. Finally, note that Pearl’s do-calculus is implemented in cplint with respect to the CP-logic semantics [12]. 3. Verifying the Causal Content of Lifted Programs The proofs of all further statements can be found in the proof appendix. We now return to the task of verifying the causal content of a probabilistic logic program whose structure was derived from observational data. More formally, we investigate the following situation: Problem 1. Suppose we are given a query alphabet Q ⊇ L ⊇ E, a stratified datalog program I, a set of integrity constraints C and an E-structure ℰ with ℰ I |= C. Further, assume we are given ℰ data, which is sampled from a functional causal model P̃ , where P̃ = (R̃, I, C) is a program in the query alphabet Q. From a structure learning algorithm like slipcover [1] we obtain a set of random clauses R and hence another program P = (R, I, C) together with a functional causal model Pℰ . Can we use the program P to predict the effect of external interventions if we assume that Pℰ induces the same ℰ distribution as P̃ ? As we are mainly interested in the causal Bayesian network semantics of the program P̃ we introduce the following equivalence relation: Definition 10. Let ℰ be an external database for the programs P1 = (R1 , I, C) and P2 = (R2 , I, C). We call P1 and P2 intervention equivalent on ℰ if they induce the same causal Bayesian net- work semantics on ℰ, i.e. if the causal Bayesian networks 𝒞ℬ (P1 , ℰ) and 𝒞ℬ (P2 , ℰ) coincide. Note that this is the case if and only if the causal models Pℰ1 and Pℰ2 predict the same effects for every intervention. To apply Theorem 5, we need to make the following assumption. Definition 11 (Faithful Programs). Let P := (R, I, C) be a program and let ℰ be an external database. We say that P is faithful for ℰ if the distribution induced by Pℰ is faithful to the ground graph Graphℰ (P). We say that two programs P1 and P2 are faithfully indistinguishable on ℰ if Pℰ1 and Pℰ2 induce the same distribution and if P1 and P2 are faithful for ℰ. Assumption 1 (Faithfulness). In the situation of Problem 1 the programs P̃ and P are faithfully indistinguishable on the external database ℰ. With our notion of faithfulness in Definition 11, Theorem 5 translates to the following statement about faithfully indistinguishable programs: Theorem 6. Let ℰ be an external database for the programs P1 = (R1 , I, C) and P2 = (R2 , I, C). If P1 and P2 are faithfully indistinguishable on ℰ, the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) share the same adjacencies and the same unshielded colliders. Hence, Assumption 1 yields that the ground graphs Graphℰ (P) and Graphℰ (P̃) yield the same adjacencies and the same unshielded colliders. To reason about colliders instead of unshielded colliders we restrict ourselves to triangle free programs. Definition 12 (Triangle Free Program Structures). We call a program P triangle free if for every external database ℰ the skeleton of the ground graph Graphℰ (P) does not contain any triangles. Proposition 2. For every triangle free program P := (R, I, C) and every external database ℰ every collider in the ground graph Graphℰ (P) is unshielded. □ Assumption 2. The programs P and P̃ in Problem 1 are triangle free. If in Problem 1 every edge of the ground graph Graphℰ (P) is involved in a collider, Theorem 6 and Proposition 2 imply that every program faithfully indistinguishable from P yields the same ground graph on ℰ. Due to the following consequence of Theorem 2 the causal Bayesian network semantics also coincide in this case. Lemma 1. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be programs such that the functional causal models Pℰ1 and Pℰ2 induce the same distribution for an external database ℰ. In this case P1 and P2 are intervention equivalent on ℰ if the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) coincide. □ Assertions i)-iii) of the following lemma list the conditions under which a given random clause induces a collider in the ground graph of an external database ℰ. Moreover, the symmetry represented by the query alphabet Q implies that all edges induced by a random clause satisfying Assertion iv) also appear in every ground graph of a program faithfully indistinguishable from P. Note that Assertion iv) yields an analogue of the relational orientation rule in the RPC-Algorithm [13]. Lemma 2. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be triangle free programs that are faithfully indistinguishable on the external database ℰ. Let 𝑅𝐶 ∈ R1 be a random clause that induces an edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 in the ground graph Graphℰ (P1 ) for an interpretation 𝜄 on var(𝑅𝐶). Moreover, assume that the random clause 𝑅𝐶 ∈ R1 and the interpretation 𝜄 on var(𝑅𝐶) satisfy one of the following assertions: i) We find two causes 𝐶1 , 𝐶2 ∈ causes(𝑅𝐶) such that 𝐶1𝜄 ̸= 𝐶2𝜄 . ii) There exists a second random clause 𝑅𝐶 ′ ∈ R1 together with an interpretation 𝜄′ on ′ var(𝑅𝐶 ′ ) satisfying (ℰ I , 𝜄′ ) |= cond(𝑅𝐶 ′ ) such that we find effect(𝑅𝐶)𝜄 = effect(𝑅𝐶 ′ )𝜄 ′ and 𝐶 𝜄 ̸= (𝐶 ′ )𝜄 for two causes 𝐶 ∈ causes(𝑅𝐶) and 𝐶 ′ ∈ causes(𝑅𝐶 ′ ). iii) There exists a cause 𝐶 ∈ causes(𝑅𝐶) together with a variable 𝑋 ∈ var(𝐶) such that 𝑋 ̸∈ var(effect(𝑅𝐶)). Moreover, we find another interpretation 𝜄′ on var(𝑅𝐶) satisfying ′ (ℰ I , 𝜄′ ) |= cond(𝑅𝐶), effect(𝑅𝐶)𝜄 = effect(𝑅𝐶)𝜄 and 𝜄(𝑋) ̸= 𝜄′ (𝑋). iv) There is only one cause 𝐶 ∈ causes(𝑅𝐶) in 𝑅𝐶 and there exists a variable 𝑋 ∈ var(effect(𝑅𝐶)) such that 𝑋 ̸∈ var(𝐶). Moreover, there exists an interpretation 𝜄′ on var(𝑅𝐶) of the same type in L as 𝜄, i.e. 𝜄 makes an existential formula 𝜑 true if and only if 𝜄′ does, such that ′ 𝐶 𝜄 = 𝐶 𝜄 and such that 𝜄(𝑋) ̸= 𝜄′ (𝑋). Then the edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 appears in the ground graph Graphℰ (P2 ) as well. Combining Theorem 6, Lemma 1 and Lemma 2 we get the following corollary: Theorem 7. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be triangle free programs that are faith- fully indistinguishable on the external database ℰ. Further, assume that every random clause 𝑅𝐶 ∈ R1 with causes(𝑅𝐶) ̸= ∅ and every interpretation 𝜄 on var(𝑅𝐶) with (ℰ, 𝜄) |= cond(𝑅𝐶) satisfies at least one of assertions i)-iv) in Lemma 2. In this case P1 and P2 are intervention equiv- alent on the external database ℰ. □ Example 9. Assume we obtained the program P as described in Problem 1 and Assumption 1 from a structure learning algorithm on the database ℰ in Example 8. In this case we cannot verify the causal content of P! However, if we would add a country producing 𝑝 or a country not producing 𝑝, the program P would be guaranteed to be intervention equivalent to P̃ as desired. Theorem 7 yields a partial answer for Problem 1 if we want to predict effects of external ℰ interventions in the functional causal model P̃ . Next, we want to generalize our result to other external databases. To this aim we introduce the following assumption: Assumption 3 (Generalization). In Problem 1 we additionally assume that the query alphabet Q contains no constants and that the programs P̃ and P are faithfully indistinguishable on every external database ℰ ′ . Next, we modify Assertions i),iii) and iv) of Lemma 2 to guarantee that under Assumption 3 every edge in the ground graph Graphℰ ′ (P) of an arbitrary external database ℰ ′ also occurs in the ground graph Graphℰ ′ (P̃): Lemma 3. Let P1 := (R1 , I, C) and P2 := (R2 , I, C) be triangle free programs that are faithfully indistinguishable on every external database ℰ. Further, assume that no variable appears twice in the head of a clause in I. ′ ′ Moreover, let 𝑅𝐶 ∈ R be a random clause that induces an edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 in the ground graph Graphℰ ′ (P1 ) of an external database ℰ ′ for an interpretation 𝜄′ on var(𝑅𝐶) in ℰ ′ . Furthermore, assume that the random clause 𝑅𝐶 ∈ R1 satisfies at least one of the following as- sertions: i) We find two causes 𝐶1 , 𝐶2 ∈ causes(𝑅𝐶) that are not sharing the same underlying predi- cate. ii) We find a cause 𝐶 ∈ causes(𝑅𝐶) and a variable 𝑋 ∈ var(𝐶) such that 𝑋 ̸∈ var(effect(𝑅𝐶)). Moreover, 𝐶 and effect(𝑅𝐶) do not share the same underlying predicate. iii) We find a cause 𝐶 ∈ causes(𝑅𝐶) and variable 𝑋 ∈ var(effect(𝑅𝐶)) such that 𝑋 ̸∈ var(𝐶). Moreover, 𝐶 and effect(𝑅𝐶) do not share the same underlying predicate. ′ ′ Then we obtain that the edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 is contained in the ground graph Graphℰ ′ (P2 ) as well. The proof of Lemma 3 requires the construction of external databases, in which causes are witnessed by more than one individual. Adding extra individuals is enabled by not having any kind of privileged equality, which is reflected in the following assumption: Assumption 4. In the situation of Problem 1 no variable appears twice in the head of a clause in I. Finally, we combine Theorem 6, Lemma 3 and Lemma 2: Theorem 8. In the situation of Lemma 3 we assume that every random clause in P1 satisfies one of the Assertions i)-iii). In this case the programs P1 and P2 are intervention equivalent on every external database ℰ. □ Example 10. Assume we obtained the program P as described in Problem 1 and Assumption 3 from a structure discovery algorithm on the database ℰ in Example 8. In this case we can invoke Theorem 8 to see that P and P̃ are intervention equivalent on every external database ℰ ′ . Based on our results we propose the following procedure to verify the causal content of a given probabilistic logic program. Procedure 1. In the situation of Problem 1 we first check that every random clause 𝑅𝐶 of pro- gram P and every interpretation 𝜄 on var(𝑅𝐶) in the external database ℰ satisfies one of the Assertions i)-iv) of Lemma 2. In this case we obtain that P̃ and P are intervention equivalent on ℰ under Assumption 2 and Assumption 1. Further, we check for every random clause 𝑅𝐶 of P1 whether there is an interpretation 𝜄 on var(𝑅𝐶) in ℰ such that (ℰ I , 𝜄) |= cond(𝑅𝐶). Next, we use this result to justify Assumption 3, i.e. to justify that no random clause is verified on the basis of a collider not observed in the ground graph Graphℰ (P). Finally, we check whether every random clause in P satisfies one of the Assertions i)-iii) of Lemma 3 and invoke Theorem 8 to conclude that under Assumption 4 the programs P and P̃ are intervention equivalent on every external database ℰ ′ as desired. 4. Conclusion We first illustrated that in the lifted case, the search space of causal explanations for an ob- served distribution can be infinite. Hence, there is no straightforward generalization of causal structure discovery here. However, we developed a procedure to verify the causal content of an acyclic, stratified, triangle free probabilistic logic program, whose structure was determined by observational data. Intuitively, we argued that we can verify every clause that observably implies only colliders in the associated ground graphs. In future we would like to remove the assumption of triangle freeness. Furthermore, we want to verify pairs of clauses that induce colliders in the ground graph for every external database. To obtain a complete result we also aim to incorporate the orientation rules of the PC-algorithm into our framework. Acknowledgement The research leading to this publication was supported by LMUexcellent, funded by the Federal Ministry of Education and Research (BMBF) and the Free State of Bavaria under the Excellence Strategy of the Federal Government and the Länder. References [1] E. Bellodi, F. Riguzzi, Structure learning of probabilistic logic programs by searching the clause space, Theory and Practice of Logic Programming 15 (2014) 169–212. doi:10.1017/ s1471068413000689. [2] J. Pearl, Causality, 2 ed., Cambridge University Press, Cambridge, UK, 2000. doi:10.1017/ CBO9780511803161. [3] T. Verma, J. Pearl, Causal networks: Semantics and expressiveness, in: Uncertainty in Artificial Intelligence, volume 9 of Machine Intelligence and Pattern Recognition, North- Holland, 1990, pp. 69–76. doi:10.1016/B978-0-444-88650-7.50011-1. [4] D. Geiger, T. Verma, J. Pearl, Identifying independence in Bayesian networks, Networks 20 (1990) 507–534. doi:10.1002/net.3230200504. [5] M. McDermott, Redundant causation, The British Journal for the Philosophy of Science 46 (1995) 523–544. URL: http://www.jstor.org/stable/687896. [6] C. Meek, Strong completeness and faithfulness in Bayesian networks, in: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, p. 411–418. [7] N. Weinberger, Faithfulness, coordination and causal coincidences, Erkenntnis 83 (2018) 113–133. doi:10.1007/s10670-017-9882-6. [8] T. Verma, J. Pearl, Equivalence and synthesis of causal models, in: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI ’90, Elsevier Science Inc., USA, 1990, p. 255–270. doi:10.5555/647233.719736. [9] C. Meek, Causal inference and causal explanation with background knowledge, in: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, p. 403–410. doi:10.5555/ 2074158.2074204. [10] F. Riguzzi, Foundations of Probabilistic Logic Programming: Languages, Semantics, Infer- ence and Learning, River Publishers, 2020. [11] J. Vennekens, M. Denecker, M. Bruynooghe, CP-logic: A language of causal probabilistic events and its relation to logic programming, Theory and Practice of Logic Programming 9 (2009) 245–308. [12] F. Riguzzi, G. Cota, E. Bellodi, R. Zese, Causal inference in cplint, International Journal of Approximate Reasoning 91 (2017) 216–232. doi:10.1016/j.ijar.2017.09.007. [13] M. Maier, Causal Discovery for Relational Domains: Representation, Reasoning, and Learning, Doctoral dissertations. 279., University of Massachusetts - Amherst, 2014. URL: https://scholarworks.umass.edu/dissertations_2/279/. Proof Appendix Proof of Theorem 2. Recall Definition 3 to see that 𝑋 has the same parents in the causal diagrams graph(ℳ2 ) and graph(ℳ3 ) and that every other node 𝑌 ̸= 𝑋 has the same parents in graph(ℳ3 ) as in graph(ℳ1 ). Furthermore, apply the assumption to obtain that the causal diagram graph(ℳ3 ) coincides with graph(ℳ1 ). Hence, we are left to show that the parameters in the causal Bayesian networks 𝒞ℬ(ℳ1 ) and 𝒞ℬ(ℳ3 ) coincide as well. As ℳ1 and ℳ2 induce the same distribution, we find that the parameters for 𝑋 coincide in the causal Bayesian networks 𝒞ℬ(ℳ1 ) and 𝒞ℬ(ℳ2 ). To conclude we recall that the parameters of every node 𝑌 ∈ V in 𝒞ℬ(ℳ1/2/3 ) can be computed from the corresponding defining equation by intervening on the set of parents Pa(𝑌 ) [2, Theorem 3.3.2]. Proof of Theorem 6. Assume that the functional causal models Pℰ1 and Pℰ2 induce the same distribution 𝜋. Further, assume that both programs P1 and P2 are faithful for ℰ. In this case by Definition 11 we see that the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) are both faithful to the distribution 𝜋. Hence, we obtain the desired result from Theorem 5. Proof of Lemma 2. If 𝑅𝐶 satisfies one of the assertions i)-iii), it is easy to see that the edge 𝐶 𝜄 → effect(𝑅𝐶)𝜄 is involved in a collider at effect(𝑅𝐶)𝜄 and the claim follows since by Theorem 6 and Proposition 2 the ground graphs Graphℰ (P1 ) and Graphℰ (P2 ) share the same colliders. Next, assume that 𝑅𝐶 and 𝜄 satisfy iv). By assumption there exists an interpretation 𝜄′ on ′ var(𝑅𝐶) of the same type in L as 𝜄 such that 𝐶 𝜄 = 𝐶 𝜄 and such that 𝜄(𝑋) ̸= 𝜄′ (𝑋) for a variable 𝑋 ∈ var(effect(𝑅𝐶)). As 𝜄 and 𝜄′ have the same type in L we find that the edges ′ 𝐶 𝜄 − effect(𝑅𝐶)𝜄 and 𝐶 𝜄 − effect(𝑅𝐶)𝜄 also share the same orientation in the ground graph Graphℰ (P2 ). If they were both pointing to 𝐶 𝜄 , we would observe a collider in Graphℰ (P2 ) not contained Graphℰ (P2 ). This is a contradiction to Theorem 6 and Proposition 2 and the claim follows. □ To prove Lemma 3 we need the following lemma: Lemma 4. Let P = (R, I, C) be a lifted program such that no variable appears twice in the head of a clause in I and let ℰ be an external database. Assume the following holds for every external atom 𝐴 = 𝑙(𝑥1 , ..., 𝑥𝑚 ): (*) Let 𝐼 be a subset of {1, . . . , 𝑚} and let 𝜄 be an interpretation on var(𝐴) with 𝜄(𝑥𝑖 ) = 𝑎 if and only if 𝑖 ∈ 𝐼. Moreover, let 𝑎′ be an individual in ℰ and let 𝜄′ be the interpretation on var(𝐴), which is given by setting 𝜄(𝑦𝑖 ) = 𝑎′ for 𝑖 ∈ 𝐼 and 𝜄′ (𝑥𝑗 ) = 𝜄(𝑥𝑗 ) for 𝑗 ̸∈ 𝐼. In this case (ℰ, 𝜄) |= 𝐴 if and only if (ℰ, 𝜄′ ) |= 𝐴. Then (*) also holds for every internal atom 𝐴. Proof. Let 𝒯 denote the fixpoint operator of the stratified datalog program I. We use induction on the iteration of 𝒯 to show that for every 𝑛 ∈ N we have (𝒯 𝑛 (ℰ), 𝜄) |= 𝐴 if and only if (𝒯 𝑛 (ℰ), 𝜄′ ) |= 𝐴. Note that the case 𝑛 = 0 follows directly from the assumption of the lemma. So assume that (𝒯 𝑛+1 (ℰ), 𝜄) |= 𝐴. If we find (𝒯 𝑛+1 (ℰ), 𝜄′ ) |= 𝐴, we are finished by the induction hypothesis. Hence, let us assume that (𝒯 𝑛+1 (ℰ), 𝜄′ ) ̸|= 𝐴. In this case there exists a clause 𝐶 ∈ I given by 𝐻 ← 𝐵1 , ..., 𝐵𝑘 and an interpretation 𝜈 on var(𝐶) such that 𝐻 𝜈 = 𝐴𝜄 and such that (𝒯 𝑛 (ℰ), 𝜈) |= {𝐵1 , ..., 𝐵𝑘 }. By the assumption of the lemma, 𝐻 = 𝑙(𝑤1 , . . . , 𝑤𝑚 ) where all 𝑤1 , . . . , 𝑤𝑚 are distinct. Further, we know that 𝜈(𝑤𝑖 ) = 𝜄(𝑥𝑖 ) = 𝑎 for all 𝑖 ∈ 𝐼. Define 𝜈 ′ by setting 𝜈 ′ (𝑤𝑖 ) = 𝑎′ for 𝑖 ∈ 𝐼 and 𝜈 ′ (𝑣) = 𝜈(𝑣) for all other variables 𝑣 on which 𝜈 is defined. Then the restrictions of 𝜈 and 𝜈 ′ to the variables in any 𝐵𝑖 , 1 ≤ 𝑖 ≤ 𝑘, satisfies the requirements of the induction hypothesis. ′ ′ Hence, (𝒯 𝑛 (ℰ), 𝜈 ′ ) |= {𝐵1 , ..., 𝐵𝑘 } and thus (𝒯 𝑛+1 (ℰ), 𝜈 ′ ) |= 𝐻. Note that 𝐻 𝜈 = 𝐴𝜄 by construction. Therefore, (𝒯 𝑛+1 (ℰ), 𝜄′ ) |= 𝐴 as required. Proof of Lemma 3. We distinguish the following cases: Case. The clause 𝑅𝐶 ∈ R1 satisfies i). In this case the desired result follows immediately from Lemma 2. Case. The clause 𝑅𝐶 ∈ R1 satisfies ii). ′ ′ By Theorem 6 we find an edge 𝐶 𝜄 − effect(𝑅𝐶)𝜄 in the ground graph Graphℰ ′ (P2 ). This edge needs now to be induced by a random clause 𝑅𝐶 ˜ ∈ R2 , i.e. we obtain that ′ ′ ˜𝜄 ˜ → effect(𝑅𝐶 𝐶 𝜄 − effect(𝑅𝐶)𝜄 = 𝐶 ˜ )˜𝜄 . Further, we may assume w.l.o.g. that the clauses 𝑅𝐶 and 𝑅𝐶 ˜ don’t share common variables. Hence, we obtain an interpretation 𝜄 on var(𝑅𝐶) ∪ var(𝑅𝐶 ) such that ˜ ((ℰ ′ )I , 𝜄) |= edge_eq ∪ cond(𝑅𝐶) ∪ cond(𝑅𝐶 ˜ ), where edge_eq denotes the set {︁ }︁ ˜ = effect(𝑅𝐶) ∧ 𝐶 = effect(𝑅 (𝐶 ˜ 𝐶)) ∨ (effect(𝑅 ˜ 𝐶)) = effect(𝑅𝐶) ∧ 𝐶 = 𝐶 ˜) . Moreover, we may assume w.l.o.g. that there exists a variable 𝑋 ∈ var(𝐶) such that 𝑋 ̸∈ var(effect(𝑅𝐶)). Furthermore, choose a variable 𝑋 ′ not contained in var(𝑅𝐶)∪var(𝑅𝐶 ˜ ). Next, choose a constant 𝜄(𝑋 ) not contained in ℰ of the sort s(𝑋) and add for each ground ′ ′ atom 𝑝(𝑐1 , ..., 𝜄(𝑋), ..., 𝑐𝑛 ) in ℰ ′ the ground atom 𝑝(𝑐1 , ..., 𝜄(𝑋 ′ ), ..., 𝑐𝑛 ). In this way we end up with an external database ℰ ′′ and an interpretation 𝜄 on var(𝑅𝐶) ∪ var(𝑅𝐶 ˜ ) ∪ {𝑋 ′ }. Further, Lemma 4 yields that the following set is satisfied with respect to 𝜄 and (ℰ )I : ′′ (edge_eq ∪ cond(𝑅𝐶)) [𝑋 ′ /𝑋] ∪ edge_eq ∪ cond(𝑅𝐶) ∪ cond(𝑅𝐶 ˜ ) ∪ {𝑋 ̸= 𝑋 ′ } Hence, 𝑅𝐶 ∈ R1 satisfies the Assertion iii) of Lemma 2 with respect to the interpretation 𝜄 and the external database ℰ ′′ and the edge 𝐶 𝜄 − effect(𝑅𝐶)𝜄 is induced by 𝑅𝐶 ˜ in Graphℰ ′′ (P2 ) as well. From Lemma 2 we further obtain that the orientation of the edge 𝐶 𝜄 − effect(𝑅𝐶)𝜄 in Graphℰ ′′ (P2 ) is also given by 𝐶 𝜄 → effect(𝑅𝐶)𝜄 . Hence, we obtain from Definition 9 that the formula 𝐶 ˜ = effect(𝑅𝐶) ∧ 𝐶 = effect(𝑅𝐶˜ ) is not satisfiable with respect to I ∪ C as 𝐶 and effect(𝑅𝐶) do not share the same underlying random predicate. However, this means that ˜ ) = effect(𝑅𝐶) ∧ 𝐶 = 𝐶 (ℰ I , 𝜄′ ) |= effect(𝑅𝐶 ˜ and the desired result follows. Case. The clause 𝑅𝐶 ∈ R1 and the interpretation 𝜄2 satisfy iii) of Lemma 2. This can be proven analogously to the previous case.