Generation of Parametrically Uniform Knowledge Bases in a Relational Probabilistic Logic with Maximum Entropy Semantics Christoph Beierle, Markus Höhnerbach, Marcus Marto Dept. of Computer Science, University of Hagen, Germany Abstract. In a relational setting, the maximum entropy model of a set of proba- bilistic conditionals can be defined referring to the full set of ground instances of the conditionals. The logic FO-PCL uses the notion of parametric uniformity to ensure that the full grounding of the conditionals can be avoided, thereby greatly simplifying the maximum entropy model computation. In this paper, we describe a system that realises an approach transforming an FO-PCL knowledge base con- sisting of relational probabilistic conditionals into a knowledge base having the same maximum entropy model that is parametrically uniform. The implemented system provides different execution and evaluation modes, including the genera- tion of all possible solutions, and is available within an integrated development environment for relational probabilistic logic. 1 Introduction There are different approaches addressing the challenge of modelling uncertainty by combining logic with probabilities, see e.g. the foundational work reported in [18, 6, 7]. Probabilistic conditional logic considers conditional probabilities P (B|A) for condi- tionals of the form if A then B with probability x [1, 19], formally denoted by (B|A)[x]; thus, P (B|A) denotes the probability of B given that we know A holds. While these ap- proaches are often based upon propositional logic, there are also several developments aiming at exploiting the expressiveness of relational or general first order logic in com- bination with probabilities, e.g. Bayesian logic programs and Markov logic networks (see [11] for an overview). Some of these approaches (see e.g. [15, 22]) employ the principle of maximum entropy (ME) [20, 14]. Among these, there is also FO-PCL [10], a first-order extension of propositional probabilistic conditional logic. An example of a conditional in FO-PCL is “If V likes U, then U likes V with probability 0.9, for different U, V”, formally denoted by h(likes(U, V ) | likes(V, U )) [0.9] , U 6= V i. The models of an FO-PCL knowledge base R consisting of a set of such condition- als are probability distributions over possible worlds [12] satisfying each conditional in R, where a possible world is a subset of the Herbrand base which is determined by the set of all ground instances of conditionals satisfying the respective constraint. The ME principle is used to select the uniquely determined model ME (R) having maxi- mum entropy. The computation of ME (R) leads to an optimisation problem with one optimisation parameter to be determined for each admissible ground instance of ev- ery conditional in R. However, if the knowledge base is parametrically uniform, i.e. all ground instances of a conditional share the same optimisation parameter value, for each conditional in R just one optimisation parameter has to be determined [10]. Thus, parametric uniformity significantly simplifies the task of computing ME (R) [8]. In [17, 4], a set of of transformation rules PU is presented allowing to transform any consistent knowledge base into a parametrically uniform knowledge base with the same maximum entropy model. In this paper, we introduce the system PU sys imple- menting PU and automatically generating PU(R) for any consistent R. This allows for a simpler ME model computation by computing ME (PU(R)) instead of ME (R). This paper is an extended version of [3]. In particular, we expand the background about FO-PCL by recalling the notions of inter-rule und intra-rule intercations, give a more detailed overview on PU sys , and provide results obtained from an evaluation of the system. After recalling the basics of FO-PCL and PU (Sec. 2), we present PU sys in Sec. 3. We illustrate the reasons for multiple solutions of PU transformations (Sec. 4), describe their optimised generation (Sec. 5), give some first application and evaluation results (Sec. 6), and conclude and point out further work (Sec. 7). 2 Background: Interactions and PU Transformation Rules An FO-PCL conditional h(ψ | φ) [ξ] , Ci consists of an antecedent φ and a consequent ψ, both being quantifier-free first-order, non-equational formulas, a probability value ξ ∈ [0, 1], and a constraint formula C with inquatations over the variables (and con- stants) occurring in φ and ψ. As an illustration of a simple FO-PCL knowledge base we consider the following: Example 1 (Misanthrope). The misanthrope example (adapted from [10]) models the friendship relations for a group of people. Usually, if a person V likes a person U , then U also likes V . However, there is a misanthrope a, and the chances that a likes someone is considerably low. The knowledge base RMI = {R1 , R2 } contains two conditionals: R1 : h(likes(U, V ) | likes(V, U )) [0.9] , U 6= V i R2 : h(likes(a, V )) [0.05] , V 6= ai t u A ground instance of a conditional satisfying the conditional’s constraint formula like U 6= V is called admissible. A model of an FO-PCL knowledge base R is a joint probability function P satisfying each conditional in R where P satisfies a conditional h(ψ | φ) [ξ] , Ci iff P (ψg |φg ) = ξ for every admissible ground instance h(ψg | φg ) [ξ] , >i of the conditional [10]. Note that a full grounding of RMI ignoring the constraint for- mulas would immediately yield an inconsistency since there is no probability distribu- tion P with P (likes(a, a)) = 0.05 and P (likes(a, a)|likes(a, a)) = 0.9. Parametric uniformity [10] significantly simplifies the task of computing the maximum entropy [20, 14] model ME (R) since it reduces the number of optimisation parameters to be taken into account from the number of admissible ground instances to the number of conditionals in R [8]. For details about the underlying optimisation problem and its parameters determining the FO-PCL model ME (R), we refer to [10, 8]. In [17], the reasons causing R to be not parametrically uniform are investigated in detail. A central observation is that for any parametrically uniform R, the sets of ground instances of atoms and pairs of different atoms occurring in the admissible ground in- stances of two conditionals must be either disjoint or identical (otherwise there is an im- balanced sharing [17, 4]). For instance, RMI from Ex. 1 is not parametrically uniform since the sets of admissible ground atoms originating from likes(U, V ) of conditional R1 and from likes(a, V ) of conditional R2 are neither equal nor disjoint. For a single conditional it is required that each admissible ground instance of an atom p or of a pair of different atoms p0 , p00 occurs the same number of times in the ground instances of the conditional as any other admissible ground instance of the atom p or of the pair of atoms p0 , p00 (otherwise there is an imbalanced use [17, 4]). The syntactic criterion of inter-rule and intra-rule interactions [17] identifies these constellations by analysing the syntactic structure of the conditionals in R; no grounding operation is required. Inter-Rule Interactions An inter-rule interaction of type 1 between two conditionals R1 and R2 with respect to P , regarding the variable V and the constant c, denoted R1 ← hP iV,c → R2 is a situation a) R1 = h(. . . P (. . . , c, . . .) . . .)[ξR1 ], CR1 i, R2 = h(. . . P (. . . , V, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= c, or b) R1 = h(. . . P (. . . , U, . . .) . . .)[ξR1 ], CR1 i, CR1  U 6= c, R2 = h(. . . P (. . . , V, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= c . where CR2 2 V 6= c denotes that from CR2 , it does not follow that V is not equal to c. An inter-rule interaction of type 2 between R1 and R2 with respect to P , regarding the variables V and Z, denoted R1 ← hP iV,Z → R2 , is a situation with a) R1 = h(. . . P (. . . , U, . . . , U, . . .) . . .)[ξR1 ], CR1 i, R2 = h(. . . P (. . . , V, . . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z, or b) R1 = h(. . . P (. . . , U, . . . , W, . . .) . . .)[ξR1 ], CR1 i, CR1  U 6= W , R2 = h(. . . P (. . . , V, . . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z. An inter-rule interaction of type 3 between R1 and R2 with respect to P and Q, regarding the variables V and Z, denoted R1 ← hP, QiV,Z → R2 , involves two different atoms P (. . .) and Q(. . .) and is a situation with a) R1 = h(. . . P (. . . , U, . . .) . . . Q(. . . , U, . . .) . . .)[ξR1 ], CR1 i, R2 = h(. . . P (. . . , V, . . .) . . . Q(. . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z, or b) R1 = h(. . . P (. . . , U, . . .) . . . Q(. . . , W, . . .) . . .)[ξR1 ], CR1 i, CR1  U 6= W , R2 = h(. . . P (. . . , V, . . .) . . . Q(. . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z. The three types of inter-rule interactions capture exactly the different reasons for imbalanced sharing [17]. E.g., for RMI the imbalanced sharing sketched above is iden- tified by the inter-rule interaction of type 1 R2 ← hlikesiU,a → R1 . For each of the different types of interactions, there is a corresponding interaction removing transformation rule in PU (cf. Figure 1). For instance, the transformation rule (TE 1 ) removes an inter-rule interaction of type 1. The application of T E1 replaces a conditional R with two new conditionals ν(σ(R)) and ν(σ̄(R)), where σ(R) is the result of applying the variable substitution σ = {V /c} to R, and σ̄(R) is the result of adding the constraint V 6= c to the constraint formula of R. The operator ν transforms a conditional in constraint normal form, a normal form required for the recognition of R ∪ {R1 , R2 } R1 ← hP iV,c → R2 , (TE 1 ) R ∪ {R1 } ∪ ν{σ(R2 ), σ(R2 )} σ = {V /c} R ∪ {R1 , R2 } R1 ← hP iV,Z → R2 , (TE 2 ) R ∪ {R1 } ∪ ν{σ(R2 ), σ(R2 )} σ = {V /Z} R ∪ {R1 , R2 } R1 ← hP, QiV,Z → R2 , (TE 3 ) R ∪ {R1 } ∪ ν{σ(R2 ), σ(R2 )} σ = {V /Z} R ∪ {R} hQiV,c → R, (TA1 ) R ∪ ν{σ(R), σ(R)} σ = {V /c} R ∪ {R} hQiV,Z → R, (TA2 ) R ∪ ν{σ(R), σ(R)} σ = {V /Z} R ∪ {R} hQ, SiV,Z → R, (TA3 ) R ∪ ν{σ(R), σ(R)} σ = {V /Z} Fig. 1. Transformation rules PU [17] interactions. Similarly, (TE 2 ) and (TE 3 ) remove inter-rule interactions of type 2 and 3, respectively. Example 2 (Application of PU to RMI ). The knowledge base RMI of Ex. 1 is not parametrically uniform since it contains two inter-rule interactions of type 1 denoted by R2 ← hlikesiU,a → R1 and R2 ← hlikesiV,a → R1 . The application of T E1 for the first interaction replaces R1 with R1 1 : h(likes(a, V ) | likes(V, a)) [0.9] , V 6= ai R1 2 : h(likes(U, V ) | likes(V, U )) [0.9] , U 6= V ∧ U 6= ai. R1 1 is the result of the substitution σ = {U/a} applied to R1 and R1 2 is the result of adding the constraint U 6= a to R1 . The atom likes(U, V ) of R1 that caused the first interaction becomes likes(a, V ) in R1 1 and its set of ground atoms is now identical to the set of ground atoms from likes(a, V ) of R2 . The added constraint U 6= a in R1 2 leads to disjoint sets of the ground atoms of the discussed atoms. The second interaction between R2 and R1 2 is also removed by applying T E1 , thereby replacing R1 2 with R1 2 1 : h(likes(U, a)|likes(a, U ))[0.9], U 6= ai R1 2 2 : h(likes(U, V )|likes(V, U ))[0.9], U 6= V ∧ U 6= a ∧ V 6= ai. The resulting knowledge base is PU(RMI ) = {R1 1 , R1 2 1 , R1 2 2 , R2 }. t u Intra-Rule Interactions While inter-rule interactions involve two conditionals, intra- rule interactions occur within a single conditional and capture the notion of imbalanced use. Suppose R is a conditional containing atoms PR , QR with the predicate symbols P and Q, respectively. An intra-rule interaction of type 1 in R with respect to Q, regarding the variable V and the constant c, denoted hQiV,c → R, is a situation with PR = P (. . . , U, . . .), QR = Q(. . . , V, . . .), U ∈ / vars(QR ), CR  U 6= V , CR  U 6= c, and CR 2 V 6= c . An intra-rule interaction of type 2 in R with respect to Q, regarding the variables V and Z, denoted hQiV,Z → R, is a situation with PR = P (. . . , U, . . .), QR = Q(. . . , V, . . . , Z . . .), U ∈ / vars(QR ), CR  U 6= V , CR  U 6= Z, and CR 2 V 6= Z. If SR with predicate symbol S is a further atom in R, an intra-rule interaction of type 3 in R with respect to Q and S, regarding the variables V and Z, denoted hQ, SiV,Z → R, is a situation with PR = P (. . . , U, . . .), QR = Q(. . . , V, . . .), SR = S(. . . Z . . .), U ∈ / vars(QR ), U ∈ / vars(SR ), CR  U 6= V , CR  U 6= Z, and CR 2 V 6= Z. Each type of intra-rule interaction can be removed by one of the three rule (TA1 ), (TA2 ), (TA3 ) in PU (Figure 1). Example 3 (Intra-rule interaction of type 1). Suppose that the signature for the knowl- edge base RIA1 = {R1 } contains the constants D = {a, b, c}: R1 : h(p(U ) | q(V )) [ξ] , U 6= V ∧ U 6= ai There is an intra-rule interaction of type 1 denoted hqiV,a → R1 . The reason for this interaction is that the number of occurrences of the ground atom q(a) in the admissible ground instances of R1 is 2, but for q(b) (and also q(c)) only 1. Note that this imbalance persists for any larger set of constants D. The application of (TA1 ) replaces R1 with R1 1 : h(p(U ) | q(a)) [ξ] , U 6= ai R1 2 : h(p(U ) | q(V )) [ξ] , U 6= V ∧ U 6= a ∧ V 6= ai and the resulting knowledge base PU(RIA1 ) is parametrically uniform. t u The importance of inter- and intra-rule interactions is that they fully capture the reasons for a knowledge base to be not parametrically uniform. Correctness and com- pleteness of the set of interaction removing transformation rules PU is given by the following proposition: Proposition 1. [17] Exhaustively applying PU to a knowledge base R yields a knowl- edge base PU(R) such that R and PU(R) have the same maximum-entropy model and PU(R) is parametrically uniform. A proof of this proposition and further details of PU can be found in [17, 4]. As a consequence of Proposition 1, we can obtain the maximum entropy model ME (R) by computing ME (PU(R)), thereby reducing the number of optimisation parameters to be considered from the number of admissible ground instances of R to the num- ber of conditionals in PU(R). For instance, considering a set D of constants having 15 elements in Ex. 1, there are 224 admissible ground instances for RMI , but only 4 conditionals in PU(RMI ) (cf. Ex. 2). 3 Implementation of the Transformation System PU In this section, we present an overview of the software system PU sys that implements the transformation system PU. PU sys has been designed as a plugin for KReator1 [9], 1 Source code of KReator and PU sys can be found at http://kreator-ide. sourceforge.net/ FO-PCL Lexer/ Semantic Analysis Knowledge Parser/ • type consistency Base AST • vars(C) ⊆ vars(φ ∪ ψ) Transformation FO-PCL Object- • constraint normal form Knowledge Structure • recognition of interactions Base • application of transformation rules Fig. 2. Conceptual pipeline of the system PU sys . Fig. 3. Interactive transformation mode applied to Ex. 1. which is an integrated development environment for relational probabilistic logic. The basic architecture of the implemented transformation system is illustrated in Figure 2. The input knowledge base is parsed into an abstract syntax tree (AST) from which an object structure is created. Thereby it is ensured that the types of variables and con- stants in a conditional are conform with the signature of the knowledge base (type con- sistency) and that the variables of the constraint formula are used in either the premise or the conclusion of the conditional. The key operations of the transformation phase are the transformation of conditionals in constraint normal form, the recognition of interac- tions, and the application of the transformation rules. These operations operate directly on the object structure. After the transformation phase, the output knowledge base is created from the resulting object structure. Transformation Modes PU sys ensures that all conditionals of the initial knowl- edge base are transformed into constraint-normal form. If more than one interaction is found, one of these interactions has to be selected for the application of the corre- sponding transformation rule. Therefore, PU sys offers different transformation modes for different rule application strategies; moreover, PU sys can be easily extended with additional transformation modes. The Interactive mode allows to control, monitor and trace single steps of a transformation process through a graphical user interface. All interactions present in a knowledge base are listed in a short form notation. They can be selected separately to apply the corresponding transformation rule or to view more detailed information in a higher-level, declarative notation which originates from the definition of the interactions. In the Automatic mode, an applicable transformation rule is selected automatically and applied, and this step is repeated until a parametrically uniform knowledge base is reached. The All Solutions transformation mode creates all results that are obtainable by applying different orders of rule applications. Thereby, it avoids the multiple generation of the same parametrically uniform knowledge base, and moreover, it avoids the generation of knowledge bases that are just variants of each other with respect to variable renamings. As this mode is of particular interest when in- vestigating properties of PU related e.g. to minimal solutions or confluence properties, this mode will be described in more detail in Sec. 5. User Interface A transformation process can be started with KReator by executing a KReator-Script [9]. All transformation parameters (e.g. transformation mode) can be set either by using a graphical user interface or within the script itself. The transformation systems also supports batch processing for multiple knowledge base files. A screenshot of the Interactive transformation mode applied to RM I from Ex. 1 is shown in Figure 3. The right table displays the conditionals of the knowledge base to be transformed and is updated accordingly after a transformation rule has been applied. All interactions that have been recognised are listed in a short form notation in the left table. They can be selected to apply the corresponding transformation rule. The bottom pane displays detailed information about the reasons why an interaction exists by tracing it back to the declarative notation used in the definitions of the interactions. 4 Multiple Solutions The application of different transformation rules form PU to a knowledge base R may lead to different parametric uniform knowledge bases (though still having the same maximum entropy model due to Proposition 1), i.e. PU is not confluent. The following knowledge base presented in [16] illustrates this. Example 4. Let R = {R1 , R2 } be the knowledge base with: R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i R2 : h(p(X, Y ) | q(W )) [0.3] , >i There are three interactions in R: Ia : R1 ← pX,Y → R2 Ib : R1 ← hp, qiX,W → R2 Ic : R1 ← hp, qiY,W → R2 Choosing first the interaction Ia and applying PU exhaustively yields the parametri- cally uniform knowledge base Ra with the following four conditionals: R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i Ra2 : h(p(X, Y ) | q(W )) [0.3] , X 6= Y i Ra3 : h(p(Y, Y ) | q(Y )) [0.3] , >i Ra4 : h(p(Y, Y ) | q(W )) [0.3] , Y 6= W i Choosing first the interaction Ib and applying PU exhaustively yields Rb with six con- ditionals: R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i Rb2 : h(p(Y, Y ) | q(Y )) [0.3] , >i Rb3 : h(p(X, Y ) | q(X)) [0.3] , X 6= Y i Rb4 : h(p(X, Y ) | q(Y )) [0.3] , X 6= Y i Rb5 : h(p(Y, Y ) | q(W )) [0.3] , W 6= Y i Rb6 : h(p(X, Y ) | q(W )) [0.3] , W 6= X ∧ W 6= Y ∧ X 6= Y i Choosing first the interaction Ic and applying PU exhaustively yields a knowledge base Rc also with six conditionals; in fact, Rc differs from Rb only by a renaming of variables. t u Thus, even when taking variable renamings into account, in Example 4, PU can transform R into two different parametrically uniform knowledge bases, Ra and Rb . Here, the choice of the interaction that gets removed first determines the solution, while in general, the splitting in different solutions may occur at any stage of the transfor- mation process. From a computational point of view, one is especially interested in small knowledge bases. Since it is not clear which particular choice of interaction re- moval will lead to a minimal knowledge base, such a minimal knowledge base can be obtained by selecting it from the set of all solutions. 5 Generation of all Solutions In principle, it is possible to enumerate the solutions in a simple way by branching out the search algorithm every time that there is more than one option which interaction to remove first. However, this might be not feasible even for small knowledge bases. It would also give no information about the number of unique solutions, i.e. solutions that differ in more than a variable naming. Knowledge bases obtained by PU whose conditionals differ only in variable naming are equivalent. A source for this ambiguity in the transformation process is that equality constraints are realised using substitutions. However, an equality constraint A = B can be realised by a substitution A/B as well by B/A if A and B are both variables. Definition 1 (pt-equivalent conditionals). Let R be a knowledge base, R ∈ R, and let σ = σn ◦ . . . ◦ σ1 and σ 0 = σm 0 ◦ . . . ◦ σ10 be substitutions obtained from applying two sequences of PU transformations to R. Then the conditionals σ(R) and σ 0 (R) are equivalent with respect to PU transformations (or just pt-equivalent) iff there is a variable renaming ρ such that ρ(σ(R)) = σ 0 (R). Note that this notion is relative to the root conditional R. For instance, in Example 4, R20 : h(p(X, X) | q(X)) [0.3] , >i R200 : h(p(Y, Y ) | q(Y )) [0.3] , >i originating from R2 with the substitutions W/X and Y /X respectively W/Y and X/Y are pt-equivalent as there is the variable renaming ρ = X/Y such that ρ(R20 ) = R200 . An algorithm to find the solutions has to make two choices during the process: R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i R2 : h(p(X, Y ) | q(W )) [0.3] , >i X/W h(p(X, Y ) | q(X)) [0.3] , >i h(p(X, Y ) | q(W )) [0.3] , W 6= Xi X/Y W/Y Rb2 : h(p(Y, Y ) | q(Y )) Rb3 : h(p(X, Y ) | q(X)) Rb4 : h(p(X, Y ) | q(Y )) h(p(X, Y ) | q(W )) [0.3] , >i [0.3] , X 6= Y i [0.3] , X 6= Y i [0.3] , W 6= X ∧ W 6= Y i X/Y Rb5 : h(p(Y, Y ) | q(W )) Rb6 : h(p(X, Y ) | q(W )) [0.3] , W 6= Y i [0.3] , W 6= X ∧ W 6= Y ∧ X 6= Y i Fig. 4. The auxiliary graph of the solution knowledge base Rb from Example 4. Q1 : What conditionals should be checked for interactions? Q2 : Which transformations should be executed on these conditionals ensuring that all solutions are generated? A naive approach might choose all conditionals and all transformations, leading to an algorithm that branches out extremely and that may generate a huge number of knowl- edge bases. The algorithm introduced in this paper uses an auxiliary graph to answer those two questions in a more sensible way. An auxiliary graph is essentially a repre- sentation for the set of knowledge bases that can be reached through the transformation process. It is a directed graph containing two types of nodes: conditional nodes repre- senting a single conditional, and substitution nodes representing a substitution acting on a conditional. The nodes are connected such that conditional nodes are connected to their respective interaction-removing substitution nodes, and substitution nodes are connected to the conditional nodes that are the result of applying said substitution to the parent conditional. Example 5. Fig. 4 is an auxiliary graph representing the solution knowledge base Rb from Example 4. It shows how such graphs look like: On the top level there are the conditionals of the original knowledge base (rectangles). Below these there are the interaction-removing substitutions σ (ellipses) connected to the conditional node R they apply to, and to the two resulting conditional nodes σ(R) and σ̄(R). This means that each substitution node has exactly one incoming and two leaving edges. The condition- als in Rb are precisely the six leaf nodes in the graph. t u Such an auxiliary graph can also be constructed for the whole transformation pro- cess behind PU. The algorithm starts with the empty graph and adds a conditional node for each conditional in the initial knowledge base. Then we successively pick one con- ditional node, compute the set of conditional nodes in the graph that it can possibly interact with, check for interactions with said nodes and add the corresponding sub- stitution nodes for the found interactions. When the substitution node gets added, we also have to connect its outgoing edges. At this point we use the equivalence between conditionals from Definition 1 to check whether a pt-equivalent conditional is already contained in the graph. If this is the fact, then it suffices to connect the substitution node to said conditional node, and we do not have to add a new conditional node to the graph. Example 6. Fig. 5 is the auxiliary graph corresponding to the knowledge base R from Example 4. In the first row, there are the two conditionals of the original knowledge base R, and the second row contains the substitution nodes corresponding to the three interactions Ia , Ib , Ic in R. The third row contains the six conditionals obtained by applying the corresponding interaction removing transformations. The fourth row con- tains the substitution nodes corresponding to the interactions among the conditionals in the third row. Note that three of the resulting conditionals in the fifth row have multiple incoming edges since, up to pt-equivalence, they can be generated in different ways. t u This operation effectively transforms the graph from a tree to a directed acyclic graph. This graph can now answer the question Q1 posed before: The substitution nodes denote exactly the substitutions that can be applied to its parent conditional node during the interaction removal process. In order to answer question Q2 , the auxiliary graph is reduced by identifying and removing redundancies caused by substitution nodes. We will give an exemplary sit- uation of removing redundancies introduced by the order of interaction removal for independent interactions on the same conditional: Let R ∈ R be a conditional that has two interactions in R with interaction removing substitutions σ1 , σ2 . Assume that those are independent, i.e. removing one interaction changes nothing about the other inter- action. Then the graph will contain both σ1 and σ2 as substitution nodes below R. As these are independent from each other, σ2 is also a substitution child node of σ1 (R) as well as σ¯1 (R) and vice versa . Thus, both substitution nodes σ1 and σ2 below R lead to the same conditionals below. This means that we can fuse the two substitution child nodes of R to one substitution node {σ1 , σ2 }. We can pick an arbitrary representative that will represent the substitution node and especially determine the edges. Remov- ing all such redundancies in a bottom-up manner yields the reduced auxiliary graph which is uniquely determined if a choice function for picking a representative for fused substitution nodes is given. Example 7. Fig. 6 shows the reduced graph to Example 4. Note how there is just one conditional node with more than one substitution child node, corresponding to R2 . t u The reduced graph can be used to determine which interaction-removing substitu- tions on a given conditional are sufficient for generating all solutions. Starting with the set M containing the conditional nodes in the first row of the graph (i.e., the set of con- ditionals in the original knowledge base), do the following: While there is a conditional node C in M that is not a leaf node, choose (non-deterministically) one of C’s child substitution nodes and replace C in M by the two child nodes of the chosen substitution node. Using the reduced graph in this way allows the generation of all solutions without getting multiple copies or variable renamed versions. Example 8. As there is only one conditional node in the reduced graph in Fig. 6 (i.e. R2 ), there is only one (non-deterministic) choice to be made. Thus, the graph represents h(p(X, Y ) | q(W )) [0.3] , >i h(p(U, U ) | q(V )) [0.2] , U 6= V i X/Y Y /W X/W h(p(Y, Y ) | q(W )) [0.3] , >i h(p(X, Y ) | q(W )) [0.3] , X 6= Y i h(p(X, Y ) | q(Y )) [0.3] , >i h(p(X, Y ) | q(X)) [0.3] , >i h(p(X, Y ) | q(W )) [0.3] , W 6= Xi h(p(X, Y ) | q(W )) [0.3] , W 6= Y i Y /W X/Y X/Y X/Y Y /W X/Y X/W h(p(Y, Y ) | q(Y )) [0.3] , >i h(p(X, Y ) | q(W )) [0.3] , W 6= X ∧ X 6= Y i h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ W 6= Xi h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ X 6= Y i Y /X Y /W X/W h(p(Y, Y ) | q(W )) [0.3] , W 6= Y i h(p(X, Y ) | q(Y )) [0.3] , X 6= Y i h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ X 6= Y ∧ W 6= Xi h(p(X, Y ) | q(X)) [0.3] , X 6= Y i Fig. 5. The auxiliary graph of Example 4 h(p(X, Y ) | q(W )) [0.3] , >i h(p(U, U ) | q(V )) [0.2] , U 6= V i X/Y X/W ; (Y /W ) h(p(X, Y ) | q(W )) [0.3] , X 6= Y i h(p(Y, Y ) | q(W )) [0.3] , >i h(p(X, Y ) | q(X)) [0.3] , >i h(p(X, Y ) | q(W )) [0.3] , W 6= Xi X/Y Y /W X/Y ; (Y /W ) h(p(X, Y ) | q(X)) [0.3] , X 6= Y i h(p(Y, Y ) | q(Y )) [0.3] , >i h(p(Y, Y ) | q(W )) [0.3] , W 6= Y i h(p(X, Y ) | q(W )) [0.3] , W 6= X ∧ X 6= Y i Y /W h(p(X, Y ) | q(Y )) [0.3] , X 6= Y i h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ X 6= Y ∧ W 6= Xi Fig. 6. Reduced graph of the auxiliary graph from Example 4 exactly the two parametrically uniform solutions Ra and Rb (cf. Example 4) which correspond to the leave nodes obtained by choosing either the left substitution child node X/Y or the right substitution child node X/W of R2 . t u 6 Applications and First Evaluation Results The system PU sys was successfully applied to many different examples, including all examples discussed in [10, 17, 8, 4]. As expected, the optimised generation of all solu- tions presented in Sec. 5 turned out to be much more efficient than an naive approach of simply executing all possible transformation sequences. For instance, for the knowl- edge base R (Example 4) exactly the two solutions were generated, compared to 28 knowledge bases in the naive approach. There are also examples, where the naive ap- proach generates many solutions (e.g. 96), while the optimised version yields a single unique solution [2]. For another knowledge base, all four (non-redundant) solutions were generated within 10 seconds, while the naive approach did not terminate within more than four hours. In [2], PU sys is tested and evaluated with respect to many dif- ferent knowledge bases, including a series of synthetically generated knowledge bases varying in the number of sorts, constants, predicates, predicate arities, and condition- als. Among these parameters, especially increasing the arity of predicates increased the number of conditionals in PU(R) due to an increased number of interactions in the original knowledge base R. For all knowledge bases considered in [10, 17, 8, 4], the in- crease from |R| to |PU(R)| was at most by a factor of 2.2, while for some synthetically generated knowledge bases there was a factor of 8.1. However, as pointed out in Sec. 2, crucial for maximum entropy computation is the relationship between |PU(R)| and | gnd(R)| (i.e. the number of admissible groundings of R); in the evaluation in [2], this relationship is often characterised by factors in the order of magnitude between 102 and 104 . As shown in [8], this factor between |PU(R)| and | gnd(R)| is responsible for the simplification obtained in computing ME (R) by determining ME (PU(R)) instead. 7 Conclusions and Further Work The software system PU sys implements the transformation system PU and provides an optimised generation of all possible solutions, thereby transforming any consistent FO-PCL knowledge R base to a semantically equivalent and parametrically uniform one and thus simplifying the maximum entropy model computation. An open question is how PU should be modified so that a confluent set of transformation rules is ob- tained. While the objective of PU is to simplify the ME model computation, thereby not taking inference into account, the techniques used in PU are related to techniques like shattering used in lifted inference [21, 5, 13] Our current work also includes the development of sophisticated methods of using the obtained maximum entropy model ME (R) for deriving the probability ME (R)(F ) for a given formula or conditional F ; here, techniques of lifted inference might be employed. References 1. E. Adams. The Logic of Conditionals. D. Reidel, Dordrecht, 1975. 2. A. Becker. Test, evaluation and assessment of a transformation system for knowledge bases with relational probabilistic conditionals. M.Sc. Thesis, Dept. of Computer Science, Univer- sity of Hagen, Germany, 2014. (in German). 3. C. Beierle, M. Höhnerbach, and M. Marto. Implementation of a transformation system for relational probabilistic knowledge bases simplifying the maximum entropy model computa- tion. In Proc. FLAIRS 2014, pages 486–489, Menlo Park, CA, 2014. AAAI Press. 4. C. Beierle and A. Krämer. Achieving parametric uniformity for knowledge bases in a rela- tional probabilistic conditional logic with maximum entropy semantics. Annals of Mathe- matics and Artificial Intelligence, 2014. (to appear). 5. R. de Salvo Braz, E. Amir, and D. Roth. Lifted first-order probabilistic inference. In L. P. Kaelbling and A. Saffiotti, editors, IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, pages 1319–1325. Professional Book Center, 2005. 6. R. Fagin, J. Halpern, and N. Megiddo. A logic for reasoning about probabilities. Information and Computation, 87:78–128, 1990. 7. R. Fagin and J. Y. Halpern. Reasoning about knowledge and probability. J. ACM, 41(2):340– 367, 1994. 8. M. Finthammer and C. Beierle. How to exploit parametric uniformity for maximum en- tropy reasoning in a relational probabilistic logic. In L. Fariñas del Cerro, A. Herzig, and J. Mengin, editors, JELIA, volume 7519 of LNAI, pages 189–201. Springer, 2012. 9. M. Finthammer and M. Thimm. An integrated development environment for probabilistic relational reasoning. Logic Journal of the IGPL, 20(5):831–871, 2012. 10. J. Fisseler. Learning and Modeling with Probabilistic Conditional Logic, volume 328 of Dissertations in Artificial Intelligence. IOS Press, Amsterdam, 2010. 11. L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. MIT Press, 2007. 12. J. Halpern. Reasoning About Uncertainty. MIT Press, 2005. 13. A. Jaimovich, O. Meshi, and N. Friedman. Template based inference in symmetric relational Markov random fields. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2007. 14. G. Kern-Isberner. Characterizing the principle of minimum cross-entropy within a conditional-logical framework. Artificial Intelligence, 98:169–208, 1998. 15. G. Kern-Isberner and T. Lukasiewicz. Combining probabilistic logic programming with the power of maximum entropy. Artificial Intelligence, Special Issue on Nonmonotonic Reason- ing, 157(1-2):139–202, 2004. 16. A. Krämer. Transformation rules for lifted inference in relational probabilistic logic knowl- edge bases. B.Sc. Thesis, Dept. of Computer Science, University of Hagen, Germany, 2011. 17. A. Krämer and C. Beierle. On lifted inference for a relational probabilistic conditional logic with maximum entropy semantics. In T. Lukasiewicz and A. Sali, editors, Foundations of Information and Knowledge Systems (FoIKS 2012), volume 7153 of LNCS, pages 224–243. Springer, 2012. 18. N. Nilsson. Probabilistic logic. Artificial Intelligence, 28:71–87, 1986. 19. D. Nute and C. Cross. Conditional logic. In D. Gabbay and F. Guenther, editors, Handbook of Philosophical Logic, volume 4, pages 1–98. Kluwer Academic Publishers, second edition, 2002. 20. J. Paris. The uncertain reasoner’s companion – A mathematical perspective. Cambridge University Press, 1994. 21. D. Poole. First-order probabilistic inference. In G. Gottlob and T. Walsh, editors, Proc. IJCAI-03, pages 985–991. Morgan Kaufmann, 2003. 22. M. Thimm, G. Kern-Isberner, and J. Fisseler. Relational probabilistic conditional reasoning at maximum entropy. In ECSQARU, volume 6717 of LNCS, pages 447–458. Springer, 2011.