Generation of Parametrically Uniform Knowledge Bases
  in a Relational Probabilistic Logic with Maximum
                  Entropy Semantics

                Christoph Beierle, Markus Höhnerbach, Marcus Marto

                  Dept. of Computer Science, University of Hagen, Germany


       Abstract. In a relational setting, the maximum entropy model of a set of proba-
       bilistic conditionals can be defined referring to the full set of ground instances of
       the conditionals. The logic FO-PCL uses the notion of parametric uniformity to
       ensure that the full grounding of the conditionals can be avoided, thereby greatly
       simplifying the maximum entropy model computation. In this paper, we describe
       a system that realises an approach transforming an FO-PCL knowledge base con-
       sisting of relational probabilistic conditionals into a knowledge base having the
       same maximum entropy model that is parametrically uniform. The implemented
       system provides different execution and evaluation modes, including the genera-
       tion of all possible solutions, and is available within an integrated development
       environment for relational probabilistic logic.


1   Introduction
There are different approaches addressing the challenge of modelling uncertainty by
combining logic with probabilities, see e.g. the foundational work reported in [18, 6, 7].
Probabilistic conditional logic considers conditional probabilities P (B|A) for condi-
tionals of the form if A then B with probability x [1, 19], formally denoted by (B|A)[x];
thus, P (B|A) denotes the probability of B given that we know A holds. While these ap-
proaches are often based upon propositional logic, there are also several developments
aiming at exploiting the expressiveness of relational or general first order logic in com-
bination with probabilities, e.g. Bayesian logic programs and Markov logic networks
(see [11] for an overview). Some of these approaches (see e.g. [15, 22]) employ the
principle of maximum entropy (ME) [20, 14]. Among these, there is also FO-PCL [10],
a first-order extension of propositional probabilistic conditional logic. An example of a
conditional in FO-PCL is “If V likes U, then U likes V with probability 0.9, for different
U, V”, formally denoted by h(likes(U, V ) | likes(V, U )) [0.9] , U 6= V i.
     The models of an FO-PCL knowledge base R consisting of a set of such condition-
als are probability distributions over possible worlds [12] satisfying each conditional
in R, where a possible world is a subset of the Herbrand base which is determined by
the set of all ground instances of conditionals satisfying the respective constraint. The
ME principle is used to select the uniquely determined model ME (R) having maxi-
mum entropy. The computation of ME (R) leads to an optimisation problem with one
optimisation parameter to be determined for each admissible ground instance of ev-
ery conditional in R. However, if the knowledge base is parametrically uniform, i.e.
all ground instances of a conditional share the same optimisation parameter value, for
each conditional in R just one optimisation parameter has to be determined [10]. Thus,
parametric uniformity significantly simplifies the task of computing ME (R) [8].
    In [17, 4], a set of of transformation rules PU is presented allowing to transform
any consistent knowledge base into a parametrically uniform knowledge base with the
same maximum entropy model. In this paper, we introduce the system PU sys imple-
menting PU and automatically generating PU(R) for any consistent R. This allows
for a simpler ME model computation by computing ME (PU(R)) instead of ME (R).
    This paper is an extended version of [3]. In particular, we expand the background
about FO-PCL by recalling the notions of inter-rule und intra-rule intercations, give a
more detailed overview on PU sys , and provide results obtained from an evaluation of
the system. After recalling the basics of FO-PCL and PU (Sec. 2), we present PU sys in
Sec. 3. We illustrate the reasons for multiple solutions of PU transformations (Sec. 4),
describe their optimised generation (Sec. 5), give some first application and evaluation
results (Sec. 6), and conclude and point out further work (Sec. 7).


2   Background: Interactions and PU Transformation Rules
An FO-PCL conditional h(ψ | φ) [ξ] , Ci consists of an antecedent φ and a consequent
ψ, both being quantifier-free first-order, non-equational formulas, a probability value
ξ ∈ [0, 1], and a constraint formula C with inquatations over the variables (and con-
stants) occurring in φ and ψ. As an illustration of a simple FO-PCL knowledge base we
consider the following:

Example 1 (Misanthrope). The misanthrope example (adapted from [10]) models the
friendship relations for a group of people. Usually, if a person V likes a person U , then
U also likes V . However, there is a misanthrope a, and the chances that a likes someone
is considerably low. The knowledge base RMI = {R1 , R2 } contains two conditionals:
           R1 : h(likes(U, V ) | likes(V, U )) [0.9] , U 6= V i
           R2 : h(likes(a, V )) [0.05] , V 6= ai                                            t
                                                                                            u

    A ground instance of a conditional satisfying the conditional’s constraint formula
like U 6= V is called admissible. A model of an FO-PCL knowledge base R is a joint
probability function P satisfying each conditional in R where P satisfies a conditional
h(ψ | φ) [ξ] , Ci iff P (ψg |φg ) = ξ for every admissible ground instance h(ψg | φg ) [ξ] , >i
of the conditional [10]. Note that a full grounding of RMI ignoring the constraint for-
mulas would immediately yield an inconsistency since there is no probability distribu-
tion P with P (likes(a, a)) = 0.05 and P (likes(a, a)|likes(a, a)) = 0.9. Parametric
uniformity [10] significantly simplifies the task of computing the maximum entropy
[20, 14] model ME (R) since it reduces the number of optimisation parameters to be
taken into account from the number of admissible ground instances to the number of
conditionals in R [8]. For details about the underlying optimisation problem and its
parameters determining the FO-PCL model ME (R), we refer to [10, 8].
    In [17], the reasons causing R to be not parametrically uniform are investigated in
detail. A central observation is that for any parametrically uniform R, the sets of ground
instances of atoms and pairs of different atoms occurring in the admissible ground in-
stances of two conditionals must be either disjoint or identical (otherwise there is an im-
balanced sharing [17, 4]). For instance, RMI from Ex. 1 is not parametrically uniform
since the sets of admissible ground atoms originating from likes(U, V ) of conditional
R1 and from likes(a, V ) of conditional R2 are neither equal nor disjoint. For a single
conditional it is required that each admissible ground instance of an atom p or of a pair
of different atoms p0 , p00 occurs the same number of times in the ground instances of
the conditional as any other admissible ground instance of the atom p or of the pair of
atoms p0 , p00 (otherwise there is an imbalanced use [17, 4]). The syntactic criterion of
inter-rule and intra-rule interactions [17] identifies these constellations by analysing
the syntactic structure of the conditionals in R; no grounding operation is required.
Inter-Rule Interactions An inter-rule interaction of type 1 between two conditionals
R1 and R2 with respect to P , regarding the variable V and the constant c, denoted
R1 ← hP iV,c → R2 is a situation
a) R1 = h(. . . P (. . . , c, . . .) . . .)[ξR1 ], CR1 i,
   R2 = h(. . . P (. . . , V, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= c, or
b) R1 = h(. . . P (. . . , U, . . .) . . .)[ξR1 ], CR1 i, CR1  U 6= c,
   R2 = h(. . . P (. . . , V, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= c .
where CR2 2 V 6= c denotes that from CR2 , it does not follow that V is not equal to c.
    An inter-rule interaction of type 2 between R1 and R2 with respect to P , regarding
the variables V and Z, denoted R1 ← hP iV,Z → R2 , is a situation with
a) R1 = h(. . . P (. . . , U, . . . , U, . . .) . . .)[ξR1 ], CR1 i,
   R2 = h(. . . P (. . . , V, . . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z, or
b) R1 = h(. . . P (. . . , U, . . . , W, . . .) . . .)[ξR1 ], CR1 i, CR1  U 6= W ,
   R2 = h(. . . P (. . . , V, . . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z.
    An inter-rule interaction of type 3 between R1 and R2 with respect to P and Q,
regarding the variables V and Z, denoted R1 ← hP, QiV,Z → R2 , involves two different
atoms P (. . .) and Q(. . .) and is a situation with
a) R1 = h(. . . P (. . . , U, . . .) . . . Q(. . . , U, . . .) . . .)[ξR1 ], CR1 i,
   R2 = h(. . . P (. . . , V, . . .) . . . Q(. . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z, or
b) R1 = h(. . . P (. . . , U, . . .) . . . Q(. . . , W, . . .) . . .)[ξR1 ], CR1 i, CR1  U 6= W ,
   R2 = h(. . . P (. . . , V, . . .) . . . Q(. . . , Z, . . .) . . .)[ξR2 ], CR2 i, and CR2 2 V 6= Z.
     The three types of inter-rule interactions capture exactly the different reasons for
imbalanced sharing [17]. E.g., for RMI the imbalanced sharing sketched above is iden-
tified by the inter-rule interaction of type 1 R2 ← hlikesiU,a → R1 .
     For each of the different types of interactions, there is a corresponding interaction
removing transformation rule in PU (cf. Figure 1). For instance, the transformation rule
(TE 1 ) removes an inter-rule interaction of type 1. The application of T E1 replaces a
conditional R with two new conditionals ν(σ(R)) and ν(σ̄(R)), where σ(R) is the
result of applying the variable substitution σ = {V /c} to R, and σ̄(R) is the result of
adding the constraint V 6= c to the constraint formula of R. The operator ν transforms
a conditional in constraint normal form, a normal form required for the recognition of
                              R ∪ {R1 , R2 }          R1 ← hP iV,c → R2 ,
              (TE 1 )
                        R ∪ {R1 } ∪ ν{σ(R2 ), σ(R2 )} σ = {V /c}

                              R ∪ {R1 , R2 }          R1 ← hP iV,Z → R2 ,
              (TE 2 )
                        R ∪ {R1 } ∪ ν{σ(R2 ), σ(R2 )} σ = {V /Z}

                              R ∪ {R1 , R2 }          R1 ← hP, QiV,Z → R2 ,
              (TE 3 )
                        R ∪ {R1 } ∪ ν{σ(R2 ), σ(R2 )} σ = {V /Z}

                                 R ∪ {R}               hQiV,c → R,
              (TA1 )
                            R ∪ ν{σ(R), σ(R)}          σ = {V /c}

                                 R ∪ {R}               hQiV,Z → R,
              (TA2 )
                            R ∪ ν{σ(R), σ(R)}          σ = {V /Z}

                                 R ∪ {R}               hQ, SiV,Z → R,
              (TA3 )
                            R ∪ ν{σ(R), σ(R)}          σ = {V /Z}


                             Fig. 1. Transformation rules PU [17]


interactions. Similarly, (TE 2 ) and (TE 3 ) remove inter-rule interactions of type 2 and
3, respectively.
Example 2 (Application of PU to RMI ). The knowledge base RMI of Ex. 1 is not
parametrically uniform since it contains two inter-rule interactions of type 1 denoted by
R2 ← hlikesiU,a → R1 and R2 ← hlikesiV,a → R1 . The application of T E1 for the
first interaction replaces R1 with
             R1 1 : h(likes(a, V ) | likes(V, a)) [0.9] , V 6= ai
             R1 2 : h(likes(U, V ) | likes(V, U )) [0.9] , U 6= V ∧ U 6= ai.
R1 1 is the result of the substitution σ = {U/a} applied to R1 and R1 2 is the result of
adding the constraint U 6= a to R1 . The atom likes(U, V ) of R1 that caused the first
interaction becomes likes(a, V ) in R1 1 and its set of ground atoms is now identical to
the set of ground atoms from likes(a, V ) of R2 . The added constraint U 6= a in R1 2
leads to disjoint sets of the ground atoms of the discussed atoms. The second interaction
between R2 and R1 2 is also removed by applying T E1 , thereby replacing R1 2 with
        R1 2 1 : h(likes(U, a)|likes(a, U ))[0.9], U 6= ai
        R1 2 2 : h(likes(U, V )|likes(V, U ))[0.9], U 6= V ∧ U 6= a ∧ V 6= ai.
The resulting knowledge base is PU(RMI ) = {R1 1 , R1 2 1 , R1 2 2 , R2 }.              t
                                                                                        u

Intra-Rule Interactions While inter-rule interactions involve two conditionals, intra-
rule interactions occur within a single conditional and capture the notion of imbalanced
use. Suppose R is a conditional containing atoms PR , QR with the predicate symbols
P and Q, respectively.
    An intra-rule interaction of type 1 in R with respect to Q, regarding the variable V
and the constant c, denoted hQiV,c → R, is a situation with PR = P (. . . , U, . . .), QR =
Q(. . . , V, . . .), U ∈
                       / vars(QR ), CR  U 6= V , CR  U 6= c, and CR 2 V 6= c .
    An intra-rule interaction of type 2 in R with respect to Q, regarding the variables
V and Z, denoted hQiV,Z → R, is a situation with PR = P (. . . , U, . . .), QR =
Q(. . . , V, . . . , Z . . .), U ∈
                                 / vars(QR ), CR  U 6= V , CR  U 6= Z, and CR 2 V 6=
Z.
    If SR with predicate symbol S is a further atom in R, an intra-rule interaction
of type 3 in R with respect to Q and S, regarding the variables V and Z, denoted
hQ, SiV,Z → R, is a situation with PR = P (. . . , U, . . .), QR = Q(. . . , V, . . .), SR =
S(. . . Z . . .), U ∈  / vars(QR ), U ∈  / vars(SR ), CR  U 6= V , CR  U 6= Z, and
CR 2 V 6= Z.
    Each type of intra-rule interaction can be removed by one of the three rule (TA1 ),
(TA2 ), (TA3 ) in PU (Figure 1).
Example 3 (Intra-rule interaction of type 1). Suppose that the signature for the knowl-
edge base RIA1 = {R1 } contains the constants D = {a, b, c}:
                       R1 : h(p(U ) | q(V )) [ξ] , U 6= V ∧ U 6= ai
There is an intra-rule interaction of type 1 denoted hqiV,a → R1 . The reason for this
interaction is that the number of occurrences of the ground atom q(a) in the admissible
ground instances of R1 is 2, but for q(b) (and also q(c)) only 1. Note that this imbalance
persists for any larger set of constants D. The application of (TA1 ) replaces R1 with
                 R1 1 : h(p(U ) | q(a)) [ξ] , U 6= ai
                 R1 2 : h(p(U ) | q(V )) [ξ] , U 6= V ∧ U 6= a ∧ V 6= ai
and the resulting knowledge base PU(RIA1 ) is parametrically uniform.                    t
                                                                                         u
    The importance of inter- and intra-rule interactions is that they fully capture the
reasons for a knowledge base to be not parametrically uniform. Correctness and com-
pleteness of the set of interaction removing transformation rules PU is given by the
following proposition:
Proposition 1. [17] Exhaustively applying PU to a knowledge base R yields a knowl-
edge base PU(R) such that R and PU(R) have the same maximum-entropy model
and PU(R) is parametrically uniform.
A proof of this proposition and further details of PU can be found in [17, 4]. As a
consequence of Proposition 1, we can obtain the maximum entropy model ME (R)
by computing ME (PU(R)), thereby reducing the number of optimisation parameters
to be considered from the number of admissible ground instances of R to the num-
ber of conditionals in PU(R). For instance, considering a set D of constants having
15 elements in Ex. 1, there are 224 admissible ground instances for RMI , but only 4
conditionals in PU(RMI ) (cf. Ex. 2).

3     Implementation of the Transformation System PU
In this section, we present an overview of the software system PU sys that implements
the transformation system PU. PU sys has been designed as a plugin for KReator1 [9],
 1
     Source code of KReator and PU sys can be found at http://kreator-ide.
     sourceforge.net/
   FO-PCL         Lexer/             Semantic Analysis
  Knowledge       Parser/        • type consistency
    Base           AST           • vars(C) ⊆ vars(φ ∪ ψ)

                                           Transformation
                                                                               FO-PCL
                  Object-        • constraint normal form
                                                                              Knowledge
                 Structure       • recognition of interactions
                                                                                Base
                                 • application of transformation rules

                     Fig. 2. Conceptual pipeline of the system PU sys .


                  Fig. 3. Interactive transformation mode applied to Ex. 1.


which is an integrated development environment for relational probabilistic logic. The
basic architecture of the implemented transformation system is illustrated in Figure 2.
The input knowledge base is parsed into an abstract syntax tree (AST) from which an
object structure is created. Thereby it is ensured that the types of variables and con-
stants in a conditional are conform with the signature of the knowledge base (type con-
sistency) and that the variables of the constraint formula are used in either the premise
or the conclusion of the conditional. The key operations of the transformation phase are
the transformation of conditionals in constraint normal form, the recognition of interac-
tions, and the application of the transformation rules. These operations operate directly
on the object structure. After the transformation phase, the output knowledge base is
created from the resulting object structure.
Transformation Modes PU sys ensures that all conditionals of the initial knowl-
edge base are transformed into constraint-normal form. If more than one interaction
is found, one of these interactions has to be selected for the application of the corre-
sponding transformation rule. Therefore, PU sys offers different transformation modes
for different rule application strategies; moreover, PU sys can be easily extended with
additional transformation modes. The Interactive mode allows to control, monitor and
trace single steps of a transformation process through a graphical user interface. All
interactions present in a knowledge base are listed in a short form notation. They can
be selected separately to apply the corresponding transformation rule or to view more
detailed information in a higher-level, declarative notation which originates from the
definition of the interactions. In the Automatic mode, an applicable transformation rule
is selected automatically and applied, and this step is repeated until a parametrically
uniform knowledge base is reached. The All Solutions transformation mode creates all
results that are obtainable by applying different orders of rule applications. Thereby,
it avoids the multiple generation of the same parametrically uniform knowledge base,
and moreover, it avoids the generation of knowledge bases that are just variants of each
other with respect to variable renamings. As this mode is of particular interest when in-
vestigating properties of PU related e.g. to minimal solutions or confluence properties,
this mode will be described in more detail in Sec. 5.
User Interface A transformation process can be started with KReator by executing a
KReator-Script [9]. All transformation parameters (e.g. transformation mode) can be set
either by using a graphical user interface or within the script itself. The transformation
systems also supports batch processing for multiple knowledge base files. A screenshot
of the Interactive transformation mode applied to RM I from Ex. 1 is shown in Figure 3.
The right table displays the conditionals of the knowledge base to be transformed and
is updated accordingly after a transformation rule has been applied. All interactions
that have been recognised are listed in a short form notation in the left table. They can
be selected to apply the corresponding transformation rule. The bottom pane displays
detailed information about the reasons why an interaction exists by tracing it back to
the declarative notation used in the definitions of the interactions.

4   Multiple Solutions
The application of different transformation rules form PU to a knowledge base R may
lead to different parametric uniform knowledge bases (though still having the same
maximum entropy model due to Proposition 1), i.e. PU is not confluent. The following
knowledge base presented in [16] illustrates this.
Example 4. Let R = {R1 , R2 } be the knowledge base with:
              R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i
              R2 : h(p(X, Y ) | q(W )) [0.3] , >i
There are three interactions in R:
              Ia : R1 ← pX,Y → R2
              Ib : R1 ← hp, qiX,W → R2
              Ic : R1 ← hp, qiY,W → R2
Choosing first the interaction Ia and applying PU exhaustively yields the parametri-
cally uniform knowledge base Ra with the following four conditionals:
              R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i
              Ra2 : h(p(X, Y ) | q(W )) [0.3] , X 6= Y i
              Ra3 : h(p(Y, Y ) | q(Y )) [0.3] , >i
              Ra4 : h(p(Y, Y ) | q(W )) [0.3] , Y 6= W i
Choosing first the interaction Ib and applying PU exhaustively yields Rb with six con-
ditionals:
              R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i
              Rb2 : h(p(Y, Y ) | q(Y )) [0.3] , >i
              Rb3 : h(p(X, Y ) | q(X)) [0.3] , X 6= Y i
              Rb4 : h(p(X, Y ) | q(Y )) [0.3] , X 6= Y i
              Rb5 : h(p(Y, Y ) | q(W )) [0.3] , W 6= Y i
              Rb6 : h(p(X, Y ) | q(W )) [0.3] , W 6= X ∧ W 6= Y ∧ X 6= Y i
Choosing first the interaction Ic and applying PU exhaustively yields a knowledge
base Rc also with six conditionals; in fact, Rc differs from Rb only by a renaming of
variables.                                                                         t
                                                                                   u

    Thus, even when taking variable renamings into account, in Example 4, PU can
transform R into two different parametrically uniform knowledge bases, Ra and Rb .
Here, the choice of the interaction that gets removed first determines the solution, while
in general, the splitting in different solutions may occur at any stage of the transfor-
mation process. From a computational point of view, one is especially interested in
small knowledge bases. Since it is not clear which particular choice of interaction re-
moval will lead to a minimal knowledge base, such a minimal knowledge base can be
obtained by selecting it from the set of all solutions.


5   Generation of all Solutions

In principle, it is possible to enumerate the solutions in a simple way by branching out
the search algorithm every time that there is more than one option which interaction
to remove first. However, this might be not feasible even for small knowledge bases. It
would also give no information about the number of unique solutions, i.e. solutions that
differ in more than a variable naming.
    Knowledge bases obtained by PU whose conditionals differ only in variable naming
are equivalent. A source for this ambiguity in the transformation process is that equality
constraints are realised using substitutions. However, an equality constraint A = B can
be realised by a substitution A/B as well by B/A if A and B are both variables.

Definition 1 (pt-equivalent conditionals). Let R be a knowledge base, R ∈ R, and
let σ = σn ◦ . . . ◦ σ1 and σ 0 = σm
                                   0
                                     ◦ . . . ◦ σ10 be substitutions obtained from applying
two sequences of PU transformations to R. Then the conditionals σ(R) and σ 0 (R)
are equivalent with respect to PU transformations (or just pt-equivalent) iff there is a
variable renaming ρ such that ρ(σ(R)) = σ 0 (R).

Note that this notion is relative to the root conditional R. For instance, in Example 4,
              R20 : h(p(X, X) | q(X)) [0.3] , >i
              R200 : h(p(Y, Y ) | q(Y )) [0.3] , >i
originating from R2 with the substitutions W/X and Y /X respectively W/Y and X/Y
are pt-equivalent as there is the variable renaming ρ = X/Y such that ρ(R20 ) = R200 .
    An algorithm to find the solutions has to make two choices during the process:
 R1 : h(p(U, U ) | q(V )) [0.2] , U 6= V i    R2 : h(p(X, Y ) | q(W )) [0.3] , >i


                                                              X/W


                                 h(p(X, Y ) | q(X)) [0.3] , >i   h(p(X, Y ) | q(W )) [0.3] , W 6= Xi


                                             X/Y                                W/Y


    Rb2 : h(p(Y, Y ) | q(Y ))      Rb3 : h(p(X, Y ) | q(X))           Rb4 : h(p(X, Y ) | q(Y ))    h(p(X, Y ) | q(W ))
           [0.3] , >i                     [0.3] , X 6= Y i                   [0.3] , X 6= Y i       [0.3] , W 6= X ∧ W 6= Y i


                                                                                                             X/Y


                                                                 Rb5 : h(p(Y, Y ) | q(W ))        Rb6 : h(p(X, Y ) | q(W ))
                                                                        [0.3] , W 6= Y i                 [0.3] , W 6= X ∧ W 6= Y ∧ X 6= Y i


            Fig. 4. The auxiliary graph of the solution knowledge base Rb from Example 4.


   Q1 : What conditionals should be checked for interactions?
   Q2 : Which transformations should be executed on these conditionals ensuring that
        all solutions are generated?

A naive approach might choose all conditionals and all transformations, leading to an
algorithm that branches out extremely and that may generate a huge number of knowl-
edge bases. The algorithm introduced in this paper uses an auxiliary graph to answer
those two questions in a more sensible way. An auxiliary graph is essentially a repre-
sentation for the set of knowledge bases that can be reached through the transformation
process. It is a directed graph containing two types of nodes: conditional nodes repre-
senting a single conditional, and substitution nodes representing a substitution acting
on a conditional. The nodes are connected such that conditional nodes are connected
to their respective interaction-removing substitution nodes, and substitution nodes are
connected to the conditional nodes that are the result of applying said substitution to
the parent conditional.

Example 5. Fig. 4 is an auxiliary graph representing the solution knowledge base Rb
from Example 4. It shows how such graphs look like: On the top level there are the
conditionals of the original knowledge base (rectangles). Below these there are the
interaction-removing substitutions σ (ellipses) connected to the conditional node R they
apply to, and to the two resulting conditional nodes σ(R) and σ̄(R). This means that
each substitution node has exactly one incoming and two leaving edges. The condition-
als in Rb are precisely the six leaf nodes in the graph.                               t
                                                                                       u

     Such an auxiliary graph can also be constructed for the whole transformation pro-
cess behind PU. The algorithm starts with the empty graph and adds a conditional node
for each conditional in the initial knowledge base. Then we successively pick one con-
ditional node, compute the set of conditional nodes in the graph that it can possibly
interact with, check for interactions with said nodes and add the corresponding sub-
stitution nodes for the found interactions. When the substitution node gets added, we
also have to connect its outgoing edges. At this point we use the equivalence between
conditionals from Definition 1 to check whether a pt-equivalent conditional is already
contained in the graph. If this is the fact, then it suffices to connect the substitution node
to said conditional node, and we do not have to add a new conditional node to the graph.
Example 6. Fig. 5 is the auxiliary graph corresponding to the knowledge base R from
Example 4. In the first row, there are the two conditionals of the original knowledge
base R, and the second row contains the substitution nodes corresponding to the three
interactions Ia , Ib , Ic in R. The third row contains the six conditionals obtained by
applying the corresponding interaction removing transformations. The fourth row con-
tains the substitution nodes corresponding to the interactions among the conditionals in
the third row. Note that three of the resulting conditionals in the fifth row have multiple
incoming edges since, up to pt-equivalence, they can be generated in different ways. t   u
    This operation effectively transforms the graph from a tree to a directed acyclic
graph. This graph can now answer the question Q1 posed before: The substitution nodes
denote exactly the substitutions that can be applied to its parent conditional node during
the interaction removal process.
    In order to answer question Q2 , the auxiliary graph is reduced by identifying and
removing redundancies caused by substitution nodes. We will give an exemplary sit-
uation of removing redundancies introduced by the order of interaction removal for
independent interactions on the same conditional: Let R ∈ R be a conditional that has
two interactions in R with interaction removing substitutions σ1 , σ2 . Assume that those
are independent, i.e. removing one interaction changes nothing about the other inter-
action. Then the graph will contain both σ1 and σ2 as substitution nodes below R. As
these are independent from each other, σ2 is also a substitution child node of σ1 (R) as
well as σ¯1 (R) and vice versa . Thus, both substitution nodes σ1 and σ2 below R lead
to the same conditionals below. This means that we can fuse the two substitution child
nodes of R to one substitution node {σ1 , σ2 }. We can pick an arbitrary representative
that will represent the substitution node and especially determine the edges. Remov-
ing all such redundancies in a bottom-up manner yields the reduced auxiliary graph
which is uniquely determined if a choice function for picking a representative for fused
substitution nodes is given.
Example 7. Fig. 6 shows the reduced graph to Example 4. Note how there is just one
conditional node with more than one substitution child node, corresponding to R2 . t
                                                                                   u
    The reduced graph can be used to determine which interaction-removing substitu-
tions on a given conditional are sufficient for generating all solutions. Starting with the
set M containing the conditional nodes in the first row of the graph (i.e., the set of con-
ditionals in the original knowledge base), do the following: While there is a conditional
node C in M that is not a leaf node, choose (non-deterministically) one of C’s child
substitution nodes and replace C in M by the two child nodes of the chosen substitution
node. Using the reduced graph in this way allows the generation of all solutions without
getting multiple copies or variable renamed versions.
Example 8. As there is only one conditional node in the reduced graph in Fig. 6 (i.e.
R2 ), there is only one (non-deterministic) choice to be made. Thus, the graph represents
                                                  h(p(X, Y ) | q(W )) [0.3] , >i         h(p(U, U ) | q(V )) [0.2] , U 6= V i


                                                 X/Y             Y /W           X/W


h(p(Y, Y ) | q(W )) [0.3] , >i    h(p(X, Y ) | q(W )) [0.3] , X 6= Y i        h(p(X, Y ) | q(Y )) [0.3] , >i        h(p(X, Y ) | q(X)) [0.3] , >i      h(p(X, Y ) | q(W )) [0.3] , W 6= Xi     h(p(X, Y ) | q(W )) [0.3] , W 6= Y i


                                                 Y /W                                     X/Y                                   X/Y
                                                                                                                                                              X/Y              Y /W                  X/Y               X/W

                                     h(p(Y, Y ) | q(Y )) [0.3] , >i      h(p(X, Y ) | q(W )) [0.3] , W 6= X ∧ X 6= Y i             h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ W 6= Xi        h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ X 6= Y i


                                                                                 Y /X           Y /W                                                                                                       X/W


                       h(p(Y, Y ) | q(W )) [0.3] , W 6= Y i       h(p(X, Y ) | q(Y )) [0.3] , X 6= Y i      h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ X 6= Y ∧ W 6= Xi                            h(p(X, Y ) | q(X)) [0.3] , X 6= Y i


                                                                                  Fig. 5. The auxiliary graph of Example 4

                                                              h(p(X, Y ) | q(W )) [0.3] , >i     h(p(U, U ) | q(V )) [0.2] , U 6= V i


                                                         X/Y                              X/W ; (Y /W )


     h(p(X, Y ) | q(W )) [0.3] , X 6= Y i    h(p(Y, Y ) | q(W )) [0.3] , >i        h(p(X, Y ) | q(X)) [0.3] , >i          h(p(X, Y ) | q(W )) [0.3] , W 6= Xi


                                                         X/Y                                    Y /W                                 X/Y ; (Y /W )


     h(p(X, Y ) | q(X)) [0.3] , X 6= Y i     h(p(Y, Y ) | q(Y )) [0.3] , >i       h(p(Y, Y ) | q(W )) [0.3] , W 6= Y i         h(p(X, Y ) | q(W )) [0.3] , W 6= X ∧ X 6= Y i


                                                                                                                                                      Y /W


                                                                                                               h(p(X, Y ) | q(Y )) [0.3] , X 6= Y i          h(p(X, Y ) | q(W )) [0.3] , W 6= Y ∧ X 6= Y ∧ W 6= Xi


                                                                  Fig. 6. Reduced graph of the auxiliary graph from Example 4
exactly the two parametrically uniform solutions Ra and Rb (cf. Example 4) which
correspond to the leave nodes obtained by choosing either the left substitution child
node X/Y or the right substitution child node X/W of R2 .                          t
                                                                                   u

6   Applications and First Evaluation Results
The system PU sys was successfully applied to many different examples, including all
examples discussed in [10, 17, 8, 4]. As expected, the optimised generation of all solu-
tions presented in Sec. 5 turned out to be much more efficient than an naive approach
of simply executing all possible transformation sequences. For instance, for the knowl-
edge base R (Example 4) exactly the two solutions were generated, compared to 28
knowledge bases in the naive approach. There are also examples, where the naive ap-
proach generates many solutions (e.g. 96), while the optimised version yields a single
unique solution [2]. For another knowledge base, all four (non-redundant) solutions
were generated within 10 seconds, while the naive approach did not terminate within
more than four hours. In [2], PU sys is tested and evaluated with respect to many dif-
ferent knowledge bases, including a series of synthetically generated knowledge bases
varying in the number of sorts, constants, predicates, predicate arities, and condition-
als. Among these parameters, especially increasing the arity of predicates increased the
number of conditionals in PU(R) due to an increased number of interactions in the
original knowledge base R. For all knowledge bases considered in [10, 17, 8, 4], the in-
crease from |R| to |PU(R)| was at most by a factor of 2.2, while for some synthetically
generated knowledge bases there was a factor of 8.1. However, as pointed out in Sec. 2,
crucial for maximum entropy computation is the relationship between |PU(R)| and
| gnd(R)| (i.e. the number of admissible groundings of R); in the evaluation in [2], this
relationship is often characterised by factors in the order of magnitude between 102 and
104 . As shown in [8], this factor between |PU(R)| and | gnd(R)| is responsible for the
simplification obtained in computing ME (R) by determining ME (PU(R)) instead.

7   Conclusions and Further Work
The software system PU sys implements the transformation system PU and provides
an optimised generation of all possible solutions, thereby transforming any consistent
FO-PCL knowledge R base to a semantically equivalent and parametrically uniform
one and thus simplifying the maximum entropy model computation. An open question
is how PU should be modified so that a confluent set of transformation rules is ob-
tained. While the objective of PU is to simplify the ME model computation, thereby
not taking inference into account, the techniques used in PU are related to techniques
like shattering used in lifted inference [21, 5, 13] Our current work also includes the
development of sophisticated methods of using the obtained maximum entropy model
ME (R) for deriving the probability ME (R)(F ) for a given formula or conditional F ;
here, techniques of lifted inference might be employed.

References
 1. E. Adams. The Logic of Conditionals. D. Reidel, Dordrecht, 1975.
 2. A. Becker. Test, evaluation and assessment of a transformation system for knowledge bases
    with relational probabilistic conditionals. M.Sc. Thesis, Dept. of Computer Science, Univer-
    sity of Hagen, Germany, 2014. (in German).
 3. C. Beierle, M. Höhnerbach, and M. Marto. Implementation of a transformation system for
    relational probabilistic knowledge bases simplifying the maximum entropy model computa-
    tion. In Proc. FLAIRS 2014, pages 486–489, Menlo Park, CA, 2014. AAAI Press.
 4. C. Beierle and A. Krämer. Achieving parametric uniformity for knowledge bases in a rela-
    tional probabilistic conditional logic with maximum entropy semantics. Annals of Mathe-
    matics and Artificial Intelligence, 2014. (to appear).
 5. R. de Salvo Braz, E. Amir, and D. Roth. Lifted first-order probabilistic inference. In L. P.
    Kaelbling and A. Saffiotti, editors, IJCAI-05, Proceedings of the Nineteenth International
    Joint Conference on Artificial Intelligence, pages 1319–1325. Professional Book Center,
    2005.
 6. R. Fagin, J. Halpern, and N. Megiddo. A logic for reasoning about probabilities. Information
    and Computation, 87:78–128, 1990.
 7. R. Fagin and J. Y. Halpern. Reasoning about knowledge and probability. J. ACM, 41(2):340–
    367, 1994.
 8. M. Finthammer and C. Beierle. How to exploit parametric uniformity for maximum en-
    tropy reasoning in a relational probabilistic logic. In L. Fariñas del Cerro, A. Herzig, and
    J. Mengin, editors, JELIA, volume 7519 of LNAI, pages 189–201. Springer, 2012.
 9. M. Finthammer and M. Thimm. An integrated development environment for probabilistic
    relational reasoning. Logic Journal of the IGPL, 20(5):831–871, 2012.
10. J. Fisseler. Learning and Modeling with Probabilistic Conditional Logic, volume 328 of
    Dissertations in Artificial Intelligence. IOS Press, Amsterdam, 2010.
11. L. Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning. MIT Press,
    2007.
12. J. Halpern. Reasoning About Uncertainty. MIT Press, 2005.
13. A. Jaimovich, O. Meshi, and N. Friedman. Template based inference in symmetric relational
    Markov random fields. In Proceedings of the Twenty-Third Conference on Uncertainty in
    Artificial Intelligence. AUAI Press, 2007.
14. G. Kern-Isberner. Characterizing the principle of minimum cross-entropy within a
    conditional-logical framework. Artificial Intelligence, 98:169–208, 1998.
15. G. Kern-Isberner and T. Lukasiewicz. Combining probabilistic logic programming with the
    power of maximum entropy. Artificial Intelligence, Special Issue on Nonmonotonic Reason-
    ing, 157(1-2):139–202, 2004.
16. A. Krämer. Transformation rules for lifted inference in relational probabilistic logic knowl-
    edge bases. B.Sc. Thesis, Dept. of Computer Science, University of Hagen, Germany, 2011.
17. A. Krämer and C. Beierle. On lifted inference for a relational probabilistic conditional logic
    with maximum entropy semantics. In T. Lukasiewicz and A. Sali, editors, Foundations of
    Information and Knowledge Systems (FoIKS 2012), volume 7153 of LNCS, pages 224–243.
    Springer, 2012.
18. N. Nilsson. Probabilistic logic. Artificial Intelligence, 28:71–87, 1986.
19. D. Nute and C. Cross. Conditional logic. In D. Gabbay and F. Guenther, editors, Handbook
    of Philosophical Logic, volume 4, pages 1–98. Kluwer Academic Publishers, second edition,
    2002.
20. J. Paris. The uncertain reasoner’s companion – A mathematical perspective. Cambridge
    University Press, 1994.
21. D. Poole. First-order probabilistic inference. In G. Gottlob and T. Walsh, editors, Proc.
    IJCAI-03, pages 985–991. Morgan Kaufmann, 2003.
22. M. Thimm, G. Kern-Isberner, and J. Fisseler. Relational probabilistic conditional reasoning
    at maximum entropy. In ECSQARU, volume 6717 of LNCS, pages 447–458. Springer, 2011.