Structure Learning in Bayesian Networks with Parent Divorcing
                                           Ulrich von Waldow (waldow@in.tum.de)
                                            Technische Universität München, Arcisstr. 21
                                                     80333 Munich, Germany

                                      Florian Röhrbein (florian.roehrbein@in.tum.de)
                                            Technische Universität München, Arcisstr. 21
                                                     80333 Munich, Germany


                            Abstract                                      Jensen, & Jensen, 1989). In the paper Parent Divorcing is de-
                                                                          fined as a technique to design more efficient Bayesian mod-
  Bayesian networks (BNs) are an essential tool for the model-
  ing of cognitive processes. They represent probabilistic knowl-         els. By introducing a new node as a divisive node, the number
  edge in an intuitive way and allow to draw inferences based             of incoming edges of a selected node is reduced. This influ-
  on current evidence and built-in hypotheses. In this paper, a           ences the number of entries of the CPT of the selected node,
  structure learning scheme for BNs will be examined that is
  based on so-called Child-friendly Parent Divorcing (CfPD).              therefore, the computational efficiency is increased. Figure 1
  This algorithm groups together nodes with similar properties            shows a simple example of standard Parent Divorcing. On the
  by adding a new node to the existing network. The updating              left side, node p has a CPT with size 2m . After inserting node
  of all affected probabilities is formulated as an optimization
  problem. The resulting procedure reduces the size of the con-           x, the node p in the resulting graph on the right-hand side has
  ditional probability tables (CPT) significantly and hence im-           a CPT of size 2k+1 < 2m (Neapolitan, 2003).
  proves the efficiency of the network, making it suitable for
  larger networks typically encountered in cognitive modelling.
  Keywords: Bayesian network; learning; inference; adding
  nodes; probability computation

                        Introduction
A BN is a directed acyclic graph (DAG) with a probability                 Figure 1: Simple Example of Parent Divorcing according to
distribution P , which is expressed as CPT on each node or                (Olesen et al., 1989). The left side represents the initial graph. In-
on each variable, respectively. We assume binary nodes, thus              serting x as an intermediate node splits the parent nodes into two sets
                                                                          of nodes.
a value can be true or f alse. The CPTs have a size of 2k ,
where k is the number of parents of a node. Let p be a node               Modification
of the BN and let O be the set of parents of p. The aim is
                                                                          In general the following statement holds for a BN: Let k be
to reduce the size of the CPT of node p, by splitting the set
                                                                          the number of parents of a node x, then the size of the CPT of
of parent nodes O into smaller sets O1 , O2 with O1 ∪ O2 = O .
                                                                          x is 2k . Moreover let n be the total number of nodes xi with
Therefore, we introduce a new node, x, as a new parent of p
                                                                          i ∈ n, then if every node xi does not have more then k parents,
and reset the connections of the network. The conclusion is a
                                                                          the total network needs less then n · 2k entries.
smaller CPT of node p. Of course there are several rules and
                                                                             The Parent Divorcing mentioned above is now modified in
net topologies, that need to be taken into account. In addition,
                                                                          such a way, that the resulting sets of parent nodes are not ar-
the probabilities of node p change and new probabilities for x
                                                                          bitrary, but similar in some way, or, that they have a feature
must be computed.
                                                                          in common respectively. Hence the aim is to find a type of
   This paper describes the procedure of Child-friendly Parent
                                                                          similarity between those nodes that should be divorced into a
Divorcing (Röhrbein, Eggert, & Körner, 2009) and the behav-
                                                                          set of groups (henceforth called the to-be-divorced parents).
ior as well as the variation of the considered BN. The findings
                                                                          Here the structure, or rather the connectivity come into con-
are explained and illustrated in a small example. The effect
                                                                          sideration, precisely the number of common child nodes.
of the reducing size of the total number of entries in the CPTs
                                                                          Let D = (V , E) be a DAG with V as the set of nodes and E
is shown in a simulation, where the CfPD is processed on five
                                                                          as the set of directed edges. The CfPD aims to find a set of
random BNs. Furthermore, a cognition scheme is shown, that
                                                                          nodes O = {o1 ...om } with O ⊆ V that have at least one child
models a BN as scenario for recognizing objects by providing
                                                                          node p1 ∈ V with p1 ∈    / O in common. These nodes should
evidence about their location and shape. Therefore, the nodes
                                                                          then be examined if there is another node p2 ∈ V and p2 ∈    /O
of the network are arranged in different levels.
                                                                          that has a set of parents Q with Q ⊂ O .
          Child-Friendly Parent Divorcing                                    In summary: p1 has the set of parents O = {o1 , ..., om } and
                                                                          p2 has the set of parents Q ⊂ O . Therefore O is the set of the
Standard Parent Divorcing                                                 to-be-divorced parents. Next, a node x is inserted in V , which
The underlying technique for the CfPD learning algorithm is               divides the to-be-divorced parents into two sets, which are
based on a design tool that is explained in (Olesen, Kjaerluff,           O1 = {o1 , ..., om }\Q and O2 = Q . Furthermore, Par(p1 ) =


                                                                    146
O1 ∩ x and Par(p2 ) = O2 hold. Figure 4 shows this example                   Scanning: This step scans the graph D to check if it satis-
with O1 = {o1 , o2 } and O2 = {o3 , o4 }.                                    fies the expectations so that CfPD could be used. Figure 3
                                                                             shows the pseudocode of this step. The following expecta-
Functionality and Rules                                                      tions must be met: (1) There must be a node p1 ∈ V that has
The section above provides an introduction and a small exam-                 at least three parent nodes (line 2), otherwise a divorce will
ple of the CfPD. This section will explain the algorithm in de-              be inefficient. Assume Par(p1 ) = O ⊆ V with |O | = m and
tail. The algorithm is a combination of structure and parame-                m ≥ 3. (2) There must be a set of nodes O 0 ⊆ O with |O 0 | = k
ter learning. In the following, let D = (V , E) be a DAG with                and k ≥ 2 that have another child node p2 ∈ V in common.
V as the set of nodes or random variables, respectively with                 (3) If p1 is the only node that the to-be-divorced parents have
|V | = n and E as the set of directed edges with |E| ≤ n·(n−1)
                                                             2   .           in common, the algorithm should not be processed because
In addition let P be a joint probability distribution of the vari-           the provided feature of having another child node in common
ables in V , then B = (D, P ) is a BN (Neapolitan, 2003). Sum-               is not satisfied. If one of these points is not satisfied, CfPD is
ming up, the algorithm can be grouped into five main steps:                  not possible or should not be executed (for point 3).

1. Scanning the network, to find out if CfPD could be exe-                   Selecting If the scanning step returns true, there may be
   cuted.                                                                    several nodes that meet the requirements. The pseudocode
                                                                             in figure 3 handles these conditions. In regards to step 1. of
2. Selecting a node that meets the expectations.                             the scanning step, we are looking for a node whose number
3. Adding a new node and updating the connections, which                     of parents are above a certain threshold t ≥ 3. If there is more
   means adding and deleting some links.                                     than one node {p1 , ..., pk } that comes into question because
4. Computing the parameters of the new node and the modi-                    their number of parents is higher than or equal to the threshold
   fied nodes.                                                               (thus Par(pi ) ≥ t with i ∈ 1, ..., k) we should select the node
                                                                             with the maximum number of parents, hence max{|Par(pi )|}
5. Handling of already learned nodes.                                        (line 3). This is because the node has the largest CPT, which
These five steps are explained in detail in the following sec-               we are trying to reduce. Probably there are two or more nodes
tion.                                                                        with the same, maximum number of parent nodes, we will
                                                                             call this set M , so ∀pm ∈ M : |Par(pm )| = max(|Par(pi )|). In
   DoCihldFriendlyParentDivorcing(bnet)                                      that case the algorithm should choose that node which meets
   0 //n := number of nodes of Bayesian network bnet                         the requirements that the intersection of the parent nodes of
   1: if (scanning(bnet) = true){
   2:    forall i ∈ n{                                                       another node p j 6= pm and the parent nodes of pm are max-
   3:     p1 := max(Par(i))                                                  imized, hence max{|Par(pm ) ∩ Par(p j )|}, where pm ∈ M .
   4:      if (∀ j ∈ Child(Par(p1 ))\p1 : |Par( j) ∩ Par(p1 )| < 2){         Again there may be several nodes that fullfil these conditions.
   5:         i := i\p1 → return to line 3.
   6:      }                                                                 Then the algorithm should choose randomly (line 7). We also
   7:      elseif (∀ j ∈ Child(Par(p1 )) : |max(Par( j))| ≥ 2)               could go almost deeper in selecting the best fitting node, but
   8:             p2 := rand(max(Par( j))                                    there always may be more than one node with the same fea-
   9:      }
   10:     else {p2 := max(Par( j))}                                         tures. So we decided to choose randomly at that point.
   11:   }                                                                   After selecting the node that will be used, a second deci-
   12:   O := Par(p1 )                                                       sion has to be made. Lets say we select p1 as the node
   13:   O0 := Par(p2 )                                                      whose parents are the to-be-divorced parents (set O ). Then
   14:   O2 := O ∩ O0
   15:   O1 := O\O2                                                          we have to choose a second node p j 6= p1 with at least two
   16:   delete(edges) : O2 → p1                                             parents, that are in the set of the to-be-divorced parents:
   17:   insert(x) : O2 → x ∧ x → p1                                         Par(p j ) ∩ O ≥ 2. If there are none except p1 , we decide not
   18:   computing(P(x)) ∧ computing(P(p1 ))
   19: }
                                                                             to split the set O . Otherwise we choose the node p j , such that
   20: else {Child-friendly Parent Divorcing is not possible}                ∀p j 6= p1 : max(|Par(p j )|). Thus at least two of the sets of
                                                                             the to-be-divorced parent nodes have a child, excluding p1 ,
  Figure 2: Pseudocode of Child-firendly Parent Divorcing.                   in common. Of course it can occur that not only one node
                                                                             p j meets these expectations. If this is the case, the algorithm
        scanning(bnet)                                                       should randomly select again.
        1: forall (i ∈ n){
        2:   if (∃i : Par(i) ≥ 3 ∧ ∃ j ∈ Child(Par(i)) ∧ j 6= i :
        3              Par(i) ∩ Par( j) >= 2){                               Adding The third step is adding a divorcing node, which
        3:      return true                                                  will be named x, and reordering the connections of the links.
        4:   }                                                               That means adding new directed edges to x and one outcom-
        5:   else {return f alse}
        6: }
                                                                             ing edge from x. This will be executed in such a way that
                                                                             the to-be-divorced parents are split into two sets. Assume
                                                                             we chose p1 and p2 in the step above. Then Par(p1 ) =
          Figure 3: Pseudocode of the scanning step.                         {o1 , ..., om } = O is the set of the to-be-divorced parents


                                                                       147
and Par(p2 ) = O 0 . The intersection of O and O 0 identi-                               This equation lead us to the following general formula:
fies the splitting of the set O . We split the set O in two
parts O1 and O2 , such that the following statement holds:                                P(p1 |o1 , . . . , om ) =
O1 = {o1 , ..., ok } ∈ O = O \O ∩ O 0 and O2 = {ok+1 , ..., om } =                         = P(p1 |x = T, o1 , . . . , ok ) · P(x = T |ok+1 , . . . , om )+   (2)
O ∩ O 0 . Figure 4 shows an example with m = 4.
                                                                                           + P(p1 |x = F, o1 , . . . ok ) · P(x = F|ok+1 , . . . , om )

                                                                                         Additionally, we can fix the conditional probability of p1 with
                                                                                         a little trick, by assuming a relationship. The motivation is,
                                                                                         according to (Röhrbein et al., 2009), that the link between x
                                                                                         and p1 reflects a kind of taxonomic relationship. Variable x
                                                                                         represents a subclass for p1 and therefore the property P(p1 =
    (a) initial graph       (b) p1 ⇒ getting O (c) p2 ⇒ getting O 0
                                                                                         T ) is 1, if the value of the newly-inserted variable x is true
                                                                                         holds. Thus

                                                                                                       P(p1 = T |x = T, o1 , ..., ok ) = 1                    (3)
                                                                                                       P(p1 = F|x = T, o1 , ..., ok ) = 0                     (4)
                (d) Inserting x and
                divorce O ⇒ get- (e) resulting graph                                     Using the equation (3) in (2) leads to:
                ting O1 and O2
                                                                                            P(p1 |o1 , ..., om ) = 1 − [1 − P(p1 |x = F, o1 , ..., ok )]·
 Figure 4: A simple example of Child-friendly Parent Divorcing                                                                                                (5)
                                                                                                                      · [1 − P(x = T |ok+1 , ..., om )]
Computing By adding a new node to a BN, some changes
must be made to the entries of the CPTs. The CPT of node                                 The equations (2) and (5) will subsequently be implemented
x must be set up and the CPT of p1 must be adjusted. If                                  as an optimization problem to ensure that the probability of
you look to the example in figure 4a, p1 has 24 = 16 entries                             occurrence of node p1 in the resulting graph is the same as in
in its CPT. Compared to 4e, where the number of entries in                               the initial graph.
the CPT of p1 is 23 = 8, this is twice as much. However
                                                                                         Handling If new knowledge of a scenario or of a BN, re-
CfPD should not change any probabilities, more precisely it
                                                                                         spectively emerges, a new node and new links must be in-
should not change the joint probability distribution P of B.
                                                                                         serted. Thus it is essential for the algorithm to specify which
Referring to the small example in figure 4, we compute the
                                                                                         node is a divorcing node and to set the connections appropri-
joint probability distribution of p1 in the resulting graph. Let
                                                                                         atly. An example is shown on the graph in figure 4. Assume
A be the initial graph and B be the resulting graph after adding
                                                                                         there is new knowledge about p1 and p2 expressed in a vari-
a new node. Hence the equation
                                                                                         able om+1 . In the initial case, the new node is inserted and
                                                                                         links are set to p1 and p2 . However, if the new knowledge
  PA (p1 , p2 , o1 , o2 , o3 , o4 ) = PB (p1 , p2 , x, o1 , o2 , o3 , o4 )   (1)         emerges after divorcing the parent nodes (i.e. after inserting
                                                                                         x), the algorithm must identify the new knowledge om+1 as
must hold. We can now use the chain rule (Neapolitan, 2003)                              part of the to-be-divorced set and consequently set the links
to compute the joint probability distribution:                                           to x and p2 .

  PA (p1 , p2 , o1 , o2 , o3 , o4 ) =                                                    Scenarios
       = P(p1 |o1 , o2 , o3 , o4 )·                                                      There are several net topologies that need to be examined.
                                                                                         As a result there are different scenarios of reordering and in-
         · P(p2 |o3 , o4 ) · P(o1 ) · P(o2 ) · P(o3 ) · P(o4 )                           serting new nodes, depending on the net topology. Figure 5
  PB (p1 , p2 , x, o1 , o2 , o3 , o4 ) =                                                 shows four different types of scenarios (topologies).
       = ∑ P(p1 |x, o1 , o2 ) · P(x|o3 , o4 )·                                       a) In this case, it would not be efficient to introduce a new
            x
                                                                                        node, x, because the size of the set of the parents that can
         · P(p2 |o3 , o4 ) · P(o1 ) · P(o2 ) · P(o3 ) · P(o4 )
                                                                                        be divorced is 1 (i.e. ok+1 is the only parent node of the set
                                                                                        of the to-be-divorced parents that has two children nodes in
Now inserting these two equations into (1) results in:                                  common). The insertion of a new node x does not reduce
                                                                                        the size of the CPT of node p1 . Thus a divorce is redun-
  P(p1 |o1 , o2 , o3 , o4 ) = ∑ P(p1 |x, o1 , o2 ) · P(x|o3 , o4 )                      dant. The algorithm only divides a set with at least two
                                 x
                                                                                        considered nodes.
       = P(p1 |x = T, o1 , o2 ) · P(x = T |o3 , o4 )+
                                                                                     b) This case describes the classical case of CfPD. The
         + P(p1 |x = F, o1 , o2 ) · P(x = F|o3 , o4 )                                   nodes p1 and p2 have a set of parent nodes in common:


                                                                                   148
                                                                                  Pbe f ore represents the absolute probability of the event, that
                                                                                  P(p1 = T ):

                                                                                  Pbe f ore = ∑ ... ∑ P(o1 ) · . . . · P(om ) · P(p1 = T |o1 , ..., om ) (6)
                                                                                                 o1         om

                           (a) not efficient                                      Now let us define Pa f ter : Assume we introduced a new node x
                                                                                  in the resulting BN. This node splits the original set of parents
                                                                                  in {o1 , . . . ok } and {ok+1 , . . . om }. Pa f ter can then be calculated
                                                                                  as follows:
                                                                                  Pa f ter = ∑ . . . ∑ ∑ P(o1 ) · . . .
                             (b) classical                                                      o1          ok x                                                 (7)
                                                                                     · P(ok ) · P(x) · P(p1 = T |o1 , . . . ok , x)

                                                                                  The optimization problem Now we can formulate the op-
                                                                                  timization problem as a cost-minimizing function, where
                                                                                  the cost is the difference between Pbe f ore and Pa f ter .
                         (c) total connection
                                                                                  There are two sets of decision variables: ∑ . . . ∑ ∑ P(p1 =
                                                                                                                                                o1     ok x
                                                                                  T |o1 , . . . , ok , x) and ∑ . . . ∑ P(x = T |ok+1 , . . . om ). The op-
                                                                                                                       ok+1   om
                                                                                  timization problem for the first case, where no correlation
                                                                                  between p1 and x is assumed (see equation (2) above) is as
                          (d) more Iterations
                                                                                  follows:
Figure 5: Different topologies, that can occur in a BN. The left side                1: minimize Pbe f ore − Pa f ter
shows the intitial configuration, the right side shows the result after                 subject to
the CfPD
                                                                                     2:    ∑ . . . ∑[P(x = F|ok+1 , . . . , om ) =
                                                                                          ok+1        om
   ok+1 , ...om . In this case the algorithm selects the node with
   the highest number of incoming edges, which is the node                                            = 1 − P(x = T |ok+1 , . . . , om )]
   with the most parents. If the number is equal, it will choose                     3: Pbe f ore − Pa f ter ≥ 0
   randomly. The node x is inserted and divides the two sets                         4:   ∑ . . . ∑[P(p1 = T |o1 , . . . , om ) ≤                                (8)
                                                                                           o1         om
   of parent nodes: o1 , ...ok and ok+1 , ...om .
                                                                                                 ≤ ∑ P(p1 = T |o1 , . . . , ok , x) · P(x|ok+1 , . . . , om )]
c) Here we have a total connection. This means that every                                               x
   node o1 , ...om is connected to the nodes p1 and p2 . The                         5: 0 ≤ ∑ . . . ∑ P(x = T |ok+1 , . . . , om ) ≤ 1
   algorithm inserts a node, x, that divides each node oi from                                   ok+1            om
   another. In this case, the size of the CPT is not minimized,                      6: 0 ≤ ∑ . . . ∑ ∑ P(p1 = T |o1 , . . . , ok , x) ≤ 1
   but the sum of entries is reduced. Initially the sum of all                                   o1             ok x
   entries before is m +2m +2m and afterwards we have a sum
                                                                                  Taking into account that there is a correlation between p1 and
   of m + 2m + 2 afterwards.
                                                                                  x we have to replace line 6 of the optimization problem above
d) This case shows a scenario where a node p1 has a number                        according to (3) by:
   of parent nodes in common with more than only one other
                                                                                          7: 0 ≤ ∑ . . . ∑ P(p1 = T |o1 , . . . , ok , x = F) ≤ 1
   node. In this case the algorithm can be processed two times                                             o1         ok
   and hence inserts more than one dividing node is inserted.                                                                                                    (9)
                                                                                          8:     ∑ . . . ∑ P(p1 = T |o1 , . . . , ok , x = T ) = 1
                                                                                                 o1             ok
Value computation as an optimization problem
                                                                                  Line 1 represents the objective function. The aim is to
In this section, the computation of the probabilities of the
                                                                                  minimize the difference between Pbe f ore and Pa f ter , thus we
changing and the newly-inserted nodes are formulated as
                                                                                  have a minimization problem. Lines 2-8 represent the con-
an optimization problem. The aim is to minimize the er-
                                                                                  straints that must be fulfilled. Line 2 assigns the values for
ror between the absolute probability of the occurrence of
                                                                                  the probability P(x = F|ok+1 , . . . , om ) for all combinations
P(p1 = T |o1 , . . . , om ) of the initial graph (Pbe f ore ) and the ab-
                                                                                  of {ok+1 , . . . , om }. In line 3 we assure that the new prob-
solute probability of the occurrence of P(p1 = T |x, o1 , . . . ok )
                                                                                  ability Pbe f ore is not larger than the probability Pa f ter . In
in the net resulting from the CfPD (Pa f ter ).
                                                                                  case of an error, i.e. Pbe f ore 6= Pa f ter , the absolute probabil-
Initial values Let us define Pbe f ore first: Assume in the ini-                  ity of the occurrence of p1 after the CfPD is less than be-
tial BN we have chosen p1 in step 2 of the algorithm. Then                        fore or in other words the occurrence is not overestimated.
m is the number of parents of p1 and Par(p1 ) = {o1 , . . . om }.                 This constraint has been chosen because intutitively it seems


                                                                            149
more reliable to underestimate the occurrence of a variable                   node with the highest possible number of parents and hence
than predicting an underestimation of a non-occurrence. Al-                   with the largest CPT is selected and reduced. For example
though, sometimes it seems more reliable to overestimate the                  the number of parents of the chosen node in BNet5 is 19,
occurrence of a variable at the time when e.g. a variable de-                 hence this node has 219 = 524288 CPT entries. After exe-
scribes the failure of part of a system for example. In that                  cuting CfPD on this net, the chosen node has 8 parents after-
case, the user has to differentiate and to determine whether                  wards and the newly-inserted node has 12 parents. Hence the
to use Pbe f ore or rather Pa f ter as minuend or subtrahend in               total number is reduced by 219 − (28 + 212 ) = 431996 entries.
line 3 and accordingly to use the ≤ for underestimation or                    Figure 6 represents the trend of 10 iterations. For example
the ≥ otherwise (see line 4). The fourth line assures that
the probabilities, precisely the JPD of P(p1 |o1 , . . . , ok , x) is                 1400000


not larger than before. The last two lines only ensure the                            1200000


non-negativity of the decision variables, and that their val-                         1000000


ues are not larger than 1. Assuming the correlation between
                                                                                       800000

x and p1 , the constraint for P(p1 |o1 , . . . , ok , x) is fixed by
P(p1 = T |o1 , . . . , ok , x = T ) = 1 according to (3). Hence line                   600000


6 of (8) is replaced by line 7 and 8 of (9). We took the                               400000


BN B = (G , P ) from figure 4a with its JPD P and Pbe f ore =                          200000


P(p1 |o1 , o2 ) = 0.6202 and performed the CfPD. This yields                                0

to the network B 0 = (G 0 , P 0 ) in 4e. We calculated the JPT’s of                             0   1   2

                                                                                                            BNet1
                                                                                                                    3        4

                                                                                                                         BNet2
                                                                                                                                   5

                                                                                                                                 BNet3
                                                                                                                                          6

                                                                                                                                         BNet4
                                                                                                                                                 7

                                                                                                                                                 BNet5
                                                                                                                                                         8   9   10


p1 and x using the optimization approach on the one side and
the EM-algorithm with 10 samples on the other. Now we can                     Figure 6: Processing the CfPD algorithm on 5 randomly-connected
calculate the values for Popt (p1 |o1 , o2 ) and PEM10 (p |o , o ) as         BNs with 30 nodes and 250 links.
                                                          1 1 2
                        0            0
well as the JPD Popt of net B for the optimization approach
                  010 for the EM-algorithm respectively. Using                BNet5 has 40 nodes and only 14666 CPT entries in the re-
and the JPD PEM
                                                                              sulting graph, which is 1.54% according to the initial graph.
the Bayes Net Toolbox for Matlab we receive the following
                                           10 (p |o , o ) = 0.7554,           Taking BNet5 as an example again, the algorithm can be pro-
results: Popt (p1 |o1 , o2 ) = 0.6202, PEM      1 1 2
   0                         010                                              cessed 84 times on this network, which will lead to a network
Popt = 0.9915 and PEM = 0.8620. As we can see the opti-
                                                                              with 114 nodes and 760 (0.07%) CPT entries. Thus the size
mization approach yields to a result, that do not change the
                                                                              of reduction is decreasing by each iteration. The number of
probability of occurrence of p1 . Also the JPD only has a very
little deviation. Whereas the result of the EM-algorithm with                 edges in a BN is restricted by n·(n−1)
                                                                                                                 2   . On the one hand, look-
sample size 10 has a deviation of more then 13 % regarding                    ing at a network with a high connectivity that has a higher
the absolut probability of p1 . Also the JPD of the complete                  number of edges provides the opportunity to execute CfPD
net varied of almost 14%. But we can obtain also good results                 several times and to lead to a high reduction of the number
for the EM-algorithm by incrementing the sample size. E.g.                    of CPT entries. On the other hand, if we consider a network
we receive PEM  50 (p |o , o ) = 0.6342 and P 050 = 0.9534 for a              that is less connected (hence it has only a few edges), we can
                       1 1 2                     EM
database with 50 samples. Although the sample size could be                   only execute the algorithm a limited number of times and the
increased almost further we will not get any better results for               reduction in the size of CPT entries will also be reduced. Fig-
the JPD, then using the optimization approach.                                ure 7 shows the difference of five iterations between five BNs
                                                                              with 30 nodes and 100 edges on the one side and 350 edges
                            Training Data                                     on the other. Therefore, the improvement regarding the total
                                                                              number of CPTs is very high at the beginning, but is flat-
We created five different, randomly connected BNs with 30                     tened by raising the number of iterations. If we look at Bnet2
nodes and 250 edges. Afterwards the CfPD algorithm was ex-                    from 6, the number of CPT entries after 14 iterations is 8255
ecuted 10 times on these networks repeatedly. Table 1 shows                   compared to the start, when the network had 412167 entries,
the total number of CPT entries for the 10 iterations. Figure                 which is an improvement of over 98%. Therefore we have to
6 represents the decreasing trend of the number of entries for                weigh the improvement of reducing the number of entries in
each iteration. Already in the first iteration we can observe                 the CPTs on the one hand against adding a new node to the
       iteration    BNet1     BNet2     BNet3    BNet4    BNet5
                                                                              net on the other.
       0 (start)   339243    1138913   311404   829771   285098
           1
           2
                   210347
                   79405
                              93409
                              78081
                                       181612
                                       116844
                                                313803
                                                184907
                                                         156202
                                                         125514
                                                                                                                        Cognition
           3       63597      62273    85164    120139    92780
           4       47597      54625    69836    88011     76972               We have shown that summing up nodes with a common fea-
           5       31597      46451    53486    55755     60654               ture by introducing a new node can improve the number of
                                                                              CPT entries significantly. We also pointed out that a high
Table 1: Number of entries of the five random BNs. The first row
(start) represents the total number of nodes before executing the al-         number of iterations will reduce the number of entries contin-
gorithm.                                                                      uously, but the improvement decreases after each processing.
a significant reduction of size. This is evident because the                     Now let us make a little extension to the learning scheme


                                                                        150
                                                                                                                            can assume that this object is a table with high probability
                                                                                                                            and e.g. a cup with very low probability. CfPD can be used to
                                                                                                                            improve a level of this network. If the level shapes includes
                                                                                                                            forms such as triangular, squared and round, the algorithm
                                                                                                                            can group together the first two by creating a new node and
                                                                                                                            hence a new level called, for example angled.

                                                                                                                                                     Summary
                                                                                                                            By adding a new node to a BN and updating the links in the
Figure 7: Decreasing trend of the number of CPTs after the first 5
iterations for two different random BNs. Left: BN with 30 nodes                                                             net, new probabilities must be computed and existing prob-
and 100 edges. Right: BN with 30 nodes and 350 edges                                                                        abilities must be changed in order that the probability of oc-
                                                                                                                            currence of the observed node does not change. Therefore we
of CfPD by choosing an appropriate node or more precisely,                                                                  formulated the problem as an optimization approach and min-
a convenient level, manually. Assume we have a learning                                                                     imized the costs between the probability of occurrence of the
scheme that identifies ob jects by defining the actual place                                                                observed node before and afterwards. Other objectives, such
as well as the shape of the object. A simple example can                                                                    as the Kullback-Leibler divergence might also be possible and
be seen in figure 8. In this example, we have three levels                                                                  can be used instead of the probability of occurrence. This will
of identifiers: place, shape and object see figure 8a. There                                                                require some testing and might be a subject for future work.
are four example places in level one, three different shapes                                                                We tested the optimization problem on the classical approach
in level two and 5 example objects in level three. By obtain-                                                               applied in this paper, using FICO XPress IVE 7.6 with ran-
ing knowledge about the place and the shape, the observer,                                                                  dom probabilities. The computed probabilities ensured the
also called agent, can identify a unique object with a suitable                                                             same probability of occurrence for the observed node at the
probability. CfPD may help us to improve these levels. Figure                                                               initial network as well as for the node at the resulting net.
8b is the resulting graph after executing CfPD on the second                                                                   We showed that the size of the CPTs can be reduced in
level, which means that the parents of the shape nodes, so                                                                  every step by repeating the algorithm to an existing BN. Fur-
level one, became divided. Assume the following scenario:                                                                   thermore the improvement during the first iterations is quite
                                                                                                                            considerable and particularly in the case of highly-connected
              garden
                  ID: garden             bathroom                      kitchen                        living room
                                                                                                                            networks, a significant reduction in the total number of CPT
                                                                                                                            entries can be achieved. By reducing the sizes of the CPTs
                                                                                                                            the efficiency increases distinctly.
                               square                   round                       triangular
                                                                                                                               We provide an example how the CfPD algorithm can be
                                                                                                                            used for cognition of objects. The algorithm is processed on
                                                                                                                            a level of a BN instead of searching for a node with a large
                    cup                 pot             ball                  towel                   table
                                                                                                                            CPT. As a result the number of CPT entries is, of course,
           (a) A cognition scheme for identifying ob-
                                                                                                                            reduced and a new level is created that partitioned the chosen
           jects according their location and shape                                                                         level more precisely.
              garden                   kitchen                  living room                       bathroom
                                                                                                                                                    References
                                                                                                                            Bishop, C. M. (2006). Pattern recognition and machine
                                                                           angled                                             learning. Springer.
                                                                                                                            Friedman, N. (n.d.). Learning belief networks in the presence
                                                                                                                              of missing values and hidden variables..
                                          round                  triangular                      square
                                                                                                                            Jensen, F. V. (2007). Bayesian networks and decision graphs.
                                                                                                                              Springer.
                                 cup              pot               ball                towel                 table
                                                                                                                            Neapolitan, R. E. (2003). Learning bayesian networks. Pren-
                                                                                                                              tice Hall.
           (b) Graph after processing CfPD on the sec-                                                                      Olesen, K. G., Kjaerluff, U., Jensen, F., & Jensen, F. V.
           ond level named shapes.                                                                                            (1989). A munin network for the median nerve - a case
                                                                                                                              study on loops. Applied Artificial Intelligence.
Figure 8: Example of a cognition scheme. 8a consists of a                                                                   Pearl, J. (1986). Fusion, propagation, and structuring in belief
3-level scheme with 12 nodes, 27 links and 92 CPT entries                                                                     networks. Artificial Intelligence.
results in a cognition scheme with four levels, 13 nodes, 25                                                                Röhrbein, F., Eggert, J., & Körner, E. (2009). Child-friendly
links and 80 CPT entries                                                                                                      divorcing: Incremental hierarchy learning in bayesian net-
The agent recognizes the situation, that he is in the kitchen                                                                 works. In IJCNN 2009.
and distinguishes a square-shaped object. Thus we have ob-
tained evidence for the first and second level. Then the agent


                                                                                                                      151