<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Structure Learning in Bayesian Networks with Parent Divorcing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arcisstr.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Munich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Ro¨ hrbein (florian.roehrbein@in.tum.de)</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universita ̈t Mu ̈nchen</institution>
          ,
          <addr-line>Arcisstr. 21 80333 Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>146</fpage>
      <lpage>151</lpage>
      <abstract>
        <p>Bayesian networks (BNs) are an essential tool for the modeling of cognitive processes. They represent probabilistic knowledge in an intuitive way and allow to draw inferences based on current evidence and built-in hypotheses. In this paper, a structure learning scheme for BNs will be examined that is based on so-called Child-friendly Parent Divorcing (CfPD). This algorithm groups together nodes with similar properties by adding a new node to the existing network. The updating of all affected probabilities is formulated as an optimization problem. The resulting procedure reduces the size of the conditional probability tables (CPT) significantly and hence improves the efficiency of the network, making it suitable for larger networks typically encountered in cognitive modelling.</p>
      </abstract>
      <kwd-group>
        <kwd>Bayesian network</kwd>
        <kwd>learning</kwd>
        <kwd>inference</kwd>
        <kwd>adding nodes</kwd>
        <kwd>probability computation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>A BN is a directed acyclic graph (DAG) with a probability
distribution P , which is expressed as CPT on each node or
on each variable, respectively. We assume binary nodes, thus
a value can be true or f alse. The CPTs have a size of 2k,
where k is the number of parents of a node. Let p be a node
of the BN and let O be the set of parents of p. The aim is
to reduce the size of the CPT of node p, by splitting the set
of parent nodes O into smaller sets O1; O2 with O1 [ O2 = O.
Therefore, we introduce a new node, x, as a new parent of p
and reset the connections of the network. The conclusion is a
smaller CPT of node p. Of course there are several rules and
net topologies, that need to be taken into account. In addition,
the probabilities of node p change and new probabilities for x
must be computed.</p>
      <p>
        This paper describes the procedure of Child-friendly Parent
Divorcing
        <xref ref-type="bibr" rid="ref6">(Ro¨hrbein, Eggert, &amp; Ko¨rner, 2009)</xref>
        and the
behavior as well as the variation of the considered BN. The findings
are explained and illustrated in a small example. The effect
of the reducing size of the total number of entries in the CPTs
is shown in a simulation, where the CfPD is processed on five
random BNs. Furthermore, a cognition scheme is shown, that
models a BN as scenario for recognizing objects by providing
evidence about their location and shape. Therefore, the nodes
of the network are arranged in different levels.
The underlying technique for the CfPD learning algorithm is
based on a design tool that is explained in
        <xref ref-type="bibr" rid="ref4">(Olesen, Kjaerluff,
Jensen, &amp; Jensen, 1989)</xref>
        . In the paper Parent Divorcing is
defined as a technique to design more efficient Bayesian
models. By introducing a new node as a divisive node, the number
of incoming edges of a selected node is reduced. This
influences the number of entries of the CPT of the selected node,
therefore, the computational efficiency is increased. Figure 1
shows a simple example of standard Parent Divorcing. On the
left side, node p has a CPT with size 2m. After inserting node
x, the node p in the resulting graph on the right-hand side has
a CPT of size 2k+1 &lt; 2m
        <xref ref-type="bibr" rid="ref3">(Neapolitan, 2003)</xref>
        .
      </p>
      <sec id="sec-1-1">
        <title>Modification</title>
        <p>In general the following statement holds for a BN: Let k be
the number of parents of a node x, then the size of the CPT of
x is 2k. Moreover let n be the total number of nodes xi with
i 2 n, then if every node xi does not have more then k parents,
the total network needs less then n 2k entries.</p>
        <p>The Parent Divorcing mentioned above is now modified in
such a way, that the resulting sets of parent nodes are not
arbitrary, but similar in some way, or, that they have a feature
in common respectively. Hence the aim is to find a type of
similarity between those nodes that should be divorced into a
set of groups (henceforth called the to-be-divorced parents).
Here the structure, or rather the connectivity come into
consideration, precisely the number of common child nodes.
Let D = (V ; E) be a DAG with V as the set of nodes and E
as the set of directed edges. The CfPD aims to find a set of
nodes O = fo1:::omg with O V that have at least one child
node p1 2 V with p1 2= O in common. These nodes should
then be examined if there is another node p2 2 V and p2 2= O
that has a set of parents Q with Q O.</p>
        <p>In summary: p1 has the set of parents O = fo1; :::; omg and
p2 has the set of parents Q O. Therefore O is the set of the
to-be-divorced parents. Next, a node x is inserted in V , which
divides the to-be-divorced parents into two sets, which are
O1 = fo1; :::; omgnQ and O2 = Q . Furthermore, Par(p1) =
O1 \ x and Par(p2) = O2 hold. Figure 4 shows this example
with O1 = fo1; o2g and O2 = fo3; o4g.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Functionality and Rules</title>
        <p>
          The section above provides an introduction and a small
example of the CfPD. This section will explain the algorithm in
detail. The algorithm is a combination of structure and
parameter learning. In the following, let D = (V ; E) be a DAG with
V as the set of nodes or random variables, respectively with
jV j = n and E as the set of directed edges with jEj n (n2 1) .
In addition let P be a joint probability distribution of the
variables in V , then B = (D; P ) is a BN
          <xref ref-type="bibr" rid="ref3">(Neapolitan, 2003)</xref>
          .
Summing up, the algorithm can be grouped into five main steps:
1. Scanning the network, to find out if CfPD could be
executed.
2. Selecting a node that meets the expectations.
3. Adding a new node and updating the connections, which
means adding and deleting some links.
4. Computing the parameters of the new node and the
modified nodes.
        </p>
        <sec id="sec-1-2-1">
          <title>5. Handling of already learned nodes.</title>
          <p>These five steps are explained in detail in the following
section.</p>
          <p>DoCihldFriendlyParentDivorcing(bnet)
0 ==n := number of nodes of Bayesian network bnet
1: if (scanning(bnet) = true)f
2: forall i 2 nf
3: p1 := max(Par(i))
4: if (8 j 2 Child(Par(p1))np1 : jPar( j) \ Par(p1)j &lt; 2)f
5: i := inp1 ! return to line 3.
76:: gelseif (8 j 2 Child(Par(p1)) : jmax(Par( j))j 2)
8: p2 := rand(max(Par( j))
9: g
10: else fp2 := max(Par( j))g
11: g
12: O := Par(p1)
13: O0 := Par(p2)
14: O2 := O \ O0
15: O1 := OnO2
16: delete(edges) : O2 ! p1
17: insert(x) : O2 ! x ^ x ! p1
18: computing(P(x)) ^ computing(P(p1))
19: g
20: else fChild-friendly Parent Divorcing is not possibleg</p>
          <p>Scanning: This step scans the graph D to check if it
satisfies the expectations so that CfPD could be used. Figure 3
shows the pseudocode of this step. The following
expectations must be met: (1) There must be a node p1 2 V that has
at least three parent nodes (line 2), otherwise a divorce will
be inefficient. Assume Par(p1) = O V with jOj = m and
m 3. (2) There must be a set of nodes O0 O with jO0j = k
and k 2 that have another child node p2 2 V in common.
(3) If p1 is the only node that the to-be-divorced parents have
in common, the algorithm should not be processed because
the provided feature of having another child node in common
is not satisfied. If one of these points is not satisfied, CfPD is
not possible or should not be executed (for point 3).
Selecting If the scanning step returns true, there may be
several nodes that meet the requirements. The pseudocode
in figure 3 handles these conditions. In regards to step 1. of
the scanning step, we are looking for a node whose number
of parents are above a certain threshold t 3. If there is more
than one node fp1; :::; pkg that comes into question because
their number of parents is higher than or equal to the threshold
(thus Par(pi) t with i 2 1; :::; k) we should select the node
with the maximum number of parents, hence maxfjPar(pi)jg
(line 3). This is because the node has the largest CPT, which
we are trying to reduce. Probably there are two or more nodes
with the same, maximum number of parent nodes, we will
call this set M , so 8pm 2 M : jPar(pm)j = max(jPar(pi)j). In
that case the algorithm should choose that node which meets
the requirements that the intersection of the parent nodes of
another node p j 6= pm and the parent nodes of pm are
maximized, hence maxfjPar(pm) \ Par(p j)jg, where pm 2 M .
Again there may be several nodes that fullfil these conditions.
Then the algorithm should choose randomly (line 7). We also
could go almost deeper in selecting the best fitting node, but
there always may be more than one node with the same
features. So we decided to choose randomly at that point.
After selecting the node that will be used, a second
decision has to be made. Lets say we select p1 as the node
whose parents are the to-be-divorced parents (set O). Then
we have to choose a second node p j 6= p1 with at least two
parents, that are in the set of the to-be-divorced parents:
Par(p j) \ O 2. If there are none except p1, we decide not
to split the set O. Otherwise we choose the node p j, such that
8p j 6= p1 : max(jPar(p j)j). Thus at least two of the sets of
the to-be-divorced parent nodes have a child, excluding p1,
in common. Of course it can occur that not only one node
p j meets these expectations. If this is the case, the algorithm
should randomly select again.</p>
          <p>Adding The third step is adding a divorcing node, which
will be named x, and reordering the connections of the links.
That means adding new directed edges to x and one
outcoming edge from x. This will be executed in such a way that
the to-be-divorced parents are split into two sets. Assume
we chose p1 and p2 in the step above. Then Par(p1) =
fo1; :::; omg = O is the set of the to-be-divorced parents
and Par(p2) = O0. The intersection of O and O0
identifies the splitting of the set O. We split the set O in two
parts O1 and O2, such that the following statement holds:
O1 = fo1; :::; okg 2 O = OnO \ O0 and O2 = fok+1; :::; omg =
O \ O0. Figure 4 shows an example with m = 4.</p>
          <p>(a) initial graph</p>
          <p>(b) p1 ) getting O (c) p2 ) getting O0
(d) Inserting x and
divorce O ) get- (e) resulting graph
ting O1 and O2
Computing By adding a new node to a BN, some changes
must be made to the entries of the CPTs. The CPT of node
x must be set up and the CPT of p1 must be adjusted. If
you look to the example in figure 4a, p1 has 24 = 16 entries
in its CPT. Compared to 4e, where the number of entries in
the CPT of p1 is 23 = 8, this is twice as much. However
CfPD should not change any probabilities, more precisely it
should not change the joint probability distribution P of B.
Referring to the small example in figure 4, we compute the
joint probability distribution of p1 in the resulting graph. Let
A be the initial graph and B be the resulting graph after adding
a new node. Hence the equation</p>
          <p>
            PA(p1; p2; o1; o2; o3; o4) = PB(p1; p2; x; o1; o2; o3; o4)
(1)
must hold. We can now use the chain rule
            <xref ref-type="bibr" rid="ref3">(Neapolitan, 2003)</xref>
            to compute the joint probability distribution:
          </p>
          <p>PA(p1; p2; o1; o2; o3; o4) =</p>
          <p>= P(p1jo1; o2; o3; o4)
PB(p1; p2; x; o1; o2; o3; o4) =</p>
          <p>P(p2jo3; o4) P(o1) P(o2) P(o3) P(o4)
= å P(p1jx; o1; o2) P(xjo3; o4)
x</p>
          <p>
            P(p2jo3; o4) P(o1) P(o2) P(o3) P(o4)
Now inserting these two equations into (1) results in:
P(p1jo1; o2; o3; o4) = å P(p1jx; o1; o2) P(xjo3; o4)
x
= P(p1jx = T; o1; o2) P(x = T jo3; o4)+
+ P(p1jx = F; o1; o2) P(x = Fjo3; o4)
(2)
(3)
(4)
(5)
This equation lead us to the following general formula:
P(p1jo1; : : : ; om) =
= P(p1jx = T; o1; : : : ; ok) P(x = T jok+1; : : : ; om)+
+ P(p1jx = F; o1; : : : ok) P(x = Fjok+1; : : : ; om)
Additionally, we can fix the conditional probability of p1 with
a little trick, by assuming a relationship. The motivation is,
according to
            <xref ref-type="bibr" rid="ref6">(Ro¨hrbein et al., 2009)</xref>
            , that the link between x
and p1 reflects a kind of taxonomic relationship. Variable x
represents a subclass for p1 and therefore the property P(p1 =
T ) is 1, if the value of the newly-inserted variable x is true
holds. Thus
          </p>
          <p>P(p1 = T jx = T; o1; :::; ok) = 1</p>
          <p>P(p1 = Fjx = T; o1; :::; ok) = 0</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>Using the equation (3) in (2) leads to:</title>
          <p>P(p1jo1; :::; om) = 1 [1</p>
          <p>P(p1jx = F; o1; :::; ok)]
[1</p>
          <p>P(x = T jok+1; :::; om)]
The equations (2) and (5) will subsequently be implemented
as an optimization problem to ensure that the probability of
occurrence of node p1 in the resulting graph is the same as in
the initial graph.</p>
          <p>Handling If new knowledge of a scenario or of a BN,
respectively emerges, a new node and new links must be
inserted. Thus it is essential for the algorithm to specify which
node is a divorcing node and to set the connections
appropriatly. An example is shown on the graph in figure 4. Assume
there is new knowledge about p1 and p2 expressed in a
variable om+1. In the initial case, the new node is inserted and
links are set to p1 and p2. However, if the new knowledge
emerges after divorcing the parent nodes (i.e. after inserting
x), the algorithm must identify the new knowledge om+1 as
part of the to-be-divorced set and consequently set the links
to x and p2.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>Scenarios</title>
        <p>There are several net topologies that need to be examined.
As a result there are different scenarios of reordering and
inserting new nodes, depending on the net topology. Figure 5
shows four different types of scenarios (topologies).
a) In this case, it would not be efficient to introduce a new
node, x, because the size of the set of the parents that can
be divorced is 1 (i.e. ok+1 is the only parent node of the set
of the to-be-divorced parents that has two children nodes in
common). The insertion of a new node x does not reduce
the size of the CPT of node p1. Thus a divorce is
redundant. The algorithm only divides a set with at least two
considered nodes.
b) This case describes the classical case of CfPD. The
nodes p1 and p2 have a set of parent nodes in common:
(a) not efficient</p>
        <p>(b) classical
(c) total connection
(d) more Iterations
ok+1; :::om. In this case the algorithm selects the node with
the highest number of incoming edges, which is the node
with the most parents. If the number is equal, it will choose
randomly. The node x is inserted and divides the two sets
of parent nodes: o1; :::ok and ok+1; :::om.
c) Here we have a total connection. This means that every
node o1; :::om is connected to the nodes p1 and p2. The
algorithm inserts a node, x, that divides each node oi from
another. In this case, the size of the CPT is not minimized,
but the sum of entries is reduced. Initially the sum of all
entries before is m + 2m + 2m and afterwards we have a sum
of m + 2m + 2 afterwards.
d) This case shows a scenario where a node p1 has a number
of parent nodes in common with more than only one other
node. In this case the algorithm can be processed two times
and hence inserts more than one dividing node is inserted.</p>
      </sec>
      <sec id="sec-1-4">
        <title>Value computation as an optimization problem</title>
        <p>In this section, the computation of the probabilities of the
changing and the newly-inserted nodes are formulated as
an optimization problem. The aim is to minimize the
error between the absolute probability of the occurrence of
P(p1 = T jo1; : : : ; om) of the initial graph (Pbe f ore) and the
absolute probability of the occurrence of P(p1 = T jx; o1; : : : ok)
in the net resulting from the CfPD (Pa f ter).</p>
        <p>Initial values Let us define Pbe f ore first: Assume in the
initial BN we have chosen p1 in step 2 of the algorithm. Then
m is the number of parents of p1 and Par(p1) = fo1; : : : omg.
Pbe f ore represents the absolute probability of the event, that
P(p1 = T ):
Now let us define Pa f ter: Assume we introduced a new node x
in the resulting BN. This node splits the original set of parents
in fo1; : : : okg and fok+1; : : : omg. Pa f ter can then be calculated
as follows:
Pa f ter = å : : : å å P(o1) : : :</p>
        <p>o1 ok x</p>
        <p>P(ok) P(x) P(p1 = T jo1; : : : ok; x)
The optimization problem Now we can formulate the
optimization problem as a cost-minimizing function, where
the cost is the difference between Pbe f ore and Pa f ter.
There are two sets of decision variables: å : : : å å P(p1 =
o1 ok x
T jo1; : : : ; ok; x) and å : : : å P(x = T jok+1; : : : om). The
opok+1 om
timization problem for the first case, where no correlation
between p1 and x is assumed (see equation (2) above) is as
follows:
1: minimize Pbe f ore Pa fter</p>
        <p>subject to
2:
å : : : å[P(x = Fjok+1; : : : ; om) =
ok+1 om
3: Pbe f ore
4:
å : : : å[P(p1 = T jo1; : : : ; om)
o1 om</p>
        <p>= 1 P(x = T jok+1; : : : ; om)]</p>
        <p>Pa fter 0
Taking into account that there is a correlation between p1 and
x we have to replace line 6 of the optimization problem above
according to (3) by:</p>
        <p>Line 1 represents the objective function. The aim is to
minimize the difference between Pbe f ore and Pa f ter, thus we
have a minimization problem. Lines 2-8 represent the
constraints that must be fulfilled. Line 2 assigns the values for
the probability P(x = Fjok+1; : : : ; om) for all combinations
of fok+1; : : : ; omg. In line 3 we assure that the new
probability Pbe f ore is not larger than the probability Pa f ter. In
case of an error, i.e. Pbe f ore 6= Pa f ter, the absolute
probability of the occurrence of p1 after the CfPD is less than
before or in other words the occurrence is not overestimated.
This constraint has been chosen because intutitively it seems
more reliable to underestimate the occurrence of a variable
than predicting an underestimation of a non-occurrence.
Although, sometimes it seems more reliable to overestimate the
occurrence of a variable at the time when e.g. a variable
describes the failure of part of a system for example. In that
case, the user has to differentiate and to determine whether
to use Pbe f ore or rather Pa f ter as minuend or subtrahend in
line 3 and accordingly to use the for underestimation or
the otherwise (see line 4). The fourth line assures that
the probabilities, precisely the JPD of P(p1jo1; : : : ; ok; x) is
not larger than before. The last two lines only ensure the
non-negativity of the decision variables, and that their
values are not larger than 1. Assuming the correlation between
x and p1, the constraint for P(p1jo1; : : : ; ok; x) is fixed by
P(p1 = T jo1; : : : ; ok; x = T ) = 1 according to (3). Hence line
6 of (8) is replaced by line 7 and 8 of (9). We took the
BN B = (G ; P ) from figure 4a with its JPD P and Pbe f ore =
P(p1jo1; o2) = 0:6202 and performed the CfPD. This yields
to the network B0 = (G 0; P 0) in 4e. We calculated the JPT’s of
p1 and x using the optimization approach on the one side and
the EM-algorithm with 10 samples on the other. Now we can
calculate the values for Popt (p1jo1; o2) and PE10M(p1jo1; o2) as
well as the JPD Po0 pt of net B0 for the optimization approach
and the JPD PE01M0 for the EM-algorithm respectively. Using
the Bayes Net Toolbox for Matlab we receive the following
results: Popt (p1jo1; o2) = 0:6202, PE10M(p1jo1; o2) = 0:7554,
Po0 pt = 0:9915 and PE01M0 = 0:8620. As we can see the
optimization approach yields to a result, that do not change the
probability of occurrence of p1. Also the JPD only has a very
little deviation. Whereas the result of the EM-algorithm with
sample size 10 has a deviation of more then 13 % regarding
the absolut probability of p1. Also the JPD of the complete
net varied of almost 14%. But we can obtain also good results
for the EM-algorithm by incrementing the sample size. E.g.
we receive PE50M(p1jo1; o2) = 0:6342 and PE05M0 = 0:9534 for a
database with 50 samples. Although the sample size could be
increased almost further we will not get any better results for
the JPD, then using the optimization approach.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Training Data</title>
      <p>We created five different, randomly connected BNs with 30
nodes and 250 edges. Afterwards the CfPD algorithm was
executed 10 times on these networks repeatedly. Table 1 shows
the total number of CPT entries for the 10 iterations. Figure
6 represents the decreasing trend of the number of entries for
each iteration. Already in the first iteration we can observe
iteration
0 (start)
1
2
3
4
5
node with the highest possible number of parents and hence
with the largest CPT is selected and reduced. For example
the number of parents of the chosen node in BNet5 is 19,
hence this node has 219 = 524288 CPT entries. After
executing CfPD on this net, the chosen node has 8 parents
afterwards and the newly-inserted node has 12 parents. Hence the
total number is reduced by 219 (28 + 212) = 431996 entries.
Figure 6 represents the trend of 10 iterations. For example</p>
      <p>BNet5 has 40 nodes and only 14666 CPT entries in the
resulting graph, which is 1.54% according to the initial graph.
Taking BNet5 as an example again, the algorithm can be
processed 84 times on this network, which will lead to a network
with 114 nodes and 760 (0.07%) CPT entries. Thus the size
of reduction is decreasing by each iteration. The number of
edges in a BN is restricted by n (n2 1) . On the one hand,
looking at a network with a high connectivity that has a higher
number of edges provides the opportunity to execute CfPD
several times and to lead to a high reduction of the number
of CPT entries. On the other hand, if we consider a network
that is less connected (hence it has only a few edges), we can
only execute the algorithm a limited number of times and the
reduction in the size of CPT entries will also be reduced.
Figure 7 shows the difference of five iterations between five BNs
with 30 nodes and 100 edges on the one side and 350 edges
on the other. Therefore, the improvement regarding the total
number of CPTs is very high at the beginning, but is
flattened by raising the number of iterations. If we look at Bnet2
from 6, the number of CPT entries after 14 iterations is 8255
compared to the start, when the network had 412167 entries,
which is an improvement of over 98%. Therefore we have to
weigh the improvement of reducing the number of entries in
the CPTs on the one hand against adding a new node to the
net on the other.</p>
    </sec>
    <sec id="sec-3">
      <title>Cognition</title>
      <p>We have shown that summing up nodes with a common
feature by introducing a new node can improve the number of
CPT entries significantly. We also pointed out that a high
number of iterations will reduce the number of entries
continuously, but the improvement decreases after each processing.</p>
      <p>Now let us make a little extension to the learning scheme
of CfPD by choosing an appropriate node or more precisely,
a convenient level, manually. Assume we have a learning
scheme that identifies ob jects by defining the actual place
as well as the shape of the object. A simple example can
be seen in figure 8. In this example, we have three levels
of identifiers: place, shape and object see figure 8a. There
are four example places in level one, three different shapes
in level two and 5 example objects in level three. By
obtaining knowledge about the place and the shape, the observer,
also called agent, can identify a unique object with a suitable
probability. CfPD may help us to improve these levels. Figure
8b is the resulting graph after executing CfPD on the second
level, which means that the parents of the shape nodes, so
level one, became divided. Assume the following scenario:
gardeIDn:garden
bathroom
kitchen</p>
      <p>living room
square
round</p>
      <p>triangular
cup
pot
bal
towel
(a) A cognition scheme for identifying
objects according their location and shape
garden
kitchen
living room</p>
      <p>bathroom
angled
round
triangular</p>
      <p>square
cup
pot
bal
towel
(b) Graph after processing CfPD on the
second level named shapes.
can assume that this object is a table with high probability
and e.g. a cup with very low probability. CfPD can be used to
improve a level of this network. If the level shapes includes
forms such as triangular, squared and round, the algorithm
can group together the first two by creating a new node and
hence a new level called, for example angled.</p>
    </sec>
    <sec id="sec-4">
      <title>Summary</title>
      <p>By adding a new node to a BN and updating the links in the
net, new probabilities must be computed and existing
probabilities must be changed in order that the probability of
occurrence of the observed node does not change. Therefore we
formulated the problem as an optimization approach and
minimized the costs between the probability of occurrence of the
observed node before and afterwards. Other objectives, such
as the Kullback-Leibler divergence might also be possible and
can be used instead of the probability of occurrence. This will
require some testing and might be a subject for future work.
We tested the optimization problem on the classical approach
applied in this paper, using FICO XPress IVE 7.6 with
random probabilities. The computed probabilities ensured the
same probability of occurrence for the observed node at the
initial network as well as for the node at the resulting net.</p>
      <p>We showed that the size of the CPTs can be reduced in
every step by repeating the algorithm to an existing BN.
Furthermore the improvement during the first iterations is quite
considerable and particularly in the case of highly-connected
networks, a significant reduction in the total number of CPT
entries can be achieved. By reducing the sizes of the CPTs
the efficiency increases distinctly.</p>
      <p>We provide an example how the CfPD algorithm can be
used for cognition of objects. The algorithm is processed on
a level of a BN instead of searching for a node with a large
CPT. As a result the number of CPT entries is, of course,
reduced and a new level is created that partitioned the chosen
level more precisely.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Pattern recognition and machine learning</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>F. V.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Bayesian networks and decision graphs</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Neapolitan</surname>
            ,
            <given-names>R. E.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Learning bayesian networks</article-title>
          .
          <source>Prentice Hall.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Olesen</surname>
            ,
            <given-names>K. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kjaerluff</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>F. V.</given-names>
          </string-name>
          (
          <year>1989</year>
          ).
          <article-title>A munin network for the median nerve - a case study on loops</article-title>
          .
          <source>Applied Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Pearl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1986</year>
          ).
          <article-title>Fusion, propagation, and structuring in belief networks</article-title>
          .
          <source>Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Ro</surname>
            ¨hrbein,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eggert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp; Ko¨rner, E. (
          <year>2009</year>
          ).
          <article-title>Child-friendly divorcing: Incremental hierarchy learning in bayesian networks</article-title>
          .
          <source>In IJCNN</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>