Semantics for Reducing Complexity and Improving Accuracy in
        Model Creation Using Bayesian Network Decision Tools


                                               Oscar Kipersztok
                                        Boeing Research & Technology
                                          P.O.Box 3707, MC: 4C-77
                                               Seattle, WA 98124
                                         oscar.kipersztok@boeing.com


                        Abstract                                likelihood, impact and timing of events and trends
                                                                (Kipersztok, 2004).
The work presented simplifies and makes accessible the
process of using advanced probabilistic models to reason        DecAid is aimed at strategic decision making where the
about complex scenarios without the need for advanced           risk of making the wrong decision can be very costly and
training. More specifically, it greatly simplifies the effort   where there is need for argumentative rigor and careful
involved in building Bayesian Networks for making               documentation of ideas, associations and assumptions
probabilistic predictions in complex domains. These             leading to the final decision. The modeling methodology
methods typically require trained users with a                  was created to enable domain experts to create Bayesian
sophisticated understanding of how to build and use these       networks (BN) without having to familiarize with the
networks to predict future events. It entails the creation of   theory of graphical probabilistic networks or the practice
simplified semantics that keeps the complexity of the           of how to build them. Such users may not also require the
methodology transparent to users. We provide more               involvement of a knowledge engineer. At the levels where
precise semantics to the definition of concept variables in     high impact decisions are made, requiring high-level of
the domain model, as well as using those semantics to           abstraction and dealing with large number of variables
assign more precise and robust meaning to predicted             and interdependencies, it is less likely that decision
outcomes. This work is presented in the context of a tool       makers will use advanced decision analytic tools
and methodology, called DecAid, where complex                   requiring learning specialized methodology to define and
cognitive models are created by defining domain-specific        represent complex domain knowledge. The overall goals
concepts using free language and defining relations and         and requirements identified for the development of the
causal weights between them. In response to a user query        DecAid tool were described in (Kipersztok, 2007).
the DecAid, unconstrained, directed graph is converted          In a world of rapid change it is incresingly challenging to
into a Bayesian network to enable predictions of events         stay abreast of occurring events and trends, making it
and trends.                                                     more difficult to process information without the use of
                                                                advanced technology tools designed to manage
 1    INTRODUCTION                                              complexity and large volumes of information.
                                                                Furthermore, strategic decision makers recognize the need
DecAid is a hypothesis-driven decision support tool that
                                                                for argumentative explanations to strategic decisions that
facilitates complex strategic decisions with features that
                                                                capture the hypothetical reasoning and the evidential
allow for easy, fast, knowledge capture and modeling in
                                                                context behind each decision. For these reasons the need
complex domains. It identifies the key variables relevant
                                                                arises to rely on advanced methods to gather, organize,
to a specific query. While the cognitive, unconstrained,
                                                                process and analyze data and knowledge.
model is built, the defined concepts are used to create a
probabilistic model to forecast events and trends.              Bayesian networks practitioners recognize the need to
Similarly, the free-language used to define and label the       make the technology more accessible to end users due to
concepts is used to generate a document search classifier       the challenges presented during the model creation
to retrieve evidence for validation of hypotheses raised by     process. Some of the most significant challenges that
the predictive model. DecAid’s   goal   is   to   predict       DecAid aims to address are: 1) the complexity in eliciting
                                                                expert knowledge, 2) defining a, potentially, large number


                                                           41
of parameters and relations in a particular domain, 3)           remain unchanged, or decrease. Various levels of
adhering to conditional independence constraint in the           granularity can be selected to define the trend concept
definition of causal variables, and 4) requiring to avoid        states.
feedback reasoning during model creation that may result
                                                                 In this section we describe the formal definitions that
in graphs with cycles.
                                                                 enable the creation of a DN and its subsequent conversion
The first challenge has been addressed by various                into a BN.
software packages (e.g., Netica, GeNIe, Hugin, etc.) that
enable users to build BN with user-friendly interfaces           2.1       Definition of a DecAid Network (DN)
equipped with knowledge elicitation tools. Learning              Similar to a Bayesian network, each DecAid variable
algorithms have also provided the means for automated            (DV) represents a concept, which is some aspect of the
construction of BN structures and their parameters from          domain modeled. More specifically, a DV defines a
data.    To address the second challenge, canonical              probability distribution over its possible values and it is
structures have been defined that reduce the number of           discrete—i.e., finite-valued and typically taking 2, 3, 5, or
parameters needed to construct conditional probability           7 values. For example, we might have a DV named
tables (CPT). (Farry et al, 2008) review several canonical       ‘Barometric   Pressure’   that   has   3   values:   ‘decreasing’,  
models, including Influence Networks (Rose and Smith,            ‘unchanged’,  and   ‘increasing’.   The set of values is taken
1996), Noisy-OR, Noisy-MAX, Qualitative Probabilistic            to have some natural ordering so that we can speak of
Networks (QPN) and Causal Influence Models (CIM).                high values versus low values. If the variable is binary,
They, in particular, emphasize usability of CIM models           we would say that values such as false / off / does-not-
where the causal influence of each parent is captured by a       occur  would  be  “low”  compared  to  true  /  on  /  occurs.
single number and the combined influence of all parents
is the mean of the individual parent values. (Pfautz et al,      More formally, a DecAid model M includes a set V of
2007) address the first three challenges and describe            DVs and, taken together, the variables in V jointly
additional ones in findings from in-depth analyses of their      describe a distribution over the entire scenario modeled
experience in facilitation of model construction from            by M. Along with the set V, the model M includes a
numerous projects.                                               directed graph structure G connecting the variables of V.
                                                                 Each variable in V is a node of G and each arc denotes a
The purpose of this work is to describe formal semantics         direct  probabilistic  influence  of  the  parent’s  value  on  the  
that enable DecAid to be directly accessible to domain           distribution  over  the  child’s  values.  The  directed  graph  G
experts to create BN models without having to concern            is unconstrained—all connections are allowed and cycles
themselves with these challenges. These semantics are            are permitted. Each arc is labeled with a single real
aimed at easing the constraints imposed by the                   number  between  −1  and  1  called  the   weight. Intuitively,
aforementioned challenges by enabling users to define            the closer |w| is to 1, the stronger the influence of the
concepts and their relations in free-association mode.           parent over the child and the closer |w| is to 0, the weaker
Concepts are defined and labeled using free language and         the influence. If the weight is positive, a high parent
a single numerical weight is assigned to each parent-child       value makes high child values more likely and a low
relation. This effort results in the creation of the DecAid      parent value make low child values more likely. A
(unconstrained) network (DN), a directed graph, which            negative weight flips the influence so that a high parent
allows cycles. The step of creating a BN from the DN             value makes low child values more likely and a low
starts with a query definition, and it involves the              parent value makes high child values more likely (other
identification of the query-specific sub graph and removal       things being equal). Note that a moderate parent value
of its cycles by, optimally, minimizing the information          will make moderate child values more likely.
loss. The result is BN directed acyclic graph specific to
the query.

 2    FROM DECAID NETWORKS TO
      BAYESIAN NETWORKS
DecAid is a system for simple but powerful probabilistic
modeling of arbitrary scenarios. It enables domain expert
to create DecAid networks by defining concepts with free
language and causal relations between them. For each
pair of relations, the user assigns a weight of causal belief.
There are two types of concepts: a) Event concepts that
                                                                 Figure 2.1: A DecAid Unconstrained Model
represent quantities that can occur or not-occur; and b)
Trend concepts that represent quantities that increase,


                                                            42
Figure 2.1 shows an example of an unconstrained DN               What follows is a description of the method used to
representing three concept variables in the Aviation             express the random variable (RV) encoded by a DV. That
Safety domain and five parent-child relations with their         is, we show how to calculate a conditional probability
corresponding weights.                                           table (CPT) for each variable in the DecAid model given
                                                                 its parent set and the size of each variable.
Once, the unconstrained model is built, DecAid is capable
of transforming the DN into a BN in order to make
predictions in response to queries.                               3.1 Concepts Defined as Random Variables
                                                                 Let X be an n-valued DV from a DecAid model D. We
                                                                 say that the sample space S for X is the real interval [0,1).
2.2      Transforming a DecAid Network (DN)                      That is, we can suppose that X describes an experiment
         into a Bayesian Network structure                       whose outcome is a real number r such   that   0   ≤   r < 1.
                                                                 The values of the random variable X break the sample
A user can make a query to the DN by defining a set of           space into n disjoint events—namely, half-open intervals
observation variables and a target variable. In response         of equal length. The set of events is thus:
to the query, DecAid is capable of transforming the
unconstrained (directed graph) model to a Bayesian                { r  [k/n , (k+1)/n)    :  for  0  ≤  k < n }
network by carrying out the following sequence of steps:                    Example (3.1.1)
1) Identifying all cycles in the unconstrained model. We         If X has 2 states, the events corresponding to the states of
use an algorithm by (Johnson, 1975) that finds the               X are:
elementary cycles in the directed graph by improving over
the original algorithm by (Tarjan, 1973);                          { r  [0.0, 0.5) , r  [0.5,1.0) }.

2) Eliminating the cycles in the unconstrained model by                     Example (3.1.2)
removing the weak edges. This is done, optimally, in             If X has 5 states, the events corresponding to the states of
order to minimize the information loss in the                    X are:
unconstrained model. This step constitutes a tradeoff
between increased expressive power for domain-expert               { r  [0.0, 0.2), r  [0.2, 0.4), r  [0.4, 0.6), r  [0.6,
users and modest information loss resulting from removal         0.8), and r  [0.8, 1.0) }.
of edges that least contribute to the information flow.
3) Identifying the sub graph relevant to the query by             3.2 Conditional Probability Tables
pruning the non relevant variables from the resulting            The heart of the probabilistic semantics is the definition
Bayesian network (Geiger et al, 1990). This step                 of local conditional probability distributions for DecAid
constitutes an important feature of DecAid in that it can        variables. We consider the various cases below: a) where
list all the relevant parameters to the user that are relevant   the variable has no parents, b) where it has one parent of
to a specific user query.                                        weight 1, c) where it has one parent of arbitrary weight,
The last step in the creation of a query specific Bayesian       and finally, d) where it has any number of parents.
network is the creation of the conditional probability           Case 3.2.1 -Variables without parents
tables (CPT). The semantics to achieve that are described
in section 3.                                                    If X has no parents in D, then it is simply given a uniform
                                                                 distribution:
For practitioners involved in high-level, strategic,
decision making the use of Bayesian network building                        P(X = xk) = 1/n for  0  ≤  k < n .
tools can be counterintuitive and may require significant        That is, the event X = xk corresponds to r  [k/n, (k+1)/n).
training time, unavailable to such intended users. Making,       The probability equals the proportion of the total length of
however, the BN technology accessible through tools like         S contributed by X=xk. Since the total length of S is 1.0,
DecAid not only will improve the accuracy of decision            it is simply equal to the length of the interval, which is
making but will also provide the means to document and           (k+1  −  k)/n = 1/n.
track the chain of causal reasoning behind each decision.
                                                                            Example (3.2.1.1)
                                                                 If X has 2 states, P(X = xk)  =  0.5    for  0  ≤  k ≤  1.
3        SEMANTICS TO CREATE
                                                                            Example (3.2.1.2)
         CODITIONAL PROBABILITY
         TABLES                                                  If X has 5 states, P(X = xk)  =  0.2  for  0  ≤  k ≤  4.
                                                                 Case 3.2.2 – Variables with one parent and |w| = 1


                                                            43
We first describe the case where we have a single parent            P(x | y)             x0         x1           x2         x3           x4
Y and where the link from Y to its child X has weight 1.
We need to show how to calculate the conditional                    y0                   0.4        0.4          0.2        0            0
probability P(X = xk | Y = yj). This is given by the                y1                   0          0            0.2        0.4          0.4
formula:
P(X = xk | Y = yj , w = 1) = P(X = xk & Y = yj) / P(Y = yj) .
                                                                    Case 3.2.3 – Variables with one parent and |w| < 1
That is, the conditional probability of the event X = xk
given that Y = yj is equal to the intersection of the               We next look at the case where the weight is different
intervals corresponding to these events divided by the              than 1. It is useful to refer to the distribution defined in
length of the interval corresponding to Y = yj.                     Case 2a as the full-weight distribution—i.e., where w=1.
           Example (3.2.2.1)                                        Let Pfull(X | yj ) be the distribution over the values of X
                                                                    given Y = yj under the assumption that the arc from Y to X
Suppose YoX and Y has 5 states and X has 2 states,                  has weight w = 1. Let U(X) be the uniform distribution
P(X = x0 | Y = y2) = | Intersection of [0, 0.5) & [0.4, 0.6) | /    over the values of X.  Then,  if  the  weight  is  0  ≤  w < 1, we
| 0.6 – 0.4 |= 0.5                                                  have
The full CPT would be:                                              P(X | yj ,  0  ≤  w < 1 ) = w·Pfull(X | yj ) + (1 – w)·U(X)

                          P(x | y)         x0       x1              That is, the final distribution is a weighted combination of
                                                                    the distribution calculated in Case 3.2.1 and the uniform
                          y0               1        0               distribution—which is the default distribution if there
                          y1               1        0               were no parent. Note that the weight acts as the
                                                                    probability that we get the full-weight distribution instead
                          y2               0.5      0.5             of a uniform distribution.
                          y3               0        1
                          y4               0        1                          Example (3.2.3.1)
                                                                    Following the previous example (II.2.3), suppose YoX
                                                                    and Y has 2 states and X has 5 states. But now suppose
           Example (3.2.2.2)
                                                                    that the weight of the arc is w = 0.6, then we have
Suppose ZoX and Z has 3 states and X has 5 states,                  P(X = x0 | Y = y0) = w·Pfull(X | y0 ) + (1 – w)·U(X)
P(X = x0 | Z = z0) = | Intersection of [0, 0.2) & [0, 0.33) | / |                            = 0.6·0.4 + (1.0 – 0.6)·(1/5)
0.33 – 0 | = 0.6
                                                                                             = 0.24 + 0.4·0.2 = 0.3 + .08 = 0.32

The full CPT is:
                                                                    The full CPT is:
P(x | z)           x0     x1         x2    x3      x4
                                                                         P(x | y)              x0         x1           x2         x3           x4
z0                 0.6    0.4        0     0       0
z1                 0      0.2        0.6   0.2     0                     y0                    0.32       0.32         0.2        0.08         0.08

z2                 0      0          0     0.4     0.6                   y1                    0.08       0.08         0.2        0.32         0.32


           Example (3.2.2.3)                                        If   the   weight   is   negative,   the   direction   of   the   parent’s  
                                                                    influence is reversed. If Y is an m-valued variable, we can
Suppose YoX and Y has 2 states and X has 5 states,                  calculate the resulting distribution using a similar
P(X = x0 | Y = y0) = | Intersection of [0, 0.2) & [0, 0.5) | / |    calculation   above   but   for   the   “opposed”   value   of   the  
0.5 – 0 | = 0.2 / 0.5 = 0.4                                         parent.    By  “opposed”  we  mean  the value at the other side
                                                                    of the range—i.e., highest is opposed to lowest, second-
                                                                    highest is opposed to second-lowest, etc. More
The full CPT is:                                                    specifically, if the weight w < 0, we have


                                                               44
P(X | yj , –1  ≤  w < 0) = w·Pfull(X | ym-j-1 ) + (1 – w)·U(X)               = c·[ 0.03, 0.03, 0.02, 0.03, 0.04 ]
                                                                             = [0.2, 0.2, 0.13, 0.2, 0.27]
            Example (3.2.3.2)                                         The full CPT is:
Following the previous example (II.2.3), suppose YoX                   P(x | y, z)         x0       x1        x2      x3         x4
and Y has 2 states and X has 5 states. But now suppose
that the weight of the arc is w = –0.6, then we have                   y0 , z0             0.2      0.2       0.13    0.2        0.27

P(X = x2 | Y = y1) = 0.6·0.2 + (1.0 – 0.6)·(1/5) = 0.12 +              y0 , z1             0.15     0.3       0.4     0.1        0.05
0.4·0.2 = 0.12 + .08 = 0.2                                             y0 , z2             0.48     0.36      0.08    0.04       0.04
The full CPT is:                                                       y1 , z0             0.04     0.04      0.08    0.36       0.48
P(x | y)            x0       x1        x2     x3      x4               y1 , z1             0.05     0.1       0.4     0.3        0.15
y0                  0.08     0.08      0.2    0.32    0.32             y1 , z2             0.27     0.2       0.13    0.2        0.2
y1                  0.32     0.32      0.2    0.08    0.08


Case 3.2.4 – Variables with multiple parents
                                                                       4      DISCUSSION
                                                                      DecAid is used for strategic decision making. Here are a
The remaining situation is when we have a variable with               few examples of such decisions: a) when to launch a new
multiple parents. In this situation, we assume that the               product into a specific market, b) how close is a rouge
influence of each parent is independent of the influence of           country to achieving nuclear weapon capability, or c)
other parents. So, if X has parents Y1, Y2,  …,  YN, we set           whether to invest in a particular emerging technology.
P(X | Y1, Y2,  …,  YN) = c·P(X | Y1) ·P(X | Y2)  ·…·P(X  |  YN)       These are decisions that involve several variables and
                                                                      their inter relations. The system enables decision makers
where c is normalization constant to make the distribution            to define concepts of the problem in a simple, intuitive,
sum to 1.                                                             manner using free language. As the user defines the
                                                                      concepts and relations, the system is creating an
                                                                      unconstrained model. Once, the model is built, DecAid is
           Example (3.2.4.1)                                          capable of making predictions in response to queries by
Suppose X has 5 states and two parents: Y with 2 states               converting the unconstrained model into a Bayesian
and weight 0.5 and Z with 3 states and weight –0.5. As                network.
we saw above from examples (3.2.2.1) and (3.2.2.2), if we
ignore the weights of the arcs and the fact that there are
multiple parents, we have for parent Z:
Pfull(X | z0) = [ 0.6, 0.4, 0.0, 0.0, 0.0 ]
And for parent Y:
Pfull(X | y0) = [ 0.4, 0.4, 0.2, 0.0, 0.0 ]


Next, adding in the effect of the weights on the                      Figure 4.2: Predictions made by DecAid Model
distributions, we get:
                                                                      Figure 4.2 shows two such predictions derived from the
P(X | z0, w= –.5) = [ 0.1, 0.1, 0.1, 0.3, 0.4 ]                       model in Figure 2.1. The first prediction forecasts a 0.85
                                                                      probability  that  “Public  concern”  will  increase  given  that  
and
                                                                      “Occurrence   of   accidents”   has   increased   and  
P(X | y0, w= .5) = [ 0.3, 0.3, 0.2, 0.1, 0.1 ]                        “Government   oversight”   does   not   occur.   The   second  
                                                                      prediction lowers   the   forecast   that   “Public   concern”   will  
Now, combining both parents we get                                    increase  to  a  0.53  probability,  if  “Government  oversight”  
P(X | y0 , z0) = c·P(X | y0)·P(X | z0)                                occurs.
        = c·[ 0.3, 0.3, 0.2, 0.1, 0.1 ]·[ 0.1, 0.1, 0.1, 0.3,         At the final stage of making decisions, summarization and
0.4 ]                                                                 argumentation becomes critical steps. It is the aim of
                                                                      DecAid to facilitate the capture of knowledge and


                                                                 45
information for that last stage, as well, by combining the            probability is defined as the size of the intersection of
predictive analytic capability obtained from the cognitive            parent and child intervals divided by the size of parent
models with the ability to retrieve evidential data and               interval. 3) The magnitude of the weight is used as the
information to validated predictive hypotheses, which is              probability that you get the full-weight conditional
outside the scope of this paper.                                      distribution instead of a uniform distribution. 4) The sign
                                                                      of the weight is used to reverse the direction of influence.
The complete probabilistic semantics of a DecAid model
                                                                      And 5) the probabilistic influence of multiple parents on a
include how the local probability models are combined
                                                                      child are assumed to be independent of one another.
(not discussed here). The cornerstone of the semantics,
however, is the definition given here for the complete                The semantics described in this paper enable the creation
CPT of a local variable from the simple numeric weights               of a Bayesian networks from an unconstrained, directed
associated with its parents as provided by the end-user               graph model created by a user within a simpler, more
creating the model.                                                   intuitive, framework implemented in a tool called DecAid,
                                                                      without requiring specialized training in how to build
DecAid variables represent a tradeoff between simplicity
                                                                      Bayesian networks.
of model definition and expressive power. Aside from
adding temporal modeling (Nodelman, et. al. 2002, 2003)
to DecAid, there are additional areas where the balance               Acknowledgement
between simplicity and expressivity could be further
enhanced. In one such area there is, currently, complete              This work would have not been possible without the
symmetry between the positive effect of a parent taking               contributions and valuable, long-term, collaboration with
on a high value and the negative effect of a parent taking            Uri Nodelman for which the author and The Boeing
on a low value. Sometimes this symmetry is warranted                  Company are greatly thankful.
but sometimes it is not. For example, consider a child
variable   “Strength of a   Fire”   (“fire”)   with   a parent        References
“Oxygen   Level   Present”   (“oxygen”).   Increasing   oxygen  
                                                                      Farry M, Pfautz J, Cox Z, Bisantz A, Stone R, and Roth E
will tend to increase the fire and decreasing oxygen will
                                                                      (2008),  “An  Experimental  Procedure  for  Evaluating  User-
tend to lessen the fire. But now consider an alternative
                                                                      Centered methods for Rapid Bayesian Network
parent  “Use  of  fire-extinguisher”.    If  the  fire-extinguisher
                                                                      Construction”,  Proceedings of the 6th Bayesian Modeling
is used, that will tend to lessen the fire. But lack of fire-
                                                                      Applications Workshop at the 24th Annual Conference on
extinguisher use does not, in itself, increase the fire. So
                                                                      Uncertainty in AI: UAI 2008, Helsinki, Finland.
we may, in general, want to allow an asymmetry between
the impact of a high-value parent and a low-value parent
                                                                      Geiger, Verma, and Pearl, (1990), "Identifying
where the high-value has the regular effect but the low-
                                                                      Independence in Bayesian Networks", Networks 20:507-
value has no special impact on the child.
                                                                      534.

Furthermore, the assumption that the effects of multiple              Johnson, B. (1975),  “Finding  all  the  elementary  circuits  in  
parents are independent of each other is strong.                      a  directed  graph”,  SIAM Journal of Computing, 4(1).
Obviously, there are many cases where this assumption is
unwarranted. The problem would be to find a simple,                   Kipersztok, O (2007). “Using Human Factors, Reasoning
understandable way for end-users to convey extra                      and Text Processing for Hypothesis Validation.” Third
information about covariance and to find an algorithm                 International Workshop on Knowledge and Reasoning for
that could examine the link structure in other parts of the           Answering Questions. International Joint Conference in
DecAid model and extract some useful information about                Artificial Intelligence, Hydrabad, India.
the dependencies among the parents.
                                                                      Kipersztok O (2004). “Combining Cognitive Causal
5. SUMMARY                                                            Models with Reasoning and Text Processing Methods for
                                                                      Decision Support.” Second Bayesian Modeling
The semantics definition is given in this paper for the               Applications Workshop at the Uncertainty in AI
complete CPT of a local variable from simple numeric                  Conference, Banff Canada.
weights associated with its parents as provided by the
end-user creating a DecAid model. Explicitly, 1) The                  Nodelman, U., Shelton, C. R., and Koller, D. (2002).
values of a DV are represented as equal-length                        “Continuous Time Bayesian Networks.” Proceedings of
subintervals of the unit interval and making explicit that            the Eighteenth Conference on Uncertainty in Artificial
they have a natural ordering so they can be seen as                   Intelligence, pp. 378-387, 2002.
coming in opposed pairs (except for a possible middle-
most value). 2) A single parent full-weight conditional


                                                                46
Nodelman, U., Shelton, C.R., and Koller, D. (2003).
“Learning        Continuous           Time        Bayesian
Networks.” Proceedings of the Nineteenth Conference on
Uncertainty in Artificial Intelligence, pp. 451-458, 2003.

Pfautz J, Cox Z, Catto G, Koelle D, Campolongo J, Roth
E  (2007),”User-Centered Methods for Rapid Creation and
Validation of Bayesian Networks”.  In Proceedings of 5th
Bayesian Applications Workshop at Uncertainty in
Artificial Intelligence (UAI 07).

Rosen J and Smith W (1996), "Influence Net Modeling
with Causal Strengths: An Evolutionary Approach," In
the Proceedings of the 1996 Command and Control
Research and Technology Symposium, Monterey CA.

Tarjan, R. E. (1972), "Depth-first search and linear graph
algorithms", SIAM Journal on Computing 1 (2): 146–160.

Tarjan, R.E. (1973), “Enumeration of the elementary
circuits of a directed graph”, SIAM Journal of Computing
2 :211-216.


.


                                                        47