Simplifying Probability Elicitation and Uncertainty Modeling
                                      in Bayesian Networks
             Patrick Paulson∗ and Thomas E. Carroll and Chitra Sivaraman and Peter Neorr
                                 Stephen D. Unwin and Shamina Hossain
                                                 Pacific Northwest National Laboratory
                                                             Richland, WA


                             Abstract                                 software, and a procedure for installing operating system
                                                                      patches. Each mitigating relationship involves other assets
  In this paper we contribute two novel methods that simplify         that might have mitigating relationships that require analy-
  the demands of knowledge elicitation for particular types of
  Bayesian networks. The first method simplifies the task of ex-
                                                                      sis. This network of mitigation relationships gives us a tool
  perts providing conditional probabilities when the states that      to elicit best practices from domain experts. It is similar to
  a random variable takes can be described by a fully ordered         the causal mapping approach used for constructing Bayesian
  set. In this order, each state’s definition is inclusive of the     networks, where expert knowledge is represented as causal
  preceding state’s definition. Knowledge for the state is then       maps that are then, in turn, used to construct Bayesian net-
  elicited as a conditional probability of the preceding state.       works (Nadkarni & Shenoy 2004). However, the elicitation
  The second method leverages the Dempster-Shafer theory of           of the conditional probabilities necessary for Bayesian net-
  evidence to provide a way for the expert to express the degree      works proved difficult. This drove us to develop new meth-
  of ignorance that they feel about the estimates being provided.     ods for eliciting knowledge from experts.
                                                                      Our Contributions In this paper we present two novel
                         Introduction                                 methods to simplify the demands of knowledge elicitation
Currently, system administrators must be intimately familiar          on certain types of Bayesian networks. The first method de-
with their cyber assets and their organization’s missions. But        scribes the values of a random variable using a fulled or-
as the network of cyber resources continues to grows, it be-          dered set of states. In this order, each state’s definition is
comes exceedingly difficult to adequately prioritize time and         inclusive of the preceding state’s definition. The second
resources across possible threats as the crucial tie between          method uses Dempster-Shafer theory of evidence to provide
cyber assets and organizational missions is absent from most          a way for experts to express uncertainty in the estimates be-
cyber monitoring tools. As business needs and market pres-            ing provided.
sures are causing cyber systems to become more intercon-              Related Work Domain experts are heavily relied on to pro-
nected and thus more susceptible to cyber attacks, organiza-          vide information about probabilistic networks. Yet, these
tions require a tool that allows them to gauge risk exposure          experts often struggle with the complexity of this respon-
from multiple risk perspectives, such as public safety, envi-         sibility. The problem of eliciting probabilities attracted the
ronmental impact, and shareholder return.                             attention of many Bayesian network community (Druzdzel
   This need motivated us to develop Carim, a decision-               & van der Gaag 1995; van der Gaag et al. 1999; Olumuş &
support methodology that provides an assessment of the                Erbaş 2004). The elicitation of the probabilistic values for
consequences of threats to components of cyber systems so             reasoning under uncertainty is a critical obstacle (Druzdzel
that security personnel can better allocate resources to pro-         & van der Gaag 1995; van der Gaag et al. 1999; Olumuş &
tect key components. Because of the evolving nature of cy-            Erbaş 2004). Various methods have been designed to elicit
ber attacks, we’ve relied on non-probabilistic techniques to          probabilistic relations. But these methods tend to be very
allow us to characterize the completeness of the knowledge            time consuming and are difficult to apply when many thou-
used to make risk assessments.                                        sands of probabilities must be assessed.
   Carim models each asset in a system as a particular asset             While Bayesian networks have been used previously to
type. Asset types have known mitigating relationships with            reason about beliefs (see, for example, (Simon & Weber
other asset types. The mitigating relationships are elicited          2006; Simon, Weber, & Levrat 2007)), we generalize these
from domain experts and best practices and encompass a                methods and formally tie them to Dempster-Shafer (DS) the-
consensus view on the types of actions that can be taken to           ory of evidence. We simplify DS theory such that the focal
reduce an asset’s vulnerability. For example, a workstation           elements of a node (i.e., the subsets with non-zero mass) are
might have mitigating relationships that include the instal-          confined to the singleton plus a general “don’t know.” No
lation of anti-virus software, a backup server and related            other subset is assigned mass.
   ∗
       Corresponding author. E-mail: patrick.paulson@pnl.gov          Organization The paper is structured as follow. In Sec-
Figure 1: A man-in-the middle attack scenario in which
the attacker can eaves drop on the wireless communication
channels.

tion we describe a scenario for which the Carim method-
ology was applied and discuss approaches in eliciting expert
knowledge and representing uncertainty in the estimates that     Figure 2: The elements for assessing the vulnerability of
they provide. In Section we provide the background neces-        substation communications to an attack.
sary to understand our contributions. We discuss our con-
tributions in Section , which are novel methods for eliciting    ues than others—for example, the relative effectiveness of
knowledge for Bayesian networks. Finally, we summarize           rekeying policies was unknown, but the traffic authentication
and conclude in Section .                                        were clearer. When encoding these values into a Bayesian
                                                                 network however, the uncertainty of our expert disappears.
      Eliciting SCADA Domain Knowledge                           While there is a good deal of controversy on the subject, the
                   for Carim                                     traditional approach of handling this problem with Bayesian
                                                                 networks is to ensure that the elicited probabilities encom-
Carim has been applied to security in the domain of super-       pass the doubts of the expert, and to not support additional
visory control and data acquisition (SCADA) networks, the        “second order probabilities” (Pearl 1988, p. 360–363). We
networks used to control industrial processes. In this do-       desired, however, to explicitly model uncertainty so that the
main, we were particularly concerned with the possibility        end-user would have a measure of the applicability of the
of a man-in-the-middle attack when a SCADA network in-           results. The method described here allows the expert to ex-
cluded unsecured links between nodes. Figure 1 is a rep-         press uncertainty without forcing them to further analyze the
resentation of this scenario. Our resident expert suggested      factors causing their uncertainty so they can be expressed in
two technical fixes that could be used, either independently     one probability distribution.
or together, to protect against such an attack: SecureDNP, an       We also realized that we were putting undue burden on
encrypted wire protocol, and SSCP, a protocol that ensures       the expert by requiring them to state probabilities that had to
data integrity. The effectiveness of these techniques depends    match the constraints of the problem: it is always required,
on the “Rekey” policy used: how often the encryption and         for example, that the vulnerability not decrease when the
authentication keys are changed. Finally, the vulnerability      only change is that the expertise of the attacker increases.
to a man-in-the-middle attack depends on the capabilities of        We are charged then, with the following requirements:
the attacker: an insider might have access to the required
keys, and a state-backed attacker may have access to enough      1. Devise procedures to simplify the elicitation of probabili-
computing power to break the encryption scheme. The fac-            ties that are constrained by additional factors
tors considered by Carim in assessing vulnerability to this      2. Model the uncertainty of the knowledge used to provide a
attack are summarized in Figure 2.                                  measure of applicability of the model
   In order to assess the vulnerability of the SCADA net-
work using traditional Bayesian techniques, we would be re-
                                                                 A Technique to Simplify Elicitation
quired to elicit from our expert conditional probabilities for
each combination of mitigation states and attacker expertise.    In order to apply a Bayesian network to this problem, the
In approaching our expert with this task, we realized that       expert was required to provide conditional probabilities for
the expert was much more comfortable providing some val-         each state of compromise of the asset for each combination
Figure 3: Elicitation requirements for man-in-the-middle attack on substation communications. The left pane is the state space;
the right is the sample space of the variable.


                                                                   for the case when the attacker has medium expertise. The
Table 1: The reduced elicitation requirements for substation       values are elicited as percentages of the potential attackers
communication security. These values are for the case when         with the given level of expertise that can move an asset to
keys are changed every 12 months, both SecureDNP and               a more compromised level given the state of mitigations.
SSCP are enabled, and the attacker has medium expertise.           Since we assume that all such attackers can leave the asset in
                      Secure        1                              the “Secure” state, the first elicited value is the percentage of
                      Eavesdrop     60%                            attackers that can change the state to “being eavesdropped”.
                                                                   In our example, the expert asserts a value of 60 percent. The
                      Inject        10%
                                                                   next value we elicit is the percentage of attackers who can
                      Join          50%                            change the state to “inject messages.” An attacker who can
                      Control       10%                            effect this change also has the expertise to eavesdrop. The
                      Unknown       0.25                           expert testifies that 10 percent of all attackers who can eaves-
                                                                   drop can also inject messages. We continue eliciting values
                                                                   in this fashion in the order that the states are specified. Fi-
of the random variables that can affect the asset’s state, as      nally, we ask the expert to quantify her confidence in the
illustrated in Figure 3.                                           values she provided. If the expert feels the amount of infor-
   As described above, the expert is also allowed to specify a     mation given in the constraints is sufficient to determine the
probability for the special state Unknown, which is probabil-      elicited values, than the “unknown” value would be zero. If
ity they do not feel comfortable assigning to any particular       the expert feels that they have no basis for their judgments,
state.                                                             then “unknown” would be be one. Viewed this way, the “un-
                                                                   known” value is the portion of information required to make
   As can be seen in the Figure 3, the elicited probabilities in
                                                                   a judgment that is missing.
this problem have some interesting characteristics because
of additional constraints on the state spaces of the variable.
In particular, it is assumed that some states of compromise
                                                                                          Background
are “more difficult” to achieve then others; attackers with        In the following we briefly describe the foundations,
“higher” levels of expertise are accorded more probability         Bayesian networks and Dempster-Shafer theory of evidence,
of moving the asset into the more difficult states.                on which we build our contributions.
   Because of these considerations we simplified our elici-        Bayesian      Networks       Qualitative     Bayesian     Net-
tation technique. For given states of the values of the mit-       works (Halpern 2003, p. 135), as a special case of
igations and a specific level of expertise, we first have the      discrete influence diagrams (Kjaerulff & Madsen 2008, p.
expert give a estimate of the “uncertainty” they have in as-       ix), are convenient to elicit and encode an expert’s impres-
sessing the hypothesized situation. They are then asked to         sions of factors that influence values in their domain of
give, for the given level of expertise l, an estimate of the       expertise. In order to be operational, quantitative Bayesian
percentage of attackers with expertise l that can achieve the      network requires a myriad conditional probabilities to be
lowest level of compromise c0 on the asset. (Since the low-        specified for each combination of values in an expert that
est level of compromise is “completely secure”, this value         they may not feel comfortable in estimating.
is 100 percent). Then, for each succeeding level of compro-           A Bayesian network N = (G, P ) over a set of random
mise ci , they are then asked to estimate what percentage of       variables X = {X1 , . . . , Xn } consists of a directed acyclic
attackers with expertise l who can achieve level of compro-        graph (DAG) G that encodes a set of conditional indepen-
mise ci−1 can also achieve level of compromise ci . Section        dence assertions and local probability distributions P for
describes how we then convert these elicited values to prob-       each variable. Together, G and P form a joint probability
abilities used in a Bayesian network.                              distribution over X .
   Using Figure 3 as an example, we are eliciting values              To be a Bayesian network, N must possess the local
for when keys are changed every 12 months, and both Se-            Markov property. Denote by pa(Xi ) and nond(xi ) the set
cureDNP (encryption) and SSCP (authentication) are used.           of parents and non-descendants, respectively, of Xi . A net-
Using our technique, we elicited the values given in Table 1       work possess the local Markov property if, for each Xi ∈ X ,
Xpa ∈ pa(Xi ), and Xnond ∈ nond(Xi ), the proposi-
tion (Neapolitan 2004, p. 37)
  P (xi ) = 0 ∨ P (xpa |xnond ) = 0∨
                            P (xi |(xpa |Xnond )) = P (xi )
evaluates to true. In words, the local Markov property states
that each variable is conditionally independent of its non-
descendants given its parent variables.                                                    Controlled
   The local Markov property makes Bayesian networks an
effective technique for eliciting knowledge: by viewing the
network, an expert can determine if all factors are being con-                               Joined
sidered when determining the probability of an event.                                      Injection
Dempster-Shafer Theory The inability to express uncer-                                   Eavesdropped
tainty is a drawback of the approaches based on probability
theory (Halpern 2003, p. 24). However, expressing uncer-                                     Secure
tainty is a necessity when attempting to elicit understand-
ing in knowledge-poor domains (see, for example, (Forester        Figure 4: The inclusive compromise states of the substation
et al. 2004; Donell & Lehner 1993; O’Hagan & Oak-                 communications using set theory.
ley 2004)). In contrast to purely probabilistic methods for
capturing domain knowledge, Dempster-Shafer theory (DS)
provides a rich mechanism for describing the range of be-                                   Method
liefs about a result (Gordon & Shortliffe 1990). This rich-       We next discuss our contributions to eliciting knowledge
ness comes at the expense of complexity in both eliciting the     from experts. The first contribution is when the states of
values for expressing the different types of ignorance and in     a variable can be described by a fully ordered set. In this set,
the combination of multiple pieces of evidence (Ai, Wang,         a state implies all the preceding states. Our second contri-
& Wang 2008).                                                     bution is using Dempster-Shafer theory of evidence to allow
   In the following we summarize DS theory. We refer the          experts to express uncertainty of the estimates that they pro-
reader to (Gordon & Shortliffe 1990) for a reference on DS        vide.
theory. Let X be a random variable specified by the finite
set X of its values. Set X is also called the frame of discern-
                                                                  Simplifying Conditional Probability Elicitation in
ment. A basic probability assignment (BPA) mX over X is
a function                                                        Bayesian Networks
                      mX : 2X → [0, 1],                           Our goal is to elicit values in the form shown in Table 1
         X
where 2 is the power set of X, for which                          and calculate conditional probabilities that can be used in
                                    X                             a Bayesian network. As an example, consider Figure 3 in
         mX (∅) = 0            and       mX (S) = 1.              which we need to elicit the probability of compromise con-
                                   S⊆X                            ditioned on the Rekey policy, wire protocol, data integrity
                                                                  protocol, and the attacker’s level of expertise.
The mass or degree of belief mX (S) of S is a measure of
                                                                     The elements of the sample space of the random variables
that portion of belief that is committed exactly to S by mX
                                                                  in a Carim model often belong to a simple order. For ex-
and not to any particular subset of S. Each subset S such that
                                                                  ample, when considered in terms of the progression of an
m(S) > 0 is called a focal element. There are two measures
                                                                  attack, the states of compromise in Figure 2 can be ordered
that bound the interval that m(S) resides. The function
                                 X                                as Secure ≺ Eavesdropped ≺ Injection ≺ Joined ≺
                    belX (S) =       m(T )                        Controlled . What this says is that for the attacker to con-
                                T ⊆S
                                                                  trol devices, she must have joined the network, and to join,
                                                                  she must have the ability to inject traffic, and so on. The
computes the belief (or support) for all S ⊆ X. The plausi-       implication of states can be represented as inclusive sets as
bility of each S ⊆ X is given by                                  we have done in Figure 4.
                  plX (S) =
                              X
                                   m(T ).                            An advantage of this constraint when eliciting knowledge
                                                                  is that we can state our elicitation in terms of an already
                             T ∩S6=∅
                                                                  elicited value, which eases the cognitive load on the subject
Belief measures the total mass that is constrained to move        matter expert. For example, instead of asking: “What is the
within the set of interest, while plausibility measures the to-   likelihood that that a person with high skill level can eaves-
tal mass that can visit somewhere in the set of interest but      drop on the network”, and then separately asking “What is
can also move outside it. From the definitions, we see that       the likelihood that a person with high skill level can inject
belX (S) ≤ mX (S) ≤ plX (S).                                      traffic into the network”, we can ask “What is the likelihood
   In the next section, we discuss our methods for eliciting      that a person with high skill level who can eavesdrop the
knowledge from experts.                                           network can also inject traffic into it?”
   Let s1 , . . . , sn be states of X such that state si+1 im-           probability of a member of a simple order according to a
plies si (i.e., si+1 is a subset of si ). We elicit beliefs              domain expert. For example, the probability that a threat
P (s1 ), P (s2 |s1 ), . . . , P (sn |sn−1 ) from experts given the       will compromise an asset at a particular level of compro-
parents of X. But in probability theory, the elements of the             mise. Additionally, the domain expert may have the ability
sample space of a random variable must be disjoint. We ob-               to know when their knowledge about an area is incomplete,
tain the disjoint sample space by defining xi to mean for                but be unable to further describe the characteristics of the
si ∧ ¬si+1 . Treating the beliefs as probabilities, the proba-           incomplete knowledge. For these reasons, we wanted our
bility P (xi ) of X taking the value xi given X’s parents is:            users to be aware of the completeness of the knowledge in
                                            i
                                                                         decisions.
                                            Y                               We solved these problems by using Bayesian networks
    P (xi ) = (1 − P (si+1 |si ))P (s1 )          P (sj |sj−1 ),   (1)
                                                                         constructed using knowledge gained via our elicitation
                                            j=2
                                                                         methods described in this paper. The first method simpli-
for i = 1, . . . , n − 1, and                                            fies the elicitation of conditional probabilities when the sam-
                                   n
                                                                         ple space of a random variable can be described by a fully
                                   Y                                     ordered set of inclusive states. The conditional probability
               P (xn ) = P (s1 )         P (sj |sj−1 ).            (2)
                                                                         of a state is dependent only on is predecessor. The second
                                   j=2
                                                                         method implements a subset of Dempster-Shafer theory us-
If X is conditionally dependent on other variables, we have              ing a Bayesian network. This allows the network to provide
all the necessary values to construct a Bayesian network to              a measure of uncertainty along with its output.
compute P (xi ).
                                                                                                 References
Implementing Subset of Dempster-Shafer Theory
with Bayesian Networks                                                    Ai, L.; Wang, J.; and Wang, X. 2008. Multi-features fu-
                                                                          sion diagnosis of tremor based on artificial neural network
The greatest disadvantage of DS theory is that in contrast to             and D-S evidence theory. Signal Processing 88(12):2927–
probabilistic models, which are described by their respec-                2935.
tive density functions, DS models must be described by a
set, which grows exponentially with the number of variable                Carley, K. M., and Palmquist, M. 1992. Extracting, rep-
values. It would be difficult to elicit degree of belief for              resenting and analyzing mental models. Social Forces
each and every set. If we can represent that problem with a               70(3):601–636.
graph that satisfies the Markov property, we then can use the             Davey, B. A., and Priestley, H. A. 2002. Introduction to
computational efficiency of Bayesian networks to compute                  Lattices and Order. Cambridge University Press, 2 edition.
degrees of belief.                                                        Donell, M. L., and Lehner, P. E. 1993. Uncertainty
   Beliefs are elicited from experts for each value x of vari-            handling and cognitive biases in knowledge engineering.
able X and also the element U nknown, which is equiva-                    Systems, Man and Cybernetics, IEEE Transactions on
lent in DS theory to the set X. All other sets have no mass.              23(2):563–570.
        P conditions, P (x) satisfies the requirements of
Given these
                                                                          Druzdzel, M., and van der Gaag, L. 1995. Elicitation
mx as x̄∈X∪{U nknown} P (x̄) = 1. A Bayesian network
                                                                          of probabilities for belief networks: Combining qualitative
can be constructed such that the node that represents X has               and quantitative information. In In Uncertainty in Artificial
a state for each of its focal elements. The node’s conditional            Intelligence (95): Proceedings of the 11th conference, Los
probability table comprises the elicited conditional proba-               Altos CA, 141–148. Morgan Kaufmann.
bilities of X given its parents. The network output for the
node computes P (x̄), for each x̄ ∈ X ∪ {U nknown}. The                   Fiot, C.; Saptawati, G.; Laurent, A.; and Teisseire, M.
belief in x is simply belX (x) = P (x) and the plausibility is            2008. Learning bayesian network structure from incom-
plX (x) = P (x) + P (U nknown).                                           plete data without any assumption. In Haritsa, J.; Kotagiri,
   We now consider an example. There are three vari-                      R.; and Pudi, V., eds., Database Systems for Advanced Ap-
ables X, Y , and Z, where X conditionally depends on Y                    plications, volume 4947 of Lecture Notes in Computer Sci-
and Z and Y and Z are conditionally independent. From                     ence. Springer Berlin / Heidelberg. 408–423. 10.1007/978-
the definition of joint probability, the probability P (x̄) of            3-540-78568-2_30.
X ∪ {U nknown} is                                                         Forester, J.; Bley, D.; Cooper, S.; Lois, E.; Siu, N.; Ko-
                         X                                                laczkowski, A.; and Wreathall, J. 2004. Expert elicitation
        P (x̄) =                   P (x̄|ȳ, z̄)P (ȳ)P (z̄).             approach for performing atheana quantification. Reliability
                 ȳ∈Y ∪{U nknown}                                         Engineering & System Safety 83(2):207–220.
                 z̄∈Z∪{U nknown}
                                                                          Gordon, J., and Shortliffe, E. H. 1990. The Dempster-
This is the probability computed by the Bayesian network.                 Shafer theory of evidence. In Shafer, G., and Pearl, J., eds.,
                                                                          Readings in Uncertain Reasoning. San Mateo, California:
                          Conclusion                                      Morgan Kaufmann Publishers. 529–539.
In eliciting knowledge for Carim, we frequently came upon                 Halpern, J. Y. 2003. Reasoning about Uncertainty. Cam-
the situation where we needed to determine the subjective                 bridge, Massachusetts: The MIT Press.
Heckerman, D. 1997. Bayesian networks for data mining.         Pearl, J. 1988. Probabilistic Reasoning in Intelligent Sys-
Data Mining and Knowledge Discovery 1:79–119.                  tems. Morgan Kaufmann.
Kjaerulff, U. B., and Madsen, A. L. 2008. Bayesian Net-        Simon, C., and Weber, P. 2006. Bayesian networks imple-
works and Influence Diagrams: A Guide to Construction          mentation of Dempster Shafer theory to model reliability
and Analysis. Springer.                                        uncertainity. In Proc. of the 1st International Conference
Nadkarni, S., and Shenoy, P. 2004. A causal mapping ap-        on Availability, Reliability and Security (ARES ’06).
proach to constructing bayesian networks. Decision Sup-        Simon, C.; Weber, P.; and Levrat, E. 2007. Bayesian net-
port Systems 38:259–281.                                       works and evidence theory to model complex systems reli-
Neapolitan, R. E. 2004. Learning Bayesian Networks. Up-        ability. Jounal of Computers 2(1):33–43.
per Saddle River, NJ: Prentice Hall.                           van der Gaag, L.; Renooij, S.; Witteman, C.; Aleman, B.;
O’Hagan, A., and Oakley, J. E. 2004. Probability is perfect,   and Taal, B. 1999. How to elicit many probabilities. In
but we can’t elicit it perfectly. Reliability Engineering &    Proceedings of the 15th Conference on Uncertainty in Ar-
System Safety 85(1–3):239–248.                                 tificial Intelligence, 647–654. Morgan Kaufmann Publish-
Olumuş, H., and Erbaş, S. O. 2004. Determining the condi-    ers.
tional probabilities in Bayesian networks. Hacettepe Jour-
nal of Mathematics and Statistics 33:69–76.