Simplifying Probability Elicitation and Uncertainty Modeling in Bayesian Networks Patrick Paulson∗ and Thomas E. Carroll and Chitra Sivaraman and Peter Neorr Stephen D. Unwin and Shamina Hossain Pacific Northwest National Laboratory Richland, WA Abstract software, and a procedure for installing operating system patches. Each mitigating relationship involves other assets In this paper we contribute two novel methods that simplify that might have mitigating relationships that require analy- the demands of knowledge elicitation for particular types of Bayesian networks. The first method simplifies the task of ex- sis. This network of mitigation relationships gives us a tool perts providing conditional probabilities when the states that to elicit best practices from domain experts. It is similar to a random variable takes can be described by a fully ordered the causal mapping approach used for constructing Bayesian set. In this order, each state’s definition is inclusive of the networks, where expert knowledge is represented as causal preceding state’s definition. Knowledge for the state is then maps that are then, in turn, used to construct Bayesian net- elicited as a conditional probability of the preceding state. works (Nadkarni & Shenoy 2004). However, the elicitation The second method leverages the Dempster-Shafer theory of of the conditional probabilities necessary for Bayesian net- evidence to provide a way for the expert to express the degree works proved difficult. This drove us to develop new meth- of ignorance that they feel about the estimates being provided. ods for eliciting knowledge from experts. Our Contributions In this paper we present two novel Introduction methods to simplify the demands of knowledge elicitation Currently, system administrators must be intimately familiar on certain types of Bayesian networks. The first method de- with their cyber assets and their organization’s missions. But scribes the values of a random variable using a fulled or- as the network of cyber resources continues to grows, it be- dered set of states. In this order, each state’s definition is comes exceedingly difficult to adequately prioritize time and inclusive of the preceding state’s definition. The second resources across possible threats as the crucial tie between method uses Dempster-Shafer theory of evidence to provide cyber assets and organizational missions is absent from most a way for experts to express uncertainty in the estimates be- cyber monitoring tools. As business needs and market pres- ing provided. sures are causing cyber systems to become more intercon- Related Work Domain experts are heavily relied on to pro- nected and thus more susceptible to cyber attacks, organiza- vide information about probabilistic networks. Yet, these tions require a tool that allows them to gauge risk exposure experts often struggle with the complexity of this respon- from multiple risk perspectives, such as public safety, envi- sibility. The problem of eliciting probabilities attracted the ronmental impact, and shareholder return. attention of many Bayesian network community (Druzdzel This need motivated us to develop Carim, a decision- & van der Gaag 1995; van der Gaag et al. 1999; Olumuş & support methodology that provides an assessment of the Erbaş 2004). The elicitation of the probabilistic values for consequences of threats to components of cyber systems so reasoning under uncertainty is a critical obstacle (Druzdzel that security personnel can better allocate resources to pro- & van der Gaag 1995; van der Gaag et al. 1999; Olumuş & tect key components. Because of the evolving nature of cy- Erbaş 2004). Various methods have been designed to elicit ber attacks, we’ve relied on non-probabilistic techniques to probabilistic relations. But these methods tend to be very allow us to characterize the completeness of the knowledge time consuming and are difficult to apply when many thou- used to make risk assessments. sands of probabilities must be assessed. Carim models each asset in a system as a particular asset While Bayesian networks have been used previously to type. Asset types have known mitigating relationships with reason about beliefs (see, for example, (Simon & Weber other asset types. The mitigating relationships are elicited 2006; Simon, Weber, & Levrat 2007)), we generalize these from domain experts and best practices and encompass a methods and formally tie them to Dempster-Shafer (DS) the- consensus view on the types of actions that can be taken to ory of evidence. We simplify DS theory such that the focal reduce an asset’s vulnerability. For example, a workstation elements of a node (i.e., the subsets with non-zero mass) are might have mitigating relationships that include the instal- confined to the singleton plus a general “don’t know.” No lation of anti-virus software, a backup server and related other subset is assigned mass. ∗ Corresponding author. E-mail: patrick.paulson@pnl.gov Organization The paper is structured as follow. In Sec- Figure 1: A man-in-the middle attack scenario in which the attacker can eaves drop on the wireless communication channels. tion we describe a scenario for which the Carim method- ology was applied and discuss approaches in eliciting expert knowledge and representing uncertainty in the estimates that Figure 2: The elements for assessing the vulnerability of they provide. In Section we provide the background neces- substation communications to an attack. sary to understand our contributions. We discuss our con- tributions in Section , which are novel methods for eliciting ues than others—for example, the relative effectiveness of knowledge for Bayesian networks. Finally, we summarize rekeying policies was unknown, but the traffic authentication and conclude in Section . were clearer. When encoding these values into a Bayesian network however, the uncertainty of our expert disappears. Eliciting SCADA Domain Knowledge While there is a good deal of controversy on the subject, the for Carim traditional approach of handling this problem with Bayesian networks is to ensure that the elicited probabilities encom- Carim has been applied to security in the domain of super- pass the doubts of the expert, and to not support additional visory control and data acquisition (SCADA) networks, the “second order probabilities” (Pearl 1988, p. 360–363). We networks used to control industrial processes. In this do- desired, however, to explicitly model uncertainty so that the main, we were particularly concerned with the possibility end-user would have a measure of the applicability of the of a man-in-the-middle attack when a SCADA network in- results. The method described here allows the expert to ex- cluded unsecured links between nodes. Figure 1 is a rep- press uncertainty without forcing them to further analyze the resentation of this scenario. Our resident expert suggested factors causing their uncertainty so they can be expressed in two technical fixes that could be used, either independently one probability distribution. or together, to protect against such an attack: SecureDNP, an We also realized that we were putting undue burden on encrypted wire protocol, and SSCP, a protocol that ensures the expert by requiring them to state probabilities that had to data integrity. The effectiveness of these techniques depends match the constraints of the problem: it is always required, on the “Rekey” policy used: how often the encryption and for example, that the vulnerability not decrease when the authentication keys are changed. Finally, the vulnerability only change is that the expertise of the attacker increases. to a man-in-the-middle attack depends on the capabilities of We are charged then, with the following requirements: the attacker: an insider might have access to the required keys, and a state-backed attacker may have access to enough 1. Devise procedures to simplify the elicitation of probabili- computing power to break the encryption scheme. The fac- ties that are constrained by additional factors tors considered by Carim in assessing vulnerability to this 2. Model the uncertainty of the knowledge used to provide a attack are summarized in Figure 2. measure of applicability of the model In order to assess the vulnerability of the SCADA net- work using traditional Bayesian techniques, we would be re- A Technique to Simplify Elicitation quired to elicit from our expert conditional probabilities for each combination of mitigation states and attacker expertise. In order to apply a Bayesian network to this problem, the In approaching our expert with this task, we realized that expert was required to provide conditional probabilities for the expert was much more comfortable providing some val- each state of compromise of the asset for each combination Figure 3: Elicitation requirements for man-in-the-middle attack on substation communications. The left pane is the state space; the right is the sample space of the variable. for the case when the attacker has medium expertise. The Table 1: The reduced elicitation requirements for substation values are elicited as percentages of the potential attackers communication security. These values are for the case when with the given level of expertise that can move an asset to keys are changed every 12 months, both SecureDNP and a more compromised level given the state of mitigations. SSCP are enabled, and the attacker has medium expertise. Since we assume that all such attackers can leave the asset in Secure 1 the “Secure” state, the first elicited value is the percentage of Eavesdrop 60% attackers that can change the state to “being eavesdropped”. In our example, the expert asserts a value of 60 percent. The Inject 10% next value we elicit is the percentage of attackers who can Join 50% change the state to “inject messages.” An attacker who can Control 10% effect this change also has the expertise to eavesdrop. The Unknown 0.25 expert testifies that 10 percent of all attackers who can eaves- drop can also inject messages. We continue eliciting values in this fashion in the order that the states are specified. Fi- of the random variables that can affect the asset’s state, as nally, we ask the expert to quantify her confidence in the illustrated in Figure 3. values she provided. If the expert feels the amount of infor- As described above, the expert is also allowed to specify a mation given in the constraints is sufficient to determine the probability for the special state Unknown, which is probabil- elicited values, than the “unknown” value would be zero. If ity they do not feel comfortable assigning to any particular the expert feels that they have no basis for their judgments, state. then “unknown” would be be one. Viewed this way, the “un- known” value is the portion of information required to make As can be seen in the Figure 3, the elicited probabilities in a judgment that is missing. this problem have some interesting characteristics because of additional constraints on the state spaces of the variable. In particular, it is assumed that some states of compromise Background are “more difficult” to achieve then others; attackers with In the following we briefly describe the foundations, “higher” levels of expertise are accorded more probability Bayesian networks and Dempster-Shafer theory of evidence, of moving the asset into the more difficult states. on which we build our contributions. Because of these considerations we simplified our elici- Bayesian Networks Qualitative Bayesian Net- tation technique. For given states of the values of the mit- works (Halpern 2003, p. 135), as a special case of igations and a specific level of expertise, we first have the discrete influence diagrams (Kjaerulff & Madsen 2008, p. expert give a estimate of the “uncertainty” they have in as- ix), are convenient to elicit and encode an expert’s impres- sessing the hypothesized situation. They are then asked to sions of factors that influence values in their domain of give, for the given level of expertise l, an estimate of the expertise. In order to be operational, quantitative Bayesian percentage of attackers with expertise l that can achieve the network requires a myriad conditional probabilities to be lowest level of compromise c0 on the asset. (Since the low- specified for each combination of values in an expert that est level of compromise is “completely secure”, this value they may not feel comfortable in estimating. is 100 percent). Then, for each succeeding level of compro- A Bayesian network N = (G, P ) over a set of random mise ci , they are then asked to estimate what percentage of variables X = {X1 , . . . , Xn } consists of a directed acyclic attackers with expertise l who can achieve level of compro- graph (DAG) G that encodes a set of conditional indepen- mise ci−1 can also achieve level of compromise ci . Section dence assertions and local probability distributions P for describes how we then convert these elicited values to prob- each variable. Together, G and P form a joint probability abilities used in a Bayesian network. distribution over X . Using Figure 3 as an example, we are eliciting values To be a Bayesian network, N must possess the local for when keys are changed every 12 months, and both Se- Markov property. Denote by pa(Xi ) and nond(xi ) the set cureDNP (encryption) and SSCP (authentication) are used. of parents and non-descendants, respectively, of Xi . A net- Using our technique, we elicited the values given in Table 1 work possess the local Markov property if, for each Xi ∈ X , Xpa ∈ pa(Xi ), and Xnond ∈ nond(Xi ), the proposi- tion (Neapolitan 2004, p. 37) P (xi ) = 0 ∨ P (xpa |xnond ) = 0∨ P (xi |(xpa |Xnond )) = P (xi ) evaluates to true. In words, the local Markov property states that each variable is conditionally independent of its non- descendants given its parent variables. Controlled The local Markov property makes Bayesian networks an effective technique for eliciting knowledge: by viewing the network, an expert can determine if all factors are being con- Joined sidered when determining the probability of an event. Injection Dempster-Shafer Theory The inability to express uncer- Eavesdropped tainty is a drawback of the approaches based on probability theory (Halpern 2003, p. 24). However, expressing uncer- Secure tainty is a necessity when attempting to elicit understand- ing in knowledge-poor domains (see, for example, (Forester Figure 4: The inclusive compromise states of the substation et al. 2004; Donell & Lehner 1993; O’Hagan & Oak- communications using set theory. ley 2004)). In contrast to purely probabilistic methods for capturing domain knowledge, Dempster-Shafer theory (DS) provides a rich mechanism for describing the range of be- Method liefs about a result (Gordon & Shortliffe 1990). This rich- We next discuss our contributions to eliciting knowledge ness comes at the expense of complexity in both eliciting the from experts. The first contribution is when the states of values for expressing the different types of ignorance and in a variable can be described by a fully ordered set. In this set, the combination of multiple pieces of evidence (Ai, Wang, a state implies all the preceding states. Our second contri- & Wang 2008). bution is using Dempster-Shafer theory of evidence to allow In the following we summarize DS theory. We refer the experts to express uncertainty of the estimates that they pro- reader to (Gordon & Shortliffe 1990) for a reference on DS vide. theory. Let X be a random variable specified by the finite set X of its values. Set X is also called the frame of discern- Simplifying Conditional Probability Elicitation in ment. A basic probability assignment (BPA) mX over X is a function Bayesian Networks mX : 2X → [0, 1], Our goal is to elicit values in the form shown in Table 1 X where 2 is the power set of X, for which and calculate conditional probabilities that can be used in X a Bayesian network. As an example, consider Figure 3 in mX (∅) = 0 and mX (S) = 1. which we need to elicit the probability of compromise con- S⊆X ditioned on the Rekey policy, wire protocol, data integrity protocol, and the attacker’s level of expertise. The mass or degree of belief mX (S) of S is a measure of The elements of the sample space of the random variables that portion of belief that is committed exactly to S by mX in a Carim model often belong to a simple order. For ex- and not to any particular subset of S. Each subset S such that ample, when considered in terms of the progression of an m(S) > 0 is called a focal element. There are two measures attack, the states of compromise in Figure 2 can be ordered that bound the interval that m(S) resides. The function X as Secure ≺ Eavesdropped ≺ Injection ≺ Joined ≺ belX (S) = m(T ) Controlled . What this says is that for the attacker to con- T ⊆S trol devices, she must have joined the network, and to join, she must have the ability to inject traffic, and so on. The computes the belief (or support) for all S ⊆ X. The plausi- implication of states can be represented as inclusive sets as bility of each S ⊆ X is given by we have done in Figure 4. plX (S) = X m(T ). An advantage of this constraint when eliciting knowledge is that we can state our elicitation in terms of an already T ∩S6=∅ elicited value, which eases the cognitive load on the subject Belief measures the total mass that is constrained to move matter expert. For example, instead of asking: “What is the within the set of interest, while plausibility measures the to- likelihood that that a person with high skill level can eaves- tal mass that can visit somewhere in the set of interest but drop on the network”, and then separately asking “What is can also move outside it. From the definitions, we see that the likelihood that a person with high skill level can inject belX (S) ≤ mX (S) ≤ plX (S). traffic into the network”, we can ask “What is the likelihood In the next section, we discuss our methods for eliciting that a person with high skill level who can eavesdrop the knowledge from experts. network can also inject traffic into it?” Let s1 , . . . , sn be states of X such that state si+1 im- probability of a member of a simple order according to a plies si (i.e., si+1 is a subset of si ). We elicit beliefs domain expert. For example, the probability that a threat P (s1 ), P (s2 |s1 ), . . . , P (sn |sn−1 ) from experts given the will compromise an asset at a particular level of compro- parents of X. But in probability theory, the elements of the mise. Additionally, the domain expert may have the ability sample space of a random variable must be disjoint. We ob- to know when their knowledge about an area is incomplete, tain the disjoint sample space by defining xi to mean for but be unable to further describe the characteristics of the si ∧ ¬si+1 . Treating the beliefs as probabilities, the proba- incomplete knowledge. For these reasons, we wanted our bility P (xi ) of X taking the value xi given X’s parents is: users to be aware of the completeness of the knowledge in i decisions. Y We solved these problems by using Bayesian networks P (xi ) = (1 − P (si+1 |si ))P (s1 ) P (sj |sj−1 ), (1) constructed using knowledge gained via our elicitation j=2 methods described in this paper. The first method simpli- for i = 1, . . . , n − 1, and fies the elicitation of conditional probabilities when the sam- n ple space of a random variable can be described by a fully Y ordered set of inclusive states. The conditional probability P (xn ) = P (s1 ) P (sj |sj−1 ). (2) of a state is dependent only on is predecessor. The second j=2 method implements a subset of Dempster-Shafer theory us- If X is conditionally dependent on other variables, we have ing a Bayesian network. This allows the network to provide all the necessary values to construct a Bayesian network to a measure of uncertainty along with its output. compute P (xi ). References Implementing Subset of Dempster-Shafer Theory with Bayesian Networks Ai, L.; Wang, J.; and Wang, X. 2008. Multi-features fu- sion diagnosis of tremor based on artificial neural network The greatest disadvantage of DS theory is that in contrast to and D-S evidence theory. Signal Processing 88(12):2927– probabilistic models, which are described by their respec- 2935. tive density functions, DS models must be described by a set, which grows exponentially with the number of variable Carley, K. M., and Palmquist, M. 1992. Extracting, rep- values. It would be difficult to elicit degree of belief for resenting and analyzing mental models. Social Forces each and every set. If we can represent that problem with a 70(3):601–636. graph that satisfies the Markov property, we then can use the Davey, B. A., and Priestley, H. A. 2002. Introduction to computational efficiency of Bayesian networks to compute Lattices and Order. Cambridge University Press, 2 edition. degrees of belief. Donell, M. L., and Lehner, P. E. 1993. Uncertainty Beliefs are elicited from experts for each value x of vari- handling and cognitive biases in knowledge engineering. able X and also the element U nknown, which is equiva- Systems, Man and Cybernetics, IEEE Transactions on lent in DS theory to the set X. All other sets have no mass. 23(2):563–570. P conditions, P (x) satisfies the requirements of Given these Druzdzel, M., and van der Gaag, L. 1995. Elicitation mx as x̄∈X∪{U nknown} P (x̄) = 1. A Bayesian network of probabilities for belief networks: Combining qualitative can be constructed such that the node that represents X has and quantitative information. In In Uncertainty in Artificial a state for each of its focal elements. The node’s conditional Intelligence (95): Proceedings of the 11th conference, Los probability table comprises the elicited conditional proba- Altos CA, 141–148. Morgan Kaufmann. bilities of X given its parents. The network output for the node computes P (x̄), for each x̄ ∈ X ∪ {U nknown}. The Fiot, C.; Saptawati, G.; Laurent, A.; and Teisseire, M. belief in x is simply belX (x) = P (x) and the plausibility is 2008. Learning bayesian network structure from incom- plX (x) = P (x) + P (U nknown). plete data without any assumption. In Haritsa, J.; Kotagiri, We now consider an example. There are three vari- R.; and Pudi, V., eds., Database Systems for Advanced Ap- ables X, Y , and Z, where X conditionally depends on Y plications, volume 4947 of Lecture Notes in Computer Sci- and Z and Y and Z are conditionally independent. From ence. Springer Berlin / Heidelberg. 408–423. 10.1007/978- the definition of joint probability, the probability P (x̄) of 3-540-78568-2_30. X ∪ {U nknown} is Forester, J.; Bley, D.; Cooper, S.; Lois, E.; Siu, N.; Ko- X laczkowski, A.; and Wreathall, J. 2004. Expert elicitation P (x̄) = P (x̄|ȳ, z̄)P (ȳ)P (z̄). approach for performing atheana quantification. Reliability ȳ∈Y ∪{U nknown} Engineering & System Safety 83(2):207–220. z̄∈Z∪{U nknown} Gordon, J., and Shortliffe, E. H. 1990. The Dempster- This is the probability computed by the Bayesian network. Shafer theory of evidence. In Shafer, G., and Pearl, J., eds., Readings in Uncertain Reasoning. San Mateo, California: Conclusion Morgan Kaufmann Publishers. 529–539. In eliciting knowledge for Carim, we frequently came upon Halpern, J. Y. 2003. Reasoning about Uncertainty. Cam- the situation where we needed to determine the subjective bridge, Massachusetts: The MIT Press. Heckerman, D. 1997. Bayesian networks for data mining. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Sys- Data Mining and Knowledge Discovery 1:79–119. tems. Morgan Kaufmann. Kjaerulff, U. B., and Madsen, A. L. 2008. Bayesian Net- Simon, C., and Weber, P. 2006. Bayesian networks imple- works and Influence Diagrams: A Guide to Construction mentation of Dempster Shafer theory to model reliability and Analysis. Springer. uncertainity. In Proc. of the 1st International Conference Nadkarni, S., and Shenoy, P. 2004. A causal mapping ap- on Availability, Reliability and Security (ARES ’06). proach to constructing bayesian networks. Decision Sup- Simon, C.; Weber, P.; and Levrat, E. 2007. Bayesian net- port Systems 38:259–281. works and evidence theory to model complex systems reli- Neapolitan, R. E. 2004. Learning Bayesian Networks. Up- ability. Jounal of Computers 2(1):33–43. per Saddle River, NJ: Prentice Hall. van der Gaag, L.; Renooij, S.; Witteman, C.; Aleman, B.; O’Hagan, A., and Oakley, J. E. 2004. Probability is perfect, and Taal, B. 1999. How to elicit many probabilities. In but we can’t elicit it perfectly. Reliability Engineering & Proceedings of the 15th Conference on Uncertainty in Ar- System Safety 85(1–3):239–248. tificial Intelligence, 647–654. Morgan Kaufmann Publish- Olumuş, H., and Erbaş, S. O. 2004. Determining the condi- ers. tional probabilities in Bayesian networks. Hacettepe Jour- nal of Mathematics and Statistics 33:69–76.