Target Beliefs for SME-oriented, Bayesian Network-based Modeling Robert Schrag Edward Wright, Robert Kerr, Robert Johnson Haystax Technology Haystax Technology 11210 Corsica Mist Ave 8251 Greensboro Dr, Suite 1000 Las Vegas, NV 89135 McLean, VA 22102 Abstract probability distributions (that otherwise might be used to achieve target beliefs directly). For example, if a binary node Divorces appears deep in a Our framework supporting non-technical subject person risk assessment network as an indicator of a top-level matter experts’ authoring of useful Bayesian binary node Trustworthy, usually (without target beliefs or networks has presented requirements for fixed other node findings) the network’s computed belief in probability soft or virtual evidence findings that we Divorces will depend on the network’s conditional refer to as target beliefs. We describe exogenously probability tables (CPTs)1—not on a published statistic about motivated target belief requirements for model the divorce rate in an intended subject population. To make nodes lacking explicit priors and mechanistically our model’s belief in Divorces agree with the exogenous motivated requirements induced by logical statistic, a modeler can: constraints over nodes that in the framework are strictly binary. Compared to the best published 1. Adjust CPTs throughout the model to agree with the results, our target belief satisfaction methods are exogenous specification. competitive in result quality and processing time on 2. Invoke Jeffrey’s rule (Jeffrey, 1983) to compute a much larger problems. likelihood finding on Divorces that achieves the specified belief. 3. Specify a target belief for Divorces and rely on target belief satisfaction machinery to achieve the target. 1. INTRODUCTION The first option is not entirely compatible with our modeling The variety of soft or virtual evidence finding on a Bayesian framework.2 The modeler’s manual effort under either of the network (BN) node in which a specified probability first two options may be undermined as soon as s/he modifies distribution must be maintained during BN inference—called the model again.3 The last option offloads the work of target a fixed probability finding by (Ben Mrad, 2015) and called a belief satisfaction to an automated process—at the expense target belief here—has received limited attention. Published of executing that process, as often as necessary. Execution results for inference algorithms respecting such findings have time may be acceptable for a given use case if the model is addressed small, artificial problems including at most 15 small, if it is not modified often, or if model development is nodes (Peng et al., 2010; Zhang et al., 2008). sufficiently simplified under this approach to enhance overall Our work on one real application has required addressing productivity. As we intend our framework to be subject dozens of such findings in a BN comprising hundreds of matter expert- (SME-)friendly, this option is attractive. The nodes. In this context, target beliefs are motivated by more we can free a modeler to concentrate on higher-level modelers’ need to address authoritative sources exogenous to decisions with greater domain impact, the more and better the model itself, where beliefs should hold for selected non- models s/he should be able to deliver. BN root model nodes—i.e., nodes lacking explicit prior 1 Including top-level node priors as a degenerate case. accommodate a conventional approach to machine learning of CPT 2 entries. Our framework automatically computes CPTs (see section 2) to reflect a 3 modeler’s specified strength with which a child node (counter-)indicates In principle, any of a large variety of modifications—including more its parent node. So, modifying CPTs is appropriate only when modifying invocations of this option to address additional exogenous these strengths is. Likewise, the representation would not naturally probabilities—could affect computed belief in Divorces. BMAW 2016 - Page 21 of 59 Our work adapting the framework to realize probabilistic In the framework, every node (or argument map statement4) argument maps for intelligence analysis (Schrag et al., 2016a; is a Hypothesis. Some Hypotheses are Logic nodes whose 2016b) has surfaced powerful representations (Logic CPTs are deterministic. Connecting the nodes are links constraints—see section 4) that can improve model clarity whose types are listed in Table 1. Argument maps’ and correctness and that often require target beliefs. SupportedBy and RefutedBy links correspond to our IndicatedBy and CounterIndicatedBy links. In the following sections, we outline the framework, our large person risk assessment model, and the view of framework models as probabilistic argument maps. We explain how Table 1: Framework link types (center column). For the last Logic constraints can improve arguments (models) and how two link types, the argument map-downstream statement target beliefs can support such constraints. We briefly review (BN-downstream node) is a Logic node. existing competitive target belief processing methods, then describe our own method and results. IndicatedBy CounterIndicatedBy 2. SME-ORIENTED MODELING Argument Argument map- MitigatedBy map- FRAMEWORK downstream5 RelevantIf upstream We developed the framework to facilitate creation of useful statement statement(s) BNs by non-technical SMEs. Faced with the challenge of OppositeOf operationalizing SMEs’ policy-guided reasoning about ImpliedByConjunction person trustworthiness in a comprehensive risk model (Schrag et al., 2014), we first developed a model encoding ImpliedByDisjunction hundreds of policy statements. The need for SMEs both to understand the model and to author its elements inspired us to develop and apply a technical approach using exclusively We encode strengths for non-Logic node-input links (first four binary random variables (BN nodes) over the domain {true, rows of Table 1) using fixed odds ratios per Figure 1. false}. This led us to an overall representation that happens to extend standard argument maps (CIA, 2006) with Bayesian probabilistic reasoning (Schrag et al., 2016a; 2016b). against in favor 1:16 1:8 1:4 1:2 1:1 odds 2:1 4:1 8:1 16:1 VeryStrongly VeryStrongly Moderately Moderately Absolutely Absolutely Strongly Strongly Weakly Weakly –4 –3 –2 –1 0 log2 odds 1 2 3 4 Figure 1: Odds ratios for discrete link strengths. Absolutely is intended as logical implication. We do not otherwise commit SMEs to absolute certainty. 4 5 Our binary BN nodes correspond to propositions bearing truth values. In Per argument map convention, “downstream” is left, “upstream” right in the argument map point of view, these propositions may be understood the left-flowing argument map of Figure 3. Except for Logic nodes, this to be statements. is opposite of links’ causal direction in BNs. BMAW 2016 - Page 22 of 59 A framework process (Wright et al., 2015) converts Trustworthy specifications into corresponding BNs. The conversion … process recognizes a pattern of link types incident on a given Reliable … CommitsMisdemeanor node and constructs an appropriate CPT reflecting specified polarities and strengths. The SME thus works in a graphical user interface (GUI) with an argument map representation (as CommittedToSchool … CommittedToCareeer if at a “dashboard”), and BN mechanics and minutiae all Law enforcement remain conveniently “under the hood.” School events Employment events events The framework includes stock noisyOr and noisyAnd distributions (bearing a standard Leak parameter) for BN Figure 2: Partial generic person attribute concept BN (top), nodes with more than one parent. While these have so far with related event categories (bottom). BN influences point been sufficient in our modeling efforts, we also could fall (causally) from indicated concept hypothesis to indicating back to fine distribution specification. We have deliberately concept. Stronger indications have thicker arrows. A single designed the framework to skirt standard CPT elicitation, negative indication has a red, double-lined arrow. which can tend to fatigue SMEs. Consider an indicator of h different Hypotheses, so with h BN parents and 2h CPT rows. Suppose belief is discretized on a 7-point scale.6 Then The framework processes a given person’s event evidence to standard, row-by-row elicitation requires 2h entries. With specialize this generic BN into a person-specific BN (Schrag noisyOr or noisyAnd, we need only h entries bearing a et al., 2014). polarity and strength for each parent, plus a Leak value for the We have specified target beliefs for some two dozen nodes in distribution. the generic person network. By processing the target beliefs We are working to make modeling in the framework more in an event evidence-free context, we ensure that events have accessible to SMEs, particularly via model editing the effects intended, respecting both indication strengths and capabilities in the GUI exhibited in Figure 3. (Schrag et al., exogenous statistics.7 2016a) describes our framework encoding of an analyst’s argument, favorable comparison of resulting modeled 4. INTELLIGENCE ANALYSIS MODEL probabilities to analyst-computed ones, and favorable MOTIVATING REQUIRMENTS FROM comparison of CPTs generated by the framework vs. elicited directly from analysts. LOGIC CONSTRAINTS Figure 3 is a screenshot of a model addressing the CIA’s Iraq 3. PERSON RISK MODEL WITH retaliation scenario (Heuer, 2013)8, where Iraq might respond EXOGENOUS BELIEF to US forces’ bombing of its intelligence headquarters by conducting major, minor, or no terror attacks, given limited REQUIREMENTS evidence about Saddam Hussein’s disposition and public Our person risk assessment application includes a core statements, Iraq’s historical responses, and the status of Iraq’s generic person BN accounting for interactions among beliefs national security apparatus. This model emphasizes about random variables representing different person Saddam’s incentives to act. By setting a hard finding of false attribute concepts like those in Figure 2. on the incentive-collecting node SaddamWins, we can examine computed beliefs under Saddam’s worst-case scenario (and, by comparing this to his best-case scenario, determine that conducting major terror attacks is not his best move). See (Schrag et al., 2016a) for details. 6 7 As (Karvetski et al., 2013) note, the inference quality of models developed Such a dividing line between generic model and evidence may not be so this way usually rivals that of models developed with arbitrary-precision bright in a probabilistic argument map, where an intelligence analyst may CPTs. enter both hypothesis and evidence nodes incrementally. 8 See chapter 8, “Analysis of Competing Hypotheses.” BMAW 2016 - Page 23 of 59 Figure 3: Statement nodes are connected by positive (solid grey line) and negative (dashed grey line) indication links of various strengths (per line thicknesses). Argument flow (from evidence to outcomes) is from right to left—e.g., SaddamWins is strongly indicated by SaddamKeepsFace. Outcome hypothesis nodes are circled in yellow. SaddamWins (hard finding false) captures Saddam’s incentives to act or not. Belief bars’ tick marks fall on a linear scale. Colors are explained in (Schrag et al., 2016a), also (Schrag et al., 2016b). In developing the model in Figure 3, we identified some We know that an attempted action can succeed or fail only if representation and reasoning shortcomings for which we are it occurs. By explicitly modeling (as Hypotheses) both the now implementing responsive capabilities (Schrag et al., potential action results and adding a Logic constraint11, we 2016b). Relevant to our discussion here, TerrorAttacksFail can force zero probability for every excluded truth value (likewise TerrorAttacksSucceed) should be allowed to be true combination, improving the model. See Figure 4. The only when TerrorAttacks also is true. constraint node (left, in right model fragment) ensures that the model will believe in attack success/failure only when an We are working towards Logic nodes supporting any attack actually occurs. Setting the hard true finding on this propositional expression using unary, binary, or higher arity node turns the summarizing Logic statement (left, in the left operators9. When a Logic statement has a hard true finding10, fragment) into the Logic constraint—but also distorts the we refer to it as a Logic constraint, otherwise as a model’s computed probabilities for the three Hypotheses. summarizing Logic statement. Presuming these probabilities have been deliberately engineered by the modeler, our framework must restore them. It does so by implementing (bottom fragment) a target belief (per the ConstraintTBC node) on one of the Hypotheses. Figure 4: Logic constraints can help ensure sound reasoning. 9 See, e.g., https://en.wikipedia.org/wiki/Truth_table. compactly via an if-then-else logic function (notated ite) as (ite Occurs 10 (xor Succeeds Fails) (nor Succeeds Fails))—if an attack occurs, it either A likelihood finding could be used to implement a soft constraint. succeeds or fails, else it neither succeeds nor fails. 11 This constraint can be rendered (abbreviating statement names) as (or (and occur (xor succeed fail)) (and (not Occurs) (nor Succeeds Fails))) or more BMAW 2016 - Page 24 of 59 We implement a target belief either (depending on likelihood findings, not all sets of target beliefs can be purpose) using a BN node like ConstraintTBC or achieved simultaneously. In our intended incremental (equivalently) via a likelihood finding on the subject BN model development concept of operations (CONOPS), the node. The GUI does not ordinarily expose an auxiliary framework’s report that a latest-asserted target belief node like ConstraintTBC to a SME/analyst-class user. induces unsatisfiability should be taken as a signal that a modeling issue requires attention—much as would the This example is for illustration. We can implement this similar report about a latest-asserted CPT. particular BN pattern without target beliefs. We also could implement absolute-strength IndicatedBy links as simple We have implemented the following refinements to this implication Logic constraints. However, this would not basic scheme, improving performance. naturally accommodate one of these links’ key 1. Measure beliefs on a (modified) log odds scale. properties—the ability to specify degree of belief in the 2. Conservatively13 apply Jeffrey’s rule to all affected link’s upstream node when the downstream node is true— nodes in early iterations/fitting steps, then in late steps relevant because we can infer nothing about P given P ! select for adjustment just the node with greatest Q and knowing Q to be true. It also demands two target difference between computed and target beliefs. belief specs that tend to compete. We are working to 3. Save the work from previous target belief processing identify more Logic constraint patterns that can be for a given model (e.g., under edit) to support fast implemented without target beliefs and to generalize incremental operation. specification of belief degree for any underdetermined entries in a summarizing Logic statement’s CPT. 5.1 MODIFIED LOG ODDS BELIEF MEASUREMENT 5. TARGET BELIEF PROCESSING Calculating the differences between beliefs measured on a Ben Mrad et al. (2015) survey BN inference methods scale in the log odds family, vs. on a linear scale, better addressing fixed probability findings—our target beliefs. reflects differences’ actual impacts. We use the function The most recent published results (Peng et al., 2010) depicted in Figure 5—a variation on log odds in which each address problems with no more than 15 nodes (all binary). factor of 2 less than even odds (valued at 0) loses one unit Apparently, earlier approaches materialized full joint of distance that we refer to as a bit. So, for belief = 0.125 distributions—these authors anecdotally reported late- we calculate –2 bits. breaking results using a BN representation, with dramatically improved efficiency. Mrad et al. report related capabilities in the commercial BN tools Netica and BayesiaLab. Netica’s “calibration” findings are concerned with comparing predictions to real data and could help identify where target beliefs were needed, however would do nothing to satisfy them. We have not experimented with BayesiaLab. While our performance results may similarly be construed as anecdotal—we have not systematically explored a relevant problem space—we have addressed a much larger problem. Our person risk assessment BN includes over 600 nodes and 26 target beliefs. The basic scheme of our target belief processing approach is to interleave applications of Jeffrey’s rule12 with Figure 5: Belief transformation function (modified log standard BN inference. Intuitively, each iteration—or odds) used in calculating computed-vs.-target belief “fitting step” (Zhang, 2008)—measures the difference differences between affected nodes’ currently computed beliefs and specified target beliefs, makes changes to bring one or We express differences between beliefs in terms of such more nodes closer to target, and propagates these changes bits. So, difference(0.999, 0.87) = 7.02 bits and in BN inference. We continue iterating until a statistic over difference(0.87, 0.76) = 0.90 bits, whereas both pairs of computed-vs.-target belief differences meets a desired untransformed beliefs (that is, (0.999, 0.87) and (0.87, criterion, or until reaching a limit on iterations, in which 0.76)) have the same ratio, 1.14.14 The transformation case we report failure. Just as for hard findings and 12 14 See (Jeffrey, 1983), as mentioned in section 1. This difference metric is more conservative than the Kullback-Leibler 13 distance or cross-entropy metric used in (Peng et al., 2010)’s I- See section 5.2. divergence calculation. The absolute value of this function also has the advantage of being symmetric. BMAW 2016 - Page 25 of 59 seems to inhibit oscillations among competing target beliefs. Seconds 4-window 10 5.2 MULTIPLE ADJUSTING IN ONE FITTING 8 STEP Seconds 6 Moving all affected nodes all the way to their target beliefs 4 in one fitting step is too aggressive in this model. We can get closer to a solution by adjusting more conservatively. 2 We found that applying Jeffrey’s rule to take affected 0 variables {½ , 1/3, ¼, ...} of the way toward their target 1 3 5 7 9 11 13 15 17 19 21 23 25 beliefs in successive fitting steps worked better than Node scaling calculated differences by any fixed proportion. This trick seems to be advantageous just for the first two or three fitting steps, after which single-node adjustments Figure 6: Run-time by affected node increment, with 4- become more effective. node moving average window Incorporating both this refinement and the preceding one and running with a maximum belief difference of 0.275 bits 6. CONCLUSION for any node (yielding adequate model fidelity for our application), we complete target belief processing in 19 Target beliefs have an important place in our SME-oriented seconds (running inside a Linux virtual machine on a 2012- modeling framework, where their processing is supported vintage Dell Precision M4800 Windows laptop).15 That’s effectively by our methods described here. We might not necessarily GUI-fast, but this is a larger model than reduce or eliminate requirements for exogenous target many of our SME users may ever develop. Fitting steps beliefs by pushing SMEs towards arbitrary-precision link took a little less than one second on average, with each strengths (see Schrag et al. 2016b), but we are counting on step’s processing dominated by the single call to BN target belief machinery to implement Logic constraints that inference. make the SMEs’ accessible modeling representation more expressive and versatile—ultimately more powerful. We These results remain practically anecdotal, as we have so expect target belief processing to be well within GUI far developed in our framework only this one large model response times for small models, including, per (Burns, including many target beliefs. Experience with different 2015), the vast majority of intelligence analysis problems models may lead to more generally useful values for run- amenable to our argument mapping approach. We time parameters. anticipate further work, especially to develop theory and practice for efficient implementation of different Logic 5.3 INCREMENTAL OPERATION constraint patterns. Under incremental operation, we execute only single-node fitting steps, as individual model edits usually have limited Acknowledgements effect on overall target belief satisfaction. So far, we have We gratefully acknowledge the stimulating context of experimented with incremental operation only for our broader collaboration we have shared with other co-authors person risk model. of (Schrag et al., 2016a; 2016b). Over two runs (with target beliefs processed in original input order vs. reversed): • Average processing times per affected node were 2.1 and 2.3 seconds, respectively. Individual target beliefs processed in about 1.1 seconds or less about half the time. Figure 6 plots processing times for the first run, by affected node number, including a 4- node moving average. • The least number of fitting steps was 0, the greatest 17 (taking from 0 to 8.7 seconds). • Total run times were 54 and 59 seconds, respectively. So, batch (vs. incremental) processing can be advantageous, depending on CONOPS and use case. 15 We found that tightening tolerance by a factor of 6.6 increased run time by a factor of 3.0. BMAW 2016 - Page 26 of 59 References https://labs.haystax.com/wp- content/uploads/2016/06/BMAW15-160303-update.pdf. Ali Ben Mrad, Veronique Delcroix, Sylvain Piechowiak, Philip Leicester, and Mohamed Abid (2015), “An Shenyong Zhang, Yun Peng, and Xiaopu Wang (2008), explication of uncertain evidence in Bayesian networks: “An Efficient Method for Probabilistic Knowledge likelihood evidence and probabilistic evidence,” Applied Integration,” In Proceedings of The 20th IEEE Intelligence, published online 20 June 2015. International Conference on Tools with Artificial Intelligence (ICTAI), November 3–5, vol 2. Dayton, pp Kevin Burns (2015), “Bayesian HELP: Assisting 179–182). Inferences in All-Source Intelligence,” Cognitive Assistance in Government, Papers from the AAAI 2015 Fall Symposium, 7–13. CIA Directorate of Intelligence, “A Tradecraft Primer: The Use of Argument Mapping,” Tradecraft Review 3(1), Kent Center for Analytic Tradecraft, Sherman Kent School, 2006. Richards J. Heuer, Jr. (2013), Psychology of Intelligence Analysis, Central Intelligence Agency Historical Document. https://www.cia.gov/library/center-for-the- study-of-intelligence/csi-publications/books-and- monographs/psychology-of-intelligence-analysis (Posted: Mar 16, 2007 01:52 PM. Last Updated: Jun 26, 2013 08:05 AM.) R. Jeffrey (1983), The Logic of Decision, 2nd Edition, University of Chicago Press. Christopher W. Karvetski, Kenneth C. Olson, Donald T. Gantz, and Glenn A. Cross (2013), “Structuring and analyzing competing hypotheses with Bayesian networks for intelligence analysis,” EURO J Decis Process 1:205– 231. Yun Peng, Shenyong Zhang, Rong Pan (2010), “Bayesian Network Reasoning with Uncertain Evidences,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 18(5):539–564. Robert Schrag, Edward Wright, Robert Kerr, and Bryan Ware (2014), “Processing Events in Probabilistic Risk Assessment,” 9th International Conference on Semantic Technologies for Intelligence, Defense, and Security (STIDS). Robert Schrag, Joan McIntyre, Melonie Richey, Kathryn Laskey, Edward Wright, Robert Kerr, Robert Johnson, Bryan Ware, and Robert Hoffman (2016a), “Probabilistic Argument Maps for Intelligence Analysis: Completed Capabilities,” 16th Workshop on Computational Models of Natural Argument. Robert Schrag, Edward Wright, Robert Kerr, Robert Johnson, Bryan Ware, Joan McIntyre, Melonie Richey, Kathryn Laskey, and Robert Hoffman (2016b), “Probabilistic Argument Maps for Intelligence Analysis: Capabilities Underway,” 16th Workshop on Computational Models of Natural Argument. Edward Wright, Robert Schrag, Robert Kerr, and Bryan Ware (2015), “Automating the Construction of Indicator- Hypothesis Bayesian Networks from Qualitative Specifications,” Haystax Technology technical report, BMAW 2016 - Page 27 of 59