Target Beliefs for SME-oriented, Bayesian Network-based Modeling


                             Robert Schrag                                           Edward Wright, Robert Kerr, Robert Johnson
                           Haystax Technology                                                   Haystax Technology
                         11210 Corsica Mist Ave                                            8251 Greensboro Dr, Suite 1000
                          Las Vegas, NV 89135                                                   McLean, VA 22102


                                 Abstract                                        probability distributions (that otherwise might be used to
                                                                                 achieve target beliefs directly).
                                                                                 For example, if a binary node Divorces appears deep in a
     Our framework supporting non-technical subject                              person risk assessment network as an indicator of a top-level
     matter experts’ authoring of useful Bayesian                                binary node Trustworthy, usually (without target beliefs or
     networks has presented requirements for fixed                               other node findings) the network’s computed belief in
     probability soft or virtual evidence findings that we                       Divorces will depend on the network’s conditional
     refer to as target beliefs. We describe exogenously                         probability tables (CPTs)1—not on a published statistic about
     motivated target belief requirements for model                              the divorce rate in an intended subject population. To make
     nodes lacking explicit priors and mechanistically                           our model’s belief in Divorces agree with the exogenous
     motivated requirements induced by logical                                   statistic, a modeler can:
     constraints over nodes that in the framework are
     strictly binary. Compared to the best published                             1.    Adjust CPTs throughout the model to agree with the
     results, our target belief satisfaction methods are                               exogenous specification.
     competitive in result quality and processing time on                        2.    Invoke Jeffrey’s rule (Jeffrey, 1983) to compute a
     much larger problems.                                                             likelihood finding on Divorces that achieves the
                                                                                       specified belief.
                                                                                 3.    Specify a target belief for Divorces and rely on target
                                                                                       belief satisfaction machinery to achieve the target.
1.       INTRODUCTION
                                                                                 The first option is not entirely compatible with our modeling
The variety of soft or virtual evidence finding on a Bayesian                    framework.2 The modeler’s manual effort under either of the
network (BN) node in which a specified probability                               first two options may be undermined as soon as s/he modifies
distribution must be maintained during BN inference—called                       the model again.3 The last option offloads the work of target
a fixed probability finding by (Ben Mrad, 2015) and called a                     belief satisfaction to an automated process—at the expense
target belief here—has received limited attention. Published                     of executing that process, as often as necessary. Execution
results for inference algorithms respecting such findings have                   time may be acceptable for a given use case if the model is
addressed small, artificial problems including at most 15                        small, if it is not modified often, or if model development is
nodes (Peng et al., 2010; Zhang et al., 2008).                                   sufficiently simplified under this approach to enhance overall
Our work on one real application has required addressing                         productivity. As we intend our framework to be subject
dozens of such findings in a BN comprising hundreds of                           matter expert- (SME-)friendly, this option is attractive. The
nodes. In this context, target beliefs are motivated by                          more we can free a modeler to concentrate on higher-level
modelers’ need to address authoritative sources exogenous to                     decisions with greater domain impact, the more and better
the model itself, where beliefs should hold for selected non-                    models s/he should be able to deliver.
BN root model nodes—i.e., nodes lacking explicit prior

     1
         Including top-level node priors as a degenerate case.                              accommodate a conventional approach to machine learning of CPT
     2
                                                                                            entries.
         Our framework automatically computes CPTs (see section 2) to reflect a
                                                                                       3
          modeler’s specified strength with which a child node (counter-)indicates         In principle, any of a large variety of modifications—including more
          its parent node. So, modifying CPTs is appropriate only when modifying             invocations of this option to address additional exogenous
          these strengths is. Likewise, the representation would not naturally               probabilities—could affect computed belief in Divorces.


                                                                    BMAW 2016 - Page 21 of 59
Our work adapting the framework to realize probabilistic                               In the framework, every node (or argument map statement4)
argument maps for intelligence analysis (Schrag et al., 2016a;                         is a Hypothesis. Some Hypotheses are Logic nodes whose
2016b) has surfaced powerful representations (Logic                                    CPTs are deterministic. Connecting the nodes are links
constraints—see section 4) that can improve model clarity                              whose types are listed in Table 1. Argument maps’
and correctness and that often require target beliefs.                                 SupportedBy and RefutedBy links correspond to our
                                                                                       IndicatedBy and CounterIndicatedBy links.
In the following sections, we outline the framework, our large
person risk assessment model, and the view of framework
models as probabilistic argument maps. We explain how                                  Table 1: Framework link types (center column). For the last
Logic constraints can improve arguments (models) and how                               two link types, the argument map-downstream statement
target beliefs can support such constraints. We briefly review                         (BN-downstream node) is a Logic node.
existing competitive target belief processing methods, then
describe our own method and results.                                                                                       IndicatedBy
                                                                                                                        CounterIndicatedBy
2.               SME-ORIENTED MODELING                                                       Argument                                                          Argument
                                                                                                map-                       MitigatedBy                            map-
                 FRAMEWORK
                                                                                            downstream5                     RelevantIf                          upstream
We developed the framework to facilitate creation of useful                                  statement                                                        statement(s)
BNs by non-technical SMEs. Faced with the challenge of                                                                     OppositeOf
operationalizing SMEs’ policy-guided reasoning about
                                                                                                                       ImpliedByConjunction
person trustworthiness in a comprehensive risk model
(Schrag et al., 2014), we first developed a model encoding                                                             ImpliedByDisjunction
hundreds of policy statements. The need for SMEs both to
understand the model and to author its elements inspired us
to develop and apply a technical approach using exclusively                            We encode strengths for non-Logic node-input links (first four
binary random variables (BN nodes) over the domain {true,                              rows of Table 1) using fixed odds ratios per Figure 1.
false}. This led us to an overall representation that happens
to extend standard argument maps (CIA, 2006) with Bayesian
probabilistic reasoning (Schrag et al., 2016a; 2016b).


                                                         against                                                        in favor


                    1:16             1:8        1:4                  1:2         1:1           odds         2:1                    4:1           8:1          16:1
                  VeryStrongly


                                                                                                                                                                VeryStrongly
                                                                                                                                    Moderately
                                            Moderately
    Absolutely


                                                                                                                                                                               Absolutely
                                                                                                                                                   Strongly
                                 Strongly


                                                                   Weakly


                                                                                                              Weakly


                         –4           –3         –2                   –1          0         log2 odds        1                     2             3             4


Figure 1: Odds ratios for discrete link strengths. Absolutely is intended as logical implication. We do not otherwise commit
SMEs to absolute certainty.


4                                                                                      5
    Our binary BN nodes correspond to propositions bearing truth values. In                Per argument map convention, “downstream” is left, “upstream” right in
     the argument map point of view, these propositions may be understood                   the left-flowing argument map of Figure 3. Except for Logic nodes, this
     to be statements.                                                                      is opposite of links’ causal direction in BNs.


                                                                            BMAW 2016 - Page 22 of 59
A framework process (Wright et al., 2015) converts                                                       Trustworthy
specifications into corresponding BNs. The conversion                                                                               …
process recognizes a pattern of link types incident on a given                                             Reliable            …           CommitsMisdemeanor
node and constructs an appropriate CPT reflecting specified
polarities and strengths. The SME thus works in a graphical
user interface (GUI) with an argument map representation (as
                                                                                      CommittedToSchool       …       CommittedToCareeer

if at a “dashboard”), and BN mechanics and minutiae all                                                                                            Law
                                                                                                                                               enforcement
remain conveniently “under the hood.”                                                   School events                 Employment events           events
The framework includes stock noisyOr and noisyAnd
distributions (bearing a standard Leak parameter) for BN                          Figure 2: Partial generic person attribute concept BN (top),
nodes with more than one parent. While these have so far                          with related event categories (bottom). BN influences point
been sufficient in our modeling efforts, we also could fall                       (causally) from indicated concept hypothesis to indicating
back to fine distribution specification. We have deliberately
                                                                                  concept. Stronger indications have thicker arrows. A single
designed the framework to skirt standard CPT elicitation,
                                                                                  negative indication has a red, double-lined arrow.
which can tend to fatigue SMEs. Consider an indicator of h
different Hypotheses, so with h BN parents and 2h CPT rows.
Suppose belief is discretized on a 7-point scale.6 Then                           The framework processes a given person’s event evidence to
standard, row-by-row elicitation requires 2h entries. With                        specialize this generic BN into a person-specific BN (Schrag
noisyOr or noisyAnd, we need only h entries bearing a                             et al., 2014).
polarity and strength for each parent, plus a Leak value for the                  We have specified target beliefs for some two dozen nodes in
distribution.                                                                     the generic person network. By processing the target beliefs
We are working to make modeling in the framework more                             in an event evidence-free context, we ensure that events have
accessible to SMEs, particularly via model editing                                the effects intended, respecting both indication strengths and
capabilities in the GUI exhibited in Figure 3. (Schrag et al.,                    exogenous statistics.7
2016a) describes our framework encoding of an analyst’s
argument, favorable comparison of resulting modeled                               4.       INTELLIGENCE ANALYSIS MODEL
probabilities to analyst-computed ones, and favorable                                      MOTIVATING REQUIRMENTS FROM
comparison of CPTs generated by the framework vs. elicited
directly from analysts.
                                                                                           LOGIC CONSTRAINTS
                                                                                  Figure 3 is a screenshot of a model addressing the CIA’s Iraq
3.       PERSON RISK MODEL WITH                                                   retaliation scenario (Heuer, 2013)8, where Iraq might respond
         EXOGENOUS BELIEF                                                         to US forces’ bombing of its intelligence headquarters by
                                                                                  conducting major, minor, or no terror attacks, given limited
         REQUIREMENTS                                                             evidence about Saddam Hussein’s disposition and public
Our person risk assessment application includes a core                            statements, Iraq’s historical responses, and the status of Iraq’s
generic person BN accounting for interactions among beliefs                       national security apparatus.        This model emphasizes
about random variables representing different person                              Saddam’s incentives to act. By setting a hard finding of false
attribute concepts like those in Figure 2.                                        on the incentive-collecting node SaddamWins, we can
                                                                                  examine computed beliefs under Saddam’s worst-case
                                                                                  scenario (and, by comparing this to his best-case scenario,
                                                                                  determine that conducting major terror attacks is not his best
                                                                                  move). See (Schrag et al., 2016a) for details.


6                                                                                 7
    As (Karvetski et al., 2013) note, the inference quality of models developed       Such a dividing line between generic model and evidence may not be so
     this way usually rivals that of models developed with arbitrary-precision         bright in a probabilistic argument map, where an intelligence analyst may
     CPTs.                                                                             enter both hypothesis and evidence nodes incrementally.
                                                                                  8
                                                                                      See chapter 8, “Analysis of Competing Hypotheses.”


                                                                       BMAW 2016 - Page 23 of 59
Figure 3: Statement nodes are connected by positive (solid grey line) and negative (dashed grey line) indication links of various
strengths (per line thicknesses). Argument flow (from evidence to outcomes) is from right to left—e.g., SaddamWins is
strongly indicated by SaddamKeepsFace. Outcome hypothesis nodes are circled in yellow. SaddamWins (hard finding false)
captures Saddam’s incentives to act or not. Belief bars’ tick marks fall on a linear scale. Colors are explained in (Schrag et
al., 2016a), also (Schrag et al., 2016b).

In developing the model in Figure 3, we identified some                            We know that an attempted action can succeed or fail only if
representation and reasoning shortcomings for which we are                         it occurs. By explicitly modeling (as Hypotheses) both the
now implementing responsive capabilities (Schrag et al.,                           potential action results and adding a Logic constraint11, we
2016b). Relevant to our discussion here, TerrorAttacksFail                         can force zero probability for every excluded truth value
(likewise TerrorAttacksSucceed) should be allowed to be true                       combination, improving the model. See Figure 4. The
only when TerrorAttacks also is true.                                              constraint node (left, in right model fragment) ensures that
                                                                                   the model will believe in attack success/failure only when an
We are working towards Logic nodes supporting any                                  attack actually occurs. Setting the hard true finding on this
propositional expression using unary, binary, or higher arity                      node turns the summarizing Logic statement (left, in the left
operators9. When a Logic statement has a hard true finding10,                      fragment) into the Logic constraint—but also distorts the
we refer to it as a Logic constraint, otherwise as a                               model’s computed probabilities for the three Hypotheses.
summarizing Logic statement.                                                       Presuming these probabilities have been deliberately
                                                                                   engineered by the modeler, our framework must restore them.
                                                                                   It does so by implementing (bottom fragment) a target belief
                                                                                   (per the ConstraintTBC node) on one of the Hypotheses.


        Figure 4: Logic constraints can help ensure sound reasoning.


9
    See, e.g., https://en.wikipedia.org/wiki/Truth_table.                            compactly via an if-then-else logic function (notated ite) as (ite Occurs
10                                                                                   (xor Succeeds Fails) (nor Succeeds Fails))—if an attack occurs, it either
     A likelihood finding could be used to implement a soft constraint.
                                                                                     succeeds or fails, else it neither succeeds nor fails.
11
     This constraint can be rendered (abbreviating statement names) as (or (and
      occur (xor succeed fail)) (and (not Occurs) (nor Succeeds Fails))) or more


                                                                          BMAW 2016 - Page 24 of 59
We implement a target belief either (depending on                 likelihood findings, not all sets of target beliefs can be
purpose) using a BN node like ConstraintTBC or                    achieved simultaneously. In our intended incremental
(equivalently) via a likelihood finding on the subject BN         model development concept of operations (CONOPS), the
node. The GUI does not ordinarily expose an auxiliary             framework’s report that a latest-asserted target belief
node like ConstraintTBC to a SME/analyst-class user.              induces unsatisfiability should be taken as a signal that a
                                                                  modeling issue requires attention—much as would the
This example is for illustration. We can implement this           similar report about a latest-asserted CPT.
particular BN pattern without target beliefs. We also could
implement absolute-strength IndicatedBy links as simple           We have implemented the following refinements to this
implication Logic constraints. However, this would not            basic scheme, improving performance.
naturally accommodate one of these links’ key
                                                                  1.      Measure beliefs on a (modified) log odds scale.
properties—the ability to specify degree of belief in the
                                                                  2.      Conservatively13 apply Jeffrey’s rule to all affected
link’s upstream node when the downstream node is true—
                                                                          nodes in early iterations/fitting steps, then in late steps
relevant because we can infer nothing about P given P !
                                                                          select for adjustment just the node with greatest
Q and knowing Q to be true. It also demands two target
                                                                          difference between computed and target beliefs.
belief specs that tend to compete. We are working to
                                                                  3.      Save the work from previous target belief processing
identify more Logic constraint patterns that can be
                                                                          for a given model (e.g., under edit) to support fast
implemented without target beliefs and to generalize
                                                                          incremental operation.
specification of belief degree for any underdetermined
entries in a summarizing Logic statement’s CPT.
                                                                  5.1 MODIFIED LOG ODDS BELIEF
                                                                      MEASUREMENT
5.       TARGET BELIEF PROCESSING
                                                                  Calculating the differences between beliefs measured on a
Ben Mrad et al. (2015) survey BN inference methods                scale in the log odds family, vs. on a linear scale, better
addressing fixed probability findings—our target beliefs.         reflects differences’ actual impacts. We use the function
The most recent published results (Peng et al., 2010)             depicted in Figure 5—a variation on log odds in which each
address problems with no more than 15 nodes (all binary).         factor of 2 less than even odds (valued at 0) loses one unit
Apparently, earlier approaches materialized full joint            of distance that we refer to as a bit. So, for belief = 0.125
distributions—these authors anecdotally reported late-            we calculate –2 bits.
breaking results using a BN representation, with
dramatically improved efficiency. Mrad et al. report
related capabilities in the commercial BN tools Netica and
BayesiaLab. Netica’s “calibration” findings are concerned
with comparing predictions to real data and could help
identify where target beliefs were needed, however would
do nothing to satisfy them. We have not experimented with
BayesiaLab. While our performance results may similarly
be construed as anecdotal—we have not systematically
explored a relevant problem space—we have addressed a
much larger problem. Our person risk assessment BN
includes over 600 nodes and 26 target beliefs.

The basic scheme of our target belief processing approach
is to interleave applications of Jeffrey’s rule12 with            Figure 5: Belief transformation function (modified log
standard BN inference. Intuitively, each iteration—or             odds) used in calculating computed-vs.-target belief
“fitting step” (Zhang, 2008)—measures the difference              differences
between affected nodes’ currently computed beliefs and
specified target beliefs, makes changes to bring one or           We express differences between beliefs in terms of such
more nodes closer to target, and propagates these changes         bits.   So, difference(0.999, 0.87) = 7.02 bits and
in BN inference. We continue iterating until a statistic over     difference(0.87, 0.76) = 0.90 bits, whereas both pairs of
computed-vs.-target belief differences meets a desired            untransformed beliefs (that is, (0.999, 0.87) and (0.87,
criterion, or until reaching a limit on iterations, in which      0.76)) have the same ratio, 1.14.14 The transformation
case we report failure. Just as for hard findings and

12                                                                14
     See (Jeffrey, 1983), as mentioned in section 1.                   This difference metric is more conservative than the Kullback-Leibler
13                                                                      distance or cross-entropy metric used in (Peng et al., 2010)’s I-
     See section 5.2.
                                                                        divergence calculation. The absolute value of this function also has
                                                                        the advantage of being symmetric.


                                                       BMAW 2016 - Page 25 of 59
seems to inhibit oscillations among competing target
beliefs.                                                                                                        Seconds      4-window

                                                                                               10
5.2 MULTIPLE ADJUSTING IN ONE FITTING
                                                                                                8
    STEP


                                                                                     Seconds
                                                                                                6
Moving all affected nodes all the way to their target beliefs                                   4
in one fitting step is too aggressive in this model. We can
get closer to a solution by adjusting more conservatively.                                      2
We found that applying Jeffrey’s rule to take affected                                          0
variables {½ , 1/3, ¼, ...} of the way toward their target                                          1   3   5   7   9 11 13 15 17 19 21 23 25
beliefs in successive fitting steps worked better than                                                                    Node
scaling calculated differences by any fixed proportion.
This trick seems to be advantageous just for the first two or
three fitting steps, after which single-node adjustments
                                                                                Figure 6: Run-time by affected node increment, with 4-
become more effective.
                                                                                node moving average window
Incorporating both this refinement and the preceding one
and running with a maximum belief difference of 0.275 bits                      6.        CONCLUSION
for any node (yielding adequate model fidelity for our
application), we complete target belief processing in 19                        Target beliefs have an important place in our SME-oriented
seconds (running inside a Linux virtual machine on a 2012-                      modeling framework, where their processing is supported
vintage Dell Precision M4800 Windows laptop).15 That’s                          effectively by our methods described here. We might
not necessarily GUI-fast, but this is a larger model than                       reduce or eliminate requirements for exogenous target
many of our SME users may ever develop. Fitting steps                           beliefs by pushing SMEs towards arbitrary-precision link
took a little less than one second on average, with each                        strengths (see Schrag et al. 2016b), but we are counting on
step’s processing dominated by the single call to BN                            target belief machinery to implement Logic constraints that
inference.                                                                      make the SMEs’ accessible modeling representation more
                                                                                expressive and versatile—ultimately more powerful. We
These results remain practically anecdotal, as we have so
                                                                                expect target belief processing to be well within GUI
far developed in our framework only this one large model
                                                                                response times for small models, including, per (Burns,
including many target beliefs. Experience with different
                                                                                2015), the vast majority of intelligence analysis problems
models may lead to more generally useful values for run-
                                                                                amenable to our argument mapping approach. We
time parameters.
                                                                                anticipate further work, especially to develop theory and
                                                                                practice for efficient implementation of different Logic
5.3 INCREMENTAL OPERATION                                                       constraint patterns.
Under incremental operation, we execute only single-node
fitting steps, as individual model edits usually have limited                   Acknowledgements
effect on overall target belief satisfaction. So far, we have
                                                                                We gratefully acknowledge the stimulating context of
experimented with incremental operation only for our
                                                                                broader collaboration we have shared with other co-authors
person risk model.
                                                                                of (Schrag et al., 2016a; 2016b).
Over two runs (with target beliefs processed in original
input order vs. reversed):
•       Average processing times per affected node were 2.1
        and 2.3 seconds, respectively. Individual target
        beliefs processed in about 1.1 seconds or less about
        half the time. Figure 6 plots processing times for the
        first run, by affected node number, including a 4-
        node moving average.
•       The least number of fitting steps was 0, the greatest
        17 (taking from 0 to 8.7 seconds).
•       Total run times were 54 and 59 seconds, respectively.
        So, batch (vs. incremental) processing can be
        advantageous, depending on CONOPS and use case.

15
     We found that tightening tolerance by a factor of 6.6 increased run time
      by a factor of 3.0.


                                                                  BMAW 2016 - Page 26 of 59
References                                                    https://labs.haystax.com/wp-
                                                              content/uploads/2016/06/BMAW15-160303-update.pdf.
Ali Ben Mrad, Veronique Delcroix, Sylvain Piechowiak,
Philip Leicester, and Mohamed Abid (2015), “An                Shenyong Zhang, Yun Peng, and Xiaopu Wang (2008),
explication of uncertain evidence in Bayesian networks:       “An Efficient Method for Probabilistic Knowledge
likelihood evidence and probabilistic evidence,” Applied      Integration,” In Proceedings of The 20th IEEE
Intelligence, published online 20 June 2015.                  International Conference on Tools with Artificial
                                                              Intelligence (ICTAI), November 3–5, vol 2. Dayton, pp
Kevin Burns (2015), “Bayesian HELP: Assisting                 179–182).
Inferences in All-Source Intelligence,” Cognitive
Assistance in Government, Papers from the AAAI 2015
Fall Symposium, 7–13.
CIA Directorate of Intelligence, “A Tradecraft Primer: The
Use of Argument Mapping,” Tradecraft Review 3(1), Kent
Center for Analytic Tradecraft, Sherman Kent School,
2006.
Richards J. Heuer, Jr. (2013), Psychology of Intelligence
Analysis, Central Intelligence Agency Historical
Document.       https://www.cia.gov/library/center-for-the-
study-of-intelligence/csi-publications/books-and-
monographs/psychology-of-intelligence-analysis (Posted:
Mar 16, 2007 01:52 PM. Last Updated: Jun 26, 2013 08:05
AM.)
R. Jeffrey (1983), The Logic of Decision, 2nd Edition,
University of Chicago Press.
Christopher W. Karvetski, Kenneth C. Olson, Donald T.
Gantz, and Glenn A. Cross (2013), “Structuring and
analyzing competing hypotheses with Bayesian networks
for intelligence analysis,” EURO J Decis Process 1:205–
231.
Yun Peng, Shenyong Zhang, Rong Pan (2010), “Bayesian
Network Reasoning with Uncertain Evidences,”
International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 18(5):539–564.
Robert Schrag, Edward Wright, Robert Kerr, and Bryan
Ware (2014), “Processing Events in Probabilistic Risk
Assessment,” 9th International Conference on Semantic
Technologies for Intelligence, Defense, and Security
(STIDS).
Robert Schrag, Joan McIntyre, Melonie Richey, Kathryn
Laskey, Edward Wright, Robert Kerr, Robert Johnson,
Bryan Ware, and Robert Hoffman (2016a), “Probabilistic
Argument Maps for Intelligence Analysis: Completed
Capabilities,” 16th Workshop on Computational Models of
Natural Argument.
Robert Schrag, Edward Wright, Robert Kerr, Robert
Johnson, Bryan Ware, Joan McIntyre, Melonie Richey,
Kathryn Laskey, and Robert Hoffman (2016b),
“Probabilistic Argument Maps for Intelligence Analysis:
Capabilities Underway,” 16th Workshop on Computational
Models of Natural Argument.
Edward Wright, Robert Schrag, Robert Kerr, and Bryan
Ware (2015), “Automating the Construction of Indicator-
Hypothesis Bayesian Networks from Qualitative
Specifications,” Haystax Technology technical report,


                                                  BMAW 2016 - Page 27 of 59