An Experimental Procedure for Evaluating User-Centered Methods
                for Rapid Bayesian Network Construction


Michael Farry, Jonathan Pfautz &             Ann Bisantz & Richard Stone                      Emilie Roth
            Zach Cox                       Industrial and Systems Engineering          Roth Cognitive Engineering
   Charles River Analytics, Inc.                  University at Buffalo                      Brookline, MA
     625 Mount Auburn St.                          Amherst NY 14260
     Cambridge, MA 02138


                       Abstract                               Networks (IN) (Jensen, 1996; Rosen & Smith, 1996a;
                                                              Rosen & Smith, 1996b) have been developed to mitigate
     Bayesian networks (BNs) are excellent tools for          this problem. In response to some issues raised by those
     reasoning about uncertainty and capturing                models, and to simplify the Bayesian modeling process
     detailed domain knowledge. However, the                  through novel user interface techniques, we developed a
     complexity of BN structures can pose a                   new canonical model, the Causal Influence Model (CIM)
     challenge to domain experts without a                    (Cox & Pfautz, 2007; Pfautz et al., 2007). The CIM
     background in artificial intelligence or                 paradigm was inspired by anecdotal evidence gained by
     probability when they construct or analyze BN            developing systems for domain experts interacting with
     models. Several canonical models have been               BNs and by an analysis of other canonical models to
     developed to reduce the complexity of BN                 determine the constraints that limit their generalizability
     structures, but there is little research on the          and applicability.
     accessibility and usability of these canonical
     models, their associated user interfaces, and the        There have been few user-centered evaluation efforts to
     contents of the models, including their                  assess how (and if) canonical models help domain experts
     probabilistic relationships. In this paper, we           elicit their knowledge and understanding of models
     present an experimental procedure to evaluate            presented to them, or how graphical interfaces and their
     our novel Causal Influence Model structure by            features and properties impact the way people create,
     measuring users’ ability to construct new models         interpret, reason with, or base actions on Bayesian
     from scratch, and their ability to comprehend            networks. The purpose of our study is to provide baseline
     previously constructed models. [Results of our           information on how people construct and describe CIMs
     experiment will be presented at the workshop.]           presented and created within a graphical user interface.

                                                              1.1 BACKGROUND
1.    INTRODUCTION AND MOTIVATION
                                                              A canonical model (Diez & Druzdzel, 2001) is a
A Bayesian network (BN) (Jensen, 2001; Pearl, 1988) is a      modeling pattern that allows probabilistic relationships
probabilistic model used to reason under uncertainty.         between nodes to be specified by a reduced set of
Successful efforts in applying Bayesian modeling to a         parameters (i.e., without completing every cell in a CPT).
variety of domains (e.g., computer vision (Rimey &            By assuming that the reduced parameters can still
Brown, 1994), social networks (Koelle et al., 2006),          accurately represent the domain being modeled, users can
human cognition (Guarino et al., 2006; Glymour, 2001),        quickly build a complex BN that would otherwise take a
and disease detection (Pang et al., 2004)) have inspired      large amount of time. Most canonical models achieve
knowledge engineers to use BNs to capture domain              their reduced parameters by assuming the independent
knowledge from experts. However, expressing an expert’s       effects of parents. This assumption allows a linear number
domain knowledge in a BN is cumbersome due to the             of parameters to quantify an entire CPT; in the best-case
complex, tedious, and mathematical nature of conditional      scenario, only a single parameter per parent is needed.
probability table (CPT) construction. Adding states and       Canonical models can also serve as a “front-end” tool for
parents to a node quickly results in an exponential           the initial model-building effort, since the CPTs can
explosion in the number of CPT entries required (Pfautz       always be refined by hand or with data at a later time.
et al., 2007). Canonical models such as Noisy-OR              Some of the simplified patterns followed by canonical
(Henrion, 1989; Pearl, 1988), Noisy-MAX (Diez &               models have been motivated by the process followed
Galan, 2003; Diez, 1993; Henrion, 1989), Qualitative          when eliciting key factors and probabilistic relationships
Probabilistic Networks (Wellman, 1990) and Influence
from domain experts (O'Hagan et al., 2006; Hastie &            conditional probability relationships, enabling Boolean,
Dawes, 2001).                                                  ordinal, and categorical nodes to be included. A full
                                                               description of the mathematical formulas that govern
A review of canonical models sheds light on the
                                                               CIMs, including formulas to translate CIM link strengths
advantages and drawbacks of each model. The Influence
Network (IN) model can only be used with Boolean               into conditional probability tables, is provided in (Cox et
                                                               al., 2007).
nodes. It assumes that the child node has a baseline
probability of occurring independently of any parent           Studies have been conducted to analyze and mitigate
effects and that each parent independently influences the      complexities that arise in the construction of Bayesian
child to be more or less likely to be true. Since a single     models as a result of knowledge elicitation (Onisko,
baseline probability for the child and a single change in      Druzdzel, & Wasyluk, 2001), but no studies to date have
probability for each parent are simple parameters for users    assessed the accessibility and usability of various
to specify, the IN represents a powerful mechanism for         canonical models and associated user interfaces when
capturing domain knowledge. However, since only                provided directly to domain experts. The following study
Boolean nodes are allowed in the IN model, model               investigates how users interpret and create CIMs within a
flexibility is significantly reduced. BNs commonly             particular user interface.
contain nodes that represent concepts other than the
occurrence or non-occurrence of events, and INs cannot         2.   METHOD
be used to simplify these BNs without considerably re-
architecting the model.                                        2.1 PARTICIPANTS
The Noisy-OR model is also used only with Boolean              Up to twenty participants are recruited from the university
nodes and assumes that a true state in any parent can          community to perform the study. After providing
cause the child to be true independently of the other          informed consent, participants are given the Ishihara Test
parents, with some uncertainty. Similar to INs, the main       for color blindness. Participants who pass this screening
drawback of the Noisy-OR is its limitation to only             continue with the study.
Boolean nodes. The Noisy-MAX model generalizes the
Noisy-OR and allows ordinal nodes at the expense of            2.2 EXPERIMENTAL SYSTEM
increasing the complexity of parameters. Although Noisy-
                                                               We have developed an CIM-enabled version of our
MAX does work with ordinal nodes, it cannot be used
                                                               BNet.Builder product to allow us to experiment with
with more general discrete nodes that do not have ordered
                                                               graphical interfaces for Bayesian network modeling
states. These nodes, referred to as categorical nodes, have
                                                               (Pfautz et al., 2007). Using a simple point-and-click
an arbitrary number of unordered states and usually
                                                               interface, users can create, label, connect, and move nodes
represent the category or type of something. Qualitative
                                                               in the model. Users can also create and modify causal
Probabilistic Networks (QPNs) allow for the construction
                                                               links to represent positive or negative influences between
of purely qualitative relationships between nodes in a
                                                               nodes and the strength of those relationships. Users can
network, to abstract from the highly quantitative and
                                                               also post or remove evidence to any node and view the
numerical nature of typical Bayesian models. QPNs
                                                               effects of posted evidence on the belief states of other
consider the “signs” inherent in probabilistic relationships
                                                               nodes. Link strengths are converted using CPTs based on
between nodes, and consider the additive synergies
                                                               algorithms provided in (Cox et al., 2007; Pfautz et al.,
between nodes to capture more complicated probabilistic
                                                               2007). The positivity or negativity of a causal link and the
relationships between them (i.e., if A and B both have a
                                                               link strength are represented visually by the color and
positive influence on node C, their influences may be
                                                               thickness of the link, respectively.
synergistic in nature: if A and B are both true, their
cumulative influence upon C may be greater than just the       To simplify model construction for this particular
sum of their individual influences.) QPNs allow for more       experiment, the CIM interface has been constrained so
qualitative model elicitation and may therefore be             that all nodes are Boolean; initial beliefs are set to 0.5 for
appropriate for interactions with non-technical experts,       all nodes and cannot be changed directly by the user (but
but they are limited in their ability to provide hard,         can change based on evidence or link strengths); and only
numerical estimates of the likelihood of events.               “hard” evidence can be posted (e.g., evidence that the
The Causal Influence Model (CIM) is a canonical model          node was either fully true, or fully false). This represents
that retains the desirable properties of the IN while          a set of simplifications we have found useful in other
providing solutions to its problems. The CIM assumes           work, particularly among users less familiar with
that each node is discrete and has an arbitrary number of      Bayesian modeling techniques. Our main goal in this
states with arbitrary meaning. Each node has a baseline        study is to determine whether participants can reason
probability distribution, independent of any parent effects.   about previously constructed CIMs and construct models
Each parent independently influences these baseline            to match a given situation. Since these are specific, novel,
probabilities to be more or less likely. The CIM also          and fundamental questions with little previous research
introduces simplifications that govern the generation of       behind them, we have started with a simple case. The
inclusion of additional node types, in particular, is useful       energy in the battery. When the car is running, the
for future work in comparing CIMs to other canonical               alternator “recharges” the battery. This process only
models such as INs, Noisy-OR, and Noisy-MAX.                       works if the alternator is working, and the battery is
                                                                   new.
2.3 EXPERIMENTAL TASKS
                                                               Four models/vignettes have been constructed for each
Participants will be asked to provide descriptions of and
                                                               task (a total of 12). Each model has the following
answer questions about a series of CIMs shown in the
                                                               relationships: 1 child/1 parent, 2 children/1 parent, 1
BNet.Builder interface. In the first task, participants will   child/2 parents, 2 children/2 parents. In all cases, all
be shown a model and asked questions about the structure
                                                               children are linked to all parents. Also, in all but the 1
and nature of relationships in the model (specifically,
                                                               child/1 parent case, one parent-child link is negative. This
questions asking them to describe elements of the model,       simplification provides the basis for the initial study. We
and questions related to abductive and deductive
                                                               expect to expand upon this simple representation with
reasoning using the model). For instance, given the
                                                               later empirical work.
following example model (Figure 1), participants would
be asked:
                                                               2.4 INDEPENDENT VARIABLE
•   Description: This picture shows a model of part of a
    car. Describe what causes headlights to be dim, or not     Two stimuli sets are created based on the 12 models.
    dim.                                                       Either the nodes in the models (or phrases in the vignette)
•   Abductive Reasoning: If the headlights are dim, what       are phrased positively, or they include at least one node
    does that mean about the other parts of the car?           that uses negative phrasing (e.g., “battery is not new”).
•   Deductive Reasoning: The alternator is working.            This difference allows us to investigate how semantic
    What does that suggest about the headlights? The           properties of the model or situation affect task
    battery is old. What does that suggest about the           performance. This condition has been inspired by our
    headlights? What if the battery is new and the             experience in domain expert interaction with CIM
    alternator is failing?                                     modeling interfaces, where we observed the articulation
                                                               of variable names as a source of common confusion. The
                                                               use of negatives in the variable name (e.g., “not raining”)
                                                               or logical antonyms (e.g., “happy” and “sad”) tends to
                                                               lead to later confusion in expressing causal relationships
                                                               (e.g., “if it is not not-raining, then it is unlikely that
                                                               Rakesh will not bring his umbrella”). By including this
                                                               specific independent variable, we will be able to assess
                                                               which specific patterns of reasoning are most difficult for
                                                               users. Participants are randomly assigned to one of the
                                                               two stimuli sets (up to 10 participants per condition). This
                                                               sample size is consistent with those used in usability type
                                                               tests, and will allow us to analyze verbal protocols of
                                                               participants to look for patterns across conditions.

                                                               2.5 DEPENDENT MEASURES AND ANALYSIS
Figure 1. Example model used in the experiment. The            Throughout all three tasks, participants are asked to “talk
green link represents positive influence, while the red link   aloud” while performing the task to describe how they are
represents negative influence within our CIM-enabled           thinking about or creating the models. Screen capture
interface.                                                     software is used to record participants’ interaction with
In the second task, participants can manipulate the causal     and construction of models. Participants are also fitted
links and post evidence to see how changing the strength       with a view point eye tracker (lightweight glasses that
and directionality of the links between the nodes, and         have an attached camera that tracks the corneal
evidence about the state of the nodes, affects beliefs about   movements of the participant’s eye to assess gaze relative
whether the nodes are true or false. They will respond to      to the computer screen they are working on). The eye
similar sets of questions as provided in the first task.       tracking system is used to record aspects of gaze position
Finally, in the third task, participants will be asked to      and dwell time at a screen location. Time to complete the
construct models from scratch using the interface based        tasks is also being recorded.
on several different vignettes, such as the following:         Data from the audio, eye track, and screen capture
    The headlight system on a car is dependent on two          processes is combined to create a “process trace” of each
    components: a battery, which stores energy to power        participant’s behavior describing and creating CIMs
    the lights, and an alternator, which converts              (Woods, 1993). Verbalizations and actions are coded and
    mechanical energy from the car’s engine into stored        analyzed (Bainbridge & Sanderson, 1995; Sanderson &
                                                               Fisher, 1994; Woods, 1993) to identify the correctness
and completeness of the descriptions and answers               disregard parental independence when constructing CIMs,
provided by participants in the first task, the processes      and further observation of user reaction to non-intuitive
with which participants constructed the models in the          but correct behavior (e.g., becoming confused when
second task, and the form and content of the models            particular variables appear overly sensitive or insensitive
produced in the third task.                                    to posted evidence.)
                                                               The CIM interface provides a user-friendly way to
3.   ANTICIPATED RESULTS AND                                   express causal influences between nodes, vastly
     DISCUSSION                                                decreasing the number of parameters needed to construct
                                                               causal models and providing the capability for a much
The purpose of this study is to provide baseline               broader base of users to perform Bayesian modeling.
information regarding how people construct and describe        Within the experimental interface, participants express
CIM models presented and created within the                    relative degrees of influence over a range of 11 steps
BNet.Builder interface. There is continued interest in         (from positive to negative 5, with a neutral intermediate
simplifying the manner in which domain expertise is            value). Additional studies are necessary to clarify the
elicited, and the creation and presentation of Bayesian        appropriate level of granularity of influence assignment
network models through direct manipulation and                 (e.g., 3 steps? 11 steps? 51 steps?) as well whether other
visualization. However, information on how these tools         methods of assigning strengths across sets of links (e.g.,
are used by practitioners, how they affect the models that     normalized strengths, rank ordered strengths) have merit.
people produce, and how they affect the way that people        Finally, detailed studies with real-world models,
interpret models or predict outcomes is missing. We            situations, and domain experts are required.
anticipate that users will have more difficulty explaining
and constructing models with more parent-child                 Acknowledgements
connections. We also anticipate users having more
difficulty explaining and constructing models when there       We would like to thank David Koelle, Geoffrey Catto,
are more nodes with negative causal links because of the       Joseph Campolongo, Sam Mahoney, Sean Guarino, and
increase in complexity of the models.                          Eric Carlson for their contributions in the development of
                                                               the CIM and identifying hypotheses to investigate. We
In this study, we intend to measure reasoning patterns
                                                               also extend our deepest gratitude to Greg Zacharias for
involving negative quantities that give users the most
                                                               his continued funding and support of our work with
trouble. We anticipate that users will have the most
                                                               Bayesian networks.
difficulty interpreting and creating models when nodes
are presented with “negatively phrased” labels (e.g.,
assessing the influence of a node labeled “battery is not      References
new” on a node labeled “headlights are dim”). If this is       Bainbridge, L., & Sanderson, P. (2005). Verbal protocol
the case, it suggests a need for developers of CIMs (and           analysis. In J. R. Wilson & E. N. Corlett (Eds.),
BNs in general) to encourage users to employ certain               Evaluation of Human Work (pp. 159 - 184). Boca
modeling patterns, possibly by constraining the                    Raton: Taylor and Francis.
description of nodes. These constraints, in turn, can be
accomplished through prior training or interface wizards,      Cox, Z. & Pfautz, J. (2007). Causal Influence Models: A
or through intelligent, automatic processing of user                Method for Simplifying Construction of Bayesian
entries, and provision of suggested alternatives (e.g., pop-        Networks. (Rep. No. R-BN07-01). Cambridge, MA:
up suggestions). These interventions could be tested in             Charles River Analytics Inc.
further studies.
The primary contribution of this paper will be process-        Diez, F. J. (1993). Parameter Adjustment in Bayes
and product-oriented descriptions of how this graphical             Networks: The Generalized Noisy OR-Gate. In
tool is used to interpret and create CIMs. Future research          Proceedings of the 9th Conference of Uncertainty in
could compare how models created within the CIM                     Artificial Intelligence, (pp. 99-105). San Mateo, CA:
framework compare to those using more traditional BN                Morgan Kaufmann.
structures, from the point of view of the user. This study
used simple Bayesian models, with constrained                  Diez, F. J. & Druzdzel, M. J. (2001). Fundamentals of
parameters and interaction capabilities, and used only              Canonical Models. In Proceedings of Ponencia
Boolean nodes. Future studies, guided by these initial              Congreso: IX Conferencia De La Asociacion
findings, can be conducted using more complex models, a
                                                                    Espanola Para La Inteligencia Artificial (CAEPIA-
greater variety of node types (e.g., categorical, ordinal),
and allow subjects greater flexibility in manipulating              TTIA 2001), (pp. 1125-1134).
CPTs and posting evidence. Other issues for investigation
include measuring and mitigating user tendencies to            Diez, F. J. & Galan, S. F. (2003). An Efficient
confuse “evidence” and “belief” (both as terms, and in the          Factorization for the Noisy MAX. International
values these terms represent), measuring tendencies to              Journal o Intelligent Systems, 18165-177.
Glymour, C. (2001). The Mind's Arrows: Bayes Nets and       Pang, B., Zhang, D., Li, N., & Wang, K. (2004).
    Graphical Causal Models in Psychology.                      Computerized Tongue Diagnosis Based on Bayesian
    Cambridge, MA: The MIT Press.                               Networks. IEEE Transactions on Biomedical
                                                                Engineering, 51(10), 1803-1810.
Guarino, S., Pfautz, J., Cox, Z., & Roth, E. (2006).
    Modeling Human Reasoning About Meta-                    Pearl, J. (1988). Probabilistic Reasoning in Intelligent
    Information. In Proceedings of 4th Bayesian                  Systems: Networks of Plausible Inference. San
    Modeling Applications Workshop at the 22nd                   Mateo, CA: Morgan Kaufmann.
    Annual Conference on Uncertainty in AI: UAI '06.
    Cambridge, Massachusetts.                               Pfautz, J., Cox, Z., Koelle, D., Catto, G., Campolongo, J.,
                                                                 & Roth, E. (2007). User-Centered Methods for
Hastie, R. & Dawes, R. M. (2001). Rational Choice in an          Rapid Creation and Validation of Bayesian
     Uncertain World: The Psychology of Judgment and             Networks. In Proceedings of 5th Bayesian
     Decision-Making. London, UK: Sage Publications.             Applications Workshop at Uncertainty in Artificial
                                                                 Intelligence (UAI '07). Vancouver, British
Henrion, M. (1989). Some Practical Issues in                     Columbia.
    Constructing Belief Networks. In L. Kanal, T.
    Levitt, & J. Lemmer (Eds.), Uncertainty in Artificial   Rimey, R. & Brown, C. (1994). Control of Selective
    Intelligence 3 (pp. 161-173). North Holland:                Perception Using Bayes Nets and Decision Theory.
    Elsevier Science Publishers.                                International Journal of Computer Vision, 12(2-3),
                                                                173-207.
Jensen, F. V. (1996). An Introduction to Bayesian
     Networks. London: University College London            Rosen, J. & Smith, W. (1996a). Influence Net Modeling
     Press.                                                     With Causal Strengths: An Evolutionary Approach.
                                                                In Proceedings of Command and Control Research
Jensen, F. V. (2001). Bayesian Networks and Decision            and Technology Symposium.
     Graphs. New York: Springer-Verlag.
                                                            Rosen, J. A. & Smith, W. L. (1996b). Influencing Global
Koelle, D., Pfautz, J., Farry, M., Cox, Z., Catto, G., &        Situations: A Collaborative Approach. US Air Force
     Campolongo, J. (2006). Applications of Bayesian            Air Chronicles.
     Belief Networks in Social Network Analysis. In
     Proceedings of 4th Bayesian Modeling Applications      Sanderson, P. M., & Fisher, C. (1994). Exploratory
     Workshop at the 22nd Annual Conference on                  sequential data analysis. Human Computer
     Uncertainty in AI: UAI '06. Cambridge,                     Interaction, 9(3), 251 - 317.
     Massachusetts.
                                                            Van der Gagg, L.C., Geenen, P.L., & Tabachneck-Schijf,
Kraaijeveld, P., Druzdzel, M., Onisko, A., & Wasyluk, H.
                                                                H.J.M. (2006). Verifying Monotonicity of Bayesian
     (2005). GeNIeRate: An Interactive Generator of             Networks with Domain Experts. In Proceedings of
     Diagnostic Bayesian Network Models. In
                                                                4th Bayesian Modeling Applications Workshop at
     Proceedings of Working Notes of the 16th
                                                                the 22nd Annual Conference on Uncertainty in AI:
     International Workshop on Principles of Diagnosis          UAI '06. Cambridge, Massachusetts.
     (DX-05), (pp. 175-180).

                                                            Woods, D. D. (1993). Process tracing methods for the
O'Hagan, A., Buck, C., Daneshkhah, A., Eiser, R.,              study of cognition outside of the experimental
    Garthwaite, P., Jenkinson, D. et al. (2006).               psychology laboratory. In G. A. Klein, J. Orasanu,
    Uncertain Judgements: Eliciting Experts'                   R. Calderwood & C. E. Zsambok (Eds.), Decision-
    Probabilities. New York: Wiley & Sons.                     making in action: Models and Methods (pp. 228 -
                                                               251). Norwood NJ: Ablex Publishers.
Onisko, A., Druzdzel, M., & Wasyluk, H. (2001).
    Learning Bayesian Network Parameters From Small
    Data Sets: Application of Noisy-OR Gates.               Wellman, M. P. (1990). Fundamental Concepts of
    International Journal of Approximate Reasoning,             Qualitative Probabilistic Networks. Artificial
    27(2), 165-182.                                             Intelligence, 44(3), 257-303.