An Experimental Procedure for Evaluating User-Centered Methods for Rapid Bayesian Network Construction Michael Farry, Jonathan Pfautz & Ann Bisantz & Richard Stone Emilie Roth Zach Cox Industrial and Systems Engineering Roth Cognitive Engineering Charles River Analytics, Inc. University at Buffalo Brookline, MA 625 Mount Auburn St. Amherst NY 14260 Cambridge, MA 02138 Abstract Networks (IN) (Jensen, 1996; Rosen & Smith, 1996a; Rosen & Smith, 1996b) have been developed to mitigate Bayesian networks (BNs) are excellent tools for this problem. In response to some issues raised by those reasoning about uncertainty and capturing models, and to simplify the Bayesian modeling process detailed domain knowledge. However, the through novel user interface techniques, we developed a complexity of BN structures can pose a new canonical model, the Causal Influence Model (CIM) challenge to domain experts without a (Cox & Pfautz, 2007; Pfautz et al., 2007). The CIM background in artificial intelligence or paradigm was inspired by anecdotal evidence gained by probability when they construct or analyze BN developing systems for domain experts interacting with models. Several canonical models have been BNs and by an analysis of other canonical models to developed to reduce the complexity of BN determine the constraints that limit their generalizability structures, but there is little research on the and applicability. accessibility and usability of these canonical models, their associated user interfaces, and the There have been few user-centered evaluation efforts to contents of the models, including their assess how (and if) canonical models help domain experts probabilistic relationships. In this paper, we elicit their knowledge and understanding of models present an experimental procedure to evaluate presented to them, or how graphical interfaces and their our novel Causal Influence Model structure by features and properties impact the way people create, measuring users’ ability to construct new models interpret, reason with, or base actions on Bayesian from scratch, and their ability to comprehend networks. The purpose of our study is to provide baseline previously constructed models. [Results of our information on how people construct and describe CIMs experiment will be presented at the workshop.] presented and created within a graphical user interface. 1.1 BACKGROUND 1. INTRODUCTION AND MOTIVATION A canonical model (Diez & Druzdzel, 2001) is a A Bayesian network (BN) (Jensen, 2001; Pearl, 1988) is a modeling pattern that allows probabilistic relationships probabilistic model used to reason under uncertainty. between nodes to be specified by a reduced set of Successful efforts in applying Bayesian modeling to a parameters (i.e., without completing every cell in a CPT). variety of domains (e.g., computer vision (Rimey & By assuming that the reduced parameters can still Brown, 1994), social networks (Koelle et al., 2006), accurately represent the domain being modeled, users can human cognition (Guarino et al., 2006; Glymour, 2001), quickly build a complex BN that would otherwise take a and disease detection (Pang et al., 2004)) have inspired large amount of time. Most canonical models achieve knowledge engineers to use BNs to capture domain their reduced parameters by assuming the independent knowledge from experts. However, expressing an expert’s effects of parents. This assumption allows a linear number domain knowledge in a BN is cumbersome due to the of parameters to quantify an entire CPT; in the best-case complex, tedious, and mathematical nature of conditional scenario, only a single parameter per parent is needed. probability table (CPT) construction. Adding states and Canonical models can also serve as a “front-end” tool for parents to a node quickly results in an exponential the initial model-building effort, since the CPTs can explosion in the number of CPT entries required (Pfautz always be refined by hand or with data at a later time. et al., 2007). Canonical models such as Noisy-OR Some of the simplified patterns followed by canonical (Henrion, 1989; Pearl, 1988), Noisy-MAX (Diez & models have been motivated by the process followed Galan, 2003; Diez, 1993; Henrion, 1989), Qualitative when eliciting key factors and probabilistic relationships Probabilistic Networks (Wellman, 1990) and Influence from domain experts (O'Hagan et al., 2006; Hastie & conditional probability relationships, enabling Boolean, Dawes, 2001). ordinal, and categorical nodes to be included. A full description of the mathematical formulas that govern A review of canonical models sheds light on the CIMs, including formulas to translate CIM link strengths advantages and drawbacks of each model. The Influence Network (IN) model can only be used with Boolean into conditional probability tables, is provided in (Cox et al., 2007). nodes. It assumes that the child node has a baseline probability of occurring independently of any parent Studies have been conducted to analyze and mitigate effects and that each parent independently influences the complexities that arise in the construction of Bayesian child to be more or less likely to be true. Since a single models as a result of knowledge elicitation (Onisko, baseline probability for the child and a single change in Druzdzel, & Wasyluk, 2001), but no studies to date have probability for each parent are simple parameters for users assessed the accessibility and usability of various to specify, the IN represents a powerful mechanism for canonical models and associated user interfaces when capturing domain knowledge. However, since only provided directly to domain experts. The following study Boolean nodes are allowed in the IN model, model investigates how users interpret and create CIMs within a flexibility is significantly reduced. BNs commonly particular user interface. contain nodes that represent concepts other than the occurrence or non-occurrence of events, and INs cannot 2. METHOD be used to simplify these BNs without considerably re- architecting the model. 2.1 PARTICIPANTS The Noisy-OR model is also used only with Boolean Up to twenty participants are recruited from the university nodes and assumes that a true state in any parent can community to perform the study. After providing cause the child to be true independently of the other informed consent, participants are given the Ishihara Test parents, with some uncertainty. Similar to INs, the main for color blindness. Participants who pass this screening drawback of the Noisy-OR is its limitation to only continue with the study. Boolean nodes. The Noisy-MAX model generalizes the Noisy-OR and allows ordinal nodes at the expense of 2.2 EXPERIMENTAL SYSTEM increasing the complexity of parameters. Although Noisy- We have developed an CIM-enabled version of our MAX does work with ordinal nodes, it cannot be used BNet.Builder product to allow us to experiment with with more general discrete nodes that do not have ordered graphical interfaces for Bayesian network modeling states. These nodes, referred to as categorical nodes, have (Pfautz et al., 2007). Using a simple point-and-click an arbitrary number of unordered states and usually interface, users can create, label, connect, and move nodes represent the category or type of something. Qualitative in the model. Users can also create and modify causal Probabilistic Networks (QPNs) allow for the construction links to represent positive or negative influences between of purely qualitative relationships between nodes in a nodes and the strength of those relationships. Users can network, to abstract from the highly quantitative and also post or remove evidence to any node and view the numerical nature of typical Bayesian models. QPNs effects of posted evidence on the belief states of other consider the “signs” inherent in probabilistic relationships nodes. Link strengths are converted using CPTs based on between nodes, and consider the additive synergies algorithms provided in (Cox et al., 2007; Pfautz et al., between nodes to capture more complicated probabilistic 2007). The positivity or negativity of a causal link and the relationships between them (i.e., if A and B both have a link strength are represented visually by the color and positive influence on node C, their influences may be thickness of the link, respectively. synergistic in nature: if A and B are both true, their cumulative influence upon C may be greater than just the To simplify model construction for this particular sum of their individual influences.) QPNs allow for more experiment, the CIM interface has been constrained so qualitative model elicitation and may therefore be that all nodes are Boolean; initial beliefs are set to 0.5 for appropriate for interactions with non-technical experts, all nodes and cannot be changed directly by the user (but but they are limited in their ability to provide hard, can change based on evidence or link strengths); and only numerical estimates of the likelihood of events. “hard” evidence can be posted (e.g., evidence that the The Causal Influence Model (CIM) is a canonical model node was either fully true, or fully false). This represents that retains the desirable properties of the IN while a set of simplifications we have found useful in other providing solutions to its problems. The CIM assumes work, particularly among users less familiar with that each node is discrete and has an arbitrary number of Bayesian modeling techniques. Our main goal in this states with arbitrary meaning. Each node has a baseline study is to determine whether participants can reason probability distribution, independent of any parent effects. about previously constructed CIMs and construct models Each parent independently influences these baseline to match a given situation. Since these are specific, novel, probabilities to be more or less likely. The CIM also and fundamental questions with little previous research introduces simplifications that govern the generation of behind them, we have started with a simple case. The inclusion of additional node types, in particular, is useful energy in the battery. When the car is running, the for future work in comparing CIMs to other canonical alternator “recharges” the battery. This process only models such as INs, Noisy-OR, and Noisy-MAX. works if the alternator is working, and the battery is new. 2.3 EXPERIMENTAL TASKS Four models/vignettes have been constructed for each Participants will be asked to provide descriptions of and task (a total of 12). Each model has the following answer questions about a series of CIMs shown in the relationships: 1 child/1 parent, 2 children/1 parent, 1 BNet.Builder interface. In the first task, participants will child/2 parents, 2 children/2 parents. In all cases, all be shown a model and asked questions about the structure children are linked to all parents. Also, in all but the 1 and nature of relationships in the model (specifically, child/1 parent case, one parent-child link is negative. This questions asking them to describe elements of the model, simplification provides the basis for the initial study. We and questions related to abductive and deductive expect to expand upon this simple representation with reasoning using the model). For instance, given the later empirical work. following example model (Figure 1), participants would be asked: 2.4 INDEPENDENT VARIABLE • Description: This picture shows a model of part of a car. Describe what causes headlights to be dim, or not Two stimuli sets are created based on the 12 models. dim. Either the nodes in the models (or phrases in the vignette) • Abductive Reasoning: If the headlights are dim, what are phrased positively, or they include at least one node does that mean about the other parts of the car? that uses negative phrasing (e.g., “battery is not new”). • Deductive Reasoning: The alternator is working. This difference allows us to investigate how semantic What does that suggest about the headlights? The properties of the model or situation affect task battery is old. What does that suggest about the performance. This condition has been inspired by our headlights? What if the battery is new and the experience in domain expert interaction with CIM alternator is failing? modeling interfaces, where we observed the articulation of variable names as a source of common confusion. The use of negatives in the variable name (e.g., “not raining”) or logical antonyms (e.g., “happy” and “sad”) tends to lead to later confusion in expressing causal relationships (e.g., “if it is not not-raining, then it is unlikely that Rakesh will not bring his umbrella”). By including this specific independent variable, we will be able to assess which specific patterns of reasoning are most difficult for users. Participants are randomly assigned to one of the two stimuli sets (up to 10 participants per condition). This sample size is consistent with those used in usability type tests, and will allow us to analyze verbal protocols of participants to look for patterns across conditions. 2.5 DEPENDENT MEASURES AND ANALYSIS Figure 1. Example model used in the experiment. The Throughout all three tasks, participants are asked to “talk green link represents positive influence, while the red link aloud” while performing the task to describe how they are represents negative influence within our CIM-enabled thinking about or creating the models. Screen capture interface. software is used to record participants’ interaction with In the second task, participants can manipulate the causal and construction of models. Participants are also fitted links and post evidence to see how changing the strength with a view point eye tracker (lightweight glasses that and directionality of the links between the nodes, and have an attached camera that tracks the corneal evidence about the state of the nodes, affects beliefs about movements of the participant’s eye to assess gaze relative whether the nodes are true or false. They will respond to to the computer screen they are working on). The eye similar sets of questions as provided in the first task. tracking system is used to record aspects of gaze position Finally, in the third task, participants will be asked to and dwell time at a screen location. Time to complete the construct models from scratch using the interface based tasks is also being recorded. on several different vignettes, such as the following: Data from the audio, eye track, and screen capture The headlight system on a car is dependent on two processes is combined to create a “process trace” of each components: a battery, which stores energy to power participant’s behavior describing and creating CIMs the lights, and an alternator, which converts (Woods, 1993). Verbalizations and actions are coded and mechanical energy from the car’s engine into stored analyzed (Bainbridge & Sanderson, 1995; Sanderson & Fisher, 1994; Woods, 1993) to identify the correctness and completeness of the descriptions and answers disregard parental independence when constructing CIMs, provided by participants in the first task, the processes and further observation of user reaction to non-intuitive with which participants constructed the models in the but correct behavior (e.g., becoming confused when second task, and the form and content of the models particular variables appear overly sensitive or insensitive produced in the third task. to posted evidence.) The CIM interface provides a user-friendly way to 3. ANTICIPATED RESULTS AND express causal influences between nodes, vastly DISCUSSION decreasing the number of parameters needed to construct causal models and providing the capability for a much The purpose of this study is to provide baseline broader base of users to perform Bayesian modeling. information regarding how people construct and describe Within the experimental interface, participants express CIM models presented and created within the relative degrees of influence over a range of 11 steps BNet.Builder interface. There is continued interest in (from positive to negative 5, with a neutral intermediate simplifying the manner in which domain expertise is value). Additional studies are necessary to clarify the elicited, and the creation and presentation of Bayesian appropriate level of granularity of influence assignment network models through direct manipulation and (e.g., 3 steps? 11 steps? 51 steps?) as well whether other visualization. However, information on how these tools methods of assigning strengths across sets of links (e.g., are used by practitioners, how they affect the models that normalized strengths, rank ordered strengths) have merit. people produce, and how they affect the way that people Finally, detailed studies with real-world models, interpret models or predict outcomes is missing. We situations, and domain experts are required. anticipate that users will have more difficulty explaining and constructing models with more parent-child Acknowledgements connections. We also anticipate users having more difficulty explaining and constructing models when there We would like to thank David Koelle, Geoffrey Catto, are more nodes with negative causal links because of the Joseph Campolongo, Sam Mahoney, Sean Guarino, and increase in complexity of the models. Eric Carlson for their contributions in the development of the CIM and identifying hypotheses to investigate. We In this study, we intend to measure reasoning patterns also extend our deepest gratitude to Greg Zacharias for involving negative quantities that give users the most his continued funding and support of our work with trouble. We anticipate that users will have the most Bayesian networks. difficulty interpreting and creating models when nodes are presented with “negatively phrased” labels (e.g., assessing the influence of a node labeled “battery is not References new” on a node labeled “headlights are dim”). If this is Bainbridge, L., & Sanderson, P. (2005). Verbal protocol the case, it suggests a need for developers of CIMs (and analysis. In J. R. Wilson & E. N. Corlett (Eds.), BNs in general) to encourage users to employ certain Evaluation of Human Work (pp. 159 - 184). Boca modeling patterns, possibly by constraining the Raton: Taylor and Francis. description of nodes. These constraints, in turn, can be accomplished through prior training or interface wizards, Cox, Z. & Pfautz, J. (2007). Causal Influence Models: A or through intelligent, automatic processing of user Method for Simplifying Construction of Bayesian entries, and provision of suggested alternatives (e.g., pop- Networks. (Rep. No. R-BN07-01). Cambridge, MA: up suggestions). These interventions could be tested in Charles River Analytics Inc. further studies. The primary contribution of this paper will be process- Diez, F. J. (1993). Parameter Adjustment in Bayes and product-oriented descriptions of how this graphical Networks: The Generalized Noisy OR-Gate. In tool is used to interpret and create CIMs. Future research Proceedings of the 9th Conference of Uncertainty in could compare how models created within the CIM Artificial Intelligence, (pp. 99-105). San Mateo, CA: framework compare to those using more traditional BN Morgan Kaufmann. structures, from the point of view of the user. This study used simple Bayesian models, with constrained Diez, F. J. & Druzdzel, M. J. (2001). Fundamentals of parameters and interaction capabilities, and used only Canonical Models. In Proceedings of Ponencia Boolean nodes. Future studies, guided by these initial Congreso: IX Conferencia De La Asociacion findings, can be conducted using more complex models, a Espanola Para La Inteligencia Artificial (CAEPIA- greater variety of node types (e.g., categorical, ordinal), and allow subjects greater flexibility in manipulating TTIA 2001), (pp. 1125-1134). CPTs and posting evidence. Other issues for investigation include measuring and mitigating user tendencies to Diez, F. J. & Galan, S. F. (2003). An Efficient confuse “evidence” and “belief” (both as terms, and in the Factorization for the Noisy MAX. International values these terms represent), measuring tendencies to Journal o Intelligent Systems, 18165-177. Glymour, C. (2001). The Mind's Arrows: Bayes Nets and Pang, B., Zhang, D., Li, N., & Wang, K. (2004). Graphical Causal Models in Psychology. Computerized Tongue Diagnosis Based on Bayesian Cambridge, MA: The MIT Press. Networks. IEEE Transactions on Biomedical Engineering, 51(10), 1803-1810. Guarino, S., Pfautz, J., Cox, Z., & Roth, E. (2006). Modeling Human Reasoning About Meta- Pearl, J. (1988). Probabilistic Reasoning in Intelligent Information. In Proceedings of 4th Bayesian Systems: Networks of Plausible Inference. San Modeling Applications Workshop at the 22nd Mateo, CA: Morgan Kaufmann. Annual Conference on Uncertainty in AI: UAI '06. Cambridge, Massachusetts. Pfautz, J., Cox, Z., Koelle, D., Catto, G., Campolongo, J., & Roth, E. (2007). User-Centered Methods for Hastie, R. & Dawes, R. M. (2001). Rational Choice in an Rapid Creation and Validation of Bayesian Uncertain World: The Psychology of Judgment and Networks. In Proceedings of 5th Bayesian Decision-Making. London, UK: Sage Publications. Applications Workshop at Uncertainty in Artificial Intelligence (UAI '07). Vancouver, British Henrion, M. (1989). Some Practical Issues in Columbia. Constructing Belief Networks. In L. Kanal, T. Levitt, & J. Lemmer (Eds.), Uncertainty in Artificial Rimey, R. & Brown, C. (1994). Control of Selective Intelligence 3 (pp. 161-173). North Holland: Perception Using Bayes Nets and Decision Theory. Elsevier Science Publishers. International Journal of Computer Vision, 12(2-3), 173-207. Jensen, F. V. (1996). An Introduction to Bayesian Networks. London: University College London Rosen, J. & Smith, W. (1996a). Influence Net Modeling Press. With Causal Strengths: An Evolutionary Approach. In Proceedings of Command and Control Research Jensen, F. V. (2001). Bayesian Networks and Decision and Technology Symposium. Graphs. New York: Springer-Verlag. Rosen, J. A. & Smith, W. L. (1996b). Influencing Global Koelle, D., Pfautz, J., Farry, M., Cox, Z., Catto, G., & Situations: A Collaborative Approach. US Air Force Campolongo, J. (2006). Applications of Bayesian Air Chronicles. Belief Networks in Social Network Analysis. In Proceedings of 4th Bayesian Modeling Applications Sanderson, P. M., & Fisher, C. (1994). Exploratory Workshop at the 22nd Annual Conference on sequential data analysis. Human Computer Uncertainty in AI: UAI '06. Cambridge, Interaction, 9(3), 251 - 317. Massachusetts. Van der Gagg, L.C., Geenen, P.L., & Tabachneck-Schijf, Kraaijeveld, P., Druzdzel, M., Onisko, A., & Wasyluk, H. H.J.M. (2006). Verifying Monotonicity of Bayesian (2005). GeNIeRate: An Interactive Generator of Networks with Domain Experts. In Proceedings of Diagnostic Bayesian Network Models. In 4th Bayesian Modeling Applications Workshop at Proceedings of Working Notes of the 16th the 22nd Annual Conference on Uncertainty in AI: International Workshop on Principles of Diagnosis UAI '06. Cambridge, Massachusetts. (DX-05), (pp. 175-180). Woods, D. D. (1993). Process tracing methods for the O'Hagan, A., Buck, C., Daneshkhah, A., Eiser, R., study of cognition outside of the experimental Garthwaite, P., Jenkinson, D. et al. (2006). psychology laboratory. In G. A. Klein, J. Orasanu, Uncertain Judgements: Eliciting Experts' R. Calderwood & C. E. Zsambok (Eds.), Decision- Probabilities. New York: Wiley & Sons. making in action: Models and Methods (pp. 228 - 251). Norwood NJ: Ablex Publishers. Onisko, A., Druzdzel, M., & Wasyluk, H. (2001). Learning Bayesian Network Parameters From Small Data Sets: Application of Noisy-OR Gates. Wellman, M. P. (1990). Fundamental Concepts of International Journal of Approximate Reasoning, Qualitative Probabilistic Networks. Artificial 27(2), 165-182. Intelligence, 44(3), 257-303.