The Why Agent Enhancing user trust in automation through explanation dialog Rob Cole Jim Jacobs Raytheon Company Raytheon Company Network Centric Systems Intelligence and Information Systems Ft. Wayne, IN, U.S.A. State College, PA, U.S.A. Robert L. Sedlmeyer Michael J. Hirsch Indiana University – Purdue University Raytheon Company Department of Computer Science Intelligence and Information Systems Ft. Wayne, IN, U.S.A Orlando, FL, U.S.A. Abstract— Lack of trust in autonomy is a recurrent issue that is trusted, manual process for purely non-technical reasons. In becoming more and more acute as manpower reduction other words, in the absence of any specific evidence of pressures increase. We address the socio-technical form of this limitations of the automation, the automation could trust problem through a novel decision explanation approach. nonetheless be rejected for reasons stemming from the social Our approach employs a semantic representation to capture milieu in which the system operates. This is the socio- decision-relevant concepts as well as other mission-relevant knowledge along with a reasoning approach that allows users to technical form of the problem. pose queries and get system responses that expose decision One might address the socio-technical problem through rationale to users. This representation enables a natural, dialog- education: train the operators with sufficient knowledge of based approach to decision explanation. It is our hypothesis that system specifications and design detail to erase doubts they the transparency achieved through this dialog process will may have regarding the automation. Such an approach is increase user trust in autonomous decisions. We tested our costly since every operator would have to be trained to a high hypothesis in an experimental scenario set in the maritime degree. Operators would essentially have to be system autonomy domain. Participant responses on psychometric trust specialists. Instead, we propose an approach intended for non- constructs were found to be significantly higher in the specialist operators, stemming from the insight that the socio- experimental group for the majority of constructs, supporting our hypothesis. Our results suggest the efficacy of incorporating technical trust problem results from a lack of insight into a decision explanation facility in systems for which a socio- system decision rationale. If an operator can be made to technical trust problem exists or might be expected to develop. understand the why of system behavior, that operator can be expected to trust the system in the future to a greater degree, if Keywords-Semantic modeling; Maritime Autonomy; Trust in the rationale given to the operator makes sense in the current Autonomy; Decision Explanation. mission context. Explanation mechanisms in expert systems have focused on I. INTRODUCTION the use of explicit representations of design logic and problem solving strategies [1]. The early history of explanation in expert Large organizations such as the Department of Defense rely systems saw the emergence of three types of approaches, as heavily on automation as a means of ensuring high-quality described in Chandrasekaran, Tanner, and Josephson [2]. Type product, as well as cost control through manpower reduction. I systems explain how data matches local goals. Type 2 However, lack of user trust has repeatedly stood in the way of systems explain how knowledge can be justified [3]. Type 3 widespread deployment. We have observed two fundamental systems explain how control strategy can be justified [4]. A forms of the problem: the technical and the socio-technical more detailed description of these types is given by Saunders form. The technical form is characterized by user reservations and Dobbs [5, p. 1102]: regarding the ability of a system to perform its mission due to known or suspected technical defects. For example, an Type 1 explanations are concerned with explaining why automated detection process might have a very high false certain decisions were or were not made during the execution (runtime) of the system. These explanations use positive rate, conditioning operators to simply ignore its information about the relationships that exist between output. Trust in such a situation can only be achieved by pieces of data and the knowledge (sets of rules for example) addressing the issue of excessive false detections, a technical available for making specific decisions or choices based on problem suggesting a purely technical solution. As another this data. For example, Rule X fired because Data Y was example, consider a situation in which automation is found to be true. introduced into a purely manual process characterized by decision making in high-pressure situations. In such a Type 2 explanations are concerned with explaining the situation, operators might reject automation in favor of the knowledge base elements themselves. In order to do this, explanations of this type must look at knowledge about This research was supported by Raytheon Corporate IR&D. knowledge. For example, knowledge may exist about a rule In Carenini and Moore [17], a comprehensive approach that identifies this rule (this piece of knowledge) as being toward the generation of evaluative arguments (called GEA) is applicable ninety percent of the time. A type 2 explanation presented. GEA focuses on the generation of text-based could use this information (this knowledge about arguments expressed in natural language. The initial step of knowledge) to justify the use of this rule. Other knowledge GEA‘s processing consists of a text planner selecting content used in providing this type of explanation consists of from a domain model by applying a communicative strategy to knowledge that is used to develop the ES but which does achieve a communication goal (e.g. make a user feel more not affect the operation of the system. This type of positively toward an entity). The selected content is packaged knowledge is referred to as deep knowledge. into sentences through the use of a computational grammar. The underlying knowledge base consists of a domain model Type 3 explanations are concerned with explaining the with entities and their relationships and an additive multi- runtime control strategy used to solve a particular problem. attribute value function (a decision-theoretic model of the For example, explaining why one particular rule (or set of user‘s preferences). rules) was fired before some other rule is an explanation about the control strategy of the system. Explaining why a In Gruber and Gautier [18] and Gautier and Gruber [19] an certain question (or type of question) was asked of the user approach to explaining the behavior of engineering models is in lieu of some other logical or related choice is another presented. Rather than causal influences that are hard-coded example. Therefore, type 3 explanations are concerned with [20], this approach is based on the inference of causal explaining how and why the system uses its knowledge the influences, inferences which are made at run time. Using a way it does, a task that also requires the use of deep previously developed causal ordering procedure, an influence knowledge in many cases. graph is built from which causal influences are determined. At any point in the influence graph, an explanation can be built Design considerations for explanations with dialog are based on the adjacent nodes and users can traverse the graph, discussed in a number of papers by Moore and colleagues ([6], obtaining explanations at any node. [7], [8] and [9]). These papers describe the explainable expert systems (EES) project which incorporates a representation for Approaches to producing explanations in MDPs are problem-solving principles, a representation for domain proposed in Elizalde et al. [21] and Khan, Poupart and Black knowledge and a method to link between them. In Moore and [22]. Two strategies exist for producing explanations in BNs. Swartout [6], hypertext is used to avoid the referential One involves transforming the network into a qualitative problems inherent in natural language analysis. To support representation [23]. The other approach focuses on the dialog with hypertext, a planning approach to explanation was graphical representation of the network. A software tool called developed that allowed the system to understand what part of Elvira is presented which allows for the simultaneous display the explanation a user is pointing at when making further of probabilities of different evidence cases along with a queries. Moore and Paris [8] and Carenini and Moore [9] monitor and editor of cases, allowing the user to enter evidence discuss architectures for text planners that allow for and select the information they want to see [24]. explanations that take into account the context created by prior utterances. In Moore [10], an approach to handling badly- An explanation application for JAVA debugging is formulated follow-up questions (such as a novice might presented in Ko and Myers [25]. This work describes a tool produce after receiving an incomprehensible explanation from called Whyline which supports programmer investigation of an expert) is presented that enables the production of clarifying program behavior. Users can pose ―why did‖ and ―why didn’t‖ explanations. Tanner and Keuneke [11] discuss an explanation questions about program code and execution. Explanations are approach based on a large number of agents with well-defined derived using a static and dynamic slicing, precise call graphs, roles is described. A particular agent produces an explanation reachability analysis and algorithms for determining potential of its conclusion by ordering a set of text strings in a sequence sources of values. that depends on the decision‘s runtime context. Based on an Explanations in case-based reasoning systems are examined explanation from one agent, users can request elaboration from as well. Sørmo, Cassens, and Aamodt [26] present a framework other agents. for explanation and consider specific goals that explanations can satisfy which include transparency, justification, relevance, Weiner [12] focuses on the structure of explanations with the goal of making explanations easy to understand by avoiding conceptualization and learning. Kofod-Petersen and Cassens complexity. Features identified as important for this goal [27] consider the importance of context and show how context include syntactic form and how the focus of attention is located and explanations can be combined to deal with the different and shifted. Eriksson [13] examines answers generated through types of explanation needed for meaningful user interaction. transformation of a proof tree, with pruning of paths, such as Explanation of decisions made via decision trees is non-informative ones. Millet and Gilloux [14] describe the considered in Langlotz, Shortliffe, and Fagan [28]. An approach in Wallis and Shortliffe [15] as employing a user explanation technique is selected and applied to the most model in order to provide users with explanations tailored to significant variables, creating a symbolic expression that is their level of understanding. The natural language aspect of converted to English text. The resulting explanation contains explanation is the focus of Papamichail and French [16], which no mathematical formulas, probability or utility values. uses a library of text plans to structure the explanations. Lieberman and Kumar [29] discuss the problem of mismatch between the specialized knowledge of experts providing help and the naiveté of users seeking help is on the decision. In each case, the amount of transparency in considered. Here, the problem consists of providing the decision-making process is a factor in the trust of the user. explanations of the expert decisions in terms the users can understand. The SuggestDesk system is described which Our approach to providing transparency, the Why Agent, is advises online help personnel. Using a knowledgebase, a decision explanation approach incorporating dialog between analogies are found between technical problem-solution pairs the user and the system. Rather than attempting to provide and everyday life events that can be used to explain them. monolithic explanations to individual questions, our dialog- based approach allows the user to pose a series of questions, Bader et al. [30] use explanation facilities in recommender the responses to which may prompt additional questions. systems to convince users of the relevance of recommended Imitative of natural discourse, our dialog approach allows a items and to enable fast decision making. In previous work, user to understand the behavior of the system by asking Bader found that recommendations lack user acceptance if the questions about its goals, actions or observables and receiving rationale was not presented. This work follows the approach of responses couched in similar terms. We implemented our Carenini and Moore [17]. approach and conducted an evaluation in a maritime autonomy scenario. The evaluation consisted of an experiment in which In Pu and Chen [31], a ―Why?‖ form of explanation was two versions of an interface were shown to participants who evaluated against what the researchers termed an Organized then answered questions related to trust. Results of the View (OV) form of explanation in the context of explanations experiment show response scores statistically consistent with of product recommendations. The OV approach attempts to our expectations for the majority of psychometric constructs group decision alternatives and provide group-level summary tested, supporting our overall hypothesis that transparency explanations, e.g. ―these are cheaper than the recommendation fosters trust. The rest of this paper is organized as follows. but heavier.‖ A trust model was used to conduct a user Section II describes the problem domain and the technical evaluation in which trust-related constructs were assessed approach. Experiments and results are presented in Section III. through a Likert scale instrument. The OV approach was found In Section IV, we provide some concluding remarks and future to be associated with higher levels of user trust than the research directions. alternative approach. The important of the use of context in explaining the II. TECHNICAL APPROACH recommendations of a recommendation system was investigated in Baltrunas et al. [32]. In this study of point-of- A. Domain Overview interest recommendation, customized explanation messages are Our approach to demonstrating the Why Agent provided for a set of 54 possible contextual conditions (e.g. functionality and evaluating its effectiveness consisted of a ―this place is good to visit with family‖). Even where more simulation-based environment centered on a maritime scenario than one contextual condition holds and is factored into the defined in consultation with maritime autonomy SMEs. The system‘s decision, only one can be utilized for the explanation notional autonomous system in our scenario was the X3 (the most influential one in the predictive model is used). Only autonomous unmanned surface vehicle (AUSV) by Harbor a single explanatory statement is provided to the user. Wing Technologies 1 . Raytheon presently has a business Explanation capabilities have also been shown to aid in relationship with this vendor in which we provide ISR increasing user satisfaction with and establishing trust in packages for their AUSVs. complex systems [34, 35, 36]. The key insight revealed by this The X3 was of necessity a notional AUSV for our research is the need for transparency in system decision- demonstration because the actual prototype was not operational making. As noted by Glass et al., ―users identified explanations at the time of the Why Agent project. For this reason, a live, of system behavior, providing transparency into its reasoning on-system demonstration was not considered. Instead, our and execution, as a key way of understanding answers and thus demonstration environment was entirely simulation-based. An establishing trust. [37]‖ Dijkstra [38] studied the existing route planning engine developed under Raytheon persuasiveness of decision aids, for novices and experts. In one research was modified to serve as the AUSV planner. experiment, lawyers examined the results of nine legal cases Additional code was developed to support the simulation supported by one out of two expert systems. Both systems had environment and Why Agent functionality, as described below. incomplete knowledge models. Because of the incomplete models, the expert systems routinely gave opposite advice on each legal case. This resulted in the lawyers being easily mis- B. Software Architecture led. Therefore, adequate explanation facilities and a good user- Our software architecture consists of four components interface must provide the user with the transparency needed to interacting in a service-oriented architecture, as shown in make the decision of trusting the system. Rieh and Danielson Figure 1. [39] Outline four different explanation types of decision aids. The Planner component performed route planning functions Line-of-reasoning explanations provide the logical justification based on a plan of intended movement. A plan of intended of the decision; justification explanations provide extensive movement is input in the form of a series of waypoints. These reference material to support the decision; control explanations waypoints, along with environmental factors, such as weather provide the problem-solving strategy to arrive at the decision; forecast data, are used in the planning algorithm to determine and terminological explanations provide definition information 1 http://www.harborwingtech.com an actual over-ocean route. The planner was a pre-existing available to the user in the context-sensitive menu for the component developed on R&D that the Why Agent leveraged ConductPatrol item. When the user selects the ConductPatrol for the demonstration. Modifications made to the planner to item and the associated why? option, a query is generated that support the Why Agent project include changes to expose route contains IDs associated with the ConductPatrol node and the change rationale to the controller and inform the controller of servesPurpose link. The linked node, in this case weather report information. MissionExecution,is then returned to the user as the result of a query against the associated OWL model. Figure 1: SW architecture for Why Agent. The Controller represents the embodiment of the majority of the simulated AUSV decision logic and simulation control logic. Because we did not employ an actual AUSV for the Why Agent project, much of the decision logic of an actual AUSV had to be simulated for our demonstration, logic implemented in the Controller. The input to the Controller consisted of a test control file that defined the event timeline for the simulation. In addition to orchestrating simulation events defined in the Figure 2: General GUI for Why Agent interface. control file, the Controller mediated queries and responses between the user interface and the semantic service. The graphical user interface was implemented as a web application. Two versions of the GUI were developed, one with and one without the Why Agent explanation facility. The Why Agent version is shown in Figure 2. It has four screen regions: a map, a status panel, a log data panel and an explanation panel. The map, implemented with Google Map technology, shows the current location and route of the AUSV. The status panel shows various AUSV status values, such as location, speed, current mode, etc. The log panel shows a time-stamped series of event descriptions. Various items in the log panel are user-selectable and have context-sensitive menus to support the Figure 3: Example domain model. user interface functionality of the Why Agent facility. When a user makes a selection, the response from the semantic service III. EXPERIMENTATION is shown in the bottom (explanation) panel. Additionally, responses in the explanation panel are also selectable for Our evaluation approach consisted of an experiment in further queries. In this manner, the user can engage in a dialog which the Why Agent was the treatment. Two versions of a with the system. prototype operator interface were developed. One version incorporated the Why Agent functionality and the second did The semantic service contains the knowledgebase not. The two versions were otherwise identical. Screenshots of underlying the decision rationale exposed by the Why Agent. the two interface versions are presented in Figures 4 and 5. The knowledge consists of event and domain ontology models represented in web ontology language (OWL) format. The A. Demonstration Scenario semantic service provides responses to queries from the The demonstration scenario consisted of autonomous controller through queries against its underlying models. fishing law enforcement in the Northwestern Hawaiian Islands An example of a domain model is shown in Figure 3. Marine National Monument. The CONOP for this mission is as Relationships in this figure encode potential queries linking follows: concepts and events that can be displayed in the user interface. For example, the activity ConductPatrol relates to the function The AUSV operator selects waypoints corresponding to MissionExecution through the relationship servesPurpose. This a patrol area. relationship is statically associated with the query why? at the The AUSV route planner finds a route through the user level. Thus, the existence of this link connected with the waypoints and a patrol is conducted. node ConductPatrol implies a why? option being made RADAR is used to detect potential illegal fishing vessels we selected the following set of five psychometric constructs: (targets) 1. General Competence, 2) Self-Defense, 3) Navigation, 4) Environmental Conservation and 5) Mission. Each construct is Targets are investigated visually after AUSV closes to intended to capture the users‘ belief regarding the system‘s an adequate proximity. ability to effectively perform in regard to that construct, i.e. the user‘s level of trust for that construct. For example, the Automated analysis of the visual data is used to confirm construct Mission attempts to encompass user attitudes toward the target is engaged in illegal fishing. the ability of the system to successfully execute its mission. Targets engaged in illegal activity are visually identified The Environmental Conservation construct was included as an for subsequent manned enforcement action. example of a construct under which we would not expect to see a difference in psychometric responses. Non-lethal self-defensive actions can be taken by the AUSV in the presence of hostile targets. To support this demonstration, a software-based simulation environment was developed. The demonstration consisted of capturing video of user interactions with the baseline and Why Agent versions of the operator interface while a scripted series of events unfolded over a pre-determined timeline. Figure 5: Operator interface with the Why Agent functionality. For each construct, we have a set of possible trust levels and a set of psychometric participant response scores. Define these as follows (for this study, k=5): Set of k constructs C = {cj : 1 ≤ j ≤ k} Set of trust levels L = {low, high} Figure 4: Operator interface without the Why Agent functionality. Psychometric participant response scores for each B. Experimental Design construct: Our experiment consisted of a single-factor, randomized Control: RC = {rjC : 1 ≤ j ≤ k } design. The factor is interface type and has two levels: baseline (control) and Why Agent (experimental). Thus, we have two Experimental: RE = {rjE : 1 ≤ j ≤ k } treatment levels, corresponding to the two factor types. The Here, we take the simplest possible approach, a binary trust experimental subjects were Raytheon employees, recruited level set. We simply assume that the trust level for a particular across multiple Raytheon locations, during the project. construct should either be low or high, with nothing in Our general hypothesis is that the Why Agent fosters a between. Clearly, many other trust models are possible. To more appropriate level of trust in users than the baseline operationalize the notion of ―more appropriate level of trust‖, system. By utilizing the information provided by the Why we need to define, for each construct, a ground truth Agent, users will be more able to calibrate their trust [33]. To assignment of trust level. Thus, we need to define the following test this hypothesis, we needed to operationalize the concept of mapping T: ―more appropriate level of trust‖ and thereby derive one or Mapping of construct to trust level: T(j)  L more testable hypotheses. We accomplished this through the following operationalization. o T(j) = low: People should not trust the system regarding construct j Trust in a particular system, being an unobservable mental aspect of a user, necessitates the use of psychometric readings o T(j) = high: People should trust the system of constructs related to the overall concept of trust. Given the regarding construct j. broad nature of this concept, multiple constructs should be defined. Using our domain insight and engineering judgment, Additionally, we need to map the elements of the trust set Equations (2) – (3) unless dialog exposing decision rationale to psychometric scale values. In other words, we need to relevant to this concept is included in the scenario. normalize the scale as follows: Based on this reasoning, we expect the effect of decision Mapping of trust level to psychometric scale values explanation to be one of pushing response scores up or down, toward the appropriate trust level but only in cases where S: S(low) = 1; S(high) = 5. explanation dialog related to the construct under test is At this point, we can define the concept of ―appropriate exposed. In other cases, we expect no difference in the level of trust‖ in terms of the psychometric scale through a response scores, as indicated in Table 1. We note that the null composition of the above mappings S and T. In other words, for hypotheses are derived as the complementary sets to the each construct, the appropriate level of trust is the equations in Table 1. E.g., the ‗low, with relevant dialog‘ null psychometric value associated with the trust level assigned to hypothesis equation would be rjE – rjC ≥ 0. that construct: A total of 44 control and 50 experimental subjects were Appropriate Level of Trust with respect to design recruited for the Why Agent study. The experiment was intent A = {aj : 1 ≤ j ≤ k } designed to be completed in one hour. Following a short orientation, a pre-study questionnaire was presented to the For each construct cj, the appropriate level of trust aj for participants. The pre-study questionnaire contained questions that construct is given by regarding participant demographics and technology attitudes. aj = S(T(j)), 1 ≤ j ≤ k (1) The purpose of the pre-study questionnaire was to determine whether any significant differences existed between the A key aspect of the above definition is the qualifier with experimental and control groups. Following the pre-study respect to design intent. We assume the system functions questionnaire, participants were given a short training without defects. With respect to design intent simply means ―it regarding the autonomous system and their role in the study. should be trusted to accomplish X if it is designed to Participants were asked to play the role of a Coast Guard accomplish X.‖ We make this assumption for simplification commander considering use of the autonomous system for a purposes, fully acknowledging that no real system is defect- drug smuggling interdiction mission. Following the training, free. In the presence of defects, the notion of appropriate level participants were shown the scenario video which consisted of of trust becomes more complex. several minutes of user interaction with either the baseline or Why Agent interface. Following the video, participants Having defined appropriate level of trust, we are finally in completed the main study questionnaire. The system training a position to define the key concept, more appropriate level of was provided in a series of powerpoint slides. Screenshots trust. The intuition underlying this notion is the observation taken from the study video were provided to the participants in that if one‘s trust level is not appropriate to begin with, any hardcopy form, along with hardcopies of the training material. intervention that moves the trust level toward the appropriate This was done to minimize any dependence on memory for score by a greater amount than some other intervention can be participants when completing the study questionnaire. said to provide a ―more‖ appropriate level of trust. The Why Table 1: Expected responses as a result of decision explanation. Agent specifically exposes information associated with the purpose of AUSV actions. Such additional information serves Experimental Condition to build trust [33]. If the psychometric score for the With relevant dialog Without relevant dialog experimental group is closer to the appropriate trust level than Experimental response Experimental response the score for the control group, then we can say that the less than control indistinguishable from experimental treatment provided a more appropriate level of Low response control response trust for that construct. Formally, we define this concept as Construct rjE – rjC < 0 rjE – rjC = 0 follows: trust level Experimental response Experimental response More appropriate level of trust: Given observed greater than control indistinguishable from response scores rjC and rjE for construct j, the High response control response experimental response rjE reflects a more appropriate rjE – rjC > 0 rjE – rjC = 0 level of trust when the following holds rjE - rjC < 0 if aj = 1 C. Experimental Results rjE - rjC > 0 if aj = 5 To investigate whether significant differences exist between the control and experimental groups in terms of responses to We expect the Why Agent to affect observed trust levels the technology attitudes questions, ANOVA was performed. only for those constructs for which relevant decision criteria The results are shown in Table 2. Cronbach reliability are exposed during the scenario. In these cases, we expect coefficients, construct variances and mean total response scores Equations (2)-(3) to hold. In all other cases, we do not. For are shown for the control and experimental groups in Tables 3 example, since the AUSV is not designed to protect marine life, and 4. we assert that the appropriate level of trust for the Environmental Conservation construct is ―low.‖ However, we To investigate whether significant differences exist between do not expect to observe response levels consistent with the control and experimental groups in terms of responses to the study questions, ANOVA was performed. For this study, we focused our analysis on individual constructs. Thus, we do 0.16), which is also consistent with our expectations as this not present any statistics on, for example, correlations among construct had no associated decision explanation content responses related to multiple constructs for either the control or exposed to the experimental group. The experimental response experimental group. The results are shown in Table 6. for construct 3 was not significantly higher than the control Table 2: ANOVA computations analyzing differences between control and response, which is inconsistent with our expectations, although experimental groups, for technology attitude questions. the difference is only marginally outside the significance threshold (p = 0.059). Table 6: ANOVA computations analyzing differences between control and experimental groups, for study questions. Table 3: Cronbach reliability coefficients, construct variances, and means for control group. Control Results While the test results indicate moderate support for the Variances efficacy of the Why Agent approach, they are decidedly mixed, Construct Q1 Q2 Q3 Total Cronbach Alpha Mean so it is not possible to draw any definitive conclusions. As 1 0.492 0.306 0.348 2.20 0.72 11.11 discussed below, we recognize that a number of significant 2 0.710 0.517 NA 1.79 0.63 6.43 3 0.720 0.319 NA 1.05 0.02 7.30 limitations also hinder the application of our results. A pilot 4 0.911 0.670 NA 2.02 0.43 6.73 study would have helped to create a stronger experimental 5 0.953 0.586 NA 2.23 0.62 7.34 design and recruit a more representative sample population, but this was not possible due to budget and schedule constraints. T-test results for each construct are shown in Table 5. Two Nevertheless, the study has provided initial evidence for how p-values are shown for each construct; p1 represents the p- and to what extent the Why Agent approach might influence value resulting from use of the pooled variance while p2 trust behavior in autonomous systems, and given impetus for represents the p-value resulting from use of separate variances. continued investigations. The ANOVA results shown in Table 2 indicate that the Construct Reliability: Referring to Table 4, we see that experimental and control groups did not significantly differ reliability coefficients for some constructs are not above the across any attribute in terms of their responses to the commonly-accepted value of 0.7. Had schedule permitted, a technology attitudes questions. In other words, we do not see pilot study could have uncovered this issue, providing an any evidence of a technology attitude bias in the study opportunity to revise the questionnaire. participants. Experiment Limitations: Clearly a variety of limitations Table 4: Cronbach reliability coefficients, construct variances, and means for control group. apply to our experiment. One is that participants did not interact directly with the system interface; instead entire groups Experimental Results of participants were shown a video of someone else interacting Variances with the system. Also, the participants were not drawn from the Construct Q1 Q2 Q3 Total Cronbach Alpha Mean population of interest. Consequently, our results may not apply 1 0.286 0.262 0.449 1.94 0.73 12.06 to that target group. Additionally, subjects were asked to play a 2 0.689 0.694 NA 2.18 0.73 7.22 role with much less information than a real person in that role 3 0.480 0.367 NA 1.17 0.56 7.64 would have. Also, as noted by a reviewer, the experimental 4 0.571 0.621 NA 1.92 0.76 7.14 design does not allow us to determine whether decision 5 0.898 0.629 NA 2.05 0.51 7.46 correctness is related to trust when clearly it should be; an intervention that raises trust regardless of correctness is not Table 5: T-test computations for each construct. desirable. Finally, execution of the experiment could have been Construct Hypothesis Tests improved. In particular, our maritime autonomy SME noted: p-values The Mode should have reflected the simulation events; The Construct p1 p2 Null Hypothesis Result LRAD light should have illuminated during the approach phase 1 0.001 0.001 Experimental score is not greater than Control score Reject Null Hypothesis 2 0.004 0.004 Experimental score is not greater than Control score Reject Null Hypothesis with an audio warning; The subjects should have been trained 3 4 0.058 0.158 0.059 0.159 Experimental score is not greater than Control score Experimental score is equal to Control score Accept Null Hypothesis Accept Null Hypothesis on the nonlethal defense functions. 5 0.348 0.347 Experimental score is not greater than Control score Accept Null Hypothesis Semantic Modeling: A potentially significant drawback to For constructs one and two, the experimental response was our approach is the manually-intensive nature of the semantic greater than the control response (p = 0.001 and 0.004, modeling effort needed to populate our knowledgebase. respectively), consistent with our expectations. For construct Identifying ways to automate this process is a key area of four, environmental conservation, we see no significant potential future work related to this effort. difference between the experimental and control responses (p = IV. CONCLUDING REMARKS [17] Carenini and Moore, ―Generating and evaluating evaluative arguments,‖ Artificial Intelligence, vol. 170, no. 11, pp. 925-952, 2006. We draw the following specific conclusions based on the [18] T. R. Gruber and P. O. Gautier, ―Machine-generated explanations of quantitative results reported above. First, the experimental and engineering models: A compositional modeling approach,‖ IJCAI, 1993. control groups do not significantly differ across any attribute in [19] Patrice O. Gautier and Thomas R. Gruber, ―Generating Explanations of terms of their responses to the technology attitudes questions. Device Behavior Using Compositional Modeling and Causal Ordering,‖ The experimental and control groups do not significantly differ AAAI, 1993. across any non-Group attribute in terms of their responses to [20] B. White and J. Frederiksen, ―Causal model progressions as a foundation the study questions with the exception of gender differences for for Intelligent learning,‖ Artificial Intelligence, vol. 42, no. 1, pp. 99- 155, 1990. construct. Construct reliability is low in some cases, indicating [21] F. Elizalde et al., ―An MDP approach for explanation. Generation,‖ In the need for a prior pilot study to tune the psychometric Workshop on Explanation-Aware Computing with AAAI, 2007. instrument. We accept the null hypothesis for construct 4 and [22] O. Z. Khan et al., ―Explaining recommendations generated by MDPs,‖ reject it for constructs 1 and 2, as predicted under our In Workshop on Explanation Aware Computing, 2008. assumptions. We cannot reject the hypothesis associated with [23] S. Renooij and L. Van-DerGaa, ―Decision making in qualitative construct 3, although this is a very marginal case. The results of influence diagrams,‖ In Proceedings of the Eleventh International construct 5 are contradictory to our expectations. Overall, we FLAIRS Conference, pp. 410–414, 1998. conclude that the Why Agent approach does increase user trust [24] C. Lacave et al. ―Graphical explanations in bayesian networks,‖ In levels through decision transparency. Lecture Notes in Computer Science, vol. 1933, pp. 122–129. Springer- Veralg, 2000. [25] Andrew Ko and Brad Myers, ―Extracting and answering why and why REFERENCES not questions about Java program output,‖ ACM Transactions on [1] B. Chandrasekaran and W. Swartout, ―Explanations in knowledge Software Engineering and Methodology, vol. 20, no. 2, 2010. systems: the role of explicit representation of design knowledge,‖ IEEE [26] F. Sørmo et al., ―Explanation in case-based reasoning – perspectives and Expert vol. 6, no. 3, pp. 47-19, 1991. goals,‖ Artificial Intelligence Review, vol 24, no. 2005, pp. 109–143, [2] B. Chandrasekaran et al., ―Explaining control strategies in problem 2005. solving,‖ IEEE Expert vol. 4, no.1, pp. 9-15, 1989. [27] A. Kofod-Petersen and J. Cassens, ―Explanations and context in ambient [3] William R. Swartout. ―XPLAIN: a system for creating and explaining intelligent systems, in Proceedings of the 6th international and expert consulting programs,‖ Artificial Intelligence, vol. 21, no. 3, pp. interdisciplinary conference on Modeling and using context, 2007. 285-325, 1983. [28] C. P. Langlotz et al., ―A methodology for generating computer-based [4] William J. Clancey, ―The epistemology of a rule-based expert system — explanations of decision-theoretic advice,‖ Med Decis Making, vol. 8, a framework for explanation,‖ Artificial Intelligence, vol 20., no. 3, pp. no. 4, pp. 290-303, 1988. 215-251, 1983. [29] H. Lieberman and A. Kumar, ―Providing expert advice by analogy for [5] V. M. Saunders and V. S. Dobbs, ―Explanation generation in expert on-line help,‖ in Proceedings of the IEEE/WIC/ACM International systems,‖ in Proceedings of the IEEE 1990 National Aerospace and Conference on Intelligent Agent Technology, pp. 26-32, 2005. Electronics Conference, vol. 3, pp. 1101-1106, 1990. [30] Baderet et al., ―Explanations in Proactive Recommender Systems in [6] J. Moore and W. Swartout, ―Pointing: A Way Toward Explanation Automotive Scenarios,‖ Workshop on Decision Making and Dialog,‖ AAAI Proceedings, pp.457-464, 1990. Recommendation Acceptance Issues in Recommender Systems Conference, 2011. [7] Swartout et al., 1991. ―Explanations in knowledge systems: design for explainable expert systems,‖ IEEE Expert, vol. 6, no. 3, pp. 58-64, 1991. [31] P. Pu and L. Chen, ―Trust building with explanation interfaces,‖ in: 11th International conference on Intelligent User Interfaces, pp. 93-100, [8] Johanna D. Moore and Cécile L. Paris, ―Planning text for advisory 2006. dialogues,‖ in Proceedings of the 27th annual meeting on Association for Computational Linguistics, 1989. [32] Baltrunas et al., ―Context-Aware Places of Interest Recommendations and Explanations,‖ in 1st Workshop on Decision Making and [9] Giuseppe Carenini and Johanna D. Moore, ―Generating explanations in Recommendation Acceptance Issues in Recommender Systems, context,‖ in Proceedings of the 1st international conference on (DEMRA 2011), 2001. Intelligent user interfaces, 1993. [33] J. D. Lee K. A. See, Trust in Automation: Designing for Appropriate [10] J. D. Moore, ―Responding to ‗HUH?‘: answering vaguely articulated Reliance. Human Factors, vol 46, no 1, pp. 50-80, 2004. follow-up questions,‖ in Proceedings of the SIGCHI conference on Human factors in computing systems: Wings for the mind, 1989. [34] D. L. McGuinness et al., ―Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study,‖ in Workshop on the [11] M.C. Tanner and A.M. Keuneke, ―Explanations in knowledge systems: Models of Trust for the Web, 2006. the roles of the task structure and domain functional models,‖ IEEE Expert, vol. 6, no. 3, 1991. [35] I. Zaihrayeu, P. Pinheiro da Silva, and D. L. McGuinness, ―IWTrust: Improving User Trust in Answersfrom the Web,‖ in Proceedings of the [12] J. L. Weiner, ―BLAH, a system which explains its reasoning,‖ Artificial 3rd International Conference on Trust Management, pp. 384-392, 2005. Intelligence, vol. 15, no. 1-2, pp. 19-48, 1980. [36] B. Y. Lim, A. K. Dey, and D. Avrahami, ―Why and why not [13] Agneta Eriksson, ―Neat explanation of Proof Trees,‖ in Proceedings of explanations improve the intelligibility of context-aware intelligent the 9th international joint conference on Artificial intelligence, vol. 1, systems,‖ in Proceedings of the 27th international conference on Human 1985. factors in computing systems, pp. 2119-2128, 2009. [14] C. Millet and M. Gilloux, ―A study of the knowledge required for [37] A. Glass, D. L. McGuinness, and M. Wolverton, ―Toward establishing explanation in expert systems,‖ in Proceedings of Artificial Intelligence trust in adaptive agents,‖ in Proceedings of the 13th international Applications, 1989. conference on Intelligent user interfaces, pp. 227-236, 2008. [15] J.W. Wallis and E.H. Shortliffe, "Customized explanations using [38] J. J. Dijkstra, ―On the use of computerised decision aids: an investigation causal knowledge," in Rule-based Expert Systems, Addison-Wesley, into the expert system as persuasive communicator,‖ Ph.D. dissertation, 1984. 1998. [16] K. N. Papamichail and S. French, ―Explaining and justifying the advice [39] S. Y. Rieh and D. R. Danielson, ―Credibility: a multidisciplinary of a decision support system: a natural language generation approach,‖ framework.‖ In Annual Review of Information Science and Technology, Expert Systems with Applications, vol. 24, no. 1, pp. 35-48, 2003. B. Cronin (Ed.), Vol. 41, pp. 307-364, 2007.