=Paper= {{Paper |id=None |storemode=property |title=The Why Agent: Enhancing User Trust in Automation through Explanation Dialog |pdfUrl=https://ceur-ws.org/Vol-966/STIDS2012_T10_HirschEtAl_TheWhyAgent.pdf |volume=Vol-966 |dblpUrl=https://dblp.org/rec/conf/stids/ColeJHS12 }} ==The Why Agent: Enhancing User Trust in Automation through Explanation Dialog== https://ceur-ws.org/Vol-966/STIDS2012_T10_HirschEtAl_TheWhyAgent.pdf
                                                     The Why Agent
                        Enhancing user trust in automation through explanation dialog

                             Rob Cole                                                        Jim Jacobs
                                                                                        Raytheon Company
                     Raytheon Company                                                 Network Centric Systems
           Intelligence and Information Systems                                        Ft. Wayne, IN, U.S.A.
                  State College, PA, U.S.A.
                                                                                       Robert L. Sedlmeyer
                     Michael J. Hirsch                                         Indiana University – Purdue University
                    Raytheon Company                                              Department of Computer Science
           Intelligence and Information Systems                                        Ft. Wayne, IN, U.S.A
                    Orlando, FL, U.S.A.

Abstract— Lack of trust in autonomy is a recurrent issue that is     trusted, manual process for purely non-technical reasons. In
becoming more and more acute as manpower reduction                   other words, in the absence of any specific evidence of
pressures increase. We address the socio-technical form of this      limitations of the automation, the automation could
trust problem through a novel decision explanation approach.         nonetheless be rejected for reasons stemming from the social
Our approach employs a semantic representation to capture
                                                                     milieu in which the system operates. This is the socio-
decision-relevant concepts as well as other mission-relevant
knowledge along with a reasoning approach that allows users to       technical form of the problem.
pose queries and get system responses that expose decision                  One might address the socio-technical problem through
rationale to users. This representation enables a natural, dialog-   education: train the operators with sufficient knowledge of
based approach to decision explanation. It is our hypothesis that    system specifications and design detail to erase doubts they
the transparency achieved through this dialog process will           may have regarding the automation. Such an approach is
increase user trust in autonomous decisions. We tested our           costly since every operator would have to be trained to a high
hypothesis in an experimental scenario set in the maritime           degree. Operators would essentially have to be system
autonomy domain. Participant responses on psychometric trust         specialists. Instead, we propose an approach intended for non-
constructs were found to be significantly higher in the
                                                                     specialist operators, stemming from the insight that the socio-
experimental group for the majority of constructs, supporting
our hypothesis. Our results suggest the efficacy of incorporating    technical trust problem results from a lack of insight into
a decision explanation facility in systems for which a socio-        system decision rationale. If an operator can be made to
technical trust problem exists or might be expected to develop.      understand the why of system behavior, that operator can be
                                                                     expected to trust the system in the future to a greater degree, if
   Keywords-Semantic modeling; Maritime Autonomy; Trust in           the rationale given to the operator makes sense in the current
Autonomy; Decision Explanation.                                      mission context.
                                                                         Explanation mechanisms in expert systems have focused on
                        I.     INTRODUCTION                          the use of explicit representations of design logic and problem
                                                                     solving strategies [1]. The early history of explanation in expert
Large organizations such as the Department of Defense rely
                                                                     systems saw the emergence of three types of approaches, as
heavily on automation as a means of ensuring high-quality
                                                                     described in Chandrasekaran, Tanner, and Josephson [2]. Type
product, as well as cost control through manpower reduction.         I systems explain how data matches local goals. Type 2
However, lack of user trust has repeatedly stood in the way of       systems explain how knowledge can be justified [3]. Type 3
widespread deployment. We have observed two fundamental              systems explain how control strategy can be justified [4]. A
forms of the problem: the technical and the socio-technical          more detailed description of these types is given by Saunders
form. The technical form is characterized by user reservations       and Dobbs [5, p. 1102]:
regarding the ability of a system to perform its mission due to
known or suspected technical defects. For example, an                       Type 1 explanations are concerned with explaining why
automated detection process might have a very high false                certain decisions were or were not made during the
                                                                        execution (runtime) of the system. These explanations use
positive rate, conditioning operators to simply ignore its
                                                                        information about the relationships that exist between
output. Trust in such a situation can only be achieved by
                                                                        pieces of data and the knowledge (sets of rules for example)
addressing the issue of excessive false detections, a technical         available for making specific decisions or choices based on
problem suggesting a purely technical solution. As another              this data. For example, Rule X fired because Data Y was
example, consider a situation in which automation is                    found to be true.
introduced into a purely manual process characterized by
decision making in high-pressure situations. In such a                     Type 2 explanations are concerned with explaining the
situation, operators might reject automation in favor of the            knowledge base elements themselves. In order to do this,
                                                                        explanations of this type must look at knowledge about
   This research was supported by Raytheon Corporate IR&D.
   knowledge. For example, knowledge may exist about a rule             In Carenini and Moore [17], a comprehensive approach
   that identifies this rule (this piece of knowledge) as being     toward the generation of evaluative arguments (called GEA) is
   applicable ninety percent of the time. A type 2 explanation      presented. GEA focuses on the generation of text-based
   could use this information (this knowledge about                 arguments expressed in natural language. The initial step of
   knowledge) to justify the use of this rule. Other knowledge      GEA‘s processing consists of a text planner selecting content
   used in providing this type of explanation consists of           from a domain model by applying a communicative strategy to
   knowledge that is used to develop the ES but which does          achieve a communication goal (e.g. make a user feel more
   not affect the operation of the system. This type of             positively toward an entity). The selected content is packaged
   knowledge is referred to as deep knowledge.                      into sentences through the use of a computational grammar.
                                                                    The underlying knowledge base consists of a domain model
       Type 3 explanations are concerned with explaining the        with entities and their relationships and an additive multi-
   runtime control strategy used to solve a particular problem.     attribute value function (a decision-theoretic model of the
   For example, explaining why one particular rule (or set of       user‘s preferences).
   rules) was fired before some other rule is an explanation
   about the control strategy of the system. Explaining why a           In Gruber and Gautier [18] and Gautier and Gruber [19] an
   certain question (or type of question) was asked of the user     approach to explaining the behavior of engineering models is
   in lieu of some other logical or related choice is another       presented. Rather than causal influences that are hard-coded
   example. Therefore, type 3 explanations are concerned with       [20], this approach is based on the inference of causal
   explaining how and why the system uses its knowledge the         influences, inferences which are made at run time. Using a
   way it does, a task that also requires the use of deep           previously developed causal ordering procedure, an influence
   knowledge in many cases.                                         graph is built from which causal influences are determined. At
                                                                    any point in the influence graph, an explanation can be built
    Design considerations for explanations with dialog are          based on the adjacent nodes and users can traverse the graph,
discussed in a number of papers by Moore and colleagues ([6],       obtaining explanations at any node.
[7], [8] and [9]). These papers describe the explainable expert
systems (EES) project which incorporates a representation for           Approaches to producing explanations in MDPs are
problem-solving principles, a representation for domain             proposed in Elizalde et al. [21] and Khan, Poupart and Black
knowledge and a method to link between them. In Moore and           [22]. Two strategies exist for producing explanations in BNs.
Swartout [6], hypertext is used to avoid the referential            One involves transforming the network into a qualitative
problems inherent in natural language analysis. To support          representation [23]. The other approach focuses on the
dialog with hypertext, a planning approach to explanation was       graphical representation of the network. A software tool called
developed that allowed the system to understand what part of        Elvira is presented which allows for the simultaneous display
the explanation a user is pointing at when making further           of probabilities of different evidence cases along with a
queries. Moore and Paris [8] and Carenini and Moore [9]             monitor and editor of cases, allowing the user to enter evidence
discuss architectures for text planners that allow for              and select the information they want to see [24].
explanations that take into account the context created by prior
utterances. In Moore [10], an approach to handling badly-               An explanation application for JAVA debugging is
formulated follow-up questions (such as a novice might              presented in Ko and Myers [25]. This work describes a tool
produce after receiving an incomprehensible explanation from        called Whyline which supports programmer investigation of
an expert) is presented that enables the production of clarifying   program behavior. Users can pose ―why did‖ and ―why didn’t‖
explanations. Tanner and Keuneke [11] discuss an explanation        questions about program code and execution. Explanations are
approach based on a large number of agents with well-defined        derived using a static and dynamic slicing, precise call graphs,
roles is described. A particular agent produces an explanation      reachability analysis and algorithms for determining potential
of its conclusion by ordering a set of text strings in a sequence   sources of values.
that depends on the decision‘s runtime context. Based on an             Explanations in case-based reasoning systems are examined
explanation from one agent, users can request elaboration from      as well. Sørmo, Cassens, and Aamodt [26] present a framework
other agents.                                                       for explanation and consider specific goals that explanations
                                                                    can satisfy which include transparency, justification, relevance,
    Weiner [12] focuses on the structure of explanations with
the goal of making explanations easy to understand by avoiding      conceptualization and learning. Kofod-Petersen and Cassens
complexity. Features identified as important for this goal          [27] consider the importance of context and show how context
include syntactic form and how the focus of attention is located    and explanations can be combined to deal with the different
and shifted. Eriksson [13] examines answers generated through       types of explanation needed for meaningful user interaction.
transformation of a proof tree, with pruning of paths, such as          Explanation of decisions made via decision trees is
non-informative ones. Millet and Gilloux [14] describe the          considered in Langlotz, Shortliffe, and Fagan [28]. An
approach in Wallis and Shortliffe [15] as employing a user          explanation technique is selected and applied to the most
model in order to provide users with explanations tailored to       significant variables, creating a symbolic expression that is
their level of understanding. The natural language aspect of        converted to English text. The resulting explanation contains
explanation is the focus of Papamichail and French [16], which      no mathematical formulas, probability or utility values.
uses a library of text plans to structure the explanations.
                                                                       Lieberman and Kumar [29] discuss the problem of
                                                                    mismatch between the specialized knowledge of experts
providing help and the naiveté of users seeking help is             on the decision. In each case, the amount of transparency in
considered. Here, the problem consists of providing                 the decision-making process is a factor in the trust of the user.
explanations of the expert decisions in terms the users can
understand. The SuggestDesk system is described which                   Our approach to providing transparency, the Why Agent, is
advises online help personnel. Using a knowledgebase,               a decision explanation approach incorporating dialog between
analogies are found between technical problem-solution pairs        the user and the system. Rather than attempting to provide
and everyday life events that can be used to explain them.          monolithic explanations to individual questions, our dialog-
                                                                    based approach allows the user to pose a series of questions,
    Bader et al. [30] use explanation facilities in recommender     the responses to which may prompt additional questions.
systems to convince users of the relevance of recommended           Imitative of natural discourse, our dialog approach allows a
items and to enable fast decision making. In previous work,         user to understand the behavior of the system by asking
Bader found that recommendations lack user acceptance if the        questions about its goals, actions or observables and receiving
rationale was not presented. This work follows the approach of      responses couched in similar terms. We implemented our
Carenini and Moore [17].                                            approach and conducted an evaluation in a maritime autonomy
                                                                    scenario. The evaluation consisted of an experiment in which
    In Pu and Chen [31], a ―Why?‖ form of explanation was           two versions of an interface were shown to participants who
evaluated against what the researchers termed an Organized
                                                                    then answered questions related to trust. Results of the
View (OV) form of explanation in the context of explanations        experiment show response scores statistically consistent with
of product recommendations. The OV approach attempts to             our expectations for the majority of psychometric constructs
group decision alternatives and provide group-level summary         tested, supporting our overall hypothesis that transparency
explanations, e.g. ―these are cheaper than the recommendation       fosters trust. The rest of this paper is organized as follows.
but heavier.‖ A trust model was used to conduct a user              Section II describes the problem domain and the technical
evaluation in which trust-related constructs were assessed          approach. Experiments and results are presented in Section III.
through a Likert scale instrument. The OV approach was found        In Section IV, we provide some concluding remarks and future
to be associated with higher levels of user trust than the          research directions.
alternative approach.
    The important of the use of context in explaining the                             II.   TECHNICAL APPROACH
recommendations of a recommendation system was
investigated in Baltrunas et al. [32]. In this study of point-of-   A. Domain Overview
interest recommendation, customized explanation messages are
                                                                        Our approach to demonstrating the Why Agent
provided for a set of 54 possible contextual conditions (e.g.
                                                                    functionality and evaluating its effectiveness consisted of a
―this place is good to visit with family‖). Even where more
                                                                    simulation-based environment centered on a maritime scenario
than one contextual condition holds and is factored into the
                                                                    defined in consultation with maritime autonomy SMEs. The
system‘s decision, only one can be utilized for the explanation
                                                                    notional autonomous system in our scenario was the X3
(the most influential one in the predictive model is used). Only
                                                                    autonomous unmanned surface vehicle (AUSV) by Harbor
a single explanatory statement is provided to the user.
                                                                    Wing Technologies 1 . Raytheon presently has a business
    Explanation capabilities have also been shown to aid in         relationship with this vendor in which we provide ISR
increasing user satisfaction with and establishing trust in         packages for their AUSVs.
complex systems [34, 35, 36]. The key insight revealed by this
                                                                        The X3 was of necessity a notional AUSV for our
research is the need for transparency in system decision-
                                                                    demonstration because the actual prototype was not operational
making. As noted by Glass et al., ―users identified explanations
                                                                    at the time of the Why Agent project. For this reason, a live,
of system behavior, providing transparency into its reasoning
                                                                    on-system demonstration was not considered. Instead, our
and execution, as a key way of understanding answers and thus
                                                                    demonstration environment was entirely simulation-based. An
establishing trust. [37]‖ Dijkstra [38] studied the
                                                                    existing route planning engine developed under Raytheon
persuasiveness of decision aids, for novices and experts. In one
                                                                    research was modified to serve as the AUSV planner.
experiment, lawyers examined the results of nine legal cases
                                                                    Additional code was developed to support the simulation
supported by one out of two expert systems. Both systems had
                                                                    environment and Why Agent functionality, as described below.
incomplete knowledge models. Because of the incomplete
models, the expert systems routinely gave opposite advice on
each legal case. This resulted in the lawyers being easily mis-     B. Software Architecture
led. Therefore, adequate explanation facilities and a good user-        Our software architecture consists of four components
interface must provide the user with the transparency needed to     interacting in a service-oriented architecture, as shown in
make the decision of trusting the system. Rieh and Danielson        Figure 1.
[39] Outline four different explanation types of decision aids.         The Planner component performed route planning functions
Line-of-reasoning explanations provide the logical justification    based on a plan of intended movement. A plan of intended
of the decision; justification explanations provide extensive       movement is input in the form of a series of waypoints. These
reference material to support the decision; control explanations    waypoints, along with environmental factors, such as weather
provide the problem-solving strategy to arrive at the decision;     forecast data, are used in the planning algorithm to determine
and terminological explanations provide definition information
                                                                    1
                                                                        http://www.harborwingtech.com
an actual over-ocean route. The planner was a pre-existing            available to the user in the context-sensitive menu for the
component developed on R&D that the Why Agent leveraged               ConductPatrol item. When the user selects the ConductPatrol
for the demonstration. Modifications made to the planner to           item and the associated why? option, a query is generated that
support the Why Agent project include changes to expose route         contains IDs associated with the ConductPatrol node and the
change rationale to the controller and inform the controller of       servesPurpose link. The linked node, in this case
weather report information.                                           MissionExecution,is then returned to the user as the result of a
                                                                      query against the associated OWL model.




                 Figure 1: SW architecture for Why Agent.

    The Controller represents the embodiment of the majority
of the simulated AUSV decision logic and simulation control
logic. Because we did not employ an actual AUSV for the Why
Agent project, much of the decision logic of an actual AUSV
had to be simulated for our demonstration, logic implemented
in the Controller. The input to the Controller consisted of a test
control file that defined the event timeline for the simulation. In
addition to orchestrating simulation events defined in the                          Figure 2: General GUI for Why Agent interface.
control file, the Controller mediated queries and responses
between the user interface and the semantic service.
    The graphical user interface was implemented as a web
application. Two versions of the GUI were developed, one with
and one without the Why Agent explanation facility. The Why
Agent version is shown in Figure 2. It has four screen regions:
a map, a status panel, a log data panel and an explanation
panel. The map, implemented with Google Map technology,
shows the current location and route of the AUSV. The status
panel shows various AUSV status values, such as location,
speed, current mode, etc. The log panel shows a time-stamped
series of event descriptions. Various items in the log panel are
user-selectable and have context-sensitive menus to support the                           Figure 3: Example domain model.
user interface functionality of the Why Agent facility. When a
user makes a selection, the response from the semantic service                            III.   EXPERIMENTATION
is shown in the bottom (explanation) panel. Additionally,
responses in the explanation panel are also selectable for                Our evaluation approach consisted of an experiment in
further queries. In this manner, the user can engage in a dialog      which the Why Agent was the treatment. Two versions of a
with the system.                                                      prototype operator interface were developed. One version
                                                                      incorporated the Why Agent functionality and the second did
    The semantic service contains the knowledgebase                   not. The two versions were otherwise identical. Screenshots of
underlying the decision rationale exposed by the Why Agent.           the two interface versions are presented in Figures 4 and 5.
The knowledge consists of event and domain ontology models
represented in web ontology language (OWL) format. The                A. Demonstration Scenario
semantic service provides responses to queries from the
                                                                          The demonstration scenario consisted of autonomous
controller through queries against its underlying models.
                                                                      fishing law enforcement in the Northwestern Hawaiian Islands
    An example of a domain model is shown in Figure 3.                Marine National Monument. The CONOP for this mission is as
Relationships in this figure encode potential queries linking         follows:
concepts and events that can be displayed in the user interface.
For example, the activity ConductPatrol relates to the function             The AUSV operator selects waypoints corresponding to
MissionExecution through the relationship servesPurpose. This               a patrol area.
relationship is statically associated with the query why? at the            The AUSV route planner finds a route through the
user level. Thus, the existence of this link connected with the             waypoints and a patrol is conducted.
node ConductPatrol implies a why? option being made
      RADAR is used to detect potential illegal fishing vessels            we selected the following set of five psychometric constructs:
      (targets)                                                            1. General Competence, 2) Self-Defense, 3) Navigation, 4)
                                                                           Environmental Conservation and 5) Mission. Each construct is
      Targets are investigated visually after AUSV closes to               intended to capture the users‘ belief regarding the system‘s
      an adequate proximity.                                               ability to effectively perform in regard to that construct, i.e. the
                                                                           user‘s level of trust for that construct. For example, the
      Automated analysis of the visual data is used to confirm             construct Mission attempts to encompass user attitudes toward
      the target is engaged in illegal fishing.                            the ability of the system to successfully execute its mission.
      Targets engaged in illegal activity are visually identified          The Environmental Conservation construct was included as an
      for subsequent manned enforcement action.                            example of a construct under which we would not expect to see
                                                                           a difference in psychometric responses.
  Non-lethal self-defensive actions can be taken by the
AUSV in the presence of hostile targets.
    To support this demonstration, a software-based simulation
environment was developed. The demonstration consisted of
capturing video of user interactions with the baseline and Why
Agent versions of the operator interface while a scripted series
of events unfolded over a pre-determined timeline.




                                                                                    Figure 5: Operator interface with the Why Agent functionality.

                                                                           For each construct, we have a set of possible trust levels and a
                                                                           set of psychometric participant response scores. Define these as
                                                                           follows (for this study, k=5):
                                                                                    Set of k constructs C = {cj : 1 ≤ j ≤ k}
                                                                                    Set of trust levels L = {low, high}
       Figure 4: Operator interface without the Why Agent functionality.
                                                                                    Psychometric participant response scores for each
B. Experimental Design                                                              construct:
    Our experiment consisted of a single-factor, randomized                         Control: RC = {rjC : 1 ≤ j ≤ k }
design. The factor is interface type and has two levels: baseline
(control) and Why Agent (experimental). Thus, we have two                           Experimental: RE = {rjE : 1 ≤ j ≤ k }
treatment levels, corresponding to the two factor types. The                   Here, we take the simplest possible approach, a binary trust
experimental subjects were Raytheon employees, recruited                   level set. We simply assume that the trust level for a particular
across multiple Raytheon locations, during the project.                    construct should either be low or high, with nothing in
    Our general hypothesis is that the Why Agent fosters a                 between. Clearly, many other trust models are possible. To
more appropriate level of trust in users than the baseline                 operationalize the notion of ―more appropriate level of trust‖,
system. By utilizing the information provided by the Why                   we need to define, for each construct, a ground truth
Agent, users will be more able to calibrate their trust [33]. To           assignment of trust level. Thus, we need to define the following
test this hypothesis, we needed to operationalize the concept of           mapping T:
―more appropriate level of trust‖ and thereby derive one or                         Mapping of construct to trust level: T(j)  L
more testable hypotheses. We accomplished this through the
following operationalization.                                                             o    T(j) = low: People should not trust the system
                                                                                               regarding construct j
Trust in a particular system, being an unobservable mental
aspect of a user, necessitates the use of psychometric readings                           o    T(j) = high: People should trust the system
of constructs related to the overall concept of trust. Given the                               regarding construct j.
broad nature of this concept, multiple constructs should be
defined. Using our domain insight and engineering judgment,
   Additionally, we need to map the elements of the trust set        Equations (2) – (3) unless dialog exposing decision rationale
to psychometric scale values. In other words, we need to             relevant to this concept is included in the scenario.
normalize the scale as follows:
                                                                         Based on this reasoning, we expect the effect of decision
         Mapping of trust level to psychometric scale values         explanation to be one of pushing response scores up or down,
                                                                     toward the appropriate trust level but only in cases where
                       S: S(low) = 1; S(high) = 5.                   explanation dialog related to the construct under test is
    At this point, we can define the concept of ―appropriate         exposed. In other cases, we expect no difference in the
level of trust‖ in terms of the psychometric scale through a         response scores, as indicated in Table 1. We note that the null
composition of the above mappings S and T. In other words, for       hypotheses are derived as the complementary sets to the
each construct, the appropriate level of trust is the                equations in Table 1. E.g., the ‗low, with relevant dialog‘ null
psychometric value associated with the trust level assigned to       hypothesis equation would be rjE – rjC ≥ 0.
that construct:                                                          A total of 44 control and 50 experimental subjects were
         Appropriate Level of Trust with respect to design           recruited for the Why Agent study. The experiment was
         intent A = {aj : 1 ≤ j ≤ k }                                designed to be completed in one hour. Following a short
                                                                     orientation, a pre-study questionnaire was presented to the
    For each construct cj, the appropriate level of trust aj for     participants. The pre-study questionnaire contained questions
that construct is given by                                           regarding participant demographics and technology attitudes.
                     aj = S(T(j)), 1 ≤ j ≤ k                   (1)   The purpose of the pre-study questionnaire was to determine
                                                                     whether any significant differences existed between the
    A key aspect of the above definition is the qualifier with       experimental and control groups. Following the pre-study
respect to design intent. We assume the system functions             questionnaire, participants were given a short training
without defects. With respect to design intent simply means ―it      regarding the autonomous system and their role in the study.
should be trusted to accomplish X if it is designed to               Participants were asked to play the role of a Coast Guard
accomplish X.‖ We make this assumption for simplification            commander considering use of the autonomous system for a
purposes, fully acknowledging that no real system is defect-         drug smuggling interdiction mission. Following the training,
free. In the presence of defects, the notion of appropriate level    participants were shown the scenario video which consisted of
of trust becomes more complex.                                       several minutes of user interaction with either the baseline or
                                                                     Why Agent interface. Following the video, participants
    Having defined appropriate level of trust, we are finally in     completed the main study questionnaire. The system training
a position to define the key concept, more appropriate level of      was provided in a series of powerpoint slides. Screenshots
trust. The intuition underlying this notion is the observation       taken from the study video were provided to the participants in
that if one‘s trust level is not appropriate to begin with, any      hardcopy form, along with hardcopies of the training material.
intervention that moves the trust level toward the appropriate       This was done to minimize any dependence on memory for
score by a greater amount than some other intervention can be        participants when completing the study questionnaire.
said to provide a ―more‖ appropriate level of trust. The Why                       Table 1: Expected responses as a result of decision explanation.
Agent specifically exposes information associated with the
purpose of AUSV actions. Such additional information serves                                         Experimental Condition
to build trust [33]. If the psychometric score for the                                              With relevant dialog       Without relevant dialog
experimental group is closer to the appropriate trust level than
                                                                                                    Experimental response      Experimental response
the score for the control group, then we can say that the                                           less than control          indistinguishable from
experimental treatment provided a more appropriate level of                             Low         response                   control response
trust for that construct. Formally, we define this concept as
                                                                     Construct                      rjE – rjC < 0              rjE – rjC = 0
follows:
                                                                     trust level                    Experimental response      Experimental response
         More appropriate level of trust: Given observed                                            greater than control       indistinguishable from
         response scores rjC and rjE for construct j, the                               High        response                   control response
         experimental response rjE reflects a more appropriate                                      rjE – rjC > 0              rjE – rjC = 0
         level of trust when the following holds
                     rjE - rjC < 0 if aj = 1                         C. Experimental Results
                     rjE - rjC > 0 if aj = 5                             To investigate whether significant differences exist between
                                                                     the control and experimental groups in terms of responses to
   We expect the Why Agent to affect observed trust levels           the technology attitudes questions, ANOVA was performed.
only for those constructs for which relevant decision criteria       The results are shown in Table 2. Cronbach reliability
are exposed during the scenario. In these cases, we expect           coefficients, construct variances and mean total response scores
Equations (2)-(3) to hold. In all other cases, we do not. For        are shown for the control and experimental groups in Tables 3
example, since the AUSV is not designed to protect marine life,      and 4.
we assert that the appropriate level of trust for the
Environmental Conservation construct is ―low.‖ However, we               To investigate whether significant differences exist between
do not expect to observe response levels consistent with             the control and experimental groups in terms of responses to
                                                                     the study questions, ANOVA was performed. For this study,
we focused our analysis on individual constructs. Thus, we do                                                   0.16), which is also consistent with our expectations as this
not present any statistics on, for example, correlations among                                                  construct had no associated decision explanation content
responses related to multiple constructs for either the control or                                              exposed to the experimental group. The experimental response
experimental group. The results are shown in Table 6.                                                           for construct 3 was not significantly higher than the control
  Table 2: ANOVA computations analyzing differences between control and
                                                                                                                response, which is inconsistent with our expectations, although
            experimental groups, for technology attitude questions.                                             the difference is only marginally outside the significance
                                                                                                                threshold (p = 0.059).
                                                                                                                   Table 6: ANOVA computations analyzing differences between control and
                                                                                                                               experimental groups, for study questions.




    Table 3: Cronbach reliability coefficients, construct variances, and means
                             for control group.

                                      Control Results                                                               While the test results indicate moderate support for the
                                   Variances                                                                    efficacy of the Why Agent approach, they are decidedly mixed,
    Construct      Q1            Q2        Q3             Total      Cronbach Alpha Mean                        so it is not possible to draw any definitive conclusions. As
       1          0.492         0.306    0.348            2.20            0.72      11.11                       discussed below, we recognize that a number of significant
       2          0.710         0.517      NA             1.79            0.63       6.43
       3          0.720         0.319      NA             1.05            0.02       7.30
                                                                                                                limitations also hinder the application of our results. A pilot
       4          0.911         0.670      NA             2.02            0.43       6.73                       study would have helped to create a stronger experimental
       5          0.953         0.586      NA             2.23            0.62       7.34                       design and recruit a more representative sample population, but
                                                                                                                this was not possible due to budget and schedule constraints.
    T-test results for each construct are shown in Table 5. Two                                                 Nevertheless, the study has provided initial evidence for how
p-values are shown for each construct; p1 represents the p-                                                     and to what extent the Why Agent approach might influence
value resulting from use of the pooled variance while p2                                                        trust behavior in autonomous systems, and given impetus for
represents the p-value resulting from use of separate variances.                                                continued investigations.
    The ANOVA results shown in Table 2 indicate that the                                                            Construct Reliability: Referring to Table 4, we see that
experimental and control groups did not significantly differ                                                    reliability coefficients for some constructs are not above the
across any attribute in terms of their responses to the                                                         commonly-accepted value of 0.7. Had schedule permitted, a
technology attitudes questions. In other words, we do not see                                                   pilot study could have uncovered this issue, providing an
any evidence of a technology attitude bias in the study                                                         opportunity to revise the questionnaire.
participants.
                                                                                                                    Experiment Limitations: Clearly a variety of limitations
    Table 4: Cronbach reliability coefficients, construct variances, and means
                             for control group.                                                                 apply to our experiment. One is that participants did not
                                                                                                                interact directly with the system interface; instead entire groups
                                  Experimental Results
                                                                                                                of participants were shown a video of someone else interacting
                                    Variances                                                                   with the system. Also, the participants were not drawn from the
     Construct       Q1           Q2        Q3             Total     Cronbach Alpha Mean                        population of interest. Consequently, our results may not apply
        1           0.286        0.262     0.449           1.94           0.73      12.06                       to that target group. Additionally, subjects were asked to play a
        2           0.689        0.694      NA             2.18           0.73       7.22                       role with much less information than a real person in that role
        3           0.480        0.367      NA             1.17           0.56       7.64                       would have. Also, as noted by a reviewer, the experimental
        4           0.571        0.621      NA             1.92           0.76       7.14                       design does not allow us to determine whether decision
        5           0.898        0.629      NA             2.05           0.51       7.46                       correctness is related to trust when clearly it should be; an
                                                                                                                intervention that raises trust regardless of correctness is not
                     Table 5: T-test computations for each construct.
                                                                                                                desirable. Finally, execution of the experiment could have been
                                     Construct Hypothesis Tests                                                 improved. In particular, our maritime autonomy SME noted:
                  p-values                                                                                      The Mode should have reflected the simulation events; The
     Construct p1        p2                      Null Hypothesis                               Result           LRAD light should have illuminated during the approach phase
        1      0.001    0.001   Experimental score is not greater than Control score   Reject Null Hypothesis
        2      0.004    0.004   Experimental score is not greater than Control score   Reject Null Hypothesis   with an audio warning; The subjects should have been trained
        3
        4
               0.058
               0.158
                        0.059
                        0.159
                                Experimental score is not greater than Control score
                                   Experimental score is equal to Control score
                                                                                       Accept Null Hypothesis
                                                                                       Accept Null Hypothesis
                                                                                                                on the nonlethal defense functions.
        5      0.348    0.347   Experimental score is not greater than Control score   Accept Null Hypothesis
                                                                                                                   Semantic Modeling: A potentially significant drawback to
    For constructs one and two, the experimental response was                                                   our approach is the manually-intensive nature of the semantic
greater than the control response (p = 0.001 and 0.004,                                                         modeling effort needed to populate our knowledgebase.
respectively), consistent with our expectations. For construct                                                  Identifying ways to automate this process is a key area of
four, environmental conservation, we see no significant                                                         potential future work related to this effort.
difference between the experimental and control responses (p =
                     IV.    CONCLUDING REMARKS                                    [17] Carenini and Moore, ―Generating and evaluating evaluative arguments,‖
                                                                                       Artificial Intelligence, vol. 170, no. 11, pp. 925-952, 2006.
    We draw the following specific conclusions based on the                       [18] T. R. Gruber and P. O. Gautier, ―Machine-generated explanations of
quantitative results reported above. First, the experimental and                       engineering models: A compositional modeling approach,‖ IJCAI, 1993.
control groups do not significantly differ across any attribute in                [19] Patrice O. Gautier and Thomas R. Gruber, ―Generating Explanations of
terms of their responses to the technology attitudes questions.                        Device Behavior Using Compositional Modeling and Causal Ordering,‖
The experimental and control groups do not significantly differ                        AAAI, 1993.
across any non-Group attribute in terms of their responses to                     [20] B. White and J. Frederiksen, ―Causal model progressions as a foundation
the study questions with the exception of gender differences for                       for Intelligent learning,‖ Artificial Intelligence, vol. 42, no. 1, pp. 99-
                                                                                       155, 1990.
construct. Construct reliability is low in some cases, indicating
                                                                                  [21] F. Elizalde et al., ―An MDP approach for explanation. Generation,‖ In
the need for a prior pilot study to tune the psychometric                              Workshop on Explanation-Aware Computing with AAAI, 2007.
instrument. We accept the null hypothesis for construct 4 and
                                                                                  [22] O. Z. Khan et al., ―Explaining recommendations generated by MDPs,‖
reject it for constructs 1 and 2, as predicted under our                               In Workshop on Explanation Aware Computing, 2008.
assumptions. We cannot reject the hypothesis associated with                      [23] S. Renooij and L. Van-DerGaa, ―Decision making in qualitative
construct 3, although this is a very marginal case. The results of                     influence diagrams,‖ In Proceedings of the Eleventh International
construct 5 are contradictory to our expectations. Overall, we                         FLAIRS Conference, pp. 410–414, 1998.
conclude that the Why Agent approach does increase user trust                     [24] C. Lacave et al. ―Graphical explanations in bayesian networks,‖ In
levels through decision transparency.                                                  Lecture Notes in Computer Science, vol. 1933, pp. 122–129. Springer-
                                                                                       Veralg, 2000.
                                                                                  [25] Andrew Ko and Brad Myers, ―Extracting and answering why and why
                               REFERENCES                                              not questions about Java program output,‖ ACM Transactions on
[1]  B. Chandrasekaran and W. Swartout, ―Explanations in knowledge                     Software Engineering and Methodology, vol. 20, no. 2, 2010.
     systems: the role of explicit representation of design knowledge,‖ IEEE      [26] F. Sørmo et al., ―Explanation in case-based reasoning – perspectives and
     Expert vol. 6, no. 3, pp. 47-19, 1991.                                            goals,‖ Artificial Intelligence Review, vol 24, no. 2005, pp. 109–143,
[2] B. Chandrasekaran et al., ―Explaining control strategies in problem                2005.
     solving,‖ IEEE Expert vol. 4, no.1, pp. 9-15, 1989.                          [27] A. Kofod-Petersen and J. Cassens, ―Explanations and context in ambient
[3] William R. Swartout. ―XPLAIN: a system for creating and explaining                 intelligent systems, in Proceedings of the 6th international and
     expert consulting programs,‖ Artificial Intelligence, vol. 21, no. 3, pp.         interdisciplinary conference on Modeling and using context, 2007.
     285-325, 1983.                                                               [28] C. P. Langlotz et al., ―A methodology for generating computer-based
[4] William J. Clancey, ―The epistemology of a rule-based expert system —              explanations of decision-theoretic advice,‖ Med Decis Making, vol. 8,
     a framework for explanation,‖ Artificial Intelligence, vol 20., no. 3, pp.        no. 4, pp. 290-303, 1988.
     215-251, 1983.                                                               [29] H. Lieberman and A. Kumar, ―Providing expert advice by analogy for
[5] V. M. Saunders and V. S. Dobbs, ―Explanation generation in expert                  on-line help,‖ in Proceedings of the IEEE/WIC/ACM International
     systems,‖ in Proceedings of the IEEE 1990 National Aerospace and                  Conference on Intelligent Agent Technology, pp. 26-32, 2005.
     Electronics Conference, vol. 3, pp. 1101-1106, 1990.                         [30] Baderet et al., ―Explanations in Proactive Recommender Systems in
[6] J. Moore and W. Swartout, ―Pointing: A Way Toward Explanation                      Automotive Scenarios,‖ Workshop on Decision Making and
     Dialog,‖ AAAI Proceedings, pp.457-464, 1990.                                      Recommendation Acceptance Issues in Recommender Systems
                                                                                       Conference, 2011.
[7] Swartout et al., 1991. ―Explanations in knowledge systems: design for
     explainable expert systems,‖ IEEE Expert, vol. 6, no. 3, pp. 58-64, 1991.    [31] P. Pu and L. Chen, ―Trust building with explanation interfaces,‖ in: 11th
                                                                                       International conference on Intelligent User Interfaces, pp. 93-100,
[8] Johanna D. Moore and Cécile L. Paris, ―Planning text for advisory
                                                                                       2006.
     dialogues,‖ in Proceedings of the 27th annual meeting on Association
     for Computational Linguistics, 1989.                                         [32] Baltrunas et al., ―Context-Aware Places of Interest Recommendations
                                                                                       and Explanations,‖ in 1st Workshop on Decision Making and
[9] Giuseppe Carenini and Johanna D. Moore, ―Generating explanations in
                                                                                       Recommendation Acceptance Issues in Recommender Systems,
     context,‖ in Proceedings of the 1st international conference on
                                                                                       (DEMRA 2011), 2001.
     Intelligent user interfaces, 1993.
                                                                                  [33] J. D. Lee K. A. See, Trust in Automation: Designing for Appropriate
[10] J. D. Moore, ―Responding to ‗HUH?‘: answering vaguely articulated
                                                                                       Reliance. Human Factors, vol 46, no 1, pp. 50-80, 2004.
     follow-up questions,‖ in Proceedings of the SIGCHI conference on
     Human factors in computing systems: Wings for the mind, 1989.                [34] D. L. McGuinness et al., ―Investigations into Trust for Collaborative
                                                                                       Information Repositories: A Wikipedia Case Study,‖ in Workshop on the
[11] M.C. Tanner and A.M. Keuneke, ―Explanations in knowledge systems:
                                                                                       Models of Trust for the Web, 2006.
     the roles of the task structure and domain functional models,‖ IEEE
     Expert, vol. 6, no. 3, 1991.                                                 [35] I. Zaihrayeu, P. Pinheiro da Silva, and D. L. McGuinness, ―IWTrust:
                                                                                       Improving User Trust in Answersfrom the Web,‖ in Proceedings of the
[12] J. L. Weiner, ―BLAH, a system which explains its reasoning,‖ Artificial
                                                                                       3rd International Conference on Trust Management, pp. 384-392, 2005.
     Intelligence, vol. 15, no. 1-2, pp. 19-48, 1980.
                                                                                  [36] B. Y. Lim, A. K. Dey, and D. Avrahami, ―Why and why not
[13] Agneta Eriksson, ―Neat explanation of Proof Trees,‖ in Proceedings of
                                                                                       explanations improve the intelligibility of context-aware intelligent
     the 9th international joint conference on Artificial intelligence, vol. 1,
                                                                                       systems,‖ in Proceedings of the 27th international conference on Human
     1985.
                                                                                       factors in computing systems, pp. 2119-2128, 2009.
[14] C. Millet and M. Gilloux, ―A study of the knowledge required for
                                                                                  [37] A. Glass, D. L. McGuinness, and M. Wolverton, ―Toward establishing
     explanation in expert systems,‖ in Proceedings of Artificial Intelligence
                                                                                       trust in adaptive agents,‖ in Proceedings of the 13th international
     Applications, 1989.
                                                                                       conference on Intelligent user interfaces, pp. 227-236, 2008.
[15] J.W. Wallis and E.H. Shortliffe, "Customized explanations using
                                                                                  [38] J. J. Dijkstra, ―On the use of computerised decision aids: an investigation
     causal knowledge," in Rule-based Expert Systems, Addison-Wesley,
                                                                                       into the expert system as persuasive communicator,‖ Ph.D. dissertation,
     1984.
                                                                                       1998.
[16] K. N. Papamichail and S. French, ―Explaining and justifying the advice
                                                                                  [39] S. Y. Rieh and D. R. Danielson, ―Credibility: a multidisciplinary
     of a decision support system: a natural language generation approach,‖
                                                                                       framework.‖ In Annual Review of Information Science and Technology,
     Expert Systems with Applications, vol. 24, no. 1, pp. 35-48, 2003.
                                                                                       B. Cronin (Ed.), Vol. 41, pp. 307-364, 2007.