=Paper= {{Paper |id=Vol-2068/exss9 |storemode=property |title=What Should Be in an XAI Explanation? What IFT Reveals |pdfUrl=https://ceur-ws.org/Vol-2068/exss9.pdf |volume=Vol-2068 |authors=Jonathan Dodge,Sean Penney,Andrew Anderson,Margaret Burnett |dblpUrl=https://dblp.org/rec/conf/iui/DodgePAB18 }} ==What Should Be in an XAI Explanation? What IFT Reveals== https://ceur-ws.org/Vol-2068/exss9.pdf
  What Should Be in an XAI Explanation? What IFT Reveals

                           Jonathan Dodge, Sean Penney, Andrew Anderson, Margaret Burnett
                                                  Oregon State University
                                                    Corvallis, OR; USA
                                { dodgej, penneys, anderan2, burnett }@eecs.oregonstate.edu


ABSTRACT                                                                             gap and approach our investigation. IFT is based on a predator-
This workshop’s call for participation poses the question: What                      prey model [12]. Grounded in prior work about how people
should be in an explanation? One route toward answering this                         seek information [3, 11], we used StarCraft II to investigate
question is to turn to theories of how humans try to obtain                          how both the expert explainers (suppliers) and our participants
information they seek. Information Foraging Theory (IFT) is                          (“demanders”) would navigate the information environment
one such theory. In this paper, we present lessons we have                           as they sought to make sense of a game while it unfolded.
learned about how IFT informs Explainable Artificial Intelli-
                                                                                     In the RTS domain, players compete for control of territory
gence (XAI), and also what XAI contributes back to IFT.
                                                                                     by fighting for it. Each player raises an army to fight their
                                                                                     opponents, which takes resources and leads players to build
CCS Concepts                                                                         Expansions (new bases) to gain more resources. Players also
•Human-centered   computing      →      User   studies;                              can use resources to create Scouting units, which lets them
•Computing methodologies → Intelligent agents;                                       learn about their enemies’ movements to enable Fighting in a
                                                                                     strategic way. For a more in-depth explanation of the domain,
Author Keywords                                                                      refer to [9].
Intelligent Agents; Explainable AI; Intelligibility; Content
                                                                                     In our user study [10] investigating this domain, we gave 20
Analysis; Video Games; StarCraft; Information Foraging
                                                                                     experienced StarCraft II players a game replay file1 to analyze
                                                                                     and asked them to record whatever they thought were key deci-
INTRODUCTION                                                                         sion points (i.e., any “event which is critically important to the
Explainable AI (XAI) is burgeoning to help ordinary users                            outcome of the game”) during the match. Participants worked
understand their intelligent agents’ behavior – but many fun-                        in pairs, allowing us to keep them talking by leveraging the
damental questions remain in order to achieve this goal. This                        social convention of conversing about their collaborative task.
paper describes our recent progress toward one such question:                        Because we wanted to understand how the participants go
What should be in an explanation?                                                    about assessing an intelligent agent’s decisions, we told them
We have been working to answer this question in a domain                             that one of the players in the game was under AI control. How-
often used for AI research, namely Real-Time Strategy (RTS)                          ever, this was not true; both players were human professionals.
games, from two sides. First, to understand what a high-                             1 We used game 3 of this match (http://lotv.spawningtool.com/
quality supply of explanations might contain, we conducted a                         23979/) from the IEM Season XI - Gyeonggi tournament.
qualitative analysis of the utterances of expert explainers [2].
Second, to understand demand for explanations in the same
domain, we conducted a user study [10] to understand the
questions participants formulated when assessing an intelligent                                2
agent playing the popular RTS game StarCraft II [9]. Here,
we focus on the latter study.
There have been previous explorations into what should be in
an XAI explanation [1, 6, 7, 8, 14, 15], but few such explo-
rations draw upon theories of how humans problem-solve. We
used Information Foraging Theory (IFT) [12] to help fill this                                                                                      3


                                                                                           1

                                                                                     Figure 1. A screenshot from our study, with participants anonymized
                                                                                     (bottom right corner). Superimposed red boxes point out: (1, bot-
                                                                                     tom left) the Minimap, a birds-eye view enabling participants navigate
© 2018. Copyright for the individual papers remains with the authors. Copying per-   around the game map; (2: top left) a drop-down menu to display the
mitted for private and academic purposes.                                            Production tab for a summary of build actions in progress; (3, middle
ExSS 2018, March 11, 2018, Tokyo, Japan.                                             right) Time Controls to rewind/forward or change speed.
The participants’ main task was to assess the AI’s capabilities.        [2]. The results showed that the shoutcasters’ commentaries in
To do so, the participants replayed the game using the built-in         StarCraft games [2] matched well with the above explanation
StarCraft tool, shown in Figure 1, which offers the ability to          demands. In particular, shoutcasters’ utterances were mostly
observe the previously recorded events. The tool provided               about the What intelligibility type, with very few utterances
functionality to freely navigate with the camera, pause/rewind          of the Why or Why-Didn’t types. Further, the shoutcasters
with time controls, and drill down into various aspects of the          were remarkably consistent with each other in frequency of
game state, helping participants decide how the AI was doing.           using each intelligibility type. The consistency between the
                                                                        supply-side and demand-side results offers evidence that in
After participants finished the main task, we conducted a ret-          the RTS domain, What explanation content is more in demand
rospective interview in two parts. In both parts, we asked              than Why or Why-Didn’t.
participants questions about things they had said and done,
while pointing them out in the video we had just made of those          Implications: Taken together, these results show that in this
participants working on the task. In the first part, we navigated       domain, participants placed very high value on state informa-
to each decision point they identified and asked why it was             tion — but not always at the same granularity, and not always
so important. In the second part, we asked about selected               restricted to a single moment in time. How an XAI system can
navigations using questions based on previous work [11], such           satisfy these explanation needs may not be straightforward,
as “What about that point in time made you stop there?”. A              but one of our findings suggests a way forward: Shoutcasters
more detailed methodology can be found in [10].                         may be usable as a gold standard. That is, the remarkable
                                                                        similarity between the frequency of shoutcasters’ utterances
WHAT WE’VE LEARNED SO FAR: IFT → XAI                                    (supply) and participants’ desired prey (demand) for most in-
Things we’ve learned from studying Prey                                 telligibility types suggests that XAI explanation systems in the
In IFT, predators seek prey, which are the pieces of information        RTS domain may be able to model their explanation content,
in the environment they they think they need. In the context            timing, and construction, around shoutcasters’ explanations.
of XAI, such prey are evidence of the agents decision process,
which are then used to create explanations for agents’ actions.
                                                                        Things we’ve learned from studying Paths
To investigate the information participants were trying to ob-          In IFT, prey exists within some patch(es), and the forager
tain, we analyzed the questions that they asked each other.             navigates between patches by following paths, made up of
We categorized their questions according to the Lim-Dey in-             one or more links. Investigating the paths participants used
telligibility types [7], which separate questions into What,            revealed a great deal of information about the kinds of costs
What-Could-Happen, Why-Did, Why-Didn’t, and How-To.                     they can incur in the RTS domain when seeking information.
We also added a Judgment intelligibility type to capture when
participants sought a quality judgment.                                 Traditionally, IFT looks at the navigation cost to get to a
                                                                        patch (here, explanation), usually in number of clicks, and
Although most previous XAI research has found Why to be                 the cognitive cost of absorbing the necessary information in
highly demanded information, our participants rarely sought             the patch once there. These costs are relevant to XAI too, but
Why or Why-Didn’t information. Instead, our participants                our investigation discovered participants incurred significant
showed a strong preference for asking What questions.                   cognitive effort in both path discovery and path triage.
What was so interesting about What? The participants’ What              Why so expensive? Professional RTS players perform several
information seeking was about finding out more about state              hundred actions per minute (APM), and each such action po-
than they currently knew. Our participants did so primarily in          tentially destroys or updates the available foraging paths. This
three categories: drill down, higher level, and temporal.               produces an information environment in which foraging paths
 Drill down Whats usually involved participants spatially nav-          are numerous, rapidly updating, and have limited lifespan.
 igating around the map, sometimes opening up objects or                Thus, our participants were faced with many more potentially
 menus to access more detailed game state information. E.g.             useful foraging paths than they could possibly follow, and had
“Is the human building any new stuff now?” The second cat-              to spend significant effort just choosing a path.
 egory, higher-level Whats, involved trying to abstract a little        Some coped with these costs by adhering to a single foraging
 above the details, to gain a higher level of understanding of          path throughout the task, rewinding rarely. These participants
 the game state. E.g. “What’s going on over there?” The third           minimized their cognitive costs of choosing, but paid a high
 category, temporal Whats, involved finding out more about              information cost, because by not following other paths, they
 differences or similarities in state over time. E.g. “When did         missed out on potentially explanatory information. Others
 he start building...?”                                                 chose not to pay this information cost, and instead paid a
Finally, to investigate whether our distribution of What vs.            navigation cost by often rewinding and pausing to spatially
Why results were reasonably representative for this domain,             explore. Rewinding also incurs substantial cognitive cost, as
we compared our participants’ questioning (i.e., explanation            more context information must be tracked — but that extra
demand) against the answers (explanation supply) produced by            context may provide useful explanatory power.
professional explainers in this domain, namely shoutcasters 2
2 Shoutcasters are sportscasters for e-sports. They perform a similar   constraint that they must analyze and explain the game in real-time,
sort of analysis as our participants were doing, but with the added     so they cannot pause or rewind.
Figure 2. Dots show the Building-Expansion decision points each par-
ticipant pair (y-axis) identified over time (x-axis). Red lines show when
Expansion events actually occurred. (Participants noticed most of them.)
The red box shows where Pair 4 failed to notice an event they likely
wanted to note, based on their previous and subsequent behavior.

Interestingly, when costs of choice were low, participants’ ex-             Figure 3. (Top:) The Scouting decision points identified by our partic-
planation seeking followed fairly traditional foraging patterns.            ipant pairs (y-axis), with game time on the x-axis. (Bottom:) The Fight-
                                                                            ing decision points identified, plotted on the same axes. After Fighting
For example, early in the game, participants scrutinized the                events begin (red line), Scouting decision points are no longer noticed
game objects carefully and in detail — a sharp contrast to                  often — despite important Scouting actions continuing to occur.
late in the game when many more game objects and foraging
                                                                            Scouting. The top image in the figure shows all of the Scouting
paths were present. This could suggest that as the information
                                                                            decision points our participants identified, while the bottom
environment grows in complexity, users in this domain will
                                                                            image shows all of the Fighting decision points. The red
seek explanations at a higher level of abstraction (i.e., a group
                                                                            line going through both images is the point at which combat
of units as opposed to a single unit).
                                                                            first begins – and also the time when scouting is usually last
Implications: XAI explanation system would benefit from                     noticed, despite being ongoing throughout the game.
incorporation of an explanation recommender. Such a recom-
                                                                            Implications: Distractions abound, and may be systematic.
mender could take into account both the human cognitive cost
                                                                            Some facets of the environment may elicit emotional response
of considering too many paths when few can be followed, and
                                                                            and receive undue attention as a result. In this case, partici-
the information cost of neglecting some path too long. For
                                                                            pants preferred investigating Fighting over Scouting.
example, if the domain is well known to an explanation system
a priori, such a recommender may help guide users (reducing                 Another implication relates to an XAI explanation system’s
their cognitive cost of choosing) to the explanations that are              user interface’s support for human workflow. Paths are easily
the most important (reducing the information cost of missing                forgotten in the presence of interruptions. Each new action
important explanatory information). In this case, it appears we             may interrupt the current foraging path, which leads to people
know Expansions are important before any analysis occurs.                   forgetting things – made worse by sheer path quantity. Previ-
                                                                            ous research has found that To-Do Listing [5] is an effective
Things we’ve learned from studying Scent and Cues                           strategy to help prevent users from forgetting so much.
Recall that we requested that participants write down key
decision points. To forage for these, they followed cues, which             WHAT WE’VE LEARNED SO FAR: XAI → IFT
are information features connected to links in the environment.             In the previous sections, we focused on things we learned
In our study, cues were the same across sessions, because                   about XAI by applying IFT to our data set. Now, we turn
everyone replayed the same game. Unlike cues, scent is “in                  the other direction, since this study is the first to apply IFT
the head” – it is foragers’ assessment of cues’ meaning.                    to XAI and the RTS domain. The RTS domain presents an
                                                                            extremely complex and rapidly changing environment, more
Unfortunately, participants missed information that we suspect
                                                                            so than other IFT environments in the literature like Integrated
they would have found key, because some cues distracted them.
Our videos showed that what participants were looking at                    Development Environments (IDEs) and web sites [3, 4, 11,
when they were distracted — the “distractor cues” — tended                  13]. In the RTS domain, hundreds of actions happen each
to be combat-oriented and affected even simple game states.                 minute. Further, the environment is continually affected by
                                                                            actions which do not originate from the forager.
For example, in Figure 2, a nearly full decision column sug-
                                                                            As discussed, our participants were faced with many paths,
gests that participants tended to agree that this decision was
                                                                            and had to rapidly triage which paths to follow. This presents
key. A nearly full participant pair row suggests that this pair
                                                                            an interesting IFT challenge. Previous research [11] identi-
consistently found this type of decision to be key. Thus, miss-
ing dots (e.g., see the red box) correspond to times when a                 fied a “scaling up problem” in IFT — a difficulty estimating
participant pair was distracted from a key event.                           value/cost of distant prey as the path to the prey became long.
                                                                            In our case, we observed that foraging paths were short, but
In fact, the scents emanating from some types of cues seemed                since so many paths are available, not much time is available to
to overpower others consistently. Consider the example in                   make an accurate path value/cost estimate. The current study
Figure 3, which shows how Fighting tended to overpower                      reveals a “breadth version” of this scaling problem (Figure 4).
Foraging in other environments        Foraging in RTS environments               debugging, refactoring, and reuse tasks. ACM
                                                                                 Transactions on Software Engineering and Methodology
                                                                ….               (TOSEM) 22, 2 (2013), 14.
                                                                              4. W. Fu and P. Pirolli. 2007. SNIF-ACT: A cognitive model
                                                                                 of user navigation on the world wide web.
                                                                                 Human-Computer Interaction 22, 4 (2007), 355–412.
                                                                              5. V. Grigoreanu, M. Burnett, and G. Robertson. 2010. A
                                                                                 strategy-centric approach to the design of end-user
Figure 4. Conceptual drawing of foraging in the RTS domain vs. pre-
viously studied foraging. (Left): Information environments in prior IFT          debugging tools. In ACM Conference on Human Factors
literature: the predator considers few paths, but the paths are sometimes        in Computing Systems. ACM, 713–722.
very deep. This figure inspired by an IDE foraging situation in [[11] Fig.
5]. (Right): Foraging in RTS, where most navigation paths are shallow,        6. T. Kulesza, M. Burnett, W. Wong, and S. Stumpf. 2015.
but with numerous paths to choose from at the top level.                         Principles of explanatory debugging to personalize
                                                                                 interactive machine learning. In ACM International
Turning to prey, in the XAI setting, the prey is evidence of                     Conference on Intelligent User Interfaces. ACM,
the agent’s decision process. Establishing trust in an XAI                       126–137.
system requires the user to know how it behaves in many cir-
cumstances. Thus, the prey is “in pieces” – meaning that bits                 7. B. Lim and A. Dey. 2009. Assessing demand for
of it are scattered over many patches. As previous work [11]                     intelligibility in context-aware applications. In ACM
has shown, “prey in pieces” creates foraging challenges, be-                     International Conference on Ubiquitous Computing.
cause finding and assembling all the bits can be tedious and                     ACM, 195–204.
error-prone. In the model-agnostic XAI setting, IFT’s “prey                   8. B. Lim, A. Dey, and D. Avrahami. 2009. Why and why
in pieces” problem becomes even more pronounced, because                         not explanations improve the intelligibility of
of the uncertain relationships between causes/effects, or even                   context-aware intelligent systems. In ACM Conference on
whether the agent will ever behave the same way again.                           Human Factors in Computing Systems. ACM,
                                                                                 2119–2128.
CONCLUSION
This paper summarizes the first investigation into information                9. S. Ontañón, G. Synnaeve, A. Uriarte, F. Richoux, D.
foraging behaviors shown by participants tasked with assess-                     Churchill, and M. Preuss. 2013. A Survey of Real-Time
ing a RTS intelligent agent. Our formative studies used IFT to                   Strategy Game AI Research and Competition in StarCraft.
inform XAI and vice versa, by examining both supply (expert                      IEEE Transactions on Computational Intelligence and AI
explanations) and demand (user’s questions).                                     in Games 5, 4 (2013), 293–311.

Our use of the IFT lens allows us to leverage results obtained               10. S. Penney, J. Dodge, C. Hilderbrand, A. Anderson, L.
from applying IFT to non-XAI domains, while also improving                       Simpson, and M. Burnett. 2018. Toward Foraging for
the ability to transport/generalize findings among XAI do-                       Understanding of StarCraft Agents: An Empirical Study.
mains. By connecting XAI to IFT foundations, we can bring                        In ACM Conference on Intelligent User Interfaces. To
to XAI a real theoretical foundation based on what informa-                      Appear.
tions humans want and how they look for it.                                  11. D. Piorkowski, A. Henley, T. Nabi, S. Fleming, C.
                                                                                 Scaffidi, and M. Burnett. 2016. Foraging and navigations,
ACKNOWLEDGMENTS                                                                  fundamentally: developers’ predictions of value and cost.
This work was supported by DARPA #N66001-17-2-4030 and                           In 2016 ACM International Symposium on Foundations
NSF #1314384. Any opinions, findings and conclusions or                          of Software Engineering. ACM, 97–108.
recommendations expressed are those of the authors and do
                                                                             12. P. Pirolli. 2007. Information Foraging Theory: Adaptive
not necessarily reflect the views of NSF, DARPA, the Army
                                                                                 Interaction with Information. Oxford Univ. Press.
Research Office, or the US government.
                                                                             13. S.S. Ragavan, S. Kuttal, C. Hill, A. Sarma, D. Piorkowski,
REFERENCES                                                                       and M. Burnett. 2016. Foraging among an overabundance
 1. S. Amershi, M. Cakmak, W. Knox, and T. Kulesza. 2014.                        of similar variants. In ACM Conference on Human
    Power to the people: The role of humans in interactive                       Factors in Computing Systems. ACM, 3509–3521.
    machine learning. AI Magazine 35, 4 (2014), 105–120.
                                                                             14. S. Stumpf, E. Sullivan, E. Fitzhenry, I. Oberst, W. Wong,
 2. J. Dodge, S. Penney, C. Hilderbrand, A. Anderson, L.                         and M. Burnett. 2008. Integrating rich user feedback into
    Simpson, and M. Burnett. 2018. How the Experts Do It:                        intelligent user interfaces. In ACM International
    Assessing and Explaining Agent Behaviors in Real-Time                        Conference on Intelligent User Interfaces. ACM, 50–59.
    Strategy Games. In ACM Conference on Human Factors
    in Computing Systems. To Appear.                                         15. J. Vermeulen, G. Vanderhulst, K. Luyten, and K. Coninx.
                                                                                 2010. PervasiveCrystal: Asking and answering why and
 3. S. Fleming, C. Scaffidi, D. Piorkowski, M. Burnett, R.                       why not questions about pervasive computing
    Bellamy, J. Lawrance, and I. Kwan. 2013. An                                  applications. In Intelligent Environments (IE), 2010 IEEE
    information foraging theory perspective on tools for                         International Conference on. IEEE, 271–276.