Twelfth International Workshop Modelling and Reasoning in Context (MRC) @IJCAI 2021                                                                 41


          Intrinsic, Dialogic, and Impact Measures of Success for Explainable AI

                                            Jörg Cassens1 , Rebekah Wegener2
                                 1
                                   University of Hildesheim, 31141 Hildesheim, Germany
                                     2
                                       Paris Lodron University, 5020 Salzburg, Austria
                                cassens@cs.uni-hildesheim.de, rebekah.wegener@sbg.ac.at


                              Abstract                                        should be able to “explain away” recommendations made by a
                                                                              diagnostic system in order to enhance the future performance.
      This paper presents a brief overview of require-                        While we currently focus on the opposite situation, e.g. an
      ments for development and evaluation of human                           artificial actor explaining its choice of recommendations to
      centred explainable systems. We propose three per-                      the human user, frameworks for designing explanation-aware
      spectives on evaluation models for explainable AI                       systems should be able to account for different flows of ex-
      that include intrinsic measures, dialogic measures                      planations, at least in principle and by extension.
      and impact measures. The paper outlines these dif-                         In order to distinguish this from views that see the machine
      ferent perspectives and looks at how the separa-                        as only the explainer, not the explainee, we make use of the
      tion might be used for explanation evaluation bench                     established term explanation awareness [Roth-Berghofer et
      marking and integration into design and develop-                        al., 2007; Roth-Berghofer and Richter, 2008]. Our working
      ment. We propose several avenues for future work.                       definition is as follows:
                                                                                 • Internal View: Explanation as part of the reasoning
1    Explanations                                                                  process itself.
Explanations are foundational to social interaction [Lom-
                                                                                       – Example: a recommender system can use domain
brozo, 2006], and numerous different approaches to achiev-
                                                                                         knowledge to explain the absence or variation of
ing explainability have been proposed recently [Adadi and
                                                                                         feature values, e.g. relations between countries
Berrada, 2018; Arrieta et al., 2019; Doran et al., 2017].
   Criticisms of current research trends include that “ac-                       • External View: giving explanations of the found solu-
counts of explanation typically define explanation (the prod-                      tion, its application, or the reasoning process to the other
uct) rather than explaining (the process)” [Edwards et al.,                        actors
2019]. Another criticism is that explanations are currently                            – Example: the user tells said recommender system
largely seen as a relatively uniform and definable concept,                              why he chooses an apartment in Norway despite
and even systems that take user goals with explanation into                              the system suggesting one in Sweden
account treat it largely on the system side of development [Bi-
ran and Cotton, 2017]. Despite this, a human centred [Ehsan                   Semiotics and philosophy as well as the human and social
and Riedl, 2020] perspective on explanation in artificial intel-              sciences provide a rich basis for applications in explainable
ligence is not new [Shortliffe, 1976; Swartout, 1983; Schank,                 AI [Miller, 2018]. There is sufficient empirical and theoreti-
1986; Leake, 1992, 1995; Mao and Benbasat, 2000]. For ex-                     cal evidence that explanations are generated, communicated,
ample, Gregor and Benbasat [1999] point out that different                    understood and used in ways that are:
user groups have different explanation needs.                                    • Dialogic, as suggested e.g. by Leake Leake [1995],
   We have earlier construed contextualised explanations
based on user goals [Sørmo et al., 2005]. This has been                          • Contextualised, as required by e.g.                    Fraassen van
used to integrate explanatory needs in the system design pro-                      Fraassen [1980], comprised of
cess [Roth-Berghofer and Cassens, 2005; Cassens and Kofod-                             – Context Awareness (knowing the situation the sys-
Petersen, 2007]. However, we have represented explanation                                tem is in) and
as a static object rather than a dialogic process. This includes                       – Context Sensitivity (acting according to such situ-
the ability of the technical system to make use of explanations                          ation) Kofod-Petersen and Aamodt [2006]; Kofod-
as well, at least as part of the theoretical model, even if not in                       Petersen and Cassens [2011]
practical applications.
   In our understanding, both human and non-human actors                         • Multimodal, as argued for by e.g. Halliday Halliday
in heterogeneous socio-technical systems (or socio-cognitive,                      [1978] and being
[Noriega et al., 2015]) can be senders and receivers of expla-                   • Construed by user interest, as noted by e.g. Achinstein
nations [Cassens and Wegener, 2019]. For example, a human                          Achinstein [1983].


Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Twelfth International Workshop Modelling and Reasoning in Context (MRC) @IJCAI 2021                                                          42


Given these foundations, can a semiotic model of explanation                  De Ruyter and Aarts, 2010; Holtzblatt and Beyer, 2016], to
as a form of multi-modal dialogic language behaviour in con-                  name a few). We should consider principles and methods
text be used to generate contextually appropriate explanations                for (designing and evaluating) explainability as additions to
by computational systems? There is an extensive body of re-                   existing tool kits, agnostic to their use in established design
search focusing on generating and using explanations in AI.                   processes whenever possible (limited by different ontological
Currently, what is lacking is:                                                commitments).
    1. A theory of the dialogic process rather than a monologic                  Evaluation is central to Human-Computer Interaction, or
       product                                                                rather: evaluations are central since they typically form a cy-
                                                                              cle and cover a system at various stages. While (formative
    2. A cohesive theory of explanation that is:                              and summative) evaluations are a cornerstone for human cen-
          • contextually appropriate (e.g. fitting people, topic,             tred design, “it is far from being a solved problem” [MacDon-
            mode and place),                                                  ald and Atwood, 2013]. We are generally in need for evalu-
          • semantically appropriate (e.g. recognised as an ex-               ation processes that are suited for emerging types of applica-
            planation)                                                        tions [Poppe et al., 2007] and for sustainable and responsible
          • lexicogrammatically optimal (best possible multi-                 systems development [Remy et al., 2018].
            modal realisation)                                                   But even if current (usability) evaluation methods [Dumas
                                                                              and Salzman, 2006] may ultimately fall short in the con-
    3. A framework for integrating explanatory capabilities in                text of XAI, they can at least inform first iterations of eval-
       the whole software development life-cycle, from re-                    uation standards. In particular when used in combination
       quirements elicitation over design and implementation                  with theories and models from other areas, such as linguistics
       through to its use                                                     [Cassens and Wegener, 2008; Halliday, 1978; Wegener et al.,
    4. A framework for evaluation measures.                                   2008], psychology [Kaptelinin, 1996], the cognitive sciences
                                                                              [Keil and Wilson, 2000], or philosophy [Achinstein, 1983;
We will focus on the last aspect in the remainder of this pa-
                                                                              van Fraassen, 1980].
per. Research in particular when it comes to measuring the
actual effectiveness and efficiency of explanations given to                     In this short paper, we cannot explore these contributions
users still seems fragmented. We propose to measure explain-                  in detail, but we will briefly outline a tripartite model for cap-
ability along three lines of inquiry. Intrinsic measures deal                 turing explanatory effectiveness that includes:
with the question of whether the system at hand can gener-                       • Intrinsic measures: measures that pertain to the ability
ate explanations at all. Dialogic measures look at whether                         of a system to generate explanations.
the system’s output is seen as an explanation by the users.                        Can the system generate explanations?
Finally, impact measures ask whether the explanation gen-
erated is of any use. These questions should help to elicit                      • Dialogic measures: measures that pertain to interaction
and formalise requirements for explanations as well as find                        between the system and its users.
ways to evaluate solutions that are operationalised sufficiently                   Does the system’s output work as an explanation for its
to enable making claims of explainability that can be tested                       users?
against and to further comparisons between systems and iter-                     • Impact measures: measures that pertain to the poten-
ations of systems.                                                                 tial, anticipated or actual impact of explanations.
   Explanations are needed during the whole life cycle of ap-                      Is the explanation generated of any use?
plications, from initial requirements elicitation over design
and development processes to using the final system. There-                   We have separated these measures because each of these
fore, it makes sense to look at frameworks for measuring ef-                  three types of measures has different methods for testing and
ficiency and effectiveness of explanations in the context of                  they cover distinct aspects of what “explanatory success” can
whole development and life cycle management processes.                        mean. It is only by combining these different perspectives
While quality measurements for explanation could eventu-                      that we can get a full picture of the explanatory performance
ally enable a final system score (for benchmarking purposes                   of a system and the explanations that are a part of that sys-
[Zhan et al., 2019]), development is a cycle and it is con-                   tem. While we can think of more perspectives, it is important
textual, and the goal is to be able to build “better” systems                 to keep in mind that quality measures have to have a well
through “better” development processes, where explanatory                     defined scope and they need to be, indeed, measurable [Car-
success is part of success metrics. Given existing require-                   valho et al., 2017]. Furthermore, for them to be able to im-
ments for transparency, such perspective on evaluating expla-                 prove processes in practice, they need to be sufficiently sim-
nations can also be part of a regulatory framework for ethical                ple to apply.
AI [Cath, 2018; Coeckelbergh, 2020; Erdélyi and Goldsmith,
2018].                                                                        2.1 Intrinsic Measures
                                                                              These measure the ability of the system to generate explana-
2     Evaluations                                                             tions, both generally for the given context of use, but specifi-
Within HCI, a plethora of different instantiations of hu-                     cally the transparency and interpretability of the system itself
man centred development processes exist (e.g. [Beyer                          or of aspects of the system such as ML models and data used
and Holtzblatt, 1997; Carroll, 2000; Cooper et al., 2014;                     as well as algorithmic and other design choices.


Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Twelfth International Workshop Modelling and Reasoning in Context (MRC) @IJCAI 2021                                                         43


   If a system or parts of a system are not transparent then it               for beneficial and equitable AI, but ethical is at least a good
is unlikely to perform well on either dialogic or impact mea-                 baseline outcome. Here we might expect to see methods such
sures. We can think of intrinsic measures as a baseline for                   as impact studies and hypothetical, scenario and risk mod-
explainable AI – it is a necessary, but not sufficient condition.             elling. It would be beneficial to know what the anticipated
From a design process perspective, we will need to look at                    consequences of the explanation are for everyone involved.
which components are necessary for explanation generation
[Roth-Berghofer and Cassens, 2005]. Evaluating, we might                      3 Related Work
explore the structure, modality and semantic characteristics
of the different explanations to ensure that they are optimised               Mohseni et al. [2018] argue that the interdisciplinary nature
for the situation. There are different specific methods that                  of explainable artificial intelligence (XAI) “poses challenges
might be useful for intrinsic measures.                                       for identifying appropriate design and evaluation methodol-
                                                                              ogy and consolidating knowledge across efforts”. At the same
2.2    Dialogic Measures                                                      time, this interdisciplinary approach is essential to the success
Here we look at the question of whether that which has been                   of XAI. We view our suggestion as a way to complement, fur-
generated actually works as an explanation to the user, in vari-              ther consolidate, and operationalise their classification sys-
ous conditions, situations and contexts. Under investigation is               tem for different goals in XAI.
the shared semiotic process of explanation generator and ex-                     Hoffman et al. [2018] propose a process model of explain-
planation consumer. Different methods are going to be useful                  ing and suggest measures that are applicable in the differ-
for dialogic measures including user studies, reaction studies,               ent phases of their conceptual model. This compliments our
experimental studies and qualitative and quantitative meth-                   (more abstract) notions of dialogic and (to a lesser degree)
ods in general. Explanations are inherently dialogic, so we                   impact measures, whereas we see our notion of intrinsic mea-
are always going to want to know who is requesting the ex-                    sures as a prerequisite for their model. Both models can be
planation, who is providing the explanation and how and why                   systematically combined, depending on the need for gran-
they are providing it. Tracking the exchange of information                   ularity and aspects covered. Mueller et al. [2021] present
itself is a way to evaluate because it lets us see the reaction to            some helpful higher-level psychological considerations that
the explanation.                                                              can serve as general templates for effective explanations.
   Trustworthy AI could be an outcome of systems that score                      Sokol and Flach [2020] introduce fact sheets with an ex-
highly on dialogic measures. This does not mean that trust-                   tensive list of properties for different explanatory methods.
worthy systems will score well on impact measures, indeed,                    This is complimentary to our approach and could be used to
human and non-human agents are quite prepared to trust a                      select methods supporting the measures chosen. A survey by
system that may have negative impacts on their wellbeing.                     Carvalho et al. [2019] on interpretability in machine learning
Trust can be engendered through a dialogically well perform-                  is orthogonal to our model, with their results being useful for
ing malicious system and this is what makes impact measures                   operationalisation of the intrinsic (e.g. their comparison of
so essential.                                                                 different methods) and the dialogic measures (e.g. the notion
                                                                              of explanation properties).
2.3    Impact Measures
Impact measures look at whether providing explanations of-                    4 Conclusion
fers benefits over the use of the system itself. These can be
used both on an individual level and for larger systems.                      We propose a tripartite perspective on explanation in intelli-
   For example, on the individual level, we might consider                    gent systems that aligns with (iterative and contextual) design
an adaptive learning system that offers explanations to fur-                  and development processes of systems such that there is space
ther the learning goal [Sørmo et al., 2005] a user might have.                for formative and summative evaluations. While it enables a
While dialogic measures can be used to evaluate whether such                  final system score (which we propose for benchmarking pur-
an explanation can function as an explanation to the student,                 poses [Zhan et al., 2019]), development is a cycle and it is
it would remain unclear whether the explanation did actually                  contextual, and the goal is to be able to build “better” sys-
improve learning outcomes.                                                    tems, where explanatory success is part of success metrics.
   These measures also look at the impact that the system can                    We have previously discussed the potential for Ambient In-
have in the world. How can it impact decisions, diagnoses, le-                telligence to be useful for creating explainable AI [Cassens
gal and access outcomes? The impact measures examine the                      and Wegener, 2019], particularly on the architecture level and
potential, anticipated or actual impact of the system and the                 with regard to capabilities subsumed [De Ruyter and Aarts,
ability of the system to explain these repercussions to users                 2010]. We propose that the core characteristics and general
in context. Here the concept of contextual AI is important                    architecture of ambient intelligent systems make them a good
because as Ehsan and Riedl argue, ”if we ignore the socially                  framework for developing XAI and that AmI systems them-
situated nature of our technical systems, we will only get a                  selves have the potential to become explanatory agents that
partial and unsatisfying picture” [Ehsan and Riedl, 2020]. A                  can be mediators between humans and other systems. The
good model of context is crucial for evaluating explanatory                   concept of mediating explanatory instances has also been ex-
success [Kofod-Petersen and Cassens, 2007; Wegener et al.,                    plored in the context of virtual explanatory agents [Weitz et
2008]. Ethical AI would be the outcome of a system that                       al., 2020] or as a user-specific “memory” of explanations
scores highly on impact measures. We would of course aim                      [Chaput et al., 2021].


Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Twelfth International Workshop Modelling and Reasoning in Context (MRC) @IJCAI 2021                                                        44


   Development of such mediators, concentrating explanatory                   Rémy Chaput, Amélie Cordier, and Alain Mille. Explana-
capabilities in specialised agents that are contextually embed-                  tion for humans, for machines, for human-machine interac-
ded in our surroundings and have the potential for person-                       tions? In WS Explainable Agency in Artificial Intelligence
alisation and anticipatory interaction, could greatly benefit                    at AAAI 2021, pages 145–152, 2021.
from a cohesive framework for measuring explanatory suc-                      Mark Coeckelbergh. AI ethics. MIT Press, 2020.
cess from different perspectives.
                                                                              Alan Cooper, Robert Reimann, David Cronin, and Christo-
References                                                                       pher Noessel. About Face (fourth edition): the essentials
                                                                                 of interaction design. John Wiley & Sons, 2014.
Peter Achinstein. The Nature of Explanation. Oxford Uni-
   versity Press, Oxford, 1983.                                               Boris De Ruyter and Emile Aarts. Experience research: a
                                                                                 methodology for developing human-centered interfaces. In
Amina Adadi and Mohammed Berrada. Peeking inside the                             Handbook of ambient intelligence and smart environments,
   black-box: a survey on explainable artificial intelligence                    pages 1039–1067. Springer, 2010.
   (xai). IEEE access, 6:52138–52160, 2018.
                                                                              Derek Doran, Sarah Schulz, and Tarek R. Besold. What does
Alejandro Barredo Arrieta, Natalia Dı́az-Rodrı́guez,                             explainable ai really mean? a new conceptualization of per-
   Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto                         spectives. arXiv preprint: 1710.00794, 2017.
   Barbado, Salvador Garcı́a, Sergio Gil-López, Daniel
   Molina, Richard Benjamins, Raja Chatila, and Francisco                     Joseph S. Dumas and Marilyn C. Salzman. Usability as-
   Herrera. Explainable artificial intelligence (xai): Con-                      sessment methods. Reviews of Human Factors and Er-
   cepts, taxonomies, opportunities and challenges toward                        gonomics, 2(1):109–140, 2006.
   responsible ai. arXiv preprint: 1910.10045, 2019.                          Brian J Edwards, Joseph J Williams, Dedre Gentner, and
Hugh Beyer and Karen Holtzblatt. Contextual design: defin-                       Tania Lombrozo. Explanation recruits comparison in a
   ing customer-centered systems. Elsevier, 1997.                                category-learning task. Cognition, 185:21–38, 2019.
Or Biran and Courtenay Cotton. Explanation and justification                  Upol Ehsan and Mark O. Riedl. Human-centered explainable
   in machine learning: A survey. In IJCAI-17 Workshop on                        ai: Towards a reflective sociotechnical approach. arXiv
   Explainable AI (XAI), 2017.                                                   preprint: 2002.01092, 2020.
John M Carroll. Making use: scenario-based design of                          Olivia J. Erdélyi and Judy Goldsmith. Regulating artificial in-
   human-computer interactions. MIT press, 2000.                                 telligence: Proposal for a global solution. In Proceedings
                                                                                 of the 2018 AAAI/ACM Conference on AI, Ethics, and So-
Rainara Maia Carvalho, Rossana Maria de Castro Andrade,                          ciety, AIES ’18, page 95–101, New York, NY, USA, 2018.
   Káthia Marçal de Oliveira, Ismayle de Sousa Santos, and                     Association for Computing Machinery.
   Carla Ilane Moreira Bezerra. Quality characteristics and
   measures for human–computer interaction evaluation in                      Shirley Gregor and Izak Benbasat. Explanations from intelli-
   ubiquitous systems. Software Quality Journal, 25(3):743–                      gent systems: Theoretical foundations and implications for
   795, 2017.                                                                    practice. MIS Quarterly, 23(4):497–530, 1999.
Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Car-                      Michael A.K. Halliday. Language as a Social Semiotic: the
   doso. Machine learning interpretability: A survey on meth-                    social interpretation of language and meaning. University
   ods and metrics. Electronics, 8(8):832, 2019.                                 Park Press, 1978.
Jörg Cassens and Anders Kofod-Petersen. Explanations and                     Robert R Hoffman, Shane T Mueller, Gary Klein, and Jor-
   case-based reasoning in ambient intelligent systems. In                       dan Litman. Metrics for explainable ai: Challenges and
   David C. Wilson and Deepak Khemani, editors, ICCBR-07                         prospects. arXiv preprint 1812.04608, 2018.
   Workshop Proceedings, pages 167–176, Belfast, Northern                     Karen Holtzblatt and Hugh Beyer. Contextual design: Design
   Ireland, 2007.                                                                for life. Morgan Kaufmann, 2016.
Jörg Cassens and Rebekah Wegener. Making use of abstract                     Viktor Kaptelinin. Activity theory: Implications for human-
   concepts – systemic-functional linguistics and ambient in-                    computer interaction. In Bonnie A. Nardi, editor, Con-
   telligence. In Max Bramer, editor, Artificial Intelligence                    text and Consciousness, pages 103–116. MIT Press, Cam-
   in Theory and Practice II – IFIP 20th World Computer                          bridge, MA, 1996.
   Congress, IFIP AI Stream, volume 276 of IFIP, pages 205–                   Frank C. Keil and Robert A. Wilson. Explaining explana-
   214, Milano, Italy, 2008. Springer.                                           tion. In Explanation and Cognition, pages 1–18. Bradford
Jörg Cassens and Rebekah Wegener. Ambient explanations:                         Books, 2000.
   Ambient intelligence and explainable ai. In Ioannis Chatzi-                Anders Kofod-Petersen and Agnar Aamodt. Contextualised
   giannakis, Boris De Ruyter, and Irene Mavrommati, edi-                        ambient intelligence through case-based reasoning. In
   tors, Proceedings of AmI 2019 – European Conference on                        Thomas R. Roth-Berghofer, Mehmet H. Göker, and H. Al-
   Ambient Intelligence, volume LNCS, Rome, Italy, Novem-                        tay Güvenir, editors, Proceedings of the Eighth European
   ber 2019. Springer.                                                           Conference on Case-Based Reasoning (ECCBR 2006),
Corinne Cath. Governing artificial intelligence: ethical, legal                  volume 4106 of LNCS, pages 211–225, Berlin, September
   and technical opportunities and challenges, 2018.                             2006. Springer.


Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Twelfth International Workshop Modelling and Reasoning in Context (MRC) @IJCAI 2021                                                       45


Anders Kofod-Petersen and Jörg Cassens. Explanations and                     Thomas R. Roth-Berghofer and Jörg Cassens. Mapping goals
   context in ambient intelligent systems. In Boicho Koki-                      and kinds of explanations to the knowledge containers of
   nov, Daniel C. Richardson, Thomas R. Roth-Berghofer,                         case-based reasoning systems. In Héctor Muñoz-Avila and
   and Laure Vieu, editors, Modeling and Using Context –                        Francesco Ricci, editors, Case Based Reasoning Research
   CONTEXT 2007, volume 4635 of LNCS, pages 303–316,                            and Development – ICCBR 2005, volume 3630 of LNAI,
   Roskilde, Denmark, 2007. Springer.                                           pages 451–464, Chicago, 2005. Springer.
Anders Kofod-Petersen and Jörg Cassens. Modelling with                       Thomas Roth-Berghofer and Michael M Richter. On expla-
   problem frames: Explanations and context in ambient in-                      nation. Künstliche Intelligenz, 22(2):5–7, 2008.
   telligent systems. In Michael Beigl, Henning Christiansen,                 Thomas Roth-Berghofer, Stefan Schulz, David B Leake, and
   Thomas R. Roth Berghofer, Kenny R. Coventry, Anders                          Daniel Bahls. Explanation-aware computing. AI Maga-
   Kofod-Petersen, and Hedda R. Schmidtke, editors, Model-                      zine, 28(4):122, 2007.
   ing and Using Context – Proceedings of CONTEXT 2011,
   volume 6967 of LNCS, pages 145–158, Karsruhe, Ger-                         Roger C. Schank. Explanation Patterns – Understanding Me-
   many, 2011. Springer.                                                        chanically and Creatively. Lawrence Erlbaum, New York,
                                                                                1986.
David B. Leake. Evaluating Explanations: A Content Theory.
   Lawrence Erlbaum Associates, New York, 1992.                               Edward H Shortliffe. Computer-based medical consultations:
                                                                                Mycin. New York, 1976.
David B. Leake. Goal-based explanation evaluation. In Goal-
   Driven Learning, pages 251–285. MIT Press, Cambridge,                      Kacper Sokol and Peter Flach. Explainability fact sheets: a
   1995.                                                                        framework for systematic assessment of explainable ap-
                                                                                proaches. In Proceedings of the 2020 Conference on
Tania Lombrozo. The structure and function of explanations.                     Fairness, Accountability, and Transparency, pages 56–67,
   Trends in cognitive sciences, 10(10):464–470, 2006.                          2020.
Craig M. MacDonald and Michael E. Atwood. Changing per-                       William R. Swartout. What kind of expert should a system
   spectives on evaluation in hci: Past, present, and future. In               be? xplain: A system for creating and explaining expert
   CHI ’13 Extended Abstracts on Human Factors in Com-                         consulting programs. Artificial Intelligence, 21:285–325,
   puting Systems, CHI EA ’13, page 1969–1978, New York,                       1983.
   NY, USA, 2013. Association for Computing Machinery.
                                                                              Frode Sørmo, Jörg Cassens, and Agnar Aamodt. Explanation
Ji-Ye Mao and Izak Benbasat. The use of explanations in                         in case-based reasoning – perspectives and goals. Artificial
   knowledge-based systems: Cognitive perspectives and a                        Intelligence Review, 24(2):109–143, October 2005.
   process-tracing analysis. Journal of Managment Informa-
   tion Systems, 17(2):153–179, 2000.                                         Bas C. van Fraassen. The Scientific Image. Clarendon Press,
                                                                                Oxford, 1980.
Tim Miller. Explanation in artificial intelligence: Insights
   from the social sciences. Artificial Intelligence, 2018.                   Rebekah Wegener, Jörg Cassens, and David Butt. Start mak-
                                                                                ing sense: Systemic functional linguistics and ambient in-
Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. A multidis-
                                                                                telligence. Revue d’Intelligence Artificielle, 22(5):629–
   ciplinary survey and framework for design and evaluation
                                                                                645, 2008.
   of explainable ai systems. arXiv preprint: 1811.11839,
   2018.                                                                      Katharina Weitz, Dominik Schiller, Ruben Schlagowski, To-
                                                                                bias Huber, and Elisabeth André. “let me explain!”: ex-
Shane T. Mueller, Elizabeth S. Veinott, Robert R. Hoffman,
                                                                                ploring the potential of virtual agents in explainable ai in-
   Gary Klein, Lamia Alam, Tauseef Mamun, and William J.
                                                                                teraction design. Journal on Multimodal User Interfaces,
   Clancey. Principles of explanation in human-ai systems. In
                                                                                pages 1–12, 2020.
   WS Explainable Agency in Artificial Intelligence at AAAI
   2021, pages 153–162, 2021.                                                 Jianfeng Zhan, Lei Wang, Wanling Gao, and Rui Ren. Bench-
                                                                                 council’s view on benchmarking ai and other emerging
Pablo Noriega, Julian Padget, Harko Verhagen, and Mark
                                                                                 workloads. arXiv preprint: 1912.00572, 2019.
   D’Inverno. Towards a framework for socio-cognitive tech-
   nical systems. In A. Ghose, N. Oren, P. Telang, and
   J. Thangarajah, editors, Coordination, Organizations, In-
   stitutions, and Norms in Agent Systems X, volume LNCS,
   pages 164–181. Springer, 2015.
Ronald Poppe, Rutger Rienks, and Betsy Dijk. Evaluating
   the future of hci: Challenges for the evaluation of emerg-
   ing applications. volume LNCS 4451, pages 234–250, 01
   2007.
Christian Remy, Oliver Bates, Jennifer Mankoff, and Adrian
   Friday. Evaluating hci research beyond usability. In Ex-
   tended Abstracts of the 2018 CHI Conference, pages 1–4,
   04 2018.


Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).