<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Intrinsic, Dialogic, and Impact Measures of Success for Explainable AI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>J o¨rg Cassens</string-name>
          <email>cassens@cs.uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rebekah Wegener</string-name>
          <email>rebekah.wegener@sbg.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Paris Lodron University</institution>
          ,
          <addr-line>5020 Salzburg</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Hildesheim</institution>
          ,
          <addr-line>31141 Hildesheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>41</fpage>
      <lpage>45</lpage>
      <abstract>
        <p>This paper presents a brief overview of requirements for development and evaluation of human centred explainable systems. We propose three perspectives on evaluation models for explainable AI that include intrinsic measures, dialogic measures and impact measures. The paper outlines these different perspectives and looks at how the separation might be used for explanation evaluation bench marking and integration into design and development. We propose several avenues for future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Explanations are foundational to social interaction
[Lombrozo, 2006], and numerous different approaches to
achieving explainability have been proposed recently [Adadi and
Berrada, 2018; Arrieta et al., 2019; D
        <xref ref-type="bibr" rid="ref5">oran et al., 2017</xref>
        ].
      </p>
      <p>Criticisms of current research trends include that
“accounts of explanation typically define explanation (the
product) rather than explaining (the process)” [Edwards et al.,
2019]. Another criticism is that explanations are currently
largely seen as a relatively uniform and definable concept,
and even systems that take user goals with explanation into
account treat it largely on the system side of development
[Biran and Cotton, 2017]. Despite this, a human centred [Ehsan
and Riedl, 2020] perspective on explanation in artificial
intelligence is not new [Shortliffe, 1976; Swartout, 1983; Schank,
1986; Leake, 1992, 1995; Mao and Benbasat, 2000]. For
example, Gregor and Benbasat [1999] point out that different
user groups have different explanation needs.</p>
      <p>We have earlier construed contextualised explanations
based on user goals [Sørmo et al., 2005]. This has been
used to integrate explanatory needs in the system design
process [Roth-Berghofer and Cassens, 2005; Cassens and
KofodPetersen, 2007]. However, we have represented explanation
as a static object rather than a dialogic process. This includes
the ability of the technical system to make use of explanations
as well, at least as part of the theoretical model, even if not in
practical applications.</p>
      <p>
        In our understanding, both human and non-human actors
in heterogeneous socio-technical systems
        <xref ref-type="bibr" rid="ref39">(or socio-cognitive,
[Noriega et al., 2015])</xref>
        can be senders and receivers of
explanations [Cassens and Wegener, 2019]. For example, a human
should be able to “explain away” recommendations made by a
diagnostic system in order to enhance the future performance.
While we currently focus on the opposite situation, e.g. an
artificial actor explaining its choice of recommendations to
the human user, frameworks for designing explanation-aware
systems should be able to account for different flows of
explanations, at least in principle and by extension.
      </p>
      <p>In order to distinguish this from views that see the machine
as only the explainer, not the explainee, we make use of the
established term explanation awareness [Roth-Berghofer et
al., 2007; Roth-Berghofer and Richter, 2008]. Our working
definition is as follows:
• Internal View: Explanation as part of the reasoning
process itself.</p>
      <p>– Example: a recommender system can use domain
knowledge to explain the absence or variation of
feature values, e.g. relations between countries
• External View: giving explanations of the found
solution, its application, or the reasoning process to the other
actors
– Example: the user tells said recommender system
why he chooses an apartment in Norway despite
the system suggesting one in Sweden
Semiotics and philosophy as well as the human and social
sciences provide a rich basis for applications in explainable
AI [Miller, 2018]. There is sufficient empirical and
theoretical evidence that explanations are generated, communicated,
understood and used in ways that are:
• Dialogic, as suggested e.g. by Leake Leake [1995],
• Contextualised, as required by e.g.</p>
      <p>Fraassen [1980], comprised of
Fraassen van
– Context Awareness (knowing the situation the
system is in) and
– Context Sensitivity (acting according to such
situation) Kofod-Petersen and Aamodt [2006];
KofodPetersen and Cassens [2011]
• Multimodal, as argued for by e.g. Halliday Halliday
[1978] and being
• Construed by user interest, as noted by e.g. Achinstein
Achinstein [1983].
Given these foundations, can a semiotic model of explanation
as a form of multi-modal dialogic language behaviour in
context be used to generate contextually appropriate explanations
by computational systems? There is an extensive body of
research focusing on generating and using explanations in AI.
Currently, what is lacking is:
1. A theory of the dialogic process rather than a monologic
product
2. A cohesive theory of explanation that is:
• contextually appropriate (e.g. fitting people, topic,
mode and place),
• semantically appropriate (e.g. recognised as an
explanation)
• lexicogrammatically optimal (best possible
multimodal realisation)
3. A framework for integrating explanatory capabilities in
the whole software development life-cycle, from
requirements elicitation over design and implementation
through to its use</p>
    </sec>
    <sec id="sec-2">
      <title>4. A framework for evaluation measures.</title>
      <p>We will focus on the last aspect in the remainder of this
paper. Research in particular when it comes to measuring the
actual effectiveness and efficiency of explanations given to
users still seems fragmented. We propose to measure
explainability along three lines of inquiry. Intrinsic measures deal
with the question of whether the system at hand can
generate explanations at all. Dialogic measures look at whether
the system’s output is seen as an explanation by the users.
Finally, impact measures ask whether the explanation
generated is of any use. These questions should help to elicit
and formalise requirements for explanations as well as find
ways to evaluate solutions that are operationalised sufficiently
to enable making claims of explainability that can be tested
against and to further comparisons between systems and
iterations of systems.</p>
      <p>
        Explanations are needed during the whole life cycle of
applications, from initial requirements elicitation over design
and development processes to using the final system.
Therefore, it makes sense to look at frameworks for measuring
efficiency and effectiveness of explanations in the context of
whole development and life cycle management processes.
While quality measurements for explanation could
eventually enable a final system score
        <xref ref-type="bibr" rid="ref53">(for benchmarking purposes
[Zhan et al., 2019])</xref>
        , development is a cycle and it is
contextual, and the goal is to be able to build “better” systems
through “better” development processes, where explanatory
success is part of success metrics. Given existing
requirements for transparency, such perspective on evaluating
explanations can also be part of a regulatory framework for ethical
AI [Cath, 2018; Coeckelbergh, 2020; Erde´lyi and Goldsmith,
2018].
2
      </p>
      <sec id="sec-2-1">
        <title>Evaluations</title>
        <p>
          Within HCI, a plethora of different instantiations of
human centred development processes exist
          <xref ref-type="bibr" rid="ref15 ref16 ref25 ref4 ref6">(e.g. [Beyer
and Holtzblatt, 1997; Carroll, 2000; Cooper et al., 2014;
De Ruyter and Aarts, 2010; Holtzblatt and Beyer, 2016], to
name a few)</xref>
          . We should consider principles and methods
for (designing and evaluating) explainability as additions to
existing tool kits, agnostic to their use in established design
processes whenever possible (limited by different ontological
commitments).
        </p>
        <p>Evaluation is central to Human-Computer Interaction, or
rather: evaluations are central since they typically form a
cycle and cover a system at various stages. While (formative
and summative) evaluations are a cornerstone for human
centred design, “it is far from being a solved problem”
[MacDonald and Atwood, 2013]. We are generally in need for
evaluation processes that are suited for emerging types of
applications [Poppe et al., 2007] and for sustainable and responsible
systems development [Remy et al., 2018].</p>
        <p>But even if current (usability) evaluation methods [Dumas
and Salzman, 2006] may ultimately fall short in the
context of XAI, they can at least inform first iterations of
evaluation standards. In particular when used in combination
with theories and models from other areas, such as linguistics
[Cassens and Wegener, 2008; Halliday, 1978; Wegener et al.,
2008], psychology [Kaptelinin, 1996], the cognitive sciences
[Keil and Wilson, 2000], or philosophy [Achinstein, 1983;
van Fraassen, 1980].</p>
        <p>In this short paper, we cannot explore these contributions
in detail, but we will briefly outline a tripartite model for
capturing explanatory effectiveness that includes:
• Intrinsic measures: measures that pertain to the ability
of a system to generate explanations.</p>
        <p>Can the system generate explanations?
• Dialogic measures: measures that pertain to interaction
between the system and its users.</p>
        <p>Does the system’s output work as an explanation for its
users?
• Impact measures: measures that pertain to the
potential, anticipated or actual impact of explanations.</p>
        <p>Is the explanation generated of any use?
We have separated these measures because each of these
three types of measures has different methods for testing and
they cover distinct aspects of what “explanatory success” can
mean. It is only by combining these different perspectives
that we can get a full picture of the explanatory performance
of a system and the explanations that are a part of that
system. While we can think of more perspectives, it is important
to keep in mind that quality measures have to have a well
defined scope and they need to be, indeed, measurable
[Carvalho et al., 2017]. Furthermore, for them to be able to
improve processes in practice, they need to be sufficiently
simple to apply.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Intrinsic Measures</title>
      <p>These measure the ability of the system to generate
explanations, both generally for the given context of use, but
specifically the transparency and interpretability of the system itself
or of aspects of the system such as ML models and data used
as well as algorithmic and other design choices.</p>
      <p>If a system or parts of a system are not transparent then it
is unlikely to perform well on either dialogic or impact
measures. We can think of intrinsic measures as a baseline for
explainable AI – it is a necessary, but not sufficient condition.
From a design process perspective, we will need to look at
which components are necessary for explanation generation
[Roth-Berghofer and Cassens, 2005]. Evaluating, we might
explore the structure, modality and semantic characteristics
of the different explanations to ensure that they are optimised
for the situation. There are different specific methods that
might be useful for intrinsic measures.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Dialogic Measures</title>
      <p>Here we look at the question of whether that which has been
generated actually works as an explanation to the user, in
various conditions, situations and contexts. Under investigation is
the shared semiotic process of explanation generator and
explanation consumer. Different methods are going to be useful
for dialogic measures including user studies, reaction studies,
experimental studies and qualitative and quantitative
methods in general. Explanations are inherently dialogic, so we
are always going to want to know who is requesting the
explanation, who is providing the explanation and how and why
they are providing it. Tracking the exchange of information
itself is a way to evaluate because it lets us see the reaction to
the explanation.</p>
      <p>Trustworthy AI could be an outcome of systems that score
highly on dialogic measures. This does not mean that
trustworthy systems will score well on impact measures, indeed,
human and non-human agents are quite prepared to trust a
system that may have negative impacts on their wellbeing.
Trust can be engendered through a dialogically well
performing malicious system and this is what makes impact measures
so essential.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Impact Measures</title>
      <p>Impact measures look at whether providing explanations
offers benefits over the use of the system itself. These can be
used both on an individual level and for larger systems.</p>
      <p>For example, on the individual level, we might consider
an adaptive learning system that offers explanations to
further the learning goal [Sørmo et al., 2005] a user might have.
While dialogic measures can be used to evaluate whether such
an explanation can function as an explanation to the student,
it would remain unclear whether the explanation did actually
improve learning outcomes.</p>
      <p>These measures also look at the impact that the system can
have in the world. How can it impact decisions, diagnoses,
legal and access outcomes? The impact measures examine the
potential, anticipated or actual impact of the system and the
ability of the system to explain these repercussions to users
in context. Here the concept of contextual AI is important
because as Ehsan and Riedl argue, ”if we ignore the socially
situated nature of our technical systems, we will only get a
partial and unsatisfying picture” [Ehsan and Riedl, 2020]. A
good model of context is crucial for evaluating explanatory
success [Kofod-Petersen and Cassens, 2007; Wegener et al.,
2008]. Ethical AI would be the outcome of a system that
scores highly on impact measures. We would of course aim
for beneficial and equitable AI, but ethical is at least a good
baseline outcome. Here we might expect to see methods such
as impact studies and hypothetical, scenario and risk
modelling. It would be beneficial to know what the anticipated
consequences of the explanation are for everyone involved.
3</p>
      <sec id="sec-5-1">
        <title>Related Work</title>
        <p>Mohseni et al. [2018] argue that the interdisciplinary nature
of explainable artificial intelligence (XAI) “poses challenges
for identifying appropriate design and evaluation
methodology and consolidating knowledge across efforts”. At the same
time, this interdisciplinary approach is essential to the success
of XAI. We view our suggestion as a way to complement,
further consolidate, and operationalise their classification
system for different goals in XAI.</p>
        <p>
          Hoffman et al. [2018] propose a process model of
explaining and suggest measures that are applicable in the
different phases of their conceptual model. This compliments our
(more abstract) notions of dialogic and (to a lesser degree)
impact measures, whereas we see our notion of intrinsic
measures as a prerequisite for their model. Both models can be
systematically combined, depending on the need for
granularity and aspects cove
          <xref ref-type="bibr" rid="ref13">red. Mueller et al. [2021</xref>
          ] present
some helpful higher-level psychological considerations that
can serve as general templates for effective explanations.
        </p>
        <p>Sokol and Flach [2020] introduce fact sheets with an
extensive list of properties for different explanatory methods.
This is complimentary to our approach and could be used to
select methods supporting the measures chosen. A survey by
Carvalho et al. [2019] on interpretability in machine learning
is orthogonal to our model, with their results being useful for
operationalisation of the intrinsic (e.g. their comparison of
different methods) and the dialogic measures (e.g. the notion
of explanation properties).
4</p>
      </sec>
      <sec id="sec-5-2">
        <title>Conclusion</title>
        <p>
          We propose a tripartite perspective on explanation in
intelligent systems that aligns with (iterative and contextual) design
and development processes of systems such that there is space
for formative and summative evaluations. While it enables a
final system score
          <xref ref-type="bibr" rid="ref53">(which we propose for benchmarking
purposes [Zhan et al., 2019])</xref>
          , development is a cycle and it is
contextual, and the goal is to be able to build “better”
systems, where explanatory success is part of success metrics.
        </p>
        <p>We have previously discussed the potential for Ambient
Intelligence to be useful for creating explainable AI [Cassens
and Wegener, 2019], particularly on the architecture level and
with regard to capabilities subsumed [De Ruyter and Aarts,
2010]. We propose that the core characteristics and general
architecture of ambient intelligent systems make them a good
framework for developing XAI and that AmI systems
themselves have the potential to become explanatory agents that
can be mediators between humans and other systems. The
concept of mediating explanatory instances has also been
explored in the context of virtual explanatory agents [Weitz et
al., 2020] or as a user-specific “memory” of explanations
[Chaput et al., 2021].</p>
        <p>Development of such mediators, concentrating explanatory
capabilities in specialised agents that are contextually
embedded in our surroundings and have the potential for
personalisation and anticipatory interaction, could greatly benefit
from a cohesive framework for measuring explanatory
success from different perspectives.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Peter</given-names>
            <surname>Achinstein</surname>
          </string-name>
          .
          <source>The Nature of Explanation</source>
          . Oxford University Press, Oxford,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Amina</given-names>
            <surname>Adadi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mohammed</given-names>
            <surname>Berrada</surname>
          </string-name>
          .
          <article-title>Peeking inside the black-box: a survey on explainable artificial intelligence (xai)</article-title>
          .
          <source>IEEE access</source>
          ,
          <volume>6</volume>
          :
          <fpage>52138</fpage>
          -
          <lpage>52160</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Alejandro</given-names>
            <surname>Barredo</surname>
          </string-name>
          <string-name>
            <surname>Arrieta</surname>
          </string-name>
          , Natalia D´
          <fpage>ıaz</fpage>
          -Rodr´ıguez,
          <source>Javier Del Ser</source>
          ,
          <string-name>
            <given-names>Adrien</given-names>
            <surname>Bennetot</surname>
          </string-name>
          , Siham Tabik, Alberto Barbado, Salvador Garc´ıa, Sergio Gil-Lo´pez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera.
          <article-title>Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai</article-title>
          .
          <source>arXiv preprint: 1910.10045</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Hugh</given-names>
            <surname>Beyer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Karen</given-names>
            <surname>Holtzblatt</surname>
          </string-name>
          .
          <article-title>Contextual design: defining customer-centered systems</article-title>
          . Elsevier,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Or</given-names>
            <surname>Biran</surname>
          </string-name>
          and
          <string-name>
            <given-names>Courtenay</given-names>
            <surname>Cotton</surname>
          </string-name>
          .
          <article-title>Explanation and justification in machine learning: A survey</article-title>
          .
          <source>In IJCAI-17 Workshop on Explainable AI (XAI)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>John M Carroll.</surname>
          </string-name>
          <article-title>Making use: scenario-based design of human-computer interactions</article-title>
          . MIT press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Rainara</given-names>
            <surname>Maia</surname>
          </string-name>
          <string-name>
            <surname>Carvalho</surname>
          </string-name>
          , Rossana Maria de Castro Andrade, Ka´thia Marc¸al de Oliveira, Ismayle de Sousa Santos, and
          <article-title>Carla Ilane Moreira Bezerra</article-title>
          .
          <article-title>Quality characteristics and measures for human-computer interaction evaluation in ubiquitous systems</article-title>
          .
          <source>Software Quality Journal</source>
          ,
          <volume>25</volume>
          (
          <issue>3</issue>
          ):
          <fpage>743</fpage>
          -
          <lpage>795</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Diogo</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <surname>Eduardo M. Pereira</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jaime</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cardoso</surname>
          </string-name>
          .
          <article-title>Machine learning interpretability: A survey on methods and metrics</article-title>
          .
          <source>Electronics</source>
          ,
          <volume>8</volume>
          (
          <issue>8</issue>
          ):
          <fpage>832</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>Jo¨rg Cassens and Anders Kofod-Petersen. Explanations and case-based reasoning in ambient intelligent systems</article-title>
          . In David C. Wilson and Deepak Khemani, editors,
          <source>ICCBR-07 Workshop Proceedings</source>
          , pages
          <fpage>167</fpage>
          -
          <lpage>176</lpage>
          , Belfast, Northern Ireland,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>Jo¨rg Cassens and Rebekah Wegener</article-title>
          .
          <article-title>Making use of abstract concepts - systemic-functional linguistics and ambient intelligence</article-title>
          . In Max Bramer, editor,
          <source>Artificial Intelligence in Theory and Practice II - IFIP 20 th World Computer Congress, IFIP AI Stream</source>
          , volume
          <volume>276</volume>
          <source>of IFIP</source>
          , pages
          <fpage>205</fpage>
          -
          <lpage>214</lpage>
          , Milano, Italy,
          <year>2008</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>Jo¨rg Cassens and Rebekah Wegener. Ambient explanations: Ambient intelligence and explainable ai</article-title>
          . In Ioannis Chatzigiannakis, Boris De Ruyter, and Irene Mavrommati, editors,
          <source>Proceedings of AmI 2019 - European Conference on Ambient Intelligence</source>
          , volume LNCS, Rome, Italy,
          <year>November 2019</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Corinne</given-names>
            <surname>Cath</surname>
          </string-name>
          .
          <source>Governing artificial intelligence: ethical, legal and technical opportunities and challenges</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>Re´my Chaput, Ame´lie Cordier, and Alain Mille. Explanation for humans, for machines, for human-machine interactions?</article-title>
          <source>In WS Explainable Agency in Artificial Intelligence at AAAI</source>
          <year>2021</year>
          , pages
          <fpage>145</fpage>
          -
          <lpage>152</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Mark</given-names>
            <surname>Coeckelbergh</surname>
          </string-name>
          .
          <article-title>AI ethics</article-title>
          . MIT Press,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Alan</given-names>
            <surname>Cooper</surname>
          </string-name>
          , Robert Reimann, David Cronin,
          <string-name>
            <given-names>and Christopher</given-names>
            <surname>Noessel</surname>
          </string-name>
          .
          <article-title>About Face (fourth edition): the essentials of interaction design</article-title>
          . John Wiley &amp; Sons,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Boris De Ruyter and Emile Aarts</surname>
          </string-name>
          .
          <article-title>Experience research: a methodology for developing human-centered interfaces</article-title>
          .
          <source>In Handbook of ambient intelligence and smart environments</source>
          , pages
          <fpage>1039</fpage>
          -
          <lpage>1067</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Derek</given-names>
            <surname>Doran</surname>
          </string-name>
          , Sarah Schulz, and
          <string-name>
            <surname>Tarek</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Besold</surname>
          </string-name>
          .
          <article-title>What does explainable ai really mean? a new conceptualization of perspectives</article-title>
          .
          <source>arXiv preprint: 1710.00794</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Joseph S.</given-names>
            <surname>Dumas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marilyn C.</given-names>
            <surname>Salzman</surname>
          </string-name>
          .
          <article-title>Usability assessment methods</article-title>
          .
          <source>Reviews of Human Factors and Ergonomics</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>109</fpage>
          -
          <lpage>140</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Brian J Edwards</surname>
          </string-name>
          ,
          <string-name>
            <surname>Joseph J Williams</surname>
            ,
            <given-names>Dedre</given-names>
          </string-name>
          <string-name>
            <surname>Gentner</surname>
            , and
            <given-names>Tania</given-names>
          </string-name>
          <string-name>
            <surname>Lombrozo</surname>
          </string-name>
          .
          <article-title>Explanation recruits comparison in a category-learning task</article-title>
          .
          <source>Cognition</source>
          ,
          <volume>185</volume>
          :
          <fpage>21</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Upol</given-names>
            <surname>Ehsan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Human-centered explainable ai: Towards a reflective sociotechnical approach</article-title>
          . arXiv preprint:
          <year>2002</year>
          .01092,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Olivia J. Erde</surname>
          </string-name>
          <article-title>´lyi and Judy Goldsmith</article-title>
          .
          <article-title>Regulating artificial intelligence: Proposal for a global solution</article-title>
          .
          <source>In Proceedings of the 2018 AAAI/ACM Conference on AI</source>
          ,
          <string-name>
            <surname>Ethics</surname>
          </string-name>
          , and Society, AIES '
          <volume>18</volume>
          , page 95-
          <fpage>101</fpage>
          , New York, NY, USA,
          <year>2018</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Shirley</given-names>
            <surname>Gregor</surname>
          </string-name>
          and
          <string-name>
            <given-names>Izak</given-names>
            <surname>Benbasat</surname>
          </string-name>
          .
          <article-title>Explanations from intelligent systems: Theoretical foundations and implications for practice</article-title>
          .
          <source>MIS Quarterly</source>
          ,
          <volume>23</volume>
          (
          <issue>4</issue>
          ):
          <fpage>497</fpage>
          -
          <lpage>530</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Michael A.K.</given-names>
            <surname>Halliday</surname>
          </string-name>
          .
          <article-title>Language as a Social Semiotic: the social interpretation of language and meaning</article-title>
          . University Park Press,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Robert R Hoffman</surname>
            , Shane T Mueller, Gary Klein, and
            <given-names>Jordan</given-names>
          </string-name>
          <string-name>
            <surname>Litman</surname>
          </string-name>
          .
          <article-title>Metrics for explainable ai: Challenges and prospects</article-title>
          . arXiv preprint
          <year>1812</year>
          .
          <volume>04608</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Karen</given-names>
            <surname>Holtzblatt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hugh</given-names>
            <surname>Beyer</surname>
          </string-name>
          .
          <article-title>Contextual design: Design for life</article-title>
          . Morgan Kaufmann,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Viktor</given-names>
            <surname>Kaptelinin</surname>
          </string-name>
          .
          <article-title>Activity theory: Implications for humancomputer interaction</article-title>
          . In Bonnie A. Nardi, editor,
          <source>Context and Consciousness</source>
          , pages
          <fpage>103</fpage>
          -
          <lpage>116</lpage>
          . MIT Press, Cambridge, MA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Frank C.</surname>
          </string-name>
          <article-title>Keil and Robert A</article-title>
          . Wilson.
          <article-title>Explaining explanation</article-title>
          .
          <source>In Explanation and Cognition</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          . Bradford Books,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Anders</given-names>
            <surname>Kofod-Petersen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Agnar</given-names>
            <surname>Aamodt</surname>
          </string-name>
          .
          <article-title>Contextualised ambient intelligence through case-based reasoning</article-title>
          . In Thomas R. Roth-Berghofer, Mehmet H. Go¨ker, and H. Altay Gu¨venir, editors,
          <source>Proceedings of the Eighth European Conference on Case-Based Reasoning (ECCBR</source>
          <year>2006</year>
          ), volume
          <volume>4106</volume>
          <source>of LNCS</source>
          , pages
          <fpage>211</fpage>
          -
          <lpage>225</lpage>
          , Berlin,
          <year>September 2006</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Anders</given-names>
            <surname>Kofod-Petersen</surname>
          </string-name>
          and
          <article-title>Jo¨rg Cassens. Explanations and context in ambient intelligent systems</article-title>
          . In Boicho Kokinov, Daniel C. Richardson, Thomas R.
          <string-name>
            <surname>Roth-Berghofer</surname>
          </string-name>
          , and Laure Vieu, editors,
          <source>Modeling and Using Context - CONTEXT</source>
          <year>2007</year>
          , volume
          <volume>4635</volume>
          <source>of LNCS</source>
          , pages
          <fpage>303</fpage>
          -
          <lpage>316</lpage>
          , Roskilde, Denmark,
          <year>2007</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>Anders</given-names>
            <surname>Kofod-Petersen</surname>
          </string-name>
          and
          <article-title>Jo¨rg Cassens. Modelling with problem frames: Explanations and context in ambient intelligent systems</article-title>
          . In Michael Beigl, Henning Christiansen, Thomas R. Roth Berghofer,
          <string-name>
            <surname>Kenny R. Coventry</surname>
          </string-name>
          , Anders Kofod-Petersen, and Hedda R. Schmidtke, editors,
          <source>Modeling and Using Context - Proceedings of CONTEXT</source>
          <year>2011</year>
          , volume
          <volume>6967</volume>
          <source>of LNCS</source>
          , pages
          <fpage>145</fpage>
          -
          <lpage>158</lpage>
          , Karsruhe, Germany,
          <year>2011</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>David B.</given-names>
            <surname>Leake</surname>
          </string-name>
          .
          <article-title>Evaluating Explanations: A Content Theory</article-title>
          . Lawrence Erlbaum Associates, New York,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>David B.</given-names>
            <surname>Leake</surname>
          </string-name>
          .
          <article-title>Goal-based explanation evaluation</article-title>
          .
          <source>In GoalDriven Learning</source>
          , pages
          <fpage>251</fpage>
          -
          <lpage>285</lpage>
          . MIT Press, Cambridge,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>Tania</given-names>
            <surname>Lombrozo</surname>
          </string-name>
          .
          <article-title>The structure and function of explanations</article-title>
          .
          <source>Trends in cognitive sciences</source>
          ,
          <volume>10</volume>
          (
          <issue>10</issue>
          ):
          <fpage>464</fpage>
          -
          <lpage>470</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Craig M. MacDonald and Michael</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Atwood</surname>
          </string-name>
          .
          <article-title>Changing perspectives on evaluation in hci: Past, present, and future</article-title>
          .
          <source>In CHI '13 Extended Abstracts on Human Factors in Computing Systems, CHI EA '13, page 1969-1978</source>
          , New York, NY, USA,
          <year>2013</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Ji-Ye Mao</surname>
            and
            <given-names>Izak</given-names>
          </string-name>
          <string-name>
            <surname>Benbasat</surname>
          </string-name>
          .
          <article-title>The use of explanations in knowledge-based systems: Cognitive perspectives and a process-tracing analysis</article-title>
          .
          <source>Journal of Managment Information Systems</source>
          ,
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>153</fpage>
          -
          <lpage>179</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <given-names>Tim</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <given-names>Sina</given-names>
            <surname>Mohseni</surname>
          </string-name>
          , Niloofar Zarei, and
          <string-name>
            <given-names>Eric D.</given-names>
            <surname>Ragan</surname>
          </string-name>
          .
          <article-title>A multidisciplinary survey and framework for design and evaluation of explainable ai systems</article-title>
          . arXiv preprint:
          <year>1811</year>
          .11839,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>Shane</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Mueller</surname>
          </string-name>
          , Elizabeth S. Veinott,
          <string-name>
            <surname>Robert R. Hoffman</surname>
            , Gary Klein, Lamia Alam, Tauseef Mamun, and
            <given-names>William J.</given-names>
          </string-name>
          <string-name>
            <surname>Clancey</surname>
          </string-name>
          .
          <article-title>Principles of explanation in human-ai systems</article-title>
          .
          <source>In WS Explainable Agency in Artificial Intelligence at AAAI</source>
          <year>2021</year>
          , pages
          <fpage>153</fpage>
          -
          <lpage>162</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Noriega</surname>
          </string-name>
          , Julian Padget, Harko Verhagen, and
          <string-name>
            <surname>Mark D'Inverno</surname>
          </string-name>
          .
          <article-title>Towards a framework for socio-cognitive technical systems</article-title>
          . In A. Ghose,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Telang</surname>
          </string-name>
          , and J. Thangarajah, editors, Coordination, Organizations, Institutions, and
          <article-title>Norms in Agent Systems X, volume</article-title>
          <string-name>
            <surname>LNCS</surname>
          </string-name>
          , pages
          <fpage>164</fpage>
          -
          <lpage>181</lpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <given-names>Ronald</given-names>
            <surname>Poppe</surname>
          </string-name>
          , Rutger Rienks, and
          <string-name>
            <given-names>Betsy</given-names>
            <surname>Dijk</surname>
          </string-name>
          .
          <article-title>Evaluating the future of hci: Challenges for the evaluation of emerging applications</article-title>
          .
          <source>volume LNCS 4451</source>
          , pages
          <fpage>234</fpage>
          -
          <lpage>250</lpage>
          ,
          <year>01 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <given-names>Christian</given-names>
            <surname>Remy</surname>
          </string-name>
          , Oliver Bates, Jennifer Mankoff, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Friday</surname>
          </string-name>
          .
          <article-title>Evaluating hci research beyond usability</article-title>
          .
          <source>In Extended Abstracts of the 2018 CHI Conference</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          ,
          <year>04 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Thomas R.</surname>
          </string-name>
          <article-title>Roth-Berghofer and Jo¨rg Cassens. Mapping goals and kinds of explanations to the knowledge containers of case-based reasoning systems</article-title>
          . In He´ctor Mun˜
          <article-title>oz-Avila and</article-title>
          Francesco Ricci, editors,
          <source>Case Based Reasoning Research and Development - ICCBR</source>
          <year>2005</year>
          , volume
          <volume>3630</volume>
          <source>of LNAI</source>
          , pages
          <fpage>451</fpage>
          -
          <lpage>464</lpage>
          , Chicago,
          <year>2005</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Roth-Berghofer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael M</given-names>
            <surname>Richter.</surname>
          </string-name>
          <article-title>On explanation</article-title>
          .
          <source>Ku¨nstliche Intelligenz</source>
          ,
          <volume>22</volume>
          (
          <issue>2</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Roth-Berghofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Schulz</surname>
          </string-name>
          , David B Leake, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Bahls</surname>
          </string-name>
          .
          <article-title>Explanation-aware computing</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ):
          <fpage>122</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <given-names>Roger C.</given-names>
            <surname>Schank. Explanation Patterns - Understanding Mechanically</surname>
          </string-name>
          and Creatively. Lawrence Erlbaum, New York,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <surname>Edward H Shortliffe</surname>
          </string-name>
          .
          <article-title>Computer-based medical consultations: Mycin</article-title>
          . New York,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <given-names>Kacper</given-names>
            <surname>Sokol</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Flach</surname>
          </string-name>
          .
          <article-title>Explainability fact sheets: a framework for systematic assessment of explainable approaches</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency</source>
          , pages
          <fpage>56</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <given-names>William R.</given-names>
            <surname>Swartout</surname>
          </string-name>
          .
          <article-title>What kind of expert should a system be? xplain: A system for creating and explaining expert consulting programs</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>21</volume>
          :
          <fpage>285</fpage>
          -
          <lpage>325</lpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <given-names>Frode</given-names>
            <surname>Sørmo</surname>
          </string-name>
          ,
          <article-title>J o¨rg Cassens, and Agnar Aamodt. Explanation in case-based reasoning - perspectives and goals</article-title>
          .
          <source>Artificial Intelligence Review</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ):
          <fpage>109</fpage>
          -
          <lpage>143</lpage>
          ,
          <year>October 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <surname>Bas C. van Fraassen</surname>
          </string-name>
          .
          <article-title>The Scientific Image</article-title>
          . Clarendon Press, Oxford,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <given-names>Rebekah</given-names>
            <surname>Wegener</surname>
          </string-name>
          , Jo¨rg Cassens, and David Butt.
          <article-title>Start making sense: Systemic functional linguistics and ambient intelligence</article-title>
          .
          <source>Revue d'Intelligence Artificielle</source>
          ,
          <volume>22</volume>
          (
          <issue>5</issue>
          ):
          <fpage>629</fpage>
          -
          <lpage>645</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <given-names>Katharina</given-names>
            <surname>Weitz</surname>
          </string-name>
          , Dominik Schiller, Ruben Schlagowski, Tobias Huber, and Elisabeth Andre´.
          <article-title>“let me explain!”: exploring the potential of virtual agents in explainable ai interaction design</article-title>
          .
          <source>Journal on Multimodal User Interfaces</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <given-names>Jianfeng</given-names>
            <surname>Zhan</surname>
          </string-name>
          , Lei Wang,
          <string-name>
            <surname>Wanling Gao</surname>
            , and
            <given-names>Rui</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
          </string-name>
          .
          <article-title>Benchcouncil's view on benchmarking ai and other emerging workloads</article-title>
          . arXiv preprint:
          <year>1912</year>
          .00572,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>