<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XAI and philosophical work on explanation: A roadmap</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleks Knoks</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Raleigh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Luxembourg</institution>
          ,
          <addr-line>Maison du Nombre, 6, Avenue de la Fonte, L-4364, Esch-sur-Alzette</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Philosophy, University of Luxembourg, Maison des Sciences Humaines</institution>
          ,
          <addr-line>11, Porte des Sciences, L-4366, Esch-sur-Alzette</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>What Deep Neural Networks (DNNs) can do is impressive, yet they are notoriously opaque. Responding to the worries associated with this opacity, the field of XAI has produced a plethora of methods purporting to explain the workings of DNNs. Unsurprisingly, a whole host of questions revolves around the notion of explanation central to this field. This note provides a roadmap of the recent work that tackles these questions from the perspective of philosophical ideas on explanations and models in science.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep Neural Networks</kwd>
        <kwd>Black Box Problem</kwd>
        <kwd>Explainable Artificial Intelligence</kwd>
        <kwd>explanation</kwd>
        <kwd>understanding</kwd>
        <kwd>scientific models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        better than another? When is an explanation a valuable simplification and when an
oversimplification? Since these types of questions have long been pursued by philosophers –
especially in the philosophy of science and epistemology – there’s scope for fruitful interaction
between philosophy with computer science in the context of XAI. And indeed, several recent
publications have applied ideas from contemporary philosophy – more specifically, the literature
on scientific models and modelling – to the particular case of opaque DNNs. It pays to distinguish
this literature from Miller’s seminal paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Where Miller called for supplementing XAI with
insights from social sciences (within which he includes philosophy) and, in particular, insights
about the way “people define, generate, select, evaluate, and present explanation” [ 3, p. 1], the
literature in question debates the viability of the whole project of XAI.
      </p>
      <p>The main goal of this note is to provide a roadmap through this literature. The views we
discuss can be arranged on a spectrum. At one end, we have the Optimists who think that
the moral to draw from philosophical work on explanation, understanding, and models is that
there’s no barrier to the use of XAI models in providing genuine explanation and understanding
of DNNs. At the other, we have the Pessimists who argue that there are principled reasons that
make these methods suspect, and that there are better alternatives to securing the reliability of
DNNs. Our discussion moves from the optimist to the pessimist end (Sections 2–5), concluding
with some remarks on promising directions for future research (Section 6).</p>
    </sec>
    <sec id="sec-2">
      <title>2. The optimist end: Páez and Fleisher</title>
      <p>
        Authors who are most optimistic about XAI are Páez [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Fleisher [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Páez suggests that
the concept of understanding is superior to that of explanation when interpreting DNNs. In
fact, he claims that there can be no explanations for black-box systems, at least, not in the
traditional sense of the term. This owes to to the fact that explanations are factive, or that “both
explanans and explanandum must be true” [4, p. 445]. Páez argues that XAI models are bound
to be false in the same way that an explanation of why a bridged collapsed using Newtonian
(as opposed to relativistic) physics is false, since they are only coarse approximations of how
the system behaves over a restricted domain. Thus, he writes, “Machine learning is the kind of
context in which one can say that, in principle, it is impossible to satisfy the factivity condition”
[4, p. 454] on either explaining-why or understanding-why a DNN has produced a specific
output for specific input. On the other hand, Páez also thinks that XAI models can provide a
non-factive form of “objectual understanding” of a DNN’s internal “mechanisms” equivalent to
an engineer’s objectual understanding of a bridge collapsing using a Newtonian model.
      </p>
      <p>Fleisher also focuses on understanding, explicitating it as follows: A subject understands
why  if she grasps an explanation  of , where  consists in information about the causal
patterns relevant to why  obtains; and where grasping  means accepting  and having
the abilities to exploit and manipulate the causal pattern information that  contains. He
emphasizes the fact that scientific models are often highly idealized and suggests that XAI
methods are relevantly similar. Just like scientific models, such XAI models as LIME represent
the causal patterns within a particular DNN and make predictions that are imperfectly faithful
to it. In view of this, Fleisher concludes that there is no principled reason for not accepting
imperfect XAI models.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Sullivan</title>
      <p>
        Sullivan’s [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] focus is not on the prospects of explanations of opaque DNNs, but rather on
whether and when these DNNs themselves – despite their opacity – can lead to explanations
and understanding of target phenomena. She thinks, for instance, that the melanoma-detection
model of Estevan et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] can further one’s understanding of mole classification, while the
sexual orientation-classification model of Wang and Kasinski [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] cannot provide understanding
of the relation between sexual orientation and appearance.
      </p>
      <p>
        Sullivan suggests that neither the opacity, nor the complexity of DNNs is a barrier to their
providing the understanding of the target domain. By extension, she is (at least) not opposed to
the use of XAI methods in providing understanding of the way DNNs work. She does think,
however, that more is needed for an explanation and understanding of the target phenomenon,
namely, evidence supporting the link between the model and the target phenomenon. This
conclusion is motivated by the use of models in science – in particular, Schelling’s model
of segregation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Sullivan argues that a model leads to genuine understanding just in case
it provides not a mere “how-possibly”, but a “how-actually” explanation, and that a model
provides the latter only if there is evidence that the features of the target phenomenon the
model represents do really behave in the way the model has them behave. Thus, Schelling’s
model is well-situated to provide understanding because there is some (limited) empirical
evidence that people’s preferences for their neighbours’ appearance do indeed cause them to
move house. The situation with opaque DNNs is quite diferent: we do not know what the
internal features of a DNN represent about the target phenomenon. Until we haven identified
the internal representational components within a DNN, it’s not even possible to use evidence
to reduce what Sullivan calls “link uncertainty”, and so we cannot take the DNN to provide
any kind of explanation of its input-output classification. Schelling’s model is constructed with
clearly labelled representational parts from the start. But with trained DNNs (without XAI)
we have no idea which features of the inputs are being grouped together and what sorts of
inferences are being performed to reach the output.
4. Durán
Durán urges the need for a “top-down” approach to XAI which starts by trying to identify
what counts as a bona fide “scientific XAI (sXAI)” rather than adopting a piecemeal ‘bottom-up’
approach of creating a range of purported XAI technologies depending on what computational
technology / method happens to be conveniently available. He emphasizes that scientific
explanations are meant to grow our understanding of why something is the case, whereas much
of contemporary XAI in fact only ofers mere classifications and predictions. Durán claims
that post-hoc XAI methods are “transparency-conditional”: any explanations, or predictions
that the method produces are mediated via the XAI system, rather than engaging directly with
the DNN itself. This implies, Durán suggests, that for an XAI model to explain, there must be
a formal connection (isomorphism, similarity, or some such) between the DNN and the XAI
model. Without such a connection there is no basis for claims that an explanation based on the
XAI model applies to the DNN. Durán laments that the form of this connection is never spelled
out. Durán is surely correct here concerning post-hoc model agnostic techniques – and we
would add that unless more is said constraining the nature of this connection, isomorphisms and
similarities between the XAI and the DNN will simply be too cheap and abundant. Durán also
warns that the surveyable and straightforward nature of XAI algorithms make them susceptible
to providing a “false sense of explainability because classifications are not explanations” [ 10,
p. 3]. He criticizes the tendency to confuse the “analysis of the structure of explanation” with
the “pragmatics of giving explanations”. The fact that diferent information must be delivered
to diferent audiences has no bearing on the structure of a bona fide explanation. We agree
with Durán here and suggest that such a criticism could justly be applied to Miller’s influential
paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Drawing on discussions in philosophy of science, Durán compares the explanations
provided by typical post-hoc XAI with an “explanation” of the apparent retrograde motion of
planets using the Ptolemaic model of planetary motion. Genuine explanation, Durán points out,
is a success term, meaning that it must come with genuine knowledge and understanding of the
world. Just as the Ptolemaic model cannot produce knowledge of this kind, neither, according
to Durán, can typical post-hoc XAI models produce genuine understanding of the DNN.
      </p>
    </sec>
    <sec id="sec-4">
      <title>5. The pessimist end: Durán and Jongsma and Babic et al.</title>
      <p>
        Durán and Jongsma [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Babic et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] focus on the applications of DNNs in healthcare
and express skepticism about the use of XAI in this context. Durán and Jongsma hold that a
typical XAI model doesn’t ofer suficient reason to believe that we can reliably trust the DNN it
aims to explicate. On their view, when a layperson sees the appealing visual outputs produced
by a post-hoc XAI (such as saliency maps or heatmaps) – she acquires only an unjustified belief
that it really represents the way the DNN produced the output. The problem is supposed to
be that, for all she knows, the post-hoc XAI is as opaque as the original DNN. XAI is said
to “induce” the belief that one knows why the DNN produced the output without oferring
a “genuine reason” to believe that XAI has interpreted the DNN. As an alternative, Durán
and Jongsma propose computational reliabilism. On this view, one is justified in believing the
predictions of a given AI system just in case “there is a reliable process... that yields, most of
the time, trustworthy results”. In spelling out the notion of a reliable process, four “reliability
indicators” are indicated: verification methods, robustness analysis, a history of (un)successful
implementations, and expert knowledge. Jointly these are said to ofer a justification to believe
that the results of medical AI systems are epistemically trustworthy. Also, such trustworthiness
is taken to be necessary, but not suficient for permissibly acting on an output of a medical AI.
      </p>
      <p>
        Babic et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] argue against the suggestion that providing XAI should be a legal requirement
on using DNNs in a healthcare setting. In their view, XAI outputs are not necessarily the actual
reasons behind the outputs of DNNs, nor causally related to them. Babic et al. hold that they
provide only “ersatz understanding”, that is, XAI outputs can leave one with a false impression
that one understands the working of a given DNN better - see also [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] here. They also criticize
post-hoc XAI for failing to be robust, for failing to provide genuine accountability, and for
threatening to limit the performance and complexity of DNNs that can be used in healthcare.
They conclude that, instead of emphasizing explainability, regulators should focus on ensuring
and requiring reliable performance of DNNs.
      </p>
    </sec>
    <sec id="sec-5">
      <title>6. Lessons for future research</title>
      <p>Having surveyed the literature, we close by identifying two promising directions for future
research: (1) and (2). But first, some methodological advice: When reading the literature, it is
important to keep in mind the distinction between (i) considering whether an opaque DNN
trained to predict or classify phenomena in some target domain might also provide us with
explanations of these phenomena, and (ii) considering whether some XAI method can provide
us with explanations of the opaque DNN. Some theorists (e.g. Sullivan) are primarily concerned
with the former, whilst others (Fleischer, Páez) are concerned with the latter. Often both of
these topics will be discussed at diferent points within a single paper. Furthermore, the term
model is sometimes used to refer to the full DNN itself (which is said to be a model of the target
phenomena) and sometimes to refer to an XAI model of the DNN (thus, a model of a model).
The moral here is that this literature doesn’t always mean the same thing by explanation.</p>
      <p>
        (1) Siding with the Optimists, we tend to think that there is no in principle reason why a
simplified, model-agnostic XAI model of a DNN cannot provide (at least some degree of) genuine
understanding. The claims made by the Pessimists that there is some kind of fundamental
problem with the very idea of such XAI techniques are too strong. However, the Pessimists’ core
worry that simplifying XAI models may be providing only pseudo-explanations and
pseudounderstanding will remain a pressing concern until we have a better grip on when exactly the
simplifications / idealizations made by a model are legitimate and useful and when they are not.
This is especially challenging in the case of modelling DNNs compared with other examples of
scientific modelling. When we employ a simplified model of a physical process or a simplified
economic model, we are perfectly aware of how the simplified model is a simplification: we
choose what the model represents and which features are being left out of the model, and so
we can have a reasonable idea about the importance and relevance of these features. In the
case of DNNs we lack an independent grip on the target phenomenon: we don’t know how
the DNN is transforming the input to obtain the output, and so we can form no clear idea
concerning the respects in which a simplified XAI model of a given DNN is a simplification and
no reasonable idea as to when the features of the DNN that the XAI model doesn’t track might
become important. Thus, one crucial topic for future research is to identify some principled
basis for deciding when an XAI model is a useful simplification and when it oversimplifies.
(And unlike Miller [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],we don’t think that the laymen reports could serve as such basis.)
(2) Agreeing with the Pessimists, we think that, at least sometimes, a reliable track record of
accuracy should sufice for trusting an opaque DNN. The pressing question, then, is when is
such a record enough. In part, this is an ethical issue: when are users / stakeholders owed an
explanation for a decision made by a DNN? But there is an epistemological issue here too: under
what circumstances is it reasonable to think that future inputs will resemble past inputs, so that
past track record of reliability can serve as the basis for trust? How can we estimate variation
in future inputs compared with past inputs and the training data? For example, when we think
of a DNN trained on a set of standardized photos or scans of one specific organ or anatomical
feature, the risk of the system responding in unforeseen ways to new inputs that difer in some
crucial way from the training data distribution seems small. But if we think of cases in which
the training data and potential inputs allow for more variation, the risk that the system might
encounter novel, of-distribution inputs for which it is no longer reliable seems much higher.
One of the four “reliability indicators” Durán and Jongsma identify is robustness analysis – a
term taken from engineering, where it refers to an analysis of a systems performance under
a range of diferent conditions. They comment, “Robustness analysis.. allows researchers to
learn about the results of a given model, and whether they are an artefact of it (eg, due to a poor
idealisation) or whether they are related to core features of the model” [11, p. 332]. This points
in the direction of a possible solution to the epistemological issue. However, for now it is also
only a promissory note, since it is not immediately clear what a satisfactory robustness analysis
of a DNN would amount to, and what sorts of “conditions” would we vary and test.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Knoks benefited from funding of the Luxembourg National Research Fund (FNR) under the
OPEN programme within the project Deontic Logic for Epistemic Rights (DELIGHT).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burrell</surname>
          </string-name>
          ,
          <article-title>How the machine 'thinks': Understanding opacity in machine learning algorithms</article-title>
          ,
          <source>Big Data &amp; Society</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <source>Principles of Explainable Artificial Intelligence</source>
          , Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>267</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Páez</surname>
          </string-name>
          ,
          <article-title>The pragmatic turn in explainable artificial intelligence (XAI)</article-title>
          ,
          <source>Minds and Machines</source>
          <volume>29</volume>
          (
          <year>2019</year>
          )
          <fpage>441</fpage>
          -
          <lpage>459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fleisher</surname>
          </string-name>
          , Understanding, idealization, and
          <string-name>
            <surname>explainable</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <source>Episteme (forthcoming).</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          ,
          <article-title>Understanding from machine learning models</article-title>
          ,
          <source>The British Journal for the Philosophy of Science</source>
          <volume>73</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Esteva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuprel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Swetter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Blau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          ,
          <article-title>Dermatologistlevel classification of skin cancer with deep neural networks</article-title>
          ,
          <source>nature</source>
          <volume>542</volume>
          (
          <year>2017</year>
          )
          <fpage>115</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kosinski</surname>
          </string-name>
          ,
          <article-title>Deep neural networks are more accurate than humans at detecting sexual orientation from facial images</article-title>
          ,
          <source>Journal of personality and social psychology 114</source>
          (
          <year>2018</year>
          )
          <fpage>246</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Schelling</surname>
          </string-name>
          , Dynamic models of segregation,
          <source>The Journal of Mathematical Sociology</source>
          <volume>1</volume>
          (
          <year>1971</year>
          )
          <fpage>143</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J. M. Durán</surname>
          </string-name>
          ,
          <article-title>Dissecting scientific explanation in AI (sXAI): A case for medicine and healthcare</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>297</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J. M. Durán</surname>
            ,
            <given-names>K. R.</given-names>
          </string-name>
          <string-name>
            <surname>Jongsma</surname>
          </string-name>
          ,
          <article-title>Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI</article-title>
          ,
          <source>Journal of Medical Ethics</source>
          <volume>47</volume>
          (
          <year>2021</year>
          )
          <fpage>329</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Babic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gerke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Evgeniou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. G.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>Beware explanations from ai in health care</article-title>
          ,
          <source>Science</source>
          <volume>373</volume>
          (
          <year>2021</year>
          )
          <fpage>284</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <article-title>The mythos of model interpretability</article-title>
          ,
          <source>Queue</source>
          <volume>16</volume>
          (
          <year>2018</year>
          )
          <fpage>31</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>