XAI and philosophical work on explanation: A
roadmap
Aleks Knoks1 , Thomas Raleigh2
1
  Department of Computer Science, University of Luxembourg, Maison du Nombre, 6, Avenue de la Fonte, L-4364,
Esch-sur-Alzette, Luxembourg
2
  Department of Philosophy, University of Luxembourg, Maison des Sciences Humaines, 11, Porte des Sciences, L-4366,
Esch-sur-Alzette, Luxembourg


                                         Abstract
                                         What Deep Neural Networks (DNNs) can do is impressive, yet they are notoriously opaque. Responding
                                         to the worries associated with this opacity, the field of XAI has produced a plethora of methods purporting
                                         to explain the workings of DNNs. Unsurprisingly, a whole host of questions revolves around the notion
                                         of explanation central to this field. This note provides a roadmap of the recent work that tackles these
                                         questions from the perspective of philosophical ideas on explanations and models in science.

                                         Keywords
                                         Deep Neural Networks, Black Box Problem, Explainable Artificial Intelligence, explanation, understand-
                                         ing, scientific models


1. Introduction
The last decade has seen an explosion of impressive applications of Deep Neural Networks
(DNNs) and other techniques from Machine Learning: systems using these techniques can
classify objects from images, diagnose diseases based on medical records, predict protein folds,
and do much more. However, these systems are also rightly characterized as opaque, meaning
that even the engineers of a given DNN can’t always understand and explain why it produces a
specific output in response to a specific input.1 This issue – known as the Black Box Problem –
has given rise to a flourishing research field of Explainable Artificial Intelligence (XAI) and its
range of methods purporting to explain why a given DNN produces a given output, including
LIME, SHAP, counterfactual explanations, layerwise relevance propagation, and many others.2
   The wide variety of existing XAI methods naturally leads one to wonder if some of them
are better than others. This, in turn, raises questions of how candidate explanations are to be
evaluated: What makes for a correct / good / acceptable explanation? When is one explanation

1st Workshop on Bias, Ethical AI, Explainability and the role of Logic and Logic Programming, BEWARE-22, co-located
with AIxIA 2022, University of Udine, Udine, Italy, 2022
$ aleks.knoks@uni.lu (A. Knoks); thomas.raleigh@uni.lu (T. Raleigh)
 https://aleksknoks.com (A. Knoks); https://thomasraleigh.weebly.com (T. Raleigh)
 0000-0001-8384-0328 (A. Knoks); 0000-0001-5056-0039 (T. Raleigh)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
            CEUR Workshop Proceedings (CEUR-WS.org)
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073


1
  For an important early discussion of different forms of opacity, see Burrell [1]. Our focus is on the opacity that
  Burrell links to the “way algorithms operate at the scale of application”.
2
  We cannot survey these methods for reasons of space. For a detailed survey, see, for instance, Guidotti et al. [2].
better than another? When is an explanation a valuable simplification and when an over-
simplification? Since these types of questions have long been pursued by philosophers –
especially in the philosophy of science and epistemology – there’s scope for fruitful interaction
between philosophy with computer science in the context of XAI. And indeed, several recent
publications have applied ideas from contemporary philosophy – more specifically, the literature
on scientific models and modelling – to the particular case of opaque DNNs. It pays to distinguish
this literature from Miller’s seminal paper [3]. Where Miller called for supplementing XAI with
insights from social sciences (within which he includes philosophy) and, in particular, insights
about the way “people define, generate, select, evaluate, and present explanation” [3, p. 1], the
literature in question debates the viability of the whole project of XAI.
   The main goal of this note is to provide a roadmap through this literature. The views we
discuss can be arranged on a spectrum. At one end, we have the Optimists who think that
the moral to draw from philosophical work on explanation, understanding, and models is that
there’s no barrier to the use of XAI models in providing genuine explanation and understanding
of DNNs. At the other, we have the Pessimists who argue that there are principled reasons that
make these methods suspect, and that there are better alternatives to securing the reliability of
DNNs. Our discussion moves from the optimist to the pessimist end (Sections 2–5), concluding
with some remarks on promising directions for future research (Section 6).


2. The optimist end: Páez and Fleisher
Authors who are most optimistic about XAI are Páez [4] and Fleisher [5]. Páez suggests that
the concept of understanding is superior to that of explanation when interpreting DNNs. In
fact, he claims that there can be no explanations for black-box systems, at least, not in the
traditional sense of the term. This owes to to the fact that explanations are factive, or that “both
explanans and explanandum must be true” [4, p. 445]. Páez argues that XAI models are bound
to be false in the same way that an explanation of why a bridged collapsed using Newtonian
(as opposed to relativistic) physics is false, since they are only coarse approximations of how
the system behaves over a restricted domain. Thus, he writes, “Machine learning is the kind of
context in which one can say that, in principle, it is impossible to satisfy the factivity condition”
[4, p. 454] on either explaining-why or understanding-why a DNN has produced a specific
output for specific input. On the other hand, Páez also thinks that XAI models can provide a
non-factive form of “objectual understanding” of a DNN’s internal “mechanisms” equivalent to
an engineer’s objectual understanding of a bridge collapsing using a Newtonian model.
   Fleisher also focuses on understanding, explicitating it as follows: A subject understands
why 𝑋 if she grasps an explanation 𝐸 of 𝑋, where 𝐸 consists in information about the causal
patterns relevant to why 𝑋 obtains; and where grasping 𝐸 means accepting 𝐸 and having
the abilities to exploit and manipulate the causal pattern information that 𝐸 contains. He
emphasizes the fact that scientific models are often highly idealized and suggests that XAI
methods are relevantly similar. Just like scientific models, such XAI models as LIME represent
the causal patterns within a particular DNN and make predictions that are imperfectly faithful
to it. In view of this, Fleisher concludes that there is no principled reason for not accepting
imperfect XAI models.
3. Sullivan
Sullivan’s [6] focus is not on the prospects of explanations of opaque DNNs, but rather on
whether and when these DNNs themselves – despite their opacity – can lead to explanations
and understanding of target phenomena. She thinks, for instance, that the melanoma-detection
model of Estevan et al. [7] can further one’s understanding of mole classification, while the
sexual orientation-classification model of Wang and Kasinski [8] cannot provide understanding
of the relation between sexual orientation and appearance.
   Sullivan suggests that neither the opacity, nor the complexity of DNNs is a barrier to their
providing the understanding of the target domain. By extension, she is (at least) not opposed to
the use of XAI methods in providing understanding of the way DNNs work. She does think,
however, that more is needed for an explanation and understanding of the target phenomenon,
namely, evidence supporting the link between the model and the target phenomenon. This
conclusion is motivated by the use of models in science – in particular, Schelling’s model
of segregation [9]. Sullivan argues that a model leads to genuine understanding just in case
it provides not a mere “how-possibly”, but a “how-actually” explanation, and that a model
provides the latter only if there is evidence that the features of the target phenomenon the
model represents do really behave in the way the model has them behave. Thus, Schelling’s
model is well-situated to provide understanding because there is some (limited) empirical
evidence that people’s preferences for their neighbours’ appearance do indeed cause them to
move house. The situation with opaque DNNs is quite different: we do not know what the
internal features of a DNN represent about the target phenomenon. Until we haven identified
the internal representational components within a DNN, it’s not even possible to use evidence
to reduce what Sullivan calls “link uncertainty”, and so we cannot take the DNN to provide
any kind of explanation of its input-output classification. Schelling’s model is constructed with
clearly labelled representational parts from the start. But with trained DNNs (without XAI)
we have no idea which features of the inputs are being grouped together and what sorts of
inferences are being performed to reach the output.


4. Durán
Durán urges the need for a “top-down” approach to XAI which starts by trying to identify
what counts as a bona fide “scientific XAI (sXAI)” rather than adopting a piecemeal ‘bottom-up’
approach of creating a range of purported XAI technologies depending on what computational
technology / method happens to be conveniently available. He emphasizes that scientific
explanations are meant to grow our understanding of why something is the case, whereas much
of contemporary XAI in fact only offers mere classifications and predictions. Durán claims
that post-hoc XAI methods are “transparency-conditional”: any explanations, or predictions
that the method produces are mediated via the XAI system, rather than engaging directly with
the DNN itself. This implies, Durán suggests, that for an XAI model to explain, there must be
a formal connection (isomorphism, similarity, or some such) between the DNN and the XAI
model. Without such a connection there is no basis for claims that an explanation based on the
XAI model applies to the DNN. Durán laments that the form of this connection is never spelled
out. Durán is surely correct here concerning post-hoc model agnostic techniques – and we
would add that unless more is said constraining the nature of this connection, isomorphisms and
similarities between the XAI and the DNN will simply be too cheap and abundant. Durán also
warns that the surveyable and straightforward nature of XAI algorithms make them susceptible
to providing a “false sense of explainability because classifications are not explanations” [10,
p. 3]. He criticizes the tendency to confuse the “analysis of the structure of explanation” with
the “pragmatics of giving explanations”. The fact that different information must be delivered
to different audiences has no bearing on the structure of a bona fide explanation. We agree
with Durán here and suggest that such a criticism could justly be applied to Miller’s influential
paper [3]. Drawing on discussions in philosophy of science, Durán compares the explanations
provided by typical post-hoc XAI with an “explanation” of the apparent retrograde motion of
planets using the Ptolemaic model of planetary motion. Genuine explanation, Durán points out,
is a success term, meaning that it must come with genuine knowledge and understanding of the
world. Just as the Ptolemaic model cannot produce knowledge of this kind, neither, according
to Durán, can typical post-hoc XAI models produce genuine understanding of the DNN.


5. The pessimist end: Durán and Jongsma and Babic et al.
Durán and Jongsma [11] and Babic et al. [12] focus on the applications of DNNs in healthcare
and express skepticism about the use of XAI in this context. Durán and Jongsma hold that a
typical XAI model doesn’t offer sufficient reason to believe that we can reliably trust the DNN it
aims to explicate. On their view, when a layperson sees the appealing visual outputs produced
by a post-hoc XAI (such as saliency maps or heatmaps) – she acquires only an unjustified belief
that it really represents the way the DNN produced the output. The problem is supposed to
be that, for all she knows, the post-hoc XAI is as opaque as the original DNN. XAI is said
to “induce” the belief that one knows why the DNN produced the output without offerring
a “genuine reason” to believe that XAI has interpreted the DNN. As an alternative, Durán
and Jongsma propose computational reliabilism. On this view, one is justified in believing the
predictions of a given AI system just in case “there is a reliable process... that yields, most of
the time, trustworthy results”. In spelling out the notion of a reliable process, four “reliability
indicators” are indicated: verification methods, robustness analysis, a history of (un)successful
implementations, and expert knowledge. Jointly these are said to offer a justification to believe
that the results of medical AI systems are epistemically trustworthy. Also, such trustworthiness
is taken to be necessary, but not sufficient for permissibly acting on an output of a medical AI.
   Babic et al. [12] argue against the suggestion that providing XAI should be a legal requirement
on using DNNs in a healthcare setting. In their view, XAI outputs are not necessarily the actual
reasons behind the outputs of DNNs, nor causally related to them. Babic et al. hold that they
provide only “ersatz understanding”, that is, XAI outputs can leave one with a false impression
that one understands the working of a given DNN better - see also [13] here. They also criticize
post-hoc XAI for failing to be robust, for failing to provide genuine accountability, and for
threatening to limit the performance and complexity of DNNs that can be used in healthcare.
They conclude that, instead of emphasizing explainability, regulators should focus on ensuring
and requiring reliable performance of DNNs.
6. Lessons for future research
Having surveyed the literature, we close by identifying two promising directions for future
research: (1) and (2). But first, some methodological advice: When reading the literature, it is
important to keep in mind the distinction between (i) considering whether an opaque DNN
trained to predict or classify phenomena in some target domain might also provide us with
explanations of these phenomena, and (ii) considering whether some XAI method can provide
us with explanations of the opaque DNN. Some theorists (e.g. Sullivan) are primarily concerned
with the former, whilst others (Fleischer, Páez) are concerned with the latter. Often both of
these topics will be discussed at different points within a single paper. Furthermore, the term
model is sometimes used to refer to the full DNN itself (which is said to be a model of the target
phenomena) and sometimes to refer to an XAI model of the DNN (thus, a model of a model).
The moral here is that this literature doesn’t always mean the same thing by explanation.
   (1) Siding with the Optimists, we tend to think that there is no in principle reason why a
simplified, model-agnostic XAI model of a DNN cannot provide (at least some degree of) genuine
understanding. The claims made by the Pessimists that there is some kind of fundamental
problem with the very idea of such XAI techniques are too strong. However, the Pessimists’ core
worry that simplifying XAI models may be providing only pseudo-explanations and pseudo-
understanding will remain a pressing concern until we have a better grip on when exactly the
simplifications / idealizations made by a model are legitimate and useful and when they are not.
This is especially challenging in the case of modelling DNNs compared with other examples of
scientific modelling. When we employ a simplified model of a physical process or a simplified
economic model, we are perfectly aware of how the simplified model is a simplification: we
choose what the model represents and which features are being left out of the model, and so
we can have a reasonable idea about the importance and relevance of these features. In the
case of DNNs we lack an independent grip on the target phenomenon: we don’t know how
the DNN is transforming the input to obtain the output, and so we can form no clear idea
concerning the respects in which a simplified XAI model of a given DNN is a simplification and
no reasonable idea as to when the features of the DNN that the XAI model doesn’t track might
become important. Thus, one crucial topic for future research is to identify some principled
basis for deciding when an XAI model is a useful simplification and when it oversimplifies.
(And unlike Miller [3],we don’t think that the laymen reports could serve as such basis.)
   (2) Agreeing with the Pessimists, we think that, at least sometimes, a reliable track record of
accuracy should suffice for trusting an opaque DNN. The pressing question, then, is when is
such a record enough. In part, this is an ethical issue: when are users / stakeholders owed an
explanation for a decision made by a DNN? But there is an epistemological issue here too: under
what circumstances is it reasonable to think that future inputs will resemble past inputs, so that
past track record of reliability can serve as the basis for trust? How can we estimate variation
in future inputs compared with past inputs and the training data? For example, when we think
of a DNN trained on a set of standardized photos or scans of one specific organ or anatomical
feature, the risk of the system responding in unforeseen ways to new inputs that differ in some
crucial way from the training data distribution seems small. But if we think of cases in which
the training data and potential inputs allow for more variation, the risk that the system might
encounter novel, off-distribution inputs for which it is no longer reliable seems much higher.
One of the four “reliability indicators” Durán and Jongsma identify is robustness analysis – a
term taken from engineering, where it refers to an analysis of a systems performance under
a range of different conditions. They comment, “Robustness analysis.. allows researchers to
learn about the results of a given model, and whether they are an artefact of it (eg, due to a poor
idealisation) or whether they are related to core features of the model” [11, p. 332]. This points
in the direction of a possible solution to the epistemological issue. However, for now it is also
only a promissory note, since it is not immediately clear what a satisfactory robustness analysis
of a DNN would amount to, and what sorts of “conditions” would we vary and test.


Acknowledgments
Knoks benefited from funding of the Luxembourg National Research Fund (FNR) under the
OPEN programme within the project Deontic Logic for Epistemic Rights (DELIGHT).


References
 [1] J. Burrell, How the machine ‘thinks’: Understanding opacity in machine learning algo-
     rithms, Big Data & Society 3 (2016).
 [2] R. Guidotti, A. Monreale, D. Pedreschi, F. Giannotti, Principles of Explainable Artificial
     Intelligence, Springer International Publishing, 2021, pp. 9–31.
 [3] T. Miller, Explanation in artificial intelligence: Insights from the social sciences, Artificial
     Intelligence 267 (2019) 1–38.
 [4] A. Páez, The pragmatic turn in explainable artificial intelligence (XAI), Minds and Machines
     29 (2019) 441–459.
 [5] W. Fleisher, Understanding, idealization, and explainable AI, Episteme (forthcoming).
 [6] E. Sullivan, Understanding from machine learning models, The British Journal for the
     Philosophy of Science 73 (2022).
 [7] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, S. Thrun, Dermatologist-
     level classification of skin cancer with deep neural networks, nature 542 (2017) 115–118.
 [8] Y. Wang, M. Kosinski, Deep neural networks are more accurate than humans at detecting
     sexual orientation from facial images, Journal of personality and social psychology 114
     (2018) 246–257.
 [9] T. C. Schelling, Dynamic models of segregation, The Journal of Mathematical Sociology 1
     (1971) 143–186.
[10] J. M. Durán, Dissecting scientific explanation in AI (sXAI): A case for medicine and
     healthcare, Artificial Intelligence 297 (2021).
[11] J. M. Durán, K. R. Jongsma, Who is afraid of black box algorithms? On the epistemological
     and ethical basis of trust in medical AI, Journal of Medical Ethics 47 (2021) 329–335.
[12] B. Babic, S. Gerke, T. Evgeniou, I. G. Cohen, Beware explanations from ai in health care,
     Science 373 (2021) 284–286.
[13] Z. Lipton, The mythos of model interpretability, Queue 16 (2018) 31–57.