<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of Explainable AI methods for Classification Tasks in Visual Inspection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Björn Forcher</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Menold</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moritz Weixler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jörg Schmitt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Wagner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Robert Bosch GmbH</institution>
          ,
          <addr-line>Wernerstraße 51, 70469 Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Methods of the eXplainable Artificial Intelligence (XAI) gain more and more interest in the machine learning (ML) community. For explaining neural networks, a lot of methods have been proposed, especially in the context of computer vision (CV). These approaches aim at explaining the decisions by means of sensitivity or importance of input features. In this paper, an application in the field of visual inspection (VI) in the manufacturing domain is analyzed. As diferent XAI methods produce interpretations of varying quality, we propose a metrics bundle to value the quality of those algorithms, e.g. Gradient or Guided Backpropagation. The bundle includes a new approach of measuring the correctness of the explanation and enables developers to rely on the most appropriate method for their use cases.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Evaluation</kwd>
        <kwd>Metrics</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Visual inspection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As the scientific field of artificial intelligence is evolving rapidly in the past few years, concerns
about the safety, security, reliability, and resiliency of these systems are growing. There have
been increasing eforts to understand ML models in order to detect weaknesses (e.g. correlated
features) at an early stage. These methods are known collectively under the term XAI [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
are used to explain the decisions of an AI system.
      </p>
      <p>In this paper, an application of visual inspection in the manufacturing of fuel injection
equipment (FIE) systems is in focus. We implemented a neural network which is used to detect
coating failures in FIE components. For improving the model we applied various XAI methods
producing explanations of varying quality. In order to evaluate these methods objectively,
a metric composition is proposed. It quantifies the ability of the XAI methods to accurately
describe the sensitivity of the model at the given sample as correct as possible, the ability
to compensate noise in the model function and the computational speed calculating the XAI
algorithm.</p>
      <p>This paper is structured as follows. In the next section we present the applied XAI methods
and the proposed metric composition. Section 3 shows the coating failure use case, the applied
XAI methods and our metrics bundle. The final section summarizes our findings and gives an
outlook of further investigations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. XAI methods and metrics</title>
      <p>
        There are many XAI methods which are used to provide interpretations for ML models (see
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). They can be classified by various characteristics such as agnosticism or
locality [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Local approaches are mainly based on one example of the data set and reveal how
features contribute to the output. Global approaches on the other side take the whole data set
into account [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The agnosticism characteristic describes whether the approach works only
for specific ML models (model-specific) or if it can be applied to any model (model-agnostic)
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this work, we focus on model-specific methods for neural networks which provide
local interpretations, namely Gradient Backpropagation (GrB) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Deconvolutional Networks
(DeCon) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Guided Backpropagation (GuB) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The main question is how to decide which method provides good explanations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Regarding
our use case explanations should be correct and not susceptible to model noise. In addition, fast
computation time is important due to real-time application in assembly lines (compare also
to [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). An explanation is correct if it directly represents, how the model made its decision
(compare to [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]). This aspect can be determined by the sensitivity of a model which
characterizes the behavior of infinitesimal changes in the input. Gradient methods, such as the
above-mentioned ones, can be leveraged for that. In this context, the term feature sensitivity
describes the ability of an XAI method to determine the sensitivity of a model as correct as
possible. The second property is to be free of model-induced noise. The ability of a method to
reduce the influence of model-induced noise is called noise susceptibility. In the following we
provide a brief description of the metrics (see [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for more details).
      </p>
      <p>As mentioned above, Feature Sensitivity takes only the local context of the model function
into account. Let () be the prediction probability for the testing input  and let  be a
small deviation vector. We define the feature sensitivity metric  as follows.
 =
1</p>
      <p>()
− ()</p>
      <p />
      <p>
        The deviation vector  is based on the attributions of  . The attributions  are a general
measure of the contribution of a single input value to the predication (compare to [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]). The
deviation vector  is now chosen sample wise as percentage  using the 2 norm and is defined
as follows:
 =  −
 · || ||2 ·
      </p>
      <p>||||2</p>
      <p>
        There are many methods to calculate a sensitivity score (see [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]). In our approach we use
the gradient-based attributions  to derive  and scale  to a range from − 1 to 1. For
this purpose the score is divided by the score of the exact gradient calculated using GrB. This
automatically gives GrB has a score of 1 as the ideal algorithm. To get rid of the influence of
arbitrarily chosen samples, the score can be calculated over as many samples as possible. For
any method the scores of all samples form a stochastic distribution. As a single score the mean
of all sample scores is used.
      </p>
      <p>To find a metric to account for Noise Susceptibility, a definition of noise is needed. For
any model trained on a finite amount of samples, the model function can only classify these
samples exactly. The diference between an imaginary ideal model function trained on infinite
samples and the present model function can be considered as an error. For this metric the error is
assumed to be dominated by high frequency noise terms. For any XAI method, the attributions
are as well assumed to consist of the ideal part  and the error  introduced by the noise
 =  +  . According to Balduzzi et al. [19], the gradient of the model function is even
more susceptible to noise than the model function itself. The noise of the gradients is usually
high frequency, especially for deep networks. The attribution error  is therefore also expected
to be large. This contrasts to the low frequent ideal attributions . By applying a low pass
image filter on the attribution map  the ideal attributions  are reconstructed. For the
present application a Gaussian filter is used. The attribution error  is assumed to be low for
a method that is not susceptible to model noise and high for a method vulnerable to model
induced noise. To quantitatively evaluate the model induced noise the structural similarity
(SSIM) [20] is used to compare the attribution mask  before and after filtering. For the SSIM
two images  and  are compared based on their diference in luminescence (, ), contrast
(, ) and structure (, ). The SSIM should yield a maximum score of 1 if (and only if) the
attributions are identical. The SSIM is calculated window-wise for a number of local windows
of the images. The final noise score  uses the mean SSIM over all windows.</p>
      <p>The metric Computational Speed  is determined by the computed samples per second
and is normalized by means of the fastest algorithm. Hence,  ranges between 0 and 1
whereas value 1 identifies the fastest algorithm.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Applying XAI Methods and Metrics</title>
      <p>Visual inspection is an important application in the manufacturing domain. Pictures are taken in
certain production steps in order to detect defective parts. In our use case we need to recognize
coating failures on a cylindrical part. The ring showing the coating is extracted from the raw
image, unrolled, straightened and stacked in lines showing 45∘ each. The model aims to identify
not only good parts (OK), but also to distinguish between four diferent failure types (BLACK,
DAMAGE, SCRATCH and SILVER).</p>
      <p>The used ML model applies a modified ResNet50 architecture (see [ 21]). The original ResNet50
network is scaled up by a factor of 4 in width and height, respectively. Instead of 224 × 224 × 3
the network uses input images of size 896 × 896 × 3. All convolutional layers are also scaled
up by the same factor. To be able to use the original fully connected layers at the end of the
network without re-scaling, an average pooling layer is introduced.</p>
      <p>The samples for bad classes are chosen equally. Here, the scores are calculated per class and
for every considered XAI method an average is build. For the feature sensitivity the deviation
percentage  is chosen empirically as 0.00001. The result can be seen in the following table.</p>
      <p>Method 
Gradient Backpropagation (GrB) 1.000
Deconvolutional Network (DeCon) -0.050
Guided Backpropagation (GuB) 0.000</p>
      <p>The generated sensitivity show that DeCon and GuB loose quite a lot of their correctness
when identifying sensitive features. In addition, DeCon shows bad results regarding noise and
speed and thus, it could be excluded for the coating failure use case. Regarding GrB and GuB
a clear preference could not be derived. GrB is the fastest algorithm but GuB shows a better
performance with respect to noise. The figure below illustrates our findings. The most left
image represents a faulty case (FC) containing three problematic areas. The figure reveals that
DeCon provides bad results for this use case. GrB and GuB on the contrary highlight these
areas correctly.</p>
      <p>(a) FC
(b) GrB
(c) DeCon
(d) GuB</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>GrB and GuB are useful for interpreting the coating failure use case. All anomalies get
highlighted. DeCon does not work properly for this use case and should not be used here.</p>
      <p>However, the sensitivity score can only be applied to gradient-based explanation methods.
To include other XAI methods such as LRP [22], DeepLift [23], GradCam [24] or GradCam++
[25] which do not compute sensitivity scores but rather relevance scores, a clearer definition of
correctness is needed with respect to all applicable XAI methods. Simply describing correctness
as the ability to choose important attributions is not suficient. Possibly, a combination of
existing and new metrics could lead to clearer results. Also more metrics are needed to account
for problems of diferent methods. For example, Guided Backpropagation is invariant to model
randomization (see Adebayo et al. [26]).</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Thanks to the developers of ACM consolidated LaTeX styles https://github.com/borisveytsman/
acmart and to the developers of Elsevier updated LATEX templates https://www.ctan.org/
tex-archive/macros/latex/contrib/els-cas-templates.
based explanations?, CoRR abs/1901.09392 (2019). URL: http://arxiv.org/abs/1901.09392.
arXiv:1901.09392.
[19] D. Balduzzi, M. Frean, L. Leary, J. Lewis, K. W.-D. Ma, B. McWilliams, The
shattered gradients problem: If resnets are the answer, then what is the question?, 2018.
arXiv:1702.08591.
[20] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from
error visibility to structural similarity., IEEE Transactions on Image Processing 13 (2004)
600–612.
[21] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification, 2015. arXiv:1502.01852.
[22] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On pixel-wise
explanations for non-linear classifier decisions by layer-wise relevance propagation, PLOS
ONE 10 (2015) 1–46. URL: https://doi.org/10.1371/journal.pone.0130140. doi:10.1371/
journal.pone.0130140.
[23] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating
activation diferences, 2019. arXiv:1704.02685.
[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual
explanations from deep networks via gradient-based localization, International Journal of
Computer Vision 128 (2019) 336–359.
[25] A. Chattopadhay, A. Sarkar, P. Howlader, V. N. Balasubramanian, Grad-cam++: Generalized
gradient-based visual explanations for deep convolutional networks, in: 2018 IEEE Winter
Conference on Applications of Computer Vision (WACV), 2018, pp. 839–847.
[26] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, B. Kim, Sanity checks for saliency
maps, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett
(Eds.), Advances in Neural Information Processing Systems, volume 31, Curran Associates,
Inc., 2018.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. van Lent</surname>
            , W. Fisher,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mancuso</surname>
          </string-name>
          ,
          <article-title>An explainable artificial intelligence system for smallunit tactical behavior</article-title>
          ,
          <source>in: Proceedings of the 16th Conference on Innovative Applications of Artifical Intelligence</source>
          , IAAI'04, AAAI Press,
          <year>2004</year>
          , p.
          <fpage>900</fpage>
          -
          <lpage>907</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saranti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Molnar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Biecek</surname>
          </string-name>
          , W. Samek,
          <string-name>
            <surname>Explainable AI Methods - A Brief Overview</surname>
          </string-name>
          , Springer International Publishing,
          <year>2022</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Linardatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Papastefanopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kotsiantis</surname>
          </string-name>
          ,
          <article-title>Explainable ai: A review of machine learning interpretability methods</article-title>
          ,
          <source>Entropy</source>
          <volume>23</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Iwana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kuroki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Uchida</surname>
          </string-name>
          ,
          <article-title>Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation</article-title>
          , in: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW),
          <year>2019</year>
          , pp.
          <fpage>4176</fpage>
          -
          <lpage>4185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tjoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <article-title>A survey on explainable artificial intelligence (XAI): Toward medical XAI</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>32</volume>
          (
          <year>2021</year>
          )
          <fpage>4793</fpage>
          -
          <lpage>4813</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>C. Molnar,</surname>
          </string-name>
          <article-title>Interpretable Machine Learning - A Guide for Making Black Box AI Explainable</article-title>
          , Independently published,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M. F.</given-names>
            <surname>Moura</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Eckersley,</surname>
          </string-name>
          <article-title>Explainable machine learning in deployment</article-title>
          ,
          <source>in: In Proceedings of the 2020 Conference onFairness, Accountability, and Transparency</source>
          ,
          <year>2020</year>
          . arXiv:
          <year>1909</year>
          .06342.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Carrillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Cantú</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Noriega</surname>
          </string-name>
          ,
          <article-title>Individual explanations in machine learning models: A survey for practitioners</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2104</volume>
          .
          <fpage>04144</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Deep inside convolutional networks: Visualising image classification models and saliency maps</article-title>
          , in: Y. Bengio, Y. LeCun (Eds.),
          <source>2nd International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2014</year>
          . arXiv:
          <volume>1312</volume>
          .
          <fpage>6034</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M. D. Zeiler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fergus</surname>
          </string-name>
          ,
          <article-title>Visualizing and understanding convolutional networks</article-title>
          ,
          <year>2013</year>
          . arXiv:
          <volume>1311</volume>
          .
          <fpage>2901</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <article-title>Striving for simplicity: The all convolutional net</article-title>
          ,
          <year>2015</year>
          . arXiv:
          <volume>1412</volume>
          .
          <fpage>6806</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Mueller</surname>
          </string-name>
          , G. Klein,
          <string-name>
            <given-names>J.</given-names>
            <surname>Litman</surname>
          </string-name>
          ,
          <article-title>Metrics for explainable ai: Challenges and prospects</article-title>
          , ArXiv abs/
          <year>1812</year>
          .04608 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>I.</given-names>
            <surname>Kakogeorgiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Karantzalos</surname>
          </string-name>
          ,
          <article-title>Evaluating explainable artificial intelligence methods for multi-label deep learning classification tasks in remote sensing</article-title>
          ,
          <source>International Journal of Applied Earth Observation and Geoinformation</source>
          <volume>103</volume>
          (
          <year>2021</year>
          )
          <fpage>102520</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Santhanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alami-Idrissi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schumann</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Giurgiu</surname>
          </string-name>
          ,
          <source>On evaluating explainability algorithms</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nauta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trienes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          , E. Nguyen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schlötterer</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Keulen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Seifert</surname>
          </string-name>
          ,
          <article-title>From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai</article-title>
          ,
          <source>ACM Comput. Surv</source>
          . (
          <year>2023</year>
          ). Just Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weixler</surname>
          </string-name>
          ,
          <article-title>Validation of Machine Learning Models with Algorithms from the Area of Explainable AI for Classification and Regression Tasks</article-title>
          ,
          <source>Master's thesis</source>
          , University Stuttgart,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Axiomatic attribution for deep networks</article-title>
          ,
          <year>2017</year>
          . arXiv:
          <volume>1703</volume>
          .
          <fpage>01365</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Suggala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. I.</given-names>
            <surname>Inouye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ravikumar</surname>
          </string-name>
          , How sensitive are sensitivity-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>