<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study on the Faithfulness of Feature Attribution Explanations in Pruned Vision-Based Multi-Task Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xenia Demetriou</string-name>
          <email>x.demetriou3@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie Sananikone</string-name>
          <email>sophie.sananikone12@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vojislav Tobias Westmoreland</string-name>
          <email>vojo.westmoreland@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthia Sabatelli</string-name>
          <email>m.sabatelli@rug.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Zullich</string-name>
          <email>marco.zullich@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of AI, Faculty of Science and Engineering, University of Groningen</institution>
          ,
          <addr-line>Nijenborg 9, 9747 AG, Groningen</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As Artificial Neural Networks (ANNs) continue growing in size to enhance predictive accuracy, model compression techniques, such as pruning, counteract the increase in energy and computational costs. Multi-Task Learning (MTL) consists in training a single model on multiple tasks, providing regularization and increased generalization, especially in applications like robotics and autonomous driving. While compressed models often match the accuracy of their larger counterparts, they are usually evaluated on metrics related to task completion or eficiency, overlooking critical aspects such as fairness, and transparency-key requirements for high-stakes applications. Specifically, Explainable AI ofers tools for making model predictions more transparent, for instance, by means of feature importance. However, especially when AI models are complex, these tools can lead to unfaithful or unreliable explanations, potentially undermining trust in these models. In the present work, we propose to investigate whether pruning applied to vision-based MTL significantly afects the faithfulness of the explanations generated for the tasks the models are trained on. We train and prune diferent models on the benchmark datasets NYUv2 and CityScapes. Despite the hurdle of generating feature importance for tasks such as surface normal prediction and depth estimation, our results show that unstructured pruning maintains faithfulness across diferent sparsity percentages. Structured pruning with milder sparsity percentages preserves faithfulness, but can decrease more rapidly at higher sparsity percentages. Overall, sparsity up to a certain threshold (90%) does not compromise explanation faithfulness. Beyond this point, both faithfulness and performance decline significantly, making them unfit to be deployed, rendering the faithfulness risk negligible.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Faithfulness</kwd>
        <kwd>Explainability</kwd>
        <kwd>Multi-Task Learning</kwd>
        <kwd>Model Compression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The deployment of artificial vision systems through Artificial Intelligence (AI) has transformed various
domains such as autonomous driving and medical imaging, resulting in remarkable performance
increases over the past decade. In Computer Vision (CV) tasks, Convolutional Neural Networks (CNNs)
have emerged as one of the cornerstones of this technology. However, they are characterized by high
computational costs and a lack of interpretability due to their complexity. Their computational footprint
can however be tackled by Model Compression (MC) techniques. One of the main techniques for MC
is pruning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which aims at removing (groups of) synapses in an Artificial Neural Network (ANN)
according to specific criteria. Another viable technique consists in training models in a Multi-Task
Learning (MTL) setting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], where one ANN can generate predictions for multiple tasks given the same
input, instead of having task-specific ANNs, each responsible for generating one prediction per task.
MTL is useful in areas such as robotics and autonomous driving, where a machine learning model
may be tasked to perform multiple tasks at the same time (e.g., Depth Estimation—DE—and semantic
segmentation) in a resource-constrained environment. It can additionally be considered a MC tool,
since MTL models are typically less computationally demanding than using separate models for each
individual task [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Explainable AI (XAI) is crucial for understanding complex model predictions, particularly in
applications requiring transparency [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Feature attribution, a key XAI tool, assigns “saliency scores” to
input features, highlighting their importance. The main model-agnostic methods are LIME [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which
relies on surrogate models, and SHAP [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which has solid game theoretical foundations. Additionally,
ANN-specific tools, such as Layerwise Relevance Propagation (LRP) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which tracks down components
of an ANN responsible for a given activation, have been developed. In ANNs, feature attribution can be
performed by backpropagating gradients of the output on the input features, like in the case of Guided
Backpropagation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, these gradient-based methods have often been found to be unreliable,
often acting similarly to edge detectors [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Grad-CAM [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a feature attribution tool specific for CNNs,
combines gradients and activations, being more class-discriminative and faithful to the underlying
model than other gradient-based tools. If compared to LIME, SHAP, and LRP, gradient-based methods
are often much more eficient [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which is a reason why Grad-CAM is often the de facto choice for
feature attributions in CNNs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The present work utilizes and expands Grad-CAM to explain DE
and Surface Normal (SN) prediction tasks. One of the paramount issues is that all these methods are
approximations of the complex decision-making process of ANNs, and can hence produce unfaithful
outputs, which highlight irrelevant features [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Analyzing the efect of pruning on the quality of explanations has been researched previously:
Abbasi-Asl and Yu [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] studied the features learned by CNNs on image classification, noticing that
redundancy in CNN filters coalesced when applying filter pruning. Weber et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] provided a
humangrounded evaluation of saliency maps generated on CNNs for image classification. Their findings
suggest that explanations produced by moderately pruned models are evaluated as superior by humans
with respect to their dense counterparts. Finally, Khakzar et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] introduced a custom pruning
method, termed PruneGrad, shown to quantitatively produce more faithful explanations; however, this
is an input-specific method, thus reducing the real world practicality and scalability.
      </p>
      <p>
        In the present work, we aim at answering the question: “How is the faithfulness of multi-task
model explanations afected by pruning? ”. We train three diferent CNNs for MTL on the NYUv2
and CityScapes datasets. We then proceed to apply diferent pruning methods, to obtain several pruned
models at diferent sparsity levels. Subsequently, we generate task-level saliency maps using variations
of the popular image-based XAI tool Grad-CAM [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Finally, we proceed to evaluate the faithfulness of
these maps with respect to the models using the Iterative Removal of Features (IROF) technique [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Our findings highlight how sparsity up to a certain threshold (dependent upon the specific pruning
technique) does not compromise the faithfulness of explanation; The main consequence is that, as far
as explainability is concerned, well-performing, moderately sparse models will likely yield similarly
faithful explanations.
      </p>
      <p>Our contributions are twofold: (i) We carry out an analysis on the efect of pruning on the faithfulness
of models trained for vision-based MTL, and (ii) We provide an adaptation of Grad-CAM and its
assessment for monocular DE and SN, a task which Grad-CAM is not originally designed for.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>In this section, we provide the materials and methods used in the present work, alongside a summary
of the experimental setup.</p>
      <sec id="sec-2-1">
        <title>2.1. Datasets</title>
        <p>
          We employed two datasets to train and evaluate our models; The NYU-Depth V2 (NYUv2) dataset [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ],
and the Cityscapes dataset [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], both commonly used benchmarks for MTL and scene understanding in
indoor and urban settings. NYUv2 contains annotations for Semantic Segmentation, SN estimation, and
DE, while Cityscapes only provides labels for Semantic Segmentation and DE. We provide an example
(a) Original Image.
        </p>
        <p>(b) Semantic Segmentation. (c) SN. RGB values represent (d) Depth. Colours represent</p>
        <p>Each color represents a the angle components. diferent depths.</p>
        <p>diferent class.
of one image and the corresponding annotations for NYUv2 in Figure 1. Semantic Segmentation is a
pixel-level classification task. We assess the quality according to the mean Intersection-over-Union
(mIoU) metric. The SN task is a pixel-level three-dimensional regression task, where surface angles
are described by their , , and  coordinates in a 3D space. We evaluate it using the median angle
error over all pixels in an image. DE is a pixel-level regression task where the distance of each pixel
relative to the camera is measured. We assess it using the relative error between the ground truth and
the predicted depth.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Model Architectures</title>
        <p>
          We made use of two CCN-based model architectures specifically crafted for MTL: DeepLab [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and
SegNetMTAN [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ]. Both techniques can be used in combination weith the FAMO weighting method
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to correctly balance the contribution of the various tasks.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Pruning</title>
        <p>
          Pruning compresses ANNs by removing synapses based on saliency. Unstructured pruning removes
individual connections, while structured pruning removes groups (neurons, filters, etc.). Structured
pruning ofers hardware-agnostic speed-ups via tensor dimension reduction, whereas unstructured
pruning requires specialized hardware. Pruning can be dynamic (allowing synapse regrowth) or static
(no regrowth). This work employs DiSparse [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] (unstructured, gradient-based MTL), Network Slimming
(NetSlim) [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] (structured, channel-level L1-norm), and HRank [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] (structured, filter-rank-based).
DiSparse avoids task-conflicting pruning, NetSlim uses L1-norm for channel importance, while HRank
prunes low-rank filters.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Explanations</title>
        <p>
          For an RGB image , we generate a saliency map  indicating pixel importance. Grad-CAM, using
CNN layer activations and output gradients, produces a matrix ′, normalized and rescaled to the
dimensions of , so each pixel is assigned a saliency from 0 to 1. Following Selvaraju et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], we apply
it to the last convolutional layer. For semantic segmentation, we use SegGrad-CAM [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], explaining
per-class pixel groups. An example of the explanations generated using SegGrad-CAM is present in
Figure 2. In addition, we operate the following methodology to adapt Grad-CAM to DE and SN. We
group predictions according to their values into specific categories: for DE, we divide the [
          <xref ref-type="bibr" rid="ref1">0 − 1</xref>
          ] range
of depth prediction into quintiles and compute the gradients according to each of these groups; in
this way, we aim at explaining the feature importance in determining a given quintile of the depth
predictions. Similarly, for SNs, we group the predictions in the eight octants of the 3d plane and generate
explanations accordingly. Despite the problems of DE and SN estimation being pixel-level regression
tasks, the proposed methodology restructures the output into a classification problem, thus allowing us
to use Grad-CAM.
        </p>
        <p>(a) Wall
(b) Floor
(c) Table
(d) Ceiling
representing feature importance, is shown on the scale at the bottom.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Assessing XAI</title>
        <p>
          As highlighted by Nauta et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], faithfulness is paramount in XAI evaluation. While other explanation
properties exist, unfaithful explanations highlight irrelevant features, potentially causing loss of trust
in AI systems by stakeholders. The assessment of faithfulness boils down to determining whether the
features marked as salient are efectively important for the model. One way of evaluating this is by
progressively masking out features according to their importance and measuring the change in the
model prediction. The expectation is that highly important features should cause a large change in
the prediction. This is the rationale behind IROF: after the process of iterative masking, the change in
prediction is plotted against the proportion of features removed, and faithfulness is computed as the
area-over-the-curve (AOC). A detailed visualization of the IROF procedure is shown in Figure 3.
        </p>
        <p>
          Formally, every input  is segmented in superpixels using SLIC [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Each segment is given a
saliency score according to the mean importance provided by Grad-CAM. Next, we compute the output
of the model on the perturbed input  ′ )︁ , where  is the number of segments removed, and divide
︁(
it by the output on the unperturbed input  ′0 )︁ . By varying this procedure over all segments
︁(

 ∈ {1, . . . , } in order of saliency, the aforementioned curve can be plotted and AOC computed as a
faithfulness score. This metric is then averaged over a whole set of  datapoints to produce the final

estimate:
        </p>
        <p>IROF(,  ) =</p>
        <p>=1
1 ∑︁ AOC ⎜
⎛  (︁ ′  ⎟</p>
        <p>)︁
⎝  (︀ ′0  ⎠
︀)
⎞
=0
.</p>
        <p>
          In the present study, we used the IROF implementation from the Quantus library by Hedström et al.
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Since IROF requires scalar outputs to be compared, for our three tasks, we adapted the procedure
as follows: (a) for Semantic Segmentation, we considered the mean of the predicted logits for each
specific class of interest; (b) for DE, we considered the relative error between unperturbed and perturbed
outputs, while (c) for SN, we considered the cosine similarity between the unperturbed and perturbed
outputs.
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Experimental Setup</title>
        <p>
          The present study investigates the efect of two independent variables (sparsity and pruning method)
on two dependent variables (model performance and IROF Scores). We tested this on three model
architectures to improve the robustness of the conclusions. Diferent models addressed diferent
combinations of tasks. We applied a diferent pruning method for each model architecture. Table 1 shows an
overview of the model architectures used, with their respective pruning method used, sparsity
percentages, tasks, and datasets. The specific optimizers, training hyperparameters, and data augmentation
strategy are instead presented in Appendix A. We point out that the aim of our experiments, as far as
[
          <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
          ], not to beat the state-of-the-art on the specific tasks. The goal of the present paper is indeed
the faithfulness of the explanations.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Results, as shown in Figure 4 following the setup in Table 1, reveal diverse trends across tasks and
pruning methods. For DE, Relative Error rises exponentially with sparsity for both model/dataset
combinations. IROF scores remain stable until 75% sparsity; HRank maintains better IROF at higher
sparsity compared to NetSlim. For SN, DiSparse (DeepLab) achieves significantly higher performance
and IROF than HRank (SegNetMTAN), even with substantial pruning due to DiSparse’s unstructured
nature allowing for aggressive pruning. IROF scores show a similar diference between DiSparse and
HRank, with sparsity minimally afecting the score until 90%. In Semantic Segmentation, NetSlim
(SegNetMTAN) achieves stable, high performance (around 70% mIoU), with a drop at 90% sparsity.
HRank (SegNetMTAN) and DiSparse (DeepLab) perform worse (35-45% mIoU). IROF scores remain
largely stable across pruning rates for all methods, with declines at high sparsity (∼ 90%) for structured
pruning.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusions</title>
      <p>In the present work, we investigated the ties between pruning and faithfulness of the explanations in
the case of MTL. We trained three diferent model architectures on the datasets CityScapes and NYUv2.
We then pruned these models using iterative structured and unstructured pruning techniques, reaching
sparsity levels above 90%. We then proceeded to generate per-task input attribution explanations
using variations of GradCAM. Finally, we assessed the faithfulness of these explanations, faithfulness
being the degree to which the explanations capture the complex predictive process of the underlying
black-box model. We evaluated the faithfulness using the popular Iterative Removal of Features
(IROF) metric, which proceeds to iteratively delete progressively more important (as measured by the
explanation) information from the input, expecting a variation in the model’s output in the process.
The results indicate how the DiSparse unstructured pruning technique seems not to particularly afect
the faithfulness of the explanations; however, the structured pruning techniques generally show a
decrement in faithfulness as the pruning rate increases—a trend which seems to be loosely connected to
the decrease in performance—which comes with excessive pruning—measured across the various tasks.</p>
      <p>Our findings highlight several positive impacts. Firstly, it encourages the implementation of DNNs
with unstructured pruning in embedded systems and resource-constrained environments without
compromising explanation faithfulness. In turn, this also helps foster user trust and regulatory compliance
in such systems. We suggest positive findings in terms of AI Democratization through the decreased
computational demands that come as a result of compression, making these models more accessible to
a broader audience. For structured pruning, the results show that sparsity up to a certain threshold
maintains explanation faithfulness. Beyond this point, both faithfulness and performance decline
significantly, making such highly pruned models unlikely to be deployed. Thus, the overall risk from
reduced faithfulness is minimal.</p>
      <p>However, our work comes with several limitations: first of all, it is strictly tied to vision-based MTL.
In addition, we assess faithfulness only according to IROF, while other methodologies exist for the
evaluation. Moreover, our analysis lacks considerations on other facets of evaluation of explanations,
such as robustness and coherence, which still should heavily be subject to considerations on faithfulness.
Our analysis is also strictly tied to functional assessment, while several XAI studies operate
humangrounded evaluation as a step for validating the explanations. These limitations also guide our next
steps in extending the project: (i) extend the study to non-vision tasks, (ii) include other metrics for
faithfulness and other facets of explanations, (iii) and carry out a human-grounded evaluation step.
Considering the paramount importance of having faithful feature importance explanations, we still
believe our findings to be of use for practitioners willing to implement MC on MTL.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgements</title>
      <p>We thank the Center for Information Technology of the University of Groningen for their support and
for providing access to the Hábrók high performance computing cluster.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author has not employed any Generative AI tools.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Training hyperparameters</title>
      <p>
        We report here the optimizer and hyperparameters we used for training our models. The settings are as
similar as possible to the original implementations found in the source references [
        <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
        ].
      </p>
      <p>Model + Pruning</p>
      <p>DeepLab+DiSparse
SegNetMTAN+NetSlim
SegNetMTAN+HRank</p>
      <p>Optimizer Learning Rate &amp; Annealing Batch size Iterations</p>
      <p>Adam 040..k0000it10e,1rwat/ioAnnsneal × 0.5 every 382 3260.4kk
0a.t0h00a1lf, twra/iAninnngeal × 0.5 2 145k
Augmentation</p>
      <p>None
Random Scaling [1.0, 1.2, 1.5× ]
and Random horizontal flip</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hoefler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alistarh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ben-Nun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dryden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peste</surname>
          </string-name>
          ,
          <article-title>Sparsity in deep learning: Pruning and growth for eficient inference and training in neural networks</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>22</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <article-title>Multitask learning</article-title>
          ,
          <source>Machine learning 28</source>
          (
          <year>1997</year>
          )
          <fpage>41</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <article-title>Disparse: Disentangled sparsification for multitask model compression</article-title>
          ,
          <source>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>2022</year>
          )
          <fpage>12372</fpage>
          -
          <lpage>12382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A survey on multi-task learning</article-title>
          ,
          <source>IEEE transactions on knowledge and data engineering 34</source>
          (
          <year>2021</year>
          )
          <fpage>5586</fpage>
          -
          <lpage>5609</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>M. E. Kaminski,</surname>
          </string-name>
          <article-title>The right to explanation, explained, in: Research handbook on information law and governance</article-title>
          ,
          <source>Edward Elgar Publishing</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , “
          <article-title>Why Should I Trust You?”: Explaining the Predictions of Any Classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Unified</surname>
          </string-name>
          <article-title>Approach to Interpreting Model Predictions</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Binder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Montavon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Klauschen</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          , W. Samek,
          <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</article-title>
          ,
          <source>PloS one 10</source>
          (
          <year>2015</year>
          )
          <article-title>e0130140</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <article-title>Striving for simplicity: The all convolutional net</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6806</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>A theoretical explanation for perplexing behaviors of backpropagationbased visualizations</article-title>
          , in: J.
          <string-name>
            <surname>Dy</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Krause (Eds.),
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>3809</fpage>
          -
          <lpage>3818</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          , Grad-CAM:
          <article-title>Visual Explanations from Deep Networks via Gradient-based Localization</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          <volume>128</volume>
          (
          <year>2020</year>
          )
          <fpage>336</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miró-Nicolau</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. i Capó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Moyà-Alcover</surname>
          </string-name>
          ,
          <article-title>A comprehensive study on fidelity metrics for xai</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>62</volume>
          (
          <year>2025</year>
          )
          <fpage>103900</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Arrighi</surname>
          </string-name>
          , I. A. de Moraes,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zullich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Barbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Junior</surname>
          </string-name>
          ,
          <source>Explainable Artificial Intelligence Techniques for Interpretation of Food Datasets: a Review</source>
          ,
          <source>arXiv preprint arXiv:2504.10527</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Arrighi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Barbon</given-names>
            <surname>Junior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Pellegrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zullich</surname>
          </string-name>
          ,
          <article-title>Explainable Automated Anomaly Recognition in Failure Analysis: is Deep Learning Doing it Correctly?</article-title>
          ,
          <source>in: World Conference on Explainable Artificial Intelligence</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>420</fpage>
          -
          <lpage>432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Abbasi-Asl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Structural compression of convolutional neural networks with applications in interpretability, Frontiers in big Data 4 (</article-title>
          <year>2021</year>
          )
          <fpage>704182</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Merkle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schöttle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schlögl</surname>
          </string-name>
          ,
          <article-title>Less is more: The influence of pruning on the explainability of CNNs</article-title>
          , arXiv preprint arXiv:
          <volume>2302</volume>
          .08878 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khakzar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baselizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanduja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rupprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Navab</surname>
          </string-name>
          ,
          <article-title>Improving feature attribution through input-specific network pruning</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>11081</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rieger</surname>
          </string-name>
          , L. Hansen,
          <article-title>Irof: a low resource evaluation metric for explanation methods</article-title>
          ,
          <source>in: Proceedings of the Workshop AI for Afordable Healthcare at ICLR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Arbelaez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <article-title>Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images</article-title>
          , in:
          <source>2013 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          , Portland,
          <string-name>
            <surname>OR</surname>
          </string-name>
          , USA,
          <year>2013</year>
          , p.
          <fpage>564</fpage>
          -
          <lpage>571</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cordts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Omran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rehfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Enzweiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Franke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schiele</surname>
          </string-name>
          ,
          <article-title>The Cityscapes dataset for semantic urban scene understanding</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>3213</fpage>
          -
          <lpage>3223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Papandreou, I. Kokkinos,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Yuille</surname>
          </string-name>
          ,
          <article-title>Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>40</volume>
          (
          <year>2017</year>
          )
          <fpage>834</fpage>
          -
          <lpage>848</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Badrinarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kendall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cipolla</surname>
          </string-name>
          ,
          <article-title>Segnet: A deep convolutional encoder-decoder architecture for image segmentation</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>39</volume>
          (
          <year>2017</year>
          )
          <fpage>2481</fpage>
          -
          <lpage>2495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Johns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <article-title>End-to-end multi-task learning with attention</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1871</fpage>
          -
          <lpage>1880</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stone</surname>
          </string-name>
          , Q. Liu, FAMO:
          <article-title>Fast adaptive multitask optimization</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Learning eficient convolutional networks through network slimming</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2736</fpage>
          -
          <lpage>2744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          , L. Shao, Hrank:
          <article-title>Filter pruning using highrank feature map</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1529</fpage>
          -
          <lpage>1538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Vinogradova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dibrov</surname>
          </string-name>
          , G. Myers,
          <article-title>Towards Interpretable Semantic Segmentation via Gradientweighted Class Activation Mapping</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>13943</fpage>
          -
          <lpage>13944</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nauta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trienes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          , E. Nguyen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schlötterer</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Keulen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Seifert</surname>
          </string-name>
          ,
          <article-title>From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Achanta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lucchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fua</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Süsstrunk, SLIC Superpixels Compared to State-of-the-Art Superpixel Methods</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>34</volume>
          (
          <year>2012</year>
          )
          <fpage>2274</fpage>
          -
          <lpage>2282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hedström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krakowczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bareeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Motzkus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Samek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lapuschkin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M.- C. Höhne</surname>
            ,
            <given-names>Quantus:</given-names>
          </string-name>
          <article-title>An explainable ai toolkit for responsible evaluation of neural network explanations and beyond</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>24</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>