<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshops, Los
Angeles, USA, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Interactive Naming for Explaining Deep Neural Networks: A Formative Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mandana Hamidi-Haines</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhongang Qi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Fern</string-name>
          <email>Alan.Fern@oregonstate.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fuxin Li</string-name>
          <email>Fuxin.Li@oregonstate.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasad Tadepalli</string-name>
          <email>Prasad.Tadepalli@oregonstate.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Electrical Engineering and Computer Science, Oregon State University Corvallis</institution>
          ,
          <addr-line>Oregon</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <abstract>
        <p>We consider the problem of explaining the decisions of deep neural networks for image recognition in terms of human-recognizable visual concepts. In particular, given a test set of images, we aim to explain each classification in terms of a small number of image regions, or activation maps, which have been associated with semantic concepts by a human annotator. The main contribution of this paper is a systematic study of the visual concepts produced by five human annotators using an interactive naming interface in terms of the adequacy of the concepts for explaining the test images and the inter-annotator agreement of visual concepts. Our work is an exploratory study of the interplay between machine learning and human recognition mediated by visualizations of the results of learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Human-centered computing → Human computer
interaction (HCI); • Computing methodologies → Neural networks.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Deep neural networks (DNNs) are powerful learning models that
achieve excellent performance on many problems ranging from
object recognition to machine translation. However, the potential
utility of DNNs is limited by the lack of human interpretability
of their decisions, which can lead to a lack of trust. The goal of
this paper is to study an approach, called interactive naming, for
improving our understanding of the decision-making process of
DNNs. In particular, this approach allows a human annotator to
visualize and organize activation maps of critical neurons into
meaningful visual concepts, which can then be used to explain
decisions made over the test data.</p>
      <p>
        Interpreting the role of neurons in the decisions of DNNs has
been a long-standing problem in artificial intelligence[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Much
recent work on interpretibility are based on the following methods:
IUI Workshops’19, March 20, 2019, Los Angeles, USA
Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted
for private and academic purposes. This volume is published and copyrighted by its
editors.
1) heatmap-based methods, which focus on visualizing activation
maps that highlight parts of the input that are most important to
the final decision of the DNN or the output of an individual
neuron [
        <xref ref-type="bibr" rid="ref1 ref11 ref12 ref12 ref13 ref14 ref15 ref17 ref18">1, 11, 12, 12–15, 17, 18</xref>
        ]. 2) perturbation-based methods, which
perturb parts of the input to see which ones are most important to
preserve the final decision [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. 3) concept-based methods, which
analyzes the alignment between individual hidden neurons and a
set of semantic concepts [
        <xref ref-type="bibr" rid="ref19 ref2 ref7">2, 7, 19</xref>
        ]. While this provides additional
insight into the semantics of neurons, they requires large sets of
data labeled by the semantic concepts and is limited to the semantic
concepts in that data. Importantly, none of the current approaches
support human interaction in recognizing, clustering, and naming
the concepts implicitly employed by the neural network in making
its decisions. While some methods do employ human recognizable
concepts, they are learned by the system ofline from a large amount
of labeled data that may or may not be relevant to the task at hand.
      </p>
      <p>In this work, we make progress toward this goal by building an
interface for interactive naming, and conducting a formative study
on a set of non-trivial image classification tasks. In particular, our
approach is based on the idea that the final decision of a DNN is
dominated by the most highly-weighted neuron activations (the
significant activations ) in the penultimate network layer. Explanations
of the decisions can thus be formed by 1) identifying the significant
activations for each decision, and 2) attaching meaningful concepts
to the significant activations. Since DNNs typically have thousands
of units in the penultimate layer, (1) can result in an overwhelming
number of activations. To address this issue we draw on recent
work that augments the original DNN with a learned explanation
Neural Network (xNN ), which mimics the predictions of the DNN
using a much smaller penultimate layer of X-features. Since the
xNN is efectively equivalent to the original DNN, we can use it to
make predictions on test instances with no loss in accuracy, but
with a dramatic reduction in the number of significant activations
to be considered for explanations.</p>
      <p>
        To deal with (2), our interface displays the (significant) activation
maps of X-features for decisions made on a test set and allows
an annotator to cluster the activations into meaningful groups
called “visual concepts.” Even though there are a small number of
significant activations that suficiently explain the final decisions,
there may not be a one-to-one correspondence between them and
human-recognizable visual concepts. Indeed, unlike in the standard
supervised learning setting, where the number of classes/concepts
is typically fixed beforehand, the number of visual concepts covered
by the set of all significant activations is unknown. To make matters
more interesting, the set of visual concepts might be diferent for
diferent annotators. Finally, the annotators may not be able to label
a map in isolation, and might need to see multiple images and find
similarities and diferences before labeling them. Indeed, this last
problem has been studied under the name of “structured labeling,”
in the context of active learning and provides an inspiration for
our work [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Drawing from the lessons of previous work, our
interface provides maximum flexibility to the human annotators
by presenting them with the activation maps of all X-features of all
test images that belong to each category. Unlike the previous work
on supervised and active learning which seek labels from a fixed
label set, the annotators are asked to cluster the maps in a way that
makes most sense to them and give them meaningful names.
      </p>
      <p>The result of interactive naming is a set of explanations of test set
predictions in terms of visual concepts. This enables summarizing
the types of predictions that are made to gain confidence in the
predictor and/or identify potential flaws in the predictor.
Importantly, this type of summary is dependent on the human annotator,
which raises interesting questions about diferences in explanations
that might result from diferent annotators. Specifically, we seek
answers to the following research questions (RQs): through our
study:</p>
      <sec id="sec-2-1">
        <title>RQ1 (Coverage of Interactive Naming): What fraction of the</title>
        <p>examples are explainable using human recognizable visual
concepts? If a significant fraction of the examples are not explainable
via visual concepts, it might mean that the X-features are not
properly aligned with human concepts and will have to be retrained
from human data.</p>
      </sec>
      <sec id="sec-2-2">
        <title>RQ2 (Inter-annotator Agreement): How much overlap ex</title>
        <p>ists between the annotated sets of activations between diferent
subjects? How much do the clusters of diferent subjects overlap?
Existence of significant overlaps might suggest that we can move
toward building a standardized ontology of visual concepts for
explanations. Lack of significant agreement might mean that we will
have to personalize explanations to diferent annotators.</p>
        <p>
          We explore the above questions through empirical experiments
and annotator studies based on data from 5 annotators on a bird
species classification dataset [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The studies reveal that a
significant fraction of the images are human recognizable with some
individual diferences among diferent annotators.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>INTERACTIVE NAMING FOR TEST SET</title>
    </sec>
    <sec id="sec-4">
      <title>EXPLANATIONS</title>
      <p>We first give an overview of the overall approach and then describe
each component of the system.
2.1</p>
    </sec>
    <sec id="sec-5">
      <title>Overview</title>
      <p>Our overall goal is to develop tools to help understand the decisions
of DNNs that are trained for image recognition via supervised
learning. In particular, we aim to generate meaningful explanations
for decisions made over a representative set of test images. This can
provide insight into the strengths and weaknesses of the learned
DNN that may not be apparent by just observing test set accuracy.
For example, one might hope to discover situations where the DNN
is making the right decision, but for the wrong reason, which would
identify potential future failure modes.</p>
      <p>Figure 1 shows an overview of our interactive naming approach
for producing test set explanations. At a high-level, each DNN
decision for a test image is dominated by a set of the most significant
activations of neurons in the penultimate layer. Thus, attaching
meaningful concepts to those activations is one way to explain
decisions. However, typical DNNs use very large penultimate
layers, which makes training easier, but can result in less compact
explanations due to the large numbers of significant activations.
For this reason we attach an xNN to the penultimate layer of the
DNN, which is trained to reproduce the decisions of the DNN, but
dramatically reduces the number of activations. Thus, explanations
can be formed in terms of much smaller number of activations.</p>
      <p>In order to attach meaning to the significant xNN activations
we developed an interactive naming interface which displays
visualizations of the significant activations to a human annotator.
The annotator is then able to cluster the activations into
meaningful groups, called visual concepts, and attach linguistic labels
to the groups if desired. Given a test instance, we can then form
an explanation by producing the significant xNN activations and
displaying the group identities/names of those activations.
Qualitatively diferent decisions will tend to have diferent explanations.
A key functionality of the system is to allow for the investigation
into the diefrent qualitative decision types over the test set. The
rest of this section explains the above steps in more detail.
2.2</p>
    </sec>
    <sec id="sec-6">
      <title>Explanation Neural Networks (xNNs)</title>
      <p>
        An xNN [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is an additional network module that can be attached
to any intermediate layer of an original DNN, which typically have
thousands of neurons. The xNN learns a lower dimensional
embedding for the DNN layer, resulting in a vector of X-features, and
then linearly maps the X-features to the output yˆ in order to mimic
the output y of the original DNN model. In our work, we apply
xNNs to a convolutional DNN trained on the available multi-class
data. The DNN outputs p(ci |I ) for each given image I and category
ci ∈ 1, . . . , C. The penultimate layer of the DNN can be
considered as scoring functions for each category s(ci |I ), where a softmax
unit p(ci |I ) = ÍCs(ci |I ) serves as the final layer of the DNN that
i=1 s(ci |I )
computes the class-conditional probability from the scores. xNN
is trained starting from the first fully-connected layer in the DNN
for each class, aiming at being faithful to the scoring functions
s(ci |I ) for each category. The xNNs can then be used for multi-class
prediction by computing the scores produced by each xNN and
returning the highest scoring class.
      </p>
      <p>It is desirable for X-features to have the following 3 properties:
1) faithfulness, the DNN predictions can be faithfully approximated
from a simple linear transform of the X-features; 2) sparsity, a
relatively small number of X-features are active per image, and 3)
orthogonality, the X-features are as independent from each other
as possible.
2.3</p>
    </sec>
    <sec id="sec-7">
      <title>Explanations via Interactive Naming</title>
      <p>Given a test image and a class c, we can use the xNN for c to produce
a class score. This score is a linear combination Íi wi · xi of the
X-features xi and their associated weights. The positive terms (i.e.
X-features with positive weights) in the linear combination sum to
provide a positive score that can be viewed as providing positive
evidence for c. Typically only a subset of the positive terms are
significant. Thus, we define the significant X-features for the image
to be minimum subset of X-features that account for at least 90%
Deep Neural
Network (DNN)</p>
      <p>XNN framework</p>
      <p>Prediction Module</p>
      <p>Explanation
Dimensionality Neural Network
Reduction (xNN)</p>
      <p>Prediction
Explanation
Module</p>
      <p>Visual concepts</p>
      <p>/ names
Significant
Activations
Significant Activations</p>
      <p>Interactive Naming</p>
      <p>Interface</p>
      <p>Add newVisual Word Save
Interactive Naming</p>
      <p>Explanations
Based on
visual
concepts
e.g. “Eye” and “Crown”
Test set</p>
      <p>Explanation</p>
      <p>Space (x-features)
of the positive score. The significant X-features can be viewed as a
type of explanation of why the image might be assigned to class c.
However, they do not have associated semantics, so the explanation
is not very useful for human consumption.</p>
      <p>
        To assign semantics to explanations, we can first produce an
activation map for each significant X-feature in an image for the class
under consideration, which identifies the “salient" image region
that is responsible for the X-feature activation. In this work, we
use the ExcitationBP algorithm for computing activation maps [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
We call these maps the significant activation maps or simply the
significant activations . While one can gain insight into a prediction
by simply viewing the significant activations, it is dificult to obtain
a general understanding of the core semantic concepts and
combinations of those concepts used for predictions across an entire test
set, which is our goal. Figure 2 (left) shows an example of a bird’s
original image followed by its 5 X-feature activations, which are
superimposed on the original image.
      </p>
      <p>Our interface is designed to attach semantics to all the significant
activations across a test set. In particular, the interface allows an
annotator to cluster the significant activations, where each group
is intended to represent a semantically meaningful visual concept
to the annotator. Activations that are assigned to a visual concept
are considered to the named, while other activations are considered
to be unnamed. The complete set of named activations resulting
from interactive naming is called a naming of the test set. Given a
naming of a test set, we can now generate an explanation for each
test image by generating the significant activations of the image
and outputting the visual concept names for those activations. Thus,
an explanation is just a set of names.
2.4</p>
    </sec>
    <sec id="sec-8">
      <title>Interactive Naming Interface</title>
      <p>
        One of the key aspects of interactive naming is that the set of visual
concepts is not known beforehand and varies from person to person.
Moreover, the visual concepts in an image are not immediately
apparent until the annotator sees multiple images. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], it was shown
that human labelers are more eficient when they are presented
with multiple instances at once and are allowed to choose the ones
they want to label. In another study [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], it was demonstrated that
not only lableing multiple images is more eficient, but also elicits
more consistent labels.
      </p>
      <p>Following the previous work, we designed a flexible user
interface(Figure 2 (Right)) to group the significant activations into
diferent visual concepts and give them textual labels/names. The
set of X-feature activations is shown to the annotator in the
“Unlabeled Examples” section of the interface. The annotator can freely
cluster activations into visual concepts and give them names. The
interface allows the annotators to compare all instances, and create
new visual concepts when they are confident. If the annotator is
not comfortable with grouping or labeling some activations, they
can leave them in the unlabeled section. The subjects can move
images across clusters, and merge clusters. They are also allowed
to discard the activations that they consider noisy.
2.5</p>
    </sec>
    <sec id="sec-9">
      <title>Data Preparation</title>
      <p>
        All our experiments were conducted on 12 categories of
CaltechUCSD Birds-200-2011 dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The first row of Table 1 shows
the number of images in each category. Given a convolutional DNN
trained on the available multi-class data, we train xNN starting
from the first fully-connected layer in the DNN for each category.
This approach reduces the dimensionality from 4,096 features in
the DNN to 5 X-features in the xNN without significant loss of
accuracy. The fifth and sixth rows of Table 1 shows the multi-class
classification accuracy of the xNN on the original 200 categories
after replacing the DNN score with the one generated by xNN for
each respective category, as well as the original DNN accuracy
on those categories. It can be seen that xNN has almost identical
accuracies as the DNN, even performing better than DNN in one
case. Also, the last row shows the Root Mean Squared Error (RMSE)
of xNN, which approximates exact scores of DNN well. xNN has
an RMSE between 0.2 − 0.5 while the range of the scoring function
is usually 0 − 50.
      </p>
      <p>Since the X-features with negative weights do not provide
positive evidence to the class at hand, their activation maps are not
used for annotation. We further filter the activation maps to only
those maps that contribute to 90% of the total positive weight for
the final decision. We call these significant activations . The second
row of Table 1 shows the total number of significant activations in
each category. The third shows the average number of significant
activations per image.
3</p>
    </sec>
    <sec id="sec-10">
      <title>HUMAN SUBJECT STUDY</title>
      <p>We had the activation maps of the diferent images annotated by
5 diferent subjects using the annotation interface. The activation
maps were separated by the class, but not by the X-feature. The
annotators were instructed to not introduce visual concepts that
only applied to one or two images, but were otherwise free to cluster
and label as many images as it made sense to them. However, not
all subjects followed instructions and left some clusters with less
than 3 images. In the following analysis, we first cleaned the data
by removing a small number of clusters with less than 3 images.
3.1</p>
    </sec>
    <sec id="sec-11">
      <title>RQ1: Coverage of Interactive Naming</title>
      <p>Since the annotators are not forced to assign visual concepts to, or
name all significant activations, some of the activations in the data
are unnamed and treated as noise/outliers. Here we are interested
in how well the annotations cover the activations and explanations
and how this coverage varies across annotators.</p>
      <p>Figure 3 shows the fraction of significant activations that are
named by each annotator for each bird category. In addition, the last
bar for each category, labeled “Any Annotator", shows the fraction
of significant activations that were assigned to a visual concept
by at least one annotator. We see that within a particular class,
there is relatively small variation among users and that the “Any
Annotator" bar is not much higher than that of the typical individual
annotator. This indicates that there is some consistency in the set of
activations that users consider to be noise. Also for most categories
there is a relatively significant amount of activations not labeled
by users, approximately ranging from 20% to 40%.
We now consider how well the annotations cover explanations,
which gives a better sense of how useful they will be for analyzing
explanations. In particular, we consider an explanation for an image
to be completely (partially) covered by an annotation if all (at least
one) of the significant activations for that image are named. Figures
4 and 5 show the partial and complete coverage for each annotator
and the “Any Annotator". We see that for most annotators the
fraction of explanations that are at least partially covered is quite
high. This means that at least partial semantics will be available
for explanations on the vast majority of cases. We also see that
the “Any Annotator" bar is similar to the individual annotators,
which indicates that the sets of partially covered explanations across
annotators is similar. The complete coverage percentages drop
substantially, which is not surprising given the results for activation
coverage from Figure 3. Once again the “Any Annotator” bar is not
significantly diferent from the rest.</p>
      <p>We performed a qualitative analysis to understand some of the
reasons that annotators were not able to assign names to
activations. One of the major reasons was when activations were dificult
to interpret and appeared to be noise. For example, when
activations highlight the edge of the image or fall on background with
unclear semantics. Such activations are potential warning
indicators about a classifier. Thus, uncovering these examples through
interactive naming has value. In other cases, the activation map was
interpretable to the annotator, but there were not enough similar
activation maps to form a cluster. This case may be resolved by
using a larger test set.
3.2</p>
    </sec>
    <sec id="sec-12">
      <title>RQ2: Inter-annotator Agreement</title>
      <p>In general we can expect diferent annotators to produce diferent
namings for a test set, where at least some of the visual concepts
differ. Here we consider the extent that these diferent namings agree
and in turn whether explanations produced by diferent namings
are semantically similar. Understanding this issue is important for
understanding the extent to which explanations are fundamentally
annotator specific.</p>
      <p>First, we consider annotator agreement about which significant
activations should be named. Figure 6 shows, for each bird category,
the fraction of significant activations that were named by diferent
numbers of annotators - 0 thru 5. Interestingly, the largest fraction
of activations are annotated by all 5 annotators and the second
largest are annotated by 0 annotators. This confirms, once again,
that for most significant activations, either all annotators choose
to assign a name or none of them do. There is strong agreement
about the set of activations that should be named.
‘Lower
body’
(size=4)
1
'Wing’
(size=34)
23</p>
      <p>3
'Open</p>
      <p>Wing’
(size=28)
'Open</p>
      <p>Wing’
(size=28)</p>
      <p>23
'Wing’
(size=34)</p>
      <p>'Eye’
(size=20)</p>
      <p>12
'Eye’
(size=13)
D = 1
'Nose’
(size=7)</p>
      <p>7
'Beak’
(size=12)</p>
      <p>Since not all significant activations are labeled by all annotators,
we first try to characterize the fraction of common annotations
between pairs of annotators. Thus, we use the Jaccard index, which is
the ratio of the intersection to the union of the two sets of
signification activations labeled by two annotators, to measure the fraction
of the images both annotators annotated. This is shown in the last
column of Table 2 averaged over diferent pairs of annotators. The
Jaccard index is fairly high for all categories, indicating that there is
a good overlap between the sets of activations chosen by diferent
annotators to annotate.</p>
      <p>We compute the agreement between the two annotators as the
total weight of all edges in the D-family matching as a fraction of
the number of activations labeled by both annotators. If we interpret
the matchines as translations between namings, then the agreement
is the fraction of activations that are translatable between namings.
The columns labeled “Agreement” in Table 2 shows the statistics of
1-family and 2-family agreements for each category over the set of
all annotator pairs. The agreement numbers are fairly high across
most categories, although the minimum values for some categories
for D = 1 are low. Since 2-family matching is more permissive than
1-family matching, the agreement numbers are higher for D = 2
as we expect. Even for D = 1 the agreement in most categories is
reasonably high, which shows that there is reason to be optimistic
about developing a common ontology for explanations.</p>
      <p>Test Set Explanation Summaries. One of the motivations for
naming a test set is to produce summaries of the explanation types
used to predict test images. For example, for category‘a’ over 56%
of the test set predictions had the explanation (‘eye’, ‘close wing’),
which indicates that the network was focusing on the bird eye
and closed wing area for those examples. As another example, for
category ‘l’ over 88% of the predictions had the explanation (‘eye’),
which means the network only looked at the eye area to make
the prediction. This type of insight may cause a practitioner to
either question the robustness of the classifier if they have reason
to believe the eye alone is not discriminative enough. Alternatively,
an expert may gain insight from this explanation and realize that
the eye is discriminative enough for the task.
4</p>
    </sec>
    <sec id="sec-13">
      <title>DISCUSSION AND CONCLUSIONS</title>
      <p>In this paper we studied the problem of understanding the
decisions of DNNs in terms of human-recognizable visual concepts.
Our interactive-naming approach involved augmenting the
original DNN with a sparser xNN, visualizing the significant activation
maps for each decision of the xNN on a test set, and then allowing
annotators to flexibly group the activations into recognizable visual
concepts, while attaching names to the concepts if desired. The
visual concepts can then be used as the basis for producing
concise meaningful explanations for test set images. We reported on
our experience of having 5 annotators use our interface for DNNs
trained to recognize diferent bird species. Our results showed that:
1) annotators were able to assign names to a non-trivial fraction of
activations, which allows for at least partial semantic explanations
for most test images; 2) the annotators had strong agreement about
which activations should and should not be named; 3) there was a
non-trivial amount of agreement between the namings produced
by diferent annotators,</p>
      <p>This formative study has set the stage for a variety of future
work. Our current interactive naming interface is flexible, but does
not attempt to actively reduce the annotator efort. Thus, there is
potential to improve the speedup of naming a test set via active
learning techniques. We are also interested in interactively training
the system based on named concepts, which might reduce the
number of activations that cannot be named. In addition, investigations
on other datasets with even larger varieties of visual concepts is
important for understanding the general characteristics of annotator
produced namings.</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors acknowledge the support of grants from NSF (grant
no. IIS-1619433), ONR (grant no. N00014-11-1-0106), and DARPA
(grant no. DARPA N66001-17-2-4030).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Bach</surname>
          </string-name>
          , Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus Robert Müller, and
          <string-name>
            <given-names>Wojciech</given-names>
            <surname>Samek</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</article-title>
          .
          <source>PLoS One</source>
          <volume>10</volume>
          (
          <issue>10</issue>
          7
          <year>2015</year>
          ). https://doi.org/10.1371/journal.pone.0130140
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>David</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bolei</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Aditya Khosla, Aude Oliva, and Antonio Torralba.
          <year>2017</year>
          .
          <article-title>Network Dissection: Quantifying Interpretability of Deep Visual Representations</article-title>
          . In Computer Vision and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Frédéric</given-names>
            <surname>Cazals</surname>
          </string-name>
          , Dorian Mazauric, Romain Tetley, and
          <string-name>
            <given-names>Rémi</given-names>
            <surname>Watrigant</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Comparing two clusterings using matchings between clusters of clusters</article-title>
          .
          <source>Research Report RR-9063. INRIA Sophia Antipolis - Méditerranée ; Universite Cote d'Azur</source>
          . https://hal.inria.fr/hal-01514872
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Dabkowski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Real Time Image Saliency for Black Box Classifiers</article-title>
          .
          <source>In NIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Fong</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Interpretable Explanations of Black Boxes by Meaningful Perturbation</article-title>
          .
          <source>In 2017 IEEE International Conference on Computer Vision</source>
          (ICCV).
          <volume>3449</volume>
          -
          <fpage>3457</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>McClelland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          .
          <year>1986</year>
          .
          <article-title>Distributed representations</article-title>
          .
          <source>In Parallel distributed processing: Explorations in the microstructure of cognition:</source>
          Vol.
          <volume>1</volume>
          .
          <string-name>
            <surname>Foundations</surname>
            ,
            <given-names>D. E.</given-names>
          </string-name>
          <string-name>
            <surname>Rumeihart</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          <string-name>
            <surname>McClelland</surname>
          </string-name>
          , and the PDP Research Group (Eds.). MIT Press, Cambridge, MA, 77âĂŞ-
          <fpage>109</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Been</given-names>
            <surname>Kim</surname>
          </string-name>
          , Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda B.
          <string-name>
            <surname>Viégas</surname>
            , and
            <given-names>Rory</given-names>
          </string-name>
          <string-name>
            <surname>Sayres</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)</article-title>
          .
          <source>In Proceedings of the 35th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2018</year>
          , Stockholmsmässan, Stockholm, Sweden,
          <source>July 10-15</source>
          ,
          <year>2018</year>
          .
          <fpage>2673</fpage>
          -
          <lpage>2682</lpage>
          . http: //proceedings.mlr.press/v80/kim18d.html
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Todd</given-names>
            <surname>Kulesza</surname>
          </string-name>
          , Saleema Amershi, Rich Caruana, Danyel Fisher, and
          <string-name>
            <given-names>Denis</given-names>
            <surname>Charles</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Structured Labeling for Facilitating Concept Evolution in Machine Learning</article-title>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Zhongang</given-names>
            <surname>Qi</surname>
          </string-name>
          , Saeed Khorram, and
          <string-name>
            <given-names>Fuxin</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Embedding Deep Networks into Visual Explanations</article-title>
          .
          <source>CoRR abs/1709</source>
          .05360 (
          <year>2017</year>
          ). arXiv:
          <volume>1709</volume>
          .
          <fpage>05360</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Advait</surname>
            <given-names>Sarkar</given-names>
          </string-name>
          , Cecily Morrison, Jonas F. Dorn, Rishi Bedi, Saskia Steinheimer, Jacques Boisvert, Jessica Burggraaf,
          <string-name>
            <surname>Marcus D'Souza</surname>
            ,
            <given-names>Peter</given-names>
          </string-name>
          <string-name>
            <surname>Kontschieder</surname>
            , Samuel Rota Bulò, Lorcan Walsh,
            <given-names>Christian P.</given-names>
          </string-name>
          <string-name>
            <surname>Kamm</surname>
            , Yordan Zaykov, Abigail Sellen, and
            <given-names>Siân</given-names>
          </string-name>
          <string-name>
            <surname>Lindley</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision</article-title>
          .
          <source>In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16)</source>
          . ACM, New York, NY, USA,
          <fpage>261</fpage>
          -
          <lpage>271</lpage>
          . https://doi.org/10.1145/2858036.2858199
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization</article-title>
          .
          <source>In 2017 IEEE International Conference on Computer Vision</source>
          (ICCV).
          <volume>618</volume>
          -
          <fpage>626</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Avanti</surname>
            <given-names>Shrikumar</given-names>
          </string-name>
          , Peyton Greenside, Anna Shcherbina, and
          <string-name>
            <given-names>Anshul</given-names>
            <surname>Kundaje</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Not Just a Black Box: Learning Important Features Through Propagating Activation Diferences</article-title>
          .
          <source>CoRR abs/1605</source>
          .01713 (
          <year>2016</year>
          ). http://arxiv.org/abs/1605. 01713
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Karen</surname>
            <given-names>Simonyan</given-names>
          </string-name>
          , Andrea Vedaldi, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps</article-title>
          .
          <source>ICLR Workshop</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.T.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Striving for Simplicity: The All Convolutional Net</article-title>
          . In ICLR Workshop. http://lmb.informatik. uni-freiburg.de/Publications/2015/DB15a
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Mukund</surname>
            <given-names>Sundararajan</given-names>
          </string-name>
          , Ankur Taly, and
          <string-name>
            <given-names>Qiqi</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Axiomatic Attribution for Deep Networks</article-title>
          .
          <source>In Proceedings of the 34th International Conference on Machine Learning</source>
          ,
          <source>Doina Precup and Yee Whye Teh (Eds.)</source>
          .
          <source>PMLR</source>
          ,
          <fpage>3319</fpage>
          -
          <lpage>3328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Branson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          .
          <year>2011</year>
          . The CaltechUCSD Birds-200-2011
          <source>Dataset. Technical Report CNS-TR-2011-001</source>
          . California Institute of Technology.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Matthew</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zeiler</surname>
            and
            <given-names>Rob</given-names>
          </string-name>
          <string-name>
            <surname>Fergus</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Visualizing and Understanding Convolutional Networks</article-title>
          .
          <source>In Computer Vision - ECCV</source>
          <year>2014</year>
          ,
          <string-name>
            <given-names>David</given-names>
            <surname>Fleet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Pajdla</surname>
          </string-name>
          , Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham,
          <fpage>818</fpage>
          -
          <lpage>833</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jianming</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Zhe Lin, Jonathan Brandt,
          <string-name>
            <given-names>Xiaohui</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Stan</given-names>
            <surname>Sclarof</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Top-down neural attention by excitation backprop</article-title>
          .
          <source>In European Conference on Computer Vision</source>
          . Springer,
          <fpage>543</fpage>
          -
          <lpage>559</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Bolei</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Yiyou Sun, David Bau, and Antonio Torralba.
          <year>2018</year>
          .
          <article-title>Interpretable Basis Decomposition for Visual Explanation</article-title>
          . In Computer Vision - ECCV
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>