<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep neural network loses attention to adversarial images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shashank Kotyan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Vasconcellos Vargas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyushu University vargas@inf.kyushu-u.ac.jp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Original Image</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saliency Map of Original Image with respect to true label</institution>
          ,
          <addr-line>ship</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Adversarial algorithms have shown to be effective against neural networks for a variety of tasks. Some adversarial algorithms perturb all the pixels in the image minimally for the image classification task in image classification. In contrast, some algorithms perturb few pixels strongly. However, very little information is available regarding why these adversarial samples so diverse from each other exist. Recently, [Vargas and Su, 2020] showed that the existence of these adversarial samples might be due to conflicting saliency within the neural network. We test this hypothesis of conflicting saliency by analysing the Saliency Maps (SM) and Gradient-weighted Class Activation Maps (Grad-CAM) of original and few different types of adversarial samples. We also analyse how different adversarial samples distort the attention of the neural network compared to original samples. We show that in the case of Pixel Attack, perturbed pixels either calls the network attention to themselves or divert the attention from them. Simultaneously, the Projected Gradient Descent Attack perturbs pixels so that intermediate layers inside the neural network lose attention for the correct class. We also show that both attacks affect the saliency map and activation maps differently. Thus, shedding light on why some defences successful against some attacks remain vulnerable against other attacks. We hope that this analysis will improve understanding of the existence and the effect of adversarial samples and enable the community to develop more robust neural networks.</p>
      </abstract>
      <kwd-group>
        <kwd>Feed-forward pass</kwd>
        <kwd>Original Image</kwd>
        <kwd>Backpropagation pass</kwd>
        <kwd>Original Image</kwd>
        <kwd>Feed-forward pass</kwd>
        <kwd>Adversarial Image</kwd>
        <kwd>Backpropagation pass Adversarial Image</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Since adversarial samples were discovered some years ago in
[Szegedy, 2014] for neural networks, the variety of adversarial
samples and corresponding adversarial algorithms has grown
in both number and types. From Universal perturbations,
[Moosavi-Dezfooli et al., 2017] that can be added to almost
any image to generate an adversarial sample, to the addition
of crafted patches [Brown et al., 2017] or in fact, even the
addition of one-pixel [Su et al., 2019] was also shown to
cause networks to be enough to misclassify. Some approaches
rely on detecting adversarial samples to mitigate the adverse
effects of adversarial algorithms, while some approaches rely
on defensive algorithms. However, most defences rely on
obfuscating gradients [Athalye et al., 2018] which can be
broken by black-box and stronger attacks.</p>
      <p>It was shown in [Vargas and Su, 2020] that changes in
pixels of an image propagate and expand throughout the
layers to either disappear or cause significant changes in the
classification. Additionally, it was also shown that
perturbation in nearby pixels of successful one-pixel attack
has high attack accuracy. This suggests that changes in a pixel
may increase or decrease the influence of a receptive field
(small group of nearby pixels). This is a direct relationship of
the convolution, which is a linear operation.</p>
      <p>In the adversarial setting, the analysis of the spatial
distribution of saliency proves helpful to interpret why
changing some pixels [Su et al., 2019] in the network
corresponds to misclassification. It was hypothesised in
[Vargas and Su, 2020] that the existence of adversarial
samples is due to conflicting saliency, which causes enough
disturbance in the neural network forcing it to misclassify.
Hence, adversarial samples are not naively fooling neural
networks but diverting their attention towards another part of
the image.</p>
      <p>Contributions: In this article, we analyse how these
spatial distribution changes based on perturbed pixels. Figure
1 shows the illustration of the methodology of generating
saliency maps and the difference in saliency caused by
adversarial perturbations. We analyse the spatial distribution
changes caused by perturbing pixels by evaluating Saliency
Maps and Gradient-weighted Class Activation Maps. We use
ResNet trained on CIFAR-10 for our evaluation. We also
assess the effect for adversarial samples generated by Pixel
Attack (an L0 norm black box attack) [Su et al., 2019], and
Projected Gradient Descent Attack (an L1 norm white-box
attack) [Madry et al., 2018].</p>
      <p>We evaluate the Saliency Maps and Gradient-weighted
Class Activation Maps concerning the predicted class, the true
class and the misclassified class of both original and
adversarial images. Our experiments reveal that both Pixel
Attack and Projected Gradient Descent Attack distorts the
saliency maps and activation maps differently. Where the
Pixel Attack calls the network’s attention towards the
perturbed pixels, effectively changing the region of interest
for the neural networks. The Projected Gradient Descent
Attack diffuses the image’s saliency and effectively destroys
the class activation maps for true class for the adversarial
images.</p>
      <p>We also investigate the hypothesis that adversarial samples
exist due to a conflicting image saliency raised in [Vargas and
Su, 2020]. Our experiments demonstrate that pixels define
both the strength as well as the type of feature (i.e., changing
pixels change the relationship between pixels of a receptive
field and, therefore, the feature). Therefore, it is possible to
either destroy a feature or decrease its intensity by modifying
the pixels slightly. It follows that the same should be valid
for modifying pixels slightly all over the image (in this case,
however, it is natural that the relative modification is more
important).
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <sec id="sec-2-1">
        <title>Adversarial Machine Learning</title>
        <p>It was exhibited in [Szegedy, 2014] that neural networks
behave oddly for almost the same images. Afterwards, in
[Nguyen et al., 2015], the authors demonstrated that neural
networks show high confidence when presented with textures
and random noise. This led to discovering a series of
vulnerabilities in neural networks, which were then exploited
by adversarial attacks.</p>
        <p>Defensive distillation [Papernot et al., 2016], a defence was
proposed, in which a smaller neural network squeezes the
content learned by the original one was proposed as a defence.
However, it was shown not to be robust enough in [Carlini
and Wagner, 2017]. Adversarial training was also proposed,
in which adversarial samples are used to augment the training
dataset [Goodfellow et al., 2014], [Madry et al., 2018].
Augmentation of the dataset is done so that the neural
network should classify the adversarial samples, thus
increasing their robustness. Although adversarial training can
increase the robustness slightly, the resulting neural network
is still vulnerable to attacks [Tramèr et al., 2018].</p>
        <p>Regarding understanding the phenomenon, it is argued in
[Goodfellow et al., 2014] that neural networks’ linearity is
one of the main reasons. Another investigation proposes the
conflicting saliency added by adversarial samples as the
reason for misclassification [Vargas and Su, 2020]. A
geometric perspective is analysed in [Moosavi-Dezfooli et al.,
2018], where it is shown that adversarial samples lie in shared
subspace, along which the decision boundary of a classifier is
positively curved. Further, in [Fawzi et al., 2018], a
relationship between sensitivity to additive perturbations of
the inputs and the curvature of the decision boundary of deep
networks is shown. Another aspect of robustness is discussed
in [Madry et al., 2018], where authors suggest that the
capacity of the neural networks’ architecture is relevant to the
robustness.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Visualisation and understanding of neural networks</title>
        <p>We can visualise the attention of the neural networks to
understand which part of the image neural network focuses. It
is well known that some parts of the image affect the output
more than others; in fact, the most influential areas can be
seen by plotting the saliency maps of a given sample.</p>
        <p>In [Itti et al., 1998] saliency maps are defined as maps to
represent conspicuity at every location in the visual field by
a scalar quantity. These saliency maps also help guide the
selection of attended locations based on the spatial distribution
of saliency. Therefore, saliency maps contribute to finding
the pixels of the image, which contribute more than the other
pixels based on the spatial distribution of saliency to categorise
the image in neural networks.</p>
        <p>There are two variations of the backpropagation used in
saliency maps in the literature. One is Rectified/ Deconv
Backpropagation [Zeiler and Fergus, 2014], where only
positive gradient information is propagated through the layers,
which correspond to the increase in output. An increase in
output can be interpreted as an increase in output probability
for a class in a classification problem. Another is Guided
Backpropagation [Springenberg et al., 2014], where only
positive gradient information is propagated through the layers,
which have positive activation for the layers.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Adversarial Machine Learning and Saliency</title>
    </sec>
    <sec id="sec-4">
      <title>Maps</title>
      <p>Let us suppose that for the image classification problem,
x 2 Rm n c be the image which is to be classified. Here
m; n is the width and the height of the image, and c is the
number of colour channels. A neural network comprises
several neural layers composed of a set of perceptrons
(artificial neurons) linked together. Each of these perceptrons
maps a set of inputs to output values with an activation
function.</p>
      <p>Thus, function of the neural network (formed by a chain)
can be defined as:</p>
      <p>g(x) = f (k)(: : : f (2)(f (1)(x)))
where f (i) is the function of the ith layer of the network,
where i = 1; 2; 3; : : : ; k and k is the last layer of the neural
network. In the image classification problem, g(x) 2 RN is
the probabilities (confidence) for all the available N classes.</p>
      <p>Also, in adversarial machine learning, one type of
adversarial samples x^ are defined as:
x^ = x + x
fx^ 2 Rm n 3 j argmax[g(x)] 6= argmax[g(x^)]g
in which x is the perturbation added to the input. There
exists, a wide variety of norm-constraints imposed upon x
such as L0, L1, L2, and L1 which allows for different
adversarial attacks. L0 norm allows attacks to perturb a few
pixels strongly, L1 norm allow all pixels to change slightly,
and both L1 and L2 allow for a mix of both strategies.</p>
      <p>Making use of the definition of adversarial samples,
adversarial machine learning thus, can be formally defined as
the following optimization problem for untargeted attacks:
minimize g(x + x)C
x
subject to
k xkp
Similarly optimization problem for the targeted attacks can be
defined as:
maximize g(x + x)T
x
subject to
k xkp
where g()C is the soft-label for the correct class, g()T is the
soft-label for the target class, p 2 f0; 1; 2; 1g is the constraint
norm on x and th is the threshold value for the constraint
norm.</p>
      <p>Saliency Maps (pixel-attribution maps or attribution maps
or sensitivity maps) [Simonyan et al., 2013] is used to assess
how output changes concerning a change in input. It can be
used to determine the region of interest in the image for neural
networks. Mathematically, the saliency map, SM 2 Rm;n
can be defined as:</p>
      <p>SM =
SMC = maxc
Hence, we can evaluate saliency map of an image with respect
to predicted class of the image C as:
Similarly, we can evaluate the saliency map of the adversarial
image concerning the predicted class of adversarial image C^
as:</p>
      <p>Gradient-weighted Class Activation Map (Grad-CAM)
[Selvaraju et al., 2017] is another type of saliency map
evaluation that assesses the attention of a convolution layer
concerning a label. It can be used to determine the attention
region of a convolution layer in the image for neural networks.
Mathematically, Gradient-weighted Class Activation Map
with respect to label l for nth layer AMl 2 Rm;n can be
defined as:
Based on this, we can evaluate activation maps of the image
concerning predicted class C as:</p>
      <p>AMl = f (n)(x) meanc</p>
      <p>AMC = f (n)(x) meanc
th
th
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
and we can evaluate activation maps of adversarial image with
respect to predicted class C^</p>
      <p>AdM C^ = f (n)(x^) meanc</p>
      <p>This article assesses Saliency Maps (SM ) and
Gradient-weighted Class Activations Maps (AM ) concerning
correctly predicted classes (or the true class) (C) and
misclassified/incorrectly predicted class (or the adversarial
class) (C^) of both adversarial and original images. Analysing
Saliency Maps and Activations Maps for other unrelated
classes are left as future work. We also focus on the
Activations Maps of the last convolution layer in the neural
network and leave the activation maps of other layers as
future work.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental results and analysis</title>
      <sec id="sec-5-1">
        <title>Experimental settings: We use ResNet [He et al., 2016]</title>
        <p>which is trained on CIFAR-10 dataset [Krizhevsky et al.,
2009]. For adversarial attacks, we evaluate Pixel Attack (L0
norm black-box attack) [Su et al., 2019] and Projected
Gradient Descent Attack (L1 norm white-box attack) [Madry
et al., 2018] using the Adversarial Robustness Toolbox library
[Nicolae et al., 2018]. For evaluating saliency maps, we
replace the traditional ReLU backpropagation with Guided
ReLU backpropagation [Springenberg et al., 2014].
4.1</p>
      </sec>
      <sec id="sec-5-2">
        <title>Saliency Maps</title>
        <p>We visualise the saliency maps of original sample (SM ) and
saliency maps of adversarial sample (SdM ) with respect to
predicted class of the model (Figure 2), true/correct class (C)
(Figure 3), and adversarial/incorrect class (C^) (Figure 4) for
different attacks. For original sample predicted class is
true/correct class (C) where as for adversarial sample
predicted class adversarial/incorrect class (C^). Intuitively,
adversarial perturbations should distort the saliency of the
image as seen by the neural network to induce
misclassification. From the figures, it can be seen that both
Pixel Attack and Projected Gradient Attack distorts the
saliency differently.
SMC
SMC
SMC
SMC
SMC
x^
x^
x^
x^
x^
SdM C^
SdM C^
SdM C^
SdM C^
SdM C^
x
x
x
x
x
SMC
SMC
SMC
SMC
SMC
x^
x^
x^
x^
x^
Projected Gradient Descent Attack
Projected Gradient Descent Attack</p>
        <p>We observe that, in the case of PGD Attack, the saliency of
the image is diffused for the adversarial image created by the
minimal change in pixels (Figure 2). Moreover, we can
visualise that PGD Attack effectively diffuses the attention
around the region of interest. Further, Figures 3 and 4 show
that this diffusion occurs for both true and adversarial class.
This shows that PGD Attack does not distort the region of
interest determined by the neural network but only
manipulating the activations inside layers such that the
adversarial perturbation induces misclassification.</p>
        <p>We also observe some peculiar characteristics of pixel
perturbations generated by Pixel Attack. Differently from the
Projected Gradient Descent Attack, we can see from the
saliency maps (Figure 2) that the attention of the neural
network shifts to the perturbed region and not diffuses as in
Projected Gradient Descent Attack. This diversion of
attention in the neural network is the consequence of only
introducing perturbed pixels. Figure 3 shows that the saliency
map with respect to the true class changes minimally as the
saliency in the adversarial image is in the same region as the
original image. However, Figure 4 shows that the saliency
map changes with respect to the adversarial class and this
distorted saliency dominate the attention with respect to the
correct class. This shows that perturbed pixels created by
Pixel Attack call the neural network’s attention towards them
and diverts the network’s attention towards the adversarial
class. Thus, Pixel Attack induces misclassification by
changing the region of interest for the neural network through
adversarial perturbations.</p>
        <p>Further, this effect of diversion in attention due to perturbed
pixels in Pixel Attack is propagated through the network to
intensify, which causes misclassification in the neural network,
as shown by [Vargas and Su, 2020]. This further builds on the
hypothesis of conflicting saliency as adversarial perturbations
severely affect the saliency of the image determined by neural
networks. This also shows that distortion in saliency proves
that adversarial perturbations do not fool the neural network
naively but diverts attention towards the perturbations.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.2 Gradient-weighted Class Activation Maps</title>
        <p>We visualise the gradient-weighted class activation maps of
original sample (AM ) and activation maps of adversarial
sample (AdM ) with respect to predicted class of the model
(Figure 5), true/correct class (C) (Figure 6), and
adversarial/incorrect class (C^) (Figure 7) for different attacks.
For original sample predicted class is true/correct class (C)
where as for adversarial sample predicted class
SdM C^
SdM C^
SdM C^
SdM C^
SdM C^
AMC
AMC
AMC
AMC
AMC
x^
x^
x^
x^
x^
SMC^
SMC^
SMC^
SMC^
SMC^
x^
x^
x^
x^
x^
Projected Gradient Descent Attack
Projected Gradient Descent Attack
adversarial/incorrect class (C^).</p>
        <p>Figure 5 shows that the Projected Gradient Descent, a
white-box attack, keeps the attention of the convolution layer
in the same region using the gradients. However, from Figure
6, we can see that attention for the correct class is strongly
distorted for adversarial image compared to the original.
Whereas the attention for the misclassified class for the
adversarial image is distorted to bring it close to the attention
for the correct class for the original image. This points out
that white-box attacks find the perturbation keeping the region
of interest intact and thus only effectively manipulating the
activations inside the neural network (convolution layers) to
call for misclassification.</p>
        <p>From the activation maps of the last convolution layer for
Pixel Attack (Figure 5), we observe that adversarial
perturbations distort the attention of the convolution layer.
This distortion either calls the attention towards or diverts the
network’s attention from the perturbed pixels. Similar to the
characteristic shown by the Saliency Maps, Figures 6 and 7
show that this distortion in activation maps is mainly focused
on adversarial class’s attention, and the true class’s attention
is changed minimally.</p>
        <p>An interesting observation of the activation maps is when
the activation map of a particular class retracts from the
perturbed pixel, and the activation map of another class gets
intensely focused on the perturbed pixel. This sheds light on
the fact that some of the features learned are complementary
to each other, and when the network focuses on one feature, it
loses the attention on other complementary feature. Further,
the influence of perturbed pixels can be visualised through
propagation maps [Vargas and Su, 2020]. As the perturbed
pixels call attention towards another part of the image,
another texture is analysed and weighted more.</p>
        <p>We believe each adversarial attack has its inherent
characteristic strategy to distort the saliency maps and
activation maps of adversarial samples, as both Projected
Gradient Descent Attack and Pixel Attack changes the
saliency maps and activation maps differently. This sheds
light on why some defences are successful against some
attacks while remaining vulnerable to other attacks. This also
calls for a deeper understanding of the effect of adversarial
samples and attacks in general on neural networks to propose
effective defences by understanding how each attack affects
the attention of the network.</p>
        <p>AMC^
AMC^
AMC^
AMC^
AMC^
x^
x^
x^
x^
x^
AdM C
AdM C
AdM C
AdM C
AdM C
x
x
x
x
x
AMC
AMC
AMC
AMC
AMC
x^
x^
x^
x^
x^
Projected Gradient Descent Attack
Projected Gradient Descent Attack</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>This paper analysed Saliency Maps (SM) and
Gradient-weighted Class Activation Maps (Grad-CAM) for
original images and adversarial images. We analyse the
distortion in the saliency of adversarial images compared to
original images to verify the hypothesis of conflicting
saliency. We used adversarial images created by Pixel Attack
and Projected Gradient Descent Attack for analysing the
saliency. Both the attacks differed the way adversarial
perturbations are found (Pixel Attack being black-box and
PGD being white box) and how adversarial perturbations are
added to the image (Pixel Attack being L0 norm attack while
PGD being L1 attack).</p>
      <p>Experimental results show that both black-box and
white-box attacks, irrespective of the norm constraint, distorts
the neural networks’ saliency to make adversarial images
misclassify, and adversarial attacks do not naively fool the
neural networks. Moreover, results reveal that both Pixel
Attack and Projected Gradient Descent Attack distorts the
saliency maps and activation maps differently. While Pixel
Attack, a black-box attack call the image’s saliency to perturb
pixels or divert their saliency from them, effectively changing
the region of interest for intermediate convolution layers of
the neural network to change their region of attention. The
Projected Gradient Descent Attack, a white box attack, on the
other hand, diffuses the saliency around the region of interest
to induce misclassification. Further, this diffusion of saliency
causes intermediate convolution layers to lose attention for
the correct class around the region of interest while gaining
attention for the misclassified class on the intended region.</p>
      <p>Thus, this paper analysed saliency for the adversarial images
and shed light on the conflicting saliency hypothesis raised
in [Vargas and Su, 2020]. It also opens up the understanding
of adversarial attacks by analysing the effect of adversarial
samples on neural networks. As we show, both the attacks
evaluated differ in their strategy to distort the attention of the
neural network. We believe this sheds light on why some
adversarial defences mitigate some adversarial attacks while
remaining vulnerable to others. We hope this analysis will help
the community understand the existence of adversarial samples
and their effect on neural networks and help the community
to develop more robust neural networks and, at the same time,
develop better adversarial defences.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by JST, ACT-I Grant Number
JP50243 and JSPS KAKENHI Grant Number JP20241216.</p>
    </sec>
    <sec id="sec-8">
      <title>Extra Images of Saliency Maps For Pixel Attack</title>
      <p>Extra Images of Gradient-weighted Class Activation Maps For Pixel Attack</p>
      <p>Extra Images of Saliency Maps For Projected Gradient Descent Attack</p>
      <p>Extra Images of Gradient-weighted Class Activation Maps For Projected Gradient Descent</p>
    </sec>
    <sec id="sec-9">
      <title>Attack</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Athalye et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Anish</given-names>
            <surname>Athalye</surname>
          </string-name>
          , Nicholas Carlini,
          <string-name>
            <given-names>and David</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples</article-title>
          .
          <source>In Icml</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Brown et al.,
          <year>2017</year>
          ] Tom
          <string-name>
            <surname>B Brown</surname>
          </string-name>
          , Dandelion Mané, Aurko Roy, Martín Abadi, and
          <string-name>
            <given-names>Justin</given-names>
            <surname>Gilmer</surname>
          </string-name>
          .
          <article-title>Adversarial patch</article-title>
          .
          <source>arXiv preprint arXiv:1712.09665</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Carlini and Wagner</source>
          , 2017]
          <string-name>
            <given-names>Nicholas</given-names>
            <surname>Carlini</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Towards evaluating the robustness of neural networks</article-title>
          .
          <source>In 2017 ieee symposium on security and privacy (sp)</source>
          , pages
          <fpage>39</fpage>
          -
          <lpage>57</lpage>
          . Ieee,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Fawzi et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Alhussein</given-names>
            <surname>Fawzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Seyed-Mohsen</surname>
            Moosavi-Dezfooli,
            <given-names>Pascal</given-names>
          </string-name>
          <string-name>
            <surname>Frossard</surname>
            , and
            <given-names>Stefano</given-names>
          </string-name>
          <string-name>
            <surname>Soatto</surname>
          </string-name>
          .
          <article-title>Empirical study of the topology and geometry of deep networks</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>3762</fpage>
          -
          <lpage>3770</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Goodfellow et al.,
          <year>2014</year>
          ] Ian J Goodfellow, Jonathon Shlens, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          .
          <article-title>Explaining and harnessing adversarial examples</article-title>
          .
          <source>arXiv preprint arXiv:1412.6572</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [He et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Itti et al.,
          <year>1998</year>
          ]
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Itti</surname>
          </string-name>
          , Christof Koch, and
          <string-name>
            <given-names>Ernst</given-names>
            <surname>Niebur</surname>
          </string-name>
          .
          <article-title>A model of saliency-based visual attention for rapid scene analysis</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          , (
          <volume>11</volume>
          ):
          <fpage>1254</fpage>
          -
          <lpage>1259</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Krizhevsky et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Geoffrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          , et al.
          <article-title>Learning multiple layers of features from tiny images</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Madry et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Aleksander</given-names>
            <surname>Madry</surname>
          </string-name>
          , Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Vladu</surname>
          </string-name>
          .
          <article-title>Towards deep learning models resistant to adversarial attacks</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [
          <string-name>
            <surname>Moosavi-Dezfooli</surname>
          </string-name>
          et al.,
          <year>2017</year>
          ]
          <string-name>
            <surname>Seyed-Mohsen</surname>
            <given-names>MoosaviDezfooli</given-names>
          </string-name>
          , Alhussein Fawzi, Omar Fawzi, and
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Frossard</surname>
          </string-name>
          .
          <article-title>Universal adversarial perturbations</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>1765</fpage>
          -
          <lpage>1773</lpage>
          . Ieee,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [
          <string-name>
            <surname>Moosavi-Dezfooli</surname>
          </string-name>
          et al.,
          <year>2018</year>
          ]
          <string-name>
            <surname>Seyed-Mohsen</surname>
            <given-names>MoosaviDezfooli</given-names>
          </string-name>
          , Alhussein Fawzi, Omar Fawzi, Pascal Frossard, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Soatto</surname>
          </string-name>
          .
          <article-title>Robustness of classifiers to universal perturbations: A geometric perspective</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Nguyen et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Anh</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , Jason Yosinski, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Clune</surname>
          </string-name>
          .
          <article-title>Deep neural networks are easily fooled: High confidence predictions for unrecognizable images</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>427</fpage>
          -
          <lpage>436</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Nicolae et al.,
          <year>2018</year>
          ]
          <string-name>
            <surname>Maria-Irina</surname>
            <given-names>Nicolae</given-names>
          </string-name>
          , Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Martin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, Ian Molloy, and
          <string-name>
            <given-names>Ben</given-names>
            <surname>Edwards</surname>
          </string-name>
          .
          <source>Adversarial robustness toolbox v1.1</source>
          .0. CoRR,
          <year>1807</year>
          .
          <volume>01069</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Papernot et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Papernot</surname>
          </string-name>
          ,
          <string-name>
            <surname>Patrick</surname>
            <given-names>McDaniel</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xi Wu</surname>
            , Somesh Jha, and
            <given-names>Ananthram</given-names>
          </string-name>
          <string-name>
            <surname>Swami</surname>
          </string-name>
          .
          <article-title>Distillation as a defense to adversarial perturbations against deep neural networks</article-title>
          .
          <source>In 2016 IEEE Symposium on Security and Privacy (SP)</source>
          , pages
          <fpage>582</fpage>
          -
          <lpage>597</lpage>
          . Ieee,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Selvaraju et al.,
          <year>2017</year>
          ]
          <string-name>
            <surname>Ramprasaath R Selvaraju</surname>
          </string-name>
          , Michael Cogswell,
          <string-name>
            <surname>Abhishek Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ramakrishna Vedantam</surname>
            , Devi Parikh, and
            <given-names>Dhruv</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          .
          <article-title>Grad-cam: Visual explanations from deep networks via gradient-based localization</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          , pages
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Simonyan et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          , Andrea Vedaldi, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Deep inside convolutional networks: Visualising image classification models and saliency maps</article-title>
          .
          <source>arXiv preprint arXiv:1312.6034</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Springenberg et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Jost</given-names>
            <surname>Tobias</surname>
          </string-name>
          <string-name>
            <surname>Springenberg</surname>
          </string-name>
          , Alexey Dosovitskiy, Thomas Brox, and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          .
          <article-title>Striving for simplicity: The all convolutional net</article-title>
          .
          <source>arXiv preprint arXiv:1412.6806</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Su et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Jiawei</given-names>
            <surname>Su</surname>
          </string-name>
          , Danilo Vasconcellos Vargas, and
          <string-name>
            <given-names>Kouichi</given-names>
            <surname>Sakurai</surname>
          </string-name>
          .
          <article-title>One pixel attack for fooling deep neural networks</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <fpage>828</fpage>
          -
          <lpage>841</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Szegedy</source>
          , 2014] Christian et al. Szegedy.
          <article-title>Intriguing properties of neural networks</article-title>
          .
          <source>In In ICLR. Citeseer</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Tramèr et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Florian</given-names>
            <surname>Tramèr</surname>
          </string-name>
          , Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and
          <string-name>
            <surname>Patrick McDaniel</surname>
          </string-name>
          .
          <article-title>Ensemble adversarial training: Attacks and defenses</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Vargas and Su</source>
          , 2020]
          <article-title>Danilo Vasconcellos Vargas and Jiawei Su. Understanding the one-pixel attack: Propagation maps and locality analysis</article-title>
          .
          <source>In Workshop on Artificial Intelligence Safety (AISafety</source>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[Zeiler and Fergus</source>
          , 2014]
          <string-name>
            <given-names>Matthew D</given-names>
            <surname>Zeiler</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rob</given-names>
            <surname>Fergus</surname>
          </string-name>
          .
          <article-title>Visualizing and understanding convolutional networks</article-title>
          .
          <source>In European conference on computer vision</source>
          , pages
          <fpage>818</fpage>
          -
          <lpage>833</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>