<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bamboo: Ball-Shape Data Augmentation Against Adversarial Attacks from All Directions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Huanrui Yang</string-name>
          <email>huanrui.yang@duke.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenhan Wang</string-name>
          <email>wenhanw@microsoft.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jingchi Zhang</string-name>
          <email>jingchi.zhang@duke.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yiran Chen</string-name>
          <email>yiran.chen@duke.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hsin-Pai Cheng</string-name>
          <email>hc218@duke.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hai Li</string-name>
          <email>hai.li@duke.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Duke University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Microsoft</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The robustness of Deep neural networks (DNNs) has been recently challenged by adversarial attacks State-of-the-art defending algorithms improve DNNs' robustness by paying high computational costs. Moreover, these approaches are usually designed against one or a few known attacking techniques only. The effectiveness to defend other types of attacking methods cannot be guaranteed. In this work, we propose Bamboo - the first data augmentation method designed for improving the general robustness of DNN without any hypothesis on the attacking algorithms. Our experiments show that Bamboo substantially improve the general robustness against arbitrary types of attacks and noises, achieving better results comparing to previous adversarial training methods, robust optimization methods and other data augmentation methods with the same amount of data points.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In recent years, deep neural network (DNN) models (e.g.,
CNNs) have been widely used in many real-world
applications
        <xref ref-type="bibr" rid="ref15 ref4 ref7">(LeCun et al. 1998; Simonyan and Zisserman 2014)</xref>
        .
However, they exposed a high sensitivity to input data
samples and therefore are vulnerable to adversarial attacks. A
“small” perturbation can be applied on input samples, which
is visually indistinguishable by humans but can result in
the misclassification of DNN models
        <xref ref-type="bibr" rid="ref1 ref17 ref18 ref6 ref9">(Szegedy et al. 2013;
Carlini and Wagner 2017; Madry et al. 2018)</xref>
        , indicating a
serious threat against the systems using DNN models.
      </p>
      <p>
        Many approaches have also been proposed to defend
against adversarial attacks. However, adversarial training
methods
        <xref ref-type="bibr" rid="ref15 ref4 ref9">(Goodfellow, Shlens, and Szegedy 2014; Madry
et al. 2018)</xref>
        won’t guarantee the performance against
previously unseen attacks
        <xref ref-type="bibr" rid="ref1 ref17 ref6">(Carlini and Wagner 2017)</xref>
        . While
solving the min-max problem used in Optimization based
methods
        <xref ref-type="bibr" rid="ref1 ref17 ref19 ref5 ref6">(Sinha, Namkoong, and Duchi 2017; Yan, Guo, and
Zhang 2018)</xref>
        often generates a high computational load.
      </p>
      <p>
        Generally speaking, defending against adversarial
attacks can be considered as a special case of increasing the
generalizability of DNN to unseen data points. Therefore
data augmentation method may also be effective.
Previous studies show that training with additional data sampled
from a Gaussian distribution centered at the original
training data can enhance the model robustness against natural
noise
        <xref ref-type="bibr" rid="ref2">(Chapelle et al. 2001)</xref>
        . The recently proposed Mixup
method
        <xref ref-type="bibr" rid="ref12 ref21">(Zhang et al. 2017)</xref>
        surprisingly improved the DNN
robustness against adversarial attacks. However, these data
augmentation may not offer the most efficient way to
enhance the adversarial robustness of DNN as they are not
designed against adversarial attacks.
      </p>
      <p>In this work, we propose Bamboo—a ball shape data
augmentation technique aiming for improving the general
robustness of DNN against adversarial attacks from all
directions. Without requiring any prior knowledge of the
attacking algorithm, Bamboo can effectively enhance the general
robustness of the DNN models against the adversarial noise.
Bamboo can offer a significantly enhanced model robustness
comparing to previous robust optimization methods, without
suffering from the high computational complexity.
Comparing to other data augmentation method, Bamboo can also
achieve further improvement of the model robustness using
the same amount of augmented data. Most importantly, as
our method makes no assumption on the distribution of
adversarial examples, it can work against all kinds of noise.</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <sec id="sec-2-1">
        <title>Measurement of DNN robustness</title>
        <p>
          A metric for measuring the robustness of the DNN is
necessary.
          <xref ref-type="bibr" rid="ref18">(Szegedy et al. 2013)</xref>
          propose the fast gradient sign
method (FGSM) noise, which is one of the most efficient and
most commonly applied attacking method. FGSM generates
an adversarial example x0 using the sign of the local
gradient of the loss function J at a data point x with label y as:
x0 = x + sign(rxJ ( ; x; y)); where controls the strength
of FGSM attack. For its high efficiency in noise generation,
the classification accuracy under the FGSM attack with
certain has been taken as a metric of the model robustness.
        </p>
        <p>
          As FGSM attack leverages only the local gradient for
perturbing the input, if is found that even a DNN model
achieves high accuracy under FGSM attack, it may still
be vulnerable to other attacking methods
          <xref ref-type="bibr" rid="ref10">(Papernot et al.
2016)</xref>
          .
          <xref ref-type="bibr" rid="ref9">(Madry et al. 2018)</xref>
          propose projected gradient
descent (PGD), which attacks the input with multi-step
variant FGSM that is projected into certain space x + S at
the vicinity of data point x for each step. A single step of
the PGD noise generation can be formulated as: xt+1 =
x+S (xt + sign(rxJ ( ; x; y))): Their work shows that
comparing to FGSM, adversarial training using PGD
adversarial is more likely to lead to a universally robust model.
Therefore the classification accuracy under the PGD attack
would also be an effective metric of the model robustness.
        </p>
        <p>
          Besides these gradient based methods, the generation of
adversarial examples can also be viewed as an optimization
process.
          <xref ref-type="bibr" rid="ref18">(Szegedy et al. 2013)</xref>
          describe the general objective
of untargeted attacks as:minimize D(x; x+ ); s:t: C(x+
) 6= C(x): Where D is the distance measurement, which
we use L2 distance here; C is the classification result of
the DNN; and x0 = x + is the adversarial example to
be found. CW attack
          <xref ref-type="bibr" rid="ref1 ref17 ref6">(Carlini and Wagner 2017)</xref>
          defines an
objective function f such that C(x + ) 6= C(x) if and only
if f (x + ) 0. With the use of f , the optimization can be
formulated as: minimize D(x; x + ) + c f (x + ): Such
objective can lead to a higher chance of finding the optimal
efficiently
          <xref ref-type="bibr" rid="ref1 ref17 ref6">(Carlini and Wagner 2017)</xref>
          . Since the objective
of CW attack is to find the minimal possible perturbation
strength of a successful attack, the average strength required
for a successful CW attack can be considered as a reasonable
measurement of the model robustness.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Previous works increasing network robustness</title>
        <p>
          There are previous attempts to derive a bound of the DNN
robustness theoretically
          <xref ref-type="bibr" rid="ref1 ref12 ref17 ref21 ref6">(Peck et al. 2017; Hein and
Andriushchenko 2017)</xref>
          , but these obtained bounds are often too
loose or too complicated to be used as a guideline for robust
training. A more practical approach is adversarial training.
For example, we can generate adversarial examples from
the training data and then include their classification loss to
the loss function
          <xref ref-type="bibr" rid="ref15 ref4 ref9">(Goodfellow, Shlens, and Szegedy 2014;
Madry et al. 2018)</xref>
          . This method can be efficiently
optimized for the limited types of known adversarial attacks.
However, it may not promise the robustness against other
attacking methods, especially those newly proposed ones.
Alternatively, the defender may online generate the worst-case
adversarial examples of the training data and minimize the
loss of such adversarial examples by solving a min-max
optimization problem during the training process. For instance,
the distributional robustness method
          <xref ref-type="bibr" rid="ref1 ref17 ref6">(Sinha, Namkoong,
and Duchi 2017)</xref>
          use the objective minimize F ( ) :=
E[supx0 fL( ; x0) D(x0; x)g] to train the weight of
a DNN model that minimize the loss L of adversarial
example x0 which is near to original data point x but has
supremum loss. This method can achieve some robustness
improvement, but suffers from high computational cost for
optimizing both the network weight and the potential
adversarial example. Also, this work only focuses on small
perturbation attacks, so the robustness guarantee may not
hold on the improvement of robustness under large
attacking strength
          <xref ref-type="bibr" rid="ref1 ref17 ref6">(Sinha, Namkoong, and Duchi 2017)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach</title>
      <sec id="sec-3-1">
        <title>Vicinity risk minimization for robustness</title>
        <p>
          Most of the supervised machine learning algorithms follow
the principle of empirical risk minimization (ERM), which
is based on the hypothesis that the testing data has a similar
distribution as the training data, so minimizing the loss on
the training data would naturally lead to the minimum
testing loss. However, the distribution of adversarial examples
generated by attacking algorithms may be different from the
original training data. Thus the DNN models trained with
ERM would have unsatisfactory performance on adversarial
examples
          <xref ref-type="bibr" rid="ref15 ref4">(Goodfellow, Shlens, and Szegedy 2014)</xref>
          .
        </p>
        <p>
          Instead of ERM, the vicinity risk minimization (VRM)
principle targets to minimize the vicinity risk R^ on the
virtual data pair (x^; y^) sampled from a vicinity distribution
P^(x^; y^jx; y) generated from the original training set
distribution P (x; y)
          <xref ref-type="bibr" rid="ref2">(Chapelle et al. 2001)</xref>
          . Consequently, the
optimization objective of the VRM-based training can be
described as: minimize R^( ) := E(x^;y^)L(f (x^; ); y^):
        </p>
        <p>
          For most of the attacking algorithms, there is a
constraint on the strength of the perturbation, so the
adversarial example x^ can be considered as within a r-radius ball
around the original data x. Without any prior knowledge
of the attacking algorithm, we can consider the adversarial
examples as uniformly distributed within the r-radius ball:
x^ U nif orm(jjx^ xjj2 r). However, directly sampling
the virtual data point x^ within the ball may be data
inefficient. Here we propose to further improve the data
efficiency by utilizing the geometry analysis of DNN model.
Previous research shows that the curvature of DNN’s
decision boundary near a training data point would most likely
be very small
          <xref ref-type="bibr" rid="ref15 ref3 ref4">(Goodfellow, Shlens, and Szegedy 2014;
Fawzi, Moosavi-Dezfooli, and Frossard 2016)</xref>
          . These
observations show that minimizing the loss of data points sampled
within the ball can be approximated by minimizing the loss
of data points sampled on the edge of the ball. Formally, the
vicinity distribution can be modified to:
        </p>
        <p>P^(x^; y^jx; y) = U nif orm(jjx^
xjj2= r)
(y^; y):
(1)
By optimizing the VRM objective with this vicinity
distribution, we can improve the robustness of DNN against
adversarial attacks with higher data efficiency in sampling the
virtual data points for augmentation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Bamboo and its intuitive explanation</title>
        <p>We propose Bamboo, a ball-shape data augmentation
scheme that augments the training set with N virtual data
points uniformly sampled from a r-radius ball centered at
each original training data point. Algorithm 1 provides a
formal description of the proposed method.</p>
        <p>
          Since the decision boundary of the DNN model tends
to have small curvature around training data points
          <xref ref-type="bibr" rid="ref3">(Fawzi,
Moosavi-Dezfooli, and Frossard 2016)</xref>
          , including the
augmented data on the ball naturally pushes the decision
boundary further away from the original training data, therefore
increases the robustness of the learned model. Figure 1 shows
the effect of Bamboo with a simple classification problem.
Here we classify 100 data points sampled from the MNIST
class of the digit “3” and digit “7” each using a multi-layer
perceptron with one hidden layer. PCA is used for
visualization. Figure 1a shows the decision boundary without data
augmentation, where the decision boundary is more curvy
and is overfitting to the training data. In Figure 1b, the
decision boundary after applying our data augmentation
becomes smoother and is further away from original training
Algorithm 1: Bamboo: Ball-shape data augmentation
Input : Augmentation ratio N , Ball radius r, Original
training set (X; Y )
        </p>
        <p>Output: Augmented training set (X^ ; Y^ )
1 n := length(X);</p>
        <p>. Initialization with training set data
2 X^ := X, Y^ := Y ;
3 count := n;
4 for i = 1 : n do
5 x := X[i], y := Y [i];
6 for j = 1 : N do
7 count := count + 1;
8 Sample</p>
        <p>X^ [count] := x + r;</p>
        <p>Y^ [count] := y;
9
points, implying a more robust model with the training set
augmented with our proposed Bamboo method.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment</title>
      <sec id="sec-4-1">
        <title>Experiment setup</title>
        <p>
          For evaluating the effect of parameter r and N on the
performance of our model, we use the average strength of
successful CW attack
          <xref ref-type="bibr" rid="ref1 ref17 ref6">(Carlini and Wagner 2017)</xref>
          as the
metric of robustness. When comparing with previous work, we
use both CW attack strength (marked as CW rob in Table 1)
and the testing accuracy under FGSM attack
          <xref ref-type="bibr" rid="ref18">(Szegedy et
al. 2013)</xref>
          with = 0:1; 0:3; 0:5 respectively (marked as
FGSM1, FGSM3 and FGSM5 in Table 1). The accuracy
under 50 iterations of PGD attack
          <xref ref-type="bibr" rid="ref9">(Madry et al. 2018)</xref>
          with
= 0:3 is also evaluated here (marked as PGD3 in Table 1).
We also test the accuracy under Gaussian noise with
variance = 0:5 (marked as GAU5 in Table 1), which
demonstrates the robustness against attacks from all directions.
        </p>
        <p>
          To visualize the effect on decision boundary, we follow
the setting used in
          <xref ref-type="bibr" rid="ref19 ref5">(He, Li, and Song 2018)</xref>
          ’s work, where
we use 784 random orthogonal directions for MNIST and
1000 random orthogonal directions for CIFAR-10 to linear
search for decision boundary. For each testing data point, we
compute the average of the top 20 smallest distance across
all the testing data points, implying the overall effectiveness
of different methods on increasing the robustness.
(a) Testing accuracy
        </p>
        <p>(b) CW robustness
Bamboo augmentation has two hyper-parameters: the ball
radius r and the ratio of the augmented data N . In figure 2a,
when we fix the radius r, the testing accuracy increases as
the number of augmented points grows up. Adjusting the
radius has little impact on the testing accuracy. Figure 2b
presents that when r is fixed, the robustness improves as
N increases. The effectiveness of further increasing N
becomes less as N gets larger. Under the same data amount,
increasing the radius r can also enhance the robustness, while
the effectiveness of increasing r saturates as r gets larger.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Boundary visualization</title>
        <p>Figure 3 shows the top 20 smallest decision boundary on
random orthogonal directions average across MNIST and
CIFAR-10 testing points respectively. Comparing to
previous methods, our Bamboo data augmentation can provide
largest gain on robustness for the most vulnerable directions.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Performance comparison</title>
        <p>Table 1 summarizes the performance of the DNN model
trained with Bamboo comparing to other methods.
Bamboo achieves the highest robustness under CW attack on
both MNIST and CIFAR-10 experiments, and the lowest
accuracy drop when facing Gaussian noise. Bamboo
demonstrates a higher robustness against a wide range of attacking
methods and the performance of our method is less
sensitive to the change of the attacking strength. Also, the overall
performance of Bamboo is better than Mixup with the same
amount of data augmented. All these observations lead to
the conclusion that our proposed Bamboo method can
effectively improve the overall robustness of DNN models, no
matter which kind of attack is applied or which direction of
noise is added. The ImageNet experiment results showed in
Table 2 show the same trend as well.
= 0:5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>In this work we propose Bamboo, the first data augmentation
method that is specially designed for improving the overall
robustness of DNNs. Without making any assumption on the
distribution of adversarial examples, Bamboo is able to
effectively improve the robustness of DNN models against
different kinds of attacks, and can achieve stable performance
on large DNN models or facing strong adversarial attacks.</p>
      <p>In future work we will discuss the theoretical relationship
between the resulted DNN robustness and the parameters
in our method, and how will the change in the scale of the
classification problem affect such relationship. We will also
propose new training tricks better suited for training with
augmented dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Carlini and Wagner</source>
          <year>2017</year>
          ] Carlini,
          <string-name>
            <given-names>N.</given-names>
            , and
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Towards evaluating the robustness of neural networks</article-title>
          .
          <source>In Security and Privacy (SP)</source>
          ,
          <source>2017 IEEE Symposium on</source>
          ,
          <fpage>39</fpage>
          -
          <lpage>57</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Chapelle et al. 2001]
          <string-name>
            <surname>Chapelle</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Vicinal risk minimization</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>416</volume>
          -
          <fpage>422</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Fawzi,
          <string-name>
            <surname>Moosavi-Dezfooli</surname>
          </string-name>
          , and Frossard 2016]
          <string-name>
            <surname>Fawzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Moosavi-Dezfooli</surname>
            , S.-M.; and Frossard,
            <given-names>P.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Robustness of classifiers: from adversarial to random noise</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>1632</volume>
          -
          <fpage>1640</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Goodfellow, Shlens, and Szegedy 2014]
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shlens</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Explaining and harnessing adversarial examples</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6572</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [He,
          <string-name>
            <surname>Li</surname>
          </string-name>
          , and Song 2018]
          <string-name>
            <surname>He</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Decision boundary analysis of adversarial examples</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Hein and Andriushchenko</source>
          <year>2017</year>
          ] Hein,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Andriushchenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Formal guarantees on the robustness of a classifier against adversarial manipulation</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>2263</volume>
          -
          <fpage>2273</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [LeCun et al.
          <year>1998</year>
          ] LeCun, Y.;
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and Haffner,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>1998</year>
          .
          <article-title>Gradient-based learning applied to document recognition.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>Proceedings of the IEEE</source>
          <volume>86</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Madry et al. 2018]
          <string-name>
            <surname>Madry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Makelov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tsipras</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Vladu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Towards deep learning models resistant to adversarial attacks</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Papernot et al. 2016]
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McDaniel</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          ; Jha,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Celik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. B.</given-names>
            ; and
            <surname>Swami</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Practical black-box attacks against deep learning systems using adversarial examples</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>CoRR abs/1602</source>
          .02697.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Peck et al. 2017]
          <string-name>
            <surname>Peck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roels</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goossens</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and Saeys,
          <string-name>
            <surname>Y.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          2017.
          <article-title>Lower bounds on the robustness to adversarial perturbations.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>804</volume>
          -
          <fpage>813</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Simonyan and Zisserman</source>
          <year>2014</year>
          ]
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zisserman</surname>
          </string-name>
          , A.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          2014.
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>arXiv preprint arXiv:1409</source>
          .
          <fpage>1556</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Sinha, Namkoong, and Duchi 2017]
          <string-name>
            <surname>Sinha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Namkoong</surname>
          </string-name>
          , H.; and
          <string-name>
            <surname>Duchi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Certifiable distributional robustness with principled adversarial training</article-title>
          .
          <source>arXiv preprint arXiv:1710</source>
          .
          <fpage>10571</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Szegedy et al. 2013]
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Sutskever,
          <string-name>
            <surname>I.</surname>
          </string-name>
          ; Bruna,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. J</surname>
          </string-name>
          .; and Fergus,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Intriguing properties of neural networks</article-title>
          .
          <source>CoRR abs/1312</source>
          .6199.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Yan, Guo, and Zhang 2018]
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and Zhang,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Deepdefense: Training deep neural networks with improved robustness</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .00404.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Zhang et al.
          <year>2017</year>
          ] Zhang, H.; Cisse,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Dauphin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. N.</given-names>
            ; and
            <surname>Lopez-Paz</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>mixup: Beyond empirical risk minimization</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>arXiv preprint arXiv:1710</source>
          .
          <fpage>09412</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>