<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adversarial Examples Through Deep Neural Network's Classification Boundary and Uncertainty Regions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Shu</string-name>
          <email>shu30@purdue.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bowei Xi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles Kamhoua</string-name>
          <email>charles.a.kamhoua.civ@army.mil</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Description</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Deep Neural Network, Adversarial Machine Learning, Classification Boundary, Uncertainty Region</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>And often it is</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Purdue University</institution>
          ,
          <addr-line>West Lafayette, IN 47907</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>US Army Research Laboratory</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>classification boundary. Through experiments</institution>
          ,
          <addr-line>we show</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>vulnerabilities. For example, DNN is used to process</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Although AI is developing rapidly, AI's vulnerability under adversarial attacks remains an extraordinarily dificult problem. In this paper we study the root cause of adversarial examples through studying the deep neural network's (DNN) classification boundary. The existing attack algorithms can generate from a handful to a few hundred adversarial examples given one clean sample. We show there are a lot more adversarial examples given one clean sample, all within a small neighborhood of the clean sample. We then define DNN uncertainty regions and show the transferability of adversarial examples is not universal. The results lead to two conjectures regarding the size of the DNN uncertainty regions and where DNN function becomes discontinuous. The conjectures ofer a potential explanation for why the generalization error bound - the theoretical guarantee established for DNN - cannot adequately capture the phenomenon of adversarial examples.</p>
      </abstract>
      <kwd-group>
        <kwd>Uncertainty</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        ter DNN gained popularity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], researchers noticed that
targetedly adding minor perturbations to a clean image
can cause a DNN to misclassify the perturbed image [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Despite decades of theoretical research on DNN, there
are still many unanswered questions regarding DNN’s
properties. For example, we do not know the shape of
DNN classification boundary. There is also a discrepancy
between the established generalization error bounds for
DNN and the existence of adversarial examples.</p>
      <p>
        We know the shape of the decision boundary of many
well known models, such as linear regression,
generalized linear regression, non-parametric regression,
support vector machine (SVM), to name a few. Despite
ate DNN robustness, we are yet to know the shape of
DNN classification boundary. A lack of understanding
of DNN’s classification boundary naturally leads to the
fact that we do not know where are the regions
containing the adversarial examples. There are
conflicting conjectures about the regions containing the
adversarial examples. [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] believed adversarial examples
lie in “dense pockets” in lower dimensional manifold,
caused by DNN’s non-linearity. On the other hand [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
believed it is DNN’s linear nature and the very high
nEvelop-O
The IJCAI-ECAI-22 Workshop on Artificial Intelligence Safety
CEUR
htp:/ceur-ws.org
ISN1613-073
© 2022 Copyright for this paper by its authors. Use permitted under Creative
      </p>
      <p>1. We show DNN classification boundary is highly
many work on building a robust DNN and to evalu- in critical applications without fully understanding its
fractured, unlike other classifiers. There are lower</p>
      <p>resulting learning model to make a mistake with certain
dimensional regions containing adversarial exam- test samples. Assuming there is no easy access to the
ples within a small neighborhood surrounding</p>
      <p>training process, evasion attacks generate test samples
every clean image.
2. Our first conjecture is that the union of these
lower dimensional bounded regions containing
adversarial examples has zero probability mass.</p>
      <p>Our second conjecture is that a DNN function
is discontinuous at the boundary of these lower
dimensional bounded regions, and may be
discontinuous inside some of these bounded regions.</p>
      <p>The two conjectures could be the reason that the
theoretical guarantees established for DNN, such
as the generalization error bounds, co-exist with
the adversarial examples. Hence new theory is
needed to evaluate DNN robustness.</p>
      <p>
        that the learning model cannot handle correctly. The
adversarial examples generated to attack DNN belong
to evasion attacks. Depending on adversaries’
knowledge of a DNN model, there are white-box attacks and
black-box attacks. For white-box attacks, adversaries
know the true DNN model, including model structure
and parameter values. For black-box attacks, adversaries
don’t know the true model. Instead, adversaries query
the true model, build a local substitute model based on
the queries, and attack the local model. A targeted attack
generates adversarial examples that are misclassified into
a pre-determined class, while an untargeted attack simply
generates misclassified samples. Several survey papers
3. We show that transferability of adversarial exam- are published, introducing the current state and the
timeples is not universal, contrary to [
        <xref ref-type="bibr" rid="ref2 ref4 ref5">2, 4, 5</xref>
        ], which
suggested that adversarial examples generated
against one DNN are misclassified by other DNNs,
even if they have diferent model structures or are
trained on diferent subsets of the training data.
      </p>
      <p>We show that adversarial examples against one
DNN can be correctly classified by some other
DNN models, simply by using diferent initial
random seeds in the training process. This leads to
our definition of DNN uncertainty regions.</p>
      <p>generate up to a few hundred adversarial exam- gray-scale image, and a tensor for a color image. The
Besides the three major contributions, additional
contributions of this paper are the following.</p>
      <p>1. Given one clean image, existing attack algorithms
ples. Sampling from the lower dimensional region
lead to a stronger attack, generating a lot more
adversarial examples given one clean image.
2. Far fewer pixels are perturbed to form these
hyper-rectangles compared to the existing attack
algorithms. Therefore we reduce the total amount
of perturbations added to a clean image to create
adversarial examples.</p>
      <p>The paper is organized as follows. Section 1.1 discusses
tablish the shape of DNN classification boundary and
introduces the concept of DNN uncertainty regions.
Section 3 discusses the discrepancy between the theoretically
ror bound, and the existence of adversarial examples.</p>
      <sec id="sec-1-1">
        <title>Section 4 concludes this paper.</title>
        <sec id="sec-1-1-1">
          <title>1.1. Related Work</title>
          <p>
            There are two broad categories of attacks, poisoning
attacks and evasion attacks [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Poisoning attacks inject
malicious samples into the training data, to cause the
the related work. Section 2 conducts experiments to es- attack algorithms that need large perturbations to
generproven DNN large sample property, its generalization er- (4) Fast Gradient Sign Method (FGSM); (5) Basic Iterative
line of attacks and defenses, e.g., [
            <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
            ]. In general, the
attack algorithms follow an optimization approach, i.e.,
generating adversarial examples through minimizing a
loss function.
          </p>
          <p>
            Adversarial evasion attacks against DNN are the
earliest attacks. Recently there are attacks designed to break
graph neural network (e.g., [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]), recurrent neural network
(e.g., [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]) etc. In this paper, we examine the classification
boundary and uncertainty regions of CNN and MLP. In
our experiments we use Foolbox [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], which implements
a large collection of adversarial attack algorithms.
          </p>
          <p>Let</p>
          <p>be a clean image and   be an adversarial
example. Let 
class label to  .  (
be a trained DNN model that assigns a

) ≠  ( )</p>
          <p>.  is a matrix for a
size of the matrix/tensor is determined by the image
resolution. The individual elements (pixels) in</p>
          <p>
            represent
the light level, having integer values ranging from 0 (no
light) to 255 (maximum light). The pixels are rescaled
to [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] by dividing the pixel value by 255.
          </p>
          <p>
            can be
vectorized. Assume a vectorized  is − dimensional, i.e.,
 ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]  . Some attack algorithms generate a single  
or only a handful of   s are not used in our experiments,
because there are not enough adversarial examples to
locate the region containing these   s. We also exclude
ate   . Here are the attack algorithms that are used in
our experiments: (1) Pointwise (PW) Attack; (2) Carlini
&amp; Wagner  2 (CW2) Attack; (3) NewtonFool (NF) Attack;
(MI) Attack.
          </p>
          <p>Method (BIM)  1,  2,  ∞ attacks; (6) Moment Iterative</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. DNN’s Uncertainty Regions</title>
      <p>DNN function is described by  ( ) =</p>
      <p>
        , where  is the
object class assigned to image  by a trained DNN model
used to classify the samples. The areas within the mar- [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is 1. Let  be suficiently small, hence the images
to be slightly larger than the minimum amount of adver- to locate a region containing these adversarial examples.
discover there exists multiple uncertainty regions inside
a ( ,  )
      </p>
      <p>.</p>
      <p>DNN model structure:</p>
      <sec id="sec-2-1">
        <title>Because we focus on studying</title>
        <p>the classification boundary of DNN, here the DNN model
structure must strictly remain the same. We discover
that even a minor change to the model structure, such
as adding or removing a batch normalization layer, will
lead to a diferent classification boundary.</p>
        <p>Uncertainty Region Construction:</p>
      </sec>
      <sec id="sec-2-2">
        <title>We compare an</title>
        <p>adversarial example   with the corresponding clean  .
If a pixel value in   is diferent than that in  , it is
perturbed by the attack. Given a clean  , we use one attack
algorithm and generate a suficient amount of adversarial
examples that are all mis-classified into the same wrong
object class. We examine how many pixels are perturbed
by the attack. Then we compute the interval for each
perturbed pixel (the original 
has a single value for this
pixel). The perturbed pixels are ordered by the interval
sizes from the largest to the smallest. We then construct
a hyper-rectangle starting from the largest interval, and
stop at where the subsequent intervals can be considered
as nearly a constant (which may not equal to the
original pixel value for clean  ). The detailed procedure is
described as follows.</p>
        <p>Assume  1 is the model under attack. For a given
attack algorithm and a object class  ,  ≠  where  is the
true object class of  , we combine the adversarial
examples   from both the targeted attack and the untargeted
attack, s.t.  1(</p>
        <p>) =  . We then construct the subspace
. spanned by   s. This step requires an attack algorithm
to generate suficient amount of perturbed images
at least 80-100 images, given one clean image. We notice
that diferent attack algorithms discover diferent regions
containing the adversarial examples for one clean image.
Only a handful of adversarial examples is not enough
  ,
The more adversarial examples an attack algorithm can
(a)
(b)
into disjoint regions, where at least two DNN models  
and   disagree on the hard label of  .</p>
        <p>Definition 1.</p>
        <p>
          An uncertainty region is defined as
 ∶=
{ ∶ ∃ 
 ,   ∈ ℳ, .. 
 ( ) ≠ 
 ( ) } ,
where  cannot be separated into disjoint regions in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] .
an adversarial image   , ( , 

) = || − 
We use the  2 distance between a clean image  and
 ||2. Let
 be the radius of a  − ball with the clean image 
at the center,  &gt; 0 .
        </p>
        <p>Denote the  − ball by ( ,  )
( ,  ) ∶=
{ ∶ (
 ,  ) ≤ 
}. When  is suficiently
small, the points in ( ,  )
are noisy versions of 
and
should be labeled to the same object class as  . Given
a clean image  , we can determine the value of  based
on the amount of adversarial perturbations. We choose 
sarial perturbations calculated from a number of attack
algorithms. Figure 2 (b) conceptually shows two separate
|( ,  )| =
in ( ,  )
Clean natural sample:</p>
        <p>We consider a clean
natural image as the result of taking a photo using a
camera. Regarding how many clean images we can have,
let’s consider the volume of a − dimensional ( ,  )</p>
        <p>
          . The volume of the feature space
are noisy versions of  . For a fixed  and  ,
there are only a finite number of non-overlapping  balls
in the feature space [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] . However, as  ⟶
∞ , we
have |( ,  )| ⟶ 0
        </p>
        <p>. Hence the feature space for higher
resolution color images can contain increasingly more
clean images.
intervals are ranked by interval size as  (,1) ≥  (,2) ≥ ⋯  (,) .
We construct a hyper-rectangle   () using  largest
in  () ( , ).
As. Then the
tervals with  ≤</p>
        <p>as
  () = ⊗ =1 [

  () ( ,()

),</p>
        <p>() ( ,() )].
  () is the subspace based on the adversarial examples
generated by attack  and misclassified to class  . We
choose the number of intervals  that the remaining
interval sizes are very small and the perturbations added
can be considered as approximately constant.
periments. CPU is Intel Xeon Silver 4114 and the GPU is
Nvidia Tesla P100. The code is posted on GitHub. 2</p>
        <sec id="sec-2-2-1">
          <title>2.1. MNIST CNN Experiment</title>
          <p>Here we conduct an experiment with the task to
classify MNIST dataset of 10 handwritten digit. MNIST has
60,000 training images and 10,000 test images. Each
image has 28x28 gray-scale pixels. Our model structure is</p>
          <p>
            2https://github.com/juanshu30/Understanding-AdversarialExamples-Through-DNNs-Classification-Boundary-andUncertainty-Regions
on clean test data.
the PyTorch implementation [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] of LeNet [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], which
has two convolutional layers. The model structure has
been published previously. We re-train LeNet on MNIST
to optimize the parameter values. The optimizer is SGD
with learning rate 0.01.  is a vectorized MNIST image
with pixels rescaled to [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] in PyTorch implementation.
          </p>
          <p>
            We have  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] 784. Table 1 shows the accuracy of 10
re-trained LeNet models on the MNIST test data using
different initial seeds.  1 to  10 have similar performance
          </p>
          <p>
            Intuitively, the ten handwritten digits have distinct
features that facilitate the classification task. Hence LeNet
can achieve nearly 99% accuracy. We visualize the
digits using t-Distributed Stochastic Neighbor Embedding
(t-SNE) technique [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], a nonlinear dimension reduction
technique. Figure 3 provides a 2D projection of the ten
digits, based on 2000 sampled images. We observe 10
clusters of digits though some clusters overlap slightly. We
would expect a classifier to divide up the feature space,
and allow a digit class to occupy a portion of the
feature space. Then the points away from the classification
boundary and their surrounding neighborhoods would
all belong to the same object class. Unfortunately this
is not what we see from DNN. We need to draw DNN’s
classification boundary around every clean image, not
along the border between two object classes.
          </p>
          <p>We choose a clean image  , and generate adversarial
examples using the attack algorithms listed in Section 1.1.</p>
          <p>We then construct the hyper-rectangles   () . We have
information.
obtained similar results. Due to the limited space, here
we show the results for a digit 1 from the test data. We
run the attacks against  1. Table 2 shows the following
1. The number of intervals used to construct the
hyper-rectangles. For example, CW2 1 → 6
means the digit 1 is mis-classified as 6. 150d
means   2
tervals.</p>
          <p>
            (6) is spanned by the largest 150
in2. The smallest interval size in   () , shown in
column  () . For PW attack, we use [
            <xref ref-type="bibr" rid="ref1">0,1</xref>
            ] for the
          </p>
          <p>We use Pytorch 1.5.0 and Cuda 10.2 to run all the ex- studied many test images and training images, and have</p>
          <p>Table 3 shows the 10
re-trained LeNet models’
mis-classification rates
against the original
adversarial images
generated by the attacks, and
the number of perturbed
(a) PW 1 → 8 (b) PW Sampled 1 → 8 cpoixluemls.ns iTnhTeablleeft 4thsrheoew Figure 5: Clean Image 1
the minimum amount of
perturbations ((||  − || 2)), the maximum amount of
perturbations ((||  − || 2)), and the average amount
selected pixels, since the measured interval sizes of perturbations ((||  −  || 2)) of the 1000 sampled
are all close to 1. For all other attacks, the interval images in each hyper-rectangle   () . The right three
size is measured from the added perturbations. columns in Table 4 show the same information for the
3. We sample 1000 images from each   () , and re- adversarial examples generated by the corresponding
port the misclassification rates by  1 to  10. attacks.</p>
          <p>Figure 4 shows the adversarial examples generated by
the attacks, and the adversarial images generated through
sampling in the hyper-rectangles   () . Figure 5 is the
corresponding clean image 1. Except for PointWise
attack, which changes a pixel value to 0 or 1, the rest of
 () has the maximum value 0.032, as shown in Table 2.</p>
          <p>This translates to 8 consecutive integer values on the
original 0 – 255 scale. They are very similar light
levels, and can be considered as approximately constant.</p>
          <p>If we add more dimensions to   () , the additional di- (a) CW2 5 → 6 (b) CW2 Sampled 5 → 6
mensions can be considered as moving the additional Figure 6: Images for MLP experiment
pixels to diferent values. Adding more dimensions do
not change the shape and size of   () . Instead that moves
a hyper-rectangle to a diferent location, increasing the
amount of perturbation and away from the clean image 3x512 hidden neurons and ReLU activation. We vary the
 . The hyper-rectangles   () in Table 2 perturbed far initial seeds and train 5 MLPs. The optimizer is SGD
fewer pixels than the original attacks. From Table 4, we with learning rate 0.01. Table 5 shows the MLP
missee that leads to smaller perturbations to create adver- classification rates on the clean MNIST test data. In the
sarial examples. There are more such hyper-rectangles interest of space, here we show two examples, a digit
with the same shape and size, as we add more pixels 5 and a digit 7, under Carlini &amp; Wagner  2 attack.
Taidentified by the attacks. Adding more pixels does not ble 6 shows the 5 MLPs’ mis-classification rates in the
necessarily increase the mis-classification rates by all hyper-rectangles. Table 7 shows mis-classification rates
DNNs. For Carlini &amp; Wagner  2 attack and FGSM, even- against the original adversarial images generated by the
tually the hyper-rectangle is moved to a place where  1 attack. Figure 6 shows an adversarial example
genermis-classification rate is close to 100% and  2 to  10 ated by Carlini &amp; Wagner  2 attack, and an adversarial
see near 0% mis-classification rate. This is the efect of example generated through sampling from the
hyperthe optimization approach used in the attack algorithms rectangle, based on the same clean image 5. Figure 9 (a)
against  1. We observe three types of   () in Table 2. is the corresponding clean image 5. The hyper-rectangle
for 7 → 2 lie in one DNN uncertainty region. Again
Carlini &amp; Wagner  2 attack has great success with the
target model  1 but can be correctly classified by some
other MLPs.
1. The target DNN mis-classifies most of the
adversarial examples while there exists another DNN
which correctly classifies the adversarial
examples;
2. The target model correctly classifies the
adversarial examples while another DNN mis-classifies
most of the adversarial examples;
3. The transferable adversarial regions where all</p>
          <p>DNNs mis-classify a significant proportion of the
adversarial examples.</p>
          <p>
            This phenomenon occurs to attacks adding both small
and large perturbations. The first two types of   ()
belong to DNN uncertainty regions. The existence of DNN
uncertainty regions shows transferability of adversarial
examples is not universal, contrary to [
            <xref ref-type="bibr" rid="ref2 ref4 ref5">2, 4, 5</xref>
            ].
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2. MNIST MLP Experiment</title>
          <p>Here we conduct experiment with a MLP trained on
MNIST. It is a fully connected network with 3 layers,</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>2.3. CIFAR10 MobileNet Experiment</title>
          <p>
            CIFAR10 [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] has 60,000 32x32 color images in 10 classes,
with 50,000 as training images and 10,000 as test images.
A vectorized CIFAR10 image is in [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]3072, combining
three color channels. The dimensionality of a CIFAR10
image is almost 4 times of a MNIST image. We use the
MobileNet in this experiment. Similar to Section 2.1, the
MobileNet model structure has been published previously
[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], which has an initial convolution layers followed by
19 residual bottleneck layers. We re-train the MobileNet
on CIFAR10 to optimize the parameters values. The
optimizer is SGD with learning rate 0.01; momentum is 0.9;
weight decaying is 5e-4. The mis-classification rates of
ifve re-trained MobileNet models on the clean CIFAR10
test data by varying initial seeds are in Table 8. For the
interest of space, here we show an example with an
airplane image under BIM  2 attack. The attack success
on the five re-trained MobileNet models are in Table 9.
Note BIM  2 attack perturbed 3071 dimensions and left 1
dimension untouched. The images are shown in Figure 8
and Figure 9 (b).
          </p>
          <p>
            Figure 7 shows the mis-classification rates as we
increase the dimensions of the hyper-rectangle. The largest
interval size is 0.2 and the 2000th largest interval size
is 0.017.  1 misclassifies all the sampled images
starting from around 200 perturbed dimensions.  5
correctly classifies all the sampled images. We see  2 and
 4 misclassification rates increase as more efective
dimensions are included, then decrease as we include
additional irrelevant dimensions. The 2000-dimensional
(a) BIM  2 airplane→deer (b) BIM Sampled airplane→deer
hyper-rectangle lies in one MobileNet uncertainty region.
As noted in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], the direction of adversarial perturbation
is important. Adversarial examples cannot be generated
by randomly sampling in 3072 dimensional ball ( ,  ) .
The lower dimensional hyper-rectangles   ()
containing infinitely many adversarial examples are discovered
through the attack algorithms. Table 10 shows that the
sampled adversarial images from the hyper-rectangle
have much smaller perturbations than the original attack
on CIFAR10.
          </p>
        </sec>
        <sec id="sec-2-2-4">
          <title>2.4. Uncertainty Regions vs. Transferable</title>
        </sec>
        <sec id="sec-2-2-5">
          <title>Adversarial Regions</title>
          <p>Due to the nature of the uncertainty regions, we have to
train multiple models. However the classification
boundary is established for one model – the model under attack
– not the ensemble of all the trained models. The output
of a DNN ensemble is either based on the majority vote,
or we take the average of the softmax layer outputs from
the DNN models in the ensemble. Hence a DNN
ensemble has a diferent classification boundary compared with
that of a single model used in the ensemble.</p>
          <p>For the MNIST CNN experiment, we use Figure 10 as
a conceptual plot to show the classification boundary
for  1.  1 is the model under attack. Let  be the
digit 1 used in Section 2.1. Let  = 6 . Hence the
hyperrectangles with larger perturbations are excluded from
( ,  ) .</p>
          <p>
            The blue dot in the center is the clean image. Inside
the black circle, the solid line segments are part of the
shape of an uncertainty/transferable region may not be
a hyper-rectangle. It is important to further investigate
how many uncertainty regions and transferable
adversarial regions exist in the feature space [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] , and the
exact shape and dimensionality of such regions.
Strategy for Robust Classification: If at least one
DNN assigns a label that is diferent from another DNN,
the image triggers an alert and requires additional
screening, either involving a human operator or alternative
classification boundary for  1. There are two types, classifiers. This strategy will improve the accuracy over
illustrated using two diferent colors. Type 1 regions are the adversarial examples in DNN uncertainty regions,
where   s are misclassified by  1 but can be correctly but won’t solve the problem for transferable
adversarclassified by some other model   ; type 2 regions are ial examples. Notice although an ensemble can achieve
where   s are misclassified by all the models,  1 to high predictive performance [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ], a DNN ensemble can
be attacked too. Meanwhile there is no guarantee about
the number of DNNs that can make correct decision over
each uncertainty region. We also need to understand
how to measure the size of DNN uncertainty regions vs.
          </p>
          <p>DNN transferable adversarial regions. We leave it to the
future work.
 10.</p>
          <p>The dashed lines inside the black circle are not part
of  1’s classification boundary, but they are model  1’s
uncertainty regions, because inside these regions   s
are correctly classified by  1 but misclassified by some
other model   . We call them the type 3 regions.</p>
          <p>
            Type 1 and 3 are the uncertainty regions. Type 2 are
the transferable adversarial regions which are more
dififcult to handle. Both the uncertainty regions and the 3. Generalization Error Bound and
transferable adversarial regions are lower dimensional Adversarial Examples
small “cracks” inside the small neighborhood of a clean
image. Only type 1 and type 2 regions where  1 misclas- The accuracy on clean test data is often used to measure
sifies the samples in the  − ball ( ,  ) are part of model a classifier’s performance. However, in [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ], the authors
 1’s classification boundary around the clean image  . argued the test data accuracy is not the most
appropriate performance measure, because the variability due to
The Shape and Size of Uncertainty Regions: In Ta- the randomness of the training data needs to be taken
ble 10 we see a significant reduction of perturbation for into consideration besides those due to the test data. Let
BIM  2 attack and the airplane image, because our hyper-  = ( ,  ) denote a sample, where  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]  is a −
directangle perturbed far less pixels than the original attack mensional vectorized image and  ∈ {1, ⋯ , } is the true
(2000d vs. 3017d). On the other hand, in Table 3, we see object class.  is generated independently and identically
only minor reduction for FGSM and the digit 1 because from a distribution  over [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] . We denote a training
the dimension of the hyper-rectangle is close to the origi- dataset with  sample points by   = ( 1, ⋯ ,   ). [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]
nal attack (375d vs 403d). For NF and the digit 1, although defined generalization error as (  (  ,  +1 )), where
our hyper-rectangle used 60d compared with 403d for  +1 is a test sample, and   () is the loss of applying a
the original attack, there is only a minor reduction in classifier  trained on   to  +1 . If   () is a 0-1 loss,
the total amount of perturbation. Since our approach the generalization error is defined as the error probability
relies on the existing attacking algorithms to locate the  ( ( ) ≠  ) as in [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ].
regions, the dimensionality of the regions is related to There is an extensive literature on the theoretical
genthe original attacks and the clean image itself. Further- eralization error bound for diferent type of classifiers
more, although we construct hyper-rectangles, the exact including DNN. Generalization error bound for DNN
), where (ℎ,  ℎ)
is proven to be ( (ℎ,ℎ)
          </p>
          <p>
            √
refers to a constant based on the width and depth of
a DNN model, e.g., [
            <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
            ].
          </p>
          <p>We observe there is a discrepancy between the
theoretically proven generalization error bound for DNN
and the existence of adversarial examples. Following
the theory, the generalization error on test data should
decrease to 0 at a rate proportional to  −1/2 where  is
the training sample size. However for every clean image
we show there exists a large number of adversarial
images in its neighborhood ( ,  )</p>
          <p>, for diferent network
structures and datasets. Adversarial examples also exist
for large DNN models trained on ImageNet with millions
on its generalization error. So far we see adversarial
examples exist in much lower dimensional regions, leading
to Conjecture 1.</p>
          <p>Remark 2:</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Another definition of generalization error involves the empirical error on the training data.</title>
        <p>Let  ̂</p>
        <p>(  ) = 1 ∑1 (</p>
        <p>tion error as</p>
        <p>
          ∗( ) = (
mated from the training data   . [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] defined
generaliza ) be the empirical risk
esti (  ,  +1 )) −  ̂
        </p>
        <p>
          (  ),
which is also used in some recent papers to establish
DNN theoretical guarantees. Corollary 1 in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] states
there exists neural networks with ReLU activation, depth
 , width (/)
and weights ( + ℎ)
, that can fit
exconjectures.
ior of DNN should already kick in. Here we have two
of training data, where the theoretical asymptotic behav- actly any function on   in − dimensional space.
Assume  1 and  2 are such models trained on   . Hence
(  ) = 0. Consequently we have
Conjecture 1:
        </p>
        <p>The union of these lower dimensional
bounded uncertainty regions and transferable adversarial
regions has zero probability mass.</p>
        <p>Conjecture 2:</p>
        <p>A DNN function is discontinuous at the
boundary of these lower dimensional bounded regions,
and may be discontinuous inside some of these bounded
regions. Note Lipschitz continuity is an important
assumption for proving the generalization error bound.</p>
        <p>
          The two conjectures with Theorem 1 ofer a potential
explanation for why such a discrepancy exists. Let   be
a  − dimensional region in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
        </p>
        <p>with  &lt;  . Let ℒ =
∪=1
∞   be the union of countably infinite non-overlapping</p>
        <p>
          lower dimensional regions    in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] with all   &lt;  .
on   . Assume ∀  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]  − ℒ,  1( ) = 
Theorem 1. Let  1 and  2 be two DNN models trained
        </p>
        <p>2( ) . And
assume ∃  ∈ ℒ , s.t.  1( ) ≠</p>
        <p>2( ) . We have
(
 1(  ,  +1 )) = (</p>
        <p>2(  ,  +1 )).</p>
        <p>
          Proof: For any continuous distribution  on [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] ,
 (ℒ ) = 0 , i.e., the lower dimensional ℒ has 0 probability
mass. For two functions that difer only on 0 probability
region, we have
(
 1(  ,  +1 )) = (
        </p>
        <p>2(  ,  +1 )). ■
Remark 1:</p>
        <p>
          Theorem 1 means the definition of
generalization error cannot tell the diference between a trained
classifier that assign correct labels to all the points in
[
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] and a diferent classifier that assign wrong labels
only to countably infinite lower dimensional bounded
regions. For example, let   ,  = 1, 2, ... , s.t.   ≠   if
 ≠  . Assume  = ∪ =∞1 [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]−1 ⊗   , be the union of
countably infinite non-overlapping [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]−1 regions. A
classifier can assign wrong labels to  without any impact
 ̂

 1
∗( 1) = 
(  ) =  ̂
        </p>
        <p>2
∗( 2).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>A limitation of our work is that we rely on the existing
attack algorithms to locate these hyper-rectangles. Also
our approach works with low resolution images. Again
we leave it to the future work to capture the shape of the
DNN classification boundary in very high dimensional
feature space.</p>
      <p>
        We gain important insights from this study. A DNN
model draws the classification boundary around every
image instead of along the border between the object
classes. This helps a DNN model to achieve high accuracy
and low generalization error for complex tasks but leaves
space for it to be attacked. How to seal these small cracks
surrounding every image is a very dificult problem, as
we witness the success of the adaptive attacks [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The
insights gained from this study points to the problem
where a robust DNN model should work on.
      </p>
      <p>
        Understanding the shape of DNN’s classification
boundary also provides insights to defend against the
backdoor attacks [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. As with many other classifiers,
we need to understand how the change in the training
data moves the classification boundary, in order to firmly
close the backdoor.
      </p>
      <p>We conclude that the adversarial examples stem from
a structural problem of DNN. DNN’s classification
boundary is unlike that of any other classifier. Current defense
strategies do not address this structural problem. We also
need new theory to describe the phenomenon of
adversarial examples and measure the robustness of DNN.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work is supported in part by US Army Research
Ofice award W911NF-17-1-0356 and US Army Research</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems (NIPS) 25</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaremba</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bruna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <article-title>Intriguing properties of neural networks</article-title>
          ,
          <source>in: The 2nd International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Crecchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bacciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          ,
          <article-title>Detecting adversarial examples through nonlinear dimensionality reduction</article-title>
          ,
          <source>in: 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>488</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shlens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <article-title>Explaining and harnessing adversarial examples</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6572</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Papernot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Boneh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McDaniel</surname>
          </string-name>
          ,
          <article-title>The space of transferable adversarial examples</article-title>
          ,
          <source>arXiv preprint arXiv:1704.03453</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Tesla</surname>
            <given-names>AutopilotAI</given-names>
          </string-name>
          ,
          <source>Tesla Artificial Intelligence &amp; Autopilot</source>
          , https://www.tesla.com/autopilotAI,
          <year>2022</year>
          . Last accessed Feb.
          <volume>01</volume>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>B. Xi,</surname>
          </string-name>
          <article-title>Adversarial machine learning for cybersecurity and computer vision: Current developments and challenges</article-title>
          ,
          <source>Wiley Interdisciplinary Reviews: Computational Statistics</source>
          <volume>12</volume>
          (
          <year>2020</year>
          )
          <article-title>e1511</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roli</surname>
          </string-name>
          ,
          <article-title>Wild patterns: Ten years after the rise of adversarial machine learning</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>84</volume>
          (
          <year>2018</year>
          )
          <fpage>317</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Transferring robustness for graph neural network against poisoning attacks</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Web Search and Data Mining</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>600</fpage>
          -
          <lpage>608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lanchantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Sofa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <article-title>Black-box generation of adversarial text sequences to evade deep learning classifiers</article-title>
          ,
          <source>in: 2018 IEEE Security and Privacy Workshops (SPW)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bethge</surname>
          </string-name>
          , W. Brendel,
          <article-title>Foolbox native: Fast adversarial attacks to benchmark the robustness of machine learning models in pytorch, tensorflow, and jax</article-title>
          ,
          <source>Journal of Open Source Software</source>
          <volume>5</volume>
          (
          <year>2020</year>
          )
          <fpage>2607</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Voichita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Khatri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Draghici</surname>
          </string-name>
          ,
          <article-title>Identifying uncertainty regions in support vector machines using geometric margin and convex hulls</article-title>
          ,
          <source>in: IEEE International Joint Conference on Neural Networks</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>3319</fpage>
          -
          <lpage>3324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tuia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ratle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pacifici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Kanevski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Emery</surname>
          </string-name>
          ,
          <article-title>Active learning methods for remote sensing image classification</article-title>
          ,
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>47</volume>
          (
          <year>2009</year>
          )
          <fpage>2218</fpage>
          -
          <lpage>2232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] PyTorch, PyTorch: Adversarial Example Generation, https://pytorch.org/tutorials/beginner/fgsm_ tutorial.html,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , L. Bottou,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hafner</surname>
          </string-name>
          ,
          <article-title>Gradientbased learning applied to document recognition</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>86</volume>
          (
          <year>1998</year>
          )
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Van der Maaten</surname>
          </string-name>
          , G. Hinton,
          <article-title>Visualizing data using t-sne.</article-title>
          ,
          <source>Journal of machine learning research 9</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <article-title>Learning multiple layers of features from tiny images</article-title>
          ,
          <source>Master's thesis</source>
          , University of Tront (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sandler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhmoginov</surname>
          </string-name>
          , L.-C.
          <article-title>Chen, MobilenetV2: Inverted residuals and linear bottlenecks</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>4510</fpage>
          -
          <lpage>4520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>A survey on ensemble learning</article-title>
          ,
          <source>Frontiers of Computer Science</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>241</fpage>
          -
          <lpage>258</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nadeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Inference for the generalization error</article-title>
          ,
          <source>Machine Learning</source>
          <volume>52</volume>
          (
          <year>2003</year>
          )
          <fpage>239</fpage>
          -
          <lpage>281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaariainen</surname>
          </string-name>
          ,
          <article-title>Generalization error bounds using unlabeled data</article-title>
          ,
          <source>in: International Conference on Computational Learning Theory</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Bartlett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Telgarsky</surname>
          </string-name>
          ,
          <article-title>Spectrallynormalized margin bounds for neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Golowich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rakhlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Shamir</surname>
          </string-name>
          ,
          <article-title>Sizeindependent sample complexity of neural networks</article-title>
          ,
          <source>in: Conference On Learning Theory, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Bengio,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Recht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <article-title>Understanding deep learning requires rethinking generalization</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>107</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Brendel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madry</surname>
          </string-name>
          ,
          <article-title>On adaptive attacks to adversarial example defenses</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1633</fpage>
          -
          <lpage>1645</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Targeted backdoor attacks on deep learning systems using data poisoning</article-title>
          ,
          <source>arXiv preprint arXiv:1712.05526</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>