<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adversarial Robustness for Face Recognition: How to Introduce Ensemble Diversity among Feature Extractors?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Takuma Amada</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kazuya Kakizaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Toshinori Araki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seng Pei Liew</string-name>
          <email>sengpei.liew@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joseph Keshet</string-name>
          <email>jkeshet@cs.biu.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Furukawa</string-name>
          <email>jun.furukawa1971g@nec.com</email>
          <email>jun.furukawa@necam.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bar-Ilan University</institution>
          ,
          <addr-line>Ramat Gan, 52900</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NEC Corporation</institution>
          ,
          <addr-line>7-1, Shiba, 5-chome Minato-ku, Tokyo 108-8001</addr-line>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NEC Israel Research Center</institution>
          ,
          <addr-line>2 Maskit Street, Herzliya Pituach</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>An adversarial example (AX) is a maliciously crafted input that humans can recognize correctly, while machine learning models cannot. This paper considers how to turn deep learning-based face recognition systems to be robust against AXs. A large number of studies have proposed methods for protecting machine learning-classifiers from AXs. One of the most successful methods among them is to prepare an ensemble of classifiers and promote diversity among them. Face recognition typically relies on feature extractors instead of classifiers. We found that directly applying this successful method to feature extractors leads to failure. We show that this failure is due to a lack of true diversity among the feature extractors and fix it by synchronizing the direction of features among models. Our method significantly enhances the robustness against AXs under the white box and black box settings while slightly increasing the accuracy. We also compared our method with adversarial training.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Deep neural networks (DNNs) have become core
components of many essential services as their performance
has gone beyond the human capability of recognition in
many tasks
        <xref ref-type="bibr" rid="ref15 ref15 ref23 ref23 ref23 ref36 ref43">(Parkhi, Vedaldi, and Zisserman 2015; Schroff,
Kalenichenko, and Philbin 2015; Szegedy et al. 2015; He
et al. 2016)</xref>
        . Face recognition is one of the most widely used
services that rely on DNNs
        <xref ref-type="bibr" rid="ref49">(Sun et al. 2014)</xref>
        , ranging from
immigration inspection to smartphone authentication.
However, the vulnerability of deep learning under adversarial
examples has begun to threaten its promises
        <xref ref-type="bibr" rid="ref52">(Szegedy et al.
2014)</xref>
        .
        <xref ref-type="bibr" rid="ref46">(Singh et al. 2020)</xref>
        discuss a wide variety of attacks.
      </p>
      <p>An adversarial example (AX) is an inconceivably
perturbed input that deceives the machine learning model into
miss-classification of it, while a human can correctly
classify it. Some attacking methods need an entire code of the
model, a white-box setting, while some need only oracle
access to the model, a black-box setting. The threat became
very plausible when Sharif et al. (2016) showed physical
glasses could deceive the machine learning model in the
black-box setting. As a result of such an attack, a person
on the blacklist may wear a slightly fancy glass to evade
immigration inspection performed by machines.</p>
      <p>
        Many works proposed how to prevent AXs, and many
of them are broken back, leading to an arms race between
defenders and attackers. A model ensembling
        <xref ref-type="bibr" rid="ref1 ref20 ref21 ref25 ref4 ref5 ref62 ref8 ref9">(Abbasi and
Gagne´ 2017; Dabouei et al. 2020; Kariyappa and Qureshi
2019)</xref>
        is one of the most successful prevention methods
among them. In particular, the adaptive diversity promoting
(ADP) method
        <xref ref-type="bibr" rid="ref34">(Pang et al. 2019)</xref>
        that promotes the
diversity of models in the ensemble is the most successful.
Although its defense is immature like all others when attacked
adaptively
        <xref ref-type="bibr" rid="ref46">(Trame`r et al. 2020)</xref>
        , such a strategy has an
advantage. It is orthogonal to other defensive approaches that
focus on enhancing single-model adversarial robustness and
can be used in tandem to achieve further adversarial
robustness. Adversarial training
        <xref ref-type="bibr" rid="ref10 ref25 ref62 ref8">(Zhong and Deng 2019)</xref>
        , another
successful AX prevention method, is an example of such a
tandem method.
      </p>
      <p>We are interested in applying the ADP method to enhance
the robustness of face recognition against AXs. Face
recognition commonly relies on a machine learning feature
extractor since it enables the service to register a huge
number of new faces without retraining the network. We
experimented with ADP’s direct application and found that it does
not improve feature extractors’ robustness against AXs.</p>
      <p>We consider the cause of failure is that the directions of
different models’ features are not comparable in a
meaningful way, and thus their diversity is insignificant. We
propose letting all the models in the ensemble share weight
vectors of their final layers so that features in different models
can compare themselves in a common coordinate. We also
propose to promote the diversity of feature vectors directly
rather than the diversity of the classifier. Figure 1 illustrates
our method.</p>
      <p>
        We have experimented with ADP, our methods, and
several variants. We trained them in various methods such as
ArcFace
        <xref ref-type="bibr" rid="ref10 ref62">(Deng et al. 2019)</xref>
        and CosFace
        <xref ref-type="bibr" rid="ref56">(Wang et al. 2018)</xref>
        ,
with the MS1MV2 dataset
        <xref ref-type="bibr" rid="ref10 ref62">(Deng et al. 2019)</xref>
        , the refined
version of the MS-Celeb-1M, and verified by VGG2 (Cao
et al. 2018). We measured the robustness by AXs adaptively
generated by I-FGSM, an iterative variant of Fast Gradient
Signed Method (FGSM)
        <xref ref-type="bibr" rid="ref14 ref15 ref23">(Goodfellow, Shlens, and Szegedy
2015)</xref>
        , Basic Iterative Method (BIM)
        <xref ref-type="bibr" rid="ref1 ref20 ref21 ref27 ref29 ref32 ref33 ref39 ref4 ref5">(Kurakin, Goodfellow,
and Bengio 2017a; Madry et al. 2018)</xref>
        , and Carlini &amp;
Wagner (CW) attack
        <xref ref-type="bibr" rid="ref1 ref20 ref21 ref4 ref5">(Carlini and Wagner 2017b)</xref>
        . We confirmed
that ADP affects neither accuracy nor robustness against
AXs. On the other hand, our method significantly enhanced
the robustness to AXs in both white-box and black-box
settings without harming its accuracy at all.
      </p>
      <p>Despite its enhancement, our model’s robustness was not
so distinctly high that we can still generate successful AXs
for all legitimate samples in the white-box setting if with
sufficiently large perturbation. Although our method does show
significantly lower transferability than others in a black-box
setting, we found that very well forged AXs often, with
sufficiently large perturbation, can again deceive the machine
recognition with very high probability.</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>Face Recognition by DNN Feature Extractor</title>
        <p>Face recognition typically belongs to either face
identification or face verification. The former is also called closed-set
face classification and assumes the probed image belongs to
one of the objects enrolled in the gallery. The latter is also
called open-set face classification and can reject a probed
image with no corresponding object in the gallery. We focus
on open-set face recognition.</p>
        <p>
          Modern open-set face recognition systems commonly
consist of a DNN-based feature extractor that maps an input
image into a low dimensional feature space
          <xref ref-type="bibr" rid="ref49">(Sun et al. 2014)</xref>
          .
Then we can measure the similarity of the two images by
the distance between their two corresponding feature
vectors. The commonly used distances are the Euclidean
distance and the cosine distance. A feature extractor excels in
face recognition service as it requires no retraining, unlike
classifiers when it registers a new face. 1
        </p>
        <p>
          There are two major ways of training the DNN-based
feature extractors. One way is to train a normal multiclass
DNN classifier initially and then regard the output of the
penultimate layer of the DNN as the feature vector (i.e.,
the DNN without the last fully connected layer is a
feature extractor)
          <xref ref-type="bibr" rid="ref15 ref23 ref36 ref49">(Parkhi, Vedaldi, and Zisserman 2015; Sun
et al. 2014)</xref>
          . Another approach is to train the feature
extractor directly using the triplet loss (this is also called
metric learning)
          <xref ref-type="bibr" rid="ref15 ref23 ref43">(Schroff, Kalenichenko, and Philbin 2015)</xref>
          . A
triplet consists of two matching face thumbnails and a
nonmatching face thumbnail, and the loss aims to separate the
positive pair from the negative by a distance margin. As
the number of triplet combinations increases exponentially,
triplet loss tends to be harder to scale. We focus on the
former approach in this work.
        </p>
        <p>
          Feature extractors trained through simple softmax outputs
are effective for closed-set classification tasks but are not
discriminative enough for open-set verification. An angular
margin penalty such as ArcFace
          <xref ref-type="bibr" rid="ref10 ref62">(Deng et al. 2019)</xref>
          ,
CosFace
          <xref ref-type="bibr" rid="ref56">(Wang et al. 2018)</xref>
          , and Sphereface
          <xref ref-type="bibr" rid="ref30">(Liu et al. 2017)</xref>
          ,
and other approaches such as
          <xref ref-type="bibr" rid="ref55">(Wan et al. 2018)</xref>
          is a form of
the loss function that has successfully improved the
discriminative power of face verification by feature vectors. The
angular margin penalty modifies the final prediction layer to
enforce the true label’s predictions to be more restrictive and
discriminative than the vanilla softmax by penalty. In such a
way, the loss transfers the penalty against the prediction into
the distance between features, features with high inter-class
variance and low intra-class variance.
        </p>
        <p>We briefly review the angular margin penalty. We start
from the cross-entropy loss LCE (x; y) for a trainable
parameter , an input x, and its true label y. We assume g(x; )
is the classification over n classes and is the fully
connected final layer’s softmax output whose input is the output
f (x; ) 2 Rd of the d-dimensional penultimate layer. Then,
letting Wj 2 Rd and bj 2 R for j = 1; : : : ; n, respectively,
an final weight vector and a final bias for the j-th prediction,
eWy f(x)+by
LCE (x; y) := T 1y log g(x) = log Pn
`=1 eW` f(x)+b`
:</p>
        <p>The loss function of ArcFace LARC; ; (x; y)
L2normalizes f and Wj , lets b` = 0, and introduces penalty
and smoothness hyperparameter . With cos (x; `) =
W` f(x)
kW`k kf(x)k , it is;</p>
        <p>LARC; ; (x; y)
= log</p>
        <p>e cos( (x;y)+ )
e cos( (x;y)+ ) + P`2f1;:::;ngny e cos (x;`) :
1Because of this property, we need to evaluate the accuracy
of feature extractor based face recognition by datasets that do not
share the labels (identities of the owners of faces) with the training
dataset.</p>
        <p>The loss function of CosFace is with a slightly different form
of the penalty. We call each model of both ArcFace and
CosFace as a single model and compare it with other models in
our experiments. The feature extractor trained by this
network is f~( ; ) 2 Rd, which is the normalization of f ( ; ).</p>
      </sec>
      <sec id="sec-2-2">
        <title>Adversarial Examples</title>
        <p>
          While deep neural networks are successful to show high
accuracy in their tasks, they are vulnerable to AXs
          <xref ref-type="bibr" rid="ref1 ref14 ref15 ref20 ref21 ref23 ref4 ref5">(Goodfellow, Shlens, and Szegedy 2015; Carlini and Wagner 2017b)</xref>
          .
An AX is a crafted input xadv, which is different from
a source image xs by , i.e., xadv = xs + , so that
the target classifier misclassifies it but not by the humans.
Many works search for a small that humans cannot
perceive, while some works such as
          <xref ref-type="bibr" rid="ref24">(Kakizaki and Yoshida
2020)</xref>
          search for a large that humans consider natural.
I-FGSM, an iterative variant of the Fast Gradient Signed
Method (FGSM)
          <xref ref-type="bibr" rid="ref14 ref15 ref23 ref32">(Goodfellow, Shlens, and Szegedy 2015;
Madry et al. 2018)</xref>
          , searches by moving in the negative
gradient direction to the target label. Basic Iterative Method
(BIM)
          <xref ref-type="bibr" rid="ref1 ref20 ref21 ref27 ref29 ref33 ref39 ref4 ref5">(Kurakin, Goodfellow, and Bengio 2017a)</xref>
          is an
extension of I-FGSM where the search is within a given
boundary. Carlini &amp; Wagner (CW) attack
          <xref ref-type="bibr" rid="ref1 ref20 ref21 ref4 ref5">(Carlini and Wagner
2017b)</xref>
          searches the best by formalizing the problem as
optimization. The above I-FGSM, BIM, and CW methods are
the major strong attacks available today to generate AXs.
        </p>
        <p>
          <xref ref-type="bibr" rid="ref1 ref20 ref21 ref39 ref4 ref5">(Rozsa, Gu¨nther, and Boult 2017)</xref>
          proposed a general
method, called LOTS, to generate AXs such that an internal
layer representation is close to that of a target by iteratively
adding perturbations to the source input. It uses a Euclidean
loss defined on the internal layer representations of the
origin and the target. It applies its gradient to the source input to
manipulate the source input’s internal layer representation.
We can generate an AX for feature extractors by applying
LOTS to the features layer, which is also an internal
representation of the classifier that is to be a feature extractor.
LOTS is sufficiently general to employ I-FGSM, BIM, and
CW for the underlying perturbation method.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Defenses against Adversarial Examples</title>
        <p>
          Various studies have proposed methods to make DNNs
robust against AXs. They include adversarial
training
          <xref ref-type="bibr" rid="ref14 ref15 ref23 ref32">(Goodfellow, Shlens, and Szegedy 2015; Madry et al.
2018)</xref>
          , feature transformation
          <xref ref-type="bibr" rid="ref57 ref59 ref61">(Xu, Evans, and Qi 2018)</xref>
          ,
statistical analysis
          <xref ref-type="bibr" rid="ref57 ref61">(Zheng and Hong 2018)</xref>
          , manifold
learning
          <xref ref-type="bibr" rid="ref41 ref57 ref61">(Samangouei, Kabkab, and Chellappa 2018)</xref>
          ,
knowledge distillation
          <xref ref-type="bibr" rid="ref35">(Papernot et al. 2016)</xref>
          , selective
dropout
          <xref ref-type="bibr" rid="ref16 ref19">(Goswami et al. 2018, 2019)</xref>
          , an ensemble of
models, and more. Despite their partial success, none of them
completely prevents AXs from deceiving DNNs with
cleverer attack techniques such as
          <xref ref-type="bibr" rid="ref1 ref2 ref20 ref21 ref27 ref29 ref33 ref39 ref4 ref5 ref57 ref61">(Carlini and Wagner 2017a;
Athalye, Carlini, and Wagner 2018)</xref>
          . An adversarial
training, a relatively successful defense, trains DNNs by
generating AXs and including them in the training data
          <xref ref-type="bibr" rid="ref1 ref14 ref15 ref20 ref21 ref23 ref27 ref29 ref4 ref5">(Goodfellow, Shlens, and Szegedy 2015; Kurakin, Goodfellow, and
Bengio 2017b)</xref>
          . However, it can not sufficiently defend the
model against some of the AXs that have not appeared
during the training
          <xref ref-type="bibr" rid="ref12">(Gilmer et al. 2019)</xref>
          . There have also been
studies on certified defenses, of which the aim is to train
DNNs in a provably robust fashion
          <xref ref-type="bibr" rid="ref1 ref20 ref21 ref25 ref37 ref4 ref5 ref57 ref57 ref57 ref61 ref61 ref62 ref7 ref8">(Cohen, Rosenfeld, and
Kolter 2019; Hein and Andriushchenko 2017; Wong et al.
2018; Raghunathan, Steinhardt, and Liang 2018; Wong and
Kolter 2018)</xref>
          . However, this approach’s successes are largely
limited to simple DNN architectures and datasets with low
resolution.
        </p>
        <p>
          Some works such as
          <xref ref-type="bibr" rid="ref40">(Russakovsky et al. 2015)</xref>
          showed
that an ensemble of different models improves the
generalizability in image classification tasks. It then turns out that
the ensemble model is also successful defenses against AXs.
          <xref ref-type="bibr" rid="ref34">(Pang et al. 2019)</xref>
          proposed adaptive diversity promotion
(ADP), which trains an ensemble of models so that their
non-maximal predictions vary largely. Since non-maximal
predictions differ greatly, it is hard to align them to maximal
prediction in an AX.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Direct Integration Adaptive Diversity Promoting (ADP) Method into Feature Extractors</title>
        <p>Despite the relative success of the ensemble model, all of the
works are validated only for classifications. We can directly
integrate the adaptive diversity promoting (ADP) method
into the angular margin penalty for feature extraction. We
assume that our ensemble is composed of K models, and
thus all parameters have K duplicates. Each model predicts
over n labels in training. Let the output of the k-th model that
represents the j-th label in the ensemble be gk;j . The
ensemble prediction Gj for the j-th label is the average of all the
output of predictions. That is, Gj (x) = K1 PkK=1 gk;j (x).
Then, Shannon Entropy of the distribution fGj gj=1;:::;n is
H(G(x)) =</p>
        <p>N
X Gj (x) log(Gj (x)):
j=1</p>
        <p>Let gk;ny 2 Rn 1 be such an (n 1)- dimensional
vector that is gk 2 Rn except for the y-th element,
i.e., true prediction. Let g~k;ny 2 Rn 1 be L2-normalized
gk;ny 2 Rn 1, and let an (n 1) K matrix M (x; y) =
(g~1;ny(x); : : : ; g~K;ny(x)) 2 R(n 1) K . Then, the ensemble
diversity of non-maximal (y) prediction of x is</p>
        <p>ED(x; y) = det(T M (x; y) M (x; y)):
Here, the operation of ” ” is the multiplication of K
(n 1) matrix and (n 1) K matrix, whose result is
K K matrix. Geometrically, it is the volume of spaces
that fg~1;ny(x); : : : ; g~K;ny(x)g spans. With hyperparameters
and , the regularizer for promoting adaptive diversity is
ADP ; (x; y) =</p>
        <p>H(G(x)) +</p>
        <p>log(ED(x; y)):
k</p>
        <p>With fLM; ; (x; y)gk=1;:::;K where M represents either
ArcFace or CosFace by ARC or COS, the ADP method
trains the model by optimizing parameters for minimum
K
X
k=1
LE;M;ADP; ; ; ; :=</p>
        <p>k
LM; ; (x; y)</p>
        <p>ADP ; (x; y):</p>
        <p>The feature extractor is the normalized penultimate layer
output (f~k 2 Rd)k=1;:::;K . We let the ensemble features F
be the average of all the output of features extractors as
which we can compare to the original k(x; `) such that
F (x) =
1 N</p>
        <p>X f~k(x):
K
k=1
Here, we omitted . We use the Euclidean distance between
the above ensemble feature F (x) of the input x and the
registered feature F (x0) of another input x0 as a measure of
similarity for our verification task.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Our Approach</title>
      <sec id="sec-3-1">
        <title>Problem and Challenge</title>
        <p>We will later see in our experiments that ADP scarcely
enhances the robustness of the network against AXs. We see
its cause as below. Let normalized weight vectors of the
final layer by all the models be fW~ `kg(k;`)2f1;:::;Kg f1;:::;ng.
Here, k runs for models in the ensemble, and ` runs for
labels. Let ff~kgk be the feature extractors. Again, k runs for
models in the ensemble. The diversity promotion in the
ensemble aims to enforce any perturbation to input x brings
it to such x0 that features ff~k(x0)gk of different models are
close to W~ `k of different `’s. The ADP is likely to achieve
such a property. However, the difference in directions of
features does not imply the difference of directions from
learned features, i.e., the weight vectors. For example,
suppose that f~1(x) near W~ 11 moves to a very different direction
from the direction of which f~2(x) near W~ 12 moves when we
perturb x by . However, if these directions are such that
f~1(x) moves to W~ 21 and that f~2(x) moves to W~ 22, the
difference in directions does not prevent AXs. Since weight
vectors are independent among different models, the
previous approaches do not prevent such situations from
occurring. Therefore, we need to promote the diversity of features
among all the ensemble models to be diverse relative to
respective weight vectors’ positions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Our Solution</title>
      </sec>
      <sec id="sec-3-3">
        <title>Shared Representative Vector:</title>
        <p>We propose to share
weight vectors fW~ `kg(k;`)2f1;:::;Kg f1;:::;ng of the final
layer by all the models. We propose W~ `k for all k are the
same and let W~ ` denote it. The change promotes all the
label y features to be close to the same W~ y independent of
models. Hence, the adversary needs to find the perturbation
of x, which moves all features ff~k(x)gk=1;:::;K from W~ y to
W~ y0 . Suppose the directions of the susceptibility of features
to perturbation are different among different models. Then,
it is hard for an adversary to find perturbation that moves
features in the same direction. If we promote diversity of
features, we can expect the chart of features around weight
vector are different among models and can expect that the
ensemble increases its robustness to AXs.</p>
        <p>~</p>
        <p>We call the weight vector W` that all models share as
shared representative weight vectors (SRV). Let k(x; `) be
the angle, i.e., the arc, between W~ ` and f~k(x) in Rd. That is,
k(x; `) is such that,
cos k(x; `) = W~ ` f~k(x);
cos k(x; `) =</p>
        <p>W`k fk(x)
kW`kk kfk(x)k
= W~ `k f~k(x):
We have the loss function with the shared representative
weight vector as
=</p>
        <p>LE;ARC;SRV; ; (x; y)
K
X log
k=1</p>
        <p>e cos( k(x;y)+ )
e cos( k(x;y)+ ) + P`2f1;:::;ngny e cos k(x;`) :
We expect that feature extractors trained by the following
loss function are robust to AXs.</p>
        <p>LE;ARC;SRV;F DP; ; ; (x; y)
=</p>
        <p>LE;ARC;SRV; ; (x; y)</p>
        <p>FDP (x):
We define LE;COS;SRV;F DP; ; ; (x; y) similarly.
Feature Diversity Promotion: In ADP, we promote
nonmaximal predictions of models in the ensemble to be
diverse. However, it is a feature that we want it hard for
adversaries to manipulate rather than predictions. Hence, we
choose to promote the diversity of ensemble features
directly. We can measure the diversity EDfeat of ensemble
features at x in the same manner as before. Let F~(x) =
(f~1(x); : : : ; f~K (x)) 2 Rd K be d K matrix. Then,</p>
        <p>EDfeat(x) = det(T F~(x) F~(x))
Here, the determinant is on the K K matrix. We promote
the ensemble feature extractors to be such that their features
are diverse by the following regularizer.</p>
        <p>FDP (x) =</p>
        <p>log(EDfeat(x))
The weighting coefficient is a hyperparameter. The
regularizer has no term of Shannon entropy, unlike ADP ; ,
since we do not need to balance features with values such as
maximal prediction in ADP.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Implementation Details</title>
        <p>
          We followed the same training process that is in
          <xref ref-type="bibr" rid="ref10 ref62">(Deng et al.
2019)</xref>
          . We adopt an MS1MV2 dataset
          <xref ref-type="bibr" rid="ref10 ref62">(Deng et al. 2019)</xref>
          , the
refined version of the MS-Celeb-1M dataset, for the
training, and VGG2 for the verification. The training dataset in
the MS1MV2 dataset includes 5.8M face images and 85K
identities. For data preprocessing, we crop face images to
the size of 112 112 and align face images by utilizing
MTCNN
          <xref ref-type="bibr" rid="ref60">(Zhang et al. 2016)</xref>
          .
        </p>
        <p>
          For the embedding network, we employ the widely used
CNN architecture, MobileFacenet
          <xref ref-type="bibr" rid="ref6">(Chen et al. 2018)</xref>
          .
After the last convolutional layer, we explore the BN
          <xref ref-type="bibr" rid="ref15 ref23">(Ioffe
and Szegedy 2015)</xref>
          -Dropout
          <xref ref-type="bibr" rid="ref48">(Srivastava et al. 2014)</xref>
          -FC-BN
structure to get the final 512-dimensional embedding
feature.
        </p>
        <p>
          We follow
          <xref ref-type="bibr" rid="ref56">(Wang et al. 2018)</xref>
          to set the feature to scale
= 64 and choose the angular margin of ArcFace at =
0:5. We choose the angular margin of CosFace at = 0:35.
        </p>
        <p>We set the batch size to 256 and train the ensemble
consisting of three feature extractors on one NVIDIA Tesla V100
(32GB) GPU. We set the initial learning rate is 10 3, and we
divide it by 10 at 12, 15, and 18 epoch. The training process
finishes at 20 epoch.</p>
        <p>
          We have experimented with the effectiveness of
regularizers ADP ; (x; y) and FDP (x) to the robustness of
feature extractors trained by the ensemble model of
ArcFace and CosFace. The hyperparameter of regularizers we
tested are ( ; ) = (2:0; 0:5); (2:0; 10:0), and (2:0; 50:0)
for ADP, and = 1:0; 10:0, and 50:0 for FDP. We also
compared our method with one of the best adversarial training,
which exploits margin-based triplet embedding
regularization
          <xref ref-type="bibr" rid="ref10 ref25 ref62 ref8">(Zhong and Deng 2019)</xref>
          . They have not experimented
with MobileFacenet that we adopted. They also observed
that the robustness varies depending on the margin in the
triplet embedding regularization term. Hence, we tried
several hyperparameter values of m = 0:2; 0:6; 1:4, and 3:0 to
find the best one for MobileFacenet.
        </p>
        <p>We applied attacks such as I-FGSM, BIM, and CW to
the following models in the LOTS framework. (1) ”single
model,” which is the original model, (2) ”baseline,” which
is a simple ensemble of original models, (3) ”ADP,” which
is a simple ensemble model with ADP regularizer, and (4)
”FDP” which is a simple ensemble model with FDP
regularizer. (5) ”SRV,” which is an ensemble with a shared
representative vector, (6) ”SRV+ADP,” which is SRV with ADP
regularizer, (7) ”AdvT for m = 0:2; 0:6; 1:4; 3:0,” which is
adversarial training, and (8) ”Our method,” which is our full
model that has SRV with FDP regularizer. The number of
models in each ensemble is three (K = 3).</p>
        <p>All the attacks are adaptive, which means we applied
methods to robustified networks. We could compare our
method to only those previous methods we experimented
with ourselves since their reported results are in different
conditions.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Performance on Legitimate Samples</title>
        <p>We verified the accuracy of verifications on the VGG2
dataset. Table 1 shows ROC-AUCs of the single model,
ADP, AdvT, and our method (SRV+FDP) for ArcFace and
CosFace. The hyperparameters resulting from best
ROCAUC are ( ; ) = (2:0; 0:5), m = 0:6, and = 10:0. Our
proposed SRV+FDP is not only no worse than the original
single model but performs best.</p>
        <p>We also show the accuracy of verification on
several legitimate datasets of different conditions in Table 2.</p>
        <sec id="sec-4-2-1">
          <title>Loss</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>ArcFace</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>CosFace</title>
          <p>method
single model
Baseline</p>
          <p>ADP
FDP</p>
          <p>SRV
SRV+ADP</p>
          <p>AdvT</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Our method</title>
        <p>single model
SRV+ADP</p>
      </sec>
      <sec id="sec-4-4">
        <title>Our method</title>
        <p>89:60
89:71
86:20
89:94
90:94
86:62
90:80
89:97
88:29
88:33
91:14
94:22
94:15
90:52
94:13
94:93
93:98
94:38
95:15
93:07
92:43
94:97</p>
        <p>
          These datasets are FW (Huang et al. 2007),
AgeDB30
          <xref ref-type="bibr" rid="ref33">(Moschoglou et al. 2017)</xref>
          , and CFP-FP
          <xref ref-type="bibr" rid="ref44">(Sengupta et al.
2016)</xref>
          , which provide face data in an unconstrained setting.
We can see that our SRV+FDP is remarkably often better
slightly than the single model. On the other hand, ADP and
SRV+ADP are often worse than the single model. We can
use our method in tandem with other methods. The
experiments are with ArcFace and CosFace.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Robustness against Adversarial Examples in</title>
      </sec>
      <sec id="sec-4-6">
        <title>White-Box Setting</title>
        <p>We generate AXs by LOTS with I-FGSM, BIM, and CW in
a white-box setting. We did not limit the number of
iterations in all attacks, which swells more than 1,000 in some
cases. We chose a rather large boundary ( = 0:1) in BIM.
We randomly sample 1000 pairs of different identity images
from the VGG2 test dataset and generated 500 AXs. As we
imposed less on AXs, all the attacks have been successful
with their different perturbation sizes. We let the learning
rate of underlying SGD to be 0:1. We note that we can still
recognize images with the largest perturbation by our human
eyes.</p>
        <p>We compared the robustness of various methods through
the size of perturbation. We define the -attack success rate
as</p>
        <p>Acc = jfxadvjxadv 2 AX ; kxadv
xsk2 &lt;</p>
        <p>gj : (1)
jAX j
Here, xs is a legitimate sample, and xadv is the AX created
from xs. AX is a set of all the successful AXs generated
in the white-box setting. This measurement represents the
proportion of adversarial samples whose perturbation size
is less than to all legitimate images. We can say that the
larger perturbation we need to fool feature extractors, the
more robust they are.</p>
        <p>Graphs in the upper row of Figure 2 show the Acc of
LOTS via I-FGSM, BIM, and CW, respectively, for
ArcFace. We observed that (1) the baseline, ADP, and the FDP
do not enhance the robustness against AXs in a noticeable
manner, (2) SRV alone enhances robustness, (3) the
addition of ADP to SRV does not enhance robustness, (4) Our</p>
        <sec id="sec-4-6-1">
          <title>LOTS via I-FGSM (CosFace)</title>
          <p>LOTS via BIM (CosFace)</p>
        </sec>
        <sec id="sec-4-6-2">
          <title>LOTS via CW (CosFace)</title>
          <p>FDP+SRV has the best robustness. We see the same trends
for another common loss function of CosFace in graphs in
the lower row of Figure 2.</p>
          <p>Graphs in Figure 3 compare our method with
adversarial training by the Acc of LOTS via I-FGSM, BIM, and
CW, respectively, for ArcFace. The adversarial training is
with several values of hyperparameter ”m.” Differences in
robustness among them are marginal. We see our method
enjoys the stronger robustness in all the cases.</p>
        </sec>
      </sec>
      <sec id="sec-4-7">
        <title>Robustness against Adversarial Examples in</title>
      </sec>
      <sec id="sec-4-8">
        <title>Black-Box Setting</title>
        <p>We first generate AXs by LOTS with I-FGSM, BIM, and
CW to the single model in a white-box setting. We
generated several sets of AXs with different distances between
their features and the target features. All of their distances
are sufficiently small that all AXs are successful attacks.
Although a class with a smaller distance is hard to generate, we
successfully generated AX for all input images by searching
for a longer time. On average, it took 10.97 seconds to
generate an AX whose distance from the target is 0.2. As in the
white-box setting experiments, we did not restrict the
number of iterations; we chose a rather large perturbation
boundary. We applied these AX to each model for the evaluation,
whose results are in graphs in Figure 4.</p>
        <p>The results show that our method has significantly
suppressed the transferability of AXs. We could also see a clear
relation between the transferability of the AXs and the
distance between features. We consider it because our
ensemble uses the same single model. The transferability
surprisingly reaches 1 when the distance between AX and the
target in feature space becomes sufficiently small. However,
this comes with a much larger perturbation in input
images, as shown in Figure 5. The transferability of all
models approaches zero as the distance between features
becomes large. It is so because AX becomes no longer a
successful one with such a large distance. That we eventually
have complete transferability with a small distance in
feature space indicates that an essential improvement within a
single model is necessary, unless we can strictly forbid the
small size of perturbation by some means. We consider this
is a clear limitation of the ensemble model method.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Introducing ensemble diversity in feature extractors for
adversarial robustness was not direct. We proposed a novel
method that could effectively increase the robustness of
feature extractors. Our method might not completely prevent
adversarial examples if the perturbation is large. However,
it can potentially be used with other tandem methods to
increase their robustness. Our method also showed stronger
robustness than one of the adversarial training methods.
AAAI Conference on Artificial Intelligence, (AAAI-18),
the 30th innovative Applications of Artificial Intelligence
(IAAI-18), and the 8th AAAI Symposium on Educational
Advances in Artificial Intelligence (EAAI-18), New
Orleans, Louisiana, USA, February 2-7, 2018, 6829–6836.
AAAI Press. URL https://www.aaai.org/ocs/index.php/
AAAI/AAAI18/paper/view/17334.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abbasi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Gagne´,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Robustness to Adversarial Examples through an Ensemble of Specialists</article-title>
          .
          <source>In 5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          , Toulon, France,
          <source>April 24-26</source>
          ,
          <year>2017</year>
          , Workshop Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Athalye</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Carlini</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples</article-title>
          .
          <source>In Proceedings of the 35th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2018</year>
          , Stockholmsma¨ssan, Stockholm, Sweden,
          <source>July 10-15</source>
          ,
          <year>2018</year>
          ,
          <fpage>274</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          2018.
          <article-title>VGGFace2: A Dataset for Recognising Faces across Pose and Age</article-title>
          . In FG,
          <fpage>67</fpage>
          -
          <lpage>74</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Carlini</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          <year>2017a</year>
          .
          <article-title>Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods</article-title>
          .
          <source>In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security</source>
          ,
          <source>AISec@CCS</source>
          <year>2017</year>
          , Dallas, TX, USA, November 3,
          <year>2017</year>
          ,
          <fpage>3</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Carlini</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          <year>2017b</year>
          .
          <article-title>Towards Evaluating the Robustness of Neural Networks</article-title>
          .
          <source>In IEEE Symposium on Security and Privacy</source>
          ,
          <volume>39</volume>
          -
          <fpage>57</fpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          ; and Han,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices</article-title>
          . CoRR abs/
          <year>1804</year>
          .07573.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rosenfeld</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Kolter</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          <year>2019</year>
          . Certified Adversarial Robustness via Randomized Smoothing.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>In</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          , K.; and Salakhutdinov, R., eds.,
          <source>Proceedings of the 36th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2019</year>
          ,
          <volume>9</volume>
          -
          <fpage>15</fpage>
          June 2019, Long Beach, California, USA, volume
          <volume>97</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>1310</volume>
          -
          <fpage>1320</fpage>
          . PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Dabouei</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Soleymani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Taherkhani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dawson</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Nasrabadi</surname>
            ,
            <given-names>N. M.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Exploiting Joint Robustness to Adversarial Perturbations</article-title>
          .
          <source>In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Xue, N.; and
          <string-name>
            <surname>Zafeiriou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>ArcFace: Additive Angular Margin Loss for Deep Face Recognition</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>In</surname>
            <given-names>CVPR</given-names>
          </string-name>
          ,
          <fpage>4690</fpage>
          -
          <lpage>4699</lpage>
          . Computer Vision Foundation / IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Gilmer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ford</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Carlini</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Cubuk</surname>
            ,
            <given-names>E. D.</given-names>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Adversarial</given-names>
            <surname>Examples</surname>
          </string-name>
          <article-title>Are a Natural Consequence of Test Error in Noise</article-title>
          .
          <source>In Proceedings of the 36th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2019</year>
          ,
          <volume>9</volume>
          -
          <fpage>15</fpage>
          June 2019, Long Beach, California, USA,
          <fpage>2280</fpage>
          -
          <lpage>2289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shlens</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Explaining</surname>
          </string-name>
          and
          <article-title>Harnessing Adversarial Examples</article-title>
          .
          <source>In 3rd International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Goswami</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ratha</surname>
            ,
            <given-names>N. K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; and Vatsa,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Detecting and Mitigating Adversarial Perturbations for Robust Face Recognition</article-title>
          .
          <source>Int. J. Comput.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          Vis.
          <volume>127</volume>
          (
          <issue>6-7</issue>
          ):
          <fpage>719</fpage>
          -
          <lpage>742</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11263-019-01160-w.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>URL https://doi.org/10.1007/s11263-019-01160-w.</mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Goswami</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Ratha,
          <string-name>
            <given-names>N. K.</given-names>
            ;
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; and Vatsa,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Unravelling Robustness of Deep Learning Based Face Recognition Against Adversarial Attacks</article-title>
          . In
          <string-name>
            <surname>McIlraith</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          , K. Q., eds., Proceedings of the Thirty-Second
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          .
          <source>In 2016 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2016</year>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA, June 27-30,
          <year>2016</year>
          ,
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Hein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Andriushchenko,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation</article-title>
          . In Guyon, I.; von Luxburg, U.;
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Wallach,
          <string-name>
            <given-names>H. M.</given-names>
            ;
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Vishwanathan,
          <string-name>
            <surname>S.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>V. N.</surname>
          </string-name>
          ; and Garnett, R., eds.,
          <source>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems</source>
          <year>2017</year>
          ,
          <fpage>4</fpage>
          -9
          <source>December</source>
          <year>2017</year>
          , Long Beach, CA, USA,
          <fpage>2266</fpage>
          -
          <lpage>2276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          2007.
          <article-title>Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments</article-title>
          .
          <source>Technical Report 07-49</source>
          , University of Massachusetts, Amherst.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Ioffe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift</article-title>
          . In
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>F. R.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Blei</surname>
          </string-name>
          , D. M., eds.,
          <source>Proceedings of the 32nd International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2015</year>
          , Lille, France,
          <fpage>6</fpage>
          -
          <issue>11</issue>
          <year>July 2015</year>
          , volume
          <volume>37</volume>
          <source>of JMLR Workshop and Conference Proceedings</source>
          ,
          <fpage>448</fpage>
          -
          <lpage>456</lpage>
          . JMLR.org.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Kakizaki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Yoshida</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Adversarial Image Translation: Unrestricted Adversarial Examples in Face Recognition Systems</article-title>
          . In SafeAI@AAAI, volume
          <volume>2560</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <fpage>6</fpage>
          -
          <lpage>13</lpage>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Kariyappa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and Qureshi,
          <string-name>
            <surname>M. K.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Improving Adversarial Robustness of Ensembles with Diversity Training</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          CoRR abs/
          <year>1901</year>
          .09981.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Kurakin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J</given-names>
          </string-name>
          .; and
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017a</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>Adversarial examples in the physical world</article-title>
          .
          <source>In 5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          , Toulon, France,
          <source>April 24-26</source>
          ,
          <year>2017</year>
          , Workshop Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Kurakin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J</given-names>
          </string-name>
          .; and
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017b</year>
          .
          <source>Adversarial Machine Learning at Scale. In 5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          , Toulon, France,
          <source>April 24-26</source>
          ,
          <year>2017</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Wen,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ; and
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <article-title>SphereFace: Deep Hypersphere Embedding for Face Recognition</article-title>
          .
          <source>In CVPR</source>
          ,
          <fpage>6738</fpage>
          -
          <lpage>6746</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Madry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Makelov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tsipras</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Vladu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Towards Deep Learning Models Resistant to Adversarial Attacks</article-title>
          .
          <source>In 6th International Conference on Learning Representations, ICLR</source>
          <year>2018</year>
          , Vancouver, BC, Canada, April 30 - May 3,
          <year>2018</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Moschoglou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Papaioannou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sagonas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kotsia</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zafeiriou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>AgeDB: The First Manually Collected, In-the-Wild Age Database</article-title>
          .
          <source>In CVPR Workshops</source>
          ,
          <year>1997</year>
          -
          <fpage>2005</fpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Improving Adversarial Robustness via Promoting Ensemble Diversity</article-title>
          . In ICML, volume
          <volume>97</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>4970</volume>
          -
          <fpage>4979</fpage>
          . PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McDaniel</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks</article-title>
          .
          <source>In IEEE Symposium on Security and Privacy</source>
          ,
          <string-name>
            <surname>SP</surname>
          </string-name>
          <year>2016</year>
          , San Jose, CA, USA, May
          <volume>22</volume>
          -26,
          <year>2016</year>
          ,
          <fpage>582</fpage>
          -
          <lpage>597</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Parkhi</surname>
            ,
            <given-names>O. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Deep Face Recognition</article-title>
          . In
          <string-name>
            <surname>Xie</surname>
          </string-name>
          , X.;
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>M. W.</given-names>
          </string-name>
          ; and Tam, G. K. L., eds.,
          <source>Proceedings of the British Machine Vision Conference</source>
          <year>2015</year>
          ,
          <article-title>BMVC 2015, Swansea</article-title>
          , UK, September 7-
          <issue>10</issue>
          ,
          <year>2015</year>
          ,
          <volume>41</volume>
          .
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          .12. BMVA Press.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Raghunathan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Steinhardt</surname>
            , J.; and Liang,
            <given-names>P.</given-names>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <article-title>Semidefinite relaxations for certifying robustness to adversarial examples</article-title>
          . In Bengio, S.; Wallach,
          <string-name>
            <given-names>H. M.</given-names>
            ;
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ;
            <surname>Grauman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Cesa-Bianchi</surname>
          </string-name>
          , N.; and Garnett, R., eds.,
          <source>Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems</source>
          <year>2018</year>
          , NeurIPS
          <year>2018</year>
          ,
          <fpage>3</fpage>
          -8
          <source>December</source>
          <year>2018</year>
          , Montre´al, Canada,
          <fpage>10900</fpage>
          -
          <lpage>10910</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>Rozsa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; Gu¨nther, M.; and Boult,
          <string-name>
            <surname>T. E.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>LOTS about attacking deep features</article-title>
          .
          <source>In IJCB</source>
          ,
          <fpage>168</fpage>
          -
          <lpage>176</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Su,
          <string-name>
            <given-names>H.</given-names>
            ;
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Satheesh,
          <string-name>
            <given-names>S.</given-names>
            ; Ma, S.;
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            ;
            <surname>Berg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            ; and
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>Int. J. Comput. Vis</source>
          .
          <volume>115</volume>
          (
          <issue>3</issue>
          ):
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>Samangouei</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kabkab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Chellappa,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Defense-GAN</surname>
          </string-name>
          :
          <article-title>Protecting Classifiers Against Adversarial Attacks Using Generative Models</article-title>
          .
          <source>In 6th International Conference on Learning Representations, ICLR</source>
          <year>2018</year>
          , Vancouver, BC, Canada, April 30 - May 3,
          <year>2018</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>Schroff</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kalenichenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Philbin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>FaceNet: A unified embedding for face recognition and clustering</article-title>
          .
          <source>In CVPR</source>
          ,
          <fpage>815</fpage>
          -
          <lpage>823</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <surname>Sengupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Castillo,
          <string-name>
            <given-names>C. D.</given-names>
            ;
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            ;
            <surname>Chellappa</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>D. W.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Frontal to profile face verification in the wild</article-title>
          .
          <source>In WACV</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          2016.
          <article-title>Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition</article-title>
          . In
          <string-name>
            <surname>Weippl</surname>
            , E. R.; Katzenbeisser,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kruegel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Halevi</surname>
          </string-name>
          , S., eds.,
          <source>Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security</source>
          , Vienna, Austria,
          <source>October 24-28</source>
          ,
          <year>2016</year>
          ,
          <fpage>1528</fpage>
          -
          <lpage>1540</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nagpal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and Vatsa,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>On the Robustness of Face Recognition Algorithms Against Attacks and Bias</article-title>
          .
          <source>In The Thirty-Fourth AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2020</year>
          , The Thirty-Second
          <source>Innovative Applications of Artificial Intelligence Conference</source>
          ,
          <source>IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2020</year>
          , New York, NY, USA, February 7-
          <issue>12</issue>
          ,
          <year>2020</year>
          ,
          <fpage>13583</fpage>
          -
          <lpage>13589</lpage>
          . AAAI Press. URL https://aaai.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>org/ojs/index.php/AAAI/article/view/7085.</mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and Salakhutdinov,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <given-names>Deep</given-names>
            <surname>Learning Face Representation by Joint IdentificationVerification. In</surname>
          </string-name>
          <string-name>
            <surname>NIPS</surname>
          </string-name>
          ,
          <year>1988</year>
          -
          <fpage>1996</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          2015.
          <article-title>Going deeper with convolutions</article-title>
          .
          <source>In IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2015</year>
          , Boston, MA, USA, June 7-12,
          <year>2015</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Sutskever,
          <string-name>
            <surname>I.</surname>
          </string-name>
          ; Bruna,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. J</surname>
          </string-name>
          .; and Fergus,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Intriguing properties of neural networks</article-title>
          . In Bengio, Y.; and LeCun, Y., eds.,
          <source>2nd International Conference on Learning Representations, ICLR</source>
          <year>2014</year>
          ,
          <article-title>Banff</article-title>
          ,
          <string-name>
            <surname>AB</surname>
          </string-name>
          , Canada,
          <source>April 14-16</source>
          ,
          <year>2014</year>
          , Conference Track Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          2020.
          <article-title>On Adaptive Attacks to Adversarial Example Defenses</article-title>
          . In Larochelle, H.; Ranzato,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Balcan,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Lin</surname>
          </string-name>
          , H., eds.,
          <source>Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems</source>
          <year>2020</year>
          ,
          <article-title>NeurIPS 2020</article-title>
          , December 6-
          <issue>12</issue>
          ,
          <year>2020</year>
          , virtual.
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>URL https://proceedings.neurips.cc/paper/2020/hash/ 11f38f8ecd71867b42433548d1078e38-Abstract.html.</mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Zhong,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Rethinking Feature Distribution for Loss Functions in Image Classification</article-title>
          .
          <source>In CVPR</source>
          ,
          <fpage>9117</fpage>
          -
          <lpage>9126</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ; and Liu,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>CosFace: Large Margin Cosine Loss for Deep Face Recognition</article-title>
          .
          <source>In CVPR</source>
          ,
          <fpage>5265</fpage>
          -
          <lpage>5274</lpage>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          <string-name>
            <surname>Wong</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Kolter</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope</article-title>
          . In Dy, J. G.; and
          <string-name>
            <surname>Krause</surname>
          </string-name>
          , A., eds.,
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2018</year>
          , Stockholmsma¨ssan, Stockholm, Sweden,
          <source>July 10-15</source>
          ,
          <year>2018</year>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research</source>
          ,
          <volume>5283</volume>
          -
          <fpage>5292</fpage>
          . PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          2018.
          <article-title>Scaling provable adversarial defenses</article-title>
          . In Bengio, S.; Wallach,
          <string-name>
            <given-names>H. M.</given-names>
            ;
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ;
            <surname>Grauman</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          ; CesaBianchi, N.; and Garnett, R., eds.,
          <source>Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems</source>
          <year>2018</year>
          , NeurIPS
          <year>2018</year>
          ,
          <fpage>3</fpage>
          -8
          <source>December</source>
          <year>2018</year>
          , Montre´al, Canada,
          <fpage>8410</fpage>
          -
          <lpage>8419</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and Qi,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks</article-title>
          .
          <article-title>In 25th Annual Network and Distributed System Security Symposium</article-title>
          ,
          <string-name>
            <surname>NDSS</surname>
          </string-name>
          <year>2018</year>
          , San Diego, California, USA, February
          <volume>18</volume>
          -
          <issue>21</issue>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ; and Qiao,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks</article-title>
          .
          <source>CoRR abs/1604</source>
          .02878.
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ; and Hong,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks</article-title>
          .
          <source>In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems</source>
          <year>2018</year>
          , NeurIPS
          <year>2018</year>
          ,
          <fpage>3</fpage>
          -8
          <source>December</source>
          <year>2018</year>
          , Montre´al, Canada,
          <fpage>7924</fpage>
          -
          <lpage>7933</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Adversarial Learning With Margin-Based Triplet Embedding Regularization</article-title>
          . In ICCV,
          <fpage>6548</fpage>
          -
          <lpage>6557</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>