Error-Silenced Quantization: Bridging Robustness and Compactness∗

                                  Zhicong Tang , Yinpeng Dong and Hang Su
                                              Tsinghua University
                      {tzc17, dyp17}@mails.tsinghua.edu.cn, suhangss@mail.tsinghua.edu.cn


                            Abstract                                2015], that is, maliciously generated noise hardly noticeable
                                                                    can easily deceive a model to give erroneous predictions.
        As deep neural networks (DNNs) advance rapidly,             This may lead to disastrous consequences and raises concerns
        quantization has become a widely used standard for          about applications in security-critical domains. For exam-
        deployments on resource-limited hardware. How-              ple, in autonomous driving, a stop signal of traffic indicators
        ever, DNNs are well accepted vulnerable to ad-              can be mistakenly detected by a model as a permission sig-
        versarial attacks, and quantization is found to fur-        nal [Eykholt et al., 2018]; or in face recognition, an adver-
        ther weaken the robustness. Adversarial training is         sary can fool the model, bypass the authentication and reach
        proved a feasible defense but depends on a larger           full access to the system [Sharif et al., 2016]. The potential
        network capacity, which contradicts with quanti-            risks are one of the key hindrances to deploy DNNs in safety-
        zation. Thus in this work, we propose a novel               critical scenarios.
        method of Error-silenced Quantization that relaxes             Furthermore, the commonly used vanilla quantization ap-
        the requirement and achieves both robustness and            proaches concentrate on the classification accuracy on clean
        compactness. We first observe the Error Ampli-              inputs and may be more severely threatened by adversarial at-
        fication Effect, i.e., small perturbations on adver-        tacks (Table 1). Therefore, it is imperative to develop a quan-
        sarial samples being amplified through layers, then         tization algorithm that can jointly optimize robustness and
        a pairing is designed to directly silence the error.        compactness. Adversarial training [Goodfellow et al., 2015;
        Comprehensive experimental results on CIFAR-10              Kurakin et al., 2017; Madry et al., 2018], i.e., augmenting the
        and CIFAR-100 prove that our method fixes the ro-           training set with adversarial samples, is recognized to be one
        bustness drop against alternative threat models and         of the best defenses. Nevertheless, it generally requires a sig-
        even outperforms full-precision models. Finally,            nificantly larger network capacity than predicting only clean
        we study different pairing schemes and secure our           inputs, which is in contradiction to quantization.
        method from the obfuscated gradient problem that
        undermines many previous defenses.                             To address this issue, we equip quantization with adver-
                                                                    sarial training and relax the requirement by extracting a
                                                                    pairing object. The pairing of clean and perturbed activa-
1       Introduction                                                tion diminishes the error within and is added to the train-
                                                                    ing loss. Then the model concurrently trained and quantized
Deep neural networks (DNNs) have demonstrated extraordi-            with the loss learns close inference on clean and adversar-
nary performances in a wide range of applications, includ-          ial inputs and thus achieve both strong robustness and high
ing visual understanding [Krizhevsky et al., 2012; He et al.,       compactness. Though previous works [Galloway et al., 2018;
2016], speech recognition [Graves et al., 2013], and natu-          Gui et al., 2019] are aware of the robustness drop and attempt
ral language processing [Devlin et al., 2019]. As its ap-           to fix it, their settings are limited. We thoroughly prove the ro-
plication develops, the deployment of DNNs is becoming              bustness of our method against four threat models: white-box
omnipresent in embedded and edge devices, such as mobile            attack, in which attackers have full access to target models;
phones, IoT devices, autonomous driving systems, etc. To            score and decision based black-box attack, in which attack-
facilitate such deployment, quantization [Wu et al., 2016;          ers have access to detailed or final predictions; and transfer
Jacob et al., 2018] is proposed, which has become an indus-         attack, in which attackers know only data distributions.
try standard for deep learning hardware and an accelerator for
                                                                       Experiments demonstrated our contributions: (i) We firstly
inference in real-time applications [Rastegari et al., 2016].
                                                                    plotted the precise error in activation of attacked models. (ii)
   However, it is accepted that DNNs are vulnerable to ad-
                                                                    We proposed a novel quantization that directly regulates the
versarial attacks [Szegedy et al., 2014; Goodfellow et al.,
                                                                    perturbed activation. (iii) With the method we silence the er-
    ∗
    Copyright c 2020 for this paper by its authors. Use permitted   ror and bridge robustness with model compactness. (iv) We
under Creative Commons License Attribution 4.0 International (CC    further confirmed the superiority and security of our method.
BY 4.0).                                                            The method is called Error-silenced Quantization (EQ)
since it is inspired by the Error Amplification Effect and aims                                1         2         4        8        16
at silencing the error in both activation and predictions.
                                                                         NAT-Full           36.19    27.96     20.76    14.53      7.79
2     Background                                                         NAT-VQ-BWN         35.38    21.07     11.79     7.59      4.99
                                                                         ADV-Full           47.22    43.65     36.63    24.60     11.16
2.1    Compress with quantization                                        ADV-VQ-BWN         40.84    28.34     19.00    12.74      7.74
In this section, we briefly introduce two typical quantized net-
works, including Binary Weight Network (BWN) [Rastegari              Table 1: Results on CIFAR-100 and ResNet-152 support that quan-
et al., 2016] and Ternary Weight Network (TWN) [Li and               tization undermines robustness and the accuracy (in %) of quantized
Liu, 2016].                                                          BWN models drops rapidly as  increases. Abbreviations: NAT- for
   Firstly, the weight W of a DNN can be denoted by Wl =             naturally trained, ADV- for adversarially training, -VQ- for vanilla
{W1 , · · · , Wi , · · · , Wm }, where the l-th layer has m output   quantization, -Full for full precision, -BWN for binary weight.
channels and Wi ∈ Rd is the weight of the i-th filter. Quanti-
zation converts each weight matrix Wi into Qi ∈ Sd , where           2.2.2 Defenses
S consists of at most 2n sparse values in a n-bit quantization.      Adversarial training [Goodfellow et al., 2015; Kurakin et
   BWN takes a scaling factor α ∈ R+ and S = {−α, +α}.               al., 2017; Madry et al., 2018] is currently the strongest and
By solving the optimization J = min kWi − αBi k it yields            most commonly used defense. It augments the training set
                                                                     with adversarial samples by the optimization as
                                                d
                                             1X
       Bji = α × sign(Wij ) and                    Wij .
                                                                                                                   
                                       α=                     (1)
                                             d j=1                               min E(x,y)∼D max L(θ, x + δ, y) ,              (3)
                                                                                    θ               δ∈∆
   TWN introduces a 0 state over BWN in S = {−α, 0, +α}              where pairs of example x ∈ Rd and ground-truth y follow an
to approximate the real-valued weight Wi more precisely. It          underlying data distribution D, δ ∈ ∆ is the allowed adver-
solves the optimization J = min kWi − αTi k as                       sarial noise added to image x to deceive the classifier, θ is the
                     j
                                                                     model weight to be optimized and L is the loss function.
         −α, Wi < −∆
         
                                          1 X
  Tji = 0,         Wij ≤ ∆ and α =                  Wij , (2)        3       The Error Amplification Effect
                                       |I ∆ |
            +α, Wij > ∆                        i∈I∆                  The conventionally quantized DNN is counter-intuitively
         
                  Pd      j                     j
                                                                     more vulnerable [Lin et al., 2019] under the threat of adver-
where ∆ = 0.7  d    j=1 Wi and I∆ = {j| Wi > ∆}.                     sarial attacks. One convincing explanation is the Error Ampli-
   Then Bi and Ti are the 1-bit and 2-bit quantized Qi that          fication Effect discovered by [Liao et al., 2018]. Specifically,
forms the space-efficient weight Q. Since the factor α re-           tiny perturbations can be amplified when fed through layers,
quires little storage, BWN compresses a full-precision model         become sizable enough to deceive the network and eventually
by 32× and TWN compresses by 16×.                                    push the classification result into an incorrect bucket. More-
                                                                     over, the quantization of a DNN worsens its robustness com-
2.2    Adversarial attacks and defenses                              paring with the original full-precision one by enlarging the
Given an image x, adversarial attacks is to find the noise δ         granularity of the weights, making its response more suscep-
that the classifier’s prediction of input xadv = x + δ is wrong.     tible to the input. As shown in Table 1, quantized models
And defenses aim to maintain the robustness of the classifier,       yield constantly inferior robustness under FGSM attacks of
i.e. the prediction accuracy on input xadv . Here we list some       varied perturbation strength.
attacks and defenses used in experiments.                               To in detail investigate the effect, we conducted pre-
                                                                     experiments on CIFAR-100 [Krizhevsky and Hinton, 2009]
2.2.1 Attacks
                                                                     and ResNet-152 [He et al., 2016]. Adversarial samples are
Fast Gradient Sign Method (FGSM) is a L∞ bounded one-
                                                                     generated untargeted by a 10-step PGD attacker with other
step attack forwarded by [Goodfellow et al., 2015] that cal-
                                                                     parameters  = 8/255 and step size 2/255 corresponding
culates the adversarial samples by following the direction of
                                                                     to [Madry et al., 2018]. In Figure 1, we test four settings
the gradient of loss function L at step size .
                                                                     with the attack, evaluate and plot the distance Dl between the
   Projected Gradient Descend (PGD) proposed by [Madry
                                                                     clean and perturbed activation of each layer as
et al., 2018] repeats FGSM and starts with a random step
to escape the sharp curvature near the original input, and is                                      Fl (x) − Fl (xadv ) 2
thought to be the strongest first-order attack.                                   Dl (x, xadv ) =                        ,        (4)
   C&W Attack [Carlini and Wagner, 2017] chooses tanh                                                   kFl (x)k2
function instead of box-constrained methods and optimizes            where Fl denotes the activation after the l-th ResNet module.
the difference between logits instead of the logit itself. It is     For convenience, we note training scheme with prefix NAT-
an iterative attack and among the strongest L2 attacks.              and ADV-, quantization scheme with infix -VQ- and -EQ-
   Decoupling Direction and Norm Attack (DDN) [Rony                  , weight precision with suffix -Full, -BWN, -TWN and use
et al., 2019] is a newly proposed L2 attack that outperforms         acronyms in all tables.
C&W. It iterates FGSM with the  adjusted in each round,                In the left zone of the illustration 1, the adversarial noise
leading to a finer-grained search for adversarial images.            applied to the input image is relatively small compared to the
                                                                       Algorithm 1 Error-silenced Quantization
                                                                       Input: dataset D, full-precision weight θfull , selected layers
                                                                       S and loss function L
                                                                       Parameter: quantization iteration K, PGD perturbation
                                                                       strength , PGD iteration T , sensitivity parameters λl and dis-
                                                                       tance functions Dl for each layer l
                                                                       Output: quantized weight θ
                                                                        1: for k = 1, 2, · · · , K do
                                                                        2:    Sample batch (x, y) from D
                                                                        3:    Partially quantize θfull into θ
                                                                        4:    for t = 1, 2, · · · , T do
                                                                        5:       Solve the inner max of Eq (6) to obtain δ
                                                                        6:    end for
Figure 1: Small perturbations amplified throughout layers and two       7:    L := L(θ, x, y)
quantized BWN models predict the same level of error as the unde-       8:    for layer l in S do
fended naturally trained model. Abbreviations: NAT- for naturally       9:       L = L + λl Dl (x, x + δ)
trained, ADV- for adversarially training, -VQ- for vanilla quantiza-   10:    end for
tion, -Full for full precision, -BWN for binary weight.                11:    Backward and update θfull with loss L
                                                                       12: end for
                                                                       13: return θ
image itself (±8 versus 255 in this setting). However, as the
inference carries on the magnitude of initial perturbation is
amplified through the latter part of the network. Once the             the former one is handled by directly controlling the ampli-
perturbation is amplified large enough, the model is misled to         fied error, i.e., pairing activation.
a wrong bucket and the accuracy is witnessed a harsh drop.
   With the experiment results above we have the follow-               4.1   Pairing activation
ing observations: (i) The error of the activation eventually           Since the activation of an adversarial input deviates largely
accumulates large enough to push the prediction to a mis-              from that of its original image, a natural solution to control
leading bucket. (ii) All models suffer from the effect while           the error is training the network to diminish this deviation.
quantization reduces robustness by a wide margin. (iii) With              Let Dl (x, x0 ) be a function that calculates the relative dis-
vanilla quantization methods, the robustness gain of adversar-         tance between the activation of l-th layer when the model is
ial training is drastically degraded.                                  fed with x and x0 respectively, which can be normalized L2
   Therefore, the currently used vanilla quantizations are             or L∞ . With a set of layers to control S, the robustness regu-
showed practically limited and the Error Amplification Effect          larization that optimizes the former part of (5) is
may be a key to a robustness-aware quantization.
                                                                                                                      
                                                                             min E(x,y)∼D L(θ, x, y) + max P (x, x + δ) .            (6)
4       Method                                                                 θ                             δ∈∆

Motivated by the Error Amplification Effect above, we in-                Here L is the loss function and P is the pairing defined as
troduce a quantization scheme that simultaneously preserves                                         X
the robustness of the original full-capacity model and the                           P (x, xadv ) =    λl Dl (x, xadv ),          (7)
compactness of low bandwidth quantization. The concurrent                                            l∈S
training and quantizing procedure is described in (1).                 where λl is a series of sensitivity parameters that determine
   We firstly follow the commonly used min-max based ro-               the threshold of the amplified error between clean and adver-
bustness optimization and formulate the overall robustness             sarial samples. The model is forced to infer close activation
and compactness target as                                              on l-th layer if λl is large and is allowed to tolerate sizable
                                                                       differences if λl is small.
                                                                        With the pairing object, we train the model with clean sam-
                                               size(θfull )            ples and then pair the activation of particular layers, rather
min E(x,y)∼D max L(θ, x + δ, y)            s.t.             = c,
    θ             δ∈∆                            size(θ)               than directly training on adversarial samples. The equation
                                                             (5)       (6) can also be divided into two parts that separately tackle
where θfull is the original full-precision weight (W), θ is the        the classification accuracy on clean and adversarial images.
finally quantized weight (Q), size(·) is the memory size to               The first part is designed to maintain the performance of
store the weight and c is the target compression rate.                 the model because it is noticed that the development of ro-
   The equation (5) can be divided into two parts: (i) Min-            bustness is often at the cost of prediction accuracy [Su et al.,
imize the loss on adversarially perturbed inputs for robust-           2018]. With the second part, we train the model to dimin-
ness. (ii) Compress the model weight to meet the target rate           ish the deviation and infer close activation. A model behaves
for compactness. In our method, the latter one is handled by a         closely on clean and adversarial inputs is supposed to gain
quantization algorithm that allows simultaneous training, and          close prediction accuracy on both.
  As a special case, pairing is applied only on the final out-                  NF      NEB         AF     AVB       AEB       AET
put layer of the network, on which the following experiments
focus. Then the pairing can be simplified as the distance be-     Clean      93.33     79.35     80.10     90.84     82.19    81.31
tween the logits on clean and adversarial samples.                FGSM        7.24     26.47     29.47     22.81     29.49    26.72
                                                                  PGD         0.00     41.84     47.06      7.08     41.62    41.02
4.2    Solving adversarial perturbations                          DDN         0.00     29.11     28.18      2.43     28.04    24.81
                                                                  C&W         0.04     38.58     40.49      8.45     38.24    36.84
In the optimization (6), the perturbations δ are generated to
maximize the error of selected activation. However, in this      (a) Natural test and white-box attack accuracy (in %). Underline
work we generate them with untargeted white-box attacks be-      indicates the first and the second of the row.
cause it is believed the strongest attack and so far no attack
studies and magnifies the error.                                               NF      NEB         AF     AVB       AEB       AET
   Previous works [Madry et al., 2018] have shown that PGD
performs as the most powerful first-order attack. We follow        NF        0.00     77.68     78.06     77.74     81.11    79.62
the conclusion and solve adversarial perturbations δ by PGD        NEB      69.10     41.82     60.84     65.58     64.19    64.08
attacks with settings consistent with [Madry et al., 2018] and     AF       67.44     57.33     47.71     54.49     61.20    60.89
modify iteration number and step size.                             AVB      24.82     73.51     72.75      7.11     76.09    75.31
                                                                   AEB      75.79     62.74     63.12     64.98     41.36    60.66
4.3    Progressive quantization                                    AET      77.20     63.31     63.70     67.79     61.11    41.12
Our method upholds and improves the robustness of quan-          (b) Transfer attack accuracy (in %). Attacks are generated by row
tized models by concurrently updating and quantizing its         and applied by line, for example, AF model reaches an accuracy of
weight. Accordingly, we choose the Stochastic Quantization       60.84% on adversarial inputs generated with NEB model.
method introduced in [Dong et al., 2019]. In our method,
a model is fed of clean and adversarial inputs with partially    Table 2: Test results on CIFAR-10. Abbreviations: N- for naturally
quantized weight, and the full-precision weight is updated by    training, A- for adversarially training, -V- for vanilla quantization,
the gradients estimated. For comparison, vanilla Stochastic      -E- for Error-silenced Quantization, -F for full precision, -B for bi-
Quantization trains models with clean inputs only.               nary weight, and -T for ternary weight.


5     Experiments                                                attacks, 100-step  = 1 DDN and 20-step  = 1 C&W to
In this section, our experiments demonstrate that the pro-       study L2 bounded attacks.
posed method can effectively retain and further improve the         For transfer attack tests, all adversarial samples are gener-
robustness when a model is quantized into low-bandwidth.         ated by the same PGD attacker as white-box stage. We train
Also, the method diminishes the aforementioned Error Am-         and quantize alternative models from scratch if the model set-
plification Effect by a large margin compared with both full-    ting generating attacks and being attacked is the same.
precision and vanilla quantized models. Finally, we show that
the method provides more convincing performances than two        5.2.1 Results
baselines: adversarial training before and while quantization.   As shown in Table 2a and 3a, the vanilla quantized models are
                                                                 exposed with weak robustness and adversarial training before
5.1    Settings                                                  quantization helps little. With conventional methods, the ro-
We apply Wide ResNet 28-10 [Zagoruyko and Komodakis,             bustness gained by adversarial training is drastically degraded
2016] on CIFAR-10 [Krizhevsky and Hinton, 2009] and              to nearly none. While with our method, the accuracy consis-
ResNet-152 on CIFAR-100. Six models in each setting are          tently floats around or above full-precision models through-
tested with clean input, white-box and transfer attacks.         out two datasets. Comparing to the gap of vanilla quantiza-
   During training, we augment training set with the PGD at-     tion, our proposed method is proved to be feasible in control-
tacker same as above and train models with an Adam opti-         ling the harsh drop to a reasonably small level and works for
mizer [Kingma and Ba, 2015] for 150 epochs. The hyper-           both naturally and adversarially trained models.
parameters are left in default without fine-tuning.                 In the cross transfer attack scenario (Tables 2b and 3b), our
   During quantization, we pair the activation after the final   robustly quantized models achieve sound results. For adver-
layer (logits) by L2 norm and use a SGD optimizer with learn-    sarial attacks generated from NF models, which is often the
ing rate 0.1, momentum 0.9 and weight decay 10−4 to train        situation, the proposed method assists quantized models to
for 120 epochs in consistence with [Dong et al., 2019]. How-     steadily beat the AF model. It is also true that our method
ever, the quantization ratio is updated by the uniform scheme,   established solid defenses confronting other attacks, for ex-
i.e., beginning at 0.2 and updated by 0.2 for every 25 epochs.   ample, in Table 3b the -EQ- models exceed the AF model
                                                                 under the attacks of other quantized models.
5.2    Retaining robustness of quantized models                     We also notice that the NEB model and the AEB model
For white-box attack tests, we use a 20-step PGD attacker        perform almost the same, which further demonstrates the ad-
with step size 0.1, which is slightly stronger than that used    vantages of our method that adversarial training before quan-
for training. We also analyze the robustness against other ad-   tization is not required. Lastly, the method manages to main-
versarial attacks, using  = 16/255 FGSM to study one-step       tain and even improve accuracy on clean data.
               NF      NEB         AF     AVB       AEB       AET
 Clean      73.20     55.54     50.80     65.84     54.09    50.74
 FGSM        7.77     12.05     11.15      7.59     13.36    10.78
 PGD         0.03     19.17     22.15      0.65     20.49    19.03
 DDN         0.01     12.35     17.37      0.24     13.74    12.22
 C&W         0.34     18.48     20.62      1.21     19.83    17.02
(a) Natural test and white-box attack accuracy (in %). Underline
indicates the first and the second of the row.

              NF      NEB         AF     AVB       AEB       AET
  NF        0.09     52.87     48.63     33.14     52.07    48.57
  NEB      49.88     18.88     36.80     41.44     37.69    36.52
  AF       44.78     37.27     22.38     33.68     35.75    34.62        Figure 2: Our quantized models diminish the Error Amplification
                                                                         Effect by a large margin and even outperform full-precision mod-
  AVB      13.77     51.94     46.64      0.57     50.56    47.18        els. Abbreviations: NAT- for naturally trained, ADV- for adversari-
  AEB      51.14     37.99     36.31     41.40     20.71    36.45        ally training, -VQ- for vanilla quantization, -EQ- for Error-silenced
  AET      56.34     40.05     37.50     46.31     39.13    18.67        Quantization, -Full for full precision, -BWN for binary weight, and
                                                                         -TWN for ternary weight.
(b) Transfer attack accuracy (in %). Attacks are generated by row
and applied by column, for example, AF model reaches an accuracy
of 36.80% on adversarial inputs generated with NEB model.                  Training         Clean     FGSM       PGD       DDN      C&W
Table 3: Test results on CIFAR-100. Abbreviations: N- for natu-            Natural          56.39      11.26     19.53    12.55      18.33
rally training, A- for adversarially training, -V- for vanilla quanti-     Adversarial      49.64      10.07     16.66    10.80      16.10
zation, -E- for Error-silenced Quantization, -F for full precision, -B     Natural          54.88      10.16     17.99    10.23      16.30
for binary weight, and -T for ternary weight.
                                                                           Adversarial      50.51       9.87     17.63    11.70      16.40

5.3    Silencing the Error Amplification Effect                          Table 4: Robustness of adversarial training in vanilla quantization.
We re-evaluate the error in latent layers to investigate whether         Test accuracy in %. Models are quantized to 1-bit and 2-bit in the
the method manages to silence it. The relative distance is               upper and lower part.
defined in (4) and sampled after every ResNet module. The
experiment is conducted on ResNet-152 and CIFAR-100.                     5.4.1 Results
5.3.1 Results                                                            As in the upper part of Table 4, adversarial training in vanilla
Though the input is perturbed by the same magnitude, the                 quantization retains limited robustness and is not compara-
error is amplified quite differently in Figure 2. With conven-           ble to our method. For naturally trained models, adversarial
tional quantization, the error of the ADV-VQ-BWN model                   training promotes robustness to 19% against PGD but lags
increases up to 4 times of the ADV-Full model, which is a                1% behind our method. For adversarially trained models, ad-
possible explanation of the large robustness drop. While with            versarial training fails to maintain the robustness and leaves a
our method, the models managed to lower the error than its               drop of 5.5%, which is the triple of ours.
full-precision counterpart throughout the inference.                        We hold that the following hypothesis may lead to the in-
   [Xu et al., 2018] conclude that image quantization, i.e., re-         consistent performances of adversarial training in the context
duction in color bit depth is an effective defense. However,             of ordinary training and quantization: (i) Quantization limits
quantization of network weight instead weakens robustness.               the capacity of the model, while adversarial training requires
[Lin et al., 2019] proved that it tends to intensify the Error           a significantly large capacity. (ii) With limited capacity, the
Amplification Effect when  > 3/255, which even starts from              model faces difficulty in learning and therefore suffers from
 = 1/255 in our experiments (Table 1). Our method ob-                   lower accuracy on both clean and adversarial inputs. In con-
tains significant results, overcomes the threshold and further           trast, the model learns to predict only clean inputs and infer
pushes it beyond  = 8/255 as in Figure 2.                               close activation on adversarial inputs with our method.
                                                                            We apply additional experiments on 2-bit quantization to
5.4    Beyond standalone adversarial training                            demonstrate the hypothesis above. Though TWN models
To prove the necessity of pairing, we append experiments                 learn higher accuracy on training set, which confirms our hy-
of adversarial training in vanilla quantization on ResNet-152            pothesis that adversarial training is hindered by limited net-
and CIFAR-100.                                                           work capacity, they attain the same and even inferior results
   For adversarial training in vanilla quantization, models are          on test set compared to BWN models. It draws conclusion
fed with perturbed samples only and updated by the original              that while higher bandwidth enables adversarial training, it
min-max optimization. All adversarial samples are generated              itself undermines robustness ([Lin et al., 2019]). In contrast,
with the same PGD attacker as in the white-box section and               our method better balances the trade-off between adversarial
all models are quantized for 120 epochs.                                 training and low bandwidth weight.
    Pairing      Clean     FGSM        PGD      DDN      C&W
    Logit         54.09     13.18     20.31    12.20     19.70
    Activation    49.65     11.80     18.01    13.10     19.70
    Logit         50.74     10.78     19.03    11.20     16.90
    Activation    49.54     10.26     18.37    11.04     16.18

Table 5: Robustness of EQ with different pairing target. Test accu-
racy in %. Models are quantized to 1-bit and 2-bit in the upper and
lower part.


6     Discussions
In this section, we discuss the equivalence of different pairing          (a) Decision-based Boundary attack test accuracy (in %).
scheme and assume pairing logits as a universal pairing. We
also discuss the obfuscated gradients problem which under-
mines many previous defenses and further secure the robust-
ness of our method.

6.1    Equivalence of different pairing
While we offer a general pairing object in (6) and (7) that
can be any layers, only the output logits is paired in experi-
ments. Here we reveal that though pairing the activation may
produce lower errors, pairing the logits achieves the same ac-
curacy and better balances training costs and performances.
We investigate ResNet-152 on CIFAR-100 and pair the acti-
vation after the 4th, 12th and 48th ResNet module.
   In Table 5, the close accuracy of two pairing schemes
shown confirms that pairing more activation provides minor
                                                                                (b) Score-based N Attack test accuracy (in %).
improvements while it requires considerable additional com-
putations and storage of intermediate results. It brings a large      Figure 3: Black-box attack test results on CIFAR-100. Abbrevia-
cost of memory space, especially when training with GPU.              tions: NAT- for naturally trained, ADV- for adversarially training, -
Furthermore, pairing activation may introduce unnecessary             EQ- for Error-silenced Quantization, -Full for full precision, -BWN
requirements on network capacity, as in the case of adversar-         for binary weight.
ial training. The smaller gap between two pairing settings on
TWN is also an implication of it.
                                                                      obfuscated gradient problem and provides a secured sense of
6.2    Secure the sense of robustness                                 robustness. We suppose a possible explanation that we use
A noticeable coincidence is that our simplified activation pair-      untargeted attacks for training while [Kannan et al., 2018]
ing scheme, pairing logits, is considerably similar to the Ad-        use targeted attacks.
versarial Logit Pairing forwarded in [Kannan et al., 2018].
With the method, the author claims state-of-the-art robustness        7    Conclusion
on ImageNet. However, it is found [Athalye et al., 2018] to           This paper aims to tackle the issue of achieving both robust-
suffer severely from obfuscated gradients and provide a false         ness and compactness in DNNs. Inspired by the Error Am-
sense of security that can be easily circumvented with non            plification Effect, we relax the capacity requirements of ad-
gradient-based attacks.                                               versarial training by pairing, and propose a quantization that
   In [Athalye et al., 2018], it is reported that defenses suf-       optimizes accuracy on benign and adversarial inputs simulta-
fering from obfuscated gradients are vulnerable to black-box          neously. Extensive experiments throughout four threat mod-
attacks that operate by estimating instead of directly solv-          els, two datasets and two networks endorse the superior ro-
ing gradients. To thoroughly examine whether our method               bustness of the proposed method over vanilla approaches and
is truly secure, we test it with L2 bounded Boundary at-              even full-precision counterparts, while still reach high com-
tack [Brendel et al., 2018] and N Attack [Li et al., 2019]            pression rates. Appended by a guarded notion of secure from
for decision-based and score-based black-box attacks, respec-         obfuscated gradients, our method managed to bridge robust-
tively. We vary perturbation strength from  = 0 to  = 4 and         ness and compactness for DNNs and further applications.
compare the accuracy of quantized models with full-precision
counterparts.
   As shown in Figure 3a and 3b, our quantization achieve             References
consistently close or better than the ADV-Full model as the           [Athalye et al., 2018] Anish Athalye, Nicholas Carlini, and
strength varies. All results confirm that our method meets no           David A. Wagner. Obfuscated gradients give a false sense
  of security: Circumventing defenses to adversarial exam-      [Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever,
  ples. In ICML, volume 80, 2018.                                  and Geoffrey E. Hinton. Imagenet classification with deep
[Brendel et al., 2018] Wieland Brendel, Jonas Rauber, and          convolutional neural networks. In NIPS, 2012.
  Matthias Bethge. Decision-based adversarial attacks: Re-      [Kurakin et al., 2017] Alexey Kurakin, Ian J. Goodfellow,
  liable attacks against black-box machine learning models.        and Samy Bengio. Adversarial machine learning at scale.
  In ICLR. OpenReview.net, 2018.                                   In ICLR. OpenReview.net, 2017.
[Carlini and Wagner, 2017] Nicholas Carlini and David A.        [Li and Liu, 2016] Fengfu Li and Bin Liu. Ternary weight
  Wagner. Towards evaluating the robustness of neural net-         networks. In NIPS workshop on EMDNN, 2016.
  works. In IEEE Symposium on Security and Privacy, 2017.       [Li et al., 2019] Yandong Li, Lijun Li, Liqiang Wang, Tong
[Devlin et al., 2019] Jacob Devlin, Ming-Wei Chang, Ken-           Zhang, and Boqing Gong. NATTACK: learning the distri-
                                                                   butions of adversarial examples for an improved black-box
  ton Lee, and Kristina Toutanova. BERT: pre-training of
                                                                   attack on deep neural networks. In ICML, 2019.
  deep bidirectional transformers for language understand-
  ing. In NAACL-HLT, 2019.                                      [Liao et al., 2018] Fangzhou Liao, Ming Liang, Yinpeng
                                                                   Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense
[Dong et al., 2019] Yinpeng Dong, Renkun Ni, Jianguo Li,           against adversarial attacks using high-level representation
  Yurong Chen, Hang Su, and Jun Zhu. Stochastic quanti-            guided denoiser. In CVPR, 2018.
  zation for learning accurate low-bit deep neural networks.
  International Journal of Computer Vision, 2019.               [Lin et al., 2019] Ji Lin, Chuang Gan, and Song Han. De-
                                                                   fensive quantization: When efficiency meets robustness.
[Eykholt et al., 2018] Kevin Eykholt, Ivan Evtimov, Ear-           In ICLR. OpenReview.net, 2019.
  lence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao,           [Madry et al., 2018] Aleksander       Madry,       Aleksandar
  Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust
                                                                   Makelov, Ludwig Schmidt, Dimitris Tsipras, and
  physical-world attacks on deep learning visual classifica-
                                                                   Adrian Vladu. Towards deep learning models resistant to
  tion. In CVPR, 2018.
                                                                   adversarial attacks. In ICLR. OpenReview.net, 2018.
[Galloway et al., 2018] Angus Galloway, Graham W. Taylor,       [Rastegari et al., 2016] Mohammad Rastegari, Vicente Or-
  and Medhat Moussa. Attacking binarized neural networks.          donez, Joseph Redmon, and Ali Farhadi. Xnor-net: Ima-
  In ICLR. OpenReview.net, 2018.                                   genet classification using binary convolutional neural net-
[Goodfellow et al., 2015] Ian J. Goodfellow, Jonathon              works. In ECCV, 2016.
  Shlens, and Christian Szegedy. Explaining and harnessing      [Rony et al., 2019] Jérôme Rony, Luiz G. Hafemann, Luiz S.
  adversarial examples. In ICLR, 2015.                             Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric
[Graves et al., 2013] Alex Graves, Abdel-rahman Mohamed,           Granger. Decoupling direction and norm for efficient
  and Geoffrey E. Hinton. Speech recognition with deep             gradient-based L2 adversarial attacks and defenses. In
  recurrent neural networks. In ICASSP, 2013.                      CVPR, 2019.
                                                                [Sharif et al., 2016] Mahmood Sharif, Sruti Bhagavatula,
[Gui et al., 2019] Shupeng Gui, Haotao Wang, Haichuan
                                                                   Lujo Bauer, and Michael K. Reiter. Accessorize to a crime:
  Yang, Chen Yu, Zhangyang Wang, and Ji Liu. Model com-            Real and stealthy attacks on state-of-the-art face recogni-
  pression with adversarial robustness: A unified optimiza-        tion. In ACM CCS, 2016.
  tion framework. In NeurIPS, 2019.
                                                                [Su et al., 2018] Dong Su, Huan Zhang, Hongge Chen, Jin-
[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing              feng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the
  Ren, and Jian Sun. Deep residual learning for image recog-       cost of accuracy? - A comprehensive study on the robust-
  nition. In CVPR, 2016.                                           ness of 18 deep image classification models. In ECCV,
[Jacob et al., 2018] Benoit Jacob, Skirmantas Kligys,              2018.
   Bo Chen, Menglong Zhu, Matthew Tang, Andrew G.               [Szegedy et al., 2014] Christian      Szegedy,       Wojciech
   Howard, Hartwig Adam, and Dmitry Kalenichenko.                  Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Er-
   Quantization and training of neural networks for efficient      han, Ian J. Goodfellow, and Rob Fergus. Intriguing
   integer-arithmetic-only inference. In CVPR, 2018.               properties of neural networks. In ICLR, 2014.
[Kannan et al., 2018] Harini Kannan, Alexey Kurakin, and        [Wu et al., 2016] Jiaxiang Wu, Cong Leng, Yuhang Wang,
  Ian J. Goodfellow. Adversarial logit pairing. CoRR,              Qinghao Hu, and Jian Cheng. Quantized convolutional
  abs/1803.06373, 2018.                                            neural networks for mobile devices. In CVPR, 2016.
[Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba.          [Xu et al., 2018] Weilin Xu, David Evans, and Yanjun Qi.
  Adam: A method for stochastic optimization. In ICLR,             Feature squeezing: Detecting adversarial examples in deep
  2015.                                                            neural networks. In NDSS, 2018.
[Krizhevsky and Hinton, 2009] A. Krizhevsky and G. Hin-         [Zagoruyko and Komodakis, 2016] Sergey Zagoruyko and
  ton. Learning multiple layers of features from tiny images.      Nikos Komodakis. Wide residual networks. In BMVC,
  University of Toronto, Tech. Rep, 2009.                          2016.