Error-Silenced Quantization: Bridging Robustness and Compactness∗ Zhicong Tang , Yinpeng Dong and Hang Su Tsinghua University {tzc17, dyp17}@mails.tsinghua.edu.cn, suhangss@mail.tsinghua.edu.cn Abstract 2015], that is, maliciously generated noise hardly noticeable can easily deceive a model to give erroneous predictions. As deep neural networks (DNNs) advance rapidly, This may lead to disastrous consequences and raises concerns quantization has become a widely used standard for about applications in security-critical domains. For exam- deployments on resource-limited hardware. How- ple, in autonomous driving, a stop signal of traffic indicators ever, DNNs are well accepted vulnerable to ad- can be mistakenly detected by a model as a permission sig- versarial attacks, and quantization is found to fur- nal [Eykholt et al., 2018]; or in face recognition, an adver- ther weaken the robustness. Adversarial training is sary can fool the model, bypass the authentication and reach proved a feasible defense but depends on a larger full access to the system [Sharif et al., 2016]. The potential network capacity, which contradicts with quanti- risks are one of the key hindrances to deploy DNNs in safety- zation. Thus in this work, we propose a novel critical scenarios. method of Error-silenced Quantization that relaxes Furthermore, the commonly used vanilla quantization ap- the requirement and achieves both robustness and proaches concentrate on the classification accuracy on clean compactness. We first observe the Error Ampli- inputs and may be more severely threatened by adversarial at- fication Effect, i.e., small perturbations on adver- tacks (Table 1). Therefore, it is imperative to develop a quan- sarial samples being amplified through layers, then tization algorithm that can jointly optimize robustness and a pairing is designed to directly silence the error. compactness. Adversarial training [Goodfellow et al., 2015; Comprehensive experimental results on CIFAR-10 Kurakin et al., 2017; Madry et al., 2018], i.e., augmenting the and CIFAR-100 prove that our method fixes the ro- training set with adversarial samples, is recognized to be one bustness drop against alternative threat models and of the best defenses. Nevertheless, it generally requires a sig- even outperforms full-precision models. Finally, nificantly larger network capacity than predicting only clean we study different pairing schemes and secure our inputs, which is in contradiction to quantization. method from the obfuscated gradient problem that undermines many previous defenses. To address this issue, we equip quantization with adver- sarial training and relax the requirement by extracting a pairing object. The pairing of clean and perturbed activa- 1 Introduction tion diminishes the error within and is added to the train- ing loss. Then the model concurrently trained and quantized Deep neural networks (DNNs) have demonstrated extraordi- with the loss learns close inference on clean and adversar- nary performances in a wide range of applications, includ- ial inputs and thus achieve both strong robustness and high ing visual understanding [Krizhevsky et al., 2012; He et al., compactness. Though previous works [Galloway et al., 2018; 2016], speech recognition [Graves et al., 2013], and natu- Gui et al., 2019] are aware of the robustness drop and attempt ral language processing [Devlin et al., 2019]. As its ap- to fix it, their settings are limited. We thoroughly prove the ro- plication develops, the deployment of DNNs is becoming bustness of our method against four threat models: white-box omnipresent in embedded and edge devices, such as mobile attack, in which attackers have full access to target models; phones, IoT devices, autonomous driving systems, etc. To score and decision based black-box attack, in which attack- facilitate such deployment, quantization [Wu et al., 2016; ers have access to detailed or final predictions; and transfer Jacob et al., 2018] is proposed, which has become an indus- attack, in which attackers know only data distributions. try standard for deep learning hardware and an accelerator for Experiments demonstrated our contributions: (i) We firstly inference in real-time applications [Rastegari et al., 2016]. plotted the precise error in activation of attacked models. (ii) However, it is accepted that DNNs are vulnerable to ad- We proposed a novel quantization that directly regulates the versarial attacks [Szegedy et al., 2014; Goodfellow et al., perturbed activation. (iii) With the method we silence the er- ∗ Copyright c 2020 for this paper by its authors. Use permitted ror and bridge robustness with model compactness. (iv) We under Creative Commons License Attribution 4.0 International (CC further confirmed the superiority and security of our method. BY 4.0). The method is called Error-silenced Quantization (EQ) since it is inspired by the Error Amplification Effect and aims  1 2 4 8 16 at silencing the error in both activation and predictions. NAT-Full 36.19 27.96 20.76 14.53 7.79 2 Background NAT-VQ-BWN 35.38 21.07 11.79 7.59 4.99 ADV-Full 47.22 43.65 36.63 24.60 11.16 2.1 Compress with quantization ADV-VQ-BWN 40.84 28.34 19.00 12.74 7.74 In this section, we briefly introduce two typical quantized net- works, including Binary Weight Network (BWN) [Rastegari Table 1: Results on CIFAR-100 and ResNet-152 support that quan- et al., 2016] and Ternary Weight Network (TWN) [Li and tization undermines robustness and the accuracy (in %) of quantized Liu, 2016]. BWN models drops rapidly as  increases. Abbreviations: NAT- for Firstly, the weight W of a DNN can be denoted by Wl = naturally trained, ADV- for adversarially training, -VQ- for vanilla {W1 , · · · , Wi , · · · , Wm }, where the l-th layer has m output quantization, -Full for full precision, -BWN for binary weight. channels and Wi ∈ Rd is the weight of the i-th filter. Quanti- zation converts each weight matrix Wi into Qi ∈ Sd , where 2.2.2 Defenses S consists of at most 2n sparse values in a n-bit quantization. Adversarial training [Goodfellow et al., 2015; Kurakin et BWN takes a scaling factor α ∈ R+ and S = {−α, +α}. al., 2017; Madry et al., 2018] is currently the strongest and By solving the optimization J = min kWi − αBi k it yields most commonly used defense. It augments the training set with adversarial samples by the optimization as d 1X Bji = α × sign(Wij ) and Wij .   α= (1) d j=1 min E(x,y)∼D max L(θ, x + δ, y) , (3) θ δ∈∆ TWN introduces a 0 state over BWN in S = {−α, 0, +α} where pairs of example x ∈ Rd and ground-truth y follow an to approximate the real-valued weight Wi more precisely. It underlying data distribution D, δ ∈ ∆ is the allowed adver- solves the optimization J = min kWi − αTi k as sarial noise added to image x to deceive the classifier, θ is the  j model weight to be optimized and L is the loss function. −α, Wi < −∆  1 X Tji = 0, Wij ≤ ∆ and α = Wij , (2) 3 The Error Amplification Effect  |I ∆ | +α, Wij > ∆ i∈I∆ The conventionally quantized DNN is counter-intuitively  Pd j j more vulnerable [Lin et al., 2019] under the threat of adver- where ∆ = 0.7 d j=1 Wi and I∆ = {j| Wi > ∆}. sarial attacks. One convincing explanation is the Error Ampli- Then Bi and Ti are the 1-bit and 2-bit quantized Qi that fication Effect discovered by [Liao et al., 2018]. Specifically, forms the space-efficient weight Q. Since the factor α re- tiny perturbations can be amplified when fed through layers, quires little storage, BWN compresses a full-precision model become sizable enough to deceive the network and eventually by 32× and TWN compresses by 16×. push the classification result into an incorrect bucket. More- over, the quantization of a DNN worsens its robustness com- 2.2 Adversarial attacks and defenses paring with the original full-precision one by enlarging the Given an image x, adversarial attacks is to find the noise δ granularity of the weights, making its response more suscep- that the classifier’s prediction of input xadv = x + δ is wrong. tible to the input. As shown in Table 1, quantized models And defenses aim to maintain the robustness of the classifier, yield constantly inferior robustness under FGSM attacks of i.e. the prediction accuracy on input xadv . Here we list some varied perturbation strength. attacks and defenses used in experiments. To in detail investigate the effect, we conducted pre- experiments on CIFAR-100 [Krizhevsky and Hinton, 2009] 2.2.1 Attacks and ResNet-152 [He et al., 2016]. Adversarial samples are Fast Gradient Sign Method (FGSM) is a L∞ bounded one- generated untargeted by a 10-step PGD attacker with other step attack forwarded by [Goodfellow et al., 2015] that cal- parameters  = 8/255 and step size 2/255 corresponding culates the adversarial samples by following the direction of to [Madry et al., 2018]. In Figure 1, we test four settings the gradient of loss function L at step size . with the attack, evaluate and plot the distance Dl between the Projected Gradient Descend (PGD) proposed by [Madry clean and perturbed activation of each layer as et al., 2018] repeats FGSM and starts with a random step to escape the sharp curvature near the original input, and is Fl (x) − Fl (xadv ) 2 thought to be the strongest first-order attack. Dl (x, xadv ) = , (4) C&W Attack [Carlini and Wagner, 2017] chooses tanh kFl (x)k2 function instead of box-constrained methods and optimizes where Fl denotes the activation after the l-th ResNet module. the difference between logits instead of the logit itself. It is For convenience, we note training scheme with prefix NAT- an iterative attack and among the strongest L2 attacks. and ADV-, quantization scheme with infix -VQ- and -EQ- Decoupling Direction and Norm Attack (DDN) [Rony , weight precision with suffix -Full, -BWN, -TWN and use et al., 2019] is a newly proposed L2 attack that outperforms acronyms in all tables. C&W. It iterates FGSM with the  adjusted in each round, In the left zone of the illustration 1, the adversarial noise leading to a finer-grained search for adversarial images. applied to the input image is relatively small compared to the Algorithm 1 Error-silenced Quantization Input: dataset D, full-precision weight θfull , selected layers S and loss function L Parameter: quantization iteration K, PGD perturbation strength , PGD iteration T , sensitivity parameters λl and dis- tance functions Dl for each layer l Output: quantized weight θ 1: for k = 1, 2, · · · , K do 2: Sample batch (x, y) from D 3: Partially quantize θfull into θ 4: for t = 1, 2, · · · , T do 5: Solve the inner max of Eq (6) to obtain δ 6: end for Figure 1: Small perturbations amplified throughout layers and two 7: L := L(θ, x, y) quantized BWN models predict the same level of error as the unde- 8: for layer l in S do fended naturally trained model. Abbreviations: NAT- for naturally 9: L = L + λl Dl (x, x + δ) trained, ADV- for adversarially training, -VQ- for vanilla quantiza- 10: end for tion, -Full for full precision, -BWN for binary weight. 11: Backward and update θfull with loss L 12: end for 13: return θ image itself (±8 versus 255 in this setting). However, as the inference carries on the magnitude of initial perturbation is amplified through the latter part of the network. Once the the former one is handled by directly controlling the ampli- perturbation is amplified large enough, the model is misled to fied error, i.e., pairing activation. a wrong bucket and the accuracy is witnessed a harsh drop. With the experiment results above we have the follow- 4.1 Pairing activation ing observations: (i) The error of the activation eventually Since the activation of an adversarial input deviates largely accumulates large enough to push the prediction to a mis- from that of its original image, a natural solution to control leading bucket. (ii) All models suffer from the effect while the error is training the network to diminish this deviation. quantization reduces robustness by a wide margin. (iii) With Let Dl (x, x0 ) be a function that calculates the relative dis- vanilla quantization methods, the robustness gain of adversar- tance between the activation of l-th layer when the model is ial training is drastically degraded. fed with x and x0 respectively, which can be normalized L2 Therefore, the currently used vanilla quantizations are or L∞ . With a set of layers to control S, the robustness regu- showed practically limited and the Error Amplification Effect larization that optimizes the former part of (5) is may be a key to a robustness-aware quantization.   min E(x,y)∼D L(θ, x, y) + max P (x, x + δ) . (6) 4 Method θ δ∈∆ Motivated by the Error Amplification Effect above, we in- Here L is the loss function and P is the pairing defined as troduce a quantization scheme that simultaneously preserves X the robustness of the original full-capacity model and the P (x, xadv ) = λl Dl (x, xadv ), (7) compactness of low bandwidth quantization. The concurrent l∈S training and quantizing procedure is described in (1). where λl is a series of sensitivity parameters that determine We firstly follow the commonly used min-max based ro- the threshold of the amplified error between clean and adver- bustness optimization and formulate the overall robustness sarial samples. The model is forced to infer close activation and compactness target as on l-th layer if λl is large and is allowed to tolerate sizable differences if λl is small.   With the pairing object, we train the model with clean sam- size(θfull ) ples and then pair the activation of particular layers, rather min E(x,y)∼D max L(θ, x + δ, y) s.t. = c, θ δ∈∆ size(θ) than directly training on adversarial samples. The equation (5) (6) can also be divided into two parts that separately tackle where θfull is the original full-precision weight (W), θ is the the classification accuracy on clean and adversarial images. finally quantized weight (Q), size(·) is the memory size to The first part is designed to maintain the performance of store the weight and c is the target compression rate. the model because it is noticed that the development of ro- The equation (5) can be divided into two parts: (i) Min- bustness is often at the cost of prediction accuracy [Su et al., imize the loss on adversarially perturbed inputs for robust- 2018]. With the second part, we train the model to dimin- ness. (ii) Compress the model weight to meet the target rate ish the deviation and infer close activation. A model behaves for compactness. In our method, the latter one is handled by a closely on clean and adversarial inputs is supposed to gain quantization algorithm that allows simultaneous training, and close prediction accuracy on both. As a special case, pairing is applied only on the final out- NF NEB AF AVB AEB AET put layer of the network, on which the following experiments focus. Then the pairing can be simplified as the distance be- Clean 93.33 79.35 80.10 90.84 82.19 81.31 tween the logits on clean and adversarial samples. FGSM 7.24 26.47 29.47 22.81 29.49 26.72 PGD 0.00 41.84 47.06 7.08 41.62 41.02 4.2 Solving adversarial perturbations DDN 0.00 29.11 28.18 2.43 28.04 24.81 C&W 0.04 38.58 40.49 8.45 38.24 36.84 In the optimization (6), the perturbations δ are generated to maximize the error of selected activation. However, in this (a) Natural test and white-box attack accuracy (in %). Underline work we generate them with untargeted white-box attacks be- indicates the first and the second of the row. cause it is believed the strongest attack and so far no attack studies and magnifies the error. NF NEB AF AVB AEB AET Previous works [Madry et al., 2018] have shown that PGD performs as the most powerful first-order attack. We follow NF 0.00 77.68 78.06 77.74 81.11 79.62 the conclusion and solve adversarial perturbations δ by PGD NEB 69.10 41.82 60.84 65.58 64.19 64.08 attacks with settings consistent with [Madry et al., 2018] and AF 67.44 57.33 47.71 54.49 61.20 60.89 modify iteration number and step size. AVB 24.82 73.51 72.75 7.11 76.09 75.31 AEB 75.79 62.74 63.12 64.98 41.36 60.66 4.3 Progressive quantization AET 77.20 63.31 63.70 67.79 61.11 41.12 Our method upholds and improves the robustness of quan- (b) Transfer attack accuracy (in %). Attacks are generated by row tized models by concurrently updating and quantizing its and applied by line, for example, AF model reaches an accuracy of weight. Accordingly, we choose the Stochastic Quantization 60.84% on adversarial inputs generated with NEB model. method introduced in [Dong et al., 2019]. In our method, a model is fed of clean and adversarial inputs with partially Table 2: Test results on CIFAR-10. Abbreviations: N- for naturally quantized weight, and the full-precision weight is updated by training, A- for adversarially training, -V- for vanilla quantization, the gradients estimated. For comparison, vanilla Stochastic -E- for Error-silenced Quantization, -F for full precision, -B for bi- Quantization trains models with clean inputs only. nary weight, and -T for ternary weight. 5 Experiments attacks, 100-step  = 1 DDN and 20-step  = 1 C&W to In this section, our experiments demonstrate that the pro- study L2 bounded attacks. posed method can effectively retain and further improve the For transfer attack tests, all adversarial samples are gener- robustness when a model is quantized into low-bandwidth. ated by the same PGD attacker as white-box stage. We train Also, the method diminishes the aforementioned Error Am- and quantize alternative models from scratch if the model set- plification Effect by a large margin compared with both full- ting generating attacks and being attacked is the same. precision and vanilla quantized models. Finally, we show that the method provides more convincing performances than two 5.2.1 Results baselines: adversarial training before and while quantization. As shown in Table 2a and 3a, the vanilla quantized models are exposed with weak robustness and adversarial training before 5.1 Settings quantization helps little. With conventional methods, the ro- We apply Wide ResNet 28-10 [Zagoruyko and Komodakis, bustness gained by adversarial training is drastically degraded 2016] on CIFAR-10 [Krizhevsky and Hinton, 2009] and to nearly none. While with our method, the accuracy consis- ResNet-152 on CIFAR-100. Six models in each setting are tently floats around or above full-precision models through- tested with clean input, white-box and transfer attacks. out two datasets. Comparing to the gap of vanilla quantiza- During training, we augment training set with the PGD at- tion, our proposed method is proved to be feasible in control- tacker same as above and train models with an Adam opti- ling the harsh drop to a reasonably small level and works for mizer [Kingma and Ba, 2015] for 150 epochs. The hyper- both naturally and adversarially trained models. parameters are left in default without fine-tuning. In the cross transfer attack scenario (Tables 2b and 3b), our During quantization, we pair the activation after the final robustly quantized models achieve sound results. For adver- layer (logits) by L2 norm and use a SGD optimizer with learn- sarial attacks generated from NF models, which is often the ing rate 0.1, momentum 0.9 and weight decay 10−4 to train situation, the proposed method assists quantized models to for 120 epochs in consistence with [Dong et al., 2019]. How- steadily beat the AF model. It is also true that our method ever, the quantization ratio is updated by the uniform scheme, established solid defenses confronting other attacks, for ex- i.e., beginning at 0.2 and updated by 0.2 for every 25 epochs. ample, in Table 3b the -EQ- models exceed the AF model under the attacks of other quantized models. 5.2 Retaining robustness of quantized models We also notice that the NEB model and the AEB model For white-box attack tests, we use a 20-step PGD attacker perform almost the same, which further demonstrates the ad- with step size 0.1, which is slightly stronger than that used vantages of our method that adversarial training before quan- for training. We also analyze the robustness against other ad- tization is not required. Lastly, the method manages to main- versarial attacks, using  = 16/255 FGSM to study one-step tain and even improve accuracy on clean data. NF NEB AF AVB AEB AET Clean 73.20 55.54 50.80 65.84 54.09 50.74 FGSM 7.77 12.05 11.15 7.59 13.36 10.78 PGD 0.03 19.17 22.15 0.65 20.49 19.03 DDN 0.01 12.35 17.37 0.24 13.74 12.22 C&W 0.34 18.48 20.62 1.21 19.83 17.02 (a) Natural test and white-box attack accuracy (in %). Underline indicates the first and the second of the row. NF NEB AF AVB AEB AET NF 0.09 52.87 48.63 33.14 52.07 48.57 NEB 49.88 18.88 36.80 41.44 37.69 36.52 AF 44.78 37.27 22.38 33.68 35.75 34.62 Figure 2: Our quantized models diminish the Error Amplification Effect by a large margin and even outperform full-precision mod- AVB 13.77 51.94 46.64 0.57 50.56 47.18 els. Abbreviations: NAT- for naturally trained, ADV- for adversari- AEB 51.14 37.99 36.31 41.40 20.71 36.45 ally training, -VQ- for vanilla quantization, -EQ- for Error-silenced AET 56.34 40.05 37.50 46.31 39.13 18.67 Quantization, -Full for full precision, -BWN for binary weight, and -TWN for ternary weight. (b) Transfer attack accuracy (in %). Attacks are generated by row and applied by column, for example, AF model reaches an accuracy of 36.80% on adversarial inputs generated with NEB model. Training Clean FGSM PGD DDN C&W Table 3: Test results on CIFAR-100. Abbreviations: N- for natu- Natural 56.39 11.26 19.53 12.55 18.33 rally training, A- for adversarially training, -V- for vanilla quanti- Adversarial 49.64 10.07 16.66 10.80 16.10 zation, -E- for Error-silenced Quantization, -F for full precision, -B Natural 54.88 10.16 17.99 10.23 16.30 for binary weight, and -T for ternary weight. Adversarial 50.51 9.87 17.63 11.70 16.40 5.3 Silencing the Error Amplification Effect Table 4: Robustness of adversarial training in vanilla quantization. We re-evaluate the error in latent layers to investigate whether Test accuracy in %. Models are quantized to 1-bit and 2-bit in the the method manages to silence it. The relative distance is upper and lower part. defined in (4) and sampled after every ResNet module. The experiment is conducted on ResNet-152 and CIFAR-100. 5.4.1 Results 5.3.1 Results As in the upper part of Table 4, adversarial training in vanilla Though the input is perturbed by the same magnitude, the quantization retains limited robustness and is not compara- error is amplified quite differently in Figure 2. With conven- ble to our method. For naturally trained models, adversarial tional quantization, the error of the ADV-VQ-BWN model training promotes robustness to 19% against PGD but lags increases up to 4 times of the ADV-Full model, which is a 1% behind our method. For adversarially trained models, ad- possible explanation of the large robustness drop. While with versarial training fails to maintain the robustness and leaves a our method, the models managed to lower the error than its drop of 5.5%, which is the triple of ours. full-precision counterpart throughout the inference. We hold that the following hypothesis may lead to the in- [Xu et al., 2018] conclude that image quantization, i.e., re- consistent performances of adversarial training in the context duction in color bit depth is an effective defense. However, of ordinary training and quantization: (i) Quantization limits quantization of network weight instead weakens robustness. the capacity of the model, while adversarial training requires [Lin et al., 2019] proved that it tends to intensify the Error a significantly large capacity. (ii) With limited capacity, the Amplification Effect when  > 3/255, which even starts from model faces difficulty in learning and therefore suffers from  = 1/255 in our experiments (Table 1). Our method ob- lower accuracy on both clean and adversarial inputs. In con- tains significant results, overcomes the threshold and further trast, the model learns to predict only clean inputs and infer pushes it beyond  = 8/255 as in Figure 2. close activation on adversarial inputs with our method. We apply additional experiments on 2-bit quantization to 5.4 Beyond standalone adversarial training demonstrate the hypothesis above. Though TWN models To prove the necessity of pairing, we append experiments learn higher accuracy on training set, which confirms our hy- of adversarial training in vanilla quantization on ResNet-152 pothesis that adversarial training is hindered by limited net- and CIFAR-100. work capacity, they attain the same and even inferior results For adversarial training in vanilla quantization, models are on test set compared to BWN models. It draws conclusion fed with perturbed samples only and updated by the original that while higher bandwidth enables adversarial training, it min-max optimization. All adversarial samples are generated itself undermines robustness ([Lin et al., 2019]). In contrast, with the same PGD attacker as in the white-box section and our method better balances the trade-off between adversarial all models are quantized for 120 epochs. training and low bandwidth weight. Pairing Clean FGSM PGD DDN C&W Logit 54.09 13.18 20.31 12.20 19.70 Activation 49.65 11.80 18.01 13.10 19.70 Logit 50.74 10.78 19.03 11.20 16.90 Activation 49.54 10.26 18.37 11.04 16.18 Table 5: Robustness of EQ with different pairing target. Test accu- racy in %. Models are quantized to 1-bit and 2-bit in the upper and lower part. 6 Discussions In this section, we discuss the equivalence of different pairing (a) Decision-based Boundary attack test accuracy (in %). scheme and assume pairing logits as a universal pairing. We also discuss the obfuscated gradients problem which under- mines many previous defenses and further secure the robust- ness of our method. 6.1 Equivalence of different pairing While we offer a general pairing object in (6) and (7) that can be any layers, only the output logits is paired in experi- ments. Here we reveal that though pairing the activation may produce lower errors, pairing the logits achieves the same ac- curacy and better balances training costs and performances. We investigate ResNet-152 on CIFAR-100 and pair the acti- vation after the 4th, 12th and 48th ResNet module. In Table 5, the close accuracy of two pairing schemes shown confirms that pairing more activation provides minor (b) Score-based N Attack test accuracy (in %). improvements while it requires considerable additional com- putations and storage of intermediate results. It brings a large Figure 3: Black-box attack test results on CIFAR-100. Abbrevia- cost of memory space, especially when training with GPU. tions: NAT- for naturally trained, ADV- for adversarially training, - Furthermore, pairing activation may introduce unnecessary EQ- for Error-silenced Quantization, -Full for full precision, -BWN requirements on network capacity, as in the case of adversar- for binary weight. ial training. The smaller gap between two pairing settings on TWN is also an implication of it. obfuscated gradient problem and provides a secured sense of 6.2 Secure the sense of robustness robustness. We suppose a possible explanation that we use A noticeable coincidence is that our simplified activation pair- untargeted attacks for training while [Kannan et al., 2018] ing scheme, pairing logits, is considerably similar to the Ad- use targeted attacks. versarial Logit Pairing forwarded in [Kannan et al., 2018]. With the method, the author claims state-of-the-art robustness 7 Conclusion on ImageNet. However, it is found [Athalye et al., 2018] to This paper aims to tackle the issue of achieving both robust- suffer severely from obfuscated gradients and provide a false ness and compactness in DNNs. Inspired by the Error Am- sense of security that can be easily circumvented with non plification Effect, we relax the capacity requirements of ad- gradient-based attacks. versarial training by pairing, and propose a quantization that In [Athalye et al., 2018], it is reported that defenses suf- optimizes accuracy on benign and adversarial inputs simulta- fering from obfuscated gradients are vulnerable to black-box neously. Extensive experiments throughout four threat mod- attacks that operate by estimating instead of directly solv- els, two datasets and two networks endorse the superior ro- ing gradients. To thoroughly examine whether our method bustness of the proposed method over vanilla approaches and is truly secure, we test it with L2 bounded Boundary at- even full-precision counterparts, while still reach high com- tack [Brendel et al., 2018] and N Attack [Li et al., 2019] pression rates. Appended by a guarded notion of secure from for decision-based and score-based black-box attacks, respec- obfuscated gradients, our method managed to bridge robust- tively. We vary perturbation strength from  = 0 to  = 4 and ness and compactness for DNNs and further applications. compare the accuracy of quantized models with full-precision counterparts. As shown in Figure 3a and 3b, our quantization achieve References consistently close or better than the ADV-Full model as the [Athalye et al., 2018] Anish Athalye, Nicholas Carlini, and strength varies. All results confirm that our method meets no David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial exam- [Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever, ples. In ICML, volume 80, 2018. and Geoffrey E. Hinton. Imagenet classification with deep [Brendel et al., 2018] Wieland Brendel, Jonas Rauber, and convolutional neural networks. In NIPS, 2012. Matthias Bethge. Decision-based adversarial attacks: Re- [Kurakin et al., 2017] Alexey Kurakin, Ian J. Goodfellow, liable attacks against black-box machine learning models. and Samy Bengio. Adversarial machine learning at scale. In ICLR. OpenReview.net, 2018. In ICLR. OpenReview.net, 2017. [Carlini and Wagner, 2017] Nicholas Carlini and David A. [Li and Liu, 2016] Fengfu Li and Bin Liu. Ternary weight Wagner. Towards evaluating the robustness of neural net- networks. In NIPS workshop on EMDNN, 2016. works. In IEEE Symposium on Security and Privacy, 2017. [Li et al., 2019] Yandong Li, Lijun Li, Liqiang Wang, Tong [Devlin et al., 2019] Jacob Devlin, Ming-Wei Chang, Ken- Zhang, and Boqing Gong. NATTACK: learning the distri- butions of adversarial examples for an improved black-box ton Lee, and Kristina Toutanova. BERT: pre-training of attack on deep neural networks. In ICML, 2019. deep bidirectional transformers for language understand- ing. In NAACL-HLT, 2019. [Liao et al., 2018] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense [Dong et al., 2019] Yinpeng Dong, Renkun Ni, Jianguo Li, against adversarial attacks using high-level representation Yurong Chen, Hang Su, and Jun Zhu. Stochastic quanti- guided denoiser. In CVPR, 2018. zation for learning accurate low-bit deep neural networks. International Journal of Computer Vision, 2019. [Lin et al., 2019] Ji Lin, Chuang Gan, and Song Han. De- fensive quantization: When efficiency meets robustness. [Eykholt et al., 2018] Kevin Eykholt, Ivan Evtimov, Ear- In ICLR. OpenReview.net, 2019. lence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, [Madry et al., 2018] Aleksander Madry, Aleksandar Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust Makelov, Ludwig Schmidt, Dimitris Tsipras, and physical-world attacks on deep learning visual classifica- Adrian Vladu. Towards deep learning models resistant to tion. In CVPR, 2018. adversarial attacks. In ICLR. OpenReview.net, 2018. [Galloway et al., 2018] Angus Galloway, Graham W. Taylor, [Rastegari et al., 2016] Mohammad Rastegari, Vicente Or- and Medhat Moussa. Attacking binarized neural networks. donez, Joseph Redmon, and Ali Farhadi. Xnor-net: Ima- In ICLR. OpenReview.net, 2018. genet classification using binary convolutional neural net- [Goodfellow et al., 2015] Ian J. Goodfellow, Jonathon works. In ECCV, 2016. Shlens, and Christian Szegedy. Explaining and harnessing [Rony et al., 2019] Jérôme Rony, Luiz G. Hafemann, Luiz S. adversarial examples. In ICLR, 2015. Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric [Graves et al., 2013] Alex Graves, Abdel-rahman Mohamed, Granger. Decoupling direction and norm for efficient and Geoffrey E. Hinton. Speech recognition with deep gradient-based L2 adversarial attacks and defenses. In recurrent neural networks. In ICASSP, 2013. CVPR, 2019. [Sharif et al., 2016] Mahmood Sharif, Sruti Bhagavatula, [Gui et al., 2019] Shupeng Gui, Haotao Wang, Haichuan Lujo Bauer, and Michael K. Reiter. Accessorize to a crime: Yang, Chen Yu, Zhangyang Wang, and Ji Liu. Model com- Real and stealthy attacks on state-of-the-art face recogni- pression with adversarial robustness: A unified optimiza- tion. In ACM CCS, 2016. tion framework. In NeurIPS, 2019. [Su et al., 2018] Dong Su, Huan Zhang, Hongge Chen, Jin- [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing feng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the Ren, and Jian Sun. Deep residual learning for image recog- cost of accuracy? - A comprehensive study on the robust- nition. In CVPR, 2016. ness of 18 deep image classification models. In ECCV, [Jacob et al., 2018] Benoit Jacob, Skirmantas Kligys, 2018. Bo Chen, Menglong Zhu, Matthew Tang, Andrew G. [Szegedy et al., 2014] Christian Szegedy, Wojciech Howard, Hartwig Adam, and Dmitry Kalenichenko. Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Er- Quantization and training of neural networks for efficient han, Ian J. Goodfellow, and Rob Fergus. Intriguing integer-arithmetic-only inference. In CVPR, 2018. properties of neural networks. In ICLR, 2014. [Kannan et al., 2018] Harini Kannan, Alexey Kurakin, and [Wu et al., 2016] Jiaxiang Wu, Cong Leng, Yuhang Wang, Ian J. Goodfellow. Adversarial logit pairing. CoRR, Qinghao Hu, and Jian Cheng. Quantized convolutional abs/1803.06373, 2018. neural networks for mobile devices. In CVPR, 2016. [Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba. [Xu et al., 2018] Weilin Xu, David Evans, and Yanjun Qi. Adam: A method for stochastic optimization. In ICLR, Feature squeezing: Detecting adversarial examples in deep 2015. neural networks. In NDSS, 2018. [Krizhevsky and Hinton, 2009] A. Krizhevsky and G. Hin- [Zagoruyko and Komodakis, 2016] Sergey Zagoruyko and ton. Learning multiple layers of features from tiny images. Nikos Komodakis. Wide residual networks. In BMVC, University of Toronto, Tech. Rep, 2009. 2016.