Fool Object Detectors with L0-Norm Patch Attack Honglin Li* Yunqing Zhao Westlake University Singapore University of Technology and Design Hangzhou, Zhejiang, China Singapore lihonglin@westlake.edu.cn yunqing_zhao@mymail.sutd.edu.sg ABSTRACT 2.1 Competition Understanding Deep Neural Networks based Object Detection algorithms have The competition adopts two object detectors known by the competi- shown their remarkable performance and been widely applied in tors (known as white-box attacks) and another two black-box object various aspects in recent years. However, for those areas which draw detectors for evaluation. The competitors are asked to generate ad- much attention to the robustness and security of the model, like versarial examples by adding a small number of patches (less than Autonomous Driving and Biomedical Image Analysis, there are still 10) to each image (also known as L0 attacks) offline. challenges to make domain users look at these detectors as reliable methods. In this paper, we mainly focus on object detection and 2.1.1 White-box Models: The 2 white-box model are the famous propose our method to fool object detectors with with L0 -Norm Faster R-CNN [11] and YOLOv4 [1] respectively. Faster R-CNN patch attack. is known as a 2-stage detectors. The first stage, or RPN [11], will classify a coarse fore/back-ground binary result for each anchor, CCS CONCEPTS then most of the background anchors will be neglected by threshold, and the second stage will make final prediction based on remaining • Computing methodologies → Object detection; • Security and anchors. YOLOv4 is an 1-stage detector, and both of this 2 white- privacy; boxes will predict on two aspects: the object’s bounding box size and location, which is a regressor, and the object’s category, which KEYWORDS is a classifier. object detection, adversarial attack, projected gradient descent 2.1.2 Data, Constrains and Evaluation Metric: The competi- tion provides 1000 COCO test samples without annotation, where 1 INTRODUCTION object detectors will predict the 81 (80 types of foreground object, Deep Neural Networks based computer vision algorithms have and 1 for background) categories’ confidence and the bounding-box’ shown their remarkable performance but also faced with vulner- size and location. ability and security problems. Adversarial attack on Image Clas- First limitation is the maximum limit of changed pixels rate. Second sification has made significant progress [4], also spurred the ad- is the maximum limit of patches’ number. The goal of the adver- vancement of the robustness of the classification models [12], and saries is to make all bounding boxes failed to be returned, by adding some attack research on object detection has also been made [16]. To the patches to images. As for evaluation score, on one hand the less make it deeper, CIKM-2020 and Aliyun-Tianchi host the AnalytiCup bounding boxes given by the adversarial example, the higher the workshop competition [3] about generating adversarial sample from score, on the other hand the less changed pixel rate, the higher the COCO [8] to attack 4 mainstream object detection models. In this score. More specific definition can be found in [3]. paper, we propose our solution based on Projected Gradient Descent [10] with l0 -norm towards this competition. To meet the rule of the 2.2 Related Work competition that the modified areas should be as smaller as better and must be constructed as connected domain, we transfer such rule A number of attacks for object detectors have been developed re- into solvable problem, and make our efforts by k-means and Prim cently [16]. [15] extends the attack method from classification to for higher score. Eventually, we get a fine result and rank 10th out detection and demonstrates that it is possible to attack objectors us- of 1701 teams. Code has been made publicly available at our github ing a designed classification loss. [9] generate adversarial examples code repo. that fool detectors for stop sign and face detections. [7] proposes to attack the RPN with a specially designed hybrid loss incorporating 2 BACKGROUND both classification and localization terms. Apart from the full images, [6] attack detectors by restricting the attacks to be within a local In this section, we first make description and understanding of the region. competition, then provide some background knowledge and review the related works about adversarial attack on Object Detection. 2.3 Projected Gradient Descent(PGD) * Corresponding author The PGD [10] attack aims at maximizing the loss: max{Jytarget , y pred } from the viewpoint of robustness optimization. In each iteration, the Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons PGD first modifies x by ∇x L, then it will take projection to norm License Attribution 4.0 International (CC BY 4.0). ball. In: Dimitar Dimitrov, Xiaofei Zhu (eds.): Proceedings of the CIKM AnalytiCup 2020, 22 October, 2020, Gawlay (Virtual Event), Ireland, 2020, published at http://ceur-ws.org. For some attack problems where the ground-truth label is not given, we can propose a loss function towards bad prediction by 17 trade-off between the attack ability and the transferability[4]. While there are K models, we should fuse the output of these models, e.g. for binary classification logits output, we can use weighted averaging Lx = wk lk x. The momentum iterative-FGSM[4] also trade-off between the white- box attacks and the transferability. Intuitively, the adversarial exam- ple can easily drop into poor local optima in searching landscape and ‘overfit’ the specific model. So, using the momentum of gradients can smooth the above problem and makes it more transferable. 2.6 Minimum Spanning Tree Figure 1: PGD iteration: X is the original input, also initial state An edge-weighted graph is a graph where we associate weights of adversarial sample X0atk , in each gradient descent iter step or costs with each edge. A minimum spanning tree (MST) of an atk will be modified into X tmp . Then, if X tmp lies i = 1, 2, ..., n, Xi−1 edge-weighted graph is a spanning tree whose weight (the sum of i i outside of the norm box, it will be projected onto the norm box’ the weights of its edges) is no larger than the weight of any other tmp edge as Xiatk , else Xiatk = Xi . spanning tree.[13] There are two mainly algorithms to solve the prblem: Prim’s algo- rithm, and Kruskal’s algorithm.[13] setting synthetic bad label, e.g. in classification problem we may set all labels into a specific category, like 0 in MNIST. Then the original PGD can be used by min{Jysynthetic , y pred }. 2.4 Sparse l0 -attack In an l0 -attack one is interested in finding the smallest number of pixels which need to be changed so that the decision changes. We show an adversarial image with l0-attack in Fig.1. From a practical point of view the l0-attack tests basically how vulnerable the model is to failure of pixels or large localized changes on an object e.g. a sticker on a refrigerator or dirt/dust on a windshield [2], or stained points on cell microscopy image. Figure 3: A MST sample, V: vertex, E: edge 3 METHODOLOGY In this section, we elaborate the details of the proposed method. We first illustrate how to employ the PGD to attack the detectors with l0 -norm. Then we describe our approach to meet the constrains, and tell how to trade off the dilemma that fewer modified pixels results in higher score coefficient but lower attack performance. 3.1 Loss Function Considering the fact that only if the classifier output as foreground Figure 2: Left: original image with box for zoom. Right: l0 - and the class confidence is bigger than NMS threshold, the final attack, only 0.04% are changed, but the final classification result predict result will be given, our method can be quite straight forward: is quite different[2]. only attack the classifier of detectors. Even though the competition does not provide any annotation, thanks for the NMS threshold we can make attack. The attack loss function is defined as follow: 2.5 Better Black-box Attack, or More 1 N Transferable Jx = smlx · 1smlx>τ , (1) N i=1 There are some techniques proposed for better transferable attack Where N is the number of predicted objects, and sml is the softmax on black-box. And in this paper, we only discuss 2 gradient-based output, and τ is a threshold more strict than NMS threshold. 1 is methods. a indicator function that return 1 if smlx > τ else 0. The target of Ensemble of Models has been used to improve traditional prediction. this function is trying to make all foreground prediction can not pass When attacking ensemble of models, the adversarial sample can NMS. 18 3.2 Attacking Procedure Our attacking framework is summarized in Algorithm 1. Input sam- ple x to the given detectors, then the softmax probability of a specific category can be obtained. Since background will be neglected by NMS, we only store the foreground’s probability. Above input to store process can be denoted as smlk x. By calculate the loss function, we can make back propagation the gradient on x is got.(We just give a simple description here about how to iterate the momentum, more details like its cold start problem and various variants can be seen in [5].) At last, if the attack result is outside of the box, we should projected it onto the norm bound like Fig.1. Algorithm 1 PGD l0 -attack on ensemble of models Input: A sample x, the foreground’s softmax function of K models: Figure 4: Group pixels to 10 areas with k-means clustering, the sml1 , sml2 , ..., smlK and weights: w1 , w2 , ..., wK . dots are the pixels, the stars are centers of each group. Hyper-params: Training epochs T , gradient momentum update factor µ, learning rate lr. Output: An adversarial example x∗ . x0∗ = x, momentum g0 = 0; for t = 0 to T − 1 do Input xt∗ , get sml0 xt∗ , sml1 xt∗ , ..., smlK xt∗ ; Fuse the K logits as lxt∗ = K k=1 wk smlk ; Get loss Jxt∗ based on lxt∗ and Eq.(1); Obtain current gradient ∇x Jxt∗ gt1 = gt ∗ 1 − µ ∇x Jxt∗ ∗ µ ∗ = x∗ − lr ∗ g xt1 t t1 update xt1∗ by projection onto l -norm box 0 end for return x∗ = xT∗ 3.3 Construct Connected Domains We solve the connected domains rule of this competition by K-means Figure 5: The training process for a sample. Due to the fixed and then convert sub-problem to a MST. We first set a specific pixels’ form of mask after epoch 30, the pre-train with more top K pix- number as β for l0 -attack, and train the adversarial sample with els is useful. Algorithm.1 for certain epochs. Then, we group these pixels into K areas by K-means, which can be visualized in Fig.4. The distance metric of two pixels is given by Chessboard Distance, or known value are 0 − 255 while the gradients are quite small. The momen- as Chebyshev Distance[14]. After grouping, we connect the pixels tum update factor µ = 0.5 is soft, and weights relatively high on the within their group by method in 2.1.4. During connecting process, gradients of current iteration compared with the usage other task, we use these pixels’ value before l0 -norm projection, and in fact this like traditional image classification. We set the first 30 epochs to is also a projection. If the value is same to original input, we make train sample with 8000 pixels l0 limitation, then use 200 epochs to slight modification on that value to keep connectivity. train sample with less pixels with connecting operation, finally we train 30 epochs to make sure the the sample quantized into uint8 type 4 EXPERIMENTS which helps maintain most adversarial ability. The first 30 ‘pre-train’ In this section, we show our experiments based on the methodol- epochs’ improvement can be visualized in Fig.5. ogy in section 3. In our experiments, we only attack the 2 provided We show our result in Table.1 by adding the methods during the white-box models due to the computational resources limitation. competition, the overall score list are recorded at the our subscription YOLOv4 make binary classification on whether the object is fore- result on the competition entrance. And we visualized a perturbed ground, and its NMS threshold (0.4) is different from Faster R- image and its patch in Fig.6. CNN(0.3). We set the threshold τ in Eq.1 as 0.15 for YOLO and 0.25 for Faster R-CNN. The latter output 81 class while the former 5 DISCUSSION AND CONCLUSION is binary, so the corresponding τ is relatively bigger. The value also In this paper, we show our PGD l0 -attack adversarial solution to- take the output’s unbalanced distribution into consideration. wards the competition’s target. We find that there is no need to make We set our learning rate as 60000, because the attack on image’s hand-crafted patches into the image, because by PGD searching 19 Score [5] Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- A YOLO Overall Black Module mization. International Conference on Learning Representations (12 2014). [6] Yuezun Li, Xian Bian, and Siwei Lyu. 2018. Attacking Object Detectors via handcraft patch, A - - 61 - Imperceptible Patches on Background. ArXiv abs/1809.05966 (2018). http: k-means 250 31 304 23 //arxiv.org/abs/1809.05966 YOLO only 49 683 783 51 [7] Yuezun Li, Daniel Tian, Ming-Ching Chang, Xiao Bian, and Siwei Lyu. 2018. Robust Adversarial Perturbation on Deep Proposal-based Models. CoRR A + YOLO 231 651 1036 154 abs/1809.05962 (2018). http://arxiv.org/abs/1809.05962 momentum 281 997 1635 357 [8] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and Larry Zitnick. 2014. Microsoft COCO: Common Prims, 1-stage A 820 1255 2547 472 Objects in Context. In ECCV (eccv ed.). European Conference on Computer quantized training 955 1335 2836 546 Vision. https://www.microsoft.com/en-us/research/publication/microsoft-coco- common-objects-in-context/ Table 1: The ‘A’ means Faster R-CNN. Firstly we use hand- [9] Jiajun Lu, Hussein Sibai, and Evan Fabry. 2017. Adversarial Examples that Fool crafted patches and only attack A, the pixels side patches will Detectors. CoRR abs/1712.02494 (2017). http://arxiv.org/abs/1712.02494 be modified. Then we use K-means to get 10 proposal areas, and [10] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial the score gets better. We also attack YOLO only, and find that Attacks. (06 2017). YOLO is much easier to be destroyed. By training on ensemble [11] Shaoqing Ren, Kaiming He, Ross. Girshick, and Jian. Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE of the 2 models, we get higher black box score, even though the Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137– white-box score shows a little decrease. By adding momentum, 1149. https://doi.org/10.1109/TPAMI.2016.2577031 the increase is promising. To lower the number of pixels, we op- [12] Andrew Ross and Finale Doshi velez. 2017. Improving the Adversarial Robust- ness and Interpretability of Deep Neural Networks by Regularizing their Input timized the connecting method by Prims together with hybrid Gradients. (11 2017). stage-1 of A into loss function, and we get a notable improve- [13] Robert Sedgewick and Kevin Wayne. 2000. Algorithms, 4th Edition. Retrieved ment. Finally, we quantize the adversarial sample after every 3 January 12, 2021 from https://algs4.cs.princeton.edu/home/ [14] F. Van der Heijden, Robert Duin, D. Ridder, and David Tax. 2004. Classification, epochs, which lowers the train/test error. Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB. (01 2004). https://doi.org/10.1002/0470090154 [15] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. 2017. Adversarial Examples for Semantic Segmentation and Object De- tection. In 2017 IEEE International Conference on Computer Vision (ICCV). 1378–1387. https://doi.org/10.1109/ICCV.2017.153 [16] H. Zhang and J. Wang. 2019. Towards Adversarially Robust Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 421–430. https://doi.org/10.1109/ICCV.2019.00051 (a) perturbed image (b) modified areas Figure 6: (a) is a perturbed image result; (b) is the difference of original image and perturbed image. Most of the perturbed pixels lie near to the 3 zebra objects. this can be done automatically. We also use K-means and Prims to handle the game’s constrain. For l0 attack, the top-K selection is intuitive because the amplitude of gradients on the input interpret how important they are for the prediction target. For the reason that the connecting process is too slow, we just connect once and keep patches fixed, which may hinder the searching result and would be optimized in future work. REFERENCES [1] Alexey Bochkovskiy, Chien Yao Wang, and Hong Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. (2020). https: //arxiv.org/abs/2004.10934 [2] Francesco Croce and Matthias Hein. 2019. Sparse and Imperceivable Adversarial Attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). [3] Dimitar Dimitrov and Xiaofei Zhu. 2020. CIKM2020 Analyticup: Alibaba- Tsinghua Adversarial Challenge on Object Detection. Retrieved January 12, 2021 from https://www.cikm2020.org/adversarial-challenge-on-object-detection/ [4] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. 2018. Boosting Adversarial Attacks With Momentum. CVPR (2018), 9185–9193. 20