=Paper=
{{Paper
|id=Vol-2881/paper9
|storemode=property
|title=Fooling Object Detectors:Adversarial Attacks by Half-Neighbor Masks
|pdfUrl=https://ceur-ws.org/Vol-2881/paper9.pdf
|volume=Vol-2881
|authors=Yanghao Zhang,Fu Wang,Wenjie Ruan
}}
==Fooling Object Detectors:Adversarial Attacks by Half-Neighbor Masks==
Fooling Object Detectors: Adversarial Attacks by Half-Neighbor Masks Yanghao Zhang∗ Fu Wang∗ Wenjie Ruan† University of Exeter Guilin Univ. of Electronic Technology University of Exeter Exeter, EX4 4QF, UK Guilin, Guangxi, 541004, China Exeter, EX4 4QF, UK yanghao.zhang@exeter.ac.uk fuu.wanng@gmail.com w.ruan@exeter.ac.uk ABSTRACT Method [2] and Projected Gradient Descent (PGD) [7]. At the same Although there are a great number of adversarial attacks on deep time, some studies show that DNN based object detection models learning based classifiers, how to attack object detection systems are also facing the same threat [4, 6, 16]. has been rarely studied. In this paper, we propose a Half-Neighbor In this paper, we introduce an adversarial attack framework, Masked Projected Gradient Descent (HNM-PGD) based attack, called HNM-PGD, which can fool different types of object detec- which can generate strong perturbation to fool different kinds of de- tors under two strict constraints concurrently. Our method first tectors under strict constraints. We also applied the proposed HNM- identifies a mask that meets the constraints, and then generates an PGD attack in the CIKM 2020 AnalytiCup Competition, which was adversarial example by perturbing a specific area that is constrained ranked within the top 1% on the leaderboard. We release the code by the mask. Adversarial examples generated in this way are guar- at https://github.com/YanghaoZYH/HNM-PGD. anteed to satisfy the limitation in terms of the number of perturbed pixels and connectivity regions, while remaining a high efficiency. CCS CONCEPTS One key novelty in HNM-PGD lies on that it enables an automatic process without handcraft operation, which provides a practical • Computing methodologies → Neural networks; • Security and solution for many real-world applications. As a by-product of this privacy → Software and application security. attack strategy, we found that some perturbations contain clear se- mantic information, which are rarely identified by previous studies KEYWORDS and provide some insights regarding the internal mechanisms of deep learning, object detector, adversarial attack, ℓ0 constraint object detectors. 1 INTRODUCTION 2 BACKGROUND Object detection is one of the most fundamental computer vision tasks, which not only performs image classification [17, 19] but 2.1 Object Detection Models also identifies the locations of the objects in an image. Now ob- Given an input example 𝑥, an object detector can described as ject detection has been widely applied as an essential component 𝑓 (𝑥) = 𝑧, where 𝑧 represents the output vector of the detector. Con- in many applications that requires a high-level security, such as sidering YOLOv4 [1] and Faster RCNN [8] as our target models, the identity authentication [11], autonomous driving [5], and intrusion information contained in 𝑧 is slightly different, and as an adversary detection [6]. In recent years, we witness the significant progress under white-box setting, our goal is to make target models fail to has been made in object detection, especially by taking the ad- detect the objects in the given examples. Thus we focus on the tar- vantage of deep learning models. However, deep learning based get models’ confidence about the existence of objects in 𝑥. For each object detection systems are also demonstrated to be vulnerable to pre-defined box, YOLOv4 directly outputs its confidence 𝑧 conf : R adversarial examples [6]. The adversarial example was first identi- about there is an object inside this box. If 𝑧 conf is above the given fied by Szegedy et al. [13], primarily on classification tasks, they threshold, YOLOv4 model views this box as a potential object con- showed that maliciously perturbed examples can fool a well-trained tainer, i.e. the area that may include objects. Faster RCNN does not Deep Neural Network (DNN) to output wrong predictions. After output 𝑧 conf , nevertheless, it introduces an extra background class that, a great number of methods have been proposed to generate and make predictions based on its classification result 𝑧 cls : R𝐶+1 , adversarial examples [3, 18], notably such as First Gradient Sign where 𝐶 is the number of classes. Suppose 𝑧𝑖cls is the maximal item ∗ Both authors contributed equally to this research. This work is done when Fu Wang in 𝑧 cls , if 𝑧𝑖cls is greater than a given threshold and 𝑖 ≠ 𝐶 + 1, then was visiting the Trustworthy AI Lab at University of Exeter. the corresponding box will be viewed as the potential container. † Corresponding author. This work is supported by Partnership Resource Fund (PRF) on Towards the Accountable and Explainable Learning-enabled Autonomous Robotic Systems from UK EPSRC project on Offshore Robotics for Certification of Assets 2.2 Constraints of Perturbation (ORCA) [EP/R026173/1], and the UK Dstl project on Test Coverage Metrics for Artificial In this paper, the restrictions of the adversary are i) the number of Intelligence. perturbed pixels is not more than 2% of the whole; ii) the number Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons of 8-connectivity regions is not greater than 10. Except for these License Attribution 4.0 International (CC BY 4.0). two constraints, there are no limitations on the magnitude of the In: Dimitar Dimitrov, Xiaofei Zhu (eds.): Proceedings of the CIKM AnalytiCup 2020, 22 October, 2020, Gawlay (Virtual Event), Ireland, 2020, published at http://ceur-ws.org. adversarial perturbation. Because both constraints are related to the number of pixels, this belongs to the ℓ0 norm attack. 32 Figure 1: An illustration for the workflow of the proposed HNM-PGD. Algorithm 1 Half-Neighbor Masked PGD (HNM-PGD) 3 METHODOLOGY Input: A given example 𝑥, number of random initialization 𝑛, con- 3.1 Mask Finding trol parameter 𝜙, HN kernel size 𝑘 and adjust step 𝑠, number We first propose a mask generation method to locate perturbation of PGD iterations 𝑃, PGD step size 𝛼 regions for any given examples. Apparently, the size and shape of Output: 𝛿 1 Í perturbation regions are critical to conduct a successful adversarial 1: 𝑆𝑥 = 𝑛 𝑛 𝑖=1 ∇𝑥 𝐿 (𝑓 (𝑥 + 𝜂𝑖 )) attack under the constraints. Therefore, we use salience map to 2: repeat capture the model’s response toward each pixel in 𝑥 at beginning. 3: 𝑧 resp = Mean(𝑆𝑥 ) + 𝜙 Std(𝑆𝑥 ) After compute an example’s salience map, we initialize the mask via 4: Initialize mask 𝑀 via 𝑧 resp only keep pixels that the model’s respond is larger than a threshold 5: while 𝑘 > 3 do 𝑧 resp . To automatically carry out this initialization, we borrow the 6: 𝑀 = HN(𝑀, 𝑘) idea of standard deviation and coverage from Gaussian distribution, 7: 𝑀 = HN(𝑀, 3) and compute 𝑧 resp via the mean and standard deviation of 𝑆𝑥 , which 8: 𝑘 = 𝑘 −𝑠 can be described as 𝑧 resp = Mean(𝑆𝑥 ) + 𝜙 Std(𝑆𝑥 ), where 𝜙 is a 9: end while control parameter. 10: 𝜙 = 𝜙 + 0.1 To meet the pixel constraints, we follow the spirit of K Nearest 11: until 𝑀 meets constraints Neighbor algorithm to refine the mask. Specifically, if half of a 12: Random initialize 𝛿 pixel’s neighbors have been chosen by the current mask, then this 13: 𝛿 = 𝛿 × 𝑀 ⊲ × : element-wise product pixel would also be chosen, otherwise it will be discarded. We 14: for 𝑖 = 1 . . . 𝑃 do employ two convolution kernels whose parameters are all 1 to 15: 𝛿 = 𝛿 + 𝛼 · sign (∇𝛿 𝐿 (𝑓 (𝑥 + 𝛿))) conduct this Half-Neighbor (HN) procedure. The first kernel aims 16: 𝛿 =𝛿 ×𝑀 to reduce the number of pixels in a mask, and its size is gradually 17: 𝛿 = max(min(𝛿, 0 − 𝑥), 1 − 𝑥) changed during iterations. The second kernel is fixed to 3×3, it can 18: end for guarantee that there are no isolated pixels in the mask and reduce the number of connectivity regions (See lines 5–9 in Algorithm 1). If the mask still does not meet the constraints, the algorithm will adjust 𝜙 accordingly and search again. 2.3 Salience Map Salience map is a common tool to analyze and interpret DNN mod- 3.2 Masked PGD Attack els’ behaviors. Smilkov et al. [12] proposed SmoothGrad method to generate stable salience maps. Given a loss function 𝐿, the salience Once the perturbation regions are located, we generate adversarial map is given by examples via PGD iterations. The basic idea here is summarized in Algorithm 1, where the selected regions are perturbed by a 𝑛 PGD adversary via conducting element-wise products between 1Õ 𝑆𝑥 = ∇𝑥 𝐿 (𝑓 (𝑥 + 𝜂𝑖 )) , (1) perturbation 𝛿 and mask 𝑀. The workflow of the proposed defense 𝑛 𝑖=1 is shown in Fig. 1. Due to the difference in the object detectors’ output 𝑧, we need where 𝜂𝑖 are white noise vectors that sampled i.i.d. from a Gaussian to consider YOLOv4 and Faster RCNN separately. As we discussed distribution. in section 2.1, YOLOv4 directly outputs its confidence, so Binary 33 (a) (b) (c) Figure 2: Adversarial patches examples generated by the proposed HNM-PGD, the generated patches are mostly located in the semantic part of the object, such as the horse’s eye, skateboard and human’s body. Cross-Entropy (BCE) loss is a suitable option to conduct adversarial attack. Suppose there are 𝑚 pre-defined boxes, and the maximal confidence is 1, BCE loss can be simplified as 𝑚 Õ 𝐿𝑦𝑜𝑙𝑜 (𝑧) = log 𝑧𝑖conf, (2) 𝑖=1 where 𝑧 is the output of a detector and 𝑧𝑖conf ∈ 𝑧. Different from YOLOv4, there are 𝐶 + 1 classes in Faster RCNN’s classification result, including 𝐶 foreground objects and 1 back- ground class. To force the detector to classify an adversarial example into the background class, we conduct a targeted adversarial attack with a negative Cross Entropy (CE) loss, which can be written as Õ cls 𝐿 𝑓 𝑟𝑐𝑛𝑛 (𝑧) = 𝑧𝐶+1 − log exp(𝑧 cls 𝑗 ) , (3) 𝑗 Figure 3: Performance on the toy dataset with the increasing where 𝑧 cls is the classification output of Faster RCNN detector, amount of pixel. and 𝑧 cls ∈ 𝑧. Note that we can attack YOLOv4 and Faster RCNN simultaneously by simply using HN masked PGD mixmize 𝐿𝑦𝑜𝑙𝑜 + 𝐿 𝑓 𝑟𝑐𝑛𝑛 . Faster RCNN Similar to the configuration of YOLOv4, we resize the input to 800×800 with bilinear interpolation. As the permitted 4 EXPERIMENTS threshold for Faster RCNN is 0.3, which is relatively lower than YOLOv4. In practice, we assign a smaller threshold 0.1 when calcu- To demonstrate our method, we select 100 images from MS COCO lating the loss to enable more boxes to appear. dataset as a toy dataset and conduct experiments for comparison PGD Settings In this paper, the HNM-PGD is carried out with 40 on two white-box models, i.e. YOLOv4 and Faster RCNN. steps, and the step size is 16/255. 4.1 Implementation Details 4.2 Experimental Results YOLOv4 Regarding the provided model YOLOv4, the input size is In this part, we employ the formula in the description of AnalytiCup set to 608×608 while the original image has 500×500, to allow differ- to calculate the score, which provides a criteria to evaluate the ential, we employ the function torch.nn.Upsample with bilinear performance of the proposed method. Our code is available on interpolation to resolve the resize problem. Due to the approx- Github1 . imation computation of torch.nn.Upsample, we need to allow We produce 100 adversarial examples on the white-box models more boxes to be detected to stabilize the adversarial perturba- with two loss together: 𝐿𝑦𝑜𝑙𝑜 + 𝐿 𝑓 𝑟𝑐𝑛𝑛 . Figure 2 gives several suc- tion. As YOLOv4 method only outputs the foreground objects with cessful examples for the targeted models. We can observe that the 𝑧 conf > 0.5, we adjust the confidence threshold from 0.5 to 0.3 during attack. 1 https://github.com/YanghaoZYH/HNM-PGD 34 proposed method does locate the object correctly, and the added Technology under Grant GDYX2019025, and the Innovation Project patches normally target on their sensitive parts. Figure 3 illustrates of GUET Graduate Education under Grant 2020YCXS042. the overall score with the increasing upper bound of the number of pixel among 100 selected images, and the scores achieved by REFERENCES YOLOv4 and Faster RCNN, respectively. We find that with the in- [1] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 crease of the quantity of pixel, Faster RCNN performs better, while [2] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining this is not the case for YOLOv4. This is because the provided white- and Harnessing Adversarial Examples. In International Conference on Learning box Faster RCNN uses a low tolerate threshold, where sufficient Representations (ICLR). [3] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, and et al. 2020. A survey of safety pixel is needed for successful attack. In terms of YOLOv4, the per- and trustworthiness of deep neural networks: Verification, testing, adversarial formance fluctuates at about 53. Therefore, there is a trade-off when attack and defence, and interpretability. Computer Science Review 37 (2020), choosing the amount of the pixel. 100270. [4] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, We apply the same strategy and perform the adversarial attack Emese Thamo, Min Wu, and Xinping Yi. 2020. A survey of safety and trust- with more steps (800) and smaller step size (4/255) for all 1000 worthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review 37 (2020), 100270. images under the different quantity of pixel, then we pick the best [5] Yann LeCun, Urs Muller, Jan Ben, Eric Cosatto, and Beat Flepp. 2005. Off-Road result obtained on the white-box models as our solution. In the Obstacle Avoidance through End-to-End Learning. In the Advances in Neural final stage of the AnalytiCup competition, we had also tried to Information Processing Systems (NeurIPS). [6] Jiajun Lu, Hussein Sibai, and Evan Fabry. 2017. Adversarial Examples that Fool improve the generalization of the attacking approach on the unseen Detectors. arXiv:1712.02494 model, i.e. black-box. In detail, we add some transformations (like [7] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and flipping/cropping) to the input image, which is expected to not Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations (ICLR). overfit the known white-box models too much. Our final score is [8] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: 2414.87, ranked 17 in the competition. Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NeurIPS). [9] Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2018. Reachability Analysis of Deep Neural Networks with Provable Guarantees. In In Proceedings 5 BEYOND COMPETITION of theInternational Joint Conference on Artificial Intelligence, Stockholm, Sweden, This competition leads to a few interesting research directions. In- 13-19 July. [10] Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, and tuitively, due to the ℓ0 norm constraint, both location and shape of Marta Kwiatkowska. 2019. Global Robustness Evaluation of Deep Neural Net- the perturbation are critical to the attacking performance. We have works with Provable Guarantees for the Hamming Distance. In In Proceedings of reviewed other top contestants’ solutions and found that linear ad- the International Joint Conference on Artificial Intelligence, Macao, China, 10-16 August. versarial patches have a higher impact on the target model’s output [11] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2016. and use less number of pixels than blocky ones, while location is Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recog- less important. This seems because blocky perturbation can only nition. In ACM Conference on Computer and Communications Security (SIGSAC). [12] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Watten- influence a relatively small range of a convolution kernel’s output, berg. 2017. SmoothGrad: removing noise by adding noise. arXiv:1706.03825 while linear perturbation can cross a wider area. To verify such [13] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. conjecture, we wish to adopt verification technologies on neural In International Conference on Learning Representations (ICLR). networks [9, 10, 15] into the object detectors and quantify the worst- [14] F. Wang, L. He, W. Liu, and Y. Zheng. 2020. Harden Deep Convolutional Classifiers case scenario of adversarial patches on object detectors. Besides, via K-Means Reconstruction. IEEE Access 8 (2020), 168210–168218. [15] Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. on the top of our HNM-PGD, we can also expand evaluation of 2020. A game-based approximate verification of deep neural networks with existing adversarial attacks and defenses, such as [14, 18], onto provable guarantees. Theoretical Computer Science 807 (2020), 298–329. object detection tasks. [16] Bin Yan, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2020. Cooling-Shrinking Attack: Blinding the tracker with imperceptible noises. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [17] Shaoning Zeng, Bob Zhang, Yanghao Zhang, and Jianping Gou. 2018. Collabora- 6 CONCLUSION tively weighting deep and classic representation via l2 regularization for image classification. arXiv preprint arXiv:1802.07589 (2018). In conclusion, we propose a PGD-based approach to attack object de- [18] Yanghao Zhang, Wenjie Ruan, Fu Wang, and Xiaowei Huang. 2020. Generalizing tectors using Half-Neighbor masks. In the proposed HNM-PGD, the Universal Adversarial Attacks Beyond Additive Perturbations. arXiv:2010.07788 automatic pipeline allows it to craft adversarial examples/patches [19] Yanghao Zhang, Shaoning Zeng, Wei Zeng, and Jianping Gou. 2018. GNN-CRC: discriminative collaborative representation-based classification via Gabor wavelet automatically under ℓ0 constraint, which can be applied in many transformation and nearest neighbor. Journal of Shanghai Jiaotong University applications, even physical-world attacks. On the other hand, this (Science) 23, 5 (2018), 657–665. end-to-end attack framework also benefits further studies on de- fending object detectors against adversarial attacks and verifying their robustness. ACKNOWLEDGMENTS The work was partially supported by the Guangxi Science and Technology Plan Project under Grant AD18281065, the Guangxi Key Laboratory of Cryptography and Information Security under Grant GCIS201817. Fu Wang is supported by the study abroad program for graduate student of Guilin University of Electronic 35