=Paper= {{Paper |id=Vol-2881/paper9 |storemode=property |title=Fooling Object Detectors:Adversarial Attacks by Half-Neighbor Masks |pdfUrl=https://ceur-ws.org/Vol-2881/paper9.pdf |volume=Vol-2881 |authors=Yanghao Zhang,Fu Wang,Wenjie Ruan }} ==Fooling Object Detectors:Adversarial Attacks by Half-Neighbor Masks== https://ceur-ws.org/Vol-2881/paper9.pdf
                                    Fooling Object Detectors:
                           Adversarial Attacks by Half-Neighbor Masks

                 Yanghao Zhang∗                                                   Fu Wang∗                                       Wenjie Ruan†
              University of Exeter                             Guilin Univ. of Electronic Technology                           University of Exeter
             Exeter, EX4 4QF, UK                                 Guilin, Guangxi, 541004, China                                Exeter, EX4 4QF, UK
          yanghao.zhang@exeter.ac.uk                                  fuu.wanng@gmail.com                                      w.ruan@exeter.ac.uk

ABSTRACT                                                                                      Method [2] and Projected Gradient Descent (PGD) [7]. At the same
Although there are a great number of adversarial attacks on deep                              time, some studies show that DNN based object detection models
learning based classifiers, how to attack object detection systems                            are also facing the same threat [4, 6, 16].
has been rarely studied. In this paper, we propose a Half-Neighbor                               In this paper, we introduce an adversarial attack framework,
Masked Projected Gradient Descent (HNM-PGD) based attack,                                     called HNM-PGD, which can fool different types of object detec-
which can generate strong perturbation to fool different kinds of de-                         tors under two strict constraints concurrently. Our method first
tectors under strict constraints. We also applied the proposed HNM-                           identifies a mask that meets the constraints, and then generates an
PGD attack in the CIKM 2020 AnalytiCup Competition, which was                                 adversarial example by perturbing a specific area that is constrained
ranked within the top 1% on the leaderboard. We release the code                              by the mask. Adversarial examples generated in this way are guar-
at https://github.com/YanghaoZYH/HNM-PGD.                                                     anteed to satisfy the limitation in terms of the number of perturbed
                                                                                              pixels and connectivity regions, while remaining a high efficiency.
CCS CONCEPTS                                                                                  One key novelty in HNM-PGD lies on that it enables an automatic
                                                                                              process without handcraft operation, which provides a practical
• Computing methodologies → Neural networks; • Security and
                                                                                              solution for many real-world applications. As a by-product of this
privacy → Software and application security.
                                                                                              attack strategy, we found that some perturbations contain clear se-
                                                                                              mantic information, which are rarely identified by previous studies
KEYWORDS
                                                                                              and provide some insights regarding the internal mechanisms of
deep learning, object detector, adversarial attack, ℓ0 constraint                             object detectors.

1    INTRODUCTION                                                                             2 BACKGROUND
Object detection is one of the most fundamental computer vision
tasks, which not only performs image classification [17, 19] but                              2.1 Object Detection Models
also identifies the locations of the objects in an image. Now ob-                             Given an input example 𝑥, an object detector can described as
ject detection has been widely applied as an essential component                              𝑓 (𝑥) = 𝑧, where 𝑧 represents the output vector of the detector. Con-
in many applications that requires a high-level security, such as                             sidering YOLOv4 [1] and Faster RCNN [8] as our target models, the
identity authentication [11], autonomous driving [5], and intrusion                           information contained in 𝑧 is slightly different, and as an adversary
detection [6]. In recent years, we witness the significant progress                           under white-box setting, our goal is to make target models fail to
has been made in object detection, especially by taking the ad-                               detect the objects in the given examples. Thus we focus on the tar-
vantage of deep learning models. However, deep learning based                                 get models’ confidence about the existence of objects in 𝑥. For each
object detection systems are also demonstrated to be vulnerable to                            pre-defined box, YOLOv4 directly outputs its confidence 𝑧 conf : R
adversarial examples [6]. The adversarial example was first identi-                           about there is an object inside this box. If 𝑧 conf is above the given
fied by Szegedy et al. [13], primarily on classification tasks, they                          threshold, YOLOv4 model views this box as a potential object con-
showed that maliciously perturbed examples can fool a well-trained                            tainer, i.e. the area that may include objects. Faster RCNN does not
Deep Neural Network (DNN) to output wrong predictions. After                                  output 𝑧 conf , nevertheless, it introduces an extra background class
that, a great number of methods have been proposed to generate                                and make predictions based on its classification result 𝑧 cls : R𝐶+1 ,
adversarial examples [3, 18], notably such as First Gradient Sign                             where 𝐶 is the number of classes. Suppose 𝑧𝑖cls is the maximal item
∗ Both authors contributed equally to this research. This work is done when Fu Wang           in 𝑧 cls , if 𝑧𝑖cls is greater than a given threshold and 𝑖 ≠ 𝐶 + 1, then
was visiting the Trustworthy AI Lab at University of Exeter.                                  the corresponding box will be viewed as the potential container.
† Corresponding author. This work is supported by Partnership Resource Fund (PRF)
on Towards the Accountable and Explainable Learning-enabled Autonomous Robotic
Systems from UK EPSRC project on Offshore Robotics for Certification of Assets                2.2    Constraints of Perturbation
(ORCA) [EP/R026173/1], and the UK Dstl project on Test Coverage Metrics for Artificial        In this paper, the restrictions of the adversary are i) the number of
Intelligence.
                                                                                              perturbed pixels is not more than 2% of the whole; ii) the number
 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons         of 8-connectivity regions is not greater than 10. Except for these
License Attribution 4.0 International (CC BY 4.0).                                            two constraints, there are no limitations on the magnitude of the
In: Dimitar Dimitrov, Xiaofei Zhu (eds.): Proceedings of the CIKM AnalytiCup 2020, 22
October, 2020, Gawlay (Virtual Event), Ireland, 2020, published at http://ceur-ws.org.        adversarial perturbation. Because both constraints are related to
                                                                                              the number of pixels, this belongs to the ℓ0 norm attack.
                                                                                         32
                                Figure 1: An illustration for the workflow of the proposed HNM-PGD.


Algorithm 1 Half-Neighbor Masked PGD (HNM-PGD)                              3 METHODOLOGY
Input: A given example 𝑥, number of random initialization 𝑛, con-           3.1 Mask Finding
     trol parameter 𝜙, HN kernel size 𝑘 and adjust step 𝑠, number
                                                                            We first propose a mask generation method to locate perturbation
     of PGD iterations 𝑃, PGD step size 𝛼
                                                                            regions for any given examples. Apparently, the size and shape of
Output: 𝛿
            1 Í                                                             perturbation regions are critical to conduct a successful adversarial
  1: 𝑆𝑥 = 𝑛 𝑛   𝑖=1 ∇𝑥 𝐿 (𝑓 (𝑥 + 𝜂𝑖 ))                                      attack under the constraints. Therefore, we use salience map to
  2: repeat
                                                                            capture the model’s response toward each pixel in 𝑥 at beginning.
  3:     𝑧 resp = Mean(𝑆𝑥 ) + 𝜙 Std(𝑆𝑥 )
                                                                            After compute an example’s salience map, we initialize the mask via
  4:     Initialize mask 𝑀 via 𝑧 resp
                                                                            only keep pixels that the model’s respond is larger than a threshold
  5:     while 𝑘 > 3 do
                                                                            𝑧 resp . To automatically carry out this initialization, we borrow the
  6:          𝑀 = HN(𝑀, 𝑘)
                                                                            idea of standard deviation and coverage from Gaussian distribution,
  7:          𝑀 = HN(𝑀, 3)
                                                                            and compute 𝑧 resp via the mean and standard deviation of 𝑆𝑥 , which
  8:          𝑘 = 𝑘 −𝑠
                                                                            can be described as 𝑧 resp = Mean(𝑆𝑥 ) + 𝜙 Std(𝑆𝑥 ), where 𝜙 is a
  9:     end while
                                                                            control parameter.
 10:     𝜙 = 𝜙 + 0.1
                                                                                To meet the pixel constraints, we follow the spirit of K Nearest
 11: until 𝑀 meets constraints
                                                                            Neighbor algorithm to refine the mask. Specifically, if half of a
 12: Random initialize 𝛿
                                                                            pixel’s neighbors have been chosen by the current mask, then this
 13: 𝛿 = 𝛿 × 𝑀                            ⊲ × : element-wise product
                                                                            pixel would also be chosen, otherwise it will be discarded. We
 14: for 𝑖 = 1 . . . 𝑃 do
                                                                            employ two convolution kernels whose parameters are all 1 to
 15:     𝛿 = 𝛿 + 𝛼 · sign (∇𝛿 𝐿 (𝑓 (𝑥 + 𝛿)))
                                                                            conduct this Half-Neighbor (HN) procedure. The first kernel aims
 16:     𝛿 =𝛿 ×𝑀
                                                                            to reduce the number of pixels in a mask, and its size is gradually
 17:     𝛿 = max(min(𝛿, 0 − 𝑥), 1 − 𝑥)
                                                                            changed during iterations. The second kernel is fixed to 3×3, it can
 18: end for
                                                                            guarantee that there are no isolated pixels in the mask and reduce
                                                                            the number of connectivity regions (See lines 5–9 in Algorithm 1).
                                                                            If the mask still does not meet the constraints, the algorithm will
                                                                            adjust 𝜙 accordingly and search again.
2.3    Salience Map
Salience map is a common tool to analyze and interpret DNN mod-
                                                                            3.2    Masked PGD Attack
els’ behaviors. Smilkov et al. [12] proposed SmoothGrad method to
generate stable salience maps. Given a loss function 𝐿, the salience        Once the perturbation regions are located, we generate adversarial
map is given by                                                             examples via PGD iterations. The basic idea here is summarized
                                                                            in Algorithm 1, where the selected regions are perturbed by a
                            𝑛                                               PGD adversary via conducting element-wise products between
                         1Õ
                  𝑆𝑥 =         ∇𝑥 𝐿 (𝑓 (𝑥 + 𝜂𝑖 )) ,              (1)        perturbation 𝛿 and mask 𝑀. The workflow of the proposed defense
                         𝑛 𝑖=1                                              is shown in Fig. 1.
                                                                                Due to the difference in the object detectors’ output 𝑧, we need
where 𝜂𝑖 are white noise vectors that sampled i.i.d. from a Gaussian        to consider YOLOv4 and Faster RCNN separately. As we discussed
distribution.                                                               in section 2.1, YOLOv4 directly outputs its confidence, so Binary
                                                                       33
                         (a)                                             (b)                                             (c)

Figure 2: Adversarial patches examples generated by the proposed HNM-PGD, the generated patches are mostly located in the
semantic part of the object, such as the horse’s eye, skateboard and human’s body.



Cross-Entropy (BCE) loss is a suitable option to conduct adversarial
attack. Suppose there are 𝑚 pre-defined boxes, and the maximal
confidence is 1, BCE loss can be simplified as
                                   𝑚
                                   Õ
                     𝐿𝑦𝑜𝑙𝑜 (𝑧) =         log 𝑧𝑖conf,              (2)
                                   𝑖=1

where 𝑧 is the output of a detector and 𝑧𝑖conf ∈ 𝑧.
   Different from YOLOv4, there are 𝐶 + 1 classes in Faster RCNN’s
classification result, including 𝐶 foreground objects and 1 back-
ground class. To force the detector to classify an adversarial example
into the background class, we conduct a targeted adversarial attack
with a negative Cross Entropy (CE) loss, which can be written as
                                         Õ                
                            cls
            𝐿 𝑓 𝑟𝑐𝑛𝑛 (𝑧) = 𝑧𝐶+1 − log            exp(𝑧 cls
                                                       𝑗 ) ,      (3)
                                             𝑗
                                                                               Figure 3: Performance on the toy dataset with the increasing
where 𝑧 cls is the classification output of Faster RCNN detector,              amount of pixel.
and 𝑧 cls ∈ 𝑧. Note that we can attack YOLOv4 and Faster RCNN
simultaneously by simply using HN masked PGD mixmize 𝐿𝑦𝑜𝑙𝑜 +
𝐿 𝑓 𝑟𝑐𝑛𝑛 .                                                                     Faster RCNN Similar to the configuration of YOLOv4, we resize
                                                                               the input to 800×800 with bilinear interpolation. As the permitted
4     EXPERIMENTS                                                              threshold for Faster RCNN is 0.3, which is relatively lower than
                                                                               YOLOv4. In practice, we assign a smaller threshold 0.1 when calcu-
To demonstrate our method, we select 100 images from MS COCO
                                                                               lating the loss to enable more boxes to appear.
dataset as a toy dataset and conduct experiments for comparison
                                                                               PGD Settings In this paper, the HNM-PGD is carried out with 40
on two white-box models, i.e. YOLOv4 and Faster RCNN.
                                                                               steps, and the step size is 16/255.
4.1    Implementation Details
                                                                               4.2    Experimental Results
YOLOv4 Regarding the provided model YOLOv4, the input size is
                                                                               In this part, we employ the formula in the description of AnalytiCup
set to 608×608 while the original image has 500×500, to allow differ-
                                                                               to calculate the score, which provides a criteria to evaluate the
ential, we employ the function torch.nn.Upsample with bilinear
                                                                               performance of the proposed method. Our code is available on
interpolation to resolve the resize problem. Due to the approx-
                                                                               Github1 .
imation computation of torch.nn.Upsample, we need to allow
                                                                                  We produce 100 adversarial examples on the white-box models
more boxes to be detected to stabilize the adversarial perturba-
                                                                               with two loss together: 𝐿𝑦𝑜𝑙𝑜 + 𝐿 𝑓 𝑟𝑐𝑛𝑛 . Figure 2 gives several suc-
tion. As YOLOv4 method only outputs the foreground objects with
                                                                               cessful examples for the targeted models. We can observe that the
𝑧 conf > 0.5, we adjust the confidence threshold from 0.5 to 0.3
during attack.                                                                 1 https://github.com/YanghaoZYH/HNM-PGD

                                                                          34
proposed method does locate the object correctly, and the added               Technology under Grant GDYX2019025, and the Innovation Project
patches normally target on their sensitive parts. Figure 3 illustrates        of GUET Graduate Education under Grant 2020YCXS042.
the overall score with the increasing upper bound of the number
of pixel among 100 selected images, and the scores achieved by                REFERENCES
YOLOv4 and Faster RCNN, respectively. We find that with the in-                [1] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4:
                                                                                   Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934
crease of the quantity of pixel, Faster RCNN performs better, while            [2] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining
this is not the case for YOLOv4. This is because the provided white-               and Harnessing Adversarial Examples. In International Conference on Learning
box Faster RCNN uses a low tolerate threshold, where sufficient                    Representations (ICLR).
                                                                               [3] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, and et al. 2020. A survey of safety
pixel is needed for successful attack. In terms of YOLOv4, the per-                and trustworthiness of deep neural networks: Verification, testing, adversarial
formance fluctuates at about 53. Therefore, there is a trade-off when              attack and defence, and interpretability. Computer Science Review 37 (2020),
choosing the amount of the pixel.                                                  100270.
                                                                               [4] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun,
    We apply the same strategy and perform the adversarial attack                  Emese Thamo, Min Wu, and Xinping Yi. 2020. A survey of safety and trust-
with more steps (800) and smaller step size (4/255) for all 1000                   worthiness of deep neural networks: Verification, testing, adversarial attack and
                                                                                   defence, and interpretability. Computer Science Review 37 (2020), 100270.
images under the different quantity of pixel, then we pick the best            [5] Yann LeCun, Urs Muller, Jan Ben, Eric Cosatto, and Beat Flepp. 2005. Off-Road
result obtained on the white-box models as our solution. In the                    Obstacle Avoidance through End-to-End Learning. In the Advances in Neural
final stage of the AnalytiCup competition, we had also tried to                    Information Processing Systems (NeurIPS).
                                                                               [6] Jiajun Lu, Hussein Sibai, and Evan Fabry. 2017. Adversarial Examples that Fool
improve the generalization of the attacking approach on the unseen                 Detectors. arXiv:1712.02494
model, i.e. black-box. In detail, we add some transformations (like            [7] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
flipping/cropping) to the input image, which is expected to not                    Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial
                                                                                   Attacks. In International Conference on Learning Representations (ICLR).
overfit the known white-box models too much. Our final score is                [8] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn:
2414.87, ranked 17 in the competition.                                             Towards real-time object detection with region proposal networks. In Advances
                                                                                   in neural information processing systems (NeurIPS).
                                                                               [9] Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2018. Reachability
                                                                                   Analysis of Deep Neural Networks with Provable Guarantees. In In Proceedings
5   BEYOND COMPETITION                                                             of theInternational Joint Conference on Artificial Intelligence, Stockholm, Sweden,
This competition leads to a few interesting research directions. In-               13-19 July.
                                                                              [10] Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, and
tuitively, due to the ℓ0 norm constraint, both location and shape of               Marta Kwiatkowska. 2019. Global Robustness Evaluation of Deep Neural Net-
the perturbation are critical to the attacking performance. We have                works with Provable Guarantees for the Hamming Distance. In In Proceedings of
reviewed other top contestants’ solutions and found that linear ad-                the International Joint Conference on Artificial Intelligence, Macao, China, 10-16
                                                                                   August.
versarial patches have a higher impact on the target model’s output           [11] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2016.
and use less number of pixels than blocky ones, while location is                  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recog-
less important. This seems because blocky perturbation can only                    nition. In ACM Conference on Computer and Communications Security (SIGSAC).
                                                                              [12] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Watten-
influence a relatively small range of a convolution kernel’s output,               berg. 2017. SmoothGrad: removing noise by adding noise. arXiv:1706.03825
while linear perturbation can cross a wider area. To verify such              [13] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,
                                                                                   Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks.
conjecture, we wish to adopt verification technologies on neural                   In International Conference on Learning Representations (ICLR).
networks [9, 10, 15] into the object detectors and quantify the worst-        [14] F. Wang, L. He, W. Liu, and Y. Zheng. 2020. Harden Deep Convolutional Classifiers
case scenario of adversarial patches on object detectors. Besides,                 via K-Means Reconstruction. IEEE Access 8 (2020), 168210–168218.
                                                                              [15] Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska.
on the top of our HNM-PGD, we can also expand evaluation of                        2020. A game-based approximate verification of deep neural networks with
existing adversarial attacks and defenses, such as [14, 18], onto                  provable guarantees. Theoretical Computer Science 807 (2020), 298–329.
object detection tasks.                                                       [16] Bin Yan, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2020. Cooling-Shrinking
                                                                                   Attack: Blinding the tracker with imperceptible noises. In IEEE Conference on
                                                                                   Computer Vision and Pattern Recognition (CVPR).
                                                                              [17] Shaoning Zeng, Bob Zhang, Yanghao Zhang, and Jianping Gou. 2018. Collabora-
6   CONCLUSION                                                                     tively weighting deep and classic representation via l2 regularization for image
                                                                                   classification. arXiv preprint arXiv:1802.07589 (2018).
In conclusion, we propose a PGD-based approach to attack object de-           [18] Yanghao Zhang, Wenjie Ruan, Fu Wang, and Xiaowei Huang. 2020. Generalizing
tectors using Half-Neighbor masks. In the proposed HNM-PGD, the                    Universal Adversarial Attacks Beyond Additive Perturbations. arXiv:2010.07788
automatic pipeline allows it to craft adversarial examples/patches            [19] Yanghao Zhang, Shaoning Zeng, Wei Zeng, and Jianping Gou. 2018. GNN-CRC:
                                                                                   discriminative collaborative representation-based classification via Gabor wavelet
automatically under ℓ0 constraint, which can be applied in many                    transformation and nearest neighbor. Journal of Shanghai Jiaotong University
applications, even physical-world attacks. On the other hand, this                 (Science) 23, 5 (2018), 657–665.
end-to-end attack framework also benefits further studies on de-
fending object detectors against adversarial attacks and verifying
their robustness.

ACKNOWLEDGMENTS
The work was partially supported by the Guangxi Science and
Technology Plan Project under Grant AD18281065, the Guangxi
Key Laboratory of Cryptography and Information Security under
Grant GCIS201817. Fu Wang is supported by the study abroad
program for graduate student of Guilin University of Electronic
                                                                         35