Fool Object Detectors with L0-Norm Patch Attack
                                     Honglin Li*                                                                       Yunqing Zhao
                               Westlake University                                                     Singapore University of Technology and Design
                           Hangzhou, Zhejiang, China                                                                     Singapore
                           lihonglin@westlake.edu.cn                                                        yunqing_zhao@mymail.sutd.edu.sg
ABSTRACT                                                                                         2.1     Competition Understanding
Deep Neural Networks based Object Detection algorithms have                                      The competition adopts two object detectors known by the competi-
shown their remarkable performance and been widely applied in                                    tors (known as white-box attacks) and another two black-box object
various aspects in recent years. However, for those areas which draw                             detectors for evaluation. The competitors are asked to generate ad-
much attention to the robustness and security of the model, like                                 versarial examples by adding a small number of patches (less than
Autonomous Driving and Biomedical Image Analysis, there are still                                10) to each image (also known as L0 attacks) offline.
challenges to make domain users look at these detectors as reliable
methods. In this paper, we mainly focus on object detection and                                  2.1.1 White-box Models: The 2 white-box model are the famous
propose our method to fool object detectors with with L0 -Norm                                   Faster R-CNN [11] and YOLOv4 [1] respectively. Faster R-CNN
patch attack.                                                                                    is known as a 2-stage detectors. The first stage, or RPN [11], will
                                                                                                 classify a coarse fore/back-ground binary result for each anchor,
CCS CONCEPTS                                                                                     then most of the background anchors will be neglected by threshold,
                                                                                                 and the second stage will make final prediction based on remaining
• Computing methodologies → Object detection; • Security and                                     anchors. YOLOv4 is an 1-stage detector, and both of this 2 white-
privacy;                                                                                         boxes will predict on two aspects: the object’s bounding box size
                                                                                                 and location, which is a regressor, and the object’s category, which
KEYWORDS                                                                                         is a classifier.
object detection, adversarial attack, projected gradient descent
                                                                                                 2.1.2 Data, Constrains and Evaluation Metric: The competi-
                                                                                                 tion provides 1000 COCO test samples without annotation, where
1     INTRODUCTION
                                                                                                 object detectors will predict the 81 (80 types of foreground object,
Deep Neural Networks based computer vision algorithms have                                       and 1 for background) categories’ confidence and the bounding-box’
shown their remarkable performance but also faced with vulner-                                   size and location.
ability and security problems. Adversarial attack on Image Clas-                                 First limitation is the maximum limit of changed pixels rate. Second
sification has made significant progress [4], also spurred the ad-                               is the maximum limit of patches’ number. The goal of the adver-
vancement of the robustness of the classification models [12], and                               saries is to make all bounding boxes failed to be returned, by adding
some attack research on object detection has also been made [16]. To                             the patches to images. As for evaluation score, on one hand the less
make it deeper, CIKM-2020 and Aliyun-Tianchi host the AnalytiCup                                 bounding boxes given by the adversarial example, the higher the
workshop competition [3] about generating adversarial sample from                                score, on the other hand the less changed pixel rate, the higher the
COCO [8] to attack 4 mainstream object detection models. In this                                 score. More specific definition can be found in [3].
paper, we propose our solution based on Projected Gradient Descent
[10] with l0 -norm towards this competition. To meet the rule of the                             2.2     Related Work
competition that the modified areas should be as smaller as better
and must be constructed as connected domain, we transfer such rule                               A number of attacks for object detectors have been developed re-
into solvable problem, and make our efforts by k-means and Prim                                  cently [16]. [15] extends the attack method from classification to
for higher score. Eventually, we get a fine result and rank 10th out                             detection and demonstrates that it is possible to attack objectors us-
of 1701 teams. Code has been made publicly available at our github                               ing a designed classification loss. [9] generate adversarial examples
code repo.                                                                                       that fool detectors for stop sign and face detections. [7] proposes to
                                                                                                 attack the RPN with a specially designed hybrid loss incorporating
2     BACKGROUND                                                                                 both classification and localization terms. Apart from the full images,
                                                                                                 [6] attack detectors by restricting the attacks to be within a local
In this section, we first make description and understanding of the                              region.
competition, then provide some background knowledge and review
the related works about adversarial attack on Object Detection.
                                                                                                 2.3     Projected Gradient Descent(PGD)
* Corresponding author                                                                           The PGD [10] attack aims at maximizing the loss: max{Jytarget , y pred }
                                                                                                 from the viewpoint of robustness optimization. In each iteration, the
 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons            PGD first modifies x by ∇x L, then it will take projection to norm
License Attribution 4.0 International (CC BY 4.0).                                               ball.
In: Dimitar Dimitrov, Xiaofei Zhu (eds.): Proceedings of the CIKM AnalytiCup 2020,
22 October, 2020, Gawlay (Virtual Event), Ireland, 2020, published at http://ceur-ws.org.           For some attack problems where the ground-truth label is not
                                                                                                 given, we can propose a loss function towards bad prediction by
                                                                                            17
                                                                                 trade-off between the attack ability and the transferability[4]. While
                                                                                 there are K models, we should fuse the output of these models, e.g.
                                                                                 for binary classification logits output, we can use weighted averaging
                                                                                 Lx = wk lk x.
                                                                                 The momentum iterative-FGSM[4] also trade-off between the white-
                                                                                 box attacks and the transferability. Intuitively, the adversarial exam-
                                                                                 ple can easily drop into poor local optima in searching landscape and
                                                                                 ‘overfit’ the specific model. So, using the momentum of gradients
                                                                                 can smooth the above problem and makes it more transferable.

                                                                                 2.6    Minimum Spanning Tree
Figure 1: PGD iteration: X is the original input, also initial state             An edge-weighted graph is a graph where we associate weights
of adversarial sample X0atk , in each gradient descent iter step                 or costs with each edge. A minimum spanning tree (MST) of an
                    atk will be modified into X tmp . Then, if X tmp lies
i = 1, 2, ..., n, Xi−1                                                           edge-weighted graph is a spanning tree whose weight (the sum of
                                               i                i
outside of the norm box, it will be projected onto the norm box’                 the weights of its edges) is no larger than the weight of any other
                                tmp
edge as Xiatk , else Xiatk = Xi .                                                spanning tree.[13]
                                                                                 There are two mainly algorithms to solve the prblem: Prim’s algo-
                                                                                 rithm, and Kruskal’s algorithm.[13]
setting synthetic bad label, e.g. in classification problem we may set
all labels into a specific category, like 0 in MNIST. Then the original
PGD can be used by min{Jysynthetic , y pred }.

2.4    Sparse l0 -attack
In an l0 -attack one is interested in finding the smallest number of
pixels which need to be changed so that the decision changes. We
show an adversarial image with l0-attack in Fig.1. From a practical
point of view the l0-attack tests basically how vulnerable the model
is to failure of pixels or large localized changes on an object e.g. a
sticker on a refrigerator or dirt/dust on a windshield [2], or stained
points on cell microscopy image.


                                                                                            Figure 3: A MST sample, V: vertex, E: edge


                                                                                 3     METHODOLOGY
                                                                                 In this section, we elaborate the details of the proposed method. We
                                                                                 first illustrate how to employ the PGD to attack the detectors with
                                                                                 l0 -norm. Then we describe our approach to meet the constrains, and
                                                                                 tell how to trade off the dilemma that fewer modified pixels results
                                                                                 in higher score coefficient but lower attack performance.

                                                                                 3.1    Loss Function
                                                                                 Considering the fact that only if the classifier output as foreground
Figure 2: Left: original image with box for zoom. Right: l0 -
                                                                                 and the class confidence is bigger than NMS threshold, the final
attack, only 0.04% are changed, but the final classification result
                                                                                 predict result will be given, our method can be quite straight forward:
is quite different[2].
                                                                                 only attack the classifier of detectors. Even though the competition
                                                                                 does not provide any annotation, thanks for the NMS threshold we
                                                                                 can make attack. The attack loss function is defined as follow:
2.5    Better Black-box Attack, or More                                                                       1 N
       Transferable                                                                                     Jx =       smlx · 1smlx>τ ,                  (1)
                                                                                                             N i=1
There are some techniques proposed for better transferable attack                Where N is the number of predicted objects, and sml is the softmax
on black-box. And in this paper, we only discuss 2 gradient-based                output, and τ is a threshold more strict than NMS threshold. 1 is
methods.                                                                         a indicator function that return 1 if smlx > τ else 0. The target of
Ensemble of Models has been used to improve traditional prediction.              this function is trying to make all foreground prediction can not pass
When attacking ensemble of models, the adversarial sample can                    NMS.
                                                                            18
3.2    Attacking Procedure
Our attacking framework is summarized in Algorithm 1. Input sam-
ple x to the given detectors, then the softmax probability of a specific
category can be obtained. Since background will be neglected by
NMS, we only store the foreground’s probability. Above input to
store process can be denoted as smlk x. By calculate the loss function,
we can make back propagation the gradient on x is got.(We just give
a simple description here about how to iterate the momentum, more
details like its cold start problem and various variants can be seen
in [5].) At last, if the attack result is outside of the box, we should
projected it onto the norm bound like Fig.1.

Algorithm 1 PGD l0 -attack on ensemble of models
Input: A sample x, the foreground’s softmax function of K models:               Figure 4: Group pixels to 10 areas with k-means clustering, the
sml1 , sml2 , ..., smlK and weights: w1 , w2 , ..., wK .                        dots are the pixels, the stars are centers of each group.
Hyper-params: Training epochs T , gradient momentum update
factor µ, learning rate lr.
Output: An adversarial example x∗ .
   x0∗ = x, momentum g0 = 0;
   for t = 0 to T − 1 do
       Input xt∗ , get sml0 xt∗ , sml1 xt∗ , ..., smlK xt∗ ;
       Fuse the K logits as lxt∗ = K    k=1 wk smlk ;
       Get loss Jxt∗ based on lxt∗ and Eq.(1);
       Obtain current gradient ∇x Jxt∗
       gt1 = gt ∗ 1 − µ ∇x Jxt∗ ∗ µ
        ∗ = x∗ − lr ∗ g
       xt1     t         t1
       update xt1∗ by projection onto l -norm box
                                                0
   end for
   return x∗ = xT∗


3.3    Construct Connected Domains
We solve the connected domains rule of this competition by K-means              Figure 5: The training process for a sample. Due to the fixed
and then convert sub-problem to a MST. We first set a specific pixels’          form of mask after epoch 30, the pre-train with more top K pix-
number as β for l0 -attack, and train the adversarial sample with               els is useful.
Algorithm.1 for certain epochs. Then, we group these pixels into K
areas by K-means, which can be visualized in Fig.4. The distance
metric of two pixels is given by Chessboard Distance, or known                  value are 0 − 255 while the gradients are quite small. The momen-
as Chebyshev Distance[14]. After grouping, we connect the pixels                tum update factor µ = 0.5 is soft, and weights relatively high on the
within their group by method in 2.1.4. During connecting process,               gradients of current iteration compared with the usage other task,
we use these pixels’ value before l0 -norm projection, and in fact this         like traditional image classification. We set the first 30 epochs to
is also a projection. If the value is same to original input, we make           train sample with 8000 pixels l0 limitation, then use 200 epochs to
slight modification on that value to keep connectivity.                         train sample with less pixels with connecting operation, finally we
                                                                                train 30 epochs to make sure the the sample quantized into uint8 type
4     EXPERIMENTS                                                               which helps maintain most adversarial ability. The first 30 ‘pre-train’
In this section, we show our experiments based on the methodol-                 epochs’ improvement can be visualized in Fig.5.
ogy in section 3. In our experiments, we only attack the 2 provided             We show our result in Table.1 by adding the methods during the
white-box models due to the computational resources limitation.                 competition, the overall score list are recorded at the our subscription
YOLOv4 make binary classification on whether the object is fore-                result on the competition entrance. And we visualized a perturbed
ground, and its NMS threshold (0.4) is different from Faster R-                 image and its patch in Fig.6.
CNN(0.3). We set the threshold τ in Eq.1 as 0.15 for YOLO and
0.25 for Faster R-CNN. The latter output 81 class while the former              5   DISCUSSION AND CONCLUSION
is binary, so the corresponding τ is relatively bigger. The value also          In this paper, we show our PGD l0 -attack adversarial solution to-
take the output’s unbalanced distribution into consideration.                   wards the competition’s target. We find that there is no need to make
We set our learning rate as 60000, because the attack on image’s                hand-crafted patches into the image, because by PGD searching
                                                                           19
                    Score                                                                  [5] Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
                            A     YOLO Overall Black
       Module                                                                                  mization. International Conference on Learning Representations (12 2014).
                                                                                           [6] Yuezun Li, Xian Bian, and Siwei Lyu. 2018. Attacking Object Detectors via
       handcraft patch, A    -       -       61        -                                       Imperceptible Patches on Background. ArXiv abs/1809.05966 (2018). http:
       k-means             250      31      304       23                                       //arxiv.org/abs/1809.05966
       YOLO only            49      683     783       51                                   [7] Yuezun Li, Daniel Tian, Ming-Ching Chang, Xiao Bian, and Siwei Lyu. 2018.
                                                                                               Robust Adversarial Perturbation on Deep Proposal-based Models. CoRR
       A + YOLO            231      651    1036      154                                       abs/1809.05962 (2018). http://arxiv.org/abs/1809.05962
       momentum            281      997    1635      357                                   [8] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
                                                                                               Ramanan, Piotr Dollar, and Larry Zitnick. 2014. Microsoft COCO: Common
       Prims, 1-stage A    820     1255    2547      472                                       Objects in Context. In ECCV (eccv ed.). European Conference on Computer
       quantized training 955      1335    2836      546                                       Vision. https://www.microsoft.com/en-us/research/publication/microsoft-coco-
                                                                                               common-objects-in-context/
Table 1: The ‘A’ means Faster R-CNN. Firstly we use hand-                                  [9] Jiajun Lu, Hussein Sibai, and Evan Fabry. 2017. Adversarial Examples that Fool
crafted patches and only attack A, the pixels side patches will                                Detectors. CoRR abs/1712.02494 (2017). http://arxiv.org/abs/1712.02494
be modified. Then we use K-means to get 10 proposal areas, and                            [10] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
                                                                                               Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial
the score gets better. We also attack YOLO only, and find that                                 Attacks. (06 2017).
YOLO is much easier to be destroyed. By training on ensemble                              [11] Shaoqing Ren, Kaiming He, Ross. Girshick, and Jian. Sun. 2017. Faster R-CNN:
                                                                                               Towards Real-Time Object Detection with Region Proposal Networks. IEEE
of the 2 models, we get higher black box score, even though the                                Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–
white-box score shows a little decrease. By adding momentum,                                   1149. https://doi.org/10.1109/TPAMI.2016.2577031
the increase is promising. To lower the number of pixels, we op-                          [12] Andrew Ross and Finale Doshi velez. 2017. Improving the Adversarial Robust-
                                                                                               ness and Interpretability of Deep Neural Networks by Regularizing their Input
timized the connecting method by Prims together with hybrid                                    Gradients. (11 2017).
stage-1 of A into loss function, and we get a notable improve-                            [13] Robert Sedgewick and Kevin Wayne. 2000. Algorithms, 4th Edition. Retrieved
ment. Finally, we quantize the adversarial sample after every 3                                January 12, 2021 from https://algs4.cs.princeton.edu/home/
                                                                                          [14] F. Van der Heijden, Robert Duin, D. Ridder, and David Tax. 2004. Classification,
epochs, which lowers the train/test error.                                                     Parameter Estimation and State Estimation: An Engineering Approach Using
                                                                                               MATLAB. (01 2004). https://doi.org/10.1002/0470090154
                                                                                          [15] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan
                                                                                               Yuille. 2017. Adversarial Examples for Semantic Segmentation and Object De-
                                                                                               tection. In 2017 IEEE International Conference on Computer Vision (ICCV).
                                                                                               1378–1387. https://doi.org/10.1109/ICCV.2017.153
                                                                                          [16] H. Zhang and J. Wang. 2019. Towards Adversarially Robust Object Detection. In
                                                                                               2019 IEEE/CVF International Conference on Computer Vision (ICCV). 421–430.
                                                                                               https://doi.org/10.1109/ICCV.2019.00051


          (a) perturbed image                      (b) modified areas


Figure 6: (a) is a perturbed image result; (b) is the difference
of original image and perturbed image. Most of the perturbed
pixels lie near to the 3 zebra objects.


this can be done automatically. We also use K-means and Prims to
handle the game’s constrain. For l0 attack, the top-K selection is
intuitive because the amplitude of gradients on the input interpret
how important they are for the prediction target. For the reason that
the connecting process is too slow, we just connect once and keep
patches fixed, which may hinder the searching result and would be
optimized in future work.

REFERENCES
 [1] Alexey Bochkovskiy, Chien Yao Wang, and Hong Yuan Mark Liao. 2020.
     YOLOv4: Optimal Speed and Accuracy of Object Detection. (2020). https:
     //arxiv.org/abs/2004.10934
 [2] Francesco Croce and Matthias Hein. 2019. Sparse and Imperceivable Adversarial
     Attacks. In Proceedings of the IEEE/CVF International Conference on Computer
     Vision (ICCV).
 [3] Dimitar Dimitrov and Xiaofei Zhu. 2020. CIKM2020 Analyticup: Alibaba-
     Tsinghua Adversarial Challenge on Object Detection. Retrieved January 12,
     2021 from https://www.cikm2020.org/adversarial-challenge-on-object-detection/
 [4] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and
     Jianguo Li. 2018. Boosting Adversarial Attacks With Momentum. CVPR (2018),
     9185–9193.
                                                                                     20