<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Eficient Warm Restart Adversarial Atack for Object Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ye Liu</string-name>
          <email>liuye_ly94@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaofei Zhu</string-name>
          <email>zxf@cqut.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xianying Huang∗</string-name>
          <email>hxy@cqut.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computer Science and, Engineering, Chongqing University of, Technology</institution>
          ,
          <addr-line>Chongqing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>21</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>This article introduces the solution of the champion team green hand for CIKM2020 Analyticup: Alibaba-Tsinghua Adversarial Challenge on Object Detection. In this work, we propose a new adversarial attack method called Eficient Warm Restart Adversarial Attack for Object Detection. It consists of three modules: 1) Eficient Warm Restart Adversarial Attack, which is designed to select proper top-k pixels ; 2) Connecting Top-k pixels with Lines, which specifies the strategy on how to connect two top-k pixels to reduce the patch number and minimize the number of changed pixels; 3) Adaptive Black Box Optimization, which is used to achieve a better performance of the black box adversarial attack by adjusting only the white box models. The final results show that our model, which only uses two white box models (i.e., YOLOv4 and Faster-RCNN), achieves an evaluation score of 3761 in this competition, which ranks first among all 1,701 teams. Our code will be available at https://github.com/liuye6666/EWR-PGD.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Object detection;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Deep neural networks have achieved great success in object
detection [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6–8</xref>
        ]. However, recent studies have shown that deep
neural networks are vulnerable to attacks from adversarial examples
[
        <xref ref-type="bibr" rid="ref1 ref10 ref5">1, 5, 10</xref>
        ]. In order to identify the fragility of the object detection
models and better evaluate the model’s adversarial robustness,
Alibaba and Tsinghua organize the CIKM2020 AnalytiCup Challenge,
i.e., Alibaba-Tsinghua Adversarial Challenge on Object Detection.
The competition uses the MSCOCO dataset1, and expects that
participants can make the models unable to detect objects while adding
fewer adversarial patches.
      </p>
      <p>To make the challenge more competitive, the challenge organizer
add two Constraints:
∗Corresponding author
1https://cocodataset.org/</p>
      <p>
        Existing adversarial attack methods, such as FGSM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], PGD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
MultiTargeted-PGD [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], ODI-PGD [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], add adversarial
perturbations to the whole image. The shortcomings of these approaches
are: (1) Due to the Constraint 1, adding adversarial perturbations
to the whole image is not allowed. (2) All these adversarial attack
methods are mainly designed in the image classification scenario.
As there is a considerable diference between object detection and
image classification, directly applying above methods in the
object detection scenario would lead to sub-optimal results. (3) These
methods do not control the number of adversarial patches, thus it
couldn’t satisfy the Constraint 2.
      </p>
      <p>To address the above-mentioned problems of existing approaches,
in this work, we propose a novel approach, named Eficient Warm
Restart Adversarial Attack for Object Detection. It consists of three
modules: (1) Eficient Warm Restart Adversarial Attack (EWR),
which performs multiple warm restarts during the process of
generating adversarial examples and selects the most important top-k
pixels based on the gradient value for each warm restart. (2)
Connect Top-k pixels with Lines (CTL), which connects these important
pixels together with lines to ensure less pixels are modified and
patch number satisfies the Constraint 2. (3) Adaptive Black Box
Optimization method (ABBO), which attempts to adjust the white
box models to implicitly afect the performance of the black box
adversarial attack.</p>
      <p>The main contribution of this work is summarized as follows:
1) We propose a novel approach which can efectively handle
the limitation of existing adversarial attack methods, and
satisfy the two constraints given by the challenge.
2) Our method achieves the best performance among all 1,701
teams with utilizing only two white box models, i.e., YOLOv4
and Faster-RCNN.</p>
    </sec>
    <sec id="sec-3">
      <title>OUR APPROACH</title>
      <p>In order to solve the problem given in this competition, we propose
a novel method, which contains three modules: (1) Eficient Warm
Restart Adversarial Attack; (2) Connecting Top-k Pixels with Lines,
and (3) Adaptive Black Box Optimization.
In an image, there are multiple objects which can be detected. Based
on our preliminary analysis, we find that the loss usually don’t
change in parallel. In particular, in the beginning, some objects will
change their corresponding loss considerably, while the loss change
of the remaining objects is small. After that, these objects with less
loss change in the beginning will change their corresponding loss
greatly. If we select top-k pixel only in the beginning stage, then
the selected top-k pixels will be biased towards these objects with a
high early loss change. It will inevitably result in selecting improper
important pixels.</p>
      <p>
        Inspired by the work of PGD[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and I-FGSM[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we design a novel
module named Eficient Warm Restart Adversarial Attack. In the
ifrst few restarts, modifying the selected top-k pixels will increase
the loss of some objects. As the number of restarts increases, more
important pixels are selected, so that the loss of the remaining
objects will increase significantly. This method can efectively solve
the problem of selecting improper important pixels as we mentioned
before.
      </p>
      <p>Therefore, for a given original image, we use multiple warm
restarts. For each warm restart, we start from the result of last
warm restart, and then feed previous restarted adversarial examples
into the YOLOv4 and Faster-RCNN model. Then we compute the
loss, and obtain the gradient value of the input image through back
propagation. At last, we select the pixel points according to the new
gradient values, and modify these pixels in the direction of loss
raising. When the number of restarts reach a specified threshold
(e.g., 10) , or the evaluation score of the subsequent restarts doesn’t
increase. Finally, we obtain the best adversarial example with the
highest score.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Connecting Top-k pixels with Lines (CTL)</title>
      <p>In order to satisfy the condition 2, we need to connect the important
top-k pixels together to reduce the patch number. In this work, we
propose a simple while efective method, called Connecting Top-k
pixels with Lines, to make the number changed pixels as small as
possible.</p>
      <p>Specifically, we iteratively connect two top-k pixels to reduce the
patch number and minimize the number of changed pixels. First,
we randomly select a pixel from all top-k pixels and connect it to
its nearest pixel in the remaining top-k pixels by using a Line. It is
worth noting that a line will involve minimum changed pixels, this
step can minimize the changed number of pixels. Then we ignore
the selected pixel, and run the above process again in the remaining
set of pixels. We will conduct this two steps iteratively until all
important pixels are in the same connected sets.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Adaptive Black Box Optimization (ABBO)</title>
      <p>For adversarial attack, the black box models are much harder as
compared with the white box models. Since in our work we only
make use of two white box models for adversarial attack, we will
improve our model to achieve a better performance over the black
box adversarial attack. In particular, we will adaptively adjust the
strategy of connecting top-k pixels as well as the parameter  of
top-k. For an image with a small number of changes pixels for
the white box models, it will be dificult for the black box attack.
Thus, we first select a small k for top-k pixels. Then we restrict
the number of changed pixels between two top-k pixel . Inversely,
when an image has a large number of changed pixels for the white
box models, we will select a bigger k for top-k pixels. In particular,
we conduct as follows:
1) When white box score &gt; 3.3, which means a small number
of changed pixels, we set k=10 for top-k, and don’t connect
two top-k pixels if the number of changed pixels between
them are more than 100.
2) When white box score is between 3 and 3.3, which means
a medium number of changed pixels, we set k=20 for top-k,
and don’t connect two top-k pixels if the number of changed
pixels between them are more than 150.
3) When white box score is &lt; 3, which means a larger number
of changed pixels, we set k=35 for top-k, and don’t connect
two top-k pixels if the number of changed pixels between
them are more than 500.
2.4</p>
    </sec>
    <sec id="sec-6">
      <title>Loss Function</title>
      <p>In our the EWR module, the loss function directly afects the
position of selecting important top-k pixels. Since the goal of this
competition is to make the model unable to identify the bounding
boxes, we only need to consider the loss related to the confidence
of bounding boxes. In order to make the confidence of all bounding
boxes small than a given threshold, we set diferent weights for
diferent confidence intervals. Specifically, for bounding boxes with
higher confidence, we set a larger weight in order to make it drop
faster.</p>
      <p>In YOLOv4 model, we set 4 confidence intervals, and set diferent
weights for diferent confidence intervals as follows:
 −0.01 ×   if   ≤ 0.2
 −0.1 ×   if 0.2 &lt;   ≤ 0.3
 =  −1 ×   if 0.3 &lt;   ≤ 0.4
 −10 ×   if 0.4 &lt;   ≤ 0.5

where conf represents the confidence of the detection bounding
boxes.</p>
      <p>In Faster-RCNN model, since the confidence threshold of the
boxes is 0.3, which is smaller than that in YOLOv4 (In YOLOv4, the
confidence threshold of the detection bounding boxes is 0.5), we
simply modify the loss function of Faster-RCNN as follow:
 −0.01 ×   if   ≤ 0.1
  =  −0−.11 ××   iiff 00..115&lt;&lt; ≤≤00.1.52
 −10 ×   if 0.2 &lt;   ≤ 0.3
</p>
      <p>Finally, for the overall loss function, we combine the loss function
of YOLOv4 and Faster-RCNN by simply adding both of them:
 =  +  − 
(1)
3</p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTS</title>
      <p>Dataset: This competition selected about 1,000 images from test
split of MSCOCO 2017 dataset. Each image has been resized to 500×
500.</p>
      <p>Model: we use only the two white box models, i.e., YOLOv4 and</p>
      <sec id="sec-7-1">
        <title>Faster-RCNN.</title>
        <p>Evaluation Metrics:The goal of the adversarial attack is to make
all bounding boxes invisible by adding the adversarial patches to
images. Thus we will adopt the following metric for evaluation:
 (,  ∗,  ) = 1 −
 ( ( ;  ),  ( ∗;  ))</p>
        <p>
          ( ;  )
× 2 −
Í
 
5000
where  is the  -th patch’s area,  is the original image,  ∗ is the
submitted adversarial image, and  is the -th model ( ∈ [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
          ]).
 ( ;  ) returns the number of bounding boxes of image  , given
by model  (a small number of bounding boxes given by the
adversarial example indicates a higher score). At last, the final score
is the sum of the scores of all images over the 4 models:
   =
        </p>
        <p>(,  ∗,  )
4
Õ Õ
=1 
(2)
(3)
3.1</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Results</title>
      <p>diferent combinations of modules.The combination of EWR and
CTL achieves evaluation score of 2500+ and 2600+ when
attacking YOLOv4 and Faster-RCNN, respectively. When attacking both
YOLOv4 and Faster-RCNN, the combination of EWR and CTL will
achieve an evaluation score of 3560+. We further combine all three
modules (i.e., EWR, CTL and ABBO), we will obtain the highest
evaluation score (i.e., 3761+), which ranks first among all 1,701
teams in the challenge of CIKM2020 Analyticup: Alibaba-Tsinghua</p>
      <sec id="sec-8-1">
        <title>Adversarial Challenge on Object Detection.</title>
        <p>√
√
√
EWR
√
√
√
√</p>
        <p>Method
CTL
√
√
√
√
ABBO
√
2600+
3560+
3761+
objects.
4</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSION 3.2</title>
    </sec>
    <sec id="sec-10">
      <title>Case Study</title>
      <p>where (a) is the original image, (b) is the results of Faster-RCNN
model’s detection, (c) is the results of YOLOv4 model’s detection,
and (d) is adversarial example. We can observe that our methods
have the following advantages:
• It has a small number of changed pixels.</p>
      <p>• Most of the top-k pixels are the key positions of attacked
In this paper, we proposed an Eficient Warm Restart Adversarial
Attack Method for Object Detection, which can modify fewer pixels
while maintaining a very high success rate of adversarial attack.
Our solution achieves the best performance in all 1701 teams in the
challenge of CIKM2020 Analyticup: Alibaba-Tsinghua Adversarial</p>
      <sec id="sec-10-1">
        <title>Challenge on Object Detection.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Yinpeng</given-names>
            <surname>Dong</surname>
          </string-name>
          , Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and
          <string-name>
            <given-names>Jianguo</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Boosting Adversarial Attacks with Momentum</article-title>
          .
          <source>In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>9185</fpage>
          -
          <lpage>9193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Sven</given-names>
            <surname>Gowal</surname>
          </string-name>
          , Jonathan Uesato, Chongli Qin,
          <string-name>
            <surname>Po-Sen</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Timothy A.
          <string-name>
            <surname>Mann</surname>
            , and
            <given-names>Pushmeet</given-names>
          </string-name>
          <string-name>
            <surname>Kohli</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>An Alternative Surrogate Loss for PGD-based Adversarial Testing</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>09338</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Alexey</given-names>
            <surname>Kurakin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ian J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Samy</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Adversarial examples in the physical world</article-title>
          .
          <source>In ICLR (Workshop).</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Aleksander</given-names>
            <surname>Madry</surname>
          </string-name>
          , Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Vladu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Towards Deep Learning Models Resistant to Adversarial Attacks</article-title>
          .
          <source>In ICLR 2018 : International Conference on Learning Representations</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Preetum</given-names>
            <surname>Nakkiran</surname>
          </string-name>
          , Gal Kaplun, Yamini Bansal, Tristan Yang,
          <string-name>
            <given-names>Boaz</given-names>
            <surname>Barak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Deep Double Descent: Where Bigger Models and More Data Hurt</article-title>
          . In ICLR 2020 : Eighth International Conference on Learning Representations.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Shaoqing</given-names>
            <surname>Ren</surname>
          </string-name>
          , Kaiming He,
          <string-name>
            <surname>Ross Girshick</surname>
            , Xiangyu Zhang, and
            <given-names>Jian</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Object Detection Networks on Convolutional Feature Maps</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>39</volume>
          ,
          <issue>7</issue>
          (
          <year>2017</year>
          ),
          <fpage>1476</fpage>
          -
          <lpage>1481</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Evan</given-names>
            <surname>Shelhamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Long</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Darrell</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Fully Convolutional Networks for Semantic Segmentation</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>39</volume>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ),
          <fpage>640</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Hirotaka</given-names>
            <surname>Suzuki</surname>
          </string-name>
          and
          <string-name>
            <given-names>Masato</given-names>
            <surname>Ito</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Information processing device, information processing method, and program</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Yusuke</surname>
            <given-names>Tashiro</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang Song</surname>
            , and
            <given-names>Stefano</given-names>
          </string-name>
          <string-name>
            <surname>Ermon</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Diversity can be Transferred: Output Diversification for White-</article-title>
          and
          <string-name>
            <surname>Black-box Attacks</surname>
          </string-name>
          .
          <source>arXiv: Learning</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Cihang</surname>
            <given-names>Xie</given-names>
          </string-name>
          , Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang,
          <string-name>
            <surname>Zhou Ren</surname>
            ,
            <given-names>and Alan L.</given-names>
          </string-name>
          <string-name>
            <surname>Yuille</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Improving Transferability of Adversarial Examples With Input Diversity</article-title>
          .
          <source>In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          .
          <volume>2730</volume>
          -
          <fpage>2739</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>