=Paper=
{{Paper
|id=Vol-3207/paper11
|storemode=property
|title=Task-Guided Denoising Network for Adversarial Defense of Remote Sensing Scene Classification
|pdfUrl=https://ceur-ws.org/Vol-3207/paper11.pdf
|volume=Vol-3207
|authors=Yonghao Xu,Weikang Yu,Pedram Ghamisi
|dblpUrl=https://dblp.org/rec/conf/cdceo/XuYG22
}}
==Task-Guided Denoising Network for Adversarial Defense of Remote Sensing Scene Classification==
<pdf width="1500px">https://ceur-ws.org/Vol-3207/paper11.pdf</pdf>
<pre>
Task-Guided Denoising Network for Adversarial Defense
of Remote Sensing Scene Classification
Yonghao Xu1 , Weikang Yu2 and Pedram Ghamisi1,3
1
  Institute of Advanced Research in Artificial Intelligence (IARAI), 1030 Vienna, Austria
2
  The Chinese University of Hong Kong in Shenzhen, School of Science and Engineering, 518172 Shenzhen, China
3
  Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Machine Learning Group, 09599 Freiberg,
Germany


                                       Abstract
                                       Deep learning models have achieved state-of-the-art performance in the interpretation of geoscience and remote sensing
                                       data. However, their vulnerability to adversarial attacks should not be neglected. To address this challenge, we propose
                                       a task-guided denoising network to conduct adversarial defense for the remote sensing scene classification task in this
                                       study. Specifically, given an adversarial remote sensing image, we use a denoising network to transform it as close to its
                                       corresponding clean image as possible with the constraint of the appearance loss. Besides, to further correct the predicted
                                       logits, the perceptual loss and the classification loss are adopted with the aid of a pre-trained classification network with fixed
                                       weights. Despite its simplicity, extensive experiments on the UAE-RS (universal adversarial examples in remote sensing)
                                       dataset demonstrate that the proposed method can significantly improve the resistibility of different deep learning models
                                       against the adversarial examples.

                                       Keywords
                                       Adversarial defense, adversarial attack, adversarial example, remote sensing, scene classification, deep learning


1. Introduction                                                                                                    One possible way to conduct adversarial defense is ad-
                                                                                                                versarial training, where both the original clean samples
Recent advances in deep learning algorithms have sig- and the generated adversarial examples are combined to
nificantly boosted the interpretation of geoscience and train the model [7, 8]. Nevertheless, adversarial training
remote sensing data [1, 2]. Nevertheless, the vulnera- can hardly improve the inherent robustness of deep neu-
bility of deep learning models to adversarial examples ral networks. Thus, the trained model can be attacked
should not be neglected. Szegedy et al. first discovered again by newly generated adversarial examples [9]. An-
that deep neural networks are very fragile to specific other type of defense is to design novel architectures or
perturbations generated by adversarial attack methods modules that are more robust against adversarial exam-
[3]. Simply adding these mild perturbations to the clean ples. For example, the self-attention mechanism and the
images, the adversarial examples are generated, which context encoding module are utilized in [10] to improve
may possess imperceptible differences from the origi- the inherent resistibility of deep neural networks. De-
nal images for human observers but could mislead the spite its effectiveness, this method requires retraining
deep neural networks to make wrong predictions with the deployed models since it changes the architecture of
high confidence. In fact, this phenomenon is not lim- the target models. Considering that retraining the de-
ited to computer vision tasks. Researchers have found ployed models may be infeasible in practical applications,
that adversarial examples do exist in the geoscience and it would be significant to develop adversarial defense
remote sensing field and can be generated based on op- methods that could directly decrease the harmfulness of
tical data [4], LiDAR point cloud [5], or even synthetic the input adversarial examples.
aperture radar (SAR) data [6]. Since most geoscience and                                                           To this end, transformation-based methods are devel-
remote sensing tasks are highly safety-critical, it is vitally oped which aim to remove or weaken the perturbations
important to develop adversarial defense methods and that exist in the adversarial examples. In [11], Tabacof et
improve the resistibility of the deployed deep learning al. explored how different levels of the Gaussian noise
model against adversarial examples.                                                                             could influence the classification performance on adver-
                                                                                                                sarial examples. Raff et al. further conducted more com-
CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth
Observation, July 25, 2022, Vienna, Austria
                                                                                                                plex transformations like Gaussian blur, gray scale, and
Envelope-Open yonghao.xu@iarai.ac.at (Y. Xu); weikangyu@link.cuhk.edu.cn                                        color jitter [12]. However, since these transformation-
(W. Yu); pedram.ghamisi@iarai.ac.at (P. Ghamisi)                                                                based methods may also cause new noises (e.g., Gaussian
Orcid 0000-0002-6857-0152 (Y. Xu); 0000-0003-1111-572X (W. Yu);                                                 noise) or style differences (e.g., color jitter) to the trans-
0000-0003-1203-741X (P. Ghamisi)                                                                                formed image, their defense performance is limited.
                   © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
    CEUR
    Workshop
                   Attribution 4.0 International (CC BY 4.0).
                   CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                                   Different from the aforementioned methods, this study
    Proceedings
         Task-guided denoising network
          Pre-trained classification network
                                                                                             ……


                                                                                 Airp
                                                                                 Mea ne
                                                                                 Bea ow
                                                                                 Farmh
                                                                                 Fore land
                                                                                 Parkst
          Data flow of the adversarial examples


                                                                                     la

                                                                                     c
                                                                                      d
          Data flow of the clean images

                                                                                             ……


                                                                                 Airp
                                                                                 Mea ne
                                                                                 Bea ow
                                                                                 Farmh
                                                                                 Fore land
                                                                                 Parkst
                                                                                     la

                                                                                     c
                                                                                      d
Figure 1: An illustration of the proposed adversarial defense framework with the task-guided denoising network (TGDN).


addresses the adversarial defense problem from the per-       image 𝑥𝑐𝑙𝑒𝑎𝑛 from the training set, we first use I-FGSM
spective of denoising. Specifically, we propose a novel       [13] to generate the corresponding adversarial example
task-guided denoising network (TGDN) for the adversar-        𝑥𝑎𝑑𝑣 (note that the use of I-FGSM is only to simulate
ial defense of remote sensing scene classification. The       the adversarial examples that may exist in the test set
main idea of the proposed method is to train a denois-        since we have no access to the real adversarial attack
ing network using clean remote sensing images and the         method adopted by the adversary in practice). Then,
corresponding adversarial examples. Since it is usually       we use 𝑇 to denoise 𝑥𝑎𝑑𝑣 and get the transformed image
infeasible to know which attack method the adversary          𝑇 (𝑥𝑎𝑑𝑣 ). Specifically, 𝑇 aims to alleviate the difference
would use in practice, we adopt the iterative fast gradient   between 𝑥𝑎𝑑𝑣 and 𝑥𝑐𝑙𝑒𝑎𝑛 from three aspects: the visual ap-
sign method (I-FGSM) [13] to generate the adversarial ex-     pearance difference, the feature representation difference,
amples for the simulation purpose in the training phase.      and the probability distribution difference. Accordingly,
Once the training is finished, the denoising network is       the training of 𝑇 is constrained by the appearance loss
expected to possess the defense ability against unknown       ℒ𝑎𝑝𝑝 , perceptual loss ℒ𝑝𝑒𝑟 , and classification loss ℒ𝑐𝑙𝑠
adversarial attacks. Despite its simplicity, extensive ex-    with the aid of Φ. Once the training is finished, we then
periments on the UAE-RS (universal adversarial examples       use 𝑇 to denoise samples in the adversarial test set.
in remote sensing) dataset [14] demonstrate that the pro-
posed TGDN can significantly improve the resistibility        2.2. Optimization
of different deep learning models against the adversarial
examples.                                                     Since the adversarial perturbation also belongs a special
   The rest of this paper is organized as follows. Sec-       type of noise, an intuitive idea is to conduct a transfor-
tion 2 describes the proposed TGDN in detail. Section 3       mation on the input adversarial example and remove the
presents the experiments in this study. Conclusions and       existing adversarial perturbation. To this end, we first
other discussions are made in Section 4.                      adopt the ℓ1 norm to define the appearance loss ℒ𝑎𝑝𝑝 :
                                                                                     𝑛𝑖𝑟 𝑛𝑖𝑐
                                                                                 1                           (𝑟,𝑐)
2. Methodology                                                       ℒ𝑎𝑝𝑝 =           ∑ ∑ |𝑇 (𝑥𝑎𝑑𝑣 )(𝑟,𝑐) − 𝑥𝑐𝑙𝑒𝑎𝑛 |,   (1)
                                                                              𝑛𝑖𝑟 𝑛𝑖𝑐 𝑟=1 𝑐=1
2.1. Overview of the Proposed TGDN                       where 𝑛𝑖𝑟 and 𝑛𝑖𝑐 denote the numbers of row and column
As shown in Figure 1, there are two main components in in the image, respectively. The constraint in (1) will en-
the proposed adversarial defense framework, including a courage the transformed image 𝑇 (𝑥𝑎𝑑𝑣 ) to possess similar
task-guided denoising network 𝑇 and a pre-trained clas- appearance to the original clean image 𝑥𝑐𝑙𝑒𝑎𝑛 .
sification network Φ (with fixed weights). Given a clean    Considering that the adversarial perturbation would
                                                         also influence the intermediate feature representation of
Adversarial Image      JPG      Downsampling     Color Jitter   DnCNN          CBDNet         HiNet           TGDN         Clean Image


Figure 2: Example adversarial images in the UAE-RS UCM dataset and the corresponding transformed images using different
methods.


Table 1
The overall accuracy (%) of different deep models on the UAE-RS UCM adversarial test set with different transforms.

                         No Transform     JPG      Downsampling      Color Jitter   DnCNN       CBDNet       HiNet      TGDN (ours)
     AlexNet                 30.86       33.90        32.29            30.57         65.62       62.57       67.05        73.90
      VGG11                  26.57       28.57        32.00            26.48         52.38       50.00       57.24        65.24
      VGG16                  19.52       38.10        39.24            22.48         58.57       49.81       55.43        68.67
      VGG19                  29.62       32.00        42.86            32.00         51.81       46.00       59.62        69.71
   Inception-v3              30.19       49.71        52.29            34.10         60.48       58.48       64.76        74.48
     ResNet18                 2.95        7.05         7.14             4.86         11.52        5.62        4.10         9.90
     ResNet50                25.52       37.62        39.71            26.57         53.05       47.05       52.38        65.81
    ResNet101                28.10       39.52        45.33            28.38         53.24       50.67       56.48        69.43
    ResNeXt50                26.76       40.10        41.90            28.10         47.81       41.52       49.33        63.24
   ResNeXt101                33.52       40.67        48.48            30.67         59.05       56.67       62.10        74.67
   DenseNet121               17.14       35.90        31.90            24.86         48.29       43.71       45.81        61.52
   DenseNet169               25.90       37.14        40.48            28.86         47.24       41.43       46.67        59.81
   DenseNet201               26.38       40.67        48.67            32.29         52.57       43.81       51.33        64.95
 RegNetX-400MF               27.33       32.29        40.29            27.05         51.81       49.81       56.67        66.38
  RegNetX-8GF                40.76       41.52        48.38            34.57         56.76       53.43       63.71        73.33
  RegNetX-16GF               34.86       54.67        55.14            34.95         68.19       64.19       69.05        78.67


the image in the deep neural network, we further define                Finally, we define the classification loss ℒ𝑐𝑙𝑠 to clean
the perceptual loss ℒ𝑝𝑒𝑟 :                                           the wrong logits in the output space of Φ:
                    𝑛𝑓 𝑟 𝑛𝑓 𝑐
             1                                                                ℒ𝑐𝑙𝑠 = ‖𝜎 (Φ (𝑇 (𝑥𝑎𝑑𝑣 ))) − 𝜎 (Φ (𝑥𝑐𝑙𝑒𝑎𝑛 )) ‖2 ,       (3)
ℒ𝑝𝑒𝑟 =             ∑ ∑ ‖Φ (𝑇 (𝑥𝑎𝑑𝑣 ))(𝑟,𝑐) −Φ𝑓 (𝑥𝑐𝑙𝑒𝑎𝑛 )(𝑟,𝑐) ‖2 ,
         𝑛𝑓 𝑟 𝑛𝑓 𝑐 𝑟=1 𝑐=1 𝑓                                         where 𝜎 (⋅) denotes the softmax function, and Φ (⋅) is the
                                                             (2)     predicted logits of Φ. With the constraint in (3), 𝑇 (𝑥𝑎𝑑𝑣 )
where 𝑛𝑓 𝑟 and 𝑛𝑓 𝑐 denote the numbers of row and col-               will tend to possess similar probability distribution to
umn in the intermediate feature map, respectively. Φ𝑓 (⋅)            the original clean image on the pre-trained network Φ.
denotes the output of the intermediate feature extraction            The complete loss function ℒ for training the proposed
layer in the pre-trained classification network Φ (with              framework is formulated as:
fixed weights). With the constraint in (2), 𝑇 (𝑥𝑎𝑑𝑣 ) will
                                                                                    ℒ = ℒ𝑎𝑝𝑝 + 𝜆𝑝𝑒𝑟 ℒ𝑝𝑒𝑟 + ℒ𝑐𝑙𝑠 ,                    (4)
tend to possess identical high-level feature representation
to the original clean image [15].                                    where 𝜆𝑝𝑒𝑟 is a weighting factor.
Adversarial Image     JPG      Downsampling       Color Jitter   DnCNN        CBDNet        HiNet     TGDN      Clean Image


Figure 3: Example adversarial images in the UAE-RS AID dataset and the corresponding transformed images using different
methods.


Table 2
The overall accuracy (%) of different deep models on the UAE-RS AID adversarial test set with different transforms.

                       No Transform        JPG      Downsampling    Color Jitter   DnCNN     CBDNet   HiNet   TGDN (ours)
        AlexNet            21.54          22.30        31.76          19.78         53.56     54.92   60.54     63.40
         VGG11             15.40          16.26        21.46          14.08         39.60     41.14   48.14     49.78
         VGG16             11.88          13.76        16.42          11.46         40.92     43.04   47.78     51.34
         VGG19             12.64          18.78        20.44          15.66         35.10     37.88   43.00     54.02
      Inception-v3         21.54          36.12        39.62          22.46         49.06     47.46   53.16     65.14
        ResNet18            1.28           7.72         4.96           2.28         10.50      7.60    4.60     24.22
        ResNet50           10.74          18.40        19.84           8.06         40.26     37.58   41.10     60.66
       ResNet101           11.56          24.30        24.02          13.86         43.74     42.14   45.04     66.90
       ResNeXt50           8.12           26.50        26.84          12.00         40.10     38.56   44.66     62.94
      ResNeXt101            7.86          20.64        29.18           9.10         45.84     43.14   47.20     63.26
      DenseNet121          10.30          16.52        21.74          14.70         36.82     35.48   39.28     58.16
      DenseNet169          10.78          17.90        20.82          12.94         36.48     32.16   38.18     56.62
      DenseNet201          14.22          24.28        29.14          16.00         42.90     38.90   44.12     65.60
    RegNetX-400MF          23.20          27.26        28.92          18.24         45.10     41.64   51.28     64.78
     RegNetX-8GF           18.92          30.22        31.26          16.28         47.76     47.86   55.02     66.88
     RegNetX-16GF          21.00          33.82        32.06          20.00         49.44     49.52   55.20     65.26


3. Experiments                                                      from the UCM dataset and 5000 adversarial test samples
                                                                    from the AID dataset generated by the Mixcut-Attack
3.1. Dataset                                                        method. Some example adversarial images are shown in
                                                                    the first columns of Figures 2 and 3.
The UAE-RS (universal adversarial examples in remote
sensing) dataset1 is utilized to evaluate the performance
of the proposed method.                                             3.2. Implementation Details
   UAE-RS provides high-resolution remote sensing ad-               We adopt the JPG [16], Downsampling [17], Color Jit-
versarial examples for both scene classification and se-            ter [12], DnCNN (Denoising CNN) [18], CBDNet (Con-
mantic segmentation tasks [14]. For the scene classifica-           volutional Blind Denoising Network) [19], HiNet (Half
tion task, UAE-RS contains 1050 adversarial test samples            Instance Normalization Network) [20], along with the
                                                                    proposed TGDN to conduct adversarial defenses. For
1
    https://github.com/YonghaoXu/UAE-RS                             the JPG method, we compress the adversarial examples
with a quality of 25 [21]. The Downsampling method is        mance improvements obtained by denoising methods are
implemented with a sampling rate of 0.5 by the bilinear      more obvious. However, although all the denoising net-
interpolation. For the Color Jitter method, we randomly      works used in this study may yield similar results from
change the brightness, contrast, saturation, and hue of      the perspective of the visual appearance according to Fig-
the adversarial examples using the uniform distribution      ures 2 and 3, their quantitative defense performance may
from 0.5 to 1.5.                                             vary a lot in different scenarios. Take the VGG16 model
   The I-FGSM [13] with the ℓ∞ norm is adopted to gen-       on the UCM adversarial test set for example. While the
erate adversarial examples for training the denoising        proposed TGDN can yield an OA of around 68%, CBDNet
networks used in this study. The perturbation level in       can only yield an OA of around 49% in this case. This
I-FGSM is fixed to 1 and the number of total iterations is   phenomenon indicates that simply using traditional de-
set to 5. We use the same transform network used in the      noising networks may not defend the adversarial attacks
HiNet to implement the task-guided denoising network         effectively since there is no specific design to tackle the
𝑇. The ResNet18 [22] pre-trained on the UCM dataset          adversarial perturbations in traditional denoising meth-
or the AID dataset is adopted as the classification net-     ods. Even though the denoised images are very similar
work Φ. The weighting factor 𝜆𝑝𝑒𝑟 in (4) is set to 1𝑒 − 3.   to the original clean images, there may still exist imper-
We use the Adam optimizer [23] with a learning rate of       ceptible adversarial perturbations that are harmful to
1𝑒 − 3 and a weight decay of 5𝑒 − 5 to train the denoising   the recognition performance. By contrast, the proposed
networks used in this study. The batch size is set to 32,    TGDN can achieve the highest OA in all defense scenarios
and numbers of training epochs are set to 100, and 30 for    except the case of the ResNet18 on the UCM adversarial
UCM and AID datasets, respectively. All experiments in       test set, where TGDN ranks second place. These results
this study are implemented with the PyTorch platform         demonstrate the effectiveness of the proposed method.
[24] using two NVIDIA Tesla A100 (40GB) GPUs.

                                                             4. Conclusions and Discussions
3.3. Experimental Results
To qualitatively evaluate how different transformations      Although deep learning-based methods have achieved
would influence the input adversarial examples, we first     state-of-the-art performance in the interpretation of geo-
visualize some example adversarial images in the UAE-        science and remote sensing data, their vulnerability to
RS dataset and the corresponding transformed images          adversarial examples can not be ignored in practical ap-
with different methods in Figures 2 and 3. Compared          plications. To address the threat of adversarial examples
to the traditional transformation methods like the JPG       for the remote sensing scene classification task, we pro-
or Downsampling, the transformed images generated by         pose a novel task-guided denoising network (TGDN) to
denoising methods generally possess much more similar        conduct the adversarial defense in this study. Specifically,
appearances to the original clean images. Besides, it        the proposed TGDN aims to alleviate the difference be-
can be observed that the visual appearance difference        tween the adversarial examples and the original clean
of the denoised images generated by DnCNN, CBDNet,           images from three aspects: the visual appearance dif-
HiNet, and TGDN is very difficult to perceive for human      ference, the feature representation difference, and the
observers.                                                   probability distribution difference. To further evaluate
   We further test the overall accuracy (OA) of different    how TGDN would influence the classification results
deep learning models on the UAE-RS dataset using dif-        of different deep learning models, the UAE-RS dataset
ferent transforms to quantitatively evaluate how these       is used in the experiments. Despite the simplicity of
defense methods would influence the classification per-      the proposed TGDN, extensive experiments demonstrate
formance. As shown in Tables 1 and 2, due to the threat      that TGDN can significantly improve the resistibility of
of adversarial attacks, the existing state-of-the-art deep   different deep learning models against the adversarial
learning models can hardly achieve satisfactory recogni-     examples.
tion results if no transform process or defense method is       Since the proposed method only considers a single
used. On the UAE-RS AID adversarial test set, all models     pre-trained network (ResNet18) when training the task-
used in this study can only achieve an OA of less than       guided denoising network, whether the ensemble learn-
25% without transform. Besides, the improvements ob-         ing with multiple pre-trained networks would improve
tained from traditional transform methods like the JPG,      the defense performance deserves further study. We will
Downsampling, and Color Jitter are limited and not stable.   try to explore it in our future work.
In some cases, they would even decrease the accuracy
as these methods may bring about new noises or style
differences, which are harmful to the deployed models.
Compared to traditional transform methods, the perfor-
References                                                         ial machine learning at scale, arXiv preprint
                                                                   arXiv:1611.01236 (2016).
 [1] P. Ghamisi, J. Plaza, Y. Chen, J. Li, A. J. Plaza, Ad-   [14] Y. Xu, P. Ghamisi, Universal adversarial examples
     vanced spectral classifiers for hyperspectral images:         in remote sensing: Methodology and benchmark,
     A review, IEEE Geosci. Remote Sens. Mag. 5 (2017)             IEEE Trans. Geos. Remote Sens. 60 (2022) 1–15.
     8–32.                                                    [15] J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for
 [2] Y. Xu, B. Du, L. Zhang, D. Cerra, M. Pato, E. Car-            real-time style transfer and super-resolution, in: Eu-
     mona, S. Prasad, N. Yokoya, R. Hänsch, B. Le Saux,            ropean Conference on Computer Vision, Springer,
     Advanced multi-sensor optical remote sensing for              2016, pp. 694–711.
     urban land use and land cover classification: Out-       [16] G. K. Dziugaite, Z. Ghahramani, D. M. Roy, A study
     come of the 2018 ieee grss data fusion contest, IEEE          of the effect of jpg compression on adversarial im-
     J. Sel. Topics Appl. Earth Observ. Remote Sens. 12            ages, arXiv preprint arXiv:1608.00853 (2016).
     (2019) 1709–1724.                                        [17] C. Guo, M. Rana, M. Cisse, L. Van Der Maaten,
 [3] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er-        Countering adversarial images using input trans-
     han, I. Goodfellow, R. Fergus, Intriguing properties          formations, arXiv preprint arXiv:1711.00117 (2017).
     of neural networks, arXiv preprint arXiv:1312.6199       [18] K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang,
     (2013).                                                       Beyond a gaussian denoiser: Residual learning of
 [4] L. Chen, Z. Xu, Q. Li, J. Peng, S. Wang, H. Li, An            deep cnn for image denoising, IEEE Trans. Image
     empirical study of adversarial examples on remote             Process. 26 (2017) 3142–3155.
     sensing image scene classification, IEEE Trans.          [19] S. Guo, Z. Yan, K. Zhang, W. Zuo, L. Zhang, Toward
     Geos. Remote Sens. 59 (2021) 7419–7433.                       convolutional blind denoising of real photographs,
 [5] Y. Cao, C. Xiao, B. Cyr, Y. Zhou, W. Park, S. Ram-            in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
     pazzi, Q. A. Chen, K. Fu, Z. M. Mao, Adversarial sen-         2019, pp. 1712–1722.
     sor attack on lidar-based perception in autonomous       [20] L. Chen, X. Lu, J. Zhang, X. Chu, C. Chen, Hinet:
     driving, in: Proceedings of the 2019 ACM SIGSAC               Half instance normalization network for image
     conference on computer and communications secu-               restoration, in: Proc. IEEE Conf. Comput. Vis. Pat-
     rity, 2019, pp. 2267–2281.                                    tern Recognit., 2021, pp. 182–192.
 [6] H. Li, H. Huang, L. Chen, J. Peng, H. Huang, Z. Cui,     [21] G. K. Wallace, The jpeg still picture compression
     X. Mei, G. Wu, Adversarial examples for cnn-based             standard, IEEE Transactions on Consumer Elec-
     sar image classification: An experience study, IEEE           tronics 38 (1992) xviii–xxxiv.
     J. Sel. Topics Appl. Earth Observ. Remote Sens. 14       [22] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn-
     (2020) 1333–1347.                                             ing for image recognition, in: Proc. IEEE Conf.
 [7] I. J. Goodfellow, J. Shlens, C. Szegedy, Explain-             Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
     ing and harnessing adversarial examples, arXiv           [23] D. P. Kingma, J. Ba, Adam: A method for stochas-
     preprint arXiv:1412.6572 (2014).                              tic optimization, arXiv preprint arXiv:1412.6980
 [8] Y. Xu, B. Du, L. Zhang, Assessing the threat of               (2014).
     adversarial examples on deep neural networks for         [24] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-
     remote sensing scene classification: Attacks and              bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
     defenses, IEEE Trans. Geos. Remote Sens. 59 (2021)            L. Antiga, et al., Pytorch: An imperative style, high-
     1604–1617.                                                    performance deep learning library, Advances in
 [9] N. Akhtar, A. Mian, N. Kardan, M. Shah, Advances              Neural Information Processing Systems 32 (2019).
     in adversarial attacks and defenses in computer
     vision: A survey, IEEE Access (2021).
[10] Y. Xu, B. Du, L. Zhang, Self-attention context net-
     work: Addressing the threat of adversarial attacks
     for hyperspectral image classification, IEEE Trans.
     Image Process. 30 (2021) 8671–8685.
[11] P. Tabacof, E. Valle, Exploring the space of adver-
     sarial images, in: International Joint Conference on
     Neural Networks (IJCNN), IEEE, 2016, pp. 426–433.
[12] E. Raff, J. Sylvester, S. Forsyth, M. McLean, Barrage
     of random transforms for adversarially robust de-
     fense, in: Proc. IEEE Conf. Comput. Vis. Pattern
     Recognit., 2019, pp. 6528–6537.
[13] A. Kurakin, I. Goodfellow, S. Bengio, Adversar-

</pre>