=Paper= {{Paper |id=Vol-3084/paper7 |storemode=property |title=Reversible Adversarial Attack Based on Reversible Image Transformation |pdfUrl=https://ceur-ws.org/Vol-3084/paper7.pdf |volume=Vol-3084 |authors=Zhaoxia Yin,Hua Wang,Li Chen,Jie Wang,Weiming Zhang }} ==Reversible Adversarial Attack Based on Reversible Image Transformation== https://ceur-ws.org/Vol-3084/paper7.pdf
Reversible Adversarial Attack Based on Reversible Image
Transformation
Zhaoxia Yin1 , Hua Wang1 , Li Chen1 , Jie Wang1 and Weiming Zhang2
1
    Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, Hefei 230601
2
    School of Information Science and Technology, University of Science and Technology of China, Hefei 230026


                                       Abstract
                                       In order to prevent illegal or unauthorized access of image data such as human faces and ensure legitimate users can use
                                       authorization-protected data, reversible adversarial attack technique is rise. Reversible adversarial examples (RAE) get both
                                       attack capability and reversibility at the same time. However, the existing technique can not meet application requirements
                                       because of serious distortion and failure of image recovery when adversarial perturbations get strong. In this paper, we
                                       take advantage of Reversible Image Transformation technique to generate RAE and achieve reversible adversarial attack.
                                       Experimental results show that proposed RAE generation scheme can ensure imperceptible image distortion and the original
                                       image can be reconstructed error-free. What’s more, both the attack ability and the image quality are not limited by the
                                       perturbation amplitude.

                                       Keywords
                                       deep neural networks, adversarial example, data protection, reversible image transformation



1. Introduction
In order to make the research significance and technical
basis of the proposed work clear, we make introduction
the following four aspects. The first is the research back-
ground, leading to the important value of adversarial
examples with both attack capability and reversibility.
Then, the research status of adversarial attack with adver-
sarial examples come. After the parallels and differences
between information hiding and adversarial examples,
reversible adversarial attacks based on information hid- Figure 1: The generation process of an adversarial example.
ing put forward. Finally, the motivation and contribution
of the proposed method is highlighted.
                                                                                       of images that have been added with specific noise to mis-
1.1. Background                                                                        lead a deep neural network model are called Adversarial
                                                                                       Examples [5], and the added noises are called Adversarial
Deep learning [1] performance is getting more and Perturbations.
more outstanding, especially in many tasks such as au-                                    As a lethal attack technology in the AI security field,
tonomous driving [2] and face recognition [3]. As an if adversarial examples are equipped with both attack
important technique of Artificial Intelligence (AI), it has capability and reversibility, it will be undoubtedly having
also been challenged by different kinds of attacks. In 2013, important application value, i.e., attacking unauthorized
Szegedy et al. [4] first discovered that adding perturba- models and harmless to authorized models with lossless
tions that are imperceptible to human vision in an image recovery capability [6].
can mislead the neural network model to get wrong re-                                     Reversible adversarial attack aims to add adversarial
sults with high confidence. As shown in Fig. 1, This kind perturbations into images in a reversible way to gener-
                                                                                       ate adversarial examples. On one hand, the generated
2021 International Workshop on Safety & Security of Deep Learning                      Reversible Adversarial Examples (RAE) can attack the
" yinzhaoxia@ahu.edu.cn (Z. Yin); 18395563070@163.com
(H. Wang); 2516585284@qq.com (L. Chen);
                                                                                       unauthorized models and prevent illegal or unauthorized
wangjie@stu.ahu.edu.cn (J. Wang); zhangwm@ustc.edu.cn                                  access of image data; on the other hand, authorized intel-
(W. Zhang)                                                                             ligent system can restore the corresponding original im-
 0000-0003-0387-4806 (Z. Yin); 0000-0001-8718-329X (H. Wang);                         ages from RAE completely and avoid interference safely.
0000-0003-3400-9444 (L. Chen); 0000-0001-5500-2060 (J. Wang);                          The emergence of RAE equips adversarial examples with
0000-0001-5576-6108 (W. Zhang)
          © 2021 Copyright for this paper by its authors. Use permitted under Creative new capabilities, which is of great significance to further
          Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR

          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                       expand the attack-defense technology and applications
of AI.                                                            Quiring et al. [12] analyzed the similarities and differ-
   However, the research has just started, and the perfor-     ences between Adversarial Example and Watermarking.
mance are not satisfied. Many problems and questions,          Both of them modify the target object to cross the de-
such as how to balance and optimize attack capability,         cision boundary at the lowest cost. In watermarking,
reversibility and image visual quality, are still waiting to   the watermarking detector is regarded as a two-classifier,
be solved and answered.                                        and the watermarking in the signal could be destroyed
                                                               by the watermarking attacks, so that the classification re-
1.2. Adversarial Attack and Adversarial                        sult could be changed from image-with-watermarking to
                                                               image-without-watermarking. In machine learning, this
     Examples                                                  boundary separates different categories, and the attacked
Attacks and defenses of adversarial examples have at-          signal, i.e. Adversarial Examples, will be misjudged by
tracted more and more attention from researchers in the        the model. Schöttle et al. [13] analyzed the similarities
field of machine learning security, and have become a          and differences between steganography and adversarial
hot research topic in recent years. Here we briefly sum-       examples. Steganography attempts to modify individual
marize the current research status of adversarial attack       pixel values to embed secret information, so that it is
and adversarial examples[7].                                   difficult for steganography analysts to detect the hidden
   Adversarial attack is to design algorithms to turn nor-     information. Schöttle et al. believe that the detection of
mal samples into adversarial examples to fool AI system.       adversarial examples belongs to the category of steganal-
According to the different degree of attacker’s under-         ysis, and develops a heuristic linear predictive adversar-
standing of the target model information, it can be di-        ial detection method based on steganalysis technology.
vided into white box and black box attacks. White-box          Zhang et al. [14] compared deep steganography and
attack refers to the construction of adversarial examples      universal adversarial perturbation, and found that the
based on information such as the structural parameters         success of both is attributed to the deep neural network’s
of the target model, Eg. Iterative Fast Gradient Sign          exceptional sensitivity to high frequency content.
Method (IFGSM) [8]. Black-box attack is to construct ad-          When we know these interesting cross-cutting stud-
versarial examples without any information of the target       ies of Adversarial Example and Information Hiding, we
model and adversarial examples are usually generated           would inevitably wonder, what would we get by com-
by training alternative models, Eg. single pixel attack [9].   bining Adversarial Example with another Information
Further more, taking image classification as an example,       Hiding technique, i.e. Reversible Data Hiding.
non-target attack only needs to make the model result in          Liu et al. achieved the first Reversible Adversarial
wrong classification for a given adversarial example and       attack by combining Reversible Data Hiding with Adver-
usually the perturbation is relatively small. For example,     sarial Examples and proposed the concept of Reversible
DeepFool attack [10]. The other kind of attack can make        adversarial examples (RAE) [15]. Since RAE get both at-
the model classify a given adversarial example into a spec-    tack capability and reversibility at the same time, illegal
ified category rather than any incorrect categories and        or unauthorized access of image data can be prevented
the representative algorithm is well known as C&W [11].        and legitimate using can be guaranteed by original image
So we can say, by slightly modifying the input digital         recovery. As shown in Fig. 2, Reversible Data Embedding
image signal, adversarial example are generated to show        (RDE) technique [16] is adopted to embed the adversar-
different information to machine or intelligent system.        ial perturbations into its adversarial image to get the
But for human vision, the information and content of the       reversible adversarial example image, from which, the
image have not been changed.                                   original image can be restored error-free. The frame-
                                                               work consists of three steps: (1) adversarial examples
                                                               generation; (2) reversible adversarial examples genera-
1.3. Reversible Adversarial Examples
                                                               tion by reversible data embedding; (3) original images
So we can say, by slightly modifying the input digital         recovery. This is really a great creative work even though
image signal, adversarial example are generated to show        the performance is far from satisfied. Let’s call it RDE-
different information to machine or intelligent system.        based RAE method and discuss the details in the coming
But for human vision, the information and content of the       section.
image have not been changed. Actually, there is another
similar technique that also aims to achieve some special 1.4. Motivation and Contribution
goals by slightly modifying the input digital image signal,
called Information Hiding, which consists of different As mentioned above, to obtain RAE, Liu et al. adopted
research topics such as Watermarking, Steganography Reversible Data Embedding technique to embed the ad-
and Reversible Data Hiding (RDH) [6].                       versarial perturbations into its adversarial image, then
                                                            the original image can be restored without distortion.
      2                Reversible Adversarial Example Generation
                                                                                                                                       2                 Reversible Adversarial Example Generation
                                                                                         1   Adversarial Example Generation

                                                                          Adversarial        Irreversible   Adversarial
     Original Images                                                                                                                                                                                                Adversarial
                                                                         Perturbations                       Examples                                                    Original Images
                                                                                                                                                                                                                   Perturbations

                                             RDH (Reversible Data EmBedding)

                                                                                                                                  RDH (Reversible Image Transformation)
                                                                                                                                                                                                 Adversarial              Irreversible

                                                                                                                                                                                                  Examples
                                                   Reversible                                               Unauthorized
          (Reversible Data Recovery )




                                                   Adversarial                                                Models
                                                                                                                                                                                                               1   Adversarial Example Generation
              Lossless Recovery




                                                    Examples
                                                   Protected Resources



                                                                                                                                                                           Reversible
                                                                                                             Authorized                                                    Adversarial                                        Unauthorized
                                                Original Images                                                                                                                                                                 Models




                                                                                                                                     (Reversible Image Recovery )
                                                                                                              Models                                                        Examples




                                                                                                                                          Lossless Recovery
                                                                                                                                                                           Protected Resources
                                        3   Original Image Restoration




                                                                                                                                                                                                                                Authorized
                                                                                                                                                                         Original Images
                                                                                                                                                                                                                                 Models
Figure 2: The overall framework of RDE-based RAE method
[15].                                                                                                                                                 3             Original Image Restoration




   However, no matter which kind of RDE algorithm is                                                                          Figure 3: The overall framework of the proposed RIT-based
adopted, the embedding capacity is always limited. That                                                                       RAE method.
means, the maximum amount of the embedding data that
can be carried by the adversarial image is also limited.
Therefore, when adversarial perturbations are strength-                                                                       2. The Proposed Method
ened, the amount of data that needs to be embedded
increases, that would result in the following three prob-                                                                     In order to achieve reversible adversarial attack, we pro-
lems: (1) The generated adversarial perturbations cannot                                                                      pose a more effective method to generate reversible ad-
be embedded completely and then the original image                                                                            versarial examples. As shown in Fig. 3, we replace re-
cannot be restored completely , which leads to the fail-                                                                      versible data hiding with RIT strategy to obtain RAE.
ure of reversibility; (2) Since too much data has to be                                                                       The original image restoration process is the inverse
embedded, the reversible adversarial image is severely                                                                        process of RIT, i.e., reversible image recovery. In this
distorted, which leads to unsatisfied image quality; (3)                                                                      section, We describe the implementation of our method
Due to increased distortion of RAE, the attack ability                                                                        as three steps: (1) Adversarial examples generation; (2)
decreases accordingly.                                                                                                        Reversible adversarial examples generation; (3) Original
   To solve these problems, here we propose to replace                                                                        image restoration.
the idea of Reversible Data Embedding with Reversible
Image Transformation (RIT) technique. In order to ver-                                                                        2.1. Adversarial Examples Generation
ify the effectiveness of the strategy, we chose one RIT
method [17] as an example to construct RAE and make                                                                           Firstly, we need to generate adversarial examples for step
performance comparisons with the method from [15].                                                                            (2). Since adversarial attacks are mainly divided into
Experiments show that the proposed scheme can com-                                                                            white box and black box. White box attack algorithms
pletely solve the problems that analyzed above. Further-                                                                      have better performance, and black box attacks usually
more, in the proposed method, realization of reversibility                                                                    rely on white box attacks indirectly, so this paper gener-
does not depend on embedding the signal difference be-                                                                        ates adversarial examples based on white-box settings.
tween original images and adversarial examples, i.e., it is                                                                   Next, we introduce several state-of-the-art white box
not limited to the strength of adversarial perturbations.                                                                     attack algorithms.
As well-known, the greater the adversarial perturbation,
                                                                                                                                   • IFGSM [8] proposed as an iterative version of
the stronger the attack ability. Therefore, the proposed
                                                                                                                                     FGSM [5]. It is a quick way to generate adversar-
method can achieve better RAE performance in terms
                                                                                                                                     ial examples, applies FGSM multiple times with
of reversibility, image quality and attack capability. We
                                                                                                                                     small perturbation instead of adding a large per-
name it RIT-based RAE method and describe it step by
                                                                                                                                     turbation.
step in Section 2. Details of experiments and results are
given in Section 3, following with Conclusion in Section                                                                           • DeepFool[10] is a untargeted attack algorithm
4.                                                                                                                                   that generates adversarial examples by explor-
                                                                                                                                     ing the nearest decision boundary, the image is
       slightly modified in each iteration to reach the              original image and the target image. Next, to
       boundary, and the algorithm will not stop un-                 restore the original image from the camouflage
       til the modified image changes the classification             image, the receiver must know the Class Index Ta-
       result.                                                       ble of the original image. By matching the blocks
     • C&W [11] is an optimization-based attack that                 in the original image with the blocks in the target
       makes perturbation undetectable by limiting the               image with similar Standard Deviations into a
       𝐿0 , 𝐿2 , 𝐿∞ norms.                                           pair, the original image and the target image can
                                                                     be obtained separately Class Index Table.
2.2. Reversible Adversarial Examples                               • Block Transformation Firstly, according to the
                                                                     block matching method, each pair of blocks has
     Generation                                                      a close Standard Deviation value. Do not change
Secondly, we take Reversible Image Transformation (RIT)              the Standard Deviation of the original image,
algorithm to generate protected resources with restricted            just change the mean value of the original im-
access capabilities, i.e., reversible adversarial examples.          age through the average shift. Then, in order
Specifically, we take the adversarial example as the tar-            to keep the similarity between the transformed
get image, and use RIT to disguise original image as the             image and the target image as much as possible,
adversarial example to directly get reversible adversarial           further rotate the transformed block to one of
example. Next, we will introduce the RIT algorithm in                four directions: 0∘ , 90∘ , 180∘ , 270∘ , and choose
detail. In fact, RIT algorithm is also a kind of reversible          the best direction to minimize Root Mean Square
data hiding technique to achieve image content protec-               Error between the rotating block and the target
tion. It can reversibly transform an original image into an          block.
arbitrarily-chosen target image with the same size to get          • Accessorial Information Embeding In order
a camouflage image, which looks almost indistinguish-                to obtain the final camouflage image, it is nec-
able from the target image. When the difference between              essary to embed auxiliary information into the
the two images is smaller, the amount of auxiliary infor-            transformed image, including: compressed Class
mation for restoring original image is greatly reduced,              Index Table and the average shift and rotation
that makes it perfect for RAE since the difference be-               direction of each block of the original image.
tween an original image and its Adversarial Example is               Choose a suitable RDH algorithm embeds these
usually very small.                                                  auxiliary information into the transformed image
                                                                     to get the final camouflage image.
2.2.1. Algorithm Implementation
In order to facilitate the understanding of the implemen-     2.3. Original Image Restoration
tation of the RIT algorithm, the following takes grayscale    Finally, the original image needs to be restored when an
image (one channel) as an example to illustrate the spe-      unauthorized model accesses it, the restoration process
cific implementation process of the algorithm [18]. For       of RIT can be directly used to realize the reverse trans-
color images, the R, G, and B color channels are trans-       formation of the reversible adversarial example to the
formed in the same way. RIT achieves the reversible           original image. Since our reversible adversarial examples
transformation on two pictures, and there are two stages      are based on RIT technology, the process of restoring
of transformation and restoration, in the transformation      the reversible adversarial examples to the original image
stage, the original image undergoes a series of pixel value   is the restoration process of RIT, while the restoration
transformations to generate a camouflage image [18]. In       process is the inverse process of the RIT transformation
the recovery stage, the hidden image transformation in-       process. Therefore, in the case of only reversible adversar-
formation needs to be extracted from the camouflage           ial examples, we can extract the hidden transformation
image, and is used for reversible restoration. Since the      information, and take the information to reverse the RIT
restoration is the reverse process of the transformation,     transformation process to non-destructively restore the
we only need to introduce the transformation process.         original images.
The transformation process is divided into three steps:
(1) Block Paring (2) Block Transformation (3) Accessorial
Information Embeding.                                         3. Evaluation and Analysis
     • Block Paring The original image and the target         To verify the effectiveness and superiority of the pro-
       image are divided into blocks in the same way          posed method, here we introduce the experiment design,
       firstly. Then, calculate Mean and Standard De-         results and comparisons, following with discussion and
       viation of the pixel values of each block of the       analysis.
3.1. Experimental Setup                                      in RDE-based RAEs of Liu et al., so the success rates of
                                                             our RAEs are lower.
     • Dataset: Since it is meaningless to attack im-
                                                                Further more, we found that, when adversarial pertur-
       ages that have been mis-classified by the model,
                                                             bations get stronger, the amount of data that needs to
       we randomly choose 5000 images from ImageNet
                                                             be embedded increases, which leads to the failure of re-
       (ILSVRC 2012 verification set) that can be cor-
                                                             versibility for RDE-based RAEs. Take Giant Panda image
       rectly classified by the model for experiments.
                                                             from Fig.1 as an example, on C&W, when confidence 𝜅
     • Deep Network: The pretrained Inception_v3 in          is 100, the amount of data that needs to be embedded by
       torchvision.models that is evaluated by Top-1 ac-     RDE-based RAE is 316311 bits, that’s far from the corre-
       curacy.                                               sponding highest embedding capacity 114986 bits. At the
     • Attack Methods: IFGSM, C&W, DeepFool. To              same time, to achieve reversible attack by using the pro-
       ensure the visual quality, we set the learning rate   posed RIT-based RAE method, the amount of additional
       of C&W_L2 to 0.005, the perturbation amplitude        data that needs to be embedded is only 105966 bits.
       𝜖 of IFGSM no more than 8/225.                           Then, to quantitatively evaluate the image quality of
                                                             RAEs, we measure three sets of PSNR: RAEs and orig-
3.2. Performance Evaluation                                  inal images,RAEs and adversarial examples as well as
                                                             original images and adversarial examples. The general
In order to evaluate the performance of the proposed         benchmark for PSNR value is 30dB, and the image dis-
method, we measure attack success rates as well as im-       tortion below 30dB can be perceived by human vision.
age quality of our reversible adversarial examples, and      In order to make a fair comparison with the method of
compare our RIT-based RAE method with RDE-based              Liu et al.[15], we keep the original image and the ad-
RAE method proposed by of Liu et al. [15].                   versarial example consistent in the experiment, and the
   In order to detect the attack ability of the generated    corresponding values of PSNR are shown in the last col-
reversible adversarial examples, firstly, three white-box    umn of Table 2. By comparing RAEs on IFGSM and C&W
attack algorithms are taken to attack the selected orig-     with the original images, we found that the PSNR values
inal images to get adversarial examples. Then, we take       of our method are higher, that means the generated RAEs
reversible image transformation to transform original im-    are less distorted than that of Liu et al. The comparison
ages into target adversarial images to generate reversible   between the RAEs and the original adversarial examples
adversarial examples. Finally, we utilize the generated      shows that our PSNR values are basically greater than
reversible adversarial images to attack the model to get     30dB, indicating our RAEs are closer to the original ad-
attack success rates. As shown in Table 1, the second        versarial examples. This result is also consistent with the
line shows the attack success rates of the generated ad-     data in Table 1, that means the specific structure of ad-
versarial examples (which are non-reversible). The third     versarial perturbation is better preserved in our method,
and fourth lines are the attack success rates of Liu et al.’sso that the final RAEs have almost the same attack effect
and our reversible adversarial examples under different      as the original adversarial example on IFGSM and C&W.
settings, respectively. On IFGSM, when 𝜖 is 4/225, 8/225,    Similar to the experimental data in Table 1 again, for
the attack success rates of our RAEs are: 70.80%, 94.55%     attack algorithms like DeepFool, the perturbation embed-
respectively. In the same case, the attack success rates of  ding amount in Liu et al.’s method is smaller than the
Liu et al.’s RAEs are only 35.22%, 81.00%, respectively. On  auxiliary information embedding amount in our work,
C&W_L2, when confidence 𝜅 is 50, 100, the attack suc-        so the PSNR values of our RAEs are smaller.
cess rates of our RAEs are: 81.02% and 94.84%, while that       In addition, Fig.4 shows the sample images of RAEs
Liu et al.’s method are just 52.73%, 55.01%, respectively.   generated by Liu et al. and our method, respectively.
From the results presented in Table 1, we observe that       After partial magnification, we can see that the image
the attack ability of the RAEs obtained by our method is     distortion of RDE-based RAEs significantly exceeds that
superior to that of Liu et al’s method. But on DeepFool,     of RIT-based RAEs. Since the amount of auxiliary infor-
because the adversarial perturbation generated by this       mation embedded in RIT-based RAEs is relatively sta-
attack closes to the theoretical minimum, its robustness     ble, while the amount of perturbation embedded in RDE-
is also relatively poor. Therefore, the amount of infor-     based RAEs is related to the perturbation signal. The
mation embedded in the adversarial examples generated        greater the perturbation, the more the amount of infor-
by the DeepFool exceeds a certain amount, which will         mation embedded, and the more the image distorted.
seriously weaken the attack performance of the adver-
sarial examples. In this kind of attack algorithm with
minimal disturbance and low robustness, the amount of 3.3. Discussion and Analysis
auxiliary information embedded in RIT-based RAEs is Both RDE-based RAE and RIT-based RAE use RDH tech-
greater than the amount of perturbation signal embedded nology to achieve adversarial reversible attacks. In RDE-
Table 1
The attack success rates of original adversarial examples, RDE-based RAEs and RIT-based RAEs.
                                                                                   IFGSM                     IFGSM     C&W_L2     C&W_L2      DeepFool
                                                                                 (𝜖=4/225)                 (𝜖=8/225)     (𝜅=50)    (𝜅=100)
                               Generated AEs                                        73.84%                   95.34%      99.98%      100%       98.35%
                               Liu et al.’s RAEs[15]                                35.22%                   81.00%      52.73%     55.01%      84.19%
                               Our RAEs                                            70.80%                    94.55%      81.02%     94.84%      54.68%


Table 2
Comparison results of image quality with PSNR(dB).
                                  Attacks                                                Methods                       RAEs/OIs   RAEs/AEs    OIs/AEs
                                  IFGSM(𝜖=4/225)                          Liu et al.’s method [15]                        22.64       23.26      37.69
                                                                             Proposed method                              30.81       33.15
                                  IFGSM(𝜖=8/225)                          Liu et al.’s method [15]                        21.93       23.55     32.31
                                                                             Proposed method                              27.59       32.13
                                  C&W_L2(𝜅=50)                            Liu et al.’s method [15]                        26.15       26.40     44.66
                                                                             Proposed method                              33.64       35.09
                                  C&W_L2(𝜅=100)                           Liu et al.’s method [15]                        22.57       23.07     38.83
                                                                             Proposed method                              32.13       34.87
                                  DeepFool                                Liu et al.’s method [15]                        40.24       40.85     51.04
                                                                             Proposed method                              34.44       35.48


                                                            the attack effect of our reversible adversarial examples
                                                            is affected to a certain extent by the amount of auxil-
                                                            iary information needed to restore the original image,
                                                            and the amount of auxiliary information is usually rel-
                                                            atively stable. Generally speaking, we can reduce the
                                                            impact of auxiliary information embedding by enhanc-
                                                            ing the adversarial perturbation. That is to say, when
                                                            generating an adversarial image, the robustness of the
    (A) Original Image   (B) Adversarial Example
                          (CW,confidence=50)
                                                            adversarial example is improved by increasing the per-
                                                   (C) Liu et al.’s Reversible
                                                      Adversarial Example
                                                                                    (D) Our Reversible Adversarial
                                                                                              Example

                                                            turbation amplitude, and finally the attack success rate
                                                            of the generated reversible adversarial example is im-
Figure 4: Sample figures of reversible adversarial examples proved. However, when faced with an attack algorithm
generated by different methods.                             similar to DeepFool with less perturbation and low ro-
                                                            bustness, since RIT auxiliary information embedding has
                                                            a greater impact on its performance than perturbation sig-
based RAE framework, Liu et al. take reversible data nal embedding, the attack success rate of our reversible
embedding algorithm to hide the perturbation difference adversarial examples is lower than Liu et al. While the
in the adversarial example to get a reversible adversarial proposed scheme is a special application of RIT, original
image. Constrained by the RDH payload, to achieve re- image and its target adversarial image have a high degree
versibility, the perturbation signal can only be controlled of similarity. Our future work is to improve reversible
within the range of the payload. A slight increase in image transformation algorithm based on the similarity
the perturbation amplitude will cause serious visual dis- between original image and adversarial example so that
tortion of the reversible adversarial example, severely the attack success rate of reversible adversarial examples
weakened attack ability, and even unable to fully embed is further improved.
the perturbation signal so that the original image can-
not be restored reversibly. In the proposed RIT-based
RAE framework, since reversible image transformation 4. Conclusion
is unnecessary to consider the size of the adversarial To solve the problems of RAE technique and improve
perturbation, the problem of difficulty in embedding ad- the performance in terms of reversibility, image qual-
versarial perturbation is solved, and it further improves ity and attack ability, we take advantage of reversible
the visual quality of the reversible adversarial example image transformation to construct reversible adversarial
to promote the overall attack success rates. In a sense,
examples, which aims to achieve reversible attack. In this   [9] J. Su, D. V. Vargas, K. Sakurai, One pixel attack for
work, we regard a generated adversarial example as the           fooling deep neural networks, IEEE Transactions
target image and its original image can be disguised as its      on Evolutionary Computation 23 (2019) 828–841.
adversarial example to get RAE. Then the original image     [10] S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard,
can be recovered from its reversible adversarial example         Deepfool: a simple and accurate method to fool
without distortion. Experimental results illustrate that         deep neural networks, in: IEEE Conference on Com-
our method overcomes the problems of perturbation in-            puter Vision and Pattern Recognition (CVPR), 2016,
formation embedding. Moreover, it’s even achieved that           pp. 2574–2582. doi:10.1109/CVPR.2016.282.
the larger adversarial perturbation, the better RAE can     [11] N. Carlini, D. Wagner, Towards evaluating the ro-
be generated. RAE can prevent illegal or unauthorized            bustness of neural networks, in: IEEE Symposium
access of image data such as human faces and ensure              on Security and Privacy (SP), IEEE, 2017, pp. 39–57.
legitimate users can use authorization-protected data.           doi:10.1109/SP.2017.49.
Today, when deep learning and other artificial intelli-     [12] P. Schöttle, A. Schlögl, C. Pasquini, R. Böhme, De-
gence technologies are widely used, this technology is           tecting adversarial examples-a lesson from multi-
of great significance. In future work, it is worth trying        media security, in: European Signal Processing
to further combine more reversible information hiding            Conference (EUSIPCO), IEEE, 2018, pp. 947–951.
technologies to study RAE solutions that meet actual        [13] E. Quiring, D. Arp, K. Rieck, Forgotten siblings:
needs.                                                           Unifying attacks on machine learning and digital
                                                                 watermarking, in: IEEE European Symposium on
                                                                 Security and Privacy (EuroS&P), IEEE, 2018, pp.
Acknowledgments                                                  488–502.
                                                            [14] C. Zhang, P. Benz, A. Karjauv, I. S. Kweon, Universal
This research work is partly supported by National Nat-
                                                                 adversarial perturbations through the lens of deep
ural Science Foundation of China (61872003, U1636201).
                                                                 steganography: Towards a fourier perspective, in:
                                                                 AAAI Conference on Artificial Intelligence, 2021,
References                                                       pp. 3296–3304.
                                                            [15] J. Liu, D. Hou, W. Zhang, N. Yu, Reversible adver-
 [1] LeCun, Yann, Bengio, Yoshua, Hinton, Geoffrey,              sarial examples., arXiv preprint arXiv: 1811.00189
      Deep learning, Nature 521 (2015) 436–444.                  (2018).
 [2] S. Aradi, Survey of deep reinforcement learning [16] W. Zhang, X. Hu, X. Li, N. Yu, Recursive histogram
      for motion planning of autonomous vehicles, IEEE           modification: establishing equivalency between re-
      Transactions on Intelligent Transportation Systems         versible data hiding and lossless data compression,
      (2020).                                                    IEEE Transactions on Image Processing 22 (2013)
 [3] J. Y. Choi, B. Lee, Ensemble of deep convolutional          2775–2785.
      neural networks with gabor face representations [17] D. Hou, C. Qin, N. Yu, W. Zhang, Reversible vi-
      for face recognition, IEEE Transactions on Image           sual transformation via exploring the correlations
      Processing 29 (2019) 3270–3281.                            within color images, Journal of Visual Communica-
 [4] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er-      tion and Image Representation 53 (2018) 134–145.
      han, I. Goodfellow, R. Fergus, Intriguing properties [18] W. Zhang, H. Wang, D. Hou, N. Yu, Reversible data
      of neural networks, in: International Conference           hiding in encrypted images by reversible image
      on Machine Learning (ICML), 2014.                          transformation, IEEE Transactions on Multimedia
 [5] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining         18 (2016) 1469–1479.
      and harnessing adversarial examples, in: Inter-
      national Conference on Learning Representations
      (ICLR), 2014.
 [6] D. Hou, W. Zhang, J. Liu, S. Zhou, D. Chen, N. Yu,
      Emerging applications of reversible data hiding, in:
      International Conference on Image and Graphics
      Processing (ICIGP), ACM, 2019, pp. 105–109.
 [7] J. Zhang, C. Li, Adversarial examples: Opportuni-
      ties and challenges, IEEE transactions on neural net-
      works and learning systems 31 (2019) 2578–2593.
 [8] A. Kurakin, I. Goodfellow, S. Bengio, Adversarial
      examples in the physical world, arXiv preprint
      arXiv:1607.02533 (2016).