<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Image Enhancement and Adversarial Atack Pipeline for Scene Privacy Protection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Bilal Sakha</string-name>
          <email>mbilal.sakha@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Habib University</institution>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>In this paper, we propose approaches to prevent automatic inference of scene class by classifiers and also enhance (or maintain) the visual appeal of images. The task is part of the Pixel Privacy challenge of the MediaEval 2019 workshop. The fusion based approaches we propose apply adversarial perturbations on the images enhanced by image enhancement algorithms instead of the original images. They combine the benefits of image style transfer/contrast enhancement and the white-box adversarial attack methods and have not been previously used in the literature for fooling the classifier and enhancing the images at the same time. We also propose to use simple Euclidean transformations which include image translation and rotation and show their eficacy in fooling the classifier. We test the proposed approaches on a subset of the Places365-standard dataset and get promising results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Social media users unintentionally expose private information
when sharing photos online [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], such as locations a user visited
etc., which can be automatically inferred by state of the art methods
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The focus of Pixel Privacy task of MediaEval 2019 workshop
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is to protect user uploaded multimedia data online. The task
objective is to use image transformation algorithms for blocking
the automatic inference of scene class by convolutional neural
network (ConvNet) based ResNet50 classifier [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] trained on
Places365standard dataset [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The proposed methods should also either
increase (or maintain) the visual appeal of an image. Additional
details of the task can be found in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        We propose to combine image style-transfer and image
enhancement with adversarial image perturbations to increase the visual
appeal of the images, in addition to blocking the automatic
inference of scene class information by the classifier. We also apply
white-box (where the attacker has access to the model’s
parameters) adversarial perturbations alone to compare the performance
to the fusion based approaches. Finally, we use simple euclidean
operations like image translation and rotation to show how they
are also able to fool the classifier. The proposed approaches are
evaluated on the basis of reduction in the top-1 classifier accuracy
and Neural Image Assessment (NIMA) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] score is used to
evaluate the image quality of the transformed images. The motivation
behind proposing fusion based approaches is to incentivize the
social media users to use such methods for not only protecting the
privacy-sensitive information in the photos, but also to enhance
their photos as an added bonus.
CartoonGAN style transfer and Iterative least-likely class
adversarial attack: In the first approach, we use an image style
transfer method based on Generative Adversarial Networks (GANs)
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] called CartoonGAN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which enhances the image by applying
cartoon style efects. On these enhanced set of images, we then
apply a white-box targeted adversarial attack called the Iterative
least-likely class method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which is a variant of the Fast Gradient
Sign Method (FGSM) proposed by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The Iterative least-likely
class method tries to make an adversarial image by adding noise
to the clean image, so that it will be classified as the class with the
lowest confidence score for clean image. For choosing optimal ϵ
(limit on the perturbation size), instead of doing binary search on
each example because of the computational expense, we choose
the value of ϵ to be 8/255 on the basis of experimental results on a
subset of validation set images. When enhancing the images using
CartoonGAN, Hayao style is chosen because it results in the largest
increase of mean aesthetic score among diferent CartoonGAN
styles on the validation images.
      </p>
      <p>
        CartoonGAN style transfer and PGD: In a slightly modified
version, we now apply Projected Gradient Descent (PGD) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
adversarial attack after enhancing the images with CartoonGAN style
transfer. Here, we apply an untargeted adversarial attack, unlike in
the previous method where the target class is the least-likely class
of clean image. For the PGD adversarial attack, we chose the value
of ϵ to be 2/255 and the stepsize is chosen as 1/ϵ on the basis of
empirical results on a subset of validation images.
      </p>
      <sec id="sec-1-1">
        <title>Image contrast enhancement &amp; Iterative least-likely class:</title>
        <p>
          In this approach, we first enhance the contrast of the images using
the method proposed by [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and then perturb the enhanced images
using the Iterative least-likely class adversarial method [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The
reason for applying image processing to enhance the images initially
is because the adversarial perturbation methods, reduce the visual
appeal of the images, so enhancing the visual appeal of the images
before applying adversarial perturbations will not only result in
better performance on image quality metrics, but may also incentivize
users to use this method over adversarial perturbations alone. In
the image contrast enhancement approach by [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the input image
is fused with the synthetic image, which is obtained by finding the
best exposure ratio to well-expose the under-exposed regions in
the original image. Both the images are then fused according to
the weight matrix, which is designed using illumination estimation
techniques and the output is the contrast enhanced image. On these
enhanced set of images, we then apply the Iterative least-likely
class method, with the same parameters values as mentioned in the
ifrst approach.
In order to compare the adversarial image perturbations with
previous fusion based approaches, we use a more powerful variant of
FGSM method, called Private-Fast Gradient Sign Method (P-FGSM)
recently proposed by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The values of ϵ and σ used for this method
are set to 8/255 and 0.99 respectively.
2.3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Euclidean transformations</title>
      <p>
        Inspired from the center crop and random crop operations in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
to fool the classifier, we choose to explore other simple geometric
operations on images, which are often overlooked in favor of
adversarial attacks to fool the classifier. We consider two basic euclidean
transformations i.e. image translation and rotation. To choose the
optimal translation and rotation value to fool the classifier, we use
the robust optimization method proposed by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], instead of the
computationally expensive grid-search. For majority of the images,
we constrain translation to be within 20% of image size in each
spatial direction and rotation up to 20°, and fill the resulting empty
image spaces with zero pixel value.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS AND EVALUATION</title>
      <p>
        In the Pixel Privacy task of the MediaEval 2019 workshop, the
participants are allowed to submit five runs for the task, which
are evaluated on the basis of top-1 classification accuracy (lower is
better) and NIMA score [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] (higher is better), as shown in Table 1.
Figure 1 shows the original image and the transformed images by
diferent approaches and the corresponding top-5 class prediction.
Fusion based approaches: The performance of CartoonGAN +
Iterative least-likely class adversarial method is good in terms of
the top-1 accuracy, however it has the worst NIMA score of 4.37
among all runs. CartoonGAN + PGD adversarial method has the
best NIMA score of 4.77 among all runs, but considerably higher
classifier accuracy of 14%, which it is still less than 50%.
      </p>
      <p>
        For Contrast Enhancement + Iterative least-likely class run, we
get the lowest 0% top-1 accuracy and 4.47 NIMA score. The images
enhanced using contrast enhancement method [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] look visually
more appealing to the naked eye, however the NIMA score after
applying only contrast enhancement is still slightly less than that
of the clean images which is unexpected.
      </p>
      <sec id="sec-3-1">
        <title>Private-FGSM adversarial attack: Private-FGSM attack reduces</title>
        <p>the top-1 classifier accuracy to 0%, at the cost of added noise in the
submitted images, which is reflected in the reduced NIMA score of
4.49. Private-FGSM attack and previously used Iterative least-likely
class methods are bounded by l∞ norm, which results in small
noise evenly distributed in the image, as can be seen by zooming
the transformed images in Figure 1.</p>
        <p>Euclidean transformations: The final run of euclidean
transformations which consists of translation and rotation operations
achieves 6.667% top-1 classifier accuracy with a reasonable NIMA
score of 4.42. For each image, finding the optimal translation and
rotation value to fool the classifier is computationally expensive
due to number of random transformations, therefore we test this
approach on smaller subset of test dataset consisting of 60 images
called test_manual, provided by the task organizers.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION AND OUTLOOK</title>
      <p>In this paper, diferent approaches have been proposed for the
Pixel Privacy task of MediaEval 2019 workshop. The fusion based
approaches combining style transfer/image enhancement with
adversarial attacks are chosen to increase the image appeal score
beforehand, as reducing the classifier accuracy through adversarial
perturbations decrease image appeal score, due to addition of noise.</p>
      <p>In future, increasing image appeal by using the state of the art
deep learning based image enhancement methods for image
denoising, color/contrast/exposure adjustment etc. and then applying
adversarial perturbation in our opinion will yield better results.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Yang</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu-Kun Lai</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yong-Jin Liu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>CartoonGAN: Generative adversarial networks for photo cartoonization</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>9465</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jaeyoung</given-names>
            <surname>Choi</surname>
          </string-name>
          , Martha Larson,
          <string-name>
            <given-names>Xinchao</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Gerald</given-names>
            <surname>Friedland</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alan</given-names>
            <surname>Hanjalic</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The geo-privacy bonus of popular photo enhancements</article-title>
          .
          <source>In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM</source>
          ,
          <volume>84</volume>
          -
          <fpage>92</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Logan</given-names>
            <surname>Engstrom</surname>
          </string-name>
          , Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and
          <string-name>
            <given-names>Aleksander</given-names>
            <surname>Madry</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Exploring the Landscape of Spatial Robustness</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          .
          <fpage>1802</fpage>
          -
          <lpage>1811</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Generative adversarial nets</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>2672</volume>
          -
          <fpage>2680</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ian</surname>
            <given-names>J Goodfellow</given-names>
          </string-name>
          , Jonathon Shlens, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Explaining and harnessing adversarial examples</article-title>
          .
          <source>arXiv preprint arXiv:1412.6572</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>770</volume>
          -
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Alexey</given-names>
            <surname>Kurakin</surname>
          </string-name>
          , Ian Goodfellow, and
          <string-name>
            <given-names>Samy</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Adversarial examples in the physical world</article-title>
          .
          <source>arXiv preprint arXiv:1607.02533</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Shamsabadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sanchez-Matilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mazzon</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Cavallaro</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Scene Privacy Protection</article-title>
          .
          <source>In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing</source>
          . Brighton, UK.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Zhuoran</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>First Steps in Pixel Privacy: Exploring Deep Learning-based Image Enhancement against LargeScale Image Inference.</article-title>
          .
          <source>In MediaEval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Zhuoran</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Martha</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <year>2019</year>
          . Pixel Privacy 2019:
          <article-title>Protecting Sensitive Scene Information in Images</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2019 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Aleksander</surname>
            <given-names>Madry</given-names>
          </string-name>
          , Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Vladu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Towards deep learning models resistant to adversarial attacks</article-title>
          .
          <source>arXiv preprint arXiv:1706.06083</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Tribhuvanesh</surname>
            <given-names>Orekondy</given-names>
          </string-name>
          , Bernt Schiele, and
          <string-name>
            <given-names>Mario</given-names>
            <surname>Fritz</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Towards a visual privacy advisor: Understanding and predicting privacy risks in images</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          . 3686-
          <fpage>3695</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Hossein</given-names>
            <surname>Talebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peyman</given-names>
            <surname>Milanfar</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>NIMA: Neural image assessment</article-title>
          .
          <source>IEEE Transactions on Image Processing 27</source>
          ,
          <issue>8</issue>
          (
          <year>2018</year>
          ),
          <fpage>3998</fpage>
          -
          <lpage>4011</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Zhenqiang</surname>
            <given-names>Ying</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ge</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yurui</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ronggang</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Wenmin</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A new image contrast enhancement algorithm using exposure fusion framework</article-title>
          .
          <source>In International Conference on Computer Analysis of Images and Patterns</source>
          . Springer,
          <fpage>36</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bolei</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba.
          <year>2017</year>
          .
          <article-title>Places: A 10 million Image Database for Scene Recognition</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>