<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HCMUS at Pixel Privacy 2020: Quality Camouflage with Back Propagation and Image Enhancement</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Minh-Khoi Pham</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hai-Tuan Ho-Nguyen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trong-Thang Pham</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hung Vinh Tran</string-name>
          <email>tvhung@selab.hcmus.edu.vn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hai-Dang Nguyen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>John von Neumann Institute</institution>
          ,
          <addr-line>VNU-HCM</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Science</institution>
          ,
          <addr-line>VNU-HCM</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh city</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>As our needs to share moments evolve, the more high-quality photos appear on the Internet. Hence, it is more likely that shared photos will be used for purposes that the owner does not want by someone else. If the target photos are high-quality, the attacker may use some criteria to assess the quality of images, such as the Blind Image Quality Assessment (BIQA) classifier. Pixel Privacy 2020 aims to tackle this problem. In this challenge, we have implemented methods of combining image enhancement and an end to end attack. The final results show that all of our approaches successfully fool the BIQA. In particular, our best run results in 84 photos being chosen as top-3 most attractive while maintaining 100% attack's accuracy on the assessment model, and out-performs all other submissions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>With the recent rapid development of social networks, the need
for sharing images also increases. Thus, smartphones have more
high-end cameras, which leads to increment image quality. These
images, which target to share with your friends, could be exploited
by attackers for your private data. For example, high-quality images
could be used as a filter for your honey-moon images. Therefore,
we consider the Pixel Privacy task is vital for this age of connection.</p>
      <p>
        In Pixel Privacy 2020[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we are given a set of images that was
evaluated as high quality by BIQA[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] model. Our target is to fool
the BIQA model so that the model will consider the modified image
as low quality, and the image remains attractive under human eyes.
To be more specific, this BIQA model was trained on KonIQ-10k
dataset[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and the given images were from Places365 validation
dataset. The output image would be processed by JPEG-compression
(ratio = 90%) before being evaluated.
      </p>
      <p>We propose three approaches. One is vanilla end-to-end with an
image-to-image based. In this approach, we aim to learn a single
network that could enhance image quality and protect the quality
from being evaluated by the BIQA model. To be flexible in
changing the image enhancement method, we also propose a two-stage
approach. The first stage is enhancing image quality, in which we
experiment with multiple methods to improve image quality. The
second stage is to camouflage the enhanced image’s quality. Three
of our runs, namely Pillow, Cartoonization, and Retouch follow this
approach. For the last approach, we assume that if the enhancement
model is good enough, it will keep the image attributes, which in
this case is protected from the BIQA model. The End-to-End with
I-FGSM applies this approach.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
    </sec>
    <sec id="sec-3">
      <title>Vanilla End-to-End</title>
      <p>In this run, we use an Image-to-Image network to reconstruct the
input image; then, we forward the reconstructed result to the BIQA
regressor.</p>
      <p>
        We simply choose the U-Net[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] model as our main network
because it is one of the most popular baseline models for image to
image problem and simple enough to implement.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Two-stages approaches</title>
      <sec id="sec-4-1">
        <title>2.2.1 Atack Algorithm.</title>
        <p>
          In these approaches, we utilize Iterative Fast Gradient Sign
Method (I-FGSM)[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to perform a white-box attack on the BIQA
model after the images are enhanced. Since the BIQA is a regression
model, we use L2 loss function instead of Cross-Entropy loss, same
as in 2.1.
        </p>
        <p>Our modified I-FGSM is described as follows, with  the input
image,  the BIQA score of   , ′ and attacking score.  ( , , ′)

the L2 cost function of the neural network, given image  , score 
and attacking score ′, measuring the distance between y and y’:
 0 = 
+1 = , { +  (∇  (, , ′))}
(1)
(2)
Given  the predicted score on   , we iteratively add the
pertur
bation to X until  becomes smaller than score ′. For all the runs
below, we find that setting  = 0.05,  = 0.05 and ′ = 30 gives
desirable results in most cases.</p>
      </sec>
      <sec id="sec-4-2">
        <title>2.2.2 Image Enhancement algorithm.</title>
        <p>Pillow: We use several image enhancement operations which is
provided by Pillow, such as adjusting color balance by 1.51,
sharpness by 3.01, brightness by 1.01 and contrast by 1.51. We apply the
same configuration for all images in the data set.</p>
        <p>
          Cartoonization: For this run, we apply a GAN-based White-box
Cartoonization method[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] to convert input images to cartoon
images with styles from Shinkai Makoto, Miyazaki Hayao, and Hosoda
Mamoru films.
        </p>
        <p>
          Retouch: In this run, we also want to compare one deep learning
"white box" approach[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] with natural enhancement and
"blackbox" method. This method applies deep reinforcement learning and
GAN Model to produce parameters for traditional image processing
methods to improve image quality.
2.3
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>End-to-End with I-FGSM</title>
      <p>
        However, diferent from other previous approaches, we will
integrate I-FGSM with enhancement model. For each iteration, we will
ifrst feed forward image to a deep learning model and BIQA model
then we will backward from L2 loss to calculate gradient and apply
I-FGSM on input image. For this particular experiment, we choose
EnlightenGAN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as enhancement model.
3
      </p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS AND RESULTS</title>
      <p>As can be seen in table 1, accuracy (after JPEG 90) is the accuracy
of model on dataset after being compressed 90% (lower is better).
Number of times selected as “Best” (Max. 140) base on human
experts evaluation. To be more specific, 20 images with largest BIQA
variance will be selected for 7 human experts to choose best three
runs out of all runs.
1See Pillow documents for explanation of these numbers</p>
      <p>
        With the above result, we have successfully fooled BIQA by more
than 50% in all runs. EnlightenGAN proves to be the best method in
our runs, followed by enhancement performance based on Pillow’s
traditional image processing approach. This is an interesting result
compare to our result in the previous Pixel Privacy[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Last year,
traditional methods still outperformed GAN-based approaches, but
this year, our proposed approach combine with a GAN method has
proven to be better than a traditional image processing method.
This could be explained as GAN could perform a more flexible image
enhancement based on image, case by case, comparing to traditional
image processing algorithms with hard-coded parameters.
All of our approaches are simple but efective enough to fool BIQA
model while maintaining the high quality of images.
      </p>
      <p>The Image-to-Image based method benefits that it does not
require pair-to-pair images to train and it also performs attacks on
feature-level not on raw images, in comparison with other methods.
Although it gives out the worst result among others, we still believe
that it can be further investigated and improved.</p>
      <p>The two-stage approaches, whose results are better, still have
clearly visible noise over the images. I-FGSM also shows its strength
in white-box attacks, both classification or even regression manner.</p>
      <p>Our new proposed approach, which is the combination of
Endto-end and I-FGSM, shows both eficiency and efectiveness in
camouflaging and visually enhancing the images.</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGMENTS</title>
      <p>Research is supported by Vingroup Innovation Foundation (VINIF)
in project code VINIF.2019.DA19.</p>
      <p>Pixel Privacy: Quality Camouflage for Social Images</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Vlad</given-names>
            <surname>Hosu</surname>
          </string-name>
          , Hanhe Lin, Tamas
          <string-name>
            <surname>Sziranyi</surname>
            , and
            <given-names>Dietmar</given-names>
          </string-name>
          <string-name>
            <surname>Saupe</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          <volume>29</volume>
          (
          <year>2020</year>
          ),
          <fpage>4041</fpage>
          -
          <lpage>4056</lpage>
          . https://doi.org/10.1109/tip.
          <year>2020</year>
          .2967829
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Yuanming</given-names>
            <surname>Hu</surname>
          </string-name>
          , Hao He, Chenxi Xu,
          <string-name>
            <given-names>Baoyuan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2017</year>
          . Exposure:
          <string-name>
            <given-names>A</given-names>
            <surname>White-Box Photo</surname>
          </string-name>
          Post-Processing Framework.
          <source>CoRR abs/1709</source>
          .09602 (
          <year>2017</year>
          ). arXiv:
          <volume>1709</volume>
          .09602 http://arxiv.org/abs/ 1709.09602
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Yifan</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen,
          <string-name>
            <given-names>Jianchao</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pan Zhou</surname>
            , and
            <given-names>Zhangyang</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Enlightengan: Deep light enhancement without paired supervision</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>06972</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alexey</given-names>
            <surname>Kurakin</surname>
          </string-name>
          , Ian Goodfellow, and
          <string-name>
            <given-names>Samy</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Adversarial examples in the physical world</article-title>
          . (
          <year>2017</year>
          ).
          <source>arXiv:cs.CV/1607.02533</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zhuoran</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Martha</given-names>
            <surname>Larson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Amsaleg</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Exploring Quality Camouflage for Social Images</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Olaf</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Brox</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>U-Net: Convolutional Networks for Biomedical Image Segmentation</article-title>
          . (
          <year>2015</year>
          ).
          <source>arXiv:cs.CV/1505.04597</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Hung</given-names>
            <surname>Vinh</surname>
          </string-name>
          <string-name>
            <given-names>Tran</given-names>
            ,
            <surname>Trong-Thang</surname>
          </string-name>
          <string-name>
            <given-names>Pham</given-names>
            ,
            <surname>Hai-Tuan</surname>
          </string-name>
          Ho-Nguyen,
          <article-title>HoaiLam Nguyen-Hy, Xuan-Vy Nguyen, Thang-Long Nguyen-Ho, and</article-title>
          <string-name>
            <given-names>Minh-Triet</given-names>
            <surname>Tran</surname>
          </string-name>
          .
          <year>2019</year>
          . HCMUS at Pixel Privacy 2019:
          <article-title>Scene Category Protection with Back Propagation and Image Enhancement</article-title>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Xinrui</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jinze</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Learning to Cartoonize Using WhiteBox Cartoon Representations</article-title>
          .
          <source>In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Xin</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Blind image quality assessment</article-title>
          .
          <source>In Proceedings. International Conference on Image Processing</source>
          , Vol.
          <volume>1</volume>
          .
          <string-name>
            <surname>I-I.</surname>
          </string-name>
          https://doi.org/10. 1109/ICIP.
          <year>2002</year>
          .1038057
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Hang</surname>
            <given-names>Zhao</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Orazio</given-names>
            <surname>Gallo</surname>
          </string-name>
          , Iuri Frosio, and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Kautz</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Loss functions for image restoration with neural networks</article-title>
          .
          <source>IEEE Transactions on computational imaging 3</source>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>47</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>