=Paper= {{Paper |id=Vol-2670/MediaEval_19_paper_16 |storemode=property |title=HCMUS at Pixel Privacy 2019: Scene Category Protection with Back Propagation and Image Enhancement |pdfUrl=https://ceur-ws.org/Vol-2670/MediaEval_19_paper_16.pdf |volume=Vol-2670 |authors=Hung Vinh Tran,Trong-Thang Pham,Hai-Tuan Ho-Nguyen,Hoai-Lam Nguyen-Hy,Xuan-Vy Nguyen,Thang-Long Nguyen-Ho,Minh-Triet Tran |dblpUrl=https://dblp.org/rec/conf/mediaeval/TranPHNNNT19 }} ==HCMUS at Pixel Privacy 2019: Scene Category Protection with Back Propagation and Image Enhancement== https://ceur-ws.org/Vol-2670/MediaEval_19_paper_16.pdf
    HCMUS at Pixel Privacy 2019: Scene Category Protection with
           Back Propagation and Image Enhancement
              Hung Vinh Tran∗ , Trong-Thang Pham∗ , Hai-Tuan Ho-Nguyen∗ , Hoai-Lam Nguyen-Hy∗ ,
                        Xuan-Vy Nguyen∗ , Thang-Long Nguyen-Ho∗ , Minh-Triet Tran
                            Faculty of Information Technology, University of Science, VNU-HCM, Vietnam
                    {tvhung,ptthang,nxvy,hnhtuan,nhhlam,nhtlong}@selab.hcmus.edu.vn,tmtriet@fit.hcmus.edu.vn

ABSTRACT                                                                  fool the Places365 scene classification, while achieving an average
Personal privacy is one of the essential problems in modern society.      NIMA score up to 5.36.
In some cases, people may not want smart computing systems                   The content of our report is as follows. In Section 2, we present
to automatically identify and reveal their personal information,          our method for protecting scene category and enhancing image
such as places or habits. This motivates our proposal to protect          with different approaches. Experimental results are in Section 3.
scene category recognition from photos by back-propagation. To            Conclusion and future works are discussed in Section 4.
further improve the visual quality and attraction of output photos,
we study and propose various strategies for image enhancement,            2        METHOD
from traditional approaches to novel GAN-based methods. Our               Our goal is to protect the scene category of the input image, and our
solution can successfully fool the Place365 scene classification in       output result should be in better quality based on NIMA [8] Score.
60 categories while achieving the average NIMA score up to 5.36.          To tackle these two goals, we desire to enhance the input image first
                                                                          by different methods then we apply our proposed protection method
                                                                          on the output enhanced image. We put the protection method in
1    INTRODUCTION
                                                                          the latter part to ensure the success of our main objective, which is
With the rapid development of computer vision and new machine             the privacy of users.
learning methods, computers can now understand content of im-
ages, e.g. which object is in an image, who is in the image, what         2.1           Protection algorithm
is happening in the image, or recognize scene’s category and at-          To protect the image’s location information, we need to modify
tributes, etc. This provides foundation for intelligent interactive       the image so that the output class becomes different compared to
systems such as smart homes, self-driving cars, etc. This fact, how-      the ground truth. In other words, we need to reduce the output
ever, can raise potential risks of personal privacy violation. A person   probability of the ground truth class.
might not want other people to know where he or she is. Important            With this intuition in mind, we consider the output probability
facilities like hospitals or military camps should not be automati-       of ground truth class as the target function f to minimize with
cally recognized also. This motivates the proposal of the problem         gradient descent and the input image as the parameter θ to optimize.
of Pixel Privacy: to prevent automatic systems from recognizing              Formally, let θ be the input image, θ ′ the modified image, f the
scene categories from images [6, 7].                                      model we want to fool. Our main goal is to modify input image :
   In the Pixel Privacy 2019 [7] task, we are given images in 60
                                                                                                            θ′ = θ + ϵ
important categories such as hospital or bedroom. Our goal is to
modify the given images so that the given automatic system (a             so that
ResNet50 [3] trained on the Places365-Standard [10] dataset) can
                                                                                                          f (θ ′ ) , f (θ )
no longer correctly recognize them. This, however, may cause the
output images to be degraded significantly. To prevent this, we use       where ϵ is obtained after backwarding through network encoding
Neural Image Assessment (NIMA) to automatically evaluate our              in Fig.1
output images.                                                                Input Image                                            Categories' Scores
                                                                                                                                            0.007
                                                                                                          Network Encoding
   We propose the method to attack the ResNet50 model using back                                                                             0.85




propagation. For each input image, we consider the logit for its                                                                                          Output label


target class as a function on the input image’s pixels. This allows                                                                                           2
                                                                                                                                             0.01


                                                                                                                                             0.025


back propagation to minimize this logit by modifying the input                                                                               0.037



image. We also use several methods to enhance the input images’
quality before feeding it through the protection phase, such as               Input Image
                                                                                                                                         Loss values


natural enhancement with Dynamic Histogram Equalization [1]
                                                                                                                                              0
                                                                                                          Network Encoding
                                                                                                                                             0.85


combined with saliency mask generated from Cascaded Partial
                                                                                                                                                          Groundtruth
Decoder [9], GAN-based approaches like CartoonGAN [2], DPED
                                                                                            Modify                            Backward
                                                                                                              Frozen                          0               2

[5], Retouch [4], and Texture. Our experiment results successfully                                                                            0


                                                                                                                                              0



Copyright 2019 for this paper by its authors. Use
permitted under Creative Commons License Attribution
4.0 International (CC BY 4.0).
MediaEval’19, 29-31 October 2019, Sophia Antipolis, France
                                                                                                 Figure 1: Protection Algorithm
MediaEval’19, 29-31 October 2019, Sophia Antipolis, France                                                              Hung Vinh Tran et al.


2.2     Image Enhancement algorithm                                        Places365-standard data set), which means lower is better. The
   2.2.1 Natural Enhancement. For the first run, we try to improve         "NIMA Score" column represents the mean aesthetics scores of
image quality with traditional computer graphic methods. We start          your runs. A higher NIMA score is better.
with Dynamic Histogram Equalization[1] algorithm to enhance                   As our experiment result shows that all of our five runs perfectly
each image’s contrast and brightness. We also try to configure             protected the scene category of the image from the attack model.
the saturation of images to emphasize the main objects. We use             And from our observation, most GAN-based method can not achieve
Cascaded Partial Decoder [9] to generate the saliency mask which           the same natural level as traditional computer graphics approach.
describes important objects with high values. Then we feed the             The texture method currently has the highest score (5.36). Moreover,
saliency mask into Gaussian Blur and get a new mask. Thirdly, we           without the protection method, the texture method also can protect
modify saturation value each pixel:                                        more than 2/3 datasets.
                                                                              However, sophisticated approaches like Texture or Cartoon Gan
                          x −α + γ , if x ≥ α
                        (
                                                                           transform images excessively, and output images do not have a
                   y = xβ
                                                                           natural appearance anymore. That is the reason why we propose
                          ω,         otherwise
                                                                           Traditional Enhancement Methods like Saturation modify or DHE
In this formula, y is a saturation value and x is a new saliency value     algorithm. Not only does it maintains the original beauty, but Nat-
in the same pixel.                                                         ural Enhancement also achieves 2nd highest score.
   2.2.2 Style Transfer. For this run, we apply a style transfer-
based approach to improve images’ appeal. In this particular case,
we run CartoonGAN [2] to modify images into some certain styles,
such as Hayao, Hosoda and Shinkai.
   2.2.3 GAN Enhancement. In this run, we try to improve image’s
quality by method proposed in [5] , which is a GAN-based algorithm.
This model’s purpose is to convert a photo taken by phone to be a
DSLR-Quality photo, in consequence, improve the color and texture
quality.                                                                       (a) Original            (b) Natural              (c) GAN
   2.2.4 Retouch. In this run, we assume that the evaluation method              Hospital               Beer Hall                 Pub
with NIMA Score is as natural as the human visual system. We try
to enhance the image with [4], which applying deep reinforcement
learning and GAN Model, to have the best quality image enhance-
ment created by AI Agent.
   2.2.5 Texture. In this run, we hypothesize that any natural im-
age has an average NIMA score between 3 and 4.5. We attempted
to validate this hypothesis by computing NIMA score for several
randomly created images, which results in fairly high score. With
that in mind, we blend noise images to the original images to em-              (d) Retouch        (e) Cartoon GAN              (f) Texture
phasize details usually ignored by NIMA model, while keeping the                   Pub            Nursing home                  Catacomb
scores above 3. We try 2 ways to blend the noise images:
   Way 1: x ′ = x ⊙ (αϵ), where x and x ′ are the input and ouput                              Figure 2: Sample output
images, respectively; ϵ is the crafted noise image; α is the coefficient
in which the crafted image is blended into the original.
   Way 2: x ′ = x +αϵ, where x and y are the V (as in HSV) channels        4   CONCLUSION AND FUTURE WORKS
of the original image and the result image, respectively; ϵ is the         We propose one simple yet effective approach for Pixel Privacy
crafted noise image; α is the coefficient with which the crafted           Problem. Our method is using backpropagation to modify certain
image is blended into the original.                                        pixels of the input image while freezing all intermediate modules
3     EXPERIMENTS AND RESULTS                                              in the attack model. This method could be expanded for other
Most of our experiment is conducted on Google Colab.                       categories besides scene category, for example, vehicle detection,
Table 1: Official evaluation result (provided by organizers)               human tracking, and so on. We also propose and apply five methods
                                                                           to enhance the input image, from new methods, namely, Cartoon
            Method                 Top-1 Accuracy      NIMA Score          GAN (Style Transfer) and Texture effect, to traditional image en-
         Original Image                                   4.64             hancement methods. All of which provides images with a higher
    hcmus_naturalenhancement               0              5.14             average NIMA Score than original images.
         hcmus_retouch                     0              4.71
     hcmus_ganenhancement                  0              4.84
       hcmus_cartoongan                    0              4.96
                                                                           ACKNOWLEDGMENTS
        hcmus_texture                      0              5.36             Research is supported by Vingroup Innovation Foundation (VINIF)
                                                                           in project code VINIF.2019.DA19. We would like to thank AIOZ Pte
  In the table above, the "Top-1 Accuracy" column shows the
                                                                           Ltd for supporting our team with computing infrastructure.
prediction accuracy of the attack model (ResNet50 [3] trained on
Pixel Privacy Task                                                                 MediaEval’19, 29-31 October 2019, Sophia Antipolis, France


REFERENCES                                                                        conference on computer vision.
 [1] Mohammad Abdullah-Al-Wadud, Md Hasanul Kabir, M Ali Akber                [6] Martha Larson, Zhuoran Liu, Simon Brugman, and Zhengyu Zhao.
     Dewan, and Oksam Chae. 2007. A dynamic histogram equalization                2018. Pixel Privacy: Increasing Image Appeal while Blocking Au-
     for image contrast enhancement. IEEE Transactions on Consumer                tomatic Inference of Sensitive Scene Information. In Working Notes
     Electronics 53, 2 (2007), 593–600.                                           Proceedings of the MediaEval 2018 Workshop.
 [2] Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. 2018. CartoonGAN: Gen-          [7] Zhuoran Liu, Zhengyu Zhao, and Martha Larson. 2019. Pixel Privacy
     erative Adversarial Networks for Photo Cartoonization. In The IEEE           2019: Protecting Sensitive Scene Information in Images. In Working
     Conference on Computer Vision and Pattern Recognition (CVPR).                Notes Proceedings of the MediaEval 2019 Workshop.
 [3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep        [8] Hossein Talebi and Peyman Milanfar. 2018. Nima: Neural image
     residual learning for image recognition. In Proceedings of the IEEE          assessment. IEEE Transactions on Image Processing 27, 8 (2018), 3998–
     Conference on Computer Vision and Pattern Recognition. 770–778.              4011.
 [4] Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, and Stephen Lin.           [9] Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded Partial Decoder
     2017. Exposure: A White-Box Photo Post-Processing Framework.                 for Fast and Accurate Salient Object Detection. CoRR abs/1904.08739
     CoRR abs/1709.09602 (2017). arXiv:1709.09602 http://arxiv.org/abs/           (2019). arXiv:1904.08739 http://arxiv.org/abs/1904.08739
     1709.09602                                                              [10] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio
 [5] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey,             Torralba. 2017. Places: A 10 million Image Database for Scene Recog-
     and Luc Van Gool. 2017. DSLR-Quality Photos on Mobile Devices with           nition. IEEE Transactions on Pattern Analysis and Machine Intelligence
     Deep Convolutional Networks. In Proceedings of the IEEE international        (2017).