=Paper=
{{Paper
|id=Vol-2670/MediaEval_19_paper_16
|storemode=property
|title=HCMUS
at Pixel Privacy 2019: Scene Category Protection with Back Propagation and Image
Enhancement
|pdfUrl=https://ceur-ws.org/Vol-2670/MediaEval_19_paper_16.pdf
|volume=Vol-2670
|authors=Hung Vinh Tran,Trong-Thang Pham,Hai-Tuan Ho-Nguyen,Hoai-Lam Nguyen-Hy,Xuan-Vy Nguyen,Thang-Long Nguyen-Ho,Minh-Triet Tran
|dblpUrl=https://dblp.org/rec/conf/mediaeval/TranPHNNNT19
}}
==HCMUS
at Pixel Privacy 2019: Scene Category Protection with Back Propagation and Image
Enhancement==
HCMUS at Pixel Privacy 2019: Scene Category Protection with Back Propagation and Image Enhancement Hung Vinh Tran∗ , Trong-Thang Pham∗ , Hai-Tuan Ho-Nguyen∗ , Hoai-Lam Nguyen-Hy∗ , Xuan-Vy Nguyen∗ , Thang-Long Nguyen-Ho∗ , Minh-Triet Tran Faculty of Information Technology, University of Science, VNU-HCM, Vietnam {tvhung,ptthang,nxvy,hnhtuan,nhhlam,nhtlong}@selab.hcmus.edu.vn,tmtriet@fit.hcmus.edu.vn ABSTRACT fool the Places365 scene classification, while achieving an average Personal privacy is one of the essential problems in modern society. NIMA score up to 5.36. In some cases, people may not want smart computing systems The content of our report is as follows. In Section 2, we present to automatically identify and reveal their personal information, our method for protecting scene category and enhancing image such as places or habits. This motivates our proposal to protect with different approaches. Experimental results are in Section 3. scene category recognition from photos by back-propagation. To Conclusion and future works are discussed in Section 4. further improve the visual quality and attraction of output photos, we study and propose various strategies for image enhancement, 2 METHOD from traditional approaches to novel GAN-based methods. Our Our goal is to protect the scene category of the input image, and our solution can successfully fool the Place365 scene classification in output result should be in better quality based on NIMA [8] Score. 60 categories while achieving the average NIMA score up to 5.36. To tackle these two goals, we desire to enhance the input image first by different methods then we apply our proposed protection method on the output enhanced image. We put the protection method in 1 INTRODUCTION the latter part to ensure the success of our main objective, which is With the rapid development of computer vision and new machine the privacy of users. learning methods, computers can now understand content of im- ages, e.g. which object is in an image, who is in the image, what 2.1 Protection algorithm is happening in the image, or recognize scene’s category and at- To protect the image’s location information, we need to modify tributes, etc. This provides foundation for intelligent interactive the image so that the output class becomes different compared to systems such as smart homes, self-driving cars, etc. This fact, how- the ground truth. In other words, we need to reduce the output ever, can raise potential risks of personal privacy violation. A person probability of the ground truth class. might not want other people to know where he or she is. Important With this intuition in mind, we consider the output probability facilities like hospitals or military camps should not be automati- of ground truth class as the target function f to minimize with cally recognized also. This motivates the proposal of the problem gradient descent and the input image as the parameter θ to optimize. of Pixel Privacy: to prevent automatic systems from recognizing Formally, let θ be the input image, θ ′ the modified image, f the scene categories from images [6, 7]. model we want to fool. Our main goal is to modify input image : In the Pixel Privacy 2019 [7] task, we are given images in 60 θ′ = θ + ϵ important categories such as hospital or bedroom. Our goal is to modify the given images so that the given automatic system (a so that ResNet50 [3] trained on the Places365-Standard [10] dataset) can f (θ ′ ) , f (θ ) no longer correctly recognize them. This, however, may cause the output images to be degraded significantly. To prevent this, we use where ϵ is obtained after backwarding through network encoding Neural Image Assessment (NIMA) to automatically evaluate our in Fig.1 output images. Input Image Categories' Scores 0.007 Network Encoding We propose the method to attack the ResNet50 model using back 0.85 propagation. For each input image, we consider the logit for its Output label target class as a function on the input image’s pixels. This allows 2 0.01 0.025 back propagation to minimize this logit by modifying the input 0.037 image. We also use several methods to enhance the input images’ quality before feeding it through the protection phase, such as Input Image Loss values natural enhancement with Dynamic Histogram Equalization [1] 0 Network Encoding 0.85 combined with saliency mask generated from Cascaded Partial Groundtruth Decoder [9], GAN-based approaches like CartoonGAN [2], DPED Modify Backward Frozen 0 2 [5], Retouch [4], and Texture. Our experiment results successfully 0 0 Copyright 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). MediaEval’19, 29-31 October 2019, Sophia Antipolis, France Figure 1: Protection Algorithm MediaEval’19, 29-31 October 2019, Sophia Antipolis, France Hung Vinh Tran et al. 2.2 Image Enhancement algorithm Places365-standard data set), which means lower is better. The 2.2.1 Natural Enhancement. For the first run, we try to improve "NIMA Score" column represents the mean aesthetics scores of image quality with traditional computer graphic methods. We start your runs. A higher NIMA score is better. with Dynamic Histogram Equalization[1] algorithm to enhance As our experiment result shows that all of our five runs perfectly each image’s contrast and brightness. We also try to configure protected the scene category of the image from the attack model. the saturation of images to emphasize the main objects. We use And from our observation, most GAN-based method can not achieve Cascaded Partial Decoder [9] to generate the saliency mask which the same natural level as traditional computer graphics approach. describes important objects with high values. Then we feed the The texture method currently has the highest score (5.36). Moreover, saliency mask into Gaussian Blur and get a new mask. Thirdly, we without the protection method, the texture method also can protect modify saturation value each pixel: more than 2/3 datasets. However, sophisticated approaches like Texture or Cartoon Gan x −α + γ , if x ≥ α ( transform images excessively, and output images do not have a y = xβ natural appearance anymore. That is the reason why we propose ω, otherwise Traditional Enhancement Methods like Saturation modify or DHE In this formula, y is a saturation value and x is a new saliency value algorithm. Not only does it maintains the original beauty, but Nat- in the same pixel. ural Enhancement also achieves 2nd highest score. 2.2.2 Style Transfer. For this run, we apply a style transfer- based approach to improve images’ appeal. In this particular case, we run CartoonGAN [2] to modify images into some certain styles, such as Hayao, Hosoda and Shinkai. 2.2.3 GAN Enhancement. In this run, we try to improve image’s quality by method proposed in [5] , which is a GAN-based algorithm. This model’s purpose is to convert a photo taken by phone to be a DSLR-Quality photo, in consequence, improve the color and texture quality. (a) Original (b) Natural (c) GAN 2.2.4 Retouch. In this run, we assume that the evaluation method Hospital Beer Hall Pub with NIMA Score is as natural as the human visual system. We try to enhance the image with [4], which applying deep reinforcement learning and GAN Model, to have the best quality image enhance- ment created by AI Agent. 2.2.5 Texture. In this run, we hypothesize that any natural im- age has an average NIMA score between 3 and 4.5. We attempted to validate this hypothesis by computing NIMA score for several randomly created images, which results in fairly high score. With that in mind, we blend noise images to the original images to em- (d) Retouch (e) Cartoon GAN (f) Texture phasize details usually ignored by NIMA model, while keeping the Pub Nursing home Catacomb scores above 3. We try 2 ways to blend the noise images: Way 1: x ′ = x ⊙ (αϵ), where x and x ′ are the input and ouput Figure 2: Sample output images, respectively; ϵ is the crafted noise image; α is the coefficient in which the crafted image is blended into the original. Way 2: x ′ = x +αϵ, where x and y are the V (as in HSV) channels 4 CONCLUSION AND FUTURE WORKS of the original image and the result image, respectively; ϵ is the We propose one simple yet effective approach for Pixel Privacy crafted noise image; α is the coefficient with which the crafted Problem. Our method is using backpropagation to modify certain image is blended into the original. pixels of the input image while freezing all intermediate modules 3 EXPERIMENTS AND RESULTS in the attack model. This method could be expanded for other Most of our experiment is conducted on Google Colab. categories besides scene category, for example, vehicle detection, Table 1: Official evaluation result (provided by organizers) human tracking, and so on. We also propose and apply five methods to enhance the input image, from new methods, namely, Cartoon Method Top-1 Accuracy NIMA Score GAN (Style Transfer) and Texture effect, to traditional image en- Original Image 4.64 hancement methods. All of which provides images with a higher hcmus_naturalenhancement 0 5.14 average NIMA Score than original images. hcmus_retouch 0 4.71 hcmus_ganenhancement 0 4.84 hcmus_cartoongan 0 4.96 ACKNOWLEDGMENTS hcmus_texture 0 5.36 Research is supported by Vingroup Innovation Foundation (VINIF) in project code VINIF.2019.DA19. We would like to thank AIOZ Pte In the table above, the "Top-1 Accuracy" column shows the Ltd for supporting our team with computing infrastructure. prediction accuracy of the attack model (ResNet50 [3] trained on Pixel Privacy Task MediaEval’19, 29-31 October 2019, Sophia Antipolis, France REFERENCES conference on computer vision. [1] Mohammad Abdullah-Al-Wadud, Md Hasanul Kabir, M Ali Akber [6] Martha Larson, Zhuoran Liu, Simon Brugman, and Zhengyu Zhao. Dewan, and Oksam Chae. 2007. A dynamic histogram equalization 2018. Pixel Privacy: Increasing Image Appeal while Blocking Au- for image contrast enhancement. IEEE Transactions on Consumer tomatic Inference of Sensitive Scene Information. In Working Notes Electronics 53, 2 (2007), 593–600. Proceedings of the MediaEval 2018 Workshop. [2] Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. 2018. CartoonGAN: Gen- [7] Zhuoran Liu, Zhengyu Zhao, and Martha Larson. 2019. Pixel Privacy erative Adversarial Networks for Photo Cartoonization. In The IEEE 2019: Protecting Sensitive Scene Information in Images. In Working Conference on Computer Vision and Pattern Recognition (CVPR). Notes Proceedings of the MediaEval 2019 Workshop. [3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep [8] Hossein Talebi and Peyman Milanfar. 2018. Nima: Neural image residual learning for image recognition. In Proceedings of the IEEE assessment. IEEE Transactions on Image Processing 27, 8 (2018), 3998– Conference on Computer Vision and Pattern Recognition. 770–778. 4011. [4] Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, and Stephen Lin. [9] Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded Partial Decoder 2017. Exposure: A White-Box Photo Post-Processing Framework. for Fast and Accurate Salient Object Detection. CoRR abs/1904.08739 CoRR abs/1709.09602 (2017). arXiv:1709.09602 http://arxiv.org/abs/ (2019). arXiv:1904.08739 http://arxiv.org/abs/1904.08739 1709.09602 [10] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio [5] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Torralba. 2017. Places: A 10 million Image Database for Scene Recog- and Luc Van Gool. 2017. DSLR-Quality Photos on Mobile Devices with nition. IEEE Transactions on Pattern Analysis and Machine Intelligence Deep Convolutional Networks. In Proceedings of the IEEE international (2017).