HCMUS at Pixel Privacy 2020: Quality Camouflage with Back Propagation and Image Enhancement Minh-Khoi Pham∗1,3 , Hai-Tuan Ho-Nguyen∗1,3 , Trong-Thang Pham1,3 , Hung Vinh Tran1,3 , Hai-Dang Nguyen1,3 , Minh-Triet Tran1,2,3 1 University of Science, VNU-HCM 2 John von Neumann Institute, VNU-HCM 3 Vietnam National University, Ho Chi Minh city, Vietnam {pmkhoi,hnhtuan,ptthang,tvhung,nhdang}@selab.hcmus.edu.vn,tmtriet@fit.hcmus.edu.vn ABSTRACT approach. For the last approach, we assume that if the enhancement As our needs to share moments evolve, the more high-quality pho- model is good enough, it will keep the image attributes, which in tos appear on the Internet. Hence, it is more likely that shared this case is protected from the BIQA model. The End-to-End with photos will be used for purposes that the owner does not want by I-FGSM applies this approach. someone else. If the target photos are high-quality, the attacker may use some criteria to assess the quality of images, such as the Blind Image Quality Assessment (BIQA) classifier. Pixel Privacy 2 APPROACH 2020 aims to tackle this problem. In this challenge, we have im- 2.1 Vanilla End-to-End plemented methods of combining image enhancement and an end In this run, we use an Image-to-Image network to reconstruct the to end attack. The final results show that all of our approaches input image; then, we forward the reconstructed result to the BIQA successfully fool the BIQA. In particular, our best run results in 84 regressor. photos being chosen as top-3 most attractive while maintaining We simply choose the U-Net[6] model as our main network 100% attack’s accuracy on the assessment model, and out-performs because it is one of the most popular baseline models for image to all other submissions. image problem and simple enough to implement. 1 INTRODUCTION With the recent rapid development of social networks, the need for sharing images also increases. Thus, smartphones have more high-end cameras, which leads to increment image quality. These images, which target to share with your friends, could be exploited by attackers for your private data. For example, high-quality images could be used as a filter for your honey-moon images. Therefore, we consider the Pixel Privacy task is vital for this age of connection. In Pixel Privacy 2020[5], we are given a set of images that was evaluated as high quality by BIQA[9] model. Our target is to fool the BIQA model so that the model will consider the modified image Figure 1: Vanilla End-to-End method as low quality, and the image remains attractive under human eyes. In Fig 1, the image x is first taken as input to U-Net, and model To be more specific, this BIQA model was trained on KonIQ-10k outputs the image y. Simultaneously, x is also enhanced to x’ by dataset[1], and the given images were from Places365 validation using simple transformations from available computer vision li- dataset. The output image would be processed by JPEG-compression braries (same as 2.2.2). After that, we use the trained frozen BIQA (ratio = 90%) before being evaluated. to predict a score for y. We then generate a pseudo target score to We propose three approaches. One is vanilla end-to-end with an attack the true score of y. Here, we have two objective functions for image-to-image based. In this approach, we aim to learn a single the network to minimize: the reconstruction loss between x’ and y network that could enhance image quality and protect the quality and the regression loss between pseudo score and true score. from being evaluated by the BIQA model. To be flexible in chang- Reconstruction loss: We experiment on both L2 loss, and SSIM ing the image enhancement method, we also propose a two-stage Loss [10] and find that model trained with L2 gives out more visually approach. The first stage is enhancing image quality, in which we appealing images than other objective functions. experiment with multiple methods to improve image quality. The Regression loss: We choose L2 loss to compute the distance be- second stage is to camouflage the enhanced image’s quality. Three tween two scores. The pseudo score is generated by subtracting A of our runs, namely Pillow, Cartoonization, and Retouch follow this from the original score B. We experiment A with the values of 30, *These authors contributed equally. 50, and B. We choose A equals 30 to submit in this run. Copyright 2020 for this paper by its authors. Use permitted under Creative Commons We add both losses and then back-propagate it to U-Net for the License Attribution 4.0 International (CC BY4.0). MediaEval’20, December 14-15 2020, Online model to be able to learn. We train the network from end to end on the pp2020_dev dataset with U-Net being initiated from scratch. MediaEval’20, December 14-15 2020, Online Minh-Khoi Pham, Hai-Tuan Nguyen-Ho, et al. 2.2 Two-stages approaches Table 1: Official evaluation result (provided by organizers) 2.2.1 Attack Algorithm. Accuracy (after Number of times In these approaches, we utilize Iterative Fast Gradient Sign Methods JPEG 90) selected as “Best” Method (I-FGSM)[4] to perform a white-box attack on the BIQA End-to-End (UNet) 13.27 11 model after the images are enhanced. Since the BIQA is a regression Retouch + I-FGSM 0.18 34 model, we use L2 loss function instead of Cross-Entropy loss, same Cartoonization + I-FGSM 1.27 34 as in 2.1. Pillow + I-FGSM 48.18 60 Our modified I-FGSM is described as follows, with 𝑋 the input End-to-End (EnlightenGAN) 0.00 84 image, 𝑦 the BIQA score of 𝑋 𝑁𝑎𝑑𝑣 , 𝑦 ′ and attacking score. 𝐽 (𝑋, 𝑦, 𝑦 ′ ) + I-FGSM the L2 cost function of the neural network, given image 𝑋 , score 𝑦 With the above result, we have successfully fooled BIQA by more and attacking score 𝑦 ′ , measuring the distance between y and y’: than 50% in all runs. EnlightenGAN proves to be the best method in our runs, followed by enhancement performance based on Pillow’s 𝑋 0𝑎𝑑𝑣 = 𝑋 (1) traditional image processing approach. This is an interesting result 𝑎𝑑𝑣 𝑎𝑑𝑣 𝑎𝑑𝑣 ′ compare to our result in the previous Pixel Privacy[7]. Last year, 𝑋𝑁 +1 = 𝐶𝑙𝑖𝑝𝑋 ,𝜖 {𝑋 𝑁 + 𝛼𝑠𝑖𝑔𝑛(∇𝑥 𝐽 (𝑋 𝑁 , 𝑦, 𝑦 ))} (2) traditional methods still outperformed GAN-based approaches, but this year, our proposed approach combine with a GAN method has 𝑎𝑑𝑣 , we iteratively add the pertur- Given 𝑦 the predicted score on 𝑋 𝑁 proven to be better than a traditional image processing method. bation to X until 𝑦 becomes smaller than score 𝑦 ′ . For all the runs This could be explained as GAN could perform a more flexible image below, we find that setting 𝛼 = 0.05, 𝜖 = 0.05 and 𝑦 ′ = 30 gives enhancement based on image, case by case, comparing to traditional desirable results in most cases. image processing algorithms with hard-coded parameters. 2.2.2 Image Enhancement algorithm. Pillow: We use several image enhancement operations which is provided by Pillow, such as adjusting color balance by 1.51 , sharp- ness by 3.01 , brightness by 1.01 and contrast by 1.51 . We apply the same configuration for all images in the data set. Cartoonization: For this run, we apply a GAN-based White-box (a) Original (73.79) (b) Enlightenment (c) Pillow FGSM Cartoonization method[8] to convert input images to cartoon im- GAN (35.46) (49.9) ages with styles from Shinkai Makoto, Miyazaki Hayao, and Hosoda Mamoru films. Retouch: In this run, we also want to compare one deep learning "white box" approach[2] with natural enhancement and "black- box" method. This method applies deep reinforcement learning and GAN Model to produce parameters for traditional image processing methods to improve image quality. (d) Unet (57.88) (e) White-box Car- (f) Retouch (41.9) toonization (14.85) Figure 2: Sample outputs with BIQA scores 2.3 End-to-End with I-FGSM However, different from other previous approaches, we will inte- 4 CONCLUSION grate I-FGSM with enhancement model. For each iteration, we will All of our approaches are simple but effective enough to fool BIQA first feed forward image to a deep learning model and BIQA model model while maintaining the high quality of images. then we will backward from L2 loss to calculate gradient and apply The Image-to-Image based method benefits that it does not re- I-FGSM on input image. For this particular experiment, we choose quire pair-to-pair images to train and it also performs attacks on EnlightenGAN [3] as enhancement model. feature-level not on raw images, in comparison with other methods. 3 EXPERIMENTS AND RESULTS Although it gives out the worst result among others, we still believe that it can be further investigated and improved. As can be seen in table 1, accuracy (after JPEG 90) is the accuracy The two-stage approaches, whose results are better, still have of model on dataset after being compressed 90% (lower is better). clearly visible noise over the images. I-FGSM also shows its strength Number of times selected as “Best” (Max. 140) base on human ex- in white-box attacks, both classification or even regression manner. perts evaluation. To be more specific, 20 images with largest BIQA Our new proposed approach, which is the combination of End- variance will be selected for 7 human experts to choose best three to-end and I-FGSM, shows both efficiency and effectiveness in runs out of all runs. camouflaging and visually enhancing the images. ACKNOWLEDGMENTS Research is supported by Vingroup Innovation Foundation (VINIF) 1 See Pillow documents for explanation of these numbers in project code VINIF.2019.DA19. Pixel Privacy: Quality Camouflage for Social Images MediaEval’20, December 14-15 2020, Online REFERENCES Notes Proceedings of the MediaEval Workshop. [1] Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. 2020. [6] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Convolutional Networks for Biomedical Image Segmentation. (2015). Image Quality Assessment. IEEE Transactions on Image Processing 29 arXiv:cs.CV/1505.04597 (2020), 4041–4056. https://doi.org/10.1109/tip.2020.2967829 [7] Hung Vinh Tran, Trong-Thang Pham, Hai-Tuan Ho-Nguyen, Hoai- [2] Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, and Stephen Lin. Lam Nguyen-Hy, Xuan-Vy Nguyen, Thang-Long Nguyen-Ho, and 2017. Exposure: A White-Box Photo Post-Processing Framework. Minh-Triet Tran. 2019. HCMUS at Pixel Privacy 2019: Scene Category CoRR abs/1709.09602 (2017). arXiv:1709.09602 http://arxiv.org/abs/ Protection with Back Propagation and Image Enhancement. (2019). 1709.09602 [8] Xinrui Wang and Jinze Yu. 2020. Learning to Cartoonize Using White- [3] Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Box Cartoon Representations. In IEEE/CVF Conference on Computer Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang. 2019. Enlight- Vision and Pattern Recognition (CVPR). engan: Deep light enhancement without paired supervision. arXiv [9] Xin Li. 2002. Blind image quality assessment. In Proceedings. Interna- preprint arXiv:1906.06972 (2019). tional Conference on Image Processing, Vol. 1. I–I. https://doi.org/10. [4] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial 1109/ICIP.2002.1038057 examples in the physical world. (2017). arXiv:cs.CV/1607.02533 [10] Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. 2016. Loss func- [5] Zhuoran Liu, Zhengyu Zhao, Martha Larson, and Laurent Amsaleg. tions for image restoration with neural networks. IEEE Transactions 2020. Exploring Quality Camouflage for Social Images. In Working on computational imaging 3, 1 (2016), 47–57.