Imcube @ MediaEval 2015 DroneProtect Task: Reversible Masking using Steganography Sebastian Schmiedeke, Pascal Kelm, and Lutz Goldmann Imcube Labs GmbH Berlin, Germany {schmiedeke, kelm, goldmann}@imcube.de ABSTRACT Inpainting, edge detection This paper describes Imcube’s participation in the Drone- Protect Task of MediaEval 2015, which aims to obscure privacy-concerned image regions in videos sequences cap- tured with drones. As a result persons and vehicles should be unrecognisable, but the semantic meaning of the scene should remain understandable to viewer. We use an ap- proach which replaces the privacy-concerned region with an automatically computed composite of inpainted background DCT-based steganography and foreground contour. Before obfuscation, the image re- gion to be hidden is extracted and steganographically em- bedded into the processed frame leading to a reversible so- Figure 1: Exemplary frame to visualise our ap- lution. The evaluation shows that the developed solution proach: Our masking algorithm obfuscates privacy- achieves good privacy protection while preserving the intel- concerning regions, but preserving the semantics. ligibility and aesthetic pleasantness of the original video. The steganography algorithm guarantees that ob- fuscated areas can be recovered, if needed. 1. INTRODUCTION Since drones become affordable, these devices are increas- ingly used for security applications. Due to their flexibility 2.1 Masking videos captures by a drone may contain highly sensitive per- Since the videos captured by the mini-drones [3] are highly sonal data. Consequently, individuals are increasingly con- dynamic compared to static surveillance cameras, traditional cerned about the “invasiveness” of such ubiquitous surveil- background subtraction techniques [5] cannot be applied for lance and fear that their privacy is at risk. The demands of foreground object extraction. Instead of that we use edge stakeholders to prevent criminal activities are often seen to detection to extract the outline of the foreground objects be in conflict with the privacy requirements of individuals. within the region of interest. Hence, each frame of the se- The DroneProtect Task of MediaEval 2015 deals with quence is transformed into grey scale and then smoothed by the problem of privacy protection in dynamic surveillance applying a Gaussian kernel (5 × 5). Edges are detected by videos [1]. applying adaptive thresholding. Therefore, the area around A common way to protect privacy in images and videos is each pixel is cross-correlated with Gaussian windows of a suf- to apply techniques such as blurring or masking, as shown ficiently large kernel width. Pixels exceeding the weighted in [3]. Since these techniques are irreversible, steganogra- sum of cross-correlation become edge pixels which are sub- phy [4] can be used to preserve this information. A typi- sequently used to form the outline. The binary outline is cal steganography algorithm is located in the process chain enhanced by applying morphological operations. between quantisation of the DCT coefficients and Huffman In order to remove the original foreground object, the coding. The information to be hidden is embedded within region of interest needs to be filled with reasonable back- the least significant bit of non-zero AC coefficients of each ground information. We rely on the inpainting algorithm by DCT block. Telea [6] which is able to rapidly reconstruct missing image parts and works as follows. Starting with the boundaries the colour information is propagated to the inside of the region 2. APPROACH by smoothly interpolating along pixel intensity lines. For Our approach combines both masking and steganography this purpose, the “image smoothness information” [2], which to obtain a visually appealing obfuscated video and to have is estimated by a weighted sum of Laplacians of the known the possibility to recover the original frame. An example is neighbourhood, is propagated along these intensity lines. shown in Figure 1. The direction of the intensity lines, also called “isophotes”, is estimated by discretised intensity gradient vectors. Based on the assumption that isophotes have the smallest changes Copyright is held by the author/owner(s). along their direction and the largest changes perpendicular MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany to their direction, it is estimated by finding the largest gra- Table 1: Subjective evaluation metrics assigned by different subject groups. Subject group Privacy Intelligibility Pleasantness Deviation Category 1 (experts) 0.46 0.65 0.70 0.12 Category 2 (novices) 0.53 0.57 0.63 0.06 Overall 0.50 0.61 0.67 0.09 dient which is orthogonal to the isophote belonging to that persons and vehicles was protected through the obfuscation pixel. To improve the temporal stability of the inpainting or in other words how difficult the obfuscation made the result, the inpainted areas are temporally filtered, e.g. for identification of a person or vehicle by hiding relevant vi- each pixel the median value with its temporally neighbours sual information. The proposed method achieves a medium is computed. overall score (0.50) which suggests that even though only the The obfuscated region of interest is obtained by blend- contour of the object of interest is preserved in some cases ing the extracted foreground contour with the reconstructed it can still be identified. A deeper analysis of these cases is background texture. Depending on the application the con- needed to identify potential improvements. tour may be emphasized with different colours. Intelligibility stands for the ability of classifying objects and actions within a video sequence and evaluates how well 2.2 Steganography the activities within a scene are preserved even if the object Since the masking algorithm itself is irreversible, the orig- of interest is obfuscated to prevent its identification. The inal image data contained within the region of interested proposed method achieves a good overall score (0.61) which must be embedded into the obfuscated image. Therefore, we shows that contour information alone is enough to under- make use of a steganography library, which is based on the stand most of the semantics of a scene from a surveillance F5-algorithm [7]. This algorithm embeds binary information perspective. A deeper analysis of the individual videos is into the DCT coefficients of a JPEG image. The least signif- needed to understand what additional information is needed icant bits of non-zero AC coefficients are replaced by the bits to improve the intelligibility further. to be embedded in such a way that the statistical distribu- Pleasantness evaluates the influence of the obfuscation tion of coefficients remains unchanged. Since only non-zero method on the visual quality of the video or by how much the coefficients can carry steganographic values and these coeffi- quality of the video is degraded by distortions and artefacts cients occur less frequent than zero-valued coefficients, only within the region of interest. The subjective score is based a limited amount of data can be embedded. Depending on on the level of user acceptance. Here, the proposed methods the amount and size of regions of interests that shall be hid- achieves a good overall score (0.67), since the black fore- den, the capacity is often insufficient. Therefore, each region ground contours composed over an inpainted background is treated as rectangular region which is JPEG compressed blend well with the original content outside the region of with an adjustable compression parameter. All compressed interest. This score may be further improved by using a regions together with their bounding boxes are then con- more sophisticated inpainting algorithm which reconstructs catenated, encrypted and embedded in the obfuscated JPEG a more plausible background texture. encoded image. Since the embedded information maybe de- Since the three metrics mentioned above evaluate quite stroyed if the image sequence is transcoded, the individual contrary requirements, the deviation evaluates difference be- frames are simply combined into a Motion JPEG video. tween these metrics by computing the standard deviation. As it can be expected from the similar scores for the different 3. EXPERIMENTS & RESULTS metrics, the proposed approach has a very good deviation score (0.09). This shows that it strikes a good balance be- The video sequences of the DroneProtect dataset [3] are tween the different criteria (privacy, intelligibility and pleas- obscured by replacing foreground objects with their outlines. antness). We are sure that individuals can be identified not only by Comparing the results between the different subject groups their face but also their clothes or accessories. So, the colour shows that the scores of the novices are more equal across of objects with each regions are replaced by an estimated the different metrics, while the experts evaluate the intelli- background. Since the regions of interests are provided with gibility and pleasantness higher and the privacy lower. This the dataset, we apply our masking algorithm only on the follows the intuition that experts will be able to recognize provided areas. actions and identities better than novices. The higher pleas- The evaluation of the obscured videos took place using antness score suggests that experts value the content of a subjective procedures. Two groups of subjects with different video higher than its quality. experience in surveillance applications were asked to survey the videos and respond to questions concerning the content. Based on the answers to these questions three different met- 4. CONCLUSION rics (privacy, pleasantness, intelligibility) and the deviation A reversible approach for protecting the privacy based on between were computed. The average scores of the different masking and steganographic embedding of the region of in- subject groups (experts, novices, overall) for all 38 videos terest has been proposed and evaluation on the DronePro- are summarized in Table 1. In the following we will anal- tect dataset. The results shows that the approach strikes yse the results for the different metrics and different subject a good balance between privacy, intelligibility and pleasant- groups. ness. For developing potential improvements a more detailed The privacy metric measures how well the identity of the analysis of the scores for the individual videos is needed. 5. REFERENCES [1] A. Badii, P. Koshunov, H. Oudi, T. Ebrahimi, T. Piatrik, V. Eiselein, N. Ruchaud, C. Fedorczak, J.-L. Dugelay, and D. F. Vazquez. Overview of the MediaEval 2015 Drone Protect Task. In MediaEval 2015 Workshop, Wurzen, Germany, September 14-15 2015. [2] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In Annual Conference on Computer Graphics, pages 417–424, 2000. [3] M. Bonetto, P. Korshunov, G. Ramponi, and T. Ebrahimi. Privacy in mini-drone based video surveillance. In Workshop on De-identification for privacy protection in multimedia, 2015. [4] T. Morkel, J. H. P. Eloff, and M. S. Olivier. An Overview of Image Steganography. In H. S. Venter, J. H. P. Eloff, L. Labuschagne, and M. M. Eloff, editors, Proceedings of the Fifth Annual Information Security South Africa Conference (ISSA2005), Sandton, South Africa, 6 2005. Published electronically. [5] S. Schmiedeke, P. Kelm, L. Goldmann, and T. Sikora. TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks. In Proceedings of the MediaEval 2014 Multimedia Benchmark Workshop, pages 73–74. CEUR-WS, 2014. [6] A. Telea. An image inpainting technique based on the fast marching method. Journal of graphics tools, 9(1):23–34, 2004. [7] A. Westfeld. F5-a steganographic algorithm. In I. S. Moskowitz, editor, Information Hiding, volume 2137 of Lecture Notes in Computer Science, pages 289–302. Springer Berlin Heidelberg, 2001.