=Paper=
{{Paper
|id=Vol-1263/paper42
|storemode=property
|title=TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_42.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/SchmiedekeKGS14
}}
==TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks==
TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks Sebastian Schmiedeke1,2 , Pascal Kelm1,2 , Lutz Goldmann2 and Thomas Sikora1 1 2 Communication Systems Group Imcube Labs GmbH Technische Universität Berlin, Germany Berlin, Germany ABSTRACT This paper describes our participation in the Visual Privacy Task of MediaEval 2014, which aims to obscure human oc- scrambling currence in image sequences. As a result the recorded person should be unrecognisable, but if needed the obscured areas can be recovered. We use an approach which models the background and pseudo-randomly scrambles pixels within disjunct foreground areas. This technique is reversible and preserves the colour characteristic of each area. So, colour- based approaches will still be able to automatically distin- Figure 1: Original frame and its scrambled version. guish between differently dressed individuals. The evalua- tions of our results show that the privacy aspect got a high are only recognizable in the recovered (de-scrambled) im- score in all three evaluation streams. The level of intelli- ages. gibility and the pleasantness of our approach is below the average, since scrambling results in lower ‘aesthetic’ images. 2.1 Background Modelling We use background subtraction for generating a foreground 1. INTRODUCTION mask for each frame. In order to compensate slight camera Video surveillance of public spaces is expanding. Con- movements, each frame is subsampled by two and the re- sequently, individuals are increasingly concerned about the sulting masks are interpolated properly. Our background ‘invasiveness’ of such ubiquitous surveillance and fear that modelling module relies on a improved background subtrac- their privacy is at risk. The demands of stakeholders to pre- tion scheme [5] based on Gaussian-Mixture models (GMM). vent criminal activities are often seen to be in conflict with This algorithm automatically selects the needed number of the privacy requirements of individuals. The main challenge Gaussian components per pixel. The mixture of these com- is to preserve the anonymity of the surveyed individuals and ponents tries to reflect the desired background colour by also to fulfil the stakeholders needs. The problem of privacy incorporating the recent 300 frames, due to the static video protection in video surveillance is concerned in this year’s content. The number of components is controlled by a Ma- MediaEval Visual Privacy Task [1]. A typical way to protect halanobis distance threshold. If the squared Mahalanobis privacy in images and videos is to apply techniques such as distance of a pixel colour to any existing component exceeds blurring or masking. Since these techniques are irreversible, this threshold (th = 15) a new Gaussian is generated. Fore- scrambling is introduced in [2]: A transform-domain scram- ground pixels are determined by their belonging to compo- bling technique, where pixels in the respective regions are nents with small weights. We apply erosion and morpholog- pseudo-randomly scrambled based on a secret key. Our ap- ical operations on the foreground masks to eliminate outlier. proach is quite similar, but applied on the pixel of disjunct Our aim was to perfectly expose the silhouettes of persons, foreground masks to preserve the less invasive image back- but that target was not always achieved (see Fig. 2 for exam- ground. An exemplary frame is shown in Fig. 1. ples of a good foreground estimation and a bad estimation). 2.2 Reversible Scrambling 2. METHODOLOGY These foreground areas are then obfuscated by shuffling Our proposed privacy-protection approach consists of a their pixels. So, an obfuscated area differs from its original background modelling module and a scrambling module that version in a changed sequence of their pixels. obfuscates foreground masks. Since the PEViD videos [4] The shuffle algorithm is based on a modified variant of the depict static scenes with a low numbers of occurring and Fisher-Yates method [3] which generates ‘random’ permuta- moving people, the scrambled foreground still allows to iden- tions. The original sequence consists of M disjunct areas to tify persons’ movements and actions. Details such as faces be obfuscated. Each area a is then represented by a vector containing its line-by-line scanned N pixels. These areas are obfuscated by changing the order of its pixels and mapping Copyright is held by the author/owner(s). back the pixels to its original shape. The new pixel order of MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain each area is determined by swapping each i-th pixel with the Table 1: Evaluations according different streams (median values of the task are in brackets) stream 1 stream 2 stream 3 Intelligibility 73.6 % (74.9 %) 75.2 % (79.3 %) 66.5 % (69.6 %) Privacy 59.0 % (50.2 %) 62.6 % (46.5 %) 60.7 % (40.7 %) Pleasantness 21.9 % (24.8 %) 60.8 % (69.6 %) 58.1 % (59.7 %) j-th pixel, where j is defined by a pseudo-random number Pleasantness stands for the influence of the obscuring fil- generator and the constraint that j ≤ i + 1. ter on the human perception of the image distortion. The subjective score is based on the level of user acceptance. Here the score is below the median value resulting from dis- traction of the users. Intelligibility stands for the ability of identifying actions and objects within video frames. All three groups evaluate our filter with high scores that are close to the median. Since full masks of person are retained, their action should be recognizable. The privacy metric concerns about the identification of in- dividuals through their faces, ethics or personal accessories. This score is much higher the average. A high subjective score was excepted, since it is very hard for the human eye to recognise structures within scrambled areas. Figure 2: Example for a good foreground mask (left) We expect higher score in all three categories when apply- and a bad mask (right) [image section]. ing a more accurate background subtraction algorithm. So, the permutation of the pixels of each foreground areas 4. CONCLUSION is determined by the order generated by a pseudo-random We propose a reversible approach for scrambling fore- sequence. The pseudo-random sequence is repeatable due to ground masks within images or videos to obscure its content. the characteristics of the pseudo-random number generator This approach ensures a high level of privacy, and achieves (PRNG). The PRNG produces a random, but repeatable se- a standard level in the other aspects, like pleasantness and quence of integer numbers by specifying a certain, but fixed intelligibility. In future we will investigate the effect of more seed. This seed is generated from the hash value of a chosen accurate foreground masks on these privacies scores. The password. This value is fixed for all regions in each frame clue is that these areas can be recovered for further analy- and video sequence. Since the pseudo-random sequence is sis, if the foreground mask and the password which gener- repeatable through the given seed, the permutation of pixels ated the seed for the pseudo-random number generation are is reversible. So, the scrambled image regions can be recov- known. ered by knowing the password and the shape of each disjunct scrambled area. We choose for scrambling instead of cryp- tography to be robust against image compression artefacts 5. ACKNOWLEDGMENTS and transmission errors. Those errors will also affect the re- The research leading to these results has received funding covered frame in terms of distorted pixels, but these errors from the European Community’s FP7 under grant agree- will not break the de-scrambling scheme. ment number FP7-261743 (VideoSense). 3. EXPERIMENTS 6. REFERENCES The video sequences of the VPT dataset [1] [4] are ob- [1] A. Badii, T. Ebrahimi, C. Fedorczak, P. Korshunov, scured by scrambling foreground objects within each frame. T. Piatrik, V. Eiselein, and A. Al-Obaidi. Overview of Since the area of faces are provided with the data set, we the MediaEval 2014 Visual Privacy Task. In MediaEval include these areas in our foreground masks. So we ensure 2014 Workshop, Barcelona, Spain, October 16-17 2014. that the faces are obscured even if it is not part of our fore- [2] F. Dufaux and T. Ebrahimi. Video surveillance using ground mask. We are sure that individuals can be identified jpeg 2000. In Optical Science and Technology, the SPIE not only by their face but also their clothes or accessories. 49th Annual Meeting, pages 268–275. International So, the individuals are anonymised at best and a colour- Society for Optics and Photonics, 2004. based cluster algorithm may also be able to group areas [3] R. Durstenfeld. Algorithm 235: Random permutation. depicting the same person. Commun. ACM, 7(7):420–, July 1964. The evaluation of the obscured videos took place using [4] P. Korshunov and T. Ebrahimi. PEViD: privacy subjective procedures. Three different groups are asked to evaluation video dataset. Applications of Digital Image survey the videos and respond to question concerning the Processing XXXVI, 25-29 August 2013. content (number of persons, actions, etc. ). Three metrics [5] Z. Zivkovic. Improved adaptive gaussian mixture model are generated from these surveys: pleasantness, intelligibil- for background subtraction. In Pattern Recognition, ity, and privacy. These groups contains of crowdsourced 2004. ICPR 2004. Proceedings of the 17th International workers and two focus groups, the scores based on their Conference on, volume 2, pages 28–31 Vol.2, Aug 2004. opinions is shown in Table 1.