=Paper=
{{Paper
|id=Vol-1263/paper42
|storemode=property
|title=TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_42.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/SchmiedekeKGS14
}}
==TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks==
<pdf width="1500px">https://ceur-ws.org/Vol-1263/mediaeval2014_submission_42.pdf</pdf>
<pre>
     TUB @ MediaEval 2014 Visual Privacy Task: Reversible
             Scrambling on Foreground Masks

           Sebastian Schmiedeke1,2 , Pascal Kelm1,2 , Lutz Goldmann2 and Thomas Sikora1
                              1                                            2
                               Communication Systems Group                     Imcube Labs GmbH
                           Technische Universität Berlin, Germany               Berlin, Germany


ABSTRACT
This paper describes our participation in the Visual Privacy
Task of MediaEval 2014, which aims to obscure human oc-                                     scrambling
currence in image sequences. As a result the recorded person
should be unrecognisable, but if needed the obscured areas
can be recovered. We use an approach which models the
background and pseudo-randomly scrambles pixels within
disjunct foreground areas. This technique is reversible and
preserves the colour characteristic of each area. So, colour-
based approaches will still be able to automatically distin-     Figure 1: Original frame and its scrambled version.
guish between differently dressed individuals. The evalua-
tions of our results show that the privacy aspect got a high
                                                                 are only recognizable in the recovered (de-scrambled) im-
score in all three evaluation streams. The level of intelli-
                                                                 ages.
gibility and the pleasantness of our approach is below the
average, since scrambling results in lower ‘aesthetic’ images.   2.1   Background Modelling
                                                                    We use background subtraction for generating a foreground
1.   INTRODUCTION                                                mask for each frame. In order to compensate slight camera
   Video surveillance of public spaces is expanding. Con-        movements, each frame is subsampled by two and the re-
sequently, individuals are increasingly concerned about the      sulting masks are interpolated properly. Our background
‘invasiveness’ of such ubiquitous surveillance and fear that     modelling module relies on a improved background subtrac-
their privacy is at risk. The demands of stakeholders to pre-    tion scheme [5] based on Gaussian-Mixture models (GMM).
vent criminal activities are often seen to be in conflict with   This algorithm automatically selects the needed number of
the privacy requirements of individuals. The main challenge      Gaussian components per pixel. The mixture of these com-
is to preserve the anonymity of the surveyed individuals and     ponents tries to reflect the desired background colour by
also to fulfil the stakeholders needs. The problem of privacy    incorporating the recent 300 frames, due to the static video
protection in video surveillance is concerned in this year’s     content. The number of components is controlled by a Ma-
MediaEval Visual Privacy Task [1]. A typical way to protect      halanobis distance threshold. If the squared Mahalanobis
privacy in images and videos is to apply techniques such as      distance of a pixel colour to any existing component exceeds
blurring or masking. Since these techniques are irreversible,    this threshold (th = 15) a new Gaussian is generated. Fore-
scrambling is introduced in [2]: A transform-domain scram-       ground pixels are determined by their belonging to compo-
bling technique, where pixels in the respective regions are      nents with small weights. We apply erosion and morpholog-
pseudo-randomly scrambled based on a secret key. Our ap-         ical operations on the foreground masks to eliminate outlier.
proach is quite similar, but applied on the pixel of disjunct    Our aim was to perfectly expose the silhouettes of persons,
foreground masks to preserve the less invasive image back-       but that target was not always achieved (see Fig. 2 for exam-
ground. An exemplary frame is shown in Fig. 1.                   ples of a good foreground estimation and a bad estimation).

                                                                 2.2   Reversible Scrambling
2.   METHODOLOGY                                                    These foreground areas are then obfuscated by shuffling
   Our proposed privacy-protection approach consists of a        their pixels. So, an obfuscated area differs from its original
background modelling module and a scrambling module that         version in a changed sequence of their pixels.
obfuscates foreground masks. Since the PEViD videos [4]             The shuffle algorithm is based on a modified variant of the
depict static scenes with a low numbers of occurring and         Fisher-Yates method [3] which generates ‘random’ permuta-
moving people, the scrambled foreground still allows to iden-    tions. The original sequence consists of M disjunct areas to
tify persons’ movements and actions. Details such as faces       be obfuscated. Each area a is then represented by a vector
                                                                 containing its line-by-line scanned N pixels. These areas are
                                                                 obfuscated by changing the order of its pixels and mapping
Copyright is held by the author/owner(s).                        back the pixels to its original shape. The new pixel order of
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain   each area is determined by swapping each i-th pixel with the
         Table 1: Evaluations according different streams (median values of the task are in brackets)
                                           stream 1        stream 2        stream 3
                        Intelligibility 73.6 % (74.9 %) 75.2 % (79.3 %) 66.5 % (69.6 %)
                        Privacy         59.0 % (50.2 %) 62.6 % (46.5 %) 60.7 % (40.7 %)
                        Pleasantness 21.9 % (24.8 %) 60.8 % (69.6 %) 58.1 % (59.7 %)


j-th pixel, where j is defined by a pseudo-random number            Pleasantness stands for the influence of the obscuring fil-
generator and the constraint that j ≤ i + 1.                      ter on the human perception of the image distortion. The
                                                                  subjective score is based on the level of user acceptance.
                                                                  Here the score is below the median value resulting from dis-
                                                                  traction of the users.
                                                                    Intelligibility stands for the ability of identifying actions
                                                                  and objects within video frames. All three groups evaluate
                                                                  our filter with high scores that are close to the median. Since
                                                                  full masks of person are retained, their action should be
                                                                  recognizable.
                                                                    The privacy metric concerns about the identification of in-
                                                                  dividuals through their faces, ethics or personal accessories.
                                                                  This score is much higher the average. A high subjective
                                                                  score was excepted, since it is very hard for the human eye
                                                                  to recognise structures within scrambled areas.
Figure 2: Example for a good foreground mask (left)                 We expect higher score in all three categories when apply-
and a bad mask (right) [image section].                           ing a more accurate background subtraction algorithm.

   So, the permutation of the pixels of each foreground areas     4.   CONCLUSION
is determined by the order generated by a pseudo-random              We propose a reversible approach for scrambling fore-
sequence. The pseudo-random sequence is repeatable due to         ground masks within images or videos to obscure its content.
the characteristics of the pseudo-random number generator         This approach ensures a high level of privacy, and achieves
(PRNG). The PRNG produces a random, but repeatable se-            a standard level in the other aspects, like pleasantness and
quence of integer numbers by specifying a certain, but fixed      intelligibility. In future we will investigate the effect of more
seed. This seed is generated from the hash value of a chosen      accurate foreground masks on these privacies scores. The
password. This value is fixed for all regions in each frame       clue is that these areas can be recovered for further analy-
and video sequence. Since the pseudo-random sequence is           sis, if the foreground mask and the password which gener-
repeatable through the given seed, the permutation of pixels      ated the seed for the pseudo-random number generation are
is reversible. So, the scrambled image regions can be recov-      known.
ered by knowing the password and the shape of each disjunct
scrambled area. We choose for scrambling instead of cryp-
tography to be robust against image compression artefacts         5.   ACKNOWLEDGMENTS
and transmission errors. Those errors will also affect the re-       The research leading to these results has received funding
covered frame in terms of distorted pixels, but these errors      from the European Community’s FP7 under grant agree-
will not break the de-scrambling scheme.                          ment number FP7-261743 (VideoSense).

3.   EXPERIMENTS                                                  6.   REFERENCES
   The video sequences of the VPT dataset [1] [4] are ob-         [1] A. Badii, T. Ebrahimi, C. Fedorczak, P. Korshunov,
scured by scrambling foreground objects within each frame.            T. Piatrik, V. Eiselein, and A. Al-Obaidi. Overview of
Since the area of faces are provided with the data set, we            the MediaEval 2014 Visual Privacy Task. In MediaEval
include these areas in our foreground masks. So we ensure             2014 Workshop, Barcelona, Spain, October 16-17 2014.
that the faces are obscured even if it is not part of our fore-   [2] F. Dufaux and T. Ebrahimi. Video surveillance using
ground mask. We are sure that individuals can be identified           jpeg 2000. In Optical Science and Technology, the SPIE
not only by their face but also their clothes or accessories.         49th Annual Meeting, pages 268–275. International
So, the individuals are anonymised at best and a colour-              Society for Optics and Photonics, 2004.
based cluster algorithm may also be able to group areas           [3] R. Durstenfeld. Algorithm 235: Random permutation.
depicting the same person.                                            Commun. ACM, 7(7):420–, July 1964.
   The evaluation of the obscured videos took place using         [4] P. Korshunov and T. Ebrahimi. PEViD: privacy
subjective procedures. Three different groups are asked to            evaluation video dataset. Applications of Digital Image
survey the videos and respond to question concerning the              Processing XXXVI, 25-29 August 2013.
content (number of persons, actions, etc. ). Three metrics        [5] Z. Zivkovic. Improved adaptive gaussian mixture model
are generated from these surveys: pleasantness, intelligibil-         for background subtraction. In Pattern Recognition,
ity, and privacy. These groups contains of crowdsourced               2004. ICPR 2004. Proceedings of the 17th International
workers and two focus groups, the scores based on their               Conference on, volume 2, pages 28–31 Vol.2, Aug 2004.
opinions is shown in Table 1.

</pre>