<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TUB @ MediaEval 2014 Visual Privacy Task: Reversible Scrambling on Foreground Masks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Schmiedeke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Kelm</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lutz Goldmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Sikora</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Imcube Labs GmbH Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper describes our participation in the Visual Privacy Task of MediaEval 2014, which aims to obscure human occurrence in image sequences. As a result the recorded person should be unrecognisable, but if needed the obscured areas can be recovered. We use an approach which models the background and pseudo-randomly scrambles pixels within disjunct foreground areas. This technique is reversible and preserves the colour characteristic of each area. So, colourbased approaches will still be able to automatically distinguish between di erently dressed individuals. The evaluations of our results show that the privacy aspect got a high score in all three evaluation streams. The level of intelligibility and the pleasantness of our approach is below the average, since scrambling results in lower `aesthetic' images.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Communication Systems Group
Technische Universität Berlin, Germany</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        Video surveillance of public spaces is expanding.
Consequently, individuals are increasingly concerned about the
`invasiveness' of such ubiquitous surveillance and fear that
their privacy is at risk. The demands of stakeholders to
prevent criminal activities are often seen to be in con ict with
the privacy requirements of individuals. The main challenge
is to preserve the anonymity of the surveyed individuals and
also to ful l the stakeholders needs. The problem of privacy
protection in video surveillance is concerned in this year's
MediaEval Visual Privacy Task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A typical way to protect
privacy in images and videos is to apply techniques such as
blurring or masking. Since these techniques are irreversible,
scrambling is introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: A transform-domain
scrambling technique, where pixels in the respective regions are
pseudo-randomly scrambled based on a secret key. Our
approach is quite similar, but applied on the pixel of disjunct
foreground masks to preserve the less invasive image
background. An exemplary frame is shown in Fig. 1.
      </p>
    </sec>
    <sec id="sec-3">
      <title>METHODOLOGY</title>
      <p>
        Our proposed privacy-protection approach consists of a
background modelling module and a scrambling module that
obfuscates foreground masks. Since the PEViD videos [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
depict static scenes with a low numbers of occurring and
moving people, the scrambled foreground still allows to
identify persons' movements and actions. Details such as faces
scrambling
are only recognizable in the recovered (de-scrambled)
images.
2.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Background Modelling</title>
      <p>
        We use background subtraction for generating a foreground
mask for each frame. In order to compensate slight camera
movements, each frame is subsampled by two and the
resulting masks are interpolated properly. Our background
modelling module relies on a improved background
subtraction scheme [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] based on Gaussian-Mixture models (GMM).
This algorithm automatically selects the needed number of
Gaussian components per pixel. The mixture of these
components tries to re ect the desired background colour by
incorporating the recent 300 frames, due to the static video
content. The number of components is controlled by a
Mahalanobis distance threshold. If the squared Mahalanobis
distance of a pixel colour to any existing component exceeds
this threshold (th = 15) a new Gaussian is generated.
Foreground pixels are determined by their belonging to
components with small weights. We apply erosion and
morphological operations on the foreground masks to eliminate outlier.
Our aim was to perfectly expose the silhouettes of persons,
but that target was not always achieved (see Fig. 2 for
examples of a good foreground estimation and a bad estimation).
2.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Reversible Scrambling</title>
      <p>These foreground areas are then obfuscated by shu ing
their pixels. So, an obfuscated area di ers from its original
version in a changed sequence of their pixels.</p>
      <p>
        The shu e algorithm is based on a modi ed variant of the
Fisher-Yates method [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which generates `random'
permutations. The original sequence consists of M disjunct areas to
be obfuscated. Each area a is then represented by a vector
containing its line-by-line scanned N pixels. These areas are
obfuscated by changing the order of its pixels and mapping
back the pixels to its original shape. The new pixel order of
each area is determined by swapping each i-th pixel with the
j-th pixel, where j is de ned by a pseudo-random number
generator and the constraint that j i + 1.
      </p>
      <p>So, the permutation of the pixels of each foreground areas
is determined by the order generated by a pseudo-random
sequence. The pseudo-random sequence is repeatable due to
the characteristics of the pseudo-random number generator
(PRNG). The PRNG produces a random, but repeatable
sequence of integer numbers by specifying a certain, but xed
seed. This seed is generated from the hash value of a chosen
password. This value is xed for all regions in each frame
and video sequence. Since the pseudo-random sequence is
repeatable through the given seed, the permutation of pixels
is reversible. So, the scrambled image regions can be
recovered by knowing the password and the shape of each disjunct
scrambled area. We choose for scrambling instead of
cryptography to be robust against image compression artefacts
and transmission errors. Those errors will also a ect the
recovered frame in terms of distorted pixels, but these errors
will not break the de-scrambling scheme.</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>
        The video sequences of the VPT dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are
obscured by scrambling foreground objects within each frame.
Since the area of faces are provided with the data set, we
include these areas in our foreground masks. So we ensure
that the faces are obscured even if it is not part of our
foreground mask. We are sure that individuals can be identi ed
not only by their face but also their clothes or accessories.
So, the individuals are anonymised at best and a
colourbased cluster algorithm may also be able to group areas
depicting the same person.
      </p>
      <p>The evaluation of the obscured videos took place using
subjective procedures. Three di erent groups are asked to
survey the videos and respond to question concerning the
content (number of persons, actions, etc. ). Three metrics
are generated from these surveys: pleasantness,
intelligibility, and privacy. These groups contains of crowdsourced
workers and two focus groups, the scores based on their
opinions is shown in Table 1.</p>
      <p>Pleasantness stands for the in uence of the obscuring
lter on the human perception of the image distortion. The
subjective score is based on the level of user acceptance.
Here the score is below the median value resulting from
distraction of the users.</p>
      <p>Intelligibility stands for the ability of identifying actions
and objects within video frames. All three groups evaluate
our lter with high scores that are close to the median. Since
full masks of person are retained, their action should be
recognizable.</p>
      <p>The privacy metric concerns about the identi cation of
individuals through their faces, ethics or personal accessories.
This score is much higher the average. A high subjective
score was excepted, since it is very hard for the human eye
to recognise structures within scrambled areas.</p>
      <p>We expect higher score in all three categories when
applying a more accurate background subtraction algorithm.
4.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>We propose a reversible approach for scrambling
foreground masks within images or videos to obscure its content.
This approach ensures a high level of privacy, and achieves
a standard level in the other aspects, like pleasantness and
intelligibility. In future we will investigate the e ect of more
accurate foreground masks on these privacies scores. The
clue is that these areas can be recovered for further
analysis, if the foreground mask and the password which
generated the seed for the pseudo-random number generation are
known.
5.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENTS</title>
      <p>The research leading to these results has received funding
from the European Community's FP7 under grant
agreement number FP7-261743 (VideoSense).
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fedorczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Piatrik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Eiselein</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Al-Obaidi</surname>
          </string-name>
          .
          <article-title>Overview of the MediaEval 2014 Visual Privacy Task</article-title>
          . In MediaEval 2014 Workshop, Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dufaux</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          .
          <article-title>Video surveillance using jpeg 2000</article-title>
          .
          <source>In Optical Science and Technology, the SPIE 49th Annual Meeting</source>
          , pages
          <volume>268</volume>
          {
          <fpage>275</fpage>
          .
          <source>International Society for Optics and Photonics</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Durstenfeld</surname>
          </string-name>
          . Algorithm 235:
          <article-title>Random permutation</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>7</volume>
          (
          <issue>7</issue>
          ):
          <fpage>420</fpage>
          {,
          <year>July 1964</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          .
          <article-title>PEViD: privacy evaluation video dataset</article-title>
          .
          <source>Applications of Digital Image Processing XXXVI</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          August
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zivkovic</surname>
          </string-name>
          .
          <article-title>Improved adaptive gaussian mixture model for background subtraction</article-title>
          .
          <source>In Pattern Recognition</source>
          ,
          <year>2004</year>
          .
          <article-title>ICPR 2004</article-title>
          .
          <source>Proceedings of the 17th International Conference on</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>28</fpage>
          <lpage>{</lpage>
          31 Vol.
          <volume>2</volume>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>