<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shape and Color-aware Privacy Protection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Melle</string-name>
          <email>andrea.melle@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Luc Dugelay</string-name>
          <email>jean-luc.dugelay@eurecom.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>450 Route Des Chappes, Biot</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>We introduce a novel content-independent lter to protect privacy sensitive Regions Of Interest (ROI) in video surveillance sequences. An abstracted version of the original image is rendered such as the general appearance of shapes and colors is preserved, while obfuscating ne details carrying personal visual information. We use shapes and colors-aware, temporally coherent segmentation algorithm, combined with a color quantization and patch rendering step.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The increasing adoption of video surveillance systems has
led to a growing research interest in privacy protection
methods. A review of principles for privacy protection in video
surveillance can be found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], while an evaluation of
several existing protection lters is reported in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. One
persistent challenge in privacy protection remains to nd the
correct balance between obfuscation of personal visual
information, intelligibility of the source and pleasantness.
      </p>
      <p>
        Non-photorealistic rendering techniques described in the
literature achieve artistic e ects such as tooning, painting, or
sketching. For example, in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] the authors propose a video
abstraction pipeline based on bilateral lter and color
quantization, and subjectively evaluate both visual pleasantness
and intelligibility, coming to the conclusion that abstracted
images favor general content understanding. The use of
segmentation to obtain a pixelizated result resembling pixel art
has been proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, this method applied to
privacy protection would carry the same drawbacks of the
commonly adopted pixelization lter [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>We propose a new privacy protection lter inspired by
results in image abstraction and non-photorealistic rendering
elds. Our method is based on a boundaries and
regionsaware segmentation algorithm, combined with a color
quantization and patch rendering step, which transforms the
original privacy sensitive ROI in a stylized and simpli ed
version. While the general appearance of shapes and colors is
preserved, to allow for people and actions detection tasks,
identi cation details, such as faces and clothes traits, are
obfuscated to render identi cation impossible.</p>
    </sec>
    <sec id="sec-2">
      <title>PROPOSED APPROACH</title>
      <p>Given a video sequence together with bounding boxes
de ning the privacy sensitive ROIs, our algorithm proceeds
in three steps. First, a segmentation algorithm divides the
image in boundaries-aware patches. Second, the image is
abstracted by replacing the pixels in each patch with a single
color chosen from a palette. Finally, the abstracted image
is rendered on top of the original frame to produce the nal
output. If additional region annotations or background
subtraction maps are available, the nal result can be further
re ned by binary masking. Figure 1 shows an example of
original and ltered frame.</p>
      <p>The algorithm allows adaptation to the desired strength of
privacy protection. By varying the number of patches, either
globally or independently in certain regions, we can obtain
di erent levels of abstraction. Our C++ implementation
takes on average less than 0:5 seconds/frame for
segmentation and about 0:3 seconds/frame for color quantization and
rendering.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Segmentation</title>
      <p>
        The intuition behind our privacy protection algorithm is
to render an abstracted version of the image by replacing
patches of pixels with a single color chosen from a palette.
To preserve intelligibility and visual pleasantness, we aim
for a region and boundary-aware process. Accordingly, we
adopt a segmentation procedure which divides the image in a
user-speci ed number N of arbitrarily shaped patches,
maximizing both their spatial and color consistency. A good
review of patch, or superpixel, segmentation methods, together
with the original description of the algorithm we adopted in
our work (SLIC ) can be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The SLIC segmentation algorithm is based on K-means
clustering [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] performed in a 5D space which includes both
spatial coordinates (x; y) and color values in the
perceptually uniform (L; a; b) space. While the original formulation
of SLIC works best for still images, when applied to video
sequences jittery artifacts appear, due to temporal
inconsistencies in color and shape of patches over several frames.
Therefore, we adopt an extension of the algorithm which
enforces temporal consistency by including the temporal
dimension t in the clustering distance metric: a video can be
represented as a 3D volume by stacking up its frames over
the time dimension, and therefore segmented in supervoxels.
A combined distance metric is obtained as a linear
combination of the two L2 norms on space-time (x; y; t) coordinates
and color (L; a; b) values:
      </p>
      <p>D = dLab + c</p>
      <p>R</p>
      <p>C
N
dxyt
(1)
Where N is the desired number of patches, (R; C) are the
height and width of the ROI and c is a compactness
parameter balancing the trade-o between spatial proximity and
color similarity of the resulting clusters.</p>
      <p>The output of the segmentation algorithm is a
segmentation label map, where each patch is identi ed by a unique
label. When additional annotation corresponding to
speci c regions, such as a face, is available, we enforce a higher
level of privacy protection by merging all the patches
substantially overlapping with such region, to ensure proper
obfuscation of shape and color details.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Color quantization and patch rendering</title>
      <p>
        We keep a palette of a small xed number of colors (e.g.
8) progressively updated from the upcoming frames as
following: we rst compute the average color for each patch
and subsequently build the palette with a K-medoids
quantization [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] over all the color occurrences at the current and
previous most recent n = 5 frames. Each patch is then lled
by the closest color in the palette. The resulting ltered
image still resembles the original one in the general shape and
color appearance, but the ne details are destroyed.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Masking</title>
      <p>To make the result visually more appealing and avoid
ltering nonsensitive regions, we crop the abstract image with
a foreground mask, inferred from the annotations and
background subtraction maps, when available. Very sensitive
regions such as face and skin are represented with an ellipse in
the mask, to enforce maximum protection. The nal frame
is computed as:</p>
      <p>Iout = Ia \ [S(L; Mf )</p>
      <p>T ] Iin \ [S(L; Mf ) &lt; T ]
(2)
where Iout is the nal rendered image, Ia the abstract image,
Iin the input image, L the segmentation labels map and Mf
the foreground mask. S is a support operator which counts
the number of foreground pixels for each given patch label.
In such way, each patch is either fully rendered abstracted
or fully rendered original in the nal image.</p>
      <p>(a)</p>
    </sec>
    <sec id="sec-6">
      <title>RESULTS</title>
      <p>
        We applied the proposed method on selected sequences
from the PEViD dataset [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Evaluation has been performed
according to the MediaEval 2013 Visual Privacy Task
guidelines, as described in great details in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Table 1 reports our
scores, together with the average score of all participants to
the challenge.
      </p>
      <sec id="sec-6-1">
        <title>Objective</title>
        <p>Score Average
0.563 0.502
0.576 0.665
0.385 0.56</p>
      </sec>
      <sec id="sec-6-2">
        <title>Subjective</title>
        <p>Score Average
0.728 0.656
0.607 0.684
0.514 0.492
4.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS</title>
      <p>In this paper we have proposed a novel privacy lter based
on a region-aware segmentation algorithm combined with a
color quantization and abstract rendering step. The result
is a stylized image where the general intelligibility of shape
and color is preserved, but the ne details of visual features
are destroyed.
5.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work has been conducted within the framework of
the EC funded Network of Excellence VideoSense.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Achanta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lucchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fua</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. SuILsstrunk.</surname>
          </string-name>
          <article-title>Slic superpixels compared to state-of-the-art superpixel methods</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>34</volume>
          (
          <issue>11</issue>
          ):
          <volume>2274</volume>
          {
          <fpage>2282</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Einig</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Piatrik</surname>
          </string-name>
          .
          <article-title>Overview of the mediaeval 2013 visual privacy task</article-title>
          .
          <source>MediaaEval 2013 Workshop, October 18-19</source>
          ,
          <year>2013</year>
          , Barcelona, Spain,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dufaux</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          .
          <article-title>A Framework for the Validation of Privacy Protection Solutions in Video Surveillance</article-title>
          .
          <source>In Proc. of IEEE International Conference on Multimedia &amp; Expo, IEEE International Conference on Multimedia and Expo</source>
          . Ieee Service Center, 445 Hoes Lane, Po Box 1331,
          <string-name>
            <surname>Piscataway</surname>
          </string-name>
          , Nj 08855-1331 Usa,
          <year>2010</year>
          .
          <article-title>special session on "Privacy-aware Multimedia Surveillance"</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gerstner</surname>
          </string-name>
          , D. DeCarlo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alexa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Finkelstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gingold</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Nealen</surname>
          </string-name>
          .
          <article-title>Pixelated image abstraction</article-title>
          .
          <source>In Proceedings of the International Symposium on Non-Photorealistic Animation and Rendering (NPAR)</source>
          ,
          <year>June 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaufman</surname>
          </string-name>
          and P. Rousseeuw.
          <article-title>Clustering by means of medoids</article-title>
          .
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          .
          <article-title>Pevid: privacy evaluation video dataset</article-title>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Melle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Dugelay</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          .
          <article-title>A framework for objective evaluation of privacy lters in video surveillance</article-title>
          .
          <source>In Proceedings of SPIE</source>
          Volume
          <volume>8856</volume>
          , volume
          <volume>8856</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lloyd</surname>
          </string-name>
          .
          <article-title>Least squares quantization in pcm</article-title>
          .
          <source>IEEE Trans. Inf</source>
          . Theor.,
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <volume>129</volume>
          {
          <fpage>137</fpage>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Senior</surname>
          </string-name>
          .
          <article-title>Privacy protection in a video surveillance system</article-title>
          . In A. Senior, editor,
          <source>Protecting Privacy in Video Surveillance</source>
          , pages
          <volume>35</volume>
          {
          <fpage>47</fpage>
          . Springer London,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Winnemo</surname>
          </string-name>
          ller,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Olsen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gooch</surname>
          </string-name>
          .
          <article-title>Real-time video abstraction</article-title>
          .
          <source>In ACM SIGGRAPH 2006 Papers, SIGGRAPH '06</source>
          , pages
          <fpage>1221</fpage>
          {
          <fpage>1226</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>