<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UNIZA@Mediaeval 2014 Visual Privacy Task: Object Transparency Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Paralicˇ</string-name>
          <email>martin.paralic@fel.uniza.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Jarina</string-name>
          <email>roman.jarina@fel.uniza.sk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Žilina, Faculty of Electrical Engineering, Department of Telecommunications and</institution>
          ,
          <addr-line>Multimedia, Univerzitná 1, 01026 Žilina</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Žilina, Faculty of Electrical Engineering, Department of Telecommunications and</institution>
          ,
          <addr-line>Multimedia, Univerzitná 1, 01026 Žilina</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper describes our approach for the Visual Privacy Task (VPT) of the MediaEval 2014. Video privacy ltering based on privacy-sensitive-object transparency is proposed. The background (hidden behind the object) is estimated by median ltering over time sequence of pixel values. We focus only on the areas labeled as high privacy sensitive (i.e. face). Low and medium privacy areas were rather untouched to keep the most of information about person activities. Despite of simplicity of the proposed method it gives promising performance. The performance is at or slightly above the average among the VPT participants.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The problem of privacy protection in video surveillance
is again concerned in this year's MediaEval Visual Privacy
Task (VPT) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The PEViD dataset [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is used for the
impact assessment of alternative solutions. Recently, a variety
of image processing methods have been developed to
protect privacy in multimedia content. A common approach is
based on replacing the sensitive information by color boxes
or distorting the pixels. More sophisticated methods utilize
person silhouettes detection followed by blurring the whole
person [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Other methods are based on encrypting sensitive
regions where the process is reversible for authorized persons
only who know the encryption key [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The disadvantage of covering privacy information is the
fact that the person's activities detection of which is crucial
for surveillance purposes, are often also hidden or altered.
The aim of this research activity is development of the visual
ltering method that keeps as much information as possible
about person's activities in video while keeping the person's
privacy intact.</p>
      <p>We propose the ltering that yields to transparency of the
privacy sensitive objects. The background (hidden behind
the object) estimation is based on computing median over
time-sequence of pixel values for each pixel. We have focused
only on the areas labeled as high privacy sensitive (i.e. face).
Low and medium privacy areas were rather untouched to
keep the most of information about person activities. This
method is aimed to minimize discomfort of watching ltered
video. Thus we utilized only the face position labels from
the provided XML metadata.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>OBJECT SEGMENTATION</title>
      <p>
        One of the requirements for the privacy ltering is object
segmentation. In this task, two kinds of segmentations were
at disposal. The rst one, automatic segmentation as
described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], was utilized for background estimation.
Second one, manual annotation of video stream in the XML
form [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We extracted a face bounding box information
from the provided XML metadata and used it for ltering.
The box was tted by masking ellipse.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>THE PROPOSED METHOD</title>
      <p>
        We examined a simple and straightforward method based
on the ltering that replaces high privacy sensitive pixel
areas by background pixels as depicted in Figure 1. The key
part of the proposed algorithm is proper estimation of the
whole background scene, with the aim to uncover the
background parts that are of high privacy (e.g. face). However
in some scenes, the background can be partially invisible
as depicted in Figure 2. The procedure of the background
estimation is as follows. Time sequence of RGB values of
each pixel is transformed into a time sequence of grayscale
values because of sorting purpose. Then median over each
time sequence is computed. In addition , the foreground
objects, detected by automatic segmentation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], are obscured
by black pixels [R; G; B] = [0; 0; 0]. This step is applied
to avoid inclusion of the foreground objects in background
pixel estimation. It is obvious that for su cient background
estimation it is crucial that each background pixel has to be
visible at least in one video frame. The background image
is built from RGB values from the middle of the sorted time
sequences of the pixel values.
4.
      </p>
    </sec>
    <sec id="sec-4">
      <title>THE EVALUATION FRAMEWORK</title>
      <p>
        The video sequences were evaluated to ful ll the UI-REF
privacy protection requirements. Overall results of the crowd
evaluation for submitted entries were quali ed in terms of
the following criteria [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]:
      </p>
      <p>The Privacy Protection Level { an average level of
privacy protection across all clips.</p>
      <p>Level of Intelligibility { the amount of useful
information is retined after ltering.</p>
      <p>The Appropriateness { aesthetic perceptual appeal to
human viewers.</p>
      <p>The evaluation scenario is composed the following three streams:
Stream 1 For the crowdsourcing evaluation, about 290
workers answered several privacy, intelligibility, and
pleasantness related questions for 6 pre-selected videos from
the test videos submitted by participants.</p>
      <p>Stream 2 A focus group comprising 65 participants (15
females), from Thales, France took part in this
evaluation. The majority of the participants were sta from
the R&amp;D departments, while the rest were from
Management, Security, and other departments.</p>
      <p>Stream 3 A focus group comprising 59 participants (22
females), from sectors including R&amp;D, data protection,
law enforcement, from around the world took part in
this study.</p>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION RESULTS</title>
      <p>The evaluation results of the proposed lter performance
were obtained in the 3 streams according to the VPT 2014
evaluation scenario. The reported results of the test among
the VPT 2014 participants, which were evaluated in terms
of the de ned criteria, are presented as median score over
ten teams. This median serves as a baseline to compare our
obtained results. The performance of the proposed approach
compared to median results is depicted in Figure 3. Despite
of simplicity of the proposed method it gives surprisingly
promising performance. The obtained score is at, or slightly
above the average in terms of all the three criteria.</p>
      <p>The challenging problem is detection and estimation of
the partially invisible background as shown in Figure 2. In
70,00%
60,00%
50,00%
40,00%
30,00%
20,00%
10,00%
0,00%
intel igibilityscore
privacyscore
pleasantnessscore
UNIZA1</p>
      <p>Median1</p>
      <p>UNIZA2</p>
      <p>Median2</p>
      <p>UNIZA3</p>
      <p>Median3
our future work, we will focus to automatic detection of face
position and use more precise ltering tight around person
face contours as well as use of more sophisticated
translucency techniques.
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Obaidi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Einig</surname>
          </string-name>
          .
          <article-title>Mediaeval 2013 visual privacy task: Holistic evaluation framework for privacy by co-design impact assessment</article-title>
          .
          <source>In MediaEval'2013</source>
          , pages
          <article-title>1{1</article-title>
          . CEUR-WS.org,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fedorczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Piatrik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Eiselein</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Al-Obaidi</surname>
          </string-name>
          .
          <article-title>Overview of the mediaeval 2014 visual privacy task</article-title>
          .
          <source>In MediaEval</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Einig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tiemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thiemert</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lallah</surname>
          </string-name>
          .
          <article-title>Visual context identi cation for privacy-respecting video analytics</article-title>
          .
          <source>In 14th IEEE MMSP International Workshop on Multimedia Signal Processing</source>
          , pages
          <volume>366</volume>
          {
          <fpage>371</fpage>
          ,
          <year>September 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Boult</surname>
          </string-name>
          . Pico:
          <article-title>Privacy through invertible cryptographic obscuration</article-title>
          .
          <source>In Computer Vision for Interactive and Intelligent Environments</source>
          , pages
          <volume>27</volume>
          {
          <fpage>38</fpage>
          ,
          <string-name>
            <surname>November</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Fradi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Eiselein</surname>
          </string-name>
          , I. Keller, J.
          <string-name>
            <surname>-L. Dugelay</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Sikora</surname>
          </string-name>
          .
          <article-title>Crowd context-dependent privacy protection lters</article-title>
          .
          <source>In 18th International Conference on Digital Signal Processing</source>
          , pages
          <volume>1</volume>
          {
          <fpage>6</fpage>
          ,
          <string-name>
            <surname>July</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          .
          <article-title>Crowdsourcing approach for evaluation of privacy lters in video surveillance</article-title>
          .
          <source>In 2013 18th International Conference on Digital Signal Processing (DSP)</source>
          , pages
          <fpage>1</fpage>
          <article-title>{6</article-title>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          . Pevid:
          <article-title>Privacy evaluation video dataset applications of digital image processing xxxvi</article-title>
          .
          <source>In SPIE International Society for Optics and Photonics</source>
          ,
          <year>August 2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>