Overview of the MediaEval 2014 Visual Privacy Task
                            Atta Badii1, Touradj Ebrahimi2, Christian Fedorczak3, Pavel Korshunov4,
                                       Tomas Piatrik5, Volker Eiselein6, Ahmed Al-Obaidi7

                                                 1 & 7 {atta.badii, a.al-obaidi}@reading.ac.uk
                                              2 & 4 {touradj.ebrahimi, pavel.korshunov}@epfl.ch
                                                   3 christian.fedorczak@thalesgroup.com
                                                             5 t.piatrik@qmul.ac.uk
                                                          6 eiselein@nue.tu-berlin.de


ABSTRACT                                                                 lighting conditions of the videos also varied widely as they
This paper presents an overview of the Visual Privacy Task (VPT)         recorded a range of indoors, outdoors, day/night-time scenes. The
of MediaEval 2014, its objectives, related dataset, and evaluation       ground truth was created manually by the task organisers and
approaches. Participants in this task were required to implement a       consisted of annotations of the bounding boxes containing the
privacy filter or a combination of filters to protect various            regions of High (H), Medium (M), or Low (L) Personally
personal information regions in video sequences as provided. The         Identifiable Information elements (PIIs) including persons’ faces
challenge was to achieve an adequate balance between the degree          and accessories. In order to simulate context–aware privacy
of privacy protection, intelligibility (how much useful information      protection solutions [3], unusual events occurring within the
is retained post privacy filtering), and pleasantness (how minimal       video datatset, such as fighting, stealing and dropping-a-bag were
were the adverse effects of filtering on the appearance of the video     also annotated. The annotations were provided in XML format
frames). The submissions from the eight (8) teams who                    alongside a foreground mask in the form of binary sequences.
participated in this task were evaluated subjectively by                 These included such annotations that distinguished the relative
surveillance experts, practitioners, data protection experts and by      privacy sensitivity of PIIs; namely for Skin (M), Face (H), Hair
naïve viewers using a crowdsourcing approach.                            (L), Accessories (M), and for Person’s body (L). The dataset was
                                                                         provided in accordance with the European Data Protection and
1. INTRODUCTION                                                          ethical compliance guidelines including informed consent and
Advances in artificial intelligence and video surveillance have led      access control as required. Figure 1 depicts a sample frame from
to increasingly complex surveillance systems of rising scale and         the dataset with annotated regions as rectangles.
capabilities. The ubiquity and enhanced capability of such
surveillance can pose significant threats to citizens’ privacy and
therefore new mitigation technologies are needed to ensure an
appropriate level of privacy protection. The Visual Privacy Task
(VPT) of MediaEval 2014 thus provided an opportunity for
experimentation to explore how video-analytic techniques may
arrive at enhanced solutions to some visual privacy problems [1].
This task focuses on privacy protection techniques that are
responsive to the context-specific needs of persons for privacy.
The evaluation was performed using three distinct user studies
aimed at developing a deeper understanding of users’ perceptions
of the effects and side-effects of privacy filtering to ensure the
validity and user-acceptability of the evaluation results.                   Figure 1: Sample annotated frame from the VPT Dataset

2. VPT 2014 DATASET                                                      3. MOTIVATION AND OBJECTIVES
The PEViD dataset [2] was specifically created for impact                The MediaEval 2014 Visual Privacy Task was motivated by
assessment of the privacy protection technologies. The dataset           application domains such as video privacy filtering of videos
consists of two subsets, namely the training and testing sets;           taken in public spaces, by smart phones, web-cams, surveillance
comprising of (21) videos as captured with both standard and high        CCTVs, and, videos stored in social websites. For this task, the
resolution cameras. The video clips are in MPEG format in full           participants were encouraged to implement a combination of
HD resolution of (1920x1080) pixels at a rate of (25) frames per         several privacy filters to protect various personal information
second and approximately (16) second each.                               regions in videos, by optimising the privacy filtering so as to: i)
The video data includes various scenarios featuring one or several       obscure such personal information effectively whilst, ii) keeping
human subjects walking or interacting. The actors may also carry         as much as possible of the ‘useful’ information that would enable
specific items, which could potentially reveal their identity and        a human viewer to form some ‘useful’ interpretation of the
may therefore need to be privacy-filtered appropriately. For             obscured video frame at some level of abstraction without
example, the actors are featured carrying backpacks, umbrellas,          compromising the privacy protection level as required by the
wearing scarves, and performing various actions, such as fighting,       person(s) featured in the video-frame.            Personal visual
pickpocketing, dropping-a-bag, or simply walking. Actors may be          information is subjective human-perceived information that can
at a distance from the camera or near the camera, making their           expose a person’s identity to a human viewer. This can include
faces appear with varying pixel size and quality. The ambient            richly detailed image regions such as distinctive facial features or
                                                                         personal jewellery as well as less rich uniform regions e.g. skin
Copyright is held by the author/owner(s).
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain
regions (that expose racial identity) or body silhouette showing a      Management, Security, and other departments. The submissions
person’s gait (that generally helps to differentiate women from         were evaluated via paper-based responses to the questions.
men and in some cases it may even enable a close friend or a            In Stream 3 evaluations, the focus group comprised of (59)
spouse to identify the person). Therefore, to satisfy both of the       participants including (22) females. This group included some key
above-mentioned criteria, i), and, ii) above, privacy protection        stakeholder types such as people from R&D, data protection, and
solutions were required to take into account different types of         law enforcement, who took part in this study from around the
visual personal information. . The participants were encouraged         world. The participants streamed the videos and answered the
to exploit the annotation information to achieve the appropriate        questionnaire using online forms. As as results of the described
level of privacy filtering for each person, object, and                 evaluations, VPT participants received a set of 3 by 3 matrices
Low/Medium/High information regions and accordingly select the          comprising the results of each participant for each tier of
best-fit filtering. It was anticipated that a single privacy filter     evaluation; quantified in terms of the following criteria:
applied to all parts of an image would result in a sub-optimal
solution and a combination of several privacy filters would                  1) The Privacy Protection Level – an average level of
provide more effective filtering.                                                 privacy protection across all testing video clips.
                                                                             2) The Level of Intelligibility – the amount of ‘useful’
                                                                                  information that was retained in the video frames after
4. SUBMISSIONS EVALUATIONS                                                        privacy filtering had been applied.
The submitted video clips were methodologically evaluated using              3) The Pleasantness of the resulting privacy filtered video
UI-REF based privacy protection requirements. Accordingly the                     frames in terms of their ‘aesthetic’ perceptual appeal to
evaluations attempted to assess the perceived efficacy, as well as                human viewers.
side-effects and affects arising from a proposed privacy filtering      Figure 2 depicts an overview of the results from the three (3)
solution -as described in [4,5]. Eight (8) research teams               evaluation streams represented by the median values of the
submitted privacy filtered video sequences for the evaluation. In       submissions for each criterion.
the context of surveillance scenarios, three distinct user studies
were conducted to ensure the validity of the evaluation results.
The subjective evaluations comprised:
     a)   Stream 1: crowdsourcing evaluations by the general
          public from online communities (“naïve subjects”) in
          accordance with the methodology in [6];
     b)   Stream 2: subjective evaluations by security system
          manufacturers and video-analytics technology and
          privacy protection solutions developers;
     c)   Stream 3: online subjective evaluations by a focus group
          comprising trained CCTV monitoring professionals, and            Figure 2: Median values of the results from the 3 streams
          law enforcement personnel.
                                                                        5. ACKNOWLEDGMENTS
For consistency in the analysis of evaluation results from all          The Visual Privacy Task at MediaEval 2014 was supported by the
streams for all participants’ solutions, the same six (6) video clips   European Commission under contracts FP7-261743 VideoSense.
were pre-selected from each submission and evaluated using the
three (3) evaluation streams. A questionnaire consisting of 12
questions had been carefully designed to examine aspects related        6. REFERENCES
to privacy, intelligibility, and pleasantness; this was used in         [1] Senior, A., Pankanti, S., Hampapur, A., Brown, L., Tian,
stream 2 and 3. The first (5) questions were aimed at eliciting the         Y.L., Ekin, A., Connell, J., Shu, C.F., and Lu, M., Enabling
opinions of the evaluators re the Contents of the viewed videos.            video privacy through computer vision; IEEE Security and
The responses to these questions were considered with respect to            Privacy 3, no. 3, pp. 50-57, 2005.
the ground truth. The rest of the questions were aimed at eliciting     [2] Korshunov, P., and Ebrahimi, T., PEViD: privacy evaluation
the Subjective Opinions of the evaluators re the viewed videos.             video dataset applications of digital image processing;
Stream 1 used a shortened version of the questionnaire with (7)             XXXVI, San Diego, USA, August 2013.
questions in total due to crowdsourcing constraints. Some 290           [3] Badii, A., Einig, M., Tiemann, M., Thiemert, D. and Lallah,
workers responded to the crowd-sourced evaluations. In the                  C., Visual context identification for privacy-respecting video
design of the crowdsourcing campaign, special care was taken so             analytics; in IEEE 14th International Workshop on
that a worker would not see the same content with different filters         Multimedia Signal Processing (MMSP 2012), pp. 366-371,
(only one filter per content) and would not see different contents          Banff, Canada, September 2012.
with the same filter (only one content per filter). Also, only the
answers from reliable crowd-sourcing workers were taken into            [4] Badii, A, “UI-REF Methodology”, articles online.
account. The reliability was ensured via honeypots, mean and            [5] Badii, A., Al-Obaidi, A., and Einig, M., MediaEval 2013
deviation metrics of time per response to a question, and total             Visual Privacy Task: Holistic Evaluation Framework for
time per campaign. Out of the total 290 workers, 230 were found             Privacy by Co-Design Impact Assessment. Proceedings of
to have provided reliable responses to all the 8 evaluation batches,        the MediaEval 2013 Workshop, Barcelona, Spain, 18-19
which resulted in 230/8=29 sets of workers' evaluations for each            October 2013
filter submitted by each participant.
                                                                        [6] Korshunov, P., Nemoto, H., Skodras, A., and Ebrahimi,
In Stream 2 evaluations, the focus group consisted of (65)                  T. Crowdsourcing-based Evaluation of Privacy in HDR
participants, (15) of them were females; staff from Thales, France          Images. SPIE Photonics Europe 2014, Optics, Photonics and
took part in this evaluation. The majority of the participants were         Digital Technologies for Multimedia Applications, Brussels,
from the R&D departments, while the rest were from                          Belgium, Proceedings of SPIE, 2014.