Overview of the MediaEval 2014 Visual Privacy Task Atta Badii1, Touradj Ebrahimi2, Christian Fedorczak3, Pavel Korshunov4, Tomas Piatrik5, Volker Eiselein6, Ahmed Al-Obaidi7 1 & 7 {atta.badii, a.al-obaidi}@reading.ac.uk 2 & 4 {touradj.ebrahimi, pavel.korshunov}@epfl.ch 3 christian.fedorczak@thalesgroup.com 5 t.piatrik@qmul.ac.uk 6 eiselein@nue.tu-berlin.de ABSTRACT lighting conditions of the videos also varied widely as they This paper presents an overview of the Visual Privacy Task (VPT) recorded a range of indoors, outdoors, day/night-time scenes. The of MediaEval 2014, its objectives, related dataset, and evaluation ground truth was created manually by the task organisers and approaches. Participants in this task were required to implement a consisted of annotations of the bounding boxes containing the privacy filter or a combination of filters to protect various regions of High (H), Medium (M), or Low (L) Personally personal information regions in video sequences as provided. The Identifiable Information elements (PIIs) including persons’ faces challenge was to achieve an adequate balance between the degree and accessories. In order to simulate context–aware privacy of privacy protection, intelligibility (how much useful information protection solutions [3], unusual events occurring within the is retained post privacy filtering), and pleasantness (how minimal video datatset, such as fighting, stealing and dropping-a-bag were were the adverse effects of filtering on the appearance of the video also annotated. The annotations were provided in XML format frames). The submissions from the eight (8) teams who alongside a foreground mask in the form of binary sequences. participated in this task were evaluated subjectively by These included such annotations that distinguished the relative surveillance experts, practitioners, data protection experts and by privacy sensitivity of PIIs; namely for Skin (M), Face (H), Hair naïve viewers using a crowdsourcing approach. (L), Accessories (M), and for Person’s body (L). The dataset was provided in accordance with the European Data Protection and 1. INTRODUCTION ethical compliance guidelines including informed consent and Advances in artificial intelligence and video surveillance have led access control as required. Figure 1 depicts a sample frame from to increasingly complex surveillance systems of rising scale and the dataset with annotated regions as rectangles. capabilities. The ubiquity and enhanced capability of such surveillance can pose significant threats to citizens’ privacy and therefore new mitigation technologies are needed to ensure an appropriate level of privacy protection. The Visual Privacy Task (VPT) of MediaEval 2014 thus provided an opportunity for experimentation to explore how video-analytic techniques may arrive at enhanced solutions to some visual privacy problems [1]. This task focuses on privacy protection techniques that are responsive to the context-specific needs of persons for privacy. The evaluation was performed using three distinct user studies aimed at developing a deeper understanding of users’ perceptions of the effects and side-effects of privacy filtering to ensure the validity and user-acceptability of the evaluation results. Figure 1: Sample annotated frame from the VPT Dataset 2. VPT 2014 DATASET 3. MOTIVATION AND OBJECTIVES The PEViD dataset [2] was specifically created for impact The MediaEval 2014 Visual Privacy Task was motivated by assessment of the privacy protection technologies. The dataset application domains such as video privacy filtering of videos consists of two subsets, namely the training and testing sets; taken in public spaces, by smart phones, web-cams, surveillance comprising of (21) videos as captured with both standard and high CCTVs, and, videos stored in social websites. For this task, the resolution cameras. The video clips are in MPEG format in full participants were encouraged to implement a combination of HD resolution of (1920x1080) pixels at a rate of (25) frames per several privacy filters to protect various personal information second and approximately (16) second each. regions in videos, by optimising the privacy filtering so as to: i) The video data includes various scenarios featuring one or several obscure such personal information effectively whilst, ii) keeping human subjects walking or interacting. The actors may also carry as much as possible of the ‘useful’ information that would enable specific items, which could potentially reveal their identity and a human viewer to form some ‘useful’ interpretation of the may therefore need to be privacy-filtered appropriately. For obscured video frame at some level of abstraction without example, the actors are featured carrying backpacks, umbrellas, compromising the privacy protection level as required by the wearing scarves, and performing various actions, such as fighting, person(s) featured in the video-frame. Personal visual pickpocketing, dropping-a-bag, or simply walking. Actors may be information is subjective human-perceived information that can at a distance from the camera or near the camera, making their expose a person’s identity to a human viewer. This can include faces appear with varying pixel size and quality. The ambient richly detailed image regions such as distinctive facial features or personal jewellery as well as less rich uniform regions e.g. skin Copyright is held by the author/owner(s). MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain regions (that expose racial identity) or body silhouette showing a Management, Security, and other departments. The submissions person’s gait (that generally helps to differentiate women from were evaluated via paper-based responses to the questions. men and in some cases it may even enable a close friend or a In Stream 3 evaluations, the focus group comprised of (59) spouse to identify the person). Therefore, to satisfy both of the participants including (22) females. This group included some key above-mentioned criteria, i), and, ii) above, privacy protection stakeholder types such as people from R&D, data protection, and solutions were required to take into account different types of law enforcement, who took part in this study from around the visual personal information. . The participants were encouraged world. The participants streamed the videos and answered the to exploit the annotation information to achieve the appropriate questionnaire using online forms. As as results of the described level of privacy filtering for each person, object, and evaluations, VPT participants received a set of 3 by 3 matrices Low/Medium/High information regions and accordingly select the comprising the results of each participant for each tier of best-fit filtering. It was anticipated that a single privacy filter evaluation; quantified in terms of the following criteria: applied to all parts of an image would result in a sub-optimal solution and a combination of several privacy filters would 1) The Privacy Protection Level – an average level of provide more effective filtering. privacy protection across all testing video clips. 2) The Level of Intelligibility – the amount of ‘useful’ information that was retained in the video frames after 4. SUBMISSIONS EVALUATIONS privacy filtering had been applied. The submitted video clips were methodologically evaluated using 3) The Pleasantness of the resulting privacy filtered video UI-REF based privacy protection requirements. Accordingly the frames in terms of their ‘aesthetic’ perceptual appeal to evaluations attempted to assess the perceived efficacy, as well as human viewers. side-effects and affects arising from a proposed privacy filtering Figure 2 depicts an overview of the results from the three (3) solution -as described in [4,5]. Eight (8) research teams evaluation streams represented by the median values of the submitted privacy filtered video sequences for the evaluation. In submissions for each criterion. the context of surveillance scenarios, three distinct user studies were conducted to ensure the validity of the evaluation results. The subjective evaluations comprised: a) Stream 1: crowdsourcing evaluations by the general public from online communities (“naïve subjects”) in accordance with the methodology in [6]; b) Stream 2: subjective evaluations by security system manufacturers and video-analytics technology and privacy protection solutions developers; c) Stream 3: online subjective evaluations by a focus group comprising trained CCTV monitoring professionals, and Figure 2: Median values of the results from the 3 streams law enforcement personnel. 5. ACKNOWLEDGMENTS For consistency in the analysis of evaluation results from all The Visual Privacy Task at MediaEval 2014 was supported by the streams for all participants’ solutions, the same six (6) video clips European Commission under contracts FP7-261743 VideoSense. were pre-selected from each submission and evaluated using the three (3) evaluation streams. A questionnaire consisting of 12 questions had been carefully designed to examine aspects related 6. REFERENCES to privacy, intelligibility, and pleasantness; this was used in [1] Senior, A., Pankanti, S., Hampapur, A., Brown, L., Tian, stream 2 and 3. The first (5) questions were aimed at eliciting the Y.L., Ekin, A., Connell, J., Shu, C.F., and Lu, M., Enabling opinions of the evaluators re the Contents of the viewed videos. video privacy through computer vision; IEEE Security and The responses to these questions were considered with respect to Privacy 3, no. 3, pp. 50-57, 2005. the ground truth. The rest of the questions were aimed at eliciting [2] Korshunov, P., and Ebrahimi, T., PEViD: privacy evaluation the Subjective Opinions of the evaluators re the viewed videos. video dataset applications of digital image processing; Stream 1 used a shortened version of the questionnaire with (7) XXXVI, San Diego, USA, August 2013. questions in total due to crowdsourcing constraints. Some 290 [3] Badii, A., Einig, M., Tiemann, M., Thiemert, D. and Lallah, workers responded to the crowd-sourced evaluations. In the C., Visual context identification for privacy-respecting video design of the crowdsourcing campaign, special care was taken so analytics; in IEEE 14th International Workshop on that a worker would not see the same content with different filters Multimedia Signal Processing (MMSP 2012), pp. 366-371, (only one filter per content) and would not see different contents Banff, Canada, September 2012. with the same filter (only one content per filter). Also, only the answers from reliable crowd-sourcing workers were taken into [4] Badii, A, “UI-REF Methodology”, articles online. account. The reliability was ensured via honeypots, mean and [5] Badii, A., Al-Obaidi, A., and Einig, M., MediaEval 2013 deviation metrics of time per response to a question, and total Visual Privacy Task: Holistic Evaluation Framework for time per campaign. Out of the total 290 workers, 230 were found Privacy by Co-Design Impact Assessment. Proceedings of to have provided reliable responses to all the 8 evaluation batches, the MediaEval 2013 Workshop, Barcelona, Spain, 18-19 which resulted in 230/8=29 sets of workers' evaluations for each October 2013 filter submitted by each participant. [6] Korshunov, P., Nemoto, H., Skodras, A., and Ebrahimi, In Stream 2 evaluations, the focus group consisted of (65) T. Crowdsourcing-based Evaluation of Privacy in HDR participants, (15) of them were females; staff from Thales, France Images. SPIE Photonics Europe 2014, Optics, Photonics and took part in this evaluation. The majority of the participants were Digital Technologies for Multimedia Applications, Brussels, from the R&D departments, while the rest were from Belgium, Proceedings of SPIE, 2014.