1. INTRODUCTION

UNIZA@Mediaeval 2014 Visual Privacy Task: Object Transparency Approach

Martin Paralicˇ

martin.paralic@fel.uniza.sk 0

Roman Jarina

roman.jarina@fel.uniza.sk 1 0 University of Žilina, Faculty of Electrical Engineering, Department of Telecommunications and , Multimedia, Univerzitná 1, 01026 Žilina , Slovakia 1 University of Žilina, Faculty of Electrical Engineering, Department of Telecommunications and , Multimedia, Univerzitná 1, 01026 Žilina , Slovakia

2014

16 17

This paper describes our approach for the Visual Privacy Task (VPT) of the MediaEval 2014. Video privacy ltering based on privacy-sensitive-object transparency is proposed. The background (hidden behind the object) is estimated by median ltering over time sequence of pixel values. We focus only on the areas labeled as high privacy sensitive (i.e. face). Low and medium privacy areas were rather untouched to keep the most of information about person activities. Despite of simplicity of the proposed method it gives promising performance. The performance is at or slightly above the average among the VPT participants.

1. INTRODUCTION

The problem of privacy protection in video surveillance is again concerned in this year's MediaEval Visual Privacy Task (VPT) [ 2 ]. The PEViD dataset [ 7 ] is used for the impact assessment of alternative solutions. Recently, a variety of image processing methods have been developed to protect privacy in multimedia content. A common approach is based on replacing the sensitive information by color boxes or distorting the pixels. More sophisticated methods utilize person silhouettes detection followed by blurring the whole person [ 1 ]. Other methods are based on encrypting sensitive regions where the process is reversible for authorized persons only who know the encryption key [ 4 ].

The disadvantage of covering privacy information is the fact that the person's activities detection of which is crucial for surveillance purposes, are often also hidden or altered. The aim of this research activity is development of the visual ltering method that keeps as much information as possible about person's activities in video while keeping the person's privacy intact.

We propose the ltering that yields to transparency of the privacy sensitive objects. The background (hidden behind the object) estimation is based on computing median over time-sequence of pixel values for each pixel. We have focused only on the areas labeled as high privacy sensitive (i.e. face). Low and medium privacy areas were rather untouched to keep the most of information about person activities. This method is aimed to minimize discomfort of watching ltered video. Thus we utilized only the face position labels from the provided XML metadata. 2.

OBJECT SEGMENTATION

One of the requirements for the privacy ltering is object segmentation. In this task, two kinds of segmentations were at disposal. The rst one, automatic segmentation as described in [ 1 ], was utilized for background estimation. Second one, manual annotation of video stream in the XML form [ 7 ]. We extracted a face bounding box information from the provided XML metadata and used it for ltering. The box was tted by masking ellipse. 3.

THE PROPOSED METHOD

We examined a simple and straightforward method based on the ltering that replaces high privacy sensitive pixel areas by background pixels as depicted in Figure 1. The key part of the proposed algorithm is proper estimation of the whole background scene, with the aim to uncover the background parts that are of high privacy (e.g. face). However in some scenes, the background can be partially invisible as depicted in Figure 2. The procedure of the background estimation is as follows. Time sequence of RGB values of each pixel is transformed into a time sequence of grayscale values because of sorting purpose. Then median over each time sequence is computed. In addition , the foreground objects, detected by automatic segmentation [ 1 ], are obscured by black pixels [R; G; B] = [0; 0; 0]. This step is applied to avoid inclusion of the foreground objects in background pixel estimation. It is obvious that for su cient background estimation it is crucial that each background pixel has to be visible at least in one video frame. The background image is built from RGB values from the middle of the sorted time sequences of the pixel values. 4.

THE EVALUATION FRAMEWORK

The video sequences were evaluated to ful ll the UI-REF privacy protection requirements. Overall results of the crowd evaluation for submitted entries were quali ed in terms of the following criteria [ 3 ], [ 6 ], [ 5 ]:

The Privacy Protection Level { an average level of privacy protection across all clips.

Level of Intelligibility { the amount of useful information is retined after ltering.

The Appropriateness { aesthetic perceptual appeal to human viewers.

The evaluation scenario is composed the following three streams: Stream 1 For the crowdsourcing evaluation, about 290 workers answered several privacy, intelligibility, and pleasantness related questions for 6 pre-selected videos from the test videos submitted by participants.

Stream 2 A focus group comprising 65 participants (15 females), from Thales, France took part in this evaluation. The majority of the participants were sta from the R&D departments, while the rest were from Management, Security, and other departments.

Stream 3 A focus group comprising 59 participants (22 females), from sectors including R&D, data protection, law enforcement, from around the world took part in this study.

EVALUATION RESULTS

The evaluation results of the proposed lter performance were obtained in the 3 streams according to the VPT 2014 evaluation scenario. The reported results of the test among the VPT 2014 participants, which were evaluated in terms of the de ned criteria, are presented as median score over ten teams. This median serves as a baseline to compare our obtained results. The performance of the proposed approach compared to median results is depicted in Figure 3. Despite of simplicity of the proposed method it gives surprisingly promising performance. The obtained score is at, or slightly above the average in terms of all the three criteria.

The challenging problem is detection and estimation of the partially invisible background as shown in Figure 2. In 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% intel igibilityscore privacyscore pleasantnessscore UNIZA1

Median1

UNIZA2

Median2

UNIZA3

Median3 our future work, we will focus to automatic detection of face position and use more precise ltering tight around person face contours as well as use of more sophisticated translucency techniques. 6.

[1]

Badii ,

Al-Obaidi , and

Einig . Mediaeval 2013 visual privacy task: Holistic evaluation framework for privacy by co-design impact assessment . In MediaEval'2013 , pages 1{1 . CEUR-WS.org, 2013 .

[2]

Badii ,

Ebrahimi ,

Fedorczak ,

Korshunov ,

Piatrik ,

Eiselein , and A . Al-Obaidi . Overview of the mediaeval 2014 visual privacy task . In MediaEval 2014 , 2014 .

[3]

Badii ,

Einig ,

Tiemann ,

Thiemert , and

Lallah . Visual context identi cation for privacy-respecting video analytics . In 14th IEEE MMSP International Workshop on Multimedia Signal Processing , pages 366 { 371 , September 2012 .

[4]

T. E.

Boult . Pico: Privacy through invertible cryptographic obscuration . In Computer Vision for Interactive and Intelligent Environments , pages 27 { 38 , November 2005 .

[5]

Fradi ,

Eiselein , I. Keller, J. -L. Dugelay , and T. Sikora . Crowd context-dependent privacy protection lters . In 18th International Conference on Digital Signal Processing , pages 1 { 6 , July 2013 .

[6]

Korshunov ,

Cai , and

Ebrahimi . Crowdsourcing approach for evaluation of privacy lters in video surveillance . In 2013 18th International Conference on Digital Signal Processing (DSP) , pages 1 {6 . ACM, 2012 .

[7]

Korshunov and

Ebrahimi . Pevid: Privacy evaluation video dataset applications of digital image processing xxxvi . In SPIE International Society for Optics and Photonics , August 2013 .