MediaEval 2013 Visual Privacy Task: Using Adaptive Edge Detection for Privacy in Surveillance Videos

MediaEval 2013 Visual Privacy Task: Using Adaptive Edge Detection for Privacy in Surveillance Videos VolkerEiselein eiselein@nue.tu-berlin.de Communications Systems Group Technische Universität Berlin TobiasSenst senst@nue.tu-berlin.de Communications Systems Group Technische Universität Berlin IvoKeller keller@nue.tu-berlin.de Communications Systems Group Technische Universität Berlin ThomasSikora sikora@nue.tu-berlin.de Communications Systems Group Technische Universität Berlin MediaEval 2013 Visual Privacy Task: Using Adaptive Edge Detection for Privacy in Surveillance Videos 52C44084868835D23B61A4F8F1570F09 GROBID - A machine learning software for extracting information from scholarly documents Privacy preservation video analysis obfuscation adaptive edge detection

In this paper we present a system for preserving the privacy of individuals in a video surveillance scenario. While a person's privacy should not be revealed to a viewer of the video without special needs, it is still important that the action in a scene as the semantic content of a video remain perceivable by a human observer.

The proposed system uses edge detection and adaptive thresholding in order to estimate the persons' silhouettes in a video scene and thus rendering most of their actions visible, while hiding sensitive personal information. In order to obtain a more complete contour around a person, an adaptive thresholding scheme using edge histograms is used as well as background subtraction which limits the edge extraction to foreground masks and thus avoids distraction of the viewer's eyes to background structures.

INTRODUCTION

With the increasing usage of Video Surveillance systems and a continuously growing amount of CCTV data being recorded and looked at, the need to apply privacy protection techniques in this area rises as well. In order to increase acceptance of CCTV-cameras among people being observed, it is of special importance to ensure their personal rights be not violated, while the surveillance system is still able to work and security staff is able to identify critical events in a video stream. The MediaEval 2013 Visual Privacy Task [1] addresses this issue and provides an evaluation based on the PEViD Dataset [5].

MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain For obfuscation of personal data, we propose to show only the contour of a person in the video. In case of a low number of persons in the scene, their movements and actions could be identified while it is still impossible to identify personal details such as the person's face, color of clothes or skin color. In this paper, we show a system which uses adaptive edge detection to determine the contour of a person and in the privacy-filtered result video replaces the interior of a person's silhouette by background information.

SYSTEM DESCRIPTION

The proposed system uses a background model in order to exchange the information within a person's contour by recent background information (see Fig. 1). The bounding boxes which are provided as manually-annotated ground truth for the purpose of MediaEval are used in order to identify where the privacy filter needs to be applied in the image. In these regions, edge detection is performed, and the result is blended into the background image. As the bounding boxes for objects could also be provided automatically (e.g. by human detection algorithms or trackers such as [3]), our system does not necessarily depend on manual annotations.

Double Background model

Our algorithm uses two background models: a) a standard Gaussian-Mixture Model (GMM) similar to [4] in order to obtain accurate foreground masks and b) a very simplified background model which essentially consists of only a single frame. The latter is used to maintain a very recent background information. This allows the viewer (or an analytics algorithm) to identify e.g. objects left in the scene or graffiti sprayed on the walls. As one of the aims of our system is to give the user a clear impression of what is going on in the scene, it is crucial to provide very recent and fast background information. We therefore just use a simple on-line learning method which adapts the background pixels for the regions without objects as follows:

BGi(x, y) = (1 − α) • BGi−1(x, y) + α • I(x, y)(1)

In order to adapt the background model quickly to the new frame, we use rather high values for α (e.g. α = 0.35) and update this model every 5th frame. This also has the advantage that the system can cope quickly with slight camera movements. In some of the PEViD videos, the camera shakes a bit and the image is shifted for some pixels. While first the background image in our system gets a bit blurry in these situations, after a few frames our system maintains the current background and is not disturbed by older values anymore. The GMM uses a much slower learning rate and is only used to mask out background parts within the given region of interest (ROI) -a task for which the previously described background model b) alone would be to simple.

Adaptive Edge Detection

We use Canny edge detection [2] in order to extract the silhouettes of the persons in the image. After a noise reduction step, this algorithms computes the gradients in the image and performs thresholding with hysteresis. A common problem for this algorithm is the choice of the two thresholds T1, T2 used in the hysteresis step. While T1 sets the minimum edge level accepted for the starting points, T2 determines which edge level must be kept during the hysteresis. Usually, it is hard to set general values which give satisfying results for arbitrary video sequences.

In our system which has to work under varying conditions (indoor, outdoor, different lighting, changing weather...), we propose to adapt the threshold according to the gradient histogram in the ROI. For the given ROI, we compute the absolute values of the gradient information and set up a histogram over every ROI. Using the assumption, that most of a human's silhouette can be recovered by the highest 15 % of the gradient information, we can thus find T1 for the Canny algorithm. In order to close holes in the silhouette, T2 is set to T2 = 0.9 • T1. This automatic choice of T1 and T2 adapts well to most scenes in the data set and gives a good contour information of a walking person in most cases.

EVALUATION RESULTS

The evaluation of the proposed filter is based on objective and subjective metrics described in [1]. The results are shown in Table 1. The proposed filter score has been compared with the average score of all 9 participants of the MediaEval Visual Privacy Task challenge. The results are very promising in terms of the privacy behavior where the adaptive edge filter is competitive with other participants.

In regards to the intelligibility and appropriateness the results of the subjective and objective evaluations related to the average of all participants are contrary. While for these categories the score of the objective results are less than the average, the score of the subjective results is higher than the average score. As the proposed filter extracts and reprojects the edge information into the image and fills the interior of

CONCLUSIONS

In this paper we showed how adaptive edge detection can be used to preserve the privacy of people in CCTV videos. The subjective test demonstrates that the proposed filter outperforms most of the other privacy filters proposed in the MediaEval 2013 challenge. Thus, our method is of special interest for semi-automatic video surveillance systems which need to hide sensitive personal information while preserving the context of actions.

Figure 1 :1Figure 1: original PEViD frame (left) and privacyfiltered result (right)

Table 1 :1Evaluation results for the proposed method compared to 9 other methods in the workshop.a person with background information, it is obvious that a standard person detection and tracking method which is usually based on extracting edges by HoG or wavelet filtering and is trained on normal videos cannot have the same performance as in unfiltered images. To apply the filtered videos for automated video analytics the respective preprocessing of the methods has to be adapted and we thus recommend a new training step based on feature extraction on the filtered images. However, based on the results of the subjective evaluation, the filter is especially appropriate for display in video surveillance systems that are guided and evaluated by human operators.Objective evaluationAdaptive edge filter Average (9)Intelligibility0.3583550.502378Privacy0.6938350.664903Appropriateness0.3686270.56048Subjective evaluationAdaptive edge filter Average (9)Intelligibility0.6783330.655741Privacy0.6837500.683843Appropriateness0.5325000.492130

ACKNOWLEDGMENTS

This work was supported by the European Commission under contracts FP7-261743 VideoSense.

<author> <persName><surname>References</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b1"> <monogr> <title level="m" type="main">Overview of the mediaeval 2013 visual privacy task ABadii MEinig TPiatrik 2013 A computational approach to edge detection JCanny IEEE Trans. Pattern Anal. Mach. Intell 8 6 1986 A motion-enhanced hybrid probability hypothesis density filter for real-time multi-human tracking in video surveillance scenarios VEiselein TSenst IKeller TSikora 15th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS) 2013 Detection of static objects for the task of video surveillance RHeras Evangelio TSenst TSikora IEEE Workshop on Applications of Computer Vision (WACV) 2011 PEViD: privacy evaluation video dataset PKorshunov TEbrahimi Proceedings of SPIE SPIE 2013 8856