Imcube @ MediaEval 2015 DroneProtect Task: Reversible
            Masking using Steganography

                          Sebastian Schmiedeke, Pascal Kelm, and Lutz Goldmann
                                                       Imcube Labs GmbH
                                                         Berlin, Germany
                                    {schmiedeke, kelm, goldmann}@imcube.de


ABSTRACT
                                                                                          Inpainting, edge detection
This paper describes Imcube’s participation in the Drone-
Protect Task of MediaEval 2015, which aims to obscure
privacy-concerned image regions in videos sequences cap-
tured with drones. As a result persons and vehicles should
be unrecognisable, but the semantic meaning of the scene
should remain understandable to viewer. We use an ap-
proach which replaces the privacy-concerned region with an
automatically computed composite of inpainted background                                  DCT-based steganography
and foreground contour. Before obfuscation, the image re-
gion to be hidden is extracted and steganographically em-
bedded into the processed frame leading to a reversible so-      Figure 1: Exemplary frame to visualise our ap-
lution. The evaluation shows that the developed solution         proach: Our masking algorithm obfuscates privacy-
achieves good privacy protection while preserving the intel-     concerning regions, but preserving the semantics.
ligibility and aesthetic pleasantness of the original video.     The steganography algorithm guarantees that ob-
                                                                 fuscated areas can be recovered, if needed.
1.   INTRODUCTION
  Since drones become affordable, these devices are increas-
ingly used for security applications. Due to their flexibility
                                                                 2.1    Masking
videos captures by a drone may contain highly sensitive per-        Since the videos captured by the mini-drones [3] are highly
sonal data. Consequently, individuals are increasingly con-      dynamic compared to static surveillance cameras, traditional
cerned about the “invasiveness” of such ubiquitous surveil-      background subtraction techniques [5] cannot be applied for
lance and fear that their privacy is at risk. The demands of     foreground object extraction. Instead of that we use edge
stakeholders to prevent criminal activities are often seen to    detection to extract the outline of the foreground objects
be in conflict with the privacy requirements of individuals.     within the region of interest. Hence, each frame of the se-
  The DroneProtect Task of MediaEval 2015 deals with             quence is transformed into grey scale and then smoothed by
the problem of privacy protection in dynamic surveillance        applying a Gaussian kernel (5 × 5). Edges are detected by
videos [1].                                                      applying adaptive thresholding. Therefore, the area around
  A common way to protect privacy in images and videos is        each pixel is cross-correlated with Gaussian windows of a suf-
to apply techniques such as blurring or masking, as shown        ficiently large kernel width. Pixels exceeding the weighted
in [3]. Since these techniques are irreversible, steganogra-     sum of cross-correlation become edge pixels which are sub-
phy [4] can be used to preserve this information. A typi-        sequently used to form the outline. The binary outline is
cal steganography algorithm is located in the process chain      enhanced by applying morphological operations.
between quantisation of the DCT coefficients and Huffman            In order to remove the original foreground object, the
coding. The information to be hidden is embedded within          region of interest needs to be filled with reasonable back-
the least significant bit of non-zero AC coefficients of each    ground information. We rely on the inpainting algorithm by
DCT block.                                                       Telea [6] which is able to rapidly reconstruct missing image
                                                                 parts and works as follows. Starting with the boundaries the
                                                                 colour information is propagated to the inside of the region
2.   APPROACH                                                    by smoothly interpolating along pixel intensity lines. For
  Our approach combines both masking and steganography           this purpose, the “image smoothness information” [2], which
to obtain a visually appealing obfuscated video and to have      is estimated by a weighted sum of Laplacians of the known
the possibility to recover the original frame. An example is     neighbourhood, is propagated along these intensity lines.
shown in Figure 1.                                               The direction of the intensity lines, also called “isophotes”,
                                                                 is estimated by discretised intensity gradient vectors. Based
                                                                 on the assumption that isophotes have the smallest changes
Copyright is held by the author/owner(s).                        along their direction and the largest changes perpendicular
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany      to their direction, it is estimated by finding the largest gra-
                   Table 1: Subjective evaluation metrics assigned by different subject groups.
                         Subject group       Privacy Intelligibility Pleasantness Deviation
                          Category 1 (experts)        0.46         0.65            0.70           0.12
                          Category 2 (novices)        0.53         0.57            0.63           0.06
                          Overall                     0.50         0.61            0.67           0.09


dient which is orthogonal to the isophote belonging to that          persons and vehicles was protected through the obfuscation
pixel. To improve the temporal stability of the inpainting           or in other words how difficult the obfuscation made the
result, the inpainted areas are temporally filtered, e.g. for        identification of a person or vehicle by hiding relevant vi-
each pixel the median value with its temporally neighbours           sual information. The proposed method achieves a medium
is computed.                                                         overall score (0.50) which suggests that even though only the
   The obfuscated region of interest is obtained by blend-           contour of the object of interest is preserved in some cases
ing the extracted foreground contour with the reconstructed          it can still be identified. A deeper analysis of these cases is
background texture. Depending on the application the con-            needed to identify potential improvements.
tour may be emphasized with different colours.                          Intelligibility stands for the ability of classifying objects
                                                                     and actions within a video sequence and evaluates how well
2.2    Steganography                                                 the activities within a scene are preserved even if the object
   Since the masking algorithm itself is irreversible, the orig-     of interest is obfuscated to prevent its identification. The
inal image data contained within the region of interested            proposed method achieves a good overall score (0.61) which
must be embedded into the obfuscated image. Therefore, we            shows that contour information alone is enough to under-
make use of a steganography library, which is based on the           stand most of the semantics of a scene from a surveillance
F5-algorithm [7]. This algorithm embeds binary information           perspective. A deeper analysis of the individual videos is
into the DCT coefficients of a JPEG image. The least signif-         needed to understand what additional information is needed
icant bits of non-zero AC coefficients are replaced by the bits      to improve the intelligibility further.
to be embedded in such a way that the statistical distribu-             Pleasantness evaluates the influence of the obfuscation
tion of coefficients remains unchanged. Since only non-zero          method on the visual quality of the video or by how much the
coefficients can carry steganographic values and these coeffi-       quality of the video is degraded by distortions and artefacts
cients occur less frequent than zero-valued coefficients, only       within the region of interest. The subjective score is based
a limited amount of data can be embedded. Depending on               on the level of user acceptance. Here, the proposed methods
the amount and size of regions of interests that shall be hid-       achieves a good overall score (0.67), since the black fore-
den, the capacity is often insufficient. Therefore, each region      ground contours composed over an inpainted background
is treated as rectangular region which is JPEG compressed            blend well with the original content outside the region of
with an adjustable compression parameter. All compressed             interest. This score may be further improved by using a
regions together with their bounding boxes are then con-             more sophisticated inpainting algorithm which reconstructs
catenated, encrypted and embedded in the obfuscated JPEG             a more plausible background texture.
encoded image. Since the embedded information maybe de-                 Since the three metrics mentioned above evaluate quite
stroyed if the image sequence is transcoded, the individual          contrary requirements, the deviation evaluates difference be-
frames are simply combined into a Motion JPEG video.                 tween these metrics by computing the standard deviation.
                                                                     As it can be expected from the similar scores for the different
3.    EXPERIMENTS & RESULTS                                          metrics, the proposed approach has a very good deviation
                                                                     score (0.09). This shows that it strikes a good balance be-
   The video sequences of the DroneProtect dataset [3] are
                                                                     tween the different criteria (privacy, intelligibility and pleas-
obscured by replacing foreground objects with their outlines.
                                                                     antness).
We are sure that individuals can be identified not only by
                                                                        Comparing the results between the different subject groups
their face but also their clothes or accessories. So, the colour
                                                                     shows that the scores of the novices are more equal across
of objects with each regions are replaced by an estimated
                                                                     the different metrics, while the experts evaluate the intelli-
background. Since the regions of interests are provided with
                                                                     gibility and pleasantness higher and the privacy lower. This
the dataset, we apply our masking algorithm only on the
                                                                     follows the intuition that experts will be able to recognize
provided areas.
                                                                     actions and identities better than novices. The higher pleas-
   The evaluation of the obscured videos took place using
                                                                     antness score suggests that experts value the content of a
subjective procedures. Two groups of subjects with different
                                                                     video higher than its quality.
experience in surveillance applications were asked to survey
the videos and respond to questions concerning the content.
Based on the answers to these questions three different met-         4.   CONCLUSION
rics (privacy, pleasantness, intelligibility) and the deviation        A reversible approach for protecting the privacy based on
between were computed. The average scores of the different           masking and steganographic embedding of the region of in-
subject groups (experts, novices, overall) for all 38 videos         terest has been proposed and evaluation on the DronePro-
are summarized in Table 1. In the following we will anal-            tect dataset. The results shows that the approach strikes
yse the results for the different metrics and different subject      a good balance between privacy, intelligibility and pleasant-
groups.                                                              ness. For developing potential improvements a more detailed
   The privacy metric measures how well the identity of the          analysis of the scores for the individual videos is needed.
5.   REFERENCES
[1] A. Badii, P. Koshunov, H. Oudi, T. Ebrahimi,
    T. Piatrik, V. Eiselein, N. Ruchaud, C. Fedorczak, J.-L.
    Dugelay, and D. F. Vazquez. Overview of the
    MediaEval 2015 Drone Protect Task. In MediaEval 2015
    Workshop, Wurzen, Germany, September 14-15 2015.
[2] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester.
    Image inpainting. In Annual Conference on Computer
    Graphics, pages 417–424, 2000.
[3] M. Bonetto, P. Korshunov, G. Ramponi, and
    T. Ebrahimi. Privacy in mini-drone based video
    surveillance. In Workshop on De-identification for
    privacy protection in multimedia, 2015.
[4] T. Morkel, J. H. P. Eloff, and M. S. Olivier. An
    Overview of Image Steganography. In H. S. Venter,
    J. H. P. Eloff, L. Labuschagne, and M. M. Eloff,
    editors, Proceedings of the Fifth Annual Information
    Security South Africa Conference (ISSA2005), Sandton,
    South Africa, 6 2005. Published electronically.
[5] S. Schmiedeke, P. Kelm, L. Goldmann, and T. Sikora.
    TUB @ MediaEval 2014 Visual Privacy Task:
    Reversible Scrambling on Foreground Masks. In
    Proceedings of the MediaEval 2014 Multimedia
    Benchmark Workshop, pages 73–74. CEUR-WS, 2014.
[6] A. Telea. An image inpainting technique based on the
    fast marching method. Journal of graphics tools,
    9(1):23–34, 2004.
[7] A. Westfeld. F5-a steganographic algorithm. In I. S.
    Moskowitz, editor, Information Hiding, volume 2137 of
    Lecture Notes in Computer Science, pages 289–302.
    Springer Berlin Heidelberg, 2001.