=Paper= {{Paper |id=Vol-1436/Paper4 |storemode=property |title=Verifying Multimedia Use at MediaEval 2015 |pdfUrl=https://ceur-ws.org/Vol-1436/Paper4.pdf |volume=Vol-1436 |dblpUrl=https://dblp.org/rec/conf/mediaeval/BoididouAPDBRK15 }} ==Verifying Multimedia Use at MediaEval 2015== https://ceur-ws.org/Vol-1436/Paper4.pdf
                    Verifying Multimedia Use at MediaEval 2015

 Christina Boididou1 , Katerina Andreadou1 , Symeon Papadopoulos1 , Duc-Tien Dang-Nguyen2 ,
                  Giulia Boato2 , Michael Riegler3 , and Yiannis Kompatsiaris1
            1
                Information Technologies Institute, CERTH, Greece. [boididou,kandreadou,papadop,ikom]@iti.gr
                                 2
                                     University of Trento, Italy. [dangnguyen,boato]@disi.unitn.it
                                       3
                                           Simula Research Laboratory, Norway. michael@simula.no


ABSTRACT
This paper provides an overview of the Verifying Multime-
dia Use task that takes places as part of the 2015 MediaEval
Benchmark. The task deals with the automatic detection of
manipulation and misuse of Web multimedia content. Its
aim is to lay the basis for a future generation of tools that
could assist media professionals in the process of verifica-
tion. Examples of manipulation include maliciously tamper-
ing with images and videos, e.g., splicing, removal/addition
of elements, while other kinds of misuse include the reposting
of previously captured multimedia content in a different con-
text (e.g., a new event) claiming that it was captured there.                  (a) Manipulation                  (b) Reposting
For the 2015 edition of the task, we have generated and
made available a large corpus of real-world cases of images            Figure 1: Examples of fake web multimedia: a) dig-
that were distributed through tweets, along with manually              itally manipulated image of an IAF F-16 deploying a
assigned labels regarding their use, i.e. misleading (fake)            flare over Southern Lebanon; the flare was digitally
versus appropriate (real).                                             duplicated; b) an image posted during Hurricane
                                                                       Sandy that is a repost from a 2009 art installation.
1.   INTRODUCTION
   Modern Online Social Networks (OSN), such as Twitter,               using “tweet-” and “user”-topic features derived from the La-
Instagram and Facebook, are nowadays the primary sources               tent Dirichlet Allocation (LDA) model. They also introduce
of information and news for millions of users and the major            the user’s “expertness” and “bias” features, demonstrating
means of publishing user-generated content. With the grow-             that the bias features work better. Given the importance of
ing number of people participating and contributing to these           the problem, as attested by the increasing number of works
communities, analyzing and verifying the massive amounts               in the area [3], this task aspires to foster the development of
of such content has emerged as a major challenge. Veracity             new Web multimedia verification approaches.
is a crucial aspect of media content, especially in cases of
breaking news stories and incidents related to public safety,
ranging from natural disasters and plane crashes to terrorist          2.   TASK OVERVIEW
attacks. Popular stories have such profound impact on the                 The definition of the task is the following: “Given a tweet
public attention that content gets immediately retransmit-             and the accompanying multimedia item (image or video)
ted by millions of users, and often it is found to be mislead-         from an event that has the profile to be of interest in the in-
ing, resulting in misinformation of the public audience and            ternational news, return a binary decision representing ver-
even of the authorities.                                               ification of whether the multimedia item reflects the reality
   In this setting, there is increasing need for automated             of the event in the way purported by the tweet.” In prac-
real-time verification and cross-checking tools. Work has              tice, participants received a list of tweets that include images
been done in this field and techniques for evaluating tweets           and were required to automatically predict, for each tweet,
have been proposed. Gupta et al. [4] used the Hurricane                whether it is trustworthy or deceptive (real or fake respec-
Sandy natural disaster case to highlight the role of Twitter in        tively). In addition to fully automated approaches, the task
spreading fake content during the event, and proposed clas-            also considered human-assisted approaches provided that
sification models to distinguish between fake and real tweets.         they are practical (i.e., fast enough) in real-world settings.
Ito et al. [5] proposed a method to assess tweet credibility by        The following considerations should be made in addition to
                                                                       the above definition:
                                                                           • A tweet is considered fake when it shares multimedia
Copyright is held by the author/owner(s).                                     content that does not represent the event that it refers
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany                   to. Figure 1 presents examples of such content.
      • A tweet is considered real when it shares multimedia
        that legitimately represents the event it refers to.       Table 1: devset events: For each event, we report
                                                                   the numbers of unique real (if available) and fake im-
      • A tweet that shares multimedia content that does not       ages (IR , IF respectively), unique tweets that shared
        represent the event it refers to but reports the false     those images (TR , TF ) and unique Twitter accounts
        information or refers to it with a sense of humour is      that posted those tweets (UR , UF ).
        neither considered fake nor real (and hence not              Name                      IR     TR      UR     IF     TF      UF
        included in the datasets released by the task).              Hurricane Sandy           148   4,664   4,446   62    5,559   5,432
                                                                     Boston Marathon bombing   28     344     310    35     189     187
                                                                     Sochi Olympics             -      -       -     26     274     252
The task also asked participants to optionally return an ex-         MA flight 370              -      -       -     29     501     493
planation (which can be a text string, or URLs pointing              Bring Back Our Girls       -      -       -      7     131     126
to resources online) that supports the verification decision.        Columbian Chemicals        -      -       -     15     185     87
                                                                     Passport hoax              -      -       -      2     44      44
The explanation was not used for quantitative evaluation,            Rock Elephant              -      -       -      1     13      13
but rather for gaining qualitative insights into the results.        Underwater bedroom         -      -       -      3     113     112
                                                                     Livr mobile app            -      -       -      4      9       9
                                                                     Pig fish                   -      -       -      1     14      14
3.     VERIFICATION CORPUS                                           Total                     176   5,008   4,756   185   7,032   6,769

Development dataset (devset): This was provided to-
gether with ground truth and used by participants to de-             For every item of the aforementioned datasets, we ex-
velop their approach. It contains tweets related to the 11         tracted and made available three types of features:
events of Table 1, comprising in total 176 cases of real and
185 cases of misused images, associated with 5,008 real and             • Features extracted from the tweet itself, for instance
7,032 fake tweets posted by 4,756 and 6,769 unique users re-              the number of terms, the number of URLs, hashtags,
spectively. Note that several of the events, e.g., Columbian              the number of mentions, etc. [1].
Chemicals, Passport Hoax and Rock Elephant, were actually
                                                                        • User-based features which are based on the Twitter
hoaxes, hence all multimedia content associated with them
                                                                          user profile, for instance the number of friends and
was misused. For several real events (e.g., MA flight 370) no
                                                                          followers, the number of times the user is included in
real images (and hence no real tweets) were included in the
                                                                          a Twitter list, whether the user is verified, etc. [1].
dataset, since none came up as a result of the data collection
process that is described below.                                        • Forensic features extracted from the visual content of
Test dataset (testset): This was used for evaluation. It                  the tweet image, for instance the probability map of
comprises 17 cases of real images, 33 of misused images and               the aligned double JPEG compression, the potential
2 cases of misused videos, in total associated with 1,217 real            primary quantization steps for the first six DCT coef-
and 2,564 fake tweets that were posted by 1,139 and 2,447                 ficients of the non-aligned JPEG compression, and the
unique users respectively.                                                PRNU (Photo-Response Non-Uniformity) [2].
   The tweet IDs and image URLs for both datasets are pub-
licly available1 . Both consist of tweets collected around a       4.    EVALUATION
number of widely known events or news stories. The tweets             Overall, the task is interested in the accuracy with which
contain fake and real multimedia content that has been man-        an automatic method can distinguish between use of mul-
ually verified by cross-checking online sources (articles and      timedia in tweets in ways that faithfully reflect reality ver-
blogs). The data were retrieved with the help of Topsy and         sus ways that spread false impressions. Hence, given a set
Twitter APIs using keywords and hashtags around these              of labelled instances (tweet + image + label) and a set of
specific events. Having defined a set of keywords K for each       predicted labels (included in the submitted runs) for these
event of Table 1, we collected a set of tweets T . Afterwards,     instances, the classic IR measures (i.e., Precision P , Recall
with the help of online resources, we identified a set of unique   R, and F -score) were used to quantify the classification per-
fake and real pictures around these events, and created the        formance, where the target class is the class of fake tweets.
fake and the real image sets IF , IR respectively. We then         Since the two classes (fake/real) are represented in a rela-
used the image sets as seeds to create our reference verifica-     tively balanced way in the testset, the classic IR measures
tion corpus TC ⊂ T . This corpus includes only those tweets        are good proxies of the classifier accuracy. Note that task
that contain at least one image of the predefined sets of im-      participants were allowed to classify a tweet as unknown. Ob-
ages IF , IR . However, in order not to restrict the tweets        viously, in case a system produces many unknown outputs,
to only those that point to the exact seed image URLs, we          it is likely that its precision will benefit, assuming that the
also employed a scalable visual near-duplicate search strat-       selection of unknown was done wisely, i.e. successfully avoid-
egy as described in [6]. More specifically, we used the sets       ing erroneous classifications. However, the recall of such a
of fake and real images as visual queries and for each query       system would suffer in case the tweets that were labelled as
we checked whether each image tweet from the T set ex-             unknown turned out to be fake (the target class).
ists as an image item or a near-duplicate image item of the
IF or the IR set. To ensure near-duplicity, we empirically
set a minimum threshold of similarity tuned for high preci-        5.    ACKNOWLEDGEMENTS
sion. However, a small amount of the images exceeding the             We would like to thank Martha Larson for her valuable
threshold turned out to be irrelevant to the ones in the seed      feedback in shaping the task and writing the overview paper.
set. To remove those, we conducted a manual verification           This work is supported by the REVEAL project, partially
step on the extended set of images.                                funded by the European Commission (FP7-610928).
1
    https://github.com/MKLab-ITI/image-verification-corpus/        tree/master/mediaeval2015
6.   REFERENCES                                                       ’12, pages 2451–2460, New York, NY, USA, 2012. ACM.
[1] C. Boididou, S. Papadopoulos, Y. Kompatsiaris,                [4] A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking
    S. Schifferes, and N. Newman. Challenges of computational         Sandy: characterizing and identifying fake images on twitter
    verification in social multimedia. In Proceedings of the          during Hurricane Sandy. In Proceedings of the 22nd
    Companion Publication of the 23rd International Conference        international conference on World Wide Web companion,
    on World Wide Web Companion, pages 743–748, 2014.                 pages 729–736, 2013.
[2] V. Conotter, D.-T. Dang-Nguyen, M. Riegler, G. Boato, and     [5] J. Ito, J. Song, H. Toda, Y. Koike, and S. Oyama.
    M. Larson. A crowdsourced data set of edited images online.       Assessment of tweet credibility with LDA features. In
    In Proceedings of the 2014 International ACM Workshop on          Proceedings of the 24th International Conference on World
    Crowdsourcing for Multimedia, CrowdMM ’14, pages 49–52,           Wide Web Companion, pages 953–958, 2015.
    New York, NY, USA, 2014. ACM.                                 [6] E. Spyromitros-Xioufis, S. Papadopoulos, I. Kompatsiaris,
[3] N. Diakopoulos, M. De Choudhury, and M. Naaman.                   G. Tsoumakas, and I. Vlahavas. A comprehensive study over
    Finding and assessing social media information sources in         VLAD and Product Quantization in large-scale image
    the context of journalism. In Proceedings of the SIGCHI           retrieval. IEEE Transactions on Multimedia,
    Conference on Human Factors in Computing Systems, CHI             16(6):1713–1728, 2014.