=Paper=
{{Paper
|id=Vol-1436/Paper4
|storemode=property
|title=Verifying Multimedia Use at MediaEval 2015
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper4.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/BoididouAPDBRK15
}}
==Verifying Multimedia Use at MediaEval 2015==
Verifying Multimedia Use at MediaEval 2015 Christina Boididou1 , Katerina Andreadou1 , Symeon Papadopoulos1 , Duc-Tien Dang-Nguyen2 , Giulia Boato2 , Michael Riegler3 , and Yiannis Kompatsiaris1 1 Information Technologies Institute, CERTH, Greece. [boididou,kandreadou,papadop,ikom]@iti.gr 2 University of Trento, Italy. [dangnguyen,boato]@disi.unitn.it 3 Simula Research Laboratory, Norway. michael@simula.no ABSTRACT This paper provides an overview of the Verifying Multime- dia Use task that takes places as part of the 2015 MediaEval Benchmark. The task deals with the automatic detection of manipulation and misuse of Web multimedia content. Its aim is to lay the basis for a future generation of tools that could assist media professionals in the process of verifica- tion. Examples of manipulation include maliciously tamper- ing with images and videos, e.g., splicing, removal/addition of elements, while other kinds of misuse include the reposting of previously captured multimedia content in a different con- text (e.g., a new event) claiming that it was captured there. (a) Manipulation (b) Reposting For the 2015 edition of the task, we have generated and made available a large corpus of real-world cases of images Figure 1: Examples of fake web multimedia: a) dig- that were distributed through tweets, along with manually itally manipulated image of an IAF F-16 deploying a assigned labels regarding their use, i.e. misleading (fake) flare over Southern Lebanon; the flare was digitally versus appropriate (real). duplicated; b) an image posted during Hurricane Sandy that is a repost from a 2009 art installation. 1. INTRODUCTION Modern Online Social Networks (OSN), such as Twitter, using “tweet-” and “user”-topic features derived from the La- Instagram and Facebook, are nowadays the primary sources tent Dirichlet Allocation (LDA) model. They also introduce of information and news for millions of users and the major the user’s “expertness” and “bias” features, demonstrating means of publishing user-generated content. With the grow- that the bias features work better. Given the importance of ing number of people participating and contributing to these the problem, as attested by the increasing number of works communities, analyzing and verifying the massive amounts in the area [3], this task aspires to foster the development of of such content has emerged as a major challenge. Veracity new Web multimedia verification approaches. is a crucial aspect of media content, especially in cases of breaking news stories and incidents related to public safety, ranging from natural disasters and plane crashes to terrorist 2. TASK OVERVIEW attacks. Popular stories have such profound impact on the The definition of the task is the following: “Given a tweet public attention that content gets immediately retransmit- and the accompanying multimedia item (image or video) ted by millions of users, and often it is found to be mislead- from an event that has the profile to be of interest in the in- ing, resulting in misinformation of the public audience and ternational news, return a binary decision representing ver- even of the authorities. ification of whether the multimedia item reflects the reality In this setting, there is increasing need for automated of the event in the way purported by the tweet.” In prac- real-time verification and cross-checking tools. Work has tice, participants received a list of tweets that include images been done in this field and techniques for evaluating tweets and were required to automatically predict, for each tweet, have been proposed. Gupta et al. [4] used the Hurricane whether it is trustworthy or deceptive (real or fake respec- Sandy natural disaster case to highlight the role of Twitter in tively). In addition to fully automated approaches, the task spreading fake content during the event, and proposed clas- also considered human-assisted approaches provided that sification models to distinguish between fake and real tweets. they are practical (i.e., fast enough) in real-world settings. Ito et al. [5] proposed a method to assess tweet credibility by The following considerations should be made in addition to the above definition: • A tweet is considered fake when it shares multimedia Copyright is held by the author/owner(s). content that does not represent the event that it refers MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany to. Figure 1 presents examples of such content. • A tweet is considered real when it shares multimedia that legitimately represents the event it refers to. Table 1: devset events: For each event, we report the numbers of unique real (if available) and fake im- • A tweet that shares multimedia content that does not ages (IR , IF respectively), unique tweets that shared represent the event it refers to but reports the false those images (TR , TF ) and unique Twitter accounts information or refers to it with a sense of humour is that posted those tweets (UR , UF ). neither considered fake nor real (and hence not Name IR TR UR IF TF UF included in the datasets released by the task). Hurricane Sandy 148 4,664 4,446 62 5,559 5,432 Boston Marathon bombing 28 344 310 35 189 187 Sochi Olympics - - - 26 274 252 The task also asked participants to optionally return an ex- MA flight 370 - - - 29 501 493 planation (which can be a text string, or URLs pointing Bring Back Our Girls - - - 7 131 126 to resources online) that supports the verification decision. Columbian Chemicals - - - 15 185 87 Passport hoax - - - 2 44 44 The explanation was not used for quantitative evaluation, Rock Elephant - - - 1 13 13 but rather for gaining qualitative insights into the results. Underwater bedroom - - - 3 113 112 Livr mobile app - - - 4 9 9 Pig fish - - - 1 14 14 3. VERIFICATION CORPUS Total 176 5,008 4,756 185 7,032 6,769 Development dataset (devset): This was provided to- gether with ground truth and used by participants to de- For every item of the aforementioned datasets, we ex- velop their approach. It contains tweets related to the 11 tracted and made available three types of features: events of Table 1, comprising in total 176 cases of real and 185 cases of misused images, associated with 5,008 real and • Features extracted from the tweet itself, for instance 7,032 fake tweets posted by 4,756 and 6,769 unique users re- the number of terms, the number of URLs, hashtags, spectively. Note that several of the events, e.g., Columbian the number of mentions, etc. [1]. Chemicals, Passport Hoax and Rock Elephant, were actually • User-based features which are based on the Twitter hoaxes, hence all multimedia content associated with them user profile, for instance the number of friends and was misused. For several real events (e.g., MA flight 370) no followers, the number of times the user is included in real images (and hence no real tweets) were included in the a Twitter list, whether the user is verified, etc. [1]. dataset, since none came up as a result of the data collection process that is described below. • Forensic features extracted from the visual content of Test dataset (testset): This was used for evaluation. It the tweet image, for instance the probability map of comprises 17 cases of real images, 33 of misused images and the aligned double JPEG compression, the potential 2 cases of misused videos, in total associated with 1,217 real primary quantization steps for the first six DCT coef- and 2,564 fake tweets that were posted by 1,139 and 2,447 ficients of the non-aligned JPEG compression, and the unique users respectively. PRNU (Photo-Response Non-Uniformity) [2]. The tweet IDs and image URLs for both datasets are pub- licly available1 . Both consist of tweets collected around a 4. EVALUATION number of widely known events or news stories. The tweets Overall, the task is interested in the accuracy with which contain fake and real multimedia content that has been man- an automatic method can distinguish between use of mul- ually verified by cross-checking online sources (articles and timedia in tweets in ways that faithfully reflect reality ver- blogs). The data were retrieved with the help of Topsy and sus ways that spread false impressions. Hence, given a set Twitter APIs using keywords and hashtags around these of labelled instances (tweet + image + label) and a set of specific events. Having defined a set of keywords K for each predicted labels (included in the submitted runs) for these event of Table 1, we collected a set of tweets T . Afterwards, instances, the classic IR measures (i.e., Precision P , Recall with the help of online resources, we identified a set of unique R, and F -score) were used to quantify the classification per- fake and real pictures around these events, and created the formance, where the target class is the class of fake tweets. fake and the real image sets IF , IR respectively. We then Since the two classes (fake/real) are represented in a rela- used the image sets as seeds to create our reference verifica- tively balanced way in the testset, the classic IR measures tion corpus TC ⊂ T . This corpus includes only those tweets are good proxies of the classifier accuracy. Note that task that contain at least one image of the predefined sets of im- participants were allowed to classify a tweet as unknown. Ob- ages IF , IR . However, in order not to restrict the tweets viously, in case a system produces many unknown outputs, to only those that point to the exact seed image URLs, we it is likely that its precision will benefit, assuming that the also employed a scalable visual near-duplicate search strat- selection of unknown was done wisely, i.e. successfully avoid- egy as described in [6]. More specifically, we used the sets ing erroneous classifications. However, the recall of such a of fake and real images as visual queries and for each query system would suffer in case the tweets that were labelled as we checked whether each image tweet from the T set ex- unknown turned out to be fake (the target class). ists as an image item or a near-duplicate image item of the IF or the IR set. To ensure near-duplicity, we empirically set a minimum threshold of similarity tuned for high preci- 5. ACKNOWLEDGEMENTS sion. However, a small amount of the images exceeding the We would like to thank Martha Larson for her valuable threshold turned out to be irrelevant to the ones in the seed feedback in shaping the task and writing the overview paper. set. To remove those, we conducted a manual verification This work is supported by the REVEAL project, partially step on the extended set of images. funded by the European Commission (FP7-610928). 1 https://github.com/MKLab-ITI/image-verification-corpus/ tree/master/mediaeval2015 6. REFERENCES ’12, pages 2451–2460, New York, NY, USA, 2012. ACM. [1] C. Boididou, S. Papadopoulos, Y. Kompatsiaris, [4] A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking S. Schifferes, and N. Newman. Challenges of computational Sandy: characterizing and identifying fake images on twitter verification in social multimedia. In Proceedings of the during Hurricane Sandy. In Proceedings of the 22nd Companion Publication of the 23rd International Conference international conference on World Wide Web companion, on World Wide Web Companion, pages 743–748, 2014. pages 729–736, 2013. [2] V. Conotter, D.-T. Dang-Nguyen, M. Riegler, G. Boato, and [5] J. Ito, J. Song, H. Toda, Y. Koike, and S. Oyama. M. Larson. A crowdsourced data set of edited images online. Assessment of tweet credibility with LDA features. In In Proceedings of the 2014 International ACM Workshop on Proceedings of the 24th International Conference on World Crowdsourcing for Multimedia, CrowdMM ’14, pages 49–52, Wide Web Companion, pages 953–958, 2015. New York, NY, USA, 2014. ACM. [6] E. Spyromitros-Xioufis, S. Papadopoulos, I. Kompatsiaris, [3] N. Diakopoulos, M. De Choudhury, and M. Naaman. G. Tsoumakas, and I. Vlahavas. A comprehensive study over Finding and assessing social media information sources in VLAD and Product Quantization in large-scale image the context of journalism. In Proceedings of the SIGCHI retrieval. IEEE Transactions on Multimedia, Conference on Human Factors in Computing Systems, CHI 16(6):1713–1728, 2014.