The VMU Participation @ Verifying Multimedia Use 2016 Christina Boididou1 , Stuart E. Middleton5 , Symeon Papadopoulos1 , Duc-Tien Dang-Nguyen2,3 , Michael Riegler4 , Giulia Boato2 , Andreas Petlund4 , and Yiannis Kompatsiaris1 1 Information Technologies Institute, CERTH, Greece. [boididou,papadop,ikom]@iti.gr 2 University of Trento, Italy. [dangnguyen,boato]@disi.unitn.it 3 Insight Centre for Data Analytics at Dublin City University, Ireland. duc-tien.dang-nguyen@dcu.ie 4 Simula Research Laboratory, Norway. michael@simula.no,apetlund@ifi.uio.no 5 University of Southampton IT Innovation Centre, UK. sem@it-innovation.soton.ac.uk ABSTRACT The participating approach predicts whether a tweet, which is accompanied by multimedia content (image/video), is trust- worthy (real) or deceptive (fake). We combine two differ- ent methods a) one using a semi-supervised learning scheme that leverages the decisions of two independent classifiers to produce a decision and b) one using textual patterns to ex- tract claims about whether a post is fake or real and attribu- tion statements about the content source. The experiments, carried out on the Verifying Multimedia Use dataset, used different combinations of content quality and trust-oriented features, namely tweet-based, user-based and forensics. Figure 1: Overview of agreement-based retraining. 1. INTRODUCTION After high-impact events, large amounts of unverified in- on them. This is motivated by a similar approach imple- formation usually start spreading in social media. Often, mented in [8] (for the problem of polarity classification). misleading information is getting viral affecting public opin- Figure 1 illustrates the adopted process. In step (a), us- ion and sentiment [4]. Based on this problem, the Verifying ing the training set, we build two independent classifiers Multimedia Use task highlights the need for verification and CL1 , CL2 and we combine their predictions for the test set. addresses the challenging problem of establishing automated We compare the two predictions, and depending on their approaches to classify social media posts as containing mis- agreement, we divide the test set into agreed and disagreed leading (fake) or trustworthy (real) content [2]. subsets, which are treated differently by the classification To tackle this challenge, we present a method combining framework. Assuming that the agreed predictions are cor- two approaches. The first approach is an extension of [3] rect with high likelihood, we use them as training samples which introduces an agreement-retraining method that uses along with our initial training set to build a new model for part of its own predictions as new training samples with classifying the disagreed samples. To this end, in step (b), the goal of adapting to posts from a new event (method we add the agreed samples to the best performing of the ARM). The second approach uses textual patterns to extract two initial models, CL1 , CL2 (comparing them on the ba- claims about whether a post is fake or real and attribution sis of their performance when doing cross-validation on the statements about the source of the content [7] (method ATT). training set). The goal of this method is to make the model The conducted experiments use various sets of features. adaptable to specific characteristics of a new event. The classifiers are built using three types of features: a) tweet-based (TB), which use the post’s metadata, b) user- 2. SYSTEM DESCRIPTION based (UB), which use the user’s metadata, c) multimedia forensics features (FOR), which are computed for the image 2.1 ARM: Agreement-based Retraining that accompanies the post. Except for the ones shared by Being an extension of [3], the proposed method uses an the task, we extract and use additional ones on each set. agreement-based retraining step with the aim to adapt to TB: Binary features such as the presence of a word, symbol posts from new events and improve the prediction accuracy or external link are added to the list. We also use language- specific binary features that correspond to the presence of specific terms; for languages, in which we cannot manage Copyright is held by the author/owner(s). to define such terms, we consider these values as missing. MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands We perform language detection with a publicly available li- brary1 . We add a feature for the number of slang words in a text, using slang lists in English2 and Spanish3 . For Table 1: Run description and results. Learning Precision Recall F-score the number of nouns, we use the Stanford parser4 to assign RUN-1 ARM (TB,UB) 0.981 0.851 0.912 parts of speech to each word (supported only in English) RUN-2 ARM (TB+FOR,UB) 0.771 0.906 0.833 and for text readability, the Flesch Reading Ease method5 , RUN-3 ARM (TB,UB) and ATT (pr.) 0.988 0.887 0.935 which computes the complexity of a piece of text as a score RUN-4 ARM (TB,UB) and ATT (ret.) 0.980 0.874 0.924 in the interval (0: hard-to-read, 100: easy-to-read). RUN-5 TB,UB,FOR 0.587 0.995 0.739 UB: We extract user-specific features such as the number of media items, the account age and others that summarize information shared by the user. For example, we check entity list automatically using information theoretic weight- whether the user shares a location and whether this can ings (i.e., TF-IDF) or create a list manually (i.e., using a be matched to a city name from the Geonames dataset6 . journalist’s trusted source list). All news providers have long For both TB and UB features, we adopt trust-oriented fea- lists of trusted sources for different regions around the world tures for the links shared, through the post itself (TB) or so this information is readily available. For this task we cre- the user profile (UB). The WOT metric7 is a score indicat- ated a list of candidate named entities by first running the ing how trustworthy a website is, using reputation ratings regex patterns on the dataset. We then manually checked by Web users. We also include the in-degree and harmonic each entity via Google search (e.g., looking at Twitter pro- centralities, rankings computed based on the links of the file pages) to determine if they were obvious news providers web forming a graph8 . Trust analysis of the links is also or journalists. performed using Web metrics provided by the Alexa API. We assign a confidence value to each matched pattern FOR: Following the method in [1], forensics features are ex- based on its source trustworthiness level. Evidence from tracted as descriptive statistics (maximum, minimum, mean, trusted authors is more trusted than evidence attributed median, most frequent value, standard deviation, and vari- to other authors, which is more trusted than unattributed ance) computed from the BAG values. In this work, we also evidence. In a cross-check step we choose the most trustwor- extracted an additional feature that can measure the im- thy claims to use for each image URI. If there is evidence age quality as a single score (from 0 to 100) by exploiting for both a fake and genuine claim with an equal confidence the method in [6]. The forensics features extraction step is we assume it is fake (i.e., any doubt = fake). Our approach performed as follows: for each image, a binary map is cre- provides a very high precision, low recall output. ated by thresholding the AJPG map, then the largest region is considered as object and the rest as the background. For 3. SUBMITTED RUNS AND RESULTS both regions, seven descriptive statistics are computed from We submitted five runs that explore different combina- the BAG values and concatenated to have a 14-dimensional tions of features (TB, UB, FOR) and methods (ARM, ATT). Table vector. The same process is applied on the NAJPG map. In 1 shows the specific run configurations and performance. order to measure the image quality, discrete cosine transfor- In RUN-1 and RUN-2, we apply the ARM in which we build mation (DCT) is applied on the whole image, then a support CL1 and CL2 (Figure 1) by using the sets of features spec- vector machine is applied to predict the quality based on the ified in Table 1. For example, in RUN-2, we use the con- values of the spectral and spatial entropies (computed from catenation of TB + FOR for CL1 and UB for CL2 . RUN-3 is the block DCT coefficients). In the end, all the forensics fea- a combination of ARM and ATT methods, in which we con- tures are concatenated as a 29-dimensional vector (14 from sider for each post the result of ATT as correct if available, AJPG, 14 from NAJPG, and 1 from image quality). otherwise we use the output of ARM. Similarly, in RUN-4, we consider the results of ATT as samples for retraining (step (b) 2.2 ATT: Attribution based claim extraction in Figure 1) along with the agreed ones of ARM. All models This approach is motivated by the human verification pro- built in ARM use a Random Forest WEKA implementation cess employed by journalists, where attributed sources are [5]. Finally, RUN-5 is a plain classification method that is key to trustworthiness of claims. A classic natural language built with the whole amount of available features. In terms processing pipeline is employed, involving text tokenization, of performance (F-score is the evaluation metric of the task), Parts of Speech (POS) tagging9 and a permissive named RUN-3 achieved the best score when using the combination entity recognition pattern focussing on noun phrases. A of the two methods. Apparently, as shown from the RUN- number of regex patterns were created to extract typical 1, RUN-2, the presence of FOR features reduced the system’s linguistic constructs around image and video content, such performance. By observing the RUN-3 and RUN-4, one may as debunking reports, claims of being real or attribution to notice the considerable performance benefit is derived from a third party source such as a news provider. the combined use of ARM and ATT. Our approach is semi-automated, using a list of a priori known trusted and untrusted sources. We can either learn an 4. ACKNOWLEDGMENTS 1 https://code.google.com/p/language-detection/ This work is supported by the REVEAL project, partially 2 http://onlineslangdictionary.com/word-list/0-a/ funded by the European Commission (FP7-610928). 3 http://www.languagerealm.com/spanish/spanishslang.php 4 http://nlp.stanford.edu/software/lex-parser.shtml 5. REFERENCES 5 http://simple.wikipedia.org/wiki/Flesch_Reading_Ease 6 [1] C. Boididou, S. Papadopoulos, D.-T. Dang-Nguyen, http://download.geonames.org/export/dump/cities1000.zip 7 G. Boato, and Y. Kompatsiaris. The certh-unitn https://www.mywot.com/ participation@ verifying multimedia use 2015. MediaEval 8 http://wwwranking.webdatacommons.org/more.html Benchmarking Initiative for Multimedia Evaluation 9 http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger (MediaEval’15), 2015. [2] C. Boididou, S. Papadopoulos, D.-T. Dang-Nguyen, [5] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, G. Boato, M. Riegler, S. E. Middleton, A. Petlund, and and I. H. Witten. The weka data mining software: an Y. Kompatsiaris. Verifying multimedia use at mediaeval update. ACM SIGKDD Explorations, 11(1):10–18, 2009. 2016. In MediaEval 2016 Workshop, Oct. 20-21, 2016, [6] L. Liu, B. Liu, H. Huang, and A. Bovik. No-reference image Hilversum, Netherlands, 2016. quality assessment based on spatial and spectral entropies. [3] C. Boididou, S. Papadopoulos, Y. Kompatsiaris, Signal Processing: Image Communication, 29(8):856–863, S. Schifferes, and N. Newman. Challenges of computational 2014. verification in social multimedia. In Proceedings of the [7] S. E. Middleton. Extracting attributed verification and Companion Publication of the 23rd International Conference debunking reports from social media: Mediaeval-2015 trust on World Wide Web Companion, pages 743–748, 2014. and credibility analysis of image and video. 2015. [4] V. Conotter, D.-T. Dang-Nguyen, G. Boato, M. Menéndez, [8] A. Tsakalidis, S. Papadopoulos, and I. Kompatsiaris. An and M. Larson. Assessing the impact of image manipulation ensemble model for cross-domain polarity classification on on users’ perceptions of deception. Proceedings of SPIE - twitter. In Web Information Systems Engineering–WISE The International Society for Optical Engineering, 2014. 2014, pages 168–177. Springer, 2014.