The VMU Participation @ Verifying Multimedia Use 2016

 Christina Boididou1 , Stuart E. Middleton5 , Symeon Papadopoulos1 , Duc-Tien Dang-Nguyen2,3 ,
         Michael Riegler4 , Giulia Boato2 , Andreas Petlund4 , and Yiannis Kompatsiaris1
                     1
                         Information Technologies Institute, CERTH, Greece. [boididou,papadop,ikom]@iti.gr
                                     2
                                         University of Trento, Italy. [dangnguyen,boato]@disi.unitn.it
            3
                Insight Centre for Data Analytics at Dublin City University, Ireland. duc-tien.dang-nguyen@dcu.ie
                           4
                               Simula Research Laboratory, Norway. michael@simula.no,apetlund@ifi.uio.no
                    5
                        University of Southampton IT Innovation Centre, UK. sem@it-innovation.soton.ac.uk


ABSTRACT
The participating approach predicts whether a tweet, which
is accompanied by multimedia content (image/video), is trust-
worthy (real) or deceptive (fake). We combine two differ-
ent methods a) one using a semi-supervised learning scheme
that leverages the decisions of two independent classifiers to
produce a decision and b) one using textual patterns to ex-
tract claims about whether a post is fake or real and attribu-
tion statements about the content source. The experiments,
carried out on the Verifying Multimedia Use dataset, used
different combinations of content quality and trust-oriented
features, namely tweet-based, user-based and forensics.                   Figure 1: Overview of agreement-based retraining.

1.    INTRODUCTION
   After high-impact events, large amounts of unverified in-             on them. This is motivated by a similar approach imple-
formation usually start spreading in social media. Often,                mented in [8] (for the problem of polarity classification).
misleading information is getting viral affecting public opin-           Figure 1 illustrates the adopted process. In step (a), us-
ion and sentiment [4]. Based on this problem, the Verifying              ing the training set, we build two independent classifiers
Multimedia Use task highlights the need for verification and             CL1 , CL2 and we combine their predictions for the test set.
addresses the challenging problem of establishing automated              We compare the two predictions, and depending on their
approaches to classify social media posts as containing mis-             agreement, we divide the test set into agreed and disagreed
leading (fake) or trustworthy (real) content [2].                        subsets, which are treated differently by the classification
   To tackle this challenge, we present a method combining               framework. Assuming that the agreed predictions are cor-
two approaches. The first approach is an extension of [3]                rect with high likelihood, we use them as training samples
which introduces an agreement-retraining method that uses                along with our initial training set to build a new model for
part of its own predictions as new training samples with                 classifying the disagreed samples. To this end, in step (b),
the goal of adapting to posts from a new event (method                   we add the agreed samples to the best performing of the
ARM). The second approach uses textual patterns to extract               two initial models, CL1 , CL2 (comparing them on the ba-
claims about whether a post is fake or real and attribution              sis of their performance when doing cross-validation on the
statements about the source of the content [7] (method ATT).             training set). The goal of this method is to make the model
The conducted experiments use various sets of features.                  adaptable to specific characteristics of a new event.
                                                                            The classifiers are built using three types of features: a)
                                                                         tweet-based (TB), which use the post’s metadata, b) user-
2.    SYSTEM DESCRIPTION                                                 based (UB), which use the user’s metadata, c) multimedia
                                                                         forensics features (FOR), which are computed for the image
2.1    ARM: Agreement-based Retraining                                   that accompanies the post. Except for the ones shared by
  Being an extension of [3], the proposed method uses an                 the task, we extract and use additional ones on each set.
agreement-based retraining step with the aim to adapt to                 TB: Binary features such as the presence of a word, symbol
posts from new events and improve the prediction accuracy                or external link are added to the list. We also use language-
                                                                         specific binary features that correspond to the presence of
                                                                         specific terms; for languages, in which we cannot manage
Copyright is held by the author/owner(s).                                to define such terms, we consider these values as missing.
MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands        We perform language detection with a publicly available li-
brary1 . We add a feature for the number of slang words
in a text, using slang lists in English2 and Spanish3 . For                Table 1: Run description and results.
                                                                                   Learning             Precision   Recall   F-score
the number of nouns, we use the Stanford parser4 to assign         RUN-1          ARM (TB,UB)             0.981     0.851     0.912
parts of speech to each word (supported only in English)           RUN-2        ARM (TB+FOR,UB)           0.771     0.906     0.833
and for text readability, the Flesch Reading Ease method5 ,        RUN-3   ARM (TB,UB) and ATT (pr.)     0.988      0.887     0.935
which computes the complexity of a piece of text as a score        RUN-4   ARM (TB,UB) and ATT (ret.)     0.980     0.874     0.924
in the interval (0: hard-to-read, 100: easy-to-read).              RUN-5           TB,UB,FOR              0.587     0.995     0.739
UB: We extract user-specific features such as the number of
media items, the account age and others that summarize
information shared by the user. For example, we check             entity list automatically using information theoretic weight-
whether the user shares a location and whether this can           ings (i.e., TF-IDF) or create a list manually (i.e., using a
be matched to a city name from the Geonames dataset6 .            journalist’s trusted source list). All news providers have long
   For both TB and UB features, we adopt trust-oriented fea-      lists of trusted sources for different regions around the world
tures for the links shared, through the post itself (TB) or       so this information is readily available. For this task we cre-
the user profile (UB). The WOT metric7 is a score indicat-        ated a list of candidate named entities by first running the
ing how trustworthy a website is, using reputation ratings        regex patterns on the dataset. We then manually checked
by Web users. We also include the in-degree and harmonic          each entity via Google search (e.g., looking at Twitter pro-
centralities, rankings computed based on the links of the         file pages) to determine if they were obvious news providers
web forming a graph8 . Trust analysis of the links is also        or journalists.
performed using Web metrics provided by the Alexa API.               We assign a confidence value to each matched pattern
FOR: Following the method in [1], forensics features are ex-      based on its source trustworthiness level. Evidence from
tracted as descriptive statistics (maximum, minimum, mean,        trusted authors is more trusted than evidence attributed
median, most frequent value, standard deviation, and vari-        to other authors, which is more trusted than unattributed
ance) computed from the BAG values. In this work, we also         evidence. In a cross-check step we choose the most trustwor-
extracted an additional feature that can measure the im-          thy claims to use for each image URI. If there is evidence
age quality as a single score (from 0 to 100) by exploiting       for both a fake and genuine claim with an equal confidence
the method in [6]. The forensics features extraction step is      we assume it is fake (i.e., any doubt = fake). Our approach
performed as follows: for each image, a binary map is cre-        provides a very high precision, low recall output.
ated by thresholding the AJPG map, then the largest region
is considered as object and the rest as the background. For       3.   SUBMITTED RUNS AND RESULTS
both regions, seven descriptive statistics are computed from         We submitted five runs that explore different combina-
the BAG values and concatenated to have a 14-dimensional          tions of features (TB, UB, FOR) and methods (ARM, ATT). Table
vector. The same process is applied on the NAJPG map. In          1 shows the specific run configurations and performance.
order to measure the image quality, discrete cosine transfor-        In RUN-1 and RUN-2, we apply the ARM in which we build
mation (DCT) is applied on the whole image, then a support        CL1 and CL2 (Figure 1) by using the sets of features spec-
vector machine is applied to predict the quality based on the     ified in Table 1. For example, in RUN-2, we use the con-
values of the spectral and spatial entropies (computed from       catenation of TB + FOR for CL1 and UB for CL2 . RUN-3 is
the block DCT coefficients). In the end, all the forensics fea-   a combination of ARM and ATT methods, in which we con-
tures are concatenated as a 29-dimensional vector (14 from        sider for each post the result of ATT as correct if available,
AJPG, 14 from NAJPG, and 1 from image quality).                   otherwise we use the output of ARM. Similarly, in RUN-4, we
                                                                  consider the results of ATT as samples for retraining (step (b)
2.2    ATT: Attribution based claim extraction                    in Figure 1) along with the agreed ones of ARM. All models
   This approach is motivated by the human verification pro-      built in ARM use a Random Forest WEKA implementation
cess employed by journalists, where attributed sources are        [5]. Finally, RUN-5 is a plain classification method that is
key to trustworthiness of claims. A classic natural language      built with the whole amount of available features. In terms
processing pipeline is employed, involving text tokenization,     of performance (F-score is the evaluation metric of the task),
Parts of Speech (POS) tagging9 and a permissive named             RUN-3 achieved the best score when using the combination
entity recognition pattern focussing on noun phrases. A           of the two methods. Apparently, as shown from the RUN-
number of regex patterns were created to extract typical          1, RUN-2, the presence of FOR features reduced the system’s
linguistic constructs around image and video content, such        performance. By observing the RUN-3 and RUN-4, one may
as debunking reports, claims of being real or attribution to      notice the considerable performance benefit is derived from
a third party source such as a news provider.                     the combined use of ARM and ATT.
   Our approach is semi-automated, using a list of a priori
known trusted and untrusted sources. We can either learn an       4.   ACKNOWLEDGMENTS
1
  https://code.google.com/p/language-detection/                     This work is supported by the REVEAL project, partially
2
  http://onlineslangdictionary.com/word-list/0-a/                 funded by the European Commission (FP7-610928).
3
  http://www.languagerealm.com/spanish/spanishslang.php
4
  http://nlp.stanford.edu/software/lex-parser.shtml               5.   REFERENCES
5
  http://simple.wikipedia.org/wiki/Flesch_Reading_Ease
6                                                                 [1] C. Boididou, S. Papadopoulos, D.-T. Dang-Nguyen,
  http://download.geonames.org/export/dump/cities1000.zip
7                                                                     G. Boato, and Y. Kompatsiaris. The certh-unitn
  https://www.mywot.com/                                              participation@ verifying multimedia use 2015. MediaEval
8
  http://wwwranking.webdatacommons.org/more.html                      Benchmarking Initiative for Multimedia Evaluation
9
  http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger             (MediaEval’15), 2015.
[2] C. Boididou, S. Papadopoulos, D.-T. Dang-Nguyen,             [5] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
    G. Boato, M. Riegler, S. E. Middleton, A. Petlund, and           and I. H. Witten. The weka data mining software: an
    Y. Kompatsiaris. Verifying multimedia use at mediaeval           update. ACM SIGKDD Explorations, 11(1):10–18, 2009.
    2016. In MediaEval 2016 Workshop, Oct. 20-21, 2016,          [6] L. Liu, B. Liu, H. Huang, and A. Bovik. No-reference image
    Hilversum, Netherlands, 2016.                                    quality assessment based on spatial and spectral entropies.
[3] C. Boididou, S. Papadopoulos, Y. Kompatsiaris,                   Signal Processing: Image Communication, 29(8):856–863,
    S. Schifferes, and N. Newman. Challenges of computational        2014.
    verification in social multimedia. In Proceedings of the     [7] S. E. Middleton. Extracting attributed verification and
    Companion Publication of the 23rd International Conference       debunking reports from social media: Mediaeval-2015 trust
    on World Wide Web Companion, pages 743–748, 2014.                and credibility analysis of image and video. 2015.
[4] V. Conotter, D.-T. Dang-Nguyen, G. Boato, M. Menéndez,      [8] A. Tsakalidis, S. Papadopoulos, and I. Kompatsiaris. An
    and M. Larson. Assessing the impact of image manipulation        ensemble model for cross-domain polarity classification on
    on users’ perceptions of deception. Proceedings of SPIE -        twitter. In Web Information Systems Engineering–WISE
    The International Society for Optical Engineering, 2014.         2014, pages 168–177. Springer, 2014.