Extracting Attributed Verification and Debunking Reports
  from Social Media: MediaEval-2015 Trust and Credibility
                 Analysis of Image and Video
                                                           Stuart E. Middleton
                                            University of Southampton IT Innovation Centre
                                                           Southampton, UK
                                                    sem@it-innovation.soton.ac.uk


ABSTRACT                                                                  is that the 'wisdom of the crowd' is not really wisdom at all when it
Journalists are increasingly turning to technology for pre-filtering      comes to verifying suspicious images and videos. Instead it is better
and automation of the simpler parts of the verification process. We       to rank evidence from Twitter according to the most trusted and
present results from our semi-automated approach to trust and             credible sources in a way similar to human journalists. We describe
credibility analysis of tweets referencing suspicious images and          a semi-automated approach, automatically extracting claims about
videos. We use natural language processing to extract evidence            real or fake content and their source attributions and comparing
from tweets in the form of fake & genuine claims attributed to            them to a manually created list of trusted sources. A cross-checking
trusted and untrusted sources. Results for team UoS-ITI in the            step ranks conflicting claims and selects the most trustworthy
MediaEval 2015 Verifying Multimedia Use task are reported. Our            evidence on which to base a final fake/real decision.
                                                                           Named Entity Patterns
'fake' tweet classifier precision scores range from 0.94 to 1.0 (recall
                                                                                                                               e.g.
0.43 to 0.72), and our 'real' tweet classifier precision scores range      @ (NNP|NN)                                          CNN
from 0.74 to 0.78 (recall 0.51 to 0.74). Image classification              # (NNP|NN)                                          BBC News
                                                                           (NNP|NN) (NNP|NN)                                   @bbcnews
precision scores range from 0.62 to 1.0 (recall 0.04 to 0.23). Our         (NNP|NN)
approach can automatically alert journalists in real-time to
                                                                           Attribution Patterns
trustworthy claims verifying or debunking viral images or videos.
                                                                                                                               e.g.
                                                                           <NE> *{0,3} <IMAGE> ...                             FBI has released prime suspect photos ...
1. INTRODUCTION                                                            <NE> *{0,2} <RELEASE> *{0,4} <IMAGE> ...
                                                                           ... <IMAGE> *{0,6} <FROM> *{0,1} <NE>
                                                                                                                               ... pic - BBC News
     Content from social media sites such as Twitter, YouTube,                                                                 ... image released via CNN
                                                                           ... <FROM> *{0,1} <NE>                              ... RT: BBC News
Facebook and Instagram are becoming an important part of modern            ... <IMAGE> *{0,1} <NE>
                                                                           ... <RT> <SEP>{0,1} <NE>
journalism. Of particular importance to real-time breaking news is
amateur on the spot incident reports and eyewitness images and             Faked Patterns
videos. With breaking news having tight reporting deadlines,                                                                   e.g.
                                                                           ... *{0,2} <FAKED> ...                              ... what a fake! ...
measured in minutes not days, the need to quickly verify suspicious        ... <REAL> ? ...                                    ... is it real? ...
content is paramount [5] [7]. Journalists are increasingly looking to      ... <NEGATIVE> *{0,1} <REAL> ...                    ... thats not real ...
pre-filter and automate the simpler parts of the verification process.     Genuine Patterns
     Current tools available to journalists can be broadly                                                                     e.g.
categorized as dashboard and in-depth analytic tools. Dashboard            ... <IMAGE> *{0,2} <REAL> ...                       ... this image is totally genuine ...
                                                                           ... <REAL> *{0,2} <IMAGE> ...                       ... its real ...
tools display filtered traffic volumes, trending hashtags and maps         ... <IS> *{0,1} <REAL> ...
of content by topic, author and/or location. In-depth analysis tools       ... <NEGATIVE> *{0,1} <FAKE> ...
use techniques such as sentiment analysis, social network graph            Key
visualization and topic tracking. These tools help journalists             <NE> = named entity (e.g. trusted source)          <RT> = RT variants (e.g. RT, MT)
manage social media content but unverified rumours and fake news           <IMAGE> = image variants(e.g. pic, image, video) <SEP> = separator variants (e.g. : - = )
stories on social media are becoming both increasingly common [6]          <FROM> = from variants(e.g. via, from, attributed) <IS> = is | its | thats
                                                                           <REAL> = real variants (e.g. real, genuine)
and increasingly difficult to spot. The current best practice for          <NEGATIVE> = negative variants (e.g. not, isn't)
journalistic user generated content (UGC) verification [5] follows
                                                                          Figure 1: Verification Linguistic Patterns. These patterns are
a hard to scale manual process involving journalists reviewing
                                                                             encoded as regex patterns matching on both phrases in
content from trusted sources with the ultimate goal of phoning up
                                                                          content and their associated POS tags (e.g. NN = noun, NNP =
authors to verify specific images/videos and then asking permission
                                                                                                  proper noun).
to use that content for publication.
     In the REVEAL project we are developing ways to automate             2. APPROACH
the simpler verification steps, empowering journalists and helping             Our trust and credibility model is based on a classic natural
them to focus on cross-checking tasks that most need human
                                                                          language processing pipeline involving tokenization, Parts of
expertise. We are creating a trust and credibility model able to          Speech (POS) tagging, named entity recognition and relational
process real-time evidence extracted using a combination of natural       extraction. The innovation in our approach lies with our choice of
language processing, image analysis, social network analysis and          regex patterns, which are modelled on how journalists verify fake
semantic analysis. This paper describes our work on text analysis,
                                                                          and genuine claims by looking at the source attribution for each
extracting and processing fake and genuine claims from tweets             claim. This allows us to provide a novel conflict resolution
referencing suspicious images and videos. Our central hypothesis          approach based on ranking claims in order of trustworthiness. We
                                                                          use the Python NLTK toolkit [1], weak stemming, Punkt sentence
Copyright is held by the author/owner(s).                                 tokenizer and Treetagger POS tagger. To extract fake and genuine
MediaEval-2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
claims we use a set of regex patterns (see Figure 1) matching both        sources. The second semi-automated run used in addition the
terms and POS tags. To discover attribution we use a combination          source attribution regex patterns, matching attributed named
of named entity matching and regex patterns.                              entities to a manually created list of trusted and untrusted sources.
      Our semi-automated approach to named entity matching is             The final semi-automated run added the cross-check step, making
based on a list of a priori known trusted and untrusted sources. We       a decision not on the basis of each tweet alone but rather using the
can either learn an entity list automatically using information           most trustworthy evidence available after cross-checking all tweets
theoretic weightings (i.e. TF-IDF) or create a list manually (i.e.        referring to a specific image or video. This final approach is the
using a journalists trusted source list). All news providers have long    most realistic one for our journalistic use case; eyewitness images
lists of trusted sources for different regions around the world so this   and videos going viral during a breaking news story will typically
information is readily available. For the MediaEval 2015 Verifying        have hundreds of comments on Twitter before journalists discover
Multimedia Use task we created a list of candidate named entities         them and attempt verification.
by first running the regex patterns on the dataset. We then manually
checked each entity via Google search (e.g. looking at Twitter               Table 3: Fake and Real Tweet Classification for Testset.
profile pages). We removed any named entities which we                     fake classification                  real classification
considered a journalist would not have in a list of trusted or             P              R         F1          P             R         F1
untrusted sources. We kept news organizations, respected                   faked & genuine patterns (run-1)
journalists and well cited bloggers and experts. Creating these lists      1.0            0.03      0.06        0.75          0.001     0.003
took under two hours (570 named entities checked, 60 accepted).            faked & genuine & attribution patterns (run-3)
      We chose these regex patterns based on the frequency of text         1.0            0.03      0.06        0.43          0.03      0.06
patterns for source attribution, fake and genuine claims in the            faked & genuine & attribution patterns & cross-check (run-4)
MediaEval-2015 devset. Other researchers have published                    1.0            0.72      0.83        0.74          0.74      0.74
linguistic patterns used to detect rumours [3] [8] [4] but our
combination of fake/genuine claims and source attribution is novel,          Table 4: Fake and Real Image Classification for Testset
using insights from the well-established journalistic verification         fake classification                  real classification
processes for User Generated Content (UGC).                                P              R         F1          P             R        F1
      We assign a confidence value to each matched pattern based           faked & genuine & attribution patterns & cross-check
on its source trustworthiness level. Evidence from trusted authors         1.0            0.04      0.09        0.62          0.23     0.33
is more trusted than evidence attributed to trusted authors, which is
more trusted than other unattributed evidence. In a cross-check step      4. CONCLUSION
we choose the most trustworthy claims to use for each image URI.                When it comes to verifying claims about suspicious images
If there is evidence for both a fake and genuine claim with an equal      and videos our hypothesis is that the 'wisdom of the crowd' is not
confidence we assume it is fake (i.e. any doubt = fake).                  really wisdom at all and it is better to rank evidence from Twitter
                                                                          in order of the most trusted and credible sources. We have
    Table 1: Fake and Real Tweet Classification for Devset                developed a semi-automated trust and credibility model based on
 fake classification                  real classification                 this intuition and well known journalistic verification principles.
 P              R         F1          P             R        F1                 When applied to classifying tweets in isolation, our approach
 faked & genuine patterns                                                 has a high precision and low recall, making it of limited value.
 0.89           0.007     0.01        1.0           0.0007   0.001        When we cross-check tweets, ranking by trustworthiness and
 faked & genuine & attribution patterns                                   picking only the most trusted claims our approach is much more
 0.89           0.007     0.01        0.99          0.05     0.11         useful, with a high precision (0.94+) and average recall (0.43+).
 faked & genuine & attribution patterns & cross-check                     The ultimate goal of course is to classify images as fake (including
 0.94           0.43      0.59        0.78          0.51     0.61         use of image in the wrong context) or real not just the tweets that
                                                                          refer to them. Our classifier was able to classify 4-10% of fake
    Table 2: Fake and Real Image Classification for Devset                images, getting it right 96-100% of the time. For the harder problem
 fake classification                  real classification                 of classifying real images our approach was able to classify 19-23%
 P              R         F1          P             R        F1           of images, getting it right 62-95% of the time.
 faked & genuine & attribution patterns & cross-check                           In the context of journalistic verification these results are
 0.96           0.10      0.19        0.95          0.19     0.32         promising. Given enough tweeted claims about an image or video
                                                                          we can rank the most trustworthy and provide a highly accurate
3. RESULTS                                                                classification result. This means that once images and videos, such
      The MediaEval 2015 Verifying Multimedia Use task is to              as eyewitness content, go viral on twitter we will be able to provide
classify tweets about images and videos as real, fake or unknown.         a real-time view on their verification status. Our approach does not
Details of the task datasets, ground truth and evaluation                 replace manual verification techniques - someone still needs to
methodology used can be found in [2]. Results in Table 1 & Table          actually verify the content - but it can rapidly alert journalists to
2 show fake and real classification performance for the devset, with      trustworthy reports of verification and/or debunking. This in turn
Table 3 & Table 4 showing the testset. Journalists ultimately want        should speed up the verification cycle and allow the 'time to
to find verified genuine content that they can use in breaking news       publish' to be shortened.
stories. As such whilst the MediaEval-2015 Verifying Multimedia
Use task is focussed on classifying fake content we also report           5. ACKNOWLEDGEMENTS
results for the harder problem of classifying real content. We report          This work is part of the research and development in the
image classification accuracy as well as classification accuracy of       REVEAL project (grant agreement 610928), supported by the 7th
tweets referring to these images.                                         Framework Program of the European Commission. The authors
      Our first fully automated run used the 'faked & genuine' regex      would like to thank journalists at Deutsche Welle for their valuable
patterns applied to each tweet independently without lists of trusted     insights into the journalistic verification process.
6. REFERENCES                                                           Conference on Web and Social Media (ICWSM-15). Oxford,
[1] Bird, S. Klein, E. Loper, E. 2009. Natural Language                 UK
    Processing with Python—Analyzing Text with the Natural          [5] Silverman, C. (Ed.), 2013. Verification Handbook. European
    Language Toolkit, O’Reilly Media                                    Journalism Centre
[2] Boididou, C. Andreadou, K. Papadopoulos, S. Dang-Nguyen,        [6] Silverman, C. 2015. Lies, Damn Lies, and Viral Content.
    D. Boato, G. Riegler, M. Kompatsiaris, Y. 2015. Verifying           How News Websites Spread (and Debunk) Online Rumors,
    Multimedia Use at MediaEval 2015. In Proceedings of the             Unverified Claims, And Misinformation. Tow Center for
    MediaEval 2015 Workshop, Wurzen, Germany                            Digital Journalism, Columbia Journalism School
[3] Boididou, C. Papadopoulos, S. Kompatsiaris, Y. Schifferes,      [7] Spangenberg, J. Heise, N. 2014. News from the Crowd:
    S. Newman, N. 2014. Challenges of computational                     Grassroots and Collaborative Journalism in the Digital Age.
    verification in social multimedia. In Proceedings of the 23rd       In Proceedings of the Companion Publication of the 23rd
    International Conference on World Wide Web (WWW '14                 International Conference on World Wide Web Companion
    Companion), International World Wide Web Conferences                (WWW 2014). Seoul, Korea, 765-768
    Steering Committee, Republic and Canton of Geneva,              [8] Zhao, Z. Resnick, P. Mei, Q. 2015. Enquiring Minds: Early
    Switzerland
                                                                        Detection of Rumors in Social Media from Enquiry Posts. In
[4] Carton, S. Park, S. Zeffer, N. Adar, E. Mei, Q. Resnick, P.         Proceedings of the 24th International Conference on World
    2015. Audience Analysis for Competing Memes in Social               Wide Web (IW3C2), Florence, Italy
    Media. In Proceedings of the Ninth International AAAI