MCG-ICT at MediaEval 2016: Verifying Tweets From Both
                   Text and Visual Content

                          Juan Cao1 , Zhiwei Jin1,2 , Yazi Zhang1,2 , Yongdong Zhang1
           1
               Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS),
                                 Institute of Computing Technology, CAS, Beijing, China
                              2
                                University of Chinese Academy of Sciences, Beijing, China
                                {jinzhiwei, caojuan, zhangyazi, zhyd }@ict.ac.cn

ABSTRACT
The Verifying Multimedia Use Task aims to automatically
detect manipulated and fake web multimedia content. We
have two important improvements this year: On the one
hand, considering that the prediction based on a short tweet
is unreliable, we propose a topic-level credibility prediction
framework. This framework exploits the internal relations
of tweets belonging to same topic. Besides, we enhance the
prediction precision of the framework by sampling topics and
exploring topic-level features. On the other hand, motivated
by the idea that manually edited or low-quality videos tend
to be fake, we reference the handbook[1] about detecting
manual editions and build a decision tree on videos.

1.    PROPOSED APPROACH
  We treat the task as a binary classification problem: real
or fake. Generally, a tweet contains two kinds of content:
text content and visual content. So, we build two classifi-
cation models respectively: for text content, the task pays
more attention to small events than breaking news this year.     Figure 1: The framework of the text analysis ap-
More than 59% events contain less than 10 tweets and 95%         proach
are less than 50 tweets. Compared with last year’s 42.5
tweets per event, the small event verification is more chal-
lenging. We propose a topic-level verification framework,        our data, and tweets containing the same videos/images are
and improve its performance by exploring topic-level fea-        rather independent but have strong relations with each oth-
tures and sampling on topics. For visual content, we refer-      er. More specifically, they attend to have same credibilities.
ence the handbook[1] about detecting manual editions and         In order to exploit their inner-relations, we take the tweets
build a decision tree on videos.                                 of the same video/image as a topic, and build a topic-level
  However, the task focuses only on detecting fake tweets        classifier. Compared with an independent tweet, a topic can
while we put efforts to identify on both categories. Finally,    maintain principal information and also eliminate random
we propose a method performing relatively well on both real      noise. As the primary contribution in the text analysis, the
and fake tweets. More details about the task can be found        topic-level improves the F1 value of 4% on fake tweets and
in [2].                                                          more than 8% on real tweets. Two main innovations of the
                                                                 topic-level part are as follows:
1.1   Text Analysis Approach                                        Topic-level Features Extracting: For each topic, we
   As illustrated in Figure 1, the framework of our text anal-   compute the average of its tweets’ features as its features.
ysis approach consists of three parts: a message-level clas-     Besides, we propose several statistic features which are listed
sifier, a topic-level classifier and a fusing part. Like many    in Table 1. Combining the two kinds of features above, we
traditional text analysis methods, we firstly build a message-   finally get the whole topic-level features. It turns out that
level classifier based on the given content features and user    these statistic features are quite effective for identifying fake
features. However, a tweet is very short (no more than 140       tweets. They boost the topic-level classifier’s F1 value on
words) and its meaning is incomplete. The credibility pre-       fake tweets by more than 14%.
diction on message-level is unreliable.                             Topics Sampling: In our dataset, More than 59% top-
   We observe that each tweet contains videos/images in          ics contain less than 10 tweets and 95% are less than 50
                                                                 tweets, which means there are quite a few small topics. To
                                                                 remove the noise brought by these small topics, we sample
Copyright is held by the author/owner(s).
MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Nether-    topics with high confidence in a 10-fold cross validation pro-
lands.                                                           cess. The sampling keeps the balance between fake and real
    Table 1: Topic Layer New Statistic Features
 Feature                Explanation
 num tweets             the number of tweets
 num distinct tweets/   the number of distinct tweets
 hashtags               /hashtags
 distinct tweets index  the ratio of distinct tweets
                        the ratio of tweets containing
 contain url/mention
                        urls/metions(namely, @)
                        the ratio of tweets containing
 contain urls/mentions/
                        multiple urls/mentions
 hashtags/questionmarks
                        /hashtags/questionmarks


topics. By this technology, we largely improve the model’s
performance on real tweets.
   The topic-level classifier classifies each topic and gives a
corresponding probability value indicating how likely it is to            Figure 2: The video classification decision tree.
be fake. In the fusing part, this probability, as the topic-
level result, is added to its tweets’ message-level features.
The final classifier is built on the fused features above.           quite variations. Ignoring this video, our model reaches a
                                                                     F1 score of 0.918 on real tweets and 0.763 on fake tweets.
1.2    Visual Analysis Approach
   In the testset, nearly half tweets contain videos while the       2.     RESULTS AND DISCUSSION
other half contains images. Observing this, we build two               We submitted 3 results which are listed in Table 2. Run 1
visual classifiers respectively: for those tweets containing         only uses text analysis approach while Run 2 only uses visual
images, we propose the given 7 types of forensics features           analysis approach. Run 3 is a hybrid of text and visual ap-
[3] [4] [5] [6] to build a image classifier model. For tweet-        proach: if a sample of testing tweet contains videos/images,
s containing videos, we build a decision tree which is the           we use the visual model to classify it, otherwise we choose
primary innovation of the visual analysis approach. Details          the text model.
about the tree are as follows:
   The basic principle is that low-quality and manually edit-
ed videos(except the professional-edited news videos) are                  Table 2: Topic Layer New Statistic Features
more likely to be fake. To detect manual editions, we ref-                            Recall Precision F1-Score
erence the handbook[1] written by experienced journalists                     Run 1    0.629    0.747      0.683
on recognizing manipulated videos, and summarize several                      Run 2    0.514    0.698      0.592
features indicating whether the video is edited. The features                 Run 3    0.610    0.764      0.678
include logos, video length, video size, shot number, resolu-
tion ratio, contrast ratio. Basing on these features, we build
                                                                        From the results we can observe that: (1)Both two models
a decision tree which is illustrated in Figure 2.
                                                                     reaches very promising results. (2) The text model is more
   The tree is intuitive. For a video containing logos, if it
                                                                     effective than the visual model. We assume it’s probably
has a high quality, it’s judged as professional-edited and the
                                                                     because of lacking sufficient videos and our video model is
label is real; if it has a low quality, its label depends on
                                                                     under-fitting. But the idea to detect manual editions in
the length: if it’s long, it’s more likely to be the kind of
                                                                     videos inspires us to explore more videos to validate and
videos which is produced by original people and edited by
                                                                     improve our model in the future.
professional journalists. So, the label is real, or otherwise it’s
fake. For a video lacking logos, if it has many shots which
also suggests manual editions, it’s fake. Otherwise it’s real.       3.     ACKNOWLEDGMENTS
More details about these features are as follows:                      This work was supported by National Nature Science Foun-
   Logo Detecting: The basic idea to detect logos is that            dation of China (61571424, 61172153) and the National High
they are invariant compared with other parts of videos: we           Technology Research and Development Program of China
divide videos into frames, and detect color-fixed pixels in          (2014AA015202).
each frame. If a certain area’s pixels keeps unchange for
most frames, it is determined as a logo. To reduce random
errors caused by steady dispersed pixels like short lines, we
                                                                     4.     REFERENCES
perform a median filter and we only keep logos that pass the         [1] Craig Silverman. Verification Handbook: An Ultimate
filter.                                                                  Guideline on Digital Age Sourcing for Emergency
   Quality: We use the average value of video size, resolu-              Coverage. Number 121. The European Journalism
tion ratio and contrast ratio to represent a video’s quality.            Centre, 2014.
   Our video classification model reaches a F1 score of 0.702        [2] Christina Boididou, Symeon Papadopoulos, Duc-Tien
on real tweets and 0.429 on fake tweets. However, there’s                Dang-Nguyen, Giulia Boato, Michael Riegler, StuartE.
a super video which contains 334 tweets while the other 25               Middleton, Andreas Petlund, and Yiannis
videos contains only 777 in total. This super video brings               Kompatsiaris. Verifying multimedia use at mediaeval
    2016. In Proceedings of the MediaEval 2016 Workshop,
    Hilversum, Netherlands,Oct. 20-21, 2016.
[3] C.Pasquini, F.Perez-Gonzalez, and G. Boato. A
    benford-fourier jpeg compression detector. In IEEE The
    International Conference on Image Processing, pp,
    pages 5322–5326. IEEE, 2014.
[4] T.Bianchi and A.Piva. Image forgery localization via
    block-grained analysis of jpeg artifacts. In IEEE
    Transactions on Information Forensics and Security,
    vol. 7, no. 3, 2012., pages 1003–1017. IEEE, 2012.
[5] M.Goljan, J.Fridrich, and M.Chen. Defending against
    fingerprint-copy attack in sensor-based camera
    identification. In IEEE Transactions on Information
    Security and Forensics, vol. 6, no. 1, 2010., pages
    227–236. IEEE, 2010.
[6] W.Li, Y.Yuan, and N.Yu. Passive detection of doctored
    jpeg image via block artifact grid extraction. In ACM
    Signal Processing, vol. 89, no. 9, 2009., pages
    1821–1829. IEEE, 2009.