MCG-ICT at MediaEval 2016: Verifying Tweets From Both Text and Visual Content Juan Cao1 , Zhiwei Jin1,2 , Yazi Zhang1,2 , Yongdong Zhang1 1 Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China {jinzhiwei, caojuan, zhangyazi, zhyd }@ict.ac.cn ABSTRACT The Verifying Multimedia Use Task aims to automatically detect manipulated and fake web multimedia content. We have two important improvements this year: On the one hand, considering that the prediction based on a short tweet is unreliable, we propose a topic-level credibility prediction framework. This framework exploits the internal relations of tweets belonging to same topic. Besides, we enhance the prediction precision of the framework by sampling topics and exploring topic-level features. On the other hand, motivated by the idea that manually edited or low-quality videos tend to be fake, we reference the handbook[1] about detecting manual editions and build a decision tree on videos. 1. PROPOSED APPROACH We treat the task as a binary classification problem: real or fake. Generally, a tweet contains two kinds of content: text content and visual content. So, we build two classifi- cation models respectively: for text content, the task pays more attention to small events than breaking news this year. Figure 1: The framework of the text analysis ap- More than 59% events contain less than 10 tweets and 95% proach are less than 50 tweets. Compared with last year’s 42.5 tweets per event, the small event verification is more chal- lenging. We propose a topic-level verification framework, our data, and tweets containing the same videos/images are and improve its performance by exploring topic-level fea- rather independent but have strong relations with each oth- tures and sampling on topics. For visual content, we refer- er. More specifically, they attend to have same credibilities. ence the handbook[1] about detecting manual editions and In order to exploit their inner-relations, we take the tweets build a decision tree on videos. of the same video/image as a topic, and build a topic-level However, the task focuses only on detecting fake tweets classifier. Compared with an independent tweet, a topic can while we put efforts to identify on both categories. Finally, maintain principal information and also eliminate random we propose a method performing relatively well on both real noise. As the primary contribution in the text analysis, the and fake tweets. More details about the task can be found topic-level improves the F1 value of 4% on fake tweets and in [2]. more than 8% on real tweets. Two main innovations of the topic-level part are as follows: 1.1 Text Analysis Approach Topic-level Features Extracting: For each topic, we As illustrated in Figure 1, the framework of our text anal- compute the average of its tweets’ features as its features. ysis approach consists of three parts: a message-level clas- Besides, we propose several statistic features which are listed sifier, a topic-level classifier and a fusing part. Like many in Table 1. Combining the two kinds of features above, we traditional text analysis methods, we firstly build a message- finally get the whole topic-level features. It turns out that level classifier based on the given content features and user these statistic features are quite effective for identifying fake features. However, a tweet is very short (no more than 140 tweets. They boost the topic-level classifier’s F1 value on words) and its meaning is incomplete. The credibility pre- fake tweets by more than 14%. diction on message-level is unreliable. Topics Sampling: In our dataset, More than 59% top- We observe that each tweet contains videos/images in ics contain less than 10 tweets and 95% are less than 50 tweets, which means there are quite a few small topics. To remove the noise brought by these small topics, we sample Copyright is held by the author/owner(s). MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Nether- topics with high confidence in a 10-fold cross validation pro- lands. cess. The sampling keeps the balance between fake and real Table 1: Topic Layer New Statistic Features Feature Explanation num tweets the number of tweets num distinct tweets/ the number of distinct tweets hashtags /hashtags distinct tweets index the ratio of distinct tweets the ratio of tweets containing contain url/mention urls/metions(namely, @) the ratio of tweets containing contain urls/mentions/ multiple urls/mentions hashtags/questionmarks /hashtags/questionmarks topics. By this technology, we largely improve the model’s performance on real tweets. The topic-level classifier classifies each topic and gives a corresponding probability value indicating how likely it is to Figure 2: The video classification decision tree. be fake. In the fusing part, this probability, as the topic- level result, is added to its tweets’ message-level features. The final classifier is built on the fused features above. quite variations. Ignoring this video, our model reaches a F1 score of 0.918 on real tweets and 0.763 on fake tweets. 1.2 Visual Analysis Approach In the testset, nearly half tweets contain videos while the 2. RESULTS AND DISCUSSION other half contains images. Observing this, we build two We submitted 3 results which are listed in Table 2. Run 1 visual classifiers respectively: for those tweets containing only uses text analysis approach while Run 2 only uses visual images, we propose the given 7 types of forensics features analysis approach. Run 3 is a hybrid of text and visual ap- [3] [4] [5] [6] to build a image classifier model. For tweet- proach: if a sample of testing tweet contains videos/images, s containing videos, we build a decision tree which is the we use the visual model to classify it, otherwise we choose primary innovation of the visual analysis approach. Details the text model. about the tree are as follows: The basic principle is that low-quality and manually edit- ed videos(except the professional-edited news videos) are Table 2: Topic Layer New Statistic Features more likely to be fake. To detect manual editions, we ref- Recall Precision F1-Score erence the handbook[1] written by experienced journalists Run 1 0.629 0.747 0.683 on recognizing manipulated videos, and summarize several Run 2 0.514 0.698 0.592 features indicating whether the video is edited. The features Run 3 0.610 0.764 0.678 include logos, video length, video size, shot number, resolu- tion ratio, contrast ratio. Basing on these features, we build From the results we can observe that: (1)Both two models a decision tree which is illustrated in Figure 2. reaches very promising results. (2) The text model is more The tree is intuitive. For a video containing logos, if it effective than the visual model. We assume it’s probably has a high quality, it’s judged as professional-edited and the because of lacking sufficient videos and our video model is label is real; if it has a low quality, its label depends on under-fitting. But the idea to detect manual editions in the length: if it’s long, it’s more likely to be the kind of videos inspires us to explore more videos to validate and videos which is produced by original people and edited by improve our model in the future. professional journalists. So, the label is real, or otherwise it’s fake. For a video lacking logos, if it has many shots which also suggests manual editions, it’s fake. Otherwise it’s real. 3. ACKNOWLEDGMENTS More details about these features are as follows: This work was supported by National Nature Science Foun- Logo Detecting: The basic idea to detect logos is that dation of China (61571424, 61172153) and the National High they are invariant compared with other parts of videos: we Technology Research and Development Program of China divide videos into frames, and detect color-fixed pixels in (2014AA015202). each frame. If a certain area’s pixels keeps unchange for most frames, it is determined as a logo. To reduce random errors caused by steady dispersed pixels like short lines, we 4. REFERENCES perform a median filter and we only keep logos that pass the [1] Craig Silverman. Verification Handbook: An Ultimate filter. Guideline on Digital Age Sourcing for Emergency Quality: We use the average value of video size, resolu- Coverage. Number 121. The European Journalism tion ratio and contrast ratio to represent a video’s quality. Centre, 2014. Our video classification model reaches a F1 score of 0.702 [2] Christina Boididou, Symeon Papadopoulos, Duc-Tien on real tweets and 0.429 on fake tweets. However, there’s Dang-Nguyen, Giulia Boato, Michael Riegler, StuartE. a super video which contains 334 tweets while the other 25 Middleton, Andreas Petlund, and Yiannis videos contains only 777 in total. This super video brings Kompatsiaris. Verifying multimedia use at mediaeval 2016. In Proceedings of the MediaEval 2016 Workshop, Hilversum, Netherlands,Oct. 20-21, 2016. [3] C.Pasquini, F.Perez-Gonzalez, and G. Boato. A benford-fourier jpeg compression detector. In IEEE The International Conference on Image Processing, pp, pages 5322–5326. IEEE, 2014. [4] T.Bianchi and A.Piva. Image forgery localization via block-grained analysis of jpeg artifacts. In IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, 2012., pages 1003–1017. IEEE, 2012. [5] M.Goljan, J.Fridrich, and M.Chen. Defending against fingerprint-copy attack in sensor-based camera identification. In IEEE Transactions on Information Security and Forensics, vol. 6, no. 1, 2010., pages 227–236. IEEE, 2010. [6] W.Li, Y.Yuan, and N.Yu. Passive detection of doctored jpeg image via block artifact grid extraction. In ACM Signal Processing, vol. 89, no. 9, 2009., pages 1821–1829. IEEE, 2009.