FakeNews Detection Using Pre-trained Language Models and Graph Convolutional Networks Nguyen Manh Duc Tuan Pham Quang Nhat Minh Toyo Unversity, Japan Aimesoft JSC., Vietnam ductuan024@gmail.com minhpham@aimesoft.com ABSTRACT We can use social network information to detect fake news by We introduce methods for detecting FakeNews related to coron- analyzing user-based features and network-based features. User- avirus and 5G conspiracy based on textual data and graph data. For based features are extracted from users’ profiles [7, 11]. Network- the Text-Based Fake News Detection subtask, we proposed a neural based features can be extracted from propagation posts or tweets network that combines textual features encoded by a pre-trained on the graph [18]. BERT model and metadata of tweets encoded by a multi-layer per- ceptron model. In the Structure-Based Fake News Detection subtask, 3 APPROACH we applied Graph Convolutional Networks (GCN) and proposed In this section, we describe our methods for two subtasks: text-based some features at each node of GCN. Experimental results show misinformation detection and structure-based misinformation de- that textual data contains more useful information for detecting tection. FakeNews than graph data, and using meta-data of tweets improved the result of the text-based model. 3.1 Text-Based Misinformation Detection 1 INTRODUCTION In this paper, we present our methods for two subtasks of the Fake- News Detection Task at MediaEval 2020 [9, 10]. We formalize the FakeNews detection task as a classification problem. In text-based subtask, we applied BERT model [2] which is the state-of-the-art model in many NLP tasks. BERT model has been shown to be effec- tive in many NLP tasks including text classification. We used Covid- Twitter-BERT [8] (CT-BERT), which was trained on a corpus of 160M tweets about the coronavirus. The data used to train CT-BERT has the same domain as the domain of data provided for the Fake- News detection task, and we expect that we can obtain better results with CT-BERT compared with the general BERT models trained on open-domain data. We combined metadata-based features with textual features obtained by CT-BERT and fine-tuned CT-BERT on our task-specific data. Experimental results show that combining Figure 1: Text-based Fake News Detection Model. metadata with textual features is better than using textual features only. In the structure-based subtask, we adopted Graph Convolu- Since tweet data is very noisy, we performed pre-processing tional Networks (GCN) [12] to capture the relations of nodes in steps as follows before putting data into CT-BERT model. retweet graphs. • We deleted mentions and emojis with tweet-preprocessor, a pre-processing library for tweet data. 2 RELATED WORK • We changed the words into lowercase forms. One of the approaches to fake news detection is using the content of • There are some emojis written in text format such as “:)”, the news. Content-based features are extracted from textual aspects “:(”, etc. We changed those emojis into sentiment words and visual aspects. Textual information can be extracted by layers “happy” or “sad”. of CNN [4]. From textual information, we can observe features that • We deleted punctuation characters that are not useful such are specific to fake news, such as writing style or emotions [3, 13, as “;”, “:”, “-”, “=”. 16]. Furthermore, both textual and visual information can be used • We did tokenization, word normalization, word segmen- together to detect fake news [5, 16, 17]. tation with ekphrasis [1], a text analysis tool for social medias. FakeNews detection data is unbalanced, in which the number Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). of tweets labeled as a conspiracy is much smaller than the number MediaEval’20, December 14-15 2020, Online of tweets labeled as non-conspiracy. Therefore, we balanced the dataset with Easy Data Augmentation (EDA) method [14]. MediaEval’20, December 14-15 2020, Online Nguyen and Pham Pre-processed and augmented data was then put into neural Table 1: Evaluation Results for Text-based Subtask networks. In our work, we conducted experiments with two models as follows. Run 2-class 3-class In the first model, we simply passed a tweet text into CT-BERT Run-1: Tweet only 0.361 0.412 and used the hidden vector at [CLS] token as the representation Run-2: Tweet + other features 0.396 0.419 of the tweet. The hidden state at [CLS] is then put into a sigmoid layer for 2-class classification or into a softmax layer for 3-class classification. In the second model, we combined text-based features with meta- Table 2: Evaluation Results for Structure-based Subtask data based features in a neural network shown in Figure 1. First, we get the embedding vector of a tweet text using CT-BERT. After that, Run 2-class 3-class we used 1D-CNN [6] with different filter sizes. By doing that, we Graph+extracted features 0.151 0.088 can use more information from various sources for prediction. We Graph+metadata+extracted features -0.081 0.151 passed metadata-based features into a fully-connected layer with batch normalization. Finally, we concatenated metadata features with all outputs from 1D-CNN and passed them into a sigmoid layer for 2-class classification or a softmax layer for 3-class classifica- • Run-1: We use the first model presented in Section 3.1 to tion. In addition to provided metadata, we extracted other features generate results. including the number of retweets, favorites, characters, words, ques- • Run-2: We use the second model presented in Section 3.1. tion marks, hashtags, mentions, and URLs in the tweet, the posted time of the tweet, and a binary feature to indicate whether or not it Table 1 shows results of our submitted runs. For the first run is a sensitive tweet. From users’ profiles, we extracted the number with tweets only, we obtained 0.361 of Matthews correlation coeffi- of friends, followers, groups, favorites, and statuses that users have cient (MCC) and 0.412 of MCC for 2-class and 3-class classification, posted. We also used the created time and whether or not the users’ respectively. In the second run, using tweets and other features, we profiles have been edited, and whether they are verified accounts or obtained 0.396 of MCC and 0.419 of MCC for 2-class and 3-class not. In total, we extracted 22 features including metadata features. classification, respectively. In experiments, we used the implementation of BERT in the library Transformers of HuggingFace [15]. 4.2 Structure-Based Misinformation Detection We submitted two runs in the structure-based subtask. 3.2 Structure-Based Misinformation Detection We applied Graph Convolutional Network [12] (GCN) for structure- • Run-1: We used 9 extracted features as node features in based subtask. The model uses traditional GCN on first-order prox- graphs. imity matrix and second-order proximity matrix. The first order • Run-2: We included metadata-based features along with 9 proximity is created by adding edges in the original adjacency ma- extracted features as node features. trix in order to a directed graph into an undirected graph. The Table 2 shows the results for two runs. For the first run, we got second-order proximity matrix is also an undirected graph and is 0.151 of MCC for 2-class classification and 0.088 of MCC for 3-class created by taking into account shared neighbors of each two nodes. classification. For the second run, we got -0.081 of MCC for 2-class We passed three created graphs into two layers of GCN, with the classification and 0.151 of MCC for 3-class classification. We can filter size of 64. After that, we concatenated three output graphs hor- see that metadata-based features did not show their benefits in our izontally and then used global max pooling to get the embedded vec- GCN model. tor of the entire graph. Finally, we passed it into a fully-connected In 2-class classification, the second run is better than the first layer of 512 nodes with dropout then added a sigmoid layer for run when we evaluated on the development set. We obtained 0.30 2-class classification or a softmax layer for 3-class classification. and 0.31 of MCC, respectively. The reason for the performance gap In GCN, from the input graph, for each node, by using networkx might be that the way we standardized features or split training library1 we created nine features: page-rank, in/out-degree, hub, data is not good. and authority, betweenness centrality, closeness, number of trian- gles, eigenvector centrality. For the first run, we use only the nine 5 CONCLUSIONS AND FUTURE WORK extracted features as node features. For the second run, we include We have presented our proposed methods for the two subtasks provided metadata features into node features. at MediaEval 2020 FakeNews Detection Task. In the text-based subtask, we have shown that using metadata-based features and 4 RESULTS AND ANALYSIS other proposed features outperformed the model with only text 4.1 Text-Based Misinformation Detection features. The MCC scores of our proposed models are still low, We submitted two runs for each of two-class classifier and three- especially in the structure-based subtask. In future work, we plan class classifier. to use external resources to compare different information sources and calculate the probability that a piece of information is false. We 1 https://networkx.org believe that it is a natural way to detect misinformation. FakeNews: Corona virus and 5G conspiracy MediaEval’20, December 14-15 2020, Online REFERENCES arXiv–1910. [1] Christos Baziotis, Nikos Pelekis, and Christos Doulkeridis. 2017. DataS- [16] Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and tories at SemEval-2017 Task 4: Deep LSTM with Attention for Message- Philip S. Yu. 2018. TI-CNN: Convolutional Neural Networks for Fake level and Topic-based Sentiment Analysis. In Proceedings of the 11th News Detection. (2018). arXiv:cs.CL/1806.00749 International Workshop on Semantic Evaluation (SemEval-2017). Asso- [17] Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware ciation for Computational Linguistics, Vancouver, Canada, 747–754. Multi-Modal Fake News Detection. (2020). arXiv:cs.CL/2003.04981 [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. [18] Xinyi Zhou and Reza Zafarani. 2019. Network-based Fake News 2019. BERT: Pre-training of Deep Bidirectional Transformers for Lan- Detection: A Pattern-driven Approach. (2019). arXiv:cs.SI/1906.04210 guage Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423 [3] Souvick Ghosh and Chirag Shah. 2018. Towards automatic fake news classification. Proceedings of the Association for Information Science and Technology 55, 1 (2018), 805–807. [4] Rohit Kumar Kaliyar, Anurag Goswami, Pratik Narang, and Soumendu Sinha. 2020. FNDNet–A deep convolutional neural network for fake news detection. Cognitive Systems Research 61 (2020), 32–44. [5] Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma. 2019. MVAE: Multimodal Variational Autoencoder for Fake News Detection. In The World Wide Web Conference (WWW ’19). Association for Computing Machinery, New York, NY, USA, 2915–2921. https: //doi.org/10.1145/3308558.3313552 [6] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classi- fication. (2014). arXiv:cs.CL/1408.5882 [7] S. Krishnan and M. Chen. 2018. Identifying Tweets with Fake News. In 2018 IEEE International Conference on Information Reuse and Integration (IRI). 460–464. https://doi.org/10.1109/IRI.2018.00073 [8] Martin Müller, Marcel Salathé, and Per E Kummervold. 2020. COVID- Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. (2020). arXiv:cs.CL/2005.07503 [9] Konstantin Pogorelov, Daniel Thilo Schroeder, Luk Burchard, Johannes Moe, Stefan Brenner, Petra Filkukova, and Johannes Langguth. 2020. FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020. In MediaEval 2020 Workshop. [10] Daniel Thilo Schroeder, Konstantin Pogorelov, and J. Langguth. 2019. FACT: a Framework for Analysis and Capture of Twitter Graphs. 2019 Sixth International Conference on Social Networks Analysis, Manage- ment and Security (SNAMS) (2019), 134–141. [11] Kai Shu, Xinyi Zhou, Suhang Wang, Reza Zafarani, and Huan Liu. 2019. The Role of User Profile for Fake News Detection. (2019). arXiv:cs.SI/1904.13355 [12] Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, and Andrew Lim. 2020. Directed Graph Convolutional Network. (2020). arXiv:cs.LG/2004.13970 [13] Yaqing Wang, Fenglong Ma, Z. Jin, Ye Yuan, G. Xun, Kishlay Jha, Lu Su, and Jing Gao. 2018. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing (2018). [14] Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Tech- niques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natu- ral Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 6383–6389. https: //www.aclweb.org/anthology/D19-1670 [15] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and others. 2019. HuggingFace’s Trans- formers: State-of-the-art Natural Language Processing. ArXiv (2019),