FakeNews Detection Using Pre-trained Language Models and
                Graph Convolutional Networks
                       Nguyen Manh Duc Tuan                                                        Pham Quang Nhat Minh
                            Toyo Unversity, Japan                                                     Aimesoft JSC., Vietnam
                           ductuan024@gmail.com                                                      minhpham@aimesoft.com

ABSTRACT                                                                               We can use social network information to detect fake news by
We introduce methods for detecting FakeNews related to coron-                        analyzing user-based features and network-based features. User-
avirus and 5G conspiracy based on textual data and graph data. For                   based features are extracted from users’ profiles [7, 11]. Network-
the Text-Based Fake News Detection subtask, we proposed a neural                     based features can be extracted from propagation posts or tweets
network that combines textual features encoded by a pre-trained                      on the graph [18].
BERT model and metadata of tweets encoded by a multi-layer per-
ceptron model. In the Structure-Based Fake News Detection subtask,                   3     APPROACH
we applied Graph Convolutional Networks (GCN) and proposed                           In this section, we describe our methods for two subtasks: text-based
some features at each node of GCN. Experimental results show                         misinformation detection and structure-based misinformation de-
that textual data contains more useful information for detecting                     tection.
FakeNews than graph data, and using meta-data of tweets improved
the result of the text-based model.                                                  3.1    Text-Based Misinformation Detection


1    INTRODUCTION
In this paper, we present our methods for two subtasks of the Fake-
News Detection Task at MediaEval 2020 [9, 10]. We formalize the
FakeNews detection task as a classification problem. In text-based
subtask, we applied BERT model [2] which is the state-of-the-art
model in many NLP tasks. BERT model has been shown to be effec-
tive in many NLP tasks including text classification. We used Covid-
Twitter-BERT [8] (CT-BERT), which was trained on a corpus of
160M tweets about the coronavirus. The data used to train CT-BERT
has the same domain as the domain of data provided for the Fake-
News detection task, and we expect that we can obtain better results
with CT-BERT compared with the general BERT models trained
on open-domain data. We combined metadata-based features with
textual features obtained by CT-BERT and fine-tuned CT-BERT on
our task-specific data. Experimental results show that combining                           Figure 1: Text-based Fake News Detection Model.
metadata with textual features is better than using textual features
only. In the structure-based subtask, we adopted Graph Convolu-                         Since tweet data is very noisy, we performed pre-processing
tional Networks (GCN) [12] to capture the relations of nodes in                      steps as follows before putting data into CT-BERT model.
retweet graphs.                                                                            • We deleted mentions and emojis with tweet-preprocessor,
                                                                                              a pre-processing library for tweet data.
2    RELATED WORK                                                                          • We changed the words into lowercase forms.
One of the approaches to fake news detection is using the content of                       • There are some emojis written in text format such as “:)”,
the news. Content-based features are extracted from textual aspects                           “:(”, etc. We changed those emojis into sentiment words
and visual aspects. Textual information can be extracted by layers                            “happy” or “sad”.
of CNN [4]. From textual information, we can observe features that                         • We deleted punctuation characters that are not useful such
are specific to fake news, such as writing style or emotions [3, 13,                          as “;”, “:”, “-”, “=”.
16]. Furthermore, both textual and visual information can be used                          • We did tokenization, word normalization, word segmen-
together to detect fake news [5, 16, 17].                                                     tation with ekphrasis [1], a text analysis tool for social
                                                                                              medias.
                                                                                        FakeNews detection data is unbalanced, in which the number
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                   of tweets labeled as a conspiracy is much smaller than the number
MediaEval’20, December 14-15 2020, Online                                            of tweets labeled as non-conspiracy. Therefore, we balanced the
                                                                                     dataset with Easy Data Augmentation (EDA) method [14].
MediaEval’20, December 14-15 2020, Online                                                                                  Nguyen and Pham


   Pre-processed and augmented data was then put into neural                    Table 1: Evaluation Results for Text-based Subtask
networks. In our work, we conducted experiments with two models
as follows.                                                                        Run                               2-class      3-class
   In the first model, we simply passed a tweet text into CT-BERT
                                                                                   Run-1: Tweet only                  0.361       0.412
and used the hidden vector at [CLS] token as the representation
                                                                                   Run-2: Tweet + other features      0.396       0.419
of the tweet. The hidden state at [CLS] is then put into a sigmoid
layer for 2-class classification or into a softmax layer for 3-class
classification.
   In the second model, we combined text-based features with meta-            Table 2: Evaluation Results for Structure-based Subtask
data based features in a neural network shown in Figure 1. First, we
get the embedding vector of a tweet text using CT-BERT. After that,             Run                                     2-class      3-class
we used 1D-CNN [6] with different filter sizes. By doing that, we
                                                                                Graph+extracted features                  0.151      0.088
can use more information from various sources for prediction. We
                                                                                Graph+metadata+extracted features        -0.081      0.151
passed metadata-based features into a fully-connected layer with
batch normalization. Finally, we concatenated metadata features
with all outputs from 1D-CNN and passed them into a sigmoid layer
for 2-class classification or a softmax layer for 3-class classifica-
                                                                                 • Run-1: We use the first model presented in Section 3.1 to
tion. In addition to provided metadata, we extracted other features
                                                                                   generate results.
including the number of retweets, favorites, characters, words, ques-
                                                                                 • Run-2: We use the second model presented in Section 3.1.
tion marks, hashtags, mentions, and URLs in the tweet, the posted
time of the tweet, and a binary feature to indicate whether or not it        Table 1 shows results of our submitted runs. For the first run
is a sensitive tweet. From users’ profiles, we extracted the number       with tweets only, we obtained 0.361 of Matthews correlation coeffi-
of friends, followers, groups, favorites, and statuses that users have    cient (MCC) and 0.412 of MCC for 2-class and 3-class classification,
posted. We also used the created time and whether or not the users’       respectively. In the second run, using tweets and other features, we
profiles have been edited, and whether they are verified accounts or      obtained 0.396 of MCC and 0.419 of MCC for 2-class and 3-class
not. In total, we extracted 22 features including metadata features.      classification, respectively.
   In experiments, we used the implementation of BERT in the
library Transformers of HuggingFace [15].                                 4.2     Structure-Based Misinformation Detection
                                                                          We submitted two runs in the structure-based subtask.
3.2     Structure-Based Misinformation Detection
We applied Graph Convolutional Network [12] (GCN) for structure-                 • Run-1: We used 9 extracted features as node features in
based subtask. The model uses traditional GCN on first-order prox-                 graphs.
imity matrix and second-order proximity matrix. The first order                  • Run-2: We included metadata-based features along with 9
proximity is created by adding edges in the original adjacency ma-                 extracted features as node features.
trix in order to a directed graph into an undirected graph. The           Table 2 shows the results for two runs. For the first run, we got
second-order proximity matrix is also an undirected graph and is          0.151 of MCC for 2-class classification and 0.088 of MCC for 3-class
created by taking into account shared neighbors of each two nodes.        classification. For the second run, we got -0.081 of MCC for 2-class
    We passed three created graphs into two layers of GCN, with the       classification and 0.151 of MCC for 3-class classification. We can
filter size of 64. After that, we concatenated three output graphs hor-   see that metadata-based features did not show their benefits in our
izontally and then used global max pooling to get the embedded vec-       GCN model.
tor of the entire graph. Finally, we passed it into a fully-connected        In 2-class classification, the second run is better than the first
layer of 512 nodes with dropout then added a sigmoid layer for            run when we evaluated on the development set. We obtained 0.30
2-class classification or a softmax layer for 3-class classification.     and 0.31 of MCC, respectively. The reason for the performance gap
    In GCN, from the input graph, for each node, by using networkx        might be that the way we standardized features or split training
library1 we created nine features: page-rank, in/out-degree, hub,         data is not good.
and authority, betweenness centrality, closeness, number of trian-
gles, eigenvector centrality. For the first run, we use only the nine     5     CONCLUSIONS AND FUTURE WORK
extracted features as node features. For the second run, we include
                                                                          We have presented our proposed methods for the two subtasks
provided metadata features into node features.
                                                                          at MediaEval 2020 FakeNews Detection Task. In the text-based
                                                                          subtask, we have shown that using metadata-based features and
4 RESULTS AND ANALYSIS                                                    other proposed features outperformed the model with only text
4.1 Text-Based Misinformation Detection                                   features. The MCC scores of our proposed models are still low,
We submitted two runs for each of two-class classifier and three-         especially in the structure-based subtask. In future work, we plan
class classifier.                                                         to use external resources to compare different information sources
                                                                          and calculate the probability that a piece of information is false. We
1 https://networkx.org                                                    believe that it is a natural way to detect misinformation.
FakeNews: Corona virus and 5G conspiracy                                                                MediaEval’20, December 14-15 2020, Online


REFERENCES                                                                           arXiv–1910.
 [1] Christos Baziotis, Nikos Pelekis, and Christos Doulkeridis. 2017. DataS-   [16] Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and
     tories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-            Philip S. Yu. 2018. TI-CNN: Convolutional Neural Networks for Fake
     level and Topic-based Sentiment Analysis. In Proceedings of the 11th            News Detection. (2018). arXiv:cs.CL/1806.00749
     International Workshop on Semantic Evaluation (SemEval-2017). Asso-        [17] Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware
     ciation for Computational Linguistics, Vancouver, Canada, 747–754.              Multi-Modal Fake News Detection. (2020). arXiv:cs.CL/2003.04981
 [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.          [18] Xinyi Zhou and Reza Zafarani. 2019. Network-based Fake News
     2019. BERT: Pre-training of Deep Bidirectional Transformers for Lan-            Detection: A Pattern-driven Approach. (2019). arXiv:cs.SI/1906.04210
     guage Understanding. In Proceedings of the 2019 Conference of the
     North American Chapter of the Association for Computational Linguis-
     tics: Human Language Technologies, Volume 1 (Long and Short Papers).
     Association for Computational Linguistics, Minneapolis, Minnesota,
     4171–4186. https://doi.org/10.18653/v1/N19-1423
 [3] Souvick Ghosh and Chirag Shah. 2018. Towards automatic fake news
     classification. Proceedings of the Association for Information Science
     and Technology 55, 1 (2018), 805–807.
 [4] Rohit Kumar Kaliyar, Anurag Goswami, Pratik Narang, and Soumendu
     Sinha. 2020. FNDNet–A deep convolutional neural network for fake
     news detection. Cognitive Systems Research 61 (2020), 32–44.
 [5] Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma.
     2019. MVAE: Multimodal Variational Autoencoder for Fake News
     Detection. In The World Wide Web Conference (WWW ’19). Association
     for Computing Machinery, New York, NY, USA, 2915–2921. https:
     //doi.org/10.1145/3308558.3313552
 [6] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classi-
     fication. (2014). arXiv:cs.CL/1408.5882
 [7] S. Krishnan and M. Chen. 2018. Identifying Tweets with Fake News. In
     2018 IEEE International Conference on Information Reuse and Integration
     (IRI). 460–464. https://doi.org/10.1109/IRI.2018.00073
 [8] Martin Müller, Marcel Salathé, and Per E Kummervold. 2020. COVID-
     Twitter-BERT: A Natural Language Processing Model to Analyse
     COVID-19 Content on Twitter. (2020). arXiv:cs.CL/2005.07503
 [9] Konstantin Pogorelov, Daniel Thilo Schroeder, Luk Burchard, Johannes
     Moe, Stefan Brenner, Petra Filkukova, and Johannes Langguth. 2020.
     FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020.
     In MediaEval 2020 Workshop.
[10] Daniel Thilo Schroeder, Konstantin Pogorelov, and J. Langguth. 2019.
     FACT: a Framework for Analysis and Capture of Twitter Graphs. 2019
     Sixth International Conference on Social Networks Analysis, Manage-
     ment and Security (SNAMS) (2019), 134–141.
[11] Kai Shu, Xinyi Zhou, Suhang Wang, Reza Zafarani, and Huan Liu.
     2019. The Role of User Profile for Fake News Detection. (2019).
     arXiv:cs.SI/1904.13355
[12] Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, and
     Andrew Lim. 2020. Directed Graph Convolutional Network. (2020).
     arXiv:cs.LG/2004.13970
[13] Yaqing Wang, Fenglong Ma, Z. Jin, Ye Yuan, G. Xun, Kishlay Jha, Lu
     Su, and Jing Gao. 2018. EANN: Event Adversarial Neural Networks
     for Multi-Modal Fake News Detection. Proceedings of the 24th ACM
     SIGKDD International Conference on Knowledge Discovery & Data Min-
     ing (2018).
[14] Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Tech-
     niques for Boosting Performance on Text Classification Tasks. In
     Proceedings of the 2019 Conference on Empirical Methods in Natu-
     ral Language Processing and the 9th International Joint Conference
     on Natural Language Processing (EMNLP-IJCNLP). Association for
     Computational Linguistics, Hong Kong, China, 6383–6389. https:
     //www.aclweb.org/anthology/D19-1670
[15] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond,
     Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi
     Louf, Morgan Funtowicz, and others. 2019. HuggingFace’s Trans-
     formers: State-of-the-art Natural Language Processing. ArXiv (2019),