On The Pursuit of Fake News : From Graph Convolutional Networks to Time Series Zeynep Pehlivan Institut national de l’audiovisuel, France zpehlivan@ina.fr ABSTRACT classification model, GCN, proposed in [8] by using the graph con- This paper presents the methods proposed by team INAFake team volutional layers from [7]. However, [7] is developed for node clas- for MediaEval 2020 FakeNews: Corona virus and 5G conspiracy. sification, it can be extracted to use with the graph classification by We concentrate our work on the sub-task of structure-based fake implementing a global pooling layer as a last layer that performs news detection. Our aim is to test existing methods by leaning on some form of pooling operation 1 . temporal features of networks without taking any textual features The second approach is the Deep Graph Convolutional Neural into account. We applied two well known supervised graph classi- Network (DGCNN) [13] algorithm. It uses the graph convolutional fication approaches, graph convolutional layers (GCN) and Deep layers from [7] and proposes "SortPooling" which sorts nodes ac- Graph Convolutional Neural Network (DGCNN). We also present cording the concatenation of the node embeddings of all layers as the problem as a multivariate time series classification problem and the continuous equivalent of node coloring algorithms. Then, such tested multivariate long short term memory fully convolutional "colors" define a lexicographic ordering of nodes across graphs. The network method. top ordered nodes are then selected and fed (as a sequence) to a one-dimensional convolutional layer that computes the aggregated graph encoding. roles within the graph [2]. 1 INTRODUCTION As studied in [10], fake news spread significantly faster and Social media, which provides instant textual and visual informa- deeper than the truth. Thus, we would like to put the temporal di- tion exchange, plays an important role in information propagation mension into account for this challenge and problem falls into time but plays also a crucial role for the propagation of fake informa- series classification category. Instead of creating univariate time tion. One study [1] estimates that 42 percent of visits to fake news series from tweets published/retweet dates, we create multivariate websites came through social media. Specially, when fake news time series (MLTS) by using graph features changing over time. distort real-world information by tweaking or mixing it with the Recently, most approaches to MLTS have used neural networks, true information, it spreads faster on social media [10]. and in particular convolutional neural networks [6, 12]. We use The aim of Fake News task [9] is to detect misinformation spread- MLTSM-FCN [6] which is a combination of long short term mem- ers by analysing tweets related to Coronavirus and 5G conspiracy - ory (LSTM) [5] and one-dimensional fully convolutional networks the idea that the COVID-19 outbreak is somehow connected to the (FCN) [11] joined by a concatenation layer, followed by a shared introduction of the 5G wireless technology. The challenge of this dense layer for predictions. In [6], authors propose two versions, task is not only to detect the fake news but also to make the distinc- one with attention layer (MALTSM-FCN), one without attention tion between fake news related to Corona virus-5G and other fake layer. We choose to use the version with attention layer. news subjects. This work addresses the issues related to sub-task of structure-based fake news detection, thus it does not take the 3 APPROACHES tweets content into account. In this section, we are going to give details of our implementations for GCN, DGCNN and MALTSM-FCN. 2 RELATED WORK Fake news detection focuses on using news contents and social 3.1 GCN contexts. For social context based approaches, the features mainly include user-based, post-based and network-based. For this chal- Our deep learning model is represented in Figure 1. The input is lenge, we will focus on network based features. Two graph learning the graph represented by its adjacency and node features matrices. problems have been well studied: node classification and graph The first three layers are Graph Convolutional as in [7] with each classification. Node classification is to predict the class label of layer having 128 units with relu activations and orthogonal kernel nodes in a graph, while graph classification aims to predict the initializer. The next layer is a mean pooling layer where the learned class label of graphs, for which various graph kernels and deep node representation are summarized to create a graph represen- learning approaches have been designed. tation. The graph representation is input to three fully connected We first apply two different graph classification algorithms to layers with 128, 32 and 16 units respectively with relu activations this challenge’s dataset. First one is based on graph convolutional and orthogonal kernel initializer. The model is trained using a batch size of 128. Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). MediaEval’20, December 14-15 2020, Online 1 https://github.com/tkipf/gcn/issues/4 MediaEval’20, December 14-15 2020, Online Z.Pehlivan Figure 1: GCN architecture from [3] Figure 3: MALSTM-FCN architecture from [6] spreading we pre-processed the provided graphs and discarded all edges that point against the time. Then, for each node in each graph, following features are calculated by using networkx package Figure 2: DGCNN architecture from [3] : degree centrality, closeness centrality, betweenness centrality,load centrality, harmonic centrality, number of cliques, clustering coeffi- cient, square clustering coefficient and average neighbor degree. 3.2 DGCNN For MALSTM-FCN, for each graph, time series are created by us- The model is represented in Figure 2. The model’s input is also ing following graph features : average clustering coefficient, graph the graph represented by its adjacency and node features matrices. clique number, number of connected components, local efficiency, The first four layers are Graph Convolutional layers, each have 128, number of isolates and also normalized time distance to source 128, 128, 3 units and tanh activations. These layers are followed tweet. We also discarded all edges that point against the time. by a one dimensional convolutional layer, Conv1D, followed by a max pooling, MaxPool1D, layer. Next is a second Conv1D layer 4 RESULTS AND DISCUSSIONS that is followed by two Dense layers, first one with relu activation Stratified K-Fold cross validation model (with k=10) is used to mea- and followed by droput layer (0.2) and second one with softmax sure the performance. For each fold, dataset is split into training activation for classification. (90%), validation (3% of training) and test (10%) sets. Figure 1 shows For GCN and DGCNN, categorical cross-entropy loss is used to the results for K-Fold CV by using categorical accuracy, ROC AUC train the neural network. The models are trained using a batch size and Matthews correlation coefficient (MCC) and also the official of 128 and 256 respectively, Adam optimizer with initial learning results for test dataset (T-MCC). rate 0.001 and decay 0.01, with dropout (0.2). We also reduced the learning rate by a factor 1/10 and applied early stopping. Stellar- Graph [3] and networkx [4] packages are used for the implementa- Table 1: Stratified K-Fold CV and submission results tion. Model Accuracy ROC AUC MCC T-MCC 3.3 MALSTM-FCN GCN 71.1± 0.2% 82.6 ± 1.1% 0.56 ± 0.003 0.020 This model is represented in Figure 3. The model’s input is the DGCNN 71.2 ± 0.2% 81.5 ± 1.0% 0.56 ± 0.004 0.023 time series generated by using graph features explained below. MALSTM-FCN 71.5 ± 1.7% 82.1 ± 1.7% 0.54 ± 0.004 0.035 This model is implemented by using source code of [6] 2 . For the MALSTM-FCN network, the optimal number of LSTM hidden states for each dataset was found via grid search over 8, 16, 32. The FCN block consists of three blocks of 128-256-128 filters. The models are The results are not promising at all. What went wrong? As the trained using a batch size of 128. He uniform initializer is used for results are really bad, we can not conclude that it was just a problem the convolution kernels. The activcation function is set to relu. of tuning. Probably, there is a bug between the code where we train and generate results. It can be an explication for the huge difference 3.4 Input generation between MCC values. We investigate on this. As a future work, it can be interesting to propose two steps For GCN and DGCNN, the same input is used. As explained in classifier for this task : First to detect fake and not fake by using [8] the challenge, the provided retweet graphs contain sub-graphs of which should give around 92% ROC AUC and then try to make the the Twitters follower graph and as suggested by the organizers, distinction between corona and other conspiracy. For the time series since each sub-graph must contain the trajectories of the real world part, we would like to focus on this approach by using different 2 https://github.com/titu1994/MLSTM-FCN features in the future. FakeNews: Corona virus and 5G conspiracy MediaEval’20, December 14-15 2020, Online REFERENCES [1] Hunt Allcott and Matthew Gentzkow. 2017. Social Media and Fake News in the 2016 Election. Journal of Economic Perspectives 31 (May 2017), 211–236. https://doi.org/10.1257/jep.31.2.211 [2] Davide Bacciu, Federico Errica, Alessio Micheli, and Marco Podda. 2020. A gentle introduction to deep learning for graphs. Neural Networks 129 (Sept. 2020), 203–221. https://doi.org/10.1016/j.neunet. 2020.06.006 [3] CSIRO’s Data61. 2018. StellarGraph Machine Learning Library. https: //github.com/stellargraph/stellargraph. (2018). [4] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Explor- ing Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 – 15. [5] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735–1780. [6] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford. 2019. Multivariate LSTM-FCNs for time series classification. Neural Networks 116 (Aug 2019), 237–245. https://doi.org/10.1016/j. neunet.2019.04.014 [7] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs, stat] (Feb. 2017). http://arxiv.org/abs/1609.02907 arXiv: 1609.02907. [8] Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M. Bronstein. 2019. Fake News Detection on Social Media using Geometric Deep Learning. arXiv:1902.06673 [cs, stat] (Feb. 2019). http://arxiv.org/abs/1902.06673 arXiv: 1902.06673. [9] Konstantin Pogorelov, Daniel Thilo Schroeder, Luk Burchard, Johannes Moe, Stefan Brenner, Petra Filkukova, and Johannes Langguth. 2020. FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020. In MediaEval 2020 Workshop. [10] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (March 2018), 1146–1151. https://doi.org/10.1126/science.aap9559 [11] Zhiguang Wang, Weizhong Yan, and Tim Oates. 2016. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. (2016). arXiv:cs.LG/1611.06455 [12] Sung Whan Yoon, Jun Seo, and Jaekyun Moon. 2019. TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning. (2019). arXiv:cs.LG/1905.06549 [13] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An End-to-End Deep Learning Architecture for Graph Classification. AAAI (2018).