On The Pursuit of Fake News : From Graph Convolutional
                        Networks to Time Series
                                                                        Zeynep Pehlivan
                                                          Institut national de l’audiovisuel, France
                                                                       zpehlivan@ina.fr

ABSTRACT                                                                             classification model, GCN, proposed in [8] by using the graph con-
This paper presents the methods proposed by team INAFake team                        volutional layers from [7]. However, [7] is developed for node clas-
for MediaEval 2020 FakeNews: Corona virus and 5G conspiracy.                         sification, it can be extracted to use with the graph classification by
We concentrate our work on the sub-task of structure-based fake                      implementing a global pooling layer as a last layer that performs
news detection. Our aim is to test existing methods by leaning on                    some form of pooling operation 1 .
temporal features of networks without taking any textual features                        The second approach is the Deep Graph Convolutional Neural
into account. We applied two well known supervised graph classi-                     Network (DGCNN) [13] algorithm. It uses the graph convolutional
fication approaches, graph convolutional layers (GCN) and Deep                       layers from [7] and proposes "SortPooling" which sorts nodes ac-
Graph Convolutional Neural Network (DGCNN). We also present                          cording the concatenation of the node embeddings of all layers as
the problem as a multivariate time series classification problem and                 the continuous equivalent of node coloring algorithms. Then, such
tested multivariate long short term memory fully convolutional                       "colors" define a lexicographic ordering of nodes across graphs. The
network method.                                                                      top ordered nodes are then selected and fed (as a sequence) to a
                                                                                     one-dimensional convolutional layer that computes the aggregated
                                                                                     graph encoding. roles within the graph [2].
1    INTRODUCTION                                                                        As studied in [10], fake news spread significantly faster and
Social media, which provides instant textual and visual informa-                     deeper than the truth. Thus, we would like to put the temporal di-
tion exchange, plays an important role in information propagation                    mension into account for this challenge and problem falls into time
but plays also a crucial role for the propagation of fake informa-                   series classification category. Instead of creating univariate time
tion. One study [1] estimates that 42 percent of visits to fake news                 series from tweets published/retweet dates, we create multivariate
websites came through social media. Specially, when fake news                        time series (MLTS) by using graph features changing over time.
distort real-world information by tweaking or mixing it with the                     Recently, most approaches to MLTS have used neural networks,
true information, it spreads faster on social media [10].                            and in particular convolutional neural networks [6, 12]. We use
   The aim of Fake News task [9] is to detect misinformation spread-                 MLTSM-FCN [6] which is a combination of long short term mem-
ers by analysing tweets related to Coronavirus and 5G conspiracy -                   ory (LSTM) [5] and one-dimensional fully convolutional networks
the idea that the COVID-19 outbreak is somehow connected to the                      (FCN) [11] joined by a concatenation layer, followed by a shared
introduction of the 5G wireless technology. The challenge of this                    dense layer for predictions. In [6], authors propose two versions,
task is not only to detect the fake news but also to make the distinc-               one with attention layer (MALTSM-FCN), one without attention
tion between fake news related to Corona virus-5G and other fake                     layer. We choose to use the version with attention layer.
news subjects. This work addresses the issues related to sub-task
of structure-based fake news detection, thus it does not take the                    3     APPROACHES
tweets content into account.                                                         In this section, we are going to give details of our implementations
                                                                                     for GCN, DGCNN and MALTSM-FCN.
2    RELATED WORK
Fake news detection focuses on using news contents and social                        3.1     GCN
contexts. For social context based approaches, the features mainly
include user-based, post-based and network-based. For this chal-                     Our deep learning model is represented in Figure 1. The input is
lenge, we will focus on network based features. Two graph learning                   the graph represented by its adjacency and node features matrices.
problems have been well studied: node classification and graph                       The first three layers are Graph Convolutional as in [7] with each
classification. Node classification is to predict the class label of                 layer having 128 units with relu activations and orthogonal kernel
nodes in a graph, while graph classification aims to predict the                     initializer. The next layer is a mean pooling layer where the learned
class label of graphs, for which various graph kernels and deep                      node representation are summarized to create a graph represen-
learning approaches have been designed.                                              tation. The graph representation is input to three fully connected
   We first apply two different graph classification algorithms to                   layers with 128, 32 and 16 units respectively with relu activations
this challenge’s dataset. First one is based on graph convolutional                  and orthogonal kernel initializer. The model is trained using a batch
                                                                                     size of 128.
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
MediaEval’20, December 14-15 2020, Online
                                                                                     1 https://github.com/tkipf/gcn/issues/4
MediaEval’20, December 14-15 2020, Online                                                                                            Z.Pehlivan


               Figure 1: GCN architecture from [3]


                                                                                Figure 3: MALSTM-FCN architecture from [6]


                                                                        spreading we pre-processed the provided graphs and discarded
                                                                        all edges that point against the time. Then, for each node in each
                                                                        graph, following features are calculated by using networkx package
             Figure 2: DGCNN architecture from [3]                      : degree centrality, closeness centrality, betweenness centrality,load
                                                                        centrality, harmonic centrality, number of cliques, clustering coeffi-
                                                                        cient, square clustering coefficient and average neighbor degree.
3.2    DGCNN                                                               For MALSTM-FCN, for each graph, time series are created by us-
The model is represented in Figure 2. The model’s input is also         ing following graph features : average clustering coefficient, graph
the graph represented by its adjacency and node features matrices.      clique number, number of connected components, local efficiency,
The first four layers are Graph Convolutional layers, each have 128,    number of isolates and also normalized time distance to source
128, 128, 3 units and tanh activations. These layers are followed       tweet. We also discarded all edges that point against the time.
by a one dimensional convolutional layer, Conv1D, followed by
a max pooling, MaxPool1D, layer. Next is a second Conv1D layer          4    RESULTS AND DISCUSSIONS
that is followed by two Dense layers, first one with relu activation    Stratified K-Fold cross validation model (with k=10) is used to mea-
and followed by droput layer (0.2) and second one with softmax          sure the performance. For each fold, dataset is split into training
activation for classification.                                          (90%), validation (3% of training) and test (10%) sets. Figure 1 shows
   For GCN and DGCNN, categorical cross-entropy loss is used to         the results for K-Fold CV by using categorical accuracy, ROC AUC
train the neural network. The models are trained using a batch size     and Matthews correlation coefficient (MCC) and also the official
of 128 and 256 respectively, Adam optimizer with initial learning       results for test dataset (T-MCC).
rate 0.001 and decay 0.01, with dropout (0.2). We also reduced the
learning rate by a factor 1/10 and applied early stopping. Stellar-
Graph [3] and networkx [4] packages are used for the implementa-            Table 1: Stratified K-Fold CV and submission results
tion.
                                                                              Model           Accuracy       ROC AUC           MCC          T-MCC
3.3    MALSTM-FCN
                                                                            GCN              71.1± 0.2%     82.6 ± 1.1%     0.56 ± 0.003    0.020
This model is represented in Figure 3. The model’s input is the
                                                                           DGCNN             71.2 ± 0.2%    81.5 ± 1.0%     0.56 ± 0.004    0.023
time series generated by using graph features explained below.
                                                                         MALSTM-FCN          71.5 ± 1.7%    82.1 ± 1.7%     0.54 ± 0.004    0.035
This model is implemented by using source code of [6] 2 . For the
MALSTM-FCN network, the optimal number of LSTM hidden states
for each dataset was found via grid search over 8, 16, 32. The FCN
block consists of three blocks of 128-256-128 filters. The models are      The results are not promising at all. What went wrong? As the
trained using a batch size of 128. He uniform initializer is used for   results are really bad, we can not conclude that it was just a problem
the convolution kernels. The activcation function is set to relu.       of tuning. Probably, there is a bug between the code where we train
                                                                        and generate results. It can be an explication for the huge difference
3.4    Input generation                                                 between MCC values. We investigate on this.
                                                                           As a future work, it can be interesting to propose two steps
For GCN and DGCNN, the same input is used. As explained in
                                                                        classifier for this task : First to detect fake and not fake by using [8]
the challenge, the provided retweet graphs contain sub-graphs of
                                                                        which should give around 92% ROC AUC and then try to make the
the Twitters follower graph and as suggested by the organizers,
                                                                        distinction between corona and other conspiracy. For the time series
since each sub-graph must contain the trajectories of the real world
                                                                        part, we would like to focus on this approach by using different
2 https://github.com/titu1994/MLSTM-FCN                                 features in the future.
FakeNews: Corona virus and 5G conspiracy                                       MediaEval’20, December 14-15 2020, Online


REFERENCES
 [1] Hunt Allcott and Matthew Gentzkow. 2017. Social Media and Fake
     News in the 2016 Election. Journal of Economic Perspectives 31 (May
     2017), 211–236. https://doi.org/10.1257/jep.31.2.211
 [2] Davide Bacciu, Federico Errica, Alessio Micheli, and Marco Podda.
     2020. A gentle introduction to deep learning for graphs. Neural
     Networks 129 (Sept. 2020), 203–221. https://doi.org/10.1016/j.neunet.
     2020.06.006
 [3] CSIRO’s Data61. 2018. StellarGraph Machine Learning Library. https:
     //github.com/stellargraph/stellargraph. (2018).
 [4] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Explor-
     ing Network Structure, Dynamics, and Function using NetworkX. In
     Proceedings of the 7th Python in Science Conference, Gaël Varoquaux,
     Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 – 15.
 [5] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term
     Memory. Neural Computation 9, 8 (1997), 1735–1780.
 [6] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel
     Harford. 2019. Multivariate LSTM-FCNs for time series classification.
     Neural Networks 116 (Aug 2019), 237–245. https://doi.org/10.1016/j.
     neunet.2019.04.014
 [7] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification
     with Graph Convolutional Networks. arXiv:1609.02907 [cs, stat] (Feb.
     2017). http://arxiv.org/abs/1609.02907 arXiv: 1609.02907.
 [8] Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and
     Michael M. Bronstein. 2019. Fake News Detection on Social Media
     using Geometric Deep Learning. arXiv:1902.06673 [cs, stat] (Feb. 2019).
     http://arxiv.org/abs/1902.06673 arXiv: 1902.06673.
 [9] Konstantin Pogorelov, Daniel Thilo Schroeder, Luk Burchard, Johannes
     Moe, Stefan Brenner, Petra Filkukova, and Johannes Langguth. 2020.
     FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020.
     In MediaEval 2020 Workshop.
[10] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true
     and false news online. Science 359, 6380 (March 2018), 1146–1151.
     https://doi.org/10.1126/science.aap9559
[11] Zhiguang Wang, Weizhong Yan, and Tim Oates. 2016. Time Series
     Classification from Scratch with Deep Neural Networks: A Strong
     Baseline. (2016). arXiv:cs.LG/1611.06455
[12] Sung Whan Yoon, Jun Seo, and Jaekyun Moon. 2019. TapNet: Neural
     Network Augmented with Task-Adaptive Projection for Few-Shot
     Learning. (2019). arXiv:cs.LG/1905.06549
[13] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018.
     An End-to-End Deep Learning Architecture for Graph Classification.
     AAAI (2018).