=Paper= {{Paper |id=Vol-2699/paper35 |storemode=property |title=Embedding Partial Propagation Network for Fake News Early Detection |pdfUrl=https://ceur-ws.org/Vol-2699/paper35.pdf |volume=Vol-2699 |authors=Amila Silva,Yi Han,Ling Luo,Shanika Karunasekera,Christopher Leckie |dblpUrl=https://dblp.org/rec/conf/cikm/Silva0LKL20 }} ==Embedding Partial Propagation Network for Fake News Early Detection== https://ceur-ws.org/Vol-2699/paper35.pdf
Embedding Partial Propagation Network for Fake News
Early Detection
Amila Silvaa , Yi Hana , Ling Luoa , Shanika Karunasekeraa and Christopher Leckiea
a School of Computing and Information Systems, The University of Melbourne



                                          Abstract
                                          Detecting fake news as early as possible has attracted growing attention due to its fast-spreading nature and the significant
                                          harm it can cause. As demonstrated in recent studies, the propagation pattern of fake news on social media differs from that
                                          of real news, and a number of works have extracted different types of features from the propagation pattern for detection.
                                          However, a major limitation of this approach is that the propagation network is not fully available in the early stages, and
                                          may take a long time to complete. As a result, existing network-based fake news detection methods yield low accuracy
                                          during the early stages of propagation. To bridge the research gap, in this work we: (1) propose a novel network embedding
                                          algorithm, based on the investigation of a wide range of features obtained from the propagation network, which are not well
                                          studied in previous work; and (2) design an autoencoder-based neural architecture to predict the embedding of the complete
                                          propagation network using the partially available network in the early stages of propagation. Our experiments show that
                                          with the predicted embedding for the complete propagation network, our model can achieve state-of-the-art performance
                                          while only having access to the early stage propagation network.

                                          Keywords
                                          Fake News Detection, News Propagation Networks, Network Embedding



1. Introduction                                                                                                    detection [3, 4, 5, 6].
                                                                                                                      It has been demonstrated that the propagation pattern
While the growing popularity of social media has greatly                                                           of news on social media, e.g., tweets and retweets of news
facilitated the exchange of information, it also provides                                                          on Twitter, can facilitate the detection of fake news [7,
an ideal platform to spread fake news, especially inten-                                                           8, 9, 10, 11], since the propagation pattern of fake news
tional disinformation, which has already and will con-                                                             exhibits distinctive characteristics. However, instead of
tinue to cause significant damage.                                                                                 relying on the entire propagation network, which may
   Even though a number of independent fact-checking                                                               take days or even weeks to complete, we only use the
organisations have emerged globally over recent years,                                                             initial network that, for instance, belongs to the first 100
the sheer volume of fake news makes it infeasible to                                                               tweets, or tweets posted within the first few hours, to
rely entirely on human investigation. In addition, what                                                            verify a news item. Specifically, the main contributions
makes the task even more challenging is that fake news                                                             of this work include (Figure 1 provides an overview):
needs to be detected at an early stage before it becomes                                                           • We investigate a range of local and global features of
widespread, since it is difficult to correct people’s per-                                                         the propagation network, including temporal-based, text-
ception towards an issue once it is formed, even if the                                                            based and user-based, and compare their contributions
previous impression is inaccurate [1]. Therefore, in our                                                           to the detection of fake news. Based on the observations,
work we focus on fake news early detection: verifying                                                              we propose a novel network representation learning al-
the validity of a news item within a certain time limit                                                            gorithm to embed the propagation network;
from when it is published online. Here we use the defi-                                                            • We train an autoencoder that takes as input the par-
nition in [2] that fake news is intentionally and verifiably                                                       tial propagation network corresponding to the tweets
false news published by a news outlet—similar definitions                                                          posted within the detection deadline, and predicts the
have also been used in previous studies on fake news                                                               embedding of the complete propagation network;
                                                                                                                   • We perform extensive experiments to demonstrate that
Title of the Proceedings: “Proceedings of the CIKM 2020 Workshops
October 19-20, Galway, Ireland”
                                                                                                                   the predicted embedding of the complete propagation
Editors of the Proceedings: Stefan Conrad, Ilaria Tiddi                                                            network can be used to achieve state-of-the-art perfor-
email: amila.silva@student.unimelb.edu.au (A. Silva);                                                              mance in fake news early detection.
yi.han@unimelb.edu.au (Y. Han); ling.luo@unimelb.edu.au                                                               The remainder of this paper is organised as follows:
(L. Luo); karus@unimelb.edu.au (S. Karunasekera);                                                                  Section 2 defines the problem of fake news early detec-
caleckie@unimelb.edu.au (C. Leckie)
orcid: 0000-0003-2042-9050 (A. Silva); 0000-0001-6530-4564                                                         tion; Section 3 describes how to embed the propagation
(Y. Han); 0000-0002-1363-8308 (L. Luo); 0000-0001-7080-5064                                                        network; Section 4 introduces the network embedding-
(S. Karunasekera); 0000-0002-4388-0517 (C. Leckie)                                                                 based detection algorithm; Section 5 provides the experi-
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).                     mental verification of the designed algorithm; Section 6
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
                                                                         Detection Deadline
                                                                         Tweets Posted within the Detection Deadline
                                                                         Tweets Posted after the Detection Deadline
                                                                         Source Node/ News Record
                                            Local Network                Node-level Aggregation
                                             Embedding
                                                                             Autoencoder              Fake News Detector

                                                                                                                 Binary
                                                                                                                Classifier


                                                            Embedding of the            Predicted Embedding
                                            Global Network Initial Propagation            of the Complete
                                              Embedding          Network                Propagation Network

Figure 1: Overview of the Proposed Framework. Node-level and global attributes are extracted from the initial propagation
network to generate the network embedding, which is used to train an autoencoder to predict the embedding for the complete
propagation network.



reviews previous work on fake news detection; and fi- of time after tweet 𝑖, e.g., five hours. The follower and
nally Section 7 concludes the paper and offers directions following relations are not included when constructing
for future work.                                                  the edges of the propagation network, as they may not be
                                                                  available in real time due to the much stricter rate limit
                                                                  of the corresponding Twitter APIs, which prohibits the
2. Problem Definition                                             timely detection of fake news.
                                                                  • 𝑋𝑡𝑟 is the set of node-level and network-level features
We define the problem of fake news early detection as fol-
                                                                  for 𝐺 𝑟𝑡 , which are explained in detail in Section 3.
lows: let 𝑅 𝐿 be a set of labelled news records. Each record
                                                                      The problem is to predict the label 𝑦𝑟 for unlabelled
𝑟 ∈ 𝑅 𝐿 is represented as a tuple ⟨𝑡 𝑟 , 𝑊 𝑟 , 𝐺 𝑟𝑡 , 𝐻 𝑟 , 𝑦𝑟 ⟩,
                                                                  news    records 𝑟 ∈ 𝑅𝑈 as false or real news records within
where (1) 𝑡 𝑟 is the timestamp when 𝑟 is published online;
                                                                  a detection deadline Δ𝑡, where 𝐺 𝑟𝑡 for 𝑟 ∈ 𝑅𝑈 is only
(2) 𝑊 𝑟 is the text content of 𝑟; (3) 𝐺 𝑟𝑡 is the propagation
                                                                  available for 𝑡 ≤ Δ𝑡.
network of 𝑟 at timestamp 𝑡 𝑟 + 𝑡 (further explained be-
             𝑟
low); (4) 𝐻 is the set of timeline tweets posted by the
users involved in 𝐺 𝑟𝑡 , i.e., it provides background infor- 3. Representation Learning for
mation of the news spreaders. Note that 𝐻 𝑟 does not
necessarily always have to contain the latest timeline                  Propagation Network
tweets; and (5) 𝑦𝑟 is the label: 𝑦𝑟 is 1 if 𝑟 is false and 0
                                                                  In this section, we propose a simple yet effective unsuper-
otherwise.
                                    𝑟                             vised network representation learning method to embed
   Each propagation network 𝐺 𝑡 is an attributed directed
                                                                  the propagation network. Formally, for a given propaga-
graph (𝑉𝑡𝑟 , 𝐸 𝑡𝑟 , 𝑋𝑡𝑟 ), where:
     𝑟                                                            tion network 𝐺 𝑟𝑡 of news record 𝑟 at timestamp 𝑡 𝑟 + 𝑡,
• 𝑉𝑡 is the set of vertices/nodes, and each node is a
                                                                  representation learning aims to learn a mapping function
tweet/retweet with the corresponding user. A special
                                                                   𝑓 : 𝐺 𝑟𝑡 → ℎ𝑟𝑡 ∈ R𝑑 such that the obtained embedding ℎ𝑟𝑡
case is that an extra node representing the news is added
                                                                  is useful for predicting the label 𝑦𝑟 of the news record.
to link the network together—it is called the source node
                                                                  Moreover, we analyse the informativeness of the learned
hereafter.
                                                                  embeddings for the initial propagation network at the de-
• 𝐸 𝑡𝑟 is the set of edges. Here, edges represent how a
                                                                  tection deadline 𝐺 𝑟Δ𝑡 1 and for the complete propagation
news item spreads from one person to another as shown
                                                                  network 𝐺 𝑇𝑟 (𝑇 >> Δ𝑡).
in Fig 1. However, Twitter APIs do not provide the imme-
                                                                      Datasets. We conduct all our experiments on two
diate source of a retweet. To solve this problem, within
                                                                  publicly available datasets introduced in [12], which are
each cascade we first sort the tweets by their timestamps,
                                                                  collected from two fact-checking websites: (1) PolitiFact2 ;
and then search for the potential source of a retweet
                                                                  and (2) GossipCop3 . Both datasets consist of labelled
from all the tweets that are published earlier. Specifically,
there is an edge from node 𝑖 to node 𝑗 if (1) the user                 1 We denote the propagation network at the detection deadline
of tweet 𝑖 mentions the user of tweet 𝑗; or (2) tweet 𝑖 𝐺Δ𝑡          𝑟 as the initial propagation network.
                                                                       2 https://www.politifact.com/
is public and tweet 𝑗 is posted within a certain period
                                                                       3 https://www.gossipcop.com/
 Algorithm 1: Local Network Representation                                     Politifact                            Gossipcop

  Input: propagation network 𝐺 𝑟𝑡 = (𝑉𝑡𝑟 , 𝐸 𝑡𝑟 , 𝑋𝑡𝑟 )            0.2                                0.6
          source node of 𝑟 𝑣 𝑠 ∈ 𝑉𝑡𝑟
                                                                     0                                0.4
          gamma 𝛾 ∈ [0, 1]
                                                                                                      0.2
  Output: The local representation 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑟𝑡 )                −0.2
   0
1 ℎ 𝑣 ← 𝑥 𝑣 ∀v∈ 𝑉𝑡
                       𝑟                                                                                0
2 for 𝑡 in 1, 2, ..., 𝑘 do
                                                                 −0.4                                −0.2
3     for 𝑣 in 𝑉 do                                              −0.6                                −0.4
                                                                           2    4     6     8                    2    4        6   8
                                  ∑︁          𝑡−1
                               ∀(𝑣,𝑢) ∈𝐸𝑡𝑟 ℎ𝑢
4          ℎ𝑡𝑣 ← 𝛾ℎ𝑡−1
                   𝑣 + (1 − 𝛾)
                               ∑︁
                                 ∀( 𝑣,𝑢) ∈𝐸 𝑟 1
                                              𝑡
                                                                          𝑛1        𝑛2          𝑛3          𝑛4            𝑛5       𝑛6
5     end
6 end
                                                                Figure 2: Correlation of the news labels with the source node
            𝑟       𝑘
7 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑡 ) ← ℎ 𝑣
                     𝑠                                          representations using node-level user-based features (𝑛1 −𝑛6 ),
                   𝑟
8 𝑟𝑒𝑡𝑢𝑟𝑛 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑡 )                                          at different iterations with the proposed label propagation
                                                                scheme (X-axis: number of iterations, Y-axis: correlation val-
                                                                ues).

news records and all the tweets and retweets for each
news item. Please refer to [4] for the descriptive statistics
of the datasets.                                                sophisticated neural architectures such as graph neural
                                                                networks [14] that use data-driven trainable kernels to
                                                                perform node-level aggregation, our approach is more
3.1. Local Representation                                       straightforward and hence easier to interpret.
In this subsection, we introduce how to embed node-
level/local features.                               3.1.2. Node-level Features
                                                               We investigate three categories of node-level features:
3.1.1. Node-level Feature Aggregation                          (1) user-based features; (2) text-based features; and (3)
The nodes in a propagation network typically have com- temporal features.
plex multi-modal attributes e.g., temporal-based, text- Node-level User-based Features. The following user-
based and user-based, which are useful to characterize based features are studied in our experiments: whether
the propagation network. Previous work [11, 13] mainly the user is verified (𝑛1 ); the number of followers (𝑛2 );
adopts simple averaging techniques to aggregate such the number of lists (𝑛3 ); the number of favourites (𝑛4 );
node-level features, e.g., the average time between tweets, the number of tweets (𝑛5 ); and the number of friends
or the average sentiment score of the tweets. The main mentioned per timeline tweet divided by the number of
limitation of these approaches is that they mostly ignore friends (𝑛6 ).
the structure of the network. To solve this problem, we           Such node-level user features can be useful to identify
propose an aggregation technique to summarise node- the differences in the way users engage with false news
level attributes while preserving the structural properties and real news. For example, less credible users are more
of the network, which is elaborated in Algorithm 1.            likely to spread fake news as shown in [5]. 𝑛1 and 𝑛2
   The proposed approach iteratively updates the embed- can be good indicators to identify less credible users.
ding of the nodes based on their one-hop neighbours. In addition, the finding in [13] shows that fake news
Specifically, the embedding of each node ℎ0𝑣 in the net- spreaders tend to form larger clusters by their actions.
work is initialized using its features (Line 1 in Algo- 𝑛6 can be useful to identify such user behaviours.
rithm 1). Then for each iteration, the embeddings of              The correlations of features 𝑛1 − 𝑛6 with the news
one-hop neighbours (i.e., immediate successors in the di- labels are shown in Figure 2. A positive (or negative) cor-
rected graph) are propagated to the node following Line relation in Figure 2 means that the corresponding feature
4 in Algorithm 1. Here, 𝛾 controls the weight assigned values are higher (or lower) for fake news compared with
to the propagated embeddings from the neighbours and real news. As can be seen, almost all the features show
the scale of the updated embedding. By running 𝑘 iter- moderate correlation for at least one dataset. Specifically,
ations of the aforementioned label propagation scheme, 𝑛1 and 𝑛6 exhibit the highest correlation for PolitiFact
each node can summarize its k-hop network based on the and GossipCop, respectively, such that they are consis-
node-level features. Finally, the embedding of the source tent with the aforementioned theory-driven explanations.
node 𝑣 𝑠 in the network 𝐺 𝑟𝑡 is returned as the local rep- However, feature 𝑛1 shows opposite relations with the
resentation of the graph, i.e., 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑟𝑡 ). In contrast to labels for PolitiFact (negative) and Gossipcop (positive),
               Politifact                         Gossipcop                         Politifact                         Gossipcop
   0.4                                                                                                     0.4
                                      0.6
                                                                        0.2
   0.2                                                                                                     0.2
                                      0.4
     0                                                                    0                                 0
                                      0.2
  −0.2                                                                 −0.2                           −0.2
  −0.4                                 0
                                                                       −0.4                           −0.4
           2    4     6     8               2      4    6     8                 2    4     6     8               2      4    6     8

                      𝑛7        𝑛8          𝑛9         𝑛10
                                                                                           𝑛15       𝑛16         𝑛17        𝑛18
                      𝑛11       𝑛12         𝑛13

                                                                     Figure 4: Correlation of the news labels with the source
Figure 3: Correlation of the news labels with the source
                                                                     node representations using node-level temporal-based fea-
node representations using node-level text-based features
                                                                     tures (𝑛15 − 𝑛18 ), at different iterations with the proposed la-
(𝑛7 − 𝑛14 ), at different iterations with the proposed label prop-
                                                                     bel propagation scheme (X-axis: number of iterations, Y-axis:
agation scheme (X-axis: number of iterations, Y-axis: correla-
                                                                     correlation values).
tion values).



which may be due to the domain difference of the two          Node-level Temporal Features. Moreover, we analyse
datasets.                                                     the following node-level temporal features to further
   Another interesting observation is that the correlation    capture the difference in the dissemination between fake
of each feature converges after a few iterations (≈ 8) us-    and real news: the time difference with the source node
ing the proposed node-level aggregation approach. This        (𝑛15 ); the time difference with the immediate predecessor
observation indicates that the nodes that are close to the    (𝑛16 ); the average time difference with the immediate
source node are more informative compared to the rest         successors (𝑛17 ); user account timestamp (𝑛18 ).
in the propagation networks.                                     According to the correlation analysis in Figure 4, the
                                                              selected features show moderate correlations with the
Node-level Text-based Features. We further study news labels for both datasets, and the results are also
text-based features as listed below: the sentiment scores more consistent over different values of 𝑘, compared
computed using VADER4 using text content in the tweets with the other node-level features.
(𝑛7 ); the frequency of positive words (𝑛8 ); the frequency      In summary, the proposed label propagation scheme
of negative words (𝑛9 ); the number of emojis (𝑛10 ); the can capture up to 𝑘-hop neighbour information to gener-
number of mentions (𝑛11 ); the number of hashtags (𝑛12 ); ate the embedding for the source node based on the node-
and the percentage of tweets related to the target topic level features. Our empirical analysis shows that the
(𝑛13 )—we collect the timeline tweets for each user, and nodes in close vicinity to the source node are mostly in-
run tweet topic classification. For the dataset of PolitiFact formative to generate useful local representations. Thus
(or GossipCop), we calculate the percentage of tweets the proposed label propagation scheme with a limited
whose topic is classified as “politics" (or “entertainment"). 𝑘 value is sufficient for performing node-level feature
   The node-level text features can be helpful to under- aggregation.
stand the linguistic differences of the text contents gen-
erated by the users engaging with fake news and real
news. As shown in Figure 3, for both datasets a subset of
                                                              3.2. Global Representation
the above features show relatively high correlation with In addition to local features, the following network-level
the news labels, e.g., features 𝑛9 , 𝑛11 for PolitiFact, and features are also extracted to represent the structural
features 𝑛11 , 𝑛13 for GossipCop. This aligns with well- properties of each network 𝐺 𝑟𝑡 , which is denoted as the
defined theories—for example, it has been demonstrated global representation 𝑓𝑔𝑙𝑜𝑏𝑎𝑙 (𝐺 𝑟𝑡 ) of the network.
that user-bias is a useful indicator to identify fake news • Wiener Index (𝑔1 ): The Wiener Index of a network is
spreaders [13]. A feature like 𝑛13 can help understand the sum of the lengths of the shortest paths between all
user bias to a particular domain, thus a user with a higher pairs of vertices, which is a measure of the structural
percentage of domain-specific posts (i.e., users with high virality of a propagation network.
𝑛13 ) is more likely to be a fake news spreader. In addition, • Number of nodes (𝑔2 ): The number of nodes in a prop-
correlation values in Figure 3 also converge after a few agation network can be useful to understand the differ-
iterations of label propagation as seen in Figure 2.          ences in the scale of user engagements for false and real
                                                              news pieces.
    4 https://github.com/cjhutto/vaderSentiment
Table 1
Correlation analysis of 𝑔1 − 𝑔4 , where 𝑔𝑖Δ𝑡 and 𝑔𝑖𝑇 are the 𝑖 𝑡 ℎ global feature computed using the propagation networks at the
detection deadline (Δ𝑡 = 5 ℎ𝑜𝑢𝑟 𝑠) and using the complete propagation networks respectively. The statistically significant
figures under correlation test are shown in bold.
   Dataset                               Politifact                                                      Gossipcop
   Attribute    𝑐𝑜𝑟𝑟 (𝑔𝑖Δ𝑡 , 𝑦)        𝑐𝑜𝑟𝑟 (𝑔𝑖𝑇 , 𝑦)      𝑐𝑜𝑟𝑟 (𝑔𝑖Δ𝑡 , 𝑔𝑖𝑇 )       𝑐𝑜𝑟𝑟 (𝑔𝑖Δ𝑡 , 𝑦)    𝑐𝑜𝑟𝑟 (𝑔𝑖𝑇 , 𝑦)     𝑐𝑜𝑟𝑟 (𝑔𝑖Δ𝑡 , 𝑔𝑖𝑇 )
         𝑔1           -0.0217               -0.2825              0.2046                 0.0102             0.2341              0.2341
         𝑔2           -0.0194               -0.2589              0.2859                -0.0998             0.4099              0.2715
         𝑔3           0.0523                 0.0107              0.2959                -0.2573             0.2120              0.2808
         𝑔4           -0.0402               -0.2151              0.3414                -0.0981             0.4249              0.2780


               Politifact                         Gossipcop                 In addition, Figure 5 shows that the correlations of
                                                                          features 𝑔5 − 𝑔6 are stronger in close proximity to the
   0.6                                0.6                                 source nodes, which is consistent with the observation
   0.4                                0.4                                 in the node-level features. Hence, it further signifies
   0.2                                0.2                                 the ability of the proposed label propagation scheme to
                                        0                                 preserve the network-level information.
     0
                                                                            After the local and global representations are obtained,
 −0.2                                −0.2
                                                                          we concatenate them to create the final embedding of the
                  5             10                     5            10    propagation network: 𝑓 (𝐺 𝑟𝑡 ) = 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑟𝑡 )⊕ 𝑓𝑔𝑙𝑜𝑏𝑎𝑙 (𝐺 𝑟𝑡 ),
                                                                          where ⊕ is the concatenation operation.
                𝐿𝑒𝑣𝑒𝑙                                 𝐿𝑒𝑣𝑒𝑙

                   𝑐𝑜𝑟𝑟 (𝑔5Δ𝑡 , 𝑦)              𝑐𝑜𝑟𝑟 (𝑔5𝑇 , 𝑦)            4. Network-based Fake News
                  𝑐𝑜𝑟𝑟 (𝑔5Δ𝑡 , 𝑔5𝑇 )            𝑐𝑜𝑟𝑟 (𝑔6Δ𝑡 , 𝑦)
                   𝑐𝑜𝑟𝑟 (𝑔6𝑇 , 𝑦)              𝑐𝑜𝑟𝑟 (𝑔6Δ𝑡 , 𝑔6𝑇 )
                                                                             Early Detection
                                                                          As shown in Section 3, the embedding of the complete
Figure 5: Correlation analysis of 𝑔5 and 𝑔6 at different net-
                                                                          propagation network of news records have a relatively
work levels.
                                                                          strong correlation with the labels. However, only the
                                                                          initial part of the propagation network is available at the
                                                                          early detection deadlines. Hence, we propose to train an
• Network depth (𝑔3 ): This measure captures how far                      autoencoder that takes the partial propagation network
the information is propagated via tweets and retweets.                    as input, and generates the embedding of the complete
• Maximum outdegree (𝑔4 ): This characterizes the most                    propagation network.
influential node in a propagation network.                                   Formally, for a given news record 𝑟 in the training
• Number of nodes at different hops (𝑔5 ): This measure                   set, denote the embedding of the initial network 𝑓 (𝐺 𝑟Δ𝑡 )
counts the number of 𝑘-hop neighbours with respect to                     and the complete network 𝑓 (𝐺 𝑇𝑟 ) as 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑟Δ𝑡 ) ⊕
the source node in a propagation network.                                  𝑓𝑔𝑙𝑜𝑏𝑎𝑙 (𝐺 𝑟Δ𝑡 ) and 𝑓𝑙𝑜𝑐𝑎𝑙 (𝐺 𝑇𝑟 ) ⊕ 𝑓𝑔𝑙𝑜𝑏𝑎𝑙 (𝐺 𝑇𝑟 ), respec-
• Branching factor at different levels (𝑔6 ): For a given                 tively. The autoencoder is trained using the following
level 𝑙 in a propagation network with respect to the                      reconstruction loss.
source node, the branching factor at 𝑙 is calculated as
the ratio of the nodes at 𝑙 + 1 and the nodes at 𝑙.                                𝐿 𝑟 𝑒𝑐𝑜𝑛 = ||𝐷𝑒𝑐(𝐸𝑛𝑐( 𝑓 (𝐺 𝑟Δ𝑡 ))) − 𝑓 (𝐺 𝑇𝑟 )|| 2          (1)
   Several observations can be made from the correla-
                                                                                                                                        ′
tion analysis of 𝑔1 − 𝑔6 in Table 1 and Figure 5: (1) the                 where 𝐸𝑛𝑐 is the encoder: 𝑓 (𝐺 𝑟Δ𝑡 ) → 𝑙 𝑟 ∈ R𝑑 , 𝐷𝑒𝑐
global embeddings extracted from the initial propagation                  is the decoder: 𝑙 𝑟 → 𝑓 (𝐺 𝑇𝑟 ) ∈ R𝑑 , 𝑑 ′ is the latent
network do not show an obvious correlation with the
news label; (2) the global embeddings from the complete                   dimension of the autoencoder, and 𝑓 (𝐺 𝑇𝑟 ) is the pre-
propagation network, however, show much stronger cor-                     dicted embedding for the complete propagation network.
relation, which demonstrates the importance of having                     Both 𝐸𝑛𝑐() and 𝐷𝑒𝑐() mappings are modelled as 2-layer
access to the complete network; and (3) there is a moder-                 feedforward neural networks followed by a Sigmoid ac-
ate correlation between the global embeddings generated                   tivation function (𝜎), which can be formally defined as
from the initial network and from the corresponding com-                  follows:
plete network, which indicates the feasibility of using
the initial network to predict the future embedding.
                                                                                𝐸𝑛𝑐( 𝑓 (𝐺 𝑇𝑟 )) = 𝜎( 𝐴2 (𝜎( 𝐴1 ) 𝑓 (𝐺 𝑇𝑟 ) + 𝑏 1 ) + 𝑏 2 )     (2)
        𝐷𝑒𝑐(𝑙 𝑟 ) = 𝜎( 𝐴4 (𝜎( 𝐴3 )𝑙 𝑟 + 𝑏 3 ) + 𝑏 4 )         (3)      0.9
                            ′                        ′   ′
where { 𝐴1 , 𝐴𝑇4 } ∈ R (2𝑑 ,𝑑) , {𝐴2 , 𝐴𝑇3 } ∈ R (𝑑 ,2𝑑 ) , {𝑏 1 ,   0.85                          0.88
           ′                       ′
𝑏 4 } ∈ R2𝑑 , and {𝑏 2 , 𝑏 3 } ∈ R𝑑 are trainable parameters.          0.8
We leave the optimal neural architecture search for 𝐸𝑛𝑐()                                          0.86
and 𝐷𝑒𝑐() in our model as future work.                               0.75
    Subsequently, the generated embedding of the com-
plete network is used to classify the news record.                           1 5 10 15        25          100 500   1,000 1,500
                                                                                     Δ𝑡                         𝑡 ℎ𝑟 𝑒𝑠ℎ
        𝐿 𝑝𝑟 𝑒𝑑𝑖𝑐𝑡 = 𝐵𝐶𝐸 (𝜎(𝑊 ∗ 𝑓 (𝐺 𝑇𝑟 ) + 𝑏), 𝑦𝑟 )          (4)
where 𝐵𝐶𝐸 () is the standard binary cross entropy loss
                                                                                 PolitiFact        PolitiFact - Without 𝐿𝑟 𝑒𝑐𝑜𝑛
function and 𝑊, 𝑏 are the trainable parameters of the
                                                                                 GossipCop         GossipCop - Without 𝐿𝑟 𝑒𝑐𝑜𝑛
fake news classifier.
   The final loss function jointly optimises 𝐿 𝑟 𝑒𝑐𝑜𝑛 and
                                                                     Figure 6: Accuracy of the proposed approach with different
𝐿 𝑝𝑟 𝑒𝑑𝑖𝑐𝑡 :
                                                                     detection deadlines (Δ𝑡, in hour) and with different thresh-
                 𝐿 = 𝐿 𝑟 𝑒𝑐𝑜𝑛 + 𝐿 𝑝𝑟 𝑒𝑑𝑖𝑐𝑡             (5)           olds (𝑡 ℎ𝑟 𝑒𝑠ℎ) for the maximum number of nodes in the ini-
                                                                     tial network.
5. Experimental Verification
In this section, we present our experimental results to              the GossipCop dataset, outperforming the best baseline
demonstrate the efficiency of the proposed algorithm.                by as much as 10% in accuracy.
                                                                        For the PolitiFact dataset, the proposed approach out-
5.1. Experimental Setup                                              performs all the baselines except dEFEND. However, the
                                                                     result for dEFEND is obtained using the complete propa-
Baselines. We compare our approach with four widely                  gation network for each news item, while our approach
used text-based methods: (1) RST [15], (2) LIWC5 , (3)               only requires the initial propagation network at the de-
text-CNN [16], (4) HAN [17], one propagation network-                tection deadline. In other words, our method is more suit-
based algorithm: HSA-BLSTM [18], and three mixed                     able for fake news early detection. In addition, dEFEND
approaches: (1) TCNN-URG [19]; (2) CSI [6]; (3) dE-                  also extracts rich latent features from the news content,
FEND [4].                                                            which may be manipulated by intelligent fake news gen-
   Parameter Settings. The proposed approach has                     erators to bypass detection—similar to the well-known
three model-specific parameters: (1) the latent embed-               adversarial attacks against machine learning models.
ding dimension 𝑑 ′ —the default value is set to 10, as we               Ablation Study. Table 2 shows that without the re-
have empirically observed that the performance of the                construction loss proposed in Eq. 1, i.e., the model makes
model plateaus for 𝑑 ′ ≥ 10; (2) detection deadline Δ𝑡—the           classification only based on the embedding of the ini-
default value is set to 5 ℎ𝑜𝑢𝑟 𝑠, and we also analyse the            tial propagation network, which is less informative as
model performance under other values of Δ𝑡; and (3) 𝛾 is             shown in Section 3, its accuracy drops by around 3% for
set to 0.5. As for the baselines, please refer to [4] for the        both datasets. This clearly indicates the importance of
hyper-parameter settings. Note that all the propagation              predicting the embedding of the complete propagation
network-based baselines use the complete propagation                 network.
network.                                                                Furthermore, we analyse the contribution of different
   To evaluate the performance of the proposed approach,             types of features. It can be seen that Node-level User-
we adopt the commonly used metrics: (1) Accuracy (Acc);              based Features are the most important among all Node-
(2) Precision (Prec); (3) Recall (Rec); and (4) F1 Score             level Features, which is due to the high correlation of
(F1). Following the previous works [11, 4], we randomly              features like 𝑛1 and 𝑛6 with the actual news label. In
choose 75% of news records for training and remaining                addition, it is clear that Global Features are the least use-
25% for testing, and the same process is performed for 5             ful, as removing global features has minimum impact on
different training and test splits and the average perfor-           the final result. The reason can be that most global fea-
mance is reported.                                                   tures adopt simple sum operation and ignore the network
                                                                     structure. Overall, the removal of each type of feature
5.2. Results for Fake News Detection                                 in the ablation study decreases the final performance of
                                                                     the model, which verifies the positive contribution of the
Table 2 shows the results for fake news detection. The               facets of the proposed model.
proposed approach yields substantially better results for               Parameter Sensitivity. In Figure 6, we have checked
    5 https://liwc.wpengine.com/
Table 2
Results for fake news detection of different methods, which are classified under three categories: (1) news record content-
based approaches (N); (2) propagation network-based approaches (P); and (3) suitability for early fake news detection (i.e.,
ability to yield better performance with initial propagation networks) (E)
   Method                             Type                   Politifact                          Gossipcop
                                N      P     E    Acc     Prec        Rec      F1      Acc     Prec     Rec        F1
   RST                          ✓            ✓   0.607    0.625      0.523   0.569    0.531    0.534   0.492      0.512
   LIWC                         ✓            ✓   0.769    0.843      0.794   0.818    0.736    0.756   0.461      0.572
   text-CNN                     ✓            ✓   0.653    0.678      0.863   0.760    0.739    0.707   0.477      0.569
   HAN                          ✓            ✓   0.837    0.824      0.896   0.860    0.742    0.655   0.689      0.672
   HPA-BLSTM                           ✓         0.846    0.894      0.868   0.881    0.753    0.684   0.662      0.673
   TCNN-URG                     ✓      ✓         0.712    0.711      0.941   0.810    0.736    0.715   0.521      0.603
   CSI                          ✓      ✓         0.827    0.847      0.897   0.871    0.772    0.732   0.638      0.682
   dEFEND                       ✓      ✓         0.904    0.902     0.956    0.928    0.808    0.729   0.782      0.755
   Our Approach                        ✓     ✓   0.874    0.878     0.870    0.871    0.889    0.859    0.805    0.828
   Ablation Study
   (-) Reconstruction Loss                       0.851    0.854     0.855    0.854    0.865    0.846    0.791     0.810
   (-) Global Features                           0.871    0.837     0.867    0.852    0.876    0.851    0.769     0.802
   (-) Node-level Features                       0.722    0.686     0.840    0.751    0.779    0.679    0.669     0.673
   (-) Node-level Text Features                  0.840    0.834     0.852    0.841    0.863    0.808    0.772     0.781
   (-) Node-level Temporal Features              0.862    0.854     0.879    0.864    0.878    0.844    0.784     0.804
   (-) Node-level User Features                  0.782    0.772     0.815    0.791    0.857    0.806    0.768     0.778


the performance of the proposed model with different 6.2. Context-based Fake News Detection
configurations for the initial network. As can be seen,
                                                           Social context here refers to the interactions between
if the initial network is too small due to low 𝑡ℎ𝑟𝑒𝑠ℎ or
                                                           users. These engagements can be transferred into dif-
Δ𝑡, the performance drops drastically if the predictions
                                                           ferent types of graphs to facilitate fake news detection.
are made using the embedding of the initial networks
                                                           For example, a range of models have been applied to
(i.e., without 𝐿 𝑟 𝑒𝑐𝑜𝑛 ). In contrast, the model performs
                                                           study the propagation patterns, including Propagation
reasonably well with the predicted embedding of the
                                                           Tree Kernel [7], LSTM cells incorporated with RNNs [8],
complete propagation network even with a small initial
                                                           and GNNs [3]. Other methods that fall into this category
network size.
                                                           include [25, 11, 10]. Our method is also context-based,
                                                           although it only relies on the partial propagation network
6. Related Work                                            for fake news early detection.
                                                              In addition to the above two categories, a number of
Existing work on fake news detection mainly relies on methods use a mixed strategy and rely on both news con-
two sources of information: news content and social tent and associated user inter-actions over social media
context. Based on this criterion, we classify prior work to detect fake news [6, 26, 27].
into two categories: content-based and context-based.

                                                               7. Conclusions and Future Work
6.1. Content-based Fake News Detection
This type of method uses news headlines and body con-          In this work, we have designed a novel representation
tent to detect fake news. The content here is not limited      learning framework for fake news early detection, by em-
to text-based, but can also include visual information.        bedding news propagation networks using both global-
For example, Wang et al. [20] extract both text and vi-        level and node-level attributes. Subsequently, we propose
sual features from posts to train a fake news detector         to train an autoencoder to predict the embedding of the
and an event discriminator simultaneously. Other work          complete propagation network using the partial network
that applies multi-modal techniques includes [21, 22, 23].     at an early stage. We demonstrate that the predicted
In addition to content, styles can assist differentiating      embedding for the complete propagation network can
between fake and real news, since fake news aims to            achieve better results for fake news early detection.
mislead the public, and often exhibits distinct writing           For future work, we intend to work on the following
styles [24]. Furthermore, the idea of knowledge-based          directions: (1) Our empirical studies show that some net-
detection is discussed in [2].                                 work attributes carry domain-specific relations with the
                                                               news labels. Therefore, a model trained on the dataset
from one domain using these features may perform very            san Awadallah, S. Ruston, H. Liu, Leveraging multi-
poorly for data from other domains. In order to solve            source weak social supervision for early detection
this problem, we will study how to extend the proposed           of fake news, arXiv:2004.01732 (2020).
approach in a domain-agnostic manner. (2) We have not [14] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, P. S. Yu,
considered other attributes of news records such as the          A comprehensive survey on graph neural networks,
textual and image content. Hence, the integration of fea-        arXiv:1901.00596 (2019).
tures from these modalities with the proposed framework [15] V. Rubin, N. Conroy, Y. Chen, Towards news ver-
can be another direction to explore.                             ification: Deception detection methods for news
                                                                 discourse, in: Proc. of HICSS, 2015.
                                                            [16] Y. Kim, Convolutional neural networks for sentence
References                                                       classification, in: Proc. of EMNLP, 2014.
                                                            [17] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy,
 [1] J. D. keersmaecker, A. Roets, ‘Fake news’: Incorrect,
                                                                 Hierarchical attention networks for document clas-
      but hard to correct. the role of cognitive ability on
                                                                 sification, in: Proc. of NAACL-HLT, 2016.
      the impact of false information on social impres-
                                                            [18] H. Guo, J. Cao, Y. Zhang, J. Guo, J. Li, Rumor detec-
      sions, Intelligence (2017).
                                                                 tion with hierarchical social attention network, in:
 [2] X. Zhou, R. Zafarani, Fake news: A survey of
                                                                 Proc. of CIKM, 2018.
      research, detection methods, and opportunities,
                                                            [19] F. Qian, C. Gong, K. Sharma, Y. Liu, Neural user
      arXiv:1812.00315 (2018).
                                                                 response generator: Fake news detection with col-
 [3] F. Monti, F. Frasca, D. Eynard, D. Mannion, M. M.
                                                                 lective user intelligence, in: Proc. of IJCAI, 2018.
      Bronstein, Fake news detection on social media
                                                            [20] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su,
      using geometric deep learning, arXiv:1902.06673
                                                                 J. Gao, EANN: Event adversarial neural networks
      (2019).
                                                                 for multi-modal fake news detection, in: Proc. of
 [4] K. Shu, L. Cui, S. Wang, D. Lee, H. Liu, DEFEND:
                                                                 KDD, 2018.
      Explainable fake news detection, in: Proc. of KDD,
                                                            [21] Z. Jin, J. Cao, Y. Zhang, J. Zhou, Q. Tian, Novel
      2019.
                                                                 visual and statistical image features for microblogs
 [5] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake
                                                                 news verification, IEEE Trans. on Multimedia
      news detection on social media: A data mining
                                                                 (2017).
      perspective, SIGKDD Explor (2017).
                                                            [22] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, P. S. Yu,
 [6] N. Ruchansky, S. Seo, Y. Liu, CSI: A hybrid deep
                                                                 TI-CNN: Convolutional neural networks for fake
      model for fake news detection, in: Proc. of CIKM,
                                                                 news detection, arXiv:1806.00749 (2018).
      2017.
                                                            [23] X. Zhou, J. Wu, R. Zafarani, SAFE: Similarity-
 [7] J. Ma, W. Gao, K.-F. Wong, Detect rumors in mi-
                                                                 aware multi-modal fake news detection, in: Proc.
      croblog posts using propagation structure via ker-
                                                                 of PAKDD, 2020.
      nel learning, in: Proc. of ACL, 2017.
                                                            [24] S. Volkova, K. Shaffer, J. Y. Jang, N. Hodas, Separat-
 [8] L. Wu, H. Liu, Tracing fake-news footprints: Char-
                                                                 ing facts from fiction: Linguistic models to classify
      acterizing social media messages by how they prop-
                                                                 suspicious and trusted news posts on twitter, in:
      agate, in: Proc. of WSDM, 2018.
                                                                 Proc. of ACL, 2017.
 [9] Y. Liu, Y.-f. B. Wu, Early detection of fake news on
                                                            [25] S. Yang, K. Shu, S. Wang, R. Gu, F. Wu, H. Liu,
      social media through propagation path classifica-
                                                                 Unsupervised fake news detection on social media:
      tion with recurrent and convolutional networks, in:
                                                                 A generative approach, AAAI (2019).
      Proc. of AAAI, 2018.
                                                            [26] J. Zhang, B. Dong, P. S. Yu, FAKEDETECTOR: Effec-
[10] X. Zhou, R. Zafarani,          Network-based fake
                                                                 tive fake news detection with deep diffusive neural
      news detection: A pattern-driven approach,
                                                                 network, arXiv:1805.08751 (2018).
      arXiv:1906.04210 (2019).
                                                            [27] K. Shu, S. Wang, H. Liu, Beyond news contents:
[11] K. Shu, D. Mahudeswaran, S. Wang, H. Liu,
                                                                 The role of social context for fake news detection,
      Hierarchical propagation networks for fake
                                                                 in: Proc. of WSDM, 2019.
      news detection: Investigation and exploitation,
      arXiv:1903.09196 (2019).
[12] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, H. Liu,
      FakeNewsNet: A data repository with news con-
      tent, social context and spatialtemporal infor-
      mation for studying fake news on social media,
      arXiv:1809.01286 (2018).
[13] K. Shu, G. Zheng, Y. Li, S. Mukherjee, A. Has-