False News Classification and Dissemination: The Case
        of the 2019 Indonesian Presidential Election

          Rayan Suryadikara                                    Suzan Verberne                     Frank W. Takes
   r.suryadikara@umail.leidenuinv.nl                    s.verberne@liacs.leidenuniv.nl             takes@liacs.nl


                                                                 1   Introduction
                                                                 A recent study strictly defined fake news as news ar-
                        Abstract                                 ticles that are intentionally and verifiably false and
                                                                 could therefore mislead readers [2]. In a political con-
                                                                 text the definition can be considered a bit wider. One
    In this paper we investigate automated meth-                 study argues that politicians tend to label any news
    ods for understanding false news dissemina-                  sources which do not support their positions as fake
    tion on Twitter in relation to one particular                news [23]. This is especially common in the context of
    event: the 2019 Indonesian presidential elec-                a large political event, e.g., an election. For example,
    tion. We collected a sample of 2,360 tweets                  there was an allegation that Joko Widodo was both
    related to topics addressed by fact-checking                 a communist and Chinese in the Indonesia 2014 pres-
    websites. The tweets were hand-labeled ac-                   idential election [10]. In this paper, we focus on the
    cording to their trustworthiness. We trained                 2019 presedential election in Indonesia.
    several classification models on the human-                     Social media flourishes as an alternative informa-
    labelled data, using three groups of text fea-               tion source, in particular during elections, where many
    tures. The word n-gram features appeared to                  politicians utilize social media as means to reach out to
    be the most effective, reaching a recall of 85%              the public more directly. Politicians prefer Twitter be-
    for true news and 62% for false news. With                   cause of its efficiency in spreading messages, sparking
    this classifier we labeled a larger sample of                conversations, building public opinion, or gaining sup-
    tweets related to fact-checking topics in the                port [19]. Especially in volatile political times, there
    context of the 2019 Indonesian presidential                  are so-called buzzer teams that attempt to amplify
    elections. We then analysed the dissemination                messages and creates a “buzz” on social networks to
    of true news and false news in the underlying                spread positive content about one side of the political
    Twitter network using community detection                    spectrum, while disseminating negative content about
    and centrality measures. The top influential                 the other [11]. Hashtags are often used to increase
    users in the network disseminate more false                  their visibility to Indonesian Twitter users, which of-
    news, including a government institution ac-                 ten become trending topics that then gain even more
    count and a verified politician’s account. Our               attention [11].
    results show that the combination of text fea-                  Because of these problems and their political im-
    tures and social network analysis can provide                pact, there is an urgent need to automatically identify
    valuable insights in detecting and preventing                and analyze false news in social media. This process
    the dissemination of false news. Moreover, we                could then result in the identification of the actors in-
    make the dataset used in this research avail-                volved, as well as their networks that disseminated the
    able for reuse by the community.                             false news. This research studies how false news can be
                                                                 detected based on the content of the messages posted,
                                                                 and then analyses its dissemination using social net-
Copyright c by the paper’s authors. Use permitted under Cre-     work analysis. The particular case that is considered
ative Commons License Attribution 4.0 International (CC BY
                                                                 is the 2019 Indonesian presidential election on Twit-
4.0). CEUR Workshop Proceedings (CEUR-WS.org).
                                                                 ter, for which data was manually gathered and labeled
Title of the Proceedings: “Proceedings of the CIKM 2020 Work-
shops October 19-20, Galway, Ireland”. Editors of the Proceed-   in light of this study.
ings: Stefan Conrad, Ilaria Tiddi                                   The contributions of this paper are:
    • A new hand-labeled dataset of 2,360 tweets for the       A study analyzed Australia’s Department of Im-
      detection of false news in the Indonesian language;   migration and Citizenship (DIAC) Twitter data to
                                                            identify topics over the DIAC Twitter account and
    • A method based on word features that can rea-         the spread of tweets, particularly the most retweeted
      sonably distinguish true news and false news in       tweets [26]. Another study further explored the anal-
      this data.                                            ysis by taking the mention feature into account and
                                                            term co-occurrence analysis with Korean Presidential
    • An analysis of how true news and false news dis-
                                                            Election on Twitter [18]. It marked the possibility to
      seminate in the Twitter network related to the
                                                            analyse the real political situation from the social net-
      2019 Indonesian elections, and what role particu-
                                                            work. On the other hand, one research utilized and
      lar communities, accounts, and hashtags play in
                                                            built hashtag co-occurrence graph [24] to discover se-
      the dissemination of false news.
                                                            mantic relations between words in a tweet.
   The remainder of the paper is organized as follows.         Another study [7] investigated filter bubble effects
In Section 2 we discuss related work. In Section 3          which tend to be generated by recommender systems
we introduce the data and the annotation process. In        that personalize and filter tweets via community de-
Section 4 we present the methods we use, followed by        tection. Regarding influential actors in a network, a
experimental results in Section 5. Finally, the conclu-     recent study with the main topic is the 2014 Malaysian
sions of the research are outlined in Section 6.            floods [14] utilized betweenness centrality to identify
                                                            the potentially key Twitter users during information
                                                            dissemination. Another study analyses false news
2     Related Work
                                                            based on the impact of emotion [5] or the profiling
In this section, we discuss work on false news on so-       of Twitter users [4].
cial media as well as methods for identifying this false       While these works present the analysis of filter bub-
news.                                                       bles or the influential users, our study will utilize ac-
    A recent study examined fake news from a political      tual true news and false news labels of news messages
perspective, inspired by the 2016 US presidential elec-     to assess which type of news is circulated inside certain
tions [2]. They differentiated fake news and its close      communities and/or spread of particular influential ac-
cousins in the political subject: unintentional report-     tors.
ing mistakes, rumors, conspiracy theories, satires, false
statements by politicians, and slanted or misleading re-
ports. The nature of the political world itself where a     3     Data
great number of critical reports have been discredited      3.1   Data collection
as fake news leads to redefining fake news which spread
on social media [2]. A relevant study by Vosoughi et        For crawling tweets we use the GetOldTweets Library1
al. [23] focused on the veracity of Twitter posts which     to bypass the limitations of the official Twitter API.
have been true or false.                                    This allows us to to download historical Twitter data
    In addition, they also defined news (either true or     within a specific date range for a particular query. The
false) as any story or claim with an assertion in it,       queries we used for crawling Twitter data are built
especially in social media. This extends the defini-        on topics that were published by two Indonesian fact-
tion scope of false news from ‘intentional’ characteris-    checking websites2 . The tweets are in the Indonesian
tics, allowing to incorporate aforementioned fake news’     language. We gathered data from the first day of the
close cousins [2] into a single term. Therefore, the        2019 Indonesian presidential campaign (September 23,
‘false news’ term will be used throughout the paper         2018) to a week after the election result was publicized
which incorporates fake news and its close cousins.         (May 28, 2019).
    In the text classification field for Indonesian lan-       We selected 281 topics related to the presidential
guage, most research focuses on hate speech identi-         elections from the above referenced fact-checking web-
fication. One of the first researches on Indonesian         sites with their corresponding supporting URLs. For
hate speech was conducted with multiple text features       each topic we created a query. For example, for the
(character n-grams and negative sentiment) and clas-        supporting URL that examines whether the 23 Euro-
sifiers (Naive Bayes, SVM, and Random Forest) [1].          pean Union ambassadors support Prabowo-Sandi or
This research and data set were expanded with adding
                                                               1 https://github.com/Jefferson-Henrique/
abusive language and hate speeches’ target and levels
                                                            GetOldTweets-python
[8]. However, there has not been conducted research to         2 https://cekfakta.tempo.co/ (Cek Fakta Tempo from
detect false news in the Indonesian language, despite       Tempo) and https://turnbackhoax.id/ (Turn Back Hoax from
they are usually associated with hate speech.               Mafindo)
not3 , we used the topic “European Ambassadors Sup-                    Class               Statistics
port Prabowo” as the query to extract the relevant
tweets.                                                                True News                  896
   To ensure alignment between the extracted tweets                    False News                 648
and the supporting URL, tweets from the first time                     Misleading News            189
the news aired in social media until its seventh day                   Other                      627
are selected. After removal of duplicate tweets, this                  Total                    2,360
resulted in a set of 8,784 tweets for the 281 topics. For
annotation, tweets that one retweet, one like, and one      Table 1: The 2019 Indonesian Presidential Election
reply, or less are removed resulting in a set of 2,360      News Data Set Size for Annotation
that we use for annotation.
                                                                The annotation process was conducted in two
3.2   Annotation                                            stages. In the first stage, two annotators annotated
We recruited 10 native Indonesian speakers to anno-         the data. In the second stage, a third annotator (the
tate the data. They do not have political job, political    first author of this paper) acted as a final judge for
affiliation, or belong to a political party to facilitate   any tweet where two previous annotators disagreed.
the impartiality. Having 2,360 tweets as original data      We analyzed the inter-rater reliability of the anno-
set, and two annotators per tweet, each annotator had       tated data using Cohen’s κ. Out of 10 annotator
to label 472 tweets.                                        pairs, there are five pairs with moderate agreement
    The information provided to the annotators was the      (κ = 0.41 − 0.60), four pairs with fair agreement
topic, the supporting URL, and the tweet text. One          (κ = 0.21 − 0.40), and one pair with slight agreement
topic is linked to one supporting URL and to multiple       (κ = 0.01 − 0.20). The highest κ score is 0.52 and the
tweets. We wrote an extensive annotation guideline          lowest is 0.07. As a whole, we obtain fair agreement
for Indonesian false news and validated it in several       with a mean κ of 0.33. The statistics of the annotated
short iterations before starting the actual annotation      data are outlined in Table 1.
process. 4 Annotators are asked to assign one of four
classes to each tweet:                                      3.3   Network Data
  • True: Tweets that relate to the topic and are true      We extract two different networks from our Twitter
    or accurate according to the supporting URLs;           collection of 8,748 tweets. The first is the mention
                                                            network. In literature, it is suggested that mention-
  • False: Tweets that relate to the topic and are false    ing other usernames in a tweet represents a more di-
    or inaccurate according the to supporting URLs;         rect form of communication than what is obtained
  • Misleading: Tweets that relate to the topic and         from a network based on follower connections [18].
    have accurate information according to support-         The second network that we create is the hashtag
    ing URLs but lead to wrong conclusions;                 co-occurrence network The frequency of use for a
                                                            hashtag indicates its popularity. In the 2019 Indone-
  • Other: Tweets that do not relate to the topic or        sian presidential election, there are certain hashtags
    are not discussed within supporting URLs.               created to support or oppose certain figures, such as
                                                            #jokowiamin to support Joko Widodo, the incumbent,
    While misleading news is sometimes considered a         and #2019gantipresiden (“2019 change the presi-
subset of false news, we decided to distinguish it sep-     dent”) to support Prabowo, the challenger.
arately for text classification. According to [21], mis-       The mention network is a weighted directed network
leading news tends to use correct facts and data, but       where posting usernames are defined as the source
how the news is delivered or how conclusions are drawn      and mentioned usernames are the target of a directed
is false and therefore leads to the wrong interpretation.   link. Link weight is determined by how many times
This is consistent with other definitions that mislead-     the source username mentions the target username.
ing news conceives false facts by topic changes, irrel-     The hashtag co-occurrence network is a weighted undi-
evant information, and equivocations to mislead the         rected network in which two hashtags are connected if
audience [22].                                              they occur together in a tweet. Link weight is deter-
   3 https://cekfakta.tempo.co/fakta/111/fakta-atau-hoax-   mined by counting how many times the tags co-occur.
benarkah-23-dubes-uni-eropa-dukung- prabowo-sandi,    de-      In our experiments, we visualize the two networks
termined to be false news
   4 The annotation guideline can be found here:   https:
                                                            to analyse how true news and false news spread in
//github.com/rayansuryadikara/false_news_detection_and_     presidential election settings. For the network data,
dissemination_analysis                                      misleading news will be merged under false news to
keep it straightforward and to simplify the contrast-       Therefore, the normalized form often consists of more
ing visualization between true news and false news. In      than one word.
doing so, we actually model both networks as a multi-           For the word n-gram features the text was lower-
graph in which two nodes can be connected based on          cased, and URLs and punctuation were removed. For
how often they communicate or co-occur in both true         mentioned usernames and hashtags, we removed the
and fake news.                                              @ and # symbols while the usernames and the hash-
                                                            tag words themselves were kept because both are in-
4     Methods                                               strumental parts of tweets to be identified and distin-
                                                            guished [13, 15]. Some of the usernames and hashtags
In this section, we first present our text classification
                                                            are also included in the text normalization dictionary
methods using three different content-based feature
                                                            and therefore are normalized as well. We used six sub-
sets (Section 4.1) and voting ensembles to combine the
                                                            sets of word n-grams to create vocabularies: Unigram,
feature representations. Next, we present the network
                                                            bigram, trigram, uni-bigram, bi-trigram, and uni-bi-
analysis features that we use to analyze the dissemi-
                                                            trigram. In all n-gram feature sets we use tf-idf as
nation of true and false news in the Twitter network
                                                            term weight.
(Section 4.2).
                                                                Classification models. We used the same clas-
4.1   Text classification                                   sifiers as prior work on Indonesian text classification
                                                            [1, 8]: Multinomial Naive Bayes (MNB), Support Vec-
Features. For the content-based classification, we          tor Machines (SVM) with SGD optimization [25], and
compare three types of features: orthography features,      Random Forest (RF), all implemented in Scikit-learn.
sentiment lexicon features, and word n-grams.               We used the default hyperparameter settings for each
    Social media such as Twitter is a common exam-          classifier. For SVM, this means that C = 1. For
ple wherein there the conventions of orthographies are      RF, the number of estimators is 100 with no maxi-
sometimes lacking [6]. Therefore, orthography pat-          mum depth for the trees. The final precision and re-
terns are commonly used for social media analysis           call scores of each set of text feature are the average
[8, 17]. We define five orthography features: counts        scores of these three classifiers. Meanwhile, F1 scores
of exclamation marks (E), question marks (Q), upper-        are calculated according to average precision and recall
case letters (U), lowercase letters (L), and emojis (M).    scores.
    For sentiment features, we use the Indonesian               Voting ensembles. We assembled the results of
Sentiment Lexicon (InSet) [9] which comprises 3,609         from each experiment with different text features. The
positive words and 6,609 negative words5 . The senti-       final precision, recall, and F1 scores of each ensemble
ment scores range from -5 to 5, where negative scores       follow the same approach with the text feature sets af-
indicate negative words and positive scores indicate        ter the voting ensemble is performed. We use majority
positive words. Words with score 0 are disregarded          voting: the numbers for each label are compared and
since the lexicon excludes neutral category. Along with     the most voted label is selected. If there is not one
InSet, we use an Indonesian abusive lexicon [8],which       label with the most votes, the class will be determined
comprises 126 words that are considered abusive.6           according to a text feature that has the best perfor-
Thus, we have three sentiment lexicon features: the         mance. We construct two different ensembles: Ensem-
positive word count (P), the negative word count (N),       ble I is arranged from all combinations of each feature,
and the abusive word count (A). Before applying the         Ensemble II is arranged from the best combination of
sentiment lexicons, we apply stop words removal and         each feature.
text normalization7 . The stop words dictionary is
adopted from [20].8 The text normalization dictionary
                                                            4.2   Social network analysis
comprises of 11,034 terms which are mapped to a nor-
malized form. The dictionary is a continuous, collec-      We aim to analyse how true news and false news spread
tive work from researches [1, 8, 16] on the Indonesian     between actors in the two networks described in Sec-
language. In addition to lemmatization, the dictio-        tion 3.3. For visualization, we use Gephi [3], an open-
nary also facilitates Indonesian abbreviations, slangs,    source tool for social network analysis. While we do
misspelled words, and even political figures’ names.       not directly model the precise diffusion of the news
   5 https://github.com/fajri91/InSet                      as the network evolves, we do believe that these two
   6 https://github.com/okkyibrohim/                       methods provide crucial insights in the reach of differ-
id-multi-label-hate-speech-and-abusive-language-detection ent types of news and the network effects involved in
   7 https://github.com/okkyibrohim/
                                                           the process.
id-multi-label-hate-speech-and-abusive-language-detection/
blob/master/new_kamusalay.csv                                 Community detection is a method capable of
   8 https://github.com/stopwords-iso/                     partitioning the network into communities (more
tightly connected groups with fewer connections to           of exclamation marks, question marks, lowercase let-
other communities). Here, we use the well-known Lou-         ters, and emojis; for the 5-class classification having
vain modularity maximization algorithm to perceive           the uppercase letter count instead of the question mark
the potential of filter bubble effects in a community [7].   count is the most effective set. Of the sentiment lex-
Filter bubbles are a phenomenon in which a person is         icons, using a combination of positive and negative
exposed to ideas, people, facts, or news that adhere to      sentiment words gives the best results for both set-
or are consistent with a particular political or social      tings. The assemble of the best feature combinations
ideology, leaving alternative ideas unconsidered and in      performed the best in the 3-class, while the assemble
some cases outrightly rejected [12]. We propose to sys-      from all feature combinations performed the best in
tematically identify every community to see the type         the 5-class. We compare the best feature combination
of news circulating in that community.                       for each feature type in Table 2.
   Centrality measures assign a ranking to nodes                The table shows that the n-gram features outper-
in a network based on their topological position in the      form orthographies and sentiment lexicons in each set-
network. Here, we choose to use betweenness central-         ting and each class. The ensemble methods are also
ity to identify the most influential nodes. Betweenness      not able to improve over the n-gram features alone.
centrality measures for a particular node how many           Nevertheless, the ensembling method allows orthogra-
other nodes are connected via a shortest path that           phy and sentiment lexicons to be included as features
runs through that node. Therefore for the mention            in text classification with better performance than in-
network, the node or username acts as an important           dependently, especially from social media sphere.
hub in receiving and spreading information to other             Final quality of text classification With the
nodes [14]. On an individual node level, betweenness         best text features in the 5-class setting (which is more
centrality captures information from neighboring users       difficult, but also more realistic than the 3-class set-
who both consume and generate false news. For the            ting), we obtain precision scores of 55% for true news,
hashtag co-occurrence network, the hashtag is also an        71% of false news, and 68% for misleading news. Re-
important hub where it frequently co-occurs lot with         call is 85% for true news, 62% for false news, and 26%
other hashtags.                                              for misleading news. The low recall for misleading
                                                             news is caused by the small number of items in this
5     Results and analysis                                   category.
                                                                We analyzed the full collection of 8,784 tweets where
We first present results on the comparison of the effec-
                                                             the unannotated data set (6,424 tweets) is labelled
tiveness of the three different text feature types (Sec-
                                                             by the SVM classifier with SGD optimization in the
tion 5.1). After finding the most effective text features,
                                                             5-class setting with the best-performing feature set
we investigate the dissemination of true and false news,
                                                             (word bigrams). We then do the social network anal-
using the network analysis metrics (Section 5.2).
                                                             ysis on the automatically labelled dataset, which we
                                                             discuss in the next section.
5.1   Results — text classification
Experimental settings. We evaluate our classifiers           5.2   Results — social network analysis
in two different types of experimental settings. The
first setting is the data set with three classes, namely     Table 3 shows the counts of nodes and edges (full net-
True News, False News, and Misleading News. The              work, and for true and false news) in the labelled Twit-
second setting is the data set with four classes: True,      ter networks. The last line of the table shows the num-
False, Misleading, Other, and Unclear (where the three       ber of communities. For the 10 largest communities,
annotators all assigned a different label) While the 3-      the distribution of true and false news by community
class setting is easier for the classifier to learn, the     as well as the top 10 influential actors are shown in
5-class setting is more realistic because it includes the    Figure 1 and 2 for the mention network and in Figure
tweets that are irrelevant but will occur in a real Twit-    5 and 6 for the hashtag co-occurrence network.
ter stream as well. We used a fixed random train–test           The distributions are stacked column of true news
split of the data for evaluation of the models, with 20%     and false news, listing the number of nodes and edges
of the data for testing.                                     in each discovered community or actor (usernames for
    Comparison of feature sets. We find that in the          the mention network work and hashtags for the hash-
3-class classification, the best n-gram feature set is the   tag co-occurrence network). True news is defined by
combination of unigrams and bigrams; in the 5-class          blue color while false news is defined by orange color.
classification the best n-gram feature set is the use of        In the visualization, communities are represented by
bigrams alone. The best orthography feature set for          colours and betweenness centrality determined node
the 3-class classification is the feature set with counts    size, as shown in Figure 3 and 4 for the mentioned net-
                                                                                         Misleading
                                True News               False News
                Features                                                                 News
                                P       R       F1      P          R          F1         P      R             F1
                Uni-bigram      0.730   0.903   0.807   0.811      0.692      0.747      0.830     0.246      0.380
                EQLM            0.374   0.512   0.432   0.437      0.523      0.476      0.133     0.079      0.099
    3 Classes
                PN              0.552   0.836   0.665   0.299      0.221      0.254      0.064     0.044      0.052
                Ensemble II     0.671   0.899   0.768   0.796      0.569      0.664      0.643     0.237      0.346
                Bigram          0.562   0.790   0.657   0.707      0.621      0.661      0.683     0.263      0.380
                EULM            0.354   0.285   0.316   0.308      0.528      0.389      0.051     0.035      0.042
    5 Classes
                PN              0.414   0.786   0.542   0.455      0.179      0.257      0.077     0.070      0.074
                Ensemble I      0.551   0.849   0.668   0.638      0.569      0.602      0.471     0.211      0.291

Table 2: Comparison of all text feature sets plus the ensemble methods. For each text feature type in each
classification setting, only the most effective feature combination is shown. The evaluation scores are average
scores over the three classifiers (NB, SVM, RF).


     Figure 1: Distribution of true news and false           Figure 2: Distribution of true news and false
     news - top 10 communities of mention net-               news - top 10 influential usernames of men-
     work                                                    tion network

                       Mention        Hashtag                 anced communities.
 Statistics
                       Network        Co-occurrence
                                                            • While the proportions of true news and false news
 # Nodes                      1,891             1,302         are quite balanced in general, some usernames
 # Edges                      2,582             4,315         show a very strong tendency towards false news
 # True news edges              841             2,213         over true news, in particular a verified government
 # False news edges           1,043             1,655         institution account bawaslu ri (shown in Figure
 # Communities                  165               133         3 and 4) and two unverified accounts, caknur14
                                                              and hamaro id. One predominantly “true news”
         Table 3: Network Data Properties                     username is cnnindonesia, which is a verified
                                                              news source account.
work and Figure 7 and 8 for the hashtag co-occurrence
network. The visualization is formed by applying ego        • Verified accounts tend to spread more false news
network to the ego (determined username or hashtag)           than true news, where three of the top four in-
within level 1 or its direct connection.                      fluential usernames disseminate more false news
   Mention network Based on the analysis of the               than true news. The two largest, bawaslu ri9
mention network for the 2019 Indonesian presidential          (shown in Figure 3 and 4) and gunromli10 , are
elections on Twitter, we find that:                           verified and politically-related account.

  • False news is more prevalent in the largest com-        • One of the top “true news” influential usernames
    munities and also being disseminated and received         is divhumas polri11 . This is to be expected since
    more by top influential usernames. However,             9 The official account of an Indonesian government institu-
    there are still more communities with a balanced     tion.
    proportion between true news and false news.           10 The official account of an Indonesian politician.

    Many news source accounts are found in these bal-      11 The official account of Indonesian republic police force.
     Figure 3: Network of bawaslu ri’s - true               Figure 4: Network of bawaslu ri’s - false
     news dissemination                                     news dissemination


     Figure 5: Distribution of true news and false          Figure 6: Distribution of true news and false
     news - top 10 communities (by size) of hash-           news - top 10 influential hashtags of hashtag
     tag co-occurrence network                              co-occurrence network
    they have a cyber division dedicated to fight back       community is filled with many slandering hash-
    hoax.                                                    tags towards the incumbent Jokowi, such as
                                                             jaekingoflies (Jae is one of derogatory title to
   Hashtag co-occurrence network Based on the                Jokowi), jaengibuldimanalagi (Where does Jae
analysis of the hashtag co-occurrence network for the        lie again) and uninstalljaenow. However, none
2019 Indonesian presidential elections on Twitter, the       of them is a hashtag with enough influence.
interesting findings are:
                                                           • The inclined “true news” influential hashtags
  • True news is more strongly associated with top           are very general terms and not directly about
    influential hashtags.                                    the presidential election, such as hoax and
                                                             Indonesia. Hashtag hoax is especially notewor-
  • False news is more strongly associated with              thy because any tweet which includes this hashtag
    sentiment-induced hashtags than with hash-               mostly warns that the topic is a hoax, therefore
    tags about events or occurrences.             Ex-        fighting back hoax and is categorized as true news.
    amples are 2019gantipresiden (2019 change                The particular case of this hashtag was also out-
    the president, shown in Figure 7 and 8),                 lined in the annotation guideline.
    indonesianeedsprabowo and 01jokowilagi (01
    Jokowi again), which show support for both can-         The mentioned-based network shows that the influ-
    didates. These results confirm the finding of pre-   ential users are not only receive more false news, but
    vious work [5] that emotions are important in de-    also spread them as well. These usernames consists of
    tecting false information.                           unverified and verified ones, with the top two influen-
                                                         tial usernames are verified and “false news” inclined.
  • There is a community formed (Community 3)            This indicates that accounts with verification mark are
    where only false news circulate in it. This          not always clean from hoaxes.
      Figure 7: Network of 2019gantipresiden -                     Figure 8: Network of 2019gantipresiden -
      true news dissemination                                      false news dissemination
   Meanwhile, the hashtag-based network shows that              the hashtags that relate to explicit support of an elec-
supportive or sentiment-induced hashtags tend to re-            tion candidate occur more in false news messages than
late more with false news, rather than more general             hashtags related to general events. These supportive
events or terms. This indicates that these hashtags             or favouring hashtags tend to contain names or have
are more prone to information bias. Especially the              strong sentiments.
supportive hashtags for each candidate, where users                In the 2019 Indonesian presidential election case,
show fanatic support and attack the opposite candi-             our results show that the combination of text features
date as well, often with false information.                     with social network analysis can provide valuable in-
   As a reminder, these results illustrate the circum-          sights for the study of false news on social media.
stances of the 2019 Indonesian presidential election            Hopefully these findings pave the way for not only de-
event on Twitter. Furthermore, the news are selected            tecting but also preventing the dissemination of false
based on fact-checking websites, which confirming cir-          news in elections.
culating, trending topics on social media whether it is
true or false.
                                                                References
6    Conclusions                                                [1] I. Alfina et al. Hate Speech Detection in The In-
                                                                    donesian Language: A Dataset and Preliminary
In this paper we trained classifiers for detecting false            Study. 2017 International Conference on Advanced
news on Twitter and we analysed its dissemination re-               Computer Science and Information Systems, 233–
lated to the 2019 Indonesian presidential elections. We             238, October 2017.
created a labelled dataset for true, false, and mislead-
ing news that we publish for use by other researchers.12        [2] H. Allcott and M. Gentzkow. Social Media and
   We found that the most prominent text feature to                 Fake News in The 2016 Election. Journal of Eco-
detect and distinguish true news, false news, and mis-              nomic Perspectives, 31(2):211–236, May 2017.
leading news is word n-grams, in particular unigrams
and bigrams. We also experimented with orthography              [3] M. Bastian, S. Heymann, and M. Jacomy. Gephi:
features and sentiment features, but those did not im-              An Open Source Software for Exploring and Ma-
prove the n-gram baseline. Nevertheless, the ensemble               nipulating Networks. Third International AAAI
method allows the possibility to include and further                Conference on Weblogs and Social Media, March
refine these two text features in the future research.              2019.
   From the social network analysis perspective, we
                                                                   Duan, Xinhuan, Elham Naghizade, Damiano
found that the largest communities with top influen-
                                                                   Spina, and Xiuzhen Zhang. ”RMIT at PAN-CLEF
tial usernames tend to have more false news circulating
                                                                   2020: Profiling Fake News Spreaders on Twitter.”
rather than true news. Some of these influential users
                                                                   CLEF, 2020.
are also verified accounts. Regarding the hashtags,
 12 The URL of the data repository will be added after anony-   [4] X. Duan et al. RMIT at PAN-CLEF 2020: Profil-
mous peer review.                                                   ing Fake News Spreaders on Twitter. CLEF, 2020.
[5] B. Ghanem, P. Rosso and F. Rangel. An Emo-           [15] Y. Ruan et al. Prediction of Topic Volume on
    tional Analysis of False Information in Social Me-       Twitter. Proceedings of the 4th International ACM
    dia and News Articles. ACM Transactions on In-           Conference on Web Science, 397–402, 2012.
    ternet Technology (TOIT), 20(2):1–18, April 2020.
                                                         [16] N. A. Salsabila et al. Colloquial Indonesian Lexi-
[6] K. Gimpel et al. Part-of-Speech Tagging for Twit-        con. 2018 International Conference on Asian Lan-
    ter: Annotation, Features, and Experiments. Pro-         guage Processing, 226–229, November 2018.
    ceedings of the 49th Annual Meeting of the Associ-
    ation for Computational Linguistics: Human Lan-      [17] M. S. Saputri, R. Mahendra, and M. Adri-
    guage Technologies, 42–47, June 2011.                    ani. Emotion classification on indonesian Twitter
                                                             Dataset. 2018 International Conference on Asian
[7] Q. Grossetti, C. Du Mouza and N. Travers.                Language Processing, 90–95, November 2018.
    Community-Based Recommendations on Twitter:
    Avoiding the Filter Bubble. International Con-       [18] M. Song, M. C. Kim, and Y. K. Jeong. Analyzing
    ference on Web Information Systems Engineering,          The Political Landscape of 2012 Korean Presiden-
    212–227, November 2019.                                  tial Election in Twitter. IEEE Intelligent Systems,
                                                             29(2):18–26, June 2014.
[8] M. O. Ibrohim and I. Budi. Multi-label Hate
    Speech and Abusive Language Detection in In-         [19] S. Suraya and F. E. D. Kadju. Jokowi Versus
    donesian Twitter. Proceedings of The Third Work-         Prabowo Presidential Race for 2019 General Elec-
    shop on Abusive Language Online, 46–57, August           tion on Twitter. Saudi Journal of Humanities and
    2019.                                                    Social Sciences, 4(3):198–212, April 2019.

[9] F. Koto and G. Y. Rahmaningtyas. Inset Lex-          [20] F. Z. Tala. A Study of Stemming Effects on In-
    icon: Evaluation of A Word List for Indonesian           formation Retrieval in Bahasa Indonesia. Institute
    Sentiment Analysis in Microblogs. 2017 Interna-          for Logic, Language and Computation Universiteit
    tional Conference on Asian Language Processing,          van Amsterdam, December 2003.
    391–394, December 2017.                              [21] Tempo. Metodologi.
[10] K. Lamb. Fake News Spikes in Indonesia ahead            www.cekfakta.tempo.co/metodologi.
    of Elections.                                        [22] S. Volkova and J. Y. Jang. Misleading or Falsi-
    www.theguardian.com/world/2019/mar/20/fake               fication: Inferring Deceptive Strategies and Types
    -news-spikes-in-indonesia-ahead-of-electio               in Online News and Social Media. Companion Pro-
    ns.                                                      ceedings of The Web Conference 2018, 29(2):575–
[11] K. Lamb. ’I felt disgusted’: Inside Indonesia’s         583, April 2018.
    Fake Twitter Account Factories.                      [23] S. Vosoughi, D. Roy, and S. Aral.        The
    www.theguardian.com/world/2018/jul/23/indo               Spread of True and False News Online. Science,
    nesias-fake-twitter-account-factories-jaka               359(6380):1146–1151, March 2018.
    rta-politic.
                                                         [24] Y. Wang et al. Using Hashtag Graph-based Topic
[12] N. Lum. The Surprising Difference between               Model to Connect Semantically-related Words
    Filter Bubble and Echo Chamber.                          without Co-occurrence in Microblogs.      IEEE
    www.medium.com/@nicklum/the-surprising-diff              Transactions on Knowledge and Data Engineering,
    erence-between-filter-bubble-and-echo-chamb              28(7):1919–1933, February 2016.
    er-b909ef2542cc.
                                                         [25] R. G. J. Wijnhoven and P. H. N. de With. Fast
[13] N. Naveed et al. Bad News Travel Fast: A                Training of Object Detection using Stochastic Gra-
    Content-based Analysis of Interestingness on Twit-       dient Descent. 20th International Conference on
    ter. Proceedings of the 3rd International Web Sci-       Pattern Recognition, 424–427, August 2010.
    ence Conference, 1–7, June 2011.
                                                         [26] Y. Zhao. Analysing Twitter Data with Text Min-
[14] A. T. Olanrewaju and A. Rahayu. Examining               ing and Social Network Analysis. Proceedings of
    The Information Dissemination Process on Social          The 11th Australasian Data Mining and Analytics
    Media during The Malaysia 2014 Floods Using So-          Conference, 23–29, November 2013.
    cial Network Analysis. Journal of Information and
    Communication Technology, 17(1):141–166, Jan-
    uary 2020.