=Paper=
{{Paper
|id=Vol-2696/paper_148
|storemode=property
|title=Fake News Spreader Detection using Neural Tweet Aggregation
|pdfUrl=https://ceur-ws.org/Vol-2696/paper_148.pdf
|volume=Vol-2696
|authors=Oleg Bakhteev,Aleksandr Ogaltsov,Petr Ostroukhov
|dblpUrl=https://dblp.org/rec/conf/clef/BakhteevOO20
}}
==Fake News Spreader Detection using Neural Tweet Aggregation==
<pdf width="1500px">https://ceur-ws.org/Vol-2696/paper_148.pdf</pdf>
<pre>
       Fake news spreader detection using neural tweet
                        aggregation
                         Notebook for PAN at CLEF 2020

                Oleg Bakhteev, Aleksandr Ogaltsov, and Petr Ostroukhov

                                 Antiplagiat, Moscow, Russia;
             Moscow Institute of Physics and Technology (MIPT), Moscow, Russia
           Higher School of Economics, Moscow Institute of Physics and Technology
             bakhteev@ap-team.ru, ogaltsov@ap-team.ru, ostroukhov@ap-team.ru


        Abstract The paper describes the neural networks-based approach for Profiling
        Fake News Spreaders on Twitter task at PAN 2020. The problem is reduced to the
        binary classification with a set of tweets of the user as an object to classify and
        class labels corresponding to users that are likely to spread fake news and ordinary
        users. To deal with a set of tweets we employ two neural network architectures:
        either based on recurrent or convolutional neural networks. We try aggregate the
        whole information obtained from the tweets to decide whether the user can spread
        fake news or not. We also present an ensemble of models that consists of a neural
        network and a classification model that works on each tweet separately.


1     Introduction

Author profiling task is focused on investigating different aspects of the author style.
This year the task considers the problem of fake news spreaders detection [10]: given a
set of tweets written by an author, one should decide whether the author keen to spread
fake news or not. The tweets are written in two languages: English and Spanish. The
training dataset contains 300 sets of tweets for each language with 150 sets for users
that are keen to spread fake news and 150 sets for ordinary users. The performance
metric for the task is an accuracy.
    The problem of detection fake news spreading in social media becomes more and
more important nowadays. There are plenty of works devoted to classification distinct
social media messages based on the fact they contain fake news or not. The methods
of fake news detection significantly vary from usage of classical linguistic features [7]
to the usage of contemporary deep learning-based models [13,4]. In [8] the authors
propose an end-to-end deep learning-based approach to detect fakes using external re-
sources, such as news datasets, which can improve the performance of the existing
methods of fake news detection.
    This work was supported by RFBR project No.18-07-01441.
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessa-
    loniki, Greece.
    Despite the significant amount of works devoted to the fake news detection, many
of them cannot be implemented for this task straightforwardly: the analysed object for
this task is not a unique text, but a collection of short texts. Therefore even if we have
any fake news in the tweet collection, we don’t know exactly, which tweet in collection
really contains fake. Moreover, we don’t have any guarantee that the set of tweets for
the target fake news spreader really contains any fake news: we only know the fact that
the author spreads them. Therefore the usage of external resources becomes less im-
portant for this task. In this way the fake news spreader detection is similar to previous
author profiling task, such as gender profiling [12] or bot detection [11]. The key idea
for many approaches for such tasks is to aggregate information that contained in all the
texts written by the author. For such an aggregation we employ a neural network-based
approach. Inspired by idea described in [14] we consider two architectures based on
recurrent and convolutional neural networks. Both the architectures interpret the input
object as a set without any knowledge about tweet order. We also analyse the perfor-
mance of the ensemble of two models that work on different hierarchy levels of the
dataset, both on the corpus of tweets level and on the distinct tweet level with classifier
trained on the external corpus.


2     Methodology

The following section describes the proposed method of fake news spreader detection.
Formally we consider the problem as a classification problem, where the classified ob-
ject is a set of tweets. Given a labeled dataset D = (xi , yi ):

          xi = {x1i , . . . , xm
                               i },   xji ∈ W+ ,    j ∈ {1, . . . , m},   yi ∈ {0, 1},

where W+ is a set of all possible strings written in the given language, yi = 1 corre-
sponds to users that are likely to spread fake news, yi = 0 corresponds to ordinary user.
The task is to find the binary classifier, that minimizes an empirical risk on the dataset
D:                                           X
                            f = arg min           [f (xi ) 6= yi ],
                                      f ∈F
                                             xi ,yi ∈D

where F is a set of all considered classification models.
   The following sections gives a brief overview of the used models.


2.1   Recurrent network-based architecture

The proposed architecture is illustrated in Figure 1. It consists of 3 main components:

 1. Recurrent network fRNN that works with the word embeddings of the current tweet.
 2. Weighting layer fWL , which determines the impact of each tweet in the final deci-
    sion.
 3. The feedforward network with softmax layer fSM that makes a final decision on the
    class of the tweet collection author.
   Once the recurrent network fRNN processed the tweet xji with hidden state hji we
employ the weighting layer in order to determine the weight of it in total sum:

                                wij = fRNN (hji ), wij ∈ [0, 1].

After that we sum all the hidden states with their weights normalized by softmax:
                                m
                                X
                         hi =         ŵij hji ,   ŵi = softmax(wi ).
                                j=1

The final stage of the classification is done by the feedforward network fSM with a
softmax layer:
                              f (xi ) = arg max fSM (hi )j ,                    (1)
                                              j∈{0,1}

where fSM (hi )j represents the component j of the softmax output fSM (hi ).
    We use GRU for the fRNN component and a one-layer neural network with sigmoid
activation as a weighting layer fWL .


Figure 1. The scheme of the proposed Recurrent network-based architecture. All the RNNs and
Weighting layers share their weights.


2.2   Convolutional network-based architecture

Another approach to aggregate collection of tweets by given author is convolutional
network. CNNs are known for powerful hierarchical representation of matrix data. So,
we need to convert tweet collection into matrix. We do it in a following way:

 1. Take top-k most frequent words from all tweets.
 2. Form a matrix using word embeddings by preserving order of words in top.
 3. Apply convolutions to obtained matrix and get probability distribution via feedfor-
    ward neural network.

More formally, we do “matrization” of each collection

                                         xi −→ Mdxk
                                                i   ,
where d is an embedding size and k is number of most frequent words to take. Than we
get vector of high-level features from filters:

                                   ui = fCNN (Mdxk
                                               i   )

And finally, we obtain class label by feedforward neural network:

                               f (xi ) = arg max fSM (ui ),                            (2)
                                         j∈{0,1}

    The intuition behind such conversion on the one hand is to make representation
independent of tweets order in collection xi and on the other hand to construct abstract
representation of topics and sentiments of the collection. We apply common filters and
pooling with fully-connected network with one hidden layer on the top to predict final
class label. The scheme of this aggregation method is on Figure 2.


               Figure 2. The scheme of the proposed CNN-based architecture.


2.3   Model ensembling
In order to analyse the tweet collection on both the hierarchical levels, either on the
whole tweet set level or on the distinct tweet level, we employ an ensemble of two
models: neural network for aggregation of total information about the author and a per-
tweet classification model. For the per-tweet model we train a classification model fLR
on the dataset from [15]. This is a binary classification dataset, where objects are tweets
about events from 2016, and labels are the indicators of whether particular tweet is
rumour or not. The percentage of rumour tweets in this collection is about 37%.
    As a classification model we use a l2 -regularized logistic regression model with
TF-IDF representation as tweet features.
    As a final decision rule we use the following formula:
                                                         j
                             
                             0, if min{xji ∈xi } fLR (xi ) ≤ α0 ,
                             
                   f (xi ) = 1, if min {xji ∈ xi }fLR (xji ) ≥ α1 ,                     (3)
                             
                               fN N (xi ) otherwise.,
                             
where fLR is a probability of the rumour prediction for the logistic regression model,
fN N is either CNN or RNN described in (1), (2), α0 , α1 are the hyperparameters tuned
on the validation set. The described scheme gives us an opportunity to use some tweet-
level information straightforwardly without neural networks for the two cases:

 1. if all the tweets of the user seem to be very usual and not suspicious;
 2. if any of the tweets is likely to be fake or rumour with high probability and we don’t
    need to use neural network for the tweet collection.


3     Experiment Details

3.1   Preliminary dataset analysis

For the preliminary dataset analysis we vectorized all the tweets from English part of
dataset using Universal Sentence Encoder [2]. We clusterized them using DBSCAN [3]
from scikit-learn package [6]. The visualization of T-SNE projection [5] of the clus-
terization results is represented in Figure 3. Despite the fact the cluster structure is not
rather clear, we can see that the clusters of messages with high amount of users that are
likely to spread fake news are concentrated on the right part of the projection. Although
we do not have any ground truth for the distinct tweets, we believe that clusters with
high amount of messages from the users that are likely to spread fake news contain our
point of interest. A brief analysis of these messages showed that the most part of them
is devoted to the three topics:

 1. news about politics;
 2. news about pop-starts and actors;
 3. news about sport.

A significant part of such messages also mention different celebrities famous in one of
these areas. We tried to use this information in the following experiments.


3.2   Experiment results

In order to validate our models we conducted a computational experiment. As a prepro-
cessing step we lowercased tweets and removed stop-words and punctuation. We did
not use any special preprocessing.
    For the word embeddings we used fastText [1] trained on Common Crawl and
Wikipedia with dimension set to 100. For the hyperparameters tuning we used 5-fold
validation both for the neural networks (1),(2) and the tweet classification model fLR .
We tuned the following hyperparameters:

 1. number of layers and hidden dimension for the RNN model;
 2. number of top-k most frequent words, number of filters and padding size for the
    CNN model;
 3. learning rate, l2 and dropout rate;
 4. α0 and α1 for the model ensemble (3).
Figure 3. T-SNE projection of the vectorized tweets. The color of the point corresponds to the
percentage of messages from users that are likely to spread fake news in cluster: from green (only
ordinary users) to red (all the tweet authors in clusters are the users that are keen to spread fake
news). Grey coloured points correspond to the clusters with only one point per cluster.


    Based on the preliminary analysis from subsection 3.1 we also considered the pro-
posed models with an addition indicator: whether the token corresponds to the celebrity
or not. The celebrity list was mined from the web-resources devoted to the political,
sports, music and cinema news.
    The model evaluation was done using TIRA environment [9]. The results of our
model performance for the cross-validation is shown in Table 1. As we can see, the
RNN-based architecture gives a slightly better performance than CNN-based architec-
ture. The usage of ensemble also allowed us to slightly increase the resulting accuracy.
An additional feature for the celebrity in tokens did not give us any improvement and
lowered the predictional performance.


Language      CNN            RNN      CNN with RNN with CNN with RNN with
                                      celebrity   celebrity   ensemble  ensemble
                                      indicator   indicator
English       0.74 ± 0.05 0.76 ± 0.05 0.7 ± 0.05 0.7 ± 0.05 0.75 ± 0.05 0.77 ± 0.05
Spanish       0.77 ± 0.06 0.78 ± 0.05 0.74 ± 0.06 0.75 ± 0.04     —         —
                          Table 1. The results for the proposed models.
4    Conclusion

The paper describes the neural networks-based approach for Fake news spreaders dete-
cion task. We proposed two neural network architectures based on RNN and CNN. The
resulting performance gave us an accuracy about 77% for the English dataset and 78%
for the Spanish dataset. We believe that the proposed approach can be considered as one
of the baselines for the further problem development. The future work include a detailed
analysis of the dataset and more advanced usage of external resources, such as list of
celebrities mentioned in tweets or external datasets of fake news in tweet messages.


References

 1. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword
    information. Transactions of the Association for Computational Linguistics 5, 135–146
    (2017)
 2. Cer, D., Yang, Y., Kong, S.y., Hua, N., Limtiaco, N., John, R.S., Constant, N.,
    Guajardo-Cespedes, M., Yuan, S., Tar, C., et al.: Universal sentence encoder for english. In:
    Proceedings of the 2018 Conference on Empirical Methods in Natural Language
    Processing: System Demonstrations. pp. 169–174 (2018)
 3. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering
    clusters in large spatial databases with noise. In: Kdd. vol. 96, pp. 226–231 (1996)
 4. Jiang, Y., Petrak, J., Song, X., Bontcheva, K., Maynard, D.: Team bertha von suttner at
    semeval-2019 task 4: Hyperpartisan news detection using elmo sentence representation
    convolutional network. In: Proceedings of the 13th International Workshop on Semantic
    Evaluation. pp. 840–844 (2019)
 5. Maaten, L.v.d., Hinton, G.: Visualizing data using t-sne. Journal of machine learning
    research 9(Nov), 2579–2605 (2008)
 6. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
    Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D.,
    Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal
    of Machine Learning Research 12, 2825–2830 (2011)
 7. Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake
    news. In: Proceedings of the 27th International Conference on Computational Linguistics.
    pp. 3391–3401 (2018)
 8. Popat, K., Mukherjee, S., Yates, A., Weikum, G.: Declare: Debunking fake news and false
    claims using evidence-aware deep learning. In: Proceedings of the 2018 Conference on
    Empirical Methods in Natural Language Processing. pp. 22–32 (2018)
 9. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture.
    In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World.
    Springer (Sep 2019)
10. Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th Author Profiling
    Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: Cappellato, L., Eickhoff,
    C., Ferro, N., Névéol, A. (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR
    Workshop Proceedings (Sep 2020), CEUR-WS.org
11. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: Bots and
    gender profiling in twitter. In: Proceedings of the CEUR Workshop, Lugano, Switzerland.
    pp. 1–36 (2019)
12. Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at
    pan 2017: Gender and language variety identification in twitter. Working notes papers of the
    CLEF pp. 1613–0073 (2017)
13. Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., Su, L., Gao, J.: Eann: Event adversarial
    neural networks for multi-modal fake news detection. In: Proceedings of the 24th acm
    sigkdd international conference on knowledge discovery & data mining. pp. 849–857
    (2018)
14. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.:
    Deep sets. In: Advances in neural information processing systems. pp. 3391–3401 (2017)
15. Zubiaga, A., Hoi, G.W.S., Liakata, M., Procter, R.: Pheme dataset of rumours and
    non-rumours (2016). https://doi.org/10.6084/M9.FIGSHARE.4010619,
    https://figshare.com/articles/PHEME_dataset_of_rumours_and_non-rumours/4010619

</pre>