A BERT based Two-stage Fake News Spreaders
                Profiling System
                        Notebook for PAN at CLEF 2020

                            Shih-Hung Wu and Sheng-Lun Chien

                 Department of Computer Science and Information Engineering,
                             Chaoyang University of Technology
                                 Taichung, Taiwan (R.O.C)
                        shwu@cyut.edu.tw, s10727614@cyut.edu.tw


        Abstract. This paper describes our two-stage classification approach to the
        CLEF 2020 lab: Profiling Fake News Spreaders on Twitter. The task can be
        briefly defined as: Given a Twitter feed, determine whether its author is keen to
        be a spreader of fake news. Our approach is to adopt the pretrained model
        BERT as a tweet classifier and to spot potential spreaders whose tweets are
        strongly suspected as fake news. The performance of our approach can reach
        0.71 on the English data set during developing the system. However, the
        performance drop to 0.56 in the final PAN at CLEF 2020 shared task.


1      Introduction

    A great amount of fake news and rumors are propagated in online social networks.
According to the experience on developing anti-spam techniques, it is a good
approach to spot the source instead of trying to check the content one-by-one. The
aim of profiling fake news spreaders task at PAN-2020 is to know if it is possible to
discriminate authors who have posted some fake news in the past from those who
have never done it before [1].
    The organizers propose the task from a multilingual perspective, and provide data
set in English and Spanish, and recommend the participants to take part in both
languages. The uncompressed dataset consists in a folder per language (en, es). Each
folder contains an XML file per author (Twitter user) with 100 tweets and the
filename of these XML files corresponding to the unique author IDs. There are also a
separate truth.txt file with the list of authors and the ground truth of whether they are
fake news spreaders or not. The performance of a system will be ranked by accuracy
in discriminating between the two classes.
    However, due to the limitation of time and resource, we just build a system only
for tweets in English based on the content analysis and skip the tweets in Spanish.
The decision process of our system is a two-stage classification approach to the

    Copyright © 2020 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25
    September 2020, Thessaloniki, Greece.
Profiling Fake News Spreaders on Twitter task. Our system adopted the pre-trained
bidirectional transformer language model, known as BERT [2] as our NLP tool for
content analysis. During the training phrase, we fine-tune of the pretrained model
BERT as a tweet classifier and use it to classify each tweet as potential fake news or
not. Then our system spot a spreader by checking the percentage of each author’s
tweets that is classified as fake news. If the percentage is higher than a threshold, then
we consider the author is a fake news spreader.


2       The BERT Pre-trained Model

   The system flow is shown in the following figures. Figure 1 shows the BERT
model and classifier architecture. The core of our system is the pretrained language
model “BERT”. The BERT model is a bidirectional transformer pre-trained using a
combination of masked language modeling (MLM) objective and next sentence
prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.
BERT stands for Bidirectional Encoder Representations from Transformers. BERT is
designed to pre-train deep bidirectional representations from unlabeled text. As a
result, the pre-trained BERT model can be fine-tuned with just one additional output
layer to create models for new tasks. The implementation of BERT that we use is
BERT for sequence classification from Hugging face library1. The pretrained model is
“bert-base-uncased”, that required all English text to be in lower case. The hyper-
parameter in the training phrase: Hidden size = 768, Learning r= 6.0e-5, and Vocab =
30522. We train the model 10 epochs in each experiment setting.

                         Class


                     Linear Classifier


                                             BERT


       [CLS]        W1           W2        W3          …                 Wn


                                 Figure 1: BERT model and classifier.

   Figure 2 shows how our system do the training. In the training phrase, we have
done some our training data preprocessing. We extracted every tweet and added the
truth data for each XML file. Each XML file contains 100 tweets, and will be
associated with the same label. All the non-English characters are filtered, only


1   https://huggingface.co/transformers/model_doc/bert.html
English characters are kept for training data. Then we combine all data into a training
dataset for the model to let it learn which tweet may be telling the fake news or not.

                                                              Convert the file
               Start                                          Into CSV format


             Input all
             XML Files                                     Transform the CSV file
                                                            Into Training dataset


     Extract every tweets and                           Run the BERT classification
    associate the truth label for
            each tweet


     Process the text and keep                                       End
       only English character


                                    Figure 2: system architecture.


3     System Development

   There are 300 authors and each has 100 tweets in the training set. We find that half
of them are spreaders however we do not know whether each tweet is a fake news or
not. However, we assume that all tweets belong to the spreaders are potential fake
news and all tweets belong to the non-spreaders are real news. Thus, we trained a
classifier that can classify the news into potential fake ones and real ones.
   We know this assumption is imprecise, the classifier cannot spot the fake news
well. So when we need to use it to spot a spreader, we set a threshold mechanism to
prevent overly identify too many authors as spreaders. Only if the percentage of an
author’s tweets passed the threshold he/she will be labelled as spreader. An author
with only a few tweets that are classified as fake news will not be labelled as a
spreader. The decision is made by an empirical threshold. We divide the training set
into two parts and use this developing set to find the best threshold, where 70% of the
data used as training set and 30% of the data used as test set. Figure 3 shows the
accuracy vs. threshold result, where the threshold range from 60% to 90%. The
system can get a 0.71 accuracy value with a threshold 74%. The threshold is selected
manually and used in our system.
                      Figure 3: Threshold vs. accuracy result on training set.

   To know how the system might perform, we conduct several similar experiments
on the training set. Table 1 shows the test results. The accuracy value is around 0.65
to 0.71 given enough training data, i.e. 60% to 80% of the data in training set. We
expect that our system can get similar result in formal test.

Table 1: System performance during develop, we divide the English training set as training part
                             and test part with the threshold 74%
              Training data to test data ratio                 Accuracy
              Training50% test50%                              0.47
              Training60% test40%                              0.66
              Training70% test30%                              0.71
              Training80% test20%                              0.65

   Figure 4 shows how our system do the test. Before testing the data, we also do the
data preprocessing first. We extracted every tweet for the one Author file (XML file)
and used the model to predict every tweet. After all tweets of one author were labeled
1 or 0, we have a threshold mechanism to make decisions on whether the author is a
spreader or not by checking the percentage of 1 exceeded 74% or not. Then our
system will put the final answer with author id to a XML file and finish the task.
                                               Input                       Output
                Start
                                            Test dataset                Final answer


               Input
                                       Run BERT classifier for         Transform final
              XML file
                                           every tweets                    answer
                                                                        Into XML file


            Extract every
               tweets                   Calculate the average               End
                                        class labels of tweets
                                           for each author


         Transform the data              Check the average
        into Test set CSV file                  pass
                                       The threshold or not


                                  Figure 4: Our system flow.


4    Test Result and Discussion

   The test part is taken on the virtual machine provided by the organizers, we met
some technical error. In the training phrase, all the non-English characters are filtered,
only English characters are kept for training data, but we omitted this part during the
test phrase. This is one of the reasons that our system performance decreased. Table 2
shows our system official final test result vs. some benchmarks. The accuracy value
of our system is 0.560, which is equal to the LSTM benchmark but lower than our
best result on the development set.

            Table 2: Our system performance of the final result vs. benchmarks.
                           Test runs                                    Accuracy
                   SYMANTO (LDSE) [5]                                    0.745
                            EIN [6]                                      0.640
                             LSTM                                        0.560
         Pan20-author-profiling-test-dataset-2020-02-23                  0.560
                          RANDOM                                         0.510


5    Conclusion

   This paper describes our two-stage classification approach to Profiling Fake News
Spreaders on Twitter task. The performance of our approach can reach 0.7 on the
development set. However, the performance drop to 0.56 in the final PAN evaluation
at CLEF 2020 shared task.
    As future work, we intend to investigate what are the other information that might
help to detect fake news spreaders [3]. For example, in addition to news content and
labels, fake news articles in some datasets also provide information on social network
of Twitter which contains Twitter users and their following relationships, i.e., user-
user relationships, and how the news has propagated (tweeted/re-tweeted) by users,
i.e., news-user relationships [4].


Acknowledgements

This study was supported by the Ministry of Science and Technology under the grant
number MOST 109-2221-E-324-024


Reference

1.   Rangel F., Giachanou A., Ghanem B., Rosso P. Overview of the 8th Author Profiling Task
     at PAN 2020: Profiling Fake News Spreaders on Twitter. In: L. Cappellato, C. Eickhoff, N.
     Ferro, and A. Névéol (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR
     Workshop Proceedings.CEUR-WS.org
2.   Devlin, j., Chang, M., Lee, K., and Toutanova, K. BERT: Pre-training of Deep
     Bidirectional Transformers for Language Understanding, arXiv:1810.04805v2 [cs.CL]
     (2018)
3.   N. Ruchansky, S. Seo, and Y. Liu. CSI: A Hybrid Deep Model for Fake News Detection.
     In Proceedings of the 2017 ACM on Conference on Information and Knowledge
     Management, pages 797-806. ACM, (2017)
4.   K. Shu, S. Wang, and H. Liu. Beyond News Contents: The Role of Social Context for
     Fake News Detection.In WSDM, (2019)
5.   Rangel F., Franco-Salvador M., Rosso P. A Low Dimensionality Representation for
     Language Variety Identification. In: Postproc. 17th Int. Conf. on Comput. Linguistics and
     Intelligent Text Processing, CICLing-2016, Springer-Verlag, Revised Selected Papers,
     Part II, LNCS(9624), pp. 156-169 (arXiv:1705.10754)
6.   Ghanem, B., Rosso, P., and Rangel, F. (2020). An Emotional Analysis of False
     Information in Social Media and News Articles. ACM Transactions on Internet
     Technology (TOIT), 20(2), pp. 1-18.