Automatic Detection of Fake News Spreaders Using BERT Notebook for PAN at CLEF 2020 Arup Baruah1 , Kaushik Amar Das1 , Ferdous Ahmed Barbhuiya1 , and Kuntal Dey2? 1 IIIT Guwahati, India 2 Accenture Technology Labs, Bangalore arup.baruah@gmail.com, kaushikamardas@gmail.com, ferdous@iiitg.ac.in, kuntal.dey@accenture.com Abstract This paper discusses the approach we used to detect fake news spread- ers. We used the pre-trained large cased BERT model to perform the classifi- cation. We experimented by concatenating all the tweets of an author and then performing classification using the vector obtained by max-pooling the 1024- dimensional vectors of the sub-strings of the concatenated string. We also exper- imented by processing each of the tweets of an author separately. It was found that concatenating the tweets yields better performance. This model obtained an accuracy of 0.6900 on the test set. 1 Introduction The shared task on “Profiling Fake News Spreaders” was held as part of PAN at CLEF 2020. This task is basically a binary classification task where it is required to determine if a given author has spread some fake news in the past or not. Detecting fake news spreaders is an important step to prevent fake news from propagating through social media. This task was held for English and Spanish languages. The details of this shared task is available in Rangel et al. [8]. In this paper, we describe the work we performed for this shared task. We partic- ipated in this task for the English language only. We used the pre-trained large cased BERT [1] model to classify authors as fake news spreaders or not. The rest of this paper is structured as follows: Section 2 discusses the related work that has been performed for author profiling and fake news detection, Section 3 describes the dataset used for this shared task, Section 4 presents the methodology we used, and Section 5 discusses the results we obtained. 2 Related Work The task of detecting fake news spreaders falls in the category of author profiling. As opposed to author attribution where it is required to determine the identity of the author Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessa- loniki, Greece. ? *This work was done when the author was affiliated with IBM Research India, New Delhi of a particular piece of text, author profiling is about categorizing authors into different classes such as gender, age, occupation, bots, etc. based on a given text as the evidence. PAN has been conducting shared tasks on author profiling since 2013. Rangel and Rosso in [4] summarizes the author profiling task of PAN 2019. This subtask required determining if the author of a particular piece of text is a human or a bot. If the author is a human, the task also required determining the gender of the author. The best performing system for detecting bots in the English language obtained an ac- curacy of 0.96 using a random forest classifier [3]. Features used include tweet length, number of capital and lowercase letters, mentions, retweets, edit distance between con- secutive tweets, and tf-idf of unigrams and bigrams. For gender classification, the best performance was obtained by a logistic regression classifier [10]. The features used in- cludes word n-grams (1 to 3) and character n-grams (3 to 5). Instead of removing emoji and special characters, they were converted to text. The system obtained an accuracy of 84% in detecting gender. Polignano et al. [5] used a deep learning approach to de- tect bots and gender. A combination of convolutional neural networks and dense neural networks were used to perform the classification. GloVe, word2vec, and FastText word embeddings were used in the study. Their system obtained accuracy scores of 0.9182 and 0.7973 in detecting bots and gender respectively. With regard to fake news detection in social media texts, Shu et al. in [9] discuss knowledge-based, style-based, stance-based, and propagation-based approaches for de- tecting fake news. They also list the different types of features that can be used for fake news detection which include content-based features (source, headline, lexical features, syntactic features, and visual features), and social context features (user-based, post- based, and network-based). 3 Dataset The shared task “Profiling Fake News Spreaders on Twitter” was conducted for English and Spanish languages. The training data provided for the English language consisted of tweets for 300 different authors and 100 tweets were provided for each author. The dataset was balanced with 150 positive and 150 negative instances. The number of tokens in each tweet varied from 6 to 30. 4 Methodology In our work, we used the pre-trained large cased BERT model. This version of BERT has 24 layers and 16 attention heads. It produces 1024-dimensional vectors to repre- sent the words. BERT generates contextualized word embeddings as opposed to static embeddings produced by word2vec or GloVe. The details of our approach are depicted in Figure 1. As mentioned in Section 3, for each author, a list of 100 tweets is provided in the dataset. The tweets for each au- thor were first concatenated. The concatenated string was then tokenized using BERT’s WordPiece tokenizer. The tokens were then split into chunks of length 500 tokens. If the last chunk had less than 500 tokens, it was padded with zeroes to make the length equal to 500 tokens. Each of the token sub-list was then provided as input to the pre-trained .... .... .... .... ............................... ......... Figure 1. Architecture of our classifier BERT model. The 1024-dimensional vector from the Extract layer of the BERT model was used as the representation of the sub-string. Max-pooling was then performed on the 1024-dimensional vectors of the token sub-lists. The resultant 1024-dimensional vector was then provided as input to the classification layer. The classification layer consisted of a single Dense layer having a single unit. The sigmoid activation function was used for the layer. The Adam optimizer was used for training and the loss function that was used was is Binary Crossentropy. We also performed another experiment, whereby, instead of concatenating all the 100 tweets of an author, a 1024-dimensional vector was generated for each tweet using the pre-trained BERT model. Max pooling was performed on the 100 vectors that thus obtained. The classification was performed using the resultant vector. Max sequence length of 40 was used for the experiment. 5 Results In this section, we discuss the results obtained on the development and the test set. The development set was created from the dataset by doing a stratified split. 80% of the dataset was used for training and 20% of the dataset was used as the development set. Table 1 shows the results obtained on the development set. As mentioned in section 4, experiment 1 in the table refers to the experiment where the tweets of each author were concatenated and then split in sub-lists of 500 tokens each (after tokenization). Experi- ment 2 in the table refers to the experiment where each tweet was processed separately. As can be seen from the table, concatenating the tweets resulted in better performance than processing each tweet separately. Based on this observation, the model from ex- periment 1 was used to make the submission for the shared task. Table 2 shows the confusion matrices for Experiment 1 and Experiment 2 on the development set. As can be seen, Experiment 1 performed better than Experiment 2 in detecting the fake news spreaders category. Experiment Precision Recall F1 Accuracy Experiment 1 0.7229 0.7167 0.7147 0.7167 Experiment 2 0.6435 0.6333 0.6267 0.6333 Table 1. Dev Set Results Experiment 1 Experiment 2 Pred NOT Pred FAKE Pred NOT Pred FAKE NOT 24 6 23 7 FAKE 11 19 15 15 Table 2. Confusion Matrix for Dev Set System Method Accuracy Our System BERT 0.6900 Best System - 0.7500 Baseline 1 [7] Low Dimensionality Representation 0.7450 Baseline 2 NN + word n-grams 0.6900 Baseline 3 SVM + char n-grams 0.6800 Baseline 4 [2] LSTM + Emotional features 0.6400 Baseline 5 LSTM 0.5600 Baseline 6 Random 0.5100 Table 3. Test Set Results (English language) Our model obtained an accuracy score of 0.6900 on the test set for English lan- guage. The evaluation on the test set was performed on the TIRA platform [6]. Table 3 shows the performance of our system in comparison to the best performing system of the shared task and other baseline systems. As can be seen, our system performed better than the random, LSTM, emotionally infused LSTM [2], and character n-gram based SVM baseline systems. Our system had the same accuracy as the word n-gram based NN baseline and performed worse than the baseline system that used the low dimen- sionality representation technique [7]. The final rank in the shared task was determined by averaging the accuracy scores obtained for both English and Spanish languages. As we did not make any submission for Spanish language, our system obtained a rank of 58 out of 66 participants. However, when considering the scores for only English language, we obtained a rank of 29 out of 66 participants. 6 Conclusion Detecting fake news spreaders in an important step to control the spread of fake news through social media. In our work, we used a classifier based on the pre-trained large cased BERT model to detect fake news spreaders. It was found that concatenating all the tweets of an author yielded a better performance than processing each tweet separately. Our model obtained accuracy score of 0.6900 in the test data. It performed better than the character n-gram based SVM, LSTM, emotionally infused LSTM and the random baseline systems. References 1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423 2. Ghanem, B., Rosso, P., Rangel, F.: An Emotional Analysis of False Information in Social Media and News Articles. ACM Transactions on Internet Technology (TOIT) 20(2), 1–18 (2020) 3. Johansson, F.: Supervised classification of twitter accounts based on textual content of tweets. In: Cappellato, L., Ferro, N., Losada, D.E., Müller, H. (eds.) Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September 9-12, 2019. CEUR Workshop Proceedings, vol. 2380. CEUR-WS.org (2019), http://ceur-ws.org/Vol-2380/paper_154.pdf 4. Pardo, F.M.R., Rosso, P.: Overview of the 7th author profiling task at PAN 2019: Bots and gender profiling in twitter. In: Cappellato, L., Ferro, N., Losada, D.E., Müller, H. (eds.) Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September 9-12, 2019. CEUR Workshop Proceedings, vol. 2380. CEUR-WS.org (2019), http://ceur-ws.org/Vol-2380/paper_263.pdf 5. Polignano, M., de Pinto, M.G., Lops, P., Semeraro, G.: Identification of bot accounts in twitter using 2d cnns on user-generated contents. In: Cappellato, L., Ferro, N., Losada, D.E., Müller, H. (eds.) Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September 9-12, 2019. CEUR Workshop Proceedings, vol. 2380. CEUR-WS.org (2019), http://ceur-ws.org/Vol-2380/paper_95.pdf 6. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. Springer (Sep 2019) 7. Rangel, F., Franco-Salvador, M., Rosso, P.: A Low Dimensionality Representation for Language Variety Identification. In: International Conference on Intelligent Text Processing and Computational Linguistics. pp. 156–169. Springer (2016) 8. Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR-WS.org (Sep 2020) 9. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data mining perspective. SIGKDD Explorations 19(1), 22–36 (2017), https://doi.org/10.1145/3137597.3137600 10. Valencia-Valencia, A.I., Gómez-Adorno, H., Rhodes, C.S., Pineda, G.F.: Bots and gender identification based on stylometry of tweet minimal structure and n-grams model. In: Cappellato, L., Ferro, N., Losada, D.E., Müller, H. (eds.) Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September 9-12, 2019. CEUR Workshop Proceedings, vol. 2380. CEUR-WS.org (2019), http://ceur-ws.org/Vol-2380/paper_216.pdf