Approaches to the Profiling Fake News Spreaders on Twitter Task in English and Spanish Jacobo López Fernández1 and Juan Antonio López Ramı́rez2 Universitat Politècnica de València jalofer1@posgrado.upv.es, jualora1@inf.upv.es Abstract. This paper discusses the decisions made approaching PANs Profiling Fake News Spreaders on Twitter Task at CLEF 2020. We briefly describe how we combined author tweets to create samples that do or do not represent a Fake News Spreader. We decided to handle both lan- guages proposed for this task: Spanish and English; and the methodolo- gies that we suggested were Linear Support Vector Machines (SVMs) and Gradient Boosting, respectively. Other approaches such as Long Short- Term Memory (LSTM) were taken into account in the process of finding a model with the best accuracy results and these were also reported in this paper. We made use of the cross-validation scenario to obtain accuracy results due to the reduced amount of data. We have managed to achieve average accuracy scores of 0.735 for the Spanish language identification task and 0.685 for the English language identification task. Keywords: author profiling, fake news, multilingual, social media, spread- ers 1 Introduction The trust in information read through social media has steadily increased in the last few years. However, allow their accounts to publish and propagate misinfor- mation with severe consequences for our society. First of all, we should make it clear that there are different types of misinformation and disinformation, such as fake news, satire or rumours that go viral in online social networks [5]. In addition, psycho-linguistic information as emotion, sentiment or informal lan- guage should be previously analised. Exploiting information extracted from user profiles and user interactions, we should be able to classify them depending on the information obtained. A great amount of fake news and rumors are propagated in online social networks with the aim, usually, to deceive users and formulate specific opinions [15]. Users play a critical role in the creation and propagation of fake news Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 Septem- ber 2020, Thessaloniki, Greece. online by consuming and sharing articles with inaccurate information either intentionally or unintentionally. To prevent dissemination of misinformation and disinformation we may pro- file fake news spreaders automatically. Author profiling is based on the detection of certain characteristics in profiles making use of linguistic pattern recognition techniques. This paper discusses the decisions made approaching PANs’ Profiling Fake News Spreaders on Twitter Task at CLEF 2020 [11]. The task consists in, given a Twitter feed, determine whether its author is keen to be a fake news spreader. So, it focuses on identifying possible fake news spreaders on social media as a first step towards preventing fake news from being propagated among online users. The task has a multilingual perspective, so that includes tweets in English and Spanish. It is defined as a binary classification task. 2 Related Work Fake news detection has attracted a lot of research attention in the last years. Guess et al. [7] made an approach to this field by doing research during the elections, in particular the 2016 US election procedure. In that paper, they pro- pose a system that obtained features from polls published on Facebook. Popat et al. [9] suggested an end-to-end model to evaluate trust on random texts, with- out human supervision. Subsequently, they presented a biLSTM neural network model which aggregates signals from external evidence articles, the language of these articles and the trustworthiness of their sources. Shu et al. [14] pointed that fake news spreaders cannot be profiled precisely based only on text content, but we should understand the correlation between user profiles on social media and fake news. They state that social engagements should be used as auxiliary information to improve fake news detection systems. In addition, Sliva et al. [13], distinguished that, approaching content from a data mining perspective, we could identify patterns that could mark a text as fake. These patterns, such as clearness which make the text more readable and could convince the receiver even when it is fake. Collecting this kind of information produces a huge, unestructured, incomplete and noisy data; difficult and ex- pensive to manage. Giachanou et al. [6] proposed EmoCred that incorporates emotions that are expressed in the claims into an LSTM network to differentiate between fake and real claims. 3 Fake News Spreaders Detection Systems First, we apply the same type of preprocessing for both, the English and Spanish tasks data, following the next steps: – Load tweets from XML files. – Concatenate tweets forming a chain for every author. All tweets in this chain are separated by a blank space. We apply this technique on the English dataset and the Spanish dataset. With the concatenated data, we vectorized our samples. The vectorizers used to perform this task were CountVectorizer and TfidfVectorizer [12]. CountVec- torizer creates valuable data from counting words in samples while TfidfVector- izer takes into consideration more common words in detriment of those which are less common. Concurrently, the tokenizer selected was casual tokenize, an implementation of TweetTokenizer from NLTK, due to its suitability to manage characters and expressions commonly used on the Twitter social network. After all these transformations, we ended up getting a feature matrix for each of the languages proposed, English and Spanish. 3.1 First approaches The first method applied to classify our samples employed Recurrent Neural Networks (RNN), making use of pretrained word embeddings in order to help representing words as real-valued vectors and lead to better performance of our neural network system. For the Spanish language task, the embeddings loaded by our system were the Spanish Billion Words Corpus and Embeddings [1] which had been trained using word2vec and consists of near 1 million words where each of them is represented as a vector with a size of 300. For the English language task, the embeddings loaded by our system had been trained using GloVe from Stanford [8] and collected by Laurence Moroney, composed of near 6000 billion words where each of them is represented as a vector with a size of 100. Once the embeddings were loaded, we trained our RNN model with LSTM and the following topology: Fig. 1. Topology of convolutional RNN and LSTM. 3.2 Final systems Despite the fact that the results obtained making use of RNN were close to those reported in the state of the art for this kind of tasks, we did not reach promising results as we will explain later in this paper. At that point, we made use of classifiers provided by the framework scikit-learn and chose the Gradient Boosting algorithm and the linear SVM algorithm for the English and Spanish tasks, respectively. The main core of Gradient Boosting [4] consists of a predictive model based on decision trees, built step-by-step allowing the optimization of a differentiable loss function. For this function we made use of linear regression or ’sigmoid’, called ’deviance’ in the scikit-learn framework, whose mathematical expression is represented as the following: 1 σ(z) = (1) 1 + e−z We also experimented using the AdaBoost [3] algorithm along with the loss function, called ’exponential’ in the scikit-learn framework. This algorithm fo- cuses on classification problems and aims to convert a set of weak classifiers into a strong one. The final equation for classification can be represented as: M X F (x) = sign( θm fm (x)) (2) m=1 where fm stands for the mth weak classifier and θm is the corresponding weight. It is exactly the weighted combination of M weak classifiers. The function which gives the weight for the mth weak classifier is the following: 1 1 − m θm = ln( ) (3) 2 m where m is the lowest weighted classification error. From another point of view, the main core of SVM [2] is based on the concept of separating a group of points (samples) into two different categories. As a consequence, our model had to be able to classify the sample correctly into its category. SVM looks for a hyperplane which optimally separates the points belonging the two classes. Subsequently, we look for the hyperplane with the longest distance (margin) to the closest points to it. The equation of the hyperplane in the ’M’ dimension can be given as: M X y =b+ w i xi (4) i=1 where wi are vectors, b is biased term and xi are input variables. Furthermore, given a group of points S = (x1 , c1 ), ..., (xN , cN ) and a constant C>0, we should obtain weights θ ∈