=Paper= {{Paper |id=Vol-2696/paper_138 |storemode=property |title=Approaches to the Profiling Fake News Spreaders on Twitter Task in English and Spanish |pdfUrl=https://ceur-ws.org/Vol-2696/paper_138.pdf |volume=Vol-2696 |authors=Jacobo López Fernández,Juan Antonio López Ramírez |dblpUrl=https://dblp.org/rec/conf/clef/FernandezR20 }} ==Approaches to the Profiling Fake News Spreaders on Twitter Task in English and Spanish== https://ceur-ws.org/Vol-2696/paper_138.pdf
        Approaches to the Profiling Fake News
       Spreaders on Twitter Task in English and
                       Spanish

           Jacobo López Fernández1 and Juan Antonio López Ramı́rez2

                           Universitat Politècnica de València
                     jalofer1@posgrado.upv.es, jualora1@inf.upv.es



        Abstract. This paper discusses the decisions made approaching PANs
        Profiling Fake News Spreaders on Twitter Task at CLEF 2020. We briefly
        describe how we combined author tweets to create samples that do or
        do not represent a Fake News Spreader. We decided to handle both lan-
        guages proposed for this task: Spanish and English; and the methodolo-
        gies that we suggested were Linear Support Vector Machines (SVMs) and
        Gradient Boosting, respectively. Other approaches such as Long Short-
        Term Memory (LSTM) were taken into account in the process of finding a
        model with the best accuracy results and these were also reported in this
        paper. We made use of the cross-validation scenario to obtain accuracy
        results due to the reduced amount of data. We have managed to achieve
        average accuracy scores of 0.735 for the Spanish language identification
        task and 0.685 for the English language identification task.

        Keywords: author profiling, fake news, multilingual, social media, spread-
        ers


1     Introduction
The trust in information read through social media has steadily increased in the
last few years. However, allow their accounts to publish and propagate misinfor-
mation with severe consequences for our society. First of all, we should make it
clear that there are different types of misinformation and disinformation, such
as fake news, satire or rumours that go viral in online social networks [5]. In
addition, psycho-linguistic information as emotion, sentiment or informal lan-
guage should be previously analised. Exploiting information extracted from user
profiles and user interactions, we should be able to classify them depending on
the information obtained.
    A great amount of fake news and rumors are propagated in online social
networks with the aim, usually, to deceive users and formulate specific opinions
[15]. Users play a critical role in the creation and propagation of fake news
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 Septem-
    ber 2020, Thessaloniki, Greece.
online by consuming and sharing articles with inaccurate information either
intentionally or unintentionally.
     To prevent dissemination of misinformation and disinformation we may pro-
file fake news spreaders automatically. Author profiling is based on the detection
of certain characteristics in profiles making use of linguistic pattern recognition
techniques.
     This paper discusses the decisions made approaching PANs’ Profiling Fake
News Spreaders on Twitter Task at CLEF 2020 [11]. The task consists in, given
a Twitter feed, determine whether its author is keen to be a fake news spreader.
So, it focuses on identifying possible fake news spreaders on social media as
a first step towards preventing fake news from being propagated among online
users. The task has a multilingual perspective, so that includes tweets in English
and Spanish. It is defined as a binary classification task.

2   Related Work
Fake news detection has attracted a lot of research attention in the last years.
Guess et al. [7] made an approach to this field by doing research during the
elections, in particular the 2016 US election procedure. In that paper, they pro-
pose a system that obtained features from polls published on Facebook. Popat
et al. [9] suggested an end-to-end model to evaluate trust on random texts, with-
out human supervision. Subsequently, they presented a biLSTM neural network
model which aggregates signals from external evidence articles, the language of
these articles and the trustworthiness of their sources.
    Shu et al. [14] pointed that fake news spreaders cannot be profiled precisely
based only on text content, but we should understand the correlation between
user profiles on social media and fake news. They state that social engagements
should be used as auxiliary information to improve fake news detection systems.
In addition, Sliva et al. [13], distinguished that, approaching content from a data
mining perspective, we could identify patterns that could mark a text as fake.
These patterns, such as clearness which make the text more readable and could
convince the receiver even when it is fake. Collecting this kind of information
produces a huge, unestructured, incomplete and noisy data; difficult and ex-
pensive to manage. Giachanou et al. [6] proposed EmoCred that incorporates
emotions that are expressed in the claims into an LSTM network to differentiate
between fake and real claims.

3   Fake News Spreaders Detection Systems
First, we apply the same type of preprocessing for both, the English and Spanish
tasks data, following the next steps:
 – Load tweets from XML files.
 – Concatenate tweets forming a chain for every author. All tweets in this chain
   are separated by a blank space. We apply this technique on the English
   dataset and the Spanish dataset.
    With the concatenated data, we vectorized our samples. The vectorizers used
to perform this task were CountVectorizer and TfidfVectorizer [12]. CountVec-
torizer creates valuable data from counting words in samples while TfidfVector-
izer takes into consideration more common words in detriment of those which
are less common.
    Concurrently, the tokenizer selected was casual tokenize, an implementation
of TweetTokenizer from NLTK, due to its suitability to manage characters and
expressions commonly used on the Twitter social network.
    After all these transformations, we ended up getting a feature matrix for each
of the languages proposed, English and Spanish.


3.1   First approaches

The first method applied to classify our samples employed Recurrent Neural
Networks (RNN), making use of pretrained word embeddings in order to help
representing words as real-valued vectors and lead to better performance of our
neural network system.
    For the Spanish language task, the embeddings loaded by our system were the
Spanish Billion Words Corpus and Embeddings [1] which had been trained using
word2vec and consists of near 1 million words where each of them is represented
as a vector with a size of 300.
    For the English language task, the embeddings loaded by our system had
been trained using GloVe from Stanford [8] and collected by Laurence Moroney,
composed of near 6000 billion words where each of them is represented as a
vector with a size of 100.
    Once the embeddings were loaded, we trained our RNN model with LSTM
and the following topology:




                Fig. 1. Topology of convolutional RNN and LSTM.
3.2   Final systems

Despite the fact that the results obtained making use of RNN were close to
those reported in the state of the art for this kind of tasks, we did not reach
promising results as we will explain later in this paper. At that point, we made
use of classifiers provided by the framework scikit-learn and chose the Gradient
Boosting algorithm and the linear SVM algorithm for the English and Spanish
tasks, respectively.
    The main core of Gradient Boosting [4] consists of a predictive model based
on decision trees, built step-by-step allowing the optimization of a differentiable
loss function. For this function we made use of linear regression or ’sigmoid’,
called ’deviance’ in the scikit-learn framework, whose mathematical expression
is represented as the following:

                                               1
                                  σ(z) =                                          (1)
                                            1 + e−z
    We also experimented using the AdaBoost [3] algorithm along with the loss
function, called ’exponential’ in the scikit-learn framework. This algorithm fo-
cuses on classification problems and aims to convert a set of weak classifiers into
a strong one. The final equation for classification can be represented as:
                                            M
                                            X
                            F (x) = sign(         θm fm (x))                      (2)
                                           m=1

   where fm stands for the mth weak classifier and θm is the corresponding
weight. It is exactly the weighted combination of M weak classifiers. The function
which gives the weight for the mth weak classifier is the following:

                                       1     1 − m
                                θm =     ln(        )                             (3)
                                       2       m
    where m is the lowest weighted classification error.
    From another point of view, the main core of SVM [2] is based on the concept
of separating a group of points (samples) into two different categories. As a
consequence, our model had to be able to classify the sample correctly into
its category. SVM looks for a hyperplane which optimally separates the points
belonging the two classes. Subsequently, we look for the hyperplane with the
longest distance (margin) to the closest points to it.
    The equation of the hyperplane in the ’M’ dimension can be given as:
                                            M
                                            X
                                  y =b+           w i xi                          (4)
                                            i=1

   where wi are vectors, b is biased term and xi are input variables.
   Furthermore, given a group of points S = (x1 , c1 ), ..., (xN , cN ) and a constant
C>0, we should obtain weights θ ∈