=Paper= {{Paper |id=Vol-3782/paper3 |storemode=property |title=Detecting fake news using Twitter social information |pdfUrl=https://ceur-ws.org/Vol-3782/paper3.pdf |volume=Vol-3782 |authors=Jesús M. Fraile-Hernández,Álvaro Rodrigo,Roberto Centeno |dblpUrl=https://dblp.org/rec/conf/codai2/Fraile-Hernandez24 }} ==Detecting fake news using Twitter social information== https://ceur-ws.org/Vol-3782/paper3.pdf
                         Detecting fake news using Twitter social information
                         Jesús M. Fraile-Hernández* , Álvaro Rodrigo and Roberto Centeno
                         NLP & IR Group at UNED (Spain)


                                     Abstract
                                     In this paper, the aim is to study whether social information can provide useful information when classifying
                                     news. For this purpose, a set of news items in Spanish has been extended with social information. Subsequently, a
                                     classifier model has been proposed to carry out this task, mixing the social information previously extracted with
                                     the textual information of the news item. Finally, we have studied which social features are the most relevant in
                                     this task.

                                     Keywords
                                     Social information, Classifying news, Classifier model, Social features, Fake news detection




                         1. Introduction
                         Due to the increase in communication channels in recent decades, users have access to an immense
                         amount of information almost instantaneously. However, it is relatively easy to fall for hoaxes or
                         misinformation on social media.
                            Traditional models of fake news detection focus on detecting the linguistic characteristics of the news.
                         Subsequently, in [1], pre-trained embeddings were used along with LSTM. Finally, with the emergence
                         of contextual models, [2] leveraged the pre-trained BERT model, to perform transferred learning and
                         identify the veracity of news.
                            However, due to the difficulty even for a human to discern between true and false news, sometimes the
                         textual information in the news is not enough. In [3] it is proposed at a theoretical level the possibility
                         of creating a hybrid approach that incorporates the linguistic characteristics of the news and an analysis
                         of the networks that are formed around that news. In [4] the author uses different features to identify
                         fake news in popular Twitter threads. In [5] fake news is detected using only the extracted textual
                         information. Regarding hybrid models, the CSI model proposed in [6] performs a characterisation in
                         three modules: capturing, scoring and integrating. In [7], a news detection model is proposed that
                         considers the association of user interactions, the editor’s bias and the users’ stance towards the news.
                            The aim of this work is to study whether social information can provide useful information for
                         the detection of fake news. To this end, social information has been collected from Twitter to extend
                         FakeDeS, a relevant corpus of news in Spanish, and a model has been designed to include textual and
                         social information. Furthermore, we intend to study which social features are the most relevant for
                         news classification.
                            The rest of this paper is structured as follows: Section 2 describes the datasets to be used along with
                         the task to be solved. Section 3 describes the methodology followed including the extraction of social
                         information from Twitter along with the models proposed based on the data they use. Section 4 includes
                         the evaluation metrics used. Section 5 then presents the results, which will be discussed in Section 6.
                         Finally, conclusions and future work are given in Section 7.




                         Proceedings of the 1st Workshop on COuntering Disinformation with Artificial Intelligence (CODAI), co-located with the 27th
                         European Conference on Artificial Intelligence (ECAI), pages 19–28, October 20, 2024, Santiago de Compostela, Spain
                         *
                           Corresponding author.
                         $ jfraile@lsi.uned.es (J. M. Fraile-Hernández)
                          0009-0001-5474-4844 (J. M. Fraile-Hernández)
                                    © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                           19
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                           19–28




        Figure 1: Results IberLEF 2021 on the test set.


2. Dataset and task
The dataset we will work with is the Spanish Fake News Corpus (FakeDeS) [8], which contains publica-
tions in Spanish about different events that were collected from November 2020 to March 2021. Each of
these publications is labelled as true or false. Newspaper websites and fact-checking websites were
mainly used to collect the information.
    The dataset is divided into 3 files with a total of 1543 news items. Because of the methodology used,
it has been decided to merge the training and development files to obtain what we will call the training
set. Each of the news items contains information such as the topic, the name of the source, the headline,
the text and the link to the news item.
    The training set has a total of 971 news items, of which 480 are false and 491 are true. On the other
hand, the test set consists of 572 news items, half of which are true and half of which are false. Therefore,
we are dealing with balanced data sets.
    The topics covered in the training corpus are: politics, entertainment, sport, society, science, health,
economy, security and education.
    It should be noted that the test set has news related to Covid-19, while the training set does not
present any news related to this topic (the most similar are the health news, but in no case do they
mention Covid-19). Therefore, the models that are proposed will have to correctly classify this topic
without having seen it in the training.
    In IberLEF 2021, a shared task was proposed whose objective was to classify a series of news items
as true or false. To do so, the FakeDeS corpus described above was used. A report was published in
[9], which collected the most important characteristics of the best-performing models. The results of
this task by the different participants can be seen in Figure 1. Among the approaches used to solve
it, the participants of the GDUFS team, the team that achieved the best accuracy, used a BERT model
and sample memory with an attention mechanism. The method consisted of taking the first and last
segments of the texts and feeding them into a BERT system, obtaining two embeddings (head and tail).
In addition, there is a matrix called ‘sample memory’, which is obtained by taking a random sample
of the head and tail embeddings; this matrix is used in an attention mechanism with the rest of the



                                                      20
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                         19–28




Figure 2: Violin diagram of the number of tweets collected.


texts. In contrast to the GDUFS_DM approach, the participants of team Haha, the second-placed team,
employed feature selection with a weighted tf-idf and a multilayer perceptron. This model not only
analysed the content of the news item, but also combined information such as the publisher of the news
item or the topic of the news item.


3. Methodology
This section describes the methodology used to extract social information from Twitter users. In
addition, the models trained according to the type of data they use are presented.

3.1. Social information extraction
The main objective of this work is to study the information provided by social information when
detecting fake news, and as mentioned in Chapter 1, there is no corpus in Spanish that contains this
information. This is why we decided to extract this information from the social network Twitter, using
the API provided by the platform.
   For each news item, we searched for those tweets that contained the headline of the news item or
the link to it. To solve the problem of the maximum length of the queries, special characters have been
eliminated from the news headlines.
   According to [4] and [5] there is a series of metadata of the tweets that allow extracting information
about whether the user may be prone to the propagation of fake news or the tweet may contain
untruthful information. Therefore, it has been decided to extract the following metadata from each of
the tweets.

    • Tweet. Text of the tweet, id of the author, id of the tweet, number of retweets, number of replies
      to the tweet, number of likes, number of citations of the tweet.
    • User. username (str), user creation date (date) ISO 8601, verified user (bool), number of followers
      (int), number of followed (int), number of tweets (int), number of times listed (int).

We have managed to extract posts from 41.67% of the total number of news items. Of these, the
distribution of the number of tweets collected per news item shows a high concentration in the (0, 200)
interval, representing 86% of the news items. Within this interval, it is observed that true news tends
to receive more interaction. However, as the number of tweets about a news item increases, it is evident
that fake news receives a greater number of interactions. This trend can be seen in the violin diagram
presented in Figure 2.
   It is worth noting that, although the news is written in Spanish, there are tweets in English or French
that talk about the news. This is especially true for news related to Covid-19.




                                                     21
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                          19–28


3.2. Textual models
In this section, the textual methods used for the binary classification of the news items will be presented.
The full text of the news item has been used, so it has had to be preprocessed. For the non-contextual
models, urls, emoticons or non-textual expressions, stopwords, the text has been converted to lowercase
and the processes of lemmatisation and stemming have been applied. However, for the contextual
models, only the urls have been eliminated.
   Subsequently, 5 different approaches have been used.
   1. Vector space model based on bags of words (BoW).
   2. Vector space model using a weighted tf-idf.
   3. Bigram counting.
   4. Neural Networks and deep learning.
   5. Contextual models.
For approaches 1, 2 and 3, Naive Bayes, SVM, Logistic Regression, Decision Trees and Random Forest
models have been trained. For approach 4, multilayer perceptrons with input the tf-idf weight vector,
multilayer perceptrons and convolutional networks with an embedding layer and multilayer perceptrons,
convolutional networks, LSTM, GRU and bidirectional networks with a pre-trained embedding layer.
Finally, for approach 5, the BETO model has been selected: Spanish BERT [10] with a final classification
layer with two neurons. This model is a BERT model trained with the whole-word masking technique
on a large corpus of more than three billion Spanish words.

3.3. Models with social information
The methods that use only the social information of the news collected use the following metadata for
each published tweet: number of retweets, number of replies, number of likes of the tweet, number of
quotes of the tweet, verified user, number of followers, number of followed, number of tweets of the
author, number of times the author has been listed. Then, in order to record the impact of the news
item on social networks, the number of tweets collected for this news item is added.
   To represent all the tweets that talk about a certain news item, an average of the previous character-
istics of each tweet has been calculated. Finally, the standard deviation of each characteristic was added.
In this way, a data matrix with 20 columns is obtained (where the column relating to the deviation of
the number of tweets of the news item is always 0).
   Once the feature matrix has been obtained, different learning models have been used with different
hyperparameter explorations such as Decision Trees, Random Forest, SVM, Gradient Boosting, Adaptive
Boosting, MLP,...

3.4. Hybrid model
A hybrid model has been developed that seeks to take advantage of both the textual information
provided by the text of the news item and the social information extracted from the Twitter data (both
the non-textual information of the section and the text of the tweets collected).
   In this model, for each news item, a specialised model is used to classify the news using social
information. For this purpose, the best model from the previous subsection (Random Forest) is selected.
With this model, for each news item, the probabilities of being true or false are extracted using as input
the corresponding row of the matrix of social characteristics with standard deviation described in that
section. In the event that no tweets could be extracted from a news item, the output would be a vector
of two zeros.
   In parallel, the text of the news item is processed using the BETO: Spanish BERT model [11]. The
output is a vector of dimension 768.
   In parallel to these two processes, for each news item with tweets collected, the text of each tweet is
pre-processed (eliminating URLs and tokenising) and subsequently processed using the pre-trained
XLM-roBERTa-base model [12]. This transformer model has been trained on a corpus of about 198



                                                    22
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                            19–28




Figure 3: Workflow of the hybrid model.


million tweets in 8 different languages (Spanish, Arabic, English, French, German, Hindi, Portuguese
and Italian) and is specialised in sentiment classification (positive, negative or neutral). In our case, the
last layer of the model will be removed, obtaining as output a vector of length 768 that will represent
the most relevant features of the text of the tweet.
   For each available tweet, the previous process has been carried out, obtaining a vector of length 768.
Finally, an average of all the vectors of the tweets of the news item has been made to obtain a vector
that represents the tweets of that news item. If the news item had no social information, a vector of
zeros is returned.
   Then, the three vectors are joined to obtain a vector of dimensionality 1538. This flowchart can be
seen in Figure 3
   Once all the news has been processed following the previous diagram, several models have been
trained such as Decision Trees, Random Forest, SVM, Gradient Boosting, Adaptive Boosting, MLP, ...


4. Evaluation
Two different methodologies have been used to evaluate the models, a cross-validation and an evaluation
on the test set.

4.1. 𝑘-fold cross-validation
Cross-validation is one of the most widely used methods to estimate the prediction error of a model
with a given set of hyperparameters. A 𝑘-fold (or 𝑘-fold cross-validation) has been used. This method
divides the data set, in our case the train set together with the development set, into 𝑘 equal parts
𝑃1 , . . . , 𝑃𝑘 . For each 𝑃𝑛 the model is trained using the other 𝑘 − 1 parts and the error in predicting the
𝑃𝑛 data (data never seen by this model) is calculated. By doing this for the 𝑘 parts we obtain a set of
errors. With these 𝑘 errors we calculate their mean and variance to obtain a measure of the average
error of that model with those hyperparameters.
  It should be noted that this method requires a fairly large computational cost, since for a cross-
validation of 𝑘-folds it would be necessary to train 𝑘 models. As a general rule, a value of 5 or 10 is



                                                     23
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                           19–28


                                          Textual models       𝐹1
                                            TF-IDF (RF)       0.849
                                             BoW (RF)         0.825
                                           Bigramas (RF)      0.822
                                          MLP (Embedding)     0.786
                                           MLP (TF-IDF)       0.751
                                               CNN            0.740
                                               BETO           0.727
                                               GRU            0.678
Table 1
Cross-validation results of textual model training.


usually chosen as a good compromise between bias and variance. In our case a 5-fold cross-validation
has been used.

4.2. Test set evaluation
Finally, for the model that has performed best in the previous cross-validations, the test set will
be evaluated. This set will never be seen by the model and will provide a representation of the
generalisability of the model.

4.3. Evaluation metrics
To evaluate the performance of our classification model, we use the F1 metric. The F1 value will be
calculated for both true and false classified news. With this, the value 𝑀 𝑎𝑐𝑟𝑜 - 𝐹 1, or simply 𝐹 1, will
be calculated as the average between the two previous values.


5. Results
In this section the results of the various trained models will be presented. For each approach in the
section 3 the following results will be shown:

    • Within the training of a particular approach, the 𝑀 𝑎𝑐𝑟𝑜 - 𝐹1 value of the best algorithms used will
      be shown. The average of the 𝑀 𝑎𝑐𝑟𝑜 - 𝐹1 values will be reflected using 5-fold cross-validation.
    • For each approach, the model with the best 𝑀 𝑎𝑐𝑟𝑜 - 𝐹1 will be selected during training. Sub-
      sequently, it will be retrained with all data and evaluated on the test set. The 𝐹 1𝐹 𝑎𝑘𝑒 , 𝐹 1𝑇 𝑟𝑢𝑒 ,
      𝑀 𝑎𝑐𝑟𝑜 - 𝐹1 and the Accuracy of the model will be exposed.

5.1. Textual models
The training results of the methods described in section 3.2 are listed in Table 1.
   It can be seen that the non-neural models stand out from those using neural networks. This could be
due to the fact that the models being used have a large number of parameters to optimise and we have
a rather limited data set. It is worth noting that the use of pre-trained embeddings has resulted in lower
performance than training the embeddings from scratch. Also noteworthy is the poor performance
obtained with recurrent networks, models that have required a large amount of training time and are
commonly used for language processing problems. The best performing approach has been to use a
weighted tf-idf together with a Random Forest model.
   The results of the evaluation of this model on the test set and the results of the teams participating in
IberLEF 2021 are shown in Table 4.




                                                      24
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                            19–28


                                    Social information models          𝐹1
                                           Random Forest              0.845
                                         Gradient Boosting            0.834
                                         Adaptive Boosting            0.826
                                    Extremely Randomized Trees        0.817
                                           Decision Trees             0.797
                                              K-Nearest               0.788
                                    Multilayer Perceptron (MLP)       0.787
                                                SVM                   0.785
                                    Passive-Aggressive Classifier     0.785
                                 Perceptron with two hidden layers    0.783
                                 Linear Discriminant Analysis (LDA)   0.781
                                      Multinomial Naive Bayes         0.781
                                  Perceptron with one hidden layer    0.781
                                       Bernouilli Naive Bayes         0.779
                                  Quadratic Discriminant Analysis     0.776
                                         Logistic Regression          0.703
Table 2
Cross-validation results of social models.


5.2. Social information models
The training results of the methods described in section 3.3 are collected in Table 2.
   We can see that the 𝐹 1 of the models is quite high. Tree-based models occupy the top 5 positions in
the list. In addition, those based on clusters of trees stand out from individual decision trees. The best
performing approach was a Random Forest model. It should be remembered that this model has only
been trained and evaluated with those news items from which it has been possible to extract social
information, so the training and test set is smaller than in the rest of the cases.
   Due to these results, it has been decided to choose the Random Forest classifier for the social
information for the hybrid model, as indicated in section 3.4.

5.3. Hybrid model
The training results of the methods described in section 3.4 are listed in Table 3.
   In view of the training results, any of the first 2 models would be valid for your choice. The rest of the
models have a very similar accuracy to the first three. It has been decided to select logistic regression
over decision trees since it is a simpler algorithm, with a smaller number of hyperparameters and with
a lower computational cost.
   The results of the evaluation of this model on the test set and the results of the teams participating in
IberLEF 2021 are shown in Table 4.


6. Discussion
This section presents a discussion of the results obtained.
   In view of the results shown in Tables 1 and 3, it can be seen that the approach that obtains the best 𝐹 1
is a model that uses only textual information, more specifically a Random Forest with a weighted tf-idf.
This approach obtains a higher 𝐹 1 compared to other types of models that include social information,
so that a priori it could be thought that social information does not provide relevant information.
   However, in Table 4, we can see how on the test set the model that uses only textual information
obtains worse results compared to the hybrid model. This is due to the fact that when using a tf-idf
weight it is possible that there are words in the corpus on which the weight is applied (training news
corpus) that do not exist in the test set. This is why models such as transformer networks pre-trained




                                                     25
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                        19–28


                                           Hybrid Model                      𝐹1
                                            Decision Trees                  0.818
                                         Logistic Regression                0.818
                                                SVM                         0.809
                                    Linear Discriminant Analysis            0.809
                                           Random Forest                    0.809
                                          Gradient Boosting                 0.809
                                     Passive-Aggressive Classifier          0.809
                                    Adaptive Boosting (AdaBoost)            0.809
                                    Extremely Randomized Trees              0.809
                                   Quadratic Discriminant Analysis          0.809
                                    Multilayer Perceptron (MLP)             0.809
                                              K-Nearest                     0.809
                                 Perceptron with three hidden layers        0.809
                                  Perceptron with two hidden layers         0.808
                                  Perceptron with one hidden layer          0.808
                                      Multinomial Naive Bayes               0.631
                                        Bernoulli Naive Bayes               0.607
Table 3
Cross-validation results of hybrid model.

                                                 Fake          True    𝐹𝑚𝑎𝑐𝑟𝑜   Accuracy
                        Textual Models          0.7140        0.7488   0.7314    0.7325
                        Hybrid Model            0.7900        0.7352   0.7626    0.7657
                        GDUFS_DM                0.7666        0.7649   0.7666    0.7657
                        Haha                    0.7548        0.7522   0.7548    0.7535
                        Chats_                  0.7514        0.7690   0.7514    0.7605
                        SINAI                   0.7385        0.7821   0.7385    0.7622
                        baseline-BERT           0.7321        0.7432   0.7321    0.7378
                        baseline-BOW-SVM        0.7217        0.7359   0.7217    0.7290
Table 4
Results on the test set. Including the best participants of IberLEF 2021.


on large corpora will have more generalisation capacity and, therefore, will be able to obtain better
results.
   Once social information is introduced into the model, a significant increase in results can be seen.
This is due to the fact that on the one hand the text is being processed using transformer models with a
very high generalisation capacity and that the non-textual social information extracted from Twitter is
the same regardless of the subject matter.
   Comparing the models with respect to the best classified in IberLEF 2021, Figure 1, it can be seen
that the hybrid model is the one that best classifies Fake news. This hybrid model obtains the same
Accuracy as the first ranked team.
   In addition, a study has been carried out on which social information features are the most relevant
for the model. For this purpose, the importance of the permutation set out in [13] has been used. It
can be seen that 8 of the 9 most relevant features only depend on the author’s information and not
on the content or information of the tweet. These 9 features are, in order of importance: listed_count,
following_count_std, followers_count, tweet_count_std, followers_count_std, quote_count_std, verified,
verified_std, tweet_count. In addition, within these characteristics, the information provided by those
obtained from the standard deviation of the set of tweets collected for each news item stands out.
   The percentage of importance of the most relevant features used in the logistic regression of the
hybrid model has also been calculated. To calculate the importance of each feature, 𝑓𝑖 , the coefficients
of the regression, 𝑤𝑖 , have been extracted and the following operation has been carried out 𝑓𝑖 = 𝑒𝑤𝑖 .
Finally, the percentage of each of them has been calculated. With this, the most relevant characteristic



                                                         26
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                           19–28


for the model, with 10 times more importance over the rest, was the variable that corresponds to the
probability returned by the Random Forest that a news item is true using the social information of the
news item.


7. Conclusions and Future Work
Throughout the development of this work, it has been observed how the introduction of social informa-
tion, combined with textual information, has enabled the classification of news, helping to improve the
performance of the models. This suggests that, when solving a problem, it would be useful to add social
information to the dataset. However, obtaining this information is quite costly both economically and
in terms of time.
   Additionally, the importance of social features in classifier models has been studied, concluding that
author-related features are more important than tweet-related features. The development of a model
that combines all textual and social features achieves similar or better results than models that use only
textual information.
   However, it is crucial to acknowledge several important limitations:
    • Impractical Approach: Many of the social signals being harvested are post-facto. While disin-
      formation might actually be spreading, many features (such as the number of reposts) would
      not have stabilized. Thus, while the current approach of augmenting these signals might work
      post-facto, it is unlikely to work with live data. Even post-facto, it is unclear whether the approach
      will scale.
    • Flawed Methodology: The use of balanced training data, and a small set of data at that, is not
      meaningful. Particularly, it is unclear how learning from such a small corpus would generalize
      when new kinds of disinformation arise. In practice, the distribution of disinformation-carrying
      articles compared to genuine ones is far from balanced. Therefore, any realistic methodology
      needs to incorporate the ability to handle imbalance and transferability from the learning phase.
      Moreover, adversary behavior might change to emulate the features of good articles or at least
      stray away from its current behavior, rendering the specific features used for classification
      obsolete.
    • Too Static and Small Dataset: The dataset used is too static and small, and lacks adequate diversity
      to consider any results conclusive. A variety of distinct datasets ought to be used to determine if
      the ideas actually work in a more general setting.
   As a line of future work, it would be a good approach not only to study the individual social metadata
of each user, but also to study a social graph of the followers or followers to see the social relationships
that exist between them. Additionally, the dataset should be expanded and diversified, and methods
should be developed to handle imbalanced data and adapt to changing adversary behavior.
   We acknowledge that this work, while preliminary, can trigger useful discussions and provides a
foundation upon which more robust and scalable approaches can be built in the future.


Acknowledgments
This work was supported by the HAMiSoN project grant CHIST-ERA-21-OSNEM-002, AEI PCI2022-
135026-2 (MCIN/AEI/10.13039/501100011033 and EU “NextGenerationEU”/PRTR).


References
 [1] P. Bharadwaj, Z. Shao, Fake news detection with semantic features and text mining, International
     Journal on Natural Language Computing (IJNLC) Vol 8 (2019).
 [2] R. K. Kaliyar, A. Goswami, P. Narang, Fakebert: Fake news detection in social media with a
     bert-based deep learning approach, Multimedia tools and applications 80 (2021) 11765–11788.



                                                    27
Jesús M. Fraile-Hernández et al. CODAI Workshop Proceedings                                       19–28


 [3] N. K. Conroy, V. L. Rubin, Y. Chen, Automatic deception detection: Methods for finding fake news,
     Proceedings of the association for information science and technology 52 (2015) 1–4.
 [4] C. Buntain, J. Golbeck, Automatically identifying fake news in popular twitter threads, in: 2017
     IEEE International Conference on Smart Cloud (SmartCloud), IEEE, 2017, pp. 208–215.
 [5] M. Albahar, A hybrid model for fake news detection: Leveraging news content and user comments
     in fake news, IET Information Security 15 (2021) 169–177.
 [6] N. Ruchansky, S. Seo, Y. Liu, Csi: A hybrid deep model for fake news detection, in: Proceedings of
     the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 797–806.
 [7] K. Shu, S. Wang, H. Liu, Exploiting tri-relationship for fake news detection, arXiv preprint
     arXiv:1712.07709 8 (2017).
 [8] J.-P. Posadas-Durán, H. Gómez-Adorno, G. Sidorov, J. J. M. Escobar, Detection of fake news in a
     new corpus for the spanish language, Journal of Intelligent & Fuzzy Systems 36 (2019) 4869–4876.
 [9] H. Gómez-Adorno, J. P. Posadas-Durán, G. B. Enguix, C. P. Capetillo, Overview of fakedes at
     iberlef 2021: Fake news detection in spanish shared task, Procesamiento del Lenguaje Natural 67
     (2021) 223–231.
[10] J. Canete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
     evaluation data, Pml4dc at iclr 2020 (2020) 1–10.
[11] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
     evaluation data, in: PML4DC at ICLR 2020, 2020.
[12] F. Barbieri, L. Espinosa-Anke, J. Camacho-Collados, Xlm-t: Multilingual language models in twitter
     for sentiment analysis and beyond, Proceedings of the LREC, Marseille, France (2022) 20–25.
[13] A. Altmann, L. Toloşi, O. Sander, T. Lengauer, Permutation importance: a corrected feature
     importance measure, Bioinformatics 26 (2010) 1340–1347.




                                                   28