<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TSIA team at FakeDeS 2021: Fake News Detection in Spanish Using Multi-Model Ensemble Learning</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Science and Engineering Yunnan University</institution>
          ,
          <addr-line>Yunnan</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fake news has become a hotly debated topic in journalism. This paper describes our contribution of the TSIA team in the Fake News Detection in Spanish Shared Task of IberLEF 2021. We regard this task as a binary classi cation task. We mainly propose three model architectures based on the pre-trained model BETO and XLM-RoBERTa-Large. We rst ne-tuned the Spanish pre-trained model BETO and then we chose the multi-language pre-trained model XLM-RoBERTa-Large to replace BETO and ne-tune it, including the addition of CNN for feature extraction. Finally, our system achieves best F1-score of 0.6860 by hard voting, which ranks 10th out of 21 teams on the nal leaderboard. Our score is only 0.0806 worse than the best score on the leaderboard.</p>
      </abstract>
      <kwd-group>
        <kwd>Fake News Classi cation</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>XLM-RoBERTa-Large</kwd>
        <kwd>Ensemble</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This goal of Fake News Detection in Spanish Shared Task at IberLEF 2021 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
aims to help users detect and lter out potentially deceptive news in social
networks. As we all know, social networks o er platforms in which information and
articles may be shared without fact-checking or moderation. Moderating
usergenerated content on social media presents a challenge due to both volume and
variety of information posted. In particular, highly partisan fabricated
materials on social media, fake news, is believed to be an in uencing factor in recent
elections [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Misinformation spread through fake news has attracted signi cant
media attention recently and current approaches rely on manual annotation by
third parties [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to notify users that shared content may be untrue. Social
media information may not only represent a lot of negative emotions(terrorism,
political elections, advertisement, satire, among others), but also show the
particularity that the people can decide to show or hide their identity. The task of
detecting fake news is de ned as the prediction of the chances of a particular
news article being deceptive [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The conventional solution to this task is to
ask professionals such as journalists to check claims against evidence based on
previously spoken or written facts. However, it is time-consuming and expensive.
For example, it is hard for editors to judge whether a piece of news is real or
not. As the Internet community and the speed of the spread of information are
growing rapidly, automated fake news detection on Internet content has gained
interest in the Arti cial Intelligence research community. The goal of automatic
fake news detection is to reduce the human time and e ort to detect fake news
and help us stop spreading it. The task of fake news detection has been studied
from various perspectives with the development in subareas of Computer
Science, such as Machine Learning (ML), Data Mining (DM), and NLP [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Besides
the fact that most of the previous works done in these two tasks, namely
aggressiveness detection and fake-news detection, are for English, little research
has been done for Spanish using the most recent NLP techniques such as deep
learning approaches [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In this paper, We use popular techniques in natural
language processing to solve the problem of identifying fake news in Spanish.
      </p>
      <p>The remainder of the paper is structured as follows: a brief analysis on related
work is performed in section 2, followed by a description of the datasets and
details on the methods employed for detection of fake news in Section 3. Section
4 outlines the evaluation process and results, while conclusions and future work
are drawn in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>For datasets in di erent languages, it brings challenges to fake news detection.
In recent years, researchers have done a lot of research on fake news detection on
English datasets. And due to the impact of Covid-19, many competitions have
issued tasks on fake news detection. Such as SemEval 2021 Task 1released the
detection of toxic text span, HASOC 2020 2 issued the challenge of hate speech and
o ensive content identi cation in Indo-European languages and CONSTRAINT
2021' task 3, about hostility detection in Hindi. All these show that the
detection of fake news has always been a ery challenge. Hence, the researches on the
detection of fake news in Spanish in social media is also valuable. This is also
helpful for the detection of Covid-19 information in Spanish social media.</p>
      <p>
        The detection of fake news is the same as other text classi cation problems
in natural language processing. The most important thing is to nd suitable
features to represent sentences. The task is to assign prede ned categories to a
given text sequence. Many work has shown that pre-trained models on large
corpora are bene cial for text classi cation and other NLP tasks, which can avoid
training new models from scratch. Since 2013, people have proposed some word
embedding approaches such as word2vec [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and glove [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, because their
1 https://sites.google.com/view/toxicspans
2 https://hasoc re.github.io/hasoc/2020/
3 http://lcs2.iiitd.edu.in/CONSTRAINT-2021/
word embeddings are all in the same space, they can not express the role of
polysemy. In other words, they are non-contextual embedding, they can not capture
the high-level concepts of sentences, such as semantics and context [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Later,
someone proposed the ELMo [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] model to solve this problem. Compared with
word2vec and glove, ELMo captures contextual information and not just
individual information of words. In word2vec, the vector representations of words are
completely consistent in di erent contexts, but ELMo is optimized for this [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
More recently, pre-trained language models have shown to be useful in learning
common language representations by utilizing a large amount of unlabeled data:
such as OpenAI GPT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. BERT is based on a multi-layer
bidirectional Transformer [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and is trained on plain text for masked word prediction
and next sentence prediction tasks. Since BERT is suitable for English and the
dataset of this competition is Spanish, which also added Covid-19 related data
for English. We nally choose BETO4 and a multi-language pre-trained model|
XLM-RoBERTa-Large 5 as our pre-trained model. And we ne-tuned this two
pre-trained models, submited three Runs and made a hard voting on the three
Runs nally.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data and Methods</title>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>
          The dataset used in the model are all provided by the organizer. There are 676
training set and 295 development set. The corpus consists of news compiled
mainly from Mexican web sources: established newspaper websites, media
companies websites, special websites dedicated to validating fake news, and websites
designated by di erent journalists as sites that regularly publish fake news. The
corpus contains the following information [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]:
{ Category: Fake / True.
{ Topic: Science / Sport / Economy / Education / Entertainment / Politics,
        </p>
        <p>Health / Security / Society.
{ Source: The name of the source media.
{ Headline: The title of the news.
{ Text: The complete text of the news.</p>
        <p>{ Link: The URL where the news was published.</p>
        <p>Since the corpus contain di erent labels, in order to increase the learning ability
of the model. We added "Category" and "Topic" column to the "Text" column.
We did not use the label|"Link". This does improve the learning ability of
the model, but it also leads to the poor generalization ability of the model. In
addition, we did simple data preprocessing, such as: we strip emojis from the
training set, and we deleted the link of website, etc.</p>
        <sec id="sec-3-1-1">
          <title>4 https://github.com/dccuchile/beto 5 https://huggingface.co/xlm-roberta-large</title>
          <p>3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Fine-tuned of BETO and XLM-RoBERTa-Large</title>
        <p>Pre-trained and ne-tuning architecture is already a popular method for text
classi cation. Our system used BETO and XLM-RoBERTa-Large as the
pretrained model, and we provided three runs with ensemble. They are:
{ Run 1: Fine-tuned of BETO
{ Run 2: XLM-RoBERTa-Large
{ Run 3: XLM-RoBERTa-Large + CNN
BETO is similar to BERT. They all have 12 hidden layers. BETO is a BERT
model trained on a big Spanish corpus. BETO is of size similar to a
BERTBase and was trained with the whole word masking technique. Representing
each word in the sentence as a vector, which includes word embedding and
character embedding. The character embedding is initialized randomly. The word
embedding is usually imported from a pre-trained word embedding le. All
embeddings will be ne-tuned during training. For the Run 1, as is shown in Fig. 1,
P O is the pooler output of BETO, HO is hidden-state of the rst token of the
sequence(CLS token) at the output of the hidden layer of the model. Then, we
concatenate P O and HO of the last three hidden layers into the classi er after
obtaining P O.</p>
        <p>The Facebook AI team released XLM-RoBERTa in November 2019 as an
update of its original XLM-100 model. They are all transformer-based language
models, all rely on the mask language model target, and they can handle texts
in 100 di erent languages. Compared to the original version, the biggest update
of XLM-RoBERTa is a signi cant increase in the amount of training data. The
commonly used crawler datasets that have been cleaned and trained occupy up
to 2.5tb of storage space. It is several orders of magnitude larger than the
Wiki100 corpus used to train its previous version, and this expansion is especially
noticeable in languages with fewer resources. XLM-RoBERTa-Large adds 12
hidden layers on the basis of XLM-RoBERTa. Therefore, the network structure
of XLM-RoBERTa-Large is much more complicated, and the number of
pretrained layers is deeper. For the Run 2, wo just add a classi er after the
XLMRoBERTa-Large(Note: we did not give the architecture of Run2). For the Run
3, as is shown in Fig. 2, we add CNN before P O is sent to the classi er. Firstly,
we got pooler output (P O), P O is the pooler output of XLM-Roberta-Large. It
is obtained by its last layer hidden state of the rst token of the sequence (CLS
token) further processed by a linear layer and a tanh activation function. Then,
we let P O go through a three-layers CNN (including convolution and pooling).
Finally, input this two-dimensional vector into a linear classi er to do a binary
classi cation.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Ensemble learning</title>
        <p>We use the multi-model ensemble learning approach to get a stable system that
performs well in all aspects. We further use hard voting to determine the nal
category, whose main idea is to vote for a speech by the classi cation results
of each model and the minority obeys the majority. Thus, our nal predition
result integrates the models of Run 1, Run 2 and Run 3 by ensemble learning.
The experimental results in the next chapter verify the e ectiveness of ensemble
learning.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <sec id="sec-4-1">
        <title>Hyper-parameters settings</title>
        <p>In this work, our models were implemented based on Pytorch 6. Our experiments
were run on Google Colab 7. The GPU is Tesla P4. The batch size is 32. Our
hidden layer state of BETO and XLM-RoBERTa-Large by setting the output
hidden states was True. We used the adam optimizer and the learning rate of
three Runs was 5e-5. The three models were trained in 30 epochs. For the Run
3, we used three convolutional layers. The number of convolution kernels is 256.
The activation function is Relu. The pooling layer uses maximum pooling.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Criteria evaluation and results</title>
        <p>We mainly used F1-score to evaluate our model. The criteria evaluation of
F1score is as follows:</p>
        <p>P recision =</p>
        <p>; Recall =</p>
        <p>T P
T P + F P</p>
        <p>T P
T P + F N
; F1 =</p>
        <p>P recision Recall 2</p>
        <p>P recision + Recall
The result is shown in Table 1.</p>
        <sec id="sec-4-2-1">
          <title>6 https://pytorch.org/ 7 https://drive.google.com/drive/my-drive</title>
          <p>From the data in Table 1, it can be seen that the three Runs on the development
set all obtain good results, which the F1-score of Run 3 is the best. This shows
that CNN is helpful in this task. Therefore, we choose the XLM-RoBERTa-Large
+ CNN architecture to predict the nal test set. The result using this model on
the test set is 0.6252. Finally, we submitted the result of ensembling the three
Runs by hard voting. The nal best result on the test set is 0.6860, which shows
that ensemble learning strengthens the learning ability of multiple classi ers.</p>
          <p>But the results of our model on the test set are not the most competitive.
This may be because we did not do a better job of data augmentation(DA),
which leads to the poor model generalization. We need to allow limited data
to produce value equivalent to more data without substantial increase in data.
Therefore, we need to put more e ort in data processing and augmentation.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and future work</title>
      <p>In this paper, we describe our strategy to classify fake and real text in Spanish
document. In our three systems, we used transformers based pre-trained
models, BETO, XLM-RoBERTa-Large and XLM-RoBERTa-Large adding CNN. Our
proposals show to be competitive for this speci c task. However, we must also
further test and improve our model, because our results are 0.0806 worse than
the best F1-score. So we still have a lot of work to do in the future.</p>
      <p>
        In the future, We should rst try to ne-tune the appropriate parameters
of the model, because we have not done too many attempts to ne-tune the
parameters. Then, future development directions include exploring other related
datasets for fake news elds. Also, We just did ensemble learning for the
prediction results of the three models. We need to try more integrated learning
methods. And we have too few ensemble models, We need to explore more
models that are as competitive as others. In addition, advanced error analysis
techniques, such as feature importance or model explainability, could also be used
to improve the model's performance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Allcott</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gentzkow</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Social media and fake news in the 2016 election</article-title>
          .
          <source>Journal of Economic Perspectives</source>
          <volume>31</volume>
          (
          <issue>2</issue>
          ),
          <volume>211</volume>
          {
          <fpage>236</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , T.B.,
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Language models are few-shot learners (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gomez-Adorno</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Posadas-Duran</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bel-Enguix</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Porto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Overview of fakedes task at iberlef 2020: Fake news detection in spanish</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Facebook is going to use snopes and other fact-checkers to combat and bury 'fake news' (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>Computer Science</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mellado</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Albornoz</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adorno</surname>
            ,
            <given-names>H.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafra</surname>
            ,
            <given-names>S.M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lima</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>de Arco</surname>
            ,
            <given-names>F.M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taule</surname>
          </string-name>
          , M. (eds.):
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , 2021
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Oshikawa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , W.Y.:
          <article-title>A survey on natural language processing for fake news detection (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Conference on Empirical Methods in Natural Language Processing</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Posadas-Duran</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Adorno</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escobar</surname>
            ,
            <given-names>J.J.M.:</given-names>
          </string-name>
          <article-title>Detection of fake news in a new corpus for the spanish language</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          <volume>36</volume>
          (
          <issue>5</issue>
          ),
          <volume>4869</volume>
          {
          <fpage>4876</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>V.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conroy</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Towards news veri cation: Deception detection methods for news discourse</article-title>
          .
          <source>In: Hawaii International Conference on System Sciences</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>How to ne-tune bert for text classi cation? (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Tanase</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaharia</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cercel</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dascalu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Detecting aggressiveness in mexican spanish social media content by ne-tuning transformer-based models</article-title>
          .
          <source>In: MEX-A3T at IberLEF</source>
          <year>2020</year>
          :
          <article-title>Authorship and aggressiveness analysis in Twitter: case study in Mexican Spanish (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          ), http: //arxiv.org/abs/1706.03762
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Villatoro-Tello</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ram</surname>
          </string-name>
          rez-
          <string-name>
            <surname>De-La-Rosa</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parida</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motlicek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Idiap and uam participation at mex-a3t evaluation campaign</article-title>
          .
          <source>In: IberLEF2020</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deconvolutional paragraph representation learning</article-title>
          .
          <source>In: NIPS</source>
          (
          <year>2017</year>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>