<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GSI-UPM at IberLEF2021: Emotion Analysis of Spanish Tweets by Fine-tuning the XLM-RoBERTa Language Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>rlos A. Igl</string-name>
          <email>carlosangel.iglesiasg@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Politecnica de Madrid, Intelligent Systems Group</institution>
          ,
          <addr-line>28040 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work presents the participation of the Intelligent Systems Group (GSI) at Universidad Politecnica de Madrid (UPM) in the Emotion Analysis competition EmoEvalEs, part of IberLEF 2021 Conference. The addressed challenge proposes an emotion classi cation task of Spanish tweets, categorizing each message into seven emotions. We propose the design and development of a ne-tuned neural language model (XLM-RoBERTa) to tackle this challenge. We have obtained excellent results with this approach, obtaining the rst place in the competition with a macro-averaged F1 score of 71.70%. Additionally, we also explore the application of several ensemble methods built over the neural language model.</p>
      </abstract>
      <kwd-group>
        <kwd>Emotion Analysis</kwd>
        <kwd>Transformers</kwd>
        <kwd>XLM-Roberta</kwd>
        <kwd>Twitter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recent advances in machine learning research are rapidly pushing the sentiment
analysis eld forward. Works that use neural architectures to improve previous
models are established, and current state-of-the-art models are heavily based on
these techniques [
        <xref ref-type="bibr" rid="ref1 ref21">1,21</xref>
        ]. Although sentiment analysis still represents a challenging
task and further study is needed, research in emotion analysis is also relevant. In
this sense, estimating emotions from text is currently less studied and opens a
new range of potential applications. Since sentiment and emotion analysis share
many subproblems, the approaches that tackle these disciplines are frequently
similar.
      </p>
      <p>
        This paper presents our participation in IberLEF 2021 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], describing our
e orts towards EmoEvalEs, an emotion classi cation task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The task presents
an emotion classi cation challenge in the form of a multiclass classi cation task,
where the emotions considered are anger, disgust, fear, joy, sadness, surprise, and
other, category which represent emotions that are not included in the Ekman
emotion model or the absence of any emotion. The data has been extracted from
Twitter, and its contents address several domains: entertainment, catastrophe,
political, global commemoration, and global strike. For more information on the
dataset, please consult [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        In the e ort of addressing this task, we use the ne-tuned XLM-Twitter
(XLM-T) language model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as the primary emotion estimator. Additionally,
intending to improve the results obtained by XLM-T, we combine its predictions
in an ensemble system that uses several types of features and models. The nal
result indicates that our e orts are oriented in the right direction since we have
obtained the rst place in the competition with a nal macro-averaged F1 score
that reaches 71.70%.
      </p>
      <p>For replication reasons, we provide the source code used to generate the
models and their respective submissions. It is available online at https://github.
com/gsi-upm/emoevales-iberlef2021.</p>
      <p>The rest of the paper is organized as follows. Section 2 provides the
background of the methods used in this work. Next, the proposed approach and
architecture for emotion classi cation are described in Section 3 and evaluated
in Section 4. Finally, Section 5 states our conclusions and proposes future lines
of work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Deep learning approaches are common in sentiment analysis and have proved
helpful in emotion analysis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Incorporating deep learning models was initiated
with the popularization of word embedding models, such as word2vec [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or
GloVe [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Word embedding models have allowed researchers and practitioners
to develop new deep learning models that use these distributed representations.
One relevant example is Sentiment-Speci c Word Embedding (SSWE) that
computes sentiment-oriented word embeddings that can be later used to predict
sentiment in texts [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Another model that makes e cient use of word embeddings
is presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This model uses a straightforward word vector aggregation
method that extracts a uni ed document representation from a word
embedding model. In this way, the aggregation of word vectors has proven e ective in
sentiment analysis, obtaining consistent performance in di erent data domains.
      </p>
      <p>
        The use of word embeddings to elicit sentiment and emotion represents a large
eld. In this work, we use previous models to incorporate them into our ensemble
model. One of those is the SIMilarity-based sentiment projectiON (SIMON)
model, which computes the representation of a particular word in a document
by considering its projection to a set of domain words [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Such projection is
computed using the semantic similarity between words, as obtained from a word
embedding model. Thus, a document word is represented by its similarity to
the selection of domain words. However, as previously studied, this selection of
words can be varied, and selecting the component words can highly a ect the
nal prediction performance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In recent years, an approach that has proved successful in most Natural
Language Processing (NLP) tasks is using large pretrained language models based
on a transformer architecture. Transformers [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] are an attention-based
architecture that allows computing complex representations of information without using
Recurrent Neural Networks, which have made it possible to parallelize the
training of large language models e ciently. After the release of BERT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in 2018,
the NLP community has created new improved language models. One of these
language models is RoBERTa [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], an optimized BERT pretraining approach
that achieves signi cantly better results than the previous BERT
implementation. Furthermore, RoBERTa outperformed state-of-the-art results, becoming
the baseline for many further works for di erent NLP tasks, such as cross-lingual
language understanding (XLU).
      </p>
      <p>
        In this domain, XLM-RoBERTa (XLM-R) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] stands out as a model
pretrained in 100 di erent languages, achieving state-of-the-art performance on
cross-lingual classi cation, sequence labeling, and question answering. The lack
of pretrained language models in languages di erent than English has geared
researchers' interest towards multilingual models that have demonstrated that
it is possible to have a single large model for all languages without sacri
cing too much performance for each language. However, previous research shows
that multilingual models tend to underperform monolingual models in
languagespeci c tasks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This context framed the pretrained language model we have
used for this work, XLM-T, an XLM-R that achieves better results in the Twitter
domain that its XLM-R baseline and has been pretrained on millions of tweets
in over 30 di erent languages
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Architecture</title>
      <sec id="sec-3-1">
        <title>Fine-tuning XLM-T</title>
        <p>
          We have ne-tuned the Twitter-speci c pretrained language model in the
downstream task of emotion classi cation following parameter-e cient transfer
learning techniques [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In short, the language model parameters remain unchanged
while the weights of a neural network classi cation head on top of the language
model are trained.
        </p>
        <p>
          We have implemented this architecture and run the training process using the
modules and the Trainer API from the HuggingFace Transformer library [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ],
which is optimized and provides a wide range of training options and built-in
features. We have tested three di erent approaches to solve the problem:
{ Multi-label classi cation problem: We have trained the model to predict
the class with a higher probability among the seven possibilities.
{ Binary classi cation problem: Frame the multiclass problem as a
onevs-all problem where seven di erent models are trained. Ties are solved by
selecting the the output from the model with the highest con dence score.
{ Additional Features: We extend the classi cation head to use the
additional features, event and o ensive, available in the dataset as new inputs
encoded as one-hot vectors.
        </p>
        <p>The classi cation head consists of a dense layer at the output of the language
model, followed by a dropout layer with the default dropout probability of the
language model and a nal projection layer with the number of labels. For the
Additional Features model, we have added new inputs to the rst dense layer.</p>
        <p>We have followed the training process as described in Transformers
documentation. For the sake of reproducibility, the source code is available at
https://github.com/gsi-upm/emoevales-iberlef2021, and the ne-tuned model is
available at the HuggingFace model hub.</p>
        <p>The hyper-parameters used are a batch size of 16 per GPU, max length
(tokenizer) of 200, and training for 5 epochs. The rest of the parameters are the
default in the Trainer API. The trainer API also saves the checkpoint of the best
epoch that usually occurs at epoch 3. We run the training process for 1 hour on
two NVIDIA Titan X Pascal GPUs.</p>
        <p>
          Before tokenizing, we have slightly preprocessed the tweets with the Twitter
preprocessing module of the GSITK library [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We have found this helpful to
achieve slightly better results.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Ensemble model</title>
        <p>
          To improve the nal prediction scores, we have developed an ensemble model
that combines, at the prediction level, di erent models trained on varied data.
In this sense, previous works [
          <xref ref-type="bibr" rid="ref1 ref20">1,20</xref>
          ] have successfully used ensemble models to
boost the prediction performance. Furthermore, as outlined in previous works,
an ensemble model improves its performance when using varied models trained
with di erent feature types. We have used the following features in our ensemble
model:
        </p>
        <p>
          SIMON [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. As mentioned, this model computes the representation of a
document by measuring the similarity of the component words to those of a
prede ned domain word set. The main component of this model is the domain
word set, from which the model adapts to the di erent language uses of a speci c
domain. Following previous work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we have extracted a custom word set from
the dataset, selecting the words by their frequency of appearance in the dataset.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Word embedding combination (M G). As described in Sect. 2, this</title>
        <p>model aggregates the component word vectors from a document, obtaining a
xed document representation. Previous works have found that this method is
reliable across di erent domains. In this work, we aggregate the word vectors
using the average aggregation operation.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Term Frequency{Inverse Document Frequency (TF-IDF). We use</title>
        <p>the TF-IDF as a simple text representation method. Our model instance
considers both uni and bigrams.</p>
        <p>N-grams. Similarly, as before, we use this feature to enhance the variety of
the ensemble model. Moreover, as before, we consider uni and bigrams.</p>
        <p>Additional features. The challenge dataset contains additional information
that can be easily used as features. Concretely, we used the event and o ensive
data elds. The event category speci es the general event from which the message
has been extracted. In contrast, the o ensive category speci es whether the
message contains o ensive language, which may aid in the task at hand. Please
note that we do not train a learning model with these features solely. However,
instead, we add them to other feature sets.</p>
        <p>MeaningCloud. Since sentiment and emotion are intimately related we
have incorporated a new feature, the sentiment estimation that MeaningCloud
(https://www.meaningcloud.com/) o ers. MeaningCloud o ers a professional
sentiment analysis service that can be accessed via a web API (https://www.
meaningcloud.com/products/sentiment-analysis). We extract the sentiment
estimations for all messages using this service. This information is included as an
additional feature.</p>
        <p>Considering the described feature types, we train di erent learning models
that train on the mentioned features. We select a simple algorithm for the base
learners, logistic regression, since all predictions are combined in an ensemble
fashion. For feature combination, we concatenate the feature vectors. When using
learning models to train for the ensemble, we have selected logistic regression
and random forests.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>4.1</p>
      <sec id="sec-4-1">
        <title>XLM-T evaluation</title>
        <p>This section describes the performance of the di erent approaches we have
followed during the ne-tuning of the pretrained model. Table 1 shows the accuracy
and weighted F1 scores of the di erent approaches on the development set, where
the model ne-tuned in the multiclass classi cation problem has achieved the
best results. Table 2 shows the same information for the test set, where the best
model is the multiclass estimator again.</p>
        <p>XLM-T Multi-label classi cation
XLM-T Binary one-vs-all classi cation</p>
        <p>Additional Features</p>
        <p>XLM-T Multi-label classi cation
XLM-T Binary one-vs-all classi cation</p>
        <p>Additional Features</p>
        <p>These results show the superior performance of the multiclass classi er over
the combination of various binary classi ers, although better combination
strategies could improve the results of the latter. Moreover, including additional
features without any preprocessing and at the same level as the output produced
by the pretrained language model decreases the classi er performance.</p>
        <p>Figure 1 depicts the confusion matrix produced by the XLM-T multi-label
classi er on the test set. We observe the evident unbalancing of the dataset,
where almost half of the records belong to the others class. Moreover, this class
is usually confounded with the joy class. Additionally, this matrix shows the
di culty of distinguishing between emotions that share similar features, such as
anger and disgust. Finally, the low number of records in some classes (anger,
disgust, and surprise) is an additional challenge since the models tend to fail in
those classes.</p>
        <p>Fig. 1: Confusion matrix on test set produced by XLM-T multi-label classi er
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Ensemble evaluation</title>
        <p>As explained, we have designed an ensemble methodology to combine several
base models and raise the nal classi cation performance. Our primary model is
the one we have obtained through the use of the XLM-T transformer ne-tuned
for the multiclass classi cation problem, and thus we include this model into
the ensemble. Table 3 details the models used in the ensemble, along with the
features they use to train. To evaluate the di erent models, the accuracy and
weighted averaged F1 score have been used.</p>
        <p>Word embedding combination (M G): averaged word vectors.</p>
        <p>Uni and bi-gram representations.</p>
        <p>TF-IDF features, considering both uni and bigrams.</p>
        <p>SIMON features using an extracted word set.
n-gram representations in combination with
the dataset's additional features.</p>
        <p>Ensemble that uses a logistic regression model to
learn from base classi er predictions.</p>
        <p>Ensemble that uses a random forest model to
learn from base classi er predictions.</p>
        <p>Ensemble LR combined with the dataset's additional features.</p>
        <p>Ensemble RF combined with the dataset's additional features.</p>
        <p>Ensemble LR + add. features, combined with
the sentiment estimation obtained from MeaningCloud.</p>
        <p>
          We have evaluated the models detailed in Table 3 on the development dataset [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
The obtained metrics can be seen in Table 4. As described above, the XLM-T
model obtains high accuracy and averaged F1 scores. As expected, the rest of
the base models achieve lower metrics in comparison since they do not consider
such complex relations in the analyzed text. Therefore, we can consider the
average of word vectors (word2vec in Table 3), n-grams, and TF-IDF as baseline
approaches in this task. Following, the SIMON model a higher score than the
rest of the base methods, which can be explained by the increased complexity
of the method in comparison. The SIMON model uses both a word embedding
model and a selected word set to compute the text representation.
        </p>
        <p>Following, we can observe that adding additional features (event and o
ensive categories) to the n-gram approach improves the regular n-gram features.
This fact indicates that these additional features can be leveraged to improve
classi cation performance.</p>
        <p>Table 4 shows that combining all base models through an ensemble generally
improves the classi cation performance when attending to the ensemble
methods. Nonetheless, the metrics have not been improved over the XLM-T model,
even though this model is included in the ensemble.</p>
        <p>This situation changes when adding additional features and the
MeaningCloud sentiment analysis to the ensemble. The ensemble using all features gets
a lower accuracy, but the weighted F1 score is slightly higher in the
development set, although this does not represent a relevant improvement. Please note
that in this last case, the ensemble learner is trained with a combination of the
predictions from the base classi ers, additional features, and MeaningCloud's
sentiment analysis results.</p>
        <p>When attending to the test set results, we have observed a di erent situation.
The last ensemble, with additional features and sentiment analysis, does not
improve the XLM-T nal performance.
This paper has described our participation in the EmoEvalEs competition framed
in the IberLef 2021 Conference. Our proposal relies on using large pretrained
language models, outperforming previous methods with little e ort using the
HuggingFace library, which provides a straightforward implementation of these
pretrained language models. The pretrained model we have used is a RoBERTa
transformer trained on a multilingual corpus of tweets, XLM-T. We have
evaluated di erent strategies to approach the problem, nding that the ne-tuned
model for a multiclass classi cation task obtains better results than the
combination of various binary classi ers and the model with additional features. We have
achieved rst place in the EmoEvalEs competition with this model, obtaining a
macro-averaged F1 score of 71.70%.</p>
        <p>This work also presents an ensemble method that combines several base
classi ers with the XLM-T model to improve the nal performance by adding
more knowledge to the system. Although we have found a slight improvement in
the overall classi cation metrics in the development set, this enhancement has
not continued in the test set. The obtained results suggest that combining such a
transformer architecture with classical machine learning methods is a challenge,
which must be done carefully.</p>
        <p>
          We propose further lines of research to improve this work. Firstly, the e ective
combination of additional features that carry new information could enhance the
classi er's overall performance. Secondly, we suggest using a weighted validation
loss during the ne-tuning of the language model to deal with the unbalanced
dataset problem. Moreover, using a monolingual pretrained model for the speci c
language of the task could improve the obtained results. In this sense, using the
Spanish language model BETO [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] seems promising since it has demonstrated
great results in similar tasks like sentiment analysis. Finally, this same method
could be applied for emotion classi cation in other languages with the same
pretrained language model, XLM-T.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The authors want to thank the help and support from the Catedra Cabify
(Cabify Chair) at the ETSI Telecomunicacion of the Universidad Politecnica de
Madrid. Moreover, the authors would also like to gratefully acknowledge
MeaningCloud's support in facilitating this research work. Finally, the authors would
like to acknowledge the support of NVIDIA Corporation with the donation of
the Titan X Pascal GPU used in this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Araque</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcuera-Platas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez-Rada</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          :
          <article-title>Enhancing deep learning sentiment analysis with ensemble techniques in social applications</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>77</volume>
          , 236{
          <fpage>246</fpage>
          (
          <year>2017</year>
          ). https://doi.org/https://doi.org/10.1016/j.eswa.
          <year>2017</year>
          .
          <volume>02</volume>
          .002, https://www.sciencedirect.com/science/article/pii/S0957417417300751
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Araque</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatti</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staiano</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Depechemood++
          <article-title>: a bilingual emotion lexicon built through simple yet powerful techniques</article-title>
          .
          <source>IEEE Transactions on A ective Computing</source>
          pp.
          <volume>1</volume>
          {
          <issue>1</issue>
          (
          <year>2019</year>
          ). https://doi.org/10.1109/TAFFC.
          <year>2019</year>
          .2934444
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Araque</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          :
          <article-title>An approach for radicalization detection based on emotion signals and semantic similarity</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>17877</issue>
          {
          <fpage>17891</fpage>
          (
          <year>2020</year>
          ). https://doi.org/10.1109/ACCESS.
          <year>2020</year>
          .2967219
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Araque</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          :
          <article-title>A semantic similarity-based perspective of a ect lexicons for sentiment analysis</article-title>
          .
          <source>Knowledge-Based Systems 165</source>
          ,
          <fpage>346</fpage>
          {
          <fpage>359</fpage>
          (
          <year>2019</year>
          ). https://doi.org/https://doi.org/10.1016/j.knosys.
          <year>2018</year>
          .
          <volume>12</volume>
          .005, https: //www.sciencedirect.com/science/article/pii/S0950705118305926
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anke</surname>
            ,
            <given-names>L.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camacho-Collados</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>XLM-T: A multilingual language model toolkit for twitter</article-title>
          .
          <source>CoRR abs/2104</source>
          .12250 (
          <year>2021</year>
          ), https://arxiv.org/abs/ 2104.12250
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish pretrained bert model and evaluation data</article-title>
          .
          <source>In: PML4DC at ICLR</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhary</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wenzek</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzman</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>8440</volume>
          {
          <fpage>8451</fpage>
          . Association for Computational Linguistics,
          <source>Online (Jul</source>
          <year>2020</year>
          ). https://doi.org/10.18653/v1/
          <year>2020</year>
          .aclmain.
          <volume>747</volume>
          , https://www.aclweb.org/anthology/2020.acl-main.
          <fpage>747</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Houlsby</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giurgiu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jastrzebski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morrone</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Laroussilhe</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gesmundo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Attariyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelly</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Parameter-e cient transfer learning for NLP</article-title>
          . CoRR abs/
          <year>1902</year>
          .00751 (
          <year>2019</year>
          ), http://arxiv.org/abs/
          <year>1902</year>
          .00751
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          . CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ), http://arxiv.org/abs/
          <year>1907</year>
          .11692
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          , M. (eds.):
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>Plaza-del-</article-title>
          <string-name>
            <surname>Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez-Zafra</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molina-Gonzalez</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <article-title>Uren~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
          </string-name>
          n-Valdivia, M.T.:
          <article-title>Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>Plaza-del-</article-title>
          <string-name>
            <surname>Arco</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
            <given-names>~</given-names>
          </string-name>
          <article-title>a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
          </string-name>
          n-Valdivia, M.T.:
          <article-title>EmoEvent: A Multilingual Emotion Corpus based on di erent Events</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>1492</volume>
          {
          <fpage>1498</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          , Marseille, France (May
          <year>2020</year>
          ), https://www.aclweb.org/anthology/2020.lrec-
          <volume>1</volume>
          .
          <fpage>186</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rust</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfei</surname>
            <given-names>er</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Vulic</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>How good is your tokenizer? on the monolingual performance of multilingual language models</article-title>
          . CoRR abs/
          <year>2012</year>
          .15613 (
          <year>2020</year>
          ), https://arxiv.org/abs/
          <year>2012</year>
          .15613
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Learning sentiment-speci c word embedding for twitter sentiment classi cation</article-title>
          .
          <source>In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <volume>1555</volume>
          {
          <issue>1565</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          ), http: //arxiv.org/abs/1706.03762
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delangue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cistac</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rault</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funtowicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
          </string-name>
          , J.:
          <article-title>Huggingface's transformers: State-ofthe-art natural language processing</article-title>
          . CoRR abs/
          <year>1910</year>
          .03771 (
          <year>2019</year>
          ), http://arxiv. org/abs/
          <year>1910</year>
          .03771
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Ensemble of feature sets and classi cation algorithms for sentiment classi cation</article-title>
          .
          <source>Information sciences 181(6)</source>
          ,
          <volume>1138</volume>
          {
          <fpage>1152</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Yadav</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vishwakarma</surname>
            ,
            <given-names>D.K.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis using deep learning architectures: a review</article-title>
          .
          <source>Arti cial Intelligence Review</source>
          <volume>53</volume>
          (
          <issue>6</issue>
          ),
          <volume>4335</volume>
          {
          <fpage>4385</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>