<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>URJC-Team at EmoEvalEs 2021: BERT for Emotion Classi cation in Spanish Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jorge Alberto Flores Sanchez</string-name>
          <email>jorgeflores8185@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Soto Montalvo Herranz</string-name>
          <email>soto.montalvo@urjc.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raquel Mart nez Unanue</string-name>
          <email>raquel@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Educacion a Distancia</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Rey Juan Carlos</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the URJC-Team in the EmoEvalEs 2021 task of the IberLEF evaluation campaign. The task consists of classifying the emotion expressed in a tweet among seven di erent classes of emotion. Our proposal is based on transfer learning using BERT language modeling. We train three ne-tuned BERT models nally selecting for the submitted runs two of them, along with a system that combines all the models by means of an ensemble method. We obtained competitive results in the challenge, ranking fth. Additional work needs to be done to improve the results.</p>
      </abstract>
      <kwd-group>
        <kwd>Emotion Classi cation</kwd>
        <kwd>Tweets</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Transformer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Material and Methods</title>
      <p>
        Data
The EmoEvalEs dataset [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is based on events that took place in April 2019
and related to di erent domains: entertainment, catastrophe, political, global
commemoration, and global strike. For the task, the data were divided into dev,
training and test partitions. The distribution of emotions for each dataset is
shown in Table 1.
      </p>
      <p>To develop our proposal we have used the development and training
partitions, as the test partition was later provided by the organizers to evaluate
the participating systems and determine the winner of the challenge. We have
merged the training and development data by blending them together to create
a larger training set, which will be referred to hereafter as training data.</p>
      <p>
        We randomly selected 90% of each emotion class to train the model and the
remaining 10% to test it. Table 2 shows the nal distribution of these data.
We explore the use of Bidirectional Encoder Representations from Transformers
(BERT) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a deep learning approach that has proven to be very successful
when applied to several NLP tasks. In particular, we have experimented with a
pre-trained BERT model, BETO [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], as the core for the semantic representation
of the input tokens. BETO is a BERT model trained on over 300M lines of a
Spanish corpus, and it is similar in size to a BERT-Base model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        BETO has 12 self-attention layers with 16 attention-heads each, using 1024
as hidden size. In total the model has 110M parameters. Two versions of BETO
are trained: one with cased data and one with uncased data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The proposed system has been implemented in Python 3.7 with
HuggingFace's transformers library [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Three models have been trained with di erent
data and con guration parameters. First of all, a basic pre-processing were
carried out, eliminating the special character `#'. Then it is tokenized by taking
the words to subwords found in the 32k token vocabulary. Adam optimizer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
was used with the standard parameters ( 1 = 0.9, 2 = 0.999). We applied a
linear decay function to decrease the initial learning rate to 0. And nally the
max sequence length is 128 tokens.
      </p>
      <p>We xed some hyper-parameters for the di erent models:
{ Model 1. The case model was trained with all data, batch size=32, learning
rate=2e-5, epochs=4, and weight decay=0.1.
{ Model 2. The uncase model was trained with all data, batch size=32, learning
rate=5e-5, epochs=3, and weight decay=0.1.
{ Model 3. The case model was trained with all data, but removing 30% of
the \others" class since it was the majority class. Batch size=32, learning
rate=2e-5, epochs=4, and weight decay=0.1.</p>
      <p>We have submitted three di erent runs: one that assembles the results of
the three previous models by means of a voting system, the nal result being
the class with the most votes, in the event of a tie, the nal result will be the
prediction of Model 1. The others two submitted runs with the results of models
2 and 3, respectively.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The evaluation measures used by organizers are the following: accuracy and the
weighted-averaged versions of Precision, Recall, and F1. The participant systems
are ranked by the weighted-averaged F1 and accuracy measures in a multi-class
evaluation scenario.</p>
      <p>Table 3 shows the results obtained by the three runs in the challenge. The
best results are for the ensemble method, this is because the predictions of the
multiple models are combined taking advantage of the performance of each of
them. Table 4 contains the results of the three submitted runs for each
emotion. The system achieves the best results for other, sadness, joy, fear and anger
classes. However, for the disgust and surprise classes it works badly, this is
because the system confuses these emotions with others that are similar, such as
disgust with anger, and surprise with joy or others. In addition, the small
sample that is available for these classes can be a ecting. Thus, the system does not
have enough data to train the model and be able to di erentiate between these
classes.</p>
      <p>Making a comparison between Model 2 and Model 3, it can be seen that
when training the model with the least amount of data for the others class,
the performance of the model increases for the joy class and decreases for the
others class. This is because with the new data distribution, the model is able to
better di erentiate the joy class over the others class, thus classifying a greater
number of tweets correctly. Otherwise, as this data decrease is not performed for
the others class, the performance of the joy class decreases, since several tweets
belonging to this class are predicted by the model as others.</p>
      <p>Moreover, it can be seen that although the best general results have been
obtained with the voting system, there are classes like fear and disgust where
this is not the case, since for Model 2, the fear class reaches an F1 of 0.7 and
voting 0.6, and the same for disgust class, where Model 2 reaches 0.11 and voting
0.09.</p>
      <p>Finally, it is important to note that although the results are slightly worse
in some classes, overall robustness is gained.</p>
      <p>
        On the other hand, a comprehensive comparison and ranking of the results
from all the shared task participants can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Table 5 summarizes
these results. Our system has reached position number four among the fteen
participants. Making a comparison with the best system on the accuracy metric,
the di erence is 2.475%, this is equivalent to the fact that the system correctly
classi ed 41 more tweets than our system (the validation set is composed by
1656 tweets).
This paper describes the system presented by the URJC-Team at the
EmoEvalEs 2021 task at the IberLEF evaluation campaign. Several deep-learning
models were trained and ensembled to automatically detect and classify
emotions expressed by people from associated events in Spanish tweets. Although it
is a complex task, our system achieves good results for certain emotions and is
competitive with respect to the other systems of the other participants, obtaining
a di erence of 2.475% with the best system of the workshop.
      </p>
      <p>As future work, it is intended to carry out further experiments with BETO
and other pre-trained linguistic models in order to improve the results in the
task, taking into account the classes that were more di cult to detect, disgust
and surprise. In addition, it might be interesting to do some preprocessing to
deal with unbalanced data.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work was supported by MCI/AEI/FEDER, UE DOTT-HEALTH Project
(MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish pretrained bert model and evaluation data</article-title>
          .
          <source>In: Proceedings of the Practical ML for Developing Countries Workshop</source>
          at the Eighth International Conference on Learning
          <source>Representations (ICLR</source>
          <year>2020</year>
          ). Addis Ababa,
          <source>Ethiopia (Apr</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          -1423, https://www.aclweb.org/anthology/N19-1423
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 3rd
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          ), http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          , M. (eds.):
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Plaza-del-Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez-Zafra</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molina-Gonzalez</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <article-title>Uren~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
          </string-name>
          n-Valdivia, M.T.:
          <article-title>Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Plaza-del-Arco</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
            <given-names>~</given-names>
          </string-name>
          <article-title>a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin-Valdivia</surname>
          </string-name>
          , M.T.:
          <article-title>EmoEvent: A Multilingual Emotion Corpus based on di erent Events</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>1492</volume>
          {
          <fpage>1498</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          , Marseille, France (May
          <year>2020</year>
          ), https://www.aclweb.org/anthology/2020.lrec-
          <volume>1</volume>
          .
          <fpage>186</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delangue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cistac</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rault</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funtowicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
          </string-name>
          , J.:
          <article-title>Huggingface's transformers: State-of-the-art natural language processing</article-title>
          . CoRR abs/
          <year>1910</year>
          .03771 (
          <year>2019</year>
          ), http://arxiv.org/abs/
          <year>1910</year>
          .03771
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>