<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Emotion Detection for Spanish with Data Augmentation and Transformer-Based Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hongxin Luo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Science and Engineering Yunnan University</institution>
          ,
          <addr-line>Yunnan</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we describe the participation of Yeti team in IberLEF EmoEvalEs task, which is based on the Spanish Semantic Analysis in TASS 2020 version, and proposes as separate task for 2021 in IberLEF. We introduce the methods we used in the emotion detection task and the results obtained. First, we used back-translation data augmentation technology to solve the problems of data scarcity and data imbalance. Our method is based on transfer learning using the BETO language model for sentiment classi cation in Spanish. This system showed excellent performance and nally achieved an accuracy score of 0.7125. We won third place in the nal result, which is only 0.0151 points away from the best result.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing</kwd>
        <kwd>Transformers</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Sentiment Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Sentiment analysis in tweets is a challenging task because a lot of subjective
information is generated every day. It is very di cult to deal with these messages
with potential language phenomena [6], and these subjective languages can be
used to express private states beyond opinions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. People have been looking
for e cient sentiment analysis algorithms based on tweets [15]. In the past few
years, most of the work on sentiment analysis has combined neural network
models and word embedding techniques to achieve better results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][11]. This
work is to promote the development of a Twitter sentiment classi cation system
in Spanish.
      </p>
      <p>
        Iberian Languages Evaluation Forum (IberLEF) is a comparative evaluation
campaign for Natural Language Processing Systems in Spanish and other Iberian
languages [12]. The main content of EmoEvalEs task [13] includes classifying the
emotion expressed in a tweet as one of six Ekman's basic emotions [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that best
represents the mental state of the Twitter sender:Anger, Disgust, Fear, Joy,
      </p>
      <p>Sadness, Surprise or Others. In the task, the data set is divided into training
set, development set, and test set.</p>
      <p>
        This article mainly summarizes our participation in Emotion Detection and
Evaluation tasks [13]. According to the results of TASS 2020 [6], we can nd that
the performance of the BERT-based model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] on the task is very competitive [6].
We considered a number of state-of-the-art neural network models, and nally
our approach is to adaptively ne-tune the Transformer architecture based on
multi-language pre-trained. We used ALBERT as the baseline for comparison.
      </p>
      <p>The rest of this paper is organized as follows. Chapter 2 describes the task
and the corpus. Chapter 3 introduces our system in detail. Chapter 4
introduces the experimental setup. Chapter 5 outlines the evaluation process, and
the conclusions are in Chapter 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Corpus description</title>
      <p>The organizer proposed a sentiment detection task, which is a single-label
multiclassi cation task, which divides the sentiment labels corresponding to tweets
into seven di erent sentiments. The seven sentiments are Anger, Disgust, Fear,
Joy, Sadness, Surprise and Others. The data set is mainly collected from related
events in di erent domains in April 2019, including entertainment, catastrophe,
politics, global commemorations and global strikes [14]. The corpus is divided
into three parts: training, development, and testing, with a total of 8223 items.
The data in the training set and the development set have a total of ve
attributes, namely id, event, tweet, o ensive, and emotion. The test set does not
contain the emotion label. In order to prevent the classi er from relying on
hashtags to classify sentiment with tweets, the organizer replaced the hashtag in the
dataset with the keyword \HASHTAG" [6]. The challenges we have to face are
as follows [14]:
{ Lack of context: Tweets are short (up to 240 characters).
{ Informal language: Misspellings, emojis and onomatopoeias are common.
{ Multiclass classi cation: The dataset is labeled with seven di erent classes.</p>
    </sec>
    <sec id="sec-3">
      <title>Materials and methods</title>
      <sec id="sec-3-1">
        <title>Pre-processing</title>
        <p>Data preprocessing is particularly important for reducing the noise
information in tweets. High-quality input data can improve the output performance of
the model [8]. Before conducting our experiments, we performed the following
preprocessing on the data respectively. First of all, we delete the URLs and
punctuation marks from the text content. In order to remove unnecessary
semantic information, we removed the stop words through the NLTK toolkit and
converted the content of the tweets to lowercase. Finally, we also used the emoji
library to convert the emojis in the tweets into text content. At the same time,
we also kept the original version of the data set. In the experiment, we compared
the results of various pre-processing in the experiment.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data augmentation</title>
        <p>Due to the extremely unbalanced distribution of the data set, the model tends
to be over- tting, predicting the most frequent category. We decided to use data
augmentation technology to solve this problem. A simple and e ective method
is back-translation [16]. Back-translation is to translate sentences into other
languages (such as Spanish to English), and then translate English back to
Spanish. Check whether the newly generated sentence is di erent from the original
sentence. If it is not the same, use the newly generated sentence as the data
augmentation version of the original text. Run back-translation in multiple
different languages at the same time to generate more variants. This augmentation
technique helps to introduce changes in vocabulary and syntax in tweets, most
of the time it can maintain the original meaning [10]. We used two representative
languages (Chinese and English) to run back-translation to expand the training
data, because we found during the experiments that using more languages to
run back-translation does not signi cantly improve the experimental results. In
order to obtain translation results, we used Baidu Translation API service.*
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>ALBERT</title>
        <p>We used the ALBERT model as the baseline of our work, because ALBERT
[9] is a newly released model that has excellent performance in various
Natural Language Processing (NLP) tasks. ALBERT solves the problem of memory
and training speed by designing a Lite BERT architecture, which has fewer
parameters than the traditional BERT architecture [9]. The structure of ALBERT
is basically the same as BERT, and there are three speci c improvements,
including embedding layer parameter factorization, cross-layer parameter sharing,
* Baidu Translation API available at https://api.fanyi.baidu.com/
Next Sentence Prediction (NSP) task is changed to Sentence Order Prediction
(SOP) task. The hyperparameter settings of the model are as follows (the
settings found to perform well in several ne-tunings; the parameters not mentioned
keep the default values):
{ albert model : albert base v2
{ max seq length : 128
{ optimizer : AdamW
{ warmup step : 200
{ learning rate : 3e-5
{ train step : 800
{ train batch size : 64
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>BETO</title>
        <p>
          Inspired by the results of the TASS 2020 seminar and our emotion classi cation
task, we decided to use the BETO model to complete this challenging Emotion
Detection and Evaluation for Spanish task. BETO is a BERT model trained on
a large Spanish corpus. BETO model combines the pre-training model with the
downstream task model, which means that the BETO model is still used when
doing downstream tasks, and it naturally supports text classi cation tasks, and
there is no need to modify the model when doing text classi cation tasks. BETO
is trained on a large Spanish corpus, which can more accurately represent the
text features of Spanish, and can solve the problem of the dependence of the
task on the Spanish language. It has BETO-uncased and BETO-cased. We used
BETO-cased as our Language Model (LM). The size of BETO is similar to
BERT-base, according to the guidelines presented by Can~ete et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], BETO
has received Whole Word Masking (WWM) training, and both use about 31k
Byte Pair Encoder (BPE) subwords constructed by Sentence Piece Vocabulary
list, and have been trained in 2M steps [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In the training process, the dynamic
mask technology is introduced, which is to use 10 di erent masks for the same
sentence in the corpus. When using WWM to mask a speci c token, if the token
corresponds to a subword in a sentence, all consecutive tokens that make up the
same word will be masked. We use the ADAM optimizer [7] for optimization.
We hope to use BETO as a basic initial LM to construct a robust method to
complete this challenge and achieve excellent performance in the nal result.
The settings of the optimal hyper-parameters in the experiment are as follows
(the settings that performed well in several ne-tunings; the parameters not
mentioned keep the default values):
{ beto model : BETO-cased
{ max seq length : 128
{ train batch size : 32
{ learning rate : 2e-5
{ num train epochs : 3.0
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>In this section we introduce our experimental procedure. In order to compare
the results of this experiment, we will use ALBERT as the baseline. We compare
with the BETO model with the best result obtained by ne-tuning during the
experiment (the hyperparameter settings are shown in the introduction in
section 3.3), both are pre-trained deep models. The IberLEF organization released
three corpora for training, development, and testing. The label of each tweet
corresponds to one of the 6 emotions.</p>
      <p>In the exhaustive search process with the BETO model as the main research
object, we determined the model con guration parameters (as shown in section
3.4). Inspired by TASS 2020 Task 1, through observing the data, we found that
the tweet also corresponds to an O ensive label, so based on the nearly
determined model con guration parameters, we tried to input the O ensive + Tweet
content into the model for prediction and compare with the result of input only
Tweet.</p>
      <p>Then we processed the unbalanced data in the corpus by using back-translation
data augmentation technology, mainly using Chinese and English as intermediate
languages for back translation. We mainly enhance the two very few categories
(Fear and Disgust) to expand the data volume and prevent the model from
over tting and predicting a large number of categories.</p>
      <p>Finally, we also tried to convert the emojis in data into corresponding content
texts to explore better model performance. We input the processed data into
the two models for comparison. All experiments are performed on a machine
equipped with Nvidia GPU (Tesla V100).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The results of our model on the validation set of the Emotion Detection and
Evaluation for Spanish task are shown in Table 2. Our nal submission results
and rankings on the o cial test set are shown in Table 5. The nal result of our
system is quite competitive. In the nal submitted result, it got the fourth place
overall with a score of 0.7125, which is only 0.0151 behind the best result.</p>
      <p>The results given in Table 4 show that the results obtained by using the
BETO model are better than the baseline, and at the same time far better than
the BERT model using the multi-language model. This fully shows that the
result of using the speci c languages pre-trained model is higher than that of
the multi-language pre-trained model. Through Table 2 we also observe that our
data preprocessing does not promote the performance of the model. Compared
with the unprocessed raw data, the e ect is reduced. After our discussion, we
concluded that we believe that the reason for the drop in results may be related
to the pre-trained of the model. The pre-trained of the original BERT model
is to ensure that the context and semantic connections can be learned, and the
input data set is a raw material that has not undergone any pre-processing
raw data, and I added preprocessing when ne-tuning downstream tasks, which
may destroy the contextual text relationship, which will result in poor results.
Finally, it can be seen from Table 3 that the back-translation data augmentation
technology we used is helpful to the improvement of model performance, and the
conversion of emojis into text content also slightly improves the e ect. Our nal
submission results and rankings on the o cial test set are shown in Table 5.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We propose a BETO-cased sentiment classi cation system for IberLEF 2021
EmoEvalEs task. This method is based on BETO transfer learning. It is mainly
applied to the sentiment analysis of Spanish tweets, which includes an additional
data augmentation step, and has achieved good results in the Spanish task.
We are very satis ed with the results of our rst participation in the IberLEF
workshop. Although the method is relatively simple, it is important that we
have achieved very good results in the task by exploring the hyperparameters of
the model and con guring our model reasonably in the task. Careful selection
of language models and data augmentation techniques play an important role
in sentiment analysis of small sample data set. However, there are still huge
challenges in sentiment analysis regarding the content of tweets, and our system
still has a lot of room for improvement. In the future work, I hope to use more
powerful data augmentation technology to solve the problem of data scarcity.
We look forward to further exploring more advanced techniques to solve the
sentiment analysis of Spanish tweets.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>First of all, thank the organizer for the valuable opportunity provided to us.
Then I would also like to thank the teacher for supporting my research work
and the patience of future reviewers.
6. Garc a-Vega, M., D az-Galiano, M.C., Garc a-Cumbreras, M.A., del Arco, F.M.P.,
Montejo-Raez, A., Jimenez-Zafra, S.M., Camara, E.M., Aguilar, C.A., Antonio,
M., Cabezudo, S., et al.: Overview of tass 2020: introducing emotion detection
(2020)
7. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014)
8. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Data preprocessing for supervised
leaning. world academy of science, engineering and technology. International
Journal of Computer, Electrical, Automation, Control and Information Engineering 1,
4104{4109 (2007)
9. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A
lite bert for self-supervised learning of language representations. arXiv preprint
arXiv:1909.11942 (2019)
10. Luque, F.M.: Atalaya at tass 2019: Data augmentation and robust embeddings for
sentiment analysis. arXiv preprint arXiv:1909.11241 (2019)
11. Mart nez Camara, E., Almeida-Cruz, Y., D az Galiano, M.C., Estevez-Velarde, S.,
Garc a Cumbreras, M.A., Garc a Vega, M., Gutierrez, Y., Montejo Raez, A.,
Montoyo, A., Munoz, R., et al.: Overview of tass 2018: Opinions, health and emotions
(2018)
12. Montes, M., Rosso, P., Gonzalo, J., Aragon, E., Agerri, R., Alvarez Carmona,
M., Alvarez Mellado, E., Carrillo-de Albornoz, J., Chiruzzo, L., Freitas, L.,
Gomez Adorno, H., Gutierrez, Y., Jimenez-Zafra, S.M., Lima, S., Plaza-de Arco,
F.M., Taule, M. (eds.): Proceedings of the Iberian Languages Evaluation Forum
(IberLEF 2021) (2021)
13. Plaza-del-Arco, F.M., Jimenez-Zafra, S.M., Montejo-Raez, A., Molina-Gonzalez,
M.D., Uren~a-Lopez, L.A., Mart n-Valdivia, M.T.: Overview of the EmoEvalEs task
on emotion detection for Spanish at IberLEF 2021. Procesamiento del Lenguaje
Natural 67(0) (2021)
14. Plaza-del-Arco, F., Strapparava, C., Uren~a-Lopez, L.A., Martin-Valdivia, M.T.:
EmoEvent: A Multilingual Emotion Corpus based on di erent Events. In:
Proceedings of the 12th Language Resources and Evaluation Conference. pp. 1492{
1498. European Language Resources Association, Marseille, France (May 2020),
https://www.aclweb.org/anthology/2020.lrec-1.186
15. Villena Roman, J., Lana Serrano, S., Mart nez Camara, E., Gonzalez Cristobal,</p>
      <p>J.C.: Tass-workshop on sentiment analysis at sepln (2013)
16. Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation
for consistency training. arXiv preprint arXiv:1904.12848 (2019)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Algeo</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A comprehensive grammar of the english language. by randolph quirk, sidney greenbaum, geo rey leech, and jan svartvik</article-title>
          .
          <source>london: Longman</source>
          .
          <year>1985</year>
          . x+
          <volume>1779</volume>
          .
          <source>Journal of English Linguistics</source>
          <volume>20</volume>
          (
          <issue>1</issue>
          ),
          <volume>122</volume>
          {
          <fpage>136</fpage>
          (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish pre-trained bert model and evaluation data</article-title>
          .
          <source>PML4DC at ICLR</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D</given-names>
            <surname>az Galiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.C.</given-names>
            ,
            <surname>Mart nez Camara</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          , Garc a Cumbreras,
          <string-name>
            <surname>M.A.</surname>
          </string-name>
          , Garc a Vega,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Villena</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>The democratization of deep learning in tass 2017 (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ekman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Are there basic emotions? (</article-title>
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>