Emotion Analysis for Spanish Tweets: The
    Model based on XLM-RoBERTa and Bi-GRU

    Yuanchi Qu1[0000−0002−0971−1795] , Shuangjun Jia2[0000−0001−8315−5662] , and
                       Yanjie Zhang3[0000−0001−7356−6791]
                       1
                         Yunnan University, Yunnan, P.R. China
                                qychi@foxmail.com
                       2
                         Yunnan University, Yunnan, P.R. China
                                2858044698@qq.com
                       3
                         Yunnan University, Yunnan, P.R. China
                               jyzhangisme@163.com


        Abstract. Our team (the team-name is qu) participate in the classifica-
        tion task of EmoEvalEs@IberLEF 2021. The task requires to classify the
        emotion of the Spanish tweets (anger, disgust, fear, joy, sadness, surprise,
        others). To solve the problem, we propose a model (XLM-RoBERTa
        combined with Bidirectional Gated Recurrent Units). In our model, the
        accuracy is 0.449879 and the averaged precision is 0.618833. In the final
        evaluation results, our team ranked 15th.

        Keywords: Emotion Analysis · BiGRU · Spanish· Classification.


1     Introduction

With the development of society and technology, the number of users on the
network has increased sharply, and communication on Internet has become an
important way of communication in modern society [1]. The comments made on
social media have a huge impact on the mood of users. Emotional classification of
the speech on social media can help the government monitor people’s emotional
changes and trends of public opinion, and this even can avoid the occurrence of
vicious and false incidents [2]. Therefore, it has great significance to detect and
classify users’ emotions expressed on social media.
    Since most of the tweets’ messages do not have intonation and expression.
So it is a difficult task to analyze the emotion of users on social media. To
encourage the development of this field, EmoEvalEs@IberLEF 2021 [3] released
a new task: Emotion detection and evaluation for Spanish. The goal is to classify
the emotion of tweets into 7 categories (anger, disgust, fear, joy, sadness, surprise,
others). This task has three main challenges: (1) Most of the tweets are very
short sentences, so the context cannot be analyzed effectively. (2) Tweets are
    IberLEF 2021,September 2021, Málaga, Spain.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
not formal text, and there are likely some unnormal factors such as misspelling
of words and emoticons. (3) Different from the two-classification task, this task
is a seven-classification task, which has more stringent requirements for model
performance.
    This article uses a model combining XLM-RoBERTa [4] with Bi-GRU [5] to
complete the emotion classification of Spanish. Experimental results show that
this model can effectively improve the performance of emotion classification. In
the process of model training, we use the training data provided by the compe-
tition official. The paper’s structure of the rest is as follows: The second part is
related research work on the analysis of emotions. The third part is about the
processing of the data set and the description of the model (XLM-RoBERTa
and Bi-GRU). The fourth part is the analysis of the experimental results and
the summary of the shortcomings. The fifth part is the conclusion of the paper.
And the sixth part is the acknowledgements of the paper.


2   Related Work

In recent years, in the fields of NLP (Natural Language Processing) and DM
(Data Mining), researches about emotion analysis have always existed. Scholars
at home and abroad are mainly doing researches and learning from emotional
dictionaries, Machine Learning, and Deep Learning. The keys of the analysis
method based on the emotion dictionary are to use the emotional dictionary and
to judge the users’ emotional tendency through a series of calculations (semantic
correlation). This method requires to extract the features manually. For ”Big
Data”, the method can no longer meet daily needs.
    To make up for the shortcomings of emotional dictionary, the papers com-
pared SVM (Support Vector Machines), CRF (Conditional Random Field) and
other methods [6], and finally concluded that SVM is generally more advan-
tageous. Zampieri et al. [7] proposed a emotion classification based on matrix
projection, normalized vector algorithm and KNN (K-Nearest Neighbor) algo-
rithm. Experiments prove that the accuracy of the algorithm is high, and the
time of classification is greatly reduced. Comparison of Machine Learning and
emotional dictionary methods, Machine Learning no longer needs to manually
label tags, and it no longer needs to build emotional dictionaries. And Machine
Learning is more refined in feature extraction and semantic analysis. However,
some features of text still need to be manually annotated, and they are not fully
automated.
    For the shortcomings of Machine Learning, the methods based on Deep
Learning can solve it well, and it has been proved that Deep Learning is more
prominent in emotion analysis than Machine Learning. Kim [8] proposed to use
CNN (Convolutional Neural Networks) training model to complete the emotion
analysis.
    Zhu et al. [9] proposed a model based on LSTM to solve the problem of emo-
tion classification. The papers [10, 11] proposed the BERT pre-training model
and used the Transformer to train massive corpus. The papers [12, 13] proposed
that they used the BERT model to extract the semantic feature from the com-
ments, and then input the acquired features into the LSTM model for tendency
classification. The accuracy is also improved compared with the BERT model.
To further improve the performance of the model, we propose a model (XLM-R
combined with Bi-GRU).


3     Methodology and Data

3.1   Data description

In this mission, we use the official data set provided by EmoEvalEs@IberLEF
2021 [14]. There are a total of 8223 data (tweets written in Spanish). There are
844 items in the development set, 1656 items in the test set, and 5723 items
in the training set. The labels in the training set are anger, disgust, fear, joy,
sadness, surprise, others [15]. Their data volume is not evenly distributed.


3.2   Model Description

In order to choose a better model, we get the scores of three different models on
the development set under the same conditions. As shown in Table 1 below.
    In the Table 1, we observe that XLM-R combined with Bi-GRU model has
more advantages than other models. LSTM network or GRU network can ef-
fectively deal with the long-range dependence problem of length sequences, and
are widely used in natural language processing. However, the two networks only
consider the correlation between sequences in one direction, and do not consider
the possible correlation of future information, so they have great limitations in
the application of strong round-trip correlation.


        Table 1. Evaluation results of different models on development set


                 Model  Optimization Accuracy Averaged F1 score
                            sgd       0.4452       0.4535
             XLM-R
                           adam       0.4746       0.4059
                            sgd       0.5156       0.5342
          XLM-R + LSTM
                           adam       0.5556       0.5725
                            sgd       0.5812       0.6031
         XLM-R + Bi-GRU
                           adam       0.5949       0.6034


    This paper uses the XLM-R model to extract the features of the training
set and then inputs the acquired features into the Bi-GRU model to extract the
emotional features of comments. Finally, we classify the emotion tendency by
the Softmax method [16]. The specific process is shown in Figure 1.
Fig. 1. The flow chart of emotion classification based on XLM-RoBERTa combined
with Bi-GRU model


    In the GRU, there are only two gates: update gate and reset gate. The facts
have proved that compared with LSTM, GRU reduces the parameters and im-
proves the efficiency of the model. The specific structure is shown in Figure 2.
    In Figure 2, the Zt represents the update gate and the rt represents the reset
gate. The Zt can determine the renewal of the cell state and the rt decides to
write some information into eht . The smaller the number of rt , the less information
to be written into the candidate set, which means that the amount of previously
useful information is less. Zt is the update gate, which determines to update
the previous information or not. It can discard useless information to avoid the
problem of long-term dependence. Zt is the logic gate when the model renews
the activation. The equation is as follows:

                              Zt = σ(ωz · [ht−1 , xt ])                          (1)

   In the Equation 1, σ is the activation function (Sigmoid) and the ω is the
weight matrix.
   The rt is the reset gate, which determines whether the previous information
needs to be reset. The rt decides whether to abandon the previous activation ht
when the candidate is activated. The equations is as follows:

                               rt = σ(ωr · [ht−1 , xt ])                         (2)
                               Fig. 2. The model:GRU


                            ht = tanh(ω · [rt · ht−1 , xt ])
                            e                                                       (3)
    In the Equation 3, the tanh is the activation function and the ht is the hidden
layer. The calculation formula for receiving [ht−1 ,ht ] in the calculation process
is as follows:
                           ht = (1 − zt ) · ht−1 + zt · e
                                                        ht                      (4)
    In the model (XLM-R combined with Bi-GRU), the XLM-R layer is the
embedding layer of the Bi-GRU layer, and the Bi-GRU layer is the hidden layer.
The input sequence is embedded in the model from two directions and the two-
                                                       (i)                    (i)
way information is stored. Last the information (ht ) is finally output. ht
                                                                   →
                                                                   −
represents the bidirectional GRU information of the i-th text, and ht represents
                                                 ←
                                                 −
the forward GRU information of the i-th text. ht represents the reverse GRU
information of the i-th text. The equations are as follows:
                                →
                                −              −−→
                                ht = GRU (xt , ht−1 )                               (5)

                                ←
                                −              ←−−
                                ht = GRU (xt , ht−1 )                               (6)
Figure 3 below shows the model diagram of XLM-R. The XLM-R model adopts
the structure of the bidirectional Transformer for encoding. It can convert text
information into information (vector) that can be recognized by the machine [17].
XLM-R adopts a dynamic mask and cancels the NSP (Next Sentence Prediction)
task.
    In Figure 3, E1 , E2 , E3 ,. . . . . . , En−1 , En represent the input (Language em-
beddings + Position embeddings + Token embeddings). After the bidirectional
                        Fig. 3. The model: XLM-RoBERTa


Transformer encoder, we can get the vector of the tweet text (T1 , T2 , . . . . . . ,
Tn−1 , Tn ). The vector obtained through the XLM-R model training is expressed
              (i) (i)  (i)          (i)    (i)
as: Wi = (w1 , w2 , w3 , . . . .., wn−1 , wn ), where the Wi is the vector-matrix of
                           (i)
the i-th sentence. The Wj is the feature of each word, and the n is the maximum
length of the sentence.
    As shown in Figure 1, the entire XLM-R combined with Bi-GRU model is
mainly divided into 5 layers. The first layer is the input layer. We input the
officially provided data into the model; The second layer, we use the XLM-
R model to vectorize the tweets; The third layer, we input the obtained word
features into the Bi-GRU network to realize the emotion analysis of the text;
The fourth layer, we use the function (Softmax) to classify the features; The
fifth layer, the model output the emotion labels of the tweets (anger, disgust,
fear, joy, sadness, surprise, others).


4     Experiment and Results
4.1   Data preprocessing
The data come from the official competition. We find that the data are based on
events related to entertainment, disasters, politics, commemoration and global
strikes that occurred in April 2019. Since the data set comes from tweets, the
content of the data are not formal texts. So we need to process the data before
training and testing. (1) Use ”HASHTAG” to replace the tags in the data. (2)
Because the username has no effect on emotion analysis and may affect the judg-
ment of the model, a unified standard is adopted and ”#@USER” is used instead
of the username. (3) There are emoticons in the text. Replace all emoticons with
text. (4) Uniform case.
4.2   Experiment setting
The experimental equipments used in this experiment are as follows: Intel CORE
i7 CPU (16G), Hard Disk 1T, NVIDIA RTX 3080Ti. The operating system is
Windows 10. The tool of editor is the Pychrom2020. The framework of Deep
Learning is PyTorch. The optimizer used is Adam, and the learning rate is 2e-5.
The specific parameters of the model are shown in Table 2.


                       Table 2. The Setting of Parameter.


                        Parameter                   Value
                        optimizer                   Adam
                        Max seq length              80
                        Dimension                   512
                        Batch size                  32
                        dropout                     0.4
                        epoch                       70
                        Gradient accumulation steps 8


4.3   Results
The evaluation indexes for the Spanish emotion multi-classification model are
F1 score, averaged recall, averaged precision and accuracy. The partial results
in this competition are shown in Table 3 below.


                           Table 3. Evaluation results.


         Team    Accuracy Averaged precision Averaged recall F1 score
         GSI-UPM 0.727657 0.709411           0.727657        0.717028
         qu      0.449879 0.618833           0.449879        0.446947


    The team GSI-UPM win the first place in the competition. However, our
team qu only win the 15th place in this competition. In the competition subtask
3, we not achieve the ideal results. We think there may be three main reasons for
this result: (1) The value of the hyper-parameter is unreasonable. The value of
the epoch is too large and it can increase the number of iterations of the weights
in the network, and cause the model to over-fit. (2) The distribution of training
data is uneven. The distribution of the data amount of each label in the training
data is too uneven, resulting in poor performance of the trained model. Because
the linear classifier used in this model is biased to most classes, it causes the
deviation of the model. (3) The training set has only 5723 pieces of data. So the
generalization ability of the model is poor.


5   Conclusion
In this paper, we mainly propose a model (XLM-R combined with Bi-GRU) for
emotion detection of Spanish text (tweets). The performance of this model on
the development set is good, but the performance on the test data set is not
satisfactory. Therefore, I think there is still a huge room for improvement in the
performance of the model. In the next work, I will reset the hyper-parameters
and use the K-fold ensemble method to improve the generalization ability of the
model.


Acknowledgements
The completion of the paper is attributed to many people’s support. First and
foremost, I want to extend my heartfelt gratitude to my supervisor, Yanhua
Yang. He gives me much help and advice during the whole process of my writ-
ing. My thanks also go to the authors whose books and articles have given me
inspiration in my writing of the paper. Last but not least, I would like to thank
the organizers for their hard work.


References
 1. Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M.F., Davalos, S., Teredesai,
    A., De Cock, M.: Age and gender identification in social media. Proceedings of
    CLEF 2014 Evaluation Labs 1180, 1129–1136 (2014)
 2. Meina, M., Brodzinska, K., Celmer, B., Czoków, M., Patera, M., Pezacki, J., Wilk,
    M.: Ensemble-based classification for author profiling using various features. Note-
    book Papers of CLEF (2013)
 3. Montes, M., Rosso, P., Gonzalo, J., Aragón, E., Agerri, R., Álvarez-Carmona,
    M.Á., Álvarez Mellado, E., Carrillo-de Albornoz, J., Chiruzzo, L., Freitas, L.,
    Gómez Adorno, H., Gutiérrez, Y., Jiménez-Zafra, S.M., Lima, S., Plaza-de Arco,
    F.M., Taulé, M. (eds.): Proceedings of the Iberian Languages Evaluation Forum
    (IberLEF 2021) (2021)
 4. Agarwal, S., Sureka, A.: Characterizing linguistic attributes for automatic classi-
    fication of intent based racist/radicalized posts on tumblr micro-blogging website.
    arXiv preprint arXiv:1701.04931 (2017)
 5. Choe, D.E., Kim, H.C., Kim, M.H.: Sequence-based modeling of deep learning with
    lstm and gru networks for structural damage detection of floating offshore wind
    turbine blades. Renewable Energy 174, 218–235 (2021)
 6. Korhonen, A., Traum, D., Màrquez, L.: Proceedings of the 57th annual meeting of
    the association for computational linguistics. In: Proceedings of the 57th Annual
    Meeting of the Association for Computational Linguistics (2019)
 7. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.:
    Semeval-2019 task 6: Identifying and categorizing offensive language in social me-
    dia (offenseval). arXiv preprint arXiv:1903.08983 (2019)
 8. Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcasm in
    twitter and amazon. In: Proceedings of the fourteenth conference on computational
    natural language learning. pp. 107–116 (2010)
 9. Zhang, Z., Robinson, D., Tepper, J.: Detecting hate speech on twitter using a
    convolution-gru based deep neural network. In: European semantic web conference.
    pp. 745–760. Springer (2018)
10. Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets
    using deep convolutional neural networks. arXiv preprint arXiv:1610.08815 (2016)
11. Ghosh, A., Veale, T.: Fracking sarcasm using neural network. In: Proceedings of the
    7th workshop on computational approaches to subjectivity, sentiment and social
    media analysis. pp. 161–169 (2016)
12. Tay, Y., Tuan, L.A., Hui, S.C., Su, J.: Reasoning with sarcasm by reading in-
    between. arXiv preprint arXiv:1805.02856 (2018)
13. Zhang, M., Zhang, Y., Fu, G.: Tweet sarcasm detection using deep neural network.
    In: Proceedings of COLING 2016, the 26th International Conference on Compu-
    tational Linguistics: technical papers. pp. 2449–2460 (2016)
14. Plaza-del-Arco, F.M., Jiménez-Zafra, S.M., Montejo-Ráez, A., Molina-González,
    M.D., Ureña-López, L.A., Martı́n-Valdivia, M.T.: Overview of the EmoEvalEs task
    on emotion detection for Spanish at IberLEF 2021. Procesamiento del Lenguaje
    Natural 67(0) (2021)
15. Plaza-del-Arco, F., Strapparava, C., Ureña-Lopez, L.A., Martin-Valdivia, M.T.:
    EmoEvent: A Multilingual Emotion Corpus based on different Events. In: Pro-
    ceedings of the 12th Language Resources and Evaluation Conference. pp. 1492–
    1498. European Language Resources Association, Marseille, France (May 2020),
    https://www.aclweb.org/anthology/2020.lrec-1.186
16. Golubev, A., Loukachevitch, N.: Use of bert neural network models for senti-
    ment analysis in russian. Automatic Documentation and Mathematical Linguistics
    55(1), 17–25 (2021)
17. Yu, H., Ji, Y., Li, Q.: Student sentiment classification model based on gru neural
    network and tf-idf algorithm. Journal of Intelligent & Fuzzy Systems (Preprint),
    1–11