<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Emotion Classi cation for Spanish with XLM-RoBERTa and TextCNN</article-title>
      </title-group>
      <abstract>
        <p>On social networking platforms, users usually cannot perceive each other's speech tone and facial expressions, making the emotions conveyed by users on the Internet not clear enough. This task is Spanish emotion detection and evaluation from EmoEvalEs@IberLEF 2021. Speci cally, the task includes dividing the emotions expressed on Twitter into one of the following seven categories: Anger, Disgust, Fear, Joy, Sadness, Surprise or Others. Our team (team name is Dong) rst use XLM-Roberta for embedding. Then we input the word vector into Transformer Encoder for the secondary extraction of features, and then input the result into TextCNN. Using TextCNN's ability to capture local features, our model could extract high-level features such as text semantics, word order, and context. Finally, the output of the model is input into the fully connected layer for classi cation. Our model rank 14th in this task. The weighted-averaged F1 is 0.5570, and the accurcy is 0.5368.</p>
      </abstract>
      <kwd-group>
        <kwd>Emotion classi cation</kwd>
        <kwd>TextCNN</kwd>
        <kwd>XLM-RoBERTa</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Yunnan University,Yunnan,P.R.China</p>
      <p>icat@mail.ynu.edu.cn
2 Yunnan University,Yunnan,P.R.China</p>
      <p>thomasduke2008@gmail.com
3 Yunnan University,Yunnan,P.R.China</p>
      <p>
        K1ky0@pm.me
With the popularization of the mobile Internet, it is easier for people to express
their opinions and emotions on online social media platforms. At the same time,
due to the popularity of Twitter, short texts with emotional inclination that
people publish on Twitter have the characteristics of wide spreading and fast
speed. These texts can quickly produce an important and far-reaching impact on
social development. With the passage of time, the number of emotionally inclined
text content in the mobile Internet has explosively increased, and these data have
made the task of emotional classi cation in text a hot research topic. Emotion
classi cation is an important task in various basic tasks of emotion analysis. The
task of emotion classi cation needs to nd emotional texts in subjective texts
and analyze them [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Di erent from the traditional classi cation method that
analyzes the objective content of the text, emotion classi cation is to extract
some viewpoint information from the text. So far, a large number of researchers
have proposed many natural language processing models based on deep learning
to realize emotion classi cation. This task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is Spanish emotion detection and
evaluation from EmoEvalEs@IberLEF 2021 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In the past 10 years, text-based emotion analysis has been a hot research eld
that has attracted much attention from researchers in psychology, sociology, and
computer science. The emotion analysis of the text is the study of how to dig out
the subjective sentiment tendency of the user from the text [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In 2013, Mikolov
et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed the Word2vec model. This model maps high-dimensional
vectors to low-dimensional spaces, and proposes a natural language analysis method
based on word vectors. In 2014, Kim et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed a model for processing
text classi cation tasks in natural language processing. Because the Word2vec
model lacks the ability to detect local features of text, the model includes a
convolutional neural network (TextCNN) that can extract local features of text. In
2014, Je rey Pennington et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed a model (Glove) that combines the
advantages of global matrix factorization and local context windows, and learns
word vectors by mapping words to global vectors. In 2018, Peters [8] proposed an
embedded language model (ELMo) based on a special two-way LSTM structure,
which can learn the usage and expression of words in text in a xed direction.
In 2018, OPEN AI proposed a generative pre-trained transformer model (GPT)
based on a one-way Transformer structure [9]. This model uses a unique
selfattention mechanism in Transformer to enable the model to extract sentences.
Thanks to the multi-head attention mechanism of the Transformer model, the
pre-training methods of many natural language processing models have been
improved accordingly. In 2018, Google researchers proposed a deep two-way
code converter (BERT) [10] based on the Transformer encoding structure. This
model uses the MLM (Masked Language Modeling) strategy to achieve two-way
language learning in the pre-training process. Later, the RoBERTa model [11]
released by Facebook improved the original pre-training method of the BERT
model, and further improved the training e ect of related models. In 2019,
researchers from the GoogleBrain team proposed an XLNet model based on an
autoregressive language model pre-training method [12]. The XLNet model can
learn bidirectional semantic information in the text, and can be competent for
many natural language processing tasks. We participate in this task for Spanish
and propose a method that includes XLM-RoBERTa [13], Transformer Encoder
and TextCNN models. It can integrate the advantages of each model to enhance
the e ectiveness of the classi cation of emotions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data and Resources</title>
      <p>The data set [14] used in this mission is provided by EmoEvalEs@IberLEF 2021,
which is derived from online tweets based on global events in di erent elds in
April 2019. The data set is divided into training, development, and testing parts,
in which the hashtag is replaced by the keyword "HASHTAG". Our task is to
divide the tweets in the data set into one of Anger, Disgust, Fear, Joy, Sadness,
Surprise or Others, where Others represents neutral or no emotion.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>System</title>
    </sec>
    <sec id="sec-5">
      <title>Description</title>
      <sec id="sec-5-1">
        <title>Data Preprocessing</title>
        <p>We remove punctuation marks, emojis, empty characters and other special
symbols in the data set. Removing them can reduce the di culty of training the
model without a ecting the classi cation results. Considering that this task is a
text-based seven classi cation task, our model inserts a [CLS] identi er before
the sentence for text classi cation. Then the model predicts what the words are
obscured based on the remaining words. As the training process progresses, the
tokens that are masked in the sentence are not exactly the same each time. The
model will gradually adapt to di erent mask positions, thereby learning multiple
semantic features.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Model Description</title>
        <p>Our system includes XLM-RoBERTa, Transformer Encoder, and TextCNN
models. During the training process, we train the weights of XLM-RoBERTa, TextCNN,
Transformer Encoder and nal classi cation layer. The RoBERTa model inherits
some characteristics of the BERT model, and it expresses the input sentence as
a word vector, a sentence vector, and a position vector. Then RoBERTa
optimizes the pre-training method in terms of model structure and data processing,
which uses more training resources, more training data, larger batch-size and
longer training time. The RoBERTa model uses continuous full-sentences and
doc-sentences as the input of the model and removes the NSP loss. In this task,
we use Hugging Face's implementation of XLM-RoBERTa model, which inherits
the XLM training method and draws on the ideas of RoBERTa, making our
model more suitable for cross-language training tasks in this task. In this layer,
our network layer has 12 layers, the hidden layer 768 layers, and the number of
self-attention heads is 12.</p>
        <p>The XLM-RoBERTa model converts the input labeled corpus into
corresponding feature maps, which integrate global feature information. We consider
that not only the amount of parameters in this layer is huge, but also the amount
of parameter changes is small. This situation is likely to cause the model to
overt and make the nal classi cation e ect unsatisfactory. Therefore, we use the
output of the XLM-RoBERTa model as the input of the Transformer Encoder
layer, which is to use the encoder to perform secondary feature extraction on
the information of the previous layer. Since the parameters in the Transformer
Encoder layer are much smaller than the parameters of the previous model, we
nd that its parameters changed a lot during the training process and are more
sensitive to changes in the input data. This method e ectively alleviates the
over- tting phenomenon of the results and enhances the generalization ability
of the entire model. The Transformer Encoder layer we designed contains only
one Transformer encoding block.</p>
        <p>The output of the Transformer Encoder layer has global features of the text.
We use it as the input of TextCNN to reduce parameters and capture the
local features. After obtaining the local features of the text, we input them into
the max-pooling layer and the fully connected layer (softmax) to obtain the
classi cation results. The TextCNN in this method includes a one-dimensional
convolutional layer.
In this task, we use PyTorch to implement our model. We use Hugging Face's
implementation of XLM-RoBERTa model. The optimization method of the model
is Adam algorithm and cross entropy loss function. The activation function in
the convolutional layer is ReLU. We ne-tuned the model many times, and the
hyperparameter settings of the nal model are shown in Table 1.</p>
        <p>The evaluation indicators of this classi cation task are accuracy and the
weighted-averaged versions of precision, recall, and F1. The nal results of our
model are shown in Table 2. Our model ranks 14th in this task, the
weightedaveraged F1 is 0.5570 (Top team is 0.7170), the accurcy is 0.5368 (Top team is
0.7276).</p>
        <p>The results of other models are shown in Table 3. We use accuracy and F1
to evaluate the performance of our model. Compared with XLM-RoBERTa, our
model has improved its accuracy by nearly 1% and its F1 by nearly 3% on the
development set. Although our model may use the TextCNN layer to capture
local features, the nal e ect is very limited. We believe that there may be two
reasons for this result. On the one hand, the training set data distribution is
unbalanced. In particular, the number of "fear" is very small compared to other
tags, and the number of "others" accounts for nearly half of the total amount
of data. This makes the model biased towards the side with more data during
training. On the other hand, the total amount of data in the training set is
relatively small. We train the weight of the model by increasing the number of
training epochs. But in order to prevent the occurrence of over- tting, we can
only choose a compromise.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this article, we propose a classi cation model to solve the task of Spanish
emotion detection and evaluation. According to our experimental process, we
have considered the problem of less training data for this task and imbalanced
data distribution. Due to the huge amount of parameters of the XLM-RoBERTa
model, di erent downstream tasks require di erent ne-tuning strategies, and
this situation requires a lot of data to support. We have tried to use convolutional
neural networks to capture the local features of the text and reduce the model
parameters to make the classi cation e ect better, but the results are not ideal.
Our weighted-averaged F1 is 0.5570 and accurcy is 0.5368. This result shows
that our model has certain limitations. For example, the performance on this
data set is not outstanding, and it does not perform well on the more re ned
data set. In view of these disadvantages, we can consider weighting the loss
of each category. In addition, we plan to use K-fold cross-validation for model
ne-tuning, and then nd the hyperparameter values that make the model's
generalization performance optimal.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We would like to thank the organizers for organizing this task and providing
data support, and thank the review experts for their patience. Finally, we would
like to thank the school for supporting our research.
8. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K.,
Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long Papers). vol. 1, pp.
2227{2237 (2018)
9. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st
International Conference on Neural Information Processing Systems. vol. 30, pp. 5998{
6008 (2017)
10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep
bidirectional transformers for language understanding (2018)
11. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M.,
Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining
approach (2019)
12. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: Xlnet:</p>
      <p>Generalized autoregressive pretraining for language understanding (2019)
13. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzman,
F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual
representation learning at scale (2019)
14. Plaza del Arco, F.M., Strapparava, C., Urena Lopez, L.A., Martin, M.:
EmoEvent: A multilingual emotion corpus based on di erent events. In:
Proceedings of the 12th Language Resources and Evaluation Conference. pp. 1492{
1498. European Language Resources Association, Marseille, France (May 2020),
https://www.aclweb.org/anthology/2020.lrec-1.186</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Routray</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swain</surname>
            ,
            <given-names>C.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.P.:</given-names>
          </string-name>
          <article-title>A survey on sentiment analysis</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>76</volume>
          (
          <issue>10</issue>
          ),
          <volume>1</volume>
          {
          <issue>8</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Plaza-del-Arco</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez-Zafra</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Raez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molina-Gonzalez</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <article-title>Uren~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
          </string-name>
          n-Valdivia, M.T.:
          <article-title>Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvarez Mellado</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            <given-names>Adorno</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jimenez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Plaza-de Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Taule</surname>
          </string-name>
          , M. (eds.):
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>From classi cation to quanti cation in tweet sentiment analysis</article-title>
          .
          <source>Social Network Analysis and Mining</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>22</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. CHURCH,
          <string-name>
            <surname>Ward</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <source>Word2vec. Natural Language Engineering</source>
          <volume>23</volume>
          (
          <issue>01</issue>
          ),
          <volume>155</volume>
          {
          <fpage>162</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jaderberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Reading text in the wild with convolutional neural networks</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>116</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>20</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>