<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sentiment Analysis for Spanish Tweets based on Continual Pre-training and Data Augmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yingwen Fu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ziyu Yang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nankai Lin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lianxi Wang</string-name>
          <email>wanglianxi@gdufs.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Feng Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong University of Foreign Studies</institution>
          ,
          <addr-line>Guangzhou</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information Science and Technology, Guangdong University of Foreign Studies</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we report the solution of the team BERT4EVER for the sentiment analysis task for Spanish tweets in EmoEvalEs@IberLEF 2021, which aims to classify Spanish tweets into one of the following emotional categories: Anger, Disgust, Fear, Joy, Sadness, Surprise or Others. We adopt the monolingual Spanish BERT model to tackle the problem. In addition, we leverage two augmented strategies to enhance the classic fine-tuned model, namely continual pre-training and data augmentation to improve the generalization capability. Experimental results demonstrate the effectiveness of the BERT model and two augmented strategies.</p>
      </abstract>
      <kwd-group>
        <kwd>Sentiment Analysis</kwd>
        <kwd>BERT</kwd>
        <kwd>Continual Pre-training</kwd>
        <kwd>Back Translation</kwd>
        <kwd>Mix up</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Sentiment analysis is an important task in the field of natural language processing
(NLP). It is often used to determine which type of emotion a text belongs to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
However, due to the lack of voice modulations and facial expressions, understanding the
emotions expressed by users on social media such as Twitter is a difficult task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Researchers are constantly pursuing efficient algorithms to achieve better classification
results. [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]
      </p>
      <p>Therefore, in EmoEvalEs@IberLEF 2021 [14], a sentiment analysis task is proposed
[15], requiring participants to perform sentiment analysis and evaluation of tweets in
Spanish and classify them into one of the following emotional categories: Anger,
Disgust, Fear, Joy, Sadness, Surprise or Others. This track provides Spanish tweets and the
 
corresponding categories for participants to conduct sentiment classification
experiment. However, there are two main challenges for this task:</p>
      <p>
        1) The dataset size is relatively small, which is far from the amount of data required
for training of commonly used classification models such as BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Bi-LSTM
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>2) The proportion of categories is extremely imbalanced, in the provided dataset, the
proportion of Fear and Disgust is much smaller than that of Others and Joy.</p>
      <p>In tackle to the issues above, we, the BERT4EVER team, have leveraged two
strategies to boost the classification performance: Continual Pre-training and Data
Augmentation. These two strategies can effectively compensate for the two problems of small
data size and imbalanced category proportions, so that the trained model has yielded
better performance.</p>
      <p>The remaining structure of the article is as follows. In Section 2 we will describe the
task and data set given by the organizer in detail. Then in Section 3 our specific
implementation is given. The final experimental results and conclusions are shown in the
Section 4 and Section 5 respectively.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>The aim of the task is to classify the sentiment conveyed in a Spanish tweet. The task
is tough because it lacks the facial expression and intonation and the sentiment can be
divided into the following sentiment classes: Anger, Disgust, Fear, Joy, Sadness,
Surprise or Others (the sentiment conveyed in a tweet as ‘neural’ or no sentiment).</p>
      <p>
        The datasets [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] involved in this task were provided by the organizer of the Codalab.
There are about 18,000 training datasets. In addition to the tweet, the labels of the
dataset also include whether the tweet is offensive and what event the tweet is about.
Some statistics about the training set are shown in Table 1.
      </p>
      <p>In our conducted experiment, in order to fairly explore the effectiveness of different
strategies, we leveraged 5-fold cross-validation in which we divided all the datasets
into 5 parts to obtain an ensemble model with a better generalization performance. 4
parts of them are for training and the remaining part is for verification. Afterwards we
leverage the average results of 5 cross models as an estimation of the effectiveness of
the strategy.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <sec id="sec-3-1">
        <title>Base Model</title>
        <p>
          BERT (Bidirectional Encoder Representations from Transformers) model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is a
pretrained language model (PLM) which shows excellent performance on multiple
downstream NLP tasks. The model architecture is shown in Fig. 1. It reads the input sequence
at once and learns via two strategies, i.e., masked language modeling (MLM) and next
sentence prediction (NSP). MLM is mean to randomly mask 15 percent of input words
and replace them to other tokens, then predict those masked words. NSP refers to
predict whether the input two sentences are consequent in the text or not to enhance the
relationship between the sentences.
        </p>
        <p>
          In this paper, we leverage BETO [13] as our base model. BETO is a BERT model
trained on a big Spanish corpus Zenodo. BETO is of size similar to a BERT-Base and
was trained with the Whole Word Masking technique. It uses a vocabulary of about 31k
BPE [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] subwords constructed using SentencePiece and were trained for 2M steps.
        </p>
        <p>However, since our data set is based on Spanish tweet, a general pre-trained model
directly applied to this data set may be limited by insufficient domain knowledge. At
the same time, the problem of category imbalance (as discussed in Introduction) is also
a problem we need to solve. Therefore, we proposed two strategies, Continuous
pretraining and Data augmentation, to alleviate the above problems.
Inspired by [11], our continual pre-training approach to domain adaption is
straightforward—we continue pretraining BETO on a large corpus of unlabeled domain-specific
text. Specifically, we try two domain corpora: (1) Training set in
EmoEvalEs@IberLEF 2021: we ignore the labels in the training set and only use the raw text for
continual pre-training. (2) General Spanish tweet corpus + Training set in
EmoEvalEs@IberLEF 2021: in addition to the unlabeled training data in this track, we also
leverage a large general Spanish tweet corpus [12] for domain-adaptive pretraining.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Augmentation</title>
        <p>Data augmentation is to solve over-fitting from the data level and improve the
generalization of the model. By increasing the diversity of training samples, the model can
learn more essential features of the data and enhance the model's adaptability to subtle
changes in samples.</p>
        <p>Back Translation. In order to generate more training data, we use back translation
generate paraphrases of an unlabeled sentence   in constructing  ′ . The paraphrase
 ′ , generated via translating   to an intermediate language and then translating it
back, describes the same content as   and should be close to  ′ semantically. In terms
of the generated label,   and the corresponding back translation sample  ′ share the
same labels. We leverage English as intermediate language in back translation.</p>
        <p>By observing the Spanish dataset, we find that the three types of categories, Disgust,
Fear, and Surprise, account for the lowest proportions. Therefore, we only perform back
translation in these three categories. Increase the proportion of low-proportion
categories, which not only enriches the amount of training data but also reduces the model’s
misjudgment rate for these three low-proportion labels.</p>
        <p>
          Mix Up. Mix up [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a simple and quick data augmentation method. Its
implementation method is to randomly extract two samples from the training sample to perform
a simple random weighted summation. At the same time, the label of the sample
corresponds to the weighted summation, and then the predicted result and the weighted
summation loss is calculated for the subsequent tags, and finally the parameters are updated
through backpropagation.
        </p>
        <p>=    + (1 −  )  ,  =    + (1 −  ) 
(1)
where   ,   are raw input vectors and   ,   are one-hot label encodings. In this task,
we simply set  as 0.5 and get more stable predict results.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment</title>
      <sec id="sec-4-1">
        <title>Experiment Settings</title>
        <p>We use Transformers library using Pytorch as backend to construct BERT-based
models and ski-learn to construct machine learning models. The hyperparameters are
shown in Table 2. As for evaluation, we leverage macro weighted averaged F1 score as
our evaluation metric.</p>
        <sec id="sec-4-1-1">
          <title>Parameter</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Learning Rate</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Batch Size</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>Epoch</title>
        </sec>
        <sec id="sec-4-1-5">
          <title>Optimizer</title>
        </sec>
        <sec id="sec-4-1-6">
          <title>Device</title>
          <p>
            We firstly report the offline results about some machine learning methods such as
Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF) and so on
and latest neural methods such as fine-tuned XLM [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], fine-tuned BETO as well as
some augmented strategies including continual pre-training and back translation. The
results are shown in Table 3 and Table 4.
          </p>
          <p>From the table above, we can see that the SVM method works best in machine
learning methods, outperforming LR and RF with 0.0407 and 0.0245. In addition, neural
methods are far superior to machine learning methods, indicating the superiority of
neural methods especially BERT-based methods. As for BERT-based method itself, we
can see that the monolingual BETO achieves better performance than multilingual
XLM with an improvement of almost 0.1, demonstrate the effectiveness of monolingual
BETO for this task. Besides, two augmented strategies leveraged in this paper have
made certain improvements to the base model, among which Mix up augmentation
achieves the best effect, reaching an average accuracy of 0.7266. In addition, continual
pre-training with training set and low proportion data back translation respectively
outperforms continual pre-training with general corpus and whole data back translation.</p>
          <p>Based on the offline results, we use the models (soft voting with 5 cross models) of
ID 9, ID 10 and the combination of ID 9 and ID 10 (in Table 3) as our final submissions.
The online results are shown in Table 5. We achieve the second place in the
competition.</p>
          <p>It can be seen from Table 5 that Fine-tuned BETO + Training set pre-training + Low
proportion data back translation achieves the best result of 0.7222 in accuracy. It is
worthwhile to note that the offline performance of Fine-tuned BETO + Training set
pre-training + Mix up is excellent, but the online performance of it is not so good. That
is also why the performance of the combination of the two models is not as good as that
of the single model. We hold the opinion that the model training is over-fitting, resulting
in poor generalization performance of the model, and thus the effect is impaired when
tested on the test set.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Aiming at sentiment analysis task for Spanish tweets in EmoEvalEs@IberLEF 2021,
we adopt a monolingual pre-trained Spanish BERT model as our base model and
finetune it with the labeled tweets. In addition, focusing on two problems of small data size
and class imbalance in the original training set, we leverage two augmented strategies
to enhance the classic fine-tuned model, namely continual pre-training and data
augmentation. Specifically, we try two data augmentation methods: back translation and
mix up. Experimental results demonstrate the effectiveness of two augmented
strategies. In the future, we will further try more data augmentation methods to achieve better
results on the sentiment analysis task for Spanish tweets.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work was supported by the National Social Science Foundation of China (No.
17CTQ045), the Soft Science Research Project of Guangdong Province
(No.2019A101002108), the Science and Technology Program of Guangzhou
(No.202002030227), the National Natural Science Foundation of China (No.
61572145) and the Key Field Project for Universities of Guangdong Province (No.
2019KZDZX1016). The authors would like to thank the anonymous reviewers for their
valuable comments and suggestions.
11. Gururangan, S., Marasović, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D. and Smith,
N. A.: Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In:
Proceedings of ACL, pp. 8342—8360. Online (2020).
12. González, J. Á., Hurtado, L. F. and Pla, F.: TWilBert: Pre-trained Deep Bidirectional
Transformers for Spanish Twitter. Neurocomputing 426, 58-69 (2021).
13. Cañete, J., Chaperon, G., Fuentes, R., Ho, J., Kang, H. and Pérez, J.: Spanish Pre-Trained</p>
      <p>BERT Model and Evaluation Data. In: Preceedings of ICLR 2020. (2020).
14. Montes, M., Rooso, P., Gonzalo, J., et al.: Proceedings of the Iberian Languages Evaluation</p>
      <p>Forum (IberLEF 2021). CEUR Workshop Proceedings. (2021).
15. Plaza-del-Arco, F. M., Jiménez Zafra, S. M., Montejo Ráez, A., Molina González, M. D.,
Ureña López, L. A., Martín Valdivia, M. T.: Overview of the EmoEvalEs task on emotion
detection for Spanish at IberLEF 2021. Procesamiento del Lenguaje Natural 67(0) (2021).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis and opinion mining</article-title>
          .
          <source>Synthesis Lectures on Human Language Technologies</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>167</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farra</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Nakov</surname>
          </string-name>
          , P.: SemEval
          <article-title>-2017 task 4: Sentiment analysis in twitter</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval2017)</source>
          , pp.
          <fpage>502</fpage>
          -
          <lpage>518</lpage>
          . Vancouver, Canada (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cliche</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>BB twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</source>
          , pp.
          <fpage>573</fpage>
          -
          <lpage>580</lpage>
          . Vancouver, Canada (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Arasteh</surname>
            ,
            <given-names>S. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monajem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christlein</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Evert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>How Will Your Tweet Be Received? Predicting the Sentiment Polarity of Tweet Replies</article-title>
          .
          <source>In: 2021 IEEE 15th International Conference on Semantic Computing (ICSC)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Toutanova</surname>
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>In: Proceedings of NAACLHLT</source>
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Plaza-del-Arco</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urena</surname>
            <given-names>Lopez</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L. A.</surname>
          </string-name>
          and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Valdivia</surname>
          </string-name>
          , M. T.:
          <article-title>EmoEvent: A Multilingual Emotion Corpus based on different Events</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          , pp.
          <fpage>1492</fpage>
          -
          <lpage>1498</lpage>
          . European Language Resources Association, Marseille, France (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norouzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macherey</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krikun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macherey</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.:
          <article-title>Google's neural machine translation system: Bridging the gap between human and machine translation</article-title>
          .
          <source>In: CoRR</source>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Zhang, H.,
          <string-name>
            <surname>Cisse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dauphin</surname>
            ,
            <given-names>Y. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Paz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>: mixup: BEYOND EMPIRICAL RISK MINIMIZATION</article-title>
          .
          <source>In: Preceedings of ICLR</source>
          <year>2018</year>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhary</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wenzek</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzmán</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          , V.:
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          .
          <source>In: Proceedings of ACL</source>
          <year>2020</year>
          , pp.
          <fpage>8440</fpage>
          -
          <lpage>8451</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>