-

Emotion Detection for Spanish with Data Augmentation and Transformer-Based Models

Hongxin Luo

0 0 School of Information Science and Engineering Yunnan University , Yunnan , P.R. China

In this paper we describe the participation of Yeti team in IberLEF EmoEvalEs task, which is based on the Spanish Semantic Analysis in TASS 2020 version, and proposes as separate task for 2021 in IberLEF. We introduce the methods we used in the emotion detection task and the results obtained. First, we used back-translation data augmentation technology to solve the problems of data scarcity and data imbalance. Our method is based on transfer learning using the BETO language model for sentiment classi cation in Spanish. This system showed excellent performance and nally achieved an accuracy score of 0.7125. We won third place in the nal result, which is only 0.0151 points away from the best result.

Natural Language Processing Transformers Data Augmentation Sentiment Analysis

Sentiment analysis in tweets is a challenging task because a lot of subjective information is generated every day. It is very di cult to deal with these messages with potential language phenomena [6], and these subjective languages can be used to express private states beyond opinions [ 1 ]. People have been looking for e cient sentiment analysis algorithms based on tweets [15]. In the past few years, most of the work on sentiment analysis has combined neural network models and word embedding techniques to achieve better results [ 4 ][11]. This work is to promote the development of a Twitter sentiment classi cation system in Spanish.

Iberian Languages Evaluation Forum (IberLEF) is a comparative evaluation campaign for Natural Language Processing Systems in Spanish and other Iberian languages [12]. The main content of EmoEvalEs task [13] includes classifying the emotion expressed in a tweet as one of six Ekman's basic emotions [ 5 ] that best represents the mental state of the Twitter sender:Anger, Disgust, Fear, Joy,

Sadness, Surprise or Others. In the task, the data set is divided into training set, development set, and test set.

This article mainly summarizes our participation in Emotion Detection and Evaluation tasks [13]. According to the results of TASS 2020 [6], we can nd that the performance of the BERT-based model [ 3 ] on the task is very competitive [6]. We considered a number of state-of-the-art neural network models, and nally our approach is to adaptively ne-tune the Transformer architecture based on multi-language pre-trained. We used ALBERT as the baseline for comparison.

The rest of this paper is organized as follows. Chapter 2 describes the task and the corpus. Chapter 3 introduces our system in detail. Chapter 4 introduces the experimental setup. Chapter 5 outlines the evaluation process, and the conclusions are in Chapter 6. 2

Corpus description

The organizer proposed a sentiment detection task, which is a single-label multiclassi cation task, which divides the sentiment labels corresponding to tweets into seven di erent sentiments. The seven sentiments are Anger, Disgust, Fear, Joy, Sadness, Surprise and Others. The data set is mainly collected from related events in di erent domains in April 2019, including entertainment, catastrophe, politics, global commemorations and global strikes [14]. The corpus is divided into three parts: training, development, and testing, with a total of 8223 items. The data in the training set and the development set have a total of ve attributes, namely id, event, tweet, o ensive, and emotion. The test set does not contain the emotion label. In order to prevent the classi er from relying on hashtags to classify sentiment with tweets, the organizer replaced the hashtag in the dataset with the keyword \HASHTAG" [6]. The challenges we have to face are as follows [14]: { Lack of context: Tweets are short (up to 240 characters). { Informal language: Misspellings, emojis and onomatopoeias are common. { Multiclass classi cation: The dataset is labeled with seven di erent classes.

Materials and methods Pre-processing

Data preprocessing is particularly important for reducing the noise information in tweets. High-quality input data can improve the output performance of the model [8]. Before conducting our experiments, we performed the following preprocessing on the data respectively. First of all, we delete the URLs and punctuation marks from the text content. In order to remove unnecessary semantic information, we removed the stop words through the NLTK toolkit and converted the content of the tweets to lowercase. Finally, we also used the emoji library to convert the emojis in the tweets into text content. At the same time, we also kept the original version of the data set. In the experiment, we compared the results of various pre-processing in the experiment. 3.2

Data augmentation

Due to the extremely unbalanced distribution of the data set, the model tends to be over- tting, predicting the most frequent category. We decided to use data augmentation technology to solve this problem. A simple and e ective method is back-translation [16]. Back-translation is to translate sentences into other languages (such as Spanish to English), and then translate English back to Spanish. Check whether the newly generated sentence is di erent from the original sentence. If it is not the same, use the newly generated sentence as the data augmentation version of the original text. Run back-translation in multiple different languages at the same time to generate more variants. This augmentation technique helps to introduce changes in vocabulary and syntax in tweets, most of the time it can maintain the original meaning [10]. We used two representative languages (Chinese and English) to run back-translation to expand the training data, because we found during the experiments that using more languages to run back-translation does not signi cantly improve the experimental results. In order to obtain translation results, we used Baidu Translation API service.* 3.3

ALBERT

We used the ALBERT model as the baseline of our work, because ALBERT [9] is a newly released model that has excellent performance in various Natural Language Processing (NLP) tasks. ALBERT solves the problem of memory and training speed by designing a Lite BERT architecture, which has fewer parameters than the traditional BERT architecture [9]. The structure of ALBERT is basically the same as BERT, and there are three speci c improvements, including embedding layer parameter factorization, cross-layer parameter sharing, * Baidu Translation API available at https://api.fanyi.baidu.com/ Next Sentence Prediction (NSP) task is changed to Sentence Order Prediction (SOP) task. The hyperparameter settings of the model are as follows (the settings found to perform well in several ne-tunings; the parameters not mentioned keep the default values): { albert model : albert base v2 { max seq length : 128 { optimizer : AdamW { warmup step : 200 { learning rate : 3e-5 { train step : 800 { train batch size : 64 3.4

BETO

Inspired by the results of the TASS 2020 seminar and our emotion classi cation task, we decided to use the BETO model to complete this challenging Emotion Detection and Evaluation for Spanish task. BETO is a BERT model trained on a large Spanish corpus. BETO model combines the pre-training model with the downstream task model, which means that the BETO model is still used when doing downstream tasks, and it naturally supports text classi cation tasks, and there is no need to modify the model when doing text classi cation tasks. BETO is trained on a large Spanish corpus, which can more accurately represent the text features of Spanish, and can solve the problem of the dependence of the task on the Spanish language. It has BETO-uncased and BETO-cased. We used BETO-cased as our Language Model (LM). The size of BETO is similar to BERT-base, according to the guidelines presented by Can~ete et al. [ 2 ], BETO has received Whole Word Masking (WWM) training, and both use about 31k Byte Pair Encoder (BPE) subwords constructed by Sentence Piece Vocabulary list, and have been trained in 2M steps [ 2 ]. In the training process, the dynamic mask technology is introduced, which is to use 10 di erent masks for the same sentence in the corpus. When using WWM to mask a speci c token, if the token corresponds to a subword in a sentence, all consecutive tokens that make up the same word will be masked. We use the ADAM optimizer [7] for optimization. We hope to use BETO as a basic initial LM to construct a robust method to complete this challenge and achieve excellent performance in the nal result. The settings of the optimal hyper-parameters in the experiment are as follows (the settings that performed well in several ne-tunings; the parameters not mentioned keep the default values): { beto model : BETO-cased { max seq length : 128 { train batch size : 32 { learning rate : 2e-5 { num train epochs : 3.0

Experimental Setup

In this section we introduce our experimental procedure. In order to compare the results of this experiment, we will use ALBERT as the baseline. We compare with the BETO model with the best result obtained by ne-tuning during the experiment (the hyperparameter settings are shown in the introduction in section 3.3), both are pre-trained deep models. The IberLEF organization released three corpora for training, development, and testing. The label of each tweet corresponds to one of the 6 emotions.

In the exhaustive search process with the BETO model as the main research object, we determined the model con guration parameters (as shown in section 3.4). Inspired by TASS 2020 Task 1, through observing the data, we found that the tweet also corresponds to an O ensive label, so based on the nearly determined model con guration parameters, we tried to input the O ensive + Tweet content into the model for prediction and compare with the result of input only Tweet.

Then we processed the unbalanced data in the corpus by using back-translation data augmentation technology, mainly using Chinese and English as intermediate languages for back translation. We mainly enhance the two very few categories (Fear and Disgust) to expand the data volume and prevent the model from over tting and predicting a large number of categories.

Finally, we also tried to convert the emojis in data into corresponding content texts to explore better model performance. We input the processed data into the two models for comparison. All experiments are performed on a machine equipped with Nvidia GPU (Tesla V100). 5

Results

The results of our model on the validation set of the Emotion Detection and Evaluation for Spanish task are shown in Table 2. Our nal submission results and rankings on the o cial test set are shown in Table 5. The nal result of our system is quite competitive. In the nal submitted result, it got the fourth place overall with a score of 0.7125, which is only 0.0151 behind the best result.

The results given in Table 4 show that the results obtained by using the BETO model are better than the baseline, and at the same time far better than the BERT model using the multi-language model. This fully shows that the result of using the speci c languages pre-trained model is higher than that of the multi-language pre-trained model. Through Table 2 we also observe that our data preprocessing does not promote the performance of the model. Compared with the unprocessed raw data, the e ect is reduced. After our discussion, we concluded that we believe that the reason for the drop in results may be related to the pre-trained of the model. The pre-trained of the original BERT model is to ensure that the context and semantic connections can be learned, and the input data set is a raw material that has not undergone any pre-processing raw data, and I added preprocessing when ne-tuning downstream tasks, which may destroy the contextual text relationship, which will result in poor results. Finally, it can be seen from Table 3 that the back-translation data augmentation technology we used is helpful to the improvement of model performance, and the conversion of emojis into text content also slightly improves the e ect. Our nal submission results and rankings on the o cial test set are shown in Table 5. 6

Conclusions

We propose a BETO-cased sentiment classi cation system for IberLEF 2021 EmoEvalEs task. This method is based on BETO transfer learning. It is mainly applied to the sentiment analysis of Spanish tweets, which includes an additional data augmentation step, and has achieved good results in the Spanish task. We are very satis ed with the results of our rst participation in the IberLEF workshop. Although the method is relatively simple, it is important that we have achieved very good results in the task by exploring the hyperparameters of the model and con guring our model reasonably in the task. Careful selection of language models and data augmentation techniques play an important role in sentiment analysis of small sample data set. However, there are still huge challenges in sentiment analysis regarding the content of tweets, and our system still has a lot of room for improvement. In the future work, I hope to use more powerful data augmentation technology to solve the problem of data scarcity. We look forward to further exploring more advanced techniques to solve the sentiment analysis of Spanish tweets.

Acknowledgments

First of all, thank the organizer for the valuable opportunity provided to us. Then I would also like to thank the teacher for supporting my research work and the patience of future reviewers. 6. Garc a-Vega, M., D az-Galiano, M.C., Garc a-Cumbreras, M.A., del Arco, F.M.P., Montejo-Raez, A., Jimenez-Zafra, S.M., Camara, E.M., Aguilar, C.A., Antonio, M., Cabezudo, S., et al.: Overview of tass 2020: introducing emotion detection (2020) 7. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 8. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Data preprocessing for supervised leaning. world academy of science, engineering and technology. International Journal of Computer, Electrical, Automation, Control and Information Engineering 1, 4104{4109 (2007) 9. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019) 10. Luque, F.M.: Atalaya at tass 2019: Data augmentation and robust embeddings for sentiment analysis. arXiv preprint arXiv:1909.11241 (2019) 11. Mart nez Camara, E., Almeida-Cruz, Y., D az Galiano, M.C., Estevez-Velarde, S., Garc a Cumbreras, M.A., Garc a Vega, M., Gutierrez, Y., Montejo Raez, A., Montoyo, A., Munoz, R., et al.: Overview of tass 2018: Opinions, health and emotions (2018) 12. Montes, M., Rosso, P., Gonzalo, J., Aragon, E., Agerri, R., Alvarez Carmona, M., Alvarez Mellado, E., Carrillo-de Albornoz, J., Chiruzzo, L., Freitas, L., Gomez Adorno, H., Gutierrez, Y., Jimenez-Zafra, S.M., Lima, S., Plaza-de Arco, F.M., Taule, M. (eds.): Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) (2021) 13. Plaza-del-Arco, F.M., Jimenez-Zafra, S.M., Montejo-Raez, A., Molina-Gonzalez, M.D., Uren~a-Lopez, L.A., Mart n-Valdivia, M.T.: Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021. Procesamiento del Lenguaje Natural 67(0) (2021) 14. Plaza-del-Arco, F., Strapparava, C., Uren~a-Lopez, L.A., Martin-Valdivia, M.T.: EmoEvent: A Multilingual Emotion Corpus based on di erent Events. In: Proceedings of the 12th Language Resources and Evaluation Conference. pp. 1492{ 1498. European Language Resources Association, Marseille, France (May 2020), https://www.aclweb.org/anthology/2020.lrec-1.186 15. Villena Roman, J., Lana Serrano, S., Mart nez Camara, E., Gonzalez Cristobal,

J.C.: Tass-workshop on sentiment analysis at sepln (2013) 16. Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)

1. Algeo , J.: A comprehensive grammar of the english language. by randolph quirk, sidney greenbaum, geo rey leech, and jan svartvik . london: Longman . 1985 . x+ 1779 . Journal of English Linguistics 20 ( 1 ), 122 { 136 ( 1987 )

2. Can~ete, J., Chaperon , G. , Fuentes , R. , Perez , J.: Spanish pre-trained bert model and evaluation data . PML4DC at ICLR 2020 ( 2020 )

3. Devlin , J. , Chang , M.W. , Lee , K. , Toutanova , K. : Bert: Pre-training of deep bidirectional transformers for language understanding . arXiv preprint arXiv: 1810 . 04805 ( 2018 )

az Galiano , M.C. , Mart nez Camara , E. , Garc a Cumbreras, M.A. , Garc a Vega, M. ,

Villena

Roman , J.:

The democratization of deep learning in tass 2017 (

2018 )

5. Ekman , P. : Are there basic emotions? ( 1992 )