Irony Detection in the Portuguese Language using BERT Shengyi Jiang1,2, Chuwei Chen1, Nankai Lin1 * , Zhuolin Chen1 and Jinyi Chen1 ( ) 1 School of Information Science and Technology, Guangdong University of Foreign Studies, China 2 Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong Uni- versity of Foreign Studies, Guangzhou neakail@outlook.com Abstract. In this article, we report the solution of the team BERT 4EVER for the Irony Detection in the Portuguese language task in IberLeF 2021, which aims to identify irony news articles or tweets in the Portuguese language spread via digital media. We propose the BERT (Bidirectional Encoder Representations from Transformers) model to tackle the problem. In addition, we adopt weight loss and ensemble learning to improve the generalization capability. Experimental results as well as the leading position of our team on the task leaderboard demon- strate the effectiveness of our method in the field of news. Keywords: BERT, Irony Detection, Portuguese. 1 Introduction Irony refers to the use of words contrary to the original meaning to express the mean- ing, which is a form of figurative language with strong emotional color. The irony detection is a key-challenge in various tasks involving Natural Language Processing (NLP). In the field of Opinion Mining, for instance, Luís Sarmento et al. [1] noted the role of irony in minimizing the error when discriminating negative from positive opinions. Table 1. The statistics of the dataset used in this task. Field News Tweets Irony 7222 12736 No Irony 17272 2476 All 18494 15212 IberLEF 2021 proposes the task “Irony Detection in the Portuguese language” [Erro! Fonte de referência não encontrada.], which aims at encouraging more work to address the problem of identifying the presence of irony in texts (tweets and news) IberLEF 2021, September 2021, Málaga, Spain. © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) written in Portuguese. The distribution of the dataset is shown in Table 1. Our team, BERT 4EVER, also participates in this task and achieves the first rank in the field of news. In this report, we will review our solution to this task, namely, the BERT model aided by weight loss and ensemble learning. 2 Related Work Most of the research studies in Irony Detection focus on English language [3]. Chris- tos Baziotis et al. [4] presented an ensemble of two different deep learning models: a word- and a character-level deep LSTM for capturing the semantic and syntactic in- formation of tweets respectively, which ranked 1st for both subtasks in SemEval-2018 Task 3 “Irony detection in English tweets” [4]. Barbieri et al. proposed a new evalua- tion framework named TWEETEVAL consisting of seven heterogeneous Twitter- specific classification tasks. In particular, they provided a strong set of baselines as starting point and compared different language modeling pre-training strategies, which established a relatively perfect evaluation system of English irony detection [5]. Recently, the NLP community also focuses on other languages for the need to develop linguistic and computational resources, which has spawned a multitude of irony detection competitions for other languages, such as Arabic [6], Spanish [7], and Italian [8]. Portuguese is a low-resource language, which limits the amount of research done for this language. Freitas et al. proposed a set of patterns that might suggest iron- ic/sarcastic statements by observing a corpus constituted by tweets [9]. In particular, they developed special clues for irony detection, through the implementation and evaluation of a set of patterns. Fabio Ricardo Araujo da Silva [10] proposed a Convo- lutional Neural Network (CNN) adapted for automatically detecting irony/sarcasm in Brazilian Portuguese, which was trained and tested by datasets from Twitter obtained by the author and from thirds. For the reason that there is no website corpus in Portu- guese corpus (only the corpus from social network twitter), Gabriel Schubert Marten et al. [11] developed a corpus in the Portuguese language to sarcasm and irony detec- tion task. 3 Method As shown in Figure 1, based on the BERT model, we train three kinds of models with different strategies. In the prediction stage, we fuse the prediction results of the three models for each field (news/twitter). These three strategies are as follows: (1) We fine-turn the BERT model separately for the training set in each field. (2) On the basis of (1), we adopt the Loss Weight strategy for the training set in each field to solve data imbalance. (3) We combine the data from these two fields together and fine-tune the BERT model, so as to make use of the information from the other field to assist classification and improve the generalization ability of the model. Fig. 1. The framework of our method. 3.1 BERT Fig. 2. BERT Model. BERT (Bidirectional Encoder Representations for Transformers) is designed to pre- train deep bidirectional representations from unlabeled text by jointly building models both left and right context in all layers [12]. The BERT model structure is shown in Figure 2. It consists of two pretrain tasks, namely Mask Language Model (MLM) and Next Sentence Prediction (NSP) :(1) MLM is defined as masking some words in the input sequence, and then predicting the masked words according to the context; (2) NSP refers to predicting whether the second sentence is the follow-up (next sentence) of the first sentence. 3.2 Weight Loss We can see that in the field of news, there are more data that are not ironic, while in the field of tweets, there are more data that are ironic. In this paper, we adopt a class weight adjustment method [13] to tackle the problem of data with class imbalance. Assuming that the given labels 𝑐 = (𝑐! , 𝑐" , . . . , 𝑐# ), the class weight 𝑤$ of the 𝑖-th label is calculated as follows. 𝑐$ 𝑤$ = 𝑙𝑜𝑔%& ( ' ) (8) ∑$(! 𝑐$ where the value of 𝑚𝑢 is e. 3.3 Model Fusion We train multiple models through multiple strategies. Each model predicts the test data separately. For each sample 𝑥, the predicted probability of the model is 𝑝𝑡! + 𝑝𝑡" + 𝑝𝑡) 𝑝𝑛! + 𝑝𝑛" + 𝑝𝑛) 𝑦=[ , ] 3 3 in which 𝑝𝑡$ is the “no irony” probability of 𝑖-th BERT model, 𝑝𝑛$ is the “irony” probability of 𝑖-th BERT model. 4 Result Based on five-fold cross-validation, we reported the result of BERTs model with three strategies and other machine learning algorithms. We used Transformers2 library and Pytorch 3 library as backend to construct BERT-based models and scikit-learn 4 to construct machine learning models. The BERT5 model we used was pre-trained by Souza et. al [14]. When using other machine learning algorithms, we selected the text features with TFIDF. We used Bacc (Balanced Accuracy) as the evaluation indicator and the results were shown in Table 2. We could see that the performance of most algorithms was excellent, and the BERT models with three strategies were more than 0.99 on the validation set. 2 https://github.com/huggingface/transformers 3 https://github.com/pytorch/pytorch 4 https://github.com/scikit-learn/scikit-learn 5 https://huggingface.co/neuralmind/bert-base-portuguese-cased Table 2. The results of our model based on five-fold cross-validation. Field Model Bacc BERT (Strategy 1) 0.9992 BERT (Strategy 2) 0.9992 BERT (Strategy 3) 0.9977 Tweets KNN 0.9599 Random Forest 0.9497 Decision Tree 0.9947 SVM 0.9959 Naive Bayes 0.8849 BERT (Strategy 1) 0.9906 BERT (Strategy 2) 0.9900 BERT (Strategy 3) 0.9883 News KNN 0.9393 Random Forest 0.9497 Decision Tree 0.8923 SVM 0.9822 Naive Bayes 0.8397 Due to the limitation of the number of contest submissions, we submitted three re- sults, the results of the test set were shown in Table 3 and Table 4. In the field of news, we respectively submitted the result of BERT (Strategy 1), the result of BERT (Strategy 3), and the fusion result of three BERT models. It can be seen that when multiple models were fused, the performance dropped instead. Among the three strat- egies, the strategy 3 with the worst performance on the validation set had the best performance on the test set. This was because the model used more data for training, which increased the generalization capability of the model to some extent. The result reached the best performance in this evaluation competition. However, in the field of tweets, our models were overfitting to the training data. Although the fusion strategy had brought a certain improvement, the result of our test set was still very low, the Bacc value was only 0.4975. Table 3. The results of our model on the final news test set. Method Bacc Acc F1 Precision Recall BERT (Strategy 1) 0.9107 0.9000 0.8800 0.8148 0.9565 BERT (Strategy 3) 0.9215 0.9133 0.8943 0.8397 0.9565 BERT (Merge) 0.9063 0.8967 0.8755 0.8134 0.9478 Table 4. The results of our model on the final tweets test set. Method Bacc Acc F1 Precision Recall BERT (Strategy 1) 0.4959 0.4067 0.5782 0.4080 0.9919 BERT (Merge) 0.4975 0.4100 0.5776 0.4088 0.9837 5 Conclusion In the Irony Detection in the Portuguese language task in IberLeF 2021, we train three kinds of models with different strategies based on the BERT model. Experimental results as well as the leading position of our team on the task leaderboard in the field of news demonstrate the effectiveness of our method. However, in the field of tweets, our models are overfitting to the training data. In the future, we will try to solve the problem of overfitting in tweets field in order to achieve better results on the Irony Detection in the Portuguese language task. Acknowledgements This work was supported by the Key Field Project for Universities of Guangdong Province (No. 2019KZDZX1016), the National Natural Science Foundation of China (No. 61572145) and the National Social Science Foundation of China (No. 17CTQ045). The authors would like to thank the anonymous reviewers for their valu- able comments and suggestions. References 1. Sarmento, L., Carvalho P., Silva M. J., et al.: Automatic creation of a reference corpus for political opinion mining in user-generated content. In: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, pp. 29-36 (2009). 2. Corrêa, U. B., dos Santos, L. P., Coelho, L., de Freitas, L. A.: Overview of the IDPT Task on Irony Detection in Portuguese at IberLEF 2021. Procesamiento del Lenguaje Natural, vol. 67, (2021). 3. Joshi, A., Bhattacharyya, P., Carman, M. J.: Automatic sarcasm detection: A survey. ACM Computing Surveys (CSUR) 50(5), 1-22 (2017). 4. Baziotis, C., Athanasiou, N., Papalampidi, P., et al.: NTUA-SLP at SemEval-2018 Task 3 Tracking Ironic Tweets using Ensembles of Word and Character Level Attentive RNNs. CORR (2018). 5. Barbieri, F., Camacho-Collados, J., Neves, L., et al.: TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. CORR (2020). 6. Ghanem, B., Karoui, J., Benamara, F., et al.: IDAT AT FIRE2019 Overview of the Track on Irony Detection in Arabic Tweets. In: Proceedings of the 11th Forum for Information Retrieval Evaluation, pp. 10-13 (2019). 7. Ortega-Bueno, R., Rangel, F., Hernández, F. D., et al.: Overview of the task on irony de- tection in Spanish variants. In: Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural lan- guage processing (SEPLN 2019), CEUR, vol. 2421, pp. 229-256 (2019). 8. Cignarella, A. T., Frenda, S., Basile, V., et al.: Overview of the evalita 2018 task on irony detection in italian tweets (ironita). In: Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2018), CEUR, vol. 2263, pp. 1-6 (2018). 9. de Freitas, L. A., Vanin, A. A., Hogetop, D. N., Bochernitsan, M. N., and Vieira, R.: Pathways for irony detection in tweets. In: Proceedings of the 29th Annual ACM Sympo- sium on Applied Computing, pp. 628-633 (2014). 10. DA SILVA, F. R. A.: Detecção de ironia e sarcasmo em língua portuguesa: Uma aborda- gem utilizando deep learning. (2018). 11. Marten, G. S., and de Freitas, L. A.: The Construction of a Corpus for Detecting Irony and Sarcasm in Portuguese. Brazilian Journal of Development 7(5), 47973-47984 (2021). 12. Devlin, J., Chang, M. W., Lee K., and Toutanova K.: BERT: Pre-training of Deep Bidirec- tional Transformers for Language Understanding, In: Proceedings of NAACLHLT 2019, pp. 4171-4186. (2019). 13. Wang, L., Lin X., and Lin, N.: Research on pseudo-label technology for multi-label classi- fication. In: 16th International Conference on Document Analysis and Recognition, (2021). 14. Souza, F., Nogueira, R., and Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazil- ian Portuguese. In: Brazilian Conference on Intelligent Systems, pp. 403-417. (2020).