UO-CERPAMID at IroSvA: Impostor Method Adaptation for Irony Detection IroSvA@IberLEF 2019 Daniel Castro1 and Llinet Benavides2 1 Center for Pattern Recognition and Data Mining, Universidad de Oriente, Cuba daniel.castro@cerpamid.co.cu 2 Universidad de Oriente, Cuba llinet@uo.edu.cu Abstract. Irony in text allows expressing implicit negative opinions, using a figurative and humorous language. IroSvA is a challenging task proposed this year that allows the evaluation of Irony Detection algo- rithms in the analysis of short texts for three Spanish language variants (Cuban, Spain and Mexican). Our proposal focuses on the study of three different representations of textual information and similarity measures using a weighted combination of these representations. We use an adap- tation of the impostors method for classifying texts in Ironic or Non ironic. We consider the non-ironic texts of the training dataset as the list of impostors. The results achieved are encouraging and the best were obtained for the cuban variant. Keywords: Irony detection · Impostor method · Text similarity 1 Introduction Irony is a fundamental rhetorical device. It is a uniquely human mode of com- munication, curious in that the speaker says something other than what he or she intends [16]. Irony is an active part in the speech of users on the web, when it comes to expressing their opinions (blogs, forums, social networks, specialized sites). Hence the importance of its computational detection for the analysis of data by companies with access to data generated by these or entities with an interest in sentiment analysis, among others. The detection of irony is defined as “a set of characteristics and techniques that allow you to decide whether a text is ironic or not” [12]. Another definition is “Irony detection is an interesting machine learning problem, because, in contrast to most text classifications tasks, it requires a semantics that cannot be inferred directly from word counts over documents alone” [16]. As such, modeling irony has a large potential for applications in various research Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 Septem- ber 2019, Bilbao, Spain. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) areas, including text mining, author profiling, detecting online harassment and, perhaps one of the most investigated applications at present, automatic senti- ment analysis. [4]. Therefore, detecting irony involves the most challenging areas of natural lan- guage processing. Research in user-generated content is a very challenging task. Despite the social media texts convey an invaluable source of information, they are difficult to process because they are noisy, informal, with little context, and plenty of grammatical mistakes [3]. In the task of detecting irony, although in its beginnings, the English language [14,1,3] was the most studied. There are already works in other languages such as Dutch [8], Catalan [9], Arabic [7] and still few are considered in Spanish [12,6]. SemEval 2018 Shared Task[5] on “irony detection in tweets” and Ironita 2018 [2] were the first tasks in irony detection. In SemEval 2018 “The systems that were submitted represent a variety of neural-network-based approaches (i.e. CNNs, RNNs and (bi-)LSTMs) exploiting word and character embedding as well as handcrafted features. Other popular classification algorithms include Support Vector Machines, Maximum Entropy, Random Forest, and Naive Bayes. While most approaches were based on one algorithm, some participants experimented with ensemble learners (e.g. SVM + LR, CNN + bi-LSTM, stacked LSTMs), implemented a voting system or built a cascaded architecture (for Task B) that first distinguished ironic from nonironic tweets and subsequently differentiated between the fine-grained irony categories.”[5]. IroSvA (Irony Detection in Spanish Variants) is the first shared task fully ded- icated to identify the presence of irony in short messages (tweets and news comments) written in Spanish. [10]. The task will be structured into three subtasks, each one of these, for predicting whether messages are ironic or not, in one of the three Spanish variants. The three subtasks aim to the same goal: participants should determine whether a message is ironic or not according to specified context. Recent advances in detection of irony have shown that the supervised classifica- tion methodology with a great extent of feature engineering produces satisfactory indicators for irony or sarcasm. This methodology has been tested in short texts, such as product reviews, news commentaries and tweets [6]. Supervised Classi- fication focused in determine the class of an object based on a set of known objects grouped by class. 2 Proposal presented Our approach is based on the representation of the text using three different vectors of features. For classifying a new text in the Ironic or Non-ironic class, we used the proposed General Impostor Method (GIM) of [15], but with a sim- plified variation of the Impostor Method (IM) presented by [15]. The similarity between texts is defined by a weighted similarity considering the three vector representation. In the next sections we explain the representation, the weighted similarity and the GIM. The IM uses a set of Non-ironic documents related with 316 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) the Ironic sample and analyze the similarity of an unknown document with the set and the Ironic sample. The unknown document would be Ironic if it is more similar to the Ironic one than a randomly subset of the Non-ironic documents. 2.1 Document representation A text (document) is modeled as an object with three vector representations, but using some features in common. The idea behind this type of representation is to adjust what representation could impact for different genres of text. One representation, and denoted by R1 , is based on the classical Bag of Word (BoW) vector and we build this with the tokens extracted from a Natural Lan- guage tokenizer or with lemmas extracted using a Natural Language lemmatizer. The second representation, R2 , is based only on the frequency of Punctuation Sign, considering all punctuation extracted by a Part of Speech Tagger. For the third representation, R3 , are computed different stylistic features and the fre- quency of them. Some of the stylistic features are: entire capitalized words (QUE BUENO), character flooding (oooooohhhhhh), repetition of closed exclamation sign (!!!!!!!), etc. 2.2 Similarity measures The similarity measures to compare two documents needs to consider all the representations proposed, but also to be flexible, so that we can use only some of them. For that reason we use the following measure. β(D1 , D2 ) = α ∗ β 1 (D1R1 , D2R1 ) + γ ∗ β 2 (D1R2 , D2R2 ) + δ ∗ β 3 (D1R3 , D2R3 ) (1) For the β ∗ similarity function we implemented different similarities proposed in the literature, for example, cosene, jaccard, dice, tanimoto or distance like euclidean or minmax. In the evaluation phase we tested with all of them and used the one that allows us to obtain the best result for each representation. The parameters α, γ, δ, let us give importance to the representation, and if one of the parameters is 0, then, not considering that representation. α + γ + δ = 1. 2.3 Impostor method We used the proposed General Impostor Method (GIM), but not considering in the IM the step of randomly choose a subset of features from the representation of text. The set of impostors S corresponds to the set of non-ironic documents provided in the training dataset. D1 is the document to be classified and D2 is an ironic document. ∆∗ is a parameter that needs to be adjusted using the training dataset and allows us to determine that D1 is ironic if it has more similarity with ironic D2 than a percentage of non-ironic impostors. 317 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) Impostors Method (IM) Input: (D1 , D2 ) A pair of documents. S: a set of impostors documents Output: (ironic) or (non-ironic) 1. Score = 0 2. Repeat k times a. Randomly choose n impostors from S : I1 , ..., In b. Score+ = k1 if β(D1 , D2 ) ∗ β(D2 , D1 ) > β(D1 , Ii ) ∗ β(D2 , Ii ), for each i ∈ {1...n} 3. return (ironic) if Score > ∆∗ ; else (non-ironic) The set of Non-ironic texts are considered the impostors texts and for the classification of the evaluation dataset we used all the Non-ironic texts provided in the training dataset. All the parameters were optimized and the best value used for the evaluation phase and resumed in the Evaluation section. Next we present the pseudo-code of IM and GIM. General Impostors Method (GIM) Input: (D) A document to be classified. Y : (D1 , ..., Dn ) ironic documents Output: (ironic) or (non-ironic) 1. For each pair of documents (D, Di ) a. Run original IM to obtain a similarity binary score S(D, Di ) 2. Score = Average over similarity scores [S(D, D1 ), ..., S(D, Dn )] 3. return (ironic) if Score > θ∗ ; else (non-ironic) 3 Evaluation results The evaluation was executed for the three variants of Spanish language pre- sented by the task, and for each of them we needed to optimize the parameters of the GIM and the weighted similarity measure. For that purpose we run a 10 cross fold validation over the training dataset provided. The range of the param- eters evaluated was varied and the best parameter chosen is illustrated in the Table 1(parameters of UO-run2). Table 1. Parameters optimization for the three Spanish variants over the training dataset parameters ES MX CU K 5.0 5.0 5.0 S 5.0 5.0 5.0 ∆ 0.6 0.6 0.7 θ 0.5 0.5 0.5 α 0.2 0.8 0.2 γ 0.8 0.2 0.0 δ 0.0 0.0 0.8 318 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) The structure and data distribution for the training and test datasets were presented by [10], and also the metrics used for evaluation. The similarity be- tween documents was calculated considering the euclidean distance function for each of the three representation, because it gets the best results in the cross validation evaluation over the rest of the comparison functions implemented. 3.1 Discussion In the Table 2, we present in row 1 and row 2 the results achieved by the two runs sent to the task, and all the baselines exposed by the organizers, and the highest value obtained by[13]. The main difference of UO-run1 and UO-run2 is that in run1 the BoW representation takes as features the token extracted by the Freeling[11] NLP tokenizer and in run2 the BoW representation considers as features the lemma extracted by the FreeLing lematizer. Also the parameters ∆ and θ were 0.5 for UO-run1. Table 2. F1-macro for the three Spanish variants test corpus approach ES MX CU AVG UO-run1 0.5110 0.4890 0.4996 0.4999 UO-run2 0.5445 0.5353 0.5930 0.5576 LDSE 0.6795 0.6608 0.6335 0.6579 W2V 0.6823 0.6271 0.6033 0.6376 Word nGrams 0.6696 0.6196 0.5684 0.6192 Majority 0.4000 0.4000 0.4000 0.4000 Our best result was achieved for UO-run2, and for the cuban variant, also similar to those of two of the baselines. It is important to notice that with lemma representation the results were always better than the representation based only on lexical tokens, because in the first one we reduced the lexical variety of words referring to the same lemma. 4 Conclusions and Future Work In general, based on the results achieved in the cross validation from the training dataset of the Spain and Mexican dataset (spanish tweets), the R3 representation gets the worse results and this is based on the stylistic variety between ironic tweets, and also the similarity in stylistic features between ironic and non-ironic tweets. As future directions, we will introduce representations based on Word nGrams and feature selection methods based on the importance of features by class. 319 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) References 1. Barbieri, F., Saggion, H.: Modelling irony in twitter: Feature analysis and evalua- tion. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26-31, 2014. pp. 4258–4264 (2014), http://www.lrec-conf.org/proceedings/lrec2014/summaries/231.html 2. Cignarella, A.T., Frenda, S., Basile, V., Bosco, C., Patti, V., Rosso, P.: Overview of the EVALITA 2018 task on irony detection in italian tweets (ironita). In: Pro- ceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy, December 12-13, 2018. (2018), http://ceur-ws.org/Vol-2263/paper005.pdf 3. Farı́as, D.I.H.: Irony and Sarcasm Detection in Twitter: The Role of Affective Content. phdthesis, Universidad Politécnica de Valencia (2017) 4. Hee, C.V.: Exploring automatic irony detection on social media. phdthesis, Uni- versidad de Gent (2017) 5. Hee, C.V., Lefever, E., Hoste, V.: Semeval-2018 task 3: Irony detection in english tweets. In: Proceedings of The 12th International Workshop on Semantic Evalua- tion, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018. pp. 39–50 (2018), https://aclanthology.info/papers/S18-1005/s18-1005 6. Jasso, G., Meza-Ruı́z, I.V.: Character and word baselines systems for irony detec- tion in spanish short texts. Procesamiento del Lenguaje Natural 56, 41–48 (2016), http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5285 7. Karoui, J., Zitoune, F.B., Moriceau, V.: SOUKHRIA: towards an irony detection system for arabic in social media. In: Third International Conference On Arabic Computational Linguistics, ACLING 2017, November 5-6, 2017, Dubai, United Arab Emirates. pp. 161–168 (2017). https://doi.org/10.1016/j.procs.2017.10.105, https://doi.org/10.1016/j.procs.2017.10.105 8. Liebrecht, C., Kunneman, F., van den Bosch, A.: The perfect solution for de- tecting sarcasm in tweets #not. In: Proceedings of the 4th Workshop on Com- putational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@NAACL-HLT 2013, 14 June 2013, Atlanta, Georgia, USA. pp. 29–37 (2013), http://aclweb.org/anthology/W/W13/W13-1605.pdf 9. Muñoz, J.R.: TwIrony: Identificación de la ironı́a en Tweets en Catalán. candthesis, Universitat Pompeu Fabra (2015) 10. Ortega-Bueno, R., Rangel, F., Hernández Farı́as, D.I., Rosso, P., Montes-y-Gómez, M., Medina Pagola, J.E.: Overview of the Task on Irony Detection in Spanish Variants. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019). CEUR-WS.org (2019) 11. Padró, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Pro- ceedings of the Eighth International Conference on Language Resources and Eval- uation, LREC 2012, Istanbul, Turkey, May 23-25, 2012. pp. 2473–2479 (2012), http://www.lrec-conf.org/proceedings/lrec2012/summaries/430.html 12. Pinto, M.: Modelo de Detección Automática de Ironı́a en Textos en Español. math- esis (2017) 13. Rangel, F., Rosso, P., Franco-Salvador., M.: A low dimensionality representation for language variety identification. In: 17th International Conference on Intelli- gent Text Processing and Computational Linguistics, CICLing’16. Springer-Verlag, LNCS(9624), pp. 156-169 (2018) 320 Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) 14. Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in twitter. Language Resources and Evaluation 47(1), 239–268 (2013). https://doi.org/10.1007/s10579-012-9196-x, https://doi.org/10.1007/s10579-012- 9196-x 15. Seidman, S.: Authorship verification using the impostors method notebook for PAN at CLEF 2013. In: Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013. (2013), http://ceur-ws.org/Vol-1179/CLEF2013wn-PAN- Seidman2013.pdf 16. Wallace, B.C.: Computational irony: A survey and new perspectives. Artif. Intell. Rev. 43(4), 467–483 (2015). https://doi.org/10.1007/s10462-012-9392-5, https://doi.org/10.1007/s10462-012-9392-5 321