-

IberLEF

Emotion-Based Cross-Variety Irony Detection?

Hiram Calvo

hcalvo@cic.ipn.mx 0

Omar Juarez Gambino

0 1 0 Center for Computing Research , CIC 1 Escuela Superior de Computo (ESCOM) Instituto Politecnico Nacional J.D. Batiz e/ M.O. de Mendizabal , 07738, Mexico City , Mexico

2019

24 264 271

This work is centered on the data made available for the IroSvA challenge, consisting of three variants of Spanish language from three di erent countries. We propose a simple model for identifying irony, based on tweet embeddings, refraining from using of additional NLP techniques. We aim to nd cues that are able to generalize the knowledge obtained from a language variant, and evaluate the ability to detect irony in di erent combinations of variants, from di erent countries and topics. For this purpose, we propose using six features based on the degree of emotion present in each tweet. These automatically tagged features include 5 levels of strength, ranging from none to very high, of six emotions: love, joy, surprise, sadness, anger, and fear. Experiments were carried out with di erent combinations of language variants. Obtained results show that exclusively using the information of the emotion levels (discarding the embeddings) could improve the irony detection in a language variant di erent from that used for training.

Several resources have been used as features for detecting irony: from lexical, syntactic features, to polarity, or changes in polarity [ 1 ]. Other works pay special attention to the role of a ective information involved in tweets [ 2 ] and have experimented with several emotion lexicons such as EMOLEX, EmoSN, SentiSense, LIWC, etc., obtaining state-of-the-art results. In this work, we experiment with the use of similar information, particularly automatically emotion-tagged tweets within the framework of the 6 main emotions described by Shaver (love, joy, surprise, sadness, anger and fear) [ 3 ], with the particularity of considering intensities of such emotions learned from text, ranging from N{none, to VH{very high), as described in [ 4 ].

Our main goal is to determine to what extent the use of these tags allows irony identi cation in di erent corpora (with unrelated topics) of the same language (in this case, Spanish, with some regional variants). An F-measure around 70% has been reported for tests performed on the same kind of trained text [ 5 ] [ 6 ] and some works report up to 90% using a ective content [ 7 ]. These tests have been carried out in the same language variant and same topics; however, are there more general cues of irony present that would allow to classify irony learning from one language variant, and testing with another? That is called cross-variety irony detection. For this purpose, a general feature representation that allows domain generalization is needed. A common solution to this, is to use embeddings for representing each tweet [ 8 ] [ 9 ]. In this work, we propose adding emotion labels as part of these features. Two main questions arise: (1) Can emotion intensity labels alone work as features for cross-variety irony detection? and (2) When used as a complement to an embeddings representation, the use of emotion-based features improves irony cross-variety classi cation?

To answer these questions, we focus on the corpora provided by the IroSvA challenge [ 10 ]. Within this context, our de nition of what is ironic and what is not, is de ned by the examples provided in the training datasets of this challenge.

The IroSvA challenge aims at investigating whether a short message, written in Spanish language, is ironic or not with respect to a given context. In particular, this challenge aims at studying the way irony changes in distinct Spanish variants. Concretely, it is focused on Spanish from Spain, Mexico and Cuba. Further details are given in [ 10 ].

In the next section we describe our classi cation scheme, along with description of features used. In Section 3 we provide details on our experiments and results, and nally in Section 4 we draw our conclusions. 2

Classi cation scheme

The same strategy was followed for the three subtasks (although some variants had improvements for some particular subtasks, we opted for using the same method). We performed a standard preprocessing consisting in xing CR/LF lines, tweets running several lines, and topic names (removing numbers and multi-word names). Then we converted representation to one-hot (word space model, WSM) with no lemmatization, no stopwords handling, and without ltering the minimum number of occurrences of each word. Approximately 12,000 tokens were identi ed for each corpus. Finally, the WSM was converted to embeddings using FastText Embeddings [ 11 ] from SBWC (Spanish Billion-Words Corpus)3. The number of dimensions was 300 and a total of 855,380 vectors were used4.

The IroSvA challenge has three corpora of distinct Spanish variants. Particularly Spanish from Spain, Mexico and Cuba. Each corpus was manually labeled, 3 crscardellino.github.io/SBWCE/ 4 github.com/dccuchile/spanish-word-embeddings and includes 1,600 examples of non-ironic texts, and 800 ironic texts. We randomly sampled 800 examples of non-ironic texts to have a balanced training data with 800 ironic texts and 800 non-ironic texts|800 non-ironic texts were discarded.

Models were trained using the AdaBoost M1 function [ 12 ] on Random Forest Classi ers [ 13 ] with parameters Bag Size=100%; Batch Size=100; and Unlimited depth trees. 3

Experiments and results

The e ect of considering topics is di erent for each language variant: for the es variant, there were no changes on performance, while for the mx variant, removing topics resulted on a small performance decrease. Finally, for the cu variant, not using topics represented a small performance increase. Therefore, we cannot conclude that adding or removing topic information could be of general bene t for this task. However, for the next series of experiments, topic information had to be removed, as topics among language variants are completely di erent. Results of Table 1 suggest that removing this information would not harm general performance for this task, so that for following experiments, only features of embeddings are used. For this series of experiments, we considered the previously balanced corpora of 1,600 tweets each. Additionally, we built three new corpora by combining two language varieties in order to observe the capabilities of generalizing irony characterization from only one language variant vs. a di erent one, as well as two amalgamated language varieties against a di erent one. The new corpora were named esmx, which combined the es and mx corpora; escu (es + cu), and mxcu (mx + cu). Table 2 shows accuracy of all possible combinations, including those which were tested against a subset of training. For example, for the third row (esmx ) tested with the rst colum (es), result was signi cantly higher (89.09%) because es was a subset of esmx, and, of course, its cases had been already seen in the training set.

From Table 2 more interesting values can be observed: for example, for the rst quadrant (top-left), which compared simple (not combined) corpora, the best value was obtained when training with the cu variant, tested on the es variant. The inverse situation yielded the best results as well (training with es and testing with cu) compared with training with mx (and tested on cu). A similar situation happened for the mx variant: Training with es yielded better results than training with cu. Best results for each language variant (per row) are shown in italics for this quadrant.

For the second quadrant (top-right), when using the es variant for evaluating with the mxcu combined corpus, results were very similar to evaluating only with the mx corpus (56.32% vs. 56.31%). However, when training with mx on unseen varieties together (escu ) results were lower than the previous best result (55.78% vs. 57.01% with es). The same happened for the cu variant evaluated on esmx and es, respectively (56.06% vs. 57.89%).

Finally, for the third quadrant (bottom-left), combined corpora were used to train, and they were evaluated with single corpora. Combinations not including the evaluation set in the training set are shown in italics. Compared with the rst quadrant (training with simple corpora), all varieties were bene ted. For example, cu increased from 55.25% with es, to 55.88% with esmx ; mx increased from 56.31% with es, to 58.78% with escu; es increased from 57.89% with cu, to 58.84% with mxcu. This may suggest that, despite being di erent language varieties with di erent topics and ways of expression, amalgamating two corpora helped to predict irony on a di erent corpus.

The last quadrant (bottom-right) is also shown in Table 2; however, as all training sets are partially contained on all evaluation subsets, these results are not so interesting to discuss. 3.2

Cross-variety irony detection using emotion-levels As mentioned in Section 1, a di erent set of features is proposed for this task: the use of 5 levels of emotions (None, Very Low, Low, High, and Very High) for a 6-tuple of emotions: (love, joy, surprise, anger, sadness, and fear) corresponding to the top level of emotions proposed by [ 3 ]. Another application of an automatic tagger for this kind of emotion-levels can be found in [ 4 ].

As an example of the obtained features, consider Table 3. The rst tweet has a value of None for love, joy, and surprise, while Very High anger, and Low values of sadness and fear. Como cuando cambias de personal porque hacen mal el trabajo encomendado y resulta que los reemplazos son piores El cine es subjetivo...... creo q es muy buena para los que vivimos en CDMX en esa epoca, es nostalgica... s le falto un poco mas de historia... pero s me gusto... pienso que dirigir una pel cula sin actores profesionales es un gran merito!! Felicidades @alfonsocuaron Emotions tuple

N,N,N,VH,L,L L,VH,VL,N,N,N Muy bien, <<<a comprar!!! Bueno si abre la pagina primero N,VH,N,N,VL,N

Results for the set of experiments using only emotion features are shown in Table 4. As can be seen, this time experiments that involved the test set in the training set did not have a high accuracy; compare esmx with mx | embeddings: 88.70%, emotions: 57.68%. Yet interestingly, when evaluating the cu variant, both training with es or mx, results are higher than their embeddings counterpart (shown in bold). In overall, results using emotion-levels only are only 1.91% below their embeddings counterpart for single to single corpora (es, mx, cu, rst quadrant|top-left), which is interesting, considering the reduction of 300 to only 6 features.

A general comparison of accuracies using embeddings or emotions as features is shown in Table 5. Quadrants are numbered as (1) top-left, (2) top-right, and (3) bottom-left. The rst quadrant represents single vs. single varieties, i.e., no variant combinations were used. The second quadrant represents training with single varieties evaluated on their unseen combined variant, i.e. es vs mxcu, mx vs. escu, and cu vs. esmx. The third quadrant represents training with combined corpora, evaluated on their unseen single variant, i.e. esmx vs. cu; escu vs. mx ; and mxcu vs. es. For calculating these averages, no overlapping combinations were considered (v.gr. es vs. esmx ).

As can be seen from Table 5, cross-variety irony detection is performed better when using embeddings as features in average; however, di erence found is relatively small, suggesting that emotion features could be used to improve or aid sentiment related tasks, such as irony detection.

Finally, to answer the second question posed in Section 1, we experimented with using emotions and embeddings altogether, obtaining only a slight increase for the cu dataset. Accuracies of using only embeddings to embeddings+emotions were: es:82.32 to 82.13%; mx :80.37 to 78.79%; cu:79.10 to 79.36%. From these results, we are not able to conclude that using both embeddings and emotions simultaneously would be of general bene t, at least for the language varieties and topics addressed in this task. Finally, in this section we compare our results with other works. Particularly, we were provided with four di erent results, being majority voting, using word nGrams, Word2Vec features (no speci c details provided), and using LDSE, as described in [ 14 ]. Accuracy results are shown in Table 6. For one language variant (es), our model was able to overcome the provided results, but in average both LDSE and Word2Vec systems presented better results. 4

Conclusions and Future Work

For this task, a relatively simple model was proposed to classify tweets as ironic or not ironic for three di erent language varieties. This model was mainly based on embeddings as features. This representation allowed our model to learn features from a di erent language variant or varieties, and attempt to classify tweets from an unseen variant as ironic or not ironic.

A particular contribution of this work consisted on using emotion-levels as features to perform the same task. Interestingly, the classi ers were still able to classify tweets with a similar performance than when using tweet embeddings| less than 3% overall average di erence in accuracy; and for some variant pairs (es vs. cu and mx vs. cu) performance was improved, compared to using embeddings only. This evidence suggests that using emotion levels as features could be used to aid sentiment-related classi cation tasks such as irony detection.

For this work, no additional information other than the embeddings and the emotion-level tagger was used. As a future work, we plan to include information on the context, as well as the possibility to perform opinion objects identi cation along with sentiment analysis to improve performance in this task.

Van

Hee , C. : Can machines sense irony?: exploring automatic irony detection on social media . PhD thesis , Ghent University ( 2017 )

2. Far as, D.I.H. : Irony and sarcasm detection in Twitter: the role of a ective content . PhD thesis , Universitat Politecnica de Valencia ( 2017 )

3. Shaver , P. , Schwartz , J. , Kirson , D. , O'connor, C.: Emotion Knowledge: Further Exploration of a Prototype Approach . Journal of personality and social psychology 52 ( 1987 ) 1061

4. Gambino , O.J. , Calvo , H.: Predicting emotional reactions to news articles in social networks . Computer Speech & Language 58 ( 2019 ) 280 { 303

Van

Hee , C. , Lefever , E. , Hoste , V. : Semeval-2018 task 3: Irony detection in english tweets . In: Proceedings of The 12th International Workshop on Semantic Evaluation . ( 2018 ) 39 { 50

6. Cignarella , A.T. , Frenda , S. , Basile , V. , Bosco , C. , Patti , V. , Rosso , P. , et al.: Overview of the Evalita 2018 task on irony detection in italian tweets (IronITA) . In: Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2018 ). Volume 2263 ., CEUR-WS ( 2018 ) 1{ 6

7. Far as, D.I.H. , Patti , V. , Rosso , P. : Irony detection in Twitter: The role of a ective content . ACM Transactions on Internet Technology (TOIT) 16 ( 2016 ) 19

8. Mikolov , T. , Chen , K. , Corrado , G. , Dean , J.: E cient estimation of word representations in vector space . arXiv preprint arXiv:1301.3781 ( 2013 )

9. Dhingra , B. , Zhou , Z. , Fitzpatrick , D. , Muehl , M. , Cohen , W.W.: Tweet2vec: Character-based distributed representations for social media . arXiv preprint arXiv:1605.03481 ( 2016 )

10. Ortega-Bueno , R. , Rangel , F. , Hernandez Far as, D.I. , Rosso , P. , Montes- y-Gomez, M. ,

Medina

Pagola , J.E. : Overview of the Task on Irony Detection in Spanish Variants . In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019 ), co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019), CEUR-WS.org ( 2019 )

11. Joulin , A. , Grave , E. , Bojanowski , P. , Douze , M. , Jegou , H. , Mikolov , T. : Fasttext.zip: Compressing text classi cation models . arXiv preprint arXiv:1612.03651 ( 2016 )

12. Cortes , E.A. , Martinez , M.G. , Rubio , N.G. : Multiclass corporate failure prediction by adaboost . m1. International Advances in Economic Research 13 ( 2007 ) 301 { 312

13. Breiman , L. : Random forests . Machine learning 45 ( 2001 ) 5 { 32

14. Rangel , F. , Franco-Salvador , M. , Rosso , P.: A low dimensionality representation for language variety identi cation . In: International Conference on Intelligent Text Processing and Computational Linguistics, LNCS 9624 , Springer ( 2018 ) 156 { 169