UO IRO: Linguistic informed deep-learning model for irony detection Reynier Ortega-Bueno José E. Medina Pagola Center for Pattern Recognition and University of Informatics Sciences Data Mining, Santiago de Cuba, Cuba Havana, Cuba reynier.ortega@cerpamid.co.cu jmedinap@uci.cu Computer Science Department, University of Oriente reynier@uo.edu.cu Abstract basic tasks related to natural language compre- hension have been effectively resolved. Notwith- English. This paper describes our UO - standing, slight advances have been archived by IRO system developed for participating in the machines when figurative devices and creativ- the shared task IronITA, organized within ity are used in language with communicative pur- EVALITA: 2018 Workshop. Our approach poses. Irony is a peculiar case of figurative de- is based on a deep learning model in- vices frequently used in real life communication. formed with linguistic knowledge. Specif- As human beings, we appeal to irony for express- ically, a Convolutional (CNN) and Long ing in implicit way a meaning opposite to the lit- Short Term Memory (LSTM) neural net- eral sense of the utterance (Attardo, 2000; Wilson work are ensembled, also, the model is and Sperber, 1992). Thus, understanding irony re- informed with linguistics information in- quires a more complex set of cognitive and lin- corporated through its second to last hid- guistics abilities than literal meaning. Due to its den layer. Results achieved by our sys- nature, irony has important implications in senti- tem are encouraged, however a more fine- ment analysis and other related tasks, which aim tuned hyper-parameters setting is required at recognizing feelings and emotions from texts. for improving the model’s effectiveness. Considering that, detecting irony automatically from textual messages is an important issue to en- Italiano. Questo articolo descrive il nos- hance sentiment analysis and it is still an open re- tro sistema UO IRO, sviluppato per la search problem (Gupta and Yang, 2017; Maynard partecipazione allo shared task IronITA, and Greenwood, 2014; Reyes et al., 2013). presso EVALITA 2018. Il nostro approc- cio si basa su un modello di deep learn- In this work we address the fascinating prob- ing con conoscenza linguistica. In par- lem of automatic irony detection in tweets writ- ticolare: una Convolutional Neural Net- ten in Italian language. Particularly, we describe work (CNN) e una Long Short Term Mem- our irony detection system (UO IRO) developed ory Neural Network (LSTM). Inoltre, il for participating in IronITA 2018: Irony Detection modello è arricchito da conoscenza lin- in Italian Tweets (Cignarella et al., 2018a). Our guistica, incorporata nel penultimo hid- proposed model is based on a deep learning model den layer del modello. Sebbene sia nec- informed with linguistic information. Specifically, essario un miglioramento a grana fine dei a CNN and an attention based LSTM neural net- parametri per migliorare le prestazioni del work are ensembled, moreover, the model is in- modello, i risultati ottenuti sono incorag- formed with linguistic information incorporated gianti. through its second to last hidden layer. We only participated in Task A (irony detection). For that, two constrained runs and two unconstrained runs 1 Introduction were submitted. The official results shown that our system obtains interesting results. Our best Computers interacting with humans through lan- run was ranked in 12th position out of 17 submis- guage, in natural way, continues to be one of the sions. The paper is organized as follows. In Sec- most salient challenge for Artificial Intelligent re- tion 2, we introduce our UO IRO system for irony searchers and practitioners. Nowadays, several detection. Experimental results are subsequently discussed in Section 3. Finally, in Section 4, we present our conclusions and attractive directions for future work. 2 UO IRO system for irony detection The motivation for this work comes from two di- rections. In a first place, the recent and promis- ing results found by some authors (Deriu and Cieliebak, 2016; Cimino and Dell’Orletta, 2016; Gonález et al., 2018; Rangwani et al., 2018; Wu et al., 2018; Peng et al., 2018) in the use of con- volutional networks and recursive networks, also the hybridization of them for dealing with figura- tive language. The second direction is motivated Figure 1: Overall Architecture of UO IRO: Irony by the wide use of linguistic features manually en- Detection System. coded which have showed to be good indicators for discriminating among ironic and non ironic 2.1 Preprocessing content (Reyes et al., 2012; Reyes and Rosso, In the preprocessing step, the tweets are cleaned. 2014; Barbieri et al., 2014; Farı́as et al., 2016; Firstly, the emoticons, urls, hashtags, mentions Farı́as et al., 2018). and twitter-specific tokens (RT for retweet and Our proposal learns a representation of the FAV for favorite) are recognized and replaced by a tweets in three ways. In this sense, we propose corresponding wild-card which encodes the mean- to learn a representation based on a recursive net- ing of these special words. Afterwards, tweets are work with the purpose of capturing long depen- morphologically analyzed by FreeLing (Padró and dencies among terms in the tweets. Moreover, a Stanilovsky, 2012). In this way, for each resulting representation based on convolutional network is token, its lemma is assigned. Then, the words in considered, it tries to encode local and partial re- the tweets are represented as vectors using a word lation between words which are near among them- embedding model. In this work we use the Italian selves. The last representation is based on linguis- pre-trained vectors2 public available (Bojanowski tic features which are calculated for the tweets. et al., 2017). After that, all linguistic features previously com- puted are concatenated in a one-dimensional vec- 2.2 Attention Based LSTM tor and it is passed through a dense hidden layer We use a model that consists in a Bidirectional which encodes the linguistic knowledge and in- LSTM neural network (Bi-LSTM) at the word cludes this information to the model. level. Each time step t, the Bi-LSTM gets as in- Finally, the three neural network based outputs put a word vector wt with syntactic and semantic are combined in a merge layer. The integrated rep- information known as word embedding. The idea resentations is passed to a dense hidden layer and behind this Bi-LSTM is to capture long-range and the final classification is performed by the output backwards dependencies in the tweets. Afterward, layer, which use a softmax as activation function an attention layer is applied over each hidden state for predicting ironic or not ironic labels. For train- ht . The attention weights are learned using the ing the complete model we use categorical cross- concatenation of the current hidden state ht of the entropy as loss function and the Adam method Bi-LSTM and the past hidden state st−1 . The goal (Kingma and Ba, 2014) as the optimizer, also, we of this layer is then to derive a context vector ct use a batch size of 64 and training the model for 20 that captures relevant information for feeding it as epochs. Our proposal was implemented using the input to the next level. Finally, a LSTM layer is Keras Framework1 . The architecture of the UO - stacked at the top. This network at each time step IRO is shown in Figure 1 and described below. receives the context vector ct which is propagated 2 https://s3-us-west-1.amazonaws.com/fasttext- 1 https://keras.io/ vectors/wiki.it.zip until the final hidden state sTx . This vector (sTx ) of the word with major number of synsets and can be considered as a high level representation of the average number of synsets. the tweet. For more details, please see (Ortega- • Domain Ambiguity: Three different features Bueno et al., 2018). were considered using WordNet: the first one is the mean of the number of domains of 2.3 Convolutional Neural Network each word. The second one is the greatest We use a CNN model that consists in 3 pairs of number of domains that a single word has convolutional layers and pooling layers in this ar- in the tweet. The last one is the difference chitecture. Filters of size three, four and five were between the number of domains of the word defined for the convolutional layers. In case of with major number of domains and the av- pooling layer, the maxpooling strategy was used. erage number of domains. It is important to We also use the Rectified Linear Unit (ReLU), clarify that the resources WordNet Domains3 Normalization and Dropout methods to improve and SUMO4 were separately used. the accuracy and generalizability of the model. • Persons: This feature tries to capture verbs conjugated in the first, second, third person 2.4 Linguistic Features and nouns and adjectives which agree with In our work, we explored some linguistic features such conjugations. useful for irony detection in texts which can be • Tenses: This feature tries to capture the dif- grouped in three main categories: Stylistic, Struc- ferent verbal tenses used in the tweet. tural and Content, and Polarity Contrast. We de- • Questions-answers: Occurrences of ques- fine a set of features distributed as follows: tions and answers pattern in the tweet. • Part of Speech: The number of nouns, verbs, Stylistic Features adverbs and adjectives in the tweet are quan- • Length: Three different features were consi- tified. dered: number of words, number of charac- • Negation: The amount of negation words. ters, and the means of the length of the words Polarity Contrast Features in the tweet. • Hashtags: The amount of hashtags. With the purpose of capturing some types of ex- • Urls: The number of url. plicit polarity contrast we consider the set of fea- • Emoticons: The number of emoticons. tures proposed in (Peña et al., 2018). The Ital- • Exclamations: Occurrences of exclamation ian polarity lexicon (Basile and Nissim, 2013) was marks. used to determine the contrast between different • Emphasized Words: Four different features parts of the tweet. were considered: word emphasized through • WordPolarityContrast: It is the polarity dif- repetition, capitalization, character flooding ference between the most positive and the and exclamation marks. most negative word in the tweet. This fea- • Punctuation Marks: The frequency of dots, ture, also consider the distance, in terms of commas, semicolons, and question marks. tokens, between the words. • Quotations: The number of expressions be- • EmotiTextPolarityContrast: It is the pola- tween quotation marks. rity contrast between the emoticons and the words in the tweet. Structural and Content Features • AntecedentConsequentPolarityContrast: • Antonyms: This feature considers the num- This considers the polarity contrast between ber of pairs of antonyms existing in the two parts of the tweet, when it is split tweet. WordNet (Miller, 1995) antonym re- by a delimiter. In this case, adverbs and lation was used for that. punctuation marks were used as delimiters. • Lexical Ambiguity: Three different features • MeanPolarityPhrase: It is the mean of the were considered using WordNet: the first one polarities of the words that belong to quotes. is the mean of the number of synsets of each • PolarityStandardDeviation: It is the standard word. The second one is the greatest number deviation of the polarities of the words that of synsets that has a single word. The last is 3 http://wndomains.fbk.eu/hierarchy.html 4 the difference between the number of synsets http://www.adampease.org/OP/ belong to quotes. SVM) are a modification of the CNN-LSTM • PresentPastPolarityContrast: It computes model, in this case we change the softmax layer the polarity contrast between the parts of the at the output of the model and use a Linear Sup- tweet written in present and past tense. port Vector Machine (SVM) with default parame- • SkipGramPolarityRate:It computes the rate ters as final classifier. Run3-c and run3-u (CNN- among skip-grams with polarity contrast and LSTM-LING) represent the original introduced all valid skip-grams. The valid skip-grams model without any variations. Finally, for run4-c are those composed by two words (nouns, and run4-u (CNN-LSTM-LING-SVM) we change adjectives, verbs, adverbs) with skip=1. the softmax layer by a linear SVM as final classi- The skip-grams with polarity opposition are fier. For unconstrained runs, we include the ironic those that match with the patterns positive- tweets provided by the corpus Twittirò (Cignarella negative, positive-neutral, negative-neutral, et al., 2018b), to the official training set releases and vise versa. by the IronITA organizers. • CapitalLetterTextPolarityContrast: It com- putes the polarity contrast between capital- Analyzing Table 1, several observations can be ized words and the rest of the words in the made. Firstly, unconstrained runs achieved bet- tweets. ter results than constrained ones. These results reveal that introducing more ironic examples im- 3 Experiments and Results proves the performance of the UO IRO. Secondly, In this section we show the results of the pro- the results achieved with the variants that consider posed model in the shared task of “Irony Detec- the linguistic knowledge (run3-c, run4-c, run3-u tion” and discuss them. In a first experiment we and run4-u) obtain an increase in the effectiveness. analyze the performance of four variants of our With respect to the strategy used for the final clas- model using 10 fold cross-validation strategy on sification of the tweets, generally, those variants the training set. Also, each variant was running that use SVM obtain a slight drop in the AVG-F1 . in unconstrained and constrained setting, respec- tively. In Table 1, we summarize the obtained Regarding the official results, we submitted results in terms of F1 measure macro averaged four runs, two for constrained setting (RUN1-c (F1-AVG). Specifically, we rely on the macro for and RUN2-c) and two for unconstrained setting preventing systems biased towards the most popu- (RUN3-u and RUN4-u). For the unconstrained lated classes. variants of the UO IRO, the tweets provided by the corpus Twittirò were also used with the train- ing set. Taking into account the results of the Table Table 1: Results obtained by UO IRO on the train- 1 we select to CNN-LSTM-LING (RUN1-c and ing set by using 10-fold cross-validation. RUN3-u) and CNN-LSTM-LING-SVM (RUN2-c Run Model AVG-F1 and RUN4-u) as the most promising variants of the Constrained model for evaluating in the official test set. run1-c CNN-LSTM 0.7019 run2-c CNN-LSTM-SVM 0.6927 As can be observed in Table 2, our four runs run3-c CNN-LSTM-LING 0.7124 were ranked 12th , 13th , 14th and 15th from a to- run4-c CNN-LSTM-LING-SVM 0.7040 tal of 17 submissions. The unconstrained variants Unconstrained of the UO IRO achieved better results than con- run1-u CNN-LSTM 0.7860 strained ones. Contrary to the results shown in the run2-u CNN-LSTM-SVM 0.7900 Table 1, the runs that use SVM as final classifica- run3-u CNN-LSTM-LING 0.8226 tion strategy (RUN2-c and RUN4-u) were better run4-u CNN-LSTM-LING-SVM 0.8207 ranked than the other ones. We think that this be- havior may be caused by softmax classifiers (last For the run1-c and run1-u (CNN-LSTM) we layer of the UO IRO), those are more sensitive to only combine the representation obtained by the the over-fitting problem than Support Vector Ma- attention based LSTM model with the CNN chines. Notice that, in all cases our model surpass model, in these runs, no linguistic knowledge was the two baseline methods established by the orga- considered. Run2-c and run2-u (CNN-LSTM- nizers. Alessandra Cignarella, Frenda Simona, Basile Vale- Table 2: Official results for the Irony Detection rio, Bosco Cristina, Patti Viviana, and Rosso Paolo. subtask. 2018a. Overview of the EVALITA 2018 Task on Rank Runs F1 -I F1 -noI Avg-F1 Irony Detection in Italian Tweets (IronITA). In 12/17 RUN4-u 0.700 0.603 0.651 Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors, Proceedings of Sixth Eval- 13/17 RUN3-u 0.665 0.626 0.646 uation Campaign of Natural Language Process- 14/17 RUN2-c 0.678 0.579 0.629 ing and Speech Tools for Italian. Final Workshop 15/17 RUN1-c 0.577 0.652 0.614 (EVALITA 2018), Turin, Italy. CEUR.org. Alessandra Teresa Cignarella, Cristina Bosco, Viviana Patti, and Mirko Lai. 2018b. Application and Anal- 4 Conclusions ysis of a Multi-layered Scheme for Irony on the In this paper we presented the UO IRO system Italian Twitter Corpus TWITTIRÓ. In Proceed- ings of the Eleventh International Conference on for the task of Irony Detection in Italian Tweets Language Resources and Evaluation (LREC 2018), (IronITA) at EVALITA 2018. We participated in pages 4204–4211. the “Irony classification’ subtask and our best sub- mission ranked 12nd out of 17. Our proposal com- Andrea Cimino and Felice Dell’Orletta. 2016. Tandem LSTM-SVM Approach for Sentiment Analysis. In bines attention-based Long Short-Term Memory CLiC-it/EVALITA. 2016, pages 1–6. CEUR-WS.org. Network, Convolutional Neural Network, and lin- guistics information which is incorporated through Jan Deriu and Mark Cieliebak. 2016. Sentiment the second to last hidden layer of the model. The Analysis using Convolutional Neural Networks with results shown that the consideration of linguistic Multi-Task Training and Distant Supervision on Ital- ian Tweets. In CLiC-it/EVALITA. 2016, pages 1–5. features in combination with the deep represen- CEUR-WS.org. tation learned by the neural network models ob- tain better effectiveness based on F1-measure. Re- Delia Irazú Hernańdez Farı́as, Viviana Patti, and Paolo sults achieved by our system are interesting, how- Rosso. 2016. Irony Detection in Twitter. ACM Transactions on Internet Technology, 16(3):1–24. ever a more fine-tuned hyper-parameters setting is required for improving the model’s effectiveness. Delia-Irazú Hernádez Farı́as, Viviana Patti, and Paolo We think that including the linguistic features of Rosso. 2018. ValenTO at SemEval-2018 Task 3 : irony into the firsts layers of the model could be Exploring the Role of Affective Content for Detect- ing Irony in English Tweets. In Proceedings ofthe a way to increase the effectiveness. We would 12th International Workshop on Semantic Evalua- like to explore this approach in the future work. tion (SemEval-2018), pages 643–648. Association Also, we plan to analyze how affective informa- for Computational Linguistics. tion flows through the tweets, and how it impacts José Angel Gonález, Lluı́s-F. Hurtado, and Ferran Pla. on the irony realization. 2018. ELiRF-UPV at SemEval-2018 Tasks 1 and 3 : Affect and Irony Detection in Tweets. In Proceed- ings ofthe 12th International Workshop on Semantic References Evaluation (SemEval-2018), pages 565–569. Salvatore Attardo. 2000. Irony as relevant inappropri- Raj Kumar Gupta and Yinping Yang. 2017. Crys- ateness. Journal of Pragmatics, 32(6):793–826. talNest at SemEval-2017 Task 4 : Using Sar- Francesco Barbieri, Horacio Saggion, and Francesco casm Detection for Enhancing Sentiment Classifi- Ronzano. 2014. Modelling Sarcasm in Twitter, a cation and Quantification. In Proceedings ofthe Novel Approach. In Proceedings ofthe 5th Work- 11th International Workshop on Semantic Evalua- shop on Computational Approaches to Subjectivity, tions (SemEval-2017), pages 626–633, Vancouver, Sentiment and Social Media Analysis, pages 136– Canada. Association for Computational Linguistics. 141. Diederik P Kingma and Jimmy Ba. 2014. Adam: A Valerio Basile and Malvina Nissim. 2013. Senti- method for stochastic optimization. arXiv preprint ment analysis on Italian tweets. In 4th Workshop arXiv:1412.6980. on Computational Approaches to Subjectivity, Senti- ment and Social Media Analysis, pages 100–107. Diana Maynard and Mark A Greenwood. 2014. Who cares about sarcastic tweets ? Investigating the im- Piotr Bojanowski, Edouard Grave, Armand Joulin, and pact of sarcasm on sentiment analysis. In Proceed- Tomas Mikolov. 2017. Enriching Word Vectors ings of the Ninth International Conference on Lan- with Subword Information. Transactions of the guage Resources and Evaluation (LREC’14). Euro- ACL., 5:135–146. pean Language Resources Association. George A Miller. 1995. WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41. Reynier Ortega-Bueno, Carlos E Mu, and Paolo Rosso. 2018. UO UPV : Deep Linguistic Humor De- tection in Spanish Social Media. In Proceed- ings of the Third Workshop on Evaluation of Hu- man Language Technologies for Iberian Languages (IberEval 2018), pages 1–11. Lluı́s Padró and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards Wider Multilinguality. In Proceedings of the (LREC 2012). Anakarla Sotolongo Peña, Leticia Arco Garcı́a, and Adrián Rodrı́guez Dosina. 2018. Detección de ironı́a en textos cortos enfocada a la minerı́a de opinión. In IV Conferencia Internacional en Ciencias Computacionales e Informáticas (CICCI’ 2018), number 1-10, Havana, Cuba. Bo Peng, Jin Wang, and Xuejie Zhang. 2018. YNU- HPCC at SemEval-2018 Task 3 : Ensemble Neu- ral Network Models for Irony Detection on Twit- ter. In 622 Proceedings ofthe 12th International Workshop on Semantic Evaluation (SemEval-2018), pages 622–627. Association for Computational Lin- guistics. Harsh Rangwani, Devang Kulshreshtha, and Anil Ku- mar Singh. 2018. NLPRL-IITBHU at SemEval- 2018 Task 3 : Combining Linguistic Features and Emoji Pre-trained CNN for Irony Detection in Tweets. In Proceedings ofthe 12th International Workshop on Semantic Evaluation (SemEval-2018), pages 638–642. Association for Computational Lin- guistics. Antonio Reyes and Paolo Rosso. 2014. On the dif- ficulty of automatically detecting irony: beyond a simple case of negation. Knowledge and Informa- tion Systems, 40(3):595–614. Antonio Reyes, Paolo Rosso, and Davide Buscaldi. 2012. From humor recognition to irony detection: The figurative language of social media. Data and Knowledge Engineering, 74:1–12. Antonio Reyes, Paolo Rosso, and Tony Veale. 2013. A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1):239–268. Deirdre Wilson and Dan Sperber. 1992. On verbal irony. Lingua, 87(1):53–76. Chuhan Wu, Fangzhao Wu, Sixing Wu, Junxin Liu, Zhigang Yuan, and Yongfeng Huang. 2018. THU NGN at SemEval-2018 Task 3 : Tweet Irony De- tection with Densely Connected LSTM and Multi- task Learning. In Proceedings ofthe 12th Interna- tional Workshop on Semantic Evaluation (SemEval- 2018), pages 51–56. Association for Computational Linguistics.