UO IRO: Linguistic informed deep-learning model for irony detection
             Reynier Ortega-Bueno                            José E. Medina Pagola
        Center for Pattern Recognition and               University of Informatics Sciences
       Data Mining, Santiago de Cuba, Cuba                         Havana, Cuba
     reynier.ortega@cerpamid.co.cu                            jmedinap@uci.cu
          Computer Science Department,
              University of Oriente
            reynier@uo.edu.cu

                     Abstract                           basic tasks related to natural language compre-
                                                        hension have been effectively resolved. Notwith-
    English. This paper describes our UO -              standing, slight advances have been archived by
    IRO system developed for participating in           the machines when figurative devices and creativ-
    the shared task IronITA, organized within           ity are used in language with communicative pur-
    EVALITA: 2018 Workshop. Our approach                poses. Irony is a peculiar case of figurative de-
    is based on a deep learning model in-               vices frequently used in real life communication.
    formed with linguistic knowledge. Specif-           As human beings, we appeal to irony for express-
    ically, a Convolutional (CNN) and Long              ing in implicit way a meaning opposite to the lit-
    Short Term Memory (LSTM) neural net-                eral sense of the utterance (Attardo, 2000; Wilson
    work are ensembled, also, the model is              and Sperber, 1992). Thus, understanding irony re-
    informed with linguistics information in-           quires a more complex set of cognitive and lin-
    corporated through its second to last hid-          guistics abilities than literal meaning. Due to its
    den layer. Results achieved by our sys-             nature, irony has important implications in senti-
    tem are encouraged, however a more fine-            ment analysis and other related tasks, which aim
    tuned hyper-parameters setting is required          at recognizing feelings and emotions from texts.
    for improving the model’s effectiveness.            Considering that, detecting irony automatically
                                                        from textual messages is an important issue to en-
    Italiano. Questo articolo descrive il nos-          hance sentiment analysis and it is still an open re-
    tro sistema UO IRO, sviluppato per la               search problem (Gupta and Yang, 2017; Maynard
    partecipazione allo shared task IronITA,            and Greenwood, 2014; Reyes et al., 2013).
    presso EVALITA 2018. Il nostro approc-
    cio si basa su un modello di deep learn-               In this work we address the fascinating prob-
    ing con conoscenza linguistica. In par-             lem of automatic irony detection in tweets writ-
    ticolare: una Convolutional Neural Net-             ten in Italian language. Particularly, we describe
    work (CNN) e una Long Short Term Mem-               our irony detection system (UO IRO) developed
    ory Neural Network (LSTM). Inoltre, il              for participating in IronITA 2018: Irony Detection
    modello è arricchito da conoscenza lin-            in Italian Tweets (Cignarella et al., 2018a). Our
    guistica, incorporata nel penultimo hid-            proposed model is based on a deep learning model
    den layer del modello. Sebbene sia nec-             informed with linguistic information. Specifically,
    essario un miglioramento a grana fine dei           a CNN and an attention based LSTM neural net-
    parametri per migliorare le prestazioni del         work are ensembled, moreover, the model is in-
    modello, i risultati ottenuti sono incorag-         formed with linguistic information incorporated
    gianti.                                             through its second to last hidden layer. We only
                                                        participated in Task A (irony detection). For that,
                                                        two constrained runs and two unconstrained runs
1   Introduction                                        were submitted. The official results shown that
                                                        our system obtains interesting results. Our best
Computers interacting with humans through lan-          run was ranked in 12th position out of 17 submis-
guage, in natural way, continues to be one of the       sions. The paper is organized as follows. In Sec-
most salient challenge for Artificial Intelligent re-   tion 2, we introduce our UO IRO system for irony
searchers and practitioners. Nowadays, several          detection. Experimental results are subsequently
discussed in Section 3. Finally, in Section 4, we
present our conclusions and attractive directions
for future work.

2       UO IRO system for irony detection

The motivation for this work comes from two di-
rections. In a first place, the recent and promis-
ing results found by some authors (Deriu and
Cieliebak, 2016; Cimino and Dell’Orletta, 2016;
Gonález et al., 2018; Rangwani et al., 2018; Wu
et al., 2018; Peng et al., 2018) in the use of con-
volutional networks and recursive networks, also
the hybridization of them for dealing with figura-
tive language. The second direction is motivated         Figure 1: Overall Architecture of UO IRO: Irony
by the wide use of linguistic features manually en-      Detection System.
coded which have showed to be good indicators
for discriminating among ironic and non ironic           2.1   Preprocessing
content (Reyes et al., 2012; Reyes and Rosso,
                                                         In the preprocessing step, the tweets are cleaned.
2014; Barbieri et al., 2014; Farı́as et al., 2016;
                                                         Firstly, the emoticons, urls, hashtags, mentions
Farı́as et al., 2018).
                                                         and twitter-specific tokens (RT for retweet and
   Our proposal learns a representation of the           FAV for favorite) are recognized and replaced by a
tweets in three ways. In this sense, we propose          corresponding wild-card which encodes the mean-
to learn a representation based on a recursive net-      ing of these special words. Afterwards, tweets are
work with the purpose of capturing long depen-           morphologically analyzed by FreeLing (Padró and
dencies among terms in the tweets. Moreover, a           Stanilovsky, 2012). In this way, for each resulting
representation based on convolutional network is         token, its lemma is assigned. Then, the words in
considered, it tries to encode local and partial re-     the tweets are represented as vectors using a word
lation between words which are near among them-          embedding model. In this work we use the Italian
selves. The last representation is based on linguis-     pre-trained vectors2 public available (Bojanowski
tic features which are calculated for the tweets.        et al., 2017).
After that, all linguistic features previously com-
puted are concatenated in a one-dimensional vec-         2.2   Attention Based LSTM
tor and it is passed through a dense hidden layer        We use a model that consists in a Bidirectional
which encodes the linguistic knowledge and in-           LSTM neural network (Bi-LSTM) at the word
cludes this information to the model.                    level. Each time step t, the Bi-LSTM gets as in-
   Finally, the three neural network based outputs       put a word vector wt with syntactic and semantic
are combined in a merge layer. The integrated rep-       information known as word embedding. The idea
resentations is passed to a dense hidden layer and       behind this Bi-LSTM is to capture long-range and
the final classification is performed by the output      backwards dependencies in the tweets. Afterward,
layer, which use a softmax as activation function        an attention layer is applied over each hidden state
for predicting ironic or not ironic labels. For train-   ht . The attention weights are learned using the
ing the complete model we use categorical cross-         concatenation of the current hidden state ht of the
entropy as loss function and the Adam method             Bi-LSTM and the past hidden state st−1 . The goal
(Kingma and Ba, 2014) as the optimizer, also, we         of this layer is then to derive a context vector ct
use a batch size of 64 and training the model for 20     that captures relevant information for feeding it as
epochs. Our proposal was implemented using the           input to the next level. Finally, a LSTM layer is
Keras Framework1 . The architecture of the UO -          stacked at the top. This network at each time step
IRO is shown in Figure 1 and described below.            receives the context vector ct which is propagated
                                                            2
                                                              https://s3-us-west-1.amazonaws.com/fasttext-
    1
        https://keras.io/                                vectors/wiki.it.zip
until the final hidden state sTx . This vector (sTx )       of the word with major number of synsets and
can be considered as a high level representation of         the average number of synsets.
the tweet. For more details, please see (Ortega-          • Domain Ambiguity: Three different features
Bueno et al., 2018).                                        were considered using WordNet: the first one
                                                            is the mean of the number of domains of
2.3   Convolutional Neural Network                          each word. The second one is the greatest
We use a CNN model that consists in 3 pairs of              number of domains that a single word has
convolutional layers and pooling layers in this ar-         in the tweet. The last one is the difference
chitecture. Filters of size three, four and five were       between the number of domains of the word
defined for the convolutional layers. In case of            with major number of domains and the av-
pooling layer, the maxpooling strategy was used.            erage number of domains. It is important to
We also use the Rectified Linear Unit (ReLU),               clarify that the resources WordNet Domains3
Normalization and Dropout methods to improve                and SUMO4 were separately used.
the accuracy and generalizability of the model.           • Persons: This feature tries to capture verbs
                                                            conjugated in the first, second, third person
2.4   Linguistic Features                                   and nouns and adjectives which agree with
In our work, we explored some linguistic features           such conjugations.
useful for irony detection in texts which can be          • Tenses: This feature tries to capture the dif-
grouped in three main categories: Stylistic, Struc-         ferent verbal tenses used in the tweet.
tural and Content, and Polarity Contrast. We de-          • Questions-answers: Occurrences of ques-
fine a set of features distributed as follows:              tions and answers pattern in the tweet.
                                                          • Part of Speech: The number of nouns, verbs,
Stylistic Features                                          adverbs and adjectives in the tweet are quan-
  • Length: Three different features were consi-            tified.
     dered: number of words, number of charac-            • Negation: The amount of negation words.
     ters, and the means of the length of the words
                                                        Polarity Contrast Features
     in the tweet.
  • Hashtags: The amount of hashtags.                   With the purpose of capturing some types of ex-
  • Urls: The number of url.                            plicit polarity contrast we consider the set of fea-
  • Emoticons: The number of emoticons.                 tures proposed in (Peña et al., 2018). The Ital-
  • Exclamations: Occurrences of exclamation            ian polarity lexicon (Basile and Nissim, 2013) was
     marks.                                             used to determine the contrast between different
  • Emphasized Words: Four different features           parts of the tweet.
     were considered: word emphasized through              • WordPolarityContrast: It is the polarity dif-
     repetition, capitalization, character flooding           ference between the most positive and the
     and exclamation marks.                                   most negative word in the tweet. This fea-
  • Punctuation Marks: The frequency of dots,                 ture, also consider the distance, in terms of
     commas, semicolons, and question marks.                  tokens, between the words.
  • Quotations: The number of expressions be-              • EmotiTextPolarityContrast: It is the pola-
     tween quotation marks.                                   rity contrast between the emoticons and the
                                                              words in the tweet.
Structural and Content Features                            • AntecedentConsequentPolarityContrast:
  • Antonyms: This feature considers the num-                 This considers the polarity contrast between
    ber of pairs of antonyms existing in the                  two parts of the tweet, when it is split
    tweet. WordNet (Miller, 1995) antonym re-                 by a delimiter. In this case, adverbs and
    lation was used for that.                                 punctuation marks were used as delimiters.
  • Lexical Ambiguity: Three different features            • MeanPolarityPhrase: It is the mean of the
    were considered using WordNet: the first one              polarities of the words that belong to quotes.
    is the mean of the number of synsets of each           • PolarityStandardDeviation: It is the standard
    word. The second one is the greatest number               deviation of the polarities of the words that
    of synsets that has a single word. The last is         3
                                                               http://wndomains.fbk.eu/hierarchy.html
                                                           4
    the difference between the number of synsets               http://www.adampease.org/OP/
      belong to quotes.                                SVM) are a modification of the CNN-LSTM
    • PresentPastPolarityContrast: It computes         model, in this case we change the softmax layer
      the polarity contrast between the parts of the   at the output of the model and use a Linear Sup-
      tweet written in present and past tense.         port Vector Machine (SVM) with default parame-
    • SkipGramPolarityRate:It computes the rate        ters as final classifier. Run3-c and run3-u (CNN-
      among skip-grams with polarity contrast and      LSTM-LING) represent the original introduced
      all valid skip-grams. The valid skip-grams       model without any variations. Finally, for run4-c
      are those composed by two words (nouns,          and run4-u (CNN-LSTM-LING-SVM) we change
      adjectives, verbs, adverbs) with skip=1.         the softmax layer by a linear SVM as final classi-
      The skip-grams with polarity opposition are      fier. For unconstrained runs, we include the ironic
      those that match with the patterns positive-     tweets provided by the corpus Twittirò (Cignarella
      negative, positive-neutral, negative-neutral,    et al., 2018b), to the official training set releases
      and vise versa.                                  by the IronITA organizers.
    • CapitalLetterTextPolarityContrast: It com-
      putes the polarity contrast between capital-        Analyzing Table 1, several observations can be
      ized words and the rest of the words in the      made. Firstly, unconstrained runs achieved bet-
      tweets.                                          ter results than constrained ones. These results
                                                       reveal that introducing more ironic examples im-
3    Experiments and Results                           proves the performance of the UO IRO. Secondly,
In this section we show the results of the pro-        the results achieved with the variants that consider
posed model in the shared task of “Irony Detec-        the linguistic knowledge (run3-c, run4-c, run3-u
tion” and discuss them. In a first experiment we       and run4-u) obtain an increase in the effectiveness.
analyze the performance of four variants of our        With respect to the strategy used for the final clas-
model using 10 fold cross-validation strategy on       sification of the tweets, generally, those variants
the training set. Also, each variant was running       that use SVM obtain a slight drop in the AVG-F1 .
in unconstrained and constrained setting, respec-
tively. In Table 1, we summarize the obtained            Regarding the official results, we submitted
results in terms of F1 measure macro averaged          four runs, two for constrained setting (RUN1-c
(F1-AVG). Specifically, we rely on the macro for       and RUN2-c) and two for unconstrained setting
preventing systems biased towards the most popu-       (RUN3-u and RUN4-u). For the unconstrained
lated classes.                                         variants of the UO IRO, the tweets provided by
                                                       the corpus Twittirò were also used with the train-
                                                       ing set. Taking into account the results of the Table
Table 1: Results obtained by UO IRO on the train-
                                                       1 we select to CNN-LSTM-LING (RUN1-c and
ing set by using 10-fold cross-validation.
                                                       RUN3-u) and CNN-LSTM-LING-SVM (RUN2-c
  Run       Model                        AVG-F1
                                                       and RUN4-u) as the most promising variants of the
                   Constrained                         model for evaluating in the official test set.
  run1-c CNN-LSTM                        0.7019
  run2-c CNN-LSTM-SVM                    0.6927           As can be observed in Table 2, our four runs
  run3-c CNN-LSTM-LING                   0.7124        were ranked 12th , 13th , 14th and 15th from a to-
  run4-c CNN-LSTM-LING-SVM 0.7040                      tal of 17 submissions. The unconstrained variants
                  Unconstrained                        of the UO IRO achieved better results than con-
  run1-u CNN-LSTM                        0.7860        strained ones. Contrary to the results shown in the
  run2-u CNN-LSTM-SVM                    0.7900        Table 1, the runs that use SVM as final classifica-
  run3-u CNN-LSTM-LING                   0.8226        tion strategy (RUN2-c and RUN4-u) were better
  run4-u CNN-LSTM-LING-SVM 0.8207                      ranked than the other ones. We think that this be-
                                                       havior may be caused by softmax classifiers (last
   For the run1-c and run1-u (CNN-LSTM) we             layer of the UO IRO), those are more sensitive to
only combine the representation obtained by the        the over-fitting problem than Support Vector Ma-
attention based LSTM model with the CNN                chines. Notice that, in all cases our model surpass
model, in these runs, no linguistic knowledge was      the two baseline methods established by the orga-
considered. Run2-c and run2-u (CNN-LSTM-               nizers.
                                                         Alessandra Cignarella, Frenda Simona, Basile Vale-
Table 2: Official results for the Irony Detection          rio, Bosco Cristina, Patti Viviana, and Rosso Paolo.
subtask.                                                   2018a. Overview of the EVALITA 2018 Task on
 Rank Runs            F1 -I    F1 -noI   Avg-F1            Irony Detection in Italian Tweets (IronITA). In
 12/17 RUN4-u         0.700 0.603        0.651             Tommaso Caselli, Nicole Novielli, Viviana Patti,
                                                           and Paolo Rosso, editors, Proceedings of Sixth Eval-
 13/17 RUN3-u         0.665 0.626        0.646             uation Campaign of Natural Language Process-
 14/17 RUN2-c         0.678 0.579        0.629             ing and Speech Tools for Italian. Final Workshop
 15/17 RUN1-c         0.577 0.652        0.614             (EVALITA 2018), Turin, Italy. CEUR.org.

                                                         Alessandra Teresa Cignarella, Cristina Bosco, Viviana
                                                           Patti, and Mirko Lai. 2018b. Application and Anal-
4   Conclusions                                            ysis of a Multi-layered Scheme for Irony on the
In this paper we presented the UO IRO system               Italian Twitter Corpus TWITTIRÓ. In Proceed-
                                                           ings of the Eleventh International Conference on
for the task of Irony Detection in Italian Tweets
                                                           Language Resources and Evaluation (LREC 2018),
(IronITA) at EVALITA 2018. We participated in              pages 4204–4211.
the “Irony classification’ subtask and our best sub-
mission ranked 12nd out of 17. Our proposal com-         Andrea Cimino and Felice Dell’Orletta. 2016. Tandem
                                                           LSTM-SVM Approach for Sentiment Analysis. In
bines attention-based Long Short-Term Memory
                                                           CLiC-it/EVALITA. 2016, pages 1–6. CEUR-WS.org.
Network, Convolutional Neural Network, and lin-
guistics information which is incorporated through       Jan Deriu and Mark Cieliebak. 2016. Sentiment
the second to last hidden layer of the model. The           Analysis using Convolutional Neural Networks with
results shown that the consideration of linguistic          Multi-Task Training and Distant Supervision on Ital-
                                                            ian Tweets. In CLiC-it/EVALITA. 2016, pages 1–5.
features in combination with the deep represen-             CEUR-WS.org.
tation learned by the neural network models ob-
tain better effectiveness based on F1-measure. Re-       Delia Irazú Hernańdez Farı́as, Viviana Patti, and Paolo
sults achieved by our system are interesting, how-         Rosso. 2016. Irony Detection in Twitter. ACM
                                                           Transactions on Internet Technology, 16(3):1–24.
ever a more fine-tuned hyper-parameters setting is
required for improving the model’s effectiveness.        Delia-Irazú Hernádez Farı́as, Viviana Patti, and Paolo
We think that including the linguistic features of         Rosso. 2018. ValenTO at SemEval-2018 Task 3 :
irony into the firsts layers of the model could be         Exploring the Role of Affective Content for Detect-
                                                           ing Irony in English Tweets. In Proceedings ofthe
a way to increase the effectiveness. We would              12th International Workshop on Semantic Evalua-
like to explore this approach in the future work.          tion (SemEval-2018), pages 643–648. Association
Also, we plan to analyze how affective informa-            for Computational Linguistics.
tion flows through the tweets, and how it impacts
                                                         José Angel Gonález, Lluı́s-F. Hurtado, and Ferran Pla.
on the irony realization.                                   2018. ELiRF-UPV at SemEval-2018 Tasks 1 and 3
                                                            : Affect and Irony Detection in Tweets. In Proceed-
                                                            ings ofthe 12th International Workshop on Semantic
References                                                  Evaluation (SemEval-2018), pages 565–569.
Salvatore Attardo. 2000. Irony as relevant inappropri-
                                                         Raj Kumar Gupta and Yinping Yang. 2017. Crys-
  ateness. Journal of Pragmatics, 32(6):793–826.
                                                           talNest at SemEval-2017 Task 4 : Using Sar-
Francesco Barbieri, Horacio Saggion, and Francesco         casm Detection for Enhancing Sentiment Classifi-
  Ronzano. 2014. Modelling Sarcasm in Twitter, a           cation and Quantification. In Proceedings ofthe
  Novel Approach. In Proceedings ofthe 5th Work-           11th International Workshop on Semantic Evalua-
  shop on Computational Approaches to Subjectivity,        tions (SemEval-2017), pages 626–633, Vancouver,
  Sentiment and Social Media Analysis, pages 136–          Canada. Association for Computational Linguistics.
  141.
                                                         Diederik P Kingma and Jimmy Ba. 2014. Adam: A
Valerio Basile and Malvina Nissim. 2013. Senti-            method for stochastic optimization. arXiv preprint
  ment analysis on Italian tweets. In 4th Workshop         arXiv:1412.6980.
  on Computational Approaches to Subjectivity, Senti-
  ment and Social Media Analysis, pages 100–107.         Diana Maynard and Mark A Greenwood. 2014. Who
                                                           cares about sarcastic tweets ? Investigating the im-
Piotr Bojanowski, Edouard Grave, Armand Joulin, and        pact of sarcasm on sentiment analysis. In Proceed-
   Tomas Mikolov. 2017. Enriching Word Vectors             ings of the Ninth International Conference on Lan-
   with Subword Information. Transactions of the           guage Resources and Evaluation (LREC’14). Euro-
   ACL., 5:135–146.                                        pean Language Resources Association.
George A Miller.      1995.  WordNet: a lexical
  database for English. Communications of the ACM,
  38(11):39–41.
Reynier Ortega-Bueno, Carlos E Mu, and Paolo Rosso.
  2018. UO UPV : Deep Linguistic Humor De-
  tection in Spanish Social Media. In Proceed-
  ings of the Third Workshop on Evaluation of Hu-
  man Language Technologies for Iberian Languages
  (IberEval 2018), pages 1–11.
Lluı́s Padró and Evgeny Stanilovsky. 2012. FreeLing
  3.0: Towards Wider Multilinguality. In Proceedings
  of the (LREC 2012).
Anakarla Sotolongo Peña, Leticia Arco Garcı́a, and
  Adrián Rodrı́guez Dosina.     2018.    Detección
  de ironı́a en textos cortos enfocada a la minerı́a
  de opinión. In IV Conferencia Internacional en
  Ciencias Computacionales e Informáticas (CICCI’
  2018), number 1-10, Havana, Cuba.
Bo Peng, Jin Wang, and Xuejie Zhang. 2018. YNU-
  HPCC at SemEval-2018 Task 3 : Ensemble Neu-
  ral Network Models for Irony Detection on Twit-
  ter. In 622 Proceedings ofthe 12th International
  Workshop on Semantic Evaluation (SemEval-2018),
  pages 622–627. Association for Computational Lin-
  guistics.
Harsh Rangwani, Devang Kulshreshtha, and Anil Ku-
  mar Singh. 2018. NLPRL-IITBHU at SemEval-
  2018 Task 3 : Combining Linguistic Features
  and Emoji Pre-trained CNN for Irony Detection in
  Tweets. In Proceedings ofthe 12th International
  Workshop on Semantic Evaluation (SemEval-2018),
  pages 638–642. Association for Computational Lin-
  guistics.
Antonio Reyes and Paolo Rosso. 2014. On the dif-
  ficulty of automatically detecting irony: beyond a
  simple case of negation. Knowledge and Informa-
  tion Systems, 40(3):595–614.
Antonio Reyes, Paolo Rosso, and Davide Buscaldi.
  2012. From humor recognition to irony detection:
  The figurative language of social media. Data and
  Knowledge Engineering, 74:1–12.
Antonio Reyes, Paolo Rosso, and Tony Veale. 2013.
  A multidimensional approach for detecting irony
  in Twitter. Language Resources and Evaluation,
  47(1):239–268.
Deirdre Wilson and Dan Sperber. 1992. On verbal
  irony. Lingua, 87(1):53–76.
Chuhan Wu, Fangzhao Wu, Sixing Wu, Junxin Liu,
  Zhigang Yuan, and Yongfeng Huang. 2018. THU
  NGN at SemEval-2018 Task 3 : Tweet Irony De-
  tection with Densely Connected LSTM and Multi-
  task Learning. In Proceedings ofthe 12th Interna-
  tional Workshop on Semantic Evaluation (SemEval-
  2018), pages 51–56. Association for Computational
  Linguistics.