Computational rule-based model for Irony Detection in Italian Tweets

                                              Simona Frenda
                                    FICLIT - University of Bologna, Italy
                                     simona.frenda@gmail.com


                      Abstract                         the length of tweets is limited (140 characters),
                                                       users are encouraged to use some creative de-
    English. In the domain of Natural Language         vices in order to communicate their opinions. In
    Processing (NLP), the interest in figurative       particular they express their emotions or feelings
    language is enhanced, especially in the last       through some morphosyntactic elements or con-
    few years, thanks to the amount of linguistic      ventional expedients, such as: emoticons, hash-
    data provided by web and social networks.          tags, heavy punctuation, etc. It seems that these
    Figurative language provides a non-literary
                                                       elements represent a substitution of typical ges-
    sense to the words, thus the utterances require
    several interpretations disclosing the play of     tures and tones of oral communication. In this re-
    signification. In order to individuate different   search we used some linguistic features, fre-
    meaning levels in case of ironic texts detec-      quently found in ironic tweets, as referent points
    tion, it is necessary a computational model        to create the rules of our irony detection system
    appropriated to the complexity of rhetorical       in Italian tweets.
    artifice. In this paper we describe our rule-         The results we gained are promising and re-
    based system of irony detection as it has been     veal the features considered can be good ironic
    presented to the SENTIPOLC task of                 clues to identify ironic texts.
    EVALITA 2016, where we ranked third on                In the following section we synthetically de-
    twelve participants.
                                                       scribe the state of art about irony detection. In
                                                       the third and fourth sections we present our ap-
    Italiano. Nell’ambito del Natural Language
    Processing (NLP) l’interesse per il linguag-       proach, describing the linguistic resources used
    gio figurativo è particolarmente aumentato         and data processing. The fifth section contains
    negli ultimi anni, grazie alla quantità            the description of linguistic features, and finally
    d’informazione linguistica messa a disposi-        in the sixth section we present the results ob-
    zione dal web e dai social network. Il lin-        tained in SENTIPOLC evaluation.
    guaggio figurativo conferisce alle parole un
    senso che va oltre quello letterale, pertanto      2    Related Work
    gli enunciati richiedono interpretazioni pluri-
    voche che possano svelare i giochi di signifi-     Although the difficulties of research, it is evident
    cato del discorso. Nel caso specifico del rico-    in the literature an attempt to understand this lin-
    noscimento automatico di un testo ironico,         guistic phenomenon and develop some computa-
    infatti, determinare la presenza di diversi        tional models to detect or generate irony.
    gradi di significazione esige un modello com-         In the 90s Lessard and Levison (1992, 1993) 1
    putazionale adeguato alla complessità              and Binsted and Ritchie (1994, 1997)2 developed
    dell’artificio retorico. In questo articolo de-    the first joke generators and recently Stock and
    scriviamo il nostro sistema “rule-based” de-
                                                       Strapparava (2006) realized HAHAcronym, a
    dito al riconoscimento dell’ironia che ha
    partecipato al task SENTIPOLC di EVALITA           system designed to generate and re-analyze the
    2016, nel quale ci siamo classificati terzi su     acronyms, considering semantic opposition and
    dodici partecipanti.                               rhythm criteria.
                                                          The research described by Utsumi (1996) was
1    Introduction                                      one of the first approaches to automatic irony
                                                       processing, even though it was too abstract for a
The amount of texts available on the web and es-       computational framework. In 2009, Veale and
pecially in social networks has become a source        Hao noted that English figurative comparisons
of linguistic information especially for the Senti-
ment Analysis. For instance, on Twitter, where         1Ritchie (2009: 73).
                                                       2Ritchie (2009: 73).
(as X as Y) are often used to express ironic opin-      cally referred or embedded in the morphology of
ions, especially when the marker “about” is             the verb “ser”.
present (about as X as Y). Recently, Reyes et al.          Our work proposes an adaptation for some of
(2013) produced a multidimensional model for            these clues, increased by other surface features,
detecting irony on Twitter based on four concep-        to Italian irony detection in Twitter.
tual features: signatures (pointedness, counter-
factuality, and temporal compression), unexpect-        3     Methodology
edness (temporal imbalance and contextual im-
                                                        Approaching the detection of irony in tweets
balance), style and emotional scenarios (activa-
                                                        means to understand how people, especially net
tion, imagery, and pleasantness described by
                                                        users, make irony. We try to approach this hard
Whissel, 20093). Barbieri and Saggion (2014)
                                                        work by analyzing the corpus of tweets and iden-
proposed a model based on a group of seven sets
                                                        tifying possible ironic clues. Once identified, sur-
of lexical and semantic features of the words in a
                                                        face features common to ironic tweets are in-
tweet: frequency, written-spoken style, intensity
                                                        serted as binary rules in our system.
of adverbs and adjectives, structure (punctuation,
                                                           Our rule-based system, written in Perl, finds
length, emoticons), sentiments, synonyms and
                                                        ironic features (described in section 5) in tweets
ambiguity.
                                                        and consequently distinguishes the ironic ones
   Karoui et al. (2015) focused on the presence of
                                                        from the non-ironic.
negation markers as well as on both implicit and
                                                           In the following sections we describe re-
explicit opposition in French ironic tweets.
                                                        sources used, data processing, ironic clues and
Moreover, this research highlights the impor-
                                                        the results obtained in the EVALITA 2016 SEN-
tance of surface traits in ironic texts, such as:
                                                        TIPOLC task.
punctuation marks (González-Ibáñez et al.,
2011), sequence or combination of exclamation           4     Analysis of corpus
and question marks (Carvalho et al., 2009;
Buschmeier et al., 2014), tweet length (Davidov         For this research we used a corpus of tweets pro-
et al., 2010), interjections (González-Ibáñez et        vided by SENTIPOLC organizers (Barbieri et al.,
al., 2011), words in capital letters (Reyes et al.,     2016). This training set is composed of 7410
2013), emoticons (Buschmeier et al., 2014), quo-        tweets labeled according to the criteria of subjec-
tations (Tsur et al., 2010)4, slang words (Burfoot      tivity, overall and literal polarity (positive/neu-
and Baldwin, 2009)5 and opposition words, as            tral/negative/mixed), irony and political topic.
“but” or “although” (Utsumi, 2004)6.
   Carvalho et al. (2009) distinguished eight           4.1       Resources
“clues” for irony detection in some comments            For the analysis and processing of Italian tweets
(each consisting of about four sentences) from a        we used some linguistic resources available on-
Portuguese online newspaper. Their attention fo-        line, such as:
cused on positive comments because in a previ-
ous research they showed that positive sentences              •    Sentiment Lexicon LOD (Linked Open
are more subjected to irony and it is more diffi-                  Data). Developed by the Institute for
cult to recognize their true polarity. So the idea is              Computational Linguistics “A. Zam-
to identify the irony in apparently positive sen-                  polli”, it contains 24.293 lexical entries
tences that require the presence of at least one                   annotated with positive/negative/neutral
positive adjective or noun in a window of four                     polarity.
words. Carvalho et al. (2009) based their model
on both oral and gestural “clues” of irony, such              •    Morph-it! (Zanchetta and Baroni, 2005).
as: emoticons, heavy punctuation, quotation                        It is a lexicon of inflected forms of
marks, onomatopoeic expressions for laughter                       34.968 lemma (extracted from the corpus
and positive interjections and, on the other hand,                 of “La Repubblica”) with their morpho-
on specific morphosyntactic constructions, such                    logical features.
as: the diminutive form of NE, the demonstrative        A tweet is composed of different essential ele-
determiners before NE, the pronoun “tu” specifi-        ments for linguistic analysis, as interjections and
                                                        emoticons. We therefore developed a lexicon of
3Reyes et al. (2013: 249).                              interjections and a list of emoticons described
4Karoui et al. (2015).
5Karoui et al. (2015).                                  summarily below:
6Karoui et al. (2015).
      •     The interjections, extracted from Morph-            •    the label EMONEG replaces negative
            it! and Treccani7, are manually annotated                emoticons;
            with their polarity. The annotation has
                                                                •    the label EMOIRO replaces ironic
            been developed with the support of Vo-
                                                                     emoticons;
            cabolario Treccani, while the sentiment
            lexicon has been used to label improper             •    the characters of url are removed.
            interjections (see Table 1).
                                                          This method allows us to clean up the texts from
      •     The      emoticons,      extracted     from   those characters that may hinder the analysis of
            Wikipedia, are subdivided in EMOPOS,          data and ironic clues retrieval.
            EMONEG and EMOIRO, according to
            the classification of Di Gennaro et al.       5     Features
            (2014) and Wikipedia description8, espe-
                                                          In section 2 we have presented the research of
            cially for the ironic annotation (see Table
                                                          Carvalho et al. (2009) which demonstrated how
            2).
                                                          the most productive patterns (with a precision
      Positive           Negative         Neutral         from 45% to 85%) are the ones related to orality
                                                          and gesture, as emoticons or expressions for
          evviva            mah               boh         laughter. Based on this analysis, we try to recog-
           urrà           macché              mhm         nize ironic tweets with a system designed to find
                                                          ironic clues into the texts. Some of these clues
   complimenti               bah           chissà
                                                          are adapted to Italian language from Portuguese,
 congratulazioni            puah              beh         while some other features are individuated dur-
                                                          ing the analysis of the tweets.
Table 1: Example of annotated lexicon of inter-              All of these features are used as binary rules in
jections.                                                 our system to classify the texts in ironic and non-
                                                          ironic.
          Label                    Emoticon               5.1       Positive Interjections
                       =) =] :D (-: [-: (-; [-;           Ameka (1992)9 describes the interjections as
      EMOPOS
                      :-> :) :-) (; ;)                    “relatively conventionalized vocal gestures
                      :[ =( :-( :'( :-/ :/ :-> :\> :/     which express a speaker’s mental state, action or
      EMONEG                                              attitude or reaction to a situation”. These linguis-
                      =/ =\ :L =L :S
                                                          tic elements are used as simple ways to commu-
                      ^^ ^.^ :P xP ^3^ ^L^ ^_^            nicate user’s feelings or moods.
      EMOIRO
                      ^-^ ^w^                                In previous researches interjections were rep-
                                                          resented as good humor clues. Kreuz and Caucci
Table 2: Example of annotated list of emoticons.          (2007) tried to determine if specific lexical fac-
                                                          tors might suggest the interpretation of a state-
4.2       Data Processing                                 ment as sarcastic. They demonstrated with a test
Incoming file processed by our system has been            that the presence of interjections is a good pre-
previously lemmatized and syntactically anno-             dictor for the readers. They provided a group of
tated by TreeTagger (Schmid, 1994) with Italian           students with some extracts from various works,
tagset provided by Baroni.                                a part of which originally contained the word
   Nevertheless, before syntactic analysis, we ap-        “sarcastically”. Students were able to classify
plied the rules of substitution and elimination of        correctly the extracts where the word “sarcasti-
some textual elements, in order to clean up the           cally” was deleted thanks to the interjections.
texts and avoid hampering the process of                     Carvalho et al. (2009) noted that positive in-
POStagging and lemmatization of TreeTagger. In            terjections has very often an ironical use in ap-
particular:                                               parently positive utterances.
                                                             Taking into consideration these precedent re-
      •     the label EMOPOS replaces positive            searches, we consider improper and proper inter-
            emoticons;                                    jections annotated with positive polarity (see Ta-
                                                          ble 1 in section 4.1). Improper interjections are
7http://www.treccani.it
8Wikipedia version of the 6th of June.                    9Lindbladh (2015: 1).
usually followed by exclamations or question            user’s mood. In particular we focus on the ironic
marks, which suggest a rising intonation (“si-          emoticons, those which express joking or ironic
curo!”), whereas proper ones (or onomatopoeic           intention (see section 4.1). We have distin-
expressions) are sometimes added to the phrase          guished EMOIRO from EMOPOS because posi-
without any punctuation characters (“ah dimenti-        tive emoticons (considered in Carvalho et al.,
cavo”, “ah comunque”).                                  2009 and González-Ibáñez et al., 2011) are fre-
                                                        quently used to express a humorous intention,
5.2   Expressions with “che”                            not specifically ironic.
The adjective or pronoun “che” can be used with
                                                        5.7    Hashtag
exclamatory intention in expressions such as
“che ridere”, “che educato”, “che sorpresa”.            Hashtag is a special element in the syntax of
Like interjections, these expressions are used as       tweets used to connect those ones containing the
marks to express user’s emotions and their ironic       same keywords (which may be a part of the
intent.                                                 speech) or phrases as #mobbastaveramenteperò.
                                                           The user communicates through hashtags sev-
5.3   Pronoun “tu” and Verb Morphology                  eral information about events, people they refers
The use of pronoun “tu” and its morphological           to and the topic of message. We focus on hash-
inflection of the verb “essere” expresses a high        tags that may suggest to the readers an ironic
degree of proximity between the user and the            connotation of the message as #lol and #ironia,
person it refers to (Carvalho et al., 2009). For in-    and on others that we extracted from ironic
stance, if this person is a popular politician, this    tweets in the training set: #stranezze, #Ahahaha-
degree of familiarity is fake or artificial and it is   hah, #benecosì, etc.
usually used ironically in the tweets.
                                                        5.8    Regional Expressions
5.4   Disjunctive Conjunction                           It seems that regional expressions are utilized by
In the training set we note how disjunctive con-        users in ironic texts to underline their own mood
junctions (“o”, “oppure”) are used to introduce         and emotions. In particular, common construc-
an alternative between two propositions or con-         tions deriving from local use may be: “annamo
cepts which may belong to very different seman-         bene”, “namo bene” and “ce” followed by the
tic domains (for example: In televisione stamat-        verb (e.g. “ce vuole”, “ce sta”, “ce potrebbe”), as
tina: i cartoni animati o Mario Monti.[…]). This        in this ironic tweet: “@zdizoro t'appassionerà
strange combination of ideas surprises the read-        sapè che nel prossimo governo #Monti ce
ers and suggests them a possible ironic interpre-       potrebbe rimanè MaryStar Gelmini, come n'in-
tation of the message.                                  crostazione”.
5.5   Onomatopoeic Expressions for laughter             5.9    Quotation Marks
Onomatopoeic expressions for laughter (the most         We focus on the use of quotation marks as a sign
diffused are “ahah”, “hehe” and “ihih”) are usu-        for the readers to interpret non-literally the con-
ally used in humorous texts (Carvalho et al.,           tent of text. In fact, in the social networks these
2009; Buschmeier et al., 2014) with their vari-         elements are frequently used to underline the
ants (in capital letters or with repetitions). They     possible different meanings of the word between
represent some marks which inform the reader            quotation marks, and emphasize the ironic con-
about the user’s mood and also suggest that the         tent.
tweet must be interpreted in a figurative sense.
                                                        5.10   Heavy Punctuation
5.6   Ironic Emoticons                                  In web communication the punctuation plays an
Users utilize emoticons to show their facial ex-        important role in the expression of the emotions
pressions as well as their emotions in the texts.       and feelings. Several researches (González-
Tavosanis (2010) presents a macro-classification        Ibáñez et al., 2011; Kreuz and Caucci, 2007; Car-
of emoticons: expressive, decorative/pleasant           valho et al., 2009a; Buschmeier et al., 2014;
and of morphosyntactic substitution, which stand        Davidov et al. 2010; Karoui et al., 2014) consid-
for a word or a whole phrase.                           ered the punctuation as a surface feature to signal
   In our research we only consider expressive          humorous texts. In particular we focus on combi-
emoticons which add information about the
nation of question and exclamation marks to               Francesco Barbieri, Valerio Basile, Danilo Croce,
irony detection.                                             Malvina Nissim, Nicole Novielli and Viviana Patti.
                                                             2016. Overview of the EVALITA 2016 SENTIment
6    Results                                                 POLarity Classification Task. In Pierpaolo Basile,
                                                             Anna Corazza, Franco Cutugno, Simonetta Monte-
Our system is evaluated on the SENTIPOLC of-                 magni, Malvina Nissim, Viviana Patti, Giovanni
ficial test data composed of 3000 tweets and the             Semeraro and Rachele Sprugnoli, editors, Proceed-
values of precision, recall and average F-score              ings of Third Italian Conference on Computational
are calculated using the evaluation tool provided            Linguistics (CLiC-it 2016) & Fifth Evaluation
by the organizers (Barbieri et al., 2016). As we             Campaign of Natural Language Processing and
                                                             Speech Tools for Italian. Final Workshop
can see from Table 3, official results of our sys-
                                                             (EVALITA 2016). Associazione Italiana di Linguis-
tem are promising, although our research in this             tica Computazionale (AILC).
domain has to be improved.
                                                          Konstantin Buschmeier, Philipp Cimiano, and Roman
                                                            Klinger. 2014. An impact analysis of features in a
          Rank                       F-score                classification approach to irony detection in prod-
             1                        0.548                 uct reviews. Proceedings of the 5th Workshop on
                                                            Computational Approaches to Subjectivity, Senti-
             2                       0.5412                 ment and Social Media Analysis. Baltimore, Mary-
             3                       0.5251                 land, USA. 42–49.

             4                       0.5162               Paula Carvalho, Luís Sarmento, Mário J. Silva and
                                                            Eugénio De Oliveira. 2009. Clues for detecting
             5                       0.5133                 irony in user-generated contents: Oh...!! it’s “so
                                                            easy” ;-). Proceedings of the 1st international
             6                       0.4992                 CIKM workshop on Topic-sentiment analysis for
             7                       0.4961                 mass opinion. ACM. 53–56.
             8                       0.4872               Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010.
                                                            Semi-supervised recognition of sarcastic sentences
             9                        0.481                 in twitter and amazon. Proceedings of the Four-
            10                       0.4761                 teenth Conference on Computational Natural Lan-
                                                            guage Learning, CoNLL ’10. Stroudsburg, PA,
            11                       0.4728                 USA. Association for Computational Linguistics.
                                                            107–116.
            12                       0.4725
                                                          Pierluigi Di Gennaro, Arianna Rossi and Fabio Tam-
Table 3: Official results and ranking of Irony De-           burini. 2014. The FICLIT+CS@UniBO System at
                                                             the EVALITA 2014 Sentiment Polarity Classifica-
tection sub-task.
                                                             tion Task. Proceedings of the Fourth International
                                                             Workshop EVALITA 2014. Pisa University Press.
7    Conclusion
                                                          Roberto González-Ibáñez, Smaranda Muresan and
In this paper we have described our computa-                Nina Wacholder. 2011. Identifying Sarcasm in
tional model based on linguistic features which             Twitter: A Closer Look. Proceedings of the 49th
have proven to be good clues for the identifica-            Annual Meeting of the Association for Computa-
tion of ironic texts. Nonetheless, in future works          tional Linguistics: shortpapers. Portland, Oregon.
we plan to examine in depth semantic inconsis-              581–586.
tencies and ambiguities, amusing wordplay and             Jihen Karoui, Farah Benamara Zitoune, Veronique
rhymes that may surprise the reader. In conclu-              Moriceau, Nathalie Aussenac-Gilles and Lamia
sion, we think that a good detection of irony is             Hadrich Belguith. 2015. Detection automatique de
possible if all the levels of linguistic analysis are        l’ironie dans les tweet en francais. 22eme Traite-
considered.                                                  ment Automatique des Langues Naturelles. Caen.
                                                          Roger J. Kreuz and Gina M. Caucci. 2007. Lexical In-
References                                                  fluences on the Perception of Sarcasm. Proceed-
Francesco Barbieri and Horacio Saggion. 2014. Mod-          ings of the Workshop on Computational Ap-
   elling Irony in Twitter, Features Analysis and Eval-     proaches to Figurative Language. Rochester, NY.
   uation. Language Resources and Evaluation con-           1–4.
   ference, LREC. Reykjavik, Iceland.                     Sara Lindbladh. 2015. La semantica e pragmatica dei
                                                            segnali discorsivi italiani – un confronto tra bene,
  va bene, be’ e va be’. Seminarium 27 oktober. Uni-
  versita di Uppsala, Sweden.
Antonio Reyes, Paolo Rosso and Tony Veale. 2013. A
  multidimensional approach for detecting irony in-
  Twitter. Language Resources and Evaluation. 47:
  239–268.
Graeme Ritchie. 2009. Can computers create humor?
  AI Magazine. Volume 30, No. 3. 71-81.
Helmut Schmid. 1994. Probabilistic Part-of-Speech-
  Tagging Using Decision Trees. Proceedings of In-
  ternational Conference on New Methods in Lan-
  guage Processing, Manchester, UK.
Oliviero Stock and Carlo Strapparava. 2006. Laugh-
   ing with HAHAcronym, a computational humor
   system. Proceedings of the Twenty-First National
   Conference on Artificial Intelligence (AAAI-06),
   Boston, Massachusetts.
Mirko Tavosanis. 2010. L’italiano del web. Carocci.
  Roma.
Akira Utsumi. 1996. A unified theory of irony and its
  computational formalization. Proceedings of the
  16th conference on computational linguistics. As-
  sociation for Computational Linguistics. Morris-
  town, NJ. 962–967.
Tony Veale and Yanfen Hao. 2009. Support structures
  for linguistic creativity: A computational analysis
  of creative irony in similes. Proceedings of CogSci
  2009, the 31st annual meeting of the cognitive sci-
  ence society. 1376–1381.
Eros Zanchetta and Marco Baroni. 2005. Morph-it! A
  free corpus-based morphological resource for the
  Italian language. Proceedings of Corpus Linguis-
  tics 2005. University of Birmingham, Birmingham,
  UK.