Exploiting Emotive Features for the
                       Sentiment Polarity Classification of tweets

                Lucia C. Passaro, Alessandro Bondielli and Alessandro Lenci
                CoLing Lab, Dipartimento di Filologia, Letteratura e Linguistica
                                   University of Pisa (Italy)
                            lucia.passaro@for.unipi.it
                         alessandro.bondielli@gmail.com
                             alessandro.lenci@unipi.it


                    Abstract                          emoticons, slang, specific terminology, abbrevia-
                                                      tions, links and hashtags is higher than in other do-
    English. This paper describes the CoL-            mains and platforms. Twitter users post messages
    ing Lab system for the participation in           from many different media, including their smart-
    the constrained run of the EVALITA 2016           phones, and they “tweet” about a great variety of
    SENTIment POLarity Classification Task            topics, unlike what can be observed in other sites,
    (Barbieri et al., 2016). The system ex-           which appear to be tailored to a specific group of
    tends the approach in (Passaro et al., 2014)      topics (Go et al., 2009).
    with emotive features extracted from ItEM            The paper is organized as follows: Section 2
    (Passaro et al., 2015; Passaro and Lenci,         describes the architecture of the system, as well
    2016) and FB-NEWS15 (Passaro et al.,              as the pre-processing and the features designed
    2016).                                            in (Passaro et al., 2014). Section 3 shows the
                                                      additional features extracted from emotive VSM
    Italiano. Questo articolo descrive il             and from LDA. Section 4 shows the classification
    sistema sviluppato all’interno del CoL-           paradigm, and the last sections are left for results
    ing Lab per la partecipazione al task             and conclusions.
    di EVALITA 2016 SENTIment POLarity
    Classification Task (Barbieri et al., 2016).
                                                      2     Description of the system
    Il sistema estende l’approccio descritto in
    (Passaro et al., 2014) con una serie di fea-      The system extends the approach in (Passaro et al.,
    tures emotive estratte da ItEM (Passaro et        2014) with emotive features extracted from ItEM
    al., 2015; Passaro and Lenci, 2016) and           (Passaro et al., 2015; Passaro and Lenci, 2016)
    FB-NEWS15 (Passaro et al., 2016).                 and FB-NEWS15 (Passaro et al., 2016). The main
                                                      goal of the work is to evaluate the contribution of
                                                      a distributional affective resource to estimate the
1   Introduction                                      valence of words. The CoLing Lab system for
Social media and microblogging services are ex-       polarity classification includes the following ba-
tensively used for rather different purposes, from    sic steps: (i) a preprocessing phase, to separate
news reading to news spreading, from entertain-       linguistic and nonlinguistic elements in the target
ment to marketing. As a consequence, the study        tweets; (ii) a feature extraction phase, in which the
of how sentiments and emotions are expressed in       relevant characteristics of the tweets are identified;
such platforms, and the development of methods        (iii) a classification phase, based on a Support Vec-
to automatically identify them, has emerged as a      tor Machine (SVM) classifier with a linear kernel.
great area of interest in the Natural Language Pro-
                                                      2.1    Preprocessing
cessing Community. Twitter presents many lin-
guistic and communicative peculiarities. A tweet,     The aim of the preprocessing phase is the identifi-
in fact, is a short informal text (140 characters),   cation of the linguistic and nonlinguistic elements
in which the frequency of creative punctuation,       in the tweets and their annotation.
   While the preprocessing of nonlinguistic ele-            2015). The features are, for each emotion,
ments such as links and emoticons is limited to             the total count of strongly emotional tokens
their identification and classification (cf. section        in each tweet.
2.2.4), the treatment of the linguistic material re-
quired the development of a dedicated rule-based       Bad words lexicon: By exploiting an in house
procedure, whose output is a normalized text that          built lexicon of common Italian bad words,
is subsequently feed to a pipeline of general-             we reported, for each tweet, the frequency of
purpose linguistic annotation tools. The following         bad words belonging to a selected list, as well
rules have been applied in the linguistic prepro-          as the total amount of these lemmas.
cessing phase:                                         Sentix: Sentix (Sentiment Italian Lexicon:
                                                           (Basile and Nissim, 2013)) is a lexicon for
  • Emphasis: tokens presenting repeated char-
                                                           Sentiment Analysis in which 59,742 lemmas
    acters like bastaaaa “stooooop” are replaced
                                                           are annotated for their polarity and intensity,
    by their most probable standardized forms
                                                           among other information. Polarity scores
    (i.e. basta “stop”);
                                                           range from −1 (totally negative) to 1 (totally
  • Links and emoticons: they are identified and           positive), while Intensity scores range from
    removed;                                               0 (totally neutral) to 1 (totally polarized).
                                                           Both these scores appear informative for the
  • Punctuation: linguistically irrelevant punctu-         classification, so that we derived, for each
    ation marks are removed;                               lemma, a Combined score Cscore calculated
                                                           as follows:
  • Usernames: the users cited in a tweet are
    identified and normalized by removing the @                      Cscore = Intensity ∗ P olarity      (1)
    symbol and capitalizing the entity name;                Depending on their Cscore , the selected lem-
  • Hashtags: they are identified and normalized            mas have been organized into several groups:
    by simply removing the # symbol;                          • strongly positives: 1 ≤ Cscore < 0.25
                                                              • weakly positives:0.25 ≤ Cscore < 0.125
   The output of this phase are linguistically-               • neutrals: 0.125 ≤ Cscore ≤ −0.125
                                                              • weakly negatives: −0.125 < Cscore ≤ −0.25
standardized tweets, that are subsequently POS
                                                              • highly negatives: −0.25 < Cscore ≤ −1
tagged with the Part-Of-Speech tagger described
in (Dell’Orletta, 2009) and dependency-parsed               Since Sentix relies on WordNet sense dis-
with the DeSR parser (Attardi et al., 2009).                tinctions, it is not uncommon for a lemma
                                                            to be associated with more than one
2.2   Feature extraction                                    hIntensity,Polarityi pair, and consequently to
The inventory of features can be organized into six         more than one Cscore .
classes. The five classes of features described in          In order to handle this phenomenon, the lem-
this section have been designed in 2014, the sixth          mas have been splitted into three different
class, described in the next section is referred to         ambiguity classes: Lemmas with only one
the emotive and LDA features.                               entry or whose entries are all associated with
2.2.1 Lexical Features                                      the same Cscore value, are marked as “Unam-
                                                            biguous” and associated with their Cscore .
Lexical features represent the occurrence of bad
words or of words that are either highly emotional          Ambiguous cases were treated by inspecting,
or highly polarized. Relevant lemmas were identi-           for each lemma, the distribution of the associ-
fied from two in-house built lexicons (cf. below),          ated Cscores : Lemmas which had a Majority
and from Sentix (Basile and Nissim, 2013), a lexi-          Vote (MV) were marked as “Inferable” and
con of sentiment-annotated Italian words. Lexical           associated with the Cscore of the MV. If there
features include:                                           was no MV, lemmas were marked as “Am-
                                                            biguous” and associated with the mean of the
ItEM seeds: Lexicon of 347 highly emotional                 Cscores . To isolate a reliable set of polarized
    Italian words built by exploiting an online             words, we focused only on the Unambigu-
    feature elicitation paradigm (Passaro et al.,           ous or Inferable lemmas and selected only the
     250 topmost frequent according to the PAIS               and :-), marked with their polarity score: 1
     corpus (Lyding et al., 2014), a large collec-            (positive), −1 (negative), 0 (neutral).
     tion of Italian web texts.                               LexEmo is used both to identify emoticons
     Other Sentix-based features in the ColingLab             and to annotate their polarity.
     model are: the number of tokens for each                 Emoticon-related features are the total
     Cscore group, the Cscore of the first token in           amount of emoticons in the tweet, the
     the tweet, the Cscore of the last token in the           polarity of each emoticon in sequential order
     tweet and the count of lemmas that are repre-            and the polarity of each emoticon in reversed
     sented in Sentix.                                        order. For instance, in the tweet :-(quando
                                                              ci vediamo? mi manchi anche tu! :*:*
2.2.2 Negation
                                                              “:-(when are we going to meet up? I miss
Negation features have been developed to encode               you, too :*:*” there are three emoticons,
the presence of a negation and the morphosyntac-              the first of which (:-() is negative while the
tic characteristics of its scope.                             others are positive (:*; :*).
   The inventory of negative lemmas (e.g. “non”)              Accordingly, the classifier has been fed
and patterns (e.g. “non ... mai”) have been ex-               with the information that the polarity of
tracted from (Renzi et al., 2001). The occurrences            the first emoticon is −1, that of the second
of these lemmas and structures have been counted              emoticon is 1 and the same goes for the third
an inserted as features to feed the classifier.               emoticon. At the same way, another group of
   In order to characterize the scope of each nega-           feature specifies that the polarity of the last
tion, we used the dependency parsed tweets pro-               emoticon is 1, as it goes for that of the last
duced by DeSR (Attardi et al., 2009). The scope               but one emoticon, while the last but two has
of a negative element is assumed to be its syntac-            a polarity score of −1.
tic head or the predicative complement of its head,
in the case the latter is a copula. Although it is        Links: These features contain a shallow classifi-
clearly a simplifying assumption, the preliminary             cation of links performed using simple reg-
experiments show that this could be a rather cost-            ular expressions applied to URLs, to clas-
effective strategy in the analysis of linguistically          sify them as following: video, images,
simple texts like tweets.                                     social and other. We also use as feature
   This information has been included in the model            the absolute number of links for each tweet.
by counting the number of negation patterns en-
countered in each tweet, where a negation pat-            Emphasis: The features report on the number of
tern is composed by the PoS of the negated ele-              emphasized tokens presenting repeated char-
ment plus the number of negative tokens depend-              acters like bastaaaa, the average number of
ing from it and, in case it is covered by Sentix, ei-        repeated characters in the tweet, and the cu-
ther its Polarity, its Intensity and its Cscores value.      mulative number of repeated characters in the
                                                             tweet.
2.2.3 Morphological features
The linguistic annotation produced in the prepro-         Creative Punctuation: Sequences of contigu-
cessing phase has been exploited also in the pop-             ous punctuation characters, like !!!,
ulation of the following morphological statistics:            !?!?!?!!?!????! or ......., are
(i) number of sentences in the tweet; (ii) number of          identified and classified as a sequence of
linguistic tokens; (iii) proportion of content words          dots, exclamations marks, question marks or
(nouns, adjectives, verbs and adverbs); (iv) num-             mixed. For each tweet, the features corre-
ber of tokens for Part-of-Speech.                             spond to the number of sequences belonging
                                                              to each group and their average length in
2.2.4 Shallow features                                        characters.
This group of features has been developed to de-
scribe distinctive characteristics of web communi-        Quotes: The number of quotations in the tweet.
cation. The group includes:                               2.2.5 Twitter features
Emoticons: We used the lexicon LexEmo to mark             This group of features describes some Twitter-
   the most common emoticons, such as :-(                 specific characteristics of the target tweets.
Topic: This information marks if a tweet has been                 of features have been extracted. The simplest ones
    retrieved via a specific political hashtag or                 include general statistics such as the number of
    keywords. It is provided by organizers as an                  emotive words and the emotive score of a tweet.
    attribute of the tweet;                                       More sophisticated features are aimed at inferring
                                                                  the degree of distinctivity of a word as well as its
Usernames: The number of @username in the                         polarity from their own emotive profile.
    tweet;
                                                                  Number of emotive words: Words belonging to
Hashtags: Hashtags play the role of organizing                       the emotive Facebook spaces;
    the tweets around a single topic, so that they
    are useful to be considered in determing their                Emotive/words ratio: The ratio between the
    polarity (i.e. a tweet containing hashtags like                  number of emotive words and the total num-
    and #amore “#love” and #felice “#happy”                          ber of words in the tweet;
    is expected to be positive and a tweet con-
                                                                  Strongly emotive words: Number of words hav-
    taining hashtags like #ansia “#anxiety” and
                                                                      ing a high (greater than 0.4) emotive score for
    #stressato “#stressedout” is expected to be
                                                                      at least one emotion;
    negative. This group of features registers the
    presence of an hashtag belonging to the list                  Tweet emotive score: Score calculated as the ra-
    of the hashtags with a frequency higher than                      tio between the number of strongly polarized
    1 in the training corpus.                                         words and the number of the content words in
                                                                      the tweet (Eq. 2). The feature assumes values
3       Introducing emotive and LDA features                          in the interval [0, 1]. In absence of strongly
In order to add emotive features to the CoLing Lab                    emotive words, the default value is 0.
model, we created an emotive lexicon from the                                         Count(Strongly emotive words)
corpus FB-NEWS15 (Passaro et al., 2016) follow-                         E(T weet) =
                                                                                         Count(Content words)
ing the strategy illustrated in (Passaro et al., 2015;                                                            (2)
Passaro and Lenci, 2016). The starting point is
                                                                  Maximum values: The maximum emotive value
a set of seeds strongly associated to one or more
                                                                     for each emotion (8 features);
emotions of a given taxonomy, that are used to
build centroid distributional vectors representing                Quartiles: The features take into account the dis-
the various emotions.                                                tribution of the emotive words in the tweet.
   In order to build the distributional profiles of the              For each emotion, the list of the emotive
words, we extracted the list T of the 30,000 most                    words has been ordered according to the
frequent nouns, verbs and adjectives from FB-                        emotive scores and divided into quartiles
NEWS15. The lemmas in T were subsequently                            (e.g. the fourth quartile contains the most
used as target and contexts in a square matrix of                    emotive words and the first quartile the less
co-occurrences extracted within a five word win-                     emotive ones.). Each feature registers the
dow (±2 words, centered on the target lemma). In                     count of the words belonging to the pair
addition, we extended the matrix to the nouns, ad-                   hemotion, quartilei (32 features in total);
jectives and verbs in the corpus of tweets (i. e.
lemmas not belonging to T ).                                      ItEM seeds: Boolean features registering the
   For each hemotion, PoSi pair we built a centroid                   presence of words belonging to the words
vector from the vectors of the seeds belonging to                     used as seeds to build the vector space mod-
that emotion and PoS, obtaining in total 24 cen-                      els. In particular, the features include the
troids1 . Starting from these spaces, several groups                  top 4 frequent words for each emotion (32
                                                                      boolean features in total);
    1
     Following the configuration in (Passaro et al., 2015; Pas-
saro and Lenci, 2016), the co-occurrence matrix has been          Distintive words: 32 features corresponding to
re-weighted using the Pointwise Mutual Information (Church
and Hanks, 1990), and in particular the Positive PMI (PPMI),           the top 4 distinctive words for each emotion.
in which negative scores are changed to zero (Niwa and                 The degree of distinctivity of a word for a
Nitta, 1994). We constructed different word spaces accord-             given emotion is calculated starting from the
ing to PoS because the context that best captures the meaning
of a word differs depending on the word to be represented              VSM normalized using Z-scores. In particu-
(Rothenhusler and Schtze, 2007).                                       lar, the feature corresponds to the proportion
     of the emotion hemotioni i against the sum of      so that we opted for the more economical setting,
     total emotion score [e1 , ..., e8 ];               i.e. the multiclass one.

Polarity (count): The number of positive and            5     Results
    negative words. The polarity of a word is
                                                        Although this model is not optimal according to
    calculated by applying Eq. 3, in which pos-
                                                        the global ranking, if we focus on the recognition
    itive emotions are assumed to be J OY and
                                                        of the negative tweets (i.e. the NEG task), it ranks
    T RUST, and negative emotions are assumed
                                                        fifth (F1-score), and first if we consider the class 1
    to be D ISGUST, F EAR , A NGER and S AD -
                                                        of the NEG task (i.e. NEG, F.sc. 1). Such trend is
    NESS.
                                                        reversed if we consider the POS task, which is the
                                   J OY +T RUST         worst performing class of this system.
                   P olarity(w) =
                                         2        (3)
             D ISGUST +F EAR +A NGER +S ADNESS
           −                                                Task        Class   Precision     Recall   F-score
                              4                             POS           0       0,8548      0,7682   0,8092
                                                            POS           1        0,264      0,3892   0,3146
                                                            POS task              0,5594      0,5787   0,5619
Polarity (values): The polarity (calculated using           NEG           0       0,7688      0,6488   0,7037
                                                            NEG           1       0,5509      0,6883    0,612
    Eq. 3) of the emotive words in the tweet.
                                                            NEG task             0,65985     0,66855   0,6579
    The maximum number of emotive words is                  GLOBAL              0,609625    0,623625   0,6099
    assumed to be 20;
                                                                        Table 1: System results.
LDA features: This group of features includes 50
   features referred to the topic distribution of
                                                           Due to the great difference in terms of perfor-
   the tweet. The LDA model has been built
                                                        mance between the results obtained by performing
   on the FB-NEWS15 corpus (Passaro et al.,
                                                        a 10 fold cross validation, we suspected that the
   2016) which is organized into 50 clusters of
                                                        system was overfitting the training data, so that we
   thematically related news created with LDA
                                                        performed different feature ablation experiments,
   (Blei et al., 2003) (Mallet implementation
                                                        in which we included only the lexical information
   (McCallum, 2002)). Each feature refers to
                                                        derived from ItEM and FB-NEWS (i.e. we re-
   the association between the text of the tweet
                                                        moved the features relying to Sentix, Negation and
   and a topic extracted from FB-NEWS15.
                                                        Hashtags (cf. table 2). The results demonstrate on
4   Classification                                      one hand that significant improvements can be ob-
                                                        tained by using lexical information, especially to
We used the same paradigm used in (Passaro et al.,      recognize negative texts. On the other hand the
2014). In particular, we chose to base the CoL-         results highlight the overfitting of the submitted
ing Lab system for polarity classification on the       model, probably due to the overlapping between
SVM classifier with a linear kernel implementa-         Sentix and the emotive features.
tion available in Weka (Witten and Frank, 2011),
trained with the Sequential Minimal Optimization            Task        Class   Precision     Recall   F-score
                                                            POS           0       0,8518      0,8999    0,8752
(SMO) algorithm introduced by Platt (Platt, 1999).          POS           1       0,3629       0,267    0,3077
   The classification task proposed by the orga-            POS task             0,60735     0,58345   0,59145
                                                            NEG          0        0,8082      0,6065     0,693
nizers could be approached either by building               NEG          1        0,5506      0,7701    0,6421
two separate binary classifiers relying of two dif-         NEG task              0,6794      0,6883   0,66755
ferent models (one judging the positiveness of              GLOBAL              0,643375    0,635875    0,6295
the tweet, the other judging its negativeness),
or by developing a single multiclass classifier              Table 2: System results for a filtered model.
where the possible outcomes are Positive Polar-
ity (Task POS:1, Task NEG:0), Negative Polar-               The advantage of using only the lexical features
ity (Task POS:0, Task NEG:1), Mixed Polarity            derived from ItEM are the following: i) the emo-
(Task POS:1, Task NEG:1) and No Polarity (Task          tional values of the words can be easily updated;
POS:0, Task NEG:0). In Evalita 2014 (Passaro et         ii) the VSM can be extended to increase the lexical
al., 2014) we tried both approaches in our devel-       coverage of the resource; iii) the system is “lean”
opment phase, and found no significant difference,      (it can do more with less).
6   Conclusions                                             Verena Lyding, Egon Stemle, Claudia Borghetti, Marco
                                                              Brunello, Sara Castagnoli, Felice DellOrletta, Hen-
The Coling Lab system presented in 2014 (Pas-                 rik Dittmann, Alessandro Lenci, and Vito Pirrelli.
saro et al., 2014) has been enriched with emo-                2014. The PAISÀ Corpus of Italian Web Texts. In
                                                              Proceedings of the 9th Web as Corpus Workshop
tive features derived from a distributional, corpus-
                                                              (WaC-9), pages 36–43, Gothenburg (Sweden). As-
based resource built from the social media cor-               sociation for Computational Linguistics.
pus FB-NEWS15 (Passaro et al., 2016). In ad-
dition, the system exploits LDA features extacted           Andrew K. McCallum.            2002.     Mallet: A
                                                              machine       learning   for    language    toolkit.
from the same corpus. Additional experiments                  http://mallet.cs.umass.edu.
demonstrated that removing most of the non-
distributional lexical features derived from Sentix,        Yoshiki Niwa and Yoshihiko Nitta. 1994. Co-
                                                              occurrence vectors from corpora vs. distance vectors
the performance can be improved. As a conse-                  from dictionaries. In Proceedings of the 15th Inter-
quence, with a relatively low number of features              national Conference On Computational Linguistics,
the system reaches satisfactory performance, with             pages 304–309, Kyoto (Japan).
top-scores in recognizing negative tweets.
                                                            Lucia C. Passaro and Alessandro Lenci. 2016. Eval-
                                                              uating context selection strategies to build emotive
                                                              vector space models. In Proceedings of the Tenth In-
References                                                    ternational Conference on Language Resources and
                                                              Evaluation (LREC 2016), Portoro (Slovenia). Euro-
Giuseppe Attardi, Felice Dell’Orletta, Maria Simi, and        pean Language Resources Association (ELRA).
  Joseph Turian. 2009. Accurate dependency pars-
  ing with a stacked multilayer perceptron. In Pro-         Lucia C. Passaro, Gianluca E. Lebani, Emmanuele
  ceedings of EVALITA 2009 Evaluation of NLP and              Chersoni, and Alessandro Lenci. 2014. The col-
  Speech Tools for Italian 2009, Reggio Emilia (Italy).       ing lab system for sentiment polarity classification
  Springer.                                                   of tweets. In Proceedings of the First Italian Confer-
                                                              ence on Computational Linguistics CLiC-it 2014 &
Francesco Barbieri, Valerio Basile, Danilo Croce,             and of the Fourth International Workshop EVALITA
  Malvina Nissim, Nicole Novielli, and Viviana Patti.         2014, pages 87–92, Pisa (Italy).
  2016. Overview of the EVALITA 2016 SENTiment
  POLarity Classification Task. In Pierpaolo Basile,        Lucia C. Passaro, Laura Pollacci, and Alessandro
  Anna Corazza, Franco Cutugno, Simonetta Monte-              Lenci. 2015. Item: A vector space model to boot-
  magni, Malvina Nissim, Viviana Patti, Giovanni Se-          strap an italian emotive lexicon. In Proceedings
  meraro, and Rachele Sprugnoli, editors, Proceed-            of the second Italian Conference on Computational
  ings of Third Italian Conference on Computational           Linguistics CLiC-it 2015, pages 215–220, Trento
  Linguistics (CLiC-it 2016) & Fifth Evaluation Cam-          (Italy).
  paign of Natural Language Processing and Speech
  Tools for Italian. Final Workshop (EVALITA 2016),         Lucia C. Passaro, Alessandro Bondielli, and Alessan-
  Napoli (Italy). Academia University Press.                  dro Lenci. 2016. Fb-news15: A topic-annotated
                                                              facebook corpus for emotion detection and senti-
Valerio Basile and Malvina Nissim. 2013. Sentiment            ment analysis. In Proceedings of the Third Italian
  analysis on italian tweets. In Proceedings of the 4th       Conference on Computational Linguistics CLiC-it
  Workshop on Computational Approaches to Subjec-             2016., Napoli (Italy). To appear.
  tivity, Sentiment and Social Media Analysis, pages
  100–107, Atlanta.                                         John C. Platt, 1999. Advances in Kernel Meth-
                                                              ods, chapter Fast Training of Support Vector Ma-
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.           chines Using Sequential Minimal Optimization,
  2003. Latent dirichlet allocation. The Journal of           pages 185–208. MIT Press, Cambridge, MA, USA.
  Machine Learning Research, 3:993–1022.                    Lorenzo Renzi, Giampaolo Salvi, and Anna Cardi-
                                                              naletti. 2001. Grande grammatica italiana di con-
Kenneth W. Church and Patrick Hanks. 1990. Word               sultazione. Number v. 1. Il Mulino.
  association norms, mutual information, and lexicog-
  raphy. Computational Linguistics, 16:22–29.               Klaus Rothenhusler and Hinrich Schtze. 2007. Part of
                                                              speech filtered word spaces. In Sixth International
Felice Dell’Orletta. 2009. Ensemble system for part-          and Interdisciplinary Conference on Modeling and
  of-speech tagging. In Proceedings of EVALITA 2009           Using Context.
  Evaluation of NLP and Speech Tools for Italian
  2009, Reggio Emilia (Italy). Springer.                    Ian H. Witten and Eibe Frank. 2011. Data Mining:
                                                               Practical Machine Learning Tools and Techniques.
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit-             Morgan Kaufmann Publishers Inc., San Francisco,
  ter sentiment classification using distant supervision.      CA, USA, 3rd edition.
  Processing, pages 1–6.