Exploiting Emotive Features for the Sentiment Polarity Classification of tweets Lucia C. Passaro, Alessandro Bondielli and Alessandro Lenci CoLing Lab, Dipartimento di Filologia, Letteratura e Linguistica University of Pisa (Italy) lucia.passaro@for.unipi.it alessandro.bondielli@gmail.com alessandro.lenci@unipi.it Abstract emoticons, slang, specific terminology, abbrevia- tions, links and hashtags is higher than in other do- English. This paper describes the CoL- mains and platforms. Twitter users post messages ing Lab system for the participation in from many different media, including their smart- the constrained run of the EVALITA 2016 phones, and they “tweet” about a great variety of SENTIment POLarity Classification Task topics, unlike what can be observed in other sites, (Barbieri et al., 2016). The system ex- which appear to be tailored to a specific group of tends the approach in (Passaro et al., 2014) topics (Go et al., 2009). with emotive features extracted from ItEM The paper is organized as follows: Section 2 (Passaro et al., 2015; Passaro and Lenci, describes the architecture of the system, as well 2016) and FB-NEWS15 (Passaro et al., as the pre-processing and the features designed 2016). in (Passaro et al., 2014). Section 3 shows the additional features extracted from emotive VSM Italiano. Questo articolo descrive il and from LDA. Section 4 shows the classification sistema sviluppato all’interno del CoL- paradigm, and the last sections are left for results ing Lab per la partecipazione al task and conclusions. di EVALITA 2016 SENTIment POLarity Classification Task (Barbieri et al., 2016). 2 Description of the system Il sistema estende l’approccio descritto in (Passaro et al., 2014) con una serie di fea- The system extends the approach in (Passaro et al., tures emotive estratte da ItEM (Passaro et 2014) with emotive features extracted from ItEM al., 2015; Passaro and Lenci, 2016) and (Passaro et al., 2015; Passaro and Lenci, 2016) FB-NEWS15 (Passaro et al., 2016). and FB-NEWS15 (Passaro et al., 2016). The main goal of the work is to evaluate the contribution of a distributional affective resource to estimate the 1 Introduction valence of words. The CoLing Lab system for Social media and microblogging services are ex- polarity classification includes the following ba- tensively used for rather different purposes, from sic steps: (i) a preprocessing phase, to separate news reading to news spreading, from entertain- linguistic and nonlinguistic elements in the target ment to marketing. As a consequence, the study tweets; (ii) a feature extraction phase, in which the of how sentiments and emotions are expressed in relevant characteristics of the tweets are identified; such platforms, and the development of methods (iii) a classification phase, based on a Support Vec- to automatically identify them, has emerged as a tor Machine (SVM) classifier with a linear kernel. great area of interest in the Natural Language Pro- 2.1 Preprocessing cessing Community. Twitter presents many lin- guistic and communicative peculiarities. A tweet, The aim of the preprocessing phase is the identifi- in fact, is a short informal text (140 characters), cation of the linguistic and nonlinguistic elements in which the frequency of creative punctuation, in the tweets and their annotation. While the preprocessing of nonlinguistic ele- 2015). The features are, for each emotion, ments such as links and emoticons is limited to the total count of strongly emotional tokens their identification and classification (cf. section in each tweet. 2.2.4), the treatment of the linguistic material re- quired the development of a dedicated rule-based Bad words lexicon: By exploiting an in house procedure, whose output is a normalized text that built lexicon of common Italian bad words, is subsequently feed to a pipeline of general- we reported, for each tweet, the frequency of purpose linguistic annotation tools. The following bad words belonging to a selected list, as well rules have been applied in the linguistic prepro- as the total amount of these lemmas. cessing phase: Sentix: Sentix (Sentiment Italian Lexicon: (Basile and Nissim, 2013)) is a lexicon for • Emphasis: tokens presenting repeated char- Sentiment Analysis in which 59,742 lemmas acters like bastaaaa “stooooop” are replaced are annotated for their polarity and intensity, by their most probable standardized forms among other information. Polarity scores (i.e. basta “stop”); range from −1 (totally negative) to 1 (totally • Links and emoticons: they are identified and positive), while Intensity scores range from removed; 0 (totally neutral) to 1 (totally polarized). Both these scores appear informative for the • Punctuation: linguistically irrelevant punctu- classification, so that we derived, for each ation marks are removed; lemma, a Combined score Cscore calculated as follows: • Usernames: the users cited in a tweet are identified and normalized by removing the @ Cscore = Intensity ∗ P olarity (1) symbol and capitalizing the entity name; Depending on their Cscore , the selected lem- • Hashtags: they are identified and normalized mas have been organized into several groups: by simply removing the # symbol; • strongly positives: 1 ≤ Cscore < 0.25 • weakly positives:0.25 ≤ Cscore < 0.125 The output of this phase are linguistically- • neutrals: 0.125 ≤ Cscore ≤ −0.125 • weakly negatives: −0.125 < Cscore ≤ −0.25 standardized tweets, that are subsequently POS • highly negatives: −0.25 < Cscore ≤ −1 tagged with the Part-Of-Speech tagger described in (Dell’Orletta, 2009) and dependency-parsed Since Sentix relies on WordNet sense dis- with the DeSR parser (Attardi et al., 2009). tinctions, it is not uncommon for a lemma to be associated with more than one 2.2 Feature extraction hIntensity,Polarityi pair, and consequently to The inventory of features can be organized into six more than one Cscore . classes. The five classes of features described in In order to handle this phenomenon, the lem- this section have been designed in 2014, the sixth mas have been splitted into three different class, described in the next section is referred to ambiguity classes: Lemmas with only one the emotive and LDA features. entry or whose entries are all associated with 2.2.1 Lexical Features the same Cscore value, are marked as “Unam- biguous” and associated with their Cscore . Lexical features represent the occurrence of bad words or of words that are either highly emotional Ambiguous cases were treated by inspecting, or highly polarized. Relevant lemmas were identi- for each lemma, the distribution of the associ- fied from two in-house built lexicons (cf. below), ated Cscores : Lemmas which had a Majority and from Sentix (Basile and Nissim, 2013), a lexi- Vote (MV) were marked as “Inferable” and con of sentiment-annotated Italian words. Lexical associated with the Cscore of the MV. If there features include: was no MV, lemmas were marked as “Am- biguous” and associated with the mean of the ItEM seeds: Lexicon of 347 highly emotional Cscores . To isolate a reliable set of polarized Italian words built by exploiting an online words, we focused only on the Unambigu- feature elicitation paradigm (Passaro et al., ous or Inferable lemmas and selected only the 250 topmost frequent according to the PAIS and :-), marked with their polarity score: 1 corpus (Lyding et al., 2014), a large collec- (positive), −1 (negative), 0 (neutral). tion of Italian web texts. LexEmo is used both to identify emoticons Other Sentix-based features in the ColingLab and to annotate their polarity. model are: the number of tokens for each Emoticon-related features are the total Cscore group, the Cscore of the first token in amount of emoticons in the tweet, the the tweet, the Cscore of the last token in the polarity of each emoticon in sequential order tweet and the count of lemmas that are repre- and the polarity of each emoticon in reversed sented in Sentix. order. For instance, in the tweet :-(quando ci vediamo? mi manchi anche tu! :*:* 2.2.2 Negation “:-(when are we going to meet up? I miss Negation features have been developed to encode you, too :*:*” there are three emoticons, the presence of a negation and the morphosyntac- the first of which (:-() is negative while the tic characteristics of its scope. others are positive (:*; :*). The inventory of negative lemmas (e.g. “non”) Accordingly, the classifier has been fed and patterns (e.g. “non ... mai”) have been ex- with the information that the polarity of tracted from (Renzi et al., 2001). The occurrences the first emoticon is −1, that of the second of these lemmas and structures have been counted emoticon is 1 and the same goes for the third an inserted as features to feed the classifier. emoticon. At the same way, another group of In order to characterize the scope of each nega- feature specifies that the polarity of the last tion, we used the dependency parsed tweets pro- emoticon is 1, as it goes for that of the last duced by DeSR (Attardi et al., 2009). The scope but one emoticon, while the last but two has of a negative element is assumed to be its syntac- a polarity score of −1. tic head or the predicative complement of its head, in the case the latter is a copula. Although it is Links: These features contain a shallow classifi- clearly a simplifying assumption, the preliminary cation of links performed using simple reg- experiments show that this could be a rather cost- ular expressions applied to URLs, to clas- effective strategy in the analysis of linguistically sify them as following: video, images, simple texts like tweets. social and other. We also use as feature This information has been included in the model the absolute number of links for each tweet. by counting the number of negation patterns en- countered in each tweet, where a negation pat- Emphasis: The features report on the number of tern is composed by the PoS of the negated ele- emphasized tokens presenting repeated char- ment plus the number of negative tokens depend- acters like bastaaaa, the average number of ing from it and, in case it is covered by Sentix, ei- repeated characters in the tweet, and the cu- ther its Polarity, its Intensity and its Cscores value. mulative number of repeated characters in the tweet. 2.2.3 Morphological features The linguistic annotation produced in the prepro- Creative Punctuation: Sequences of contigu- cessing phase has been exploited also in the pop- ous punctuation characters, like !!!, ulation of the following morphological statistics: !?!?!?!!?!????! or ......., are (i) number of sentences in the tweet; (ii) number of identified and classified as a sequence of linguistic tokens; (iii) proportion of content words dots, exclamations marks, question marks or (nouns, adjectives, verbs and adverbs); (iv) num- mixed. For each tweet, the features corre- ber of tokens for Part-of-Speech. spond to the number of sequences belonging to each group and their average length in 2.2.4 Shallow features characters. This group of features has been developed to de- scribe distinctive characteristics of web communi- Quotes: The number of quotations in the tweet. cation. The group includes: 2.2.5 Twitter features Emoticons: We used the lexicon LexEmo to mark This group of features describes some Twitter- the most common emoticons, such as :-( specific characteristics of the target tweets. Topic: This information marks if a tweet has been of features have been extracted. The simplest ones retrieved via a specific political hashtag or include general statistics such as the number of keywords. It is provided by organizers as an emotive words and the emotive score of a tweet. attribute of the tweet; More sophisticated features are aimed at inferring the degree of distinctivity of a word as well as its Usernames: The number of @username in the polarity from their own emotive profile. tweet; Number of emotive words: Words belonging to Hashtags: Hashtags play the role of organizing the emotive Facebook spaces; the tweets around a single topic, so that they are useful to be considered in determing their Emotive/words ratio: The ratio between the polarity (i.e. a tweet containing hashtags like number of emotive words and the total num- and #amore “#love” and #felice “#happy” ber of words in the tweet; is expected to be positive and a tweet con- Strongly emotive words: Number of words hav- taining hashtags like #ansia “#anxiety” and ing a high (greater than 0.4) emotive score for #stressato “#stressedout” is expected to be at least one emotion; negative. This group of features registers the presence of an hashtag belonging to the list Tweet emotive score: Score calculated as the ra- of the hashtags with a frequency higher than tio between the number of strongly polarized 1 in the training corpus. words and the number of the content words in the tweet (Eq. 2). The feature assumes values 3 Introducing emotive and LDA features in the interval [0, 1]. In absence of strongly In order to add emotive features to the CoLing Lab emotive words, the default value is 0. model, we created an emotive lexicon from the Count(Strongly emotive words) corpus FB-NEWS15 (Passaro et al., 2016) follow- E(T weet) = Count(Content words) ing the strategy illustrated in (Passaro et al., 2015; (2) Passaro and Lenci, 2016). The starting point is Maximum values: The maximum emotive value a set of seeds strongly associated to one or more for each emotion (8 features); emotions of a given taxonomy, that are used to build centroid distributional vectors representing Quartiles: The features take into account the dis- the various emotions. tribution of the emotive words in the tweet. In order to build the distributional profiles of the For each emotion, the list of the emotive words, we extracted the list T of the 30,000 most words has been ordered according to the frequent nouns, verbs and adjectives from FB- emotive scores and divided into quartiles NEWS15. The lemmas in T were subsequently (e.g. the fourth quartile contains the most used as target and contexts in a square matrix of emotive words and the first quartile the less co-occurrences extracted within a five word win- emotive ones.). Each feature registers the dow (±2 words, centered on the target lemma). In count of the words belonging to the pair addition, we extended the matrix to the nouns, ad- hemotion, quartilei (32 features in total); jectives and verbs in the corpus of tweets (i. e. lemmas not belonging to T ). ItEM seeds: Boolean features registering the For each hemotion, PoSi pair we built a centroid presence of words belonging to the words vector from the vectors of the seeds belonging to used as seeds to build the vector space mod- that emotion and PoS, obtaining in total 24 cen- els. In particular, the features include the troids1 . Starting from these spaces, several groups top 4 frequent words for each emotion (32 boolean features in total); 1 Following the configuration in (Passaro et al., 2015; Pas- saro and Lenci, 2016), the co-occurrence matrix has been Distintive words: 32 features corresponding to re-weighted using the Pointwise Mutual Information (Church and Hanks, 1990), and in particular the Positive PMI (PPMI), the top 4 distinctive words for each emotion. in which negative scores are changed to zero (Niwa and The degree of distinctivity of a word for a Nitta, 1994). We constructed different word spaces accord- given emotion is calculated starting from the ing to PoS because the context that best captures the meaning of a word differs depending on the word to be represented VSM normalized using Z-scores. In particu- (Rothenhusler and Schtze, 2007). lar, the feature corresponds to the proportion of the emotion hemotioni i against the sum of so that we opted for the more economical setting, total emotion score [e1 , ..., e8 ]; i.e. the multiclass one. Polarity (count): The number of positive and 5 Results negative words. The polarity of a word is Although this model is not optimal according to calculated by applying Eq. 3, in which pos- the global ranking, if we focus on the recognition itive emotions are assumed to be J OY and of the negative tweets (i.e. the NEG task), it ranks T RUST, and negative emotions are assumed fifth (F1-score), and first if we consider the class 1 to be D ISGUST, F EAR , A NGER and S AD - of the NEG task (i.e. NEG, F.sc. 1). Such trend is NESS. reversed if we consider the POS task, which is the J OY +T RUST worst performing class of this system. P olarity(w) = 2 (3) D ISGUST +F EAR +A NGER +S ADNESS − Task Class Precision Recall F-score 4 POS 0 0,8548 0,7682 0,8092 POS 1 0,264 0,3892 0,3146 POS task 0,5594 0,5787 0,5619 Polarity (values): The polarity (calculated using NEG 0 0,7688 0,6488 0,7037 NEG 1 0,5509 0,6883 0,612 Eq. 3) of the emotive words in the tweet. NEG task 0,65985 0,66855 0,6579 The maximum number of emotive words is GLOBAL 0,609625 0,623625 0,6099 assumed to be 20; Table 1: System results. LDA features: This group of features includes 50 features referred to the topic distribution of Due to the great difference in terms of perfor- the tweet. The LDA model has been built mance between the results obtained by performing on the FB-NEWS15 corpus (Passaro et al., a 10 fold cross validation, we suspected that the 2016) which is organized into 50 clusters of system was overfitting the training data, so that we thematically related news created with LDA performed different feature ablation experiments, (Blei et al., 2003) (Mallet implementation in which we included only the lexical information (McCallum, 2002)). Each feature refers to derived from ItEM and FB-NEWS (i.e. we re- the association between the text of the tweet moved the features relying to Sentix, Negation and and a topic extracted from FB-NEWS15. Hashtags (cf. table 2). The results demonstrate on 4 Classification one hand that significant improvements can be ob- tained by using lexical information, especially to We used the same paradigm used in (Passaro et al., recognize negative texts. On the other hand the 2014). In particular, we chose to base the CoL- results highlight the overfitting of the submitted ing Lab system for polarity classification on the model, probably due to the overlapping between SVM classifier with a linear kernel implementa- Sentix and the emotive features. tion available in Weka (Witten and Frank, 2011), trained with the Sequential Minimal Optimization Task Class Precision Recall F-score POS 0 0,8518 0,8999 0,8752 (SMO) algorithm introduced by Platt (Platt, 1999). POS 1 0,3629 0,267 0,3077 The classification task proposed by the orga- POS task 0,60735 0,58345 0,59145 NEG 0 0,8082 0,6065 0,693 nizers could be approached either by building NEG 1 0,5506 0,7701 0,6421 two separate binary classifiers relying of two dif- NEG task 0,6794 0,6883 0,66755 ferent models (one judging the positiveness of GLOBAL 0,643375 0,635875 0,6295 the tweet, the other judging its negativeness), or by developing a single multiclass classifier Table 2: System results for a filtered model. where the possible outcomes are Positive Polar- ity (Task POS:1, Task NEG:0), Negative Polar- The advantage of using only the lexical features ity (Task POS:0, Task NEG:1), Mixed Polarity derived from ItEM are the following: i) the emo- (Task POS:1, Task NEG:1) and No Polarity (Task tional values of the words can be easily updated; POS:0, Task NEG:0). In Evalita 2014 (Passaro et ii) the VSM can be extended to increase the lexical al., 2014) we tried both approaches in our devel- coverage of the resource; iii) the system is “lean” opment phase, and found no significant difference, (it can do more with less). 6 Conclusions Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice DellOrletta, Hen- The Coling Lab system presented in 2014 (Pas- rik Dittmann, Alessandro Lenci, and Vito Pirrelli. saro et al., 2014) has been enriched with emo- 2014. The PAISÀ Corpus of Italian Web Texts. In Proceedings of the 9th Web as Corpus Workshop tive features derived from a distributional, corpus- (WaC-9), pages 36–43, Gothenburg (Sweden). As- based resource built from the social media cor- sociation for Computational Linguistics. pus FB-NEWS15 (Passaro et al., 2016). In ad- dition, the system exploits LDA features extacted Andrew K. McCallum. 2002. Mallet: A machine learning for language toolkit. from the same corpus. Additional experiments http://mallet.cs.umass.edu. demonstrated that removing most of the non- distributional lexical features derived from Sentix, Yoshiki Niwa and Yoshihiko Nitta. 1994. Co- occurrence vectors from corpora vs. distance vectors the performance can be improved. As a conse- from dictionaries. In Proceedings of the 15th Inter- quence, with a relatively low number of features national Conference On Computational Linguistics, the system reaches satisfactory performance, with pages 304–309, Kyoto (Japan). top-scores in recognizing negative tweets. Lucia C. Passaro and Alessandro Lenci. 2016. Eval- uating context selection strategies to build emotive vector space models. In Proceedings of the Tenth In- References ternational Conference on Language Resources and Evaluation (LREC 2016), Portoro (Slovenia). Euro- Giuseppe Attardi, Felice Dell’Orletta, Maria Simi, and pean Language Resources Association (ELRA). Joseph Turian. 2009. Accurate dependency pars- ing with a stacked multilayer perceptron. In Pro- Lucia C. Passaro, Gianluca E. Lebani, Emmanuele ceedings of EVALITA 2009 Evaluation of NLP and Chersoni, and Alessandro Lenci. 2014. The col- Speech Tools for Italian 2009, Reggio Emilia (Italy). ing lab system for sentiment polarity classification Springer. of tweets. In Proceedings of the First Italian Confer- ence on Computational Linguistics CLiC-it 2014 & Francesco Barbieri, Valerio Basile, Danilo Croce, and of the Fourth International Workshop EVALITA Malvina Nissim, Nicole Novielli, and Viviana Patti. 2014, pages 87–92, Pisa (Italy). 2016. Overview of the EVALITA 2016 SENTiment POLarity Classification Task. In Pierpaolo Basile, Lucia C. Passaro, Laura Pollacci, and Alessandro Anna Corazza, Franco Cutugno, Simonetta Monte- Lenci. 2015. Item: A vector space model to boot- magni, Malvina Nissim, Viviana Patti, Giovanni Se- strap an italian emotive lexicon. In Proceedings meraro, and Rachele Sprugnoli, editors, Proceed- of the second Italian Conference on Computational ings of Third Italian Conference on Computational Linguistics CLiC-it 2015, pages 215–220, Trento Linguistics (CLiC-it 2016) & Fifth Evaluation Cam- (Italy). paign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), Lucia C. Passaro, Alessandro Bondielli, and Alessan- Napoli (Italy). Academia University Press. dro Lenci. 2016. Fb-news15: A topic-annotated facebook corpus for emotion detection and senti- Valerio Basile and Malvina Nissim. 2013. Sentiment ment analysis. In Proceedings of the Third Italian analysis on italian tweets. In Proceedings of the 4th Conference on Computational Linguistics CLiC-it Workshop on Computational Approaches to Subjec- 2016., Napoli (Italy). To appear. tivity, Sentiment and Social Media Analysis, pages 100–107, Atlanta. John C. Platt, 1999. Advances in Kernel Meth- ods, chapter Fast Training of Support Vector Ma- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. chines Using Sequential Minimal Optimization, 2003. Latent dirichlet allocation. The Journal of pages 185–208. MIT Press, Cambridge, MA, USA. Machine Learning Research, 3:993–1022. Lorenzo Renzi, Giampaolo Salvi, and Anna Cardi- naletti. 2001. Grande grammatica italiana di con- Kenneth W. Church and Patrick Hanks. 1990. Word sultazione. Number v. 1. Il Mulino. association norms, mutual information, and lexicog- raphy. Computational Linguistics, 16:22–29. Klaus Rothenhusler and Hinrich Schtze. 2007. Part of speech filtered word spaces. In Sixth International Felice Dell’Orletta. 2009. Ensemble system for part- and Interdisciplinary Conference on Modeling and of-speech tagging. In Proceedings of EVALITA 2009 Using Context. Evaluation of NLP and Speech Tools for Italian 2009, Reggio Emilia (Italy). Springer. Ian H. Witten and Eibe Frank. 2011. Data Mining: Practical Machine Learning Tools and Techniques. Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit- Morgan Kaufmann Publishers Inc., San Francisco, ter sentiment classification using distant supervision. CA, USA, 3rd edition. Processing, pages 1–6.