Samskara Minimal structural features for detecting subjectivity and polarity in Italian tweets Irene Russo, Monica Monachini Istituto di Linguistica Computazionale “Antonio Zampolli“ (ILC CNR) Lari Lab firstname.secondname@ilc.cnr.it Abstract cause we assist every day to an exponential growth of opinionated content on the web that require English. Sentiment analysis classification computational systems to be managed. Detected, tasks strongly depend on the properties extracted and classified, opinionated content can of the medium that is used to communi- also be labeled as positive or negative, but ad- cate opinionated content. There are some ditional categories (ambiguous, neutral etc.) are limitations in Twitter that force the user possible. Resources and methodologies created to exploit structural properties of this so- for the detection and classification of subjectiv- cial network with features that have prag- ity and polarity in reviews are not applicable with matic and communicative functions. Sam- good results on different data, such as tweets or skara is a system that uses minimal struc- comments about news from online fora. tural features to classify Italian tweets as There are several reasons behind this: first and instantiations of a textual genre, obtain- foremost, opinions can be expressed more or ing good results for subjectivity classifi- less explicitly depending on the context; lexical cation, while polarity classification needs cues from lexical resources such as SentiWord- substantial improvements. Net (Baccianella et al., 2010) or General Inquirer Italiano. I compiti di classificazione a (Stone, 1966) could be useless when people write livello di sentiment analysis dipendono their point of views in complex and subtle ways. fortemente dalle proprietà del mezzo us- Secondly, different media and platforms impose ato per comunicare contenuti d’opinione. different constraints on the structure of the con- Vi sono limiti oggettivi in Twitter che tent expressed. forzano l’utente a sfruttare le proprietà Twitter’s limits in terms of characters force the use strutturali del mezzo assegnando ad al- of abbreviations and the omission of syntactic el- cuni elementi funzioni pragmatiche e co- ements. But users try to exploit creatively these municative. Samskara è un sistema che limitations, for example adding pragmatic func- si propone di classificare i tweets ital- tions with emoticons. iani come se appartenessero a un genere Features and functionalities anchoring the text to testuale, interprentandoli come elementi extra-linguistic dimensions (such as mentions and caratterizzati da strutture minimali e otte- pictures in tweets or like/agree from other users nendo buoni risultati nella classificazione in online debates) should be considered in Sen- della soggettività mentre la classificazione timent Analysis classification tasks because of to della polarità ha bisogno di sostanziali their communicative functions. miglioramenti. In this paper we present Samskara, a Lari lab sys- tem for the classification of Italian tweets that took part in two tasks at Sentipolc2016 (Task 1,subjec- 1 Introduction tivity and Task 2, polarity classification). The sys- After 15 years of NLP works on the topic Sen- tem is described in par. 2, with results presented in timent Analysis is still a relevant task, mainly be- 2.2 where we discuss the limitations of the system. 2 System description (Pennebaker et al., 2015) we see that frequen- cies are unable to distinguish between positive and Samskara is a classification system based on a negative tweets in the Sentipolc2016 training data minimal set of features that wants to address the (see Table 1). To avoid this, we defined for inter- issue of subjectivity and polarity classifications of Italian tweets. Tweets are considered as instanti- class tokens LIWC+ LIWC- ations of a textual genre, namely they have spe- pos 92295 234 (0.26%) 225 (0.25%) cific structural properties with communicative and neg 114435 78 (0.07%) 683 (0.6%) pragmatic functions. In our approach, focusing on the structural properties means: Table 1: Absolute and relative frequencies of Ital- ian LIWC2015 lemmas in positive and negative • abstracting the task from lexical values of tweets (Sentipolc2016 training set). single words that could be a deceptive cue because of lexical sparseness, ambiguity of nal use a subset of SentiWordNet 3.0 (Baccianella words, use of jargon and ironic exploitations et al., 2010) that we call SWN Core selecting: of words; • all the words corresponding to senses that are • taking into account features used in author- polarised; ship attribution to represent abstract patterns characterizing different styles, e.g. PoS tag • from the set above, all the words correspond- n-gram frequencies(Stamatos, 2009)1 ; ing to senses that display single-valued po- larity (i.e. they are always positive or always • choosing a tagset for PoS that includes tags negative); peculiar of tweets as a textual genre, i.e. in- • from the set above we delete all the words terjection and emoticon. that have also a neutral sense; More generally, we want to capture high-level lin- • we sum polarity values for every lemma in guistic and extra-linguistic properties of tweets, order to have for example a single value for also considering basic sequential structures in lemmas listed in SWN with two different forms of sequences of bigrams. positive values or three different negative val- 2.1 Data analysis, data preprocessing and ues. feature selection The English SWN Core is composed by 6640 ex- Before starting with the selections of features, data clusively positive lemmas and 7603 exclusively analysis of the training set helped in the investiga- negative lemmas. Since in these lists items have tion of several hypotheses. a polarity value ranging from 0.125 to 3.25, with Polarised lexical items have been widely used in the idea of selecting lemmas that are strongly po- sentiment analysis classification (Liu and Zhang, larised we set 0.5 as threshold; as a consequence 2012) but resources in this field list values at of this decision we have 1844 very positive and sense level (such as SentiWordNet) or conflate the 3272 very negative lemmas. After deletion of senses in a single entry (such as General Inquirer multiword expressions these strongly opinionated and LIWC). Without an efficient word sense dis- words have been translated to Italian using Google ambiguation module, using SentiWordNet is dif- Translate, manually checked and annotated with ficult. One strategy is to sum all the values and PoS and polarity. to select a threshold for words that are tagged We clean the lists, deleting lemmas that appear as polarised in text. That means to overstimate two times, lemmas that have been translated as positive/negative content, without finding a clear multiword expressions and lemmas that do not boundary between, for example, positive and neg- have polarity in Italian. At the end we have 890 ative tweets. positive and 1224 negative Italian lemmas. Con- Considering the Italian version of LIWC2015 sidering their frequencies in the training set (see 1 Table 2) we find out that only negative items are For the moment we think that sequences of syntactic re- lations are not useful because of the poor performance of Ital- distinctive. Because of the presence of ironic ian syntactic parsers on tweets. tweets positive lemmas tend to occur in tweets that have been tagged as negative. The exploitation of TreeTagger PoSTWITA positive words in ironic communication is a well- AUX [A-Z a-z]+ AUX known phenomenon (Dews and Winner, 1995) - DET [A-Z a-z]+ DET the positive literal meaning is subverted by the PRO [A-Z a-z]+ PRON negative intended meaning - and neglecting this NPR [A-Z a-z]+ PROPN aspect of the Sentipolc2016 training set could im- PUN PUNCT ply lower classification performances. If we al- SENT PUNCT low positive items from SWN Core in the system VER[A-Z a-z]+cli VERB CLIT the classification of negative tweets is made diffi- VER [A-Z a-z]+ VERB cult. As we mention above, structural properties Table 3: Comparison between TreeTagger and SWN Core+ SWN Core- PoSTWITA tagsets. obj 536 (0.76%) 264 (0.37%) subj 2307 (1.4%) 1608 (1%) by the tag VERYNEG. At this point, with the in- pos 1055 (4.8%) 200 (0.9%) tention to have a minimal sequence of significant neg 839 (2%) 1096 (2.6%) tags, we created 4 version of the training set ac- Table 2: Absolute and relative frequencies of cording to 4 minimal structures, deleting all lem- SWN Core lemmas in Sentipolc2016 training set. mas and leaving only PoS tags: • minimal structure 1 (MSTRU1): EMO, of tweets can be treated as sequences of PoS. To MENTION, HASHTAG, URL, EMAIL; reduce data sparseness and to include dedicated tags for Twitter we choose the tagset proposed • minimal structure 2 (MSTRU2): EMO, by PoSTWITA, an Evalita2016 task (Bosco et al., MENTION, HASHTAG, URL, EMAIL, 2016). It looks promising because it contains cat- PROPN, INTJ; egories that: • minimal structure 3 (MSTRU3): EMO, • could be easily tagged as preprocessing step MENTION, HASHTAG, URL, EMAIL, with regular expressions (for example MEN- PROPN, INTJ, ADJ, ADV; TION and LINK); • minimal structure 4 (MSTRU4): EMOTI- • are suitable for noisy data, tagging uniformly CON, MENTION, HASHTAG, URL, items that can be written in several, non- EMAIL, PROPN, INTJ, VERYNEG. predictable ways (ahahahha, haha as INTJ); We performed classification experiments with • contains tags that have communicative and these features and we get better results with pragmatic functions, such as emoticon and MSTRU4 (see par. 2.2). interjection (see Table 4). For Samskara each tweet is represented as a se- We preprocessed all the tweets in the training set quence including its EMO, MENTION, HASH- substituting elements that are easy to find, such as TAG, URL, EMAIL, PROPN (Proper Noun), mention, hashtags, email, link, emoticon (all tags INTJ and VERYNEG lemmas from SWN Core included in PoSTWITA). (see tweet in example 1 represented in example After that, Sentipolc2016 training set has been 2). This minimal, very compact way to repre- tagged with TreeTagger (Schmid, 1997); TreeTag- sent a tweet is very convenient because partially ger tags have been converted to PostTWITA tagset avoids any noise introduced by PoS tagger (con- (see Table 3) and additional tags from PosTWITA taining only VERYNEG and PROPN as elements have been added, building dedicated lists for them that should be properly tagged with this tool). that include items from PoSTWITA training set (1) @FGoria Mario Monti Premier! #Italiare- plus additional items selected by the authors (see siste. Table 4). (2) MENTION PROPN HASHTAG. Thanks to TreeTagger we have all the words lem- matized and so all the lemmas included in the neg- Additional features for the classification of subjec- ative counterpart of SWN Core can be substituted tive and positive or negative tweets are listed in new tag type examples PART particle ’s EMO emoticon :DD, :-)))), u u INTJ interjection ah, boh, oddioo SYM symbol %, &, < CONJ coordinating conjunction ebbene, ma, oppure SCONJ subordinating conjunction nonostante, mentre, come Table 4: Examples of lemmas tagged according to Twitter-specific PoSTWITA tags. Table 5, with BOOL meaning boolean feature and performance on the training set was not satisfy- NUM numeric feature (they correspond to abso- ing but nevertheless we decided to submit results lute frequencies). The features have been selected for Task 2 on test set using all the features. In thinking about their communicative function: a1 Table 9 the official results submitted for the com- for example is useful because there is a tendency to petition are reported. Samskara was first among communicate opinionated content in discussions the constrained systems for subjectivity classifi- with other users while we choose a2 because neu- cation, while not surprisingly the performance in tral tweets often advertise newspapers’ articles in a Task 2 was bad. Results in Task 2 can be explained non opinionated way including the link at the end by the absence in the system of structural features of the tweet, but the URL is significant in other that are meaningful for the positive-negative dis- positions a6, a6 1. Together with emoticons, in- tinctions or by the unsuitability of such a minimal terjections are items that signal the presence of approach for the task. It is possible that richer se- opinionated content. For the kind of asynchronous mantic features are necessary for the detection and communication that characterize them, tweets can the classification of polarity and polarised lexical contain questions that don’t expect an answer, that items should be revised, for example, represent- are rethorical a8 1, thus making the tweet opinio- ing each lemma as a sentiment specific word em- nanted. bedding (SSWE) encoding sentiment information (Tang et al., 2014). 2.2 Results and Discussion With Samskara we prove that classification of The system adopts the Weka2 library that allows tweets should take into account structural proper- experiments with different classifiers. Due to bet- ties of content on social media, especially proper- ter performance of Naive Bayes (default settings, ties that have communicative and pragmatic func- 10- fold cross validation) with respect to Support tions. The minimal features we selected for Sam- Vector Machine we choose the first; best perfor- skara were successful for the classification of sub- mances were obtained with MSTRU4 considering jective Italian tweets. The system is based on a frequencies of unigrams and bigrams of PoS as minimal set of features that are easy to retrieve and features. We took part to Sentipolc2016 only with tag; the classification system is efficient and fast a constrained run, choosing slightly different set of for Task 1 and as such it is promising for real-time features for subjectivity and polarity evaluation. processing of big data stream. Adding the additional features in Table 5 we se- lected for Task 1 a subset of them after an ablation test. More specifically, the feature set 1 (FS1 in References Table 7) is composed by features a1, a2, a4, a4 1, Stefano Baccianella and Andrea Esuli and Fabrizio Se- a6, a6 1, a7, a7 1, a8 1, a9. The system perfor- bastian. 2010. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opin- mance is reported in terms of F-score, according to ion Mining. In Proceedings of the Seventh Interna- the measure adopted by the task organizers (Barbi- tional Conference on Language Resources and Eval- eri et al., 2016). Results on the training data look uation (LREC’10). promising for Task 1, less promising for Task 2 Barbieri, Francesco and Basile, Valerio and Croce, (see Table 8). We didn’t succeed in optimising Danilo and Nissim, Malvina and Novielli, Nicole features for the polarity detection sub-task. The and Patti, Viviana. 2016. Overview of the EVALITA 2016 SENTiment POLarity Classification 2 http://www.cs.waikato.ac.nz/ml/weka/ Task. In Pierpaolo Basile, Anna Corazza, Franco features description type a1 the tweet starts with MENTION BOOL a2 the tweet ends with a LINK BOOL a3 the tweet has PoS of type PUNCT BOOL a3 1 number of PoS of type PUNCT in each tweet NUM a4 the tweet has PoS of type VERYNEG BOOL a4 1 number of PoS of type VERYNEG in each tweet NUM a5 the tweet has PoS of type INTJ BOOL a5 1 number of PoS of type INTJ in each tweet NUM a6 the tweet has PoS of type URL BOOL a6 1 number of PoS of type URL in each tweet NUM a7 the tweet has PoS of type EMOTICON BOOL a7 1 number of PoS of type EMOTICON in each tweet NUM a8 1 the tweet contains a question BOOL a8 2 the tweet contains a question at the end BOOL a9 the tweet contains two consecutive exclamation marks (’!!’) BOOL the tweets contains connectives such as anzitutto, a10 BOOL comunque, dapprima, del resto Table 5: Additional features for subjectivy and polarity classification of tweets. MSTRU4 + FS1 Linguistica Computazionale (AILC). obj F-score 0.532 subj F-score 0.811 Bosco, Cristina and Tamburini, Fabio and Bolioli, An- drea and Mazzei, Alessandro. 2016. Overview Avg F-score 0.724 of the EVALITA 2016 Part Of Speech on TWit- ter for ITAlian Task. In Pierpaolo Basile, Anna Table 6: Classification results for Task 1 obtained Corazza, Franco Cutugno, Simonetta Montemagni, on Sentipolc2016 training set. Malvina Nissim, Viviana Patti, Giovanni Semer- aro and Rachele Sprugnoli, editors, Proceedings of MSTRU4 + AllF Third Italian Conference on Computational Linguis- pos F-score 0.424 tics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools neg F-score 0.539 for Italian. Final Workshop (EVALITA 2016) As- both F-score 0.047 sociazione Italiana di Linguistica Computazionale neu F-score 0.526 (AILC). Avg F-score 0.48 Shelly Dews and Ellen Winner. 1995. Muting the Table 7: Classification results for Task 2 obtained meaning: A social function of irony. Metaphor and Symbolic Activity, 10(1):319. on Sentipolc2016 training set. Bing Liu and Lei Zhang. 2012. A Survey of Opinion F-score Rank Mining and Sentiment Analysis. In C. C. Aggarwal Task 1 0.7184 1 & C. Zhai (Eds.) Mining Text Data, pp. 415–463, Task 2 0.5683 13 US: Springer. Table 8: Classification results for Task 1 and Task James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, 2 on Sentipolc2016 test set. and Kate Blackburn. 2015. The Development and Psychometric Properties of LIWC2015. Cutugno, Simonetta Montemagni, Malvina Nis- Helmut Schmid. 1997. Probabilistic Part-of-Speech sim, Viviana Patti, Giovanni Semeraro and Rachele Tagging Using Decision Trees. In New Methods in Sprugnoli, editors, Proceedings of Third Italian Language Processing, UCL Press, pp. 154-164. Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Lan- Efstathios Stamatatos. 2009. A Survey of Modern Au- guage Processing and Speech Tools for Italian. Final thorship Attribution Methods. Journal of the Ameri- Workshop (EVALITA 2016) Associazione Italiana di can Society for Information Science and Technology. Stone, Philip J. 1966. The General Inquirer: A Computer Approach to Content Analysis. The MIT Press. Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu and Bing Qin. 2014. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classifica- tion In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.