-

A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

Cataldo Musto

cataldo.musto@uniba.it 0

Giovanni Semeraro

giovanni.semeraro@uniba.it 0

Marco Polignano

marco.polignano@uniba.it 0 0 Department of Computer Science University of Bari Aldo Moro , Italy

The exponential growth of available online information provides computer scientists with many new challenges and opportunities. A recent trend is to analyze people feelings, opinions and orientation about facts and brands: this is done by exploiting Sentiment Analysis techniques, whose goal is to classify the polarity of a piece of text according to the opinion of the writer. In this paper we propose a lexicon-based approach for sentiment classi cation of Twitter posts. Our approach is based on the exploitation of widespread lexical resources such as SentiWordNet, WordNet-A ect, MPQA and SenticNet. In the experimental session the e ectiveness of the approach was evaluated against two state-of-the-art datasets. Preliminary results provide interesting outcomes and pave the way for future research in the area.

Sentiment Analysis Opinion Mining Semantics Lexicons

Thanks to the exponential growth of available online information many new challenges and opportunities arise for computer scientists. A recent trend is to analyze people feelings, opinions and orientation about facts and brands: this is done by exploiting Sentiment Analysis [ 13, 8 ] techniques, whose goal is to classify the polarity of a piece of text according to the opinion of the writer.

State of the art approaches for sentiment analysis are broadly classi ed in two categories: supervised approaches [ 6, 12 ] learn a classi cation model on the ground of a set of labeled data, while unsupervised (or lexicon-based ) ones [ 18, 4 ] infer the sentiment conveyed by a piece of text on the ground of the polarity of the word (or the phrases) which compose it. Even if recent work in the area showed that supervised approaches tend to overcome unsupervised ones (see the recent SemEval 2013 and 2014 challenges [ 10, 15 ]), the latter have the advantage of avoiding the hard-working step of labeling training data.

However, these techniques rely on (external) lexical resources which are concerned with mapping words to a categorical (positive, negative, neutral) or numerical sentiment score, which is used by the algorithm to obtain the overall sentiment conveyed by the text. Clearly, the e ectiveness of the whole approach strongly depends on the goodness of the lexical resource it relies on. As a consequence, in this work we investigated the e ectiveness of some widespread available lexical resources in the task of sentiment classi cation of microblog posts. 2

State-of-the-art Resources for Lexicon-based Sentiment Analysis

SentiWordNet: SentiWordNet [ 1 ] is a lexical resource devised to support Sentiment Analysis applications. It provides an annotation based on three numerical sentiment scores (positivity, negativity, neutrality) for each WordNet synset [ 9 ]. Clearly, given that this lexical resource provides a synset-based sentiment representation, di erent senses of the same term may have di erent sentiment scores. As shown in Figure 1, the term terrible is provided with two di erent sentiment associations. In this case, SentiWordNet needs to be coupled with a Word Sense Disambiguation (WSD) algorithm to identify the most promising meaning.

WordNet-A ect: WordNet-A ect [ 17 ] is a linguistic resource for a lexical representation of a ective knowledge. It is an extension of WordNet which labels a ective-related synsets with a ective concepts de ned as A-Labels (e.g. the term euphoria is labeled with the concept positive-emotion, the noun illness is labeled with physical state, and so on). The mapping is performed on the ground of a domain-independent hierarchy (a fragment is provided in Figure 2) of a ective labels automatically built relying on WordNet relationships.

MPQA: MPQA Subjectivity Lexicon [ 19 ] provides a lexicon of 8,222 terms (labeled as subjective expressions ), gathered from several sources. This lexicon contains a list of words, along with their POS-tagging, labeled with polarity (positive, negative, neutral) and intensity (strong, weak).

SenticNet: SenticNet [ 3 ] is a lexical resource for concept-level sentiment analysis. It relyies on the Sentic Computing [ 2 ], a novel multi-disciplinary paradigm for Sentiment Anaylsis. Di erently from the previously mentioned resources, SenticNet is able to associate polarity and a ective information also to complex concepts such as accomplishing goal, celebrate special occasion and so on. At present, SenticNet provides sentiment scores (in a range between -1 and 1) for 14,000 common sense concepts. The sentiment conveyed by each term is de ned on the ground of the intensity of sixteen basic emotions, de ned in a model called Hourglass of Emotions (see Figure 3). 3

Methodology

Typically, lexicon-based approaches for sentiment classi cation are based on the insight that the polarity of a piece of text can be obtained on the ground of the polarity of the words which compose it. However, due to the complexity of natural languages, a so simple approach is likely to fail since many facets of the language (e.g., the presence of the negation) are not taken into acccount. As a consequence, we propose a more ne-grained approach: given a Tweet T, we split it in several micro-phrases m1 : : : mn according to the splitting cues occurring in the content. As splitting cues we used punctuations, adverbs and conjunctions. Whenever a splitting cue is found in the text, a new micro-phrase is built. 3.1

Description of the approach

Given such a representation, we de ne the sentiment S conveyed by a Tweet T as the sum of the polarity conveyed by each of the micro-phrases mi which compose it. In turn, the polarity of each micro-phrase depends on the sentimental score of each term in the micro-phrase, labeled as score(tj), which is obtained from one of the above described lexical resources. In this preliminary formulation of the approach we did not take into account any valence shifters [ 7 ] except of the negation. When a negation is found in the text, the polarity of the whole micro-phrase is inverted. No heuristics have been adopted to deal with neither language intensi ers and downtoners, or to detect irony [ 14 ].

We de ned four di erent implementations of such approach: basic, normalized, emphasized and emphasized-normalized. In the basic formulation, the Fig. 3. The Hourglass of Emotions

Sbasic(T ) = polbasic(mi) = n X polbasic(mi) i=1

jT j k X score(tj ) j=1 Snorm(T ) = polnorm(mi) = n X polnorm(mi) i=1 k X score(tj ) j=1 jmij

In the normalized formulation, the micro-phrase-level scores are normalized by using the length of the single micro-phrase, in order to weigh di erently the micro-phrases according to their length.

The emphasized version is an extension of the basic formulation which gives a bigger weight to the terms tj belonging to speci c POS categories: sentiment of the Tweet is obtained by rst summing the polarity of each microphrase. Then, the score is normalized through the length of the whole Tweet. In this case the micro-phrases are just exploited to invert the polarity when a negation is found in text. (1) (2) (3) (4) (5) (6) (7) (8) polemph(mi) =

Semph(T ) = n X polemph(mi) i=1

jT j k X score(tj ) wpos(tj) j=1

SemphNorm(T ) = polemphNorm(mi) = n X polemphNorm(mi) i=1 Xk score(tj ) wpos(tj) j=1 jmij where wpos(tj) is greater than 1 if pos(tj ) = adverbs; verbs; adjectives, otherwise 1.

Finally, the emphasized-normalized is just a combination of the second and third version of the approach: 3.2

Lexicon-based Score Determination

Regardless of the variant which is adopted, the e ectiveness of the whole approach strictly depends on the way score(tj ) is calculated. For each lexical resource, a di erent way to determine the sentiment score is adopted.

As regards SentiWordNet, tj is processed through an NLP pipeline to get its POS-tag. Next, all the synsets mapped to that POS of the terms are extracted. Finally, score(tj ) is calculated as the weighted average of all the sentiment scores of the sysnets.

If WordNet-A ect is chosen as lexical resource, the algorithm tries to map the term tj to one of the nodes of the a ective hierarchy. The hierarchy is climbed until a matching is obtained. In that case, the term inherits the sentiment score (extracted from SentiWordNet) of the A-Label it matches. Otherwise, it is ignored.

The determination of the score with MPQA and is quite straightforward, since the algorithm rst associates the correct POS-tag to the term tj , then looks for it in the lexicon. If found, the term is assigned with a di erent score according to its categorical label.

A similar approach is performed for SenticNet, since the knowledge-base is queried and the polarity associated to that term is obtained. However, given that SenticNet also models common sense concepts, the algorithm tries to match more complex expressions (as bigrams and trigrams ) before looking for simple unigrams. 4

Experimental Evaluation

In the experimental session we evaluated the e ectiveness of the above described lexical resources in the task of sentiment classi cation of microblog posts. Specifically, we evaluated the accurracy of our lexicon-based approach on varying both the four lexical resources as well as the four versions of the algorithm.

Dataset and Experimental Design: experiments were performed by exploiting SemEval-2013 [ 10 ] and Stanford Twitter Sentiment (STS) datasets [ 5 ]. SemEval-20131 dataset consists of 14,435 Tweets already split in training (8,180 Tweets) and test data (3,255). Tweets have been manually annotated and are classi ed as positive, neutral and negative. STS dataset contains more that 1,600,000 Tweets, already split in training and test test, but test set is considerably smaller than training (only 359 Tweets). In this case tweets have been collected through Twitter APIs2 and automatically labeled according to the emoticons they contained.

Even if our approach can work in a totally unsupervised manner, we used training data to learn positive and negative classi cation thresholds through a simple Greedy strategy. For SemEval-2013 all the data were used to learn the thresholds, while for STS only 10,000 random tweets were exploited, due to computational issues. As regards the emphasis-based approach, the boosting factor w is set to 1.5 after a rough tuning (the score of adjectives, adverbs and nouns is increased by 50%). As regards the lexical resources, the last versions of MPQA, SentiWordNet and WordNet-A ect were downloaded, while SenticNet

1 www.cs.york.ac.uk/semeval-2013/task2/ 2 https://dev.twitter.com/

was invoked through the available REST APIs3. Some statistics about the coverage of the lexical resources is provided is provided in Table 1. For POS-tagging of Tweets, we adopted TwitterNLP4 [ 11 ], a resource speci cally developed for POS-tagging of microblog posts. Finally, The e ectiveness of the approaches was evaluated by calculating both accuracy and F1-measure [ 16 ] on test sets, while stastical signi cance was assessed through McNemar's test5.

Discussion of the Results: results of the experiments on SemEval-2013 data are provided in Figure 4. Due to space reasons, we only report accuracy scores. Results shows that the best-performing con guration is the one based on SentiWordNet which exploits both emphasis and normalization. By comparing all the variants, it emerges that the introduction of emphasis leads to an improvement in 7 out of 8 comparisons (0.4% on average). Di erences are statistically signi cant only by considering the introduction of emphasis on normalized approach with SenticNet (p < 0:0001) and SentiWordNet (p < 0:0008). On the other side, the introduction of normalization leads to an improvement only in 1 out of 4 comparisons, by using the WordNet-A ect resource (p < 0:04). By comparing the e ectiveness of the di erent lexical resources, it emerges that SentiWordNet performs signi cantly better than both SenticNet and WordNetA ect (p < 0:0001). However, even if the gap with MPQA results quite large (0.7%, from 58.24 to 58.98), the di erence is not statistically signi cant (p < 0:5). To sum up, the analysis performed on SemEval-2013 showed that SentiWordNet and MPQA are the best-perfoming lexical resources on such data.

Figure 5 shows the results of the approaches on STS dataset. Due to the small number of Tweets in the test set, results have a smaller statistical signi cance. In this case, the best-perfoming lexical resource is SenticNet, which obtained 74.65% of accuracy, greater than those obtained by the other lexical resources. However, the gap is statistically signi cant only if compared to WordNet-A ect (p < 0:00001) and almost signi cant with respect to MPQA (p < 0:11). Finally, even if the gap with SentiWordNet is around 2% (72.42% accuracy), the di erence does not seem statistically signi cant (p < 0:42). Differently from SemEval-2013 data, it emerges that the introduction of emphasis

3 http://sentic.net/api/ 4 http://www.ark.cs.cmu.edu/TweetNLP/ 5 http://en.wikipedia.org/wiki/McNemar's test

leads to an improvement only in 2 comparisons (+0.28% only on MPQA and WordNet-A ect), while in all the other cases no improvement was noted. The introduction of normalization produced a improvement in 3 out of 4 comparisons (average improvement of 0.6%, peak of 1.2% on MPQA). In all these cases, no statistical di erences emerged on varying the approaches on the same lexical resource. 5

Conclusions and Future Work

In this paper we provided a thorough comparison of lexicon-based approaches for sentiment classi cation of microblog posts. Speci cally, four widespread lexical resources and four di erent variants of our algorithm have been evaluated against two state of the art datasets.

Even if the results have been quite controversial, some interesting behavioral patterns were noted: MPQA and SentiWordNet emerged as the bestperforming lexical resources on those data. This is an interesting outcome since even a resource with a smaller coverage as MPQA can produce results which are comparable to a general-purpose lexicon as SentiWordNet. This is probably due to the fact that subjective terms, which MPQA strongly rely on, play a key role for sentiment classi cation. On the other side, results obtained by WordNetA ect were not good. This is partially due to the very small coverage of the lexicon, but it is likely that the choice of relying sentiment classi cation only on a ective features lters out a lot of relevant terms. Finally, results obtained by SenticNet were really interesting since it was the best-performing con guration on STS and the worst-performing one on SemEval data. Further analysis on the results showed that this behaviour was due to the fact that SenticNet can hardly classi cate neutral Tweets (only 20% accuracy on that data), and this negatively a ected the overall results on a three-class classi cation task. Further analysis are needed to investigate this behavior.

As future work, we will extend the analysis by evaluating more lexical resources as well as more datasets. Moreover, we will re ne our technique for threshold learning and we will try to improve our algorithm by modeling more complex syntactic structures as well as by introducing a word-sense disambiguation strategy to make our approach semantics-aware.

Acknowledgments. This work full ls the research objectives of the project "VINCENTE - A Virtual collective INtelligenCe ENvironment to develop sustainable Technology Entrepreneurship ecosystems" funded by the Italian Ministry of University and Research (MIUR)

Andrea

Esuli Baccianella , Stefano and Fabrizio Sebastiani. SentiWordNet 3 . 0: An enhanced lexical resource for sentiment analysis and opinion mining . In Proceedings of LREC , volume 10 , pages 2200 { 2204 , 2010 .

Erik

Cambria and

Amir

Hussain . Sentic computing. Springer, 2012 .

3. Erik

Cambria

, Daniel Olsher, and

Dheeraj

Rajagopal . Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis . AAAI, Quebec City , pages 1515 { 1521 , 2014 .

Xiaowen

Ding , Bing Liu, and Philip S Yu . A holistic lexicon-based approach to opinion mining . In Proceedings of the 2008 International Conference on Web Search and Data Mining , pages 231 { 240 . ACM, 2008 .

Alec

Go , Richa Bhayani, and

Lei

Huang . Twitter sentiment classi cation using distant supervision . CS224N Project Report , Stanford, pages 1 { 12 , 2009 .

Xia

Hu , Lei Tang,

Jiliang

Tang , and Huan Liu. Exploiting social relations for sentiment analysis in microblogging . In Proceedings of the sixth ACM international conference on Web search and data mining , pages 537 { 546 . ACM, 2013 .

Alistair

Kennedy and

Diana

Inkpen . Sentiment classi cation of movie reviews using contextual valence shifters . Computational Intelligence , 22 ( 2 ): 110 { 125 , 2006 .

Bing

Liu and

Lei

Zhang . A survey of opinion mining and sentiment analysis . In Mining Text Data , pages 415 { 463 . Springer, 2012 .

9. George

Miller. WordNet: a lexical database for english . Communications of the ACM , 38 ( 11 ): 39 { 41 , 1995 .

10. Preslav

Nakov

, Zornitsa Kozareva, Alan Ritter, Sara Rosenthal, Veselin Stoyanov, and Theresa Wilson. Semeval -2013 task 2: Sentiment analysis in twitter . 2013 .

11. Olutobi

Owoputi

, Brendan O'Connor , Chris Dyer, Kevin Gimpel, Nathan Schneider , and Noah

Smith.

Improved part-of-speech tagging for online conversational text with word clusters . In HLT-NAACL , pages 380 { 390 , 2013 .

12.

Alexander

Pak and

Patrick

Paroubek . Twitter as a corpus for sentiment analysis and opinion mining . In LREC , 2010 .

13.

Pang and

Lillian

Lee . Opinion mining and sentiment analysis . Foundations and trends in information retrieval , 2 ( 1 -2):1{ 135 , 2008 .

14. Antonio Reyes, Paolo Rosso, and

Tony

Veale . A multidimensional approach for detecting irony in twitter . Language Resources and Evaluation , 47 ( 1 ): 239 { 268 , 2013 .

15. Sara

Rosenthal

, Preslav Nakov, Alan Ritter, and

Veselin

Stoyanov . Semeval-2014 task 9: Sentiment analysis in twitter . Proc. SemEval , 2014 .

16.

Fabrizio

Sebastiani . Machine learning in automated text categorization. ACM computing surveys (CSUR) , 34 ( 1 ):1{ 47 , 2002 .

17.

Carlo

Strapparava and

Alessandro

Valitutti . Wordnet a ect: an a ective extension of wordnet . In LREC , volume 4 , pages 1083 { 1086 , 2004 .

18. Maite

Taboada

, Julian Brooke, Milan To loski, Kimberly Voll, and

Manfred

Stede . Lexicon-based methods for sentiment analysis . Computational linguistics , 37 ( 2 ): 267 { 307 , 2011 .

19. Janyce

Wiebe

, Theresa Wilson, and

Claire

Cardie . Annotating expressions of opinions and emotions in language . Language resources and evaluation , 39 ( 2-3 ): 165 { 210 , 2005 .