-

Sentiment Analysis for Real-time Applications

Javi Fernandez

javifm@ua.es 0 0 University of Alicante

In this paper we present a supervised hybrid approach for Sentiment Analysis in Real-time Applications. The main goal of this work is to design an approach which employs very few resources but obtains near state-of-the-art results.

Recent years have seen the birth of Social Networks and Web 2.0. They have facilitated people to share aspects and opinions about their everyday life. This subjective information can be interesting for general users, brands and organisations. However, the vast amount of information (for example, over 500 million messages per day in Twitter1) complicates traditional sentiment analysis systems to process this subjective information in realtime. The performance of sentiment analysis tools has become increasingly critical.

The main goal of our work is to design a sentiment analysis approach oriented to realtime applications. An approach that balances e ciency and quality. It must employ very few resources, in order to be able to process as many texts as possible. This will also make sentiment analysis more accessible for

This research work has been partially funded by the University of Alicante, Generalitat Valenciana, Spanish Government, Ministerio de Educacion, Cultura y Deporte and Ayudas Fundacion BBVA a equipos de investigacion cient ca 2016 through the projects TIN2015-65100-R, TIN201565136-C2-2-R, PROMETEOII/2014/001, GRE1601: Plataforma inteligente para recuperacion, analisis y representacion de la informacion generada por usuarios en Internet and Analisis de Sentimientos Aplicado a la Prevencion del Suicidio en las Redes Sociales (ASAP).

1www.internetlivestats.com/twitter-statistics everybody. In addition, the quality of the approach should be near the state-of-the-art results. In the following sections we explain our approach in detail. Section 2 brie y describes the related work in the eld and introduce our work. In Section 3 we detail the approach we propose. Finally, Section 4 concludes the paper, and outlines the future work. 2

Related Work

Two main approaches can be followed: machine learning and lexicon-based (Taboada et al., 2011; Medhat, Hassan, y Korashy, 2014; Mohammad, 2015; Ravi y Ravi, 2015) . Machine learning approaches treat polarity classi cation as a text categorisation problem. Texts are usually represented as vectors of features, and depending on the features used, the system can reach better results. If a labelled training set of documents is needed, the approach is de ned as supervised learning; if not, it is de ned as unsupervised learning. These approaches perform very well in the domain they are trained on, but their performance drops when the same classi er is used in a di erent domain (Pang y Lee, 2008; Tan et al., 2009) . In addition, if the number of features is big, the e ciency drops dramatically. Lexicon-based approaches make use of dictionaries of opinionated words and phrases to discern the polarity of a text. In these approaches, each word in the dictionary is assigned a score for each sentiment (e.g. positivity and negativity). To detect the polarity of a text, the scores of its words are combined, and the polarity with the greatest score is chosen. These dictionaries can be generated manually (Tong, 2001) , semiautomatically from an initial seed of opinionated words (Kim, Rey, y Hovy, 2004; Baccianella, Esuli, y Sebastiani, 2010) , or automatically from a labelled dataset (Jijkoun, de Rijke, y Weerkamp, 2010; Cruz et al., 2013) . The major disadvantage of these approaches is the incapability to nd opinion words with domain and context speci c orientations, while the last one helps to solve this problem (Medhat, Hassan, y Korashy, 2014) . These approaches are usually faster than machine learning ones, as the combination of scores is normally a prede ned mathematical function. We decided to use a hybrid approach, trying to take advantage of the machine learning approach categorisation quality and the lexicon approach speed.

Most of the current sentiment analysis approaches employ words, n-grams and phrases as information units for their models, either as features for machine learning approaches, or as dictionary entries in the lexicon-based approaches. However, words and n-grams have some problems to represent the exibility and sequentiality of human language. This is the reason why we decided to use skipgrams. The use of skipgrams is a technique whereby n-grams are formed (bigrams, trigrams, etc.), but in addition to using adjacent sequences of words, it also allows some words to be skipped (Guthrie et al., 2006) . In this way, skipgrams are new terms that retain part of the sequentiality of the terms, but in a more exible way than n-grams (Fernandez et al., 2014) . Note that an n-gram can be dened as a 0-skip-n-gram, a skipgram where k = 0. For example, the sentence \I love healthy food" has two word level trigrams: \I love healthy" and \love healthy food". However, there is one important trigram implied by the sentence that was not captured: \I love food". The use of skipgrams allows the word \health" to be skipped, providing the mentioned trigram. 3

Methodology

Our contribution consists in a hybrid approach which creates a lexicon from a labelled dataset and builds a polarity classi er from the dataset and the generated lexicon with machine learning techniques. Its architecture can be seen in Figure 1. In the following subsections we explain the di erent parts of our approach in detail.

Dataset

Tokenisation Lexicon Generation

Supervised Learning Lexicon Classifier We tried to employ the minimum number of external linguistic tools, to minimise the possible propagation of external errors, in addition to the extra time they can consume. The tokenisation process starts obtaining all the words in the text. We only extract words containing alphabetic characters. Numbers, punctuation symbols, or emoticons, are not considered at this moment, but we are studying the best way to include them in the future. The only external resource we employ for the tokenisation process is a stemmer to obtain the most general form of the words we extracted. We preferred a stemmer over a lemmatiser because they are much faster (Balakrishnan y Lloyd-Yemoh, 2014) and require less resources, one of the goals of our approach. Speci cally, we used the Snowball 2 implementation for each language.

Once we have the words in the text, we combine them using the skipgram modelling to obtain multiword terms. We will use two variables in this work: n will be the maximum number of words when building a new term with the skipgram modelling, and k will be the maximum number of skips. Note that n = 3 includes all the terms with 1, 2 and 3 words, and k = 3 includes 1, 2 and 3 skips. 3.2

Lexicon generation In summary, our sentiment lexicon consists of a list of terms for each polarity, assigning a score indicating how strongly that term is 2snowball.tartarus.org related to that polarity. To build this lexicon, we need a polarity labelled dataset, which will provide both the terms in the lexicon and their scores. There exist many term scoring techniques (Yang y Pedersen, 1997; Chandrashekar y Sahin, 2014) , and the majority of them employ probabilities to calculate the scores. However, they take full advantage of the skipgram modelling, because they give the same importance to terms where words were adjacent, than to those where the words were not adjacent (we skipped some of them). Because of this, we created our custom scoring formula.

First, we will describe our counting formulas. In general, when we want to count the number of documents the term t occurs, we usually loop over the dataset and add 1 each time we nd that term in a document. Instead, we add a value that is inversely proportional to the number of skips. This is what formulas in Equations 1 and 2 do, where D is the labelled dataset; jDj is the number of documents in D, d is a document in D, Dp is the subset of documents in D labelled with polarity p, jtj is the number of words in term t, and (t; d) is the number of skips of term t in document d.

C(t) = C(t; p) =

X [t 2 d] d2D X [t 2 d] d2Dp

jtj jtj + (t; d)

With this counting formulas, the number of skips is taken into account, and we can build our nal scoring formula shown in Equation 3, where s(t; p) is the score of term t for the polarity p, and is a factor that gives more relevance to terms that appear a largest number of times. This factor depends on the size and the domain of the dataset. s(t; p) =

C(t; p)

C(t)

C(t; p) C(t; p) +

At the end of this process we have a list of skipgrams with a score for each polarity: our sentiment lexicon. Table 1 shows an example of a dictionary built using the Movie Reviews dataset (Pang, Lee, y Vaithyanathan, 2002) , with n = 2 and k = 10. In this example, we show only the best ve terms for each polarity. (1) (2) (3)

Negative

this mess worst movie is terrible ludicrous waste

Positive

outstanding is terri c

nest breathtaking is excellent

Score

.862 .826 .823 .803 .795 We use machine learning techniques to create a model able to classify the polarity of new texts. The documents in the dataset are employed as training instances, and the labelled polarities are used as categories. However, in contrast with text classi cation approaches, we do not create one feature per term, we create a feature per polarity. In other words, we have the same number of features and categories. Our hypothesis is that this number of features is enough to obtain a decent system quality with a low latency. The weight of each feature is calculated as speci ed in Equation 4, where w(d; p) is the weight of the feature for polarity p in document d. w(d; p) = X s(t; p) t2d

jtj jtj + (t; d) (4)

Table 2 shows an example of feature weighting for the text \worst movie ever" using again the scores of a dictionary built using the Movie Reviews dataset, with n = 2 and k = 10. The nal weights (positive = 1:48, negative = 3:40) will be employed as feature weights for the machine learning process.

To build our model we employed Support Vector Machines (SVM), as it has been proved to be e ective on text categorisation tasks (Sebastiani, 2002; Mohammad, Kiritchenko, y Zhu, 2013) . Speci cally, we used the Weka3 (Hall et al., 2009) default implementation with the default parameters (linear kernel, C = 1, = 0:1).

3www.cs.waikato.ac.nz/ml/weka worst movie ever worst movie worst ever movie ever weight(w) In this paper we presented a supervised hybrid approach for Sentiment Analysis in Twitter. We built a sentiment lexicon from a polarity dataset using statistical measures. We employed skipgrams as information units, to enrich the sentiment lexicon with combinations of words that do not appear explicitly in the text. The lexicon created was used in conjunction with machine learning techniques to create a polarity classi er.

Preliminary performance experiments have shown an acceptable speed to be employed in real-time applications4. Processing speeds go from 1; 000 documents per second in the worst cases (long texts, great values for n and k) to 10; 000 in the best cases (short texts, low values for n and k). These numbers are good enough to work with extensively used platforms like Twitter, where users generate over 500 million tweets per day (this is almost 6,000 tweets per second)5.

Moreover, experiments with di erent datasets have also obtained promising results (Fernandez et al., 2013; Fernandez, Gomez, y Mart nez-Barco, 2014; Fernandez et al., 2014; Gutierrez, Tomas, y Fernandez, 2015; Fernandez et al., 2015) . Experiments with the Movie Reviews dataset (Pang, Lee, y Vaithyanathan, 2002) obtained an accuracy of 86.7%, with long texts in English and 2-level polarity, and 64.7% with the TASS 2012 dataset (Villena-Roman y Garc a-Morera, 2013) for Spanish tweets and 6-level polarity.

As future work, we plan to study new methods to calculate and combine the weight 4Using a Macbook Pro 2.4 GHz i5 with 8GB RAM 5www.internetlivestats.com/twitter-statistics of the skipgrams. We also want to add more features to the machine learning algorithm, but always trying to maintain a small number of them, in order to avoid increasing the latency. In addition, we want to include external resources and tools, such as knowledge from existing sentiment lexicons, but always focused in real-time applications. We will also extend our study to di erent corpora and domains, to con rm the robustness of the approach. Tweets. En Proceedings of the International Workshop on Semantic Evaluation (SemEval-2013).

Baccianella , S. , A . Esuli, y

Sebastiani . 2010 . Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining . En LREC , volumen 10 , paginas 2200 { 2204 .

Balakrishnan , V. y E.

Lloyd-Yemoh . 2014 . Stemming and lemmatization: a comparison of retrieval performances . Lecture Notes on Software Engineering , 2 ( 3 ): 262 .

Chandrashekar , G. y F.

Sahin . 2014 . A survey on feature selection methods . Computers & Electrical Engineering , 40 ( 1 ): 16 { 28 .

Cruz , F. L. ,

J. A.

Troyano ,

Enr quez , F. J. Ortega, y

C. G.

Vallejo . 2013 . Long autonomy or long delay? the importance of domain in opinion mining . Expert Systems with Applications , 40 ( 8 ): 3174 { 3184 .

Fernandez , J. ,

J. M.

Gomez , y P. Mart nezBarco. 2014 . A supervised approach for sentiment analysis using skipgrams . En 11th International Workshop on Natural Language Processing and Cognitive Science (NAACL).

Fernandez , J. ,

Gutierrez ,

J. M.

Gomez , y P. Mart nez-Barco. 2014 . Gplsi: Supervised sentiment analysis in twitter using skipgrams . En Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014 ), paginas 294 { 299 .

Fernandez , J. ,

Gutierrez ,

J. M.

Gomez , P.

Mart nez-

Barco , A . Montoyo, y

Munoz . 2013 . Sentiment analysis of spanish tweets using a ranking algorithm and skipgrams . En XXIX Congreso de la Sociedad Espanola de Procesamiento de Lenguaje Natural (SEPLN 2013 ), paginas 133 { 142 .

Fernandez , J. ,

Gutierrez ,

J. M.

Gomez , y P. Mart nez-Barco. 2014 . GPLSI: Supervised Sentiment Analysis in Twitter using Skipgrams . En Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014 ), numero SemEval, paginas 294 { 299 .

Fernandez , J. ,

Gutierrez ,

Tomas ,

J. M.

Gomez , y P. Mart nez-Barco. 2015 . Evaluating a sentiment analysis approach from a business point of view.

Guthrie , D. ,

Allison , W. Liu, L. Guthrie, y

Wilks . 2006 . A Closer Look at Skip-gram Modelling . En 5th international Conference on Language Resources and Evaluation (LREC 2006 ), paginas 1{ 4 .

Gutierrez , Y. , D. Tomas,

y J.

Fernandez . 2015 . Bene ts of using ranking skip-gram techniques for opinion mining approaches . En eChallenges e-2015 Conference , 2015 , paginas 1 { 10 . IEEE.

Hall , M. ,

Frank ,

Holmes ,

Pfahringer , P. Reutemann,

y I. H.

Witten . 2009 . The weka data mining software: an update . ACM SIGKDD explorations newsletter , 11 ( 1 ): 10 { 18 .

Jijkoun , V. , M. de Rijke, y

Weerkamp . 2010 . Generating focused topic-speci c sentiment lexicons . En Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics , paginas 585 { 594 . Association for Computational Linguistics .

Kim , S.-m., M. Rey, y E.

Hovy . 2004 . Determining the Sentiment of Opinions . En Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004 ), pagina 1367.

Medhat , W. , A. Hassan, y

Korashy . 2014 . Sentiment Analysis Algorithms and Applications: a Survey. Ain Shams Engineering Journal .

Mohammad , S. M.

2015 . Sentiment analysis: Detecting valence, emotions, and other affectual states from text . Emotion measurement, paginas 201 { 238 .

Mohammad , S. M. , S . Kiritchenko, y

Zhu . 2013 . NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Pang, B . y

Lee . 2008 . Opinion Mining and Sentiment Analysis . Foundations and Trends in Information Retrieval , 2 ( 1 {2):1{ 135 .

Pang , B. , L. Lee, y

Vaithyanathan . 2002 . Thumbs up? Sentiment Classi cation using Machine Learning Techniques . En Conference on Empirical Methods in Natural Language Processing (EMNLP 2002 ), numero July, paginas 79 { 86 .

Ravi , K. y V.

Ravi . 2015 . A survey on opinion mining and sentiment analysis: tasks, approaches and applications . KnowledgeBased Systems , 89 : 14 { 46 .

Sebastiani , F.

2002 . Machine Learning in Automated Text Categorization. ACM Computing Surveys (CSUR) , 34 ( 1 ):1{ 47 , 3 .

Taboada , M. ,

Brooke , M. To loski, K. Voll, y

Stede . 2011 . Lexicon-based methods for sentiment analysis . Computational Linguistics , 37 ( 2 ): 267 { 307 .

Tan , S. , X. Cheng, Y. Wang, y

Xu . 2009 . Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis . Advances in Information Retrieval , paginas 337 { 349 .

Tong , R. M.

2001 . An operational system for detecting and tracking opinions in on-line discussion . En Working Notes of the ACM SIGIR 2001 Workshop on Operational Text Classi cation, volumen 1, pagina 6.

Villena-Roman , J. y J.

Garc a-Morera. 2013 . TASS 2013-Workshop on Sentiment Analysis at SEPLN 2013 : An overview . En XXIX Congreso de la Sociedad Espan~ola de Procesamiento de Lenguaje Natural (SEPLN 2013 ).

Yang , Y. y J. O.

Pedersen . 1997 . A comparative study on feature selection in text categorization . En Icml , volumen 97 , paginas 412 { 420 .