TASS 2015, septiembre 2015, pp 65-70 recibido 09-07-15 revisado 24-07-15 aceptado 29-07-15 Sentiment Analysis for Twitter: TASS 2015∗ Análisis de sentimientos en Twitter: TASS 2015 Oscar S. Siordia Mario Graff Daniela Moctezuma Sabino Miranda-Jimenez CentroGEO Eric S. Tellez Col. Lomas de Padierna, Elio-Atenógenes Villaseñor Delegación Tlalpan, INFOTEC CP. 14240, México D.F. Ave. San Fernando 37, osanchez;dmoctezuma@centrogeo.edu.mx Tlalpan, Toriello Guerra, 14050 Ciudad de México, D.F. mario.graff;sabino.miranda; eric.tellez;elio.villasenor@infotec.com.mx Resumen: En este artı́culo se presentan los resultados obtenidos en la tarea 1: clasificación global de cinco niveles de polaridad para un conjunto de tweets en español, del reto TASS 2015. En nuestra metodologı́a, la representación de los tweets estuvo basada en caracterı́sticas lingüisticas y de polaridad como lematizado de palabras, filtros de palabras, reglas de negación, entre otros. Además, se uti- lizaron diferentes transformaciones como LDA, LSI y la matriz TF-IDF, todas estas representaciones se combinaron con el clasificador SVM. Los resultados muestran que LSI y la matriz TF-IDF mejoran el rendimiento del clasificador SVM utilizado. Palabras clave: Análisis de sentimiento, Minerı́a de opinión, Twitter Abstract: In this paper we present experiments for global polarity classification task of Spanish tweets for TASS 2015 challenge. In our methodology, tweets rep- resentation is focused on linguistic and polarity features such as lemmatized words, filter of content words, rules of negation, among others. In addition, different trans- formations are used (LDA, LSI, and TF-IDF) and combined with a SVM classifier. The results show that LSI and TF-IDF representations improve the performance of the SVM classifier applied. Keywords: Sentiment analysis, Opinion mining, Twitter. 1 Introduction ion or any level of each of them. Determin- ing whether a text document has a positive In last years the production of textual doc- or negative opinion is turning to an essential uments in social media has increased ex- tool for both public and private companies ponentially. This ever-growing amount of (Peng, Zuo, y He, 2008). This tool is use- available information promotes the research ful to know “What people think”, which is and business activities around opinion min- an important information in order to help to ing and sentiment analysis areas. In so- any decision-making process (for any level of cial media, people share their opinions about government, marketing, etc.) (Pang y Lee, events, other people and organizations. This 2008). With this purpose, in this paper we is the main reason why text mining is becom- describe the methodology employed for the ing an important research topic. Automatic workshop TASS 2015 (Taller de Análisis de sentiment analysis in text is one of most im- Sentimientos de la SEPLN). The TASS work- portant task in text mining. The task of sen- shop is an event of SEPLN conference, which timent classification determines if one docu- is a conference in Natural Language Process- ment has positive, negative or neutral opin- ing for Spanish language. The purpose of ∗ This work was partially supported by Cátedras TASS is to provide a discussion and a point of CONACYT program sharing about latest research work in the field Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073 Oscar S. Siordia, Daniela Moctezuna, Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Elio-Atenógenes Villaseñor of sentiment analysis in social media (specifi- results of classification for task 1 are 64.32% cally Twitter in Spanish language). In TASS (six labels) and 70.89% (four labels). F1 (F- workshop, several challenge tasks are pro- Measure) is 70.48% in task 2 and 90% in task posed, and furthermore a benchmark dataset 3. is proposed to compare the algorithms and Another method to sentiment extraction systems of participants (for more details see and classification on unstructured text is pro- (Villena-Román et al., 2015)). posed in (Shahbaz, Guergachi, y ur Rehman, Several methodologies to classify tweets 2014). Here, five labels were used to senti- from Task 1, Sentiment Analysis at global ment classification: Strongly Positive, Posi- level of TASS workshop 2015, are presented tive, Neutral, Negative and Strongly Nega- in this work. This task is to perform an au- tive. The solution proposed combines tech- tomatic sentiment classification to determine niques of Natural language processing at sen- the global polarity (six polarity levels P, P+, tence level and algorithms of opinion mining. NEU, N, N+ and NONE) of each tweet in the The accuracy results were 61% for five levels provided dataset. With this purpose, several and 75% by reducing to three levels (Positive, solutions have been proposed in this work. negative and neutral). The paper is organized as follows, a brief In (Antunes et al., 2011) an ensemble overview of related works is shown in Section based on SVM and AIS (Artificial Immune 2, the proposed methodology is describe in Systems) is proposed. Here, the main idea Section 3. Section 4 shows the experimen- is that SVM can be enhanced with AIS ap- tal results and analysis, and finally, Section proaches which can capture dynamic mod- 5 concludes. els. Experiments were carried out with the Reuters-21578 benchmark dataset. The re- 2 Related work ported results show a 95.52% of F1. Nowadays, several methods have been pro- An approach of multi-label sentiment posed in the community of opinion mining classification is proposed in (Liu y Chen, and sentiment analysis. Most of these works 2015). This approach has three main com- employ twitter as a principal input of data ponents: text segmentation, feature extrac- and they aimed to classify entire documents tion and multi-label classification. The fea- as overall positive or negative polarity levels tures used included raw segmented words (sentiment) or rating scores (i.e. 1 to 5 stars). and sentiment features based on three sen- Such is a case of work presented in (da timent dictionaries: DUTSD, NTUSD and Silva, Hruschka, y Jr., 2014), which proposes HD. Moreover, here, a detailed study of sev- an approach to classify sentiment of tweets by eral multi-label classification methods is con- using classifier ensembles and lexicons; where ducted, in total 11 state-of-the-art meth- tweets are classified as positive or negative. ods have been considered: BR, CC, CLR, As a result, this work concludes that classi- HOMER, RAkEL, ECC, MLkNN, and RF- fier ensembles formed by several and diverse PCT, BRkNN, BRkNN-a and BRkNN-b. components are promising for tweet senti- These methods were compared in two mi- ment classification. Moreover, several state- croblog datasets and the reported results of of-the-art techniques were compared in four all methods are around of 0.50 of F1. databases. The best accuracy result reported In summary, most of works analyzed clas- was around 75%. sify the documents mainly in three polari- In (Lluı́s F. Hurtado, 2014) is described ties: positive, neutral and negative. More- the participation of ELiRF research group in over, most of works use social media (mainly TASS 2014 workshop (winners of TASS work- Twitter) as analyzed documents. In this shop 2014). Here, the winner approaches work, several methods to classify sentiment used for four tasks are detailed. The pro- in tweets are described. These methods were posed methodology uses SVM (Support Vec- implemented, according with TASS work- tor Machines) with 1-vs-all approach. More- shop specifications, with the purpose of clas- over, Freeling (Padró y Stanilovsky, 2012) sify tweets in six polarity levels: P+, P, Neu- was used as lemmatizer and Tweetmotif1 to tral, N+, N and None. The proposed method tokenizer to Spanish language. The accuracy are based on several standard techniques as LDA (Latent Dirichlet Allocation), LSI (La- 1 http://tweetmotif.com/about tent Semantic Indexing), TF-IDF matrix in 66 Sentiment Analysis for Twitter: TASS 2015 combination with the well-known SVM clas- tom): sifier. cs|xc → x 3 Proposed solution qu → k In this section the proposed solution is de- gue|ge → je tailed. First, a preprocessing step was car- gui|gi → ji ried out, later a Pseudo-phonetic transforma- sh|ch → x tion was done and finally the generation of Q-gram expansion was employed. ll → y z→s 3.1 Preprocessing step h→ Preprocessing focuses on the task of find- c[a|o|u] → k ing a good representation for tweets. Since c[e|i] → s tweets are full of slang and misspellings, we normalize the text using procedures such as w→u error correction, usage of special tags, part v→b of speech (POS) tagging, and negation pro- ΨΨ → Ψ cessing. Error correction consists on reduc- Ψ∆Ψ∆ → Ψ∆ ing words/tokens with invalid duplicate vow- els and consonants to valid/standard Span- In our transformation notation, square ish words (ruidoooo → ruido; jajajaaa → ja; brackets do not consume symbols and Ψ, ∆ jijijji → ja). Error correction uses an ap- means for any valid symbols. The idea is proach based on a Spanish dictionary, statis- not to produce a pure phonetic transforma- tical model for common double letters, and tion as in Soundex (Donald, 1999) like al- heuristic rules for common interjections. In gorithms, but try to reduce the number of the case of the usage of special tags, twitter’s possible errors in the text. Notice that the users (i.e., @user) and urls are removed us- last two transformation rules are partially ing regular expressions; in addition, we clas- covered by the statistical modeling used for sify 512 popular emoticons into four classes correcting words (explained in preprocess- (P, N, NEU, NONE), which are replaced by a ing step). Nonetheless, this pseudo-phonetic polarity tag in the text, e.g., positive emoti- transformation does not follow the statistical cons such as :), :D are replaced by POS, rules of the previous preprocessing step. and negative emoticons such as :(, :S are re- placed by NEG. In the POS-tagging step, all 3.3 Q-gram expansion words are tagged and lemmatized using the Along with the placing bag of words repre- Freeling tool for Spanish language (Padró y sentation (of the normalized text) we added Stanilovsky, 2012), stop words are removed, the 4 and 5 gram of characters of the nor- and only content words (nouns, verbs, ad- malized text. Blank spaces were normalized jetives, adverbs), interjections, hashtags, and and taken into account to the q-gram expan- polarity tags are used for data representation. sion; so, some q-grams will be over more than In negation step, Spanish negation markers one word. In addition of these previous steps, are attached to the nearest content word, e.g., several transformations (LSI, LDA and TF- ‘no seguir’ is replaced by ‘no seguir’, ‘no es IDF matrix) were conducted to generate sev- bueno’ is replaced by ‘no bueno’, ‘sin comida’ eral data models for testing phase. is replaced by ‘no comida’; we use a set of heuristic rules for negations. Finally, all di- 4 Results and analysis acritic and punctuation symbols are also re- The classifier submitted to the competition moved. was selected using the following procedure. The 7, 218 tweets with 6 polarity levels were 3.2 Psudo-phonetic split in two sets. Firstly, the tweets pro- transformation vided were shuffled and then the first set, With the purpose of reducing typos and hereafter the training set, was created with slangs we applied a semi-phonetic transfor- the first 6, 496 tweets (approximately 90% of mation. First, we applied the following trans- dataset), and, the second set, hereafter the formations (with precedence from top to bot- validation set, was composed by the rest 722 67 Oscar S. Siordia, Daniela Moctezuna, Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Elio-Atenógenes Villaseñor tweets (approximately 10% of dataset). The Table 1 complements the information pre- training set was used to fit a Support Vector sented on Figure 1. Machine (SVM) using a linear kernel2 with The table presents the score F1 per polar- C = 1, weights inversely proportional to the ity and the average (Macro-F1) for different class frequencies, and using one vs rest multi- configurations. The table is divided in five class strategy. The validation set was used to blocks, the first and second correspond to a select the best classifier using as performance SVM with LSI (400 topics) and TF-IDF, re- the score F1. spectively. It is observed that TF-IDF out- The first step was to model the data us- performed LSI; within LSI and TF-IDF it ing different transformations, namely Latent can be seen that 5-gram and 4-gram got the Dirichlet Allocation (LDA) using an online best performance in LSI and TF-IDF, respec- learning proposed by (Hoffman, Bach, y Blei, tively. 2010), Latent Semantic Indexing (LSI), and The third row block presents the perfor- TF-IDF.3 Figure 1 presents the score F1, in mance when the features are a direct addition the validation set, of a SVM using either LSI of LSI and TF-IDF; here it is observed that or LDA with normalized text, different lev- the best performance is with 4-gram further- els of Q-gram (4-gram and 5-gram), and the more it had the best overall performance in number of topics is varied from 10 to 500 as N+. The forth row block complements the well. It is observed that LSI outperformed previous results by presenting the best per- LDA in all the configurations tested. Com- formance of LSI and TF-IDF, that is, LSI paring the performance between normalized with 5-gram and TF-IDF with 4-gram. It text, 4-gram, and 5-gram, it is observed an is observed that this configuration has the equivalent performance. Given that the im- best overall performance in P+, N, None and plemented LSI depends on the order of the average (Macro-F1). Finally, the last row documents more experiments are needed to block gives an indicated of whether the pho- know whether any particular configuration is netic transformation is making any improve- statistically better than other. Even though ment. The conclusion is that the phonetic the best configuration is LSI with 400 topics transformation is making a difference; how- and 5-gram, this system is not competitive ever, more experiments are needed in order enough compared with the performance pre- to know whether this difference is statistically sented by the best algorithm in TASS 2014. significant. Based on the score F1 presented on Table 1 the classifier submitted to the competition is a SVM with a direct addition of LSI using 400 topics and 4-gram and LDA with 5-gram. This classifier is identified as INGEOTEC- M14 in the competition. The SVM, LSI and LDA were trained with the 7218 tweets and then this instance was used to predict the 6 polarity levels of the competition tweets. This procedure was replicated for the 4 po- larity levels competition. Table 4 presents the accuracy, average re- call, precision, and F1 of INGEOTEC-M1 run using the validation set created, a 10- fold crossvalidation on the 7218 tweets and the 1k tweets evaluated by the system’s com- Figure 1: Performance in terms of the score petition. This performance was on the 5 po- F1 on the validation set for different number larity levels challenge. It is observed from the of topics using LSI and LDA with different table that the 10-fold crossvalidation gives a Q-gram. much better estimation of the performance 2 4 The SVM was the class LinearSVC implemented We also submitted another classifier identified as in (Pedregosa et al., 2011) INGEOTEC-E1; however, the algorithm presented a 3 The implementations used for LDA, LSI, and TF- bug that could not be find out on time for the com- IDF were provided by (Řehůřek y Sojka, 2010). petition. 68 Sentiment Analysis for Twitter: TASS 2015 P P+ N N+ Neutral None Average SVM + LSI Text 0.238 0.549 0.403 0.348 0.025 0.492 0.343 4-gram 0.246 0.543 0.404 0.333 0.048 0.533 0.351 5-gram 0.246 0.552 0.462 0.356 0.000 0.575 0.365 SVM + TF-IDF Text 0.271 0.574 0.414 0.407 0.103 0.511 0.380 4-gram 0.290 0.577 0.477 0.393 0.130 0.589 0.409 5-gram 0.302 0.577 0.476 0.379 0.040 0.586 0.393 SVM + {LSI + TF-IDF} 4-gram 0.297 0.578 0.471 0.421 0.142 0.578 0.415 5-gram 0.307 0.567 0.474 0.391 0.040 0.579 0.393 SVM + {LSI with 4-gram + TF-IDF with 5-gram} 4-5-gram 0.282 0.596 0.481 0.407 0.144 0.595 0.417 SVM + {LSI + TF-IDF without phonetic transformation} 4-5-gram 0.324 0.577 0.459 0.395 0.150 0.593 0.416 Table 1: Score F1 per polarity level and average (Macro-F1) on the validation set for LSI (with 400 topics) and TF-IDF with different levels of Q-gram. The best performance in each polarity is indicated in boldface. of the classifier when tested on 1k tweets of solution. This proposed solution uses a com- the competition (90% of training and 10% of bination of LSI with 4-gram + TF-IDF with validation). 5-gram, and a SVM classifier (one-vs-one ap- In summary, in this work the best result proach). reached was a 0.404 of F1. This result was achieved with a combination of LSI with 4- Acknowledgements gram + TF-IDF with 5-gram, using a SVM This research is partially supported by the classifier (one-vs-one approach). Cátedras CONACyT project. Furthermore, Acc. Recall Precision F1 the authors would like to thank CONACyT Val. 0.471 0.428 0.421 0.417 for supporting this work through the project 10-fold 0.443 0.397 0.395 0.393 247356 (PN2014). Comp. 0.431 0.411 0.398 0.404 Bibliography Table 2: Accuracy (Acc.), average recall, av- erage precision and average F1 of the classi- Antunes, Mário, Catarina Silva, Bernardete fier in the validation set (Val.), using a 10- Ribeiro, y Manuel Correia. 2011. A hy- fold cross-validation (7, 218 tweets), and as brid ais-svm ensemble approach for text reported by the competition (comp.) on 1k classification. En Adaptive and Natu- tweets. ral Computing Algorithms, volumen 6594 de Lecture Notes in Computer Science. Springer Berlin Heidelberg, páginas 342– 5 Conclusions 352. In this contribution, we presented the ap- proach used to tackle the polarity classifica- da Silva, Nádia F.F., Eduardo R. Hruschka, tion task of Spanish tweets of TASS 2015. y Estevam R. Hruschka Jr. 2014. Tweet From the results, it is observed that a combi- sentiment analysis with classifier ensem- nation of different data models, in this case bles. Decision Support Systems, 66(0):170 LSI and TF-IDF, improves the performance – 179. of a SVM classifier. It also noted that the Donald, E Knuth. 1999. The art of com- phonetic transformation makes an improve- puter programming. Sorting and search- ment; however, more experiments are needed ing, 3:426–458. to know whether this improvement is statis- tically significant. As a result, we obtained a Hoffman, Matthew, Francis R Bach, y 0.404 of F1 (macro-F1) in sentiment classifi- David M Blei. 2010. Online learning for cation task at five levels, with the proposed latent dirichlet allocation. En advances 69 Oscar S. Siordia, Daniela Moctezuna, Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Elio-Atenógenes Villaseñor in neural information processing systems, Martı́nez-Cámara, M. Teresa Martı́n- páginas 856–864. Valdivia, y L. Alfonso Ureña-López. 2015. Overview of TASS 2015. Liu, Shuhua Monica y Jiun-Hung Chen. 2015. A multi-label classification based approach for sentiment classification. Ex- pert Systems with Applications, 42(3):1083 – 1093. Lluı́s F. Hurtado, Ferran Pla. 2014. Elirf- upv en TASS 2014: Análisis de sentimien- tos, detección de tópicos y análisis de sen- timientos de aspectos en twitter. Proc. of the TASS workshop at SEPLN 2014. Padró, Lluı́s y Evgeny Stanilovsky. 2012. Freeling 3.0: Towards wider multilin- guality. En Proceedings of the Lan- guage Resources and Evaluation Confer- ence (LREC 2012), Istanbul, Turkey, May. ELRA. Pang, Bo y Lillian Lee. 2008. Opinion min- ing and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1- 2):1–135. Pedregosa, F., G. Varoquaux, A. Gram- fort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, y E. Duchesnay. 2011. Scikit-learn: Ma- chine learning in Python. Journal of Ma- chine Learning Research, 12:2825–2830. Peng, Tao, Wanli Zuo, y Fengling He. 2008. Svm based adaptive learning method for text classification from positive and unla- beled documents. Knowledge and Infor- mation Systems, 16(3):281–301. Řehůřek, Radim y Petr Sojka. 2010. Soft- ware Framework for Topic Modelling with Large Corpora. En Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, páginas 45–50, Val- letta, Malta, Mayo. ELRA. Shahbaz, M., A. Guergachi, y R.T. ur Rehman. 2014. Sentiment miner: A prototype for sentiment analysis of un- structured data and text. En Electri- cal and Computer Engineering (CCECE), 2014 IEEE 27th Canadian Conference on, páginas 1–7, May. Villena-Román, Julio, Janine Garcı́a-Morera, Miguel A. Garcı́a-Cumbreras, Eugenio 70