-

1613-0073

Sentiment Analysis for Twitter: TASS 2015

Delegacion Tlalpan

Mexico D.F. osanchez

dmoctezuma@centrogeo.edu.mx

0 0 Mario Gra Sabino Miranda-Jimenez Eric S. Tellez Elio-Atenogenes Villasen~or INFOTEC Ave. San Fernando 37 , Tlalpan, Toriello Guerra, 14050 Ciudad de Mexico, D.F. mario.gra ;sabino.miranda

2015

1397 65 70

In this paper we present experiments for global polarity classi cation task of Spanish tweets for TASS 2015 challenge. In our methodology, tweets representation is focused on linguistic and polarity features such as lemmatized words, lter of content words, rules of negation, among others. In addition, di erent transformations are used (LDA, LSI, and TF-IDF) and combined with a SVM classi er. The results show that LSI and TF-IDF representations improve the performance of the SVM classi er applied.

In last years the production of textual documents in social media has increased exponentially. This ever-growing amount of available information promotes the research and business activities around opinion mining and sentiment analysis areas. In social media, people share their opinions about events, other people and organizations. This is the main reason why text mining is becoming an important research topic. Automatic sentiment analysis in text is one of most important task in text mining. The task of sentiment classi cation determines if one document has positive, negative or neutral opinion or any level of each of them. Determining whether a text document has a positive or negative opinion is turning to an essential tool for both public and private companies (Peng, Zuo, y He, 2008) . This tool is useful to know \What people think", which is an important information in order to help to any decision-making process (for any level of government, marketing, etc.) (Pang y Lee, 2008) . With this purpose, in this paper we describe the methodology employed for the workshop TASS 2015 (Taller de Analisis de Sentimientos de la SEPLN). The TASS workshop is an event of SEPLN conference, which is a conference in Natural Language Processing for Spanish language. The purpose of TASS is to provide a discussion and a point of sharing about latest research work in the eld of sentiment analysis in social media (speci cally Twitter in Spanish language). In TASS workshop, several challenge tasks are proposed, and furthermore a benchmark dataset is proposed to compare the algorithms and systems of participants (for more details see (Villena-Roman et al., 2015) ).

Several methodologies to classify tweets from Task 1, Sentiment Analysis at global level of TASS workshop 2015, are presented in this work. This task is to perform an automatic sentiment classi cation to determine the global polarity (six polarity levels P, P+, NEU, N, N+ and NONE) of each tweet in the provided dataset. With this purpose, several solutions have been proposed in this work.

The paper is organized as follows, a brief overview of related works is shown in Section 2, the proposed methodology is describe in Section 3. Section 4 shows the experimental results and analysis, and nally, Section 5 concludes. 2

Related work

Nowadays, several methods have been proposed in the community of opinion mining and sentiment analysis. Most of these works employ twitter as a principal input of data and they aimed to classify entire documents as overall positive or negative polarity levels (sentiment) or rating scores (i.e. 1 to 5 stars).

Such is a case of work presented in (da Silva, Hruschka, y Jr., 2014) , which proposes an approach to classify sentiment of tweets by using classi er ensembles and lexicons; where tweets are classi ed as positive or negative. As a result, this work concludes that classier ensembles formed by several and diverse components are promising for tweet sentiment classi cation. Moreover, several stateof-the-art techniques were compared in four databases. The best accuracy result reported was around 75%.

In (Llu s F. Hurtado, 2014) is described the participation of ELiRF research group in TASS 2014 workshop (winners of TASS workshop 2014). Here, the winner approaches used for four tasks are detailed. The proposed methodology uses SVM (Support Vector Machines) with 1-vs-all approach. Moreover, Freeling (Padro y Stanilovsky, 2012) was used as lemmatizer and Tweetmotif1 to tokenizer to Spanish language. The accuracy results of classi cation for task 1 are 64:32% (six labels) and 70:89% (four labels). F1 (FMeasure) is 70:48% in task 2 and 90% in task 3.

Another method to sentiment extraction and classi cation on unstructured text is proposed in (Shahbaz, Guergachi, y ur Rehman, 2014) . Here, ve labels were used to sentiment classi cation: Strongly Positive, Positive, Neutral, Negative and Strongly Negative. The solution proposed combines techniques of Natural language processing at sentence level and algorithms of opinion mining. The accuracy results were 61% for ve levels and 75% by reducing to three levels (Positive, negative and neutral).

In (Antunes et al., 2011) an ensemble based on SVM and AIS (Arti cial Immune Systems) is proposed. Here, the main idea is that SVM can be enhanced with AIS approaches which can capture dynamic models. Experiments were carried out with the Reuters-21578 benchmark dataset. The reported results show a 95:52% of F1.

An approach of multi-label sentiment classi cation is proposed in (Liu y Chen, 2015) . This approach has three main components: text segmentation, feature extraction and multi-label classi cation. The features used included raw segmented words and sentiment features based on three sentiment dictionaries: DUTSD, NTUSD and HD. Moreover, here, a detailed study of several multi-label classi cation methods is conducted, in total 11 state-of-the-art methods have been considered: BR, CC, CLR, HOMER, RAkEL, ECC, MLkNN, and RFPCT, BRkNN, BRkNN-a and BRkNN-b. These methods were compared in two microblog datasets and the reported results of all methods are around of 0:50 of F1.

In summary, most of works analyzed classify the documents mainly in three polarities: positive, neutral and negative. Moreover, most of works use social media (mainly Twitter) as analyzed documents. In this work, several methods to classify sentiment in tweets are described. These methods were implemented, according with TASS workshop speci cations, with the purpose of classify tweets in six polarity levels: P+, P, Neutral, N+, N and None. The proposed method are based on several standard techniques as LDA (Latent Dirichlet Allocation), LSI (Latent Semantic Indexing), TF-IDF matrix in tom): combination with the well-known SVM classi er. 3

Proposed solution

In this section the proposed solution is detailed. First, a preprocessing step was carried out, later a Pseudo-phonetic transformation was done and nally the generation of Q-gram expansion was employed. 3.1

Preprocessing step

Preprocessing focuses on the task of nding a good representation for tweets. Since tweets are full of slang and misspellings, we normalize the text using procedures such as error correction, usage of special tags, part of speech (POS) tagging, and negation processing. Error correction consists on reducing words/tokens with invalid duplicate vowels and consonants to valid/standard Spanish words (ruidoooo ! ruido; jajajaaa ! ja; jijijji ! ja). Error correction uses an approach based on a Spanish dictionary, statistical model for common double letters, and heuristic rules for common interjections. In the case of the usage of special tags, twitter's users (i.e., @user) and urls are removed using regular expressions; in addition, we classify 512 popular emoticons into four classes (P, N, NEU, NONE), which are replaced by a polarity tag in the text, e.g., positive emoticons such as :), :D are replaced by POS, and negative emoticons such as :(, :S are replaced by NEG. In the POS-tagging step, all words are tagged and lemmatized using the Freeling tool for Spanish language (Padro y Stanilovsky, 2012) , stop words are removed, and only content words (nouns, verbs, adjetives, adverbs), interjections, hashtags, and polarity tags are used for data representation. In negation step, Spanish negation markers are attached to the nearest content word, e.g., `no seguir' is replaced by `no seguir', `no es bueno' is replaced by `no bueno', `sin comida' is replaced by `no comida'; we use a set of heuristic rules for negations. Finally, all diacritic and punctuation symbols are also removed. 3.2

Psudo-phonetic transformation

With the purpose of reducing typos and slangs we applied a semi-phonetic transformation. First, we applied the following transformations (with precedence from top to botcsjxc ! x

qu ! k guejge ! je guijgi ! ji shjch ! x ll ! y z ! s h ! c[ajoju] ! k c[eji] ! s w ! u v ! b ! !

In our transformation notation, square brackets do not consume symbols and ; means for any valid symbols. The idea is not to produce a pure phonetic transformation as in Soundex (Donald, 1999) like algorithms, but try to reduce the number of possible errors in the text. Notice that the last two transformation rules are partially covered by the statistical modeling used for correcting words (explained in preprocessing step). Nonetheless, this pseudo-phonetic transformation does not follow the statistical rules of the previous preprocessing step. 3.3

Q-gram expansion

Along with the placing bag of words representation (of the normalized text) we added the 4 and 5 gram of characters of the normalized text. Blank spaces were normalized and taken into account to the q-gram expansion; so, some q-grams will be over more than one word. In addition of these previous steps, several transformations (LSI, LDA and TFIDF matrix) were conducted to generate several data models for testing phase. 4

Results and analysis

The classi er submitted to the competition was selected using the following procedure.

The 7; 218 tweets with 6 polarity levels were split in two sets. Firstly, the tweets provided were shu ed and then the rst set, hereafter the training set, was created with the rst 6; 496 tweets (approximately 90% of dataset), and, the second set, hereafter the validation set, was composed by the rest 722 tweets (approximately 10% of dataset). The training set was used to t a Support Vector Machine (SVM) using a linear kernel2 with C = 1, weights inversely proportional to the class frequencies, and using one vs rest multiclass strategy. The validation set was used to select the best classi er using as performance the score F1.

The rst step was to model the data using di erent transformations, namely Latent Dirichlet Allocation (LDA) using an online learning proposed by (Ho man, Bach, y Blei, 2010) , Latent Semantic Indexing (LSI), and TF-IDF.3 Figure 1 presents the score F1, in the validation set, of a SVM using either LSI or LDA with normalized text, di erent levels of Q-gram (4-gram and 5-gram), and the number of topics is varied from 10 to 500 as well. It is observed that LSI outperformed LDA in all the con gurations tested. Comparing the performance between normalized text, 4-gram, and 5-gram, it is observed an equivalent performance. Given that the implemented LSI depends on the order of the documents more experiments are needed to know whether any particular con guration is statistically better than other. Even though the best con guration is LSI with 400 topics and 5-gram, this system is not competitive enough compared with the performance presented by the best algorithm in TASS 2014.

Table 1 complements the information presented on Figure 1.

The table presents the score F1 per polarity and the average (Macro-F1) for di erent con gurations. The table is divided in ve blocks, the rst and second correspond to a SVM with LSI (400 topics) and TF-IDF, respectively. It is observed that TF-IDF outperformed LSI; within LSI and TF-IDF it can be seen that 5-gram and 4-gram got the best performance in LSI and TF-IDF, respectively.

The third row block presents the performance when the features are a direct addition of LSI and TF-IDF; here it is observed that the best performance is with 4-gram furthermore it had the best overall performance in N+. The forth row block complements the previous results by presenting the best performance of LSI and TF-IDF, that is, LSI with 5-gram and TF-IDF with 4-gram. It is observed that this con guration has the best overall performance in P+, N, None and average (Macro-F1). Finally, the last row block gives an indicated of whether the phonetic transformation is making any improvement. The conclusion is that the phonetic transformation is making a di erence; however, more experiments are needed in order to know whether this di erence is statistically signi cant.

Based on the score F1 presented on Table 1 the classi er submitted to the competition is a SVM with a direct addition of LSI using 400 topics and 4-gram and LDA with 5-gram.

This classi er is identi ed as INGEOTECM14 in the competition. The SVM, LSI and LDA were trained with the 7218 tweets and then this instance was used to predict the 6 polarity levels of the competition tweets.

This procedure was replicated for the 4 polarity levels competition.

Table 4 presents the accuracy, average recall, precision, and F1 of INGEOTEC-M1 run using the validation set created, a 10fold crossvalidation on the 7218 tweets and the 1k tweets evaluated by the system's competition. This performance was on the 5 polarity levels challenge. It is observed from the table that the 10-fold crossvalidation gives a much better estimation of the performance

4We also submitted another classi er identi ed as INGEOTEC-E1; however, the algorithm presented a bug that could not be nd out on time for the competition. P P+ N N+ Neutral None

SVM + LSI

Text 0:238 0:549 0:403 0:348 0:025 0:492 4-gram 0:246 0:543 0:404 0:333 0:048 0:533 5-gram 0:246 0:552 0:462 0:356 0:000 0:575

SVM + TF-IDF

Text 0:271 0:574 0:414 0:407 0:103 0:511 4-gram 0:290 0:577 0:477 0:393 0:130 0:589 5-gram 0:302 0:577 0:476 0:379 0:040 0:586

SVM + fLSI + TF-IDFg 4-gram 0:297 0:578 0:471 0:421 0:142 0:578 5-gram 0:307 0:567 0:474 0:391 0:040 0:579

SVM + fLSI with 4-gram + TF-IDF with 5-gramg 4-5-gram 0:282 0:596 0:481 0:407 0:144 0:595

SVM + fLSI + TF-IDF without phonetic transformationg 4-5-gram 0:324 0:577 0:459 0:395 0:150 0:593 0:343 0:351 0:365 of the classi er when tested on 1k tweets of the competition (90% of training and 10% of validation).

In summary, in this work the best result reached was a 0.404 of F1. This result was achieved with a combination of LSI with 4gram + TF-IDF with 5-gram, using a SVM classi er (one-vs-one approach).

Val. 10-fold Comp.

Acc. 0:471 0:443 0:431

Recall 0:428 0:397 0:411

Precision 0:421 0:395 0:398

F1 0:417 0:393 0:404 In this contribution, we presented the approach used to tackle the polarity classi cation task of Spanish tweets of TASS 2015.

From the results, it is observed that a combination of di erent data models, in this case LSI and TF-IDF, improves the performance of a SVM classi er. It also noted that the phonetic transformation makes an improvement; however, more experiments are needed to know whether this improvement is statistically signi cant. As a result, we obtained a 0.404 of F1 (macro-F1) in sentiment classi cation task at ve levels, with the proposed solution. This proposed solution uses a combination of LSI with 4-gram + TF-IDF with 5-gram, and a SVM classi er (one-vs-one approach).

Acknowledgements

This research is partially supported by the Catedras CONACyT project. Furthermore, the authors would like to thank CONACyT for supporting this work through the project 247356 (PN2014). in neural information processing systems, paginas 856{864.

Antunes , Mario, Catarina Silva, Bernardete Ribeiro, y Manuel Correia. 2011 . A hybrid ais-svm ensemble approach for text classi cation . En Adaptive and Natural Computing Algorithms, volumen 6594 de Lecture Notes in Computer Science . Springer Berlin Heidelberg, paginas 342 { 352 .

da Silva , Nadia F.F. , Eduardo

. Hruschka,

y Estevam R. Hruschka

Jr . 2014 . Tweet sentiment analysis with classi er ensembles . Decision Support Systems , 66 ( 0 ): 170 { 179 .

Donald , E

Knuth . 1999 . The art of computer programming . Sorting and searching , 3 : 426 { 458 .

man

, Matthew, Francis R Bach, y David M Blei. 2010 . Online learning for latent dirichlet allocation . En advances Liu , Shuhua Monica y Jiun-Hung Chen . 2015 . A multi-label classi cation based approach for sentiment classi cation . Expert Systems with Applications , 42 ( 3 ): 1083 { 1093 .

Llu s F.

Hurtado , Ferran

Pla . 2014 . Elirfupv en TASS 2014 : Analisis de sentimientos, deteccion de topicos y analisis de sentimientos de aspectos en twitter. Proc. of the TASS workshop at SEPLN 2014 .

Padro , Llu s y Evgeny Stanilovsky. 2012 . Freeling 3.0: Towards wider multilinguality . En Proceedings of the Language Resources and Evaluation Conference (LREC 2012 ), Istanbul, Turkey, May. ELRA.

Pang , Bo y Lillian Lee. 2008 . Opinion mining and sentiment analysis . Foundations and Trends in Information Retrieval , 2 ( 1 - 2):1{ 135 .

Pedregosa , F. ,

Varoquaux ,

Gramfort ,

Michel ,

Thirion ,

Grisel ,

Blondel ,

Prettenhofer ,

Weiss ,

Dubourg ,

Vanderplas ,

Passos ,

Cournapeau ,

Brucher , M. Perrot,

y E.

Duchesnay . 2011 . Scikit-learn: Machine learning in Python . Journal of Machine Learning Research , 12 : 2825 { 2830 .

Peng , Tao, Wanli Zuo, y Fengling He. 2008 . Svm based adaptive learning method for text classi cation from positive and unlabeled documents . Knowledge and Information Systems , 16 ( 3 ): 281 { 301 .

Rehurek , Radim y Petr Sojka. 2010 . Software Framework for Topic Modelling with Large Corpora . En Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks , paginas 45 { 50 , Valletta , Malta, Mayo. ELRA.

Shahbaz , M. , A . Guergachi, y R. T. ur Rehman . 2014 . Sentiment miner: A prototype for sentiment analysis of unstructured data and text . En Electrical and Computer Engineering (CCECE) , 2014 IEEE 27th Canadian Conference on, paginas 1 {7, May.

Villena-Roman , Julio, Janine

Garc a-Morera, Miguel A. Garc a-Cumbreras, Eugenio Mart nez- Camara , M. Teresa Mart nValdivia, y L. Alfonso Uren~a- Lopez . 2015 .

Overview of TASS

2015 .