TASS 2015, septiembre 2015, pp 35-40 recibido 14-07-15 revisado 24-07-15 aceptado 29-07-15 GTI-Gradiant at TASS 2015: A Hybrid Approach for Sentiment Analysis in Twitter∗ GTI-Gradiant en TASS 2015: Una aproximación hı́brida para el análisis de sentimiento en Twitter Tamara Álvarez-López Hector Cerezo-Costas Jonathan Juncal-Martı́nez Diego Celix-Salgado Milagros Fernández-Gavilanes Gradiant Enrique Costa-Montenegro 36310 Vigo, Spain Francisco Javier González-Castaño {hcerezo,dcelix}@gradiant.org GTI Research Group, AtlantTIC University of Vigo, 36310 Vigo, Spain {talvarez,jonijm,milagros.fernandez, kike}@gti.uvigo.es, javier@det.uvigo.es Resumen: Este artı́culo describe la participación en el workshop tass 2015 del grupo de investigación GTI, del centro AtlantTIC, perteneciente a la Universidad de Vigo, y el centro tecnológico Gradiant. Ambos grupos han desarrollado conjun- tamente una aproximación hı́brida para el análisis de sentimiento global en Twitter, presentado en la tarea 1 del tass. Se propone un sistema basado en clasificadores y en aproximaciones sin supervisión, construidas mediante léxicos de polaridad y es- tructuras sintácticas. La combinación de los dos tipos de sistemas ha proporcionado resultados competitivos sobre los conjuntos de prueba propuestos. Palabras clave: Léxico polar, análisis de sentimiento, dependencias sintácticas. Abstract: This paper describes the participation of the GTI research group of AtlantTIC, University of Vigo, and Gradiant (Galician Research and Development Centre in Advanced Telecommunications), in the tass 2015 workshop. Both groups have worked together in the development of a hybrid approach for sentiment anal- ysis, at a global level, of Twitter, proposed in task 1 of tass. A system based on classifiers and unsupervised approaches, built with polarity lexicons and syntactic structures, is presented here. The combination of both approaches has provided highly competitive results over the given datasets. Keywords: Polarity lexicon, sentiment analysis, dependency parsing. 1 Introduction topic, using colloquial and compact language. As a consecuence, SA in Twitter is specially In recent years, research on the field of Sen- challenging, as opinions are expressed in one timent Analysis (sa) has increased consider- or two short sentences. Moreover, they in- ably, due to the growth of user content gen- clude special elements such as hashtags or erated in social networks, blogs and other mentions. Henceforth, additional treatments platforms on the Internet. These are con- must be applied when analyzing a tweet. sidered valuable information for companies, Numerous contributions on this subject which seek to know or even predict the ac- can be found in the literature. Most of them ceptance of their products, to design their are supervised machine learning approaches, marketing campaigns more efficiently. One although unsupervised semantic can also be of these sources of information is Twitter, found in this field. The first ones are usu- where users are allowed to write about any ally classifiers built from features of a “bag ∗ of words” representation (Pak and Paroubek, This work was supported by the Spanish Govern- ment, co-financed by the European Regional Devel- 2010), whilst the second ones try to model opment Fund (ERDF) under project TACTICA, and linguistic knowledge by using polarity dic- RedTEIC (R2014/037). tionaries (Brooke, Tofiloski, and Taboada, Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073 T. Álvarez-López, J. Juncal-Martínez, M. Fernández-Gavilanes, E. Costa-Montenegro, F. J. González-Castaño, H. Cerezo-Costas, D. Celix-Salgado 2009), which contain words tagged with their hybrid system. The following subsections ex- semantic orientation. These strategies in- plain the two approaches, as well as the strat- volve lexics, syntax or semantics analyt- egy followed to combine them. ics (Quinn et al., 2010) with a final aggre- gation of their values. 2.1 Previous steps The tass evaluation workshop aims at The first treatments to be applied over the set providing a benchmark forum for comparing of tweets rely on natural language processing the latest approaches in this field. In this (nlp) and are common to both approaches. way, our team only took part in task 1 related They include preprocessing, lexical and syn- to sa in Twitter. This task encompasses tactic analysis and generation of sentiment four experiments. The first consists of eval- lexicons. uating tweet polarities over a big dataset of 2.1.1 Preprocessing tweets, with only 4 tags, positive (p), negative The language used on Twitter contains words (n), neutral (neu) or no opinion (none) ex- that are not found in any dictionary, be- pressed. In the second experiment, the same cause of orthographic modifications. The aim evaluation is requested over a smaller selec- here is to normalize the texts to get closer tion of tweets. The third and fourth experi- to formal language. The actions executed in ments propose the same two datasets, respec- this stage are the substitution of emoticons, tively, but with 6 different possible tags, in- which are divided in several categories, by cluding strong positive (p+) and strong neg- equivalent Spanish words, for example, :) is ative (n+). In addition, a training set has replaced by e feliz ; the substitution of fre- been provided, in order to build the mod- quent abbreviations; the removal of repeated els (Villena-Román et al., 2015). characters and the replacement of specific The rest of this article is structured as fol- Twitter words such as hashtags, as well as lows: Section 2 presents in detail the system mentions or urls, by hashtag, mencion and proposed. Section 3 describes the results ob- url tags, respectively. tained and some experiments performed over the target datasets. Finally, Section 4 sum- 2.1.2 Lexical and syntactic analysis marizes the main findings and conclusions. After the preproccessing, the input text is morphologically tagged to obtain the part-of- 2 System overview speech (PoS) associated with a word, as ad- jectives, adverbs, nouns and verbs. Finally, Our system is a combination of two different a dependency tree is created with the syn- approaches. The first approach is an unsu- tactic functions annotated. These steps are pervised approach, based on sentiment dic- performed with the Freeling tool (Padró and tionaries, which are automatically generated Stanilovsky, 2012). from the set of tweets to analyze (a set of pos- itive and negative seeds, created manually, 2.1.3 Sentiment lexicons are necessary to start the process). The sec- Sentiment lexicons have been used in many ond is a supervised approach, which employs supervised and unsupervised approaches for Conditional Random Fields (crfs) (Sutton sentiment detection. They are not so and McCallum, 2011) to detect the scope common in Spanish as in English, al- of potential polarity shifters (e.g. intensi- though there are some available, such as fication, reversal verbs and negation parti- socal (Brooke, Tofiloski, and Taboada, cles). This information is combined to con- 2009), Spanishdal (Dell’ Amerlina Rı́os and form high-level features which are fed to a Gravano, 2013) and esol lexicon (Molina- statistical classifier to finally obtain the po- González et al., 2013). Some of them are lists larity of the message. of words with an associated number, which In this way, both approaches have been represents the polarity, and others are just previously adapted to the English language lists of negative and positive words. and submitted to the SemEval-2015 senti- However, these dictionaries are not con- ment analysis task, achieving good rankings textualized, so we generate additional ones and results separately (Fernández-Gavilanes automatically from the words in the syntac- et al., 2015; Cerezo-Costas and Celix- tic dependencies of each tweet, considering Salgado, 2015). Because both have shown verbs, nouns and adjectives. Then, we ap- particular advantages, we decided to build a ply a polarity expansion algorithm based on 36 GTI-Gradiant at TASS 2015: A Hybrid Approach for Sentiment Analysis in Twitter graphs (Cruz et al., 2011). The starting point shifters. of this algorithm is a set of positive and nega- 2.2.2 Polarity modifiers tive words, used as seeds, extracted from the Polarity shifters are specific particles (e.g. no most negative and positive words in the gen- (no)), words (e.g. evitar (avoid)), or con- eral lexicons. This dictionary will contain a structions (e.g. fuera de (out of)) that mod- list of words with their polarity associated, ify the polarity of the words under their in- which is a real number in [-5, 5]. Finally, fluence. Detecting these scopes of influence we merge each general lexicon with the auto- closely related to the syntactic graphs is dif- matically created ones, obtaining several dic- ficult due to the unreliability of dependency tionaries, depending on the combination ap- and syntactic parsers on Twitter. To solve plied, to feed our system. this problem we trained sequential crfs for As explained in the next sections, the dic- each problem we wanted to solve. crfs are tionaries obtained must be adapted for us- supervised techniques that assign a label to ing them in the supervised approach. In each component (in our case the words of a this case, only a list of positive and negative sentence). words is required, with no associated values. Our system follows a similar approach to 2.2 Supervised approach Lapponi, Read and Ovrelid (2012) but it This subsection presents the supervised ap- has been enhanced to track intensification, proach for the tagging of Spanish tweets. Af- comparisons within a sentence, and the effect ter the previous steps, lexical, PoS and crf of adversative clauses (e.g. sentences with labels are jointly combined to build the final pero (but) particles). We refer the reader to features that define the input to a logistic re- Cerezo-Costas and Celix-Salgado (2015) to gression classifier. see the input features employed by the crfs. The system works in two phases. First, 2.2.3 Classifier a learning phase is applied in which the sys- All the characteristics from previous steps are tem learns the parameters of the supervised included as input features of a statistical clas- model using manually tagged data. Second, sifier. The lexical features (word, stem, word the supervised model is only trained with the and stem bigrams and flags extracted from training vector provided by the organization. the polar dictionaries) are included with PoS 2.2.1 Strategy initialization and the labels from the crfs. The learning This strategy uses several dictionaries as an algorithm employed was a logistic regressor. input for different steps of the feature extrac- Due to the size of the feature space and its tion process. Hence, a polarity dictionary, sparsity, l1 (0.000001) and l2 (0.00005) regu- previously created and adapted, containing larization was applied to learn the most im- positive and negative words, is provided as portant features and discard the least rele- input in this step. Certain polarity shifters vant for the task. play an important role in the detection of the 2.3 Unsupervised approach polarity of a sentence. Previous attempts in The unsupervised approach is based on gen- the academic literature followed different ap- erated polarity lexicons applied to the syntac- proaches, like hand-crafted rules (Sidorov et tic structures previously obtained. The final al., 2013) or crfs (Lapponi et al., 2012). We sentiment result of each tweet is expressed as employ crfs to detect the scope of the polar- a real number, calculated as follows: first, the ity shifters such as denial particles (e.g. sin words in the dependency tree are assigned a (without), no (no)) and reversal verbs, (e.g. polarity from the sentiment dictionary; sec- evitar (avoid), solucionar (solve)). In or- ond, a polarity value propagation based on der to obtain the list of reversal verbs and Caro and Grella (2013) is performed on each denial particles, basic syntactic rules and a dependency tree from the lower nodes to the manual supervision were applied to the final root, by means of propagation rules explained system. A similar approach can be found in later. The real end value is classified as p, n, Choi and Cardie (2008) . neu or none, according to defined intervals. Additional dictionaries are used in the sys- tem (e.g. adversative particles or superla- 2.3.1 Intensification rules tives) but their main purpose is to give sup- Usually, adverbs act as intensifiers or di- port of the learning steps with the polarity minishers of the word that follows them. 37 T. Álvarez-López, J. Juncal-Martínez, M. Fernández-Gavilanes, E. Costa-Montenegro, F. J. González-Castaño, H. Cerezo-Costas, D. Celix-Salgado For example, there is a difference between sido imposible (I had promised it, but bonito (beautiful) and muy bonito (very it has been impossible), the most impor- beautiful). The first one has a positive con- tant part is the one with the nexus, whereas notation, whose polarity is increased by the in the concessive sentence A pesar de su adverb muy (very). So, its semantic orienta- talento, han sido despedidos (In spite of tion is altered. Therefore, the intensification their talent, they have been fired), it is achieved by assigning a positive or nega- is the part without the nexus. tive percentage in the intensifiers and dimin- ishers (Zhang, Ferrari, and Enjalbert, 2012). 2.4 Combination strategy: the hybrid approach 2.3.2 Negation rules If words that imply denial appear in the In order to decide the final polarity of each text, such as no (no), nunca (never) or ni tweet, we combine both approaches as fol- (neither) (Zhang, Ferrari, and Enjalbert, lows: applying the supervised approach, 15 2012), the meaning is completely altered. For different outputs are generated, randomizing example, there is a difference between Yo the training vector and selecting a subset of soy inteligente (I am intelligent) and Yo them for training (leaving out 1500 records no soy inteligente (I am not intelligent). in each iteration). Then, another 15 outputs The meaning of the text changes from posi- are generated applying the unsupervised ap- tive to negative, due to the negator nexus. proach, using 15 different lexicons, created by Therefore, the negation is identified by de- combining each general lexicon (SDAL, SO- tecting the affected scope in the dependency CAL, eSOL) with the automatically gener- tree, for subsequently applying a negative ated one, and also combining 3 or 4 of them. factor to all affected nodes. During this process, when a word appears in several dictionaries, we apply a weighted av- 2.3.3 Polarity conflict rules erage, varying the relevance assigned to each Sometimes, two words appearing together dictionary, thus providing more output com- express opposite sentiments. The aim binations. Afterwards, we apply a major- here is to detect these cases, known as ity voting method among the 30 outputs ob- polarity conflicts (Moilanen and Pulman, tained to decide the final tweet polarity. This 2007). For example, in fiesta aburrida strategy has shown better performance than (boring party), fiesta (party) has a po- only one of the approaches by itself, making larity with a positive connotation, which is the combination of both a good choice for the reduced by the negative polarity of aburri- experiments, as explained in the next section. da (boring). Moreover, in náufrago ileso (unharmed castaway), náufrago (castaway) 3 Experimental results has a negative polarity, which is reduced The performance in task 1 was measured by by the positive polarity of ileso (unharmed), means of the accuracy (correct tweets accord- yielding a new positive connotation. ing to the gold standard). Table 1 shows 2.3.4 Adversative/concessive rules the results, where accuracy is represented for There is a point in common between ad- each experiment, as well as the results of the versative and concessive sentences. In both top ranking systems, out of 16 participants cases, one part of the sentence is in con- for the 6-tag subtasks, and 15 participants trast with the other. While the former ex- for the 4-tag subtasks. press an objection in compliance with what It can be noticed that the results for 6 tags is said in the main clause, the latter express are considerably worse than those for 4 tags. a difficulty in fulfilling the main clause. We It appears that it becomes more difficult for can assume that both constructions will re- our system, and for any system in general, to strict, exclude, amplify or diminish the sen- detect positive or negative intensities, rather timent reflected in them. Some adversa- than just distinguishing positive from nega- tive nexus can be pero (but) or sin em- tive. Furthermore, we can also observe in the bargo (however) (Poria et al., 2014), whereas results for the smaller dataset that accuracy concessive ones can be aunque (although) diminishes notably for both experiments. or a pesar de (in spite of) (Rudolph, As previously said, in order to obtain our 1996). For example, in the adversative results, we combined both approaches, by sentence Lo habı́a prometido, pero me ha means of a majority voting method. On the 38 GTI-Gradiant at TASS 2015: A Hybrid Approach for Sentiment Analysis in Twitter Accuracy Accuracy Team Approach 6 6 (1k) 4 4 (1k) 6 6 (1k) 4 4 (1k) LIF 67.22 51.61 72.61 69.21 Supervised 58.4 48.3 66.4 63.8 GTI-GRAD 59.25 50.92 69.53 67.42 Unsupervised 47.8 41.8 66.3 65.1 ELIRF 67.31 48.83 72.52 64.55 Combined 59.2 50.9 69.5 67.4 GSI 61.83 48.74 69.04 65.83 LYS 56.86 43.45 66.45 63.49 Table 2: Comparative accuracy analysis. Both approaches and combined output. Table 1: GTI-Gradiant accuracy obtained for each experiment, compared to the top rank- sity of Vigo) and Gradiant (Galician Re- ing systems. The subscripts represent the po- search and Development Centre in Advanced sition in the ranking. Telecommunications) in tass 2015 Task 1: Sentiment Analysis at global level. We have one hand, the outputs resulting from the su- presented a hybrid system, combining super- pervised approach were generated by apply- vised and unsupervised approaches, which ing classifiers, with different training records. has obtained competitive results and a good On the other hand, the unsupervised ap- position in the final ranking. proach requires the use of several dictionar- The unsupervised approach consists of ies, getting a real number polarity for each sentiment propagation rules on dependencies, tweet, and then applying an interval to de- whilst the supervised one is based on classi- termine when a tweet carries an opinion or fiers. This combination seems to work con- not. This interval is fixed to [-1, 1] for no siderably well in this task. opinion. In addition, the number of words There is still margin for improvement, containing a polarity is taken into account mostly in neutral tweets detection and more to decide the neutrality of a tweet. That is, refined distinction of degrees of positivity and if it contains polar words but the total re- negativity. sult lies in [-1, 1], this means that there is a contraposition of opinions, so the tweet is References tagged as neutral. However, our combined Brooke, J., M. Tofiloski, and M. Taboada. system seemed to work not so well for neutral 2009. Cross-linguistic sentiment analy- texts, specially in the bigger datasets. This sis: From English to Spanish. In Proc. of may be due to the small proportion of neutral the Int. Conf. RANLP-2009, pages 50–54, tweets through out the whole dataset, as they Borovets, Bulgaria. ACL. only represent a 2.15% of the total number of Caro, L. Di and M. Grella. 2013. Sentiment tweets, rising to 6.3% for the small datasets. analysis via dependency parsing. Com- For the 6-tag experiments, p+ and n+ puter Standards & Interfaces, 35(5):442– tags were determined with the supervised ap- 453. proach. This decision was taken because the unsupervised approach was not able to dis- Cerezo-Costas, H. and D. Celix-Salgado. criminate efficiently between p and p+ or be- 2015. Gradiant-analytics: Training polar- tween n and n+. ity shifters with CRFs for message level Table 2 shows several experiments with polarity detection. In Proc. of the 9th Int. the supervised and unsupervised models, as Workshop on Semantic Evaluation (Se- well as with the combined one, so we can ap- mEval 2015), pages 539–544, Denver, Col- preciate the improvement in the last case. orado. ACL. These results were obtained by applying a Choi, Y. and C. Cardie. 2008. Learning with majority voting method to each approach Compositional Semantics as Structural In- separately, with 15 outputs, and then to 30 ference for Subsentential Sentiment Anal- outputs of the combined result. ysis. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, 4 Conclusions pages 793–801. This paper describes the participation of the Cruz, F. L., J. A. Troyano, F. J. Ortega, GTI Research Group (AtlantTIC, Univer- and F. Enrı́quez. 2011. Automatic expan- 39 T. Álvarez-López, J. Juncal-Martínez, M. Fernández-Gavilanes, E. Costa-Montenegro, F. J. González-Castaño, H. Cerezo-Costas, D. Celix-Salgado sion of feature-level opinion lexicons. In Poria, S., E. Cambria, G. Winterstein, and Proc. of the 2nd Workshop on Computa- G. Huang. 2014. Sentic patterns: tional Approaches to Subjectivity and Sen- Dependency-based rules for concept-level timent Analysis, pages 125–131, Strouds- sentiment analysis. Knowledge-Based Sys- burg, PA, USA. ACL. tems, 69(0):45–63. Dell’ Amerlina Rı́os, M. and A. Gravano. Quinn, K. M., B. L. Monroe, M. Co- 2013. Spanish dal: A Spanish dictionary laresi, M. H. Crespin, and D. R. Radev. of affect in language. In Proc. of the 4th 2010. How to analyze political atten- Workshop on Computational Approaches tion with minimal assumptions and costs. to Subjectivity, Sentiment and Social Me- American Journal of Political Science, dia Analysis, pages 21–28, Atlanta, Geor- 54(1):209–228. gia. ACL. Rudolph, E. 1996. Contrast: Adversa- Fernández-Gavilanes, M., T. Álvarez-López, tive and Concessive Relations and Their J. Juncal-Martı́nez, E. Costa-Montenegro, Expressions in English, German, Span- and F. J. González-Castaño. 2015. GTI: ish, Portuguese on Sentence and Text An unsupervised approach for sentiment Level. Research in text theory. Walter de analysis in Twitter. In Proc. of the 9th Gruyter. Int. Workshop on Semantic Evaluation Sidorov, G., S. Miranda-Jiménez, F. Viveros- (SemEval 2015), pages 533–538, Denver, Jiménez, A. Gelbukh, N. Castro-Sánchez, Colorado. ACL. F. Velásquez, I. Dı́az-Rangel, S. Suárez- Guerra, A. Treviño, and J. Gordon. Lapponi, E., J. Read, and L. Ovrelid. 2013. Empirical Study of Machine Learn- 2012. Representing and Resolving Nega- ing Based Approach for Opinion Min- tion for Sentiment Analysis. In IEEE ing in Tweets. In Ildar Batyrshin and 12th Int. Conf. on Data Mining Work- Miguel González Mendoza, editors, Ad- shops (ICDMW), pages 687–692. vances in Artificial Intelligence, volume Lapponi, E., E. Velldal, L. Øvrelid, and 7629 of LNCS. Springer Berlin Heidelberg, J. Read. 2012. Uio 2: Sequence-Labeling pages 1–14. Negation Using Dependency Features. In Sutton, C. and A. McCallum. 2011. An In- Proc. of the 1st Conf. on Lexical and Com- troduction to Conditional Random Fields. putational Semantics, volume 1, pages Machine Learning, 4(4):267–373. 319–327. Villena-Román, J., J. Garcı́a-Morera, M. A. Moilanen, K. and S. Pulman. 2007. Sen- Garcı́a-Cumbreras, E. Martı́nez-Cámara, timent composition. In Proc. of RANLP M. T. Martı́n-Valdivia, and L. A. Ureña- 2007, Borovets, Bulgaria. López. 2015. Overview of tass 2015. In TASS 2015: Workshop on Sentiment Molina-González, M. D., E. Martı́nez- Analysis at SEPLN. Cámara, M. T. Martı́n-Valdivia, and J. M. Perea-Ortega. 2013. Semantic orienta- Zhang, L., S. Ferrari, and P. Enjalbert. 2012. tion for polarity classification in Spanish Opinion analysis: The effect of negation reviews. Expert Syst. Appl., 40(18):7250– on polarity and intensity. In Jeremy 7257. Jancsary, editor, Proc. of KONVENS 2012, pages 282–290. ÖGAI, September. Padró, L. and E. Stanilovsky. 2012. Freel- PATHOS 2012 workshop. ing 3.0: Towards wider multilinguality. In Proc. of the Language Resources and Evaluation Conf. (LREC 2012), Istanbul, Turkey. ELRA. Pak, A. and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Proc. of the Int. Conf. on Lan- guage Resources and Evaluation, LREC 2010, Valletta, Malta. 40