TASS 2015, septiembre 2015, pp 29-34 recibido 17-07-15 revisado 24-07-15 aceptado 28-07-15 Aspect based Sentiment Analysis of Spanish Tweets Análisis de Sentimientos de Tweets en Español basado en Aspectos Oscar Araque, Ignacio Corcuera, Constantino Román, Carlos A. Iglesias y J. Fernando Sánchez-Rada Grupo de Sistemas Inteligentes, Departamento de Ingenierı́a de Sistemas Telemáticos, Universidad Politécnica de Madrid (UPM), España Avenida Complutense, no 30, 28040 Madrid, España {oscar.aiborra, ignacio.cplatas, c.romang}@alumnos.upm.es {cif, jfernando}@dit.upm.es Resumen: En este artı́culo se presenta la participación del Grupo de Sistemas Inteligentes (GSI) de la Universidad Politécnica de Madrid (UPM) en el taller de Análisis de Sentimientos centrado en tweets en Español: el TASS2015. Este año se han propuesto dos tareas que hemos abordado con el diseño y desarrollo de un sistema modular adaptable a distintos contextos. Este sistema emplea tecnologı́as de Procesado de Lenguaje Natural (NLP) ası́ como de aprendizaje automático, de- pendiento además de tecnologı́as desarrolladas previamente en nuestro grupo de in- vestigación. En particular, hemos combinado un amplio número de rasgos y léxicos de polaridad para la detección de sentimento, junto con un algoritmo basado en grafos para la detección de contextos. Los resultados experimentales obtenidos tras la consecución del concurso resultan prometedores. Palabras clave: Aprendizaje automático, Procesado de lenguaje natural, Análisis de sentimientos, Detección de aspectos Abstract: This article presents the participation of the Intelligent Systems Group (GSI) at Universidad Politécnica de Madrid (UPM) in the Sentiment Analysis work- shop focused in Spanish tweets, TASS2015. This year two challenges have been proposed, which we have addressed with the design and development of a modu- lar system that is adaptable to different contexts. This system employs Natural Language Processing (NLP) and machine-learning technologies, relying also in pre- viously developed technologies in our research group. In particular, we have used a wide number of features and polarity lexicons for sentiment detection. With regards to aspect detection, we have relied on a graph-based algorithm. Once the challenge has come to an end, the experimental results are promising. Keywords: Machine learning, Natural Language Processing, Sentiment analysis, Aspect detection 1 Introduction NONE means absence of sentiment polarity. This task provides a corpus (Villena-Román In this article we present our participation et al., 2015b), which contains a total of 68.000 for the TASS2015 challenge (Villena-Román tweets written in Spanish, describing a diver- et al., 2015a). This work deals with two dif- sity of subjects. ferent tasks, that are described next. The first task of this challenge, Task 1 (Villena-Román et al., 2015b), consists of The second and last task, Task 2 (Villena- determining the global polarity at a message Román et al., 2015b), is aimed to detect level. Inside this task, there are two eval- the sentiment polarity at an aspect level us- uations: one in which 6 polarity labels are ing three labels (P, N and NEU). Within considered (P+, P, NEU, N, N+, None), and this task, two corpora (Villena-Román et al., another one with 4 polarity labels considered 2015b) are provided: SocialTV and STOM- (P, N, NEU, NONE). P stands for positive, POL corpus. We have restricted ourselves while N means negative and NEU is neu- to the SocialTV corpus in this edition. This tral. The “+” symbol is used for intensifi- corpus contains 2.773 tweets captured during cation of the polarity. It is considered that the celebration of the 2014 Final of Copa del Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073 Oscar Araque, Ignacio Corcuera, Constantino Román, Carlos A. Iglesias, J. Fernando Sánchez-Rada rey championship1 . Along with the corpus machine learning algorithm. Fernández et al. a set of aspects which appear in the tweets (2013) employ a ranking algorithm using bi- is given. This list is essentially composed by grams and added to this a skipgrams scorer, football players, coaches, teams, referees, and which allow them to create sentiment lexi- other football-related concepts such as crowd, cons that are able to retain the context of the authorities, match and broadcast. terms. A different approach is by means of The complexity presented by the challenge the Word2Vec model, used by Montejo-Ráez, has taken us to develop a modular system, in Garcı́a-Cumbreras, and Dı́az-Galiano (2104), which each component can work separately. in which each word is considered in a 200- We have developed and experimented with dimensional space, without using any lexical each module independently, and later com- or syntactical analysis: this allows them to bine them depending on the Task (1 or 2) we develop a fairly simple system with reason- want to solve. able results. The rest of the paper is organized as fol- lows. First, Section 2 is a review of the 3 System architecture research involving sentiment analysis in the One of ours main goals is to design and de- Twitter domain. After this, Section 3 briefly velop an adaptable system which can func- describes the general architecture of the de- tion in a variety of situations. As we have al- veloped system. Following that, Section 4 ready mentioned, this has taken us to a sys- describes the module developed in order to tem composed of several modules that can confront the Task 1 of this challenge. After work separately. Since the challenge pro- this, Section 5 explains the other modules poses two different tasks (Villena-Román et necessaries to address the Task 2. Finally, al., 2015b), we will utilize each module when Section 6 concludes the paper and presents necessary. some conclusions regarding our participation Our system is divided into three modules: in this challenge, as well as future works. • Named Entity Recognizer (NER) module. The NER module detects the 2 Related Work entities within a text, and classifies them Centering the attention in the scope of TASS, as one of the possibles entities. In the many researches have experimented, through Section 5 a more detailed description of the TASS corpora, with different approaches this module and the set of entities given to evaluate the performance of these systems. is presented, as it is used in the Task 2. Vilares et al. (2014) present a system re- • Aspect and Context detection mod- lying in machine learning classification for ule. This module is in charge of detect- the tasks of sentiment analysis, and a heuris- ing the remaining aspects -aspects that tics based approach for aspect-based senti- are not entities and therefore can not be ment analysis. Another example of classifi- detected as such- and the contexts of all cation through machine learning is the work aspects. In the Section 5 this module is of Hurtado and Pla (2014), in which they described in greater detail since it is only utilize Support Vector Machine (SVM) with used for tackling the Task 2. remarkable results. It is common to in- corporate linguistic knowledge to this sys- • Sentiment Analysis module. As the tems, as proposed by Urizar and Roncal name suggests, the goal of this module (2013), who also employ lexicons in its work. is to classify the given texts using sen- Balahur and Perea-Ortega (2013) deal with timent polarity labels. This module is this problem using dictionaries and trans- based on combining NLP and machine lated data from English to Spanish, as well learning techniques and is used in both as machine-learning techniques. An inter- Task 1 and 2. It is explained in more esting procedure is performed by Vilares, detail next. Alonso, and Gómez-Rodrı́guez (2013): us- ing semantic information added to psycho- 3.1 Sentiment Analysis module logical knowledge extracted from dictionar- The sentiment analysis module relies in a ies, they combine these features to train a SVM machine-learning model that is trained with data composed of features extracted 1 www.en.wikipedia.org/wiki/2014 Copa del Rey Final from the TASS dataset: General corpus for 30 Aspect based Sentiment Analysis of Spanish Tweets the Task 1 and SocialTV corpus for Task words with polarity is to combine sev- 2 (Villena-Román et al., 2015b). eral resources lexicon. The lexicons used 3.1.1 Feature Extraction are: Elhuyar Polar Lexicon (Urizar and Roncal, 2013), ISOL (Martı́nez-Cámara We have used different approaches to design et al., 2013), Sentiment Spanish Lexi- the feature extraction. The reference docu- con (SSL) (Veronica Perez Rosas, 2012), ment taken in the development of the fea- SOCAL (Taboada et al., 2011) and ML- tures extraction was made by Mohammad, SentiCON (Cruz et al., 2014). Kiritchenko, and Zhu (2013). With this in mind, the features extracted from each tweet • Intensifiers, a intensifier dictio- to form a feature vector are: nary (Cruz et al., 2014) has been used for calculating the polarity of a • N-grams, combination of contiguous se- word, increasing or decreasing its value. quences of one, two and three tokens • Negation, explained in 3.1.2. consisting on words, lemmas and stem words. As this information can be dif- • Global Polarity, this score is the sum ficult to handle due to the huge volume of the punctuations from the emoticon of N-grams that can be formed, we set a analysis and the lexicon resources. minimum frequency of three occurrences 3.1.2 Negation to consider the N-gram. An important feature that has been used to • All-caps, the number of words with all develop the classifier is the treatment of the characters in upper cases that appears negations. This approach takes into account in the tweets. the role of the negation words or phrases, as they can alter the polarity value of the words • POS information, the frequency of each or phrases they precede. part-of-speech tag. The polarity of a word changes if it is • Hashtags, the number of hashtags terms. included in a negated context. For detect- ing a negated context we have utilized a • Punctuation marks, these marks are fre- set of negated words, which has been man- quently used to increase the sentiment ually composed by us. Besides, detecting the of a sentence, specially on the Twitter context requires deciding how many tokens domain. The presence or absence of are affected by the negation. For this, we these marks (?!) are extracted as a new have followed the proposal by Pang, Lee, and feature, as well as its relative position Vaithyanathan (2002). within the document. Once the negated context is defined there • Elongated words, the number of words are two features affected by this: N-grams that has one character repeated more and lexicon. The negation feature is added than two times. to these features, implying that its negated (e.g. positive becomes negative, +1 becomes • Emoticons, the system uses a Emoticons -1). This approximation is based on the work Sentiment Lexicon, which has been de- by Saurı́ and Pustejovsky (2012). veloped by Hogenboom et al. (2013). • Lexicon Resources, for each token w, we 4 Task 1: Sentiment analysis at used the sentiment score score(w) to de- global level termine: 4.1 Experiment and results 1. Number of words that have a In this competition it is allowed for submis- score(w) 6= 0. sion up to three experiments for each corpus. With this in mind, three experiments have 2. Polarity of each word that has a been developed in this task attending to the score(w) 6= 0. lexicons that adjust better to the corpus: 3. Total score of all the polarities of the words that have a score(w) 6= 0. • RUN-1, there is one lexicon that is adapted well to the corpus, the ElhPolar The best way to increase the coverage lexicon. It has been decided to use only range with respect to the detection of this dictionary in the first run. 31 Oscar Araque, Ignacio Corcuera, Constantino Román, Carlos A. Iglesias, J. Fernando Sánchez-Rada • RUN-2, in this run the two lexicons that 5.1 NER have the best results in the experiments The goal of this module is to detect the words have been combined, the ElhPolar and that represent a certain entity from the set the ISOL. of entities that can be identified as a per- • RUN-3, the last run is a mix of all the son (players and coaches) or an organization lexicon used on the experiments. (teams). For this module we used the Stanford CRF NER (Finkel, Grenager, and Manning, 2005). It includes a Spanish model trained on news Experiment Accuracy F1-Score data. To adapt the model, we trained it 6labels 61.8 50.0 instead with the training dataset (Villena- 6labels-1k 48.7 44.6 Román et al., 2015b) and a gazette. The 4labels 69.0 55.0 model is trained with two labels: Per- 4labels-1k 65.8 53.1 son (PER) and Organization (ORG). The gazette entries were collected from the train- ing dataset, resulting in a list of all the ways Table 1: Results of RUN-1 in the Task 1 the entities (players, teams or coaches) were named. We verified the performance of the Stanford NER by means of cross-validation Experiment Accuracy F1-Score on the training data. With this, we obtained 6labels 61.0 49.5 an average F1-Score of 91.05%. 6labels-1k 48.0 44.0 As the goal of the NER module is to detect 4labels 67.9 54.6 the words that represent a specific entity, we 4labels-1k 64.6 53.1 used a list of all the ways these entities were named. In this way, once the Stanford NER detect the general entity our improved NER Table 2: Results of RUN-2 in the Task 1 module search in this list and decides the par- ticular entity by matching the pattern of the entity words. Experiment Accuracy F1-Score 6labels 60.8 49.3 5.2 Aspect and Context detection 6labels-1k 47.9 43.7 This module aims to detect the aspects that 4labels 67.8 54.5 are not entities, and thus have not been 4labels-1k 64.6 48.7 detected by the NER module. To achieve this, we have composed a dictionary using the training dataset (Villena-Román et al., Table 3: Results of RUN-3 in the Task 1 2015b) which contains all the manners that all the aspects -including the entities for- 5 Task 2: Aspect-based sentiment merly detected- are named. Using this dic- tionary, this module can detect words that analysis are related to a specific aspect. Although This task is an extension of the Task 1 in the NER module already detects entities as which sentiment analysis is made at the as- players, coaches or teams, this module can pect level. The goal in this task is to detect detect them too: it treats these detected en- the different aspects that can be in a tweet tities as more relevant than its own recogni- and afterwards analyze the sentiment associ- tions, combining in this way the capacity of ated with each aspect. aspect/entity detection of the NER module For this, we used a pipeline that takes the and this module. provided corpus as input and produces the As for the context detection, we have im- sentiment annotated corpus as output. This plemented a graph based algorithm (Mukher- pipeline can be divided into three major mod- jee and Bhattacharyya, 2012) that allows us ules that work in a sequential manner: first to extract sets of words related to an aspect the NER, second the Aspect and Context de- from a sentence, even if this sentence has dif- tection, and third the Sentiment Analysis as ferent aspects and mixed emotions. The con- described below. text of an aspect is the set of words related 32 Aspect based Sentiment Analysis of Spanish Tweets to that aspect. Besides, we have extended Experiment Accuracy F1-Score this algorithm in such a way that allow us to RUN-1 63.5 60.6 configure the scope of this context detection. RUN-2 62.1 58.4 Combining this two approaches -aspect RUN-3 55.7 55.8 and context detection- this module is able to detect the word or words which identify an aspect, and extract the context of this aspect. Table 4: Results of each run in the Task 2 This context allows us to isolate the senti- ment meaning of the aspect, fact that will be system ranked first in F1-Score and second very interesting for the sentiment analysis at in Accuracy. an aspect level. We have obtained an accuracy of 93.21% 6 Conclusions and future work in this second step of the pipeline with the training dataset (Villena-Román et al., In this paper we have described the partici- 2015b). As for the test dataset (Villena- pation of the GSI in the TASS 2015 challenge Román et al., 2015b) we obtained an accu- (Villena-Román et al., 2015a). Our proposal racy of 89.27%2 . relies in both NLP and machine-learning techniques, applying them jointly to obtain 5.3 Sentiment analysis a satisfactory result in the rankings of the The sentiment analysis module is the end of challenge. We have designed and developed the processing pipeline. This module is in a modular system that relies in previous tech- charge of classifying the detected aspects in nologies developed in our group (Sánchez- polarity values through the contexts of each Rada, Iglesias, and Gil, 2015). These charac- aspect. We have used the same model used teristics make this system adaptable to dif- in Task 1 to analyse every detected aspect in ferent conditions and contexts, feature that Task 2, given that the detected aspect con- results very useful in this competition given texts in Task 2 are similar to the texts anal- the diversity of tasks (Villena-Román et al., ysed in Task 1. 2015b). Nevertheless, though using the same As future work, our aim is to improve as- model, it is needed to train this model with pect detection by including semantic similar- the proper data. For this, we extracted the ity based on the available lexical resources in aspects and contexts from the train dataset, the Linguistic Linked Open Data Cloud. To process the corresponding features (explained this aim, we will integrate also vocabularies in Section 3), and then train the model with such as Marl (Westerski, Iglesias, and Tapia, these. In this way, the trained machine is fed 2011). In addition, we are working on im- contexts of aspects that will classify in one proving the sentiment detection based on the of the three labels (as mentioned: positive, social context of users within the MixedEmo- negative and neutral). tions project. 5.4 Results Acknowledgement By means of connecting these three modules This research has been partially funded together, we obtain a system that is able to and by the EC through the H2020 project recognize entities and aspects, detect the con- MixedEmotions (Grant Agreement no: text in which they are enclosed, and classify 141111) and by the Spanish Ministry of them at an aspect level. The performance of Industry, Tourism and Trade through the this system is showed in the Table 4. The dif- project Calista (TEC2012-32457). We would ferent RUNs represent separate adjustments like to thank Maite Taboada as well as the of the same experiment, in which several pa- rest of researchers for providing us their rameters are controlled in order to obtain the valuable lexical resources. better performance. As can be seen in Table 4, the global per- References formance obtained is fairly positive, as our Balahur, A. and José M. Perea-Ortega. 2013. 2 We calculated this metric using the out- Experiments using varying sizes and ma- put granted by the TASS uploading page chine translated data for sentiment analy- www.daedalus.es/TASS2015/private/evaluate.php. sis in twitter. 33 Oscar Araque, Ignacio Corcuera, Constantino Román, Carlos A. Iglesias, J. Fernando Sánchez-Rada Cruz, Fermı́n L, José A Troyano, Beatriz Sánchez-Rada, J. Fernando, Carlos A. Igle- Pontes, and F Javier Ortega. 2014. Build- sias, and Ronald Gil. 2015. A Linked ing layered, multilingual sentiment lexi- Data Model for Multimodal Sentiment cons at synset and lemma levels. Expert and Emotion Analysis. 4th Workshop on Systems with Applications, 41(13):5984– Linked Data in Linguistics: Resources and 5994. Applications. Fernández, J., Y. Gutiérrez, J. M. Gómez, Saurı́, Roser and James Pustejovsky. 2012. P. Martı́nez-Barco, A. Montoyo, and Are you sure that this happened? as- R. Muñoz. 2013. Sentiment analysis of sessing the factuality degree of events Spanish tweets using a ranking algorithm in text. Computational Linguistics, and skipgrams. 38(2):261–299. Finkel, Jenny Rose, Trond Grenager, and Taboada, Maite, Julian Brooke, Milan Christopher Manning. 2005. Incorpo- Tofiloski, Kimberly Voll, and Manfred rating non-local information into informa- Stede. 2011. Lexicon-based methods for tion extraction systems by gibbs sampling. sentiment analysis. Computational lin- pages 363–370. guistics, 37(2):267–307. Hogenboom, A., D. Bal, F. Franciscar, Urizar, Xabier Saralegi and Iñaki San Vicente M. Bal, F. De Jong, and U. Kaymak. Roncal. 2013. Elhuyar at TASS 2013. 2013. Exploiting emoticons in polarity Veronica Perez Rosas, Carmen Banea, classification of text. Rada Mihalcea. 2012. Learning senti- Hurtado, Ll. and F. Pla. 2014. ELiRF-UPV ment lexicons in spanish. In Proc. of the en TASS 2014: Análisis de sentimientos, international conference on Language Re- detección de tópicos y análisis de sen- sources and Evaluation (LREC), Istanbul, timientos de aspectos en twitter. Turkey. Martı́nez-Cámara, E., M. Martın-Valdivia, Vilares, D., M. A. Alonso, and C. Gómez- MD Molina-González, and L. Ureña Rodrı́guez. 2013. LyS at TASS 2013: López. 2013. Bilingual experiments on Analysing Spanish tweets by means of de- an opinion comparable corpus. WASSA pendency parsing, semantic-oriented lexi- 2013, 87. cons and psychometric word-properties. Mohammad, Saif M, Svetlana Kiritchenko, Vilares, David, Yerai Doval, Miguel A. and Xiaodan Zhu. 2013. Nrc-canada: Alonso, and Carlos Gómez-Rodrı́guez. Building the state-of-the-art in sentiment 2014. LyS at TASS 2014: a prototype analysis of tweets. In Second Joint Con- for extracting and analysing aspects from ference on Lexical and Computational Se- Spanish tweets. mantics (* SEM), volume 2, pages 321– Villena-Román, Julio, Janine Garcı́a-Morera, 327. Miguel A. Garcı́a-Cumbreras, Eugenio Montejo-Ráez, A, M.A. Garcı́a-Cumbreras, Martı́nez-Cámara, M. Teresa Martı́n- and M.C. Dı́az-Galiano. 2104. Partici- Valdivia, and L. Alfonso Ureña-López, ed- pación de SINAI Word2Vec en TASS 2014. itors. 2015a. Proc. of TASS 2015: Work- Mukherjee, Subhabrata and Pushpak Bhat- shop on Sentiment Analysis at SEPLN, tacharyya. 2012. Feature specific senti- number 1397 in CEUR Workshop Proc., ment analysis for product reviews. vol- Aachen. ume 7181 of Lecture Notes in Computer Villena-Román, Julio, Janine Garcı́a-Morera, Science, pages 475–487. Springer. Miguel A. Garcı́a-Cumbreras, Eugenio Pang, Bo, Lillian Lee, and Shivakumar Martı́nez-Cámara, M. Teresa Martı́n- Vaithyanathan. 2002. Thumbs up?: senti- Valdivia, and L. Alfonso Ureña-López. ment classification using machine learning 2015b. Overview of TASS 2015. techniques. In Proc. of the ACL-02 con- Westerski, Adam, Carlos A. Iglesias, and Fer- ference on Empirical methods in natural nando Tapia. 2011. Linked Opinions: language processing-Volume 10, pages 79– Describing Sentiments on the Structured 86. Association for Computational Lin- Web of Data. In Proc. of the 4th Interna- guistics. tional Workshop Social Data on the Web. 34