TASS 2017: Workshop on Semantic Analysis at SEPLN, septiembre 2017, págs. 35-42 Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-coverage Lexical Resources and Text Features Tecnolengua Lingmotif en TASS 2017: Clasificación de polaridad de tuits en español combinando recursos léxicos de amplia cobertura con rasgos textuales. Antonio Moreno-Ortiz & Chantal Pérez Hernández University of Málaga Spain {amo, mph}@uma.es Abstract: In this paper we describe our participation in TASS 2017 shared task on polarity classification of Spanish tweets. For this task we built a classification model based on the Lingmotif Spanish lexicon, and combined this with a number of formal text features, both general and CMC-specific, as well as single-word keywords and n-gram keywords, achieving above-average results across all three datasets. We report the results of our experiments with different combinations of said feature sets and machine learning algorithms (logistic regression and SVM). Keywords: sentiment analysis, twitter, polarity classification Resumen: En este artı́culo describimos nuestra participación en la tarea de clasi- ficación de polaridad de tweets en español del TASS 2017. Para esta tarea hemos desarrollado un modelo de clasificación basado en el lexicón español de Lingmotif, combinado con una serie de rasgos formales de los textos, tanto generales como es- pecı́ficos de la comunicación mediada por ordenador (CMC), junto con palabras y unidades fraseológicas clave, lo que nos ha permitido obtener unos resultados por encima de la media en los tres conjuntos de la prueba. Mostramos los resultados de nuestros experimentos con diferentes combinaciones de conjuntos de funciones y algoritmos de aprendizaje automático (regresión logı́stica y SVM). Palabras clave: análisis de sentimiento, twitter, clasificación de polaridad 1 Introduction basis, thus being a milestone not only for The use of microblogging sites in general, and Spanish Twitter content, but for sentiment Twitter in particular, has become so well - analysis in general. established that it is now a common source The General Corpus of TASS was pub- to poll user opinion and even social happiness lished for TASS 2013 (Villena Román et al., (Abdullah et al., 2015). Its relevance as a so- 2013), introducing aspect-based sentiment cial hub can hardly be overestimated, and it analysis, consisting of over 68,000 polarity- is now common for traditional media to refer- annotated tweets. Its creation followed cer- ence Twitter trending topics as an indicator tain design criteria in terms of topics (poli- of social concerns and interests. tics, football, literature, and entertainment) It is not surprising, then, that Twitter and users. datasets are increasingly being used for sen- TASS 2017 (Martı́nez-Cámara et al., timent analysis shared tasks. The SemEval 2017) keeps the Spain-only General Corpus series of shared tasks included Sentiment of TASS, and introduces a new international Analysis of English Twitter content in 2013 corpus of Spanish tweets, named InterTASS. (Nakov et al., 2013), and included other lan- The InterTASS corpus adds considerable dif- guages in later editions. The TASS Work- ficulty to the tasks not only because of its shop on Sentiment Analysis at SEPLN series multi-varietal nature, but also because, un- started in 2012, and continued on a yearly like the General Corpus of TASS, content has ISSN 1613-0073 Copyright © 2017 by the paper's authors. Copying permitted for private and academic purposes. Antonio Moreno-Ortiz, Chantal Pérez Hernéndez not been filtered or their users selected, which makes it more difficult to compare results introduces many and varied decoding issues. with those of other sentiment classification shared tasks, where the none class is not con- 1.1 Classification tasks sidered. TASS 2017 proposes two classification tasks. Task 1 focuses on sentiment analysis at the 1.2 Lexicon-based Sentiment tweet level, while Task 2 deals with aspect- Analysis based sentiment classification. We took part Within Sentiment Analysis it is common in Task 1, since we have not yet tackled to distinguish corpus-based approaches from aspect-based sentiment analysis. The aim lexicon-based approaches. Although a com- of this task is the automatic classification of bination of both methods can be found in the tweets in one of 4 levels: positive, nega- literature (Riloff, Patwardhan, and Wiebe, tive, neutral, and none. 2006), Lexicon-based approaches are usu- The neutral/none distinction intro- ally preferred for sentence-level classification duces added difficulty to the classification (Andreevskaia and Bergler, 2007), whereas task. Tweets annotated as none are sup- corpus-based, statistical approaches are pre- posed to express no sentiment whatsoever, as ferred for document-level classification. in informative or declarative texts, whereas Using sentiment dictionaries has a long the neutral category of tweets is meant to tradition in the field. WordNet (Fellbaum, qualify tweets where both positive and neg- 1998) has been a recurrent source of lexical ative opinion is expressed, but they cancel information (Kim and Hovy, 2004; Hu and each other out, resulting in a neutral overall Liu, 2004; Adreevskaia and Bergler, 2006), message. either directly, as a source of lexical informa- We believe this distinction is too fuzzy tion, or for sentiment lexicon construction. to be annotated reliably. First, precise bal- Other common lexicons used in English sen- ance of polarity is hardly ever found in any timent analysis research include The Gen- message where sentiment is expressed: the eral Inquirer (Stone and Hunt, 1963), MPQA message is usually ”negative/positive situa- (Wilson, Wiebe, and Hoffmann, 2005), and tion x, somehow counterbalanced by posi- Bing Liu’s Opinion Lexicon (Hu and Liu, tive/negative situation y”, with an entail- 2004). Yet other researchers have used a ment that the result is tilted to either side. combination of existing lexicons or created The following are examples of tweets tagged their own (Hatzivassiloglou and McKeown, as neutral in the training set: 1997; Turney, 2002). The use of lexicons • 768547351443169284 Parece que las cosas no te has sometimes been straightforward, where van muy bien, espero que todo mejore, que todo the mere presence of a sentiment word de- el mundo merece ser feliz. termines a given polarity. However, negation • 770417499317895168 No hay nada más bonito q and intensification can alter the valence or separarse d una persona y q al tiempo t diga q polarity of that word.1 Modification of sen- t echa de menos... pero a mi no m va a pasar timent in context has also been widely rec- We also found a number of examples ognized and dealt with by some researchers where tweets that clearly fell into none cases, (Kennedy and Inkpen, 2006; Polanyi and Za- where wrongly annotated as neutral: enen, 2006; Choi and Cardie, 2008; Taboada • 768588061496209408 Estas palabras, del Po- et al., 2011). ema, INSTANTES, son de Nadine Stair. Es- However, the valence of a given word may critora norteamericana, a la q le gustan los hela- vary greatly from one domain to another, a dos. fact well recognized in the literature (Aue • 767846757996847104 pues imaginate en una and Gamon, 2005; Pang and Lee, 2008; Choi, casa muy grande Kim, and Myaeng, 2009), which causes prob- • 769993102442524674 Ninguno de los clubes lo lems when a sentiment lexicon is the only hizo oficial pero se dice que sı́ source of knowledge. A number of solutions These annotation issues are to be ex- have been proposed, mostly using ad hoc dic- pected, due to the added cognitive load that 1 The use of the terms valence and polarity is used is placed on the annotators, as other re- inconsistently in the literature. We use polarity to searchers have pointed out (Mohammad and refer to the binary distinction positive/negative sen- Bravo-Marquez, 2017a). Also, its presence timent, and valence to a value of intensity on a scale. 36 Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features tionaries, sometimes created automatically from a domain-specific corpus (Tai and Kao, 2013; Lu et al., 2011). Our approach to using a lexicon takes some ideas from the aforementioned ap- proaches. We describe it in the next section. 2 System description Our system for this polarity classification task relies on the availability of rich sets of lexical, sentiment, and (formal) text fea- tures, rather than on highly sophisticated al- Figure 1: Lingmotif Learn gorithms. We basically used a logistic re- gression classifier trained on the optimal set of features after many feature combinations cessing datasets, getting the text run trough were tried on the training set. We also tried the Lingmotif SA engine, and feeding the a SVM classifier on the same feature sets, but resulting data into one of several machine we consistently obtained poorer results com- learning algorithms. Lingmotif Learn is able pared to the logistic regression classifier. Pa- to extract both Sentiment features and non- rameter finetuning on each classifier was very sentiment features, such as raw text metrics limited; we simply performed a grid search and keywords, and it makes it easy to experi- on the C parameter, which threw 100 as op- ment with different feature set combinations. timal. For the SVM classifier we found the RBF kernel to perform better than the lin- 2.1 The Lingmotif tool ear kernel2 . We mostly focused on feature Sentiment features are returned by the Ling- selection and combination. motif SA engine. Lingmotif (Moreno-Ortiz, We obtained good results on the three test 2017a) is a user-friendly, multilingual, text datasets, with some important differences be- analysis application with a focus on senti- tween the InterTASS and General datasets. ment analysis that offers several modes of Results, however, were not as good as we had text analysis. It is not specifically geared to- anticipated based on our experiments on the wards any particular type of text or domain. training datasets. We discuss this in section It can analyze long documents, such as nar- 3 below. Here we describe our general system ratives, medium-sized ones, such as political architecture and feature sets. speeches and debates, and short to very short This TASS shared task is our first expe- texts, such as user reviews and tweets. For rience with Twitter data sentiment classifi- each of these, the tool offers different outputs cation proper, although we had the related and metrics. experience from our recent participation in For large collections of short texts, such WASSA-2017 Shared Task on Emotion In- as Twitter datasets, it provides a multi- tensity (Mohammad and Bravo-Marquez, document mode whose default output is clas- 2017b). From this shared task we learnt the sification. In the current publicly available relevance and impact that other, non-lexical version this classification is entirely based on text features can have in microblogging texts. the Text Sentiment Score (TSS), which at- Since our focus was on identifying the pre- tempts to summarize the text’s overall po- dictive power of classification features, and larity on a 0-100 scale. TSS is calculated as intended to perform many experiments with a function of the text’s positive and nega- features combinations, we designed a simple tive scores and the sentiment intensity, which tool to facilitate this. reflects the proportion of sentiment to non- This tool, Lingmotif Learn, is a GUI- sentiment lexical items in the text. Spe- enabled convenience tool that manages cific details on TSS calculation can be found datasets and uses the Python-based scikit- in Moreno-Ortiz (2017a). A description of learn (Pedregosa et al., 2011) machine learn- its applications is found in Moreno-Ortiz ing toolkit. It facilitates loading and prepro- (2017b). 2 For the RBF kernel we used gamma=0.001, Lingmotif results are generated as a C=100. For the linear kernel we used C=1000. HTML/Javascript document, which is saved 37 Antonio Moreno-Ortiz, Chantal Pérez Hernéndez Name Description Name Description tss Text Sentiment Score sentences Number of sentences tsi Text Sentiment Intensity tt.ratio Type/Token ratio sent.it Number of lexical Items lex.items Number of lexical items pos.sc Positive score gram.items Number of grammatical items neg.sc Negative score vb.items Number of verbs pos.it Number of positive items nn.items Number of nouns neg.it Number of negative items nnp.items Number of proper nouns neu.it Number of neutral items jj.items Number of adjectives split1.tss TSS for split 1 of text rb.items Number of adverbs split2.tss TSS for split 2 of text chars Number of characters sentences Number of sentences intensifiers Number of intensifiers shifters Number of sentiment shifters contrasters Number of contrast words emoticons Number of emoticons/emojis Table 1: Sentiment feature set all.caps Number of upper case words char.ngrams Number of character ngrams x.marks Number of exclamation marks locally to a predefined location and auto- q.marks Number of question marks matically sent to the user’s default browser quote.marks Number of quotation marks susp.marks Number of suspension marks for immediate display. Internally, the ap- x.marks.seqs Number of x.marks sequences plication generates results as an XML doc- q.marks.seqs Number of q.marks sequences ument containing all the relevant data; this xq.marks.seqs Number of x/q marks sequences XML document is then parsed against one of handles Number of Twitter handles several available XSL templates, and trans- hashtags Number of hashtags urls Number of URL’s formed into the final HTML. Lingmotif Learn simply plugs into the in- Table 2: Text feature set ternally generated XML document to retrieve the desired sentiment analysis data, and ap- 2.3 Text features pends the data to each tweet as features. Raw text features are commonly used in sen- 2.2 Sentiment features timent analysis shared tasks successfully (e.g. Mohammad, Kiritchenko, and Zhu (2013), Table 1 summarizes the sentiment-related Kiritchenko et al. (2014)), including previ- feature set generated by the Lingmotif en- ous editions of TASS (Cerón-Guzmán, 2016). gine. The role of some of them is rather obvi- Most of these features are included in the ous; the presence of emoticons or exclama- original Lingmotif engine, but for this occa- tion marks, for example, usually determines sion we experimented with text splits to test (strong) sentiment or opinion, thus being a the relevance of the position of the sentiment good candidate predictor for the none vs rest words in the tweet. The features split1.tss distinction. The role of others, however, is and split2.tss are the combined sentiment not as clear. For example, we consistently ob- score for each half of the tweet. The assump- tained better results using the gram.items tion was that sentiment words used towards feature, whereas the number of lexical items the end of the tweet may have more weight was not a good predictor. The number of on the overall tweet polarity. This might be verbs, adjectives and adverbs also proved to helpful especially for the P/N/NEU distinc- be useful, whereas the number of nouns did tion. Neutral tweets are supposed to have not. some balance between positivity and negativ- Table 2 contains the full list of text fea- ity. In our tests with the training set, how- tures we experimented with. ever, adding these features did not improve results. We also experimented with 3 splits, 2.4 Keyword features with the same results. These features were In order to account for words and expressions thus discarded for test set classification. that convey sentiment but may not be in- Some of these features are in fact re- cluded in the sentiment lexicon, we experi- dundant. Notably, tss already encapsulates mented with automatic keyword extraction pos.sc, neg.sc, and neu.it. In our tests, for each of the classes in the training set. the classifier performed better using just the Automatic keyword and keyphrase extrac- pos.sc and neg.sc values, than our calcu- tion is a well developed field and a number lated tss, so we only used these two features. of tools and methodologies have been pro- 38 Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features Name Description Experiment Macro-F1 Accuracy p.kw Positive keywords run3 0.528 0.657 p.ng.kw Positive ngram keywords final 0.517 0.632 p.handles Positive handles no ngrams 0.508 0.652 n.kw Negative keywords n.ng.kw Negative ngram keywords n.handles Negative handles Table 5: Official results for the General TASS neu.kw Neutral keywords test set neu.ng.kw Neutral ngram keywords neu.handles Neutral handles Experiment Macro-F1 Accuracy none.kw None keywords run3 0.521 0.638 none.ng.kw None ngram keywords final 0.488 0.618 none.handles None handles run4 0.483 0.612 Table 3: Keywords feature set Table 6: Official results for the General TASS-1k test set posed. Hasan and Ng (2014) provide a good overview of the state-of-the-art techniques for the former than the latter. All our models keyphrase extraction. were trained on one dataset where both train- We used a very simple approach that ing datasets (General and InterTASS) where consisted in comparing frequencies of single merged. Perhaps better results would have words and ngrams (2 to 4 words) on a one- been obtained by training on each dataset vs-rest basis for each of our four classes, for separately. words and ngrams with a minimum frequency The other reason for poorer performance of 2. We calculated and ranked keyness based on the InterTASS test set concerns the very on the chi-square statistic, and then manually different nature of the datasets. The Gen- removed irrelevant results. We ended up with eral Corpus of TASS consists of tweets gen- a list of 100 keywords and 100 keyphrases for erated by public figures (artists, politicians, each class. We did the same for Twitter han- journalists) with a large number of follow- dles. ers. Such Twitter users are more predictable Using the keywords feature set improved both in terms of the content of their tweets results considerably in our tests with the and the language they use. They are also training set. However, this improvement did Castilian Spanish speakers entirely. Most of not transfer well to the test sets, especially in these tweets contain very compact but care- the case of the InterTASS dataset. We fur- fully chosen language, expressing users’ opin- ther discuss this issue in section 3. ion or evaluation of polically or socially rel- evant events. On the other hand, the in- 3 Experiments and Results terTASS corpus shows much more variabil- Tables 4, 5, and 6 show our results for each of ity; first, the tweets were collected not only the test sets. Although performance is strong from Spain, but from several Latin American across all three, there clearly is a difference countries, which introduces important lexi- between the General TASS datasets, on the cal variability. Second, no user selection is one hand, and the InterTASS dataset on the apparent. Tweets were randomly collected other. from the whole Spanish speaking user base. This introduces spelling errors and a much Experiment Macro-F1 Accuracy more colloquial and chatty language. Non- sent-only 0.456 0.582 lexical linguistic features, such as exclama- run3 0.441 0.576 tion marks, emojis or emoticons, are recur- sent-only-fixed 0.441 0.595 rent, as are, user-to user messages, which Table 4: Official results for the InterTASS are of course hard-to-decode, since they pre- test set suppose certain privately shared knowledge. These issues have obviously affected the per- We believe this is due to two main reasons. formance of all TASS participants, as is clear First, the General training set (7,218 tweets) from the final leader board. is much larger than the InterTASS training We obtained the best results for the Gen- set (1,514 tweets, using both the training and eral datasets with our run3 experiment, development datasets). This of course pro- where we combined a selection of features vides a much more solid training base for from the three feature sets listed in tables 39 Antonio Moreno-Ortiz, Chantal Pérez Hernéndez Features model overfitting, with an obvious negative pos.sc neu.kw impact on the classification of the test set. neg.sc neu.ng.kw After this became apparent on our first re- vb.items neu.handles sults upload, we corrected by reducing the jj.items none.kw sets of keywords, keyphrases and user han- rb.items none.ng.kw dles, which resulted in better overall results. gram.items none.handles n.chars emoticons 4 Conclusions intensifiers all.caps This shared task has served us to assess the contrasters char.ngrams usefulness of many different features as pre- p.kw x.marks dictors of polarity classification in Spanish p.ng.kw q.marks tweets. The differing sizes and characteristics p.handles susp.marks of the training and test datasets determined n.kw hashtags to some extent our results, but we also felt we n.ng.kw handles overfitted our model with too large a selec- n.handles urls tion of keywords, which threw overoptimistic Table 7: run3 experiment feature set results in our tests. Our results on par with other participants Features who used more sophisticated systems from pos.sc handles the technical perspective, which is also an neg.sc emoticons indication of the salient role that curated, vb.items all.caps high-quality lexical resources play in senti- jj.items char.ngrams ment analysis. rb.items x.marks We also experienced the negative impact gram.items q.marks of model overfitting and learnt how to limit n.chars susp.marks its effects. We plan to use this knowledge in intensifiers urls future versions of Lingmotif, which currently contrasters hashtags uses sentiment features exclusively. It is ob- vious that combining those with other formal Table 8: sent-only experiment feature set features can improve results considerably. Acknowledgments 1, 2, and 3. This selection was in fact the optimal we found during our cross-validation This research was supported by Spain’s tests on the training dataset. Table 7 lists MINECO through the funding of project the feature set used in this experiment. Lingmotif2 (FFI2016-78141-P). Concerning the InterTass test set, the best results were obtained with the sent-only ex- References periment, where a reduced set of features was Abdullah, S., E. L. Murnane, J. M. Costa, used. We list these features in table 8. and T. Choudhury. 2015. Collective We obtained better results for the Inter- smile: Measuring societal happiness from TASS test set using this reduced set of fea- geolocated images. In Proceedings of the tures because the keyword sets were caus- 18th ACM Conference on Computer Sup- ing noise, since they were extracted using the ported Cooperative Work & Social whole training set, which contained a much Computing, CSCW ’15, pages 361–374, larger proportion of tweets from the General New York, NY, USA. ACM. TASS dataset. Adreevskaia, A. and S. Bergler. 2006. Min- Another important aspect is the large dif- ing wordnet for fuzzy sentiment: Senti- ference that we encountered between our own ment tag extraction from wordnet glosses. tests on the training datasets and our final In 11th Conference of the European Chap- (official) results. For the General corpus of ter of the Association for Computational TASS, we consistently obtained very high F1 Linguistics, pages 209–216. scores (upwards of 0.73) using the keyword set, but much closer to the official results Andreevskaia, A. and S. Bergler. 2007. without them. This is a clear indication of Clac and clac-nb: Knowledge-based and 40 Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features corpus-based approaches to sentiment tag- and data mining, pages 168–177, Seattle, ging. In Proceedings of the 4th In- WA, USA. ACM. ternational Workshop on Semantic Eval- Kennedy, A. and D. Inkpen. 2006. Sentiment uations, SemEval ’07, pages 117–120, classification of movie reviews using con- Stroudsburg, PA, USA. Association for textual valence shifters. Computational Computational Linguistics. Intelligence, 22(2):110–125. Aue, A. and M. Gamon. 2005. Customizing Kim, S.-M. and E. Hovy. 2004. Determin- sentiment classifiers to new domains: A ing the sentiment of opinions. In Pro- case study. Borovets, Bulgaria. ceedings of the 20th international confer- Cerón-Guzmán, J. A. 2016. Jacerong at tass ence on Computational Linguistics, page 2016: An ensemble classifier for sentiment 1367, Geneva, Switzerland. Association analysis of spanish tweets at global level. for Computational Linguistics. In Proceedings of TASS 2016: Work- Kiritchenko, S., X. Zhu, C. Cherry, and shop on Sentiment Analysis at SEPLN S. Mohammad. 2014. Nrc–canada-2014: co-located with 32nd SEPLN Conference Detecting aspects and sentiment in cus- (SEPLN 2016), pages 35–39, Salamanca, tomer reviews. In Proceedings of the Spain. SEPLN. 8th International Workshop on Semantic Choi, Y. and C. Cardie. 2008. Learning with Evaluation (SemEval 2014), pages 437– compositional semantics as structural in- 442, Dublin, Ireland, August. Association ference for subsentential sentiment analy- for Computational Linguistics and Dublin sis. In Proceedings of the Conference on City University. Empirical Methods in Natural Language Lu, Y., M. Castellanos, U. Dayal, and Processing, EMNLP ’08, pages 793–801, C. Zhai. 2011. Automatic construction of Stroudsburg, PA, USA. a context-aware sentiment lexicon: An op- Choi, Y., Y. Kim, and S.-H. Myaeng. 2009. timization approach. In Proceedings of the Domain-specific sentiment analysis using 20th International Conference on World contextual feature generation. In Proceed- Wide Web, WWW ’11, pages 347–356, ing of the 1st international CIKM work- New York, NY, USA. ACM. shop on Topic-sentiment analysis for mass Martı́nez-Cámara, E., M. C. Dı́az-Galiano, opinion, pages 37–44, Hong Kong, China. M. Á. Garcı́a-Cumbreras, M. Garcı́a- ACM. Vega, and J. Villena-Román. 2017. Fellbaum, C., editor. 1998. WordNet An Overview of tass 2017. In J. Vil- Electronic Lexical Database. The MIT lena Román, M. Á. Garcı́a Cumbreras, Press, Cambridge, MA; London, May. E. Martı́nez-Cámara, M. C. Dı́az Galiano, and M. Garcı́a Vega, editors, Proceedings Hasan, K. S. and V. Ng. 2014. Auto- of TASS 2017: Workshop on Semantic matic keyphrase extraction: A survey of Analysis at SEPLN (TASS 2017), vol- the state of the art. In Proceedings of the ume 1896 of CEUR Workshop Proceed- 52nd Annual Meeting of the Association ings, Murcia, Spain, September. CEUR- for Computational Linguistics (Volume 1: WS. Long Papers), pages 1262–1273. Mohammad, S. and F. Bravo-Marquez. Hatzivassiloglou, V. and K. R. McKeown. 2017a. Emotion intensities in tweets. In 1997. Predicting the semantic orientation Proceedings of the sixth joint conference of adjectives. In Proceedings of the eighth on lexical and computational semantics conference on European chapter of the As- (*Sem), Vancouver, Canada. sociation for Computational Linguistics, Mohammad, S. and F. Bravo-Marquez. pages 174–181, Madrid, Spain. Associa- 2017b. Wassa-2017 shared task on emo- tion for Computational Linguistics. tion intensity. In Proceedings of the Hu, M. and B. Liu. 2004. Mining and sum- EMNLP 2017 Workshop on Computa- marizing customer reviews. In Proceed- tional Approaches to Subjectivity, Sen- ings of the tenth ACM SIGKDD interna- timent, and Social Media, Copenhagen, tional conference on Knowledge discovery Denmark, September. 41 Antonio Moreno-Ortiz, Chantal Pérez Hernéndez Mohammad, S. M., S. Kiritchenko, and Stone, P. J. and E. B. Hunt. 1963. A X. Zhu. 2013. Nrc-canada: Building the computer approach to content analysis: state-of-the-art in sentiment analysis of Studies using the general inquirer sys- tweets. In Proceedings of the seventh in- tem. In Proceedings of the May 21-23, ternational workshop on Semantic Evalu- 1963, Spring Joint Computer Conference, ation Exercises (SemEval-2013), Atlanta, AFIPS ’63 (Spring), pages 241–256, New Georgia, USA, June. York, NY, USA. ACM. Moreno-Ortiz, A. 2017a. Lingmotif: A user- Taboada, M., J. Brooks, M. Tofiloski, focused sentiment analysis tool. Proce- K. Voll, and M. Stede. 2011. Lexicon- samiento del Lenguaje Natural, 58(0):133– based methods for sentiment analysis. 140, March. Computational Linguistics, 37(2):267– 307. Moreno-Ortiz, A. 2017b. Lingmotif: Senti- ment analysis for the digital humanities. Tai, Y.-J. and H.-Y. Kao. 2013. Automatic In Proceedings of the 15th Conference of domain-specific sentiment lexicon genera- the European Chapter of the Association tion with label propagation. In Proceed- for Computational Linguistics, pages 73– ings of International Conference on In- 76, Valencia, Spain, April. Association for formation Integration and Web-based Ap- Computational Linguistics. plications & Services, IIWAS ’13, pages 53:53–53:62, New York, NY, USA. ACM. Nakov, P., Z. Kozareva, A. Ritter, S. Rosen- Turney, P. D. 2002. Thumbs up or thumbs thal, V. Stoyanov, and T. Wilson. 2013. down? semantic orientation applied to Semeval-2013 task 2: Sentiment analy- unsupervised classification of reviews. In sis in twitter. In Proceedings of the Sev- Proceedings of the 40th Annual Meeting enth International Workshop on Seman- of the Association for Computational Lin- tic Evaluation (SemEval 2013), Atlanta, guistics (ACL), pages 417–424, Philadel- Georgia, USA, June. phia, USA. Pang, B. and L. Lee. 2008. Opinion mining Villena Román, J., S. Lana Serrano, and sentiment analysis. Foundations and E. Martı́nez Cámara, and J. C. Trends in Information Retrieval, 2(1-2):1– González Cristóbal. 2013. Tass - 135. workshop on sentiment analysis at sepln. Pedregosa, F., G. Varoquaux, A. Gram- Procesamiento del Lenguaje Natural, fort, V. Michel, B. Thirion, O. Grisel, 50:37–44. M. Blondel, P. Prettenhofer, R. Weiss, Wilson, T., J. Wiebe, and P. Hoffmann. V. Dubourg, J. Vanderplas, A. Passos, 2005. Recognizing contextual polarity D. Cournapeau, M. Brucher, M. Perrot, in phrase-level sentiment analysis. In and É. Duchesnay. 2011. Scikit-learn: Proceedings of the Conference on Hu- Machine learning in python. J. Mach. man Language Technology and Empirical Learn. Res., 12:2825–2830, November. Methods in Natural Language Processing, HLT ’05, pages 347–354, Stroudsburg, PA, Polanyi, L. and A. Zaenen. 2006. Contex- USA. Association for Computational Lin- tual valence shifters. In Computing Atti- guistics. tude and Affect in Text: Theory and Ap- plications, volume 20 of The Information Retrieval Series. Springer, Dordrecht, The Netherlands, shanahan, james g., qu, yan, wiebe, janyce edition, pages 1–10. Riloff, E., S. Patwardhan, and J. Wiebe. 2006. Feature subsumption for opinion analysis. In Proceedings of the 2006 Con- ference on Empirical Methods in Natural Language Processing, EMNLP ’06, pages 440–448, Stroudsburg, PA, USA. Associa- tion for Computational Linguistics. 42