=Paper= {{Paper |id=Vol-1896/p3_tecnolengua_tass2017 |storemode=property |title=Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features |pdfUrl=https://ceur-ws.org/Vol-1896/p3_tecnolengua_tass2017.pdf |volume=Vol-1896 |authors=Antonio Moreno-Ortiz,Chantal Pérez Hernéndez }} ==Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features== https://ceur-ws.org/Vol-1896/p3_tecnolengua_tass2017.pdf
                   TASS 2017: Workshop on Semantic Analysis at SEPLN, septiembre 2017, págs. 35-42




  Tecnolengua Lingmotif at TASS 2017: Spanish
     Twitter Dataset Classification Combining
 Wide-coverage Lexical Resources and Text Features
    Tecnolengua Lingmotif en TASS 2017: Clasificación de
 polaridad de tuits en español combinando recursos léxicos de
            amplia cobertura con rasgos textuales.
                 Antonio Moreno-Ortiz & Chantal Pérez Hernández
                               University of Málaga
                                      Spain
                                {amo, mph}@uma.es

       Abstract: In this paper we describe our participation in TASS 2017 shared task
       on polarity classification of Spanish tweets. For this task we built a classification
       model based on the Lingmotif Spanish lexicon, and combined this with a number of
       formal text features, both general and CMC-specific, as well as single-word keywords
       and n-gram keywords, achieving above-average results across all three datasets. We
       report the results of our experiments with different combinations of said feature sets
       and machine learning algorithms (logistic regression and SVM).
       Keywords: sentiment analysis, twitter, polarity classification
       Resumen: En este artı́culo describimos nuestra participación en la tarea de clasi-
       ficación de polaridad de tweets en español del TASS 2017. Para esta tarea hemos
       desarrollado un modelo de clasificación basado en el lexicón español de Lingmotif,
       combinado con una serie de rasgos formales de los textos, tanto generales como es-
       pecı́ficos de la comunicación mediada por ordenador (CMC), junto con palabras y
       unidades fraseológicas clave, lo que nos ha permitido obtener unos resultados por
       encima de la media en los tres conjuntos de la prueba. Mostramos los resultados
       de nuestros experimentos con diferentes combinaciones de conjuntos de funciones y
       algoritmos de aprendizaje automático (regresión logı́stica y SVM).
       Palabras clave: análisis de sentimiento, twitter, clasificación de polaridad


1    Introduction                                               basis, thus being a milestone not only for
The use of microblogging sites in general, and                  Spanish Twitter content, but for sentiment
Twitter in particular, has become so well -                     analysis in general.
established that it is now a common source                          The General Corpus of TASS was pub-
to poll user opinion and even social happiness                  lished for TASS 2013 (Villena Román et al.,
(Abdullah et al., 2015). Its relevance as a so-                 2013), introducing aspect-based sentiment
cial hub can hardly be overestimated, and it                    analysis, consisting of over 68,000 polarity-
is now common for traditional media to refer-                   annotated tweets. Its creation followed cer-
ence Twitter trending topics as an indicator                    tain design criteria in terms of topics (poli-
of social concerns and interests.                               tics, football, literature, and entertainment)
    It is not surprising, then, that Twitter                    and users.
datasets are increasingly being used for sen-                       TASS 2017 (Martı́nez-Cámara et al.,
timent analysis shared tasks. The SemEval                       2017) keeps the Spain-only General Corpus
series of shared tasks included Sentiment                       of TASS, and introduces a new international
Analysis of English Twitter content in 2013                     corpus of Spanish tweets, named InterTASS.
(Nakov et al., 2013), and included other lan-                   The InterTASS corpus adds considerable dif-
guages in later editions. The TASS Work-                        ficulty to the tasks not only because of its
shop on Sentiment Analysis at SEPLN series                      multi-varietal nature, but also because, un-
started in 2012, and continued on a yearly                      like the General Corpus of TASS, content has
ISSN 1613-0073                     Copyright © 2017 by the paper's authors. Copying permitted for private and academic purposes.
                                    Antonio Moreno-Ortiz, Chantal Pérez Hernéndez



not been filtered or their users selected, which              makes it more difficult to compare results
introduces many and varied decoding issues.                   with those of other sentiment classification
                                                              shared tasks, where the none class is not con-
1.1   Classification tasks                                    sidered.
TASS 2017 proposes two classification tasks.
Task 1 focuses on sentiment analysis at the                   1.2      Lexicon-based Sentiment
tweet level, while Task 2 deals with aspect-                           Analysis
based sentiment classification. We took part                  Within Sentiment Analysis it is common
in Task 1, since we have not yet tackled                      to distinguish corpus-based approaches from
aspect-based sentiment analysis. The aim                      lexicon-based approaches. Although a com-
of this task is the automatic classification of               bination of both methods can be found in the
tweets in one of 4 levels: positive, nega-                    literature (Riloff, Patwardhan, and Wiebe,
tive, neutral, and none.                                      2006), Lexicon-based approaches are usu-
   The neutral/none distinction intro-                        ally preferred for sentence-level classification
duces added difficulty to the classification                  (Andreevskaia and Bergler, 2007), whereas
task. Tweets annotated as none are sup-                       corpus-based, statistical approaches are pre-
posed to express no sentiment whatsoever, as                  ferred for document-level classification.
in informative or declarative texts, whereas                      Using sentiment dictionaries has a long
the neutral category of tweets is meant to                    tradition in the field. WordNet (Fellbaum,
qualify tweets where both positive and neg-                   1998) has been a recurrent source of lexical
ative opinion is expressed, but they cancel                   information (Kim and Hovy, 2004; Hu and
each other out, resulting in a neutral overall                Liu, 2004; Adreevskaia and Bergler, 2006),
message.                                                      either directly, as a source of lexical informa-
   We believe this distinction is too fuzzy                   tion, or for sentiment lexicon construction.
to be annotated reliably. First, precise bal-                 Other common lexicons used in English sen-
ance of polarity is hardly ever found in any                  timent analysis research include The Gen-
message where sentiment is expressed: the                     eral Inquirer (Stone and Hunt, 1963), MPQA
message is usually ”negative/positive situa-                  (Wilson, Wiebe, and Hoffmann, 2005), and
tion x, somehow counterbalanced by posi-                      Bing Liu’s Opinion Lexicon (Hu and Liu,
tive/negative situation y”, with an entail-                   2004). Yet other researchers have used a
ment that the result is tilted to either side.                combination of existing lexicons or created
The following are examples of tweets tagged                   their own (Hatzivassiloglou and McKeown,
as neutral in the training set:                               1997; Turney, 2002). The use of lexicons
  • 768547351443169284 Parece que las cosas no te             has sometimes been straightforward, where
    van muy bien, espero que todo mejore, que todo            the mere presence of a sentiment word de-
    el mundo merece ser feliz.                                termines a given polarity. However, negation
  • 770417499317895168 No hay nada más bonito q              and intensification can alter the valence or
    separarse d una persona y q al tiempo t diga q            polarity of that word.1 Modification of sen-
    t echa de menos... pero a mi no m va a pasar
                                                              timent in context has also been widely rec-
  We also found a number of examples                          ognized and dealt with by some researchers
where tweets that clearly fell into none cases,               (Kennedy and Inkpen, 2006; Polanyi and Za-
where wrongly annotated as neutral:                           enen, 2006; Choi and Cardie, 2008; Taboada
  • 768588061496209408 Estas palabras, del Po-                et al., 2011).
    ema, INSTANTES, son de Nadine Stair. Es-                      However, the valence of a given word may
    critora norteamericana, a la q le gustan los hela-        vary greatly from one domain to another, a
    dos.                                                      fact well recognized in the literature (Aue
  • 767846757996847104 pues imaginate en una                  and Gamon, 2005; Pang and Lee, 2008; Choi,
    casa muy grande                                           Kim, and Myaeng, 2009), which causes prob-
  • 769993102442524674 Ninguno de los clubes lo               lems when a sentiment lexicon is the only
    hizo oficial pero se dice que sı́                         source of knowledge. A number of solutions
   These annotation issues are to be ex-                      have been proposed, mostly using ad hoc dic-
pected, due to the added cognitive load that                      1
                                                                   The use of the terms valence and polarity is used
is placed on the annotators, as other re-                     inconsistently in the literature. We use polarity to
searchers have pointed out (Mohammad and                      refer to the binary distinction positive/negative sen-
Bravo-Marquez, 2017a). Also, its presence                     timent, and valence to a value of intensity on a scale.
                                                         36
Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features



 tionaries, sometimes created automatically
 from a domain-specific corpus (Tai and Kao,
 2013; Lu et al., 2011).
    Our approach to using a lexicon takes
 some ideas from the aforementioned ap-
 proaches. We describe it in the next section.

 2     System description
 Our system for this polarity classification
 task relies on the availability of rich sets
 of lexical, sentiment, and (formal) text fea-
 tures, rather than on highly sophisticated al-                                     Figure 1: Lingmotif Learn
 gorithms. We basically used a logistic re-
 gression classifier trained on the optimal set
 of features after many feature combinations                           cessing datasets, getting the text run trough
 were tried on the training set. We also tried                         the Lingmotif SA engine, and feeding the
 a SVM classifier on the same feature sets, but                        resulting data into one of several machine
 we consistently obtained poorer results com-                          learning algorithms. Lingmotif Learn is able
 pared to the logistic regression classifier. Pa-                      to extract both Sentiment features and non-
 rameter finetuning on each classifier was very                        sentiment features, such as raw text metrics
 limited; we simply performed a grid search                            and keywords, and it makes it easy to experi-
 on the C parameter, which threw 100 as op-                            ment with different feature set combinations.
 timal. For the SVM classifier we found the
 RBF kernel to perform better than the lin-                            2.1      The Lingmotif tool
 ear kernel2 . We mostly focused on feature                            Sentiment features are returned by the Ling-
 selection and combination.                                            motif SA engine. Lingmotif (Moreno-Ortiz,
    We obtained good results on the three test                         2017a) is a user-friendly, multilingual, text
 datasets, with some important differences be-                         analysis application with a focus on senti-
 tween the InterTASS and General datasets.                             ment analysis that offers several modes of
 Results, however, were not as good as we had                          text analysis. It is not specifically geared to-
 anticipated based on our experiments on the                           wards any particular type of text or domain.
 training datasets. We discuss this in section                         It can analyze long documents, such as nar-
 3 below. Here we describe our general system                          ratives, medium-sized ones, such as political
 architecture and feature sets.                                        speeches and debates, and short to very short
    This TASS shared task is our first expe-                           texts, such as user reviews and tweets. For
 rience with Twitter data sentiment classifi-                          each of these, the tool offers different outputs
 cation proper, although we had the related                            and metrics.
 experience from our recent participation in                               For large collections of short texts, such
 WASSA-2017 Shared Task on Emotion In-                                 as Twitter datasets, it provides a multi-
 tensity (Mohammad and Bravo-Marquez,                                  document mode whose default output is clas-
 2017b). From this shared task we learnt the                           sification. In the current publicly available
 relevance and impact that other, non-lexical                          version this classification is entirely based on
 text features can have in microblogging texts.                        the Text Sentiment Score (TSS), which at-
    Since our focus was on identifying the pre-                        tempts to summarize the text’s overall po-
 dictive power of classification features, and                         larity on a 0-100 scale. TSS is calculated as
 intended to perform many experiments with                             a function of the text’s positive and nega-
 features combinations, we designed a simple                           tive scores and the sentiment intensity, which
 tool to facilitate this.                                              reflects the proportion of sentiment to non-
    This tool, Lingmotif Learn, is a GUI-                              sentiment lexical items in the text. Spe-
 enabled convenience tool that manages                                 cific details on TSS calculation can be found
 datasets and uses the Python-based scikit-                            in Moreno-Ortiz (2017a). A description of
 learn (Pedregosa et al., 2011) machine learn-                         its applications is found in Moreno-Ortiz
 ing toolkit. It facilitates loading and prepro-                       (2017b).
   2
     For the RBF kernel we used gamma=0.001,                               Lingmotif results are generated as a
 C=100. For the linear kernel we used C=1000.                          HTML/Javascript document, which is saved
                                                                 37
                                   Antonio Moreno-Ortiz, Chantal Pérez Hernéndez



      Name         Description                                 Name                Description
      tss          Text Sentiment Score                        sentences           Number of sentences
      tsi          Text Sentiment Intensity                    tt.ratio            Type/Token ratio
      sent.it      Number of lexical Items                     lex.items           Number of lexical items
      pos.sc       Positive score                              gram.items          Number of grammatical items
      neg.sc       Negative score                              vb.items            Number of verbs
      pos.it       Number of positive items                    nn.items            Number of nouns
      neg.it       Number of negative items                    nnp.items           Number of proper nouns
      neu.it       Number of neutral items                     jj.items            Number of adjectives
      split1.tss   TSS for split 1 of text                     rb.items            Number of adverbs
      split2.tss   TSS for split 2 of text                     chars               Number of characters
      sentences    Number of sentences                         intensifiers        Number of intensifiers
      shifters     Number of sentiment shifters                contrasters         Number of contrast words
                                                               emoticons           Number of emoticons/emojis
         Table 1: Sentiment feature set                        all.caps            Number of upper case words
                                                               char.ngrams         Number of character ngrams
                                                               x.marks             Number of exclamation marks
locally to a predefined location and auto-                     q.marks             Number of question marks
matically sent to the user’s default browser                   quote.marks         Number of quotation marks
                                                               susp.marks          Number of suspension marks
for immediate display. Internally, the ap-                     x.marks.seqs        Number of x.marks sequences
plication generates results as an XML doc-                     q.marks.seqs        Number of q.marks sequences
ument containing all the relevant data; this                   xq.marks.seqs       Number of x/q marks sequences
XML document is then parsed against one of                     handles             Number of Twitter handles
several available XSL templates, and trans-                    hashtags            Number of hashtags
                                                               urls                Number of URL’s
formed into the final HTML.
   Lingmotif Learn simply plugs into the in-                               Table 2: Text feature set
ternally generated XML document to retrieve
the desired sentiment analysis data, and ap-                 2.3      Text features
pends the data to each tweet as features.
                                                             Raw text features are commonly used in sen-
2.2     Sentiment features                                   timent analysis shared tasks successfully (e.g.
                                                             Mohammad, Kiritchenko, and Zhu (2013),
Table 1 summarizes the sentiment-related                     Kiritchenko et al. (2014)), including previ-
feature set generated by the Lingmotif en-                   ous editions of TASS (Cerón-Guzmán, 2016).
gine.                                                        The role of some of them is rather obvi-
    Most of these features are included in the               ous; the presence of emoticons or exclama-
original Lingmotif engine, but for this occa-                tion marks, for example, usually determines
sion we experimented with text splits to test                (strong) sentiment or opinion, thus being a
the relevance of the position of the sentiment               good candidate predictor for the none vs rest
words in the tweet. The features split1.tss                  distinction. The role of others, however, is
and split2.tss are the combined sentiment                    not as clear. For example, we consistently ob-
score for each half of the tweet. The assump-                tained better results using the gram.items
tion was that sentiment words used towards                   feature, whereas the number of lexical items
the end of the tweet may have more weight                    was not a good predictor. The number of
on the overall tweet polarity. This might be                 verbs, adjectives and adverbs also proved to
helpful especially for the P/N/NEU distinc-                  be useful, whereas the number of nouns did
tion. Neutral tweets are supposed to have                    not.
some balance between positivity and negativ-                    Table 2 contains the full list of text fea-
ity. In our tests with the training set, how-                tures we experimented with.
ever, adding these features did not improve
results. We also experimented with 3 splits,                 2.4      Keyword features
with the same results. These features were                   In order to account for words and expressions
thus discarded for test set classification.                  that convey sentiment but may not be in-
    Some of these features are in fact re-                   cluded in the sentiment lexicon, we experi-
dundant. Notably, tss already encapsulates                   mented with automatic keyword extraction
pos.sc, neg.sc, and neu.it. In our tests,                    for each of the classes in the training set.
the classifier performed better using just the               Automatic keyword and keyphrase extrac-
pos.sc and neg.sc values, than our calcu-                    tion is a well developed field and a number
lated tss, so we only used these two features.               of tools and methodologies have been pro-
                                                        38
Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features



      Name                 Description                                      Experiment           Macro-F1         Accuracy
      p.kw                 Positive keywords                                run3                     0.528            0.657
      p.ng.kw              Positive ngram keywords                          final                    0.517            0.632
      p.handles            Positive handles                                 no ngrams                0.508            0.652
      n.kw                 Negative keywords
      n.ng.kw              Negative ngram keywords
      n.handles            Negative handles
                                                                       Table 5: Official results for the General TASS
      neu.kw               Neutral keywords                            test set
      neu.ng.kw            Neutral ngram keywords
      neu.handles          Neutral handles
                                                                            Experiment           Macro-F1         Accuracy
      none.kw              None keywords                                    run3                     0.521            0.638
      none.ng.kw           None ngram keywords                              final                    0.488            0.618
      none.handles         None handles                                     run4                     0.483            0.612

           Table 3: Keywords feature set                               Table 6: Official results for the General
                                                                       TASS-1k test set
 posed. Hasan and Ng (2014) provide a good
 overview of the state-of-the-art techniques for                       the former than the latter. All our models
 keyphrase extraction.                                                 were trained on one dataset where both train-
     We used a very simple approach that                               ing datasets (General and InterTASS) where
 consisted in comparing frequencies of single                          merged. Perhaps better results would have
 words and ngrams (2 to 4 words) on a one-                             been obtained by training on each dataset
 vs-rest basis for each of our four classes, for                       separately.
 words and ngrams with a minimum frequency                                 The other reason for poorer performance
 of 2. We calculated and ranked keyness based                          on the InterTASS test set concerns the very
 on the chi-square statistic, and then manually                        different nature of the datasets. The Gen-
 removed irrelevant results. We ended up with                          eral Corpus of TASS consists of tweets gen-
 a list of 100 keywords and 100 keyphrases for                         erated by public figures (artists, politicians,
 each class. We did the same for Twitter han-                          journalists) with a large number of follow-
 dles.                                                                 ers. Such Twitter users are more predictable
     Using the keywords feature set improved                           both in terms of the content of their tweets
 results considerably in our tests with the                            and the language they use. They are also
 training set. However, this improvement did                           Castilian Spanish speakers entirely. Most of
 not transfer well to the test sets, especially in                     these tweets contain very compact but care-
 the case of the InterTASS dataset. We fur-                            fully chosen language, expressing users’ opin-
 ther discuss this issue in section 3.                                 ion or evaluation of polically or socially rel-
                                                                       evant events. On the other hand, the in-
 3     Experiments and Results                                         terTASS corpus shows much more variabil-
 Tables 4, 5, and 6 show our results for each of                       ity; first, the tweets were collected not only
 the test sets. Although performance is strong                         from Spain, but from several Latin American
 across all three, there clearly is a difference                       countries, which introduces important lexi-
 between the General TASS datasets, on the                             cal variability. Second, no user selection is
 one hand, and the InterTASS dataset on the                            apparent. Tweets were randomly collected
 other.                                                                from the whole Spanish speaking user base.
                                                                       This introduces spelling errors and a much
     Experiment            Macro-F1          Accuracy                  more colloquial and chatty language. Non-
     sent-only                 0.456             0.582                 lexical linguistic features, such as exclama-
     run3                      0.441             0.576
                                                                       tion marks, emojis or emoticons, are recur-
     sent-only-fixed           0.441             0.595
                                                                       rent, as are, user-to user messages, which
 Table 4: Official results for the InterTASS                           are of course hard-to-decode, since they pre-
 test set                                                              suppose certain privately shared knowledge.
                                                                       These issues have obviously affected the per-
    We believe this is due to two main reasons.                        formance of all TASS participants, as is clear
 First, the General training set (7,218 tweets)                        from the final leader board.
 is much larger than the InterTASS training                                We obtained the best results for the Gen-
 set (1,514 tweets, using both the training and                        eral datasets with our run3 experiment,
 development datasets). This of course pro-                            where we combined a selection of features
 vides a much more solid training base for                             from the three feature sets listed in tables
                                                                 39
                               Antonio Moreno-Ortiz, Chantal Pérez Hernéndez



                Features                                 model overfitting, with an obvious negative
      pos.sc        neu.kw                               impact on the classification of the test set.
      neg.sc        neu.ng.kw                            After this became apparent on our first re-
      vb.items      neu.handles                          sults upload, we corrected by reducing the
      jj.items      none.kw                              sets of keywords, keyphrases and user han-
      rb.items      none.ng.kw                           dles, which resulted in better overall results.
      gram.items    none.handles
      n.chars       emoticons                            4     Conclusions
      intensifiers  all.caps
                                                         This shared task has served us to assess the
      contrasters char.ngrams
                                                         usefulness of many different features as pre-
      p.kw          x.marks
                                                         dictors of polarity classification in Spanish
      p.ng.kw       q.marks
                                                         tweets. The differing sizes and characteristics
      p.handles     susp.marks
                                                         of the training and test datasets determined
      n.kw          hashtags
                                                         to some extent our results, but we also felt we
      n.ng.kw       handles
                                                         overfitted our model with too large a selec-
      n.handles     urls
                                                         tion of keywords, which threw overoptimistic
   Table 7: run3 experiment feature set                  results in our tests.
                                                             Our results on par with other participants
                Features                                 who used more sophisticated systems from
      pos.sc         handles                             the technical perspective, which is also an
      neg.sc         emoticons                           indication of the salient role that curated,
      vb.items       all.caps                            high-quality lexical resources play in senti-
      jj.items       char.ngrams                         ment analysis.
      rb.items       x.marks                                 We also experienced the negative impact
      gram.items     q.marks                             of model overfitting and learnt how to limit
      n.chars        susp.marks                          its effects. We plan to use this knowledge in
      intensifiers   urls                                future versions of Lingmotif, which currently
      contrasters hashtags                               uses sentiment features exclusively. It is ob-
                                                         vious that combining those with other formal
 Table 8: sent-only experiment feature set               features can improve results considerably.

                                                         Acknowledgments
1, 2, and 3. This selection was in fact the
optimal we found during our cross-validation             This research was supported by Spain’s
tests on the training dataset. Table 7 lists             MINECO through the funding of project
the feature set used in this experiment.                 Lingmotif2 (FFI2016-78141-P).
   Concerning the InterTass test set, the best
results were obtained with the sent-only ex-             References
periment, where a reduced set of features was            Abdullah, S., E. L. Murnane, J. M. Costa,
used. We list these features in table 8.                   and T. Choudhury. 2015. Collective
   We obtained better results for the Inter-               smile: Measuring societal happiness from
TASS test set using this reduced set of fea-               geolocated images. In Proceedings of the
tures because the keyword sets were caus-                  18th ACM Conference on Computer Sup-
ing noise, since they were extracted using the             ported Cooperative Work & Social
whole training set, which contained a much                 Computing, CSCW ’15, pages 361–374,
larger proportion of tweets from the General               New York, NY, USA. ACM.
TASS dataset.
                                                         Adreevskaia, A. and S. Bergler. 2006. Min-
   Another important aspect is the large dif-              ing wordnet for fuzzy sentiment: Senti-
ference that we encountered between our own                ment tag extraction from wordnet glosses.
tests on the training datasets and our final               In 11th Conference of the European Chap-
(official) results. For the General corpus of              ter of the Association for Computational
TASS, we consistently obtained very high F1                Linguistics, pages 209–216.
scores (upwards of 0.73) using the keyword
set, but much closer to the official results             Andreevskaia, A. and S. Bergler. 2007.
without them. This is a clear indication of                Clac and clac-nb: Knowledge-based and
                                                    40
Tecnolengua Lingmotif at TASS 2017: Spanish Twitter Dataset Classification Combining Wide-Coverage Lexical Resources and Text Features



     corpus-based approaches to sentiment tag-                             and data mining, pages 168–177, Seattle,
     ging.    In Proceedings of the 4th In-                                WA, USA. ACM.
     ternational Workshop on Semantic Eval-
                                                                       Kennedy, A. and D. Inkpen. 2006. Sentiment
     uations, SemEval ’07, pages 117–120,
                                                                         classification of movie reviews using con-
     Stroudsburg, PA, USA. Association for
                                                                         textual valence shifters. Computational
     Computational Linguistics.
                                                                         Intelligence, 22(2):110–125.
 Aue, A. and M. Gamon. 2005. Customizing                               Kim, S.-M. and E. Hovy. 2004. Determin-
   sentiment classifiers to new domains: A                               ing the sentiment of opinions. In Pro-
   case study. Borovets, Bulgaria.                                       ceedings of the 20th international confer-
 Cerón-Guzmán, J. A. 2016. Jacerong at tass                            ence on Computational Linguistics, page
   2016: An ensemble classifier for sentiment                            1367, Geneva, Switzerland. Association
   analysis of spanish tweets at global level.                           for Computational Linguistics.
   In Proceedings of TASS 2016: Work-                                  Kiritchenko, S., X. Zhu, C. Cherry, and
   shop on Sentiment Analysis at SEPLN                                    S. Mohammad. 2014. Nrc–canada-2014:
   co-located with 32nd SEPLN Conference                                  Detecting aspects and sentiment in cus-
   (SEPLN 2016), pages 35–39, Salamanca,                                  tomer reviews. In Proceedings of the
   Spain. SEPLN.                                                          8th International Workshop on Semantic
 Choi, Y. and C. Cardie. 2008. Learning with                              Evaluation (SemEval 2014), pages 437–
   compositional semantics as structural in-                              442, Dublin, Ireland, August. Association
   ference for subsentential sentiment analy-                             for Computational Linguistics and Dublin
   sis. In Proceedings of the Conference on                               City University.
   Empirical Methods in Natural Language                               Lu, Y., M. Castellanos, U. Dayal, and
   Processing, EMNLP ’08, pages 793–801,                                 C. Zhai. 2011. Automatic construction of
   Stroudsburg, PA, USA.                                                 a context-aware sentiment lexicon: An op-
 Choi, Y., Y. Kim, and S.-H. Myaeng. 2009.                               timization approach. In Proceedings of the
   Domain-specific sentiment analysis using                              20th International Conference on World
   contextual feature generation. In Proceed-                            Wide Web, WWW ’11, pages 347–356,
   ing of the 1st international CIKM work-                               New York, NY, USA. ACM.
   shop on Topic-sentiment analysis for mass                           Martı́nez-Cámara, E., M. C. Dı́az-Galiano,
   opinion, pages 37–44, Hong Kong, China.                               M. Á. Garcı́a-Cumbreras, M. Garcı́a-
   ACM.                                                                  Vega, and J. Villena-Román.        2017.
 Fellbaum, C., editor. 1998. WordNet An                                  Overview of tass 2017.        In J. Vil-
    Electronic Lexical Database. The MIT                                 lena Román, M. Á. Garcı́a Cumbreras,
    Press, Cambridge, MA; London, May.                                   E. Martı́nez-Cámara, M. C. Dı́az Galiano,
                                                                         and M. Garcı́a Vega, editors, Proceedings
 Hasan, K. S. and V. Ng. 2014. Auto-                                     of TASS 2017: Workshop on Semantic
   matic keyphrase extraction: A survey of                               Analysis at SEPLN (TASS 2017), vol-
   the state of the art. In Proceedings of the                           ume 1896 of CEUR Workshop Proceed-
   52nd Annual Meeting of the Association                                ings, Murcia, Spain, September. CEUR-
   for Computational Linguistics (Volume 1:                              WS.
   Long Papers), pages 1262–1273.
                                                                       Mohammad, S. and F. Bravo-Marquez.
 Hatzivassiloglou, V. and K. R. McKeown.                                 2017a. Emotion intensities in tweets. In
   1997. Predicting the semantic orientation                             Proceedings of the sixth joint conference
   of adjectives. In Proceedings of the eighth                           on lexical and computational semantics
   conference on European chapter of the As-                             (*Sem), Vancouver, Canada.
   sociation for Computational Linguistics,
                                                                       Mohammad, S. and F. Bravo-Marquez.
   pages 174–181, Madrid, Spain. Associa-
                                                                         2017b. Wassa-2017 shared task on emo-
   tion for Computational Linguistics.
                                                                         tion intensity.  In Proceedings of the
 Hu, M. and B. Liu. 2004. Mining and sum-                                EMNLP 2017 Workshop on Computa-
   marizing customer reviews. In Proceed-                                tional Approaches to Subjectivity, Sen-
   ings of the tenth ACM SIGKDD interna-                                 timent, and Social Media, Copenhagen,
   tional conference on Knowledge discovery                              Denmark, September.
                                                                 41
                               Antonio Moreno-Ortiz, Chantal Pérez Hernéndez



Mohammad, S. M., S. Kiritchenko, and                     Stone, P. J. and E. B. Hunt. 1963. A
  X. Zhu. 2013. Nrc-canada: Building the                    computer approach to content analysis:
  state-of-the-art in sentiment analysis of                 Studies using the general inquirer sys-
  tweets. In Proceedings of the seventh in-                 tem. In Proceedings of the May 21-23,
  ternational workshop on Semantic Evalu-                   1963, Spring Joint Computer Conference,
  ation Exercises (SemEval-2013), Atlanta,                  AFIPS ’63 (Spring), pages 241–256, New
  Georgia, USA, June.                                       York, NY, USA. ACM.
Moreno-Ortiz, A. 2017a. Lingmotif: A user-               Taboada, M., J. Brooks, M. Tofiloski,
  focused sentiment analysis tool. Proce-                  K. Voll, and M. Stede. 2011. Lexicon-
  samiento del Lenguaje Natural, 58(0):133–                based methods for sentiment analysis.
  140, March.                                              Computational Linguistics, 37(2):267–
                                                           307.
Moreno-Ortiz, A. 2017b. Lingmotif: Senti-
  ment analysis for the digital humanities.              Tai, Y.-J. and H.-Y. Kao. 2013. Automatic
  In Proceedings of the 15th Conference of                  domain-specific sentiment lexicon genera-
  the European Chapter of the Association                   tion with label propagation. In Proceed-
  for Computational Linguistics, pages 73–                  ings of International Conference on In-
  76, Valencia, Spain, April. Association for               formation Integration and Web-based Ap-
  Computational Linguistics.                                plications & Services, IIWAS ’13, pages
                                                            53:53–53:62, New York, NY, USA. ACM.
Nakov, P., Z. Kozareva, A. Ritter, S. Rosen-
                                                         Turney, P. D. 2002. Thumbs up or thumbs
  thal, V. Stoyanov, and T. Wilson. 2013.
                                                           down? semantic orientation applied to
  Semeval-2013 task 2: Sentiment analy-
                                                           unsupervised classification of reviews. In
  sis in twitter. In Proceedings of the Sev-
                                                           Proceedings of the 40th Annual Meeting
  enth International Workshop on Seman-
                                                           of the Association for Computational Lin-
  tic Evaluation (SemEval 2013), Atlanta,
                                                           guistics (ACL), pages 417–424, Philadel-
  Georgia, USA, June.
                                                           phia, USA.
Pang, B. and L. Lee. 2008. Opinion mining                Villena Román, J., S. Lana Serrano,
  and sentiment analysis. Foundations and                   E. Martı́nez Cámara, and J. C.
  Trends in Information Retrieval, 2(1-2):1–                González Cristóbal.  2013.     Tass -
  135.                                                      workshop on sentiment analysis at sepln.
Pedregosa, F., G. Varoquaux, A. Gram-                       Procesamiento del Lenguaje Natural,
  fort, V. Michel, B. Thirion, O. Grisel,                   50:37–44.
  M. Blondel, P. Prettenhofer, R. Weiss,                 Wilson, T., J. Wiebe, and P. Hoffmann.
  V. Dubourg, J. Vanderplas, A. Passos,                    2005. Recognizing contextual polarity
  D. Cournapeau, M. Brucher, M. Perrot,                    in phrase-level sentiment analysis. In
  and É. Duchesnay. 2011. Scikit-learn:                   Proceedings of the Conference on Hu-
  Machine learning in python. J. Mach.                     man Language Technology and Empirical
  Learn. Res., 12:2825–2830, November.                     Methods in Natural Language Processing,
                                                           HLT ’05, pages 347–354, Stroudsburg, PA,
Polanyi, L. and A. Zaenen. 2006. Contex-
                                                           USA. Association for Computational Lin-
   tual valence shifters. In Computing Atti-
                                                           guistics.
   tude and Affect in Text: Theory and Ap-
   plications, volume 20 of The Information
   Retrieval Series. Springer, Dordrecht, The
   Netherlands, shanahan, james g., qu, yan,
   wiebe, janyce edition, pages 1–10.
Riloff, E., S. Patwardhan, and J. Wiebe.
   2006. Feature subsumption for opinion
   analysis. In Proceedings of the 2006 Con-
   ference on Empirical Methods in Natural
   Language Processing, EMNLP ’06, pages
   440–448, Stroudsburg, PA, USA. Associa-
   tion for Computational Linguistics.
                                                    42