TASS 2016


                                                                                                     CEUR Workshop Proceedings

                                                                                                                           ISSN: 1613-0073


    Artículos

    Overview of TASS 2016
    Miguel Ángel García Cumbreras, Julio Villena Román, Eugenio Martínez Cámara, M. Carlos Díaz
    Galiano, M. Teresa Martín Valdivia, L. Alfonso Ureña López ...................................................................13
    Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis
    de Sentimiento
    Edgar Casasola Murillo ..............................................................................................................................23
    LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task
    Antonio Quirós, Isabel Segura-Bedmar, Paloma Martínez .........................................................................29
    JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Tweets at Global Level
    Jhon Adrán Cerón-Guzmán .........................................................................................................................35
    Participación de SINAI en TASS 2016
    A. Montejo-Ráez, M. C. Díaz-Galiano .........................................................................................................41
    ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter
    Lluís-F. Hurtado, Ferran Pla ......................................................................................................................47
    GTI at TASS 2016: Supervised Approach for Aspect Based Sentiment Analysis in Twitter
    Tamara Álvarez-López, Milagros Fernández-Gavilanes, Silvia García-Méndez, Jonathan Juncal-
    Martínez, Francisco Javier González-Castaño ...........................................................................................53


Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido                                               ISSN 1613-0073
TASS 2016


                                                                                      CEUR Workshop Proceedings

                                                                                                ISSN: 1613-0073


    Organización
    Comité organizador
    Julio Villena-Román                          Sngular             julio.villena@sngular.team
    Miguel Á. García Cumbreras                   Universidad de Jaén magc@ujaen.es
    Eugenio Martínez Cámara                      TU Darmstadt camara@ukp.informatik.tu-darmstadt.de
    Manuel C. Díaz Galiano                       Universidad de Jaén mcdiaz@ujaen.es
    M. Teresa Martín Valdivia                    Universidad de Jaén maite@ujaen.es
    L. Alfonso Ureña López                       Universidad de Jaén laurena@ujaen.es


    ISSN:       1613-0073
    Editado en: Universidad de Jaén
    Año:        2016
    Editores: Julio Villena-Román         Sngular               julio.villena@sngular.team
                Miguel Á. García Cumbreras     Universidad de Jaén magc@ujaen.es
                Eugenio Martínez Cámara TU Darmstadt camara@ukp.informatik.tu-darmstadt.de
                Manuel C. Díaz Galiano Universidad de Jaén mcdiaz@ujaen.es
                M. Teresa Martín Valdivia Universidad de Jaén maite@ujaen.es
                L. Alfonso Ureña López Universidad de Jaén laurena@ujaen.es
    Publicado por: CEUR Workshop Proceedings


    Comité de programa
    Alexandra Balahur                             EC-Joint Research Centre (Italia)
    José Carlos Cortizo                           Universidad Europea de Madrid (España)
    Jose María Gómez Hidalgo                      Optenet (España)
    José Carlos González-Cristobal                Universidad Politécnica de Madrid (España)
    Lluís F. Hurtado                              Universidad de Valencia (España)
    Carlos A. Iglesias Fernández                  Universidad Politécnica de Madrid (España)
    Zornitsa Kozareva                             Information Sciences Institute (EE.UU.)
    Sara Lana Serrano                             Universidad Politécnica de Madrid (España)
    Ruslan Mitkov                                 University of Wolverhampton (Reino Unido)
    Andrés Montoyo                                Universidad de Alicante (España)
    Rafael Muñoz                                  Universidad de Alicante (España)
    Constantine Orasan                            University of Wolverhampton (Reino Unido)
    Jose Manuel Perea Ortega                      Universidad de Extremadura (España)
    Ferran Pla Santamaría                         Universidad de Valencia (España)
    María Teresa Taboada Gómez                    Simon Fraser University (Canadá)
    Mike Thelwall                                 University of Wolverhampton (Reino Unido)
    José Antonio Troyano Jiménez                  Universidad de Sevilla (España)


Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido           ISSN 1613-0073
TASS 2016


    Agradecimientos
    La organización de TASS ha contado con la colaboración de investigadores que participan en
    los siguiente proyectos de investigación:
    • REDES (TIN2015-65136-C2-1-R)


Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido   ISSN 1613-0073
TASS 2016


                                                                                      CEUR Workshop Proceedings

                                                                                                ISSN: 1613-0073


    Preámbulo

    Actualmente el español es la segunda lengua materna del mundo por número de hablantes tras el
    chino mandarín, y la segunda lengua mundial en cómputo global de hablantes. Esa segunda
    posición se traduce en un 6,7% de población mundial que se puede considerar hispanohablante.
    La presencia del español en el mundo no tiene una correspondencia directa con el nivel de
    investigación en el ámbito del Procesamiento del Lenguaje Natural, y más concretamente en la
    tarea que nos atañe, el Análisis de Opiniones. Por consiguiente, el Taller de Análisis de
    Sentimientos en la SEPLN (TASS) tiene como objetivo la promoción de la investigación del
    tratamiento del español en sistemas de Análisis de Opiniones, mediante la evaluación
    competitiva de sistemas de procesamiento de opiniones.

    En la edición de 2016 han participado 7 equipos, de los que 6 han enviado un artículo
    describiendo el sistema que han presentado, habiendo sido aceptados los 6 artículos tras ser
    revisados por el comité organizador. La revisión se llevó a cabo con la intención de publicar
    sólo aquellos que tuvieran un mínimo de calidad científica.

    La edición de 2016 tendrá lugar en el seno del XXXII Congreso Internacional de la Sociedad
    Española para el Procesamiento del Lenguaje Natural, que se celebrará el próximo mes de
    septiembre en Salamanca (España) dentro del V Congreso Español de Informática (CEDI 2016).


    Septiembre de 2016
    Los editores


Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido           ISSN 1613-0073
TASS 2016


                                                                                      CEUR Workshop Proceedings

                                                                                                ISSN: 1613-0073


    Preamble

    Currently Spanish is the second native language in the world by number of speakers after the
    Mandarin Chinese. This second position means that the 6.7% of the world population is
    Spanish-speaking. The presence of the Spanish language in the world has not a direct
    correspondence with the number of research works related to the treatment of Spanish language
    in the context of Natural Language Processing, and specially in the field of Sentiment Analysis.
    Therefore, the Workshop on Sentiment Analysis at SEPLN (TASS) aims to promote the
    research of the treatment of texts written in Spanish in Sentiment Analysis systems by means of
    the competitive assessment of opinion processing systems.

    Seven teams have participated in the 2016 edition of the workshop. Six of the seven teams have
    submitted a description paper of their systems. After a review process, the organizing committee
    has accepted the 6 papers, because all of them reached an acceptable scientific quality level.

    The 2016 edition will be held at the 32nd International Conference of the Spanish Society for
    Natural Language Processing (SEPLN 2016), which will take place at Salamanca in September
    framed by the 5th Spanish Conference of Computer Science (CEDI 2016).


    September 2016
    The editors


Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido           ISSN 1613-0073
TASS 2016


                                                                                                     CEUR Workshop Proceedings

                                                                                                                           ISSN: 1613-0073


    Artículos

    Overview of TASS 2016
    Miguel Ángel García Cumbreras, Julio Villena Román, Eugenio Martínez Cámara, M. Carlos Díaz
    Galiano, M. Teresa Martín Valdivia, L. Alfonso Ureña López ...................................................................13
    Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis
    de Sentimiento
    Edgar Casasola Murillo ..............................................................................................................................23
    LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task
    Antonio Quirós, Isabel Segura-Bedmar, Paloma Martínez .........................................................................29
    JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Tweets at Global Level
    Jhon Adrán Cerón-Guzmán .........................................................................................................................35
    Participación de SINAI en TASS 2016
    A. Montejo-Ráez, M. C. Díaz-Galiano .........................................................................................................41
    ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter
    Lluís-F. Hurtado, Ferran Pla ......................................................................................................................47
    GTI at TASS 2016: Supervised Approach for Aspect Based Sentiment Analysis in Twitter
    Tamara Álvarez-López, Milagros Fernández-Gavilanes, Silvia García-Méndez, Jonathan Juncal-
    Martínez, Francisco Javier González-Castaño ...........................................................................................53


Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido                                               ISSN 1613-0073
Artículos
                    TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 13-21


                                    Overview of TASS 2016
                                        Resumen de TASS 2016
 Miguel Ángel García Cumbreras1, Julio Villena Román2, Eugenio Martínez Cámara1,
 Manuel Carlos Díaz Galiano1, M. Teresa Martín Valdivia1, L. Alfonso Ureña López1
                                  1
                                    Universidad de Jaén
                                    23071 Jaén, Spain
                                        2
                                          Sngular
                                  28034 Madrid, Spain
   1
     {magc, emcamara, mcdiaz, laurena, maite}@ujaen.es 2{julio.villena}@sngular.team

        Resumen: Este artículo describe la quinta edición del taller de evaluación experimental TASS
        2016, enmarcada dentro del Congreso Internacional SEPLN 2016. El principal objetivo de
        TASS es promover la investigación y el desarrollo de nuevos algoritmos, recursos y técnicas
        para el análisis de sentimientos en medios sociales (concretamente en Twitter), aplicado al
        idioma español. Este artículo describe las tareas propuestas en TASS 2016, así como el
        contenido de los corpus utilizados, los participantes en las distintas tareas, los resultados
        generales obtenidos y el análisis de estos resultados.
        Palabras clave: TASS 2016, análisis de opiniones, medios sociales

        Abstract: This paper describes TASS 2016, the fifth edition of the Workshop on Sentiment
        Analysis at SEPLN. The main aim is the promotion of the research and the development of new
        algorithms, resources and techniques on the field of sentiment analysis in social media
        (specifically Twitter) focused on the Spanish language. This paper presents the TASS 2016
        proposed tasks, the description of the corpora used, the participant groups, the results and
        analysis of them.
        Keywords: TASS 2016, sentiment analysis, social media.


                                                                   Although SA is not a new task, it is still
    1      Introduction                                        challenging, because the state of the art has not
                                                               yet resolved some problems related to
TASS is an experimental evaluation workshop,
                                                               multilingualism, domain adaptation, text genre
a satellite event of the annual SEPLN
                                                               adaptation and polarity classification at fine
Conference, with the aim to promote the
                                                               grained level. Polarity classification has usually
research on Sentiment Analysis in social media
                                                               been tackled following two main approaches.
focused on the Spanish language. The fifth
                                                               The first one applies machine learning
edition will be held on September 13th, 2016 at
                                                               algorithms in order to train a polarity classifier
the University of Salamanca, Spain.
                                                               using a labelled corpus (Pang et al. 2002). This
   Sentiment Analysis (SA) is traditionally
                                                               approach is also known as the supervised
defined as the computational treatment of
                                                               approach. The second one is known as semantic
opinion, sentiment and subjectivity in texts
                                                               orientation, or the unsupervised approach, and
(Pang & Lee, 2008). However, Cambria and
                                                               it integrates linguistic resources in a model in
Hussain (2012) offer a more updated definition:
                                                               order to identify the valence of the opinions
Computational techniques for the extraction,
                                                               (Turney 2002).
classification, understanding and evaluation of
                                                                   The aim of TASS is to provide a competitive
opinions and comments published on the
                                                               forum where the newest research works in the
Internet and other kind of user generated
                                                               field of SA in social media, specifically focused
contents. It is a hard task because even humans
                                                               on Spanish tweets, are described and discussed
often disagree on the polarity of a given text.
                                                               by scientific and business communities.
And it is a harder task when the text has only
                                                                   The rest of the paper is organized as follows.
140 characters (Twitter messages or tweets).
                                                               Section 2 describes the different corpus

                                                   ISSN 1613-0073
    M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López


provided to participants. Section 3 shows the                        Obviously, it was not allowed to use the test
different tasks of TASS 2016. Section 4                              data from previous years to train the systems.
describes the participants and the overall results                       Each tweet was tagged with its global
are presented in Section 5. Finally, the last                        polarity (positive, negative or neutral
section shows some conclusions and future                            sentiment) or no sentiment at all. A set of 6
directions.                                                          labels has been defined: strong positive (P+),
                                                                     positive (P), neutral (NEU), negative (N),
    2     Corpus                                                     strong negative (N+) and one additional no
                                                                     sentiment tag (NONE).
TASS 2016 experiments are based on two
                                                                         In addition, there is also an indication of the
corpora, specifically built for the different
                                                                     level of agreement or disagreement of the
editions of the workshop.
                                                                     expressed sentiment within the content, with
    The two corpora will be made freely
                                                                     two possible values: AGREEMENT and
available to the community after the workshop.
                                                                     DISAGREEMENT. This is especially useful to
Please         send        an      email       to
                                                                     make out whether a neutral sentiment comes
tass@sngularmeaning.team filling in the TASS
                                                                     from neutral keywords or else the text contains
Corpus License agreement with your email,
                                                                     positive and negative sentiments at the same
affiliation (institution, company or any kind of
                                                                     time.
organization) and a brief description of your
                                                                         Moreover, the polarity values related to the
research objectives, and you will be given a
                                                                     entities that are mentioned in the text are also
password to download the files in the password
                                                                     included for those cases when applicable. These
protected area. The only requirement is to
                                                                     values are similarly tagged with 6 possible
include a citation to a relevant paper and/or the
                                                                     values and include the level of agreement as
TASS website.
                                                                     related to each entity.
   2.1     General corpus                                                This corpus is based on a selection of a set
                                                                     of topics. Thematic areas such as “política”
The General Corpus contains over 68.000
                                                                     (“politics”), “fútbol” (“soccer”), “literatura”
tweets, written in Spanish, about 150 well-
                                                                     (“literature”)        or         “entretenimiento”
known personalities and celebrities of the world
                                                                     (“entertainment”). Each tweet in the training
of politics, economy, communication, mass
                                                                     and test set has been assigned to one or several
media and culture, between November 2011
                                                                     of these topics (most messages are associated to
and March 2012. Although the context of
                                                                     just one topic, due to the short length of the
extraction has a Spanish-focused bias, the
                                                                     text).
diverse nationality of the authors, including
                                                                         The annotation has been semi-automatically
people from Spain, Mexico, Colombia, Puerto
                                                                     done: a baseline machine learning model is first
Rico, USA and many other countries, makes the
                                                                     run and then all tags are checked by human
corpus reach a global coverage in the Spanish-
                                                                     experts. In the case of the polarity at entity
speaking world.
                                                                     level, due to the high volume of data to check,
Each tweet includes its ID (tweetid), the
                                                                     the human annotation has only been done for
creation date (date) and the user ID (user). Due
                                                                     the training set.
to restrictions in the Twitter API Terms of
                                                                         Table 1 shows a summary of the training
Service        (https://dev.twitter.com/terms/api-
                                                                     and test corpora provided to participants.
terms), it is forbidden to redistribute a corpus
that includes text contents or information about                         Attribute                                    Value
users. However, it is valid if those fields are                          Tweets                                      68.017
removed and instead IDs (including Tweet IDs                             Tweets (test)                         60.798 (89%)
and user IDs) are provided. The actual message                           Tweets (test)                          7.219 (11%)
content can be easily obtained by making                                 Topics                                          10
queries to the Twitter API using the tweetid.                            Users                                          154
    The general corpus has been divided into                             Date start (train)                      2011-12-02
training set (about 10%) and test set (90%). The                         Date end (train)                        2012-04-10
training set was released, so the participants                           Date start (test)                       2011-12-02
could train and validate their models. The test                          Date end (test)                         2012-04-10
corpus was provided without any tagging and
has been used to evaluate the results.                                              Table 1: Corpus statistics

                                                               14
                                           Overview of TASS 2016


   Users were journalists (periodistas),                  gathered from 23rd to 24th of April 2015, and
politicians (políticos) or celebrities (famosos).         are related to one of the following political
The only language involved was Spanish (es).              aspects that appear in political campaigns:
   The list of topics that have been selected is          • Economics             (Economía):         taxes,
the following:                                                infrastructure, markets, labour policy...
   • Politics (política)                                  • Health System (Sanidad): hospitals,
   • Entertainment (entretenimiento)                          public/private health system, drugs,
   • Economy (economía)                                       doctors...
   • Music (música)                                       • Education (Educación): state school, private
   • Soccer (fútbol)                                          school, scholarships...
   • Films (películas)                                    • Political party (Propio_partido): anything
   • Technology (tecnología)                                  good (speeches, electoral programme...) or
   • Sports (deportes)                                        bad (corruption, criticism) related to the
   • Literature (literatura)                                  entity
   • Other (otros)                                        • Other aspects (Otros_aspectos): electoral
   The corpus is encoded in XML. Figure 1                     system, environmental policy...
shows the information of two tweets. The first               Each aspect is related to one or several
tweet is only annotated with the polarity at              entities that correspond to one of the main
tweet level because there is not any entity in the        political parties in Spain, which are:
text. However, the second one is annotated with           • Partido_Popular (PP)
the global polarity of the message and the                • Partido_Socialista_Obrero_Español
polarity associated to each of the entities that              (PSOE)
appear in the text (UPyD and Foro Asturias).              • Izquierda_Unida (IU)
                                                          • Podemos
                                                          • Ciudadanos (C’s)
                                                          • Unión_Progreso_y_Democracia (UPyD)

                                                             Each tweet in the corpus has been manually
                                                          annotated by two annotators, and a third one in
                                                          case of disagreement, with the sentiment
                                                          polarity at aspect level. Sentiment polarity has
                                                          been tagged from the point of view of the
                                                          person who writes the tweet, using 3 levels: P,
                                                          NEU and N. Again, no difference is made
                                                          between no sentiment and a neutral sentiment
                                                          (neither positive nor negative). Each political
                                                          aspect is linked to its correspondent political
                                                          party and its polarity.

                                                             Figure 2 shows the information of two
                                                          sample tweets.


   Figure 1: Sample tweets (General corpus)

                                                           Figure 2: Sample tweets (STOMPOL corpus)
2.2   STOMPOL corpus
STOMPOL (corpus of Spanish Tweets for                        The number of tweets per each entity are
Opinion Mining at aspect level about POLitics)            shown in Table 2.
is a corpus of Spanish tweets prepared for the
research on the challenging task of opinion
mining at aspect level. The tweets were

                                                     15
      M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López


Entity                           Train                  Test           N, N+, NONE) and another based on just 4 labels
PP                                 205                  125            (P, N, NEU, NONE).
PSOE                               136                   70                Participants are expected to submit (up to 3)
C’s                                119                   87            experiments for the 6-labels evaluation, and
Podemos                             98                   80            they are also allowed to submit (up to 3)
IU                                 111                   43
                                                                       specific experiments for the 4-labels scenario.
UPyD                                97                  124
                                                                           Results must be submitted in a plain text file
Total                              766                  529
                                                                       with the following format:

 Table 2: Number of tweets per entity and per                                     tweetid \t polarity
               corpus subset
                                                                       where polarity can be:
                                                                        • P+, P, NEU, N, N+ and NONE for the 6-labels
      3     Description of tasks                                          case
Since the first edition of TASS, a new task and                         • P, NEU, N and NONE for the 4-labels case.
a new corpus have been published. However,
one of the aims of TASS is the evaluation of the                           The same test corpus of previous years was
progress of the research on SA. Thus, the                              used for the evaluation in order to develop a
edition of 2016 was focused on the analysis and                        comparison among the systems. The accuracy is
the comparison of the systems with the                                 one of the measures used to evaluate the
submissions of previous editions.                                      systems, however due to the fact that the
    The edition of 2016 was focused on two                             training corpus is not totally balanced the
tasks: polarity classification at tweet level and                      systems were also assessed by the macro-
polarity classification at entity level. The                           averaged precision, macro-averaged recall and
polarity classification task has been proposed                         macro-averaged F1-measure.
with the same corpus since the first edition of                        3.2       Task 2: Aspect-based sentiment
TASS, but the polarity classification at aspect                                  analysis
level has been proposed with a different corpus                        A corpus with the entities and the aspect
each edition. In the edition of 2016 the                               identified was provided to the participants, so
classification at aspect level uses the                                the goal of the systems is the inference of the
STOMPOL corpus, which was published the                                polarity at the aspect-level. As in 2015,
first time in the edition of 2015.                                     STOMPOL corpus was the corpus used in this
    Participants are expected to submit up to 3                        task. STOMPOL was divided in training and
results of different experiments for one or both                       test set, the first one for the development and
of these tasks, in the appropriate format                              validation of the systems, and the second for
described below.                                                       evaluation.
    Along with the submission of experiments,                              Participants are expected to submit up to 3
participants have been invited to submit a paper                       experiments for each corpus, each in a plain
to the workshop in order to describe their                             text file with the following format:
experiments and discussing the results with the
audience in a regular workshop session.                                   tweetid \t aspect-entity \t polarity
    The two proposed tasks are described next.
                                                                           Allowed polarity values are: P, N and NEU.
3.1       Task 1: Sentiment Analysis at                                For the evaluation, a single label combining
          Global Level                                                 “aspect-polarity” has been considered. As in the
                                                                       first task, accuracy, macro-averaged precision,
This task consists on performing an automatic
                                                                       macro-averaged recall and macro-averaged F1-
polarity classification to determine the global
                                                                       measure have been calculated for the global
polarity of each message in the test set of the
                                                                       result.
General Corpus. The training set of the corpus
was provided to the participants with the aim
they could train and validate their models with
                                                                             4     Participants and Results
it. There were two different evaluations: one                          This year 7 (7 last year) groups submitted their
based on 6 different polarity labels (P+, P, NEU,                      systems The list of active participant groups is


                                                                 16
                                             Overview of TASS 2016


shown in Table 3, including the tasks in which             measure have been used to evaluate each
they have participated.                                    individual label and ranking the systems.
   Six of the seven participant groups sent a
report describing their experiments and results                             Run Id           M-F1
achieved. Papers were reviewed and included in                              ELiRF-UPV_1      0.518
the workshop proceedings. References are listed                             jacerong_2       0.504
in Table 4.
                                                                            jacerong_3       0.503
                                                                            jacerong_1       0.499
            Group             1      2                                      ELiRF-UPV_2      0.496
            jacerong          X                                             INGEOTEC         0.464
            ELiRF-UPV         X      X
                                                                            LABDA_1          0.429
            LABDA             X
                                                                            LABDA_2          0.429
            INGEOTEC          X
            GASUCR            X                                             LABDA_3          0.418
            GTI                      X                                      GASURC_3         0.254
            SINAI_w2v         X                                             GASURC_1         0.232
            Total             6      1
                                                                            GASURC_2         0.227

             Table 3: Participant groups
                                                                     Table 5: Results for Task 1, 5 levels
Group                 Report
                      ELiRF-UPV en TASS 2016:                 In order to perform a more in-depth
ELiRF                 Análisis de Sentimientos en          evaluation, results are calculated considering
                      Twitter                              the classification only in 3 levels (POS, NEU,
                      GTI at TASS 2016:
                                                           NEG) and no sentiment (NONE) merging P and P+
                      Supervised Approach for
GTI                                                        in only one category, as well as N and N+ in
                      Aspect Based Sentiment
                      Analysis in Twitter                  another one. The results reached by the
                      JACERONG at TASS 2016:               submitted systems are shown in Table 6.
                      An Ensemble Classifier for
jacerong              Sentiment Analysis of Spanish                         Run Id           M-F1
                      Tweets at Global Level                                jacerong_3       0.568
                      LABDA at the 2016 TASS                                jacerong_2       0.567
                      challenge task: using word
LABDA                 embedding for the sentiment                           jacerong_1       0.564
                      analysis task                                         ELiRF-UPV_1      0.549
                      Participación de SINAI en                             ELiRF-UPV_2      0.548
SINAI
                      TASS 2016
                                                                            INGEOTEC         0.524
                                                                            LABDA_3          0.511
             Table 4: Participant reports
                                                                            LABDA_2          0.508
      5     Results                                                         LABDA_1          0.508

   This section will be focused on the                                      SINAI_w2v_1      0.504
description and the analysis of the results and                             SINAI_w2v_3      0.486
the systems submitted by the participants.                                  SINAI_w2v_4      0.469
                                                                            SINAI_w2v_2      0.440
5.1       Task 1: Sentiment Analysis at                                     GASURC_1         0.250
          Global Level
                                                                            GASURC_2         0.152
Submitted runs and results for Task 1,
evaluation based on 5 polarity levels with the
whole General test Corpus are shown in Table                         Table 6: Results for Task 1, 3 levels
5. Accuracy, macro-averaged precision, macro-
averaged recall and macro-averaged F1-

                                                      17
      M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López


5.2     Task 2: Aspect-based Sentiment                                 method based on lexical markers. The polarity
        Analysis                                                       classification system is a SVM classifier that
                                                                       uses different type of features in order to
Submitted runs and results for Task 2, with the                        represent the contexts of the entities and the
STOMPOL corpus, are shown in Table 7.                                  aspects.
Accuracy, macro-averaged precision, macro-                                 Montejo-Ráez and Díaz-Galiano (2016)
averaged recall and macro-averaged F1-                                 introduce a system based on a supervised
measure have been used to evaluate each                                learning algorithm over vectors resulting from a
individual label and ranking the systems.                              weighted vector. This vector is computed using
                                                                       a Word2Vec algorithm. This method, which is
                 Run Id               M-F1
                                                                       inspired     from    neural-network     language
                 ELiRF-UPV_1          0.526                            modelling, was executed with a collection of
                 GTI                  0.463                            tweets written in Spanish and the Spanish
                                                                       Wikipedia in order to generate a set of word
              Table 7: Results for Task 2                              embeddings for the representation of the words
                                                                       of the General Corpus of TASS as dense
                                                                       vectors. The creation of the collection of tweets
5.3     Description of the systems                                     written in Spanish followed a distant
The systems submitted in the edition of 2016                           supervision approach by means the assumption
represent the next step of the ones submitted in                       that tweets with happy and sad emoticons
the previous edition. The systems may be                               express     emotions     or   opinions.     Their
cluster in two groups, those ones that rely on                         experiments show massive data from Twitter
the classification power of the ensemble of                            can lead to a slight improvement in
several base classifiers, and those systems that                       classification accuracy.
change the use traditional Bag-of-Words model                              The system presented by the team LABDA
for the use of vectors of word embeddings in                           (Quirós, Segura-Bedmar and Paloma Martínez,
order to represent the meaning of each word. In                        2016) is similar to the one submitted by SINAI
the subsequent paragraphs the main features of                         (Montejo-Ráez and Díaz-Galiano, 2016)
the systems submitted are going to be depicted.                        because it also used word embeddings as
    Hurtado and Pla (2016) describe the                                schema of representation of the meaning of the
participation of the team ELiRF-UPV in the                             words of the tweets. Quirós, Segura-Bedmar
two tasks of TASS 2016. The only difference                            and Paloma Martínez (2016) assessed the
between the systems submitted for the two tasks                        performance of the SVM and Logistic
is the fact that the one focused on the second                         Regression as classifiers.
task has a module for the identification of the                            Casasola Murillo and Marín Reventós
context of each of the entities and aspects                            (2016) submitted an unsupervised system based
annotated on the tweets. The polarity                                  on the system described in Turney (2002), but
classification system relies on the ensemble of                        with a specific adaptation to the classification
192 configurations of a SVM classifiers. For                           of tweets written in Spanish.
the combination of the set of classifiers they
evaluate the performance of an approach based                          5.4     Analysis
on voting and other on stacking.                                       In Table 5 and Table 6 are shown the results of
    The system depicted in (Cerón-Guzmán,                              each system and they are ranked by the F1-
2016) is also based on an approach of ensemble                         score reached, so it is not hard to know what is
classifiers. In this case the base classifiers used                    the best system in the edition of 2016.
a classifier based on logistic regression and they                         On the other hand, how many tweets were
are combined by voting.                                                rightly classified by the submitted systems? Is
    Alvarez et al. (2016) exposed the                                  there a set of tweets that were not rightly
participation of the team GTI on the task 2. The                       classified by any system? What are the most
system is similar to the system of the team                            difficult tweets to classify? These questions are
ELiRF-UPV in the sense that it is composed by                          going to be answered in the following
two layers: context identification and polarity                        paragraphs?
classification. Regarding the identification of                            Table 8 shows the rate of tweets that are
the context, the authors design a heuristic                            rightly classified by a number of systems. There


                                                                 18
                                           Overview of TASS 2016


are about a 6% of tweets whose polarity is not                     Id: 177439342497767424
inferred by any of the submitted systems. In
other words, the submitted systems in the                          hahahahahaha “@Absolutexe: ¿Le
edition of 2016 are able to classify about the                     han cambiado ya el nombre a la
94% of the test set. So, what is the main                          Junta de Andalucía por la Banda de
features of that 6% of tweets that any system                      Andalucía o aún no?”
inferred their polarity?
                                                                   hahahahahaha “@Absolutexe: Has the
Number of systems        Rate of tweets                            Junta de Andalucía renamed Gang of
0                        0.056%                                    Andalucía or not yet?”
1                        0.065%
2                        0.063%                                    Polarity: N+
3                        0.067%
4                        0.059%
5                        0.061%                            Figure 4: Tweet not rightly classified by any
6                        0.074%                                              system
7                        0.078%
8                        0.081%
9                        0.112%                                    Id: 177439342497767424
10                       0.122%
11                       0.082%                                    Rubalcaba pide a Rajoy que
12                       0.062%
                                                                   presente ya los Presupuestos y dice
13                       0.011%
                                                                   que no lo hace porque espera a las
                                                                   elecciones andaluzas
  Table 8: Rate of tweets rightly classified (6
       classes) by a number of systems                             Rubalcaba requires Rajoy to submit the
                                                                   Budget and says that he didn’t because
                                                                   he is waiting the results of the elections
      Id: 171304000392663040                                       in Andalucia

      Sacarle 17 puntos en la final de                             Polarity: NONE
      Copa al Barça CB en el Palau
      Sant Jordi es una pasada.
                                                           Figure 5: Tweet not rightly classified by any
      Beating Barça by 17 points in the                                      system
      Copa is amazing

      Polarity: P+                                           All the systems submitted are based on
                                                         linear classifiers that do not take into account
                                                         the context of each word, which means a big
 Figure 3: Tweet not rightly classified by any           drawback for the understanding the meaning of
                   system                                a span of text.
                                                             The tweets of the Figures 3, 4 and 5 show
                                                         that opinions and emotions are not only
   Figures Figure 3,Figure 4Figure 5 are three           expressed by lexical markers, so the future
examples of tweets that were not rightly                 participants should take into account the
classified by any system. The common feature             challenging task of implicit opinion analysis,
of the three tweets is that they do not have any         irony and sarcasm detection. These new
lexical marker that express emotion or opinion.          problems may be framed on the semantic level
Moreover, the tweet of the Figure 4 is sarcastic,        of Natural Language Processing and should be
which means an additional challenging for SA             tackled by the research community in order to
because requires a deep understanding of the             go a step further in the understanding of the
language.                                                subjective information, which is continuously
                                                         published on the Internet.


                                                    19
    M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López


    6     Conclusions and Future Work                                Cerón-Guzmán, J. A. 2016. JACERONG at
                                                                        TASS 2016: An Ensemble Classifier for
TASS was the first workshop about SA focused
                                                                        Sentiment Analysis of Spanish Tweets at
on the processing of texts written in Spanish. In
                                                                        Global Level. In Proceedings of TASS 2016:
the three first editions of TASS, the research
                                                                        Workshop on Sentiment Analysis at SEPLN
community were mainly formed by Spanish
                                                                        co-located with the 32nd SEPLN
researchers, however since the last edition, the
                                                                        Conference (SEPLN 2016), Salamanca,
researchers that come from South America is
                                                                        September
making bigger, so it is an evidence that the
research community of Sentiment Analysis in                          Casasola Murillo, E. and Gabriela M. R. 2016.
Spanish is not only located in Spain and is                            Evaluación de Modelos de Representación
formed by the Spanish speaking countries.                              del Texto con Vectores de Dimensión
    Anyway, the developed corpus and gold                              Reducida para Análisis de Sentimiento. In
standards, and the reports from participants will                      Proceedings of TASS 2016: Workshop on
for sure be helpful for knowing the state of the                       Sentiment Analysis at SEPLN co-located
art in SA in Spanish.                                                  with the 32nd SEPLN Conference (SEPLN
    The future work will be mainly focused on                          2016), Salamanca, September
the definition of a new General Corpus because                       Hurtado, Ll. and Ferran P. 2016. ELiRF-UPV
of the following reasons:                                              en TASS 2016: Análisis de Sentimientos en
1. The language used on Twitter changes                                Twitter. In Proceedings of TASS 2016:
     faster than the language used in traditional                      Workshop on Sentiment Analysis at SEPLN
     genres of texts, so the update of the corpus                      co-located with the 32nd SEPLN
     is required in order to cover a real used of                      Conference (SEPLN 2016), Salamanca,
     the language on Twitter.                                          September
2. After several editions of the workshop, we
     realize that the quality of the annotation is                   Montejo-Ráez, A. and Díaz-Galiano, M. C.
     not extremely good, so it is required to                          2016. Participación de SINAI en TASS
     define a new corpus with a high quality                           2016. In Proceedings of TASS 2016:
     annotation in order to provide a real gold                        Workshop on Sentiment Analysis at SEPLN
     standard for Spanish SA on Twitter.                               co-located with the 32nd SEPLN
3. The research community deeply know the                              Conference (SEPLN 2016), Salamanca,
     General Corpus of TASS and it wants a                             September
     new challenge.                                                  Pang, B., Lillian Lee and Shivakumar
    A significant amount of new tasks is                                Vaithyanathan.    2002.    Thumbs      up?:
currently being defined in Natural Language                             Sentiment classification using machine
Processing, so some of them, such as stance                             learning techniques. In Proceedings of the
classification, will be studied to be proposal for                      ACL-02 Conference on Empirical Methods
the next edition of TASS.                                               in Natural Language Processing - Volume
                                                                        10, EMNLP ’02, páginas 79–86. Association
                                                                        for Computational Linguistics, Stroudsburg,
Acknowledgements                                                        PA, USA. doi:10.3115/1118693.1118704.
This work has been partially supported by a                          Pang, B. and Lillian Lee (2008). Opinion
grant from the Fondo Europeo of Desarrollo                              mining and sentiment analysis. Foundations
Regional (FEDER) and REDES project                                      and Trends in Information Retrieval, 2(1-
(TIN2015-65136-C2-1-R) from the Spanish                                 2):1–135.        ISSN           1554-0669.
Government.                                                             doi:10.1561/1500000011.
                                                                     Quirós, A., Isabel S. B. and Paloma M. 2016.
References                                                             LABDA at the 2016 TASS challenge task:
Cambria, E. and Amir Hussain, A. 2012. Sentic                          using word embeddings for the sentiment
  Computing.     Techniques,    Tools    and                           analysis task. In Proceedings of TASS 2016:
  Applications. Springer Briefs in Cognitive                           Workshop on Sentiment Analysis at SEPLN
  Computation,     volume     2.    Springer                           co-located with the 32nd SEPLN
  Netherlands. ISBN 978-94-007-5069-2.                                 Conference (SEPLN 2016), Salamanca,
  doi:10.1007/978-94-007-5070-8.                                       September

                                                               20
                                         Overview of TASS 2016


Turney, P. D. 2002. Thumbs up or thumbs
   down?: Semantic orientation applied to
   unsupervised classification of reviews. In
   Proceedings of the 40th Annual Meeting on
   Association for Computational Linguistics,
   ACL ’02, pp: 417–424. Association for
   Computational Linguistics, Stroudsburg,
   PA, USA. doi:10.3115/1073083.1073153.
Villena-Román, J., Sara, L. S., Eugenio M. C.,
   and José Carlos G. C. 2013. TASS -
   Workshop on Sentiment Analysis at SEPLN.
   Revista de Procesamiento del Lenguaje
   Natural, 50, pp 37-44.
Villena-Román, J., Janine G. M., Sara L. S. and
   José Carlos G. C. 2014. TASS 2013 - A
   Second Step in Reputation Analysis in
   Spanish. Revista de Procesamiento del
   Lenguaje Natural, 52, pp 37-44.


                                                  21
                     TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 23-28


      Evaluación de Modelos de Representación del Texto con
    Vectores de Dimensión Reducida para Análisis de Sentimiento∗
    Evaluation of Reduced Dimension Vector Text Representation Models for
                             Sentiment Analysis

              Edgar Casasola Murillo                               Gabriela Marı́n Raventós
              Universidad de Costa Rica                             Universidad de Costa Rica
                 San José, Costa Rica                                 San José, Costa Rica
               edgar.casasola@ucr.ac.cr                              gabriela.marin@ucr.ac.cr

       Resumen: Se describe el sistema para análisis de sentimiento desarrollado por el
       Grupo de Análisis de Sentimiento GAS-UCR de la Universidad de Costa Rica para
       la tarea 1 del workshop TASS 2016. El sistema propuesto está basado en el uso
       de vectores de caracterı́sticas de baja dimensión para representación del texto. Se
       propone un modelo simple fundamentado en la normalización de texto con identi-
       ficación de marcadores de énfasis, el uso de modelos de lenguaje para representar
       las caracterı́sticas locales y globales del texto, y caracterı́sticas como emoticones y
       partı́culas de negación. Los primeros experimentos muestran las mejoras que se ob-
       tienen en la precisión al identificar la polaridad de textos completos conforme se van
       incorporando las caracterı́sticas aquı́ mencionadas.
       Palabras clave: análisis de sentimiento, clasificación de textos por polaridad, textos
       cortos
       Abstract: The Sentiment Analisys System developed by GAS-UCR team of the
       University of Costa Rica for task 1 of TASS 2016 workshop is presented. Prelimi-
       nar evaluation results of the proposed Sentiment Analysis System are presented.
       The system is based on low dimension feature vectors for text representation. The
       proposed model is based on text normalization with emphasis mark identification,
       the use of local and global language models, and other features like emoticons an
       negation terms. Initial experimentation shows that the introduction of the selected
       features have a positive impact on precision at the polarity classification task.
       Keywords: sentiment analysis, polarity based text clasification, short texts.


1     Introducción                                             ciales marcadores de énfasis presentes en el
Este trabajo tiene como propósito describir                    mismo, la creación de vectores de caracterı́sti-
el sistema utilizado por el grupo de investi-                   cas de dimensión reducida para disminuir el
gación en análisis de sentimiento de la Uni-                  efecto de la dispersión de los datos, y la ex-
versidad de Costa Rica en su participación                     ploración del impacto del uso de diccionarios
en el taller TASS2016 (Garcı́a-Cumbreras et                     de polaridad que se generan mediante la uti-
al., 2016). El enfoque del trabajo del grupo                    lización de diferentes modelos de representa-
ha sido el estudio de los factores que van inci-                ción del lenguaje asociados tanto al contexto
diendo en las mejoras en la precisión obtenida                 local como global de los datos. Para esto es-
al llevar a cabo la clasificación de la polaridad              tamos utilizando una adaptación propia del
de tweets en idioma español. Nuestro sistema                   algoritmo de Turney (Turney, 2002)sobre un
se fundamenta en tres elementos básicos que                    corpus de 5 millones de tweets en español.
son: la normalización del texto en la etapa                    Estos modelos se almacenan en forma de dic-
de preprocesamiento identificando los poten-                    cionarios con polaridad para su posterior re-
                                                                utilización. Nos interesa particularmente la
∗
  Este trabajo se ha llevado a cabo gracias al apo-             investigación en este campo dado que si bien
yo económico de la Universidad de Costa Rica y el              desde el año 2013 se identificó una brecha im-
Gobierno de la República de Costa Rica a través del           portante entre la cantidad de investigación y
MICITT. Se agradece a los asistentes del grupo de
investigación GAS-UCR por su trabajo                           tecnologı́a del lenguaje desarrollada para el
                                                    ISSN 1613-0073
                                     E. Casasola Murillo, G. Marín Raventós


idioma inglés y el español (Cambria et al.,               buscan la representación vectorial de las pala-
2013) (Melero et al., 2012), de la misma for-               bras en el espacio continuo como es el caso del
ma debemos tener presente que no necesaria-                 uso de Word2Vect (Dı́az-Galiano y Montejo-
mente las soluciones para español peninsular               Ráez, 2015).
van a tener los mismos resultados al aplicarse
a variantes de español americano, por lo que               3     Descripción del sistema
los recursos y métodos que utilizamos tienen               Nuestro sistema se fundamenta en cuatro
la intensión de aportar a la investigación en             elementos que consideramos importantes de
español y colaborar para su posterior aplica-              mencionar. Primero nos referiremos a la for-
ción en otros contextos de habla hispana.                  ma en que construimos nuestro diccionario
                                                            con la polaridad de los términos y las razones
2   Antecedentes                                            para haber construido uno propio. Posterior-
Entre los resultados obtenidos con sistemas                 mente nos referimos a nuestro proceso de pre-
con enfoques basados en aprendizaje máqui-                 procesamiento e identificación de potenciales
na, el uso de máquina de soporte vecto-                    marcadores de énfasis durante esta etapa ini-
rial (MSV) ha ofrecido buenos resultados                    cial. En la siguiente subsección explicamos la
tanto en inglés (Kiritchenko, Zhu, y Moham-                forma en que construimos vectores de baja di-
mad, 2014) y (Batista y Ribeiro, 2013) como                 mensión con información y hacemos uso del
en español donde 9 de los 14 sistemas para el              diccionario. Finalmente se menciona la forma
español presentados en TASS2015 (Villena-                  en que se pretende capturar en los vectores de
Román et al., 2015) hacı́an uso de este ti-                caracterı́sticas aspectos locales con respecto a
po de clasificador. Sin embargo, la dependen-               los datos de entrenamiento, y globales, a par-
cia del lenguaje hace que estos clasificadores              tir de modelos de representación del lenguaje
dependan de los vectores de caracterı́sticas                general.
con los que son representados los comenta-
rios de texto. Esta extracción de caracterı́sti-           3.1      Creación del diccionario
cas ha sido el foco de atención de múltiples                       polarizado
trabajos como (Cabanlit y Junshean Espino-                  Decidimos desarrollar diccionarios de polari-
sa, 2014) , (Feldman, 2013), (Guo y Wan,                    dad propios, en lugar de utilizar los existen-
2012), (Sharma y Dey, 2012) y (Wang et al.,                 tes, ya que consideramos que desde el punto
2011). En trabajos recientes de análisis de                de vista del procesamiento de lenguaje natu-
sentimiento en español tales como el trabajo               ral tradicional (Indurkhya y Damerau, 2010)
de (Martı́nez-Cámara et al., 2015) se utilizan             estos diccionarios con polaridad pueden ser
varios diccionarios de polaridad y se represen-             vistos cada uno, como un modelo de lenguaje
tan utilizando un modelo de espacio vectorial               particular. Por este motivo tratamos de desa-
MEV. El diccionario en sı́ se convierte en un               rrollar y evaluar una adaptación del tradi-
modelo de lenguaje que sirve como recurso                   cional método de generación de estos recur-
para lograr representaciones eficientes de los              sos lingüı́sticos de (Turney, 2002). La deci-
vectores utilizados para la clasificación.                 sión anterior no se debió a la no existencia
   En los últimos años la representación vec-            de diccionarios polarizados ya que claramen-
torial basada en modelos de lenguaje como                   te en trabajos como (Martı́nez-Cámara et al.,
unigramas y bigramas se movió hacia repre-                 2015) se hace uso de varios de ellos, sino con
sentaciones de caracterı́sticas ya que la canti-            el fin de incorporar la etapa de creación de
dad de términos introduce un problema aso-                 diccionario dentro de la metodologı́a de tra-
ciado a su alta dispersión en el vector (Cam-              bajo para que posteriores investigaciones en
bria et al., 2013). Si los vectores contienen               otros paı́ses de habla hispana puedan replicar
un alto número de atributos diferentes, uno                el trabajo y disminuir la barrera inicial aso-
por término, los conjuntos de datos para en-               ciada a la falta de recursos lingüı́sticos pro-
trenamiento deben contener una mayor can-                   pios y el efecto del uso del diccionario pola-
tidad de textos anotados que atributos para                 rizado sobre la calidad de los resultados de
un buen entrenamiento de los clasificadores.                clasificación.
Es por esto que los modelos de representación                  El diccionario de polaridad creado utiliza
del lenguaje basados en unigramas, bigramas                 un corpus recolectado durante el año 2013,
o bien skipgramas requiren de una represen-                 con 5 millones de tweets en español. La va-
tación vectorial eficiente. Trabajos recientes             riante con respecto al algoritmo propuesto
                                                      24
       Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento


por Turney (Turney, 2002) es la siguiente.
Para el cálculo de la orientación semánti-
ca de un término, tal y como lo define
Turney en su artı́culo original, se utilizaron
grupos de palabras semilla en lugar de un
solo término, y en lugar de utilizar consul-
tas a motores de búsqueda para obtener la
cantidad de textos donde aparecen las pala-
bras analizadas cerca de las palabras positi-
vas o negativas se utilizó el motor de búsque-
da implementado con el software libre Solr
http://lucene.apache.org/solr/. Con el
motor se indexaron los 5 millones de tweets
por lo que las consultas se ejecutaron en for-
ma local. Este método cuenta con la ventaja
de que se puede calcular entonces la orien-
tación semántica de un término directamen-
te o bien almacenarlo en un diccionario. En
nuestro caso precalculamos la polaridad y la
almacenamos en forma de diccionario. Por el
momento solo se han llevado a cabo los cálcu-
los para términos individuales.                                   Figura 1: Proceso de normalizacion del texto

3.2   Normalizador de texto con                                    ca también fue incorporada. La presencia de
      marcadores de énfasis                                       marcadores de énfasis tales como repetición
Luego de un proceso de análisis de las ca-                        de caracteres, de sı́labas, o mayúsculas so-
racterı́sticas presentes en el texto desarrolla-                   bre términos que aparecen como negativos en
mos un sistema para normalización del texto.                      algún contexto son registrados como una ca-
Para este preprocesamiento se segmentan los                        racterı́stica importante en el vector.
términos potenciales, signos de puntuación y                         Los vectores generados utilizan la polari-
emoticones. Se lleva a cabo un marcado y                           dad de los términos para determinar la po-
conversión de los términos. El proceso que se-                   sición en el vector de caracterı́sticas creado.
guimos hace una eliminación de los términos                      Cabe dejar claro que dependiendo del modelo
que son identificados en el diccionario. Este                      de datos los términos pueden ser unigramas,
proceso se muestra en la figura 1.                                 bigramas o skipgramas. En el caso de los uni-
     Las repeticiones de letras, repeticiones de                   gramas, por ejemplo, si se construye un vec-
sı́labas y mayúsculas son identificadas y eli-                    tor con la frecuencia de los términos según
minadas pero estos términos se marcan como                        su polaridad con valores de polaridad desde
potenciales identificadores de énfasis. Ejem-                     -1.0 hasta 1.0, el vector que se obtiene serı́a
plos son: EXCELENTE, graciassss, bue-                              como el que se muestra en la figura 2. En este
nisı́simo. En esta fase se identifican los                         vector por ejemplo se muestran dos términos
tweets que contienen palabras positivas con                        con polaridad, según diccionario, entre el -0.8
énfasis para su posterior uso.                                    y -0.9, un término con polaridad entre 0.1 y
                                                                   0.2, y otro con polaridad mayor a 0.9. En es-
3.3   Representación vectorial de                                 te caso, en nuestro diccionario, la polaridad
      baja dimensión                                              se representa con valores distribuidos desde
Dos caracterı́sticas representadas en los vec-                     lo más negativo hasta lo positivo con valores
tores tienen que ver con la presencia y po-                        entre -1.0 y 0 para los negativos y 0 a 1.0 para
laridad de los emoticones y con la presen-                         los positivos.
cia de partı́culas de negación. Además, al                           Para el taller TASS2016 quisimos evaluar
desarrollar esta investigación se pudo obser-                     inicialmente el uso de vectores con la menor
var que los términos positivos con marcado-                       dimensión posible, ası́ que en lugar de vecto-
res de énfasis son un potencial identificador                     res de 20 celdas utilizamos solo vectores de 5
de la polaridad positiva de los textos que                         celdas para cada grupo de caracterı́sticas, en
los contienen, por lo tanto esta caracterı́sti-                    lugar de saltos de 0.1 el rango utilizado es de
                                                              25
                                   E. Casasola Murillo, G. Marín Raventós


                                Figura 2: Vector de caracterı́sticas

0.5.                                                      skip-gramas previos. Por el momento es-
                                                          tas variantes no fueron enviadas como expe-
3.4    Modelos locales y globales de                      rimentos a TASS2016 sino solo las versiones
       representación del lenguaje                       iniciales.
Nuestra propuesta pretende representar en
los vectores de caracterı́sticas información
propia obtenida durante el proceso de entre-              4     Metodologı́a
namiento, al igual que datos que represen-
ten información obtenida de modelos de len-              Utilizando el diccionario, el normalizador y
guaje del español en general. En nuestro caso            el modelo de representación vectorial se pro-
se utilizó inicialmente el diccionario genera-           cedió a crear vectores de respresentación con
do a partir del corpus recolectado como insu-             diferentes configuraciones. Primeramente se
mo para obtener de él la información general            construyó una versión con vectores de di-
del español. En el momento de entrenamien-               mensión 20 distribuyendo la polaridad de los
to, la polaridad de los términos en cada tweet           términos según la polaridad almacenada pa-
son conocidos para ese conjunto de datos.                 ra unigramas en el diccionario local. En este
La información global es la que se ha calcu-             caso se pretende evaluar solamente el uso del
lado previamente y se encuentra almacena-                 diccionario y los marcadores de énfasis como
da en forma de diccionarios. En nuestra pro-              repeticiones y mayúsculas. Este primer expe-
puesta lo que queremos hacer es representar               rimento es el denominado GASUCR-01. El
en el vector las frecuencias de los términos             segundo experimento consistió en evaluar un
de cada tweet distribuidos según su polari-              modelo un poco más robusto a nivel local con
dad pero utilizar diferentes modelos de re-               bigramas y la polaridad para el unigrama en
presentación de lenguaje para llevar a cabo              el diccionario, si el bigrama no está presen-
este cálculo. El diccionario utilizado en es-            te durante el proceso de evaluación. En este
tos experimentos fue nuestra versión con uni-            caso se crearon vectores de menor dimensión
gramas. Se pretende utilizar representaciones             para los datos locales, con solo cinco campos.
con bigramas y una versión de skipgramas                 Esta ejecución se idendificó como experimen-
que incluye solo los términos anteriores a la            to GASUCR-01-noEMO-noPartNeg. Esta es
palabra que se desea representar. Durante el              la implementación base para luego evaluar el
entrenamiento, la polaridad obtenida en for-              uso de bigramas tomados del contexto glo-
ma local es almacenada al igual que las fre-              bal. Esta versión base también fue enviada
cuencias tomadas de diccionarios de polari-               a la tarea de 4 categorı́as. En este caso, lo
dad global. Por lo tanto, los vectores cuen-              que se hizo fue unir las categorı́as +P y P en
tan con entradas para las distribuciones de               una sola, y la categorı́a +N con la N. El ter-
polaridad local y las distribuciones de polari-           cer experimento agregaba al anterior el uso
dad global. Aquı́ es donde incorporamos los               de los emoticones, aparición de términos po-
diferentes modelos de lenguaje. Inicialmente              sitivos con énfasis y las partı́culas negativas.
trabajamos con unigramas para obtener re-                 En los resultados esta versión se identificó co-
sultados base para posteriores experimentos.              mo GASUCR-04 En esta versión de TASS no
Posteriormente, se genera un diccionario para             nos dió tiempo de ejecutar las versiones con
bigramas y otro para lo que definimos como                bigramas globales, ni skipgramas.
                                                    26
       Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento


5   Resultados                                                     de baja dimensión, y modelos de represen-
Los resultados oficiales obtenidos para las eje-                   tación del texto con caracterı́sticas locales y
cuciones antes mencionadas son los que se                          globales. El trabajo además hace uso de ca-
muestran en las Tablas 1 y 2. En estas figu-                       racterı́sticas comunes con otros como los son
ras la columna Ac. muestra la exactitud, P                         el uso de emoticones y partı́culas negativas.
se refiere a la Macro Precisión, R al Ma-                         Como trabajo futuro tenemos pendiente la
cro Exhaustividad y F1 al Macro F1. En                             evaluación usando 3 categorı́as de los datos
los resultados generales de TASS los resul-                        que hacen uso de contexto local con bigra-
tados del grupo aparecen con el id indica-                         mas y caracterı́sticas adicionales como uso
do bajo el nombre del grupo GASUCR. En                             de emoticones, palabras positivas con énfasis,
nuestro caso con el experimento 01 obtene-                         y partı́culas de negación. Esperamos que los
mos los casos base para el uso de unigramas                        mejores resultados sean obtenidos al incorpo-
globales con vectores de dimensión 20 y los                       rar los nuevos modelos de lenguaje que esta-
bigramas locales con dimensión 5. Es impor-                       mos calculando para bigramas y skipgramas
tante observar que los bigramas locales con                        previos al unirlo con nuestro método de re-
dimensión 5 y las caracterı́sticas de énfasis                    presentación en vectores de baja dimensión.
positivo, partı́culas de negación y emoticones                    Se desea estudiar el efecto de la reducción
producen un leve incremento pasando de 0.32                        del tamaño del vector al igual que técnicas
a 0.41. Otro aspecto que rescatamos es el au-                      de extrapolación de la polaridad en los mo-
mento de la exactitud al pasar a la tarea de                       delos para los términos que no aparecen en
3 categorı́as.                                                     los datos de entrenamiento.

                                                                   Bibliografı́a
Tabla 1: Resultados Tarea 1 con 5 levels y                         Batista, F. y R. Ribeiro. 2013. Sentiment
corpus completo)                                                     analysis and topic classification based on
 id             Ac.   P       R      F1                              binary maximum entropy classifiers. Pro-
 01             0.342 0.217 0.237 0.227                              cesamiento de Lenguaje Natural, 50:77–
 01-noEmNeg 0.326 0.334 0.258 0.291                                  84.
 04             0.410 0.268 0.242 0.254
                                                                   Cabanlit, M. A. y K. Junshean Espinosa.
                                                                     2014. Optimizing n-gram based text fea-
                                                                     ture selection in sentiment analysis for
Tabla 2: Resultados Tarea 1 con 3 niveles y                          commercial products in twitter through
corpus completo                                                      polarity lexicons. En Information, Inte-
 id             Ac.    P      R       F1
                                                                     lligence, Systems and Applications, IISA
 01-noEmNeg 0.373 0.212 0.303 0.250                                  2014, The 5th International Conference
                                                                     on, páginas 94–97. IEEE.
   Estos casos se fueron seleccionando para ir
evaluando en forma incremental cada uno de                         Cambria, E., B. Schuller, Y. Xia, y C. Hava-
los aspectos relacionados a nuestra propues-                         si. 2013. New avenues in opinion mining
ta. Con cada caracterı́stica nueva se trata de                       and sentiment analysis. Intelligent Sys-
determinar su impacto sobre los valores de                           tems, IEEE, PP(99):1–1.
exactitud, precisión y exhaustividad.                             Dı́az-Galiano, M. y A. Montejo-Ráez. 2015.
                                                                       Participación de sinai dw2vec en tass
6   Conclusiones y trabajo futuro                                      2015. En Proceedings del Taller TASS
El marco de evaluación de TASS es provecho-                           2015 en Análisis de Sentimiento de la XX-
so para los grupos que inician la investigación                       XI Conferencia SEPLN 2015, páginas 59–
en análisis de sentimiento en español con el                         64.
fin de extenderla a otras latitudes. En nues-
                                                                   Feldman, R. 2013. Techniques and appli-
tro caso pudimos evaluar y comparar la ca-
                                                                      cations for sentiment analysis. Commun.
lidad de los resultados de los primeros casos
                                                                      ACM, 56(4):82–89, Abril.
base de nuestro trabajo. Observamos los pri-
meros resultados con un sistema que utiliza                        Garcı́a-Cumbreras, M., J. Villena-Román,
un método de normalización con identifica-                         E. Martı́nez Cámara, M. C. Dı́az-
ción de potenciales marcadores de énfasis, un                      Galiano, M. T. Martı́n Valdivia, y L. A.
modelo de representación basado en vectores                         Ureña López.    2016.    Overview of
                                                              27
                                   E. Casasola Murillo, G. Marín Raventós


   tass 2016.   En Proceedings of TASS                        sis in twitter: a graph-based hashtag sen-
   2016: Workshop on Sentiment Analysis at                    timent classification approach. En Pro-
   SEPLN co-located with the 32nd SEPLN                       ceedings of the 20th ACM international
   Conference (SEPLN 2016), Salamanca,                        conference on Information and knowledge
   Spain, September.                                          management, páginas 1031–1040. ACM.
Guo, L. y X. Wan. 2012. Exploiting syntactic
  and semantic relationships between terms
  for opinion retrieval. Journal of the ame-
  rican society for information science and
  technology, 63(11):2269–2282, Noviembre.
Indurkhya, N. y F. J. Damerau. 2010. Hand-
   book of natural language processing, volu-
   men 2. CRC Press.
Kiritchenko, S., X. Zhu, y S. M. Mohammad.
   2014. Sentiment analysis of short infor-
   mal texts. Journal of Artificial Intelligen-
   ce Research, páginas 723–762.
Martı́nez-Cámara, E., M. Á. Garcı́a-
  Cumbreras, M. T. Martı́n-Valdivia, y
  L. A. Ureña-L’opez. 2015. Sinai-emma:
  Vectores de palabras para el análisis de
  opiniones en twitter.   En Proceedings
  del Taller TASS 2015 en Análisis de
  Sentimiento de la XXXI Conferencia
  SEPLN 2015, páginas 41–46.
Melero, M., A.-B. Cardús, A. Moreno,
  G. Rehm, K. de Smedt, y H. Uszkoreit.
  2012. The Spanish language in the digital
  age. Springer.
Sharma, A. y S. Dey. 2012. A comparati-
  ve study of feature selection and machine
  learning techniques for sentiment analysis.
  En Proceedings of the 2012 ACM Research
  in Applied Computation Symposium, pági-
  nas 1–7. ACM.
Turney, P. D. 2002. Thumbs up or thumbs
  down?: semantic orientation applied to
  unsupervised classification of reviews. En
  Proceedings of the 40th annual meeting on
  association for computational linguistics,
  páginas 417–424. Association for Compu-
  tational Linguistics.
Villena-Román, J., J. Garcı́a Morera,
   M. Á. Garcı́a-Cumbreras, E. M. Cámara,
   M. T. M. Valdivia, y L. A. U. López.
   2015. Overview of tass 2015. En Procee-
   dings del Taller TASS 2015 en Análisis
   de Sentimiento de la XXXI Conferencia
   SEPLN 2015, páginas 13–21.
Wang, X., F. Wei, X. Liu, M. Zhou, y
  M. Zhang. 2011. Topic sentiment analy-
                                                    28
                     TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 29-33


        LABDA at the 2016 TASS challenge task: using word
           embeddings for the sentiment analysis task∗
LABDA en la competición TASS 2016: utilizando vectores de palabras para
                 la tarea de análisis de sentimiento

             Antonio Quirós1,2 , Isabel Segura-Bedmar1 , and Paloma Martı́nez1
              1
                  Departamento de Informática, Universidad Calos III de Madrid
                   Avd. de la Universidad, 30, 28911, Leganés, Madrid, España
                     100342879@alumnos.uc3m.es, isegura,pmf@inf.uc3m.es
                                          2
                                   Sngular Data&Analytics
                   Av. LLano Castellano 13, Planta 5, 28034 Madrid, España
                                antonio.quiros@sngular.team

      Resumen: Este artı́culo describe la participación del grupo LABDA en la tarea
      1 (Sentiment Analysis at global level) de la competición TASS 2016. En nuestro
      enfoque, los tweets son representados por medio de vectores de palabras y son cla-
      sificados utilizando algoritmos como SVM y regresión logı́stica.
      Palabras clave: Análisis de Sentimiento, Vectores de palabras
      Abstract: This paper describes the participation of the LABDA group at the Task
      1 (Sentiment Analysis at global level). Our approach exploits word embedding re-
      presentations for tweets and machine learning algorithms such as SVM and logistics
      regression.
      Keywords: Sentiment Analysis, Word embeddings

1   Introduction                                                resources for sentiment analysis of tweets in
Knowing the opinion of customers or users                       Spanish. This paper describes the participa-
has become a priority for companies and or-                     tion of the LABDA group at the Task 1 (Sen-
ganizations in order to improve the quality of                  timent Analysis at global level). In this task,
their services and products. With the ongoing                   the participating systems have to determine
explosion of social media, it affords a signifi-                the global polarity of each tweet in the test
cant opportunity to poll the opinion of many                    dataset. There are two different evaluations:
Internet users by processing their comments.                    one based on 6 different polarity labels (P+,
However, it should be noted that sentiment                      P, NEU, N, N+, NONE) and another based
analysis, which can be defined as the auto-                     on just 4 labels (P, N, NEU, NONE). A de-
matic analysis of opinion in texts (Pang and                    tailed description of the task can be found
Lee, 2008), is a challenging task because it is                 in the overview paper of TASS 2016 (Garcı́a-
not strange that different people assign dif-                   Cumbreras et al., 2016). Our approach ex-
ferent polarities to a given text. On Twitter,                  ploits word embedding representations for
the task is even more difficult, because the                    tweets and machine learning algorithms such
texts are small (only 140 characters) and are                   as SVM and logistics regression. The word
charectized by their informal style language,                   embedding model can yield significant dimen-
many grammatical errors and spelling mista-                     sionality reduction compared to the classical
kes, slang and vulgar vocabulary and abbre-                     Bag-Of-Word (BoW) model. The dimensio-
viations.                                                       nality redution can have several positive ef-
                                                                fects on our algorithms such as faster trai-
   Since their introduction in 2013, the TASS
                                                                ning, avoiding overfitting and better perfor-
shared task editions have had as main goal
                                                                mance.
to promote the development of methods and
                                                                   The paper is organized as follows. Section
∗
  This work was supported by eGovernAbility-Access              2 describes our approach. The experimental
project (TIN2014-52665-C2-2-R).                                 results are presented and discussed in Section
                                                    ISSN 1613-0073
                                   A. Quirós, I. Segura-Bedmar, P. Martínez


3. We conclude in Section 4 with a summary                 vert the tweets to lowercase and replace miss-
of our findings and some directions for future             pelled accented letters with the correct one
work.                                                      (for instance “à” with “á”). We also treat
                                                           elongations (that is, the repetition of a cha-
2   System                                                 racter) by removing the repetition of a cha-
In this paper, we study the use of word em-                racter after its second occurrence (for exam-
beddings (also known as word vectors) in or-               ple, “hoooolaaaa” would be translated to
der to represent tweets and then examine se-               “hola”). We then decided to take into account
veral machine learning algorithms to classify              laughs (for instance “jajaja”) which turned
them. Word embeddings have shown promi-                    out to be challenging because of the diverse
sing results in NLP tasks, such as named                   ways they are expressed (i.e. expressions li-
entity recognition (Segura-Bedmar, Suárez-                ke “jajajaja” or “jejeje” and even misspelled
Paniagua, and Martınez, 2015), relation ex-                ones like “jajjajaaj”) We addressed this using
traction (Alam et al., 2016), sentiment analy-             regular expressions to standardize the diffe-
sis (Socher et al., 2013b) or parsing (Socher              rent forms (i.e. “jajjjaaj” to “jajaja”) and
et al., 2013a). A word embedding is a fun-                 then replace them with the word “risas”. Fi-
ction to map words to low dimensional vec-                 nally we remove all non-letters characters and
tors, which are learned from a large collection            all stopwords present in tweets1 .
of texts. At present, Neural Network is one of                     Orientation       Emoticons
the most used learning techniques for gene-                          Positive        :-), :), :D, :o), :], D:3,
rating word embeddings (Mikolov and Dean,                                            :c), :>, =], 8), =),
2013). The essential assumption of this mo-                                          :}, :ˆ), :-D, 8-D, 8D,
del is that semantically close words will have                                       x-D, xD, X-D, XD,
similar vectors (in terms of cosine similarity).                                     =-D, =D, =-3, =3,
Word embeddings can help to capture seman-                                           BˆD, :’), :’), :*, :-*,
tic and syntactic relationships of the corres-                                       :ˆ*, ;-), ;), *-), *), ;-
ponding words.                                                                       ], ;], ;D, ;ˆ), >:P, :-P,
    While the well-known Bag-of-Words                                                :P, X-P, x-p, xp, XP,
(BoW) model involves a very large number                                             :-p, :p, =p, :-b, :b
of features (as many as the number of non-
stopwords words with at least a minimum                               Negative       >:[, :-(, :(, :-c, :-<,
number of occurrences in the training data),                                         :<, :-[, :[, :{, ;(, :-
the word embedding representation allows                                             ||, >:(, :’-(, :’(, D:<,
a significant reduction in the feature set                                           D=, v.v
size (in our case, from million to just 300).
The dimensionality reduction is a desirable
goal, because it helps in avoiding overfitting
and leads to a reduction of the training and               Table 1: List of positive and negative emoti-
classification times, without any performance              cons
loss.
    As a preprocessing step, tweets must be                   Once the tweets are preprocessed, they are
cleaned. First, we remove all links and urls.              tokenized using the NLKT toolkit (a Pyt-
We then remove usernames which can be ea-                  hon package for NLP); we also performed
sily recognized because their first character is           experimentation by lemmatizing each tweet
the symbol @. We then transform the hash-                  using MeaningCloud2 Text Analytic software
tags to words by removing its first charac-                to compare both approaches. Then, for each
ter (that is, the symbol #). Taking advanta-               token, we search its vector in the word em-
ge of regular expressions, the emoticons are               bedding model. We use a pretrained model
detected and classified in order to count the              (Cardellino, 2016), which was generated by
number of positive and negative emoticons in               using the word2vec algorithm (Mikolov and
each tweet and then we remove them from the                Dean, 2013) from a collection of Spanish texts
text. Table 1 shows the list of positive and               with approximately 1.5 billion words. The di-
negative emoticons, which were taken from                  mension of the word embedding is 300. It
the wikipedia page https://en.wikipedia.                       1
                                                                   http://snowball.tartarus.org/algorithms/spanish/stop.txt
                                                               2
org/wiki/List\_of\_emoticons. We con-                              https://www.meaningcloud.com/
                                                     30
                LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task


should be noted that these texts were ta-                               negEmo: number of negative emoticons
ken from different resources such as Spanish                            present in the tweet.
Wikipedia, WikiSource and Wikibooks, but
none of them contains tweets. Therefore, it                           For the posWords and negWords features
is possible that the main characteristics of                      we used the iSOL lexicon(Molina-González et
the social media texts (such as informal style                    al., 2013), a list composed by 2,509 positive
language, noisy, plenty of grammatical errors                     words and 5,626 negative words. As descri-
and spelling mistakes, slang and vulgar voca-                     bed before, for the emoticons we used the lis-
bulary, abbreviations, etc) are not correctly                     ted in Table 1, but also added to the positive
represented in this model. One of the main                        ones the number of laughs detected; and also,
problems is that there is a significant number                    we included the number of recommendations
of words (almost a 13 % of the vocabulary, re-                    present in the form of a “Follow Friday” hash-
presenting the 6 % of words occurrences) that                     tag (#FF), due to its ease of detection and
are not found in the model. We perform a re-                      its positive bias.
view of a small sample of these words, sho-                           Classification is performed using scikit-
wing that most of them were mainly hash-                          learn, a Python module for machine learning.
tags.                                                             This package provides many algorithms such
    In our approach, a tweet of n tokens (T =                     as Random Forest, Support Vector Machine
w1 , w2 , ..., wn ) is represented as the centroid                (SVM) and so on. One of its main advantages
of the word vectors w    ~i of its tokens, as shown               is that it is supported by extensive documen-
in the following equation:                                        tation. Moreover, it is robust, fast and easy
                                                                  to use.
           n           PN                                             As stated before, we have two main trai-
        1                 j=1 w
                              ~j .T F (wj , t)
   T~ =
           X
              w
              ~i =       PN                           (1)         ning models: Averaged centroids and the ave-
        n i=1               j=1 T F (wj , t)                      raged centroids including the inverted docu-
                                                                  ment frequency, for both the lemmatized and
   where N is the vocabulary size, that is,
                                                                  not-lemmatized texts. We performed experi-
the total number of distinct words, while
                                                                  ments using three different classifiers: Ran-
T F (wj , t) refers to the number of occurren-
                                                                  dom Forests, Support Vector Machines and
ces of the j-th vocabulary word in the tweet
                                                                  Logistic Regression because these classifiers
T.
                                                                  often achieved the best results for text clas-
   We also explore the effect of including the
                                                                  sification and sentiment analysis.
inverse document frequencies IDF to repre-
                                                                      Also we evaluated the impact of applying
sent tweets (see Equation 2). This helps to
                                                                  a set of emoticon’s rules as a pre-classification
increase the weight of words that occur of-
                                                                  stage, similar to (Chikersal et al., 2015), in
ten, but only in a few documents, while it re-
                                                                  which we determine a first stage polarity for
duces the relevance of words that occur very
                                                                  each tweet as follows:
frequently in a larger number of texts.
                                                                        If posEmo is greater than zero and negE-
        n          PN                                                   mo is equal to zero, the tweet is marked
     1                j=1 w
                          ~j .T F (wj , t).IDF (wj )
T~ =
        X
           w
           ~i =      PN                                                 as “P”.
     n i=1              j=1 T F (wj , t).IDF (wj )
                                                      (2)               If negEmo is greater than zero and posE-
                          log|D|
   having IDF (wj ) = |tw∈D:w         where |D|                         mo is equal to zero, the tweet is marked
                               j ∈tw|
                                                                        as “N”.
refers to the number of tweets.
   In addition to using the centroid, we assess                         If both posEmo and negEmo are grea-
the impact of complementing the tweet model                             ter than zero, the tweet is marked as
with the following additional features:                                 “NEU”.
     posWords: number of positive words pre-                            If both posEmo and negEmo are equal to
     sent in the tweet.                                                 zero, the tweet is marked as “NONE”.
     negWords: number of negative words                               Then, after the classification takes place
     present in the tweet.                                        we made three tests: i) Applying no rule,
     posEmo: number of positive emoticons                         ii) honoring the polarity defined by the rule,
     present in the tweet.                                        which means, we keep the predefined polarity
                                                            31
                                       A. Quirós, I. Segura-Bedmar, P. Martínez


if the tweet was marked as “P” or “N”, ot-                          Run             P       R      F1     Acc
herwise we take the value estimated by the                          RUN-1         0.411   0.449   0.429   0.527
classifier, and iii) a mixed approach where                         RUN-2         0.412   0.448   0.429   0.527
we give each polarity a value (N+: -2; N: -1;                       RUN-3         0.402   0.436   0.418   0.549
NEU,NONE: 0; P: 1; P+: 2) and performed
an arithmetic sum of both the predefined and
estimated polarity if and only if they are not                 Table 2: Results for Sentiment Analysis at
equal; with that for instance, if the classifier               global level (5 levels, Full test corpus)
marked a tweet as “N” and the rules mar-
ked it as “P” the tweet will be classified as                       Run             P       R      F1     Acc
“NEU”.                                                              RUN-1         0.506   0.510   0.508   0.652
                                                                    RUN-2         0.508   0.508   0.508   0.652
3       Results                                                     RUN-3         0.512   0.511   0.511   0.653
In order to choose the best-performing clas-
sifiers, we use 10-fold cross-validation becau-                Table 3: Results for Sentiment Analysis at
se there is no development dataset and this                    global level (3 levels, Full test corpus)
strategy has become the standard method
in practical terms. Our experiments showed
that, although the results were similar3 , the                    With the settings mentioned above, the
best settings for the 5-levels task are:                       obtained results are extremely similar, but we
                                                               can state that, in terms of Accuracy, Logis-
        RUN-1: Support Vector Machine, over                    tic Regression report the best results; and,
        the averaged centroids without applying                even it’s not measured in this work, is worth
        any rules for pre-defining polarities.                 mentioning that Logistic Regression’s perfor-
        RUN-2: Support Vector Machine, over                    mance was observably faster.
        the averaged centroids and applying the
        mixed rules approach.                                  4     Conclusions and future work
        RUN-3: Logistic Regression, over the                   This paper explores the use of word embed-
        centroids with inverted document fre-                  dings for the task of sentiment analysis. Ins-
        quency and applying the mixed rules ap-                tead of using, the bag-of-words model to re-
        proach.                                                present tweets, these are represented as word
                                                               vectors taken from a pre-trained model of
    and for the 3-levels task are:                             word embeddings. An important advantage
                                                               of word embedding model compared to the
        RUN-1: Support Vector Machine, over
                                                               technique of bag-of-words representation is
        the averaged centroids and applying the
                                                               that it achieves a significant dimensional re-
        mixed rules approach.
                                                               duction of the feature set needed to represent
        RUN-2: Logistic Regression, over the                   tweets and leads, therefore, to a reduction of
        centroids with inverted document fre-                  training and testing time of the algorithms.
        quency and applying the mixed rules ap-                    In order to use word embedding models
        proach.                                                properly, a preprocessing stage had to be
        RUN-3: Logistic Regression, over the                   completed before training a classifier. Due to
        averaged centroids and applying the mi-                the unstructured nature of the tweets, this
        xed rules approach.                                    preprocessing proved to be a very important
                                                               step in order to standardize at some degree
   Tables 2 and 3 show the results for the-                    the input data. The experimentation showed
se settings provided by the TASS submission                    that the three tested classifiers obtained very
system. For each run, accuracy is provided as                  similar results, with Random Forest having
well as the macro-averaged precision, recall                   slight worse performance and Logistic Re-
and F1-measure. As expected, the results for                   gression being slightly better and much more
3 levels are higher than for 5 levels because                  faster.
the training dataset is larger.                                    One of the main drawback of our approach
    3
    Experiments showed that not-lemmatized text
                                                               is that many words do not have a word vector
performed better in all settings, hence the best set-          in the word embedding model used for our
tings reported here is using not-lematized model               experiments. An analysis showed that many
                                                         32
               LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task


of these words come from hashtags, which are                     Pang, B. and L. Lee. 2008. Opinion mining
usually short phrases. Therefore, we should                        and sentiment analysis. Foundations and
apply a more sophisticated method in order                         trends in information retrieval, 2(1-2):1–
to extract the words forming hashtag.                              135.
    As future work, we also plan to use a word
                                                                 Segura-Bedmar, I., V. Suárez-Paniagua, and
embedding model trained on a collection of
                                                                    P. Martınez.    2015.    Exploring word
text from Spanish social media. We think
                                                                    embedding for drug name recognition.
that this will have a positive effect of the per-
                                                                    In SIXTH INTERNATIONAL WORKS-
formance of our system to identify the pola-
                                                                    HOP ON HEALTH TEXT MINING AND
rity of tweets because this model will be ge-
                                                                    INFORMATION ANALYSIS (LOUHI),
nerated from documents characterized by the
                                                                    page 64.
main features that describe social media texts
(for example, informal style language, plenty                    Socher, R., J. Bauer, C. D. Manning, and
of grammatical errors and spelling mistakes,                        A. Y. Ng. 2013a. Parsing with composi-
slang and vulgar vocabulary).                                       tional vector grammars. In ACL (1), pa-
                                                                    ges 455–465.
Acknowledgments
                                                                 Socher, R., A. Perelygin, J. Y. Wu,
This work was supported by eGovernAbility-                          J. Chuang, C. D. Manning, A. Y. Ng, and
Access project (TIN2014-52665-C2-2-R).                              C. Potts. 2013b. Recursive deep models
                                                                    for semantic compositionality over a sen-
References                                                          timent treebank. In Proceedings of the
Alam, F., A. Corazza, A. Lavelli, and R. Za-                        conference on empirical methods in natu-
   noli. 2016. A knowledge-poor approach to                         ral language processing (EMNLP), volume
   chemical-disease relation extraction. Da-                        1631, page 1642. Citeseer.
   tabase, 2016:baw071.
Cardellino, C. 2016. Spanish Billion Words
  Corpus and Embeddings, March.
Chikersal, P., S. Poria, E. Cambria, A. Gel-
  bukh, and C. E. Siong. 2015. Modelling
  public sentiment in twitter: using linguis-
  tic patterns to enhance supervised lear-
  ning. In International Conference on Inte-
  lligent Text Processing and Computational
  Linguistics, pages 49–65. Springer.
Garcı́a-Cumbreras, M. A., J. Villena-Román,
  E. Martı́nez-Cámara, M. C. Dı́az-Galiano,
  M. T. Martı́n-Valdivia, and L. A. U.
  na López. 2016. Overview of tass 2016.
  In Proceedings of TASS 2016: Works-
  hop on Sentiment Analysis at SEPLN co-
  located with the 32nd SEPLN Conferen-
  ce (SEPLN 2016), Salamanca, Spain, Sep-
  tember.
Mikolov, T. and J. Dean. 2013. Distributed
  representations of words and phrases and
  their compositionality. Advances in neural
  information processing systems.
Molina-González, M. D., E. Martı́nez-Cáma-
  ra, M.-T. Martı́n-Valdivia, and J. M.
  Perea-Ortega. 2013. Semantic orientation
  for polarity classification in spanish re-
  views. Expert Systems with Applications,
  40(18):7250–7257.
                                                           33
                   TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 35-39


      JACERONG at TASS 2016: An Ensemble Classifier for
       Sentiment Analysis of Spanish Tweets at Global Level
    JACERONG en TASS 2016: Combinación de clasificadores para el
       análisis de sentimientos de tuits en español a nivel global

                                Jhon Adrián Cerón-Guzmán
                          Santiago de Cali, Valle del Cauca, Colombia
                                   jadrian.ceron@gmail.com

      Resumen: Este artı́culo describe un enfoque basado en conjuntos de clasificadores
      que se ha desarrollado para participar en la Tarea 1 del taller TASS sobre análisis de
      sentimientos de tuits en español a nivel global. Los conjuntos se construyen sobre
      la combinación de sistemas con la correlación absoluta más baja entre sı́. Estos
      sistemas son capaces de tratar con formas léxicas no estándar en los tweets, con el fin
      de mejorar la calidad del análisis de lenguaje natural. Para realizar la clasificación
      de polaridad, el enfoque utiliza caracterı́sticas básicas que han probado su poder
      discriminativo, ası́ como caracterı́sticas de n-gramas de palabras y caracteres. Luego,
      las salidas de clasificadores de Regresión logı́stica, que pueden ser etiquetas de clase o
      probabilidades para cada clase, se utilizan para construir conjuntos de clasificadores.
      Los resultados experimentales muestran que la combinación menos correlacionada
      de 25 sistemas, la cual elige la clase con la probabilidad promedio no poderada más
      alta, es la configuración que mejor se adapta a la tarea, alcanzando una precisión
      global de 62.0% en la evaluación de seis etiquetas, y de 70.5% en la evaluación de
      cuatro etiquetas.
      Palabras clave: Análisis de sentimientos, clasificación de polaridad, combinación
      de clasificadores, normalización léxica, tuis en español, Twitter
      Abstract: This paper describes an ensemble-based approach developed to partic-
      ipate in TASS-2016 Task 1 on sentiment analysis of Spanish tweets at global level.
      Ensembles are built on the combination of systems with the lowest absolute correla-
      tion with each other. The systems are able to deal with non-standard lexical forms
      in tweets, in order to improve the quality of natural language analysis. To support
      the polarity classification, the approach uses basic features that have proved their
      discriminative power, as well as word and character n-gram features. Then, outputs
      from Logistic Regression classifiers, which may be either class labels or probabilities
      for each class, are used to build ensembles. Experimental results show that the
      less-correlated combination of 25 systems, which chooses the class with the highest
      unweighted average probability, is the setting that best suits to the task, achieving
      an overall accuracy of 62.0% in the six-labels evaluation, and of 70.5% in the four-
      labels evaluation.
      Keywords: Ensemble classifier, lexical normalization, polarity classification, senti-
      ment analysis, Spanish tweets, Twitter

1   Introduction                                              tional methods. Around election time, sen-
What people say on social media about is-                     timent analysis of political tweets has been
sues of their everyday life, the society, and                 widely used to capture trends in public opin-
the world in general, has turned into a rich                  ion regarding important issues such as vot-
source of information to understand social                    ing intention (Gayo-Avello, 2013). However,
behavior. Twitter content, in particular,                     analyzing this content also presents several
has caught the attention of researchers who                   challenges, including the development of text
have investigated its potential for conducting                analysis approaches based on Natural Lan-
studies on the human subjectivity at large                    guage Processing techniques, which properly
scale, which was not feasible using tradi-                    adapt to the informal genre and the free writ-
                                                  ISSN 1613-0073
                                            J. A. Cerón-Guzmán


ing style of Twitter (Han and Baldwin, 2011;             2.1     Preprocessing
Cerón-Guzmán and León-Guzmán, 2016).                 The process of text cleaning and normaliza-
    TASS is a workshop aimed at fostering re-            tion is performed in two phases: basic pre-
search on sentiment analysis of Spanish Twit-            processing and advanced preprocessing.
ter data, which provides a benchmark evalu-
                                                         2.1.1 Basic Preprocessing
ation to compare the latest advances in the
field (Garcı́a-Cumbreras et al., 2016). One of           The following simple rules are implemented
the proposed tasks is to determine the opin-             as regular expressions:
ion orientation expressed in tweets at global
level. Task 1 consists on assigning one of                  • Removing URLs and emails.
six labels (P+, P, NEU, N, N+, NONE) to                     • HTML entities are mapped to textual
a tweet in the six-labels evaluation; or one                  representations (e.g., “&lt;” → “<”).
of four labels (P, NEU, N, NONE) in the
four-labels evaluation. Here, P, N, and NEU,                • Specific Twitter terms such as mentions
stand for positive, negative, and neutral, re-                (@user) and hashtags (#topic) are re-
spectively; NONE, instead, means no senti-                    placed by placeholders.
ment. The “+” symbol is used as intensifier.                • Unknown characters are mapped to their
    This paper presents an ensemble-based                     closest ASCII variant, using the Python
approach to polarity classification of Span-                  Unidecode module for the mapping.
ish tweets, developed to participate in Task 1
proposed by the organizing committee of the                 • Consecutive repetitions of a same char-
TASS workshop. The ensemble members are                       acter are reduced to one occurrence.
(relatively) highly correct classifiers with the            • Emoticons are recognized and then clas-
lowest absolute correlation with each other.                  sified into positive and negative, ac-
The output from each classifier, which may                    cording to the sentiment they convey
be either a class label or probabilities for each             (e.g., “:)” → “EMO POS”, “:(” →
class, is used to assign the polarity to a tweet              “EMO NEG”).
based on a majority rule or on the highest un-
weighted average probability. Moreover, clas-               • Unification of punctuation marks (Vi-
sifiers are adapted to deal with non-standard                 lares, Alonso, and Gómez-Rodrıguez,
lexical forms in tweets, in order to improve                  2014).
the quality of natural language analysis.
    The remainder of this paper is organized             2.1.2 Advanced Preprocessing
as follows. Section 2 describes the com-                 Once the set of simple rules has been applied,
mon architecture of the ensemble members                 the tweet text is tokenized and morpho-
(i.e., classifiers). Next, the submitted exper-          logically analyzed by FreeLing (Padró and
iments, as well as the obtained results, are             Stanilovsky, 2012). In this way, for each re-
discussed in Section 3. Finally, Section 4 con-          sulting token, its lemma and Part-of-Speech
cludes the paper.                                        (POS) tag are assigned. Taking these data
                                                         as input, the following advanced preprocess-
2   The System Architecture                              ing is applied:

The tweet text is passed through the pipeline               • Lexical normalization. Each token is
of each system in order to assign it a class la-              passed through a set of basic modules
bel or a probability to be of a certain class.                of FreeLing (e.g., dictionary lookup, suf-
The pipeline, which goes from text prepro-                    fixes check, detection of numbers and
cessing to machine learning classification, is                dates, and named entity recognition)
described below. Note that the system term                    for identifying standard word forms and
is preferred over the classifier term, because a              other valid constructions. If a token
machine learning classifier receives a feature                is not recognized by any of the mod-
vector and produces a class label or probabil-                ules, it is marked as out-of-vocabulary
ities for each class; instead, the system term                (OOV) word. Then, a confusion set
enables to conceive the whole process, from                   is formed by normalization candidates
preprocessing to machine learning classifica-                 which are identical or similar to the
tion.                                                         graphemes or phonemes that make the
                                                    36
           JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Analysis of Spanish Tweets at Global Level


      OOV word. These candidates are el-                              • The number of positive and negative
      ements of the union of a dictionary                               words, relative to the ElhPolar lexicon
      of Spanish standard word forms and a                              (Saralegi and Vicente, 2013), the AFINN
      gazetteer of proper nouns. The best nor-                          lexicon (Nielsen, 2011), or an union of
      malization candidate for the OOV word                             both lexicons. In a negated context, the
      is which best fits a statistical language                         label of a polarity word is inverted (i.e.,
      model. The language model was esti-                               positive words become negative words,
      mated from the Spanish Wikipedia cor-                             and vice versa). Additionally, a third
      pus. Lastly, the selected candidate is                            feature labels the tweet with the class
      capitalized according to the capitaliza-                          whose number of polarity words in the
      tion rules of the Spanish language. Ex-                           text is the highest.
      tensive research on lexical normalization                       • The number of negated contexts.
      of Spanish tweets can be read in (Cerón-
      Guzmán and León-Guzmán, 2016).                               • The number of occurrences of each Part-
                                                                        of-Speech tag.
  • Negation handling. Inspired by the
    approach proposed by Pang et al. (Pang,                       2.2.2 N-gram Features
    Lee, and Vaithyanathan, 2002), this re-                       The fixed-length set of basic features is al-
    search defined a negated context as a                         ways extracted from tweets. However, the
    segment of the tweet that starts with a                       tweet text varies from another in terms of
    (Spanish) negation word and ends with                         length, number of tokens, and vocabulary
    a punctuation mark (i.e., “!”, “,”, “:”,                      used. For that reason, a process that trans-
    “?”, “.”, “;”), but only the first n  [0, 3]                 forms textual data into numerical feature vec-
    or all tokens labeled with any or a spe-                      tors of fixed length is required. This process,
    cific POS tag (i.e., verb, adjective, ad-                     known as vectorization, is performed by ap-
    verb, and common noun) are affected by                        plying the tf-idf weighting scheme (Manning,
    adding it the “ NEG” suffix. Note that                        Raghavan, and Schütze, 2008). Thus, each
    when n = 0, no token is affected.                             document (i.e., a tweet text) is represented
                                                                  as a vector d = {t1 , . . . , tn }  RV , where V
2.2    Feature Extraction                                         is the size of the vocabulary that was built
In this stage, the normalized tweet text is                       by considering word n-grams with n  [1, 4],
transformed into a feature vector that feeds                      or character n-grams with n  [3, 5] in the
the machine learning classifier. The features                     collection (i.e., the training set). The vector
are grouped into basic features and n-gram                        is, hence, formed by word n-grams, charac-
features.                                                         ter n-grams, or a concatenation of word and
                                                                  character n-grams.
2.2.1 Basic Features
Some of these features are computed before                        2.3      Machine Learning
the process of text cleaning and normaliza-                                Classification
tion is performed.                                                At the last stage, the sentiment analysis sys-
                                                                  tem classifies a given tweet as either P+, P,
  • The number of words completely in up-                         NEU, N, N+, or NONE, or assigns probabil-
    percase.                                                      ities for each class. After receiving as input
  • The number of words with more than                            the feature vector, a L2-regularized Logistic
    two consecutive repetitions of a same                         Regression classifier assigns a class label to
    character.                                                    the tweet or a probability to be of a certain
                                                                  class. The classifier was trained on the train-
  • The number of consecutive repetitions of
                                                                  ing set, using the Scikit-learn (Pedregosa et
    exclamation marks, question marks, and
                                                                  al., 2011) implementation of the Logistic Re-
    both punctuation marks (e.g., “!!”, “??”,
                                                                  gression algorithm.
    “?!”) and whether the text ends with an
    exclamation or question mark.                                 3     Experiments
  • The number of occurrences of each class                       1,720 different sentiment analysis systems
    of emoticons (i.e., positive and negative)                    were trained on the training set via 5-fold
    and whether the last token of the tweet                       cross validation, in order to find the best pa-
    is an emoticon.                                               rameter settings, namely: negation handling,
                                                            37
                                          J. A. Cerón-Guzmán


polarity lexicon, order of word and charac-                                      Macro-      Macro-   Macro-
                                                        Experiment    Accuracy
                                                                                 Precision   Recall   F1
ter n-grams, and others parameters related
                                                        run-1         0.614      0.471       0.531    0.499
to the vectorization process (e.g., lowercas-
                                                        run-2         0.619      0.476       0.535    0.504
ing, frequency thresholds, etc.). The systems
                                                        run-3         0.620      0.477       0.532    0.503
were sorted by their mean cross-validation
score, and thus the top 50 ranked were fil-            Table 1: Performance on the test set in the
tered to build the ensemble. The training              six-labels evaluation
set is a collection of 7,219 tweets, each of
                                                                                 Macro-      Macro-   Macro-
which is tagged with one of six labels (i.e.,           Experiment    Accuracy
                                                                                 Precision   Recall   F1
P+, P, NEU, N, N+, and NONE). Note that
                                                        run-1         0.702      0.564       0.565    0.564
the systems were trained for the six-labels             run-2         0.704      0.567       0.568    0.567
evaluation, and therefore the P+ and P la-              run-3         0.705      0.568       0.567    0.568
bels were merged into P, as well as the N+
and N labels were merged into N, to produce            Table 2: Performance on the test set in the
an output in accordance with the four-labels           four-labels evaluation
evaluation. Further description of the pro-
vided corpus, as well as of the training and            Class        Precision      Recall       F1-score
test sets, can be read in (Garcı́a-Cumbreras            P            0.755          0.786        0.770
et al., 2016).
                                                        NEU          0.128          0.093        0.107
   Next, the top 50 systems assigned a class
label to each tweet in a collection of 1,000,           N            0.631          0.812        0.710
which was drawn from the untagged test set              NONE         0.758          0.578        0.656
with a similar class distribution to the train-
ing set. In this stage, the objective was              Table 3: Discriminative power for each class
to find the systems with the lowest abso-              in the four-labels evaluation
lute correlation with each other; therefore,
the performance was not evaluated. Then,               evaluation, and of 0.2% in the four-labels
the less-correlated combinations of 5, 10, and         evaluation; instead, a negligible gain occurs
25 systems, were used to build the ensem-              among the “run-2” and“ run-3” experiments,
bles, whose outputs correspond to the sub-             taking additionally into account the compu-
mitted experiments. These experiments are              tational cost of running the latter.
described below:                                          As a final point, Table 3 shows how the
                                                       overall performance is affected by the low dis-
  • run-1: the less-correlated combination
                                                       criminative power of the ensembles (in this
    of 5 systems, which chooses the class la-
                                                       case, the one that correspond to “run-3”) for
    bel that represents the majority in the
                                                       the NEU class. With this in mind, it is pro-
    predictions made by the ensemble mem-
                                                       posed as future work to deal with the low
    bers.
                                                       representativeness of the NEU class in the
  • run-2: the less-correlated combination             training data (i.e., 9.28% of tweets), in order
    of 10 systems, which chooses the class             to properly characterize this kind of tweets.
    with the highest unweighted average
    probability.                                       4       Conclusion
  • run-3: the less-correlated combination             This paper has described an ensemble-based
    of 25 systems, which chooses the class             approach for sentiment analysis of Spanish
    with the highest unweighted average                Twitter data at global level, developed in
    probability.                                       order to participate in Task 1 proposed by
                                                       the organization of TASS workshop. Three
   Tables 1 and 2 show the performance eval-           ensembles were built on the combination of
uation on the test set (i.e., a collection of          sentiment analysis systems with the lowest
60,798 tweets) for six and four labels, respec-        absolute correlation with each other. The
tively. Accuracy has been defined as the offi-         systems were adapted to the informal genre
cial metric for ranking the systems. In sum-           and the free writing style that characterize
mary, the main gain occurs among the “run-             Twitter, in order to improve the quality of
1” and “run-2” experiments, with an incre-             natural language analysis. In this way, the
ment of 0.5% in accuracy in the six-labels             predicted class label for a particular tweet
                                                  38
          JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Analysis of Spanish Tweets at Global Level


was based on a majority rule or on the high-                     Padró, L. and E. Stanilovsky. 2012. Freel-
est average probability. Experimental results                      ing 3.0: Towards wider multilinguality.
showed that the less-correlated combination                        In Proceedings of the Language Resources
of 25 systems, which chose the class with the                      and Evaluation Conference (LREC 2012),
highest unweighted average probability, was                        Istanbul, Turkey, May. ELRA.
the setting that best suited to the task. How-
                                                                 Pang, B., L. Lee, and S. Vaithyanathan.
ever, there is a great room for improvement
                                                                   2002. Thumbs up?: Sentiment classifi-
in the learning of a proper characterization
                                                                   cation using machine learning techniques.
of neutral tweets.
                                                                   In Proceedings of the ACL-02 Conference
                                                                   on Empirical Methods in Natural Lan-
References                                                         guage Processing - Volume 10, EMNLP
Cerón-Guzmán, J. A. and E. León-Guzmán.                        ’02, pages 79–86. Association for Compu-
  2016. Lexical normalization of Spanish                           tational Linguistics.
  tweets. In Proceedings of the 25th Inter-
                                                                 Pedregosa, F., G. Varoquaux, A. Gram-
  national Conference Companion on World
                                                                   fort, V. Michel, B. Thirion, O. Grisel,
  Wide Web, WWW’16 Companion, pages
                                                                   M. Blondel, P. Prettenhofer, R. Weiss,
  605–610. International World Wide Web
                                                                   V. Dubourg, J. Vanderplas, A. Passos,
  Conferences Steering Committee.
                                                                   D. Cournapeau, M. Brucher, M. Perrot,
Garcı́a-Cumbreras, M. A., J. Villena-Román,                       and E. Duchesnay. 2011. Scikit-learn:
  E. Martı́nez-Cámara, M. C. Dı́az-Galiano,                       Machine learning in Python. Journal
  M. T. Martı́n-Valdivia, and L. A. Urena-                         of Machine Learning Research, 12:2825–
  López. 2016. Overview of tass 2016. In                          2830.
  Proceedings of TASS 2016: Workshop on                          Saralegi, X. and I. S. Vicente. 2013. Elhu-
  Sentiment Analysis at SEPLN co-located                            yar at tass 2013. In Proceedings of the
  with the 32nd SEPLN Conference (SE-                               Sentiment Analysis Workshop at SEPLN
  PLN 2016), Salamanca, Spain, Septem-                              (TASS2013), September.
  ber.
                                                                 Vilares, D., M. A. Alonso, and C. Gómez-
Gayo-Avello, D. 2013. A meta-analysis of                            Rodrıguez. 2014. On the usefulness of
  state-of-the-art electoral prediction from                        lexical and syntactic processing in polarity
  Twitter data. Soc. Sci. Comput. Rev.,                             classification of twitter messages. Journal
  31(6):649–679.                                                    of the Association for Information Science
Han, B. and T. Baldwin. 2011. Lexi-                                 and Technology.
  cal normalisation of short text messages:
  Makn sens a #Twitter. In Proceedings of
  the 49th Annual Meeting of the Associa-
  tion for Computational Linguistics: Hu-
  man Language Technologies - Volume 1,
  HLT’11, pages 368–378, Stroudsburg, PA,
  USA. Association for Computational Lin-
  guistics.
Manning, C. D., P. Raghavan, and
  H. Schütze.    2008.     Scoring, term
  weighting and the vector space model. In
  An Introduction to Information Retrieval.
  Cambridge University Press, New York,
  NY, USA.
Nielsen, F. Å. 2011. A new anew: evalu-
   ation of a word list for sentiment analy-
   sis in microblogs. In Proceedings of the
   ESWC2011 Workshop on ‘Making Sense
   of Microposts’: Big things come in small
   packages, pages 93–98.
                                                           39
                    TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 41-45


                     Participación de SINAI en TASS 2016∗
                             SINAI participation in TASS 2016

                 A. Montejo-Ráez                                       M.C. Dı́az-Galiano
                  University of Jaén                                    University of Jaén
                  23071 Jaén (Spain)                                   23071 Jaén (Spain)
                 amontejo@ujaen.es                                       mcdiaz@ujaen.es

      Resumen: Este artı́culo describe el sistema de clasificación de la polaridad utilizado
      por el equipo SINAI en la tarea 1 del taller TASS 2016. Como en participaciones
      anteriores, nuestro sistema se basa en un método supervisado con SVM a partir
      de vectores de palabras. Dichos vectores se calculan utilizando la técnicas de deep-
      learning Word2Vec, usando modelos generados a partir de una colección de tweets
      expresamente generada para esta tarea y el volcado de la Wikipedia en español. Nues-
      tros experimentos muestran que el uso de colecciones de datos masivos de Twitter
      pueden ayudar a mejorar sensiblemente el rendimiento del clasificador.
      Palabras clave: Análisis de sentimientos, clasificación de la polaridad, deep-
      learning, Word2Vec
      Abstract: This paper introduces the polarity classification system used by the SI-
      NAI team for the task 1 at the TASS 2016 workshop. Our approach is based on a
      supervised learning algorithm over vectors resulting from a weighted vector. This
      vector is computed using a deep-learning algorithm called Word2Vec. The algorithm
      is applied so as to generate a word vector from a deep neural net trained over a spe-
      cific tweets collection and the Spanish Wikipedia. Our experiments show massive
      data from Twitter can lead to a slight improvement in classificaciones accuracy.
      Keywords: Sentiment analysis, polarity classification, deep learning, Word2Vec,
      Doc2Vec

1    Introducción                                             de dichos vectores para obtener una única
                                                               representación vectorial. Nuestros resultados
En este trabajo describimos las aportacio-
                                                               demuestran que el rendimiento del sistema de
nes realizadas para participar en la ta-
                                                               clasificación puede verse sensiblemente mejo-
rea 1 del taller TASS (Sentiment Analy-
                                                               rado gracias a la introducción de estos datos
sis at global level), en su edición de 2016
                                                               en la generación del modelo de palabras, no
(Garcı́a-Cumbreras et al., 2016). Nuestra so-
                                                               ası́ en el entrenamiento del clasificador de po-
lución continúa con las técnicas aplicadas
                                                               laridad final.
en el TASS 2014 (Montejo-Ráez, Garcı́a-
Cumbreras, y Dı́az-Galiano, 2014) y 2015                           La tarea del TASS en 2016 denominada
(Dı́az-Galiano y Montejo-Ráez, 2015), utili-                  Sentiment Analysis at global level consiste en
zando aprendizaje profundo para represen-                      el desarrollo y evaluación de sistemas que de-
tar el texto y una colección de entrenamiento                 terminan la polaridad global de cada tweet
creada con tweets que contienen emoticonos                     del corpus general. Los sistemas presentados
que expresan emociones de felicidad o triste-                  deben predecir la polaridad de cada tweet uti-
za. Para ello utilizamos el método Word2Vec,                  lizando 6 o 4 etiquetas de clase (granularidad
ya que ha obtenido los mejores resultados en                   fina y gruesa respectivamente).
años anteriores. Por lo tanto, generamos un                      El resto del artı́culo está organizado de la
vector de pesos para cada palabra del tweet                    siguiente forma. El apartado 2 describe el es-
utilizando Word2Vec, y realizamos la media                     tado del arte de los sistemas de clasificación
∗
                                                               de polaridad en español. A continuación, se
  Este estudio está parcialmente financiado por el
proyecto TIN2015-65136-C2-1-R otorgado por el Mi-
                                                               describe la colección de tweets con emotico-
nisterio de Economı́a y Competitividad del Gobierno            nos utilizada para entrenar el clasificador. En
de España.                                                    el apartado 4 se describe el sistema desarro-
                                                   ISSN 1613-0073
                                     A. Montejo-Ráez, M. C. Díaz-Galiano


llado y en el apartado 5 los experimentos rea-            tado y Pla, 2014). Abordaron la tarea co-
lizados, los resultados obtenidos y el análisis          mo un problema de clasificación, utilizando
de los mismos. Finalmente, en el último apar-            SVM. Utilizaron una estrategia uno-contra-
tado exponemos las conclusiones y el trabajo              todos donde entrenan un sistema binario pa-
futuro.                                                   ra cada polaridad. Los tweets fueron tokeni-
                                                          nizados para utilizar las palabras o los lemas
2   Clasificación de la polaridad en                     como caracterı́sticas y el valor de cada carac-
    español                                              terı́stica era su coeficiente tf-idf. Posterior-
La mayor parte de los sistemas de clasifica-              mente realizaron una validación cruzada para
ción de polaridad están centrados en textos             determinar el mejor conjunto de caracterı́sti-
en inglés, y para textos en español el sistema          cas y parámetros a utilizar.
más completo, en cuanto a técnicas lingüı́sti-             El equipo ELiRF-UPV (Hurtado, Pla, y
cas aplicadas, posiblemente sea The Spanish               Buscaldi, 2015) volvió a obtener los mejores
SO Calculator (Brooke, Tofiloski, y Taboada,              resultados en la edición de TASS 2015 con
2009), que además de resolver la polaridad de            una técnica muy similar a la edición anterior
los componentes clásicos (adjetivos, sustanti-           (SVM, tokenización, clasificadores binarios y
vos, verbos y adverbios) trabaja con modifi-              coeficientes tf-idf). En este caso utilizaron un
cadores como la detección de negación o los             sistema de votación simple entre un mayor
intensificadores.                                         número de clasificadores con parámetros dis-
    Los algoritmos de aprendizaje profundo                tintos. Los mejores resultados los obtuvieron
(deep-learning en inglés) están dando buenos            con un sistema que combinaba 192 sistemas
resultados en tareas donde el estado del ar-              SVM con configuraciones diferentes, utilizan-
te parecı́a haberse estancado (Bengio, 2009).             do un nuevo sistema SVM para realizar dicha
Estas técnicas también son de aplicación en            combinación.
el procesamiento del lenguaje natural (Collo-
bert y Weston, 2008), e incluso ya existen sis-
                                                          3     Colección de tweets con
temas orientados al análisis de sentimientos,                  emoticonos
como el de Socher et al. (Socher et al., 2011).           Los algoritmos de deep-learning necesitan
Los algoritmos de aprendizaje automático no              grandes volúmenes de datos para su entre-
son nuevos, pero sı́ están resurgiendo gracias           namiento. Por ese motivo se ha creado una
a una mejora de las técnicas y la disposición           colección de tweets especı́fica para la detec-
de grandes volúmenes de datos necesarios pa-             ción de polaridad. Para crear dicha colección
ra su entrenamiento efectivo.                             se han recuperado tweets con las siguientes
    En la edición de TASS en 2012 el equipo              caracterı́sticas:
que obtuvo mejores resultados (Saralegi Uri-
zar y San Vicente Roncal, 2012) presentaron                      Que contengan emoticonos que expresen
un sistema completo de pre-procesamiento de                      la polaridad del tweet. En este caso se
los tweets y aplicaron un lexicón derivado del                  han utilizado los siguientes emoticonos:
inglés para polarizar los tweets. Sus resulta-
                                                                    • Positivos: :) :-) :D :-D
dos eran robustos en granularidad fina (65 %
de accuracy) y gruesa (71 % de accuracy).                           • Negativos: :( :-(
    En la edición de TASS en 2013 el mejor
                                                                 Que los tweets no contengan URLs, para
equipo (Fernández et al., 2013) tuvo todos
                                                                 evitar tweets cuyo contenido principal se
sus experimentos en el top 10 de los resul-
                                                                 encuentra en el enlace.
tados, y la combinación de ellos alcanzó la
primera posición. Presentaron un sistema con                    Que no sean retweets, para reducir el
dos variantes: una versión modificada del al-                   número de tweets repetidos.
goritmo de ranking (RA-SR) utilizando bi-
gramas, y una nueva propuesta basada en                      La captura de dichos tweets se realizó
skipgrams. Con estas dos variantes crearon                durante 22 dı́as, del 18/07/2016 hasta el
lexicones sobre sentimientos, y los utilizaron            9/08/2016, recuperando unos 100.000 tweets
junto con aprendizaje automático (SVM) pa-               diarios aproximadamente. Tal y como se ve
ra detectar la polaridad de los tweets.                   en la Figura 1 la recuperación fue muy ho-
    En 2014 el equipo con mejores resultados              mogénea y se obtuvieron más de 2.000.000
en TASS se denominaba ELiRF-UPV (Hur-                     de tweets.
                                                     42
                                       Participación de SINAI en TASS 2016


                                                            anterior, pero en la que se intenta predecir
                                                            los términos acompañantes a partir de un
                                                            término dado. Con estas topologı́as, si dis-
                                                            ponemos de un volumen de textos suficiente,
                                                            esta representación puede llegar a capturar
Figura 1: Número de tweets recuperados cada                la semántica de cada palabra. El número de
12 horas                                                    dimensiones (longitud de los vectores de ca-
                                                            da palabra) puede elegirse libremente. Para
    Posteriormente, se realizó un filtrado de              el cálculo del modelo Word2Vec hemos re-
dichos tweets eliminando aquellos que con-                  currido al software indicado, creado por los
tubieran menos de 5 palabras, teniendo                      propios autores del método.
en cuenta que consideramos palabra todo                         Tal y como se ha indicado, para obtener
término que sólo contenga letras (sin núme-              los vectores Word2Vec representativos para
ros, ni caracteres especiales).                             cada palabra tenemos que generar un modelo
    Al final quedaron 1.777.279 clasificados                a partir de un volumen de texto grande. Para
según el emoticono que contienen de la si-                 ello hemos utilizado los parámetros que me-
guiente manera:                                             jores resultados obtuvieron en nuestra par-
                                                            ticipación del 2014 (Montejo-Ráez, Garcı́a-
         Positivos: 869.339 tweets
                                                            Cumbreras, y Dı́az-Galiano, 2014). Por lo
         Negativos: 907.940 tweets                          tanto, a partir de un volcado de Wikipedia2
                                                            en Español de los artı́culos en XML, hemos
   Por último, se realiza la siguiente limpieza
                                                            extraı́do el texto de los mismos. Obtenemos
de tweets:
                                                            ası́ unos 2,2 GB de texto plano que alimen-
         Convertir el texto a minúsculas.                  ta al programa word2vec con los parámetros
                                                            siguientes: una ventana de 5 términos, el mo-
         Eliminar menciones (nombres de usuario
                                                            delo skip-gram y un número de dimensiones
         que empiezan el caracter @).
                                                            esperado de 300, logrando un modelo con más
         Sustituir letras acentuadas por sus ver-           de 1,2 millones de palabras en su vocabulario.
         siones sin acentuar.                                   Como puede verse en la Figura 2, nuestro
         Quitar las palabras vacı́as de contenido           sistema realiza la clasificación de los tweets
         (stopwords).                                       utilizando dos fases de aprendizaje, una en
                                                            la que entrenamos el modelo Word2Vec ha-
         Normalizar las palabras para que no con-           ciendo uso de un volcado de la enciclopedia
         tengan letras repetidas, sustituyendo las          on-line Wikipedia, en su versión en español,
         repeticiones de letras contiguas para de-          como hemos indicado anteriormente. De esta
         jar sólo 3 repeticiones.                          forma representamos cada tweet con el vector
                                                            resultado de calcular la media de los vectores
4       Descripción del sistema                            Word2Vec de cada palabra en el tweet y su
Word2Vec1 es una implementación de la ar-                  desviación tı́pica (por lo que cada vector de
quitectura de representación de las palabras               palabras por modelo es de 600 dimensiones).
mediante vectores en el espacio continuo, ba-               Se lleva a cabo una simple normalización pre-
sada en bolsas de palabras o n-gramas con-                  via sobre el tweet, eliminando repetición de
cebida por Tomas Mikolov et al. (Mikolov                    letras y poniendo todo a minúsculas. La se-
et al., 2013). Su capacidad para capturar la                gunda fase de entrenamiento utiliza el algo-
semántica de las palabras queda comproba-                  ritmo SVM y se entrena con la colección de
da en su aplicabilidad a problemas como la                  tweets con emoticonos explicada en el aparta-
analogı́a entre términos o el agrupamiento de              do 3. La implementación de SVM utilizada es
palabras. El método consiste en proyectar las              la basada en kernel lineal con entrenamiento
palabras a un espacio n-dimensional, cuyos                  SGD (Stochastic Gradient Descent) propor-
pesos se determinan a partir de una estruc-                 cionada por la biblioteca Sci-kit Learn3 (Pe-
tura de red neuronal mediante un algoritmo                  dregosa et al., 2011).
recurrente. El modelo se puede configurar pa-                   Esta solución es la utilizada en las dos va-
ra que utilice una topologı́a de bolsa de pa-               riantes de la tarea 1 del TASS con predicción
labras (CBOW) o skip-gram, muy similar al                       2
                                                                    http://dumps.wikimedia.org/eswiki
    1                                                           3
        https://code.google.com/p/word2vec/                         http://scikit-learn.org/
                                                       43
                                    A. Montejo-Ráez, M. C. Díaz-Galiano


de 4 clases: la que utiliza el corpus de tweets
                                                         Tabla 1: Resultados obtenidos sobre el con-
completo (full test corpus) y el que utiliza el
                                                         junto full
corpus balanceado (1k test corpus).
                                                          w2v       SVM      Accuracy Macro-F1
                                                          W         TASS        61,31 %      48,55 %
                                                          W+T TASS              62,39 %      50,44 %
                                                          W         TASS+T      49,28 %      40,20 %
                                                          W+T TASS+T            53,72 %      44,10 %

                                                         nerado solamente con Wikipedia, pasando de
                                                         61,31 % de ajuste a un 62,39 %. En cambio,
                                                         utilizar los tweets capturados para la fase
                                                         de entrenamiento supervisado no lleva sino
                                                         a una caı́da del rendimiento del sistema.
Figura 2: Flujo de datos del sistema completo               Esto nos lleva a plantearnos la pregunta
                                                         de qué ocurrirı́a si utilizáramos sólo los tweets
                                                         recopilados para generar un modelo de vecto-
5   Resultados obtenidos                                 res de palabras. Los resultados que se obtie-
Hemos experimentado con el efecto que tie-               nen son un 59,05 % de ajuste y un 44,43 % de
nen en el rendimiento del sistema el uso de              F1. No cabe duda de que conviene explorar el
una colección de datos generada a partir de             uso de modelos de generación de caracterı́sti-
la captura de tweets y que han sido etique-              cas a partir de vectores de palabras.
tados según los emoticonos que contienen en                Estos resultados mejoran nuestros datos
la forma comentada anteriormente. La colec-              del año pasado, en los que obtuvimos un ajus-
ción de más de 1,7 millones de tweets ha sido          te del 61,19 % combinando vectores de pala-
utilizada al completo para generar un mode-              bras (Word2Vec) y vectores de documentos
lo de vectores de palabras, cuya combinación            (Doc2Vec).
con el de Wikipedia se ha analizado. También
hemos comprobado cómo el uso de dicha co-               6     Conclusiones y trabajo futuro
lección de tweets afecta cuando se usa para             A partir de los resultados obtenidos, encon-
el entrenamiento del modelo de clasificación            tramos que resulta interesante la incorpora-
de la polaridad. Para ello se han selecciona-            ción de texto no formal (tweets) para la ge-
do 500,000 tweets aleatoriamente de esta co-             neración de los modelos de palabras, lo cual
lección, con sus correspondientes etiquetas P           tiene su sentido en una tarea de clasifica-
(positivo) o N (negativo) y se han combiando             ción que, precisamente, trabaja sobre textos
con la colecciónd de entrenamiento de TASS.             no formales que tienen la misma red social
    Los resultados según las medidaas de Ac-            como fuente. En cambio, el considerar que
curacy y Macro F1 obtenidas se muestran                  los emoticonos en un tweet pueden ayudar a
en la tabla 1. La primera columna nos in-                un clasificador como SVM a mejorar en la
dica a partir de cuáles datos se han genera-            determinación de la polaridad ha resultado
do los modelos de vectores de palabras, bien             una hipótesis fallida. Esto puede entenderse
sólo con Wikipedia (W) o como combinación              echando un vistazo a algunos de los tweets
de ésta con los tweets del corpus construido            capturados por el sistema, donde se eviden-
(W+T). La segunda columna indica cómo se                cia la dificultad, incluso para una persona,
ha entrenado el clasificador de polaridad a              de poner en contexto el sentido del tweet y
partir de los textos etiquetados vectorizados            su consideración como positivo o negativo si
con los modelos generados en el paso previo,             no disponemos de un emoticono asociado.
bien sólo usando los datos de entrenamiento                 Como trabajo futuro nos proponemos di-
proporcionados por la organizacion (TASS) o              señar una red neuronal profunda más elabo-
incorporando los etiquetados a partir de emo-            rada, pero que parta también de textos de
ticonos (TASS+T).                                        entrenamiento tanto formales como no for-
    Como podemos observar, el uso de una co-             males, si bien teniendo en cuanta información
lección de tweets para ampliar la capacidad             lingüı́stica más avanzada como la sintáctica,
de representar un modelo basado en vecto-                en lugar de trabajar con simples bolsas de
res de palabras mejora sensiblemente al ge-              palabras. También queremos explorar el uso
                                                    44
                                       Participación de SINAI en TASS 2016


de redes de este tipo en el proceso de clasfi-              Hurtado, Lluı́s F y Ferran Pla. 2014. Elirf-
cación en sı́, y no sólo en la generación de ca-           upv en tass 2014: Análisis de sentimien-
racterı́sticas. Una posibilidad es utilizar una               tos, detección de tópicos y análisis de sen-
red de tipo DBN (Deep Belief Network) (Hin-                   timientos de aspectos en twitter. En In
ton y Salakhutdinov, 2006) en la que se añade                Proc. of the TASS workshop at SEPLN
una última fase donde se realiza el etiquetado               2014.
de los ejemplos.
                                                            Hurtado, Lluı́s-F, Ferran Pla, y Davide Bus-
                                                              caldi. 2015. Elirf-upv en tass 2015: Análi-
Bibliografı́a
                                                              sis de sentimientos en twitter. En In Proc.
Bengio, Yoshua. 2009. Learning deep archi-                    of TASS 2015: Workshop on Sentiment
  tectures for ai. Foundations and trends in                  Analysis at SEPLN. CEUR-WS.org, volu-
  Machine Learning, 2(1):1–127.                               men 1397, páginas 35–40.
Brooke, Julian, Milan Tofiloski, y Maite Ta-                Mikolov, Tomas, Kai Chen, Greg Corrado, y
  boada. 2009. Cross-linguistic sentiment                     Jeffrey Dean. 2013. Efficient estimation
  analysis: From english to spanish. En                       of word representations in vector space.
  Galia Angelova Kalina Bontcheva Ruslan                      CoRR, abs/1301.3781.
  Mitkov Nicolas Nicolov, y Nikolai Nikolov,
  editores, RANLP, páginas 50–54. RANLP                    Montejo-Ráez, A., M.A. Garcı́a-Cumbreras,
  2009 Organising Committee / ACL.                            y M.C. Dı́az-Galiano. 2014. Participación
                                                              de SINAI Word2Vec en TASS 2014. En
Collobert, Ronan y Jason Weston. 2008.                        In Proc. of the TASS workshop at SEPLN
  A unified architecture for natural langua-                  2014.
  ge processing: Deep neural networks with
  multitask learning. En Proceedings of the                 Pedregosa, Fabian, Gaël Varoquaux, Alexan-
  25th International Conference on Machi-                     dre Gramfort, Vincent Michel, Bertrand
  ne Learning, ICML ’08, páginas 160–167,                    Thirion, Olivier Grisel, Mathieu Blondel,
  New York, NY, USA. ACM.                                     Peter Prettenhofer, Ron Weiss, Vincent
                                                              Dubourg, y others. 2011. Scikit-learn:
Dı́az-Galiano, M.C. y A. Montejo-Ráez.                       Machine learning in python. The Journal
    2015. Participación de SINAI DW2Vec                      of Machine Learning Research, 12:2825–
    en TASS 2015. En In Proc. of TASS                         2830.
    2015: Workshop on Sentiment Analysis at
                                                            Saralegi Urizar, Xabier y Iñaki San Vicen-
    SEPLN. CEUR-WS.org, volumen 1397.
                                                               te Roncal. 2012. Tass: Detecting senti-
Fernández, Javi, Yoan Gutiérrez, José M.                    ments in spanish tweets. En TASS 2012
   Gómez, Patricio Martı́nez-Barco, Andrés                   Working Notes.
   Montoyo, y Rafael Muñoz. 2013. Sen-
                                                            Socher, Richard, Jeffrey Pennington, Eric H.
   timent analysis of spanish tweets using a
                                                               Huang, Andrew Y. Ng, y Christopher D.
   ranking algorithm and skipgrams. En In
                                                               Manning. 2011. Semi-supervised recursi-
   Proc. of the TASS workshop at SEPLN
                                                               ve autoencoders for predicting sentiment
   2013.
                                                               distributions. En Proceedings of the Con-
Garcı́a-Cumbreras, Miguel Ángel, Julio                        ference on Empirical Methods in Natural
  Villena-Román,     Eugenio     Martı́nez-                   Language Processing, EMNLP ’11, pági-
  Cámara, Manuel Carlos Dı́az-Galiano,                        nas 151–161, Stroudsburg, PA, USA. As-
  Ma . Teresa Martı́n-Valdivia, y L. Alfonso                   sociation for Computational Linguistics.
  Ureña-López. 2016. Overview of tass
  2016. En Proceedings of TASS 2016:
  Workshop on Sentiment Analysis at
  SEPLN co-located with the 32nd SEPLN
  Conference (SEPLN 2016), Salamanca,
  Spain, September.
Hinton, Geoffrey E y Ruslan R Salakhutdi-
   nov. 2006. Reducing the dimensionality
   of data with neural networks. Science,
   313(5786):504–507.
                                                       45
                   TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 47-51


ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter
          ELiRF-UPV at TASS 2016: Sentiment Analysis in Twitter

                                 Lluı́s-F. Hurtado y Ferran Pla
                                 Universitat Politècnica de València
                                          Camı́ de Vera s/n
                                           46022 València
                                    {lhurtado, fpla}@dsic.upv.es

      Resumen: En este trabajo se describe la participación del equipo del grupo de
      investigación ELiRF de la Universitat Politècnica de València en el Taller TASS2016.
      Este taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual
      de la Sociedad Española para el Procesamiento del Lenguaje Natural. Este trabajo
      presenta las aproximaciones utilizadas para las dos tareas planteadas en el taller,
      los resultados obtenidos y una discusión de los mismos. Nuestra participación se
      ha centrado principalmente en explorar diferentes aproximaciones para combinar un
      conjunto de sistemas con lo que se ha obtenido los mejores resultados en ambas
      tareas.
      Palabras clave: Twitter, Análisis de Sentimientos.
      Abstract: This paper describes the participation of the ELiRF research group of
      the Universitat Politècnica de València at TASS2016 Workshop. This workshop is a
      satellite event of the XXXII edition of the Annual Conference of the Spanish Society
      for Natural Language Processing. This work describes the approaches used for the
      two tasks of the workshop, the results obtained and a discussion of these results. Our
      participation has focused primarily on exploring different approaches for combining
      a set of systems. Using these approaches we have achieved the best results in both
      tasks.
      Keywords: Twitter, Sentiment Analysis.


1.   Introducción                                            junto de tweets sobre diferentes aspectos per-
                                                              tenecientes al dominio de la polı́tica.
   El Taller de Análisis de Sentimientos                         El presente artı́culo resume la participa-
(TASS) en sus cinco ediciones ha venido plan-                 ción del equipo ELiRF-UPV de la Universi-
teando tareas relacionadas con el análisis de                tat Politècnica de València en todas las tareas
sentimientos en Twitter. El objetivo principal                planteadas en este taller. Primero se descri-
es el de comparar y evaluar diferentes aproxi-                ben las aproximaciones y recursos utilizados
maciones a estas tareas. Además, desarrolla                  en cada tarea. A continuación se presenta la
recursos de libre acceso, básicamente, corpora               evaluación experimental realizada y los resul-
anotados con polaridad, temática, tendencia                  tados obtenidos. Finalmente se muestran las
polı́tica, aspectos, que son de gran utilidad                 conclusiones y posibles trabajos futuros.
para la comparación de diferentes aproxima-
ciones a las tareas propuestas.                               2.      Descripción de los sistemas
   En esta quinta edición del TASS se pro-                       Los sistemas presentados en el TASS 2016
ponen dos tareas de ediciones anteriores                      se basan en el sistema desarrollado en la edi-
(Garcı́a-Cumbreras et al., 2016): 1) Determi-                 cion anterior del TASS 2015 (Hurtado, Pla,
nación de la polaridad en tweets, con dife-                  y Buscaldi, 2015). Muchas de las caracterı́sti-
rentes grados de intensidad en la polaridad:                  cas y recursos de este sistema fueron uti-
6 etiquetas y 4 etiquetas y 2) Determinación                 lizados en las ediciones en las que nuestro
de la polaridad de los aspectos en el corpus                  equipo ha participado (Pla y Hurtado, 2013)
STOMPOL. Este corpus consta de un con-                        (Hurtado y Pla, 2014) . El preproceso de los
                                                  ISSN 1613-0073
                                              Ll.-F. Hurtado, F. Pla


tweets utiliza la estrategia descrita en el tra-             3.        Tarea 1: Análisis de
bajo del TASS 2013 (Pla y Hurtado, 2013).                              sentimientos en tweets
Esta consiste básicamente en la adaptación
                                                                 Esta tarea consiste en determinar la pola-
para el castellano del tokenizador de tweets
                                                             ridad de los tweets y la organización ha defi-
Tweetmotif (Connor, Krieger, y Ahn, 2010).
                                                             nido dos subtareas. La primera distingue seis
También se ha usado Freeling (Padró y Sta-
                                                             etiquetas de polaridad: N y N+ que expresan
nilovsky, 2012)1 como lematizador, detector
                                                             polaridad negativa con diferente intensidad,
de entidades nombradas y etiquetador mor-
                                                             P y P+ para la polaridad positiva con dife-
fosintáctico, con las correspondientes modifi-
                                                             rente intensidad, NEU para la polaridad neu-
caciones para el dominio de Twitter. Usando
                                                             tra y NONE para expresar ausencia de pola-
esta aproximación, la tokenización ha consis-
                                                             ridad. La segunda sólo distinguen 4 etiquetas
tido en agrupar todas las fechas, los signos
                                                             de polaridad: N, P, NEU y NONE.
de puntuación, los números y las direcciones
web. Se han conservado los hashtags y las                        El corpus proporcionado por la organiza-
menciones de usuario. Se ha considerado y                    ción del TASS consta de un conjunto de en-
evaluado el uso de palabras y lemas como to-                 trenamiento, compuesto por 7219 tweets eti-
kens ası́ como la detección de entidades nom-               quetados con la polaridad usando seis etique-
bradas.                                                      tas, y un conjunto de test, de 60798 tweets,
                                                             al cual se le debe asignar la polaridad. La dis-
   Todas las tareas se han abordado como                     tribución de tweets según su polaridad en el
un problema de clasificación. Se han utiliza-               conjunto de entrenamiento se muestra en la
do Máquinas de Soporte Vectorial (SVM) por                  Tabla 1.
su capacidad para manejar con éxito gran-
des cantidades de caracterı́sticas. En concreto                         Polaridad   # tweets     %
usamos dos librerı́as (LibSVM2 y LibLinear3 )                           N               1335   18.49
que han demostrado ser eficientes implemen-                             N+               847   11.73
taciones de SVM que igualan el estado del                               NEU              670    9.28
arte. El software está desarrollado en Python                          NONE            1483   20.54
y para acceder a las librerı́as de SVM se ha                            P               1232   17.07
utilizado el toolkit scikit-learn4 . (Pedregosa                         P+              1652   22.88
et al., 2011).                                                          TOTAL           7219     100
   En este trabajo se ha explotado la técni-
ca de combinación de diferentes configuracio-
nes de clasificadores para aprovechar su com-                Tabla 1: Distribución de tweets en el conjunto
plementariedad. Se ha utilizado la técnica de               de entrenamiento según su polaridad.
votación simple utilizada en trabajos ante-
riores (Pla y Hurtado, 2013) (Pla y Hurtado,                     A partir de la tokenización propuesta se
2014b) pero en este caso extendiéndola a un                 realizó un proceso de validación cruzada (10-
número mayor de clasificadores, con diferen-                fold cross validation) para determinar el me-
tes parámetros y caracterı́sticas (palabras, le-            jor conjunto de caracterı́sticas y los paráme-
mas, n-gramas de palabras y lemas) ası́ como                 tros del modelo. Como caracterı́sticas se pro-
estrategias de combinación alternativas.                    baron diferentes tamaños de n-gramas de pa-
                                                             labras y de lemas. También se exploró la com-
    Cada tweet se ha representado como un
                                                             binación de los modelos mediante diferentes
vector que contiene los coeficientes tf-idf de
                                                             técnicas de votación para aprovechar su com-
las caracterı́sticas consideradas. En toda la
                                                             plementariedad y mejorar las prestaciones fi-
experimentación realizada, las caracterı́sticas
                                                             nales. Algunas de éstas técnicas proporcio-
y los parámetros de los clasificadores se han
                                                             naron mejoras significativas sobre el mismo
elegido mediante una validación cruzada de
                                                             conjunto de datos, como se muestra en (Pla
10 iteraciones (10-fold cross-validation) sobre
                                                             y Hurtado, 2014b). En todos los casos se han
el conjunto de entrenamiento.
                                                             utilizado diccionarios de polaridad, tanto de
                                                             lemas (Saralegi y San Vicente, 2013), como
  1
    http://nlp.lsi.upc.edu/freeling/
                                                             de palabras (Martı́nez-Cámara et al., 2013)
  2
    http://www.csie.ntu.edu.tw/˜cjlin/libsvm/                y el diccionario Afinn (Hansen et al., 2011)
  3
    http://www.csie.ntu.edu.tw/˜cjlin/liblinear/             traducido automáticamente del inglés al cas-
  4
    http://scikit-learn.org/stable/                          tellano.
                                                       48
                          ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter


  Se han considerado dos alternativas para                 Los sistemas presentados han obtenido las
abordar la tarea:                                          dos primeras posiciones en las dos subtareas
                                                           consideradas.
    run1 La primera alternativa combina
    mediante un sistema de votación ponde-                                             Run    Accuracy
    rada la salida de 192 clasificadores ba-                                            run1      0.662
    sados en el uso de SVM. La diferencia                         6-ETIQUETAS           run2      0.673
    entre los clasificadores radica en el pre-                                          run1      0.707
    procesado y la tokenización utilizada, las                   4-ETIQUETAS           run2      0.721
    caracterı́sticas seleccionadas y los valo-
    res de los parámetros del propio modelo
    SVM.                                                   Tabla 2: Resultados oficiales del equipo
    En concreto se realizaron todas las com-               ELiRF-UPV en la Tarea 1 de la competición
    binaciones posibles entre 8 tokenizacio-               TASS-2016 sobre el conjunto de test para 6
    nes (lemas o palabras, detectar NE o no,               y 4 etiquetas.
    detectar menciones a usuarios y hash-
    tags, ...); 4 conjuntos distinto de ca-
    racterı́sticas (palabras o bigramas con y              4.      Tarea 2: Análisis de Polaridad
    sin diccionarios de polaridad) y 6 valo-                       de Aspectos en Twitter
    res distintos del parámetro c del modelo                  Esta tarea consiste en asignar la polari-
    SVM con kernel lineal.                                 dad a los aspectos que aparecen marcados en
    La clase asignada a cada tweet t viene                 el corpus. Una de las dificultades de la tarea
    determinada por la siguiente fórmula.                 consiste en definir qué contexto se le asigna a
                                                           cada aspecto para poder establecer su polari-
            ĉ = argmax(Nt (c) · P (c))        (1)         dad. Para un problema similar, detección de
                   c∈C
                                                           la polaridad a nivel de entidad, en la edición
                                                           del TASS 2013, propusimos una segmenta-
    Donde C es el conjunto de todas las cla-               ción de los tweets basada en un conjunto de
    ses, Nt (c) es el número de clasificadores            heurı́sticas (Pla y Hurtado, 2013). Esta apro-
    que asignan la clase c al tweet t, y P (c)             ximación también se utilizó para la tarea de
    es la probabilidad a priori de la clase c              detección de la tendencia polı́tica de los usua-
    calculada utilizando el corpus de entre-               rios de Twitter (Pla y Hurtado, 2014a) y pa-
    namiento.                                              ra este caso proporcionó buenos resultados.
    run2 La segunda alternativa explora                    En este trabajo se propone una aproximación
    la combinación de modelos mediante el                 más simple que consiste en determinar el con-
    aprendizaje de un metaclasificador. Uti-               texto de cada aspecto a través de una venta-
    lizando las salidas de los mismos 192 cla-             na fija definida a la izquierda y derecha de la
    sificadores que en el run anterior, se ha              instancia del aspecto. Esta aproximación es
    aprendido un segundo modelo SVM que                    la que se utilizó en nuestro sistema del TASS
    sirve para proporcionar la nueva salida                2015 la cual utiliza ventanas de diferente lon-
    combinada. Se ha destinado una parte                   gitud. La longitud de la ventana óptima se
    del corpus de entrenamiento para ajus-                 ha determinado experimentalmente sobre el
    tar los parámetros del metamodelo. Esta               conjunto de entrenamiento mediante una va-
    aproximación es la misma que la utiliza-              lidación cruzada. Para entrenar nuestro sis-
    da en la edición del TASS 2015.                       tema, se ha considerado el conjunto de entre-
                                                           namiento únicamente, se han determinado los
   Para la subtarea de 4 etiquetas el run1 se              segmentos para cada aspecto y se ha seguido
ha aprendido utilizando el corpus de apren-                una aproximación similar a la Tarea 1.
dizaje con 4 etiquetas mientras que el run2,                   El corpus de la tarea, corpus STOMPOL,
dada la complejidad del ajuste de parámetros              se compone de un conjunto de tweets relacio-
del metamodelo se ha optado por adaptar el                 nados con una serie de aspectos polı́ticos (co-
resultado de la subtarea de 6 etiquetas unien-             mo economı́a, sanidad, etc.) enmarcados en
do P y P+ como P y N y N+ como N.                          la campaña polı́tica de las elecciones andalu-
   En la Tabla 2 se muestran los valores de                zas de 2015. Cada aspecto se relaciona con
Accuracy obtenidos para las dos subtareas.                 una o varias entidades que se corresponden
                                                      49
                                           Ll.-F. Hurtado, F. Pla


con uno de los principales partidos polı́ticos            dos últimas ediciones del TASS, creemos que
en España (PP, PSOE, IU, UPyD, Cs y Pode-                se está cerca de alcanzar los mejores resulta-
mos). El corpus consta de 1.284 tweets, y ha              dos posibles en la tarea de Análisis de senti-
sido dividido en un conjunto de entrenamien-              mientos tal y como se ha venido planteando
to (784 tweets) y un conjunto de evaluación              hasta el momento.
(500 tweets).                                                A la vista de los buenos resultados que se
                                                          han obtenido mediante la combinación de sis-
4.1.    Aproximación y resultados                        temas, como trabajo futuro nos planteamos
    A continuación presentamos una pequeña              desarrollar nuevos métodos de combinación
descripción de las caracterı́sticas de nuestro           de sistemas más sofisticados ası́ como la in-
sistema ası́ como el proceso seguido en la fase           clusión de otros paradigmas de clasificación
de entrenamiento. El sistema utiliza un cla-              más hetereogéneos (distintos de los SVM) pa-
sificador basado en SVM. Para aprender los                ra aumentar la complementariedad de los sis-
modelos sólo se utiliza el conjunto de entre-            temas combinados.
namiento proporcionado para la tarea y los                   Además, se pretende extender el sistema
diccionarios de polaridad previamente descri-             para otros idiomas. El sistema descrito ya
tos. Antes de abordar el entrenamiento se de-             ha sido utilizado, con ligeras modificaciones,
terminan los segmentos de tweet que cons-                 en tareas de análisis de sentimientos para el
tituyen el contexto de cada una de los as-                Inglés en la competición Semeval (Martı́nez,
pectos presentes. Se ha tenido en cuenta tres             Pla, y Hurtado, 2016) aunque con resultados
tamaños de ventana de longitudes 5, 7 y 10               no tan satisfactorios como en las tareas del
palabras a la izquierda y derecha del aspec-              TASS.
to. Cada uno de los segmentos se tokeniza y
se utiliza Freeling para determinar sus lemas             Agradecimientos
y ciertas entidades. A continuación se apren-               Este trabajo ha sido parcialmente subven-
den diferentes modelos combinando tamaños                cionado por el MINECO mediante el proyec-
de ventana, parámetros del modelo y diferen-             to ASLP-MULAN: Audio, Speech and Lan-
tes caracterı́sticas (palabras, lemas, NE, etc).          guage Processing for Multimedia Analytics
Mediante validación cruzada se elige el mejor            (TIN2014-54288-C4-3-R).
modelo. Para esta tarea sólo hemos presenta-
do un modelo.                                             Bibliografı́a
                      Run     Accuracy                    Connor, Brendan O, Michel Krieger, y Da-
        STOMPOL       run1       0.633                      vid Ahn. 2010. Tweetmotif: Exploratory
                                                            search and topic summarization for twit-
                                                            ter. En William W. Cohen y Samuel Gos-
Tabla 3: Resultados oficiales del equipo                    ling, editores, Proceedings of the Fourth
ELiRF-UPV en la Tarea 2 de la competición                  International Conference on Weblogs and
TASS-2016 para el corpus STOMPOL.                           Social Media, ICWSM 2010, Washington,
                                                            DC, USA, May 23-26, 2010. The AAAI
   En la Tabla 3 se presentan los resultados                Press.
obtenidos para la Tarea 2 con lo que nuestra
aproximación ha obtenido la primera posición            Garcı́a-Cumbreras, Miguel Ángel, Julio
en dicha tarea.                                             Villena-Román,     Eugenio     Martı́nez-
                                                            Cámara, Manuel Carlos Dı́az-Galiano,
5.     Conclusiones y trabajos                              Ma . Teresa Martı́n-Valdivia, y L. Alfonso
       futuros                                              Ureña-López. 2016. Overview of tass
                                                            2016. En Proceedings of TASS 2016:
    En este trabajo se ha presentado la parti-
                                                            Workshop on Sentiment Analysis at
cipación del grupo ELiRF-UPV en las 2 ta-
                                                            SEPLN co-located with the 32nd SEPLN
reas planteadas en TASS 2016. Nuestro equi-
                                                            Conference (SEPLN 2016), Salamanca,
po ha utilizado aproximaciones basadas en
                                                            Spain, September.
máquinas de soporte vectorial y se ha cen-
trado principalmente en combinar diferentes               Hansen, Lars Kai, Adam Arvidsson,
sistemas.                                                   Finn Årup Nielsen, Elanor Colleoni,
    Haciendo un análisis del número de parti-             y Michael Etter. 2011. Good friends, bad
cipantes y de los resultados obtenidos en las               news-affect and virality in twitter. En
                                                    50
                          ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter


   Future information technology. Springer,                Pla, Ferran y Lluı́s-F. Hurtado. 2014b. Sen-
   páginas 34–43.                                            timent analysis in twitter for spanish. En
                                                              Elisabeth Métais Mathieu Roche, y Ma-
Hurtado, Lluı́s F., Ferran Pla, y Davide Bus-
                                                              guelonne Teisseire, editores, Natural Lan-
  caldi. 2015. Elirf-upv en tass 2015: Análi-
                                                              guage Processing and Information Sys-
  sis de sentimientos en twitter. En SEPLN.
                                                              tems, volumen 8455 de Lecture Notes in
Hurtado, LLuı́s F y Ferran Pla. 2014. Elirf-                  Computer Science. Springer International
  upv en tass 2014: Análisis de sentimien-                   Publishing, páginas 208–213.
  tos, detección de tópicos y análisis de               Saralegi, Xabier y Iñaki San Vicente. 2013.
  sentimientos de aspectos en twitter. En                     Elhuyar at tass 2013. En Proceedings of
  TASS2014.                                                   the TASS workshop at SEPLN 2013. IV
Martı́nez, Vı́ctor, Ferran Pla, y Lluı́s-F Hur-               Congreso Español de Informática.
  tado. 2016. Dsic-elirf at semeval-2016
  task 4: Message polarity classification in
  twitter using a support vector machine ap-
  proach.
Martı́nez-Cámara, E., M. T. Martı́n-
  Valdivia, M. D. Molina-gonzález, y
  L. A. Ureña-lópez.     2013.  Bilingual
  Experiments on an Opinion Comparable
  Corpus. En Proceedings of the 4th Works-
  hop on Computational Approaches to
  Subjectivity, Sentiment and Social Media
  Analysis, página 87–93.
Padró, Lluı́s y Evgeny Stanilovsky. 2012.
  Freeling 3.0: Towards wider multilingua-
  lity.    En Proceedings of the Langua-
  ge Resources and Evaluation Conference
  (LREC 2012), Istanbul, Turkey, May. EL-
  RA.
Pedregosa, F., G. Varoquaux, A. Gramfort,
  V. Michel, B. Thirion, O. Grisel, M. Blon-
  del, P. Prettenhofer, R. Weiss, V. Du-
  bourg, J. Vanderplas, A. Passos, D. Cour-
  napeau, M. Brucher, M. Perrot, y E. Du-
  chesnay. 2011. Scikit-learn: Machine lear-
  ning in Python. Journal of Machine Lear-
  ning Research, 12:2825–2830.
Pla, Ferran y Lluı́s-F Hurtado. 2013. Tass-
   2013: Análisis de sentimientos en twitter.
   En Proceedings of the TASS workshop at
   SEPLN 2013. IV Congreso Español de In-
   formática.
Pla, Ferran y Lluı́s-F. Hurtado. 2014a. Po-
   litical tendency identification in twitter
   using sentiment analysis techniques. En
   Proceedings of COLING 2014, the 25th
   International Conference on Computatio-
   nal Linguistics: Technical Papers, pági-
   nas 183–192, Dublin, Ireland, August. Du-
   blin City University and Association for
   Computational Linguistics.
                                                      51
                   TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 53-57


    GTI at TASS 2016: Supervised Approach for Aspect Based
                Sentiment Analysis in Twitter∗
    GTI en TASS 2016: Una aproximación supervisada para el análisis de
               sentimiento basado en aspectos en Twitter

Tamara Álvarez-López, Milagros Fernández-Gavilanes, Silvia Garcı́a-Méndez,
       Jonathan Juncal-Martı́nez, Francisco Javier González-Castaño
                        GTI Research Group, AtlantTIC
                      University of Vigo, 36310 Vigo, Spain
  {talvarez,mfgavilanes,sgarcia,jonijm}@gti.uvigo.es, javier@det.uvigo.es

      Resumen: Este artı́culo describe la participación del grupo de investigación GTI,
      del centro AtlantTIC, perteneciente a la Universidad de Vigo, en el tass 2016. Este
      taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual de
      la Sociedad Española para el Procesamiento del Lenguaje Natural. En este trabajo
      se propone una aproximación supervisada, basada en clasificadores, para la tarea de
      análisis de sentimiento basado en aspectos. Mediante esta técnica hemos conseguido
      mejorar las prestaciones de ediciones anteriores, obteniendo una solución acorde con
      el estado del arte actual.
      Palabras clave: Análisis de sentimiento, aspectos, SVM, aprendizaje automático,
      Twitter
      Abstract: This paper describes the participation of the GTI research group of
      AtlantTIC, University of Vigo, in tass 2016. This workshop is framed within the
      XXXII edition of the Annual Congress of the Spanish Society for Natural Language
      Processing event. In this work we propose a supervised approach based on classifiers,
      for the aspect based sentiment analysis task. Using this technique we managed to
      improve the performance of previous years, obtaining a solution reflecting the actual
      state-of-the-art.
      Keywords: Sentiment analysis, aspects, SVM, machine learning, Twitter


1    Introduction                                             mum length of the post. However, tweets
                                                              have other elements we have to consider,
The social media activity is being profused
                                                              like hashtags, mentions and retweets. More
in the recent years, users post opinions and
                                                              concretely, aspect-based sentiment analysis
comments in Twitter and in other social plat-
                                                              (absa) consists of extracting opinions, i.e.
forms. Due to this, there is a huge amount
                                                              determining the sentiment polarity, from spe-
of information available that could be use-
                                                              cific entities in the text (Liu, 2012). There-
ful for business, in order to design marketing
                                                              fore, this task becomes a challenge on the
campaigns or to apply any kind of business
                                                              field of nlp.
analysis.
    As a consequence, the research on text                        The tass Workshop (Garcı́a-Cumbreras
mining and also on the field of Sentiment                     et al., 2016) and the sepln conference of-
Analysis (sa) has grown considerably these                    fer an opportunity for participants to know
days. sa is the part of Natural Language Pro-                 about the latest advances on the field of nlp
cessing (nlp) responsible for determining the                 for Spanish language.
polarity of a text or a whole sentence. The                       Many approaches applied to sa can be
sa applied to Twitter has to be conducted                     found in the literature, where it is possi-
in a restricted scenario due to the maxi-                     ble to distinguish between knowledge based
∗
                                                              approaches (Brooke, Tofiloski, and Taboada,
  This work was partially supported by the Minis-
terio de Economı́a y Competitividad under project
                                                              2009; Fernández-Gavilanes et al., 2016), us-
COINS (TEC2013-47016-C2-1-R) and by Xunta de                  ing grammars and thesaurus and others
Galicia (GRC2014/046).                                        based on machine learning approaches (Mo-
                                                  ISSN 1613-0073
           T. Álvarez-López, M. Fernández-Gavilanes, S. García-Méndez, J. Juncal-Martínez, F. J. González-Castaño


hammad, Kiritchenko, and Zhu, 2013). In                           plying sa to Twitter has been fully ad-
the last years we can also find deep learning                     dressed (Pak and Paroubek, 2010; Han and
approaches (Bengio, 2009), applied to this                        Baldwin, 2011). Within the chosen solu-
task.                                                             tions, we highlight the text normalization
   We present our supervised machine learn-                       approach (Fabo, Cuadros, and Etchegoyhen,
ing (ml) system which consists of a Support                       2013) and the use of key elements in classifi-
Vector Machine (svm) classifier. Our objec-                       cation approach (Wang et al., 2011). Others
tive is to conduct the sa process at an aspect                    hold the advantages of using deep learning
level, task 2, determining the polarity of a                      techniques in this task (dos Santos and Gatti,
specific given part of a sentence.                                2014).
   The article is structured as follows. Sec-                         According to the purpose of the developed
tion 2 is a review of the research involving sa                   systems, it is possible to find applications
in the Twitter domain. Then, the Section 3                        like classification of product reviews and po-
describes the applied approach and the im-                        litical sentiment and election results pre-
plemented system. In Section 4, we show the                       diction (Bermingham and Smeaton, 2011),
experimental results of our system. Finally,                      among others.
in Section 5 we present the conclusions and
future works.                                                     3     System Overview
                                                                  In this section we make a brief description
2   Related work                                                  of the system submitted for Task 2: Aspect-
A large amount of literature related to Opin-                     based sentiment analysis. We developed a
ion Mining (om) and sa can be found (Pang                         supervised system, based on a svm classifier
and Lee, 2008; Martı́nez-Cámara et al.,                          using different features. In the next subsec-
2016). Most of the systems are applied to                         tions we explain the different steps required.
Twitter. However others are applied to social
media platforms within the micro-blog con-                        3.1      Preprocessing
text. Due to this, the approaches are varied                      Before applying any supervised approach to
technically and in connection with the pur-                       our corpus, some preprocessing is needed.
pose.                                                             First of all, we have to normalize the text,
   Two main approaches exist in sa: super-                        since in Twitter language we can find abbre-
vised and unsupervised learning ones. Super-                      viations, mentions, hashtags, URLs or mis-
vised systems implement classification meth-                      spellings. In order to do that, we replace the
ods like svm, Logistic Regression (lr), Con-                      URLs with the “URL” tag and we replace the
ditional Random Fields (crf), K-Nearest                           abbreviations or misspellings with the correct
Neighbors (knn), etc. Cui, Mittal, and Datar                      entire word. For mentions and hashtags, we
(2006) affirmed that svm are more appro-                          keep them unchanged but deleting the “@”
priate for sentiment classification than gen-                     or “#” symbols. Moreover, when a hashtag
erative models, due to their capability for                       is composed of several words, we split and
working with ambiguity, that is, dealing with                     treat them as different tokens.
mixed feelings. Supervised algorithms are                             After this, a lexical analysis is carried out.
used when the number of classes, as well as                       It consists of lemmatization and POS tag-
the representative members of each class, are                     ging, which are performed by means of Freel-
known.                                                            ing tool (Atserias et al., 2006).
   Unsupervised systems are based on lin-                             Once we have analysed lexically the texts,
guistic knowledge like lexicons, and syntactic                    we decided to separate the sentences by the
features in order to infer the polarity (Pal-                     different aspects. For doing that, the scope
toglou and Thelwall, 2012). These last tech-                      of each aspect is determined, applying the
niques represent a more effective approach in                     following rules, which are adapted from our
the cross-domain context and for multilingual                     English aspect based sentiment anaylisis sys-
applications. The unsupervised classification                     tem (Alvarez-López et al., 2016)
algorithms do not work with a training set,
in contrast, some of them use clustering algo-                        • If there is only one aspect in the sen-
rithms in order to distinguish groups (Li and                           tence, we keep the sentence unchanged,
Liu, 2010).                                                             and introduce it entirely as input for the
   As noted earlier, the special case of ap-                            next step.
                                                            54
           GTI en TASS 2016: Una aproximación supervisada para el análisis de sentimiento basado en aspectos en Twitter


    • If there are multiple aspects, we separate                     a number of political issues, such as health
      the sentences by punctuation marks,                            or economy, among others. These issues are
      conjunctions or other aspects found.                           framed in the political campaign of Andalu-
                                                                     sian elections in 2015, where each aspect re-
    • If there are several aspects with no words
                                                                     lates to one or several entities that corre-
      between them, we consider that they be-
                                                                     spond to one of the main political parties
      long to the same context, and assign the
                                                                     in Spain (PP, PSOE, IU, UPyD, Cs and
      same polarity to all of them.
                                                                     Podemos). The corpus is composed by 1,284
                                                                     tweets, and has been divided into a training
3.2     SVM classifier                                               set (784 tweets) and a set of evaluation (500
In this section we describe the strategy fol-                        tweets).
lowed to determine the sentiment (positive,                             In order to evaluate the performance of
negative or neutral) for each aspect prede-                          the various features for polarity classification
fined in corpus.                                                     at an aspect-based level, we perform a se-
   We develop a svm classifier, using the lib-                       ries of ablation experiments as shown in Ta-
svm library (Chang and Lin, 2011). The in-                           ble 1. We start with the word token base-
puts for the svm will be the sentences sep-                          line classifier, and then add all four sets of
arated by contexts, as explained in the pre-                         features that help to increase performance as
vious subsection. The features extracted are                         measured by accuracy. As we might expect,
the following:                                                       including the aspect feature has the most
                                                                     marked effect on the performance of polarity
    • Word tokens of nouns, adjectives and                           classification, although all the features con-
      verbs in the sentence.                                         tributed to improving overall performance on
    • Lemmas of verbs, nouns and adjectives                          stompol corpus.
      that appear in each sentence.
                                                                       Type                 Accuracy           Improvement
    • POS tags of nouns, adjectives and verbs.
                                                                      Word token                56.12
    • N-grams of different length, grouping the                       +Lemmas                   57.64               +1.52%
      words in each sentence.                                         +pos tags                 58.26               +0.62%
    • Aspects appearing in the sentence. We                           +Aspects                  59.94               +1.68%
      join “aspect”-“entity”, defined in each                         +Negations                60.60               +0.66%
      target as a feature.
    • Negations. We create a negation dic-                           Table 1: Results for polarity feature ablation
      tionary, which contains several parti-                         experiments on stompol corpus
      cles indicating negation, such as “no”,
      “nunca”, etc.                                                     Due to the low participation of research
                                                                     teams in task 2 this year, we decided to com-
   The previous features are all binary ones,                        pare our proposal to the systems presented
assigning the value 1 if the current feature is                      this year and also to that ones of last year,
present in the tweet and the value 0, if not.                        because of the use of the same dataset.
                                                                        For this reason, Table 2 compares results
4     Experimental Results                                           for our approach with different official ones
The Task 2: Sentiment Analysis at the as-                            submitted in 2015 and 2016 tass editions.
pect level consists of assigning a polarity label                    In this way, we compared our results for a
to each aspect, which were initially marked                          ml approach based on well-known squared-
in the stompol corpus (Martı́nez-Cámara et                          regularised logistic regression with a snippet
al., 2016) raised by the tass organization. In                       of length 4 (Lys-2) described in Vilares et
this way, this corpus provides both polarity                         al. (2015), a clustering method focused on
labels and the identification of the aspects                         grouping authors with similar sociolinguis-
that appear in each tweet. The aim is to be                          tic insights (TID-spark) described in Park
able to correctly assign to each aspect a pos-                       (2015), a recurrent neural network composed
itive, negative or neutral polarity.                                 of a single long short term memory and a
    In this regard, the stompol corpus con-                          logistic function (Lys-1) described in Vilares
sists of a set of Spanish tweets related to                          et al. (2015), a ml approach based on a
                                                               55
           T. Álvarez-López, M. Fernández-Gavilanes, S. García-Méndez, J. Juncal-Martínez, F. J. González-Castaño


svm with a snipped of length 5,7 and 10                               In Proceedings of LREC, volume 6, pages
(ELiRF) described in Hurtado, Plà, and Bus-                          48–55.
caldi (2015), and the best performing run of
                                                                  Bengio, Y. 2009. Learning deep architec-
the actual task 2 tass edition (ELiRF-UPV).
                                                                    tures for AI. Found. Trends Mach. Learn.,
                                                                    2(1):1–127, January.
    Experiment     Task edition            Accuracy
                                                                  Bermingham, A. and A. F. Smeaton. 2011.
 ELiRF-UPV                2016                 63.3
                                                                    On using Twitter to monitor political sen-
 ELiRF                    2015                 63.3
                                                                    timent and predict election results.
 GTI                      2016                 60.6
 LyS-1                    2015                 59.9               Brooke, J., M. Tofiloski, and M. Taboada.
 TID-spark                2015                 55.7                 2009. Cross-linguistic sentiment analysis:
 Lys-2                    2015                 54.0                 From english to spanish. In G. Angelova,
                                                                    K. Bontcheva, R. Mitkov, N. Nicolov, and
                                                                    N. Nikolov, editors, RANLP, pages 50–
Table 2: Results of different approaches in                         54. RANLP 2009 Organising Committee
2015/2016 tass editions on stompol corpus                           / ACL.
   Comparing the results, the performance of                      Chang, C.-C. and C.-J. Lin. 2011. Libsvm: a
our current model is close from the top rank-                       library for support vector machines. ACM
ing systems of this and last year.                                  Transactions on Intelligent Systems and
                                                                    Technology (TIST), 2(3):27.
5     Conclusions and future works
                                                                  Cui, H., V. Mittal, and M. Datar. 2006.
This paper describes the participation of the
                                                                    Comparative experiments on sentiment
GTI group in the tass 2016, Task 2: Aspect-
                                                                    classification for online product reviews.
Based Sentiment Analysis. We developed a
                                                                    In Proceedings of the 21st National Con-
supervised system based on a svm classifier
                                                                    ference on Artificial Intelligence - Vol-
for the aspect-based sentiment analysis. The
                                                                    ume 2, AAAI’06, pages 1265–1270. AAAI
performance of our approach has been com-
                                                                    Press.
pared to that ones submitted this year but
also to that ones submitted last year. Exper-                     dos Santos, C. N. and M. Gatti. 2014. Deep
imental results suggest that we need to in-                          convolutional neural networks for senti-
clude explore new features, such as word em-                         ment analysis of short texts. In COLING,
bedding representations or paraphrase (Zhao                          pages 69–78.
and Lan, 2015), in order to improve the per-
                                                                  Fabo, P. R., M. Cuadros, and T. Etchegoy-
formance.
                                                                    hen. 2013. Lexical normalization of
   As future work we plan to include new fea-
                                                                    spanish tweets with preprocessing rules,
tures explained before and to develop a new
                                                                    domain-specific edit distances, and lan-
system which combines different ml classifi-
                                                                    guage models. In Proceedings of the Tweet
cation methods. We are also interested in
                                                                    Normalization Workshop co-located with
considering different paradigms of heteroge-
                                                                    29th Conference of the Spanish Society
neous classification, such as deep learning to
                                                                    for Natural Language Processing (SEPLN
increase the performance.
                                                                    2013), Madrid, Spain, September 20th,
References                                                          2013., pages 59–63.
Alvarez-López, T., J. Juncal-Martınez,                           Fernández-Gavilanes, M., T. Álvarez-López,
   M. Fernández-Gavilanes, E. Costa-                                J. Juncal-Martı́nez, E. Costa-Montenegro,
   Montenegro, and F. J. González-Castano.                          and F. J. González-Castaño. 2016. Unsu-
   2016. Gti at semeval-2016 task 5: Svm                             pervised method for sentiment analysis in
   and crf for aspect detection and unsu-                            online texts. Expert Systems with Appli-
   pervised aspect-based sentiment analysis.                         cations, 58:57–75.
   Proceedings of SemEval, pages 306–311.                         Garcı́a-Cumbreras, M. A., J. Villena-Román,
Atserias, J., B. Casas, E. Comelles,                                E. Martı́nez-Cámara, M. C. Dı́az-Galiano,
  M. González, L. Padró, and M. Padró.                           M. T. Martı́n-Valdivia, and L. A. Ureña-
  2006. Freeling 1.3: Syntactic and seman-                          López. 2016. Overview of tass 2016. In
  tic services in an open-source NLP library.                       Proceedings of TASS 2016: Workshop on
                                                            56
         GTI en TASS 2016: Una aproximación supervisada para el análisis de sentimiento basado en aspectos en Twitter


  Sentiment Analysis at SEPLN co-located                               (LREC’10), Valletta, Malta, may. Eu-
  with the 32nd SEPLN Conference (SE-                                  ropean Language Resources Association
  PLN 2016), Salamanca, Spain, Septem-                                 (ELRA).
  ber.
                                                                   Paltoglou, G. and M. Thelwall. 2012. Twit-
Han, B. and T. Baldwin. 2011. Lexi-                                   ter, myspace, digg: Unsupervised sen-
  cal normalisation of short text messages:                           timent analysis in social media. ACM
  Makn sens a #twitter. In Proceedings of                             Transactions on Intelligent Systems and
  the 49th Annual Meeting of the Associa-                             Technology (TIST), 3(4):66.
  tion for Computational Linguistics: Hu-                          Pang, B. and L. Lee. 2008. Opinion min-
  man Language Technologies - Volume 1,                              ing and sentiment analysis. Found. Trends
  HLT ’11, pages 368–378, Stroudsburg, PA,                           Inf. Retr., 2(1-2):1–135, January.
  USA. Association for Computational Lin-
  guistics.                                                        Park, S. 2015. Sentiment classification us-
                                                                     ing sociolinguistic clusters. In Proceedings
Hurtado, L. F., F. Plà, and D. Bus-                                 of TASS 2015: Workshop on Sentiment
  caldi.   2015.     ELiRF-UPV en TASS                               Analysis at SEPLN co-located with 31st
  2015: Análisis de sentimientos en Twit-                           SEPLN Conference (SEPLN 2015), Ali-
  ter. In Proceedings of TASS 2015: Work-                            cante, Spain, September 15, 2015., pages
  shop on Sentiment Analysis at SEPLN co-                            99–104.
  located with 31st SEPLN Conference (SE-
  PLN 2015), Alicante, Spain, September                            Vilares, D., Y. Doval, M. A. Alonso, and
  15, 2015., pages 75–79.                                             C. Gómez-Rodrı́guez.    2015.    Lys at
                                                                      TASS 2015: Deep learning experiments
Li, G. and F. Liu. 2010. A clustering-based                           for sentiment analysis on spanish tweets.
   approach on sentiment analysis. In Intel-                          In Proceedings of TASS 2015: Work-
   ligent Systems and Knowledge Engineer-                             shop on Sentiment Analysis at SEPLN co-
   ing (ISKE), 2010 International Confer-                             located with 31st SEPLN Conference (SE-
   ence on, pages 331–337. IEEE.                                      PLN 2015), Alicante, Spain, September
                                                                      15, 2015., pages 47–52.
Liu, B. 2012. Sentiment Analysis and Opin-
   ion Mining. Synthesis Lectures on Human                         Wang, X., F. Wei, X. Liu, M. Zhou, and
   Language Technologies. Morgan & Clay-                             M. Zhang. 2011. Topic sentiment anal-
   pool Publishers.                                                  ysis in Twitter: A graph-based hashtag
                                                                     sentiment classification approach. In Pro-
Martı́nez-Cámara, E., M. A. Garcı́a-                                ceedings of the 20th ACM International
  Cumbreras, J. Villena-Román, and                                  Conference on Information and Knowl-
  J. Garcı́a-Morera. 2016. Tass 2015 - the                           edge Management, CIKM ’11, pages 1031–
  evolution of the spanish opinion mining                            1040, New York, NY, USA. ACM.
  systems.     Procesamiento del Lenguaje
  Natural, 56:33–40.                                               Zhao, J. and M. Lan. 2015. Ecnu: Lever-
                                                                     aging word embeddings to boost perfor-
Mohammad, S. M., S. Kiritchenko, and                                 mance for paraphrase in Twitter. In Pro-
  X. Zhu. 2013. Nrc-canada: Building the                             ceedings of the 9th International Work-
  state-of-the-art in sentiment analysis of                          shop on Semantic Evaluation (SemEval
  tweets. In Proceedings of the seventh in-                          2015), pages 34–39, Denver, Colorado,
  ternational workshop on Semantic Evalu-                            June. Association for Computational Lin-
  ation Exercises (SemEval-2013), Atlanta,                           guistics.
  Georgia, USA, June.
Pak, A. and P. Paroubek. 2010. Twit-
  ter as a corpus for sentiment analy-
  sis and opinion mining. In N. C. C.
  Chair), K. Choukri, B. Maegaard, J. Mar-
  iani, J. Odijk, S. Piperidis, M. Ros-
  ner, and D. Tapias, editors, Proceedings
  of the Seventh International Conference
  on Language Resources and Evaluation
                                                             57