=Paper= {{Paper |id=Vol-1702/complete-proceedings |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1702/tass2016_proceedings.pdf |volume=Vol-1702 }} ==None== https://ceur-ws.org/Vol-1702/tass2016_proceedings.pdf

TASS 2016

CEUR Workshop Proceedings

ISSN: 1613-0073

Artículos

Overview of TASS 2016
Miguel Ángel García Cumbreras, Julio Villena Román, Eugenio Martínez Cámara, M. Carlos Díaz
Galiano, M. Teresa Martín Valdivia, L. Alfonso Ureña López ...................................................................13
Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis
de Sentimiento
Edgar Casasola Murillo ..............................................................................................................................23
LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task
Antonio Quirós, Isabel Segura-Bedmar, Paloma Martínez .........................................................................29
JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Tweets at Global Level
Jhon Adrán Cerón-Guzmán .........................................................................................................................35
Participación de SINAI en TASS 2016
A. Montejo-Ráez, M. C. Díaz-Galiano .........................................................................................................41
ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter
Lluís-F. Hurtado, Ferran Pla ......................................................................................................................47
GTI at TASS 2016: Supervised Approach for Aspect Based Sentiment Analysis in Twitter
Tamara Álvarez-López, Milagros Fernández-Gavilanes, Silvia García-Méndez, Jonathan Juncal-
Martínez, Francisco Javier González-Castaño ...........................................................................................53

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073
TASS 2016

CEUR Workshop Proceedings

ISSN: 1613-0073

Organización
Comité organizador
Julio Villena-Román Sngular julio.villena@sngular.team
Miguel Á. García Cumbreras Universidad de Jaén magc@ujaen.es
Eugenio Martínez Cámara TU Darmstadt camara@ukp.informatik.tu-darmstadt.de
Manuel C. Díaz Galiano Universidad de Jaén mcdiaz@ujaen.es
M. Teresa Martín Valdivia Universidad de Jaén maite@ujaen.es
L. Alfonso Ureña López Universidad de Jaén laurena@ujaen.es

ISSN: 1613-0073
Editado en: Universidad de Jaén
Año: 2016
Editores: Julio Villena-Román Sngular julio.villena@sngular.team
Miguel Á. García Cumbreras Universidad de Jaén magc@ujaen.es
Eugenio Martínez Cámara TU Darmstadt camara@ukp.informatik.tu-darmstadt.de
Manuel C. Díaz Galiano Universidad de Jaén mcdiaz@ujaen.es
M. Teresa Martín Valdivia Universidad de Jaén maite@ujaen.es
L. Alfonso Ureña López Universidad de Jaén laurena@ujaen.es
Publicado por: CEUR Workshop Proceedings

Comité de programa
Alexandra Balahur EC-Joint Research Centre (Italia)
José Carlos Cortizo Universidad Europea de Madrid (España)
Jose María Gómez Hidalgo Optenet (España)
José Carlos González-Cristobal Universidad Politécnica de Madrid (España)
Lluís F. Hurtado Universidad de Valencia (España)
Carlos A. Iglesias Fernández Universidad Politécnica de Madrid (España)
Zornitsa Kozareva Information Sciences Institute (EE.UU.)
Sara Lana Serrano Universidad Politécnica de Madrid (España)
Ruslan Mitkov University of Wolverhampton (Reino Unido)
Andrés Montoyo Universidad de Alicante (España)
Rafael Muñoz Universidad de Alicante (España)
Constantine Orasan University of Wolverhampton (Reino Unido)
Jose Manuel Perea Ortega Universidad de Extremadura (España)
Ferran Pla Santamaría Universidad de Valencia (España)
María Teresa Taboada Gómez Simon Fraser University (Canadá)
Mike Thelwall University of Wolverhampton (Reino Unido)
José Antonio Troyano Jiménez Universidad de Sevilla (España)

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073
TASS 2016

Agradecimientos
La organización de TASS ha contado con la colaboración de investigadores que participan en
los siguiente proyectos de investigación:
• REDES (TIN2015-65136-C2-1-R)

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073
TASS 2016

CEUR Workshop Proceedings

ISSN: 1613-0073

Preámbulo

Actualmente el español es la segunda lengua materna del mundo por número de hablantes tras el
chino mandarín, y la segunda lengua mundial en cómputo global de hablantes. Esa segunda
posición se traduce en un 6,7% de población mundial que se puede considerar hispanohablante.
La presencia del español en el mundo no tiene una correspondencia directa con el nivel de
investigación en el ámbito del Procesamiento del Lenguaje Natural, y más concretamente en la
tarea que nos atañe, el Análisis de Opiniones. Por consiguiente, el Taller de Análisis de
Sentimientos en la SEPLN (TASS) tiene como objetivo la promoción de la investigación del
tratamiento del español en sistemas de Análisis de Opiniones, mediante la evaluación
competitiva de sistemas de procesamiento de opiniones.

En la edición de 2016 han participado 7 equipos, de los que 6 han enviado un artículo
describiendo el sistema que han presentado, habiendo sido aceptados los 6 artículos tras ser
revisados por el comité organizador. La revisión se llevó a cabo con la intención de publicar
sólo aquellos que tuvieran un mínimo de calidad científica.

La edición de 2016 tendrá lugar en el seno del XXXII Congreso Internacional de la Sociedad
Española para el Procesamiento del Lenguaje Natural, que se celebrará el próximo mes de
septiembre en Salamanca (España) dentro del V Congreso Español de Informática (CEDI 2016).

Septiembre de 2016
Los editores

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073
TASS 2016

CEUR Workshop Proceedings

ISSN: 1613-0073

Preamble

Currently Spanish is the second native language in the world by number of speakers after the
Mandarin Chinese. This second position means that the 6.7% of the world population is
Spanish-speaking. The presence of the Spanish language in the world has not a direct
correspondence with the number of research works related to the treatment of Spanish language
in the context of Natural Language Processing, and specially in the field of Sentiment Analysis.
Therefore, the Workshop on Sentiment Analysis at SEPLN (TASS) aims to promote the
research of the treatment of texts written in Spanish in Sentiment Analysis systems by means of
the competitive assessment of opinion processing systems.

Seven teams have participated in the 2016 edition of the workshop. Six of the seven teams have
submitted a description paper of their systems. After a review process, the organizing committee
has accepted the 6 papers, because all of them reached an acceptable scientific quality level.

The 2016 edition will be held at the 32nd International Conference of the Spanish Society for
Natural Language Processing (SEPLN 2016), which will take place at Salamanca in September
framed by the 5th Spanish Conference of Computer Science (CEDI 2016).

September 2016
The editors

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073
TASS 2016

CEUR Workshop Proceedings

ISSN: 1613-0073

Artículos

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073
Artículos
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 13-21

Overview of TASS 2016
Resumen de TASS 2016
Miguel Ángel García Cumbreras1, Julio Villena Román2, Eugenio Martínez Cámara1,
Manuel Carlos Díaz Galiano1, M. Teresa Martín Valdivia1, L. Alfonso Ureña López1
1
Universidad de Jaén
23071 Jaén, Spain
2
Sngular
28034 Madrid, Spain
1
{magc, emcamara, mcdiaz, laurena, maite}@ujaen.es 2{julio.villena}@sngular.team

Resumen: Este artículo describe la quinta edición del taller de evaluación experimental TASS
2016, enmarcada dentro del Congreso Internacional SEPLN 2016. El principal objetivo de
TASS es promover la investigación y el desarrollo de nuevos algoritmos, recursos y técnicas
para el análisis de sentimientos en medios sociales (concretamente en Twitter), aplicado al
idioma español. Este artículo describe las tareas propuestas en TASS 2016, así como el
contenido de los corpus utilizados, los participantes en las distintas tareas, los resultados
generales obtenidos y el análisis de estos resultados.
Palabras clave: TASS 2016, análisis de opiniones, medios sociales

Abstract: This paper describes TASS 2016, the fifth edition of the Workshop on Sentiment
Analysis at SEPLN. The main aim is the promotion of the research and the development of new
algorithms, resources and techniques on the field of sentiment analysis in social media
(specifically Twitter) focused on the Spanish language. This paper presents the TASS 2016
proposed tasks, the description of the corpora used, the participant groups, the results and
analysis of them.
Keywords: TASS 2016, sentiment analysis, social media.

Although SA is not a new task, it is still
1 Introduction challenging, because the state of the art has not
yet resolved some problems related to
TASS is an experimental evaluation workshop,
multilingualism, domain adaptation, text genre
a satellite event of the annual SEPLN
adaptation and polarity classification at fine
Conference, with the aim to promote the
grained level. Polarity classification has usually
research on Sentiment Analysis in social media
been tackled following two main approaches.
focused on the Spanish language. The fifth
The first one applies machine learning
edition will be held on September 13th, 2016 at
algorithms in order to train a polarity classifier
the University of Salamanca, Spain.
using a labelled corpus (Pang et al. 2002). This
Sentiment Analysis (SA) is traditionally
approach is also known as the supervised
defined as the computational treatment of
approach. The second one is known as semantic
opinion, sentiment and subjectivity in texts
orientation, or the unsupervised approach, and
(Pang & Lee, 2008). However, Cambria and
it integrates linguistic resources in a model in
Hussain (2012) offer a more updated definition:
order to identify the valence of the opinions
Computational techniques for the extraction,
(Turney 2002).
classification, understanding and evaluation of
The aim of TASS is to provide a competitive
opinions and comments published on the
forum where the newest research works in the
Internet and other kind of user generated
field of SA in social media, specifically focused
contents. It is a hard task because even humans
on Spanish tweets, are described and discussed
often disagree on the polarity of a given text.
by scientific and business communities.
And it is a harder task when the text has only
The rest of the paper is organized as follows.
140 characters (Twitter messages or tweets).
Section 2 describes the different corpus

ISSN 1613-0073
M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López

provided to participants. Section 3 shows the Obviously, it was not allowed to use the test
different tasks of TASS 2016. Section 4 data from previous years to train the systems.
describes the participants and the overall results Each tweet was tagged with its global
are presented in Section 5. Finally, the last polarity (positive, negative or neutral
section shows some conclusions and future sentiment) or no sentiment at all. A set of 6
directions. labels has been defined: strong positive (P+),
positive (P), neutral (NEU), negative (N),
2 Corpus strong negative (N+) and one additional no
sentiment tag (NONE).
TASS 2016 experiments are based on two
In addition, there is also an indication of the
corpora, specifically built for the different
level of agreement or disagreement of the
editions of the workshop.
expressed sentiment within the content, with
The two corpora will be made freely
two possible values: AGREEMENT and
available to the community after the workshop.
DISAGREEMENT. This is especially useful to
Please send an email to
make out whether a neutral sentiment comes
tass@sngularmeaning.team filling in the TASS
from neutral keywords or else the text contains
Corpus License agreement with your email,
positive and negative sentiments at the same
affiliation (institution, company or any kind of
time.
organization) and a brief description of your
Moreover, the polarity values related to the
research objectives, and you will be given a
entities that are mentioned in the text are also
password to download the files in the password
included for those cases when applicable. These
protected area. The only requirement is to
values are similarly tagged with 6 possible
include a citation to a relevant paper and/or the
values and include the level of agreement as
TASS website.
related to each entity.
2.1 General corpus This corpus is based on a selection of a set
of topics. Thematic areas such as “política”
The General Corpus contains over 68.000
(“politics”), “fútbol” (“soccer”), “literatura”
tweets, written in Spanish, about 150 well-
(“literature”) or “entretenimiento”
known personalities and celebrities of the world
(“entertainment”). Each tweet in the training
of politics, economy, communication, mass
and test set has been assigned to one or several
media and culture, between November 2011
of these topics (most messages are associated to
and March 2012. Although the context of
just one topic, due to the short length of the
extraction has a Spanish-focused bias, the
text).
diverse nationality of the authors, including
The annotation has been semi-automatically
people from Spain, Mexico, Colombia, Puerto
done: a baseline machine learning model is first
Rico, USA and many other countries, makes the
run and then all tags are checked by human
corpus reach a global coverage in the Spanish-
experts. In the case of the polarity at entity
speaking world.
level, due to the high volume of data to check,
Each tweet includes its ID (tweetid), the
the human annotation has only been done for
creation date (date) and the user ID (user). Due
the training set.
to restrictions in the Twitter API Terms of
Table 1 shows a summary of the training
Service (https://dev.twitter.com/terms/api-
and test corpora provided to participants.
terms), it is forbidden to redistribute a corpus
that includes text contents or information about Attribute Value
users. However, it is valid if those fields are Tweets 68.017
removed and instead IDs (including Tweet IDs Tweets (test) 60.798 (89%)
and user IDs) are provided. The actual message Tweets (test) 7.219 (11%)
content can be easily obtained by making Topics 10
queries to the Twitter API using the tweetid. Users 154
The general corpus has been divided into Date start (train) 2011-12-02
training set (about 10%) and test set (90%). The Date end (train) 2012-04-10
training set was released, so the participants Date start (test) 2011-12-02
could train and validate their models. The test Date end (test) 2012-04-10
corpus was provided without any tagging and
has been used to evaluate the results. Table 1: Corpus statistics

14
Overview of TASS 2016

Users were journalists (periodistas), gathered from 23rd to 24th of April 2015, and
politicians (políticos) or celebrities (famosos). are related to one of the following political
The only language involved was Spanish (es). aspects that appear in political campaigns:
The list of topics that have been selected is • Economics (Economía): taxes,
the following: infrastructure, markets, labour policy...
• Politics (política) • Health System (Sanidad): hospitals,
• Entertainment (entretenimiento) public/private health system, drugs,
• Economy (economía) doctors...
• Music (música) • Education (Educación): state school, private
• Soccer (fútbol) school, scholarships...
• Films (películas) • Political party (Propio_partido): anything
• Technology (tecnología) good (speeches, electoral programme...) or
• Sports (deportes) bad (corruption, criticism) related to the
• Literature (literatura) entity
• Other (otros) • Other aspects (Otros_aspectos): electoral
The corpus is encoded in XML. Figure 1 system, environmental policy...
shows the information of two tweets. The first Each aspect is related to one or several
tweet is only annotated with the polarity at entities that correspond to one of the main
tweet level because there is not any entity in the political parties in Spain, which are:
text. However, the second one is annotated with • Partido_Popular (PP)
the global polarity of the message and the • Partido_Socialista_Obrero_Español
polarity associated to each of the entities that (PSOE)
appear in the text (UPyD and Foro Asturias). • Izquierda_Unida (IU)
• Podemos
• Ciudadanos (C’s)
• Unión_Progreso_y_Democracia (UPyD)

Each tweet in the corpus has been manually
annotated by two annotators, and a third one in
case of disagreement, with the sentiment
polarity at aspect level. Sentiment polarity has
been tagged from the point of view of the
person who writes the tweet, using 3 levels: P,
NEU and N. Again, no difference is made
between no sentiment and a neutral sentiment
(neither positive nor negative). Each political
aspect is linked to its correspondent political
party and its polarity.

Figure 2 shows the information of two
sample tweets.

Figure 1: Sample tweets (General corpus)

Figure 2: Sample tweets (STOMPOL corpus)
2.2 STOMPOL corpus
STOMPOL (corpus of Spanish Tweets for The number of tweets per each entity are
Opinion Mining at aspect level about POLitics) shown in Table 2.
is a corpus of Spanish tweets prepared for the
research on the challenging task of opinion
mining at aspect level. The tweets were

15
M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López

Entity Train Test N, N+, NONE) and another based on just 4 labels
PP 205 125 (P, N, NEU, NONE).
PSOE 136 70 Participants are expected to submit (up to 3)
C’s 119 87 experiments for the 6-labels evaluation, and
Podemos 98 80 they are also allowed to submit (up to 3)
IU 111 43
specific experiments for the 4-labels scenario.
UPyD 97 124
Results must be submitted in a plain text file
Total 766 529
with the following format:

Table 2: Number of tweets per entity and per tweetid \t polarity
corpus subset
where polarity can be:
• P+, P, NEU, N, N+ and NONE for the 6-labels
3 Description of tasks case
Since the first edition of TASS, a new task and • P, NEU, N and NONE for the 4-labels case.
a new corpus have been published. However,
one of the aims of TASS is the evaluation of the The same test corpus of previous years was
progress of the research on SA. Thus, the used for the evaluation in order to develop a
edition of 2016 was focused on the analysis and comparison among the systems. The accuracy is
the comparison of the systems with the one of the measures used to evaluate the
submissions of previous editions. systems, however due to the fact that the
The edition of 2016 was focused on two training corpus is not totally balanced the
tasks: polarity classification at tweet level and systems were also assessed by the macro-
polarity classification at entity level. The averaged precision, macro-averaged recall and
polarity classification task has been proposed macro-averaged F1-measure.
with the same corpus since the first edition of 3.2 Task 2: Aspect-based sentiment
TASS, but the polarity classification at aspect analysis
level has been proposed with a different corpus A corpus with the entities and the aspect
each edition. In the edition of 2016 the identified was provided to the participants, so
classification at aspect level uses the the goal of the systems is the inference of the
STOMPOL corpus, which was published the polarity at the aspect-level. As in 2015,
first time in the edition of 2015. STOMPOL corpus was the corpus used in this
Participants are expected to submit up to 3 task. STOMPOL was divided in training and
results of different experiments for one or both test set, the first one for the development and
of these tasks, in the appropriate format validation of the systems, and the second for
described below. evaluation.
Along with the submission of experiments, Participants are expected to submit up to 3
participants have been invited to submit a paper experiments for each corpus, each in a plain
to the workshop in order to describe their text file with the following format:
experiments and discussing the results with the
audience in a regular workshop session. tweetid \t aspect-entity \t polarity
The two proposed tasks are described next.
Allowed polarity values are: P, N and NEU.
3.1 Task 1: Sentiment Analysis at For the evaluation, a single label combining
Global Level “aspect-polarity” has been considered. As in the
first task, accuracy, macro-averaged precision,
This task consists on performing an automatic
macro-averaged recall and macro-averaged F1-
polarity classification to determine the global
measure have been calculated for the global
polarity of each message in the test set of the
result.
General Corpus. The training set of the corpus
was provided to the participants with the aim
they could train and validate their models with
4 Participants and Results
it. There were two different evaluations: one This year 7 (7 last year) groups submitted their
based on 6 different polarity labels (P+, P, NEU, systems The list of active participant groups is

16
Overview of TASS 2016

shown in Table 3, including the tasks in which measure have been used to evaluate each
they have participated. individual label and ranking the systems.
Six of the seven participant groups sent a
report describing their experiments and results Run Id M-F1
achieved. Papers were reviewed and included in ELiRF-UPV_1 0.518
the workshop proceedings. References are listed jacerong_2 0.504
in Table 4.
jacerong_3 0.503
jacerong_1 0.499
Group 1 2 ELiRF-UPV_2 0.496
jacerong X INGEOTEC 0.464
ELiRF-UPV X X
LABDA_1 0.429
LABDA X
LABDA_2 0.429
INGEOTEC X
GASUCR X LABDA_3 0.418
GTI X GASURC_3 0.254
SINAI_w2v X GASURC_1 0.232
Total 6 1
GASURC_2 0.227

Table 3: Participant groups
Table 5: Results for Task 1, 5 levels
Group Report
ELiRF-UPV en TASS 2016: In order to perform a more in-depth
ELiRF Análisis de Sentimientos en evaluation, results are calculated considering
Twitter the classification only in 3 levels (POS, NEU,
GTI at TASS 2016:
NEG) and no sentiment (NONE) merging P and P+
Supervised Approach for
GTI in only one category, as well as N and N+ in
Aspect Based Sentiment
Analysis in Twitter another one. The results reached by the
JACERONG at TASS 2016: submitted systems are shown in Table 6.
An Ensemble Classifier for
jacerong Sentiment Analysis of Spanish Run Id M-F1
Tweets at Global Level jacerong_3 0.568
LABDA at the 2016 TASS jacerong_2 0.567
challenge task: using word
LABDA embedding for the sentiment jacerong_1 0.564
analysis task ELiRF-UPV_1 0.549
Participación de SINAI en ELiRF-UPV_2 0.548
SINAI
TASS 2016
INGEOTEC 0.524
LABDA_3 0.511
Table 4: Participant reports
LABDA_2 0.508
5 Results LABDA_1 0.508

This section will be focused on the SINAI_w2v_1 0.504
description and the analysis of the results and SINAI_w2v_3 0.486
the systems submitted by the participants. SINAI_w2v_4 0.469
SINAI_w2v_2 0.440
5.1 Task 1: Sentiment Analysis at GASURC_1 0.250
Global Level
GASURC_2 0.152
Submitted runs and results for Task 1,
evaluation based on 5 polarity levels with the
whole General test Corpus are shown in Table Table 6: Results for Task 1, 3 levels
5. Accuracy, macro-averaged precision, macro-
averaged recall and macro-averaged F1-

17
M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López

5.2 Task 2: Aspect-based Sentiment method based on lexical markers. The polarity
Analysis classification system is a SVM classifier that
uses different type of features in order to
Submitted runs and results for Task 2, with the represent the contexts of the entities and the
STOMPOL corpus, are shown in Table 7. aspects.
Accuracy, macro-averaged precision, macro- Montejo-Ráez and Díaz-Galiano (2016)
averaged recall and macro-averaged F1- introduce a system based on a supervised
measure have been used to evaluate each learning algorithm over vectors resulting from a
individual label and ranking the systems. weighted vector. This vector is computed using
a Word2Vec algorithm. This method, which is
Run Id M-F1
inspired from neural-network language
ELiRF-UPV_1 0.526 modelling, was executed with a collection of
GTI 0.463 tweets written in Spanish and the Spanish
Wikipedia in order to generate a set of word
Table 7: Results for Task 2 embeddings for the representation of the words
of the General Corpus of TASS as dense
vectors. The creation of the collection of tweets
5.3 Description of the systems written in Spanish followed a distant
The systems submitted in the edition of 2016 supervision approach by means the assumption
represent the next step of the ones submitted in that tweets with happy and sad emoticons
the previous edition. The systems may be express emotions or opinions. Their
cluster in two groups, those ones that rely on experiments show massive data from Twitter
the classification power of the ensemble of can lead to a slight improvement in
several base classifiers, and those systems that classification accuracy.
change the use traditional Bag-of-Words model The system presented by the team LABDA
for the use of vectors of word embeddings in (Quirós, Segura-Bedmar and Paloma Martínez,
order to represent the meaning of each word. In 2016) is similar to the one submitted by SINAI
the subsequent paragraphs the main features of (Montejo-Ráez and Díaz-Galiano, 2016)
the systems submitted are going to be depicted. because it also used word embeddings as
Hurtado and Pla (2016) describe the schema of representation of the meaning of the
participation of the team ELiRF-UPV in the words of the tweets. Quirós, Segura-Bedmar
two tasks of TASS 2016. The only difference and Paloma Martínez (2016) assessed the
between the systems submitted for the two tasks performance of the SVM and Logistic
is the fact that the one focused on the second Regression as classifiers.
task has a module for the identification of the Casasola Murillo and Marín Reventós
context of each of the entities and aspects (2016) submitted an unsupervised system based
annotated on the tweets. The polarity on the system described in Turney (2002), but
classification system relies on the ensemble of with a specific adaptation to the classification
192 configurations of a SVM classifiers. For of tweets written in Spanish.
the combination of the set of classifiers they
evaluate the performance of an approach based 5.4 Analysis
on voting and other on stacking. In Table 5 and Table 6 are shown the results of
The system depicted in (Cerón-Guzmán, each system and they are ranked by the F1-
2016) is also based on an approach of ensemble score reached, so it is not hard to know what is
classifiers. In this case the base classifiers used the best system in the edition of 2016.
a classifier based on logistic regression and they On the other hand, how many tweets were
are combined by voting. rightly classified by the submitted systems? Is
Alvarez et al. (2016) exposed the there a set of tweets that were not rightly
participation of the team GTI on the task 2. The classified by any system? What are the most
system is similar to the system of the team difficult tweets to classify? These questions are
ELiRF-UPV in the sense that it is composed by going to be answered in the following
two layers: context identification and polarity paragraphs?
classification. Regarding the identification of Table 8 shows the rate of tweets that are
the context, the authors design a heuristic rightly classified by a number of systems. There

18
Overview of TASS 2016

are about a 6% of tweets whose polarity is not Id: 177439342497767424
inferred by any of the submitted systems. In
other words, the submitted systems in the hahahahahaha “@Absolutexe: ¿Le
edition of 2016 are able to classify about the han cambiado ya el nombre a la
94% of the test set. So, what is the main Junta de Andalucía por la Banda de
features of that 6% of tweets that any system Andalucía o aún no?”
inferred their polarity?
hahahahahaha “@Absolutexe: Has the
Number of systems Rate of tweets Junta de Andalucía renamed Gang of
0 0.056% Andalucía or not yet?”
1 0.065%
2 0.063% Polarity: N+
3 0.067%
4 0.059%
5 0.061% Figure 4: Tweet not rightly classified by any
6 0.074% system
7 0.078%
8 0.081%
9 0.112% Id: 177439342497767424
10 0.122%
11 0.082% Rubalcaba pide a Rajoy que
12 0.062%
presente ya los Presupuestos y dice
13 0.011%
que no lo hace porque espera a las
elecciones andaluzas
Table 8: Rate of tweets rightly classified (6
classes) by a number of systems Rubalcaba requires Rajoy to submit the
Budget and says that he didn’t because
he is waiting the results of the elections
Id: 171304000392663040 in Andalucia

Sacarle 17 puntos en la final de Polarity: NONE
Copa al Barça CB en el Palau
Sant Jordi es una pasada.
Figure 5: Tweet not rightly classified by any
Beating Barça by 17 points in the system
Copa is amazing

Polarity: P+ All the systems submitted are based on
linear classifiers that do not take into account
the context of each word, which means a big
Figure 3: Tweet not rightly classified by any drawback for the understanding the meaning of
system a span of text.
The tweets of the Figures 3, 4 and 5 show
that opinions and emotions are not only
Figures Figure 3,Figure 4Figure 5 are three expressed by lexical markers, so the future
examples of tweets that were not rightly participants should take into account the
classified by any system. The common feature challenging task of implicit opinion analysis,
of the three tweets is that they do not have any irony and sarcasm detection. These new
lexical marker that express emotion or opinion. problems may be framed on the semantic level
Moreover, the tweet of the Figure 4 is sarcastic, of Natural Language Processing and should be
which means an additional challenging for SA tackled by the research community in order to
because requires a deep understanding of the go a step further in the understanding of the
language. subjective information, which is continuously
published on the Internet.

19
M. Á. García Cumbreras, J. Villena Román, E. Martínez Cámara, M. C. Díaz Galiano, M. T. Martín Valdivia, L. A. Ureña López

6 Conclusions and Future Work Cerón-Guzmán, J. A. 2016. JACERONG at
TASS 2016: An Ensemble Classifier for
TASS was the first workshop about SA focused
Sentiment Analysis of Spanish Tweets at
on the processing of texts written in Spanish. In
Global Level. In Proceedings of TASS 2016:
the three first editions of TASS, the research
Workshop on Sentiment Analysis at SEPLN
community were mainly formed by Spanish
co-located with the 32nd SEPLN
researchers, however since the last edition, the
Conference (SEPLN 2016), Salamanca,
researchers that come from South America is
September
making bigger, so it is an evidence that the
research community of Sentiment Analysis in Casasola Murillo, E. and Gabriela M. R. 2016.
Spanish is not only located in Spain and is Evaluación de Modelos de Representación
formed by the Spanish speaking countries. del Texto con Vectores de Dimensión
Anyway, the developed corpus and gold Reducida para Análisis de Sentimiento. In
standards, and the reports from participants will Proceedings of TASS 2016: Workshop on
for sure be helpful for knowing the state of the Sentiment Analysis at SEPLN co-located
art in SA in Spanish. with the 32nd SEPLN Conference (SEPLN
The future work will be mainly focused on 2016), Salamanca, September
the definition of a new General Corpus because Hurtado, Ll. and Ferran P. 2016. ELiRF-UPV
of the following reasons: en TASS 2016: Análisis de Sentimientos en
1. The language used on Twitter changes Twitter. In Proceedings of TASS 2016:
faster than the language used in traditional Workshop on Sentiment Analysis at SEPLN
genres of texts, so the update of the corpus co-located with the 32nd SEPLN
is required in order to cover a real used of Conference (SEPLN 2016), Salamanca,
the language on Twitter. September
2. After several editions of the workshop, we
realize that the quality of the annotation is Montejo-Ráez, A. and Díaz-Galiano, M. C.
not extremely good, so it is required to 2016. Participación de SINAI en TASS
define a new corpus with a high quality 2016. In Proceedings of TASS 2016:
annotation in order to provide a real gold Workshop on Sentiment Analysis at SEPLN
standard for Spanish SA on Twitter. co-located with the 32nd SEPLN
3. The research community deeply know the Conference (SEPLN 2016), Salamanca,
General Corpus of TASS and it wants a September
new challenge. Pang, B., Lillian Lee and Shivakumar
A significant amount of new tasks is Vaithyanathan. 2002. Thumbs up?:
currently being defined in Natural Language Sentiment classification using machine
Processing, so some of them, such as stance learning techniques. In Proceedings of the
classification, will be studied to be proposal for ACL-02 Conference on Empirical Methods
the next edition of TASS. in Natural Language Processing - Volume
10, EMNLP ’02, páginas 79–86. Association
for Computational Linguistics, Stroudsburg,
Acknowledgements PA, USA. doi:10.3115/1118693.1118704.
This work has been partially supported by a Pang, B. and Lillian Lee (2008). Opinion
grant from the Fondo Europeo of Desarrollo mining and sentiment analysis. Foundations
Regional (FEDER) and REDES project and Trends in Information Retrieval, 2(1-
(TIN2015-65136-C2-1-R) from the Spanish 2):1–135. ISSN 1554-0669.
Government. doi:10.1561/1500000011.
Quirós, A., Isabel S. B. and Paloma M. 2016.
References LABDA at the 2016 TASS challenge task:
Cambria, E. and Amir Hussain, A. 2012. Sentic using word embeddings for the sentiment
Computing. Techniques, Tools and analysis task. In Proceedings of TASS 2016:
Applications. Springer Briefs in Cognitive Workshop on Sentiment Analysis at SEPLN
Computation, volume 2. Springer co-located with the 32nd SEPLN
Netherlands. ISBN 978-94-007-5069-2. Conference (SEPLN 2016), Salamanca,
doi:10.1007/978-94-007-5070-8. September

20
Overview of TASS 2016

Turney, P. D. 2002. Thumbs up or thumbs
down?: Semantic orientation applied to
unsupervised classification of reviews. In
Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics,
ACL ’02, pp: 417–424. Association for
Computational Linguistics, Stroudsburg,
PA, USA. doi:10.3115/1073083.1073153.
Villena-Román, J., Sara, L. S., Eugenio M. C.,
and José Carlos G. C. 2013. TASS -
Workshop on Sentiment Analysis at SEPLN.
Revista de Procesamiento del Lenguaje
Natural, 50, pp 37-44.
Villena-Román, J., Janine G. M., Sara L. S. and
José Carlos G. C. 2014. TASS 2013 - A
Second Step in Reputation Analysis in
Spanish. Revista de Procesamiento del
Lenguaje Natural, 52, pp 37-44.

21
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 23-28

Evaluación de Modelos de Representación del Texto con
Vectores de Dimensión Reducida para Análisis de Sentimiento∗
Evaluation of Reduced Dimension Vector Text Representation Models for
Sentiment Analysis

Edgar Casasola Murillo Gabriela Marı́n Raventós
Universidad de Costa Rica Universidad de Costa Rica
San José, Costa Rica San José, Costa Rica
edgar.casasola@ucr.ac.cr gabriela.marin@ucr.ac.cr

Resumen: Se describe el sistema para análisis de sentimiento desarrollado por el
Grupo de Análisis de Sentimiento GAS-UCR de la Universidad de Costa Rica para
la tarea 1 del workshop TASS 2016. El sistema propuesto está basado en el uso
de vectores de caracterı́sticas de baja dimensión para representación del texto. Se
propone un modelo simple fundamentado en la normalización de texto con identi-
ficación de marcadores de énfasis, el uso de modelos de lenguaje para representar
las caracterı́sticas locales y globales del texto, y caracterı́sticas como emoticones y
partı́culas de negación. Los primeros experimentos muestran las mejoras que se ob-
tienen en la precisión al identificar la polaridad de textos completos conforme se van
incorporando las caracterı́sticas aquı́ mencionadas.
Palabras clave: análisis de sentimiento, clasificación de textos por polaridad, textos
cortos
Abstract: The Sentiment Analisys System developed by GAS-UCR team of the
University of Costa Rica for task 1 of TASS 2016 workshop is presented. Prelimi-
nar evaluation results of the proposed Sentiment Analysis System are presented.
The system is based on low dimension feature vectors for text representation. The
proposed model is based on text normalization with emphasis mark identification,
the use of local and global language models, and other features like emoticons an
negation terms. Initial experimentation shows that the introduction of the selected
features have a positive impact on precision at the polarity classification task.
Keywords: sentiment analysis, polarity based text clasification, short texts.

1 Introducción ciales marcadores de énfasis presentes en el
Este trabajo tiene como propósito describir mismo, la creación de vectores de caracterı́sti-
el sistema utilizado por el grupo de investi- cas de dimensión reducida para disminuir el
gación en análisis de sentimiento de la Uni- efecto de la dispersión de los datos, y la ex-
versidad de Costa Rica en su participación ploración del impacto del uso de diccionarios
en el taller TASS2016 (Garcı́a-Cumbreras et de polaridad que se generan mediante la uti-
al., 2016). El enfoque del trabajo del grupo lización de diferentes modelos de representa-
ha sido el estudio de los factores que van inci- ción del lenguaje asociados tanto al contexto
diendo en las mejoras en la precisión obtenida local como global de los datos. Para esto es-
al llevar a cabo la clasificación de la polaridad tamos utilizando una adaptación propia del
de tweets en idioma español. Nuestro sistema algoritmo de Turney (Turney, 2002)sobre un
se fundamenta en tres elementos básicos que corpus de 5 millones de tweets en español.
son: la normalización del texto en la etapa Estos modelos se almacenan en forma de dic-
de preprocesamiento identificando los poten- cionarios con polaridad para su posterior re-
utilización. Nos interesa particularmente la
∗
Este trabajo se ha llevado a cabo gracias al apo- investigación en este campo dado que si bien
yo económico de la Universidad de Costa Rica y el desde el año 2013 se identificó una brecha im-
Gobierno de la República de Costa Rica a través del portante entre la cantidad de investigación y
MICITT. Se agradece a los asistentes del grupo de
investigación GAS-UCR por su trabajo tecnologı́a del lenguaje desarrollada para el
ISSN 1613-0073
E. Casasola Murillo, G. Marín Raventós

idioma inglés y el español (Cambria et al., buscan la representación vectorial de las pala-
2013) (Melero et al., 2012), de la misma for- bras en el espacio continuo como es el caso del
ma debemos tener presente que no necesaria- uso de Word2Vect (Dı́az-Galiano y Montejo-
mente las soluciones para español peninsular Ráez, 2015).
van a tener los mismos resultados al aplicarse
a variantes de español americano, por lo que 3 Descripción del sistema
los recursos y métodos que utilizamos tienen Nuestro sistema se fundamenta en cuatro
la intensión de aportar a la investigación en elementos que consideramos importantes de
español y colaborar para su posterior aplica- mencionar. Primero nos referiremos a la for-
ción en otros contextos de habla hispana. ma en que construimos nuestro diccionario
con la polaridad de los términos y las razones
2 Antecedentes para haber construido uno propio. Posterior-
Entre los resultados obtenidos con sistemas mente nos referimos a nuestro proceso de pre-
con enfoques basados en aprendizaje máqui- procesamiento e identificación de potenciales
na, el uso de máquina de soporte vecto- marcadores de énfasis durante esta etapa ini-
rial (MSV) ha ofrecido buenos resultados cial. En la siguiente subsección explicamos la
tanto en inglés (Kiritchenko, Zhu, y Moham- forma en que construimos vectores de baja di-
mad, 2014) y (Batista y Ribeiro, 2013) como mensión con información y hacemos uso del
en español donde 9 de los 14 sistemas para el diccionario. Finalmente se menciona la forma
español presentados en TASS2015 (Villena- en que se pretende capturar en los vectores de
Román et al., 2015) hacı́an uso de este ti- caracterı́sticas aspectos locales con respecto a
po de clasificador. Sin embargo, la dependen- los datos de entrenamiento, y globales, a par-
cia del lenguaje hace que estos clasificadores tir de modelos de representación del lenguaje
dependan de los vectores de caracterı́sticas general.
con los que son representados los comenta-
rios de texto. Esta extracción de caracterı́sti- 3.1 Creación del diccionario
cas ha sido el foco de atención de múltiples polarizado
trabajos como (Cabanlit y Junshean Espino- Decidimos desarrollar diccionarios de polari-
sa, 2014) , (Feldman, 2013), (Guo y Wan, dad propios, en lugar de utilizar los existen-
2012), (Sharma y Dey, 2012) y (Wang et al., tes, ya que consideramos que desde el punto
2011). En trabajos recientes de análisis de de vista del procesamiento de lenguaje natu-
sentimiento en español tales como el trabajo ral tradicional (Indurkhya y Damerau, 2010)
de (Martı́nez-Cámara et al., 2015) se utilizan estos diccionarios con polaridad pueden ser
varios diccionarios de polaridad y se represen- vistos cada uno, como un modelo de lenguaje
tan utilizando un modelo de espacio vectorial particular. Por este motivo tratamos de desa-
MEV. El diccionario en sı́ se convierte en un rrollar y evaluar una adaptación del tradi-
modelo de lenguaje que sirve como recurso cional método de generación de estos recur-
para lograr representaciones eficientes de los sos lingüı́sticos de (Turney, 2002). La deci-
vectores utilizados para la clasificación. sión anterior no se debió a la no existencia
En los últimos años la representación vec- de diccionarios polarizados ya que claramen-
torial basada en modelos de lenguaje como te en trabajos como (Martı́nez-Cámara et al.,
unigramas y bigramas se movió hacia repre- 2015) se hace uso de varios de ellos, sino con
sentaciones de caracterı́sticas ya que la canti- el fin de incorporar la etapa de creación de
dad de términos introduce un problema aso- diccionario dentro de la metodologı́a de tra-
ciado a su alta dispersión en el vector (Cam- bajo para que posteriores investigaciones en
bria et al., 2013). Si los vectores contienen otros paı́ses de habla hispana puedan replicar
un alto número de atributos diferentes, uno el trabajo y disminuir la barrera inicial aso-
por término, los conjuntos de datos para en- ciada a la falta de recursos lingüı́sticos pro-
trenamiento deben contener una mayor can- pios y el efecto del uso del diccionario pola-
tidad de textos anotados que atributos para rizado sobre la calidad de los resultados de
un buen entrenamiento de los clasificadores. clasificación.
Es por esto que los modelos de representación El diccionario de polaridad creado utiliza
del lenguaje basados en unigramas, bigramas un corpus recolectado durante el año 2013,
o bien skipgramas requiren de una represen- con 5 millones de tweets en español. La va-
tación vectorial eficiente. Trabajos recientes riante con respecto al algoritmo propuesto
24
Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento

por Turney (Turney, 2002) es la siguiente.
Para el cálculo de la orientación semánti-
ca de un término, tal y como lo define
Turney en su artı́culo original, se utilizaron
grupos de palabras semilla en lugar de un
solo término, y en lugar de utilizar consul-
tas a motores de búsqueda para obtener la
cantidad de textos donde aparecen las pala-
bras analizadas cerca de las palabras positi-
vas o negativas se utilizó el motor de búsque-
da implementado con el software libre Solr
http://lucene.apache.org/solr/. Con el
motor se indexaron los 5 millones de tweets
por lo que las consultas se ejecutaron en for-
ma local. Este método cuenta con la ventaja
de que se puede calcular entonces la orien-
tación semántica de un término directamen-
te o bien almacenarlo en un diccionario. En
nuestro caso precalculamos la polaridad y la
almacenamos en forma de diccionario. Por el
momento solo se han llevado a cabo los cálcu-
los para términos individuales. Figura 1: Proceso de normalizacion del texto

3.2 Normalizador de texto con ca también fue incorporada. La presencia de
marcadores de énfasis marcadores de énfasis tales como repetición
Luego de un proceso de análisis de las ca- de caracteres, de sı́labas, o mayúsculas so-
racterı́sticas presentes en el texto desarrolla- bre términos que aparecen como negativos en
mos un sistema para normalización del texto. algún contexto son registrados como una ca-
Para este preprocesamiento se segmentan los racterı́stica importante en el vector.
términos potenciales, signos de puntuación y Los vectores generados utilizan la polari-
emoticones. Se lleva a cabo un marcado y dad de los términos para determinar la po-
conversión de los términos. El proceso que se- sición en el vector de caracterı́sticas creado.
guimos hace una eliminación de los términos Cabe dejar claro que dependiendo del modelo
que son identificados en el diccionario. Este de datos los términos pueden ser unigramas,
proceso se muestra en la figura 1. bigramas o skipgramas. En el caso de los uni-
Las repeticiones de letras, repeticiones de gramas, por ejemplo, si se construye un vec-
sı́labas y mayúsculas son identificadas y eli- tor con la frecuencia de los términos según
minadas pero estos términos se marcan como su polaridad con valores de polaridad desde
potenciales identificadores de énfasis. Ejem- -1.0 hasta 1.0, el vector que se obtiene serı́a
plos son: EXCELENTE, graciassss, bue- como el que se muestra en la figura 2. En este
nisı́simo. En esta fase se identifican los vector por ejemplo se muestran dos términos
tweets que contienen palabras positivas con con polaridad, según diccionario, entre el -0.8
énfasis para su posterior uso. y -0.9, un término con polaridad entre 0.1 y
0.2, y otro con polaridad mayor a 0.9. En es-
3.3 Representación vectorial de te caso, en nuestro diccionario, la polaridad
baja dimensión se representa con valores distribuidos desde
Dos caracterı́sticas representadas en los vec- lo más negativo hasta lo positivo con valores
tores tienen que ver con la presencia y po- entre -1.0 y 0 para los negativos y 0 a 1.0 para
laridad de los emoticones y con la presen- los positivos.
cia de partı́culas de negación. Además, al Para el taller TASS2016 quisimos evaluar
desarrollar esta investigación se pudo obser- inicialmente el uso de vectores con la menor
var que los términos positivos con marcado- dimensión posible, ası́ que en lugar de vecto-
res de énfasis son un potencial identificador res de 20 celdas utilizamos solo vectores de 5
de la polaridad positiva de los textos que celdas para cada grupo de caracterı́sticas, en
los contienen, por lo tanto esta caracterı́sti- lugar de saltos de 0.1 el rango utilizado es de
25
E. Casasola Murillo, G. Marín Raventós

Figura 2: Vector de caracterı́sticas

0.5. skip-gramas previos. Por el momento es-
tas variantes no fueron enviadas como expe-
3.4 Modelos locales y globales de rimentos a TASS2016 sino solo las versiones
representación del lenguaje iniciales.
Nuestra propuesta pretende representar en
los vectores de caracterı́sticas información
propia obtenida durante el proceso de entre- 4 Metodologı́a
namiento, al igual que datos que represen-
ten información obtenida de modelos de len- Utilizando el diccionario, el normalizador y
guaje del español en general. En nuestro caso el modelo de representación vectorial se pro-
se utilizó inicialmente el diccionario genera- cedió a crear vectores de respresentación con
do a partir del corpus recolectado como insu- diferentes configuraciones. Primeramente se
mo para obtener de él la información general construyó una versión con vectores de di-
del español. En el momento de entrenamien- mensión 20 distribuyendo la polaridad de los
to, la polaridad de los términos en cada tweet términos según la polaridad almacenada pa-
son conocidos para ese conjunto de datos. ra unigramas en el diccionario local. En este
La información global es la que se ha calcu- caso se pretende evaluar solamente el uso del
lado previamente y se encuentra almacena- diccionario y los marcadores de énfasis como
da en forma de diccionarios. En nuestra pro- repeticiones y mayúsculas. Este primer expe-
puesta lo que queremos hacer es representar rimento es el denominado GASUCR-01. El
en el vector las frecuencias de los términos segundo experimento consistió en evaluar un
de cada tweet distribuidos según su polari- modelo un poco más robusto a nivel local con
dad pero utilizar diferentes modelos de re- bigramas y la polaridad para el unigrama en
presentación de lenguaje para llevar a cabo el diccionario, si el bigrama no está presen-
este cálculo. El diccionario utilizado en es- te durante el proceso de evaluación. En este
tos experimentos fue nuestra versión con uni- caso se crearon vectores de menor dimensión
gramas. Se pretende utilizar representaciones para los datos locales, con solo cinco campos.
con bigramas y una versión de skipgramas Esta ejecución se idendificó como experimen-
que incluye solo los términos anteriores a la to GASUCR-01-noEMO-noPartNeg. Esta es
palabra que se desea representar. Durante el la implementación base para luego evaluar el
entrenamiento, la polaridad obtenida en for- uso de bigramas tomados del contexto glo-
ma local es almacenada al igual que las fre- bal. Esta versión base también fue enviada
cuencias tomadas de diccionarios de polari- a la tarea de 4 categorı́as. En este caso, lo
dad global. Por lo tanto, los vectores cuen- que se hizo fue unir las categorı́as +P y P en
tan con entradas para las distribuciones de una sola, y la categorı́a +N con la N. El ter-
polaridad local y las distribuciones de polari- cer experimento agregaba al anterior el uso
dad global. Aquı́ es donde incorporamos los de los emoticones, aparición de términos po-
diferentes modelos de lenguaje. Inicialmente sitivos con énfasis y las partı́culas negativas.
trabajamos con unigramas para obtener re- En los resultados esta versión se identificó co-
sultados base para posteriores experimentos. mo GASUCR-04 En esta versión de TASS no
Posteriormente, se genera un diccionario para nos dió tiempo de ejecutar las versiones con
bigramas y otro para lo que definimos como bigramas globales, ni skipgramas.
26
Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento

5 Resultados de baja dimensión, y modelos de represen-
Los resultados oficiales obtenidos para las eje- tación del texto con caracterı́sticas locales y
cuciones antes mencionadas son los que se globales. El trabajo además hace uso de ca-
muestran en las Tablas 1 y 2. En estas figu- racterı́sticas comunes con otros como los son
ras la columna Ac. muestra la exactitud, P el uso de emoticones y partı́culas negativas.
se refiere a la Macro Precisión, R al Ma- Como trabajo futuro tenemos pendiente la
cro Exhaustividad y F1 al Macro F1. En evaluación usando 3 categorı́as de los datos
los resultados generales de TASS los resul- que hacen uso de contexto local con bigra-
tados del grupo aparecen con el id indica- mas y caracterı́sticas adicionales como uso
do bajo el nombre del grupo GASUCR. En de emoticones, palabras positivas con énfasis,
nuestro caso con el experimento 01 obtene- y partı́culas de negación. Esperamos que los
mos los casos base para el uso de unigramas mejores resultados sean obtenidos al incorpo-
globales con vectores de dimensión 20 y los rar los nuevos modelos de lenguaje que esta-
bigramas locales con dimensión 5. Es impor- mos calculando para bigramas y skipgramas
tante observar que los bigramas locales con previos al unirlo con nuestro método de re-
dimensión 5 y las caracterı́sticas de énfasis presentación en vectores de baja dimensión.
positivo, partı́culas de negación y emoticones Se desea estudiar el efecto de la reducción
producen un leve incremento pasando de 0.32 del tamaño del vector al igual que técnicas
a 0.41. Otro aspecto que rescatamos es el au- de extrapolación de la polaridad en los mo-
mento de la exactitud al pasar a la tarea de delos para los términos que no aparecen en
3 categorı́as. los datos de entrenamiento.

Bibliografı́a
Tabla 1: Resultados Tarea 1 con 5 levels y Batista, F. y R. Ribeiro. 2013. Sentiment
corpus completo) analysis and topic classification based on
id Ac. P R F1 binary maximum entropy classifiers. Pro-
01 0.342 0.217 0.237 0.227 cesamiento de Lenguaje Natural, 50:77–
01-noEmNeg 0.326 0.334 0.258 0.291 84.
04 0.410 0.268 0.242 0.254
Cabanlit, M. A. y K. Junshean Espinosa.
2014. Optimizing n-gram based text fea-
ture selection in sentiment analysis for
Tabla 2: Resultados Tarea 1 con 3 niveles y commercial products in twitter through
corpus completo polarity lexicons. En Information, Inte-
id Ac. P R F1
lligence, Systems and Applications, IISA
01-noEmNeg 0.373 0.212 0.303 0.250 2014, The 5th International Conference
on, páginas 94–97. IEEE.
Estos casos se fueron seleccionando para ir
evaluando en forma incremental cada uno de Cambria, E., B. Schuller, Y. Xia, y C. Hava-
los aspectos relacionados a nuestra propues- si. 2013. New avenues in opinion mining
ta. Con cada caracterı́stica nueva se trata de and sentiment analysis. Intelligent Sys-
determinar su impacto sobre los valores de tems, IEEE, PP(99):1–1.
exactitud, precisión y exhaustividad. Dı́az-Galiano, M. y A. Montejo-Ráez. 2015.
Participación de sinai dw2vec en tass
6 Conclusiones y trabajo futuro 2015. En Proceedings del Taller TASS
El marco de evaluación de TASS es provecho- 2015 en Análisis de Sentimiento de la XX-
so para los grupos que inician la investigación XI Conferencia SEPLN 2015, páginas 59–
en análisis de sentimiento en español con el 64.
fin de extenderla a otras latitudes. En nues-
Feldman, R. 2013. Techniques and appli-
tro caso pudimos evaluar y comparar la ca-
cations for sentiment analysis. Commun.
lidad de los resultados de los primeros casos
ACM, 56(4):82–89, Abril.
base de nuestro trabajo. Observamos los pri-
meros resultados con un sistema que utiliza Garcı́a-Cumbreras, M., J. Villena-Román,
un método de normalización con identifica- E. Martı́nez Cámara, M. C. Dı́az-
ción de potenciales marcadores de énfasis, un Galiano, M. T. Martı́n Valdivia, y L. A.
modelo de representación basado en vectores Ureña López. 2016. Overview of
27
E. Casasola Murillo, G. Marín Raventós

tass 2016. En Proceedings of TASS sis in twitter: a graph-based hashtag sen-
2016: Workshop on Sentiment Analysis at timent classification approach. En Pro-
SEPLN co-located with the 32nd SEPLN ceedings of the 20th ACM international
Conference (SEPLN 2016), Salamanca, conference on Information and knowledge
Spain, September. management, páginas 1031–1040. ACM.
Guo, L. y X. Wan. 2012. Exploiting syntactic
and semantic relationships between terms
for opinion retrieval. Journal of the ame-
rican society for information science and
technology, 63(11):2269–2282, Noviembre.
Indurkhya, N. y F. J. Damerau. 2010. Hand-
book of natural language processing, volu-
men 2. CRC Press.
Kiritchenko, S., X. Zhu, y S. M. Mohammad.
2014. Sentiment analysis of short infor-
mal texts. Journal of Artificial Intelligen-
ce Research, páginas 723–762.
Martı́nez-Cámara, E., M. Á. Garcı́a-
Cumbreras, M. T. Martı́n-Valdivia, y
L. A. Ureña-L’opez. 2015. Sinai-emma:
Vectores de palabras para el análisis de
opiniones en twitter. En Proceedings
del Taller TASS 2015 en Análisis de
Sentimiento de la XXXI Conferencia
SEPLN 2015, páginas 41–46.
Melero, M., A.-B. Cardús, A. Moreno,
G. Rehm, K. de Smedt, y H. Uszkoreit.
2012. The Spanish language in the digital
age. Springer.
Sharma, A. y S. Dey. 2012. A comparati-
ve study of feature selection and machine
learning techniques for sentiment analysis.
En Proceedings of the 2012 ACM Research
in Applied Computation Symposium, pági-
nas 1–7. ACM.
Turney, P. D. 2002. Thumbs up or thumbs
down?: semantic orientation applied to
unsupervised classification of reviews. En
Proceedings of the 40th annual meeting on
association for computational linguistics,
páginas 417–424. Association for Compu-
tational Linguistics.
Villena-Román, J., J. Garcı́a Morera,
M. Á. Garcı́a-Cumbreras, E. M. Cámara,
M. T. M. Valdivia, y L. A. U. López.
2015. Overview of tass 2015. En Procee-
dings del Taller TASS 2015 en Análisis
de Sentimiento de la XXXI Conferencia
SEPLN 2015, páginas 13–21.
Wang, X., F. Wei, X. Liu, M. Zhou, y
M. Zhang. 2011. Topic sentiment analy-
28
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 29-33

LABDA at the 2016 TASS challenge task: using word
embeddings for the sentiment analysis task∗
LABDA en la competición TASS 2016: utilizando vectores de palabras para
la tarea de análisis de sentimiento

Antonio Quirós1,2 , Isabel Segura-Bedmar1 , and Paloma Martı́nez1
1
Departamento de Informática, Universidad Calos III de Madrid
Avd. de la Universidad, 30, 28911, Leganés, Madrid, España
100342879@alumnos.uc3m.es, isegura,pmf@inf.uc3m.es
2
Sngular Data&Analytics
Av. LLano Castellano 13, Planta 5, 28034 Madrid, España
antonio.quiros@sngular.team

Resumen: Este artı́culo describe la participación del grupo LABDA en la tarea
1 (Sentiment Analysis at global level) de la competición TASS 2016. En nuestro
enfoque, los tweets son representados por medio de vectores de palabras y son cla-
sificados utilizando algoritmos como SVM y regresión logı́stica.
Palabras clave: Análisis de Sentimiento, Vectores de palabras
Abstract: This paper describes the participation of the LABDA group at the Task
1 (Sentiment Analysis at global level). Our approach exploits word embedding re-
presentations for tweets and machine learning algorithms such as SVM and logistics
regression.
Keywords: Sentiment Analysis, Word embeddings

1 Introduction resources for sentiment analysis of tweets in
Knowing the opinion of customers or users Spanish. This paper describes the participa-
has become a priority for companies and or- tion of the LABDA group at the Task 1 (Sen-
ganizations in order to improve the quality of timent Analysis at global level). In this task,
their services and products. With the ongoing the participating systems have to determine
explosion of social media, it affords a signifi- the global polarity of each tweet in the test
cant opportunity to poll the opinion of many dataset. There are two different evaluations:
Internet users by processing their comments. one based on 6 different polarity labels (P+,
However, it should be noted that sentiment P, NEU, N, N+, NONE) and another based
analysis, which can be defined as the auto- on just 4 labels (P, N, NEU, NONE). A de-
matic analysis of opinion in texts (Pang and tailed description of the task can be found
Lee, 2008), is a challenging task because it is in the overview paper of TASS 2016 (Garcı́a-
not strange that different people assign dif- Cumbreras et al., 2016). Our approach ex-
ferent polarities to a given text. On Twitter, ploits word embedding representations for
the task is even more difficult, because the tweets and machine learning algorithms such
texts are small (only 140 characters) and are as SVM and logistics regression. The word
charectized by their informal style language, embedding model can yield significant dimen-
many grammatical errors and spelling mista- sionality reduction compared to the classical
kes, slang and vulgar vocabulary and abbre- Bag-Of-Word (BoW) model. The dimensio-
viations. nality redution can have several positive ef-
fects on our algorithms such as faster trai-
Since their introduction in 2013, the TASS
ning, avoiding overfitting and better perfor-
shared task editions have had as main goal
mance.
to promote the development of methods and
The paper is organized as follows. Section
∗
This work was supported by eGovernAbility-Access 2 describes our approach. The experimental
project (TIN2014-52665-C2-2-R). results are presented and discussed in Section
ISSN 1613-0073
A. Quirós, I. Segura-Bedmar, P. Martínez

3. We conclude in Section 4 with a summary vert the tweets to lowercase and replace miss-
of our findings and some directions for future pelled accented letters with the correct one
work. (for instance “à” with “á”). We also treat
elongations (that is, the repetition of a cha-
2 System racter) by removing the repetition of a cha-
In this paper, we study the use of word em- racter after its second occurrence (for exam-
beddings (also known as word vectors) in or- ple, “hoooolaaaa” would be translated to
der to represent tweets and then examine se- “hola”). We then decided to take into account
veral machine learning algorithms to classify laughs (for instance “jajaja”) which turned
them. Word embeddings have shown promi- out to be challenging because of the diverse
sing results in NLP tasks, such as named ways they are expressed (i.e. expressions li-
entity recognition (Segura-Bedmar, Suárez- ke “jajajaja” or “jejeje” and even misspelled
Paniagua, and Martınez, 2015), relation ex- ones like “jajjajaaj”) We addressed this using
traction (Alam et al., 2016), sentiment analy- regular expressions to standardize the diffe-
sis (Socher et al., 2013b) or parsing (Socher rent forms (i.e. “jajjjaaj” to “jajaja”) and
et al., 2013a). A word embedding is a fun- then replace them with the word “risas”. Fi-
ction to map words to low dimensional vec- nally we remove all non-letters characters and
tors, which are learned from a large collection all stopwords present in tweets1 .
of texts. At present, Neural Network is one of Orientation Emoticons
the most used learning techniques for gene- Positive :-), :), :D, :o), :], D:3,
rating word embeddings (Mikolov and Dean, :c), :>, =], 8), =),
2013). The essential assumption of this mo- :}, :ˆ), :-D, 8-D, 8D,
del is that semantically close words will have x-D, xD, X-D, XD,
similar vectors (in terms of cosine similarity). =-D, =D, =-3, =3,
Word embeddings can help to capture seman- BˆD, :’), :’), :*, :-*,
tic and syntactic relationships of the corres- :ˆ*, ;-), ;), *-), *), ;-
ponding words. ], ;], ;D, ;ˆ), >:P, :-P,
While the well-known Bag-of-Words :P, X-P, x-p, xp, XP,
(BoW) model involves a very large number :-p, :p, =p, :-b, :b
of features (as many as the number of non-
stopwords words with at least a minimum Negative >:[, :-(, :(, :-c, :-<,
number of occurrences in the training data), :<, :-[, :[, :{, ;(, :-
the word embedding representation allows ||, >:(, :’-(, :’(, D:<,
a significant reduction in the feature set D=, v.v
size (in our case, from million to just 300).
The dimensionality reduction is a desirable
goal, because it helps in avoiding overfitting
and leads to a reduction of the training and Table 1: List of positive and negative emoti-
classification times, without any performance cons
loss.
As a preprocessing step, tweets must be Once the tweets are preprocessed, they are
cleaned. First, we remove all links and urls. tokenized using the NLKT toolkit (a Pyt-
We then remove usernames which can be ea- hon package for NLP); we also performed
sily recognized because their first character is experimentation by lemmatizing each tweet
the symbol @. We then transform the hash- using MeaningCloud2 Text Analytic software
tags to words by removing its first charac- to compare both approaches. Then, for each
ter (that is, the symbol #). Taking advanta- token, we search its vector in the word em-
ge of regular expressions, the emoticons are bedding model. We use a pretrained model
detected and classified in order to count the (Cardellino, 2016), which was generated by
number of positive and negative emoticons in using the word2vec algorithm (Mikolov and
each tweet and then we remove them from the Dean, 2013) from a collection of Spanish texts
text. Table 1 shows the list of positive and with approximately 1.5 billion words. The di-
negative emoticons, which were taken from mension of the word embedding is 300. It
the wikipedia page https://en.wikipedia. 1
http://snowball.tartarus.org/algorithms/spanish/stop.txt
2
org/wiki/List\_of\_emoticons. We con- https://www.meaningcloud.com/
30
LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task

should be noted that these texts were ta- negEmo: number of negative emoticons
ken from different resources such as Spanish present in the tweet.
Wikipedia, WikiSource and Wikibooks, but
none of them contains tweets. Therefore, it For the posWords and negWords features
is possible that the main characteristics of we used the iSOL lexicon(Molina-González et
the social media texts (such as informal style al., 2013), a list composed by 2,509 positive
language, noisy, plenty of grammatical errors words and 5,626 negative words. As descri-
and spelling mistakes, slang and vulgar voca- bed before, for the emoticons we used the lis-
bulary, abbreviations, etc) are not correctly ted in Table 1, but also added to the positive
represented in this model. One of the main ones the number of laughs detected; and also,
problems is that there is a significant number we included the number of recommendations
of words (almost a 13 % of the vocabulary, re- present in the form of a “Follow Friday” hash-
presenting the 6 % of words occurrences) that tag (#FF), due to its ease of detection and
are not found in the model. We perform a re- its positive bias.
view of a small sample of these words, sho- Classification is performed using scikit-
wing that most of them were mainly hash- learn, a Python module for machine learning.
tags. This package provides many algorithms such
In our approach, a tweet of n tokens (T = as Random Forest, Support Vector Machine
w1 , w2 , ..., wn ) is represented as the centroid (SVM) and so on. One of its main advantages
of the word vectors w ~i of its tokens, as shown is that it is supported by extensive documen-
in the following equation: tation. Moreover, it is robust, fast and easy
to use.
n PN As stated before, we have two main trai-
1 j=1 w
~j .T F (wj , t)
T~ =
X
w
~i = PN (1) ning models: Averaged centroids and the ave-
n i=1 j=1 T F (wj , t) raged centroids including the inverted docu-
ment frequency, for both the lemmatized and
where N is the vocabulary size, that is,
not-lemmatized texts. We performed experi-
the total number of distinct words, while
ments using three different classifiers: Ran-
T F (wj , t) refers to the number of occurren-
dom Forests, Support Vector Machines and
ces of the j-th vocabulary word in the tweet
Logistic Regression because these classifiers
T.
often achieved the best results for text clas-
We also explore the effect of including the
sification and sentiment analysis.
inverse document frequencies IDF to repre-
Also we evaluated the impact of applying
sent tweets (see Equation 2). This helps to
a set of emoticon’s rules as a pre-classification
increase the weight of words that occur of-
stage, similar to (Chikersal et al., 2015), in
ten, but only in a few documents, while it re-
which we determine a first stage polarity for
duces the relevance of words that occur very
each tweet as follows:
frequently in a larger number of texts.
If posEmo is greater than zero and negE-
n PN mo is equal to zero, the tweet is marked
1 j=1 w
~j .T F (wj , t).IDF (wj )
T~ =
X
w
~i = PN as “P”.
n i=1 j=1 T F (wj , t).IDF (wj )
(2) If negEmo is greater than zero and posE-
log|D|
having IDF (wj ) = |tw∈D:w where |D| mo is equal to zero, the tweet is marked
j ∈tw|
as “N”.
refers to the number of tweets.
In addition to using the centroid, we assess If both posEmo and negEmo are grea-
the impact of complementing the tweet model ter than zero, the tweet is marked as
with the following additional features: “NEU”.
posWords: number of positive words pre- If both posEmo and negEmo are equal to
sent in the tweet. zero, the tweet is marked as “NONE”.
negWords: number of negative words Then, after the classification takes place
present in the tweet. we made three tests: i) Applying no rule,
posEmo: number of positive emoticons ii) honoring the polarity defined by the rule,
present in the tweet. which means, we keep the predefined polarity
31
A. Quirós, I. Segura-Bedmar, P. Martínez

if the tweet was marked as “P” or “N”, ot- Run P R F1 Acc
herwise we take the value estimated by the RUN-1 0.411 0.449 0.429 0.527
classifier, and iii) a mixed approach where RUN-2 0.412 0.448 0.429 0.527
we give each polarity a value (N+: -2; N: -1; RUN-3 0.402 0.436 0.418 0.549
NEU,NONE: 0; P: 1; P+: 2) and performed
an arithmetic sum of both the predefined and
estimated polarity if and only if they are not Table 2: Results for Sentiment Analysis at
equal; with that for instance, if the classifier global level (5 levels, Full test corpus)
marked a tweet as “N” and the rules mar-
ked it as “P” the tweet will be classified as Run P R F1 Acc
“NEU”. RUN-1 0.506 0.510 0.508 0.652
RUN-2 0.508 0.508 0.508 0.652
3 Results RUN-3 0.512 0.511 0.511 0.653
In order to choose the best-performing clas-
sifiers, we use 10-fold cross-validation becau- Table 3: Results for Sentiment Analysis at
se there is no development dataset and this global level (3 levels, Full test corpus)
strategy has become the standard method
in practical terms. Our experiments showed
that, although the results were similar3 , the With the settings mentioned above, the
best settings for the 5-levels task are: obtained results are extremely similar, but we
can state that, in terms of Accuracy, Logis-
RUN-1: Support Vector Machine, over tic Regression report the best results; and,
the averaged centroids without applying even it’s not measured in this work, is worth
any rules for pre-defining polarities. mentioning that Logistic Regression’s perfor-
RUN-2: Support Vector Machine, over mance was observably faster.
the averaged centroids and applying the
mixed rules approach. 4 Conclusions and future work
RUN-3: Logistic Regression, over the This paper explores the use of word embed-
centroids with inverted document fre- dings for the task of sentiment analysis. Ins-
quency and applying the mixed rules ap- tead of using, the bag-of-words model to re-
proach. present tweets, these are represented as word
vectors taken from a pre-trained model of
and for the 3-levels task are: word embeddings. An important advantage
of word embedding model compared to the
RUN-1: Support Vector Machine, over
technique of bag-of-words representation is
the averaged centroids and applying the
that it achieves a significant dimensional re-
mixed rules approach.
duction of the feature set needed to represent
RUN-2: Logistic Regression, over the tweets and leads, therefore, to a reduction of
centroids with inverted document fre- training and testing time of the algorithms.
quency and applying the mixed rules ap- In order to use word embedding models
proach. properly, a preprocessing stage had to be
RUN-3: Logistic Regression, over the completed before training a classifier. Due to
averaged centroids and applying the mi- the unstructured nature of the tweets, this
xed rules approach. preprocessing proved to be a very important
step in order to standardize at some degree
Tables 2 and 3 show the results for the- the input data. The experimentation showed
se settings provided by the TASS submission that the three tested classifiers obtained very
system. For each run, accuracy is provided as similar results, with Random Forest having
well as the macro-averaged precision, recall slight worse performance and Logistic Re-
and F1-measure. As expected, the results for gression being slightly better and much more
3 levels are higher than for 5 levels because faster.
the training dataset is larger. One of the main drawback of our approach
3
Experiments showed that not-lemmatized text
is that many words do not have a word vector
performed better in all settings, hence the best set- in the word embedding model used for our
tings reported here is using not-lematized model experiments. An analysis showed that many
32
LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task

of these words come from hashtags, which are Pang, B. and L. Lee. 2008. Opinion mining
usually short phrases. Therefore, we should and sentiment analysis. Foundations and
apply a more sophisticated method in order trends in information retrieval, 2(1-2):1–
to extract the words forming hashtag. 135.
As future work, we also plan to use a word
Segura-Bedmar, I., V. Suárez-Paniagua, and
embedding model trained on a collection of
P. Martınez. 2015. Exploring word
text from Spanish social media. We think
embedding for drug name recognition.
that this will have a positive effect of the per-
In SIXTH INTERNATIONAL WORKS-
formance of our system to identify the pola-
HOP ON HEALTH TEXT MINING AND
rity of tweets because this model will be ge-
INFORMATION ANALYSIS (LOUHI),
nerated from documents characterized by the
page 64.
main features that describe social media texts
(for example, informal style language, plenty Socher, R., J. Bauer, C. D. Manning, and
of grammatical errors and spelling mistakes, A. Y. Ng. 2013a. Parsing with composi-
slang and vulgar vocabulary). tional vector grammars. In ACL (1), pa-
ges 455–465.
Acknowledgments
Socher, R., A. Perelygin, J. Y. Wu,
This work was supported by eGovernAbility- J. Chuang, C. D. Manning, A. Y. Ng, and
Access project (TIN2014-52665-C2-2-R). C. Potts. 2013b. Recursive deep models
for semantic compositionality over a sen-
References timent treebank. In Proceedings of the
Alam, F., A. Corazza, A. Lavelli, and R. Za- conference on empirical methods in natu-
noli. 2016. A knowledge-poor approach to ral language processing (EMNLP), volume
chemical-disease relation extraction. Da- 1631, page 1642. Citeseer.
tabase, 2016:baw071.
Cardellino, C. 2016. Spanish Billion Words
Corpus and Embeddings, March.
Chikersal, P., S. Poria, E. Cambria, A. Gel-
bukh, and C. E. Siong. 2015. Modelling
public sentiment in twitter: using linguis-
tic patterns to enhance supervised lear-
ning. In International Conference on Inte-
lligent Text Processing and Computational
Linguistics, pages 49–65. Springer.
Garcı́a-Cumbreras, M. A., J. Villena-Román,
E. Martı́nez-Cámara, M. C. Dı́az-Galiano,
M. T. Martı́n-Valdivia, and L. A. U.
na López. 2016. Overview of tass 2016.
In Proceedings of TASS 2016: Works-
hop on Sentiment Analysis at SEPLN co-
located with the 32nd SEPLN Conferen-
ce (SEPLN 2016), Salamanca, Spain, Sep-
tember.
Mikolov, T. and J. Dean. 2013. Distributed
representations of words and phrases and
their compositionality. Advances in neural
information processing systems.
Molina-González, M. D., E. Martı́nez-Cáma-
ra, M.-T. Martı́n-Valdivia, and J. M.
Perea-Ortega. 2013. Semantic orientation
for polarity classification in spanish re-
views. Expert Systems with Applications,
40(18):7250–7257.
33
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 35-39

JACERONG at TASS 2016: An Ensemble Classifier for
Sentiment Analysis of Spanish Tweets at Global Level
JACERONG en TASS 2016: Combinación de clasificadores para el
análisis de sentimientos de tuits en español a nivel global

Jhon Adrián Cerón-Guzmán
Santiago de Cali, Valle del Cauca, Colombia
jadrian.ceron@gmail.com

Resumen: Este artı́culo describe un enfoque basado en conjuntos de clasificadores
que se ha desarrollado para participar en la Tarea 1 del taller TASS sobre análisis de
sentimientos de tuits en español a nivel global. Los conjuntos se construyen sobre
la combinación de sistemas con la correlación absoluta más baja entre sı́. Estos
sistemas son capaces de tratar con formas léxicas no estándar en los tweets, con el fin
de mejorar la calidad del análisis de lenguaje natural. Para realizar la clasificación
de polaridad, el enfoque utiliza caracterı́sticas básicas que han probado su poder
discriminativo, ası́ como caracterı́sticas de n-gramas de palabras y caracteres. Luego,
las salidas de clasificadores de Regresión logı́stica, que pueden ser etiquetas de clase o
probabilidades para cada clase, se utilizan para construir conjuntos de clasificadores.
Los resultados experimentales muestran que la combinación menos correlacionada
de 25 sistemas, la cual elige la clase con la probabilidad promedio no poderada más
alta, es la configuración que mejor se adapta a la tarea, alcanzando una precisión
global de 62.0% en la evaluación de seis etiquetas, y de 70.5% en la evaluación de
cuatro etiquetas.
Palabras clave: Análisis de sentimientos, clasificación de polaridad, combinación
de clasificadores, normalización léxica, tuis en español, Twitter
Abstract: This paper describes an ensemble-based approach developed to partic-
ipate in TASS-2016 Task 1 on sentiment analysis of Spanish tweets at global level.
Ensembles are built on the combination of systems with the lowest absolute correla-
tion with each other. The systems are able to deal with non-standard lexical forms
in tweets, in order to improve the quality of natural language analysis. To support
the polarity classification, the approach uses basic features that have proved their
discriminative power, as well as word and character n-gram features. Then, outputs
from Logistic Regression classifiers, which may be either class labels or probabilities
for each class, are used to build ensembles. Experimental results show that the
less-correlated combination of 25 systems, which chooses the class with the highest
unweighted average probability, is the setting that best suits to the task, achieving
an overall accuracy of 62.0% in the six-labels evaluation, and of 70.5% in the four-
labels evaluation.
Keywords: Ensemble classifier, lexical normalization, polarity classification, senti-
ment analysis, Spanish tweets, Twitter

1 Introduction tional methods. Around election time, sen-
What people say on social media about is- timent analysis of political tweets has been
sues of their everyday life, the society, and widely used to capture trends in public opin-
the world in general, has turned into a rich ion regarding important issues such as vot-
source of information to understand social ing intention (Gayo-Avello, 2013). However,
behavior. Twitter content, in particular, analyzing this content also presents several
has caught the attention of researchers who challenges, including the development of text
have investigated its potential for conducting analysis approaches based on Natural Lan-
studies on the human subjectivity at large guage Processing techniques, which properly
scale, which was not feasible using tradi- adapt to the informal genre and the free writ-
ISSN 1613-0073
J. A. Cerón-Guzmán

ing style of Twitter (Han and Baldwin, 2011; 2.1 Preprocessing
Cerón-Guzmán and León-Guzmán, 2016). The process of text cleaning and normaliza-
TASS is a workshop aimed at fostering re- tion is performed in two phases: basic pre-
search on sentiment analysis of Spanish Twit- processing and advanced preprocessing.
ter data, which provides a benchmark evalu-
2.1.1 Basic Preprocessing
ation to compare the latest advances in the
field (Garcı́a-Cumbreras et al., 2016). One of The following simple rules are implemented
the proposed tasks is to determine the opin- as regular expressions:
ion orientation expressed in tweets at global
level. Task 1 consists on assigning one of • Removing URLs and emails.
six labels (P+, P, NEU, N, N+, NONE) to • HTML entities are mapped to textual
a tweet in the six-labels evaluation; or one representations (e.g., “<” → “<”).
of four labels (P, NEU, N, NONE) in the
four-labels evaluation. Here, P, N, and NEU, • Specific Twitter terms such as mentions
stand for positive, negative, and neutral, re- (@user) and hashtags (#topic) are re-
spectively; NONE, instead, means no senti- placed by placeholders.
ment. The “+” symbol is used as intensifier. • Unknown characters are mapped to their
This paper presents an ensemble-based closest ASCII variant, using the Python
approach to polarity classification of Span- Unidecode module for the mapping.
ish tweets, developed to participate in Task 1
proposed by the organizing committee of the • Consecutive repetitions of a same char-
TASS workshop. The ensemble members are acter are reduced to one occurrence.
(relatively) highly correct classifiers with the • Emoticons are recognized and then clas-
lowest absolute correlation with each other. sified into positive and negative, ac-
The output from each classifier, which may cording to the sentiment they convey
be either a class label or probabilities for each (e.g., “:)” → “EMO POS”, “:(” →
class, is used to assign the polarity to a tweet “EMO NEG”).
based on a majority rule or on the highest un-
weighted average probability. Moreover, clas- • Unification of punctuation marks (Vi-
sifiers are adapted to deal with non-standard lares, Alonso, and Gómez-Rodrıguez,
lexical forms in tweets, in order to improve 2014).
the quality of natural language analysis.
The remainder of this paper is organized 2.1.2 Advanced Preprocessing
as follows. Section 2 describes the com- Once the set of simple rules has been applied,
mon architecture of the ensemble members the tweet text is tokenized and morpho-
(i.e., classifiers). Next, the submitted exper- logically analyzed by FreeLing (Padró and
iments, as well as the obtained results, are Stanilovsky, 2012). In this way, for each re-
discussed in Section 3. Finally, Section 4 con- sulting token, its lemma and Part-of-Speech
cludes the paper. (POS) tag are assigned. Taking these data
as input, the following advanced preprocess-
2 The System Architecture ing is applied:

The tweet text is passed through the pipeline • Lexical normalization. Each token is
of each system in order to assign it a class la- passed through a set of basic modules
bel or a probability to be of a certain class. of FreeLing (e.g., dictionary lookup, suf-
The pipeline, which goes from text prepro- fixes check, detection of numbers and
cessing to machine learning classification, is dates, and named entity recognition)
described below. Note that the system term for identifying standard word forms and
is preferred over the classifier term, because a other valid constructions. If a token
machine learning classifier receives a feature is not recognized by any of the mod-
vector and produces a class label or probabil- ules, it is marked as out-of-vocabulary
ities for each class; instead, the system term (OOV) word. Then, a confusion set
enables to conceive the whole process, from is formed by normalization candidates
preprocessing to machine learning classifica- which are identical or similar to the
tion. graphemes or phonemes that make the
36
JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Analysis of Spanish Tweets at Global Level

OOV word. These candidates are el- • The number of positive and negative
ements of the union of a dictionary words, relative to the ElhPolar lexicon
of Spanish standard word forms and a (Saralegi and Vicente, 2013), the AFINN
gazetteer of proper nouns. The best nor- lexicon (Nielsen, 2011), or an union of
malization candidate for the OOV word both lexicons. In a negated context, the
is which best fits a statistical language label of a polarity word is inverted (i.e.,
model. The language model was esti- positive words become negative words,
mated from the Spanish Wikipedia cor- and vice versa). Additionally, a third
pus. Lastly, the selected candidate is feature labels the tweet with the class
capitalized according to the capitaliza- whose number of polarity words in the
tion rules of the Spanish language. Ex- text is the highest.
tensive research on lexical normalization • The number of negated contexts.
of Spanish tweets can be read in (Cerón-
Guzmán and León-Guzmán, 2016). • The number of occurrences of each Part-
of-Speech tag.
• Negation handling. Inspired by the
approach proposed by Pang et al. (Pang, 2.2.2 N-gram Features
Lee, and Vaithyanathan, 2002), this re- The fixed-length set of basic features is al-
search defined a negated context as a ways extracted from tweets. However, the
segment of the tweet that starts with a tweet text varies from another in terms of
(Spanish) negation word and ends with length, number of tokens, and vocabulary
a punctuation mark (i.e., “!”, “,”, “:”, used. For that reason, a process that trans-
“?”, “.”, “;”), but only the first n [0, 3] forms textual data into numerical feature vec-
or all tokens labeled with any or a spe- tors of fixed length is required. This process,
cific POS tag (i.e., verb, adjective, ad- known as vectorization, is performed by ap-
verb, and common noun) are affected by plying the tf-idf weighting scheme (Manning,
adding it the “ NEG” suffix. Note that Raghavan, and Schütze, 2008). Thus, each
when n = 0, no token is affected. document (i.e., a tweet text) is represented
as a vector d = {t1 , . . . , tn } RV , where V
2.2 Feature Extraction is the size of the vocabulary that was built
In this stage, the normalized tweet text is by considering word n-grams with n [1, 4],
transformed into a feature vector that feeds or character n-grams with n [3, 5] in the
the machine learning classifier. The features collection (i.e., the training set). The vector
are grouped into basic features and n-gram is, hence, formed by word n-grams, charac-
features. ter n-grams, or a concatenation of word and
character n-grams.
2.2.1 Basic Features
Some of these features are computed before 2.3 Machine Learning
the process of text cleaning and normaliza- Classification
tion is performed. At the last stage, the sentiment analysis sys-
tem classifies a given tweet as either P+, P,
• The number of words completely in up- NEU, N, N+, or NONE, or assigns probabil-
percase. ities for each class. After receiving as input
• The number of words with more than the feature vector, a L2-regularized Logistic
two consecutive repetitions of a same Regression classifier assigns a class label to
character. the tweet or a probability to be of a certain
class. The classifier was trained on the train-
• The number of consecutive repetitions of
ing set, using the Scikit-learn (Pedregosa et
exclamation marks, question marks, and
al., 2011) implementation of the Logistic Re-
both punctuation marks (e.g., “!!”, “??”,
gression algorithm.
“?!”) and whether the text ends with an
exclamation or question mark. 3 Experiments
• The number of occurrences of each class 1,720 different sentiment analysis systems
of emoticons (i.e., positive and negative) were trained on the training set via 5-fold
and whether the last token of the tweet cross validation, in order to find the best pa-
is an emoticon. rameter settings, namely: negation handling,
37
J. A. Cerón-Guzmán

polarity lexicon, order of word and charac- Macro- Macro- Macro-
Experiment Accuracy
Precision Recall F1
ter n-grams, and others parameters related
run-1 0.614 0.471 0.531 0.499
to the vectorization process (e.g., lowercas-
run-2 0.619 0.476 0.535 0.504
ing, frequency thresholds, etc.). The systems
run-3 0.620 0.477 0.532 0.503
were sorted by their mean cross-validation
score, and thus the top 50 ranked were fil- Table 1: Performance on the test set in the
tered to build the ensemble. The training six-labels evaluation
set is a collection of 7,219 tweets, each of
Macro- Macro- Macro-
which is tagged with one of six labels (i.e., Experiment Accuracy
Precision Recall F1
P+, P, NEU, N, N+, and NONE). Note that
run-1 0.702 0.564 0.565 0.564
the systems were trained for the six-labels run-2 0.704 0.567 0.568 0.567
evaluation, and therefore the P+ and P la- run-3 0.705 0.568 0.567 0.568
bels were merged into P, as well as the N+
and N labels were merged into N, to produce Table 2: Performance on the test set in the
an output in accordance with the four-labels four-labels evaluation
evaluation. Further description of the pro-
vided corpus, as well as of the training and Class Precision Recall F1-score
test sets, can be read in (Garcı́a-Cumbreras P 0.755 0.786 0.770
et al., 2016).
NEU 0.128 0.093 0.107
Next, the top 50 systems assigned a class
label to each tweet in a collection of 1,000, N 0.631 0.812 0.710
which was drawn from the untagged test set NONE 0.758 0.578 0.656
with a similar class distribution to the train-
ing set. In this stage, the objective was Table 3: Discriminative power for each class
to find the systems with the lowest abso- in the four-labels evaluation
lute correlation with each other; therefore,
the performance was not evaluated. Then, evaluation, and of 0.2% in the four-labels
the less-correlated combinations of 5, 10, and evaluation; instead, a negligible gain occurs
25 systems, were used to build the ensem- among the “run-2” and“ run-3” experiments,
bles, whose outputs correspond to the sub- taking additionally into account the compu-
mitted experiments. These experiments are tational cost of running the latter.
described below: As a final point, Table 3 shows how the
overall performance is affected by the low dis-
• run-1: the less-correlated combination
criminative power of the ensembles (in this
of 5 systems, which chooses the class la-
case, the one that correspond to “run-3”) for
bel that represents the majority in the
the NEU class. With this in mind, it is pro-
predictions made by the ensemble mem-
posed as future work to deal with the low
bers.
representativeness of the NEU class in the
• run-2: the less-correlated combination training data (i.e., 9.28% of tweets), in order
of 10 systems, which chooses the class to properly characterize this kind of tweets.
with the highest unweighted average
probability. 4 Conclusion
• run-3: the less-correlated combination This paper has described an ensemble-based
of 25 systems, which chooses the class approach for sentiment analysis of Spanish
with the highest unweighted average Twitter data at global level, developed in
probability. order to participate in Task 1 proposed by
the organization of TASS workshop. Three
Tables 1 and 2 show the performance eval- ensembles were built on the combination of
uation on the test set (i.e., a collection of sentiment analysis systems with the lowest
60,798 tweets) for six and four labels, respec- absolute correlation with each other. The
tively. Accuracy has been defined as the offi- systems were adapted to the informal genre
cial metric for ranking the systems. In sum- and the free writing style that characterize
mary, the main gain occurs among the “run- Twitter, in order to improve the quality of
1” and “run-2” experiments, with an incre- natural language analysis. In this way, the
ment of 0.5% in accuracy in the six-labels predicted class label for a particular tweet
38
JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Analysis of Spanish Tweets at Global Level

was based on a majority rule or on the high- Padró, L. and E. Stanilovsky. 2012. Freel-
est average probability. Experimental results ing 3.0: Towards wider multilinguality.
showed that the less-correlated combination In Proceedings of the Language Resources
of 25 systems, which chose the class with the and Evaluation Conference (LREC 2012),
highest unweighted average probability, was Istanbul, Turkey, May. ELRA.
the setting that best suited to the task. How-
Pang, B., L. Lee, and S. Vaithyanathan.
ever, there is a great room for improvement
2002. Thumbs up?: Sentiment classifi-
in the learning of a proper characterization
cation using machine learning techniques.
of neutral tweets.
In Proceedings of the ACL-02 Conference
on Empirical Methods in Natural Lan-
References guage Processing - Volume 10, EMNLP
Cerón-Guzmán, J. A. and E. León-Guzmán. ’02, pages 79–86. Association for Compu-
2016. Lexical normalization of Spanish tational Linguistics.
tweets. In Proceedings of the 25th Inter-
Pedregosa, F., G. Varoquaux, A. Gram-
national Conference Companion on World
fort, V. Michel, B. Thirion, O. Grisel,
Wide Web, WWW’16 Companion, pages
M. Blondel, P. Prettenhofer, R. Weiss,
605–610. International World Wide Web
V. Dubourg, J. Vanderplas, A. Passos,
Conferences Steering Committee.
D. Cournapeau, M. Brucher, M. Perrot,
Garcı́a-Cumbreras, M. A., J. Villena-Román, and E. Duchesnay. 2011. Scikit-learn:
E. Martı́nez-Cámara, M. C. Dı́az-Galiano, Machine learning in Python. Journal
M. T. Martı́n-Valdivia, and L. A. Urena- of Machine Learning Research, 12:2825–
López. 2016. Overview of tass 2016. In 2830.
Proceedings of TASS 2016: Workshop on Saralegi, X. and I. S. Vicente. 2013. Elhu-
Sentiment Analysis at SEPLN co-located yar at tass 2013. In Proceedings of the
with the 32nd SEPLN Conference (SE- Sentiment Analysis Workshop at SEPLN
PLN 2016), Salamanca, Spain, Septem- (TASS2013), September.
ber.
Vilares, D., M. A. Alonso, and C. Gómez-
Gayo-Avello, D. 2013. A meta-analysis of Rodrıguez. 2014. On the usefulness of
state-of-the-art electoral prediction from lexical and syntactic processing in polarity
Twitter data. Soc. Sci. Comput. Rev., classification of twitter messages. Journal
31(6):649–679. of the Association for Information Science
Han, B. and T. Baldwin. 2011. Lexi- and Technology.
cal normalisation of short text messages:
Makn sens a #Twitter. In Proceedings of
the 49th Annual Meeting of the Associa-
tion for Computational Linguistics: Hu-
man Language Technologies - Volume 1,
HLT’11, pages 368–378, Stroudsburg, PA,
USA. Association for Computational Lin-
guistics.
Manning, C. D., P. Raghavan, and
H. Schütze. 2008. Scoring, term
weighting and the vector space model. In
An Introduction to Information Retrieval.
Cambridge University Press, New York,
NY, USA.
Nielsen, F. Å. 2011. A new anew: evalu-
ation of a word list for sentiment analy-
sis in microblogs. In Proceedings of the
ESWC2011 Workshop on ‘Making Sense
of Microposts’: Big things come in small
packages, pages 93–98.
39
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 41-45

Participación de SINAI en TASS 2016∗
SINAI participation in TASS 2016

A. Montejo-Ráez M.C. Dı́az-Galiano
University of Jaén University of Jaén
23071 Jaén (Spain) 23071 Jaén (Spain)
amontejo@ujaen.es mcdiaz@ujaen.es

Resumen: Este artı́culo describe el sistema de clasificación de la polaridad utilizado
por el equipo SINAI en la tarea 1 del taller TASS 2016. Como en participaciones
anteriores, nuestro sistema se basa en un método supervisado con SVM a partir
de vectores de palabras. Dichos vectores se calculan utilizando la técnicas de deep-
learning Word2Vec, usando modelos generados a partir de una colección de tweets
expresamente generada para esta tarea y el volcado de la Wikipedia en español. Nues-
tros experimentos muestran que el uso de colecciones de datos masivos de Twitter
pueden ayudar a mejorar sensiblemente el rendimiento del clasificador.
Palabras clave: Análisis de sentimientos, clasificación de la polaridad, deep-
learning, Word2Vec
Abstract: This paper introduces the polarity classification system used by the SI-
NAI team for the task 1 at the TASS 2016 workshop. Our approach is based on a
supervised learning algorithm over vectors resulting from a weighted vector. This
vector is computed using a deep-learning algorithm called Word2Vec. The algorithm
is applied so as to generate a word vector from a deep neural net trained over a spe-
cific tweets collection and the Spanish Wikipedia. Our experiments show massive
data from Twitter can lead to a slight improvement in classificaciones accuracy.
Keywords: Sentiment analysis, polarity classification, deep learning, Word2Vec,
Doc2Vec

1 Introducción de dichos vectores para obtener una única
representación vectorial. Nuestros resultados
En este trabajo describimos las aportacio-
demuestran que el rendimiento del sistema de
nes realizadas para participar en la ta-
clasificación puede verse sensiblemente mejo-
rea 1 del taller TASS (Sentiment Analy-
rado gracias a la introducción de estos datos
sis at global level), en su edición de 2016
en la generación del modelo de palabras, no
(Garcı́a-Cumbreras et al., 2016). Nuestra so-
ası́ en el entrenamiento del clasificador de po-
lución continúa con las técnicas aplicadas
laridad final.
en el TASS 2014 (Montejo-Ráez, Garcı́a-
Cumbreras, y Dı́az-Galiano, 2014) y 2015 La tarea del TASS en 2016 denominada
(Dı́az-Galiano y Montejo-Ráez, 2015), utili- Sentiment Analysis at global level consiste en
zando aprendizaje profundo para represen- el desarrollo y evaluación de sistemas que de-
tar el texto y una colección de entrenamiento terminan la polaridad global de cada tweet
creada con tweets que contienen emoticonos del corpus general. Los sistemas presentados
que expresan emociones de felicidad o triste- deben predecir la polaridad de cada tweet uti-
za. Para ello utilizamos el método Word2Vec, lizando 6 o 4 etiquetas de clase (granularidad
ya que ha obtenido los mejores resultados en fina y gruesa respectivamente).
años anteriores. Por lo tanto, generamos un El resto del artı́culo está organizado de la
vector de pesos para cada palabra del tweet siguiente forma. El apartado 2 describe el es-
utilizando Word2Vec, y realizamos la media tado del arte de los sistemas de clasificación
∗
de polaridad en español. A continuación, se
Este estudio está parcialmente financiado por el
proyecto TIN2015-65136-C2-1-R otorgado por el Mi-
describe la colección de tweets con emotico-
nisterio de Economı́a y Competitividad del Gobierno nos utilizada para entrenar el clasificador. En
de España. el apartado 4 se describe el sistema desarro-
ISSN 1613-0073
A. Montejo-Ráez, M. C. Díaz-Galiano

llado y en el apartado 5 los experimentos rea- tado y Pla, 2014). Abordaron la tarea co-
lizados, los resultados obtenidos y el análisis mo un problema de clasificación, utilizando
de los mismos. Finalmente, en el último apar- SVM. Utilizaron una estrategia uno-contra-
tado exponemos las conclusiones y el trabajo todos donde entrenan un sistema binario pa-
futuro. ra cada polaridad. Los tweets fueron tokeni-
nizados para utilizar las palabras o los lemas
2 Clasificación de la polaridad en como caracterı́sticas y el valor de cada carac-
español terı́stica era su coeficiente tf-idf. Posterior-
La mayor parte de los sistemas de clasifica- mente realizaron una validación cruzada para
ción de polaridad están centrados en textos determinar el mejor conjunto de caracterı́sti-
en inglés, y para textos en español el sistema cas y parámetros a utilizar.
más completo, en cuanto a técnicas lingüı́sti- El equipo ELiRF-UPV (Hurtado, Pla, y
cas aplicadas, posiblemente sea The Spanish Buscaldi, 2015) volvió a obtener los mejores
SO Calculator (Brooke, Tofiloski, y Taboada, resultados en la edición de TASS 2015 con
2009), que además de resolver la polaridad de una técnica muy similar a la edición anterior
los componentes clásicos (adjetivos, sustanti- (SVM, tokenización, clasificadores binarios y
vos, verbos y adverbios) trabaja con modifi- coeficientes tf-idf). En este caso utilizaron un
cadores como la detección de negación o los sistema de votación simple entre un mayor
intensificadores. número de clasificadores con parámetros dis-
Los algoritmos de aprendizaje profundo tintos. Los mejores resultados los obtuvieron
(deep-learning en inglés) están dando buenos con un sistema que combinaba 192 sistemas
resultados en tareas donde el estado del ar- SVM con configuraciones diferentes, utilizan-
te parecı́a haberse estancado (Bengio, 2009). do un nuevo sistema SVM para realizar dicha
Estas técnicas también son de aplicación en combinación.
el procesamiento del lenguaje natural (Collo-
bert y Weston, 2008), e incluso ya existen sis-
3 Colección de tweets con
temas orientados al análisis de sentimientos, emoticonos
como el de Socher et al. (Socher et al., 2011). Los algoritmos de deep-learning necesitan
Los algoritmos de aprendizaje automático no grandes volúmenes de datos para su entre-
son nuevos, pero sı́ están resurgiendo gracias namiento. Por ese motivo se ha creado una
a una mejora de las técnicas y la disposición colección de tweets especı́fica para la detec-
de grandes volúmenes de datos necesarios pa- ción de polaridad. Para crear dicha colección
ra su entrenamiento efectivo. se han recuperado tweets con las siguientes
En la edición de TASS en 2012 el equipo caracterı́sticas:
que obtuvo mejores resultados (Saralegi Uri-
zar y San Vicente Roncal, 2012) presentaron Que contengan emoticonos que expresen
un sistema completo de pre-procesamiento de la polaridad del tweet. En este caso se
los tweets y aplicaron un lexicón derivado del han utilizado los siguientes emoticonos:
inglés para polarizar los tweets. Sus resulta-
• Positivos: :) :-) :D :-D
dos eran robustos en granularidad fina (65 %
de accuracy) y gruesa (71 % de accuracy). • Negativos: :( :-(
En la edición de TASS en 2013 el mejor
Que los tweets no contengan URLs, para
equipo (Fernández et al., 2013) tuvo todos
evitar tweets cuyo contenido principal se
sus experimentos en el top 10 de los resul-
encuentra en el enlace.
tados, y la combinación de ellos alcanzó la
primera posición. Presentaron un sistema con Que no sean retweets, para reducir el
dos variantes: una versión modificada del al- número de tweets repetidos.
goritmo de ranking (RA-SR) utilizando bi-
gramas, y una nueva propuesta basada en La captura de dichos tweets se realizó
skipgrams. Con estas dos variantes crearon durante 22 dı́as, del 18/07/2016 hasta el
lexicones sobre sentimientos, y los utilizaron 9/08/2016, recuperando unos 100.000 tweets
junto con aprendizaje automático (SVM) pa- diarios aproximadamente. Tal y como se ve
ra detectar la polaridad de los tweets. en la Figura 1 la recuperación fue muy ho-
En 2014 el equipo con mejores resultados mogénea y se obtuvieron más de 2.000.000
en TASS se denominaba ELiRF-UPV (Hur- de tweets.
42
Participación de SINAI en TASS 2016

anterior, pero en la que se intenta predecir
los términos acompañantes a partir de un
término dado. Con estas topologı́as, si dis-
ponemos de un volumen de textos suficiente,
esta representación puede llegar a capturar
Figura 1: Número de tweets recuperados cada la semántica de cada palabra. El número de
12 horas dimensiones (longitud de los vectores de ca-
da palabra) puede elegirse libremente. Para
Posteriormente, se realizó un filtrado de el cálculo del modelo Word2Vec hemos re-
dichos tweets eliminando aquellos que con- currido al software indicado, creado por los
tubieran menos de 5 palabras, teniendo propios autores del método.
en cuenta que consideramos palabra todo Tal y como se ha indicado, para obtener
término que sólo contenga letras (sin núme- los vectores Word2Vec representativos para
ros, ni caracteres especiales). cada palabra tenemos que generar un modelo
Al final quedaron 1.777.279 clasificados a partir de un volumen de texto grande. Para
según el emoticono que contienen de la si- ello hemos utilizado los parámetros que me-
guiente manera: jores resultados obtuvieron en nuestra par-
ticipación del 2014 (Montejo-Ráez, Garcı́a-
Positivos: 869.339 tweets
Cumbreras, y Dı́az-Galiano, 2014). Por lo
Negativos: 907.940 tweets tanto, a partir de un volcado de Wikipedia2
en Español de los artı́culos en XML, hemos
Por último, se realiza la siguiente limpieza
extraı́do el texto de los mismos. Obtenemos
de tweets:
ası́ unos 2,2 GB de texto plano que alimen-
Convertir el texto a minúsculas. ta al programa word2vec con los parámetros
siguientes: una ventana de 5 términos, el mo-
Eliminar menciones (nombres de usuario
delo skip-gram y un número de dimensiones
que empiezan el caracter @).
esperado de 300, logrando un modelo con más
Sustituir letras acentuadas por sus ver- de 1,2 millones de palabras en su vocabulario.
siones sin acentuar. Como puede verse en la Figura 2, nuestro
Quitar las palabras vacı́as de contenido sistema realiza la clasificación de los tweets
(stopwords). utilizando dos fases de aprendizaje, una en
la que entrenamos el modelo Word2Vec ha-
Normalizar las palabras para que no con- ciendo uso de un volcado de la enciclopedia
tengan letras repetidas, sustituyendo las on-line Wikipedia, en su versión en español,
repeticiones de letras contiguas para de- como hemos indicado anteriormente. De esta
jar sólo 3 repeticiones. forma representamos cada tweet con el vector
resultado de calcular la media de los vectores
4 Descripción del sistema Word2Vec de cada palabra en el tweet y su
Word2Vec1 es una implementación de la ar- desviación tı́pica (por lo que cada vector de
quitectura de representación de las palabras palabras por modelo es de 600 dimensiones).
mediante vectores en el espacio continuo, ba- Se lleva a cabo una simple normalización pre-
sada en bolsas de palabras o n-gramas con- via sobre el tweet, eliminando repetición de
cebida por Tomas Mikolov et al. (Mikolov letras y poniendo todo a minúsculas. La se-
et al., 2013). Su capacidad para capturar la gunda fase de entrenamiento utiliza el algo-
semántica de las palabras queda comproba- ritmo SVM y se entrena con la colección de
da en su aplicabilidad a problemas como la tweets con emoticonos explicada en el aparta-
analogı́a entre términos o el agrupamiento de do 3. La implementación de SVM utilizada es
palabras. El método consiste en proyectar las la basada en kernel lineal con entrenamiento
palabras a un espacio n-dimensional, cuyos SGD (Stochastic Gradient Descent) propor-
pesos se determinan a partir de una estruc- cionada por la biblioteca Sci-kit Learn3 (Pe-
tura de red neuronal mediante un algoritmo dregosa et al., 2011).
recurrente. El modelo se puede configurar pa- Esta solución es la utilizada en las dos va-
ra que utilice una topologı́a de bolsa de pa- riantes de la tarea 1 del TASS con predicción
labras (CBOW) o skip-gram, muy similar al 2
http://dumps.wikimedia.org/eswiki
1 3
https://code.google.com/p/word2vec/ http://scikit-learn.org/
43
A. Montejo-Ráez, M. C. Díaz-Galiano

de 4 clases: la que utiliza el corpus de tweets
Tabla 1: Resultados obtenidos sobre el con-
completo (full test corpus) y el que utiliza el
junto full
corpus balanceado (1k test corpus).
w2v SVM Accuracy Macro-F1
W TASS 61,31 % 48,55 %
W+T TASS 62,39 % 50,44 %
W TASS+T 49,28 % 40,20 %
W+T TASS+T 53,72 % 44,10 %

nerado solamente con Wikipedia, pasando de
61,31 % de ajuste a un 62,39 %. En cambio,
utilizar los tweets capturados para la fase
de entrenamiento supervisado no lleva sino
a una caı́da del rendimiento del sistema.
Figura 2: Flujo de datos del sistema completo Esto nos lleva a plantearnos la pregunta
de qué ocurrirı́a si utilizáramos sólo los tweets
recopilados para generar un modelo de vecto-
5 Resultados obtenidos res de palabras. Los resultados que se obtie-
Hemos experimentado con el efecto que tie- nen son un 59,05 % de ajuste y un 44,43 % de
nen en el rendimiento del sistema el uso de F1. No cabe duda de que conviene explorar el
una colección de datos generada a partir de uso de modelos de generación de caracterı́sti-
la captura de tweets y que han sido etique- cas a partir de vectores de palabras.
tados según los emoticonos que contienen en Estos resultados mejoran nuestros datos
la forma comentada anteriormente. La colec- del año pasado, en los que obtuvimos un ajus-
ción de más de 1,7 millones de tweets ha sido te del 61,19 % combinando vectores de pala-
utilizada al completo para generar un mode- bras (Word2Vec) y vectores de documentos
lo de vectores de palabras, cuya combinación (Doc2Vec).
con el de Wikipedia se ha analizado. También
hemos comprobado cómo el uso de dicha co- 6 Conclusiones y trabajo futuro
lección de tweets afecta cuando se usa para A partir de los resultados obtenidos, encon-
el entrenamiento del modelo de clasificación tramos que resulta interesante la incorpora-
de la polaridad. Para ello se han selecciona- ción de texto no formal (tweets) para la ge-
do 500,000 tweets aleatoriamente de esta co- neración de los modelos de palabras, lo cual
lección, con sus correspondientes etiquetas P tiene su sentido en una tarea de clasifica-
(positivo) o N (negativo) y se han combiando ción que, precisamente, trabaja sobre textos
con la colecciónd de entrenamiento de TASS. no formales que tienen la misma red social
Los resultados según las medidaas de Ac- como fuente. En cambio, el considerar que
curacy y Macro F1 obtenidas se muestran los emoticonos en un tweet pueden ayudar a
en la tabla 1. La primera columna nos in- un clasificador como SVM a mejorar en la
dica a partir de cuáles datos se han genera- determinación de la polaridad ha resultado
do los modelos de vectores de palabras, bien una hipótesis fallida. Esto puede entenderse
sólo con Wikipedia (W) o como combinación echando un vistazo a algunos de los tweets
de ésta con los tweets del corpus construido capturados por el sistema, donde se eviden-
(W+T). La segunda columna indica cómo se cia la dificultad, incluso para una persona,
ha entrenado el clasificador de polaridad a de poner en contexto el sentido del tweet y
partir de los textos etiquetados vectorizados su consideración como positivo o negativo si
con los modelos generados en el paso previo, no disponemos de un emoticono asociado.
bien sólo usando los datos de entrenamiento Como trabajo futuro nos proponemos di-
proporcionados por la organizacion (TASS) o señar una red neuronal profunda más elabo-
incorporando los etiquetados a partir de emo- rada, pero que parta también de textos de
ticonos (TASS+T). entrenamiento tanto formales como no for-
Como podemos observar, el uso de una co- males, si bien teniendo en cuanta información
lección de tweets para ampliar la capacidad lingüı́stica más avanzada como la sintáctica,
de representar un modelo basado en vecto- en lugar de trabajar con simples bolsas de
res de palabras mejora sensiblemente al ge- palabras. También queremos explorar el uso
44
Participación de SINAI en TASS 2016

de redes de este tipo en el proceso de clasfi- Hurtado, Lluı́s F y Ferran Pla. 2014. Elirf-
cación en sı́, y no sólo en la generación de ca- upv en tass 2014: Análisis de sentimien-
racterı́sticas. Una posibilidad es utilizar una tos, detección de tópicos y análisis de sen-
red de tipo DBN (Deep Belief Network) (Hin- timientos de aspectos en twitter. En In
ton y Salakhutdinov, 2006) en la que se añade Proc. of the TASS workshop at SEPLN
una última fase donde se realiza el etiquetado 2014.
de los ejemplos.
Hurtado, Lluı́s-F, Ferran Pla, y Davide Bus-
caldi. 2015. Elirf-upv en tass 2015: Análi-
Bibliografı́a
sis de sentimientos en twitter. En In Proc.
Bengio, Yoshua. 2009. Learning deep archi- of TASS 2015: Workshop on Sentiment
tectures for ai. Foundations and trends in Analysis at SEPLN. CEUR-WS.org, volu-
Machine Learning, 2(1):1–127. men 1397, páginas 35–40.
Brooke, Julian, Milan Tofiloski, y Maite Ta- Mikolov, Tomas, Kai Chen, Greg Corrado, y
boada. 2009. Cross-linguistic sentiment Jeffrey Dean. 2013. Efficient estimation
analysis: From english to spanish. En of word representations in vector space.
Galia Angelova Kalina Bontcheva Ruslan CoRR, abs/1301.3781.
Mitkov Nicolas Nicolov, y Nikolai Nikolov,
editores, RANLP, páginas 50–54. RANLP Montejo-Ráez, A., M.A. Garcı́a-Cumbreras,
2009 Organising Committee / ACL. y M.C. Dı́az-Galiano. 2014. Participación
de SINAI Word2Vec en TASS 2014. En
Collobert, Ronan y Jason Weston. 2008. In Proc. of the TASS workshop at SEPLN
A unified architecture for natural langua- 2014.
ge processing: Deep neural networks with
multitask learning. En Proceedings of the Pedregosa, Fabian, Gaël Varoquaux, Alexan-
25th International Conference on Machi- dre Gramfort, Vincent Michel, Bertrand
ne Learning, ICML ’08, páginas 160–167, Thirion, Olivier Grisel, Mathieu Blondel,
New York, NY, USA. ACM. Peter Prettenhofer, Ron Weiss, Vincent
Dubourg, y others. 2011. Scikit-learn:
Dı́az-Galiano, M.C. y A. Montejo-Ráez. Machine learning in python. The Journal
2015. Participación de SINAI DW2Vec of Machine Learning Research, 12:2825–
en TASS 2015. En In Proc. of TASS 2830.
2015: Workshop on Sentiment Analysis at
Saralegi Urizar, Xabier y Iñaki San Vicen-
SEPLN. CEUR-WS.org, volumen 1397.
te Roncal. 2012. Tass: Detecting senti-
Fernández, Javi, Yoan Gutiérrez, José M. ments in spanish tweets. En TASS 2012
Gómez, Patricio Martı́nez-Barco, Andrés Working Notes.
Montoyo, y Rafael Muñoz. 2013. Sen-
Socher, Richard, Jeffrey Pennington, Eric H.
timent analysis of spanish tweets using a
Huang, Andrew Y. Ng, y Christopher D.
ranking algorithm and skipgrams. En In
Manning. 2011. Semi-supervised recursi-
Proc. of the TASS workshop at SEPLN
ve autoencoders for predicting sentiment
2013.
distributions. En Proceedings of the Con-
Garcı́a-Cumbreras, Miguel Ángel, Julio ference on Empirical Methods in Natural
Villena-Román, Eugenio Martı́nez- Language Processing, EMNLP ’11, pági-
Cámara, Manuel Carlos Dı́az-Galiano, nas 151–161, Stroudsburg, PA, USA. As-
Ma . Teresa Martı́n-Valdivia, y L. Alfonso sociation for Computational Linguistics.
Ureña-López. 2016. Overview of tass
2016. En Proceedings of TASS 2016:
Workshop on Sentiment Analysis at
SEPLN co-located with the 32nd SEPLN
Conference (SEPLN 2016), Salamanca,
Spain, September.
Hinton, Geoffrey E y Ruslan R Salakhutdi-
nov. 2006. Reducing the dimensionality
of data with neural networks. Science,
313(5786):504–507.
45
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 47-51

ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter
ELiRF-UPV at TASS 2016: Sentiment Analysis in Twitter

Lluı́s-F. Hurtado y Ferran Pla
Universitat Politècnica de València
Camı́ de Vera s/n
46022 València
{lhurtado, fpla}@dsic.upv.es

Resumen: En este trabajo se describe la participación del equipo del grupo de
investigación ELiRF de la Universitat Politècnica de València en el Taller TASS2016.
Este taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual
de la Sociedad Española para el Procesamiento del Lenguaje Natural. Este trabajo
presenta las aproximaciones utilizadas para las dos tareas planteadas en el taller,
los resultados obtenidos y una discusión de los mismos. Nuestra participación se
ha centrado principalmente en explorar diferentes aproximaciones para combinar un
conjunto de sistemas con lo que se ha obtenido los mejores resultados en ambas
tareas.
Palabras clave: Twitter, Análisis de Sentimientos.
Abstract: This paper describes the participation of the ELiRF research group of
the Universitat Politècnica de València at TASS2016 Workshop. This workshop is a
satellite event of the XXXII edition of the Annual Conference of the Spanish Society
for Natural Language Processing. This work describes the approaches used for the
two tasks of the workshop, the results obtained and a discussion of these results. Our
participation has focused primarily on exploring different approaches for combining
a set of systems. Using these approaches we have achieved the best results in both
tasks.
Keywords: Twitter, Sentiment Analysis.

1. Introducción junto de tweets sobre diferentes aspectos per-
tenecientes al dominio de la polı́tica.
El Taller de Análisis de Sentimientos El presente artı́culo resume la participa-
(TASS) en sus cinco ediciones ha venido plan- ción del equipo ELiRF-UPV de la Universi-
teando tareas relacionadas con el análisis de tat Politècnica de València en todas las tareas
sentimientos en Twitter. El objetivo principal planteadas en este taller. Primero se descri-
es el de comparar y evaluar diferentes aproxi- ben las aproximaciones y recursos utilizados
maciones a estas tareas. Además, desarrolla en cada tarea. A continuación se presenta la
recursos de libre acceso, básicamente, corpora evaluación experimental realizada y los resul-
anotados con polaridad, temática, tendencia tados obtenidos. Finalmente se muestran las
polı́tica, aspectos, que son de gran utilidad conclusiones y posibles trabajos futuros.
para la comparación de diferentes aproxima-
ciones a las tareas propuestas. 2. Descripción de los sistemas
En esta quinta edición del TASS se pro- Los sistemas presentados en el TASS 2016
ponen dos tareas de ediciones anteriores se basan en el sistema desarrollado en la edi-
(Garcı́a-Cumbreras et al., 2016): 1) Determi- cion anterior del TASS 2015 (Hurtado, Pla,
nación de la polaridad en tweets, con dife- y Buscaldi, 2015). Muchas de las caracterı́sti-
rentes grados de intensidad en la polaridad: cas y recursos de este sistema fueron uti-
6 etiquetas y 4 etiquetas y 2) Determinación lizados en las ediciones en las que nuestro
de la polaridad de los aspectos en el corpus equipo ha participado (Pla y Hurtado, 2013)
STOMPOL. Este corpus consta de un con- (Hurtado y Pla, 2014) . El preproceso de los
ISSN 1613-0073
Ll.-F. Hurtado, F. Pla

tweets utiliza la estrategia descrita en el tra- 3. Tarea 1: Análisis de
bajo del TASS 2013 (Pla y Hurtado, 2013). sentimientos en tweets
Esta consiste básicamente en la adaptación
Esta tarea consiste en determinar la pola-
para el castellano del tokenizador de tweets
ridad de los tweets y la organización ha defi-
Tweetmotif (Connor, Krieger, y Ahn, 2010).
nido dos subtareas. La primera distingue seis
También se ha usado Freeling (Padró y Sta-
etiquetas de polaridad: N y N+ que expresan
nilovsky, 2012)1 como lematizador, detector
polaridad negativa con diferente intensidad,
de entidades nombradas y etiquetador mor-
P y P+ para la polaridad positiva con dife-
fosintáctico, con las correspondientes modifi-
rente intensidad, NEU para la polaridad neu-
caciones para el dominio de Twitter. Usando
tra y NONE para expresar ausencia de pola-
esta aproximación, la tokenización ha consis-
ridad. La segunda sólo distinguen 4 etiquetas
tido en agrupar todas las fechas, los signos
de polaridad: N, P, NEU y NONE.
de puntuación, los números y las direcciones
web. Se han conservado los hashtags y las El corpus proporcionado por la organiza-
menciones de usuario. Se ha considerado y ción del TASS consta de un conjunto de en-
evaluado el uso de palabras y lemas como to- trenamiento, compuesto por 7219 tweets eti-
kens ası́ como la detección de entidades nom- quetados con la polaridad usando seis etique-
bradas. tas, y un conjunto de test, de 60798 tweets,
al cual se le debe asignar la polaridad. La dis-
Todas las tareas se han abordado como tribución de tweets según su polaridad en el
un problema de clasificación. Se han utiliza- conjunto de entrenamiento se muestra en la
do Máquinas de Soporte Vectorial (SVM) por Tabla 1.
su capacidad para manejar con éxito gran-
des cantidades de caracterı́sticas. En concreto Polaridad # tweets %
usamos dos librerı́as (LibSVM2 y LibLinear3 ) N 1335 18.49
que han demostrado ser eficientes implemen- N+ 847 11.73
taciones de SVM que igualan el estado del NEU 670 9.28
arte. El software está desarrollado en Python NONE 1483 20.54
y para acceder a las librerı́as de SVM se ha P 1232 17.07
utilizado el toolkit scikit-learn4 . (Pedregosa P+ 1652 22.88
et al., 2011). TOTAL 7219 100
En este trabajo se ha explotado la técni-
ca de combinación de diferentes configuracio-
nes de clasificadores para aprovechar su com- Tabla 1: Distribución de tweets en el conjunto
plementariedad. Se ha utilizado la técnica de de entrenamiento según su polaridad.
votación simple utilizada en trabajos ante-
riores (Pla y Hurtado, 2013) (Pla y Hurtado, A partir de la tokenización propuesta se
2014b) pero en este caso extendiéndola a un realizó un proceso de validación cruzada (10-
número mayor de clasificadores, con diferen- fold cross validation) para determinar el me-
tes parámetros y caracterı́sticas (palabras, le- jor conjunto de caracterı́sticas y los paráme-
mas, n-gramas de palabras y lemas) ası́ como tros del modelo. Como caracterı́sticas se pro-
estrategias de combinación alternativas. baron diferentes tamaños de n-gramas de pa-
labras y de lemas. También se exploró la com-
Cada tweet se ha representado como un
binación de los modelos mediante diferentes
vector que contiene los coeficientes tf-idf de
técnicas de votación para aprovechar su com-
las caracterı́sticas consideradas. En toda la
plementariedad y mejorar las prestaciones fi-
experimentación realizada, las caracterı́sticas
nales. Algunas de éstas técnicas proporcio-
y los parámetros de los clasificadores se han
naron mejoras significativas sobre el mismo
elegido mediante una validación cruzada de
conjunto de datos, como se muestra en (Pla
10 iteraciones (10-fold cross-validation) sobre
y Hurtado, 2014b). En todos los casos se han
el conjunto de entrenamiento.
utilizado diccionarios de polaridad, tanto de
lemas (Saralegi y San Vicente, 2013), como
1
http://nlp.lsi.upc.edu/freeling/
de palabras (Martı́nez-Cámara et al., 2013)
2
http://www.csie.ntu.edu.tw/˜cjlin/libsvm/ y el diccionario Afinn (Hansen et al., 2011)
3
http://www.csie.ntu.edu.tw/˜cjlin/liblinear/ traducido automáticamente del inglés al cas-
4
http://scikit-learn.org/stable/ tellano.
48
ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter

Se han considerado dos alternativas para Los sistemas presentados han obtenido las
abordar la tarea: dos primeras posiciones en las dos subtareas
consideradas.
run1 La primera alternativa combina
mediante un sistema de votación ponde- Run Accuracy
rada la salida de 192 clasificadores ba- run1 0.662
sados en el uso de SVM. La diferencia 6-ETIQUETAS run2 0.673
entre los clasificadores radica en el pre- run1 0.707
procesado y la tokenización utilizada, las 4-ETIQUETAS run2 0.721
caracterı́sticas seleccionadas y los valo-
res de los parámetros del propio modelo
SVM. Tabla 2: Resultados oficiales del equipo
En concreto se realizaron todas las com- ELiRF-UPV en la Tarea 1 de la competición
binaciones posibles entre 8 tokenizacio- TASS-2016 sobre el conjunto de test para 6
nes (lemas o palabras, detectar NE o no, y 4 etiquetas.
detectar menciones a usuarios y hash-
tags, ...); 4 conjuntos distinto de ca-
racterı́sticas (palabras o bigramas con y 4. Tarea 2: Análisis de Polaridad
sin diccionarios de polaridad) y 6 valo- de Aspectos en Twitter
res distintos del parámetro c del modelo Esta tarea consiste en asignar la polari-
SVM con kernel lineal. dad a los aspectos que aparecen marcados en
La clase asignada a cada tweet t viene el corpus. Una de las dificultades de la tarea
determinada por la siguiente fórmula. consiste en definir qué contexto se le asigna a
cada aspecto para poder establecer su polari-
ĉ = argmax(Nt (c) · P (c)) (1) dad. Para un problema similar, detección de
c∈C
la polaridad a nivel de entidad, en la edición
del TASS 2013, propusimos una segmenta-
Donde C es el conjunto de todas las cla- ción de los tweets basada en un conjunto de
ses, Nt (c) es el número de clasificadores heurı́sticas (Pla y Hurtado, 2013). Esta apro-
que asignan la clase c al tweet t, y P (c) ximación también se utilizó para la tarea de
es la probabilidad a priori de la clase c detección de la tendencia polı́tica de los usua-
calculada utilizando el corpus de entre- rios de Twitter (Pla y Hurtado, 2014a) y pa-
namiento. ra este caso proporcionó buenos resultados.
run2 La segunda alternativa explora En este trabajo se propone una aproximación
la combinación de modelos mediante el más simple que consiste en determinar el con-
aprendizaje de un metaclasificador. Uti- texto de cada aspecto a través de una venta-
lizando las salidas de los mismos 192 cla- na fija definida a la izquierda y derecha de la
sificadores que en el run anterior, se ha instancia del aspecto. Esta aproximación es
aprendido un segundo modelo SVM que la que se utilizó en nuestro sistema del TASS
sirve para proporcionar la nueva salida 2015 la cual utiliza ventanas de diferente lon-
combinada. Se ha destinado una parte gitud. La longitud de la ventana óptima se
del corpus de entrenamiento para ajus- ha determinado experimentalmente sobre el
tar los parámetros del metamodelo. Esta conjunto de entrenamiento mediante una va-
aproximación es la misma que la utiliza- lidación cruzada. Para entrenar nuestro sis-
da en la edición del TASS 2015. tema, se ha considerado el conjunto de entre-
namiento únicamente, se han determinado los
Para la subtarea de 4 etiquetas el run1 se segmentos para cada aspecto y se ha seguido
ha aprendido utilizando el corpus de apren- una aproximación similar a la Tarea 1.
dizaje con 4 etiquetas mientras que el run2, El corpus de la tarea, corpus STOMPOL,
dada la complejidad del ajuste de parámetros se compone de un conjunto de tweets relacio-
del metamodelo se ha optado por adaptar el nados con una serie de aspectos polı́ticos (co-
resultado de la subtarea de 6 etiquetas unien- mo economı́a, sanidad, etc.) enmarcados en
do P y P+ como P y N y N+ como N. la campaña polı́tica de las elecciones andalu-
En la Tabla 2 se muestran los valores de zas de 2015. Cada aspecto se relaciona con
Accuracy obtenidos para las dos subtareas. una o varias entidades que se corresponden
49
Ll.-F. Hurtado, F. Pla

con uno de los principales partidos polı́ticos dos últimas ediciones del TASS, creemos que
en España (PP, PSOE, IU, UPyD, Cs y Pode- se está cerca de alcanzar los mejores resulta-
mos). El corpus consta de 1.284 tweets, y ha dos posibles en la tarea de Análisis de senti-
sido dividido en un conjunto de entrenamien- mientos tal y como se ha venido planteando
to (784 tweets) y un conjunto de evaluación hasta el momento.
(500 tweets). A la vista de los buenos resultados que se
han obtenido mediante la combinación de sis-
4.1. Aproximación y resultados temas, como trabajo futuro nos planteamos
A continuación presentamos una pequeña desarrollar nuevos métodos de combinación
descripción de las caracterı́sticas de nuestro de sistemas más sofisticados ası́ como la in-
sistema ası́ como el proceso seguido en la fase clusión de otros paradigmas de clasificación
de entrenamiento. El sistema utiliza un cla- más hetereogéneos (distintos de los SVM) pa-
sificador basado en SVM. Para aprender los ra aumentar la complementariedad de los sis-
modelos sólo se utiliza el conjunto de entre- temas combinados.
namiento proporcionado para la tarea y los Además, se pretende extender el sistema
diccionarios de polaridad previamente descri- para otros idiomas. El sistema descrito ya
tos. Antes de abordar el entrenamiento se de- ha sido utilizado, con ligeras modificaciones,
terminan los segmentos de tweet que cons- en tareas de análisis de sentimientos para el
tituyen el contexto de cada una de los as- Inglés en la competición Semeval (Martı́nez,
pectos presentes. Se ha tenido en cuenta tres Pla, y Hurtado, 2016) aunque con resultados
tamaños de ventana de longitudes 5, 7 y 10 no tan satisfactorios como en las tareas del
palabras a la izquierda y derecha del aspec- TASS.
to. Cada uno de los segmentos se tokeniza y
se utiliza Freeling para determinar sus lemas Agradecimientos
y ciertas entidades. A continuación se apren- Este trabajo ha sido parcialmente subven-
den diferentes modelos combinando tamaños cionado por el MINECO mediante el proyec-
de ventana, parámetros del modelo y diferen- to ASLP-MULAN: Audio, Speech and Lan-
tes caracterı́sticas (palabras, lemas, NE, etc). guage Processing for Multimedia Analytics
Mediante validación cruzada se elige el mejor (TIN2014-54288-C4-3-R).
modelo. Para esta tarea sólo hemos presenta-
do un modelo. Bibliografı́a
Run Accuracy Connor, Brendan O, Michel Krieger, y Da-
STOMPOL run1 0.633 vid Ahn. 2010. Tweetmotif: Exploratory
search and topic summarization for twit-
ter. En William W. Cohen y Samuel Gos-
Tabla 3: Resultados oficiales del equipo ling, editores, Proceedings of the Fourth
ELiRF-UPV en la Tarea 2 de la competición International Conference on Weblogs and
TASS-2016 para el corpus STOMPOL. Social Media, ICWSM 2010, Washington,
DC, USA, May 23-26, 2010. The AAAI
En la Tabla 3 se presentan los resultados Press.
obtenidos para la Tarea 2 con lo que nuestra
aproximación ha obtenido la primera posición Garcı́a-Cumbreras, Miguel Ángel, Julio
en dicha tarea. Villena-Román, Eugenio Martı́nez-
Cámara, Manuel Carlos Dı́az-Galiano,
5. Conclusiones y trabajos Ma . Teresa Martı́n-Valdivia, y L. Alfonso
futuros Ureña-López. 2016. Overview of tass
2016. En Proceedings of TASS 2016:
En este trabajo se ha presentado la parti-
Workshop on Sentiment Analysis at
cipación del grupo ELiRF-UPV en las 2 ta-
SEPLN co-located with the 32nd SEPLN
reas planteadas en TASS 2016. Nuestro equi-
Conference (SEPLN 2016), Salamanca,
po ha utilizado aproximaciones basadas en
Spain, September.
máquinas de soporte vectorial y se ha cen-
trado principalmente en combinar diferentes Hansen, Lars Kai, Adam Arvidsson,
sistemas. Finn Årup Nielsen, Elanor Colleoni,
Haciendo un análisis del número de parti- y Michael Etter. 2011. Good friends, bad
cipantes y de los resultados obtenidos en las news-affect and virality in twitter. En
50
ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter

Future information technology. Springer, Pla, Ferran y Lluı́s-F. Hurtado. 2014b. Sen-
páginas 34–43. timent analysis in twitter for spanish. En
Elisabeth Métais Mathieu Roche, y Ma-
Hurtado, Lluı́s F., Ferran Pla, y Davide Bus-
guelonne Teisseire, editores, Natural Lan-
caldi. 2015. Elirf-upv en tass 2015: Análi-
guage Processing and Information Sys-
sis de sentimientos en twitter. En SEPLN.
tems, volumen 8455 de Lecture Notes in
Hurtado, LLuı́s F y Ferran Pla. 2014. Elirf- Computer Science. Springer International
upv en tass 2014: Análisis de sentimien- Publishing, páginas 208–213.
tos, detección de tópicos y análisis de Saralegi, Xabier y Iñaki San Vicente. 2013.
sentimientos de aspectos en twitter. En Elhuyar at tass 2013. En Proceedings of
TASS2014. the TASS workshop at SEPLN 2013. IV
Martı́nez, Vı́ctor, Ferran Pla, y Lluı́s-F Hur- Congreso Español de Informática.
tado. 2016. Dsic-elirf at semeval-2016
task 4: Message polarity classification in
twitter using a support vector machine ap-
proach.
Martı́nez-Cámara, E., M. T. Martı́n-
Valdivia, M. D. Molina-gonzález, y
L. A. Ureña-lópez. 2013. Bilingual
Experiments on an Opinion Comparable
Corpus. En Proceedings of the 4th Works-
hop on Computational Approaches to
Subjectivity, Sentiment and Social Media
Analysis, página 87–93.
Padró, Lluı́s y Evgeny Stanilovsky. 2012.
Freeling 3.0: Towards wider multilingua-
lity. En Proceedings of the Langua-
ge Resources and Evaluation Conference
(LREC 2012), Istanbul, Turkey, May. EL-
RA.
Pedregosa, F., G. Varoquaux, A. Gramfort,
V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Du-
bourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, y E. Du-
chesnay. 2011. Scikit-learn: Machine lear-
ning in Python. Journal of Machine Lear-
ning Research, 12:2825–2830.
Pla, Ferran y Lluı́s-F Hurtado. 2013. Tass-
2013: Análisis de sentimientos en twitter.
En Proceedings of the TASS workshop at
SEPLN 2013. IV Congreso Español de In-
formática.
Pla, Ferran y Lluı́s-F. Hurtado. 2014a. Po-
litical tendency identification in twitter
using sentiment analysis techniques. En
Proceedings of COLING 2014, the 25th
International Conference on Computatio-
nal Linguistics: Technical Papers, pági-
nas 183–192, Dublin, Ireland, August. Du-
blin City University and Association for
Computational Linguistics.
51
TASS 2016: Workshop on Sentiment Analysis at SEPLN, septiembre 2016, pág. 53-57

GTI at TASS 2016: Supervised Approach for Aspect Based
Sentiment Analysis in Twitter∗
GTI en TASS 2016: Una aproximación supervisada para el análisis de
sentimiento basado en aspectos en Twitter

Tamara Álvarez-López, Milagros Fernández-Gavilanes, Silvia Garcı́a-Méndez,
Jonathan Juncal-Martı́nez, Francisco Javier González-Castaño
GTI Research Group, AtlantTIC
University of Vigo, 36310 Vigo, Spain
{talvarez,mfgavilanes,sgarcia,jonijm}@gti.uvigo.es, javier@det.uvigo.es

Resumen: Este artı́culo describe la participación del grupo de investigación GTI,
del centro AtlantTIC, perteneciente a la Universidad de Vigo, en el tass 2016. Este
taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual de
la Sociedad Española para el Procesamiento del Lenguaje Natural. En este trabajo
se propone una aproximación supervisada, basada en clasificadores, para la tarea de
análisis de sentimiento basado en aspectos. Mediante esta técnica hemos conseguido
mejorar las prestaciones de ediciones anteriores, obteniendo una solución acorde con
el estado del arte actual.
Palabras clave: Análisis de sentimiento, aspectos, SVM, aprendizaje automático,
Twitter
Abstract: This paper describes the participation of the GTI research group of
AtlantTIC, University of Vigo, in tass 2016. This workshop is framed within the
XXXII edition of the Annual Congress of the Spanish Society for Natural Language
Processing event. In this work we propose a supervised approach based on classifiers,
for the aspect based sentiment analysis task. Using this technique we managed to
improve the performance of previous years, obtaining a solution reflecting the actual
state-of-the-art.
Keywords: Sentiment analysis, aspects, SVM, machine learning, Twitter

1 Introduction mum length of the post. However, tweets
have other elements we have to consider,
The social media activity is being profused
like hashtags, mentions and retweets. More
in the recent years, users post opinions and
concretely, aspect-based sentiment analysis
comments in Twitter and in other social plat-
(absa) consists of extracting opinions, i.e.
forms. Due to this, there is a huge amount
determining the sentiment polarity, from spe-
of information available that could be use-
cific entities in the text (Liu, 2012). There-
ful for business, in order to design marketing
fore, this task becomes a challenge on the
campaigns or to apply any kind of business
field of nlp.
analysis.
As a consequence, the research on text The tass Workshop (Garcı́a-Cumbreras
mining and also on the field of Sentiment et al., 2016) and the sepln conference of-
Analysis (sa) has grown considerably these fer an opportunity for participants to know
days. sa is the part of Natural Language Pro- about the latest advances on the field of nlp
cessing (nlp) responsible for determining the for Spanish language.
polarity of a text or a whole sentence. The Many approaches applied to sa can be
sa applied to Twitter has to be conducted found in the literature, where it is possi-
in a restricted scenario due to the maxi- ble to distinguish between knowledge based
∗
approaches (Brooke, Tofiloski, and Taboada,
This work was partially supported by the Minis-
terio de Economı́a y Competitividad under project
2009; Fernández-Gavilanes et al., 2016), us-
COINS (TEC2013-47016-C2-1-R) and by Xunta de ing grammars and thesaurus and others
Galicia (GRC2014/046). based on machine learning approaches (Mo-
ISSN 1613-0073
T. Álvarez-López, M. Fernández-Gavilanes, S. García-Méndez, J. Juncal-Martínez, F. J. González-Castaño

hammad, Kiritchenko, and Zhu, 2013). In plying sa to Twitter has been fully ad-
the last years we can also find deep learning dressed (Pak and Paroubek, 2010; Han and
approaches (Bengio, 2009), applied to this Baldwin, 2011). Within the chosen solu-
task. tions, we highlight the text normalization
We present our supervised machine learn- approach (Fabo, Cuadros, and Etchegoyhen,
ing (ml) system which consists of a Support 2013) and the use of key elements in classifi-
Vector Machine (svm) classifier. Our objec- cation approach (Wang et al., 2011). Others
tive is to conduct the sa process at an aspect hold the advantages of using deep learning
level, task 2, determining the polarity of a techniques in this task (dos Santos and Gatti,
specific given part of a sentence. 2014).
The article is structured as follows. Sec- According to the purpose of the developed
tion 2 is a review of the research involving sa systems, it is possible to find applications
in the Twitter domain. Then, the Section 3 like classification of product reviews and po-
describes the applied approach and the im- litical sentiment and election results pre-
plemented system. In Section 4, we show the diction (Bermingham and Smeaton, 2011),
experimental results of our system. Finally, among others.
in Section 5 we present the conclusions and
future works. 3 System Overview
In this section we make a brief description
2 Related work of the system submitted for Task 2: Aspect-
A large amount of literature related to Opin- based sentiment analysis. We developed a
ion Mining (om) and sa can be found (Pang supervised system, based on a svm classifier
and Lee, 2008; Martı́nez-Cámara et al., using different features. In the next subsec-
2016). Most of the systems are applied to tions we explain the different steps required.
Twitter. However others are applied to social
media platforms within the micro-blog con- 3.1 Preprocessing
text. Due to this, the approaches are varied Before applying any supervised approach to
technically and in connection with the pur- our corpus, some preprocessing is needed.
pose. First of all, we have to normalize the text,
Two main approaches exist in sa: super- since in Twitter language we can find abbre-
vised and unsupervised learning ones. Super- viations, mentions, hashtags, URLs or mis-
vised systems implement classification meth- spellings. In order to do that, we replace the
ods like svm, Logistic Regression (lr), Con- URLs with the “URL” tag and we replace the
ditional Random Fields (crf), K-Nearest abbreviations or misspellings with the correct
Neighbors (knn), etc. Cui, Mittal, and Datar entire word. For mentions and hashtags, we
(2006) affirmed that svm are more appro- keep them unchanged but deleting the “@”
priate for sentiment classification than gen- or “#” symbols. Moreover, when a hashtag
erative models, due to their capability for is composed of several words, we split and
working with ambiguity, that is, dealing with treat them as different tokens.
mixed feelings. Supervised algorithms are After this, a lexical analysis is carried out.
used when the number of classes, as well as It consists of lemmatization and POS tag-
the representative members of each class, are ging, which are performed by means of Freel-
known. ing tool (Atserias et al., 2006).
Unsupervised systems are based on lin- Once we have analysed lexically the texts,
guistic knowledge like lexicons, and syntactic we decided to separate the sentences by the
features in order to infer the polarity (Pal- different aspects. For doing that, the scope
toglou and Thelwall, 2012). These last tech- of each aspect is determined, applying the
niques represent a more effective approach in following rules, which are adapted from our
the cross-domain context and for multilingual English aspect based sentiment anaylisis sys-
applications. The unsupervised classification tem (Alvarez-López et al., 2016)
algorithms do not work with a training set,
in contrast, some of them use clustering algo- • If there is only one aspect in the sen-
rithms in order to distinguish groups (Li and tence, we keep the sentence unchanged,
Liu, 2010). and introduce it entirely as input for the
As noted earlier, the special case of ap- next step.
54
GTI en TASS 2016: Una aproximación supervisada para el análisis de sentimiento basado en aspectos en Twitter

• If there are multiple aspects, we separate a number of political issues, such as health
the sentences by punctuation marks, or economy, among others. These issues are
conjunctions or other aspects found. framed in the political campaign of Andalu-
sian elections in 2015, where each aspect re-
• If there are several aspects with no words
lates to one or several entities that corre-
between them, we consider that they be-
spond to one of the main political parties
long to the same context, and assign the
in Spain (PP, PSOE, IU, UPyD, Cs and
same polarity to all of them.
Podemos). The corpus is composed by 1,284
tweets, and has been divided into a training
3.2 SVM classifier set (784 tweets) and a set of evaluation (500
In this section we describe the strategy fol- tweets).
lowed to determine the sentiment (positive, In order to evaluate the performance of
negative or neutral) for each aspect prede- the various features for polarity classification
fined in corpus. at an aspect-based level, we perform a se-
We develop a svm classifier, using the lib- ries of ablation experiments as shown in Ta-
svm library (Chang and Lin, 2011). The in- ble 1. We start with the word token base-
puts for the svm will be the sentences sep- line classifier, and then add all four sets of
arated by contexts, as explained in the pre- features that help to increase performance as
vious subsection. The features extracted are measured by accuracy. As we might expect,
the following: including the aspect feature has the most
marked effect on the performance of polarity
• Word tokens of nouns, adjectives and classification, although all the features con-
verbs in the sentence. tributed to improving overall performance on
• Lemmas of verbs, nouns and adjectives stompol corpus.
that appear in each sentence.
Type Accuracy Improvement
• POS tags of nouns, adjectives and verbs.
Word token 56.12
• N-grams of different length, grouping the +Lemmas 57.64 +1.52%
words in each sentence. +pos tags 58.26 +0.62%
• Aspects appearing in the sentence. We +Aspects 59.94 +1.68%
join “aspect”-“entity”, defined in each +Negations 60.60 +0.66%
target as a feature.
• Negations. We create a negation dic- Table 1: Results for polarity feature ablation
tionary, which contains several parti- experiments on stompol corpus
cles indicating negation, such as “no”,
“nunca”, etc. Due to the low participation of research
teams in task 2 this year, we decided to com-
The previous features are all binary ones, pare our proposal to the systems presented
assigning the value 1 if the current feature is this year and also to that ones of last year,
present in the tweet and the value 0, if not. because of the use of the same dataset.
For this reason, Table 2 compares results
4 Experimental Results for our approach with different official ones
The Task 2: Sentiment Analysis at the as- submitted in 2015 and 2016 tass editions.
pect level consists of assigning a polarity label In this way, we compared our results for a
to each aspect, which were initially marked ml approach based on well-known squared-
in the stompol corpus (Martı́nez-Cámara et regularised logistic regression with a snippet
al., 2016) raised by the tass organization. In of length 4 (Lys-2) described in Vilares et
this way, this corpus provides both polarity al. (2015), a clustering method focused on
labels and the identification of the aspects grouping authors with similar sociolinguis-
that appear in each tweet. The aim is to be tic insights (TID-spark) described in Park
able to correctly assign to each aspect a pos- (2015), a recurrent neural network composed
itive, negative or neutral polarity. of a single long short term memory and a
In this regard, the stompol corpus con- logistic function (Lys-1) described in Vilares
sists of a set of Spanish tweets related to et al. (2015), a ml approach based on a
55
T. Álvarez-López, M. Fernández-Gavilanes, S. García-Méndez, J. Juncal-Martínez, F. J. González-Castaño

svm with a snipped of length 5,7 and 10 In Proceedings of LREC, volume 6, pages
(ELiRF) described in Hurtado, Plà, and Bus- 48–55.
caldi (2015), and the best performing run of
Bengio, Y. 2009. Learning deep architec-
the actual task 2 tass edition (ELiRF-UPV).
tures for AI. Found. Trends Mach. Learn.,
2(1):1–127, January.
Experiment Task edition Accuracy
Bermingham, A. and A. F. Smeaton. 2011.
ELiRF-UPV 2016 63.3
On using Twitter to monitor political sen-
ELiRF 2015 63.3
timent and predict election results.
GTI 2016 60.6
LyS-1 2015 59.9 Brooke, J., M. Tofiloski, and M. Taboada.
TID-spark 2015 55.7 2009. Cross-linguistic sentiment analysis:
Lys-2 2015 54.0 From english to spanish. In G. Angelova,
K. Bontcheva, R. Mitkov, N. Nicolov, and
N. Nikolov, editors, RANLP, pages 50–
Table 2: Results of different approaches in 54. RANLP 2009 Organising Committee
2015/2016 tass editions on stompol corpus / ACL.
Comparing the results, the performance of Chang, C.-C. and C.-J. Lin. 2011. Libsvm: a
our current model is close from the top rank- library for support vector machines. ACM
ing systems of this and last year. Transactions on Intelligent Systems and
Technology (TIST), 2(3):27.
5 Conclusions and future works
Cui, H., V. Mittal, and M. Datar. 2006.
This paper describes the participation of the
Comparative experiments on sentiment
GTI group in the tass 2016, Task 2: Aspect-
classification for online product reviews.
Based Sentiment Analysis. We developed a
In Proceedings of the 21st National Con-
supervised system based on a svm classifier
ference on Artificial Intelligence - Vol-
for the aspect-based sentiment analysis. The
ume 2, AAAI’06, pages 1265–1270. AAAI
performance of our approach has been com-
Press.
pared to that ones submitted this year but
also to that ones submitted last year. Exper- dos Santos, C. N. and M. Gatti. 2014. Deep
imental results suggest that we need to in- convolutional neural networks for senti-
clude explore new features, such as word em- ment analysis of short texts. In COLING,
bedding representations or paraphrase (Zhao pages 69–78.
and Lan, 2015), in order to improve the per-
Fabo, P. R., M. Cuadros, and T. Etchegoy-
formance.
hen. 2013. Lexical normalization of
As future work we plan to include new fea-
spanish tweets with preprocessing rules,
tures explained before and to develop a new
domain-specific edit distances, and lan-
system which combines different ml classifi-
guage models. In Proceedings of the Tweet
cation methods. We are also interested in
Normalization Workshop co-located with
considering different paradigms of heteroge-
29th Conference of the Spanish Society
neous classification, such as deep learning to
for Natural Language Processing (SEPLN
increase the performance.
2013), Madrid, Spain, September 20th,
References 2013., pages 59–63.
Alvarez-López, T., J. Juncal-Martınez, Fernández-Gavilanes, M., T. Álvarez-López,
M. Fernández-Gavilanes, E. Costa- J. Juncal-Martı́nez, E. Costa-Montenegro,
Montenegro, and F. J. González-Castano. and F. J. González-Castaño. 2016. Unsu-
2016. Gti at semeval-2016 task 5: Svm pervised method for sentiment analysis in
and crf for aspect detection and unsu- online texts. Expert Systems with Appli-
pervised aspect-based sentiment analysis. cations, 58:57–75.
Proceedings of SemEval, pages 306–311. Garcı́a-Cumbreras, M. A., J. Villena-Román,
Atserias, J., B. Casas, E. Comelles, E. Martı́nez-Cámara, M. C. Dı́az-Galiano,
M. González, L. Padró, and M. Padró. M. T. Martı́n-Valdivia, and L. A. Ureña-
2006. Freeling 1.3: Syntactic and seman- López. 2016. Overview of tass 2016. In
tic services in an open-source NLP library. Proceedings of TASS 2016: Workshop on
56
GTI en TASS 2016: Una aproximación supervisada para el análisis de sentimiento basado en aspectos en Twitter

Sentiment Analysis at SEPLN co-located (LREC’10), Valletta, Malta, may. Eu-
with the 32nd SEPLN Conference (SE- ropean Language Resources Association
PLN 2016), Salamanca, Spain, Septem- (ELRA).
ber.
Paltoglou, G. and M. Thelwall. 2012. Twit-
Han, B. and T. Baldwin. 2011. Lexi- ter, myspace, digg: Unsupervised sen-
cal normalisation of short text messages: timent analysis in social media. ACM
Makn sens a #twitter. In Proceedings of Transactions on Intelligent Systems and
the 49th Annual Meeting of the Associa- Technology (TIST), 3(4):66.
tion for Computational Linguistics: Hu- Pang, B. and L. Lee. 2008. Opinion min-
man Language Technologies - Volume 1, ing and sentiment analysis. Found. Trends
HLT ’11, pages 368–378, Stroudsburg, PA, Inf. Retr., 2(1-2):1–135, January.
USA. Association for Computational Lin-
guistics. Park, S. 2015. Sentiment classification us-
ing sociolinguistic clusters. In Proceedings
Hurtado, L. F., F. Plà, and D. Bus- of TASS 2015: Workshop on Sentiment
caldi. 2015. ELiRF-UPV en TASS Analysis at SEPLN co-located with 31st
2015: Análisis de sentimientos en Twit- SEPLN Conference (SEPLN 2015), Ali-
ter. In Proceedings of TASS 2015: Work- cante, Spain, September 15, 2015., pages
shop on Sentiment Analysis at SEPLN co- 99–104.
located with 31st SEPLN Conference (SE-
PLN 2015), Alicante, Spain, September Vilares, D., Y. Doval, M. A. Alonso, and
15, 2015., pages 75–79. C. Gómez-Rodrı́guez. 2015. Lys at
TASS 2015: Deep learning experiments
Li, G. and F. Liu. 2010. A clustering-based for sentiment analysis on spanish tweets.
approach on sentiment analysis. In Intel- In Proceedings of TASS 2015: Work-
ligent Systems and Knowledge Engineer- shop on Sentiment Analysis at SEPLN co-
ing (ISKE), 2010 International Confer- located with 31st SEPLN Conference (SE-
ence on, pages 331–337. IEEE. PLN 2015), Alicante, Spain, September
15, 2015., pages 47–52.
Liu, B. 2012. Sentiment Analysis and Opin-
ion Mining. Synthesis Lectures on Human Wang, X., F. Wei, X. Liu, M. Zhou, and
Language Technologies. Morgan & Clay- M. Zhang. 2011. Topic sentiment anal-
pool Publishers. ysis in Twitter: A graph-based hashtag
sentiment classification approach. In Pro-
Martı́nez-Cámara, E., M. A. Garcı́a- ceedings of the 20th ACM International
Cumbreras, J. Villena-Román, and Conference on Information and Knowl-
J. Garcı́a-Morera. 2016. Tass 2015 - the edge Management, CIKM ’11, pages 1031–
evolution of the spanish opinion mining 1040, New York, NY, USA. ACM.
systems. Procesamiento del Lenguaje
Natural, 56:33–40. Zhao, J. and M. Lan. 2015. Ecnu: Lever-
aging word embeddings to boost perfor-
Mohammad, S. M., S. Kiritchenko, and mance for paraphrase in Twitter. In Pro-
X. Zhu. 2013. Nrc-canada: Building the ceedings of the 9th International Work-
state-of-the-art in sentiment analysis of shop on Semantic Evaluation (SemEval
tweets. In Proceedings of the seventh in- 2015), pages 34–39, Denver, Colorado,
ternational workshop on Semantic Evalu- June. Association for Computational Lin-
ation Exercises (SemEval-2013), Atlanta, guistics.
Georgia, USA, June.
Pak, A. and P. Paroubek. 2010. Twit-
ter as a corpus for sentiment analy-
sis and opinion mining. In N. C. C.
Chair), K. Choukri, B. Maegaard, J. Mar-
iani, J. Odijk, S. Piperidis, M. Ros-
ner, and D. Tapias, editors, Proceedings
of the Seventh International Conference
on Language Resources and Evaluation
57