<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julio Villena Román</string-name>
          <email>jvillena@daedalus.es</email>
          <email>{jvillena, jgarcia, cdepablo}@daedalus.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Ángel García Cumbreras</string-name>
          <email>{magc, emcamara, laurena, maite}@uja.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eugenio Martínez Cámara, M. Teresa Martín Valdivia, L. Alfonso Ureña López, Universidad de Jaén</institution>
          ,
          <addr-line>23071 Jaén</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Janine García Morera</institution>
          ,
          <addr-line>Daedalus, S.A., 28031 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This paper describes TASS 2105, the fourth edition of the Workshop on Sentiment Analysis at SEPLN. The main objective is to promote the research and the development of new algorithms, resources and techniques in the field of sentiment analysis in social media (specifically Twitter), focused on Spanish language. This paper presents the TASS 2015 proposed tasks, the contents of the generated corpora, the participant groups and the results and analysis of them.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>TASS is an experimental evaluation workshop,
a satellite event of the annual SEPLN
Conference, with the aim to promote the
research of sentiment analysis systems in social
media, focused on Spanish language. The
fourth edition will be held on September 15th,
2015 at University of Alicante, Spain.</p>
      <p>Sentiment analysis (SA) can be defined as
the computational treatment of opinion,
sentiment and subjectivity in texts (Pang &amp; Lee,
2002). It is a hard task because even humans
often disagree on the sentiment of a given text.</p>
      <p>And it is a harder task when the text has only
140 characters (Twitter messages or tweets).</p>
      <p>Text classification techniques, although
studied and improved for a longer time, still
need more research effort and resources to be
able to build better models to improve the
current result values. Polarity classification
has usually been tackled following two main
approaches. The first one applies machine
learning algorithms in order to train a polarity
classifier using a labelled corpus (Pang et al.
2002). This approach is also known as the
supervised approach. The second one is known
as semantic orientation, or the unsupervised
approach, and it integrates linguistic resources
in a model in order to identify the valence of
the opinions (Turney 2002).</p>
      <p>The aim of TASS is to provide a competitive
forum where the newest research works in the
field of SA in social media, specifically focused
on Spanish tweets, are showed and discussed by
scientific and business communities.</p>
      <p>The rest of the paper is organized as follows.</p>
      <p>Section 2 describes the different corpus
provided to participants. Section 3 shows the
different tasks of TASS 2015. Section 4
describes the participants and the overall results
are presented in Section 5. Finally, the last
section shows some conclusions and future
directions.</p>
      <p>Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido
2</p>
    </sec>
    <sec id="sec-2">
      <title>Corpus</title>
      <p>TASS 2015 experiments are based on three
corpus, specifically built for the different
editions of the workshop.</p>
      <sec id="sec-2-1">
        <title>2.1 General corpus</title>
        <p>The general corpus contains over 68.000 tweets,
written in Spanish, about 150 well-known
personalities and celebrities of the world of
politics, economy, communication, mass media
and culture, between November 2011 and
March 2012. Although the context of extraction
has a Spain-focused bias, the diverse nationality
of the authors, including people from Spain,
Mexico, Colombia, Puerto Rico, USA and
many other countries, makes the corpus reach a
global coverage in the Spanish-speaking world.
Eachtweet includes its ID (tweetid), the creation
date (date) and the user ID (user). Due to
restrictions in the Twitter API Terms of Service
(https://dev.twitter.com/terms/api-terms), it is
forbidden to redistribute a corpus that includes
text contents or information about users.
However, it is valid if those fields are removed
and instead IDs (including Tweet IDs and user
IDs) are provided. The actual message content
can be easily obtained by making queries to the
Twitter API using the tweetid.</p>
        <p>The general corpus has been divided into
training set (about 10%) and test set (90%). The
training set was released, so the participants
could train and validate their models. The test
corpus was provided without any tagging and
has been used to evaluate the results.
Obviously, it was not allowed to use the test
data from previous years to train the systems.
Each tweet was tagged with its global polarity
(positive, negative or neutral sentiment) or no
sentiment at all. A set of 6 labels has been
defined: strong positive (P+), positive (P),
neutral (NEU), negative (N), strong negative
(N+) and one additional no sentiment tag
(NONE).</p>
        <p>In addition, there is also an indication of the
level of agreement or disagreement of the
expressed sentiment within the content, with
two possible values: AGREEMENT and
DISAGREEMENT. This is especially useful to
make out whether a neutral sentiment comes
from neutral keywords or else the text contains
positive and negative sentiments at the same
time.</p>
        <p>Moreover, the polarity values related to the
entities that are mentioned in the text are also
included for those cases when applicable. These
values are similarly tagged with 6 possible
values and include the level of agreement as
related to each entity.</p>
        <p>This corpus is based on a selection of a set of
topics. Thematic areas such as "política"
("politics"), "fútbol" ("soccer"), "literatura"
("literature") or "entretenimiento"
("entertainment"). Each tweet in both the
training and test set has been assigned to one or
several of these topics (most messages are
associated to just one topic, due to the short
length of the text).</p>
        <p>All tagging has been done
semiautomatically: a baseline machine learning
model is first run and then all tags are manually
checked by human experts. In the case of the
polarity at entity level, due to the high volume
of data to check, this tagging has just been done
for the training set.</p>
        <p>Users were journalists (periodistas),
politicians (políticos) or celebrities (famosos).
The only language involved this year was
Spanish (es).</p>
        <p>The list of topics that have been selected is
the following:
• Politics (política)
• Entertainment (entretenimiento)
• Economy (economía)
• Music (música)
• Soccer (fútbol)
• Films (películas)
• Technology (tecnología)
• Sports (deportes)
• Literature (literatura)
• Other (otros)</p>
        <p>The corpus is encoded in XML. Figure 1
shows the information of two sample tweets.
The first tweet is only tagged with the global
polarity as the text contains no mentions to any
entity, but the second one is tagged with both
the global polarity of the message and the
polarity associated to each of the entities that
appear in the text (UPyD and Foro Asturias).</p>
        <p>Equipo - Real Madrid (Team - Real
Madrid)
Equipo (any other team)
Jugador - Alexis Sánchez (Player
Alexis Sánchez)
Jugador - Álvaro Arbeloa (Player
Álvaro Arbeloa)
Jugador - Andrés Iniesta (Player
Andrés Iniesta)
Jugador - Ángel Di María (Player
Ángel Di Maria)
Jugador - Asier Ilarramendi (Player
Asier Ilarramendi)
Jugador - Carles Puyol (Player - Carles
Puyol)
Jugador - Cesc Fábregas (Player - Cesc
Fábregas)
Jugador - Cristiano Ronaldo (Player
Cristiano Ronaldo)
Jugador - Dani Alves (Player - Dani
Alves)
Jugador - Dani Carvajal (Player - Dani
Carvajal)
Jugador - Fábio Coentrão (Player
Fábio Coentrão)
Jugador - Gareth Bale (Player - Gareth
Bale)
Jugador - Iker Casillas (Player - Iker
Casillas)
Jugador - Isco (Player - Isco)
Jugador - Javier Mascherano (Player
Javier Mascherano)
Jugador - Jesé Rodríguez (Player - Jesé
Rodríguez)
Jugador - José Manuel Pinto (Player
José Manuel Pinto)
Jugador - Karim Benzema (Player
Karim Benzema)
Jugador - Lionel Messi (Player - Lionel
Messi)
Jugador - Luka Modric (Player - Luka
Modric)
Jugador - Marc Bartra (Player - Marc
Bartra)
Jugador - Neymar Jr. (Player - Neymar
Jr.)
Jugador - Pedro Rodríguez (Player
Pedro Rodríguez)
Jugador - Pepe (Player - Pepe)
Jugador - Sergio Busquets (Player
Sergio Busquets)
Jugador - Sergio Ramos (Player - Sergio
Ramos)
• Jugador - Xabi Alonso (Player - Xabi</p>
        <p>Alonso)
• Jugador - Xavi Hernández (Player</p>
        <p>Xavi Hernández)
• Jugador (any other player)
• Partido (Football match)
• Retransmisión (broadcast)</p>
        <p>Sentiment polarity has been tagged from the
point of view of the person who writes the
tweet, using 3 levels: P, NEU and N. No
distinction is made in cases when the author
does not express any sentiment or when he/she
expresses a no-positive no-negative sentiment.</p>
        <p>The Social-TV corpus was randomly divided
into training set (1.773 tweets) and test set
(1.000 tweets), with a similar distribution of
both aspects and sentiments. The training set
was released previously and the test corpus was
provided without any tagging and has been used
to evaluate the results provided by the different
systems.</p>
        <p>The following figure shows the information
of three sample tweets in the training set.</p>
        <p>STOMPOL (corpus of Spanish Tweets for
Opinion Mining at aspect level about POLitics)
is a corpus of Spanish tweets prepared for the
research in the challenging task of opinion
mining at aspect level. The tweets were
gathered from 23rd to 24th of April 2015, and
are related to one of the following political
aspects that appear in political campaigns:
• Economics (Economía): taxes,
infrastructure, markets, labor policy...
• Health System (Sanidad): hospitals,
public/private health system, drugs,
doctors...
• Education (Educacion): state school, private
school, scholarships...
• Political party (Propio_partido): anything
good (speeches, electoral programme...) or
bad (corruption, criticism) related to the
entity
• Otros_aspectos (Other aspects): electoral
system, environmental policy...</p>
        <p>Each aspect is related to one or several
entities that correspond to one of the main
political parties in Spain, which are:
• Partido_Popular (PP)
• Partido_Socialista_Obrero_Español
(PSOE)
• Izquierda_Unida (IU)
• Podemos
• Ciudadanos (Cs)
• Unión_Progreso_y_Democracia (UPyD)</p>
        <p>Each tweet in the corpus has been manually
tagged by two annotators, and a third one in
case of disagreement, with the sentiment
polarity at aspect level. Sentiment polarity has
been tagged from the point of view of the
person who writes the tweet, using 3 levels: P,
NEU and N. Again, no difference is made
between no sentiment and a neutral sentiment
(neither positive nor negative). Each political
aspect is linked to its correspondent political
party and its polarity.</p>
        <p>These three corpora will be made freely
available to the community after the workshop.
Please send an email to tass@daedalus.es filling
in the TASS Corpus License agreement with
your email, affiliation (institution, company or
any kind of organization) and a brief
description of your research objectives, and you
will be given a password to download the files
in the password protected area. The only
requirement is to include a citation to a relevant
paper and/or the TASS website.</p>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Description of tasks</title>
      <p>First of all, we are interested in evaluating
the evolution of the different approaches for SA
and text classification in Spanish during these
years. So, the traditional SA at global level task
will be repeated again, reusing the same corpus,
to compare results. Moreover, we want to foster
the research in the analysis of fine-grained
polarity analysis at aspect level (aspect-based
SA, one of the new requirements of the market
of natural language processing in these areas).
So, two legacy tasks will be repeated again, to
compare results, and a new corpus has been
created for the second task.</p>
      <p>Participants are expected to submit up to 3
results of different experiments for one or both
of these tasks, in the appropriate format
described below.</p>
      <p>Along with the submission of experiments,
participants have been invited to submit a paper
to the workshop in order to describe their
experiments and discussing the results with the
audience in a regular workshop session.</p>
      <p>The two proposed tasks are described next.
3.1
(legacy) Task 1: Sentiment Analysis
at Global Level</p>
      <p>This is the same task as previous editions.
This task consists on performing an automatic
polarity classification to determine the global
polarity of each message in the test set of the
General corpus. Participants have been
provided with the training set of the General
corpus so that they may train and validate their
models. There will be two different evaluations:
one based on 6 different polarity labels (P+, P,
NEU, N, N+, NONE) and another based on just 4
labels (P, N, NEU, NONE).</p>
      <p>Participants are expected to submit (up to 3)
experiments for the 6-labels evaluation, but are
also allowed to submit (up to 3) specific
experiments for the 4-labels scenario.</p>
      <p>Results must be submitted in a plain text file
with the following format:</p>
      <p>tweetid \t polarity
where polarity can be:
• P+, P, NEU, N, N+ and NONE for the 6-labels
case
• P, NEU, N and NONE for the 4-labels case.
The same test corpus of previous years will be
used for the evaluation, to allow for comparison
among systems. Accuracy, macroaveraged
precision, macroaveraged recall and
macroaveraged F1-measure have been used to
evaluate each run.</p>
      <p>Notice that there are two test sets: the
complete set and 1k set, a subset of the first
one. The reason is that, to deal with the problem
of the imbalanced distribution of labels between
the training and test set, a selected test subset
containing 1.000 tweets with a similar
distribution to the training corpus was extracted
to be used for an alternate evaluation of the
performance of systems.
3.2
(legacy) Task 2: Aspect-based
sentiment analysis
Participants have been provided with a corpus
tagged with a series of aspects, and systems
must identify the polarity at the aspect-level.
Two corpora have been provided: the
SocialTV corpus, used in TASS 2014, and the new
STOMPOL corpus, collected in 2015
(described above). Both corpora have been
splitted into training and test set, the first one
for building and validating the systems, and the
second for evaluation.</p>
      <p>Participants are expected to submit up to 3
experiments for each corpus, each in a plain
text file with the following format:
tweetid \t aspect \t polarity
[for the Social-TV corpus]
tweetid \t aspect-entity \t polarity
[for the STOMPOL corpus]
Allowed polarity values are P, N and NEU.</p>
      <p>For evaluation, a single label combining
"aspect-polarity" has been considered. Similarly
to the first task, accuracy, macroaveraged
precision, macroaveraged recall and
macroaveraged F1-measure have been
calculated for the global result.</p>
      <p>4</p>
    </sec>
    <sec id="sec-4">
      <title>Participants and Results</title>
      <p>This year 35 groups registered (as compared to
31 groups last year) but unfortunately only 7
groups (14 last year) sent their submissions.
The list of active participant groups is shown in
Table 2, including the tasks in which they have
participated.</p>
      <p>Fourteen of the seventeen participant groups
sent a report describing their experiments and
results achieved. Papers were reviewed and
included in the workshop proceedings.
References are listed in Table 3.
Group
LIF
ELiRF
GSI
LyS
DLSI
GTI-GRAD
ITAINNOVA
SINAI-ESMA
CU
TID-spark
BittenPotato
SINAI_wd2v
DT
GAS-UCR
UCSP
SEDEMO
INGEOTEC
Total groups
Submitted runs and results for Task 1,
evaluation based on 5 polarity levels with the
whole General test corpus, are shown in Table
4. Accuracy, macroaveraged precision,
macroaveraged recall and macroaveraged
F1measure have been used to evaluate each
individual label and ranking the systems.</p>
      <p>Run Id
LIF-Run-3
LIF-Run-2
ELiRF-run3
LIF-Run-1
ELiRF-run1
ELiRF-run2
GSI-RUN-1
run_out_of_date
GSI-RUN-2
GSI-RUN-3
LyS-run-1
DLSI-Run1
Lys-run-2
GTI-GRAD-Run1
Ensemble exp1.1
SINAI-EMMA-1
INGEOTEC-M1
Ensemble exp3_emotions
CU-Run-1
TID-spark-1
BP-wvoted-v2_1
Ensemble exp2_emotions
Acc
0.672
0.654
0.659
0.628
0.648
0.658
0.618
0.673
0.610
0.608
0.552
0.595
0.568
0.592
0.535
0.502
0.488
0.549
0.495
0.462
0.534
0.524</p>
      <p>As previously described, an alternate
evaluation of the performance of systems was
done using a new selected test subset containing
1.000 tweets with a similar distribution to the
training corpus. Results are shown in Table 5.</p>
      <p>In order to perform a more in-depth
evaluation, results are calculated considering
the classification only in 3 levels (POS, NEU,
NEG) and no sentiment (NONE) merging P and P+
in only one category, as well as N and N+ in
another one. The same double evaluation using
the whole test corpus and a new selected corpus
have been carried out, shown Tables 8 and 9.
Run Id
Acc</p>
      <p>Task 2: Aspect-based Sentiment</p>
      <p>Analysis
Submitted runs and results for Task 2, with the
Social-TV and STOMPOL corpus, are shown in
Tables 10 and 11. Accuracy, macroaveraged
precision, macroaveraged recall and
macroaveraged F1-measure have been used to
evaluate each individual label and ranking the
systems.</p>
      <p>Run Id
GSI-RUN-1
GSI-RUN-2
GSI-RUN-3</p>
      <p>TASS was the first workshop about SA
focused on the processing of texts written in
Spanish. Clearly this area receives great
attraction from research groups and companies,
as this fourth edition has had a greater impact in
terms of registered groups, and the number of
participants that submitted experiments in 2015
tasks has increased.</p>
      <p>Anyway, the developed corpus and gold
standards, and the reports from participants will
for sure be helpful for other research groups
approaching these tasks.</p>
      <p>TASS corpora will be released after the
workshop for free use by the research
community. In 2014 the corpora had been
downloaded up to date by more than 60
research groups, 25 out of Spain, by groups
coming from academia and also from private
companies to use the corpus as part of their
product development. We expect to reach a
similar impact with this year's corpus.</p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>This work has been partially supported by a
grant from the Fondo Europeo of Desarrollo
Regional (FEDER), ATTOS
(TIN2012-38536C03-0) and Ciudad2020 (INNPRONTA
IPT20111006) projects from the Spanish
Government, and AORESCU project
(P11-TIC7684 MO).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Villena-Román</surname>
            , Julio; Lana-Serrano, Sara; Martínez-Cámara, Eugenio; GonzálezCristobal,
            <given-names>José</given-names>
          </string-name>
          <string-name>
            <surname>Carlos</surname>
          </string-name>
          .
          <year>2013</year>
          . TASS - Workshop
          <source>on Sentiment Analysis at SEPLN. Revista de Procesamiento del Lenguaje Natural</source>
          ,
          <volume>50</volume>
          , pp
          <fpage>37</fpage>
          -
          <lpage>44</lpage>
          . http://journal.sepln.org/sepln/ojs/ojs/index.p hp/pln/article/view/4657.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Villena-Román</surname>
            , Julio; García-Morera, Janine; Lana-Serrano, Sara; González-Cristobal,
            <given-names>José</given-names>
          </string-name>
          <string-name>
            <surname>Carlos</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>TASS 2013 - A Second Step in Reputation Analysis in Spanish</article-title>
          .
          <source>Revista de Procesamiento del Lenguaje Natural</source>
          ,
          <volume>52</volume>
          , pp
          <fpage>37</fpage>
          -
          <lpage>44</lpage>
          . http://journal.sepln.org/sepln/ojs/ojs/index.p hp/pln/article/view/4901.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Vilares</surname>
          </string-name>
          , David; Doval, Yerai; Alonso, Miguel A.;
          <string-name>
            <surname>Gómez-Rodríguez</surname>
          </string-name>
          , Carlos.
          <source>LyS at TASS</source>
          <year>2014</year>
          :
          <article-title>A Prototype for Extracting and Analysing Aspects from Spanish tweets</article-title>
          .
          <source>In Proc. of the TASS workshop at SEPLN</source>
          <year>2014</year>
          .
          <volume>16</volume>
          -
          <issue>19</issue>
          <year>September 2014</year>
          , Girona, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Perea-Ortega</surname>
            ,
            <given-names>José M.</given-names>
          </string-name>
          <string-name>
            <surname>Balahur</surname>
            ,
            <given-names>Alexandra.</given-names>
          </string-name>
          <article-title>Experiments on feature replacements for polarity classification of Spanish tweets</article-title>
          .
          <source>In Proc. of the TASS workshop at SEPLN</source>
          <year>2014</year>
          .
          <volume>16</volume>
          -
          <issue>19</issue>
          <year>September 2014</year>
          , Girona, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Hernández</given-names>
            <surname>Petlachi</surname>
          </string-name>
          , Roberto; Li,
          <string-name>
            <surname>Xiaoou.</surname>
          </string-name>
          <article-title>Análisis de sentimiento sobre textos en Español basado en aproximaciones semánticas con reglas lingüísticas</article-title>
          .
          <source>In Proc. of the TASS workshop at SEPLN</source>
          <year>2014</year>
          .
          <volume>16</volume>
          -
          <issue>19</issue>
          <year>September 2014</year>
          , Girona, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Montejo-Ráez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>García-Cumbreras</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Díaz-Galiano</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          <article-title>Participación de SINAI Word2Vec en TASS 2014</article-title>
          .
          <source>In Proc. of the TASS workshop at SEPLN</source>
          <year>2014</year>
          .
          <volume>16</volume>
          -
          <issue>19</issue>
          <year>September 2014</year>
          , Girona, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Hurtado</surname>
          </string-name>
          , Lluís F.;
          <string-name>
            <surname>Pla</surname>
          </string-name>
          , Ferran.
          <source>ELiRF-UPV en TASS</source>
          <year>2014</year>
          : Análisis de Sentimientos, Detección de Tópicos y Análisis de Sentimientos de Aspectos en Twitter.
          <source>In Proc. of the TASS workshop at SEPLN</source>
          <year>2014</year>
          .
          <volume>16</volume>
          -
          <issue>19</issue>
          <year>September 2014</year>
          , Girona, Spain.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>