<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Evaluating a Sentiment Analysis Approach from a Business Point of View</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Javi Fernandez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yoan Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Tomas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose M. Gomez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patricio Mart nez-Barco</string-name>
          <email>patriciog@dlsi.ua.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software and Computing Systems, University of Alicante</institution>
        </aff>
      </contrib-group>
      <fpage>93</fpage>
      <lpage>98</lpage>
      <abstract>
        <p>In this paper, we describe our contribution for the Task 1: Sentiment Analysis at global level of the TASS 2015 competition. This work presents our approach and the results obtained, focusing the evaluation and the discussion in the context of business enterprises.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years, with the explosion of Web
2.0, textual information has become one of
the most important sources of knowledge to
extract useful data from. Texts can
provide factual information, but also
opinionbased information, such as reviews, emotions,
and feelings. Blogs, forums and social
networks, as well as second screen scenarios,
o er a place for people to share
information in real time. Second screen refers to
the use of devices (commonly mobile devices)
to provide interactive features on streaming
content (such as television programs)
provided within a software application or
realtime video on social networking applications.
These facts have motivated recent researches
in the identi cation and extraction of
opin</p>
      <p>We would like to express our gratitude for the
nancial support given by the Department of
Software and Computer Systems at the University of
Alicante, the Spanish Ministry of Economy and
Competitivity (Spanish Government) by the project grants
ATTOS (TIN2012-38536-C03-03) and LEGOLANG
(TIN2012-31224), the European Commission by the
project grant SAM (FP7-611312), and the University
of Alicante by the project \Explotacion y tratamiento
de la informacion disponible en Internet para la
anotacion y generacion de textos adaptados al usuario"
(GRE13-15)
ions and sentiments in user comments (UC),
providing invaluable information, especially
for companies willing to understand
customers' perceptions about their products or
services in order to take appropriate business
decisions. In addition, users can nd
opinions about a product they are interested in,
and companies and personalities can monitor
their online reputation.</p>
      <p>However, processing this kind of
information brings di erent technological challenges.
The large amount of available data, its
unstructured nature, and the need to avoid the
loss of relevant information, makes almost
impossible its manual processing.
Nevertheless, Natural Language Processing (NLP)
technologies can help in analysing these large
amounts of UC automatically. Nowadays,
Sentiment Analysis (SA) as part of an NLP
task has become a popular discipline due
to its wide-relatedness to social media
behaviour studies. SA is commonly used to
analyse the comments that people post on
social networks. Also, it allows to identify the
preferences and criteria of users about
situations, events, products, brands, etc.</p>
      <p>In this work we apply SA to the social
context, speci cally to address the Task 1:</p>
    </sec>
    <sec id="sec-2">
      <title>Sentiment Analysis at global level as part of</title>
      <p>
        Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido
TASS1 2015 challenge. This task consists on
determining the global polarity of each
message over provided test sets of general
purpose. A detailed description about the
workshop and the mentioned task can be found
in
        <xref ref-type="bibr" rid="ref3">(Villena-Roman et al., 2015)</xref>
        . The context
of the workshop is also part of second screen
phenomenon, in which users generate
feedbacks of their experiences by posting them in
social media. Our approach goes on that
direction being part of the SAM 2 (Socialising
Around Media) platform, where \[...] users
are interacting with media: from passive and
one-way to proactive and interactive. Users
now comment on or recommend a TV
programme and search for related information
with both friends and the wider social
community."
      </p>
      <p>In this paper we present our SA
system. This approach builds its own sentiment
resource based on annotated samples, and
based on the information collected it
generates a machine learning classi er to deal with
the SA challenges. The paper is structured
as follows: The next section provides related
works where main insights of each approach
are exposed. The classi cation system is
described in Section 3. Subsequently, Section
4 exposes in detail the evaluation, not just
focusing on the guidelines of the TASS
competition, but also on those aspects of interest
for companies. Finally, the conclusions and
future work are presented in Section 5.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>
          Di erent techniques have been used for both
product reviews and social content analysis
to obtain lexicons of subjective words with
their associated polarity. We can start
mentioning the strategy de ned by Hu y Liu
(2004) which starts with a set of seed
adjectives (\good" and \bad") and reinforces
the semantic knowledge by applying and
expanding the lexicon with synonymy and
antonymy relations provided by WordNet 3
(Miller, 1993). As a result, an opinion
lexicon composed by a list of positive and
negative opinion words for English (around 6; 800
words) was obtained. A similar approach
has been used for building WordNet-A ect
          <xref ref-type="bibr" rid="ref2">(Strapparava y Valitutti, 2004)</xref>
          in which six
basic categories of emotions (joy, sadness,
1www.daedalus.es/TASS2015
2www.socialisingaroundmedia.com
3wordnet.princeton.edu
fear, surprise, anger and disgust ) were
expanded using WordNet. Other widely used
resource in SA is SentiWordNet (Esuli y
Sebastiani, 2006). It was built using a set
of seed words which polarity was previously
known, and expanded using similarities
between glosses. The main assumption behind
this approach was that \terms with similar
glosses in WordNet tend to have similar
polarity". The main problem of using these
kinds of resources is that they do not consider
the context in which the words appear. Some
methods tried to overcome this issue building
sentiment lexicons using the local context of
words.
        </p>
        <p>Balahur y Montoyo (2008b) built a
recommender system which computed the
polarity of new words using \polarity anchors"
(words whose polarity is known beforehand)
and Normalised Google Distance scores. The
authors used as training examples opinion
words extracted from \pros and cons
reviews" from the same domain, using the clue
that opinion words appearing in the \pros"
section are positive and those appearing in
the \cons" section are negative. Research
carried out by these authors employed the
lexical resource Emotion Triggers (Balahur y
Montoyo, 2008a). Another interesting work
presented by (Popescu y Etzioni, 2007)
extracts the polarity from local context to
compute word polarity. To this extent, it uses a
weighting function of the words around the
context to be classi ed.</p>
        <p>In our approach, the context of the words
is kept using skipgrams. Skipgrams are a
technique whereby n-grams are formed, but
in addition to allowing adjacent sequences of
words, some tokens can be \skipped". The
next section describes our approach in detail.
3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Methodology</title>
        <p>Our approach is based on the one
described in (Fernandez et al., 2013). In
this approach, the knowledge is extracted
from a training dataset, where each
document/sentence/tweet is labelled with respect
to their overall polarity. A sentiment
lexicon is created using the words, word n-grams
and word skipgrams (Guthrie et al., 2006)
extracted from the dataset (Section 3.1). In
this lexicon, terms are statistically scored
according to their appearance within each
polarity (Section 3.2). Finally, a machine
learning model is generated using the mentioned
sentiment resource (Section 3.3). In the
following sections this process is explained in
detail.
3.1</p>
        <sec id="sec-2-2-1">
          <title>Term Extraction</title>
          <p>Each text in the dataset is processed by
removing accents and converting it to lower
case. Then, each text is tokenised into words,
Twitter mentions (starting with @) and
Twitter hashtags (starting with #). We also
include combinations of punctuation symbols
as terms, in order to discover some
polarityspeci c emoticons.</p>
          <p>To improve the recall of our system, we
perform a basic normalisation of the words
extracted by removing all character
repetitions. In addition, we use the stems of the
words extracted, using the Snowball4
stemmer implementation.</p>
          <p>Afterwards, we obtain all the possible
word skipgrams from those terms by
making combinations of adjacent terms and
skipping some of them. Speci cally, we extract
kskip-n-grams, where the maximum number of
terms in the skipgram is de ned by the
variable n and the maximum number of terms
skipped is determined by the variable k. Note
that words and word n-grams are subsets of
the skipgrams extracted. Figure 1 shows an
example of this process.</p>
          <p>We must clarify the di erence between
two concepts: skipgram and skipgram
occurrence. For example, the sentences \I hit the
tennis ball" and \I hit the ball" contain the
skipgram \hit the ball", but there are two
occurrences of that skipgram: the rst one in
the rst example with 1 skipped term, and
the second one in the second example with
no skipped terms. In other words, we will
consider a skipgram as a group of terms that
appear near of each other in the same order,
allowing some other terms between them, and
a skipgram occurrence as the actual
appearance of that skipgram in a text.
3.2</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Term Scoring</title>
          <p>In this step, we calculate a global score for
each skipgram. This score using the formula
in Equation 1, where T represents the set
of texts in the dataset, t is a text from the
dataset T , os;t represents an occurrence of
skipgram s in text t, and k is a function that
returns the number of skipped terms of the
input skipgram occurrence.</p>
          <p>Graciaaaas por tu apoyo @usuario!! :)))
#</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Tokenisation</title>
      <p>Graciaaaas, por, tu, apoyo, @usuario, !!,
:)))</p>
      <p>#</p>
    </sec>
    <sec id="sec-4">
      <title>Normalisation</title>
      <p>gracias, por, tu, apoyo, @usuario, !, :)
#</p>
    </sec>
    <sec id="sec-5">
      <title>Stemming</title>
      <p>graci, por, tu, apoy, @usuario, !, :)
#</p>
    </sec>
    <sec id="sec-6">
      <title>Skipgrams (2-skip-2-grams)</title>
      <p>graci por, graci tu, graci apoy, por tu,
por apoy, por @usuario, tu apoy, tu
@usuario, tu !, apoy @usuario, apoy !,
apoy :), @usuario !, @usuario :), ! :)</p>
      <p>We also calculate a polarity score for each
skipgram and polarity. It is similar to the
previous score, but it only takes into account
the texts with a speci c polarity. The
formula is presented in Equation 2, very similar
to Equation 1, but where p represents a
speci c polarity, and Tp is the set of texts in the
training corpus annotated with polarity p.
score(s; p) =</p>
      <p>X X
t2Tp os;t2t k(os;t) + 1
1
(2)</p>
      <p>At the end of this process we have a list of
skipgrams with a global score and a polarity
score, that forms our sentiment resource.
3.3</p>
      <sec id="sec-6-1">
        <title>Learning</title>
        <p>Once we have created our statistical
sentiment resource, we generate a machine
learning model. We consider each polarity as a
category and each text as a training instance
to build our model. For each text, we will
de ne one feature per polarity. For
example, if we are categorising into positive,
negative or neutral (3 categories), there will be 3
features for each document, called positive,
negative, and neutral respectively.</p>
        <p>The values for these features will be
calculated using the sentiment resource,
combining the previously calculated scores of all the
value(p; t) =
1
X (
os;t2t k(os;t) + 1 score(s; p) + 1
score(s; p)
score(s; p)
score(s)
)
(3)
skipgram occurrences in the text, to nally
have one value for each feature. The formula
used can be seen in Equation 3, where p
represents a speci c polarity, t is a text from the
dataset, os;t represents an occurrence of
skipgram s in text t, and k is a function that
returns the number of skipped terms of the
input skipgram occurrence. This formula gives
more importance to occurrences with a low
number of skipped terms, with a high
number occurrences in the dataset in general, and
with a high number of occurrences within a
speci c polarity.</p>
        <p>
          Finally, a model will be generated using
the features speci ed and their values
obtained as explained above. The machine
learning algorithm selected is Support
Vector Machines (SVM), due to its good
performance in text categorisation tasks
          <xref ref-type="bibr" rid="ref1">(Sebastiani, 2002)</xref>
          and previous works (Fernandez
et al., 2013).
4
        </p>
        <sec id="sec-6-1-1">
          <title>Evaluation</title>
          <p>ditional experiments using di erent category
con gurations. These are the con gurations
chosen:</p>
          <p>Default. In this con guration, we used
the categories speci ed in the workshop:
NONE, NEU, P+, P, N+ and N.</p>
          <p>Subjectivity. In this con guration, we
used only two categories: SUBJECTIVE
and OBJECTIVE. The SUBJECTIVE
includes the texts that express opinions
(positive, neutral and negative), and the
OBJECTIVE category represents no
opinionated texts. The goal of this con
guration is to discover users' messages that
involve opinions.</p>
          <p>Polarity. In this experiment, we used
only two categories: POSITIVE and
NEGATIVE, independently of their
intensity. The rest of the texts were
discarded. By using this kind of
categorisation it is possible to simplify an analysis
report into only two main points of view.
Polarity+Neutral. In these
experiments, only the opinionated categories
were used: POSITIVE, NEUTRAL and
NEGATIVE. In this case, the NEUTRAL
category includes both not opinionated
texts and neutral text. Business
companies in some cases need to consider
neutral feedbacks, since the neutral
mentions can also be considered as positive
for their reputation.</p>
          <p>For the experiments, we also employed
additional datasets, so we can extrapolate our
conclusions to other domains. Their
distribution can be seen in Table 2. These are the
datasets chosen:</p>
          <p>
            TASS-Train and TASS-Test. These
are the o cial train and test dataset of
the TASS 2015 Workshop respectively.
Sanders. This is the Sanders Dataset 5.
It consists of hand-classi ed tweets
labelled as positive, negative or neutral.
5www.sananalytics.com/lab/twitter-sentiment
MR-P. This is the well-known Movie
Reviews Polarity Dataset 2.0 6
            <xref ref-type="bibr" rid="ref2">(Pang y
Lee, 2004)</xref>
            . It contains reviews of movies
labelled with respect to their overall
sentiment polarity (positive and negative).
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>MR-PS. The Movie Reviews Sentence</title>
    </sec>
    <sec id="sec-8">
      <title>Polarity Dataset 1.0 (Pang y Lee, 2005).</title>
      <p>It has sentences from movie reviews
labelled with respect their polarity
(positive and negative).</p>
    </sec>
    <sec id="sec-9">
      <title>MR-SS. The Movie Reviews Subjectiv</title>
      <p>
        ity Dataset 1.0
        <xref ref-type="bibr" rid="ref2">(Pang y Lee, 2004)</xref>
        . It
has sentences from movie reviews
labelled with respect to their subjectivity
status (subjective or objective).
      </p>
      <p>These experiments were performed
combining the datasets and the con gurations,
using 10-fold cross validation, as these
corpora do not have a default division into train
and test datasets. Note that not all the
datasets can be used in all con gurations.
For example, the Sanders dataset can be used
to evaluate Polarity and Polarity+Neutral,
but not with Subjectivity, as texts are not
explicitly divided into not opinionated (NONE)
and neutral (NEU). Table 3 shows the results
obtained.</p>
      <p>First of all, it should be noted that our
model does not use information out of the
training dataset. Thus, it will work very well
with datasets in a speci c domain and
similar topics. However, in small and
heterogeneous datasets the results will be lower. We
consider MR-SS, MR-P and MR-PS as
homogeneous datasets (only within the movies
domain) and TASS-Train, TASS-Test and
Sanders as heterogeneous datasets.</p>
      <p>
        As we can see in Table 3, the best
results were obtained in subjectivity detection
in closed domains (MR-SS), with a F-score of
0:92. In open domains the results are
noticeably worse. In our opinion, the results
obtained are good enough for business, as
studies like
        <xref ref-type="bibr" rid="ref4">Wilson et al. (2005)</xref>
        report a 0:82
of human agreement when working with the
      </p>
    </sec>
    <sec id="sec-10">
      <title>Polarity+Neutral con guration.</title>
      <p>In addition, when evaluating subjectivity
the results are signi cantly better when the
corpus is in closed domains (movies in this
case), and worse in open domains.
However, polarity evaluation does not seem to be
6www.cs.cornell.edu/people/pabo/movie-reviewdata
as domain dependent as subjectivity
evaluation. Results evaluating polarity are very
similar independently of the type of dataset
employed.
5</p>
      <sec id="sec-10-1">
        <title>Conclusions</title>
        <p>In this paper, we presented our
contribution for the Task 1 (Sentiment Analysis at
global level) of the TASS 2015
competition. The approach presented is a hybrid
approach, which builds its own sentiment
resource based on annotated samples, and
generates a machine learning model based on the
information collected.</p>
        <p>Di erent category con gurations and
different data sets were evaluated to assess
the performance of our approach
considering business enterprises interests regarding
the analysis of user feedbacks. The results
obtained are promising and encourage us to
continue with our research line.</p>
        <p>As future work we plan to train our system
with di erent datasets, in terms of size and
domain, and combine our sentiment lexicon
with existing ones (such as SentiWordNet or
WordNet A ect ) to improve the recall of our
approach.</p>
      </sec>
      <sec id="sec-10-2">
        <title>Bibliograf a</title>
        <p>Balahur, Alexandra y Andres Montoyo.
2008a. Applying a culture dependent
emotion triggers database for text valence
and emotion classi cation. Procesamiento
del lenguaje natural, 40:107{114.</p>
        <p>Balahur, Alexandra y Andres Montoyo.
2008b. Building a Recommender System
using Community Level Social Filtering.</p>
        <p>En NLPCS, paginas 32{41.</p>
        <p>Esuli, Andrea y Fabrizio Sebastiani. 2006.</p>
        <p>Sentiwordnet: A publicly available lexical
resource for opinion mining. En
Proceedings of LREC, volumen 6, paginas 417{
422.</p>
        <p>Fernandez, Javi, Yoan Gutierrez, Jose M.</p>
        <p>Gomez, Patricio Mart nez-Barco, Andres
Montoyo, y Rafael Mun~oz. 2013.
Sentiment Analysis of Spanish Tweets Using a
Ranking Algorithm and Skipgrams. En</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>XXIX Congreso de la Sociedad Espan~ola</title>
      <p>de Procesamiento de Lenguaje Natural
(SEPLN 2013), paginas 133{142.</p>
      <p>Guthrie, David, Ben Allison, Wei Liu, Louise
Guthrie, y Yorick Wilks. 2006. A closer
Dataset</p>
    </sec>
    <sec id="sec-12">
      <title>TASS-Train</title>
    </sec>
    <sec id="sec-13">
      <title>TASS-Test</title>
    </sec>
    <sec id="sec-14">
      <title>Sanders MR-P MR-PS MR-SS</title>
      <p>NONE
1,483
21,416
2,223
5,331</p>
    </sec>
    <sec id="sec-15">
      <title>Subjectivity</title>
    </sec>
    <sec id="sec-16">
      <title>Polarity+Neutral</title>
    </sec>
    <sec id="sec-17">
      <title>Polarity</title>
      <p>look at skip-gram modelling. En
Proceedings of the LREC-2006, paginas 1{4.
Hu, Minqing y Bing Liu. 2004. Mining
and summarizing customer reviews. En</p>
    </sec>
    <sec id="sec-18">
      <title>Proceedings of the 10th ACM SIGKDD,</title>
      <p>paginas 168{177. ACM.</p>
      <p>Miller, George A. 1993. Five papers on</p>
    </sec>
    <sec id="sec-19">
      <title>WordNet. Technical Report CLS-Rep-43,</title>
    </sec>
    <sec id="sec-20">
      <title>Cognitive Science Laboratory, Princeton</title>
    </sec>
    <sec id="sec-21">
      <title>University.</title>
      <p>Pang, Bo y Lillian Lee. 2004. A
sentimental education: Sentiment analysis using
subjectivity summarization based on
minimum cuts. En Proceedings of the 42nd</p>
    </sec>
    <sec id="sec-22">
      <title>Annual Meeting on Association for Computational Linguistics, pagina 271.</title>
      <p>Pang, Bo y Lillian Lee. 2005. Seeing stars:
Exploiting class relationships for
sentiment categorization with respect to rating
scales. En Proceedings of the 43rd Annual</p>
    </sec>
    <sec id="sec-23">
      <title>Meeting on Association for Computational</title>
      <p>Linguistics, paginas 115{124.</p>
      <p>Popescu, Ana-Maria y Orena Etzioni. 2007.</p>
      <p>Extracting product features and opinions
from reviews. En Natural language
processing and text mining. Springer, paginas
9{28.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Sebastiani</surname>
          </string-name>
          , Fabrizio.
          <year>2002</year>
          .
          <article-title>Machine learning in automated text categorization</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>47</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Strapparava</surname>
          </string-name>
          , Carlo y Alessandro Valitutti.
          <year>2004</year>
          .
          <article-title>WordNet A ect: an A ective Extension of WordNet</article-title>
          .
          <source>En LREC, volumen 4, paginas</source>
          <volume>1083</volume>
          {
          <fpage>1086</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Villena-Roman</surname>
            , Julio,
            <given-names>Janine</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
          </string-name>
          a-Morera,
          <article-title>Miguel A. Garc a-Cumbreras, Eugenio Mart nez-</article-title>
          <string-name>
            <surname>Camara</surname>
          </string-name>
          , M. Teresa Mart nValdivia, y L.
          <article-title>Alfonso Uren~a-</article-title>
          <string-name>
            <surname>Lopez</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Overview of TASS</article-title>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Wilson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , P. Ho mann, S. Somasundaran,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cardie</surname>
          </string-name>
          , E. Rilo , y
          <string-name>
            <given-names>S.</given-names>
            <surname>Patwardhan</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>OpinionFinder: A system for subjectivity analysis</article-title>
          .
          <source>En Proceedings of HLT/EMNLP on Interactive Demonstrations.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>