<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using an Emotion-based Model and Sentiment Analysis Techniques to Classify Polarity for Reputation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jorge Carrillo de Albornoz?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Chugur</string-name>
          <email>irina@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrique Amigo</string-name>
          <email>enrique@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Natural Language Processing and Information Retrieval Group, UNED Juan del Rosal</institution>
          ,
          <addr-line>16 (Ciudad Universitaria), 28040 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <abstract>
        <p>Online Reputation Management is a novel and active area in Computational Linguistics. Closely related to opinion mining and sentiment analysis, it incorporates new features to traditional tasks like polarity detection. In this paper, we study the feasibility of applying complex sentiment analysis methods to classifying polarity for reputation. We adapt an existing emotional concept-based system for sentiment analysis to determine polarity of tweets with reputational information about companies. The original system has been extended to work with texts in English and in Spanish, and to include a module for ltering tweets according to their relevance to each company. The resulting UNED system for pro ling task participated in the rst RepLab campaign. The experimental results prove that sentiment analysis techniques are a good starting point for creating systems for automatic detection of polarity for reputation.</p>
      </abstract>
      <kwd-group>
        <kwd>Online Reputation Management</kwd>
        <kwd>Polarity for Reputation</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Emotions</kwd>
        <kwd>Word Sense Disambiguation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Part of eMarketing, Online Reputation Management (ORM) has already become
an essential component of corporate communication for public gures and large
companies [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Being absolutely vital to maintain the good name and preserve
the \reputational capital", ORM comprises activities that aim at building,
protecting and repairing the image of people, organizations, products, or services.
      </p>
      <p>In order to study a brand image, reputation management consultancies
perform two main tasks: monitoring and pro ling. As its name suggests, the former
consists in a constant (daily) monitoring of online media, seeking and analysing
information related to the entity, with the objective of detecting any topic that
might damage its image. In contrast, pro ling refers to a single or periodic (e.g.,
monthly) revision of a company's reputation as it distils from news, opinions and
comments expressed in social media or online press. Unlike monitoring, which is
? This research was supported by the European Unions (FP7-ICT-2011-7 - Language
technologies - nr 288024 (LiMoSINe).)
essentially a real-time problem, pro ling is a static study of opinions and polar
facts concerning a certain entity and extracted for a given period. Normally, this
information is contrasted with what has been said in the same period of time
about the company's potential competitors, and with the opinions about the
entity in earlier periods of time. These practical scenarios have been adopted as
tasks for RepLab 2012, the rst evaluation campaign for ORM systems1. In this
paper, we will focus exclusively on the pro ling task.</p>
      <p>Although for reputation analysts pro ling implies a complex of subtasks such
as identifying the dimension of the entity's activity a ected by a given content,
detecting opinion targets and determining the type of opinion holder2, the basis
of adequate pro ling is undoubtedly an e ective named entity disambiguation
and detection of polarity for reputation. The system described in the present
paper centres precisely on the problem of classifying tweets into related and
unrelated with respect to a given company ( ltering) and on determining the
polarity of tweets based on the emotion concepts they contain.</p>
      <p>
        Some tasks considered in pro ling are similar to the research problems in
opinion mining and sentiment analysis that comprise subjectivity detection [
        <xref ref-type="bibr" rid="ref16 ref19 ref23">23,
19, 16</xref>
        ], polarity classi cation [
        <xref ref-type="bibr" rid="ref10 ref18 ref21 ref24">18, 21, 10, 24</xref>
        ] and intensity classi cation [
        <xref ref-type="bibr" rid="ref10 ref25 ref4">10, 4,
25</xref>
        ], among others. Although opinion mining has made signi cant advances in
the last years, most of the work has been focused on products. However, mining
and interpreting opinions about companies and individuals generally is a harder
and less understood problem, since unlike products or services, opinions about
people and organizations cannot be structured around any xed set of features
or aspects, requiring a more complex modelling of these entities.
      </p>
      <p>Identifying polarity of a given content, ORM system should assess if it has
negative, positive or neutral e ect on the company's image. Again, this problem
is related to sentiment analysis and opinion mining, but signi cantly di ers from
the mainstream research in these areas. First of all, what is analysed are not only
opinions, subjective content, but also facts, and more to the point, polar facts,
i.e. objective information that might have negative or positive implications for
the company's reputation. This means that ORM systems have to be able to
detect polarity also in non-opinionated texts. On the second place, focus or
perspective plays sometimes a decisive role, since the same information may be
negative from the point of view of clients and positive from the point of view of
investors. We will refer to this complex notion of polarity with the term polarity
for reputation.</p>
      <p>In this paper, we study the feasibility of using complex sentiment analysis
approaches to classifying polarity for reputation. To this end, as exposed in
Section 2, we adapt an existing emotional concept-based system for sentiment
analysis to classify tweets with reputation information about companies. The
system is also extended to lter tweets according to their relevance to each</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://www.limosine-project.eu/events/replab2012</title>
      <p>2 Both companies' dimensions and the types of opinion holder are standard parameters
of RepTrak System http://www.reputationinstitute.com/thought-leadership/
the-reptrak-system.
company and works with texts both in English and in Spanish. It is worthy to
mention that one of the applied approaches combines our runs with the algorithm
provided by Barcelona Media (see Section 3). Our experimental results, described
in detail in Section 4, demonstrate that sentiment analysis techniques are a good
starting point to process and classify online reputation information. Finally, after
a brief discussion of the obtained results (see Section 5), we outline some lines
for future work in Section 6.
2</p>
      <p>
        An emotional concept-based approach for polarity for
reputation
As mentioned above, our main concern is to analyze the applicability of sentiment
analysis techniques for classifying the polarity of reputation information. To this
aim, we have adapted the approach presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for polarity and intensity
classi cation of opinionated texts. The main idea of this method is to extract
the WordNet concepts in a sentence that entail an emotional meaning, assign
them an emotion within a set of categories from an a ective lexicon, and use
this information as the input of a machine learning algorithm. The strengths
of this approach, in contrast to other more simple strategies, are: (1) use of
WordNet and a word sense disambiguation algorithm, which allows the system
to work with concepts rather than terms, (2) use of emotions instead of terms
as classi cation attributes, and (3) processing of negations and intensi ers to
invert, increase or decrease the intensity of the expressed emotions. This system
has been shown to outperform previous systems designed for the same task. For
ltering, we have implemented a simple approach based on a vote mechanism
and contextual information.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Filtering</title>
        <p>In order to determine if a text is related to a given entity we have implemented
a simple vote mechanism that calculates a score depending on how many entity
context words are found in the input text. The entity context is obtained from the
entity website and from the entity entry in Wikipedia, as provided by RepLab.
An input text is determined as related or not to a given entity when the nal score
is upper a certain threshold. Di erent thresholds have been evaluated, changing
from the simple presence of the search query to mentions of the complete entity
name or to the presence of more entity context words. Our main objective for this
approach is to determine if a simple method of word presence is able to correctly
determine the relatedness of the text to a given entity. It is also important to
highlight the complexity of the task, since even for humans it is often di cult to
decide if a text is ambiguous or not with the respect to a given entity due to the
lack of context in the input text. This problem is more evident in microblogging
systems such as Twitter, where the text of the post is limited to 140 characters.</p>
        <p>
          As a rst step, the system pre-processes each input text splitting it in
sentences and isolating the tokens of each sentence using the GATE architecture [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
for English and the FreeLing library for Spanish [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In the same way, the entity
contexts, i.e. the entity website and the Wikipedia entry, are also pre-processed.
Besides, in this pre-processing step all stop words and symbols included in the
input text and the entity contexts are removed to reduce noise. Finally, the
search query and the complete entity name from the RepLab 2012 data set are
retrieved and preprocessed. The score for an input is calculated using four rules:
{ Rule I: If a text contains the complete entity name, the highest score, 0.75,
is added to the input score. This decision is based on the idea that a text
that includes the complete name of a company rarely will not refer to the
entity, as usually, complete names of the companies are the most meaningful
and distinctive (e.g., Banco Santander, S.A., Bank of America Corporation,
etc.).
{ Rule II: However, the most frequent way of referring to companies is by
means of short names, which are frequently used as queries, such as
\Santander" for Banco Santander, S.A. or \Bank of America" for Bank of
America Corporation. That is why, when the input text contains an identical to
the search query sequence of tokens, the system adds to the total score 2/3 of
the maximum score, 0.5. The reason for using a lower value is that we have
found the search queries to be highly ambiguous. For example, \Santander"
could be interpreted as a region of Spain or as a bank, depending on the
context. Note that in this case we use token matching rather than string
matching.
{ Rule III: Due to the limited length of Twitter posts, in many cases the
string of the search query is not tokenized, but included into a hashtag or
written omitting blanks (e.g., #bankofamerica or BankofAmerica instead of
\Bank of America"), so di erent tokens cannot be correctly identi ed by
GATE and FreeLing. To solve this, we have included a further rule that
checks if the input string contains the search query after removing blanks,
and assigns 1/3 of the maximum score, 0.25.
{ Rule IV: Finally, we assume that an input text that contains words from
the entity context is more probably related to the entity than other, the
more words in common the higher probability. However, as the website and
the Wikipedia entry usually include not only domain speci c terms, but also
many generic words, for each token in the input text that matches a token
in the entity context the score of the input is increased only 0.25.
        </p>
        <p>The score of each input text is then compared to the threshold to determine
if the text is related to the entity or not, ltering all the input texts that are not
related to the entity.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Polarity for reputation</title>
        <p>
          The original method presented in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] has been modi ed to improve the scope
detection approach for negation and intensi ers to deal with the e ect of
subordinate sentences and special punctuation marks. Besides, the list and weights of
the intensi ers have been adjusted to the most frequent uses in English.
Moreover, the presented approach uses the SentiSense a ective lexicon [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], that was
speci cally designed for opinionated texts. Sentisense attaches an emotional
category from a set of 14 emotions to WordNet concepts. It also includes the antonym
relationship between emotional categories, which allows to capture the e ect of
some linguistic modi ers such as negation. We also have adapted the system to
work with Spanish texts, as the original system was conceived only for English.
The method comprises four steps that are described below:
Pre-processing: POS Tagging and Concept Identi cation The objective
of the rst step is to translate each text to its conceptual representation in order
to work at the concept level in the next steps and avoid word ambiguity. To
this aim, the input text is split into sentences and the tokens are tagged with
their POS using GATE for English texts and FreeLing for Spanish texts. At this
stage, the syntax tree of each sentence is also retrieved using the Stanford Parser
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] for the English texts and the FreeLing library for the Spanish texts. With
this information, the system next maps each token to its appropriate WordNet
concept using the Lesk word sense disambiguation algorithm as implemented
in the WordNet::SenseRelate package [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] for English, and the UKB algorithm
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] as included in the FreeLing library for Spanish. Besides, to enrich the
emotion identi cation step, the hypernyms of each concept are also extracted from
WordNet.
        </p>
        <p>
          Emotion Identi cation Once the concepts are identi ed, the next step maps
each WordNet synset to its corresponding emotional category in the SentiSense
a ective lexicon, if any. The emotional categories of the hypernyms are also
retrieved. We hypothesize that the hypernyms of a concept entail the same
emotions than the concept itself, but decreasing the intensity of the emotion as we
move up in the hierarchy. So, when no entry is found in the SentiSense lexicon
for a given concept, the system retrieves the emotional category associated to its
nearest hypernym, if any. However, only a certain level of hypernymy is accepted,
since an excessive generalization introduces some noise in the emotion identi
cation. This parameter has been empirically set to 3. In order to accomplish this
step for Spanish texts we have automatically translated the SentiSense lexicon
to the Spanish language. To do this, we have automatically updated the synsets
in SentiSense to their WordNet 3.0 version using the WordNet mappings. In
particular, for nouns and verbs we use the mappings provided by the WordNet
team [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and for adjectives and adverbs, the UPC mappings [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. In this
automatic process we have only found 15 labeled synsets without a direct mapping,
which were removed in the new SentiSense version. Finally, in order to
translate the SentiSense English version to Spanish we use the Multilingual Central
Repository (MRC) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The MCR is an open source database that integrates
WordNet versions for ve di erent languages: English, Spanish, Catalan, Basque
and Galician. The Inter-Lingual-Index (ILI) allows the automatic translation of
synsets from one language to another.
Post-processing: Negation and Intensi ers In this step, the system has
to detect and solve the e ect of negations and intensi ers over the emotions
discovered in the previous step. This process is important, since these linguistic
modi ers can change the polarity and intensity of the emotional meaning of
the text. It is apparent that the text #Barclays bank may not achieve 13% on
equity target by 2013 entails di erent polarity than the text #Barclays bank may
achieve 13% on equity target by 2013, and reputation systems must be aware of
this fact.
        </p>
        <p>To this end, our system rst identi es the presence of modi ers using a list
of common negation and intensi cation tokens. In such a list, each intensi er
is assigned a value that represents its weight or strength. The scope of each
modi er is determined using the syntax tree of the sentence in which the modi er
arises. We assume as scope all descendant leaf nodes of the common ancestor
between the modi er and the word immediately after it, and to the right of
the modi er. However, this process may introduce errors in special cases, such
as subordinate sentences or those containing punctuation marks. In order to
avoid this, our method includes a set of rules to delimit the scope in such cases.
These rules are based on speci c tokens that usually mark the beginning of
a di erent clause (e.g., because, until, why, which, etc.). Since some of these
delimiters are ambiguous, their POS is used to disambiguate them. Once the
modi ers and their scope are identi ed, the system solves their e ect over the
emotions that they a ect in the text. The e ect of negation is addressed by
substituting the emotions assigned to the concepts by their antonyms. In the
case of the intensi ers, the concepts that fall into the scope of an intensi er are
tagged with the corresponding percentage weight in order to increase or diminish
the intensity of the emotions assigned to the concepts.</p>
        <p>
          In particular, for English texts we use the original list of negation signals
from [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and an adapted list of intensi ers with the most frequent uses in
English. The percentage of each intensi er has been set empirically. In order to
determine the scope for English texts, we use the syntax tree as generated by
the Stanford Parser. The same process is replicated for the Spanish texts, and a
list of common negation tokens in Spanish (such as no, nunca, nada, nadie, etc.)
and common intensi ers (mas, menos, bastante, un poco, etc.) were developed.
In order to determine the scope of each modi er, the syntax tree as generated
by the FreeLing library is used.
        </p>
        <p>Classi cation In the last step, all the information generated in the previous
steps is used to translate each text into a Vector of Emotional Intensities (VEI),
which will be the input to a machine learning algorithm. The VEI is a vector
of 14 positions, each of them representing one of the emotional categories of the
SentiSense a ective lexicon. The values of the vector are generated as follows:
{ For each concept, Ci, labeled with an emotional category, Ej , the weight of
the concept for that emotional category, weight(Ci; Ej ), is set to 1.0.
{ If no emotional category was found for the concept, and it was assigned the
category of its rst labeled hypernym, hyperi, then the weight of the concept
is computed as:
weight(Ci; Ej ) = 1=(depth(hyperi) + 1)
(1)
{ If the concept is a ected by a negation and the antonym emotional category,
Eantonj , was used to label the concept, then the weight of the concept is
multiplied by = 0:6. This value has been empirically determined in
previous studies. It is worth mentioning that the experiments have shown that
values below 0.5 decrease performance sharply, while it drops gradually for
values above 0.6.
{ If the concept is a ected by an intensi er, then the weight of the concept is
increased/decreased by the intensi er percentage, as shown in Equation 2.
weight(Ci; Ej ) = weight(Ci; Ej ) (100 + intensif ier percentage)=100 (2)
{ Finally, the position in the VEI of the emotional category assigned to the
concept is incremented by the weight previously calculated.
3</p>
        <p>
          Heterogeneity based ranking approach for polarity for
reputation
Our last approach consists in combining our runs (Uned 1..4) with the runs
provided by Barcelona Media (BMedia 1..5) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Given that the RepLab 2012 trial
data set is not big enough for training purposes we opted for a voting method. In
[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] several voting algorithms for combining classi ers are described. They are
focused on the multiplicity of classi ers that support a certain decision.
However they do not consider the diversity of classi ers, which is a strong evidence
of accuracy when combining classi ers [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. There are works in which classi ers
are selected to be combined while trying to maximize their diversity [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          We propose a voting algorithm that directly considers the diversity of
classiers instead of the amount of classi ers that corroborate a certain decision. As
far as we know, this perspective has not been applied before due to the lack of a
diversity measure to be applied over classi er sets rather than pairwise measures.
Our approach is inspired in the Heterogeneity Based Ranking [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>We de ne the Heterogeneity of a set of classi ers F = ff1::fng as the
probability over decision cases C that there exists at least two classi ers contradicting
each other.</p>
        <p>H(F ) = PC(9fi; fj 2 F =fi(c) 6= fj (c))</p>
        <p>The approach basically consists in selecting the label (e.g. positive, neutral)
that maximizes the heterogeneity of classi ers corroborating this decision.</p>
        <sec id="sec-2-2-1">
          <title>Evaluation</title>
          <p>The data set used for evaluation consists of a training set of 1800 tweets crawled
for six companies (300 per company) and a test set of 6243 tweets for 31
companies. However, since Twitter's Terms of Service do not allow redistribution of
tweets, some of them have been removed from the release version of the data
set. So, the training set that has been nally used contains 1662 tweets manually
labeled by experts, 1287 for English and 375 for Spanish. For the ltering task,
the training set comprises 1504 tweets related to any of the companies and 158
that were not related to any company. In polarity for reputation, the
distribution of tweets between classes is: positive=885, neutral=550, and negative=81.
The test set contains tweets manually labeled by experts, 3521 of them are in
English and 2722 in Spanish. Besides, this set contains 4354 related tweets, and
1889 unrelated tweets, while the distribution of tweets in polarity for reputation
classes is 1625, 1488 and 1241 tweets for positive, neutral and negative,
respectively. For the pro ling task (i.e., combining systems of ltering and polarity),
the performance is evaluated as accuracy on a four-class classi cation problem:
irrelevant, relevant negative, relevant neutral and relevant positive. For the
individual evaluation of the ltering and polarity tasks, performance is evaluated in
terms of reliability (R) and sensitivity (S)3. Accuracy (% of correctly annotated
cases) is also included for comparison purposes. Overall scores (R, S, F(R,S),
accuracy) are computed as the average of individual scores per entity, assigning
the same weight to all entities.</p>
          <p>
            As the system for polarity for reputation is a supervised method, we have
tested di erent machine learning algorithms with the training set in order select
the best classi er for the task. To determine the best machine learning algorithm
for the task, 20 classi ers currently implemented in Weka [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] were compared.
We only show the results of the best performance classi ers: a logistic regression
model (Logistic) and a classi er using decision trees (RandomForest). The best
outcomes for the two algorithms were reported when using their default
parameters in Weka. For the ltering task, we have evaluated over the training set
di erent thresholds. In particular, we have evaluated the values 0.25 (just the
presence of the query search or the presence of an entity context word), 0.5 (the
presence of the query search plus some entity context word, or the exact match
of tokens with the query search), 0.75 (the presence of the company name, or
di erent combinations of the query search and entity context words) and 1.0
(multiple combinations of the query search or the company name and the entity
context word). We have found that, upper this last threshold, the performance of
the system decreases sharply. For RepLab 2012, we have selected two thresholds
that produce better results, i.e. 0.5 and 0.75.
          </p>
          <p>Based on these considerations, ve systems have been presented to RepLab
2012: Uned 1 (Logistic + threshold=0.5 ), Uned 2 (RandomForest +
threshold=0.5 ), Uned 3 (Logistic + threshold=0.75 ), Uned 4 (RandomForest +
threshold=0.75 ), and Uned 5 or Uned-BMedia, which is the system described in the
sec</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 See guidelines at http://www.limosine-project.eu/events/replab2012</title>
      <p>tion 3 that combines the outputs of the algorithms Uned 1..4 and BMedia 1..5
using the heterogeneity-based ranking approach.</p>
      <p>
        Table 1 shows the results of our systems for the ltering task when evaluated
over the test set. As can be seen, the best performance in terms of F(R,S) is
obtained by the two approaches that use the 0.75 threshold (Uned 3 and Uned 4),
followed by the combined version Uned-BMedia and the two approaches that use
the 0.5 threshold (Uned 1 and Uned 2). In terms of accuracy, the best result is
obtained by the combined version Uned-BMedia, only 2.0 percentage points more
than the two approaches that use the 0.75 threshold and 3.1 percentage points
more than the 0.5 threshold approaches. Comparing these results with the All
relevant baseline, it is evident that the performance of our approaches is quite
acceptable but may be easily improved. It is important to recall the di culty of
the task even for humans. In fact, the best results obtained in the challenge by
our systems, Uned 3 and Uned 4, are placed 13th and 14th, respectively, out of
33 participants [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Table 2 summarizes the results obtained by our systems in terms of
accuracy, reliability and sensitivity over the test set. The best performance in terms
of F(R,S) is achieved by the combined version Uned-BMedia, while the systems
Uned 2 and Uned 4 that use the RandomForest classi er are only 4.0
percentage points lower. The systems Uned 1 and Uned 3 that use the Logistic classi er
achieve 8.0 percentage points less than the Uned-BMedia, which is an important
di erence in performance. In contrast, in terms of accuracy the best performance
is obtained by the RandomForest classi er (Uned 2 and Uned 4), followed by the
combined version Uned-BMedia and the Logistic classi er. As can be seen in the
table, all the systems improved over the All positives baseline. It is worth
noticing that the results achieved for the polarity for reputation task are much better
than the obtained for ltering task. This is evidenced by the fact that the
combined system Uned-BMedia obtained the 2nd position within the 35 participating
systems in term of F(R,S), and the two approaches that use the
RandomForest classi er obtained the 5th and 6th position, respectively. However, in terms
of accuracy the results reported by our systems are even better, the two
approaches that use the RandomForest classi er have achieved the 1st and 2nd
position in the ranking, while the combined version Uned-BMedia has achieved
the 4th position.</p>
      <p>The results of our systems for the pro ling task, i.e. combining ltering and
polarity for reputation, are shown on Table 3. The best result is achieved by
the system Uned 4. However, the di erence is not so marked with respect to the
Uned 3 and the combined version Uned-BMedia, 0.6 and 1.9 percentage points of
accuracy, respectively. The di erence becomes higher comparing to the systems
Uned 2 and Uned 1, 5.6 and 7.2 percentage points of accuracy. Moreover, all
systems considerably improve the performance over the All relevant and positives
baseline. As in the polarity for reputation task, the results achieved by our
systems compare favorably to those of other participants. In particular, three
of our systems are among six best systems out of 28 systems that participated
in the task (the 3rd, the 5th and the 6th for Uned 4, Uned 3 and Uned-BMedia,
respectively), which proves that the proposed approaches are a good starting
point for a more complex pro ling system.
Analysing and discussing the results obtained by the systems proposed for
RepLab 2012, rst of all and regarding the ltering task, we have to say that our
system correctly classi ed 3352 out of 4354 related tweets (76.9%) and 1019 out
of 1831 non-related tweets (55.6%). These results suggest that the 0.75
threshold in the vote mechanism is a good choice for discriminating between related
and unrelated tweets. Most of the related tweets were above this threshold, and
nearly half of the unrelated tweets were below it. However, a number of unrelated
tweets obtained a score far above 0.75, which seems to suggest that the ltering
method should take into account not only positive terms but also negative ones,
i.e. terms that decrease the score. We also have analyzed which are the most
frequent combination of rules for assigning scores, as well as their adequacy. We
have found that rule II (i.e. the one that considers the presence in the input
of the individual tokens within the search query) and rule III (i.e. the one that
looks for the search query without blanks), are frequently launched together
producing satisfactory results. The combination of rule III and rule IV (i.e. the
presence of entity context words) is the second most frequent one. However, it
is also the one that introduces more noise. Finally, the presence of the complete
entity name (rule I) is the less launched rule, but the one that produces the best
performance, as expected.</p>
      <p>Regarding the polarity for reputation task, our best systems in terms of
accuracy (Uned 2 and Uned 4) correctly classi ed 822 tweets as positive, 810
tweets as neutral and only 179 tweets as negative. These results show the good
performance of these systems at identifying the positive and neutral classes.
In contrast, the systems only classify 15% of negative tweets correctly. This
di erence may be due to the fact that the number of negative instances in the
training set is not big enough for the classi ers to correctly learn this class.</p>
      <p>Even if the systems achieve a good performance in polarity for reputation,
it is worth mentioning that these results are lower than those reported by the
same systems when evaluated with other datasets containing product reviews. In
order to determine the reason of this, we analyzed the datasets and found that
an important number of tweets are not labeled with any emotion. In particular,
the coverage of the vocabulary in SentiSense for the training set is 12.5% for
English and 12.6% for Spanish, while the coverage in the test set is 11.3% and
11.2% for English and Spanish, respectively. This was expected, since SentiSense
is specially designed for processing product reviews. Therefore, taking this low
coverage into account, we expect that expanding SentiSense with
reputationrelated vocabulary will allow us to signi cantly improve the classi cation results.</p>
      <p>However, as we suspected, another important source of classi cation errors is
that many of the tweets labeled with polarity in the dataset (positive and
negative) do not contain any emotional meaning per se. These are factual expressions
that, from a reputation perspective, entail a polarity, and therefore must be
classi ed as negative or positive. For example, the tweet I bough a Lufthansa ticket
to Berlin yesterday does not express any clear emotion, but is considered as
positive for reputation purposes. To this aim, reputation management knowledge
must be used to allow the system to correctly interpret these expressions.</p>
      <p>The importance of the opinion holder and, specially for microblogging, the
importance of the external links in the input text are other two important
ndings of our analysis. On the one hand, we nd some examples of tweets that,
from our point of view, should be classi ed as neutral, but were classi ed by the
experts as polar due to the relevance or popularity of the the tweet's author.
So, we can conclude that correctly determining the opinion holder is important
to weight the nal polarity for reputation. On the other hand, we nd that
many of the tweets that have external links obtain their polarity from the linked
documents. This suggests that studying polarity for reputation in social media,
especially in such microblogging services as Twitter, needs more context than
the posts themselves.</p>
      <p>Finally, the results obtained in the pro ling task show the good adequacy
of complex sentiment analysis approaches as a starting point to analyze texts
according to the reputation of a given entity. Even if the performance of the
combined system, ltering + polarity for reputation, is quite acceptable, there is
still room for improvement. It is important to notice that the evaluation of this
task includes, rst, ltering relevant tweets, and next, the polarity classi cation
of these relevant tweets, so that the error of the ltering step is propagated to
the polarity for reputation classi cation.
6</p>
      <sec id="sec-3-1">
        <title>Conclusion and Future Work</title>
        <p>In this paper, we have presented a system to address the task of classifying
polarity for reputation, based on sentiment analysis techniques. We also face
the problem of determining whether a text is related or not to a given entity
(e.g., company or brand). The results showed that, even if the task is more
complex than classifying polarity in opinionated texts, the use of complex sentiment
analysis approaches seems a good basis for classifying polarity for reputation.
However, it is still necessary to incorporate a high level of expert knowledge to
correctly analyze the reputation of a company.</p>
        <p>As future work, we plan to develop a system for separating objective from
subjective statements. This separation will allow us to study objective texts
from a reputation perspective, while subjective opinions will be analyzed using
sentiment analysis approaches. Our analysis has also revealed the importance
of detecting the opinion holder, which may in uence the polarity of the text.
To address this, we plan to study existing approaches in the area and evaluate
them for the polarity for reputation task. Finally, we would like to extend the
SentiSense a ective lexicon to cover more domain speci c vocabulary.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Personalizing PageRank for Word Sense Disambiguation</article-title>
          .
          <article-title>In: proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics</article-title>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corujo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meij</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rijke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Overview of RepLab 2012:
          <article-title>Evaluating Online Reputation Management Systems</article-title>
          .
          <source>In: proceedings of CLEF 2012 Labs and Workshop</source>
          Notebook Papers. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimenez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>UNED: Improving Text Similarity Measures without Human Assessments</article-title>
          .
          <source>In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics { Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval</source>
          <year>2012</year>
          ), pp.
          <volume>454</volume>
          {
          <fpage>460</fpage>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Brooke</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Semantic approach to automated text sentiment analysis</article-title>
          .
          <source>Unpublished doctoral dissertation</source>
          , Simon Fraser University, Canada. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Carreras</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chao</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Padr, LL.,
          <string-name>
            <surname>Padr</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>FreeLing: An Open-Source Suite of Language Analyzers</article-title>
          .
          <source>In: proceedings of the 4th International Conference on Language Resources and Evaluation</source>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Carrillo de Albornoz, J.,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gervas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>A Hybrid Approach to Emotional Sentence Polarity and Intensity Classi cation</article-title>
          .
          <source>In: proceedings of the 14th Conference on Computational Natural Language Learning</source>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>161</lpage>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Carrillo de Albornoz, J.,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gervs</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>SentiSense: An easily scalable conceptbased a ective lexicon for Sentiment Analysis</article-title>
          .
          <source>In: proceedings of the 8th International Conference on Language Resources and Evaluation</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chenlo</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atserias</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanco</surname>
          </string-name>
          , R..
          <source>: FBM-Yahoo! at RepLab</source>
          <year>2012</year>
          .
          <source>In: proceedings CLEF 2012 Labs and Workshop</source>
          Notebook Paper. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cunningham</surname>
          </string-name>
          , H.: GATE, a
          <article-title>General Architecture for Text Engineering</article-title>
          .
          <source>Computers and the Humanities</source>
          ,
          <volume>36</volume>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>254</lpage>
          . (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Esuli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Determining term subjectivity and term orientation for opinion mining</article-title>
          .
          <source>In: proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , pp.
          <volume>193</volume>
          {
          <fpage>200</fpage>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Giacinto</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>An Approach to the Automatic Design of Multiple Classi er Systems</article-title>
          .
          <source>Pattern Recognition Letters</source>
          ,
          <volume>22</volume>
          , pp.
          <volume>25</volume>
          {
          <fpage>33</fpage>
          . (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laparra</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
          </string-name>
          , G.:
          <article-title>Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base</article-title>
          .
          <source>In: proceedings of the Sixth International Global WordNet Conference</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I. H.</given-names>
          </string-name>
          :
          <article-title>The WEKA data mining software: an update</article-title>
          .
          <source>SIGKDD Explor. Newsl., 11</source>
          , pp.
          <volume>10</volume>
          {
          <fpage>18</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ho</surname>
          </string-name>
          man, T.:
          <article-title>Online reputation management is hot? But is it ethical</article-title>
          ? In: Computerworld,
          <volume>44</volume>
          ,
          <string-name>
            <surname>February</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C. D.:
          <article-title>Accurate Unlexicalized Parsing</article-title>
          .
          <source>In: proceedings of the 41st Meeting of the Association for Computational Linguistics</source>
          , pp.
          <fpage>423</fpage>
          -
          <lpage>430</lpage>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S-M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>Determining the sentiment of opinions</article-title>
          .
          <source>In: proceedings of the 20th Conference on Computational Linguistic</source>
          , pp.
          <volume>1367</volume>
          {
          <fpage>1373</fpage>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kuncheva</surname>
            ,
            <given-names>L. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whitaker</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shipp</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duin</surname>
            ,
            <given-names>R. P. W.</given-names>
          </string-name>
          :
          <article-title>Is Independence Good For Combining Classi ers?</article-title>
          .
          <source>In: proceedings of the 15th International Conference on Pattern Recognition</source>
          , pp.
          <volume>168</volume>
          {
          <fpage>171</fpage>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Thumbs up? Sentiment classi cation using Machine Learning techniques</article-title>
          .
          <source>In: proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <volume>79</volume>
          {
          <fpage>86</fpage>
          . (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts</article-title>
          .
          <source>In: proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics</source>
          , pp.
          <volume>271</volume>
          {
          <fpage>278</fpage>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Patwardhan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          , T. :
          <article-title>SenseRelate::TargetWord - A generalized framework for word sense disambiguation</article-title>
          .
          <source>In: proceedings of the Association for Computational Linguistics on Interactive Poster and Demonstration Sessions</source>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>76</lpage>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Turney</surname>
          </string-name>
          , P.D.:
          <article-title>Thumbs up or thumbs down? Semantic orientation applied to unsupervised classi cation of reviews</article-title>
          .
          <source>In: proceedings of the 40th Annual Meeting of the Association for Computational Linguistics</source>
          , pp.
          <volume>417</volume>
          {
          <fpage>424</fpage>
          .(
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. Universidad Politecnica de Catalun~
          <article-title>a mappings</article-title>
          . http://nlp.lsi.upc.edu/web/index.php?option=com content&amp;task=view&amp;id=21 &amp;
          <string-name>
            <surname>Itemid</surname>
          </string-name>
          =
          <fpage>57</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wiebe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruce</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Hara</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Development and use of a gold-standard data set for subjectivity classi cation</article-title>
          .
          <source>In: proceedings of the 37th Annual Meeting of the Association for Computational Linguistics</source>
          , pp
          <volume>246</volume>
          {
          <fpage>253</fpage>
          . (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klakow</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Bootstrapping supervised machine-learning polarity classi ers with rule-based classi cation</article-title>
          .
          <source>In: proceedings of the 1st Workshop on Computational Approaches to Subjectivity and Sentiment Analysis</source>
          , pp.
          <volume>59</volume>
          {
          <fpage>66</fpage>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Ho mann, P.:
          <article-title>Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>35</volume>
          , pp.
          <volume>399</volume>
          {
          <fpage>433</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <article-title>WordNet mappings</article-title>
          . http://wordnet.princeton.edu/wordnet/download/.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krzyzak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suen</surname>
          </string-name>
          , C.Y.:
          <article-title>Methods for combining multiple classi ers and their applications to handwriting recognition</article-title>
          .
          <source>IEEE Transactions on Systems, Man and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>A</given-names>
          </string-name>
          : Systems and Humans,
          <volume>22</volume>
          , pp.
          <volume>418</volume>
          {
          <fpage>435</fpage>
          . (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>