<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>POPSTAR at RepLab 2013: Polarity for Reputation Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>INESC-ID</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Technical University of Lisbon</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jfilgueiras</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>samir}@inesc-id.pt</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <abstract>
        <p>This paper describes our participation in the Polarity for Reputation classification task of RepLab 2013. Our system leveraged on a set of components previously developed for a Twitter message polarity classifier. Following a supervised approach, a Logistic Regression classifier is trained from annotated data. A refined language model is used to represent tweets in terms of a vocabulary consisting only of the most informative terms with word features weighted using a measure from the Information Retrieval field. To help reduce the sparseness of the feature vector, the model is enriched with another, more compact, representation of the words. Finally, we extract features to capture the use of informal and affective language. Our approach ranked in the top three for all the metrics, showing that the strategies for Twitter Sentiment Analysis are useful for the task of Polarity for Reputation classification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Twitter has become a vast repository of user-generated content over the years. A current
research trend is to use this repository, and others like it, as a source of indicators of
public opinion on a number of topics. Some popular examples are the prediction of
elections or the prediction of box-office revenues for movies.</p>
      <p>The RepLab Polarity for Reputation Classification task aims to use tweets as an
indicator of the reputation of entities, such as companies, brands or artists. To accomplish
this goal the reputation analysis problem is divided into four tasks: filtering, polarity,
clustering and priority. The filtering step removes tweets that are not related to an entity.
The polarity step rates the impact of the tweet on the reputation of the entity as positive,
negative or neutral. The clustering step, aims to cluster tweets pertaining to an entity
into topics and finally, the priority step assigns a level of importance to each topic.
RepLab participants were given an annotated dataset and asked to implement one or more
components of this system.</p>
      <p>
        It is clear that reputation analysis and sentiment analysis are different tasks.
Objective facts can still have a negative impact on the reputation of an entity, and tweets
expressing negative sentiment may still be positive for a reputation or vice-versa.
However, from a technical perspective, how different are the underlying problems? And to
what extent are the techniques suitable for one task effective on the other? Motivated
by these questions we participated in RepLab 2013 using an effective classifier we had
developed for a Twitter message polarity classifier. This classifier was tailored to
participate in the task proposed in the Sentiment Analysis in Twitter track of SemEval 2013,
a workshop focused on the evaluation of semantic analysis systems
        <xref ref-type="bibr" rid="ref3">(Nakov et al., 2013)</xref>
        and achieved state-of-the-art results.
      </p>
      <p>The remainder of the document is organized as follow: next, the annotated dataset is
presented, in Section 2 we describe our approach in detail, Section 3 shows the results
of the experiments and we briefly conclude in Section 4.
1.1</p>
      <sec id="sec-1-1">
        <title>Dataset</title>
        <p>The RepLab 2013 organization provided an annotated dataset that contained a total of
33,191 tweets annotated for polarity. Of these, 7,073 were written in Spanish and 26,118
in English. The content of webpages linked by URLs in each tweet was also provided.
The tweet distribution across classes is shown in Table 1.</p>
        <p>Language Positive Negative Neutral
English 15545 59.5% 3429 13.1% 7144 27.4%
Spanish 3195 45.2% 2143 30.3% 1735 24.5%</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <sec id="sec-2-1">
        <title>Overview</title>
        <p>
          Following a supervised approach, we used the annotated data to train a Logistic
Regression classifier. Each tweet was modelled as a feature vector consisting of a
bag-ofwords vocabulary representation. However, the lexical variation introduced by typos,
abbreviations, slang and unconventional spelling found in Twitter data, leads to very
large vocabularies. The resulting sparse vector representations with few non-zero
values hamper the learning process. To overcome this problem, words with high entropy
(not discriminative of any of the classes) were discarded and Brown Clusters
          <xref ref-type="bibr" rid="ref2">(Brown
et al., 1992)</xref>
          were used as a complementary representation of the tweet to enrich the
feature vector.
        </p>
        <p>We used sentiment lexicons to extract features based on the prior polarity of words,
taking the presence of negation particles into consideration. Some microblog oriented
features were included to capture particular aspects of this type of text (e.g. presence of
emoticons). Finally, the title of the web pages referred in the messages was included to
provide additional context.</p>
        <p>To deal with the fact that there were two different languages, we chose to train a
separate classifier for each language. The Spanish classifier was a simplified version of
the English classifier, without Brown clusters and lexicons features.</p>
        <p>Each major component of our system is described in detail in the following sections.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Preprocess</title>
        <p>In order to cope with the noisy content and reduce the vocabulary size, the following
preprocess steps were taken: stop words were discarded, user mentions (@username)
were replaced with a fixed tag &lt;USER&gt; and URLs with the tag &lt;URL&gt;. Then,
messages were normalized by converting to lower-case and reducing character repetitions to
at most 3 characters (e.g. “helloooooo!” would be normalized to “hellooo!”). Finally,
words were stemmed using the Snowball Stemmer implementation of NLTK 1.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Document Representation</title>
        <p>
          Negation The presence of a negation word can have great impact in the meaning of
a sentence, e.g., the expressions ”very good” and ”not very good” convey opposite
sentiments. Therefore, following the work of
          <xref ref-type="bibr" rid="ref6">Pang et al. (2002)</xref>
          , negation was directly
integrated in the words representation. All the words between a negation word and the
first punctuation mark, were suffixed with the NEG tag. The list of negation words was
compiled manually.
        </p>
        <p>
          Weighting Schemes for Word Features Typical schemes proposed for weighting
word features in text classification tasks are binary weighting, term frequency and tf.idf.
However,
          <xref ref-type="bibr" rid="ref5">Paltoglou and Thelwall (2010)</xref>
          showed that advanced weighting schemes used
in Information Retrieval can enhance sentiment classification accuracy. As these
measures capture the relative importance of a term in two classes (positives and negatives in
this case), they provide more informative word weights for the task at hand. We found
experimentally that delta-tf.idf weight function yields the best results.
Brown Clusters The Brown algorithm is a hierarchical clustering algorithm that
clusters words to maximize the mutual information of bi-grams. The hierarchical nature of
the clustering allows words to be represented at several levels in the hierarchy, which
can compensate for poor clusters of a small number of words. Brown clusters are
created by applying the Brown algorithm to a large corpora, capturing relations between
bi-grams to form a denser representation of a vocabulary. Using these clusters we also
represented documents in terms of a more compact vocabulary, where each word was
mapped to its corresponding cluster. Plugging these clusters as extra-features into the
document model, can alleviate the problems of feature vector sparseness and unseen
words.
        </p>
        <p>
          Word Entropy Previous work has shown that in order to improve classification
performance when using bag-of-words, words with high entropy, i.e., that do not contribute
strongly to any of the classes should be discarded
          <xref ref-type="bibr" rid="ref4 ref5">(Pak and Paroubek, 2010)</xref>
          . The
entropy of the probability distribution of a word appearing in the different classes was
computed using the
          <xref ref-type="bibr" rid="ref7">Shannon and Weaver (1948)</xref>
          definition.
1 http://nltk.org/
        </p>
        <p>A high entropy value indicates that a word appears evenly in all the classes, whereas
low entropy values mean that a word is more frequent in one of the classes, hence,
being more discriminative of a given sentiment. After computing the entropy values for
each term, a threshold was defined and words with entropy above were discarded,
reducing vocabulary size.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Features</title>
        <p>The document representation model was enriched with features that take into account
the presence of words with prior polarity, such as “happy” (positive) or “sad”
(negative) in a document. In the spirit of previous approaches, features that aim at capturing
the creative and informal use of language in tweets, were also extracted. In summary,
the following features were employed:
– Sentiment Lexicons: number of words with positivennegative prior polarity and
a score obtained by summing both. Negation was taken into account by detecting
the presence of a negation token in a window of two words. Bing Liu’s Opinion
Lexicon2 was employed for this feature.
– Heavy Punctuation: number of sequences of exclamation marks, question marks
and combinations of both.
– Upper-case Words: number of words all in upper case.
– Emoticons: number of positive and negative emoticons. The polarity of emoticons
was assessed with custom regular expressions and a list of polar emoticons used in
SentiStrength.
– Emphasized Words: number of words emphasized with more than 2 character
repetitions, e.g., “awesoooome”.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Context</title>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>As mentioned in Section 1.1 the training and test datasets also contained the contents of
web pages linked on tweets. To take advantage of this extra information we employed a
simplistic strategy. We extracted the title HTML tag of these pages and appended it to
text of the tweet associated with it.</p>
      <p>To test the impact of each feature and determine the best settings for this problem, we
used a 70%-30% split to evaluate different versions of the classifier. Table 2 reports
our results for the English classifier. The baseline used only a binary bag-of-words and
Brown clusters representation of each tweet without any vocabulary reduction. We then
added the traditional features mentioned in Section 2.4. The binary weighting scheme
was changed afterwards for a Delta-tf.idf, followed by a vocabulary reduction using
entropy, and finally the addition of the HTML title tags.
2 http://www.cs.uic.edu/˜liub/FBS/sentiment-analysis.html
Classifier Accuracy Polar F1
Baseline 71.7% 69.5%
+ Features 72.4% 71.0%
+ Delta TF.IDF 72.8% 71.0%
+ Vocabulary Reduction 73.5% 71.5%
+ HTML Title 73.7% 72.0%</p>
      <p>Table 3 shows a simplified table for the Spanish classifier. In this case the baseline
was simply a binary bag-of-words. As mentioned previously, the full classifier
employed every strategy in the English classifier except Brown clusters and lexicons.</p>
      <p>Classifier Accuracy Polar F1
Baseline 71.6% 74.5%</p>
      <p>Full classifier 72.3% 75.5%</p>
      <p>
        Finally, Table 4 shows our results in the final evaluation by the organization of
RepLab 2013, using a test dataset. Each participant could submit up to ten different
runs, and we opted to submit different variations of the vocabulary reduction and the
use of the context HTML title tags. Several vocabulary sizes were tested experimentally
and the best were submitted. The reported metrics of reliability and sensitivity, related
to precision and recall of the positive and negative classes, are described by
        <xref ref-type="bibr" rid="ref1">Amigo et al.
(2012)</xref>
        .
      </p>
      <p>Our submissions ranked third in the accuracy metric, first in Pearson correlation and
second in the F-measure of sensitivity and reliability.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Although the problems of evaluating the polarity of sentiment and analyzing polarity
for the reputation of an entity expressed in a tweet are different we found that the same
principles and techniques can be used.</p>
      <p>Our classifier achieved good results even in a different language, such as Spanish,
without any language-specific features. The experiments and evaluation show that the
proposed approach is robust and indicate that the underlying problems of reputation
analysis are not very different from free domain sentiment analysis.</p>
      <p>We also note that the use of context, even in such a simplistic way as the one we
used, improves the overall results. It becomes clear that more sophisticated approaches
should be explored in the future.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was partially supported by FCT (Portuguese research funding agency) under
project grants UTA-Est/MAI/0006/2009 (REACTION) and PTDC/CPJ-CPO/116888/2010
(POPSTAR). FCT also supported scholarship SFRH/BD/89020/2012. This research
was also funded by FCT under contract Pest-OE/EEI/LA0021/2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Reliability and sensitivity: Generic evaluation measures for document organization tasks</article-title>
          .
          <source>Technical report, Technical report, UNED.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>P. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desouza</surname>
            ,
            <given-names>P. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercer</surname>
            ,
            <given-names>R. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietra</surname>
            ,
            <given-names>V. J. D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          (
          <year>1992</year>
          ).
          <article-title>Classbased n-gram models of natural language</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):
          <fpage>467</fpage>
          -
          <lpage>479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozareva</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and Wilson,
          <string-name>
            <surname>T.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Semeval-2013 task 2: Sentiment analysis in twitter</article-title>
          .
          <source>In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2013</year>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Pak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Paroubek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Twitter as a corpus for sentiment analysis and opinion mining</article-title>
          .
          <source>Proceedings of LREC</source>
          , pages
          <fpage>1320</fpage>
          -
          <lpage>1326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Paltoglou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Thelwall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>A study of information retrieval weighting schemes for sentiment analysis</article-title>
          .
          <source>In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>1386</fpage>
          -
          <lpage>1395</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Thumbs up?: sentiment classification using machine learning techniques</article-title>
          .
          <source>In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-</source>
          Volume
          <volume>10</volume>
          , pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Shannon</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Weaver</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>1948</year>
          ).
          <article-title>A mathematical theory of communication.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>