<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FBM-Yahoo! at RepLab 2012</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jose M. Chenlo</string-name>
          <email>josemanuel.gonzalez@usc.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jordi Atserias</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Rodriguez</string-name>
          <email>carlos.rodriguezg@barcelonamedia.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roi Blanco</string-name>
          <email>roi@yahoo-inc.com</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>This paper describes FBM-Yahoo!'s participation in the proling task of RepLab 2012, which aims at determining whether a given tweet is related to a speci c company and, in if this being the case, whether it contains a positive or negative statement related to the company's reputation or not. We addressed both problems (ambiguity and polarity reputation) using Support Vector Machines (SVM) classi ers and lexicon-based techniques, building automatically company pro les and bootstrapping background data. Concretely, for the ambiguity task we employed a linear SVM classi er with a token-based representation of relevant and irrelevant information extracted from the tweets and Freebase resources. With respect to polarity classi cation, we combined SVM lexicon-based approaches with bootstrapping in order to determine the nal polarity label of a tweet.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        RepLab [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] addresses the problem of Reputation analysis, i.e. mining and
understanding opinions about companies and individuals, a harder and still not well
understood problem. FBM-yahoo! participates in the RepLab Pro ling task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
where systems are asked to annotate two kinds of information on tweets:
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Ambiguity task</title>
      <p>
        Company Representation
Twitter messages are short (up to 140 characters), hence, measures that account
for the textual overlap between tweets and company names are in general not
enough to classify a given tweet as relevant or irrelevant [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], mostly due to data
sparsity and lack of context [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In order to alleviate this problem, we turned
into using the Freebase4 graph and Wikipedia5 as reliable sources of information
for building expanded term-based representations of the di erent companies.
      </p>
      <p>From the Freebase/Wikipedia pages of the companies we extracted
automatically two sets of entities, namely related concepts and non-related concepts :
{ Related Concepts (RC): represents the set of entities that are connected with
the company in Freebase through the incoming (outgoing) links connected
to the company's Freebase page. For example, in the case of Apple Inc., the
related concepts set includes iPhoto, ichat, ibook, iTunes Store.
{ Non-Related Concepts (NRC): represents the set of common entities with
which the current company could cause spurious term matches. This set is
comprised of all Freebase entities with a name similar to that of the
company's. This set is built automatically by querying Freebase with the query
that identi es the company in the training data. From this set we remove
the target company (if it was found), and all the entities that are already
included in RC, and all entities that shared at least one non-common
category with the target company. As an example of this process, in the case
of Apple Inc. some of the non-related entities selected were \big apple" or
\pine apple".</p>
      <p>Then for each entity obtained following the previous method we have crawled
its Wikipedia page6 and then we have used Lucene7 software to compute the
following lists of keywords for each set of entities (RC, NRC):
{ entity names : Name of the entities related (non-related) to the company.
{ named entities in text : All named entities extracted by the Stanford
Named</p>
      <p>
        Entity Recognizer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
{ ngrams: Unigrams and bigrams (applying stemming and removing
stopwords).
      </p>
      <p>A weight w is associated to all of the obtained keywords (list of entities,
named entities in text, ngrams). In the case of the entities, the weight is always
1. For named entities in text and ngrams, the weight is the ratio of documents
that contain the concrete keyword.</p>
      <p>These lists of keywords represent our pro le for a given company as a bag of
words model. We note that tweets could be written in English and Spanish and
accordingly we have computed two di erent pro les for each company: one with
the English version of Wikipedia an the other one with the Spanish version.
4 http://www.freebase.com
5 http://www.wikipedia.org
6 In the data-set tweets are written in either English or Spanish. For this reason we
have downloaded and stored both versions when possible.
7 http://lucene.apache.org
2.2</p>
      <p>
        Training Process
In recent years, Machine Learning techniques have been deeply applied over
Twitter data with relative success in many classi cation problems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ].
Concretely, the best system in WePS-3 Evaluation Campaign [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where the main
task consisted in identifying if a tweet that contains a company name is
related or not to the company, employed a linear SVM classi er. Following this
approach, we have trained a SVM linear classi er using the LibLinear package
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Table 1 lists the features that are being used to represent the data, which
are broken down into matches from terms in the tweet in the company's
prole (pro le), features related to the company name in the tweet (company) and
company-independent (tweet-only) features.
      </p>
      <p>Scope</p>
      <p>description</p>
      <p>Note that the last six features compare a given tweet with the pro le
computed for the company. The rst six features are tweet-dependent, and they only
need the text of the tweet and the query that represents the company. Using this
representation we were able to learn a classi er over the trial set (six companies)
that can be directly applied to the test data.</p>
    </sec>
    <sec id="sec-3">
      <title>Polarity for Reputation task</title>
      <p>The following sections explain three di erent approaches (lexicon-based and
distant supervision using hashtags and lexicons) we explored in order to determine
whether a tweet has positive or negative implications for the company's
reputation.
3.1</p>
      <p>Lexicon-Based Approaches
The most straightforward approaches employ an ensemble of several lexicons
created with di erent methodologies in order to broaden coverage, especially
across domains since some sentiment cues are used di erently depending on the
subject being commented.</p>
      <p>In order to aggregate the lexicon scores into a nal polarity measure, several
formulas can be used, for instance:
polScore(t; lan; qt) =</p>
      <p>X polLex(t; li; qt)</p>
      <p>(1)
li2lan
where t is a tweet, lan is the language of the tweet, qt is a query, li is one of the
lexicons associated to lan and polLex(t; li) is a matching function between the
lexicon li and the tweet t. We have developed two di erent matching functions,
polLexraw and polLexsmooth. polLexraw is a simple aggregation measure that
takes into account just the matchings between tweets and lexicons to compute
the nal polarity:
polLexraw(t; l; qt) = X tfwl;t priorP ol(wl) (2)</p>
      <p>wl2l
where t represents a simple tweet, l is one of the lexicons associated to the
language of the tweet, wl is an opinionated word from lexicon l, tfwl;t is the
frequency of wl in tweet t and priorP ol(wl) is the polarity score of word wl in
lexicon l. 8</p>
      <p>On the other hand, polLexSmooth is an aggregation measure that takes into
account the matchings between tweets and lexicons and the distance of these
matchings to the company name to smooth the score of polarity of each word:
1 X</p>
      <p>X</p>
      <p>1
jqtj qi2qt wl2l\t dwl;qi
polLexsmooth(t; l; qt) =
priorP ol(wl)
(3)
where dwl;qt is the distance of the tweet term wl to query term qi.</p>
      <p>Finally, we decide the nal classi cation of each tweet using the following
simple thresholding:</p>
      <p>8 positive if polScore(t; l; qt) &gt; 0
pol(t) = &lt; neutral if polScore(t; l; qt) = 0
: negative if polScore(t; l; qt) &lt; 0
(4)
8 This score could be positive or negative depending on the orientation of wl.</p>
      <p>Note that it is possible to compute two di erent values for polScore(t; l; qt)
by applying either Equation 2 or Equation 3 to the formula in Equation 1. Full
details about which methods have been used in the runs submitted can be found
in Section 4.
3.2</p>
      <p>
        Distant Supervision
Traditional opinion mining methods proposed in the literature are often based on
machine learning techniques, using as primary features a vocabulary of unigrams
and bigrams collected from training data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Following this approach and we have used a linear SVM to classify tweets
as positive, neutral or negative. Table 2 lists the features employed to represent
the data, which are broken down into tweet-based features, part of speech-based
features and lexicon-based features.</p>
      <p>Scope Description
Voc. vocabulary features: Unigrams and bigrams from training examples.</p>
      <p>Size of tweet.</p>
      <p>Number of links.</p>
      <p>Tweet INfutmhebetwreoefthcaosuhltdagbse. spam (a single word appears more than three times).</p>
      <p>Number of exclamations and interrogations.</p>
      <p>Number of uppercase letters.</p>
      <p>Number of lengthening phenomena.</p>
      <p>Number of verbs.</p>
      <p>POS NNuummbbeerr ooff apdrojepcetrivneasm.es.</p>
      <p>Number of pronouns.</p>
      <p>Number of positive emoticons.</p>
      <p>The lexicon-based approaches previously described do not require training
and can be directly applied over test data. However, the proposed data
representation requires some amount of training data to compute the vocabulary
features for each tweet which was not available at training time. Moreover, due
to the fact that the companies in the test set belong to di erent domains (e.g.
banks vs technology), the terms (and even their senses) used for express opinions
may change from one company to another.</p>
      <p>
        For that reason, we learnt di erent a model for each company in which we
automatically generated a set of labelled examples from their background model.
Other recent work on this area has focused on distantly supervised methods
which learn the polarity classi ers from data with noisy labels such as emoticons
and hashtags [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Distant Supervision using Hashtags
Similarly to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], for each polarity class (i.e. positive, negative and neutral) we
have performed the following process to automatically generate positive, neutral
and negative labelled examples:
1. Selecting all hashtags that were used in more than 5 tweets in the background
model of the company.
2. Removing the noisy content (spam, repeated tweets, retweets, etc.) for each
hashtag.
3. Using the equation 1 in conjunction with equation 2 as matching function to
select the top 5 positive/negative/neutral hashtags, according to the ratio
of tweets of each hashtag that were classi ed as positive/negative/neutral.
4. Selecting the top 20 tweets of each polarity from top hashtags.
      </p>
      <p>This bootstrapping process enables to obtain up to 100 positive, negative
and neutral labelled examples (i.e. up to 300 examples in total) to train di erent
classi ers.</p>
      <p>Once we have generated our labelled examples, we have trained a positive
classi er (positive examples against negative plus neutral examples), and a
negative classi er (negative examples against positive plus neutral examples) for each
company in the test set. We have also trained the best thresholds that separated
the positive and the negative examples for each classi er. Finally, we combined
the two classi ers and the thresholds learned to decide if a given tweet had to
be tagged as positive, neutral or negative.</p>
      <p>Learning the Best Threshold In the previously described approach, we
selected the class decision threshold for a classi er using data which could
potentially contain noisy labels and consequently could harm the performance of our
system. To alleviate this problem, we randomly assessed 50 examples from the
background data of each company and we selected the positive/negative
thresholds for each classi er according to the the class distribution found in the data.
Full details about which runs submitted were built with this kind of training can
be found in Section 4.
3.4</p>
      <p>Distant Supervision using lexicons
This distant supervision method is similar to the one explained in Section 3.3,
with the di erence that it makes use of the polarity lexicons instead of the tweet
hashtags.</p>
      <p>The following process is undertaken for each polarity class (i.e. positive,
negative and neutral), in order to automatically generate positive, neutral and
negative labelled examples for each company:
1. Select as positive examples tweets that only have positive matches sorted by
the number of matches in the lexicon.
2. Select as neutral examples tweets that no matches ordered by the tweet
length.
3. Select as negative examples tweets that only have negative matches sorted
by the number of matches in the lexicon.</p>
      <p>Similarly to the distant supervision method using hashtags doing this
bootstrapping process we select up to 100 positive, negative and neutral labelled
examples (i.e. up to 300 examples in total) in order to train di erent
classiers for each company. These examples are selected in order of their number of
matches.</p>
      <p>The nal classi er is built using the thresholded ensemble described in
Section 3.3.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Submitted Runs</title>
      <p>FBM-Yahoo! participated in the pro ling task of RepLab 2012 competition with
5 di erent runs9. The particular details on how the FBM-Yahoo! 5 runs runs were
made can be found in Table 3. All runs use the method explained in section 2.1 to
classify a tweet as relevant or irrelevant, but they di er on the polarity method
used to compute the nal label of a tweet (i.e. positive,negative or neutral ).</p>
      <p>
        Regarding the polarity lexicon based method described in section 3.1 we
employed a total of six di erent polarity lexicons for English (including OpinionFinder[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
AFINN [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Qwordnet[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], dictionaries from the Linguistic Inquiry and Word
Count (LIWC) text analysis system [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]10 and ve polarity lexicons for Spanish.
Following [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] we also combine these lexicons with a lexicon based on emoticons.
      </p>
      <p>
        Since the resources available for Spanish are scarce, we translated some of
the resources available for English, for instance, some baseline lexicons like the
one used by OpinionFinder (the MPQA Subjectivity Lexicon), or AFINN [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
In order to resolve ambiguities in this bilingual dictionary and to adapt it to
micro-blogging usage, we selected the translation alternative that occurred most
frequently on an alternative large (100,000) Spanish Twitter corpus (di erent
from the one provided by RepLab).
      </p>
      <p>
        As an additional approach we used author-assessed datasets to create polar
lexicons from customer reviews, in this case, from 100,000 good vs. bad comments
sent to Hotels.com and other such sites, like movie comments from volunteer
reviewers and professionals. A Naive Bayes classi er was trained, from which a
list of class-discriminative unigrams and bigram was extracted. Only adjectives
and adverbs from those list were ltered to create a data-driven polar lexicon,
similar to the method of Banea and Mihalcea [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] that employs an automatically
translated corpus. Finally, starting from a small, manually crafted dictionary, we
expanded its polar entries via WordNet synsets.
9 Another run (UNED 5) was submitted in collaboration with UNED which combines
all FBM-Yahoo! and UNED runs. The details on the combination are described at
see section 3 of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
10 Mapping positive and negative sentiments to numeric polarities, expanding the
lexicon to possible morphological variants.
      </p>
      <p>Polarity for reputation task. According to the o cial measures (R and
S), the runs that take into account just the overlapping between tweets and
lexicons (i.e. BMedia2 and BMedia3 ) performed the best for polarity classi
cation. Nonetheles, bootstrapping approaches were very competitive in terms
of accuracy. In fact, the performance they achieved is very close to that of the
lexicon-based approaches, and therefore the rst conclusion we can extract from
this evaluation is that distant supervised approaches take a limited advantage of
training data in this benchmark. This could be due to the fact that lexicons
contribute for most of the model signal and might make di cult to learn anything
from other sources of features. Moreover, the noise introduced by misclassi
cation data in the training process could harm the performance of the learning
process more than improve it.</p>
      <p>Pro ling task. In this task, all methods behave similarly in terms of
performance, being BMedia4 the best run. This method combines the hashtag
bootstrapping approach with the selection of a threshold for each classi er learnt
from hand-classi ed tweets from background models. It is worth to remark that
we have selected the best threshold for a classi er using data which contains
noisy labels and consequently could harm the overall performance of the system.
In order to overcome this problem, we set a di erent threshold for each classi er
using background data. Results indicate that setting this threshold alleviates the
score noise coming from lexicon bootstrapped examples.</p>
      <p>
        Finally, as future work, we would like to explore how sentiment in Twitter
streams are a ected by real-world events, which a ect severely Twitter topic
trends. For example, if a football team loses a match, probably the next day the
overall opinion about this team will be to negative. We would also like to study
how to detect the polarity changes across the time and how to adapt our
classi cation models to this new scenarios. More concretely, we would like to apply
propensity scoring techniques [
        <xref ref-type="bibr" rid="ref19 ref20">19,20</xref>
        ] to deal with the fact that training instances
are governed by a distribution that di ers greatly from the test distribution.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work is partially funded by the Holopedia Project (TIN2010-21128-C02-02),
Ministerio de Ciencia e Innovacion.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corujo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meij</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rijke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>d</year>
          .: Overview of replab 2012:
          <article-title>Evaluating online reputation management systems</article-title>
          .
          <source>In: CLEF 2012 Labs and Workshop</source>
          Notebook Papers. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Surender</given-names>
            <surname>Reddy</surname>
          </string-name>
          <string-name>
            <surname>Yerva</surname>
          </string-name>
          , Zoltan Miklos,
          <string-name>
            <surname>K.A.</surname>
          </string-name>
          :
          <article-title>It was easy, when apples and blackberries were only fruits</article-title>
          .
          <source>In: CLEF (Notebook Papers)</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>Finding support sentences for entities</article-title>
          . In Crestani, F.,
          <string-name>
            <surname>Marchand-Maillet</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efthimiadis</surname>
            ,
            <given-names>E.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savoy</surname>
          </string-name>
          , J., eds.: SIGIR,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2010</year>
          )
          <volume>339</volume>
          {
          <fpage>346</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Feature-rich part-of-speech tagging with a cyclic dependency network</article-title>
          .
          <source>In: In Proceedings of HLT-NAACL</source>
          <year>2003</year>
          .
          <article-title>(</article-title>
          <year>2003</year>
          )
          <volume>252</volume>
          {
          <fpage>259</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bermingham</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smeaton</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          :
          <article-title>Classifying sentiment in microblogs: is brevity an advantage? In: CIKM</article-title>
          . (
          <year>2010</year>
          )
          <year>1833</year>
          {
          <fpage>1836</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Go</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhayani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Twitter sentiment classi cation using distant supervision</article-title>
          .
          <source>Processing</source>
          (
          <year>2009</year>
          ) 1{
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vovsha</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rambow</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passonneau</surname>
          </string-name>
          , R.:
          <article-title>Sentiment analysis of twitter data</article-title>
          .
          <source>In: Proceedings of the Workshop on Language in Social Media (LSM</source>
          <year>2011</year>
          ), Portland, Oregon, Association for Computational Linguistics (
          <year>June 2011</year>
          )
          <volume>30</volume>
          {
          <fpage>38</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borthwick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sekine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amigo</surname>
          </string-name>
          , E.:
          <article-title>Weps-3 evaluation campaign: Overview of the web people search clustering and attribute extraction tasks</article-title>
          . In: CLEF (Notebook Papers/LABs/Workshops). (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.J.H.X.R.</given-names>
          </string-name>
          , Lin.,
          <string-name>
            <surname>C.J.:</surname>
          </string-name>
          <article-title>LIBLINEAR: A library for large linear classi cation</article-title>
          .
          <source>In: Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          ).
          <article-title>(</article-title>
          <year>2008</year>
          )
          <year>1871</year>
          {
          <fpage>1874</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Thumbs up? sentiment classi cation using machine learning techniques</article-title>
          .
          <source>In: Proceedings of EMNLP</source>
          . (
          <year>2002</year>
          )
          <volume>79</volume>
          {
          <fpage>86</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kouloumpis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <article-title>Twitter sentiment analysis: The good the bad and the omg</article-title>
          ! In: ICWSM. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jorge Carrillo de Albornoz</surname>
            ,
            <given-names>I.C.y.E.A.</given-names>
          </string-name>
          :
          <article-title>Using an emotion-based model and sentiment analysis techniques to classify polarity for reputation</article-title>
          .
          <source>In: CLEF 2012 Labs and Workshop</source>
          Notebook Papers. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Wilson,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Ho mann, P.:
          <article-title>Recognizing contextual polarity in phraselevel sentiment analysis</article-title>
          .
          <source>In: HLT/EMNLP</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>F.A.:</given-names>
          </string-name>
          <article-title>A new ANEW: Evaluation of a word list for sentiment analysis in microblogs</article-title>
          .
          <source>CoRR</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <article-title>Garc a-Serrano, A.: Q-wordnet: Extracting polarity from wordnet senses</article-title>
          . In Chair),
          <string-name>
            <given-names>N.C.C.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Rosner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tapias</surname>
          </string-name>
          , D., eds.
          <source>: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)</source>
          , Valletta, Malta, European Language Resources Association (ELRA) (may
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Francis</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Booth</surname>
          </string-name>
          , R.J.:
          <article-title>Linguistic inquiry and word count: Liwc 2001</article-title>
          . Mahway: Lawrence Erlbaum Associates (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kun-Lin</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu-Jun Li</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          :
          <article-title>Emoticon smoothed language models for twitter sentiment analysis</article-title>
          .
          <source>In: Proceedings of the Twenty-Sixth AAAI Conference on Arti cial Intelligence (AAAI)</source>
          .
          <article-title>(</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Learning multilingual subjective language via crosslingual projections</article-title>
          .
          <source>In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Bickel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Bruckner,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Sche er, T.:
          <article-title>Discriminative Learning Under Covariate Shift</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>10</volume>
          (
          <year>September 2009</year>
          )
          <volume>2137</volume>
          {
          <fpage>2155</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          :
          <article-title>Linear-time estimators for propensity scores</article-title>
          .
          <source>Journal of Machine Learning Research - Proceedings Track</source>
          <volume>15</volume>
          (
          <year>2011</year>
          )
          <volume>93</volume>
          {
          <fpage>100</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>