<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Feature Selection Metrics for Polarity Analysis in RepLab 2012</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hogyeong Jeong</string-name>
          <email>hogyeong.jeong@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hyunjong Lee</string-name>
          <email>hyunjong.lee.s@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Seoul</institution>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this workings notes paper for RepLab 2012, we describe our method of using feature selection metrics for polarity analysis. We use the correlation coefficient, a one-sided metric, to assign polarity scores to relevant words within a tweet; we then use aggregate of these scores to determine polarity of the tweet. Our results show a reasonable level of performance compared to other methods.</p>
      </abstract>
      <kwd-group>
        <kwd>Correlation coefficient</kwd>
        <kwd>feature selection</kwd>
        <kwd>polarity analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This paper describes a correlation-coefficient based procedure for determining
polarity of a tweet. Correlation coefficients have been used successfully for text
categorization, and also for sentiment analysis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] . In this paper, we describe
how the correlation coefficient can be used to perform polarity analysis, and
compare our results with other participants in RepLab 2012 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
1.1
      </p>
      <sec id="sec-1-1">
        <title>Tasks Performed</title>
        <p>Among the many tasks for RepLab 2012, we concentrated on polarity analysis
of the profiling task. Also, while there were tweets in both English and Spanish,
we chose to focus on only the English tweets.
1.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Main Objectives of Experiments</title>
        <p>
          The main objective of the experiment was to determine the polarity (positive,
neutral, or negative) of a single tweet. Irrelevant tweets were excluded from the
polarity analysis. Although this task is closely related with sentiment analysis
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], it is somewhat different as it focuses on reputation instead of sentiment.
1.3
Our method uses some of the feature selection metrics that are described in
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to perform polarity analysis. In particular, we use the one-sided correlation
coefficient, as there are three polarity classes to consider: positive, neutral, and
negative. Although how it is actually used can vary, correlation coefficient has
been used in a closely related task of sentiment analysis [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
We submitted four runs for the task: one using the basic method, two using the
basic method with modified thresholds, and the fourth that incorporated human
input for borderline cases.
2.1
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Preprocessing</title>
        <p>
          First, we had to preprocess the tweets to extract relevant keywords that we
would use to determine polarity. To do this, we used the Stanford part-of-speech
tagger, and extracted nouns and adjectives [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>Correlation Coefficients</title>
        <p>As part of the RepLab 2012 task, we were given a small set of labeled files
that we could use for training. Given the terms that we extracted above in the
preprocessing phase, and the three polarity categories (positive, neutral, and
negative), we calculated the correlation coefficient for a term t on class ci as
CC(t, ci) =
√N [P (t, ci)P (t¯, c¯i) − P (t, c¯i)P (t¯, ci)]
pP (t)P (t¯)P (ci)P (c¯i)
(1)
where t¯ denotes other terms and c¯i denotes other classes.
2.3</p>
      </sec>
      <sec id="sec-1-5">
        <title>Basic Method</title>
        <p>Using the correlation coefficients that we calculated above (for each term and
each polarity class), we can sum the correlation coefficients of a class for all the
relevant terms within a tweet. After the summation of coefficients is done for
each polarity class (positive, negative, and neutral), we then assign polarity of
the tweet to be that of the class corresponding to the largest summation.</p>
        <p>While this represents the most basic usage of the correlation coefficients, we
can modify the thresholds somewhat to try to achieve better performance.
2.4</p>
      </sec>
      <sec id="sec-1-6">
        <title>Modified Threshold 1 - Same Class Proportions as the Training Set</title>
        <p>
          One approach is to try to set the thresholds so that the resulting class proportions
on the test set is equal to the class proportions on the training set. This method
usually works best when the training set is bigger than the test set, and the test
set is similar to the training set. Unfortunately for RepLab 2012, the test set
was much larger than the training set and it was also quite different from the
training set [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Using Feature Selection Metrics for Polarity Analysis</p>
      </sec>
      <sec id="sec-1-7">
        <title>Modified Threshold 2 - Best Performance on the Training Set</title>
        <p>Another approach is to set thresholds that corresponded to the ones that achieve
best performance in the training set (via a 5-fold testing within the training set).
Again, we would expect better performance if the training set is large and similar
to the test set - neither of which were met for RepLab 2012.
2.6</p>
      </sec>
      <sec id="sec-1-8">
        <title>Basic Method with Human Input</title>
        <p>One nice result of the correlation coefficient approach is that we can get a
measure of confidence on our classifications. For example, our confidence of
a tweet being positive on classification whose values are (positive=3.8,
neutral=0.2, negative=-2.5) is much larger than our confidence if the values were
(positive=0.3, neutral=0.2, negative=-0.7) instead.</p>
        <p>We can take advantage of this additional information by introducing human
input for the borderline cases where the difference between the top two classes is
small (we used 0.5 as a threshold). These are the cases where the automatically
generated categorizations would have a high risk of being incorrect.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>
        The evaluation on classifications was performed using reliability, sensitivity, and
the corresponding F measure, which are modified recall and precision measures
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. There were a total of 42 runs submitted for the task, of which 3 served
as baselines. Evaluations were done separately for the English and the Spanish
tweets, and the results that we provide below correspond to the English results
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
      </p>
      <p>Method (rank of 42)</p>
      <p>Best (ranked 1st)
Human Input (ranked 10th)</p>
      <p>Basic (ranked 15th)
Modified Threshold 1 (ranked 26th)
Modified Threshold 2 (ranked 28th)</p>
      <p>Reliability Sensitivity F(R,S)
.369 .350 .348
.364 .275 .285
.265 .280 .260
.230 .194 .198
.241 .184 .194</p>
      <p>As we feared, modifying the thresholds resulted in much worse performance,
because the training set was small, and unlike the test set. Meanwhile, the basic
correlation coefficient based method performed okay, ranking 15th of 42
submitted runs. As expected, our run integrating human input on just the borderline
cases led to a marked improvement compared to the other methods.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and Future Directions</title>
      <p>We were able to achieve reasonable results using relatively simple approach using
correlation coefficients. Further, we showed that we can markedly improve our
results by incorporating human input on cases deemed to be borderline by the
correlation coefficients.</p>
      <p>As a future direction, we can try exploiting the massive background data that
we did not use for our current results. Because the training set in this case was so
small, we can expect better results if we can exploit the background data to help
expand our training set. Once we have expanded the training set in such a way,
we may be able to expect better results from the modified threshold approaches
that were not able to perform well with a small training set.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Opinion mining and sentiment analysis</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          -2) (
          <year>2007</year>
          )
          <fpage>1</fpage>
          -
          <lpage>135</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Feature selection for text categorization on imbalanced data</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>6</volume>
          (
          <year>2004</year>
          ) 2004
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Marchetti-Bowick</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chambers</surname>
          </string-name>
          , N.:
          <article-title>Learning for microblogs with distant supervision: Political forecasting with twitter</article-title>
          .
          <source>In: EACL</source>
          . (
          <year>2012</year>
          )
          <fpage>603</fpage>
          -
          <lpage>612</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Amig´o,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Corujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Meij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Rijke</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>d</year>
          .: Overview of replab 2012:
          <article-title>Evaluating online reputation management systems</article-title>
          .
          <source>In: CLEF 2012 Labs and Workshop</source>
          Notebook Papers. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Enriching the knowledge sources used in a maximum entropy part-of-speech tagger</article-title>
          . In: In EMNLP/VLC
          <year>2000</year>
          .
          <article-title>(</article-title>
          <year>2000</year>
          )
          <fpage>63</fpage>
          -
          <lpage>70</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The impact of evaluation on multilingual text retrieval</article-title>
          .
          <source>In: SIGIR</source>
          . (
          <year>2005</year>
          )
          <fpage>603</fpage>
          -
          <lpage>604</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>