<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>BittenPotato: Tweet sentiment analysis by combining multiple classi ers.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iosu Mendizabal Borda</string-name>
          <email>iosu@iiia.csic.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeroni Carandell Saladich</string-name>
          <email>jeroni.carandell@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>(IIIA) Arti cial Intelligence, Research Institute, (CSIC) Spanish Council for, Scienti c Research</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>(UPC) Universitat Politecnica de Catalunya, (URV) Universitat Rovira i Virgili, (UB) Universitat de Barcelona</institution>
        </aff>
      </contrib-group>
      <fpage>71</fpage>
      <lpage>74</lpage>
      <abstract>
        <p>In this paper, we use a bag-of-words of n-grams to capture a dictionary containing the most used "words" which we will use as features. We then proceed to classify using four di erent classi ers and combine their results by apply a voting, a weighted voting and a classi er to obtain the real polarity of a phrase.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sentiment analysis is the branch of natural
language processing which is used to
determine the subjective polarity of a text. This
has many applications ranging from the
popularity of a certain product, the general
opinion about an event or politician among many
others.</p>
      <p>In the particular case of twitter texts,
these have the misfortune or great advantage
of only consisting of a maximum of 140
characters. The disadvantage is that short texts
aren't very accurately describable with bag of
words which we will use, on the other hand,
the limit also forces the author of the tweet to
be concise in its opinion and therefore noise
or non relevant statements are usually left
out.</p>
      <p>
        In this workshop for sentiment analysis
focused on Spanish, a data set with tagged
tweets according to their sentiment is given
along with a description of evaluation
measures as well as descriptions of the di erent
tasks
        <xref ref-type="bibr" rid="ref4">(Villena-Roman et al., 2015)</xref>
        .
      </p>
      <p>The rest of the article is laid out as
follows: Section 2 introduces the architecture
and components of the system, namely the
pre-processing, the extraction of features, the
algorithms used and then the process applied
to their results to obtain our nal tag.
Section 3 analyses the results obtained in this
workshop. Finally, to conclude, in section 4
we will draw some conclusions and propose
some future work.
2</p>
      <p>Architecture and components of
the system
Our system contains four main phases: data
pre-processing, feature extraction -
vectorization, the use of classi ers from which we
extract a new set of features and nally a
combined classi er which uses the latter to
predict the polarity of the text.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Pre-processing</title>
      <p>This step, crucial to all natural language
processing task, consists of extracting noise from
the text. Many of the steps such as
removal of URLs, emails, punctuation,
emoticons, spaced words etc. are general and we
will not get into so much, yet some are more
particular to the language in particular, such
Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido
as the removal of letters that are repeated
more than twice in Spanish.
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Vectorization: Bag of words</title>
      <p>
        In order to be able to apply a classi er, we
need to turn each tweet into a vector with
the same features. To do this, one of the
most common approach is to use the
Bagof-Words model with which given a corpus
of documents, it nds the N most relevant
words (or n-grams in our case). Each
feature, therefore represents the appearance of
a di erent relevant "word". Although the
relevance of a word can be de ned as the
number of times it appears in the text, this has
the disadvantage of considering words that
appear largely throughout the whole
document and lack semantic relevance. In order
to counter this e ect a more sophisticated
approach called tf-idf (term frequency -
inverse term frequency) is used. In our project
we used the Scikit-Learn T dfVectorizer
        <xref ref-type="bibr" rid="ref3">(Pedregosa et al., 2011)</xref>
        to convert each tweet to
a length N feature vector.
2.3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Algorithms: Classi ers</title>
      <p>
        Once we have a way of converting sentences
into a representation with the same features,
we can use any algorithm for classi cation.
Again, for all of the following algorithms we
used the implementations in the Scikit-Learn
python package
        <xref ref-type="bibr" rid="ref3">(Pedregosa et al., 2011)</xref>
        .
2.3.1
      </p>
      <p>SVM
The rst simple method we use is a support
vector machine with a linear kernel in
order to classify. This is generally the most
promising in terms of all the used measures
both with the complete of reduced number of
classes.
2.3.2</p>
      <sec id="sec-4-1">
        <title>AdaBoost (ADA)</title>
        <p>
          Adaboost is also a simple, easy to train since
the only relevant parameter is the number of
rounds, and it has a strong theoretical
basis in assuring that the training error will
be reduced. However, this is only true with
enough data
          <xref ref-type="bibr" rid="ref2">(Freund and Schapire, 1999)</xref>
          ,
given that the large number of features (5000)
compared to the number of instances to train
(around 4000 because of the cross validation
with the training data that we will use to
test the data), this is the worst performing
method as can be seen in tables 1 and 2.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>2.3.3 Random Forest (RF)</title>
        <p>
          We decided to use this ensemble method as
well because it has had very positive e ects
with accuracies that at times surpass the
AdaBoosts thanks to its robustness to noise and
outliers
          <xref ref-type="bibr" rid="ref1">(Breiman, 2001)</xref>
          .
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>2.3.4 Linear Regression (LR)</title>
        <p>Since the degrees of sentiment polarity are
ordered, we decided that it would also be
appropriate to consider the problem as a
discrete regression problem. Although a very
straightforward approach, it seems to give
the second best results in general at times
surpassing the SVM (Tables 1 and 2).
2.4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Result: Combining classi ers</title>
      <p>After computing the confusion matrices of
the used classi ers we reached the conclusion
that certain algorithms were better at
capturing some classes than others. These confusion
matrices can be observed in the next Section
3. Because of this reason we decided to
combine the results of di erent classi ers to have
more accurate results. In other words, we use
the results of the single classi ers as a codi
cation of the tweet into lower dimension. We
can interpret each single classi er as an
expert that gives its diagnose or opinion about
the sentiment of a tweet. Since these di
erent experts can be mistaken and disagree, we
have to nd the best result by combining the
latter.</p>
      <p>We tried three di erent combining
methods. The rst method is a simple voting of
the di erent classi ers results and the more
repeated one wins, in case of draws a random
of the drawing ones will win. The second
proposal is a more sophisticated voting with
weights in each of the classi er results, these
weights are computed with a train set and are
normalized accuracies of the classi cation of
this set.</p>
      <p>Finally, the third method consists of
another classi cation algorithm, this time of
results. The idea is that we treat each previous
classi er as an expert that give its own
diagnose of the tweet, given that we have the
real tweets, we decided to train a Radial
Basis Function (RBF) with all of the training
dataset and afterwards use the RBF to
classify the nal test results, which were the
results we uploaded to the workshop. All three
of these methods enhanced our results by few
yet signi cant points. This can be thought of
as a supervised technique for dimensionality
BittenPotato: Tweet sentiment analysis by combining multiple classifiers
reduction, since we convert a dataset of 5000
features into only 4.
3</p>
      <sec id="sec-5-1">
        <title>Empirical analysis</title>
        <p>We are now going to analyse the results
obtained in the workshop with the given
testing tweet corpus. This section is separated in
two subsections, rstly we will introduce the
results obtained with the use of the four
classi ers explained in Section 2.3. Secondly, we
will focus on the usage of the three combining
methods introduced in Section 2.4.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Single classi ers</title>
      <p>First of all we are going to talk about the
results obtained with the simple use of the four
single classi ers explained in Section 2.3. The
analysis is done with two di erent data sets;
on the one hand a set separated in four classes
and on the other hand a data set separated
in six classes.</p>
      <p>As it is depicted in Tables 1 and 2 the
SVM and the Linear Regression classi ers are
the most optimal ones in terms of the
f1measure which is the harmonic mean between
the recall and the precision.</p>
      <p>Acc</p>
      <p>Precision Recall F1
SVM 57.6667 0.4842
AB. 49.3333 0.4193
RF 54.0000 0.5122
LR 59.3333 0.4542</p>
      <p>Observing the confusion matrix of the
previously mentioned techniques, Random forest
and Linear regression, we can learn perhaps
more about the data itself. For instance, that
the number of Neutral tweets are so low that
tweets are rarely classi ed as such as seen in
the NEU columns of the confusion matrices in
gures 2 and 1. Another curious fact is that
P+ labels are very separable for our
classiers. This could be because extremes might
contain most key words that determine a
positive review as opposed to the more subtle
class P.
After applying the 4 previous single
classiers to each tweet, we obtain a data matrix
where each features correspond to the label
set by each classi er. We can interpret this
as some sort of dimensionality reduction
technique where we now have a tweet transformed
into an element of 4 attributes each
corresponding to a classi er's results.</p>
      <p>In tables 3 and 4 we can see the o cial
results of the three combined classi ers on
the Train data.</p>
      <p>We have to keep in mind that when we are
comparing the combined classi ers with the
single classi ers, we are using two di erent
test sets. In the single classi ers, we use
3Cross Validation exclusively on the train data
to obtain average measure for each classi er.
With the combined classi ers, we trained on
the Train set and evaluated on the nal Test
set.</p>
      <p>Notice that the weighted voting
outperforms the normal voting. This seems
intuitive because the weighted voting gives more
importance to the most reliable classi ers.
The RBF's results are not as promising as
the previous two methods but it still
outperforms all of the single classi ers.</p>
      <p>Acc Precision Recall F1
Voting 59.3 0.500
Weighted Voting 59.3 0.508
RBF 60.2 0.474</p>
      <p>In general we can see that these methods,
with the exception of the SVM in terms of
F1-measure, outperform the rest.
4</p>
      <sec id="sec-6-1">
        <title>Conclusions and future work</title>
        <p>In this paper we have described our approach
for the SEPLN 2015 for the global level with
relatively good results considering the
number of classes, and the general di culty of the
problem.</p>
        <p>We have started by describing the initial
preprocessing and the extraction of features
using a bag of words on trigrams and
bigrams. Then we have described and
compared four di erent classi ers that we lated
used as a way of translating the data into
merely 4 dimensions, from 5000.</p>
        <p>We can conclude that multiple classi ers
are good at capturing di erent phenomena
and that by combining them we tend to have
a better global result as we have obtained in
most of the TASS 2015 results of the Global
level.</p>
        <p>In general we are satis ed with the results
obtained of the TASS2015 challenge. As
future work, we propose to explore di erent
classi ers that might capture di erent
phenomena so that the combined classi er might
have more diverse information. Also di erent
combined classi ers should be trained.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):5{
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Freund</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>A short introduction to boosting</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          {
          <fpage>2830</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Villena-Roman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garc</surname>
          </string-name>
          a-Morera,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Garc</surname>
          </string-name>
          a-Cumbreras,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Mart nez-</article-title>
          <string-name>
            <surname>Camara</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          <string-name>
            <surname>Mart</surname>
            n-Valdivia, and
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Uren</surname>
          </string-name>
          ~a Lopez.
          <year>2015</year>
          . Overview of tass
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>