TASS 2015, septiembre 2015, pp 87-92                                            recibido 20-07-15 revisado 24-07-15 aceptado 29-07-15


Comparing Supervised Learning Methods for Classifying Spanish
                          Tweets

Comparación de Métodos de Aprendizaje Supervisado para la Clasificación de
                           Tweets en Español
                        Jorge Valverde, Javier Tejada and Ernesto Cuadros
                                   Universidad Católica San Pablo
                     Quinta Vivanco S/N , Urb. Campiña Paisajista, Arequipa - Perú
                          {andoni.valverde, jtejadac, ecuadros}@ucsp.edu.pe


        Resumen: El presente paper presenta un conjunto de experimentos para abordar la tarea de
        clasificación global de polaridad de tweets en español del TASS 2015. En este trabajo se hace
        una comparación entre los principales algoritmos de clasificación supervisados para el Análisis
        de Sentimientos: Support Vector Machines, Naive Bayes, Entropía Máxima y Árboles de
        Decisión. Se propone también mejorar el rendimiento de estos clasificadores utilizando una
        técnica de reducción de clases y luego un algoritmo de votación llamado Naive Voting. Los
        resultados muestran que nuestra propuesta supera los otros métodos de aprendizaje de máquina
        propuestos en este trabajo.
        Palabras clave: Análisis de Sentimientos, Métodos Supervisados, Tweets Españoles

        Abstract: This paper presents a set of experiments to address the global polarity classification
        task of Spanish Tweets of TASS 2015. In this work, we compare the main supervised
        classification algorithms for Sentiment Analysis: Support Vector Machines, Naive Bayes,
        Maximum Entropy and Decision Trees. We propose to improve the performance of these
        classifiers using a class reduction technique and then a voting algorithm called Naive Voting.
        Results show that our proposal outperforms the other machine learning methods proposed in
        this work.
        Keywords: Sentiment Analysis, Supervised Methods, Spanish Tweets


                                                                      methods to address the sentiment analysis
1     Introduction                                                    problem in Twitter. The vast majority of works
                                                                      aim to classify a comment, according to the
Sentiment analysis is the computational study
                                                                      polarity expressed, in three categories: positive,
of opinions about entities, events, people, etc.
                                                                      negative or neutral (Koppel and Schler, 2006).
Opinions are important because they often are
                                                                      The supervised classification algorithms are the
taken into account in decision process.
                                                                      most used methods to classify comments or
Currently, people use different social networks
                                                                      opinions.
to express their experiences with products or
                                                                      In this paper, we present a comparison of some
commercial services. Twitter is one of the
                                                                      supervised learning methods which have
biggest repositories of opinions and it is also
                                                                      achieved good results in other research works.
used as a communication channel between
                                                                      Analyzing the errors of those methods, we
companies and customers. The data generated
                                                                      propose to use a class reduction technique and a
in Twitter is important for companies, because -
                                                                      voting algorithm (which take into account the
- with that information -- they could know what
                                                                      results of supervised classifiers) to improve the
is been saying about their products, services
                                                                      classification of opinions in Twitter.
and competitors. In recent years, several
                                                                      The rest of the paper is organized as follows:
researches of NLP have developed different
                                                                      Section 2 summarizes the main works in

Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido               ISSN 1613-0073
                            Jorge Valverde Tohalino, Javier Tejada Cárcamo, Ernesto Cuadros


sentiment analysis. Section 3 describes our                    focused on different feature replacements. The
proposal and in Section 4 we describe the                      replacements were mainly based on repeated
results that we have gotten. Finally, in Section               punctuation marks, emoticons and sentiment
5, the conclusions of this work are presented.                 words. The proposal of Hernandez and Li
                                                               (2014) is based on semantic approaches with
2    Related Work                                              linguistic rules for classifying polarity texts in
                                                               Spanish. Montejo-Raez, Garcia-Cumbreras and
    There are two general approaches to classify
                                                               Diaz-Galiano (2014) use supervised learning
comments or opinions in positive, negative or
                                                               with SVM over the sum of word vectors in a
neutral:     supervised     and     unsupervised
                                                               model generated from the Spanish Wikipedia.
algorithms. Supervised classification algorithms
                                                               Jimenez et al., (2014) developed an
are used in problems which are known a priori
                                                               unsupervised classification system which uses
the number of classes and representative
                                                               an opinion lexicon and syntactic heuristic to
members of each class. The unsupervised
                                                               identify the scope of Spanish negation words.
classification algorithms, unlike supervised
                                                               San Vicente and Saralegi (2014) implement a
classification, do not have a training set, and
                                                               Support Vector Machine (SVM) algorithm.
they use clustering algorithms to try to create
                                                               That system combines the information extracted
clusters or groups (Mohri, Rostamizadeh and
                                                               from polarity lexicons with linguistic features.
Talwalkar, 2012).
                                                               For Peruvian Spanish opinions, Lopez, Tejada
    The sentiment classification task could be
                                                               and Thelwall (2012) use a specialized
formulated as a supervised learning problem
                                                               dictionary with vocabulary of that country for
with three classes: positive, negative and
                                                               Facebook comments. Lopez, Tejada and
neutral. The most used supervised techniques in
                                                               Thelwall (2012) proposed one of the first
sentiment analysis are Naive Bayes (NB),
                                                               researches that analyze Peruvian opinions. In
Support Vector Machines (SVM), Maximum
                                                               that work, authors use a basic method based on
Entropy, etc. In most cases, SVM have shown
                                                               lexical resources to classify comments from
great improvement over Naive Bayes.
                                                               Facebook.
    Cui, Mittal, and Datar (2006) affirm that
SVM are more appropriate for sentiment
                                                               3     Proposed Approach
classification than generative models, because
they can better differentiate mixed feelings.                  This paper has two major objectives: First, we
However, when the training data is small, a                    make a comparison of some of the main
Naive Bayes classifier could be more                           algorithms of supervised classification for
appropriate. One of the earliest researches on                 Sentiment Analysis: Support Vector Machines,
supervised algorithms which classify opinions                  Naive Bayes, Maximum Entropy and Decision
is presented in (Pang, Lee, and Vaithyanathan,                 Trees. The second goal is to use a class
2002). In that work, authors use three machine                 reduction technique and then a voting algorithm
learning techniques to classify the sentiment in               to improve the accuracy of final results. The
movies comments. They test several features to                 architecture of our system can be seen in Figure
find the most optimal set of them. Unigrams,                   1.
bigrams, adjectives and position of words are
used as features in those techniques. Ye, Zhang,               3.1      Comparison of Methods
and Law (2009) used three supervised learning
                                                               In this paper we compare some classification
algorithms to classify comments: SVM, Naive
                                                               methods in order to determine the performance
Bayes and Decision Trees. They use the
                                                               of these algorithms in a set of opinions written
frequencies of words to represent a document.
    Most researches are focused for the English                by Spanish users. For the experiments, we used
                                                               the four supervised classifiers described
language, since it is the predominant language
                                                               previously.    The comparison of methods has
on the Internet. There are less works of
                                                               the Training and Classification Phase. These
sentiment analysis in Spanish opinions;
                                                               phases will be explained below.
however, Spanish is playing an important role.
For Spanish comments, Perea-Ortega and                         3.1.1 Training
Balahur (2014) present several experiments to
address the global polarity classification task of
Spanish tweets. Those experiments have


                                                          88
                         Comparing Supervised Learning Methods for Classifying Spanish Tweets


         Figure 1: Proposed Approach

   For each supervised classification methods
used in this work, we identified three steps in
the training phase: comment preprocessing,
vectorization and learning.

    Preprocessing: To make a correct comment
preprocessing, we apply the following
                                                                  Figure 2: Vector Model Representation of Tweets
techniques:
         Elimination of symbols and special
           characters.                                         example of the corpus of tweets and its TF-IDF
         Elimination of articles, adverbs,                    representation.
           pronouns       and        prepositions
           (stopwords).                                           Learning: In this step, the classification
         Processing of hashtags.                              algorithm receives as parameters the
         Correction of words with repeated                    representative vectors of comments with their
           letters.                                            class labels. The class labels are: positive (P),
                                                               negative (N), neutral (NEU) and none (NONE).
         Filtration of words with ``@"
           symbol as initial letter.                           3.1.2 Classification
         Elimination of the characters ``RT".                 A classifier is a function that gives a discrete
         URLs removal.                                        output, often denoted as class, to a particular
         Stemming of comments (opinions).                     input (Mohri, Rostamizadeh and Talwalkar,
    Vectorization: Each comment in the                         2012). In this phase, the classifier receives a set
training    data    must       be     represented              of comments (the test data) and it evaluates this
mathematically.     There        are     different             input to predict the corresponding class.
mathematical models to represent information.                  3.2     Our Proposal
The most popular models are: boolean model,                    In the first evaluation of the machine learning
term frequency (TF), term frequency-inverse                    methods, the obtained accuracy results were
document frequency (TF-IDF) and Latent                         slightly lower. For this reason, we propose to
Semantic Analysis (LSA) (Codina, 2005). In                     use two techniques to improve the results of
this work, we decided to use the TF-IDF model                  classifiers. The first technique, called class
to represent the comments of the corpus                        reduction, removes one class label (NEU or
because it is more accurate and it has better                  NONE) with the aim of improving the margin
results than the other models, (Salton and                     of error of classifiers and reducing the number
McGill, 1986). In Figure 2 it is shown an                      of classes to evaluate. The second technique,
                                                               called naive voting, receives as input

                                                         89
                            Jorge Valverde Tohalino, Javier Tejada Cárcamo, Ernesto Cuadros


parameters the optimized classifiers of the first              Naive Voting, Weighted Voting, Maximun
technique. A more specific description of these                Choice Voting and F-Score/recall/precision
techniques will be explained below.                            Voting.
3.2.1 Class Reduction                                          We proposed the Naive Voting technique,
                                                               which has as input parameters the four
The basic idea of this technique was proposed
                                                               classifiers proposed in this paper. Naive Voting
in (Koppel and Schler, 2006). This technique is
                                                               is one of the simplest voting algorithms. In this
explained below.
                                                               technique, the comment is classified according
    Training and evaluation for three classes:
                                                               to the majority agreement, i.e., the class with
We decided to train the classifiers considering
                                                               most votes in each classifier will be the winning
three classes: Positive-Negative-Neutral and
                                                               class. The rules we applied for Naive Voting
Positive-Negative-None. The classifiers were
                                                               are described in Table 2.
trained and tested in this way. The new results
of classifiers using that simplification were
better. Due to the improvement, we decided to                    Rule                Class Labels     Voting
join the partial results of these classifiers. With                          P       N    NEU NONE
this union, we could classify the comments                          1        4       0      0       0   P
considering the four classes defined initially.                     2        3              0/1         P
    Union of partial results: We proposed to                        3        2              0/1         P
merge the partial results into single result. We                    4        0       4      0       0   N
established a set of rules to address the union of                  5       0/1      3          0/1     N
partial results of this class reduction technique,                  6       0/1      2          0/1     N
this rules are shown in Table 1.                                    7        0       0      4       0  NEU
                                                                    8       0/1      3          0/1    NEU
     Rule    Class Labels       Final Results                       9       0/1      2          0/1    NEU
      1       P       P               P                            10        0       0      0       4 NONE
      2       P      N              NEU                            11                0/1            3 NONE
      3       P     NONE           NONE                            12                0/1            2 NONE
      4       N       P             NEU                            13        2       2      0       0  P/N
      5       N      N                N                            14        2       0      2       0  NEU
      6       N     NONE           NONE                            15        2       0      0       2 NONE
      7      NEU      P             NEU                            16        0       2      2       0  NEU
      8      NEU     N              NEU                            17        0       2      0       2 NONE
      9      NEU NONE              NONE                            18        0       0      2       2 NONE
                                                                   19        2       0      2       0  NEU
                                                                   20        1       1      1       1  NEU
 Table 1: Rules for Class Reduction Technique

                                                                            Table 2: Naive Voting Rules
3.2.2   Voting System
Our final technique presented in this paper was
                                                                  Each row of the Table 2 shows the votes
Voting System. We choose this method because
                                                               obtained by each of the polarities (P-N-NEU-
all classifiers have a margin of error. Due to this
                                                               NONE) according to the output of the proposed
margin of error, classifiers could classify
                                                               classifiers. Due to we have 4 classifiers, the
incorrectly a comment. A voting system tries to
                                                               largest vote is 4 and the minimum is 0. Then,
reduce this margin of error. Voting systems are
                                                               the class with the highest vote will be the
based on different classification methods. Many
                                                               winning class. In the event of a tie, a set of rules
studies have used voting system to classify text.
                                                               were established to determine the winning
Kittler, Hatef and Matas (1998) and Kuncheva
                                                               class. For example, in the case of a draw at 2
(2004) describe some of these methods.
                                                               between positive and negative classes, a lottery
Rahman, Alam and Fairhurst (2002) show that
                                                               was established to determine the winning. In
in many cases the majority vote techniques are
                                                               other cases of a tie, it was chosen the NEU class
most efficient when classifiers are combined.
                                                               or NONE class as the winner.
Platie et al., (2009) and Tsutsumi, Shimada and
Endo (2008) ensure that he following methods
are the best voting systems for classification:

                                                          90
                           Comparing Supervised Learning Methods for Classifying Spanish Tweets


4        Experimental Results                                        Class reduction techniques improve results
                                                                 because they allow the classifier having to
4.1       Training and Test Data                                 decide between fewer options and then the
We used the corpora provided by the                              classifier could reduce its margin of error.
organization of TASS 2015. For our purposes,                         The voting algorithm gives good results
we used the General Corpus and the Balanced                      because it takes into account the decisions of all
General Corpus. The first one is composed of                     the classifiers. This algorithm tries to reach a
training and test set which contains 7219 and                    single decision that might be the best. A voting
60798 tweets, respectively. The Balanced                         algorithm is like a consensus between all
General Corpus is a test subset which contains                   classifiers. But it is important to take into
1000 tweets only for test.        A complete                     account that any voting algorithm is good as
description of these corpora is explained in                     long as the majority of voters (classifiers) are
(Villena Román et al., 2015).                                    good, otherwise, the voting algorithm will not
4.2       Evaluation of Classifiers                              have the expected results.
We performed a series of tests to address the                    Approaches              Methods      Accuracy
Task 1 of TASS 2015, focusing on finding the                                              SVM          0.594
global polarity of the Tweets corpora for 4 class                                          NB          0.560
labels    (P-N-NEU-NONE).         A      general                 Comparative
                                                                                           ME          0.479
description of the ''RUNs'' that we have made
                                                                                           DT          0.494
for TASS 2015 are described in Table 3.
                                                                                         SVM II        0.602
Tech.       Run-Id 60798    Run-Id        Description                                     NB II        0.560
                             1000                                    Proposal             ME II        0.600
 SVM        UCSP-RUN-2      UCSP-           Support                                       DT II        0.536
                            RUN-2           Vector                                       Voting        0.613
                                           Machine
    NB      TestNB60000      UCSP-        Naive Bayes
                            RUN-2-                                   Table 4: Results of evaluating the Full Test
                               NB
    ME       UCP-RUN-2-    TestME10      Max. Entropy
                                                                                       Corpus
                 ME            00
    DT       TestDT60000   TestDT100     Decision Tree
                               00
                                                                 Approaches              Methods      Accuracy
SVM II      UCSP-RUN-1       UCSP-        SVM + Class                                     SVM          0.586
                            RUN-1          Reduction                                       NB          0.559
 NB II       UCSP-RUN-       UCSP-         NB + Class
                                                                 Comparative
                                                                                           ME          0.618
               1-NB         RUN-1-         Reduction
                                                                                           DT          0.459
                               NB
ME II        UCSP-RUN-       UCSP-        ME + Class                                     SVM II        0.582
               1-ME         RUN-1-        Reduction                                       NB II        0.636
                              ME                                     Proposal             ME II        0.626
 DT II       UCSP-RUN-       UCSP-         DT + Class
                                                                                          DT II        0.495
               1-DT         RUN-1-         Reduction
                               DR                                                        Voting        0.626
Voting      UCSP-RUN-3       UCSP-        Naive Voting
                            RUN-3
                                                                     Table 5: Results of evaluating the 1k-Test
                                                                                      Corpus
            Table 3: Proposed Techniques

   The results we have gotten for the evaluation                 5     Conclusion
of our proposal are shown in Table 4
(Evaluation of full test corpus) and Table 5                     One of the main goals of this paper was to
(Evaluation of 1k test corpus). It can be seen                   evaluate some supervised classification
that class reduction techniques and our voting                   algorithms in the task of sentiment analysis.
algorithm improve the accuracy of the original                   The results of evaluating the classifiers in
supervised classification algorithms.                            initials experiments were not satisfactory.
                                                                 Using an optimization stage (class reduction
                                                                 and voting systems), accuracy improved

                                                           91
                             Jorge Valverde Tohalino, Javier Tejada Cárcamo, Ernesto Cuadros


slightly compared to the original techniques. It                    Mining Peruvian Facebook and Twitter.
could be shown that adequate voting algorithms                      Artificial Intelligence Driven Solutions to
improve the accuracy of classifiers. For proper                     Business and Engineering Problems.
operation of a voting system it is required to
                                                                Mohri, M., A. Rostamizadeh and A. Talwalkar.
have multiple classifiers with a relatively high
                                                                  2012. Foundations of Machine Learning.
rate of efficiency. If a classifier fails, the other
                                                                  The MIT Press.
could give the correct prediction. But if most of
classifiers give low results, then the voting                   Montejo-Raez, A., M.A. Garcia-Cumbreras
system does not ensure a correct performance.                     and M.C. Diaz-Galiano. 2014. SINAI
                                                                  Word2Vec participation in TASS 2014.
Acknowledgments                                                   Proceedings of the TASS workshop at
                                                                  SEPLN 2014.
   The research leading to the results has been
founded by Programa Nacional de Innovación                      Pang, B., L. Lee and S. Vaithyanathan. 2002.
para la Competitividad y Productividad                             Thumbs up?: sentiment classification using
(Innóvate Perú)                                                    machine learning techniques. Proceedings of
                                                                   the ACL-02 conference on Empirical
                                                                   methods in natural language processing, 10.
References                                                      Perea-Ortega, J. and A. Balahur. 2014.
Codina , L., 2005. Teoría de la recuperación de                    Experiments on feature replacements for
  información: modelos fundamentales y                             polarity classification of Spanish tweets.
  aplicaciones a la gestión documental.                            Proceedings of the TASS workshop at
  Revista      internacional     científica   y                    SEPLN 2014.
  profesional.                                                  Platie, M., M. Rouche, G. Dray and P. Poncelet.
Cui, H., V. Mittal and M. Datar. 2006.                             2009. Is a voting approach accurate for
  Comparative experiments on sentiment                             opinion mining? Centre de Recherche
  classification for online product reviews.                       LGI2P.
  Proceedings of the 21st national conference                   Rahman, A., H. Alam and M. Fairhurst. 2002.
  on      Artificial  intelligence,   Boston,                     Multiple Classifier Combination for
  Massachusetts.                                                  Character Recognition: Revisiting the
Hernandez, R. and Xiaoou Li. 2014. Sentiment                      Majority Voting System and Its Variation.
  analysis of texts in spanish based on                         Salton G. and McGill M. 1986. Introduction to
  semantic approaches with linguistic rules.                       modern information retrieval.
  Proceedings of the TASS workshop at
  SEPLN 2014.                                                   San Vicente, I. and X. Saralegi. 2014. Looking
                                                                   for Features for Supervised Tweet Polarity
Jimenez, S., E. Martinez, M. Valdivia and L.                       Classification. Proceedings of the TASS
   Lopez.     2014.    SINAI-ESMA:       An                        workshop at SEPLN 2014.
   unsupervised approach for Sentiment
   Analysis in Twitter. Proceedings of the                      Tsutsumi, K., K. Shimada and T. Endo. 2008.
   TASS workshop at SEPLN 2014.                                    Movie Review Classification Based on a
                                                                   Multiple Classifier. Department of Artificial
Kittler, J., M. Hatef and J.Matas. 1998. On                        Intelligence.
   combining classifiers. IEEE Transactions on
   Pattern Analysis and Machine Intelligence.                   Villena-Román, J., J. García-Morera, M. A.
                                                                   García-Cumbreras, E. Martínez-Cámara, M.
Koppel, M. and J. Schler. 2006. The                                T. Martín-Valdivia, and L. A. Ureña-López.
  Importance of Neutral Examples for                               2015. Overview of TASS 2015.
  Learning Sentiment. Dept. of Computer
  Science, Ramat Gan, Israel.                                   Ye, Q., Z. Zhang and R. Law. 2009. Sentiment
                                                                  classification of online reviews to travel
Kuncheva, A. L.I. 2004. Combining Pattern                         destinations by supervised machine learning
  Classifiers: Methods and Algorithms. Jhon                       approaches.      Expert    Systems     with
  Wiley and Sons.                                                 Applications, 36:6527-6535
Lopez, R., J. Tejada and M. Thelwall. 2012.
  Spanish Sentistrength as a Tool for Opinion

                                                           92