TASS 2015, septiembre 2015, pp 87-92 recibido 20-07-15 revisado 24-07-15 aceptado 29-07-15 Comparing Supervised Learning Methods for Classifying Spanish Tweets Comparación de Métodos de Aprendizaje Supervisado para la Clasificación de Tweets en Español Jorge Valverde, Javier Tejada and Ernesto Cuadros Universidad Católica San Pablo Quinta Vivanco S/N , Urb. Campiña Paisajista, Arequipa - Perú {andoni.valverde, jtejadac, ecuadros}@ucsp.edu.pe Resumen: El presente paper presenta un conjunto de experimentos para abordar la tarea de clasificación global de polaridad de tweets en español del TASS 2015. En este trabajo se hace una comparación entre los principales algoritmos de clasificación supervisados para el Análisis de Sentimientos: Support Vector Machines, Naive Bayes, Entropía Máxima y Árboles de Decisión. Se propone también mejorar el rendimiento de estos clasificadores utilizando una técnica de reducción de clases y luego un algoritmo de votación llamado Naive Voting. Los resultados muestran que nuestra propuesta supera los otros métodos de aprendizaje de máquina propuestos en este trabajo. Palabras clave: Análisis de Sentimientos, Métodos Supervisados, Tweets Españoles Abstract: This paper presents a set of experiments to address the global polarity classification task of Spanish Tweets of TASS 2015. In this work, we compare the main supervised classification algorithms for Sentiment Analysis: Support Vector Machines, Naive Bayes, Maximum Entropy and Decision Trees. We propose to improve the performance of these classifiers using a class reduction technique and then a voting algorithm called Naive Voting. Results show that our proposal outperforms the other machine learning methods proposed in this work. Keywords: Sentiment Analysis, Supervised Methods, Spanish Tweets methods to address the sentiment analysis 1 Introduction problem in Twitter. The vast majority of works aim to classify a comment, according to the Sentiment analysis is the computational study polarity expressed, in three categories: positive, of opinions about entities, events, people, etc. negative or neutral (Koppel and Schler, 2006). Opinions are important because they often are The supervised classification algorithms are the taken into account in decision process. most used methods to classify comments or Currently, people use different social networks opinions. to express their experiences with products or In this paper, we present a comparison of some commercial services. Twitter is one of the supervised learning methods which have biggest repositories of opinions and it is also achieved good results in other research works. used as a communication channel between Analyzing the errors of those methods, we companies and customers. The data generated propose to use a class reduction technique and a in Twitter is important for companies, because - voting algorithm (which take into account the - with that information -- they could know what results of supervised classifiers) to improve the is been saying about their products, services classification of opinions in Twitter. and competitors. In recent years, several The rest of the paper is organized as follows: researches of NLP have developed different Section 2 summarizes the main works in Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073 Jorge Valverde Tohalino, Javier Tejada Cárcamo, Ernesto Cuadros sentiment analysis. Section 3 describes our focused on different feature replacements. The proposal and in Section 4 we describe the replacements were mainly based on repeated results that we have gotten. Finally, in Section punctuation marks, emoticons and sentiment 5, the conclusions of this work are presented. words. The proposal of Hernandez and Li (2014) is based on semantic approaches with 2 Related Work linguistic rules for classifying polarity texts in Spanish. Montejo-Raez, Garcia-Cumbreras and There are two general approaches to classify Diaz-Galiano (2014) use supervised learning comments or opinions in positive, negative or with SVM over the sum of word vectors in a neutral: supervised and unsupervised model generated from the Spanish Wikipedia. algorithms. Supervised classification algorithms Jimenez et al., (2014) developed an are used in problems which are known a priori unsupervised classification system which uses the number of classes and representative an opinion lexicon and syntactic heuristic to members of each class. The unsupervised identify the scope of Spanish negation words. classification algorithms, unlike supervised San Vicente and Saralegi (2014) implement a classification, do not have a training set, and Support Vector Machine (SVM) algorithm. they use clustering algorithms to try to create That system combines the information extracted clusters or groups (Mohri, Rostamizadeh and from polarity lexicons with linguistic features. Talwalkar, 2012). For Peruvian Spanish opinions, Lopez, Tejada The sentiment classification task could be and Thelwall (2012) use a specialized formulated as a supervised learning problem dictionary with vocabulary of that country for with three classes: positive, negative and Facebook comments. Lopez, Tejada and neutral. The most used supervised techniques in Thelwall (2012) proposed one of the first sentiment analysis are Naive Bayes (NB), researches that analyze Peruvian opinions. In Support Vector Machines (SVM), Maximum that work, authors use a basic method based on Entropy, etc. In most cases, SVM have shown lexical resources to classify comments from great improvement over Naive Bayes. Facebook. Cui, Mittal, and Datar (2006) affirm that SVM are more appropriate for sentiment 3 Proposed Approach classification than generative models, because they can better differentiate mixed feelings. This paper has two major objectives: First, we However, when the training data is small, a make a comparison of some of the main Naive Bayes classifier could be more algorithms of supervised classification for appropriate. One of the earliest researches on Sentiment Analysis: Support Vector Machines, supervised algorithms which classify opinions Naive Bayes, Maximum Entropy and Decision is presented in (Pang, Lee, and Vaithyanathan, Trees. The second goal is to use a class 2002). In that work, authors use three machine reduction technique and then a voting algorithm learning techniques to classify the sentiment in to improve the accuracy of final results. The movies comments. They test several features to architecture of our system can be seen in Figure find the most optimal set of them. Unigrams, 1. bigrams, adjectives and position of words are used as features in those techniques. Ye, Zhang, 3.1 Comparison of Methods and Law (2009) used three supervised learning In this paper we compare some classification algorithms to classify comments: SVM, Naive methods in order to determine the performance Bayes and Decision Trees. They use the of these algorithms in a set of opinions written frequencies of words to represent a document. Most researches are focused for the English by Spanish users. For the experiments, we used the four supervised classifiers described language, since it is the predominant language previously. The comparison of methods has on the Internet. There are less works of the Training and Classification Phase. These sentiment analysis in Spanish opinions; phases will be explained below. however, Spanish is playing an important role. For Spanish comments, Perea-Ortega and 3.1.1 Training Balahur (2014) present several experiments to address the global polarity classification task of Spanish tweets. Those experiments have 88 Comparing Supervised Learning Methods for Classifying Spanish Tweets Figure 1: Proposed Approach For each supervised classification methods used in this work, we identified three steps in the training phase: comment preprocessing, vectorization and learning. Preprocessing: To make a correct comment preprocessing, we apply the following Figure 2: Vector Model Representation of Tweets techniques:  Elimination of symbols and special characters. example of the corpus of tweets and its TF-IDF  Elimination of articles, adverbs, representation. pronouns and prepositions (stopwords). Learning: In this step, the classification  Processing of hashtags. algorithm receives as parameters the  Correction of words with repeated representative vectors of comments with their letters. class labels. The class labels are: positive (P), negative (N), neutral (NEU) and none (NONE).  Filtration of words with ``@" symbol as initial letter. 3.1.2 Classification  Elimination of the characters ``RT". A classifier is a function that gives a discrete  URLs removal. output, often denoted as class, to a particular  Stemming of comments (opinions). input (Mohri, Rostamizadeh and Talwalkar, Vectorization: Each comment in the 2012). In this phase, the classifier receives a set training data must be represented of comments (the test data) and it evaluates this mathematically. There are different input to predict the corresponding class. mathematical models to represent information. 3.2 Our Proposal The most popular models are: boolean model, In the first evaluation of the machine learning term frequency (TF), term frequency-inverse methods, the obtained accuracy results were document frequency (TF-IDF) and Latent slightly lower. For this reason, we propose to Semantic Analysis (LSA) (Codina, 2005). In use two techniques to improve the results of this work, we decided to use the TF-IDF model classifiers. The first technique, called class to represent the comments of the corpus reduction, removes one class label (NEU or because it is more accurate and it has better NONE) with the aim of improving the margin results than the other models, (Salton and of error of classifiers and reducing the number McGill, 1986). In Figure 2 it is shown an of classes to evaluate. The second technique, called naive voting, receives as input 89 Jorge Valverde Tohalino, Javier Tejada Cárcamo, Ernesto Cuadros parameters the optimized classifiers of the first Naive Voting, Weighted Voting, Maximun technique. A more specific description of these Choice Voting and F-Score/recall/precision techniques will be explained below. Voting. 3.2.1 Class Reduction We proposed the Naive Voting technique, which has as input parameters the four The basic idea of this technique was proposed classifiers proposed in this paper. Naive Voting in (Koppel and Schler, 2006). This technique is is one of the simplest voting algorithms. In this explained below. technique, the comment is classified according Training and evaluation for three classes: to the majority agreement, i.e., the class with We decided to train the classifiers considering most votes in each classifier will be the winning three classes: Positive-Negative-Neutral and class. The rules we applied for Naive Voting Positive-Negative-None. The classifiers were are described in Table 2. trained and tested in this way. The new results of classifiers using that simplification were better. Due to the improvement, we decided to Rule Class Labels Voting join the partial results of these classifiers. With P N NEU NONE this union, we could classify the comments 1 4 0 0 0 P considering the four classes defined initially. 2 3 0/1 P Union of partial results: We proposed to 3 2 0/1 P merge the partial results into single result. We 4 0 4 0 0 N established a set of rules to address the union of 5 0/1 3 0/1 N partial results of this class reduction technique, 6 0/1 2 0/1 N this rules are shown in Table 1. 7 0 0 4 0 NEU 8 0/1 3 0/1 NEU Rule Class Labels Final Results 9 0/1 2 0/1 NEU 1 P P P 10 0 0 0 4 NONE 2 P N NEU 11 0/1 3 NONE 3 P NONE NONE 12 0/1 2 NONE 4 N P NEU 13 2 2 0 0 P/N 5 N N N 14 2 0 2 0 NEU 6 N NONE NONE 15 2 0 0 2 NONE 7 NEU P NEU 16 0 2 2 0 NEU 8 NEU N NEU 17 0 2 0 2 NONE 9 NEU NONE NONE 18 0 0 2 2 NONE 19 2 0 2 0 NEU 20 1 1 1 1 NEU Table 1: Rules for Class Reduction Technique Table 2: Naive Voting Rules 3.2.2 Voting System Our final technique presented in this paper was Each row of the Table 2 shows the votes Voting System. We choose this method because obtained by each of the polarities (P-N-NEU- all classifiers have a margin of error. Due to this NONE) according to the output of the proposed margin of error, classifiers could classify classifiers. Due to we have 4 classifiers, the incorrectly a comment. A voting system tries to largest vote is 4 and the minimum is 0. Then, reduce this margin of error. Voting systems are the class with the highest vote will be the based on different classification methods. Many winning class. In the event of a tie, a set of rules studies have used voting system to classify text. were established to determine the winning Kittler, Hatef and Matas (1998) and Kuncheva class. For example, in the case of a draw at 2 (2004) describe some of these methods. between positive and negative classes, a lottery Rahman, Alam and Fairhurst (2002) show that was established to determine the winning. In in many cases the majority vote techniques are other cases of a tie, it was chosen the NEU class most efficient when classifiers are combined. or NONE class as the winner. Platie et al., (2009) and Tsutsumi, Shimada and Endo (2008) ensure that he following methods are the best voting systems for classification: 90 Comparing Supervised Learning Methods for Classifying Spanish Tweets 4 Experimental Results Class reduction techniques improve results because they allow the classifier having to 4.1 Training and Test Data decide between fewer options and then the We used the corpora provided by the classifier could reduce its margin of error. organization of TASS 2015. For our purposes, The voting algorithm gives good results we used the General Corpus and the Balanced because it takes into account the decisions of all General Corpus. The first one is composed of the classifiers. This algorithm tries to reach a training and test set which contains 7219 and single decision that might be the best. A voting 60798 tweets, respectively. The Balanced algorithm is like a consensus between all General Corpus is a test subset which contains classifiers. But it is important to take into 1000 tweets only for test. A complete account that any voting algorithm is good as description of these corpora is explained in long as the majority of voters (classifiers) are (Villena Román et al., 2015). good, otherwise, the voting algorithm will not 4.2 Evaluation of Classifiers have the expected results. We performed a series of tests to address the Approaches Methods Accuracy Task 1 of TASS 2015, focusing on finding the SVM 0.594 global polarity of the Tweets corpora for 4 class NB 0.560 labels (P-N-NEU-NONE). A general Comparative ME 0.479 description of the ''RUNs'' that we have made DT 0.494 for TASS 2015 are described in Table 3. SVM II 0.602 Tech. Run-Id 60798 Run-Id Description NB II 0.560 1000 Proposal ME II 0.600 SVM UCSP-RUN-2 UCSP- Support DT II 0.536 RUN-2 Vector Voting 0.613 Machine NB TestNB60000 UCSP- Naive Bayes RUN-2- Table 4: Results of evaluating the Full Test NB ME UCP-RUN-2- TestME10 Max. Entropy Corpus ME 00 DT TestDT60000 TestDT100 Decision Tree 00 Approaches Methods Accuracy SVM II UCSP-RUN-1 UCSP- SVM + Class SVM 0.586 RUN-1 Reduction NB 0.559 NB II UCSP-RUN- UCSP- NB + Class Comparative ME 0.618 1-NB RUN-1- Reduction DT 0.459 NB ME II UCSP-RUN- UCSP- ME + Class SVM II 0.582 1-ME RUN-1- Reduction NB II 0.636 ME Proposal ME II 0.626 DT II UCSP-RUN- UCSP- DT + Class DT II 0.495 1-DT RUN-1- Reduction DR Voting 0.626 Voting UCSP-RUN-3 UCSP- Naive Voting RUN-3 Table 5: Results of evaluating the 1k-Test Corpus Table 3: Proposed Techniques The results we have gotten for the evaluation 5 Conclusion of our proposal are shown in Table 4 (Evaluation of full test corpus) and Table 5 One of the main goals of this paper was to (Evaluation of 1k test corpus). It can be seen evaluate some supervised classification that class reduction techniques and our voting algorithms in the task of sentiment analysis. algorithm improve the accuracy of the original The results of evaluating the classifiers in supervised classification algorithms. initials experiments were not satisfactory. Using an optimization stage (class reduction and voting systems), accuracy improved 91 Jorge Valverde Tohalino, Javier Tejada Cárcamo, Ernesto Cuadros slightly compared to the original techniques. It Mining Peruvian Facebook and Twitter. could be shown that adequate voting algorithms Artificial Intelligence Driven Solutions to improve the accuracy of classifiers. For proper Business and Engineering Problems. operation of a voting system it is required to Mohri, M., A. Rostamizadeh and A. Talwalkar. have multiple classifiers with a relatively high 2012. Foundations of Machine Learning. rate of efficiency. If a classifier fails, the other The MIT Press. could give the correct prediction. But if most of classifiers give low results, then the voting Montejo-Raez, A., M.A. Garcia-Cumbreras system does not ensure a correct performance. and M.C. Diaz-Galiano. 2014. SINAI Word2Vec participation in TASS 2014. Acknowledgments Proceedings of the TASS workshop at SEPLN 2014. The research leading to the results has been founded by Programa Nacional de Innovación Pang, B., L. Lee and S. Vaithyanathan. 2002. para la Competitividad y Productividad Thumbs up?: sentiment classification using (Innóvate Perú) machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing, 10. References Perea-Ortega, J. and A. Balahur. 2014. Codina , L., 2005. Teoría de la recuperación de Experiments on feature replacements for información: modelos fundamentales y polarity classification of Spanish tweets. aplicaciones a la gestión documental. Proceedings of the TASS workshop at Revista internacional científica y SEPLN 2014. profesional. Platie, M., M. Rouche, G. Dray and P. Poncelet. Cui, H., V. Mittal and M. Datar. 2006. 2009. Is a voting approach accurate for Comparative experiments on sentiment opinion mining? Centre de Recherche classification for online product reviews. LGI2P. Proceedings of the 21st national conference Rahman, A., H. Alam and M. Fairhurst. 2002. on Artificial intelligence, Boston, Multiple Classifier Combination for Massachusetts. Character Recognition: Revisiting the Hernandez, R. and Xiaoou Li. 2014. Sentiment Majority Voting System and Its Variation. analysis of texts in spanish based on Salton G. and McGill M. 1986. Introduction to semantic approaches with linguistic rules. modern information retrieval. Proceedings of the TASS workshop at SEPLN 2014. San Vicente, I. and X. Saralegi. 2014. Looking for Features for Supervised Tweet Polarity Jimenez, S., E. Martinez, M. Valdivia and L. Classification. Proceedings of the TASS Lopez. 2014. SINAI-ESMA: An workshop at SEPLN 2014. unsupervised approach for Sentiment Analysis in Twitter. Proceedings of the Tsutsumi, K., K. Shimada and T. Endo. 2008. TASS workshop at SEPLN 2014. Movie Review Classification Based on a Multiple Classifier. Department of Artificial Kittler, J., M. Hatef and J.Matas. 1998. On Intelligence. combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence. Villena-Román, J., J. García-Morera, M. A. García-Cumbreras, E. Martínez-Cámara, M. Koppel, M. and J. Schler. 2006. The T. Martín-Valdivia, and L. A. Ureña-López. Importance of Neutral Examples for 2015. Overview of TASS 2015. Learning Sentiment. Dept. of Computer Science, Ramat Gan, Israel. Ye, Q., Z. Zhang and R. Law. 2009. Sentiment classification of online reviews to travel Kuncheva, A. L.I. 2004. Combining Pattern destinations by supervised machine learning Classifiers: Methods and Algorithms. Jhon approaches. Expert Systems with Wiley and Sons. Applications, 36:6527-6535 Lopez, R., J. Tejada and M. Thelwall. 2012. Spanish Sentistrength as a Tool for Opinion 92