TASS 2015, septiembre 2015, pp 65-70                                            recibido 09-07-15 revisado 24-07-15 aceptado 29-07-15


                     Sentiment Analysis for Twitter: TASS 2015∗
                       Análisis de sentimientos en Twitter: TASS 2015

                Oscar S. Siordia                                                 Mario Graff
                Daniela Moctezuma                                         Sabino Miranda-Jimenez
                    CentroGEO                                                    Eric S. Tellez
             Col. Lomas de Padierna,                                     Elio-Atenógenes Villaseñor
                Delegación Tlalpan,                                               INFOTEC
              CP. 14240, México D.F.                                      Ave. San Fernando 37,
      osanchez;dmoctezuma@centrogeo.edu.mx                                Tlalpan, Toriello Guerra,
                                                                       14050 Ciudad de México, D.F.
                                                                         mario.graff;sabino.miranda;
                                                                 eric.tellez;elio.villasenor@infotec.com.mx

        Resumen: En este artı́culo se presentan los resultados obtenidos en la tarea 1:
        clasificación global de cinco niveles de polaridad para un conjunto de tweets en
        español, del reto TASS 2015. En nuestra metodologı́a, la representación de los
        tweets estuvo basada en caracterı́sticas lingüisticas y de polaridad como lematizado
        de palabras, filtros de palabras, reglas de negación, entre otros. Además, se uti-
        lizaron diferentes transformaciones como LDA, LSI y la matriz TF-IDF, todas estas
        representaciones se combinaron con el clasificador SVM. Los resultados muestran
        que LSI y la matriz TF-IDF mejoran el rendimiento del clasificador SVM utilizado.
        Palabras clave: Análisis de sentimiento, Minerı́a de opinión, Twitter
        Abstract: In this paper we present experiments for global polarity classification
        task of Spanish tweets for TASS 2015 challenge. In our methodology, tweets rep-
        resentation is focused on linguistic and polarity features such as lemmatized words,
        filter of content words, rules of negation, among others. In addition, different trans-
        formations are used (LDA, LSI, and TF-IDF) and combined with a SVM classifier.
        The results show that LSI and TF-IDF representations improve the performance of
        the SVM classifier applied.
        Keywords: Sentiment analysis, Opinion mining, Twitter.


1     Introduction                                                     ion or any level of each of them. Determin-
                                                                       ing whether a text document has a positive
In last years the production of textual doc-
                                                                       or negative opinion is turning to an essential
uments in social media has increased ex-
                                                                       tool for both public and private companies
ponentially. This ever-growing amount of
                                                                       (Peng, Zuo, y He, 2008). This tool is use-
available information promotes the research
                                                                       ful to know “What people think”, which is
and business activities around opinion min-
                                                                       an important information in order to help to
ing and sentiment analysis areas. In so-
                                                                       any decision-making process (for any level of
cial media, people share their opinions about
                                                                       government, marketing, etc.) (Pang y Lee,
events, other people and organizations. This
                                                                       2008). With this purpose, in this paper we
is the main reason why text mining is becom-
                                                                       describe the methodology employed for the
ing an important research topic. Automatic
                                                                       workshop TASS 2015 (Taller de Análisis de
sentiment analysis in text is one of most im-
                                                                       Sentimientos de la SEPLN). The TASS work-
portant task in text mining. The task of sen-
                                                                       shop is an event of SEPLN conference, which
timent classification determines if one docu-
                                                                       is a conference in Natural Language Process-
ment has positive, negative or neutral opin-
                                                                       ing for Spanish language. The purpose of
∗
 This work was partially supported by Cátedras                        TASS is to provide a discussion and a point of
CONACYT program                                                        sharing about latest research work in the field
Publicado en http://ceur-ws.org/Vol-1397/. CEUR-WS.org es una publicación en serie con ISSN reconocido               ISSN 1613-0073
           Oscar S. Siordia, Daniela Moctezuna, Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Elio-Atenógenes Villaseñor


of sentiment analysis in social media (specifi-                          results of classification for task 1 are 64.32%
cally Twitter in Spanish language). In TASS                              (six labels) and 70.89% (four labels). F1 (F-
workshop, several challenge tasks are pro-                               Measure) is 70.48% in task 2 and 90% in task
posed, and furthermore a benchmark dataset                               3.
is proposed to compare the algorithms and                                    Another method to sentiment extraction
systems of participants (for more details see                            and classification on unstructured text is pro-
(Villena-Román et al., 2015)).                                          posed in (Shahbaz, Guergachi, y ur Rehman,
    Several methodologies to classify tweets                             2014). Here, five labels were used to senti-
from Task 1, Sentiment Analysis at global                                ment classification: Strongly Positive, Posi-
level of TASS workshop 2015, are presented                               tive, Neutral, Negative and Strongly Nega-
in this work. This task is to perform an au-                             tive. The solution proposed combines tech-
tomatic sentiment classification to determine                            niques of Natural language processing at sen-
the global polarity (six polarity levels P, P+,                          tence level and algorithms of opinion mining.
NEU, N, N+ and NONE) of each tweet in the                                The accuracy results were 61% for five levels
provided dataset. With this purpose, several                             and 75% by reducing to three levels (Positive,
solutions have been proposed in this work.                               negative and neutral).
    The paper is organized as follows, a brief                               In (Antunes et al., 2011) an ensemble
overview of related works is shown in Section                            based on SVM and AIS (Artificial Immune
2, the proposed methodology is describe in                               Systems) is proposed. Here, the main idea
Section 3. Section 4 shows the experimen-                                is that SVM can be enhanced with AIS ap-
tal results and analysis, and finally, Section                           proaches which can capture dynamic mod-
5 concludes.                                                             els. Experiments were carried out with the
                                                                         Reuters-21578 benchmark dataset. The re-
2        Related work                                                    ported results show a 95.52% of F1.
Nowadays, several methods have been pro-                                     An approach of multi-label sentiment
posed in the community of opinion mining                                 classification is proposed in (Liu y Chen,
and sentiment analysis. Most of these works                              2015). This approach has three main com-
employ twitter as a principal input of data                              ponents: text segmentation, feature extrac-
and they aimed to classify entire documents                              tion and multi-label classification. The fea-
as overall positive or negative polarity levels                          tures used included raw segmented words
(sentiment) or rating scores (i.e. 1 to 5 stars).                        and sentiment features based on three sen-
    Such is a case of work presented in (da                              timent dictionaries: DUTSD, NTUSD and
Silva, Hruschka, y Jr., 2014), which proposes                            HD. Moreover, here, a detailed study of sev-
an approach to classify sentiment of tweets by                           eral multi-label classification methods is con-
using classifier ensembles and lexicons; where                           ducted, in total 11 state-of-the-art meth-
tweets are classified as positive or negative.                           ods have been considered: BR, CC, CLR,
As a result, this work concludes that classi-                            HOMER, RAkEL, ECC, MLkNN, and RF-
fier ensembles formed by several and diverse                             PCT, BRkNN, BRkNN-a and BRkNN-b.
components are promising for tweet senti-                                These methods were compared in two mi-
ment classification. Moreover, several state-                            croblog datasets and the reported results of
of-the-art techniques were compared in four                              all methods are around of 0.50 of F1.
databases. The best accuracy result reported                                 In summary, most of works analyzed clas-
was around 75%.                                                          sify the documents mainly in three polari-
    In (Lluı́s F. Hurtado, 2014) is described                            ties: positive, neutral and negative. More-
the participation of ELiRF research group in                             over, most of works use social media (mainly
TASS 2014 workshop (winners of TASS work-                                Twitter) as analyzed documents. In this
shop 2014). Here, the winner approaches                                  work, several methods to classify sentiment
used for four tasks are detailed. The pro-                               in tweets are described. These methods were
posed methodology uses SVM (Support Vec-                                 implemented, according with TASS work-
tor Machines) with 1-vs-all approach. More-                              shop specifications, with the purpose of clas-
over, Freeling (Padró y Stanilovsky, 2012)                              sify tweets in six polarity levels: P+, P, Neu-
was used as lemmatizer and Tweetmotif1 to                                tral, N+, N and None. The proposed method
tokenizer to Spanish language. The accuracy                              are based on several standard techniques as
                                                                         LDA (Latent Dirichlet Allocation), LSI (La-
    1
        http://tweetmotif.com/about                                      tent Semantic Indexing), TF-IDF matrix in
                                                                   66
                                   Sentiment Analysis for Twitter: TASS 2015


combination with the well-known SVM clas-                  tom):
sifier.
                                                                                 cs|xc → x
3     Proposed solution                                                              qu → k
In this section the proposed solution is de-                                    gue|ge → je
tailed. First, a preprocessing step was car-                                    gui|gi → ji
ried out, later a Pseudo-phonetic transforma-                                    sh|ch → x
tion was done and finally the generation of
Q-gram expansion was employed.                                                        ll → y
                                                                                      z→s
3.1    Preprocessing step                                                             h→
Preprocessing focuses on the task of find-                                     c[a|o|u] → k
ing a good representation for tweets. Since
                                                                                  c[e|i] → s
tweets are full of slang and misspellings, we
normalize the text using procedures such as                                           w→u
error correction, usage of special tags, part                                         v→b
of speech (POS) tagging, and negation pro-                                         ΨΨ → Ψ
cessing. Error correction consists on reduc-
                                                                               Ψ∆Ψ∆ → Ψ∆
ing words/tokens with invalid duplicate vow-
els and consonants to valid/standard Span-                    In our transformation notation, square
ish words (ruidoooo → ruido; jajajaaa → ja;                brackets do not consume symbols and Ψ, ∆
jijijji → ja). Error correction uses an ap-                means for any valid symbols. The idea is
proach based on a Spanish dictionary, statis-              not to produce a pure phonetic transforma-
tical model for common double letters, and                 tion as in Soundex (Donald, 1999) like al-
heuristic rules for common interjections. In               gorithms, but try to reduce the number of
the case of the usage of special tags, twitter’s           possible errors in the text. Notice that the
users (i.e., @user) and urls are removed us-               last two transformation rules are partially
ing regular expressions; in addition, we clas-             covered by the statistical modeling used for
sify 512 popular emoticons into four classes               correcting words (explained in preprocess-
(P, N, NEU, NONE), which are replaced by a                 ing step). Nonetheless, this pseudo-phonetic
polarity tag in the text, e.g., positive emoti-            transformation does not follow the statistical
cons such as :), :D are replaced by POS,                   rules of the previous preprocessing step.
and negative emoticons such as :(, :S are re-
placed by NEG. In the POS-tagging step, all                3.3      Q-gram expansion
words are tagged and lemmatized using the                  Along with the placing bag of words repre-
Freeling tool for Spanish language (Padró y               sentation (of the normalized text) we added
Stanilovsky, 2012), stop words are removed,                the 4 and 5 gram of characters of the nor-
and only content words (nouns, verbs, ad-                  malized text. Blank spaces were normalized
jetives, adverbs), interjections, hashtags, and            and taken into account to the q-gram expan-
polarity tags are used for data representation.            sion; so, some q-grams will be over more than
In negation step, Spanish negation markers                 one word. In addition of these previous steps,
are attached to the nearest content word, e.g.,            several transformations (LSI, LDA and TF-
‘no seguir’ is replaced by ‘no seguir’, ‘no es             IDF matrix) were conducted to generate sev-
bueno’ is replaced by ‘no bueno’, ‘sin comida’             eral data models for testing phase.
is replaced by ‘no comida’; we use a set of
heuristic rules for negations. Finally, all di-            4      Results and analysis
acritic and punctuation symbols are also re-               The classifier submitted to the competition
moved.                                                     was selected using the following procedure.
                                                           The 7, 218 tweets with 6 polarity levels were
3.2    Psudo-phonetic                                      split in two sets. Firstly, the tweets pro-
       transformation                                      vided were shuffled and then the first set,
With the purpose of reducing typos and                     hereafter the training set, was created with
slangs we applied a semi-phonetic transfor-                the first 6, 496 tweets (approximately 90% of
mation. First, we applied the following trans-             dataset), and, the second set, hereafter the
formations (with precedence from top to bot-               validation set, was composed by the rest 722
                                                      67
       Oscar S. Siordia, Daniela Moctezuna, Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Elio-Atenógenes Villaseñor


tweets (approximately 10% of dataset). The                               Table 1 complements the information pre-
training set was used to fit a Support Vector                        sented on Figure 1.
Machine (SVM) using a linear kernel2 with                                The table presents the score F1 per polar-
C = 1, weights inversely proportional to the                         ity and the average (Macro-F1) for different
class frequencies, and using one vs rest multi-                      configurations. The table is divided in five
class strategy. The validation set was used to                       blocks, the first and second correspond to a
select the best classifier using as performance                      SVM with LSI (400 topics) and TF-IDF, re-
the score F1.                                                        spectively. It is observed that TF-IDF out-
    The first step was to model the data us-                         performed LSI; within LSI and TF-IDF it
ing different transformations, namely Latent                         can be seen that 5-gram and 4-gram got the
Dirichlet Allocation (LDA) using an online                           best performance in LSI and TF-IDF, respec-
learning proposed by (Hoffman, Bach, y Blei,                         tively.
2010), Latent Semantic Indexing (LSI), and                               The third row block presents the perfor-
TF-IDF.3 Figure 1 presents the score F1, in                          mance when the features are a direct addition
the validation set, of a SVM using either LSI                        of LSI and TF-IDF; here it is observed that
or LDA with normalized text, different lev-                          the best performance is with 4-gram further-
els of Q-gram (4-gram and 5-gram), and the                           more it had the best overall performance in
number of topics is varied from 10 to 500 as                         N+. The forth row block complements the
well. It is observed that LSI outperformed                           previous results by presenting the best per-
LDA in all the configurations tested. Com-                           formance of LSI and TF-IDF, that is, LSI
paring the performance between normalized                            with 5-gram and TF-IDF with 4-gram. It
text, 4-gram, and 5-gram, it is observed an                          is observed that this configuration has the
equivalent performance. Given that the im-                           best overall performance in P+, N, None and
plemented LSI depends on the order of the                            average (Macro-F1). Finally, the last row
documents more experiments are needed to                             block gives an indicated of whether the pho-
know whether any particular configuration is                         netic transformation is making any improve-
statistically better than other. Even though                         ment. The conclusion is that the phonetic
the best configuration is LSI with 400 topics                        transformation is making a difference; how-
and 5-gram, this system is not competitive                           ever, more experiments are needed in order
enough compared with the performance pre-                            to know whether this difference is statistically
sented by the best algorithm in TASS 2014.                           significant.
                                                                         Based on the score F1 presented on Table
                                                                     1 the classifier submitted to the competition
                                                                     is a SVM with a direct addition of LSI using
                                                                     400 topics and 4-gram and LDA with 5-gram.
                                                                     This classifier is identified as INGEOTEC-
                                                                     M14 in the competition. The SVM, LSI and
                                                                     LDA were trained with the 7218 tweets and
                                                                     then this instance was used to predict the
                                                                     6 polarity levels of the competition tweets.
                                                                     This procedure was replicated for the 4 po-
                                                                     larity levels competition.
                                                                         Table 4 presents the accuracy, average re-
                                                                     call, precision, and F1 of INGEOTEC-M1
                                                                     run using the validation set created, a 10-
                                                                     fold crossvalidation on the 7218 tweets and
                                                                     the 1k tweets evaluated by the system’s com-
Figure 1: Performance in terms of the score                          petition. This performance was on the 5 po-
F1 on the validation set for different number                        larity levels challenge. It is observed from the
of topics using LSI and LDA with different                           table that the 10-fold crossvalidation gives a
Q-gram.                                                              much better estimation of the performance
  2                                                                      4
     The SVM was the class LinearSVC implemented                         We also submitted another classifier identified as
in (Pedregosa et al., 2011)                                          INGEOTEC-E1; however, the algorithm presented a
   3
     The implementations used for LDA, LSI, and TF-                  bug that could not be find out on time for the com-
IDF were provided by (Řehůřek y Sojka, 2010).                     petition.
                                                               68
                                     Sentiment Analysis for Twitter: TASS 2015


                            P        P+   N       N+    Neutral None Average
                                          SVM + LSI
               Text     0.238 0.549 0.403 0.348          0.025    0.492     0.343
             4-gram     0.246 0.543 0.404 0.333          0.048    0.533     0.351
             5-gram     0.246 0.552 0.462 0.356          0.000    0.575     0.365
                                       SVM + TF-IDF
               Text     0.271 0.574 0.414 0.407          0.103    0.511     0.380
             4-gram     0.290 0.577 0.477 0.393          0.130    0.589     0.409
             5-gram     0.302 0.577 0.476 0.379          0.040    0.586     0.393
                                   SVM + {LSI + TF-IDF}
             4-gram     0.297 0.578 0.471 0.421          0.142    0.578     0.415
             5-gram     0.307 0.567 0.474 0.391          0.040    0.579     0.393
                      SVM + {LSI with 4-gram + TF-IDF with 5-gram}
            4-5-gram 0.282 0.596 0.481 0.407             0.144   0.595      0.417
                   SVM + {LSI + TF-IDF without phonetic transformation}
            4-5-gram 0.324 0.577 0.459 0.395            0.150     0.593     0.416
Table 1: Score F1 per polarity level and average (Macro-F1) on the validation set for LSI (with
400 topics) and TF-IDF with different levels of Q-gram. The best performance in each polarity
is indicated in boldface.

of the classifier when tested on 1k tweets of                solution. This proposed solution uses a com-
the competition (90% of training and 10% of                  bination of LSI with 4-gram + TF-IDF with
validation).                                                 5-gram, and a SVM classifier (one-vs-one ap-
   In summary, in this work the best result                  proach).
reached was a 0.404 of F1. This result was
achieved with a combination of LSI with 4-                   Acknowledgements
gram + TF-IDF with 5-gram, using a SVM
                                                             This research is partially supported by the
classifier (one-vs-one approach).
                                                             Cátedras CONACyT project. Furthermore,
           Acc.    Recall       Precision     F1             the authors would like to thank CONACyT
  Val.     0.471   0.428         0.421       0.417           for supporting this work through the project
 10-fold   0.443   0.397         0.395       0.393           247356 (PN2014).
 Comp.     0.431   0.411         0.398       0.404
                                                             Bibliography
Table 2: Accuracy (Acc.), average recall, av-
erage precision and average F1 of the classi-                Antunes, Mário, Catarina Silva, Bernardete
fier in the validation set (Val.), using a 10-                 Ribeiro, y Manuel Correia. 2011. A hy-
fold cross-validation (7, 218 tweets), and as                  brid ais-svm ensemble approach for text
reported by the competition (comp.) on 1k                      classification. En Adaptive and Natu-
tweets.                                                        ral Computing Algorithms, volumen 6594
                                                               de Lecture Notes in Computer Science.
                                                               Springer Berlin Heidelberg, páginas 342–
5   Conclusions                                                352.
In this contribution, we presented the ap-
proach used to tackle the polarity classifica-               da Silva, Nádia F.F., Eduardo R. Hruschka,
tion task of Spanish tweets of TASS 2015.                       y Estevam R. Hruschka Jr. 2014. Tweet
From the results, it is observed that a combi-                  sentiment analysis with classifier ensem-
nation of different data models, in this case                   bles. Decision Support Systems, 66(0):170
LSI and TF-IDF, improves the performance                        – 179.
of a SVM classifier. It also noted that the                  Donald, E Knuth. 1999. The art of com-
phonetic transformation makes an improve-                      puter programming. Sorting and search-
ment; however, more experiments are needed                     ing, 3:426–458.
to know whether this improvement is statis-
tically significant. As a result, we obtained a              Hoffman, Matthew, Francis R Bach, y
0.404 of F1 (macro-F1) in sentiment classifi-                  David M Blei. 2010. Online learning for
cation task at five levels, with the proposed                  latent dirichlet allocation. En advances
                                                        69
       Oscar S. Siordia, Daniela Moctezuna, Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Elio-Atenógenes Villaseñor


   in neural information processing systems,                             Martı́nez-Cámara, M. Teresa Martı́n-
   páginas 856–864.                                                     Valdivia, y L. Alfonso Ureña-López. 2015.
                                                                         Overview of TASS 2015.
Liu, Shuhua Monica y Jiun-Hung Chen.
   2015. A multi-label classification based
   approach for sentiment classification. Ex-
   pert Systems with Applications, 42(3):1083
   – 1093.
Lluı́s F. Hurtado, Ferran Pla. 2014. Elirf-
   upv en TASS 2014: Análisis de sentimien-
   tos, detección de tópicos y análisis de sen-
   timientos de aspectos en twitter. Proc. of
   the TASS workshop at SEPLN 2014.
Padró, Lluı́s y Evgeny Stanilovsky. 2012.
  Freeling 3.0: Towards wider multilin-
  guality.     En Proceedings of the Lan-
  guage Resources and Evaluation Confer-
  ence (LREC 2012), Istanbul, Turkey,
  May. ELRA.
Pang, Bo y Lillian Lee. 2008. Opinion min-
  ing and sentiment analysis. Foundations
  and Trends in Information Retrieval, 2(1-
  2):1–135.
Pedregosa, F., G. Varoquaux, A. Gram-
  fort, V. Michel, B. Thirion, O. Grisel,
  M. Blondel, P. Prettenhofer, R. Weiss,
  V. Dubourg, J. Vanderplas, A. Passos,
  D. Cournapeau, M. Brucher, M. Perrot,
  y E. Duchesnay. 2011. Scikit-learn: Ma-
  chine learning in Python. Journal of Ma-
  chine Learning Research, 12:2825–2830.
Peng, Tao, Wanli Zuo, y Fengling He. 2008.
  Svm based adaptive learning method for
  text classification from positive and unla-
  beled documents. Knowledge and Infor-
  mation Systems, 16(3):281–301.
Řehůřek, Radim y Petr Sojka. 2010. Soft-
   ware Framework for Topic Modelling with
   Large Corpora. En Proceedings of the
   LREC 2010 Workshop on New Challenges
   for NLP Frameworks, páginas 45–50, Val-
   letta, Malta, Mayo. ELRA.
Shahbaz, M., A. Guergachi, y R.T.
  ur Rehman. 2014. Sentiment miner: A
  prototype for sentiment analysis of un-
  structured data and text. En Electri-
  cal and Computer Engineering (CCECE),
  2014 IEEE 27th Canadian Conference on,
  páginas 1–7, May.
Villena-Román, Julio, Janine Garcı́a-Morera,
   Miguel A. Garcı́a-Cumbreras, Eugenio
                                                               70