=Paper= {{Paper |id=Vol-1749/paper_035 |storemode=property |title=On the performance of B4MSA on SENTIPOLC'16 |pdfUrl=https://ceur-ws.org/Vol-1749/paper_035.pdf |volume=Vol-1749 |authors=Daniela Moctezuma,Eric S. Tellez,Mario Graff,Sabino Miranda–Jiménez |dblpUrl=https://dblp.org/rec/conf/clic-it/MoctezumaTGM16 }} ==On the performance of B4MSA on SENTIPOLC'16== https://ceur-ws.org/Vol-1749/paper_035.pdf
              On the performance of B4MSA on SENTIPOLC’16
               Daniela Moctezuma                           Eric S. Tellez
             CONACyT-CentroGEO                             Mario Graff
        Circuito Tecnopolo Norte No. 117,           Sabino Miranda-Jiménez
Col. Tecnopolo Pocitos II, C.P. 20313, Ags, México   CONACyT-INFOTEC
      dmoctezuma@centrogeo.edu.mx                     Circuito Tecnopolo Sur
                                                No 112, Fracc. Tecnopolo Pocitos II,
                                                       Ags, 20313, México.
                                                  eric.tellez@infotec.mx
                                                  mario.graff@infotec.mx
                                                sabino.miranda@infotec.mx

                   Abstract                           tenere diverse rappresentazioni del testo.
                                                      In questo caso essa applicata alla lin-
  This document describes the participation           gua italiana. I risultati qui presentati sono
  of the INGEOTEC team in SENTIPOLC                   due: le metriche della competizione uf-
  2016 contest. In this participation two             ficiale ed altre misure note della perfor-
  approaches are presented, B4MSA and                 mance, come macro F1 e micro F1.
  B4MSA + EvoDAG, tested in Task 1: Sub-
  jectivity classification and Task 2: Polarity
  classification. In case of polarity classifi-   1   Introduction
  cation, one constrained and unconstrained
  runs were conducted. In subjectivity clas-      Nowadays, the sentiment analysis task has become
  sification only a constrained run was done.     a problem of interest for governments, companies,
  In our methodology we explored a set of         and institutions due to the possibility of sensing
  techniques as lemmatization, stemming,          massively the mood of the people using social
  entity removal, character-based q-grams,        networks in order to take advantage in decision-
  word-based n-grams, among others, to            making process. This new way to know what are
  prepare different text representations, in      people thinking about something imposes chal-
  this case, applied to the Italian language.     lenges to the natural language processing and ma-
  The results show the official competition       chine learning areas, the first of all, is that peo-
  measures and other well-known perfor-           ple using social networks are kindly ignoring for-
  mance measures such as macro and micro          mal writing. For example, a typical Twitter user
  F1 scores.                                      do not follow formal writing rules and introduces
                                                  new lexical variations indiscriminately, the use of
  Italiano. Questo documento descrive             emoticons and the mix of languages is also the
  la partecipazione del team INGEOTEC             common lingo. These characteristics produce high
  alla competizione SENTIPOLC 2016. In            dimensional representations, where the curse of
  questo contributo sono presentati due ap-       dimension makes hard to learn from examples.
  procci, B4MSA e B4MSA + EvoDAG, ap-                There exists a number of strategies to cope with
  plicati al Task 1: Subjectivity classifica-     the sentiment analysis on Twitter messages, some
  tion e Task 2: Polarity classification. Nel     of them are based on the fact that the core problem
  caso della classificazione della polarit,       is fixed: we are looking for evidence of some sen-
  sono stati sottomessi un run constrained        timent in the text. Under this scheme a number of
  ed un run unconstrained. Per la clas-           dictionaries have been described by psychologists,
  sificazione della soggettivita, stato sot-      other resources like SentiWordNet have been cre-
  tomesso solo un run constrained. La nos-        ated adapting well known linguistic resources and
  tra metodologia esplora un insieme di tec-      machine learning. There is a lot of work around
  niche come lemmatizzazione, stemming,           this approach; however, all these knowledge is lan-
  rimozione di entit, q-grammi di caratteri,      guage dependent and must exists a deep under-
  n-grammi di parole, ed altri, al fine di ot-    standing of the language being analyzed. Our ap-
proach is mostly independent of this kind of ex-            At a glance, our goal is to find the best perform-
ternal resources while focus on tackling the mis-        ing normalization and tokenization pipelines. We
spellings and other common errors in the text.           state the modeling as a combinatorial optimiza-
   In this manuscript we detail our approach to          tion problem; then, given a performance measure,
sentiment analysis from a language agnostic per-         we try to find the best performing configuration
spective, e.g., no one in our team knows Italian         among a large parameter space.
language. We neither use external knowledge nor             The list of transformations and tokenizers are
specialized parsers. Our aim is to create a solid        listed below. All the text transformations consid-
baseline from a multilingual perspective, that can       ered are either simple to implement, or there is
be used as a real baseline for challenges like SEN-      an open-source library (e.g. (Bird et al., 2009;
TIPOLC’16 and as a basic initial approximation           Řehůřek and Sojka, 2010)) that implement it.
for sentiment analysis systems.
                                                         2.2   Set of Features
   The rest of the paper is organized in the follow-
ing sections. Section 2 describes our approach.          In order to find the best performing configura-
Section 3 describes our experimental results, and        tion, we used two sort of features that we consider
finally Section 4 concludes.                             them as parameters: cross-language and language-
                                                         dependent features.
2       Our participation                                Cross-language Features could be applied in
                                                         most similar languages and similar surface fea-
This participation is based on two approaches.
                                                         tures. Removing or keeping punctuation (ques-
First, B4MSA method, a simple approach which
                                                         tion marks, periods, etc.) and diacritics from the
starts by applying text-transformations to the
                                                         original source; applying or not applying the pro-
tweets, then transformed tweets are represented in
                                                         cesses of case sensitivity (text into lowercase) and
a vector space model, and finally, a Support Vector
                                                         symbol reduction (repeated symbols into one oc-
Machine (with linear kernel) is used as the classi-
                                                         currence of the symbol). Word-based n-grams (n-
fier. Second, B4MSA + EvoDAG, a combination
                                                         words) Feature are word sequences of words ac-
of this simple approach with a Genetic program-
                                                         cording to the window size defined. To compute
ming scheme.
                                                         the N-words, the text is tokenized and combined
2.1       Text modeling with B4MSA                       the tokens. For example, 1-words (unigrams) are
                                                         each word alone, and its 2-words (bigrams) set are
B4MSA is a system for multilingual polarity clas-        the sequences of two words, and so on (Juraf-
sification that can serve as a baseline as well as a     sky and Martin, 2009). Character-based q-grams
framework to build sophisticated sentiment analy-        (q-grams) are sequences of characters. For exam-
sis systems due to its simplicity. The source code       ple, 1-grams are the symbols alone, 3-grams are
of B4MSA can be downloaded freely1 .                     sequences of three symbols, generally, given text
   We used our previous work, B4MSA, to tackle           of size m characters, we obtain a set with at most
the SENTIPOLC challenge. Our approach learns             m − q + 1 elements (Navarro and Raffinot, 2002).
based on training examples, avoiding any digested        Finally, Emoticon (emo) feature consists in keep-
knowledge as dictionaries or ontologies. This            ing, removing, or grouping the emotions that ap-
scheme allows us to address the problem without          pear in the text; popular emoticons were hand clas-
caring about the particular language being tackled.      sified (positive, negative or neutral), included text
   The dataset is converted to a vector space using      emoticons and the set of unicode emoticons (Uni-
a standard procedure: the text is normalized, to-        code, 2016).
kenized and weighted. The weighting process is
fixed to be performed by TFIDF (Baeza-Yates and          Language Dependent Features. We considered
Ribeiro-Neto, 2011). After that process, a linear        three language dependent features: stopwords,
SVM (Support Vector Machines) is trained using           stemming, and negation. These processes are
10-fold cross-validation (Burges, 1998). At the          applied or not applied to the text. Stopwords
end, this classifier is applied to the test set to ob-   and stemming processes use data and the Snow-
tain the final prediction.                               ball Stemmer for Italian, respectively, from NLTK
                                                         Python package (Bird et al., 2009). Negation fea-
    1
        https://github.com/INGEOTEC/b4msa                ture markers could change the polarity of the mes-
sage. We used a set of language dependent rules        more than 10, 000, 000 tweets. From these tweets,
for common negation structures to attached the         we kept only those that were consistent with the
negation clue to the nearest word, similar to the      emoticon’s polarity used, e.g., the tweet only con-
approach used in (Sidorov et al., 2013).               tains consistently emoticons with positive polarity.
                                                       Then, the polarity of the whole tweet was set to the
2.3   Model Selection                                  polarity of the emoticons, and we only used pos-
The model selection, sometimes called hyper-           itive and negative polarities. Furthermore, we de-
parameter optimization, is the key of our ap-          cided to balance the set, and then we remove a lot
proach. The default search space of B4MSA con-         of positive tweets. At the end, this external dataset
tains more than 331 thousand configurations when       contains 4, 550, 000 tweets, half of them are posi-
limited to multilingual and language independent       tive and the another half are negative.
parameters; while the search space reaches close
                                                          Once this external dataset was created, we de-
to 4 million configurations when we add our three
                                                       cided to split it in batches of 50, 000 tweets half
language-dependent parameters. Depending on
                                                       of them positive and the other half negative. This
the size of the training set, each configuration
                                                       decision was taken in order to optimize the time
needs several minutes on a commodity server to
                                                       needed to train a SVM and also around this num-
be evaluated; thus, an exhaustive exploration of
                                                       ber the Macro F1 metric is closed to its maximum
the parameter space can be quite expensive that
                                                       value. That is, this number of tweets gives a good
makes the approach useless.
                                                       trade-off between time needed and classifier per-
   To reduce the selection time, we perform a
                                                       formance. In total there are 91 batches.
stochastic search with two algorithms, random
search and hill climbing. Firstly, we apply ran-          For each batch, we train a SVM at the end of
dom search (Bergstra and Bengio, 2012) that con-       this process we have 91 predictions (it is use the
sists on randomly sampling the parameter space         decision function). Besides these 91 predictions, it
and select the best configuration among the sam-       is also predicted (using as well the decision func-
ple. The second algorithm consists on a hill climb-    tion) each tweet with B4MSA. That is, at the end
ing (Burke et al., 2005; Battiti et al., 2008) im-     of this process we have 94 values for each tweet.
plemented with memory to avoid testing a config-       That is, we have a matrix with 7, 410 rows and
uration twice. The main idea behind hill climb-        94 columns for the training set and of 3, 000 rows
ing is to take a pivoting configuration (in our        and 94 columns for the test set. Moreover, for ma-
case we start using the best one found by random       trix of the training set, we also know the class for
search), explore the configuration’s neighborhood,     each row. It is important to note that all the val-
and greedily moving to the best neighbor. The pro-     ues of these matrix are predicted, for example, in
cess is repeated until no improvement is possible.     B4MSA case, we used a 10-fold cross-validation
The configuration neighborhood is defined as the       in the training set in order to have predicted values.
set of configurations such that these differ in just
one parameter’s value.                                    Clearly, at this point, the problem is how to
   Finally, the performance of the final configura-    make a final prediction; however, we had built
tion is obtained applying the above procedure and      a classification problem using the decision func-
cross-validation over the training data.               tions and the classes provided by the competition.
                                                       Thus, it is straight forward to tackle this classifica-
2.4   B4MSA + EvoDAG                                   tion problem using EvoDAG (Evolving Directed
In the polarity task besides submitting B4MSA          Acyclic Graph)2 (Graff et al., 2017) which is a
which is a constrained approach, we decided to         Genetic Programming classifier that uses seman-
generate an unconstrained submission by perform-       tic crossover operators based on orthogonal pro-
ing the following approach. The idea is to pro-        jections in the phenotype space. In a nutshell,
vide an additional dataset that it is automatically    EvoDAG was used to ensemble the outputs of the
label with positive and negative polarity using the    91 SVM trained with the dataset automatically la-
Distant Supervision approach (Snow et al., 2005;       beled and B4MSA’s decision functions.
Morgan et al., 2004).
   We start collecting tweets (using Twitter
                                                          2
stream) written in Italian. In total, we collect              https://github.com/mgraffg/EvoDAG
3   Results and Discussion                                 Finally, Table 3 presents the measures em-
                                                        ployed by our internal measurement, that is Macro
This Section presents the results of the IN-            F1 and Micro F1 (for more details see (Sebastiani,
GEOTEC team. In this participation we did two           2002)). These values are from polarity uncon-
runs, a constrained and an unconstrained run with       strained run (B4MSA + EvoDAG), polarity con-
B4MSA system, and only a constrained run with           strained run (B4MSA), subjectivity constrained
B4MSA + EvoDAG. The constrained run was con-            run (B4MSA) and irony classification (B4MSA).
ducted only with the dataset provided by SEN-           We do not participate in irony classification task
TIPOLC’16 competition. For more technical de-           but we want to show the obtained result from our
tails from the database and the competition in gen-     B4MSA approach on this task.
eral see (Barbieri et al., 2016).
   The unconstrained run was developed with an          4   Conclusions
additional dataset of 4, 550, 000 of tweets labeled
                                                        In this work we describe the INGEOTEC team
with Distant Supervision approach. The Distant
                                                        participation in SENTIPOLC’16 contest. Two ap-
Supervision is an extension of the paradigm used
                                                        proaches were used, first, B4MSA method which
in (Snow et al., 2005) and nearest to the use of
                                                        combine several text transformations to the tweets.
weakly labeled data in (Morgan et al., 2004). In
                                                        Secondly, B4MSA + EvoDAG, which combine the
this case, we consider the emoticons as key for
                                                        B4MSA method with a genetic programming ap-
automatic labeling. Hence, a tweet with a high
                                                        proach. In subjectivity classification task, the ob-
level of positive emoticons is labeled as positive
                                                        tained results place us in seventh of a total of 21
class and a tweet with a clear presence of negative
                                                        places. In polarity classification task, our results
emoticons is labeled as negative class. This give
                                                        place us 18 and 19 places of a total of 26. Since
us a bigger amount of samples for the dataset for
                                                        our approach is simple and easy to implement, we
training.
                                                        take these results important considering that we do
   For the constrained run we participate in two
                                                        not use affective lexicons or another complex lin-
task: subjectivity and polarity classification. In
                                                        guistic resource. Moreover, our B4MSA approach
the unconstrained run we only participate in polar-
                                                        was tested internally in irony classification task
ity classification task. Table 1 shows the results of
                                                        with a result of 0.4687 of macro f1, and 0.8825
subjectivity classification Task (B4MSA method),
                                                        of micro f1.
here, Prec0 is the P recision0 value, Rec0 is the
Recall0 value, FSc0 is F − Score0 value and
Prec1 , Rec1 and FSc1 the same for F − Score1           References
values and FScavg is the average value from all         Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto.
F-Scores. The explanation of evaluation measures          2011. Modern Information Retrieval. Addison-
can be seen in (Barbieri et al., 2016).                   Wesley, 2nd edition.
   Table 2, shows the results on the polarity clas-     Francesco Barbieri, Valerio Basile, Danilo Croce,
sification task. In this task our B4MSA method            Malvina Nissim, Nicole Novielli, and Viviana Patti.
achieves an average F-Score of 0.6054 and our             2016. Overview of the EVALITA 2016 SENTi-
combination of B4MSA + EvoDAG reaches an                  ment POLarity Classification Task. In Pierpaolo
                                                          Basile, Anna Corazza, Franco Cutugno, Simonetta
0.6075 of average F-Score. These results place us         Montemagni, Malvina Nissim, Viviana Patti, Gio-
on position 18 (unconstrained run) and 19 (con-           vanni Semeraro, and Rachele Sprugnoli, editors,
strained run) of a total of 26 entries.                   Proceedings of Third Italian Conference on Compu-
   It is important to mention that the difference be-     tational Linguistics (CLiC-it 2016) & Fifth Evalua-
                                                          tion Campaign of Natural Language Processing and
tween our two approaches is very small; however,          Speech Tools for Italian. Final Workshop (EVALITA
B4MSA + EvoDAG is computationally more ex-                2016). Associazione Italiana di Linguistica Com-
pensive, so we expected to have a considerable            putazionale (AILC).
improvement in performance. It is evident that          Roberto Battiti, Mauro Brunato, and Franco Mascia.
these results should be investigated further, and,        2008. Reactive search and intelligent optimization,
our first impression are that our Distant supervi-        volume 45. Springer Science & Business Media.
sion approach should be finely tune, that is, it is     James Bergstra and Yoshua Bengio. 2012. Random
needed to verify the polarity of the emoticons and        search for hyper-parameter optimization. Journal of
the complexity of the tweets.                             Machine Learning Research, 13(Feb):281–305.
                      Prec0     Rec0     FSc0     Prec1     Rec1     FSc1      FScavg
                       0.56     0.80     0.66      0.86     0.67     0.75       0.70

                              Table 1: Results on Subjectivity Classification

                             FScorepos FScoreneg Combined FScore
                                     Constrained run (B4MSA)
                              0.6414       0.5694         0.6054
                               Unconstrained run (B4MSA + EvoDAG)
                              0.5944       0.6205         0.6075

                                Table 2: Results on Polarity Classification


           Run              Macro F1     Micro F1       Radim Řehůřek and Petr Sojka. 2010. Software
  Polarity Unconstrained     0.5078       0.5395          Framework for Topic Modelling with Large Cor-
   Polarity Constrained      0.5075       0.5760          pora. In Proceedings of the LREC 2010 Workshop
 Subjectivity Constrained    0.7137       0.721
                                                          on New Challenges for NLP Frameworks, pages 45–
    Irony Constrained        0.4687       0.8825          50, Valletta, Malta, May. ELRA. http://is.
                                                          muni.cz/publication/884893/en.
Table 3: Micro F1 and Macro F1 results from our
approaches                                              Fabrizio Sebastiani. 2002. Machine learning in au-
                                                          tomated text categorization. ACM Comput. Surv.,
                                                          34(1):1–47, March.

Steven Bird, Ewan Klein, and Edward Loper.              Grigori Sidorov, Sabino Miranda-Jiménez, Francisco
   2009. Natural Language Processing with Python.         Viveros-Jiménez, Alexander Gelbukh, Noé Castro-
   O’Reilly Media.                                        Sánchez, Francisco Velásquez, Ismael Dı́az-Rangel,
                                                          Sergio Suárez-Guerra, Alejandro Treviño, and Juan
Christopher J.C. Burges. 1998. A tutorial on support      Gordon. 2013. Empirical study of machine learn-
  vector machines for pattern recognition. Data Min-      ing based approach for opinion mining in tweets. In
  ing and Knowledge Discovery, 2(2):121–167.              Proceedings of the 11th Mexican International Con-
                                                          ference on Advances in Artificial Intelligence - Vol-
Edmund K Burke, Graham Kendall, et al.         2005.      ume Part I, MICAI’12, pages 1–14, Berlin, Heidel-
  Search methodologies. Springer.                         berg. Springer-Verlag.

Mario Graff, Eric S. Tellez, Hugo Jair Escalante, and   Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005.
 Sabino Miranda-Jimnez. 2017. Semantic Genetic            Learning syntactic patterns for automatic hypernym
 Programming for Sentiment Analysis. In Oliver            discovery. In L. K. Saul, Y. Weiss, and L. Bottou,
 Schtze, Leonardo Trujillo, Pierrick Legrand, and         editors, Advances in Neural Information Processing
 Yazmin Maldonado, editors, NEO 2015, number              Systems 17, pages 1297–1304. MIT Press.
 663 in Studies in Computational Intelligence, pages    Unicode.   2016.    Unicode emoji chart.
 43–65. Springer International Publishing. DOI:           http://unicode.org/emoji/charts/
 10.1007/978-3-319-44003-3 2.                             full-emoji-list.html. Accessed 20-May-
                                                          2016.
Daniel Jurafsky and James H. Martin. 2009. Speech
  and Language Processing (2Nd Edition). Prentice-
  Hall, Inc., Upper Saddle River, NJ, USA.

Alexander A. Morgan, Lynette Hirschman, Marc
  Colosimo, Alexander S. Yeh, and Jeff B. Colombe.
  2004. Gene name identification and normaliza-
  tion using a model organism database. Journal of
  Biomedical Informatics, 37(6):396 – 410. Named
  Entity Recognition in Biomedicine.

G. Navarro and M. Raffinot. 2002. Flexible Pattern
  Matching in Strings – Practical on-line search al-
  gorithms for texts and biological sequences. Cam-
  bridge University Press. ISBN 0-521-81307-7. 280
  pages.