=Paper= {{Paper |id=Vol-1896/p10_c100tpucp_tass2017 |storemode=property |title=C100TPUCP at TASS 2017: Word Embedding Experiments for Aspect-Based Sentiment Analysis in Spanish Tweets |pdfUrl=https://ceur-ws.org/Vol-1896/p10_c100tpucp_tass2017.pdf |volume=Vol-1896 |authors=Franco Tume Fiestas,Marco A. Sobrevilla Cabezudo }} ==C100TPUCP at TASS 2017: Word Embedding Experiments for Aspect-Based Sentiment Analysis in Spanish Tweets== https://ceur-ws.org/Vol-1896/p10_c100tpucp_tass2017.pdf
                   TASS 2017: Workshop on Semantic Analysis at SEPLN, septiembre 2017, págs. 85-90




  C100TPUCP at TASS 2017: Word Embedding
Experiments for Aspect-Based Sentiment Analysis in
                 Spanish Tweets
   C100TPUCP en TASS 2017: Experimentos con Word
 Embeddings para Análisis de Sentimiento basado en Aspectos
                  sobre Tweets en Español
              Franco Tume Fiestas            Marco A. Sobrevilla Cabezudo
       Grupo de Reconocimiento de Patrones Grupo de Reconocimiento de Patrones
         e Inteligencia Artificial Aplicada   e Inteligencia Artificial Aplicada
        Pontificia Universidad Católica del Pontificia Universidad Católica del
                        Perú                                Perú
                    Lima, Perú                          Lima, Perú
                a20110060@pucp.pe                 msobrevilla@pucp.edu.pe

       Abstract: Aspect-Based Sentiment Analysis is in charge of study the opinion of
       people about different aspects from a certain entity. This task is challenging and
       highly relevant for the Natural Language Processing community. In this paper, we
       report the participation of C100T-PUCP team in the TASS 2017 for the second
       task about sentiment analysis. In this edition, we used word embeddings to get the
       similarity between words selected from a training set that had tweets about political
       parties from Spain and made a model to classify each polarity of each aspect for each
       tweet. The results showed that using more examples to training the model with this
       approach is more convenient. Moreover, the proposed approach avoids the problem
       of the classical methods that are oriented to a specific training data set.
       Keywords: Word Embeddings, Sentiment Analysis, Twitter
       Resumen: El Análisis de Sentimientos basado en Aspectos está encargado del es-
       tudio de las opiniones de las personas sobre diferentes aspectos de cierta entidad.
       Esta tarea es desafiante y muy importante para la comunidad de Procesamiento de
       Lenguaje Natural. En este trabajo se describe la participacion del equipo C100T-
       PUCP en el TASS 2017 para la segunda tarea sobre analisis de sentimientos. En esta
       edición, usamos word embeddings para conseguir la similitud entre diferentes pal-
       abras seleccionadas del conjunto de datos de entrenamiento, la cual contiene tweets
       sobre grupos polı́ticos de España y construimos un modelo para clasificar la polari-
       dad de los sentimientos expresados sobre cada aspecto en cada tweet. Los resultados
       indicaron que el utilizar mayor cantidad de ejemplos para entrenar el modelo con
       este método es conveniente. Además, el enfoque propuesto evita los problemas de
       metodos clásicos que están orientados a un set de datos de entranamiento especı́fico.
       Palabras clave: Word Embeddings, Análisis de Sentimiento, Twitter

1    Introduction                                               taken with many different approaches. One
The 6th edition of the TASS workshop con-                       of them consists in representing a word as a
sists of two task in sentiment analysis focus-                  vector and using it to get a similarity with
ing of Spanish tweets: (1) polarity classifica-                 other words. This method is called word em-
tion at global level and (2) aspect-based sen-                  beddings. Word Embeddings is a well-know
timent analysis in which the goal is to pre-                    technique to get a vector of a word in natural
dict the polarity of tweets in relation to a set                language processing. Although this method
of identified aspects (Martı́nez-Cámara et al.,                is widely used in English, there are few im-
2017).                                                          plementations of this approach for Spanish.
    The task of polarity classification has been                   In this sense, many studies use deep learn-
ISSN 1613-0073                     Copyright © 2017 by the paper's authors. Copying permitted for private and academic purposes.
                               Franco Tume Fiestas, Marco A. Sobrevilla Cabezudo



ing as a main approach to tackle sentiment                 list of words obtained from the training cor-
analysis so they can get a similarity between              pus sorting by TF-IDF. The results obtained
a bag of words that represent each sentiment               showed a F1-measure of 0.587.
(Alvarez-López et al., 2016). With this com-                  In TASS 2015, only the Aspect-based Sen-
parison we can get a vector features and use               timent Analysis Task was proposed, but a
it with classical machine learning algorithms.             new corpus was added in the evaluation
Our system uses this kind of approach to clas-             (Villena-Román et al., 2015). This corpus
sify the sentiment of each aspect of each tweet            was composed by tweets related to Politics
presented in the task.                                     in Spanish, called STOMPOL. In this edi-
    This paper summarizes the participation                tion, a method similar to that presented
of the C100T-PUCP team from Pontificia                     in TASS 2014 was proposed in (Hurtado,
Universidad Católica del Perú in the second              Pla, and Buscaldi, 2015). This method in-
task of the workshop. In this edition, we                  cluded an additional dictionary and SVM al-
propose a word embedding-based approach                    gorithm. This method showed an accuracy
to tackle the problem of aspect-level polarity             of 65.50% in Social-TV corpus and 63.3% in
classification. Firstly, we obtain a word em-              STOMPOL corpus and the F-measure was
beddings set from politics corpus to get sim-              not shown. Another method was presented
ilarity between tweets. Then, we explored                  which used a set of lexical and morphosyn-
a feature selection method for unbalanced                  tactic features in a supervised learning al-
data. Finally, we built experiments with                   gorithm (Araque et al., 2015). The way to
some classifiers using word embeddings and                 tackle the problem was divided into three
the obtained features.                                     steps: (1) identifying entities, (2) getting the
    This paper is organized as follows: an                 context (using a graph-based algorithm) and
overview of related works is shown in Section              (3) executing the supervised learning algo-
2. Section 3 presents the analyzed corpus and              rithm. Their method obtained an accuracy of
its class distribution. The system description             63.5% and a F-measure of 0.606 in Social-TV
is described in Section 4. Section 5 shows the             corpus. Finally, The third work proposed a
experimentation and results, and finally, Sec-             deep learning-based approach (Vilares et al.,
tion 6 presents some conclusions and future                2015). These authors used a LSTM Neural
works.                                                     Network to tackle the problem of polarity de-
                                                           tection. Their method obtained an accuracy
2   Related Work                                           of 61.00% in Social-TV corpus and 59.9% in
In TASS 2014, an Aspect Detection and                      STOMPOL corpus. The F-measure was not
Aspect-based Sentiment Analysis Task were                  shown in this work.
proposed (Román et al., 2015). The cor-                       In TASS 2016, two proposals were sub-
pus was composed by tweets related to the                  mitted to the Aspect-based Sentiment Anal-
final game of the “Copa del Rey” in Span-                  ysis task (Villena-Román et al., 2016) on
ish called Social-TV. In general, there were               STOMPOL corpus. The first one applied
two works submitted to these two tasks. The                a supervised algorithm using features as As-
first one proposed a method to detect as-                  pect, Lemma, POS-Tag, Negation and Word
pects based on the match of a tweet con-                   Tokens from the training corpus (Alvarez-
tent with a pre-specified set of features re-              López et al., 2016). The result obtained was
lated to the football domain (Vilares et al.,              a F1-measure of 0.463. Finally, the other
2014). This method obtained a F1-measure                   method proposed an experimentation of dif-
of 0.854. To identify the polarity on each as-             ferent supervised algorithms using the same
pect, the authors used a supervised method                 features as TASS 2015 (Hurtado and Pla,
with syntactic-based features. The method                  2016). Their best method obtained a F1-
obtained a F1-measure of 0.546. The sec-                   measure of 0.526.
ond one proposed an aspect detection method
based on a list of features and a set of regular           3     STOMPOL Corpus
expressions (Hurtado and Pla, 2014). This                  The STOMPOL corpus is composed of tweets
method obtained a F1-measure of 0.909. In                  in Spanish about Spanish elections of 2015.
the polarity detection, the authors proposed               This corpus was presented in TASS 2014
a supervised method which used as features                 (Román et al., 2015). Each tweet is re-
a list of positive and negative terms and a                lated to one of the following aspects: Eco-
                                                      86
        C100TPUCP at TASS 2017: Word Embedding Experiments for Aspect-Based Sentiment Analysis in Spanish Tweets



nomics, Health System, Education, Political                     tences selected by the model were from the
party and other aspects. Also, each aspect                      same domain of the corpus, i. e., politics.
is related to one of these sentiments: posi-                       Tweets were selected using search queries
tive, negative and neutral. The distribution                    related to political parties from South Amer-
of each aspect in the training data and the                     ica and Spain in a range from 2012 to 2015.
distribution for each sentiment per aspect is                   Even though many tweets were selected for
shown in Table 1. As shown, this dataset                        this approach much more data was needed,
presents unbalanced classes and aspects. For                    In that sense, online news websites were also
example, there are too many samples about                       scrapped. Specifically, we obtained texts
the Political Party aspect and also about neg-                  from ”El paı́s”1 , ”ABC” 2 and ”20 Minutos”3
ative sentiment.                                                Spanish newspapers. From these sites only
                                                                political news were selected and no range in
4     System Description                                        time was used.
The system presented in this edition of the                        After this, we got 400MB of data which in-
TASS uses a preprocessing removing stop-                        cluded about 5 million sentences, 30 millions
words using NLTK tool. Also, words like                         words and 1.5 million unique words. The
URL’s and special characters are removed                        corpus was preprocessed following the next
from the tweets. After this, all the words                      steps:
are passed through Freeling lemmatizer, ver-                       • removing stopwords and special charac-
sion 4.0. Furthermore, hashtags and labels                           ters that are common in tweets as ”...”
to users are kept in this processing. As using                       and URLs
this preprocessing the tokenization for all the
tweets is completed.                                               • removing words that have only one char-
   The Aspect-based Sentiment Analysis                               acter
was tackled as a classification problem. For
                                                                   • removing numbers
this, support vector machines (SVM) and
adaptive boosting (AdaBoost) classifiers                           • lemmatizing words using Freeling4 ,
were used because of the precedence in pre-                          keeping all mentions to users and hash-
vious works that showed that they behave                             tags
well in classifying long vector features. For
these models, scikit-learn implementations                          Additionally, mentions to particular enti-
are used from the toolkit.         These, are                   ties in news data were replaced with their
sklearn.ensemble.AdaBoostClassifier for the                     user ids from Twitter. This was to make
adaptive boosting and sklearn.svm.SVC for                       the news data the most similar to the twit-
the SVM implementation with a polynomial                        ter data so the embeddings get the same re-
kernel.                                                         lations between words. For example, if we
                                                                have a new that mentions Pablo Iglesias we
   Also each vector was filled with the cosine                  replace it with his Twitterś id Pablo Iglesias
similarity between each feature and the top                     and we do the same with other common po-
50 most important words for each sentiment                      litical persons.
using a probabilistic appearance metric (Liu,                       To create the word embeddings from
Loh, and Sun, 2009) to give context to the                      the corpus, we used the Word2vec
aspect. This similarity was calculated using                    model(Mikolov et al., 2013). This model use
the vector representation of the words. For                     two type of neural network to generate the
this, Mikolov Word2Vec model was used.                          embeddings. In this case, Skip-gram model
Finally, each model was verified using cross                    was used because it is recommended using
fold validation with 10 iterations and using                    it when the training data for the model is
a learning curve to verify that our models                      small. After this, we tested different values
are not over-fitted.                                            for the model parameters. The hyperpa-
                                                                rameters that were tested were: minimum
                                                                word count, context window and size of the
4.1    Word Embeddings Generation                                  1
                                                                     Available in https://elpais.com/
A word embeddings model was created us-                            2
                                                                     Available in http://www.abc.es/
ing Mikolov Word2Vec model (Mikolov et al.,                        3
                                                                     Available in http://www.20minutos.es/
                                                                   4
2013) of GenSim implementation. The sen-                             Available in http://nlp.lsi.upc.edu/freeling/
                                                          87
                               Franco Tume Fiestas, Marco A. Sobrevilla Cabezudo



                    Aspects             Tweets        Negative       Neutral       Positive
                    Economics           117           78             20            19
                    Health              21            7              10            4
                    Education           30            16             12            2
                    Political Party     777           431            176           170
                    Others              98            54             22            22
                    Total               1043          586            240           217
                            Table 1: STOMPOL Corpus distribution

vector. The values were selected based on                  two ways, one was only using the traditional
experiments and those selected are in Table                approach, i. e., the vector a bag-of-words and
2.                                                         filling the vector with 1 if the word feature oc-
       Hyperparameter             Value                    curs in the window and the other way is to
                                                           fill the feature with the most similarity value
       Minimum Word Count         100                      to the feature. Then each vector was used to
       Vector Size                300                      train a SVM and AdaBoost models. In gen-
       Context Window             6                        eral, we sent three runs which are shown as
      Table 2: Hyperparameter Values                       below:

                                                               • Run 1: The first run consisted of using
4.2   Aspect-based Sentiment                                     vector filled with the cosine similarity
      Analysis                                                   between each word and the words in the
For this task, a window of three words were                      dictionary of polarities. This vector was
selected from the previously identified aspect                   used to train a SVM model with gamma
in the training corpus to extract the features.                  value of 0.31622776601683794, C 1 and
Each tweet in the training corpus was also                       degree 2.
preprocessed in the same way as the word em-                   • Run 2: The second run was using the
beddings model, removing stopwords, special                      same vector were the hyper parame-
characters and URLs. After this, the cor-                        ters were: gamma was 0.0316, C was
pus was lemmatized keeping user tags and                         1122.018 and degree was 2.
hashtags. The analysis was based on the co-
sine similarity of these words with the words                  • Run 3: The last run was using Ad-
selected as a feature for detecting the senti-                   aBoost Classifier with a modified vec-
ment.                                                            tor that have the most representative
   In this step, firstly, we needed to create a                  word of each sentiment but for each as-
dictionary with words that represent in a bet-                   pect. Also this vector has one hot encod-
ter way each sentiment that was classified us-                   ing based on each aspect. This model
ing the labels for the sentiments in the train-                  was trained with a Naive Bayes classi-
ing data. Thus, we used two metrics, TF-                         fier as the weak classifier. The AdaBoost
IDF and probabilistic occurrence (Liu, Loh,                      model was created with the following hy-
and Sun, 2009), to see how relevant is a word                    perparemeters selected by experimenta-
for a sentiment so these words could be used                     tion: learning rate was 0.000001, number
as a dictionary for each sentiment. Thus,                        of estimators was 100.
there were more words to compare the se-
lected words in the context of the detected                5     Results and Discussions
aspect that represents the sentiment. Also,                For the experiments a SVM classifier and an
these metrics were applied for a new dictio-               AdaBoost classifier were tested using each
nary but based for each aspect relating them               set of features prepared using the dictionar-
with each sentiments, so, a more versatile fea-            ies previously created. In order to test these
ture set was extracted.                                    approaches F1-score (F1), Precision (P) and
   After selecting the features, two vectors               (R) recall were used. These metrics evaluate
were created: (1) based on probabilistic oc-               how well the model predicts based on how
currence and (2) based on probabilistic oc-                many true-positive it predicts and the inverse
currence per aspect. Each vector was filled in             how many false-positive it predicts.
                                                      88
        C100TPUCP at TASS 2017: Word Embedding Experiments for Aspect-Based Sentiment Analysis in Spanish Tweets



   In the other hand, accuracy just measured                    6     Conclusions and Future Works
how many hits it does in the prediction. Fur-                   In the 6th edition of the TASS 2017, we have
thermore, to test the model to detect the sen-                  tried a word embedding approach to classify
timent polarity the TASS experiment page                        sentiment of an aspect. With this approach
was used. This page measure a macro F1-                         we generate a vector feature of how similar
score, recall, precision and accuracy based on                  each selected word is to the features in the
the combination of how well it predicts each                    vector. The performance of the models has
aspect in an specific entity.                                   been compared with the models of the other
    The results for the polarity classification                 participants of the TASS.
are presented in Table 3. In this case a SVM                        Although the model was not trained with
classifier and an AdaBoost classifier using a                   a big amount of data that it was required, this
Naive Bayes model were tested. Also, as                         got the similarity for most of the words in the
discussed earlier, the data is not well bal-                    training set. Also, the results were close to
anced, so, for these methods a SMOTE over-                      the other participants. The obtained results
sampling was applied. For these results, the                    may give us an idea about how well word em-
SVM-run2 model was created by using the                         beddings could perform in sentiment analysis
features using probabilistic weights of the                     due to word embeddings are not adjusted to a
words. In this sense, this model predicts the                   specific set of words like traditional methods
sentiment based on the similarity of the word                   (using Bag-Of-Words).
selected as the sentiment representatives with                      With these results, we propose to test this
each word that represent a feature in the vec-                  approach using a more balanced data for as-
tor, keeping the best score of these words.                     pects and sentiments for training. Also, get-
This vector was also used for the other mod-                    ting more sentences to train the word2vec
els except for the ADA-run3 model.                              model could improve the differentiation of the
                                                                words that are used in the features. Using
   The SVM-run1 considers that every word
                                                                this improvement could get better results be-
may have different meanings for each context
                                                                cause as we saw the models can learn to pre-
so it uses a vector that has words that repre-
                                                                dict a class very well if they have a lot of
sents all the sentiments for each aspect and
                                                                information about it.
a feature that represents from which element
the tweet came. This model has better re-                       References
sults in comparison to its similar SVM that
only use top words in all the training set and                  Alvarez-López, T., M. F. Gavilanes,
have a better accuracy compared with the                           S. Garcı́a-Méndez, J. Juncal-Martı́nez,
ADA-run3 but not a better F1-Score.                                and F. J. González-Castano. 2016. Gti at
                                                                   TASS 2016: Supervised approach for as-
                                                                   pect based sentiment analysis in Twitter.
 Execution        P         R        F1         Ac.                In Proceedings of TASS 2016: Workshop
 run1           0.418     0.413     0.415      0.563               on Semantic Analysis at SEPLN (TASS
 run2           0.412     0.426     0.416      0.517               2016), pages 53–57.
 run3           0.452     0.438     0.445      0.528
                                                                Araque, O., I. Corcuera, C. Román, C. A.
Table 3: Results of the different experiments                     Iglesias, and J. F. Sánchez-Rada. 2015.
                                                                  Aspect based sentiment analysis of Span-
                                                                  ish tweets. In Proceedings of TASS 2015:
   As seen, the model was not trained with a                      Workshop on Semantic Analysis at SE-
big amount of data but it still had almost                        PLN (TASS 2015), pages 29–34.
all the words inside it’s vocabulary. This
                                                                Hurtado, L.-F. and F. Pla. 2014. Elirf-upv en
pointed us that the errors for the sentiment
                                                                  TASS 2014: Análisis de sentimientos, de-
polarity were caused probably by the unbal-
                                                                  tección de tópicos y análisis de sentimien-
ance data for each sentiment per aspect. The
                                                                  tos de aspectos en Twitter. Procesamiento
sentiments were in it’s majority negative and
                                                                  del Lenguaje Natural.
few were neutral or positive. Although, us-
ing this approach gave us results that were                     Hurtado, L.-F. and F. Pla. 2016. Elirf-upv
close to other participants that uses domain                      en TASS 2016: Análisis de sentimientos
specific systems.                                                 en Twitter. In Proceedings of TASS 2016:
                                                          89
                             Franco Tume Fiestas, Marco A. Sobrevilla Cabezudo



  Workshop on Semantic Analysis at SE-                       volume 1702 of CEUR Workshop Proceed-
  PLN (TASS 2016), pages 47–51.                              ings. CEUR-WS.org.
Hurtado, L.-F., F. Pla, and D. Buscaldi.                 Villena-Román, J., J. Garcı́a-Morera,
  2015. Elirf-upv en TASS 2015: Análisis de                M. A. G. Cumbreras, E. Martı́nez-
  sentimientos en Twitter. In Proceedings of                Cámara, M. T. Martı́n-Valdivia, and
  TASS 2015: Workshop on Semantic Anal-                     L. A. U. López. 2015. Overview of
  ysis at SEPLN (TASS 2015), pages 75–79.                   TASS 2015. In Proceedings of TASS
Liu, Y., H. T. Loh, and A. Sun. 2009. Imbal-                2015: Workshop on Semantic Analysis
   anced text classification: A term weight-                at SEPLN (TASS 2015), volume 1397
   ing approach. Expert systems with Appli-                 of CEUR Workshop Proceedings, pages
   cations, 36(1):690–701.                                  13–21. CEUR-WS.org.

Martı́nez-Cámara, E., M. C. Dı́az-Galiano,
  M. A. Garcı́a-Cumbreras, M. Garcı́a-
  Vega, and J. Villena-Román.        2017.
  Overview of tass 2017.        In J. Vil-
  lena Román, M. A. Garcı́a Cumbreras,
  D. G. M. C. Martı́nez-Cámara, Eugenio,
  and M. Garcı́a Vega, editors, Proceedings
  of TASS 2017: Workshop on Semantic
  Analysis at SEPLN (TASS 2017), vol-
  ume 1896 of CEUR Workshop Proceed-
  ings, Murcia, Spain, September. CEUR-
  WS.
Mikolov, T., K. Chen, G. Corrado, and
  J. Dean.    2013.    Efficient estimation
  of word representations in vector space.
  arXiv preprint arXiv:1301.3781.
Román, J. V., E. M. Cámara, J. G. Morera,
  and S. M. J. Zafra. 2015. TASS 2014-the
  challenge of aspect-based sentiment anal-
  ysis. Procesamiento del Lenguaje Natural,
  54:61–68.
Vilares, D., Y. Doval, M. A. Alonso, and
   C. Gómez-Rodrıguez. 2014. Lys at TASS
   2014: A prototype for extracting and
   analysing aspects from Spanish tweets.
   In Proceedings of TASS 2014: Workshop
   on Semantic Analysis at SEPLN (TASS
   2014).
Vilares, D., Y. Doval, M. A. Alonso, and
   C. Gómez-Rodrı́guez. 2015. Lys at TASS
   2015: Deep learning experiments for senti-
   ment analysis on Spanish tweets. In Pro-
   ceedings of TASS 2015: Workshop on Se-
   mantic Analysis at SEPLN (TASS 2015),
   pages 47–52.
Villena-Román, J., M. Á. G. Cumbreras,
   E. M. Cámara, M. C. Dı́az-Galiano, M. T.
   Martı́n-Valdivia, and L. A. U. López.
   2016. Overview of TASS 2016. In Pro-
   ceedings of TASS 2016: Workshop on Se-
   mantic Analysis at SEPLN (TASS 2016),
                                                    90