=Paper=
{{Paper
|id=Vol-2765/111
|storemode=property
|title=SentNA @ ATE_ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2765/paper111.pdf
|volume=Vol-2765
|authors=Francesco Mele,Antonio Sorgente,Giuseppe Vettigli
|dblpUrl=https://dblp.org/rec/conf/evalita/MeleSV20
}}
==SentNA @ ATE_ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features (short paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-2765/paper111.pdf</pdf>
<pre>
SentNA @ ATE ABSITA: Sentiment Analysis of Customer Reviews Using
       Boosted Trees with Lexical and Lexicon-based Features
         Francesco Mele               Antonio Sorgente              Giuseppe Vettigli
  Institute of Applied Sciences Institute of Applied Sciences          Centrica plc,
     and Intelligent Systems       and Intelligent Systems    Institute of Applied Sciences
   National Research Council     National Research Council and Intelligent Systems (CNR)
       f.mele@isasi.cnr.it                    a.sorgente@isasi.cnr.it giuseppe.vettigli@centrica.com


                        Abstract                                        combina un approcio basato su regole con
                                                                        tecniche Machine Learning. Il modello
    English. This paper describes our sub-
                                                                        sviluppato per il Task 1 è solo in fase pre-
    mission to the tasks on Sentiment Analysis
                                                                        liminare.
    of ATE ABSITA (Aspect Term Extraction
    and Aspect-Based Sentiment Analysis). In
    particular, we focused on Task 3 using an                   1       Introduction
    approach based on combining frequency
                                                                User feedback has become essential for compa-
    of words with lexicon-based polarities and
                                                                nies to improve their services and products. Nowa-
    uses Boosted Trees to predict the senti-
                                                                days, we can find user feedback in textual form as
    ment score. This approach achieved a
                                                                online reviews, posts on social media and so on.
    competitive error and, thanks to the inter-
                                                                These resources can express overall opinions but
    pretability of the building blocks, allows
                                                                also opinions about some specific details (aspects)
    us to show the what elements are consid-
                                                                of the subject. In this scenario, the tools provided
    ered when making the prediction. We also
                                                                by Sentiment Analysis are crucial to process user
    joined Task 1 proposing a hybrid model
                                                                feedbacks, the ongoing research in this field is fo-
    that joins rule-based and machine learning
                                                                cused on creating models that are more and more
    methodologies in order to combine the ad-
                                                                accurate and that can also extract fine grained in-
    vantages of both. The model proposed for
                                                                formation for the data. As part of this research, the
    Task 1 is only preliminary.
                                                                ATE ABSITA tasks (de Mattei et al., 2020)1 , part
                                                                of the EVALITA campaign (Basile et al., 2020),
    Italiano. Questo articolo descrive la                       challenge the participants in extracting the aspects
    nostra sottomissione ai tasks sulla Senti-                  (Task 1), predict the sentiment towards each ex-
    ment Analysis ATE ABSITA (Aspect Term                       pect (Task 2) and also predict the overall sentiment
    Extraction and Aspect-Based Sentiment                       expressed (Task 3) for a dataset containing reviews
    Analysis). I nostri sforzi si sono con-                     of items from an online shop.
    centrati sul Task 3 per il quale abbiamo                       It’s important to notice that the dataset released
    adottato gli alberi di predizione (Boosted                  for the task is one of the few resources for the Ital-
    Trees) utilizzando come features di in-                     ian language that has annotated aspects and sen-
    gresso una combinazione basata sulla                        timent at the same time. Others Italian resources
    frequenza delle parole con la polarità                     that take into account sentiment with respect to as-
    derivate da un lessico. L’approccio rag-                    pects are (Sorgente et al., 2014) and (Croce et al.,
    giunge un errore competitivo e, grazie                      2013). The first contains reviews of movies with
    all’interpretabilità dei moduli intermedi,                 8 domain specific aspects and 5 different polarity
    ci consente di analizzare in dettaglio gli                  values while the second contains opinions about
    elementi che caratterizzano maggiormente                    wines considering 5 aspects and 3 possible polar-
    la fase di predizione. Una proposta è stata                ity values.
    realizzata anche per il Task 1, dove ab-                       This paper describes our approaches in solving
    biamo sviluppato un modello ibrido che                      task 1 and task 3. The approach for task 1 is still
     Copyright © 2020 for this paper by its authors. Use per-   preliminary.
mitted under Creative Commons License Attribution 4.0 In-
                                                                    1
ternational (CC BY 4.0).                                                http://www.di.uniba.it/ swap/ate absita/index.html
   In the last decade top performing approaches to     grams at the same time.
Sentiment Analysis have shifted from using classi-
fiers on hand-crafted features, often based on lex-    2.2      Lexicon-based features
icons (Zhu et al., 2014), to complex models based      To build the polarity features of our model, we
on deep Neural Networks and advanced word em-          have adopted SenticNet, a resource used for
beddings (Liu et al., 2020). While the latest mod-     concept-level sentiment analysis. It contains a col-
els require special hardware and significant work      lection of concepts, including common-sense con-
to be trained, older approaches are built on top of    cepts, provided with values for polarity, attention,
well understood classification techniques that can     pleasantness and sensitivity. These are numerical
be trained on commodity hardware which makes           features that are available for a subset of the words
them easy to adapt for new applications. The ap-       in each review. We take in account the average, the
proach proposed for Task 3 revisits the old fash-      minimum and the maximum of all the values avail-
ioned style of doing Sentiment Analysis to see         able in each review. We also consider the mood
how it performs against more modern methodolo-         tags provided by SenticNet. They are sets of tags
gies that are used in the competition.                 as #tristezza, #rabbia, #felicità2 attached to each
   Regarding Task 1 we follow the latest trend of      word, we consider them as binary features.
exploiting linguistic patterns (Poria et al., 2016;
Liu et al., 2015; Poria et al., 2014; Rana and         2.3      Regressor
Cheah, 2019). What distinguishes our approach
from others is that we use automatically generated     Our final regressor is composed of 800 Decision
patterns based on POS-Tags (Part of Speech-Tags)       Trees with a maximum depth of 4 layers. The
following the assumption that they are more robust     model was trained using Gradient Boosting with
to bad grammar compared to linguistic dependen-        a learning rate of 0.3. The final prediction is com-
cies.                                                  puted averaging the output of each tree. The ratio-
   In Section 2 we will describe our approach for      nale behind our choice is that we have a high num-
Task 3 and in Section 2.4 we will discuss the re-      ber of features that are easy to use with tree based
sults. In Section 3 we will briefly discuss the pre-   methods for specific cases, hence ensembling al-
liminary model we build for Task 1 and its results.    lows us to learn a set of shallow trees and each of
                                                       them can work well for specific cases.
2     Our approach for Task 3
                                                       2.4      Results and discussion
The idea behind our approach is to achieve com-
                                                       To build our model we initially focused on the
petitive results using well known tools that can
                                                       training set using cross-validation to optimize the
be used on commodity hardware. We build the
                                                       parameters achieving a root mean square error of
features representing the text using n-grams and
                                                       0.852 (the prediction target is on a scale from 1
adding a set of characteristic annotated in Sentic-
                                                       to 5), we then tested the optimized model on the
Net (Cambria et al., 2010). Given the large amount
                                                       development set reaching an error of 0.805. We
of features, we decided to use Boosted Trees as re-
                                                       finally achieved an error of 0.795 on the final test
gression model given its ability to sub-sample the
                                                       set. The difference in the error across the different
features dynamically. For textual preprocessing
                                                       stages of validation suggests that the model is well
the libraries Spacy (Honnibal and Montani, 2017)
                                                       trained as the error doesn’t increase when new data
and Scikit-Learn (Pedregosa et al., 2011) were
                                                       is presented. However, it also suggest that the esti-
used. We chose XGboost (Chen and Guestrin,
                                                       mation of the error has a wide confidence interval,
2016) as implementation of Boosted Trees for re-
                                                       the standard deviation estimated during cross val-
gression.
                                                       idation is 0.049.
2.1    Lexical features                                   In Figure 1 we compare the scores predicted and
                                                       the annotated score on the development set. The
Before extracting the lexical features we remove
                                                       chart shows that the model has a tendency to over
stop words (apart from words that can be used as
                                                       estimate the error, especially in cases annotated
negative adverbs) and lemmatized each word. Fi-
                                                       with a low score.
nally, we extract a set of n-grams from each re-
                                                          2
view. We consider uni-grams, bi-grams and tri-                In English: #sadness, #anger, #happiness
                                                                       term               importance      coverage %
                   5                                                   pessimo              0.057123         5.712323
                                                                       purtroppo            0.038088         9.521134
                                                                       rimborsare           0.037871        13.308205
                   4                                                   non consigliare      0.033299        16.638059
 predicted score


                                                                       purtroppo essere     0.027965        19.434580
                   3                                                   cattivo              0.025690        22.003609
                                                                       dispiacere           0.024986        24.502171
                                                                       pensare              0.018631        26.365243
                   2                                                   sconsigliare         0.016331        27.998360
                                                                       dopo                 0.016239        29.622279
                                             perfection line
                   1                         score
                                                                       non funzionare       0.015425        31.164802
                                                                       delusione            0.015227        32.687547
                       1   2          3        4           5           non riconoscere      0.014809        34.168431
                               annotated score                         restituire           0.014615        35.629894
                                                                       bruciare             0.014250        37.054852

Figure 1: Scatter plot that shows the annotated                    Table 1:    Important terms highlighted by the
score against the predicted score on the develop-                  model. The column importance reports the im-
ment set.                                                          portance score of the term while coverage is the
                                                                   cumulative sum of the importance scores.
  We will now examine two reviews for which our
regressor has the highest error. This is the text of               lence makes the review a borderline case for our
the first review:                                                  model.
   “si autospenge proprio quando si necessita di                      We attribute this tendency to overestimate the
             usarla contelecomando”3 .                             target to the fact that the model is optimized to
   This review was annotated with a score of 2,                    minimize the root-mean-square error, this makes
but the score assigned by our system is 4.75. This                 the model predict values closer to the average an-
highlights a tendency of the system to give higher                 notated score. While this is acceptable in an aca-
scores in uncertain cases. In this specific case we                demic competition, it’s less than ideal in an indus-
have no adjectives and two typing mistakes that re-                trial setting. One way to solve the overestimation
sult in no information from the lexicon and most                   problem, without changing the formulation of the
of the words being disregarded as rare by our pre-                 error to minimize, would be to balance the data so
processing pipeline. This suggests that a special                  to have a similar number of occurrences for each
treatment is needed for these specific cases where                 score. Sub-sampling the data is unpractical as it
the classifier has fewer elements to take a decision.              would reduce the sample size too drastically. This
   The text of the second review is:                               leaves open only the option to add more samples.
                                                                      In Table 1 we see the 15 terms most influen-
 “Per questo prezzo c’è di meglio.. restituita.Gli                tial on the model. Here we note that most of the
            accessori sono ottimi.”4 .                             terms have a negative connotation. Interestingly,
   This sentence was annotated with a score of                     all the bi-grams in the list contain the word non
2, but the score assigned by our system is 3.36.                   (not). Taking in account that the terms reported
We have again a case of over estimation of the                     in the table add up to 37% of the importance of
score. This time the review has two contrasting                    all the features, this highlights the fact that the re-
sentences. A very negative one where the user                      gressor puts particular attention in the prediction
states of having returned the item and a very pos-                 of reviews with a low score even if they are a mi-
itive one regarding the accessories. This ambiva-                  nority.
      3
      In English: It turns off on its own when you need to use
it with the remote control. (The original sentence contains a      3     Preliminary results on Task 1
two typos.)
    4
      In English: There’s a better choice for the same price.. I   Task 1 asks to identify terms and phrases that con-
returned it.The accessories are great.                             tain an aspect of the customer review when it co-
occurs with opinion words that bring information               References
about the sentiment polarity. 5                               [Basile et al.2020] Valerio Basile, Danilo Croce, Maria
   For this task we have designed a hybrid model                 Di Maro, and Lucia C. Passaro. 2020. EVALITA
that joins a rule-based approach with machine                    2020: Overview of the 7th Evaluation Campaign
learning. The main idea is to identify a set of plau-            of Natural Language Processing and Speech Tools
                                                                 for Italian. In Valerio Basile, Danilo Croce, Maria
sible aspects via some pre-defined rules, then use               Di Maro, and Lucia C. Passaro, editors, Proceedings
a classifier to filter out the wrong candidates. The             of Seventh Evaluation Campaign of Natural Lan-
rules are defined on POS-Tagging patterns. For                   guage Processing and Speech Tools for Italian. Fi-
example the review                                               nal Workshop (EVALITA 2020), Online. CEUR.org.

       “Ottimo rasoio dal semplice utilizzo.”                 [Cambria et al.2010] Erik Cambria, Robert Speer,
                                                                 Catherine Havasi, and Amir Hussain.         2010.
with annotated as aspect “semplice” matches the                  Senticnet: A publicly available semantic resource
rule defined by the following pattern                            for opinion mining. In AAAI fall symposium:
                                                                 commonsense knowledge, volume 10. Citeseer.
         ADJ NOUN PROPN ADJ NOUN.
                                                              [Chen and Guestrin2016] Tianqi Chen and Carlos
   The bold tag indicates the position of the plau-              Guestrin. 2016. XGBoost: A scalable tree boosting
sible aspect. We have defined a set of about 3000                system. In Proceedings of the 22nd ACM SIGKDD
rules. The rules have been discovered picking the                International Conference on Knowledge Discovery
most common POS-Tagging patterns that match                      and Data Mining, KDD ’16, pages 785–794, New
                                                                 York, NY, USA. ACM.
the annotated aspects. In particular we have found
the position of the aspects in the sentence and se-           [Croce et al.2013] Danilo Croce, Francesco Garzoli,
lected the POS of close words (three on each side)                Marco Montesi, Diego De Cao, and Roberto Basili.
                                                                  2013. Enabling advanced business intelligence in
taking in account the punctuation.                                divino. In DART@AI*IA, pages 61–72.
   Each aspect found can match one or more rules.
The activation of each rule is used as binary fea-            [de Mattei et al.2020] Lorenzo de Mattei, Graziella
                                                                  de Martino, Andrea Iovine, Alessio Miaschi,
ture for the final classifier. The final classifier is
                                                                  Marco Polignano, and Giulia Rambelli. 2020.
implemented using Logistic Regression (Hastie et                  ATE ABSITA@EVALITA2020: Overview of the
al., 2001), its target is to predict if each candidate            Aspect Term Extraction and Aspect-based Senti-
found by the rules is an actual candidate or a false              ment Analysis Task. In Valerio Basile, Danilo
positive.                                                         Croce, Maria Di Maro, and Lucia C. Passaro, edi-
                                                                  tors, Proceedings of the 7th evaluation campaign of
   This preliminary effort achieves a F1-score of                 Natural Language Processing and Speech tools for
0.340, which is above the baseline (0.255) but be-                Italian (EVALITA 2020), Online. CEUR.org.
low the average score of the submissions (0.504).
                                                              [Hastie et al.2001] Trevor Hastie, Robert Tibshirani,
                                                                 and Jerome Friedman. 2001. The Elements of
4    Conclusions                                                 Statistical Learning. Springer Series in Statistics.
The submission confirmed the effectiveness of us-                Springer New York Inc., New York, NY, USA.
ing a simple approach to predict the sentiment                [Honnibal and Montani2017] Matthew Honnibal and
score of customer reviews in Italian (Task 3).                   Ines Montani. 2017. spaCy 2: Natural language un-
The approach consists in combining simple word                   derstanding with Bloom embeddings, convolutional
                                                                 neural networks and incremental parsing. To appear.
embedding, specifically tri-grams, and a lexicon
as SenticNet to build features for Boosted Trees.             [Liu et al.2015] Qian Liu, Zhiqiang Gao, Bing Liu, and
Our system achieved a competitive error which is                  Yuanlin Zhang. 2015. Automated rule selection
                                                                  for aspect extraction in opinion mining. In Twenty-
lower than the baseline by 0.209 points and higher
                                                                  Fourth international joint conference on artificial in-
than the best model by 0.131 points. The error                    telligence.
achieved above the average official score by 0.067
points (the estimates includes baseline models).              [Liu et al.2020] Jiaxiang Liu, Xuyi Chen, Shikun Feng,
                                                                  Shuohuan Wang, Xuan Ouyang, Yu Sun, Zhengjie
   The submission also highlights that we were                    Huang, and Weiyue Su.          2020.    kk2018 at
able to beat the baseline for Task 1 with a rudimen-              semeval-2020 task 9: Adversarial training for code-
tary approach. We will build upon this approach in                mixing sentiment classification. arXiv preprint
our future work.                                                  arXiv:2009.03673.
    5                                                         [Pedregosa et al.2011] F. Pedregosa, G. Varoquaux,
      Detailed   description      of      the     task   at
http://www.di.uniba.it/ swap/ate absita/task.html                 A. Gramfort, V. Michel, B. Thirion, O. Grisel,
   M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,
   J. Vanderplas, A. Passos, D. Cournapeau,
   M. Brucher, M. Perrot, and E. Duchesnay. 2011.
   Scikit-learn: Machine learning in Python. Journal
   of Machine Learning Research, 12:2825–2830.
[Poria et al.2014] Soujanya Poria, Erik Cambria, Lun-
    Wei Ku, Chen Gui, and Alexander Gelbukh. 2014.
    A rule-based approach to aspect extraction from
    product reviews. In Proceedings of the second work-
    shop on natural language processing for social me-
    dia (SocialNLP), pages 28–37.
[Poria et al.2016] Soujanya Poria, Erik Cambria, and
    Alexander Gelbukh. 2016. Aspect extraction for
    opinion mining with a deep convolutional neural
    network. Knowledge-Based Systems, 108:42–49.
[Rana and Cheah2019] Toqir A Rana and Yu-N Cheah.
   2019. Sequential patterns rule-based approach for
   opinion target extraction from customer reviews.
   Journal of Information Science, 45(5):643–655.
[Sorgente et al.2014] Antonio Sorgente, Giuseppe Vet-
    tigli, and Francesco Mele. 2014. An italian cor-
    pus for aspect based sentiment analysis of movie
    reviews. In First Italian Conference on Computa-
    tional Linguistics CLiC-it.
[Zhu et al.2014] Xiaodan Zhu, Svetlana Kiritchenko,
   and Saif Mohammad. 2014. Nrc-canada-2014:
   Recent improvements in the sentiment analysis of
   tweets. In Proceedings of the 8th international
   workshop on semantic evaluation (SemEval 2014),
   pages 443–447.

</pre>