=Paper=
{{Paper
|id=Vol-2765/111
|storemode=property
|title=SentNA @ ATE_ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2765/paper111.pdf
|volume=Vol-2765
|authors=Francesco Mele,Antonio Sorgente,Giuseppe Vettigli
|dblpUrl=https://dblp.org/rec/conf/evalita/MeleSV20
}}
==SentNA @ ATE_ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features (short paper)==
SentNA @ ATE ABSITA: Sentiment Analysis of Customer Reviews Using Boosted Trees with Lexical and Lexicon-based Features Francesco Mele Antonio Sorgente Giuseppe Vettigli Institute of Applied Sciences Institute of Applied Sciences Centrica plc, and Intelligent Systems and Intelligent Systems Institute of Applied Sciences National Research Council National Research Council and Intelligent Systems (CNR) f.mele@isasi.cnr.it a.sorgente@isasi.cnr.it giuseppe.vettigli@centrica.com Abstract combina un approcio basato su regole con tecniche Machine Learning. Il modello English. This paper describes our sub- sviluppato per il Task 1 è solo in fase pre- mission to the tasks on Sentiment Analysis liminare. of ATE ABSITA (Aspect Term Extraction and Aspect-Based Sentiment Analysis). In particular, we focused on Task 3 using an 1 Introduction approach based on combining frequency User feedback has become essential for compa- of words with lexicon-based polarities and nies to improve their services and products. Nowa- uses Boosted Trees to predict the senti- days, we can find user feedback in textual form as ment score. This approach achieved a online reviews, posts on social media and so on. competitive error and, thanks to the inter- These resources can express overall opinions but pretability of the building blocks, allows also opinions about some specific details (aspects) us to show the what elements are consid- of the subject. In this scenario, the tools provided ered when making the prediction. We also by Sentiment Analysis are crucial to process user joined Task 1 proposing a hybrid model feedbacks, the ongoing research in this field is fo- that joins rule-based and machine learning cused on creating models that are more and more methodologies in order to combine the ad- accurate and that can also extract fine grained in- vantages of both. The model proposed for formation for the data. As part of this research, the Task 1 is only preliminary. ATE ABSITA tasks (de Mattei et al., 2020)1 , part of the EVALITA campaign (Basile et al., 2020), Italiano. Questo articolo descrive la challenge the participants in extracting the aspects nostra sottomissione ai tasks sulla Senti- (Task 1), predict the sentiment towards each ex- ment Analysis ATE ABSITA (Aspect Term pect (Task 2) and also predict the overall sentiment Extraction and Aspect-Based Sentiment expressed (Task 3) for a dataset containing reviews Analysis). I nostri sforzi si sono con- of items from an online shop. centrati sul Task 3 per il quale abbiamo It’s important to notice that the dataset released adottato gli alberi di predizione (Boosted for the task is one of the few resources for the Ital- Trees) utilizzando come features di in- ian language that has annotated aspects and sen- gresso una combinazione basata sulla timent at the same time. Others Italian resources frequenza delle parole con la polarità that take into account sentiment with respect to as- derivate da un lessico. L’approccio rag- pects are (Sorgente et al., 2014) and (Croce et al., giunge un errore competitivo e, grazie 2013). The first contains reviews of movies with all’interpretabilità dei moduli intermedi, 8 domain specific aspects and 5 different polarity ci consente di analizzare in dettaglio gli values while the second contains opinions about elementi che caratterizzano maggiormente wines considering 5 aspects and 3 possible polar- la fase di predizione. Una proposta è stata ity values. realizzata anche per il Task 1, dove ab- This paper describes our approaches in solving biamo sviluppato un modello ibrido che task 1 and task 3. The approach for task 1 is still Copyright © 2020 for this paper by its authors. Use per- preliminary. mitted under Creative Commons License Attribution 4.0 In- 1 ternational (CC BY 4.0). http://www.di.uniba.it/ swap/ate absita/index.html In the last decade top performing approaches to grams at the same time. Sentiment Analysis have shifted from using classi- fiers on hand-crafted features, often based on lex- 2.2 Lexicon-based features icons (Zhu et al., 2014), to complex models based To build the polarity features of our model, we on deep Neural Networks and advanced word em- have adopted SenticNet, a resource used for beddings (Liu et al., 2020). While the latest mod- concept-level sentiment analysis. It contains a col- els require special hardware and significant work lection of concepts, including common-sense con- to be trained, older approaches are built on top of cepts, provided with values for polarity, attention, well understood classification techniques that can pleasantness and sensitivity. These are numerical be trained on commodity hardware which makes features that are available for a subset of the words them easy to adapt for new applications. The ap- in each review. We take in account the average, the proach proposed for Task 3 revisits the old fash- minimum and the maximum of all the values avail- ioned style of doing Sentiment Analysis to see able in each review. We also consider the mood how it performs against more modern methodolo- tags provided by SenticNet. They are sets of tags gies that are used in the competition. as #tristezza, #rabbia, #felicità2 attached to each Regarding Task 1 we follow the latest trend of word, we consider them as binary features. exploiting linguistic patterns (Poria et al., 2016; Liu et al., 2015; Poria et al., 2014; Rana and 2.3 Regressor Cheah, 2019). What distinguishes our approach from others is that we use automatically generated Our final regressor is composed of 800 Decision patterns based on POS-Tags (Part of Speech-Tags) Trees with a maximum depth of 4 layers. The following the assumption that they are more robust model was trained using Gradient Boosting with to bad grammar compared to linguistic dependen- a learning rate of 0.3. The final prediction is com- cies. puted averaging the output of each tree. The ratio- In Section 2 we will describe our approach for nale behind our choice is that we have a high num- Task 3 and in Section 2.4 we will discuss the re- ber of features that are easy to use with tree based sults. In Section 3 we will briefly discuss the pre- methods for specific cases, hence ensembling al- liminary model we build for Task 1 and its results. lows us to learn a set of shallow trees and each of them can work well for specific cases. 2 Our approach for Task 3 2.4 Results and discussion The idea behind our approach is to achieve com- To build our model we initially focused on the petitive results using well known tools that can training set using cross-validation to optimize the be used on commodity hardware. We build the parameters achieving a root mean square error of features representing the text using n-grams and 0.852 (the prediction target is on a scale from 1 adding a set of characteristic annotated in Sentic- to 5), we then tested the optimized model on the Net (Cambria et al., 2010). Given the large amount development set reaching an error of 0.805. We of features, we decided to use Boosted Trees as re- finally achieved an error of 0.795 on the final test gression model given its ability to sub-sample the set. The difference in the error across the different features dynamically. For textual preprocessing stages of validation suggests that the model is well the libraries Spacy (Honnibal and Montani, 2017) trained as the error doesn’t increase when new data and Scikit-Learn (Pedregosa et al., 2011) were is presented. However, it also suggest that the esti- used. We chose XGboost (Chen and Guestrin, mation of the error has a wide confidence interval, 2016) as implementation of Boosted Trees for re- the standard deviation estimated during cross val- gression. idation is 0.049. 2.1 Lexical features In Figure 1 we compare the scores predicted and the annotated score on the development set. The Before extracting the lexical features we remove chart shows that the model has a tendency to over stop words (apart from words that can be used as estimate the error, especially in cases annotated negative adverbs) and lemmatized each word. Fi- with a low score. nally, we extract a set of n-grams from each re- 2 view. We consider uni-grams, bi-grams and tri- In English: #sadness, #anger, #happiness term importance coverage % 5 pessimo 0.057123 5.712323 purtroppo 0.038088 9.521134 rimborsare 0.037871 13.308205 4 non consigliare 0.033299 16.638059 predicted score purtroppo essere 0.027965 19.434580 3 cattivo 0.025690 22.003609 dispiacere 0.024986 24.502171 pensare 0.018631 26.365243 2 sconsigliare 0.016331 27.998360 dopo 0.016239 29.622279 perfection line 1 score non funzionare 0.015425 31.164802 delusione 0.015227 32.687547 1 2 3 4 5 non riconoscere 0.014809 34.168431 annotated score restituire 0.014615 35.629894 bruciare 0.014250 37.054852 Figure 1: Scatter plot that shows the annotated Table 1: Important terms highlighted by the score against the predicted score on the develop- model. The column importance reports the im- ment set. portance score of the term while coverage is the cumulative sum of the importance scores. We will now examine two reviews for which our regressor has the highest error. This is the text of lence makes the review a borderline case for our the first review: model. “si autospenge proprio quando si necessita di We attribute this tendency to overestimate the usarla contelecomando”3 . target to the fact that the model is optimized to This review was annotated with a score of 2, minimize the root-mean-square error, this makes but the score assigned by our system is 4.75. This the model predict values closer to the average an- highlights a tendency of the system to give higher notated score. While this is acceptable in an aca- scores in uncertain cases. In this specific case we demic competition, it’s less than ideal in an indus- have no adjectives and two typing mistakes that re- trial setting. One way to solve the overestimation sult in no information from the lexicon and most problem, without changing the formulation of the of the words being disregarded as rare by our pre- error to minimize, would be to balance the data so processing pipeline. This suggests that a special to have a similar number of occurrences for each treatment is needed for these specific cases where score. Sub-sampling the data is unpractical as it the classifier has fewer elements to take a decision. would reduce the sample size too drastically. This The text of the second review is: leaves open only the option to add more samples. In Table 1 we see the 15 terms most influen- “Per questo prezzo c’è di meglio.. restituita.Gli tial on the model. Here we note that most of the accessori sono ottimi.”4 . terms have a negative connotation. Interestingly, This sentence was annotated with a score of all the bi-grams in the list contain the word non 2, but the score assigned by our system is 3.36. (not). Taking in account that the terms reported We have again a case of over estimation of the in the table add up to 37% of the importance of score. This time the review has two contrasting all the features, this highlights the fact that the re- sentences. A very negative one where the user gressor puts particular attention in the prediction states of having returned the item and a very pos- of reviews with a low score even if they are a mi- itive one regarding the accessories. This ambiva- nority. 3 In English: It turns off on its own when you need to use it with the remote control. (The original sentence contains a 3 Preliminary results on Task 1 two typos.) 4 In English: There’s a better choice for the same price.. I Task 1 asks to identify terms and phrases that con- returned it.The accessories are great. tain an aspect of the customer review when it co- occurs with opinion words that bring information References about the sentiment polarity. 5 [Basile et al.2020] Valerio Basile, Danilo Croce, Maria For this task we have designed a hybrid model Di Maro, and Lucia C. Passaro. 2020. EVALITA that joins a rule-based approach with machine 2020: Overview of the 7th Evaluation Campaign learning. The main idea is to identify a set of plau- of Natural Language Processing and Speech Tools for Italian. In Valerio Basile, Danilo Croce, Maria sible aspects via some pre-defined rules, then use Di Maro, and Lucia C. Passaro, editors, Proceedings a classifier to filter out the wrong candidates. The of Seventh Evaluation Campaign of Natural Lan- rules are defined on POS-Tagging patterns. For guage Processing and Speech Tools for Italian. Fi- example the review nal Workshop (EVALITA 2020), Online. CEUR.org. “Ottimo rasoio dal semplice utilizzo.” [Cambria et al.2010] Erik Cambria, Robert Speer, Catherine Havasi, and Amir Hussain. 2010. with annotated as aspect “semplice” matches the Senticnet: A publicly available semantic resource rule defined by the following pattern for opinion mining. In AAAI fall symposium: commonsense knowledge, volume 10. Citeseer. ADJ NOUN PROPN ADJ NOUN. [Chen and Guestrin2016] Tianqi Chen and Carlos The bold tag indicates the position of the plau- Guestrin. 2016. XGBoost: A scalable tree boosting sible aspect. We have defined a set of about 3000 system. In Proceedings of the 22nd ACM SIGKDD rules. The rules have been discovered picking the International Conference on Knowledge Discovery most common POS-Tagging patterns that match and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM. the annotated aspects. In particular we have found the position of the aspects in the sentence and se- [Croce et al.2013] Danilo Croce, Francesco Garzoli, lected the POS of close words (three on each side) Marco Montesi, Diego De Cao, and Roberto Basili. 2013. Enabling advanced business intelligence in taking in account the punctuation. divino. In DART@AI*IA, pages 61–72. Each aspect found can match one or more rules. The activation of each rule is used as binary fea- [de Mattei et al.2020] Lorenzo de Mattei, Graziella de Martino, Andrea Iovine, Alessio Miaschi, ture for the final classifier. The final classifier is Marco Polignano, and Giulia Rambelli. 2020. implemented using Logistic Regression (Hastie et ATE ABSITA@EVALITA2020: Overview of the al., 2001), its target is to predict if each candidate Aspect Term Extraction and Aspect-based Senti- found by the rules is an actual candidate or a false ment Analysis Task. In Valerio Basile, Danilo positive. Croce, Maria Di Maro, and Lucia C. Passaro, edi- tors, Proceedings of the 7th evaluation campaign of This preliminary effort achieves a F1-score of Natural Language Processing and Speech tools for 0.340, which is above the baseline (0.255) but be- Italian (EVALITA 2020), Online. CEUR.org. low the average score of the submissions (0.504). [Hastie et al.2001] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of 4 Conclusions Statistical Learning. Springer Series in Statistics. The submission confirmed the effectiveness of us- Springer New York Inc., New York, NY, USA. ing a simple approach to predict the sentiment [Honnibal and Montani2017] Matthew Honnibal and score of customer reviews in Italian (Task 3). Ines Montani. 2017. spaCy 2: Natural language un- The approach consists in combining simple word derstanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear. embedding, specifically tri-grams, and a lexicon as SenticNet to build features for Boosted Trees. [Liu et al.2015] Qian Liu, Zhiqiang Gao, Bing Liu, and Our system achieved a competitive error which is Yuanlin Zhang. 2015. Automated rule selection for aspect extraction in opinion mining. In Twenty- lower than the baseline by 0.209 points and higher Fourth international joint conference on artificial in- than the best model by 0.131 points. The error telligence. achieved above the average official score by 0.067 points (the estimates includes baseline models). [Liu et al.2020] Jiaxiang Liu, Xuyi Chen, Shikun Feng, Shuohuan Wang, Xuan Ouyang, Yu Sun, Zhengjie The submission also highlights that we were Huang, and Weiyue Su. 2020. kk2018 at able to beat the baseline for Task 1 with a rudimen- semeval-2020 task 9: Adversarial training for code- tary approach. We will build upon this approach in mixing sentiment classification. arXiv preprint our future work. arXiv:2009.03673. 5 [Pedregosa et al.2011] F. Pedregosa, G. Varoquaux, Detailed description of the task at http://www.di.uniba.it/ swap/ate absita/task.html A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830. [Poria et al.2014] Soujanya Poria, Erik Cambria, Lun- Wei Ku, Chen Gui, and Alexander Gelbukh. 2014. A rule-based approach to aspect extraction from product reviews. In Proceedings of the second work- shop on natural language processing for social me- dia (SocialNLP), pages 28–37. [Poria et al.2016] Soujanya Poria, Erik Cambria, and Alexander Gelbukh. 2016. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems, 108:42–49. [Rana and Cheah2019] Toqir A Rana and Yu-N Cheah. 2019. Sequential patterns rule-based approach for opinion target extraction from customer reviews. Journal of Information Science, 45(5):643–655. [Sorgente et al.2014] Antonio Sorgente, Giuseppe Vet- tigli, and Francesco Mele. 2014. An italian cor- pus for aspect based sentiment analysis of movie reviews. In First Italian Conference on Computa- tional Linguistics CLiC-it. [Zhu et al.2014] Xiaodan Zhu, Svetlana Kiritchenko, and Saif Mohammad. 2014. Nrc-canada-2014: Recent improvements in the sentiment analysis of tweets. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pages 443–447.