=Paper=
{{Paper
|id=Vol-2263/paper017
|storemode=property
|title=Ensemble of LSTMs for EVALITA 2018 Aspect-based Sentiment Analysis Task (ABSITA) (Short Paper)
|pdfUrl=https://ceur-ws.org/Vol-2263/paper017.pdf
|volume=Vol-2263
|authors=Mauro Bennici,Xileny Seijas Portocarrero
|dblpUrl=https://dblp.org/rec/conf/evalita/BenniciP18
}}
==Ensemble of LSTMs for EVALITA 2018 Aspect-based Sentiment Analysis Task (ABSITA) (Short Paper)==
Ensemble of LSTMs for EVALITA 2018 Aspect-based Sentiment Analysis task (ABSITA) (Short Paper) Mauro Bennici Xileny Seijas Portocarrero You Are My GUide You Are My GUide mauro@youaremyguide.com xileny@youaremyguide.com Abstract Automating the correct recognition of the various problems can lead to the timely addressing of the English. In identifying the different emo- same to the persons appointed to solve them. tions present in a review, it is necessary to distinguish the single entities present The research was carried out with the dataset and the specific semantic relations. The provided within the task called ABSITA, Aspect- number of reviews needed to have a based Sentiment Analysis at EVALITA 20181 complete dataset for every single possible (Basile et al., 2018). The task was a combination option is not predictable. of two tasks, Aspect Category Detection (ACD) and Aspect Category Polarity (ACP). The approach described starts from the possibility to study the aspect and later The dataset is a selection of hotel reviews taken the polarity and to create an ensemble of in Italian from the portal Booking.com. the two models to provide a better under- standing of the dataset. 2 Description of the system Each review has been cleaned up by special Italiano. Nell'identificazione delle diver- se emozioni presenti in una recensione è characters, lemmatized and brought to lowercase with the SpaCy2 framework. necessario distinguere le singole entità Generic Italian texts have been used, instead of presenti e le singole relazioni semantiche. Il numero di recensioni necessarie per reviews in the accommodation context to be sure avere un dataset completo per ogni singo- that the model will be suitable for more business models, to generate vectors in fastText3. The best la opzione possibile non è predicibile. one has a dimension of 200, with character n- L'approccio descritto parte dalla possibi- grams of length 5, a window of size 5 and 10 lità di creare due modelli diversi, uno per negatives. la parte di categorizzazione, e l'altro per la parte di polarità. E di unire i due mo- The system is the ensemble of two different delli per ottenere una maggiore compren- models to improve the ability to discover hidden sione del dataset. properties (Akhtar et al., 2018). 1 Introduction The first model is a bi-directional Long Short- Term Memory (BI-LSTM). With the increase in interactions between users This model is used for the discernment of the and businesses across different channels and dif- ASPECT. ferent languages, it becomes increasingly diffi- cult for businesses to respond promptly and ef- fectively in an effective manner. Not all activities can have a team dedicated to public relations and 1 http://sag.art.uniroma2.it/absita/ often rely on external agencies that do not know 2 https://spacy.io the internal operations of the company. 3 https://fasttext.cc/ Layer (type) Output Shape Param # Table 2: micro precision, micro recall and micro F1 =================================== score with the gold dataset. e (Embedding) (None, 100, 200) 1420400 _______________________________________ The results show that the models are useful to b (Bidirection) (None, 512) 935936 understand the category of a review better than _______________________________________ its polarity. d (Dense) (None, 7) 3591 =================================== After that we ensemble the two models (Choi et al., 2018) to obtain a system able to overcome A second BI-LSTM model is used for the dis- the results of every single model in the ACP task cernment of POLARITY. reducing the result on the ACD task (table 3). _______________________________________ Layer (type) Output Shape Param # The ensemble has been created in cascade mak- =================================== ing sure that a system acts as Attention to the e (Embedding) (None, 100, 200) 1420400 underlying system. _______________________________________ The threshold of activation was a range between b (Bidirection) (None, 512) 935936 0.45 and 0.55. _______________________________________ d (Dense) (None, 14) 7182 A third model, a LightGBM5 (Bennici and Porto- =================================== carrero, 2018) was also tested, where the follow- ing properties are extracted from the reviews A dropout and a recurrent_dropout of 0.1. text: The optimizer for both is the RMSProp. The loaded embedding is trainable. • length of the review Both the systems use Keras4 to create the RNN • percentage of special characters models. • the number of exclamation points The models were trained and tested with a 5-fold • the number of question marks cross-validation with a ratio of 80% training and • the number of words 20% testing. The best model was automatically • the number of characters saved at each iteration. • the number of spaces A threshold of 0.5 was used on the first model to • the number of stop words activate the result of the last layer. In the second • the ratio between words and stop words model, the threshold was of 0.43. • the ratio between words and spaces Aspect Category Detection (ACD) and they are joined to the vector created by the bigram and trigram of the text itself at word and micro precision micro recall micro F1 score character level. The number of leaves is 250, the learner set as 0.8397 0.8050 0.8204 ‘Feature’, and a the learning rate at 0.04. Table 1: micro precision, micro recall and micro F1 score with the gold dataset. The result of the union between the three models could not be submitted to the final evaluation, due to the limit of 2 possible submissions, but reported results higher than 83% in the tests car- ried out after the release of the complete dataset Aspect Category Polarity (ACP) for ASPECT and 75% for POLARITY. Also, the inference is faster than the RNN mod- els. micro precision micro recall micro F1 score 0.8138 0.6593 0.7172 4 https://keras.io 5 https://github.com/Microsoft/LightGBM 3 Results tively, and to identify different categories of the hotel. Aspect Category Detection (ACD) In the near future, we are ready to create a sys- tem to split the text of the review to categorize only a single sentence, or less a single subject or Runs micro precision micro recall micro F1 object. In this way, we will be ready to evaluate Run 1 0.8713 0.7504 0.8063 also the polarity of the single object or subject, Run 2 0.8697 0.7481 0.8043 and only the terms single related to it to improve Table 3: micro precision, micro recall and micro F1 the result of the ACP task. score for the submitted ACD subtasks. The performance of the system will also be evaluated by replacing all the possible entities with variables known as: Aspect Category Polarity (ACP) l City Runs micro precision micro recall micro F1 l Museum Run 1 0.7387 0.7206 0.7295 l Panoramic Point l Railway station Run 2 0.7472 0.7186 0.7326 l Street Table 4: micro precision, micro recall and micro F1 score for the submitted ACP subtask. and with a pre-category knew a priori as Break- fast for words like Coffee, Cornetto, and Jam. In the evaluation phase, we can see how the re- sults have given reason to the ensemble of the The expected result is to reduce the variance of two results. the dataset, to improve the ACD result, and to be able to use the system in production. It is clear that the ACP task (table 4) is the bene- ficiary of this process, instead of the ACD one Finally, we will evaluate the speed and effective- (table 3) that lost more than one point. ness of a CNN model in which the tasks, AS- PECT, and POLARITY, can be studied separate- The study of the dataset is influenced by the little ly and then merged. extension of the training dataset and by the speci- ficity of some terms that could refer to different Reference categories such as the comfort of the room and the quality/price ratio. Basile, P., Basile, V., Croce, D., & Polignano, M. (2018). Overview of the EVALITA 2018 Aspect- Various types of data preparation have also been based Sentiment Analysis task (ABSITA). Pro- ceedings of the 6th evaluation campaign of Natural used, including the preservation of special char- Language Processing and Speech tools for Italian acters, the shape of words (to better identify cit- (EVALITA’18) ies or places written in capital letters), and some SMOTE functions to increase the number of en- tries but with poor results and noticeable overfit- Akhtar, M., Ghosal, D., Ekbal, A., Bhattacharyya, P., ting. & Kurohashi, S. (2018, October 15). A Multi-task Ensemble Framework for Emotion, Sentiment and Intensity Prediction. Retrieved from 4 Conclusion https://arxiv.org/abs/1808.01216 Creating an ensemble of models to bring out var- ious properties of a review gave better results Choi, J. Y. and Bumshik, L. (2018).“Combining than using a single model in the polarity identifi- LSTM Network Ensemble via Adaptive Weighting cation. for Improved Time Series Forecasting,” Mathemat- ical Problems in Engineering, vol. 2018, Article ID The terms used in the review are sometimes mis- 2470171, 8 pages. doi: leading and can be used both positively or nega- https://doi.org/10.1155/2018/2470171. Bennici, M. and Seijas Portocarrero, X. (2018). The validity of dictionaries over the time in Emoji predic- tion. In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors, Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA’18), Turin, Italy. CEUR.org.