Neural-network Method for Determining Text Author’s Sentiment to an Aspect Specified by the Named Entity Aleksandr Naumova,b , Roman Rybkaa,b , Alexander Sboeva,b , Anton Selivanova,b and Artem Gryaznova,b a National Research Centre ”Kurchatov Institute“, Moscow, Russia b MEPhI National Research Nuclear University, Moscow, Russia Abstract This study presents the approach to aspect-based sentiment analysis where a named entity of a certain category is considered as an aspect. Such task formulation is a novelty and opens up the opportunity to determine writers’ attitudes to organizations and people considered in texts. This task required a dataset of Russian-language sentences where sentiment with respect to certain named entities would be labeled, which we collected using a crowdsourcing platform. Sentiment determination is based on a deep neural network with attention mechanism and ELMo language model for word vector representation. The proposed model is validated on available data on a similar task. The resulting performance (by the f1- micro metric) on the collected dataset is 0.72, which is the new state of the art for the Russian language. Keywords text analysis, natural language processing, aspect based sentiment analysis, neural networks 1. Introduction A relevant part of social monitoring is determining the sentiment of a text so as to identify its attitude to significant social events (aspects). Frequently, even one sentence contains several sentiment evaluations concerning various aspects of the text. For example, in the sentence: “Alex is an excellent worker, but the company Foo LLC, in which he works, poorly manages its staff”, there are two named entities mentioned in different sentiment. The entity “Alex”, of the category “Person”, is used in positive sentiment, and the other entity “Foo LLC”, of the cat- egory “Organization”, in a negative one. This research proposes an approach to aspect-based sentiment determination in text with named entity pre-assigned as an aspect. Such task formu- lation is novel and opens up the opportunity to determine authors’ attitudes to organizations and people considered in texts, which could be useful for social and political analysis. There are several datasets in different languages available for aspect-based sentiment analy- sis task, including SemEval 2015 competition dataset [1], containing 830 reviews on three topics Russian Advances in Artificial Intelligence: selected contributions to the Russian Conference on Artificial intelligence (RCAI 2020), October 10-16, 2020, Moscow, Russia " Naumov-AV@nrcki.ru (A. Naumov); Rybka_RB@nrcki.ru (R. Rybka); Sboev_AG@nrcki.ru (A. Sboev); Selivanov_AA@nrcki.ru (A. Selivanov); Gryaznov_AV@nrcki.ru (A. Gryaznov)  © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (laptops, restaurants, hotels) in English; SemEval 2016 [2], an extension of the previous one, containing more than 47000 sentences from reviews on seven topics (restaurants, laptops, mo- bile phones, telecommunications, digital cameras, hotels, museums) in eight languages, and in addition to those reviews, about 23000 more texts in six languages on three topics. The existing datasets in Russian language are the ones from the SentiRuEval competitions of years 2015 [3] and 2016 [4]. The first one contains 822 reviews of cars and restaurants and Twitter messages (23600 "tweets"). Overall, existing datasets are collected for specific topics and include aspect sets for some particular domains. This research is based on a specially created dataset containing sentences with aspect-based sentiment labels. Crowdsourcing was used to extend the number of annotators, their markup was validated with a special procedure to provide annotation quality (see sec. 2). Analysis of the related works of the last few years [5, 6, 7] shows that algorithms based on complex topologies (including convolution, recurrent layers, attention mechanism) of deep learning models have a significant advantage over methods based on dictionaries, rules, and traditional machine learning methods. Therefore, this research uses neural network components in the solution development (see sec. 3). Results visualization uses Sankey diagrams, allowing to compare sentiment classes distribution over different named entities and various text sources (see sec. 5). 2. Dataset 2.1. Annotating Process There are currently no datasets in Russian for setting up tools for solving problems of aspect analysis when the aspect is a named entity, so one has been gathered and annotated. For the formation of the corpus, we collected sentences in Russian from several sources: posts of the LiveJournal social network [8], texts of the online news agency Lenta.ru1 and Twitter microblog posts [9]. A crowdsourcing platform2 was used to annotate the sentences. Only Russian-speaking users of the 30% of the best performers among all active users of the plat- form by internal rating and over 18 years old were allowed into the annotation process. Before a platform user became an annotator, they underwent a training task, after which they were to mark 25 test samples, with more than 80% agreement with the annotation that we had per- formed ourselves. Upon successful completion of the training task, the user was allowed to complete the main tasks, consisting of 10 sentences for annotation, one of which was a control one that we had labeled. For this additional control we labeled 200 sentences. If the accuracy of an annotator during the annotation process became less than 70% (including test and control samples), or if the percentage of correct answers to them was less than 66% over the last six control samples, then such annotator was blocked. A check was also performed on the num- ber of consecutive identical labels and the time used for annotating the task. If the task was annotated too quickly (less than 30 seconds) or if there were many identical labels (more than eight), then such tasks were checked manually and removed from the sample if unfair labels 1 https://github.com/yutkin/Lenta.Ru-News-Dataset 2 https://toloka.yandex.ru/ were detected. Thus, each sentence was annotated at least three times. 2.2. Aspect-Based Sentiment Annotation Generally, sentences do not always contain sentiment estimations and named entities. Thus, a preliminary selection of sentences for the subsequent annotation process is carried out. The selection criterion is the presence in the sentence of a named entity and one word from the sentiment words list. Named entities were extracted using a neural network model from the DeepPavlov library3 , which is a State-of-the-art solution for the Russian language with an accuracy of 98.1% (f1-score metric * 100% ) obtained on the Collection3 dataset [10]. For filtering a sentence, we formed a list of sentiment words that based on dictionaries of opinionated words from domain-oriented Russian sentiment vocabularies of RuSentiLex [11]. About two thousand words were manually selected for positive sentiment, including: “joy”, “pleasure”, “cheerful”, etc.; and for a negative - about six thousand words, including: “enmity”, “ailment”, “grieve”, etc. The annotators were asked to determine in what sentiment the author uses the named entity in the selected sentences (the classes of sentiment were “Positive”, “Neutral” and “Negative”). The sentence could not be marked with multiple tags. If the annotator was unable to unam- biguously determine the sentiment class of the selected aspect, then that example was marked with the label “I find it difficult to determine” and, in the absence of other annotations, was not included in the resulting dataset. If a selected aspect was erroneously defined by a named entity, then such an example was marked as a “Wrong aspect” and was also not included in the final dataset. The final label for the sentence was selected on the basis of the aggregation of annotators labels by majority voting. 2.3. Summary of The Dataset The aspect-based sentiment dataset contains 5552 unique sentences (1992 from Twitter, 2050 from the news site “Lenta.ru”, 1500 from the blog platform “Livejournal”). The resulting num- ber of sentences for every presented sentiment label, as well as the number of unique named entities, are presented in Table 1. Table 1 Summary of the dataset. Number of unique named entities Positive Neutral Negative Person Organisations Total Twitter 977 510 510 1432 275 1818 Lenta 478 1653 472 1244 573 1817 LJ 834 905 366 1307 285 1592 Total 2289 3068 1348 3761 1068 4829 The number of unique entities was counted based on normal word forms without spaces and punctuation. The agreement was calculated as the average value of the ratio of the num- 3 http://deeppavlov.ai/ ber of answers for the selected sentiment label to the number of all answers for all entities. The agreement value was 0.84. The most similar datasets to the one collected from the point of aspect-based sentiment annotation for the Russian language are the datasets from SentiRuE- val 2015-2016 competitions. However, the 2015 dataset contains 822 reviews (17000 particular entities) on two pre-defined topics (restaurants and cars) labeled with four sentiment classes (positive, negative, neutral, mixed). The 2016 dataset is more representative (approx. 23600 labeled entities), it contains labeling for pre-defined aspect list, which are possible not to be presented in sentence text. Therefore, the collected dataset is a significant extension of data available for sentiment analysis of Russian-language texts. 3. Method for Aspect-Based Sentiment Analysis The proposed method is based on deep neural network with attention (IAN) [7], which solves a classification task. The architecture of the model consists of two parts: one processes the context for the target aspect, the other processes the words of the aspect itself. In our model, the context 1 is all the words of the sentence which contains a named entity, and the aspect 2 is the words that belong to the same named entity for which sentiment is determined. 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 = [𝑤1 , 𝑤2 , ..., 𝑤𝑀 ], (1) 𝑎𝑠𝑝𝑒𝑐𝑡 = [𝑤𝑖 , ..., 𝑤𝑘 ], (2) where 𝑀 is the number of words in the sentence, 𝑖 and 𝑘 are the indices of the start and the end of the named entity, respectively. At the first step, the sentence words are vectorized using the bi-directional language model ELMo4 [12], so that the representation of a word is the concatenation of representations from the hidden layers of the bidirectional language model. Then, the vectors corresponding to the words of the aspect [𝑤𝑖𝑒𝑙𝑚𝑜 , ..., 𝑤𝑘𝑒𝑙𝑚𝑜 ] and context 𝑒𝑙𝑚𝑜 ] are selected. The resulting word embeddings of the [𝑤1𝑒𝑙𝑚𝑜 , 𝑤2𝑒𝑙𝑚𝑜 , ..., 𝑤𝑖𝑒𝑙𝑚𝑜 , ..., 𝑤𝑘𝑒𝑙𝑚𝑜 , ..., 𝑤𝑀 aspect and context we feed into a recurrent neural network based on LSTM layers [13] to ex- tract their internal states (“aspect representation” and “context representation”, respectively). After that, their average vectors are used to generate attention vectors. Next, the internal rep- resentations of the aspect and context are combined into the “Final Representation” vector, and the resulting vector is feed into a fully connected layer with the softmax activation function. Such implementation of the attention mechanism allows the target aspect and context to in- fluence the formation of their internal representations in an interactive mode. The scheme of the proposed model architecture is presented in Fig. 1. We evaluate the proposed model in terms of F1-macro and F1-micro scores (see section 4.1). 4 http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#elmo Figure 1: Overview the architecture of the proposed model based on IAN. 4. Experiments 4.1. Metrics To evaluate the performance of our models, we use the F1-measure metric as the evaluation score, as in the SentiRuEval 2015-2016 competitions. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑃) (3) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑁 ) (4) 𝐹 1 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2 ∗ (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙)/(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙) (5) where TP is the number of true positives, FP is the number of false positives, FN is the number of false negatives. In this case, the score is calculated in two variations (F1-macro and F1-micro). For macro- averaging the F1-measure calculation is averaged for each class separately, but for micro- averaging it is held for all examples together. 4.2. Experimental Results In the experiments to determine the sentiment for a specified named entity, the deep neural network method proposed in this paper is validated on the original SentiRuEval-2015 compe- tition dataset, and then trained and tested on the dataset collected in the current work. The SentiRuEval-2015 competition dataset was originally split by the competition organizers into train and test set containing 5974 and 6615 samples respectively. Model training was based on examples of three classes: positive, negative, and mixed. The best performance was achieved using the following hyperparameters for our model: Batch size – 4, Dropout – 0.3 (we add a dropout layer right before the recurrent LSTM layer), the number of neurons in the LSTM layers is 150, the learning rate is 0.01, the l2-regularization value is 0.001, the loss function is cross entropy [14]. Table 2 presents the model performance in terms of F1-micro and F1-macro scores. As a Table 2 Model performance on the SentiRuEval-2015 dataset. Automobiles Restaurants Model F1-micro F1-macro F1-micro F1-macro SentiRuEval-2015 - baseline [3] 0.62 0.26 0.71 0.27 SentiRuEval-2015 - best [15] 0.74 0.57 0.82 0.55 Our approach 0.79 0.6 0.85 0.58 result, the method proposed in this work shows better accuracy compared to other solutions from the SentiRuEval-2015 competition [3], where the best result was shown by the gradient boosting model [15]. In that work, the authors was given a feature vector formed for each aspect using emotional lexicon compiled under some rather complex rules. These lexicons were formed for a particular dataset in each domain area separately using rules written manually by an expert. The dataset collected in the current study was split into training and testing sets as 80% and 20% samples, respectively. The training set included 1081 named entities with negative senti- ment, 1829 entities with a positive sentiment, and 2454 with neutral. The testing set included 460 positive entities, 267 negative entities, and 614 neutral entities. To evaluate the obtained model results, experiments were conducted with baseline methods that solve the usual prob- lem of classifying sentences without paying any attention to the extracted named entity. These methods are built both on the rules using the vocabulary of emotive vocabulary, and with a classifier analyzing the entire sentence. To train such a classifier and select its hyperparam- eters, the AutoML method based on the TPOT library is used [16]. Thus, for comparison are used: 1. 𝑅𝑎𝑛𝑑𝑜𝑚: Random definition of a label for each aspect; 2. 𝐿𝑒𝑥𝑖𝑐𝑜𝑛: This classifier is based on the positive and negative sentiment word lists that were in Section 2 used for pre-selecting sentences. The sentence is given a sentiment label that belongs to the dictionary the largest number of words from which are present in the sentence. If the number of words included in dictionaries of different sentiments is the same, then the label of the most representative sentiment of the corpus is put, that is, “positive” in this case. If the sentence doesn’t contain any words from the sentiment lists, then the sentiment of the sentence is considered neutral. 3. 𝑇 𝑃𝑂𝑇 (𝐸𝐿𝑀𝑜): This classifier is based on the TPOT software library. The average vectors of aspect words of the analyzed sentence obtained from the ELMo model are used as input features. The type of classifier and its parameters were selected automatically by the TPOT library. The performance of the model on the proposed dataset, in comparison with the baseline meth- ods, is presented in Table 3. Table 3 Model performance on our dataset in terms of F1 scores. Twitter LJ Lenta. ru All Model micro macro micro macro micro macro micro macro Random 0.35 0.27 0.27 0.22 0.21 0.2 0.27 0.23 Lexicon 0.3 0.27 0.33 0.27 0.49 0.36 0.38 0.31 Tpot (ELMo) 0.63 0.58 0.57 0.56 0.74 0.7 0.65 0.56 Our approach 0.64 0.58 0.67 0.66 0.78 0.72 0.72 0.7 Besides, we have analyzed the ability of the model to classify aspects that are not present in the training set. The average performance across all sources has not changed. This confirms the effectiveness of the proposed approach for working with other named entities. 5. Results Visualization In this section, we present an example of a visualization of the results of aspect-based sentiment analysis. Experiments were conducted using Russian text corpus of the LiveJournal posts and the SCTM-ru dataset [17], compiled from articles from the Russian Wikinews website. For analysis, 47 news and 40 blog texts on the topic “cinema, oscar” was selected. These texts contain the following keywords: “film”, “role”, “cinema”, “oscar” etc. Visualization of the results is carried out in the form of Sankey diagrams, which shows the frequency of named entities contained in different sentiment contexts (see Fig. 2). The figure 2 shows that authors of news articles have a more positive point of view in their publications, while LiveJournal posts authors often create negative context. 6. Future work The main issues requiring further research are: 1. Reproducibility of results for other languages. This task is complicated by the fact that there are no labeled data sets where named entities are considered as aspects; 2. Verification and use of modern language models (for example BERT [18]), as well as other implementations of attention mechanisms; 3. Besides, it is planned to further develop the collected dataset, both from the side of in- creasing the number of examples and from the side of expanding sources and domain ar- eas, which will make it possible to better assess the universality of the developed method; 4. Establishing identity between different spellings of the same entities to more accurately determine the integral assessment of their sentiment. Figure 2: The frequency of mentions of named entities in negative and positive contexts for blog texts (left) and news (right). 7. Conclusion The paper presents a deep-neural-network-based method for aspect-based sentiment analysis, where aspect is expressed by the named entity (organization or person), for textual data in Russian. To solve the problem, a dataset of annotated sentences for several sources (blogs, microblogs, and news) were collected. The collected dataset is available for researchers upon request with https://sagteam.ru/en website. The developed method for building a dataset based on crowd- sourcing resources can be used to extend the dataset size and improve the performance of the proposed classifier. Also, the proposed method can be used in other domain areas to create labeled examples. Evaluation of the model both on the open dataset from SentiRuEval-2015 competition and on the collected annotated corpus shows the efficiency of the developed solution. The resulting performance is a baseline for this type of task in Russian and allows one to provide aspect- based analysis with clear visualization of the results, an example of which is presented in the paper. Acknowledgments The reported study was funded by an internal grant of the NRC "Kurchatov Institute" (Order No. 1359) and has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. References [1] M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, I. Androutsopoulos, Semeval- 2015 task 12: Aspect based sentiment analysis, in: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), 2015, pp. 486–495. [2] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. Al-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq, et al., Semeval-2016 task 5: Aspect based sentiment analysis, in: 10th International Workshop on Semantic Evaluation (SemEval 2016), 2016. [3] N. Loukachevitch, P. Blinov, E. Kotelnikov, Y. Rubtsova, V. Ivanov, E. Tutubalina, Sen- tirueval: testing object-oriented sentiment analysis systems in russian, in: Proceedings of International Conference Dialog, volume 2, 2015, pp. 3–13. [4] N. Lukashevich, Y. V. Rubtsova, Sentirueval-2016: overcoming time gap and data spar- sity in tweet sentiment analysis, in: Komp’yuternaya lingvistika i intellektual’nyye tekhnologii, 2016, pp. 416–426. [5] B. Huang, K. M. Carley, Parameterized convolutional neural networks for aspect level sentiment classification, arXiv preprint arXiv:1909.06276 (2019). [6] P. Chen, Z. Sun, L. Bing, W. Yang, Recurrent attention network on memory for aspect sen- timent analysis, in: Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp. 452–461. [7] D. Ma, S. Li, X. Zhang, H. Wang, Interactive attention networks for aspect-level sentiment classification, arXiv preprint arXiv:1709.00893 (2017). [8] Rusprofiling lab 2017 rusprofiling corpus of russian texts, [online], ???? Http://rusprofilinglab.ru/rusprofiling-atpan/corpus/. [9] Y. Rubtsova, Avtomaticheskoye postroyeniye i analiz korpusa korotkikh tekstov (postov mikroblogov) dlya zadachi razrabotki i trenirovki tonovogo klassifikatora. inzheneriya znaniy i tekhnologii semanticheskogo veba, Inzheneriya znanij i tekhnologii semantich- eskogo veba 1 (2012) 109–116. [10] V. Mozharova, N. Loukachevitch, Two-stage approach in russian named entity recogni- tion, in: 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), IEEE, 2016, pp. 1–6. [11] N. Loukachevitch, A. Levchik, Creating a general russian sentiment lexicon, in: Pro- ceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2016, pp. 1171–1176. [12] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contextualized word representations, arXiv preprint arXiv:1802.05365 (2018). [13] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (1997) 1735–1780. [14] R. Rubinstein, The cross-entropy method for combinatorial and continuous optimization, Methodology and computing in applied probability 1 (1999) 127–190. [15] J. Trofimovich, Comparison of neural network architectures for sentiment analysis of russian tweets, in: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue, 2016, pp. 50–59. [16] T. T. Le, W. Fu, J. H. Moore, Scaling tree-based automated machine learning to biomedical big data with a feature set selector, Bioinformatics 36 (2020) 250–256. [17] S. Karpovich, The russian language text corpus for testing algorithms of topic model, Intellektual’nyye tekhnologii na transporte (2018). [18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).