Extracting Sentiments towards COVID-19 Aspects Eduard Nugamanov, Natalia Loukachevitch, and Boris Dobrov Lomonosov Moscow State University, Leninskie Gory, 1, Moscow, Russia ed.nugamanov@gmail.com Abstract. In this paper, we introduce a specialized Russian dataset and study approaches for aspect-based sentiment analysis of Russian users’ comments about the COVID-19. We solve two tasks, namely Relevance Determination (RD), which aims to predict whether a sentence is relevant to an aspect of the pandemic, and Sentiment Classification (SC), which classifies the sentiment expressed towards an aspect in a sentence. We applied and tested various methods of machine learning, including fine- tuning of the pre-trained RuBERT model. The best results in both tasks were obtained by RuBERT model in the Natural Language Inference (NLI) formulation. Keywords: Aspect-based sentiment analysis · BERT model · natural- language inference. 1 Introduction COVID-19 is a dangerous infectious disease caused by the SARS-CoV-2 virus. Nowadays this infection is declared a pandemic and is one of the main threats to humanity endangering both physical and mental health of people. COVID- related issues are widely discussed in social media. Such discussions give great opportunities for psychologists, social scientists to study information dissemina- tion in social networks, influence of various sources on forming users’ opinions [2, 1]. Extracting opinions related to coronavirus can be considered as the aspect- based sentiment analysis task (ABSA)[17], which allows identifying sentiment towards specific issues of coronavirus epidemics. The ABSA task, intended for extraction of sentiment towards specific aspects of an entity or a topic was mainly studied on users’ reviews such as restaurant reviews, for example food or service aspects. In fact, in coronavirus-oriented discussions we can see the same ABSA task. Aspect-based Sentiment Analysis applied to COVID-related messages is one of the means to reveal the most frustrating aspects of the pandemic. In this paper, we introduce a Russian dataset and an approach to aspect- based sentiment analysis of Russian users’ comments about the COVID-19. The dataset is large enough (about 10 thousand messages) to train modern machine learning methods in order to classify the flow of user opinions on the above and similar issues. A similar dataset could not be found in the current world Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 299 publications. For the Russian language, there is no other manually annotated dataset of user messages related to the issues of coronavirus infection. We solve two tasks, namely Relevance Determination (RD), which aims to predict whether a sentence is relevant to an aspect of the pandemic, and Sen- timent Classification (SC), which classifies the sentiment expressed towards an aspect in a sentence. We applied and tested various methods of machine learn- ing, including fine-tuning of the pre-trained Russian BERT, RuBERT [12] model. In addition to this, we formulate original tasks as Natural Language Inference (NLI) and Question Answering (QA) problems [23] and applied RuBERT to them, which led to a significant increase in the quality of classification. 2 Related Work 2.1 COVID-related Sentiment Analysis During last year a lot of work were devoted to users’ posts concerning COVID- 19. In [21], the authors examine the propagation of misinformation, conducted sentiment analysis, and determined the main topics of discussion in a collection of tweets about the COVID-19 pandemic. The paper [3] studies people’s reaction to lockdown in India with Twitter. The researchers from [20] use clustering and sentiment analysis to categorize tweets about masks. Among used methods for sentiment analysis of COVID-related texts, general sentiment analysis prevails based on existing general sentiment classifiers [1, 2, 4, 5]. The most commonly used systems for sentiment analysis are VADER [9] and TextBlob [13]. However, the authors of [8] showed that the quality of classification of the users’ sentiment into three classes in relation to vaccines by the general-purpose VADER system is about 0.51 accuracy, the TextBlob system result is slightly higher than about 0.53. These low results can be explained by the fact that the above-mentioned systems were built and trained without taking into account the specifics of the COVID-epidemic topic. There are very few new specialized datasets that are manually annotated with respect to coronavirus or related aspects. The authors of [7] previously an- notated a dataset of tweets about the attitudes of social media users towards influenza vaccines, and this set is currently being used to research users’ atti- tudes towards coronavirus vaccination (FVD dataset). Hussain et al. [8] collected comments from Facebook and Twitter users regarding coronavirus vaccination and created a UKCOVID tagged dataset. They propose a combined approach using the VADER and TextBLob systems and retrain the BERT neural network model. For the Russian language, about 3903 tweets were extracted in the work [30], a general sentiment analysis was carried out based on the Dostoevsky model 1 . It is important to note that the Dostoevsky model is trained on the RuSentiment dataset of VKontakte posts about Ukrainian-Russian relations [19] and does not concern medical topics. The work [15] examines the attitudes of physicians to the problems of the coronavirus epidemic in specialized medical forums in Russian. 1 https://github.com/bureaucratic-labs/dostoevsky 300 2.2 ABSA Sentiment Analysis Task ABSA determines the sentiment expressed to some aspect of an entity in a text. Typically, one aspect can be represented by several terms or can be not expressed in a text at all. Early approaches to the ABSA task utilize extensive feature- engineering. So, in [11] a sentiment score computed on a large unlabeled corpus of reviews is assigned to every word and used as an input to the Support Vector Machine classifier along with other textual and syntactical features. Neural networks allowed researchers to avoid manual feature-engineering. First models were based on the LSTM architecture and attention mechanism. For instance, TD-LSTM [24] uses two LSTM networks to model left and right contexts of an aspect term. ATAE-LSTM [27] creates a representation of an aspect term to use in the attention mechanism along with other tokens. Introduction of transformers [25] allowed to improve the results. Its basis, Multi-Head Self-Attention (MHSA) layers, became a popular choice to extract relations between tokens in texts. So, AEN [22] uses MHSA layers to model both a context and an aspect term in the context. The latest innovation in NLP tasks is the utilization of pre-trained generative language models, such as ELMo [16] and BERT [6]. The latter is a bidirectional encoder based on the transformer architecture. It forms powerful context-aware representations of tokens, that can be used as an input to other architectures. Also, BERT can be fine-tuned by adding task-specific layers on top. For example, the LCF-ATEPC model [28] uses MHSA blocks on top of the BERT encodings to extract and classify target terms simultaneously. The SDGCN [29] architec- ture uses BERT representations as an input to BiLSTM network with Attention Mechanism, which models relations between a sentence and each its target with the help of graph convolutional networks, which model relations between differ- ent targets. One of the most important problems that face researchers is the lack of la- beled data. There are different approaches to that problem. For instance, BERT- ADA [18] performs domain adaption by pre-training BERT on unlabeled data. The BAT model [10] generates additional adversarial examples while training. The Snippext system [14] uses BERT for a variety of tasks: extraction and verifi- cation of pairs (target, opinion on target) from a text, its sentiment classification, determination of the aspect of the target. The authors utilized such techniques as data augmentation and semi-supervised learning. Besides, to perform an ef- fective and reliable augmentation they adapted the MixUp [26] operation from computer vision. 3 Covid Aspect Sentiment Dataset 3.1 Dataset Annotation For the dataset, users’ comments on Covid-2019 related news articles were col- lected from the VKontakte social network. We selected masks, quarantine (lock- down) or vaccines as aspects for sentiment annotation and extracted relevant 301 comments using corresponding keywords. Also sentiment attitudes towards gov- ernment measures were annotated for all selected comments. This government aspect is especially difficult for automatic analysis because mentioning of gov- ernment can be implicit as in the following sentence: – In Germany, a permanent mask, etc. regime, shots from Russia are very surprising, when nothing is observed at all. The total number of sentences is 10968. Each sentence was labeled by several experts (three on average). An annota- tor should indicate sentiment it expresses towards each of the above-mentioned four aspects (or indicate that the sentence is not relevant to the aspect). The annotators’ group included professional linguists and psychologists. We consider six types of sentiment labels, namely: – irrelevant; – positive; – negative; – neutral. This label is used for factual sentences without any visible senti- ments; – both positive and negative. For such a label, evident positive and negative attitudes should be seen in a message; – relevant, but impossible to determine. In this case, the presence of a senti- ment attitude is seen, but the context of sentence does not give possibility to determine it. A sentence is considered to be relevant to an aspect, if at least two annotators considered it relevant. Sentences collected using keywords also can be irrelevant, for example a sentence mentioning Elon Musk ( ”Mask” in Russian spelling) is not relevant to the mask aspect. Multiple annotations for a relevant sentence are translated to three sentiment classes: positive, negative, and other (comprising neutral, contradictory, and unclear cases) using the following rules: – a sentence has the positive score, if the number of positive annotations is more than the number of all other annotations for this sentence; – a sentence has the negative score, if the number of negative annotations is more than the number of all other annotations for this sentence; – otherwise the sentence is assigned to another category. For example, the following sentence “the mask allows you not to maintain health, but to save your family budget” had three different labels from annota- tors: positive, negative, and impossible to determine. This sentence in fact need more context to precisely determine its sentiment towards masks, the attitude depends on interpretation. According to the above described rules, its resulting sentiment category is other. Table 1 provides sizes of resulting categories for each aspect. It can be seen that the attitudes to masks and quarantine are mainly positive, the attitudes to government actions are mainly negative. 302 Aspect Relevant Negative Positive Other Masks 5097 861 (17%) 1011 (20%) 3225 (63%) Vaccines 2604 601 (23%) 538 (21%) 1465 (56%) Quarantine 3515 244 (7%) 868 (25%) 2403 (68%) Government 1585 1027 (65%) 54 (3%) 504 (32%) Table 1. Sizes of sentiment categories 3.2 Analysing Annotators Disagreement The most significant disagreement between annotators concerns assigning posi- tive and negative scores to the same user’s post. We found the following main cases for positive-negative disagreement between annotators. First case. An author of a comment describes an opinion of another person, disagreement of the author with this opinion can be seen. In such cases some annotators can assess the sentiment of the author; other annotators can give label ”positive and negative” (because two opinions are seen) or ”impossible to determine”; the third annotator can select the described position because it takes most part of the sentence. For example (all examples are translated from Russian): – My father is so ... He endlessly repeats that masks, like vaccination, are a way of enslavement and he has an eternal ”they are watching us” in his mind, I endlessly tease him, they say, be careful. – But, just they think, since they are already sick, they no longer need a mask, there is nothing to defend against and they sneeze at everyone. Second case. The author tries to offend another participant of the dialogue using the aspect words: – well, nothing, nothing, someday for people like you they will definitely come up with vaccinations - from stupidity. Here one annotator consider this com- ment as irrelevant to vaccines, other two annotators provide contradictory opinions (positive-negative) Third case A comment describes some violations of mask or quarantine regimes. Some annotators consider such sentences as factual, neutral, other an- notators try to infer some positive or negative positions. for example: – Because few tourists comply with the quarantine measures. Also typos may occur, which are difficult to explain. Because of all above- mentioned problems, we try to have at least three annotations for each comment. 303 4 Architecture and Methods for COVID Aspects Analysis In the scope of this work, we use the RuBERT-conversational language model as a powerful feature extractor for classification. RuBERT-conversational is the BERT language model pre-trained on a large number of Russian tweets by the DeepPavlov project2 . It greatly fits our needs because it was tuned on spoken and informal language data. As the original BERT model, the input sequence of this model is either one or two sentences framed with special tokens: [CLS], A1 , . . . , Am , [SEP ], B1 , . . . , Bn [SEP ] where A1 , . . . , Am are tokens of the first sentence, B1 , . . . , Bn are tokens of the second sentence, [SEP ] is a special separating token, and [CLS] is a special token, which represents the whole input sequence for classification tasks. BERT returns hidden representations of every token of the input as the output. Furthermore, the representation of the [CLS] token is processed by a fully connected layer, which was pre-trained for the Next Sentence Prediction objective, and the tanh activation function. For the relevance determination and sentiment classification tasks, we added two fully-connected layers, containing 256 and K (a number of classes) outputs respectively, on top of the final representation of [CLS]. These layers are pre- ceded by dropout layers with the rate of 0.5 and followed by the ReLU activation function: H1d = dropout(0.5)(H[CLS] ); H2 = ReLU (W1 H1d + b1 ); H2d = dropout(0.5)(H2 ); output = W2 H2d + b2 ; where H[CLS] ∈ [−1, 1]768 is the embedding vector of [CLS], W1 ∈ R256×768 , b1 ∈ R256 , W2 ∈ RK×256 , b1 ∈ RK are trainable parameters of the layers, and K is the number of outputs which is equal to number of classes in a task. We formulate and solve the original classification tasks in different ways. First of all, we trained separate classifiers for each aspect. In that case, a document is an input to a classifier, and the output is either its relevance (0 or 1) or its sentiment (positive, negative, and other) to a considered aspect. In the second case, a document must be relevant to an aspect. Secondly, the relevance determination problem was also postulated as a Nat- ural Language Inference (NLI) problem. In that case, a classifier operates with all the given aspects and is able to learn relevance relations for new aspects if new data comes. The input of the classifier is a pair (s, h) of a sentence and an affirmative hypothesis about its relevance to an aspect, and the output is 2 https://huggingface.co/DeepPavlov/rubert-base-cased-conversational 304 whether h is true (0 or 1). For example, h can state “Is relevant to masks” or “Is relevant to vaccines”. Thirdly, we formulate the sentiment classification problem as an NLI prob- lem as well. In that case, for each triple (s, h) of a sentence and an affirmative hypothesis about its sentiment towards a relevant aspect, the classifier is trained to predict whether h holds truth (0 or 1). In that case, h may be “Is positive to masks” or “Is negative to quarantine”. Finally, the sentiment classification problem was stated as a Question An- swering (QA) problem. In this formulation, we train a classifier to predict the sentiment polarity given a pair (s, a) of an expression and an aspect. In that case, a is simply an aspect, such as “Masks” or “Quarantine”, and the output is a sentiment category. We decided not to use QA formulation for the relevance determination task because in that case it is equivalent to the NLI formulation. Task Epochs LR Sentiment Classification (NLI) 4 5e-6 Sentiment Classification (QA) 7 1e-5 Relevance Determination (NLI) 3 5e-5 RD and SC (aspect-specific) 7 1e-5 Table 2. Hyperparameters used by neural networks for different tasks. 5 Results of Experiments During the experiments, we compare several variants of RuBERT-based mod- els with classical machine learning methods, namely, Support Vector Machine (SVM), Multinomial Naı̈ve Bayes (MNB), Bernoulli Naı̈ve Bayes (BNB), Gradi- ent Boosting (GB), and Random Forest (RF). Implementations of the classical algorithms were taken from the scikit-learn library3 . These models receive tf-idf vectors as the inputs. To obtain the vectors, we tokenized and lemmatized texts, dropped stopwords, punctuation marks, and words that are seen less than in five documents. We tuned their hyperparameters with a Bayesian Optimization algorithm realized in the tune-sklearn library. We utilized an implementation of the RuBERT-conversational model from the Transformers library. Other steps were performed with the PyTorch library 4 . The models were trained with the standard back-propagation algorithm. The size of a batch was set to 64. We utilized cross-entropy loss as a loss function, AdamW as an optimizer. OneCyclicLR with maximum learning rate of 3e-5 was utilised for learning aspect-specific SC and RD tasks (in the standard formulation). Other 3 https://scikit-learn.ru/ 4 https://pytorch.org/ 305 Aspect Model Accuracy Precision Recall F1 SVM 99.45 98.47 99.23 98.85 MNB 98.60 96.93 97.18 97.06 Vaccines BNB 98.45 95.17 98.46 96.79 RF 99.67 98.98 99.62 99.30 GB 99.64 98.98 99.49 99.23 SVM 98.27 98.82 95.73 97.25 MNB 97.54 96.64 95.64 96.14 Quarantine BNB 97.81 96.76 96.39 96.58 RF 98.30 97.61 97.06 97.34 GB 98.36 98.36 96.49 97.41 SVM 99.09 98.64 99.41 99.02 MNB 98.12 97.48 98.50 97.98 Masks BNB 98.63 98.43 98.63 98.53 RF 99.12 98.64 99.48 99.06 GB 99.33 98.77 99.80 99.28 SVM 85.41 49.68 49.06 49.37 MNB 75.78 32.94 64.78 43.67 Government BNB 82.83 42.03 48.64 45.09 RF 86.17 52.96 41.30 46.41 GB 87.21 65.05 25.37 36.50 SVM 95.55 86.40 85.86 86.12 MNB 92.51 81.00 89.03 83.71 Average BNB 94.43 83.10 85.53 84.25 RF 95.82 87.05 84.37 85.53 GB 96.14 90.29 80.29 83.10 Table 3. Performance of classical machine-learning methods in the relevance determi- nation task parameters were specific to the tasks and described in Table 2. In addition, we kept track of a current best (according to F1-score) model after each epoch. All the models were tested with a random stratified train-test split, with a test size of 0.3. More precisely, the original texts were split into those collections, whereas the task-specific datasets were formed based on the same stratified train- test split. Table 3 shows the performance of classical machine learning methods in the relevance determination task. The low results of classification for the “govern- ment actions” aspect can be explained with the diversity of lexical expressions of this aspect in comments. Some sentences do not contain direct mentions of this aspect, but nevertheless, express some opinion. Table 4 provides macro-averaged scores of classical machine-learning methods for the sentiment classification task. 306 Aspect Model Accuracy macroPrec macroRecall macroF1 SVM 56.41 44.91 44.29 44.27 MNB 54.49 42.52 44.02 43.10 Vaccines BNB 56.41 44.45 33.42 38.05 RF 56.79 45.99 26.82 33.55 GB 58.59 54.58 21.09 30.16 SVM 58.82 30.99 45.46 35.94 MNB 60.82 29.54 39.94 33.92 Quarantine BNB 65.84 30.44 26.27 28.20 RF 68.79 43.54 24.83 31.02 GB 69.73 41.46 16.26 22.36 SVM 64.94 45.25 46.23 45.74 MNB 61.81 41.60 51.73 45.98 Masks BNB 65.99 46.19 37.13 40.94 RF 67.69 56.64 28.95 38.25 GB 66.32 55.88 18.47 27.60 SVM 62.89 88.21 35.68 40.82 MNB 59.54 41.52 37.64 39.21 Government BNB 62.89 36.51 36.16 36.33 RF 61.22 42.08 40.73 40.47 GB 61.84 33.60 41.04 36.95 SVM 60.77 52.34 42.91 41.69 MNB 59.16 38.80 43.33 40.55 Average BNB 62.78 39.40 33.24 35.88 RF 63.62 47.06 30.33 35.82 GB 64.12 46.38 24.21 29.27 Table 4. Performance of classical machine learning methods in the sentiment classifi- cation task 307 Table 5 and Table 6 compare results of the best (according to F1-score) classi- cal methods to RuBERT-based models in relevance determination and sentiment classification tasks correspondingly. Aspect Model Accuracy Precision Recall F1 NLI 99.70 98.98 99.74 99.36 Vaccines RuBERT 99.67 98.98 99.62 99.30 RF 99.67 98.98 99.62 99.30 NLI 98.27 98.16 96.39 97.27 Quarantine RuBERT 98.51 97.99 97.34 97.67 GB 98.36 98.36 96.49 97.41 NLI 99.27 98.77 99.67 99.22 Masks RuBERT 99.48 99.09 99.80 99.45 GB 99.33 98.77 99.80 99.28 NLI 88.45 65.59 42.77 51.78 Government RuBERT 86.84 54.55 55.35 54.94 SVM 85.41 49.68 49.06 49.37 NLI 96.42 90.38 84.64 86.91 Average RuBERT 96.13 87.65 88.03 87.84 SVM 95.55 86.40 85.86 86.12 Table 5. The results of RuBERT-based models and their comparison with the best classical methods in the relevance determination task As we can see from the tables, both classical methods and neural networks determine the relevance of messages with the high quality when the messages include direct mentions of an aspect. In more complex scenarios, neural networks show better results. In the sentiment classification task, neural networks also achieve higher scores, because they consider context and the order of words. Finally, the use of the NLI and the QA formulations increased the scores of the sentiment classification task, whereas the QA formulation performs slightly better. It may be explained by the introduction of additional aspect-related features to the input of the models and by the usage of the whole collection of sentences for training. The lowest results of macro measures are obtained for the government aspect, which can be explained with small number of examples in the positive class. As for the RD task, the introduction of the new formulations did not increase overall quality. This behavior may be caused by the fact that the task is too ’simple’ for the model to improve further. 308 Aspect Model Accuracy MacroPrec MacroRecall MacroF1 NLI 66.67 63.88 59.74 61.24 QA 66.54 63.62 61.62 62.40 Vaccines RuBERT 62.56 58.86 59.76 59.25 SVM 56.41 44.91 44.29 44.27 NLI 74.38 70.07 50.82 53.52 QA 73.81 63.64 59.38 61.13 Quarantine RuBERT 73.72 57.74 54.64 55.85 SVM 58.82 30.99 45.46 35.94 NLI 70.83 64.29 56.76 59.21 QA 71.03 63.83 61.76 62.69 Masks RuBERT 65.27 57.84 56.14 56.14 MNB 61.81 41.60 51.73 45.98 NLI 69.81 44.79 41.51 41.04 QA 68.97 43.19 42.17 41.91 Government RuBERT 68.76 43.84 46.21 44.83 SVM 62.89 88.21 35.68 40.82 NLI 70.42 60.76 52.21 53.75 QA 70.09 58.57 56.23 57.03 Average RuBERT 67.58 54.57 54.19 54.02 SVM 60.77 52.34 42.91 41.69 Table 6. The results of RuBERT-based models and their comparison with the best classical methods in the sentiment classification task 309 6 Conclusion In this paper, we introduce a specialized Russian dataset of Russian users’ com- ments about COVID-19 aspects. The dataset contains sentences with sentiment scores towards four topics widely discussed such as masks, vaccines, quaran- tine, and government measures. Each comment is scored by three annotators on average. We studied approaches to aspect-based sentiment analysis of the created dataset. We solved two tasks, namely Relevance Determination (RD), which aims to predict whether a sentence is relevant to an aspect of the pandemic, and Sentiment Classification (SC), which classifies the sentiment expressed towards an aspect in a sentence. We applied and tested various methods of machine learning, including fine- tuning of the pre-trained RuBERT model. The best results were obtained by RuBERT model in special settings called Natural Language Inference (NLI) and Question Answering (QA), in which an additional sentence is added to a classified sentence, indicating a target aspect. The created collection is publicly available5 . Acknowledgements. The reported study was funded by RFBR according to the research project № 20-04-60296. References 1. Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., Shah, Z.: Top concerns of tweeters during the covid-19 pandemic: infoveillance study. Journal of medical Internet research 22(4), e19016 (2020) 2. Alamoodi, A., Zaidan, B., Zaidan, A., Albahri, O., Mohammed, K., Malik, R., Almahdi, E., Chyad, M., Tareq, Z., Albahri, A., et al.: Sentiment analysis and its applications in fighting covid-19 and infectious diseases: A systematic review. Expert systems with applications p. 114155 (2020) 3. Barkur, G., Vibha, Kamath, G.: Sentiment analysis of nationwide lockdown due to covid 19 outbreak: Evidence from india. Asian Journal of Psychiatry 51 (Jun 2020). https://doi.org/10.1016/j.ajp.2020.102089 4. Chandrasekaran, R., Mehta, V., Valkunde, T., Moustakas, E.: Topics, trends, and sentiments of tweets about the covid-19 pandemic: Temporal infoveillance study. Journal of medical Internet research 22(10), e22624 (2020) 5. De Santis, E., Martino, A., Rizzi, A.: An infoveillance system for detecting and tracking relevant topics from italian tweets during the covid-19 event. IEEE Access 8, 132527–132538 (2020) 6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018), http://arxiv.org/abs/1810.04805 7. Huang, X., Smith, M.C., Paul, M.J., Ryzhkov, D., Quinn, S.C., Broniatowski, D.A., Dredze, M.: Examining patterns of influenza vaccination in social media. In: Workshops at the thirty-first AAAI conference on artificial intelligence (2017) 5 https://github.com/LAIR-RCC/RussianCovidDataset 310 8. Hussain, A., Tahir, A., Hussain, Z., Sheikh, Z., Gogate, M., Dashtipour, K., Ali, A., Sheikh, A.: Artificial intelligence–enabled analysis of public attitudes on facebook and twitter toward covid-19 vaccines in the united kingdom and the united states: Observational study. Journal of medical Internet research 23(4), e26627 (2021) 9. Hutto, C., Gilbert, E.: Vader: A parsimonious rule-based model for sentiment anal- ysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 8 (2014) 10. Karimi, A., Rossi, L., Prati, A., Full, K.: Adversarial training for aspect- based sentiment analysis with BERT. CoRR abs/2001.11316 (2020), https://arxiv.org/abs/2001.11316 11. Kiritchenko, S., Zhu, X., Cherry, C., Mohammad, S.: NRC-Canada-2014: De- tecting aspects and sentiment in customer reviews. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). pp. 437– 442. Association for Computational Linguistics, Dublin, Ireland (Aug 2014). https://doi.org/10.3115/v1/S14-2076, https://www.aclweb.org/anthology/S14- 2076 12. Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilin- gual transformers for russian language. CoRR abs/1905.07213 (2019), http://arxiv.org/abs/1905.07213 13. Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., Dempsey, E., et al.: Textblob: simplified text processing. Secondary TextBlob: simplified text process- ing 3 (2014) 14. Miao, Z., Li, Y., Wang, X., Tan, W.: Snippext: Semi-supervised opin- ion mining with augmented data. CoRR abs/2002.03049 (2020), https://arxiv.org/abs/2002.03049 15. Ovchinnikova, I., Ermakova, L., Nurbakova, D.: Sentiments in russian medical pro- fessional discourse during the covid-19 pandemic. In: Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emo- tion’s in Social Media. pp. 99–108 (2020) 16. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettle- moyer, L.: Deep contextualized word representations. CoRR abs/1802.05365 (2018), http://arxiv.org/abs/1802.05365 17. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al.: Semeval- 2016 task 5: Aspect based sentiment analysis. In: International workshop on se- mantic evaluation. pp. 19–30 (2016) 18. Rietzler, A., Stabinger, S., Opitz, P., Engl, S.: Adapt or get left behind: Domain adaptation through BERT language model finetuning for aspect-target sentiment classification. CoRR abs/1908.11860 (2019), http://arxiv.org/abs/1908.11860 19. Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., Gribov, A.: RuSentiment: An enriched sentiment analysis dataset for social media in Russian. In: Proceedings of the 27th International Conference on Computational Linguistics. pp. 755–763. Association for Computational Linguistics, Santa Fe, New Mexico, USA (Aug 2018), https://www.aclweb.org/anthology/C18-1064 20. Sanders, A.C., White, R.C., Severson, L.S., Ma, R., McQueen, R., Alcântara Paulo, H.C., Zhang, Y., Erickson, J.S., Bennett, K.P.: Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of covid-19 twitter discourse. medRxiv (2021). https://doi.org/10.1101/2020.08.28.20183863, https://www.medrxiv.org/content/early/2021/03/20/2020.08.28.20183863 311 21. Sharma, K., Seo, S., Meng, C., Rambhatla, S., Dua, A., Liu, Y.: Coron- avirus on social media: Analyzing misinformation in twitter conversations. CoRR abs/2003.12309 (2020), https://arxiv.org/abs/2003.12309 22. Song, Y., Wang, J., Jiang, T., Liu, Z., Rao, Y.: Attentional encoder net- work for targeted sentiment classification. CoRR abs/1902.09314 (2019), http://arxiv.org/abs/1902.09314 23. Sun, C., Huang, L., Qiu, X.: Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 380–385 (2019) 24. Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: Proceedings of COLING 2016, the 26th Interna- tional Conference on Computational Linguistics: Technical Papers. pp. 3298– 3307. The COLING 2016 Organizing Committee, Osaka, Japan (Dec 2016), https://www.aclweb.org/anthology/C16-1311 25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. CoRR abs/1706.03762 (2017), http://arxiv.org/abs/1706.03762 26. Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hid- den states. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Ma- chine Learning Research, vol. 97, pp. 6438–6447. PMLR (09–15 Jun 2019), http://proceedings.mlr.press/v97/verma19a.html 27. Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 606–615. Association for Compu- tational Linguistics, Austin, Texas (Nov 2016). https://doi.org/10.18653/v1/D16- 1058, https://www.aclweb.org/anthology/D16-1058 28. Yang, H., Zeng, B., Yang, J., Song, Y., Xu, R.: A multi-task learning model for chinese-oriented aspect polarity classification and aspect term extraction. CoRR abs/1912.07976 (2019), http://arxiv.org/abs/1912.07976 29. Zhao, P., Hou, L., Wu, O.: Modeling sentiment dependencies with graph convolu- tional networks for aspect-level sentiment classification. CoRR abs/1906.04501 (2019), http://arxiv.org/abs/1906.04501 30. Zhuravlev, A., Kitova, D.: Emotional characteristics of the attitude of social net- work users towards the coronavirus infection. Proceedings of the ’V.M. Behterev and modern personality psychology’ conference pp. 208–2011 (2020) 312