=Paper=
{{Paper
|id=Vol-2277/paper33
|storemode=property
|title=
Extracting Sentiment Attitudes from Analytical Texts via Piecewise Convolutional Neural Network
|pdfUrl=https://ceur-ws.org/Vol-2277/paper33.pdf
|volume=Vol-2277
|authors=Nicolay Rusnachenko,Natalia Loukachevitch
|dblpUrl=https://dblp.org/rec/conf/rcdl/RusnachenkoL18
}}
==
Extracting Sentiment Attitudes from Analytical Texts via Piecewise Convolutional Neural Network
==
Extracting Sentiment Attitudes from Analytical Texts via Piecewise Convolutional Neural Network © N.L. Rusnachenko 1 © N.V. Loukachevitch 2 1 Bauman Moscow State Technical University, Moscow, Russia 2 Lomonosov Moscow State University, Moscow, Russia kolyarus@yandex.ru louk_nat@mail.ru Abstract. For deep text understanding, it is necessary to explore the connections between text units mentioning events, entities, etc. Depending on the further goals, it allows to consider the text as a graph of task-specific relations. In this paper, we focused on analysis of sentiment attitudes, where the attitude represents a sentiment relation from subject towards object. Given a mass media article and list of mentioned named entities, the task is to extract sentiment attitudes between them. We propose a specific model based on convolutional neural networks (CNN), independent of handcrafted NLP features. For model evaluation, we use RuSentRel 1.0 corpora, consisted of mass media articles written in Russian. Keywords: sentiment analysis, convolutional neural networks, data intensive domains subject towards an object, where each end of such 1 Introduction relations represents a mentioned named entity. We propose a model based on the modified Automatic sentiment analysis, i.e. the identification of architecture of Convolutional Network Networks the authors’ opinion on the subject discussed in the text, (CNN). The model predicts a sentiment score for a given is one of the most popular applications of natural attitude in context. In case of the original CNN language processing during the last years. architecture, max pooling operation reduces information One of the most popular direction becomes a (convolved attitude context) quite rapidly. The modified sentiment analysis of user posts. Twitter [7] social architecture decreases the speed by reducing attitude network allows rapidly spread news in a form of short context in pieces. The borders of such pieces related to text messages, where some of them express user attitude entities positions. We use RuSentRel 1.0 corpus opinions. Such texts are limited in length and has only a for model evaluation. Both models based on original and single object for analysis – author opinion towards the modified CNN architectures significantly outperform service or product quality [1, 12]. These factors make this baselines and perform better than classifiers based on area well studied. handcrafted NLP features. Large texts, such as analytical articles represent a complicated genre of documents for sentiment analysis. 2 Related works Unlike short posts, large articles expose a lot of entities where some of them connected by relations. The Relation extraction becomes popular since the connectivity allows us to represent article as a graph. appearance of the relation classification track in This kind of representation is necessary for information proceedings of SemEval-2010 conference. In [6] authors extraction (IE) [6]. Analytical texts contain Subject- introduce a dataset for a task of semantic classification Object relations, or attitudes conveyed by different between pair of common nominals. The classification subjects, including the author(s) attitudes, positions of considered in terms of nominals context. This restriction cited sources, and relations of the mentioned entities introduced for simplicity and meaning disambiguation. between each other. The resulted model allows composing a semantic Besides, an analytical text can have a complicated network for a given text with connections, accompanied discourse structure. Given an example: «Donald Trumpe1 by the relation type (Part-Whole, Member-Collection, accused Chinae2 and Russiae3 of “playing devaluation of etc.). currencies”». This sentence illustrates an attitude from In 2014, the TAC evaluation conference in subject 𝑒1 towards multiple objects 𝑒2 and 𝑒3 , where Knowledge Base Population (KBP) track included so- objects have no attitudes within themselves. called sentiment track [5]. The task was to find all the Additionally, statements of opinion can take several cases where a query entity (sentiment holder) holds a sentences, or refer to the entity mentioned several sentiment (positive or negative) about another entity sentences earlier. (sentiment target). Thus, this task was formulated as a In this paper we introduce a problem of sentiment query-based retrieval of entity-sentiment from relevant attitude extraction from analytical articles written in documents and focused only on query entities. Russian. Here attitude denotes a directed relation from In [9] authors discover a target sentiment detection towards named entities in text. Depending on context, Proceedings of the XX International Conference this sentiment arises from a variety of factors, such as “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL’2018), Moscow, Russia, October 9-12, 2018 186 writer experience, attitudes from other entities towards idea was proceeded by the authors of paper [16] in terms target, etc.: «So happy that [Kentucky lost to of max pooling operation. This operation applies for a Tennessee] event». In latter example, Kentucky has convolved by filters data and extracts maximal value negative attitude towards Tennessee, but the writer has within each convolution. The authors proposed to treat positive one. The authors investigated how to detect each convolution in parts. The division into parts was named entity (NE) and sentiment expressed towards it. A related to attitude ends and was as follows: inner, and variety of models based on conditional random fields outer. This results in an advanced CNN architecture (CRF) were implemented. All models were trained based model and was dubbed as Piecewise Convolutional on the list of predefined features. The experiments were Neural Network (PCNN). subdivided into three tasks (in order of complexity In this paper, we present an application of the growth): NE recognition, subjectivity prediction (fact of PCNN model [16] towards sentiment attitudes sentiment existence along the target), sentiment NE extraction. We use automatically trainable features prediction (3-scale classification). instead of handcrafted NLP features. For illustrating In [17] authors proceed discover of target sentiment effectiveness, we compared our results with original detection. Being modeled as a sequence labeling CNN implementation, and other approaches: baselines, problem, the authors exploit word embeddings with classifiers based on handcrafted features. automatic features training within neural network models. Due to CRF model’s affection, the authors 3 Dataset experimented with models based on conditional neural We use RuSentRel 1.0 corpus1 consisted of analytical fields architecture (CNF) [11]. As in [9], the task was articles from Internet-portal inosmi.ru [8]. These articles considered in following parts: entities classification, in the domain of international politics were obtained entities extraction and classification. from foreign authoritative sources and translated into MPQA 3.0 [4] is a corpus of analytical articles with Russian. The collected articles contain both the author's annotated opinion expressions (towards entities and opinion on the subject matter of the article and a large events). The annotation is sentence-based. For example, number of references mentioned between the participants in the sentence «When the Imam issued the fatwa against of the described situations. Salman Rushdie for insulting the Prophet ...», Imam is negative to Salman Rushdie, but is positive to the Figure 1 Opinion annotation example for article 4 Prophet. The current corpus consists of 70 documents. (dashed: negative attitudes; solid: positive attitudes) In total, sentiments towards 4,459 targets are labeled. The paper [3] studied the approach to the discovery of the documents attitudes between subjects mentioned in the text. The approach considers such features as relatedness between entities, frequency of a named entity in the text, direct-indirect speech, and other features. The best quality of opinion extraction obtained in the work was only about 36% F-measure by two sentiment classes, which illustrates the necessity of improving extraction of attitudes at the document level is significant. For the analysis of sentiments with multiple targets in a coherent text, in the works [2] and [13] the concept of sentiment relevance is discussed. In [2], the authors consider several types of thematic importance of the entities discussed in the text: the main entity, an entity from a list of similar entities, accidental entity, etc. These For the documents, the manual annotation of the types are treated differently in sentiment analysis of sentiment attitudes towards the mentioned named entities coherent texts. has been carried out. The annotation can be subdivided For relation extraction, in [15] the task was modeled into two subtypes: by convolutional neural network towards context The author's relation to mentioned named entities; representation based on word embedding features. The relation of subjects expressed as named entities Convolving such embedding by a set of different filters, to other named entities. the authors implemented and trained Convolutional Neural Network (CNN) model for the relation Figure 1 illustrates annotated article attitudes in graph classification task. Being applied for SemEval-2010 format. These opinions are as Subject-Object relations Task 8 dataset [6] the resulted model significantly type in terms of related terminology [6] and recorded as outperforms the results of other participants. triplets: (Subject of opinion, Object of opinion, attitude). However, for the relation classification task, the The attitude can be negative (neg) or positive (pos), for original max pooling reduces information extremely example (Author, USA, neg), (USA, Russia, neg). Neutral rapid, and hence, blurs significant relation aspects. The opinions are not recorded. The attitudes are described for 1 https://github.com/nicolay-r/RuSentRel/tree/v1.0 187 the whole documents, not for each sentence. In some without indication of any sentiment to each other per a texts, there were several opinions of the different document. This number is much larger than number of sentiment orientation of the same subject in relation to positive or negative sentiments in documents, which the same object. This, in particular, could be due to the additionally stresses the complexity of the task. comparison of the sentiment orientation of previous relations and current relations (for example, between 4. Sentiment attitudes extraction Russia and Turkey). Or the author of the article could In this paper, the task of sentiment attitude extraction mention his former attitude to some subject and indicate is treated as follows: given an attitude as a pair of its the change of this attitude at the current time. In such named entities, we predict a sentiment label of a pair, cases, it was assumed that the annotator should specify which could be positive, negative, or neutral. exactly the current state of the relationship. In total, 73 The act of extraction is to select only those pairs, large analytical texts were labeled with about 2000 which were predicted as non-neutral. This leads to the relations. following questions: To prepare documents for automatic analysis, the 1. How to complete a set of all attitudes? texts were processed by the automatic name entity 2. How to predict attitude labels? recognizer, based on CRF method [10]. The program identified named entities that were categorized into four 4.1 Composing attitude sets classes: Persons, Organizations, Places and Geopolitical Entities (states and capitals as states). In total, 15.5 Given a list of synonym groups 𝑆 provided by RuSentRel thousand named entity mentions were found in the dataset (see Section 3), let 𝑆(𝑤) is a function which documents of the collection. An analytical document can returns a synonym by given word3 or phrase 𝑤. refer to an entity with several variants of naming The pair of attitudes 𝑎1 = (𝑒1,𝑙 , 𝑒1,𝑟 ) and 𝑎2 = (Vladimir Putin – Putin), synonyms (Russia – Russian (𝑒2,𝑙 , 𝑒2,𝑟 ) are equal up to synonyms 𝑎1 ≃ 𝑎2 when both Federation), or lemma variants generated from different ends related to the same synonym group: wordforms. Besides, annotators could use only one of 𝑆(𝑒1,𝑙 ) = 𝑆(𝑒2,𝑙 ) 𝑎𝑛𝑑 𝑆(𝑒1,𝑟 ) = 𝑆(𝑒2,𝑟 ) (1) possible entity’s names describing attitudes. For correct Using Formula 1 we define that 𝐴 is a set without inference of attitudes between named entities in the synonyms as follows: whole document, the dataset provides the list of variant 𝐴: ∄𝑎𝑖 , 𝑎𝑗 ∈ 𝐴: {𝑎𝑖 ≃ 𝑎𝑗 , 𝑖 ≠ 𝑗} (2) names for the same entity found in our corpus. The To complete a training set 𝐴𝑡𝑟𝑎𝑖𝑛 , we first compose current list contains 83 sets of name variants. This allows auxiliary sets without synonyms: 𝐴𝑠 is a set of sentiment separating the sentiment analysis task from the task of attitudes, and 𝐴𝑛 – is a set of neutral attitudes. For 𝐴𝑠 , the named entity coreference. etalon opinions were used to find related named entities A preliminary version of the RuSentRel corpus was to compose sentiment attitudes. 𝐴𝑛 consist of attitudes granted to the Summer school on Natural Language composed between all available named entities of the Processing and Data Analysis2, organized in Moscow in train collection. In this paper, the context attitudes were 2017. The collection was divided into the training and limited by a single sentence. Finally, completed 𝐴𝑡𝑟𝑎𝑖𝑛 is test parts. In the current experiments, we use the same an expansion 𝐴𝑠 with 𝐴𝑛 : division of the data. Table 1 contains statistics of the 𝐴𝑡𝑟𝑎𝑖𝑛 = 𝐴𝑠 ∪ 𝐴𝑛 : training and test parts of the RuSentRel corpus. (3) ∄𝑖, 𝑗: {𝑎𝑖 ≃ 𝑎𝑗 , 𝑎𝑖 ∈ 𝐴𝑠 , 𝑎𝑗 ∈ 𝐴𝑛 } Table 1 Statistics of RuSentRel 1.0 corpus To estimate the model, we complete the test set 𝐴𝑡𝑒𝑠𝑡 Parameter Training Test of neutral attitudes without synonyms. It consists of collection collection attitudes composed between all available named entities Number of documents 44 29 within a single sentence of the test collection. Table 2 Sentences (avg. per doc.) 74.5 137 illustrates amount of attitudes both for the train and test collections. Mentioned NE (avg. per doc.) 194 300 Unique NE (avg. per doc.) 33.3 59.9 Table 2 Context attitudes amount Pos. pairs of NE (avg. per doc.) 6.23 14.7 Attitudes count 𝐴𝑡𝑟𝑎𝑖𝑛 𝐴𝑡𝑒𝑠𝑡 Neg. pairs of NE (avg. per doc.) 9.33 15.6 Positive 571 (7.2%) - Negative 735 (9.3%) - Neu. pairs of NE (avg. per doc.) 120 276 Neutral 6584 (83.5%) 8024 Avg. dist. between NE within a 10.2 10.2 sentence in words 4.2 Labels prediction Share of attitudes expressed in a 76.5% 73% For label prediction, we use an approach that exploits a single sentence word embedding model and automatically trainable The last line of the Table 1 shows the average number features. We implemented an advanced CNN model, of named entities pairs mentioned in the same sentences dubbed as Piecewise Convolutional Neural Network 2 3 https://miem.hse.ru/clschool/ The case of synonym absence has been resolved by completing a new group with the single element {𝑤}. 188 (PCNN), proposed by [16]. 𝑄 = {𝑞1 , … , 𝑞𝑘 }, where 𝑞𝑖 ∈ ℝ𝑚 . We denote 𝑞𝑖:𝑗 as consequent vectors concatenation from 𝑖'th till 𝑗'th 4.2.1 Attitude embedding positions. An application of 𝐰 ∈ ℝ𝑑 , (𝑑 = 𝑤 ⋅ 𝑚) The attitude embedding is a form of an attitude towards the concatenation 𝑞𝐢:𝐣 is a sequence convolution representation in a way of a related context, where each by filter 𝐰, where 𝑤 is a filter window size. Figure 1 Figure 1 Attitude embedding matrix illustrates 𝑤 = 3. For convolving calculation 𝑐𝑗 , we apply scalar multiplication as follows: 𝑐𝑗 = 𝐰𝑞𝑗−𝑤+1:𝑗 (5) Where 𝑗 ∈ 1 … 𝑘 is filter offset within the sequence 𝑄. We decide to let 𝑞𝑖 a zero-based vector of size 𝑚 in case when 𝑖 < 0 or 𝑖 > 𝑘. As a result, 𝐜 = {𝑐1 , … , 𝑐𝑘 } with shape 𝐜 ∈ ℝ𝑘 is a convolution of a sequence 𝑄 by filter 𝑤. Figure 2 Convolving embedding matrix example word of a context is an embedding vector. Figure 1 illustrates a context for an attitude with “USA” and “Russia” as named entities: «…USA is considering the possibility of new sanctions against Russia…». Picking a context that includes attitude entities with the inner part, we expand it with words by both sides equally and finally composing a text sample 𝑠 = {𝑤1 , . . . , 𝑤𝑘 } of a size 𝑘. Additionally, each 𝑤𝑖 has been To get multiple feature combinations, a set of lowercased and lemmatized. different filters 𝑊 = {𝐰𝟏 , … 𝐰𝐭 } has been applied Let 𝐸𝑤 is a precomputed embedding vocabulary, towards the sequence 𝑄, where 𝑡 is an amount of filters. which we use to compose word embeddings 𝐞𝑤𝑖 . Each This leads to a modified Formula 1 by introduced layer 𝑤𝑖 might be a part of an attitude entity or a text. In the index 𝑖 as follows: latter case 𝐞𝑤𝑖 = 𝐸𝑤 (𝑤𝑖 )4. For attitude entities, we 𝑐𝑖,𝑗 = 𝐰𝑖 𝑞𝑗−𝑤+1:𝑗 (6) consider them as single words. Due to that some entities Denoting 𝐜𝑖 = {𝑐𝑖,1 , … , 𝑐𝑖,𝑛 } in Formula 1 we reduce the are phrases (for example “Russian Federation”), the latter by index 𝑗 and compose a matrix 𝐶 = embedding for them calculated as a sum of each {𝐜1 , 𝐜2 , … , 𝐜𝑡 } which represents convolution matrix with component word 𝑤𝑗 in the phrase: shape 𝐶 ∈ ℝ𝑘×𝑡 . Figure 1 illustrates an example of 𝐞𝑤𝑖 = 𝐸𝑤 (𝑤𝑗 ) (4) convolution matrix with 𝑡 = 3. Given a sample 𝑠, for each word 𝑤𝑖 of it, we compose vector 𝐰𝑖 as a concatenation of vectors 𝐞𝑤𝑖 4.2.3 Max pooling (word) and a pair of distances (𝑑1 , 𝑑2 ) (position) related Max pooling is an operation that reduces values by to each entity5. Given a one attitude entity 𝑒1 , we let keeping maximum. In original CNN architecture, max 𝑑1,𝑖 = 𝑝𝑜𝑠(𝑤𝑖 ) − 𝑝𝑜𝑠(𝑒1 ), where 𝑝𝑜𝑠(⋅) is a position pooling applies separately per each convolution index in sample 𝑠 by a given argument. The same {𝐜1 , … , 𝐜𝑡 } of 𝑡 layers (see Figure 3, left). computations are applied for 𝑑2,𝑖 with the other entity 𝑒2 respectively. Composed 𝐸𝑎 = {𝐰1 , … , 𝐰𝑘 } represents Figure 3 Max pooling comparison (left: original CNN an attitude embedding matrix. max pooling; right: piecewise version) 4.2.2 Convolution This step of data transformation applies filters towards the attitude embedding matrix (see Figure 2). Treating the latter as a feature-based attitude representation, this approach implements feature merging by sliding a filter of a fixed size within a data and transforming information in it. According to Section 4.2.1, 𝐸𝑎 ∈ ℝ𝑘× 𝑚 is an attitude embedding matrix with a text segment of size 𝑘 and vector size 𝑚. We regard 𝐸𝑎 as a sequence of rows 4 In case of word absence 𝑤𝑖 in 𝐸𝑤 , the zero vector was used 189 It reduces convolved information quite rapidly, and vector 𝑏. therefore is not appropriate for attitude classification The neural network training process includes the task. To keep context aspects that are inside and outside following steps: of the attitude entities, authors [16] perform piecewise 1. Split 𝑇 into list of batches 𝐵 = {𝑡1 , … , 𝑡𝑞 } with the max pooling. Given attitude entities as borders, we divide fixed size of 𝑞, where 𝑡𝑖 ∈ 𝑇; each 𝑐𝑖 into inner, left and right segments {𝐜𝑖,1 , 𝐜𝑖,2 , 𝐜𝑖,3 } 2. Randomly choose 𝑏𝑠 from list of batches 𝐵 to (see Figure 3, right). Then max pooling applies per each perform a forward propagation through the network segment separately: and receive 𝑜𝑠 = {𝑜1 , … , 𝑜𝑞 } ∈ ℝ𝑞⋅𝑐 ; 𝑝𝑖,𝑗 = 𝑚𝑎𝑥(𝐜𝑖,𝑗 ), 𝑖 ∈ 1 … 𝑡 𝑗 ∈ 1 … 3 (6) 3. Given an 𝑜𝑠 we compute cross entropy loss as Thus, for each 𝐜𝑖 we have a 𝐩𝑖 = {𝑝𝑖,1 , 𝑝𝑖,2 , 𝑝𝑖,3 }. follows: 𝑐 Concatenation of these sets 𝐩𝑖:𝑗 results in 𝐩 ∈ ℝ3𝑡 and that is a result of piecewise max pooling operation. At 𝐽(𝜃) = ∑ log 𝑝(𝑦𝑖 |𝑜𝑖,𝑗 ; 𝜃) , 𝑖 ∈ 1 … 𝑞 (10) the last step we apply the hyperbolic tangent activation 𝑗=1 function. The shape of resulted 𝑑 remains unchanged: 4. Update hidden variables 𝐻 of 𝜃 using the calculated 𝒅 = tanh(𝐩), 𝒅 ∈ ℝ𝟑𝒕 (7) gradients from the previous step; 5. Repeat steps 2-4 while the necessary epoch count 4.2.4 Sentiment Prediction will not be reached. Before we receive a neural network output, the result 5 Experiments 𝑑 ∈ ℝ3𝑡 of the previous step passed through the fully connected hidden layer: We consider attitudes as a pair of named entities 𝑜 = 𝑊1 𝑑 + 𝑏, 𝑊1 ∈ ℝ𝑐×3𝑡 , 𝑏 ∈ ℝ𝑐 (8) within a single sentence (see Section 4.1). The distance in words within pair was limited by segment size 𝑘 = 50. Figure 4 Max pooling transformation According to Table 1 (see “Share of attitudes expressed in a single sentence”) it allows us to cover up to 76.5% and 74% of sentiment attitudes for the train and test collections respectively. Table 2 illustrates an amount of extracted attitudes from train and test collections. To select an embedding model 𝐸𝑤 , the average distance between attitude entities was taken into account. According to Table 1 (see «avg. dist. between NE within a sentence in words»), we were interested in a Skip-gram based model which covers our estimation. We use a precomputed and publicly available word2vec6 model7 based on news articles with window size of 20 and vector In Formula 8, 𝑐 is an expected amount of classes, size of 1000. To perform text lemmatization, we utilize and 𝑜 is an output vector. The elements of the latter Yandex Mystem8. vectors are unscaled values. We use a softmax We use the adadelta optimizer for model training transformation to obtain probabilities per each output with parameters that were chosen according to [14]. For class. Figure 4 illustrates a 3-dimentional output vector. dropout probability, the statistically optimal value for To prevent a model from overfitting, we employ dropout most classification tasks was chosen. for output neurons during training process. For model evaluation, we use 𝐹1 (𝑃, 𝑁)-macro measure. It combines recall and precision both by 4.2.5 Training positive (P) and negative (N) classes. We experimentally As a function, the implemented neural network model study the effectiveness of a model by varying depends on the parameters divided into the following 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑐𝑜𝑢𝑛𝑡. groups: 𝐼 represents an input for supervised learning, and Table 3 illustrates the results for both implemented 𝐻 describes hidden states that are trainable during PCNN model9 and the original CNN model in runs, network optimization. Formula 9 illustrates network 𝜃 where each run varies in terms of settings. Due to that function dependencies: 𝐽(𝜃) has a non-convex shape with large amount of local 𝜃 = (𝐼; 𝐻) = (𝑇; 𝑊, 𝑊1 , 𝑏) (9) minimums, and initial hidden state varies by each we The group of input parameters 𝐼 consist of 𝑚 tuples provide multiple evaluation results during the training 𝑇 = {𝑡1 , … , 𝑡𝑚 }, where 𝑡𝑖 = (𝐴𝑒 , 𝑦) includes attitude process at certain epochs 𝐹1 (𝑒), where 𝑒 is an amount of embedding 𝐴𝑒 with the related label 𝑦 ∈ ℝ𝑐 . The group epochs were passed. According to the obtained results of hidden parameters 𝐻 includes a set of convolution (see Table 3), we may conclude that using greater filters 𝑊, hidden fully connected layer 𝑊1 and bias amount of filters allows to accelerate training process for 6 8 https://code.google.com/p/word2vec/ https://tech.yandex.ru/mystem/ 7 9 http://rusvectores.org/static/models/rusvectores2/news_myst github.com/nicolay-r/sentiment-pcnn/tree/damdid-2018 em_skipgram_1000_20_2015.bin.gz 190 Table 3 F1(P,N) results through epochs for CNN and PCNN models model filters count F1(25) F1(50) F1(75) F1(100) F1(150) F1(200) F1(250) CNN 100 0.03 0.07 0.08 0.13 0.20 0.25 0.29 200 0.06 0.11 0.15 0.19 0.17 0.25 0.26 300 0.12 0.18 0.24 0.28 0.31 0.30 0.30 PCNN 100 0.06 0.13 0.23 0.21 0.28 0.29 0.30 200 0.17 0.24 0.29 0.29 0.29 0.30 0.31 300 0.19 0.27 0.29 0.29 0.29 0.29 0.29 both models. Comparing original CNN with the perform better than approaches based on handcrafted Piecewise version, the model of the latter architecture features. reaches top results (𝐹1 (𝑃, 𝑁) ≥ 0.30) significantly Due to the dataset limitation and manual annotating faster. According to Table 4, proposed approach complexity, in further works we plan to discover significantly outperforms the baselines and performs unsupervised pre-training techniques based on better than conventional classifiers [8]. Manually automatically annotated articles of external sources. In implemented feature set was used to train KNN, SVM, addition, the current attitude embedding format has no Naive Bayes, and Random Forest classifiers [8]. For the information about related article in whole, which is an same dataset, SVM and Naive Bayes achieved 16% by another direction of further improvements. F-measure, and the best result has been obtained by the Random Forest classifier (27% F-measure). To assess the References upper bound for experimented methods, the expert agreement with etalon labeling was estimated (Table 4, [1] Alimova, I., Tutubalina, E.: Automated detection last row). Overall, we may conclude that this task still of adverse drug reactions from social media remains complicated and the results are quite low. It posts with machine learning. In: Proceedings of should be noted that the authors of the [3], who worked International Conference on Analysis of Images, with much smaller documents written in English, Social Networks and Texts, pp. 1-12, (2017) reported F-measure 36%. [2] Ben-Ami, Z., Feldman, R., Rosenfeld, B.: Table 4 Experiment results Entities’ sentiment relevance. ACL-2013, 2, pp. 87–92, (2014) Method Precision Recall F1(P,N) [3] Choi, E., Rashkin, H., Zettlemoyer, L., Choi, Y.: Neg 0.03 0.39 0.05 Document-level sentiment inference with social, Pos 0.02 0.40 0.04 faction, and discourse context. In: Proceedings Distr 0.05 0.23 0.08 of the 54th annual meeting of the association for School 0.13 0.10 0.12 computational linguistics. ACL, pp. 333–343, KNN 0.18 0.06 0.09 (2016) [4] Deng, L., Wiebe, J.: MPQA 3.0: An entity/event- SVM (Grid) 0.09 0.36 0.15 level sentiment corpus. Proceedings of the 2015 Random forest 0.41 0.21 0.27 Conference of the North American Chapter of CNN 0.41 0.23 0.31 the Association for Computational Linguistics: PCNN 0.42 0.23 0.31 Human Language Technologies, pp. 1323–1328, Expert agreement 0.62 0.49 0.55 (2015) [5] Ellis, J., Getman, J., Strassel, S., M.: Overview of linguistic resources for the TAC KBP 2014 5 Conclusion evaluations: Planning, execution, and results. This paper introduces the problem of sentiment Proceedings of TAC KBP 2014 Workshop, attitude extraction from analytical articles. The key point National Institute of Standards and Technology, of the proposed solution that it does not depend on pp. 17–18, (2014) handcrafted feature implementation. The models based [6] Hendrickx, I., et. al.: Semeval-2010 task 8: on the Convolutional Neural Network architecture were Multi-way classification of semantic relations used. between pairs of nominals. In: Proceedings of In the current experiments, the problem of sentiment the Workshop on Semantic Evaluations: Recent attitude extraction is considered as a three-class machine Achievements and Future Directions, learning task. We experimented with CNN-based models Association for Computational Linguistics, pp. by studying their effectiveness depending on 94–99, (2009) convolutional filters count. Increasing the latter [7] Loukachevitch, N., Rubtsova Y.: Sentirueval- parameter accelerates training process. Comparing 2016: Overcoming time gap and data sparsity in original architecture with the piecewise modification, the tweet sentiment analysis. In: Computational model of the latter reaches better results faster. Both Linguistics and Intellectual Technologies models significantly outperform the baselines and Proceedings of the Annual International 191 Conference Dialogue, Moscow, RGGU, pp. 416- [13] Scheible, C., Schutze, H.: Sentiment relevance. 427, (2016) In: Proceedings of ACL 2013 1, pp. 954–963, [8] Loukachevitch, N., Rusnachenko, N.: Extracting (2013) sentiment attitudes from analytical texts. In: [14] Zeiler, M.D.: Adadelta: an adaptive learning rate Proceedings of International Conference of method. arXiv preprint arXiv:1212.5701 (2012) Computational Linguistics and Intellectual [15] Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Technologies Dialog-2018, pp. 455–464, (2018) Relation classification via convolutional deep [9] Mitchell, M., Aguilar, J., Wilson, T., Van neural network. In: Proceedings of COLING Durme, B.: Open domain targeted sentiment. In: 2014, the 25th International Conference on Proceedings of the 2013 Conference on Computational Linguistics: Technical Papers, Empirical Methods in Natural Language pp. 2335–2344, (2014) Processing, pp. 1643–1654, (2013) [16] Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant [10] Mozharova, V., Loukachevitch, N.: Combining supervision for relation extraction via piecewise knowledge and CRF-based approach to named convolutional neural networks. In: Proceedings entity recognition in Russian. In: International of the 2015 Conference on Empirical Methods in Conference on Analysis of Images, Social Natural Language Processing, pp. 1753–1762, Networks and Texts, pp. 185–195, (2016) (2015) [11] Peng, J., Bo, L., Xu, J.: Conditional neural fields. [17] Zhang M., Zhang Y., Vo D. T.: Neural networks In: Advances in neural information processing for open domain targeted sentiment. In: systems, pp. 1419–1427, (2009) Proceedings of the 2015 Conference on [12] Rosenthal, S., Farra, N., Nakov, P.: Semeval- Empirical Methods in Natural Language 2017 task 4: Sentiment analysis in twitter. In: Processing, pp. 612-621, (2015) Proceedings of SemEval-2017 workshop, pp. 502-518, (2017) 192