=Paper= {{Paper |id=Vol-2277/paper33 |storemode=property |title= Extracting Sentiment Attitudes from Analytical Texts via Piecewise Convolutional Neural Network |pdfUrl=https://ceur-ws.org/Vol-2277/paper33.pdf |volume=Vol-2277 |authors=Nicolay Rusnachenko,Natalia Loukachevitch |dblpUrl=https://dblp.org/rec/conf/rcdl/RusnachenkoL18 }} == Extracting Sentiment Attitudes from Analytical Texts via Piecewise Convolutional Neural Network == https://ceur-ws.org/Vol-2277/paper33.pdf
       Extracting Sentiment Attitudes from Analytical Texts
           via Piecewise Convolutional Neural Network
                       © N.L. Rusnachenko 1            © N.V. Loukachevitch 2
                     1
                       Bauman Moscow State Technical University, Moscow, Russia
                        2
                          Lomonosov Moscow State University, Moscow, Russia
                           kolyarus@yandex.ru           louk_nat@mail.ru
          Abstract. For deep text understanding, it is necessary to explore the connections between text units
    mentioning events, entities, etc. Depending on the further goals, it allows to consider the text as a graph of
    task-specific relations. In this paper, we focused on analysis of sentiment attitudes, where the attitude
    represents a sentiment relation from subject towards object. Given a mass media article and list of mentioned
    named entities, the task is to extract sentiment attitudes between them. We propose a specific model based on
    convolutional neural networks (CNN), independent of handcrafted NLP features. For model evaluation, we
    use RuSentRel 1.0 corpora, consisted of mass media articles written in Russian.
          Keywords: sentiment analysis, convolutional neural networks, data intensive domains

                                                                    subject towards an object, where each end of such
1 Introduction                                                      relations represents a mentioned named entity.
                                                                         We propose a model based on the modified
Automatic sentiment analysis, i.e. the identification of            architecture of Convolutional Network Networks
the authors’ opinion on the subject discussed in the text,          (CNN). The model predicts a sentiment score for a given
is one of the most popular applications of natural                  attitude in context. In case of the original CNN
language processing during the last years.                          architecture, max pooling operation reduces information
     One of the most popular direction becomes a                    (convolved attitude context) quite rapidly. The modified
sentiment analysis of user posts. Twitter [7] social                architecture decreases the speed by reducing attitude
network allows rapidly spread news in a form of short               context in pieces. The borders of such pieces related to
text messages, where some of them express user                      attitude entities positions. We use RuSentRel 1.0 corpus
opinions. Such texts are limited in length and has only a           for model evaluation. Both models based on original and
single object for analysis – author opinion towards the             modified CNN architectures significantly outperform
service or product quality [1, 12]. These factors make this         baselines and perform better than classifiers based on
area well studied.                                                  handcrafted NLP features.
     Large texts, such as analytical articles represent a
complicated genre of documents for sentiment analysis.
                                                                    2 Related works
Unlike short posts, large articles expose a lot of entities
where some of them connected by relations. The                      Relation extraction becomes popular since the
connectivity allows us to represent article as a graph.             appearance of the relation classification track in
This kind of representation is necessary for information            proceedings of SemEval-2010 conference. In [6] authors
extraction (IE) [6]. Analytical texts contain Subject-              introduce a dataset for a task of semantic classification
Object relations, or attitudes conveyed by different                between pair of common nominals. The classification
subjects, including the author(s) attitudes, positions of           considered in terms of nominals context. This restriction
cited sources, and relations of the mentioned entities              introduced for simplicity and meaning disambiguation.
between each other.                                                 The resulted model allows composing a semantic
     Besides, an analytical text can have a complicated             network for a given text with connections, accompanied
discourse structure. Given an example: «Donald Trumpe1              by the relation type (Part-Whole, Member-Collection,
accused Chinae2 and Russiae3 of “playing devaluation of             etc.).
currencies”». This sentence illustrates an attitude from                In 2014, the TAC evaluation conference in
subject 𝑒1 towards multiple objects 𝑒2 and 𝑒3 , where               Knowledge Base Population (KBP) track included so-
objects have no attitudes within themselves.                        called sentiment track [5]. The task was to find all the
Additionally, statements of opinion can take several                cases where a query entity (sentiment holder) holds a
sentences, or refer to the entity mentioned several                 sentiment (positive or negative) about another entity
sentences earlier.                                                  (sentiment target). Thus, this task was formulated as a
     In this paper we introduce a problem of sentiment              query-based retrieval of entity-sentiment from relevant
attitude extraction from analytical articles written in             documents and focused only on query entities.
Russian. Here attitude denotes a directed relation from                 In [9] authors discover a target sentiment detection
                                                                    towards named entities in text. Depending on context,
Proceedings of the XX International Conference                      this sentiment arises from a variety of factors, such as
“Data Analytics and Management in Data Intensive
Domains” (DAMDID/RCDL’2018), Moscow, Russia,
October 9-12, 2018

                                                              186
writer experience, attitudes from other entities towards               idea was proceeded by the authors of paper [16] in terms
target, etc.: «So happy that [Kentucky lost to                         of max pooling operation. This operation applies for a
Tennessee] event». In latter example, Kentucky has                     convolved by filters data and extracts maximal value
negative attitude towards Tennessee, but the writer has                within each convolution. The authors proposed to treat
positive one. The authors investigated how to detect                   each convolution in parts. The division into parts was
named entity (NE) and sentiment expressed towards it. A                related to attitude ends and was as follows: inner, and
variety of models based on conditional random fields                   outer. This results in an advanced CNN architecture
(CRF) were implemented. All models were trained based                  model and was dubbed as Piecewise Convolutional
on the list of predefined features. The experiments were               Neural Network (PCNN).
subdivided into three tasks (in order of complexity                         In this paper, we present an application of the
growth): NE recognition, subjectivity prediction (fact of              PCNN model [16] towards sentiment attitudes
sentiment existence along the target), sentiment NE                    extraction. We use automatically trainable features
prediction (3-scale classification).                                   instead of handcrafted NLP features. For illustrating
     In [17] authors proceed discover of target sentiment              effectiveness, we compared our results with original
detection. Being modeled as a sequence labeling                        CNN implementation, and other approaches: baselines,
problem, the authors exploit word embeddings with                      classifiers based on handcrafted features.
automatic features training within neural network
models. Due to CRF model’s affection, the authors                      3 Dataset
experimented with models based on conditional neural                   We use RuSentRel 1.0 corpus1 consisted of analytical
fields architecture (CNF) [11]. As in [9], the task was                articles from Internet-portal inosmi.ru [8]. These articles
considered in following parts: entities classification,                in the domain of international politics were obtained
entities extraction and classification.                                from foreign authoritative sources and translated into
     MPQA 3.0 [4] is a corpus of analytical articles with              Russian. The collected articles contain both the author's
annotated opinion expressions (towards entities and                    opinion on the subject matter of the article and a large
events). The annotation is sentence-based. For example,                number of references mentioned between the participants
in the sentence «When the Imam issued the fatwa against                of the described situations.
Salman Rushdie for insulting the Prophet ...», Imam is
negative to Salman Rushdie, but is positive to the                     Figure 1 Opinion annotation example for article 4
Prophet. The current corpus consists of 70 documents.                  (dashed: negative attitudes; solid: positive attitudes)
In total, sentiments towards 4,459 targets are labeled.
      The paper [3] studied the approach to the discovery
of the documents attitudes between subjects mentioned
in the text. The approach considers such features as
relatedness between entities, frequency of a named entity
in the text, direct-indirect speech, and other features. The
best quality of opinion extraction obtained in the work
was only about 36% F-measure by two sentiment classes,
which illustrates the necessity of improving extraction of
attitudes at the document level is significant.
      For the analysis of sentiments with multiple targets
in a coherent text, in the works [2] and [13] the concept
of sentiment relevance is discussed. In [2], the authors
consider several types of thematic importance of the
entities discussed in the text: the main entity, an entity
from a list of similar entities, accidental entity, etc. These             For the documents, the manual annotation of the
types are treated differently in sentiment analysis of                 sentiment attitudes towards the mentioned named entities
coherent texts.                                                        has been carried out. The annotation can be subdivided
      For relation extraction, in [15] the task was modeled            into two subtypes:
by convolutional neural network towards context                         The author's relation to mentioned named entities;
representation based on word embedding features.
                                                                        The relation of subjects expressed as named entities
Convolving such embedding by a set of different filters,
                                                                           to other named entities.
the authors implemented and trained Convolutional
Neural Network (CNN) model for the relation                                Figure 1 illustrates annotated article attitudes in graph
classification task. Being applied for SemEval-2010                    format. These opinions are as Subject-Object relations
Task 8 dataset [6] the resulted model significantly                    type in terms of related terminology [6] and recorded as
outperforms the results of other participants.                         triplets: (Subject of opinion, Object of opinion, attitude).
      However, for the relation classification task, the               The attitude can be negative (neg) or positive (pos), for
original max pooling reduces information extremely                     example (Author, USA, neg), (USA, Russia, neg). Neutral
rapid, and hence, blurs significant relation aspects. The              opinions are not recorded. The attitudes are described for

1
    https://github.com/nicolay-r/RuSentRel/tree/v1.0




                                                                 187
the whole documents, not for each sentence. In some                 without indication of any sentiment to each other per a
texts, there were several opinions of the different                 document. This number is much larger than number of
sentiment orientation of the same subject in relation to            positive or negative sentiments in documents, which
the same object. This, in particular, could be due to the           additionally stresses the complexity of the task.
comparison of the sentiment orientation of previous
relations and current relations (for example, between               4. Sentiment attitudes extraction
Russia and Turkey). Or the author of the article could
                                                                        In this paper, the task of sentiment attitude extraction
mention his former attitude to some subject and indicate
                                                                    is treated as follows: given an attitude as a pair of its
the change of this attitude at the current time. In such
                                                                    named entities, we predict a sentiment label of a pair,
cases, it was assumed that the annotator should specify
                                                                    which could be positive, negative, or neutral.
exactly the current state of the relationship. In total, 73
                                                                        The act of extraction is to select only those pairs,
large analytical texts were labeled with about 2000
                                                                    which were predicted as non-neutral. This leads to the
relations.
                                                                    following questions:
    To prepare documents for automatic analysis, the
                                                                    1. How to complete a set of all attitudes?
texts were processed by the automatic name entity
                                                                    2. How to predict attitude labels?
recognizer, based on CRF method [10]. The program
identified named entities that were categorized into four           4.1 Composing attitude sets
classes: Persons, Organizations, Places and Geopolitical
Entities (states and capitals as states). In total, 15.5            Given a list of synonym groups 𝑆 provided by RuSentRel
thousand named entity mentions were found in the                    dataset (see Section 3), let 𝑆(𝑤) is a function which
documents of the collection. An analytical document can             returns a synonym by given word3 or phrase 𝑤.
refer to an entity with several variants of naming                       The pair of attitudes 𝑎1 = (𝑒1,𝑙 , 𝑒1,𝑟 ) and 𝑎2 =
(Vladimir Putin – Putin), synonyms (Russia – Russian                (𝑒2,𝑙 , 𝑒2,𝑟 ) are equal up to synonyms 𝑎1 ≃ 𝑎2 when both
Federation), or lemma variants generated from different             ends related to the same synonym group:
wordforms. Besides, annotators could use only one of                       𝑆(𝑒1,𝑙 ) = 𝑆(𝑒2,𝑙 ) 𝑎𝑛𝑑 𝑆(𝑒1,𝑟 ) = 𝑆(𝑒2,𝑟 )       (1)
possible entity’s names describing attitudes. For correct                Using Formula 1 we define that 𝐴 is a set without
inference of attitudes between named entities in the                synonyms as follows:
whole document, the dataset provides the list of variant                          𝐴: ∄𝑎𝑖 , 𝑎𝑗 ∈ 𝐴: {𝑎𝑖 ≃ 𝑎𝑗 , 𝑖 ≠ 𝑗}         (2)
names for the same entity found in our corpus. The                       To complete a training set 𝐴𝑡𝑟𝑎𝑖𝑛 , we first compose
current list contains 83 sets of name variants. This allows         auxiliary sets without synonyms: 𝐴𝑠 is a set of sentiment
separating the sentiment analysis task from the task of             attitudes, and 𝐴𝑛 – is a set of neutral attitudes. For 𝐴𝑠 , the
named entity coreference.                                           etalon opinions were used to find related named entities
    A preliminary version of the RuSentRel corpus was
                                                                    to compose sentiment attitudes. 𝐴𝑛 consist of attitudes
granted to the Summer school on Natural Language
                                                                    composed between all available named entities of the
Processing and Data Analysis2, organized in Moscow in               train collection. In this paper, the context attitudes were
2017. The collection was divided into the training and
                                                                    limited by a single sentence. Finally, completed 𝐴𝑡𝑟𝑎𝑖𝑛 is
test parts. In the current experiments, we use the same
                                                                    an expansion 𝐴𝑠 with 𝐴𝑛 :
division of the data. Table 1 contains statistics of the
                                                                                        𝐴𝑡𝑟𝑎𝑖𝑛 = 𝐴𝑠 ∪ 𝐴𝑛 :
training and test parts of the RuSentRel corpus.                                                                               (3)
                                                                                ∄𝑖, 𝑗: {𝑎𝑖 ≃ 𝑎𝑗 , 𝑎𝑖 ∈ 𝐴𝑠 , 𝑎𝑗 ∈ 𝐴𝑛 }
Table 1 Statistics of RuSentRel 1.0 corpus                               To estimate the model, we complete the test set 𝐴𝑡𝑒𝑠𝑡
Parameter                            Training       Test            of neutral attitudes without synonyms. It consists of
                                    collection   collection         attitudes composed between all available named entities
Number of documents                     44           29             within a single sentence of the test collection. Table 2
Sentences (avg. per doc.)             74.5         137              illustrates amount of attitudes both for the train and test
                                                                    collections.
Mentioned NE (avg. per doc.)          194          300
Unique NE (avg. per doc.)             33.3         59.9             Table 2 Context attitudes amount
Pos. pairs of NE (avg. per doc.)      6.23         14.7                 Attitudes count           𝐴𝑡𝑟𝑎𝑖𝑛               𝐴𝑡𝑒𝑠𝑡

Neg. pairs of NE (avg. per doc.)      9.33         15.6                 Positive              571 (7.2%)                -
                                                                        Negative              735 (9.3%)                -
Neu. pairs of NE (avg. per doc.)       120          276
                                                                        Neutral              6584 (83.5%)             8024
Avg. dist. between NE within a        10.2         10.2
sentence in words                                                   4.2 Labels prediction
Share of attitudes expressed in a    76.5%         73%
                                                                    For label prediction, we use an approach that exploits a
single sentence
                                                                    word embedding model and automatically trainable
    The last line of the Table 1 shows the average number           features. We implemented an advanced CNN model,
of named entities pairs mentioned in the same sentences             dubbed as Piecewise Convolutional Neural Network

2                                                                   3
    https://miem.hse.ru/clschool/                                    The case of synonym absence has been resolved by
                                                                    completing a new group with the single element {𝑤}.




                                                              188
(PCNN), proposed by [16].                                               𝑄 = {𝑞1 , … , 𝑞𝑘 }, where 𝑞𝑖 ∈ ℝ𝑚 . We denote 𝑞𝑖:𝑗 as
                                                                        consequent vectors concatenation from 𝑖'th till 𝑗'th
4.2.1 Attitude embedding                                                positions. An application of 𝐰 ∈ ℝ𝑑 , (𝑑 = 𝑤 ⋅ 𝑚)
The attitude embedding is a form of an attitude                         towards the concatenation 𝑞𝐢:𝐣 is a sequence convolution
representation in a way of a related context, where each                by filter 𝐰, where 𝑤 is a filter window size. Figure 1
Figure 1 Attitude embedding matrix                                      illustrates 𝑤 = 3. For convolving calculation 𝑐𝑗 , we
                                                                        apply scalar multiplication as follows:
                                                                                           𝑐𝑗 = 𝐰𝑞𝑗−𝑤+1:𝑗                      (5)
                                                                               Where 𝑗 ∈ 1 … 𝑘 is filter offset within the sequence
                                                                        𝑄. We decide to let 𝑞𝑖 a zero-based vector of size 𝑚 in
                                                                        case when 𝑖 < 0 or 𝑖 > 𝑘. As a result, 𝐜 = {𝑐1 , … , 𝑐𝑘 }
                                                                        with shape 𝐜 ∈ ℝ𝑘 is a convolution of a sequence 𝑄 by
                                                                        filter 𝑤.
                                                                        Figure 2 Convolving embedding matrix example




word of a context is an embedding vector. Figure 1
illustrates a context for an attitude with “USA” and
“Russia” as named entities: «…USA is considering the
possibility of new sanctions against Russia…».
      Picking a context that includes attitude entities with
the inner part, we expand it with words by both sides
equally and finally composing a text sample 𝑠 =
 {𝑤1 , . . . , 𝑤𝑘 } of a size 𝑘. Additionally, each 𝑤𝑖 has been
                                                                               To get multiple feature combinations, a set of
lowercased and lemmatized.
                                                                        different filters 𝑊 = {𝐰𝟏 , … 𝐰𝐭 } has been applied
      Let 𝐸𝑤 is a precomputed embedding vocabulary,
                                                                        towards the sequence 𝑄, where 𝑡 is an amount of filters.
which we use to compose word embeddings 𝐞𝑤𝑖 . Each                      This leads to a modified Formula 1 by introduced layer
𝑤𝑖 might be a part of an attitude entity or a text. In the              index 𝑖 as follows:
latter case 𝐞𝑤𝑖 = 𝐸𝑤 (𝑤𝑖 )4. For attitude entities, we                                       𝑐𝑖,𝑗 = 𝐰𝑖 𝑞𝑗−𝑤+1:𝑗                 (6)
consider them as single words. Due to that some entities                Denoting 𝐜𝑖 = {𝑐𝑖,1 , … , 𝑐𝑖,𝑛 } in Formula 1 we reduce the
are phrases (for example “Russian Federation”), the
                                                                        latter by index 𝑗 and compose a matrix 𝐶 =
embedding for them calculated as a sum of each
                                                                        {𝐜1 , 𝐜2 , … , 𝐜𝑡 } which represents convolution matrix with
component word 𝑤𝑗 in the phrase:
                                                                        shape 𝐶 ∈ ℝ𝑘×𝑡 . Figure 1 illustrates an example of
                         𝐞𝑤𝑖 = 𝐸𝑤 (𝑤𝑗 )                    (4)          convolution matrix with 𝑡 = 3.
      Given a sample 𝑠, for each word 𝑤𝑖 of it, we
compose vector 𝐰𝑖 as a concatenation of vectors 𝐞𝑤𝑖                     4.2.3 Max pooling
(word) and a pair of distances (𝑑1 , 𝑑2 ) (position) related
                                                                        Max pooling is an operation that reduces values by
to each entity5. Given a one attitude entity 𝑒1 , we let                keeping maximum. In original CNN architecture, max
𝑑1,𝑖 = 𝑝𝑜𝑠(𝑤𝑖 ) − 𝑝𝑜𝑠(𝑒1 ), where 𝑝𝑜𝑠(⋅) is a position                  pooling applies separately per each convolution
index in sample 𝑠 by a given argument. The same                         {𝐜1 , … , 𝐜𝑡 } of 𝑡 layers (see Figure 3, left).
computations are applied for 𝑑2,𝑖 with the other entity 𝑒2
respectively. Composed 𝐸𝑎 = {𝐰1 , … , 𝐰𝑘 } represents                   Figure 3 Max pooling comparison (left: original CNN
an attitude embedding matrix.                                           max pooling; right: piecewise version)

4.2.2 Convolution
This step of data transformation applies filters towards
the attitude embedding matrix (see Figure 2). Treating
the latter as a feature-based attitude representation, this
approach implements feature merging by sliding a filter
of a fixed size within a data and transforming information
in it.
       According to Section 4.2.1, 𝐸𝑎 ∈ ℝ𝑘× 𝑚 is an
attitude embedding matrix with a text segment of size 𝑘
and vector size 𝑚. We regard 𝐸𝑎 as a sequence of rows

4
    In case of word absence 𝑤𝑖 in 𝐸𝑤 , the zero vector was used




                                                                  189
     It reduces convolved information quite rapidly, and                  vector 𝑏.
therefore is not appropriate for attitude classification                       The neural network training process includes the
task. To keep context aspects that are inside and outside                 following steps:
of the attitude entities, authors [16] perform piecewise                  1. Split 𝑇 into list of batches 𝐵 = {𝑡1 , … , 𝑡𝑞 } with the
max pooling. Given attitude entities as borders, we divide                     fixed size of 𝑞, where 𝑡𝑖 ∈ 𝑇;
each 𝑐𝑖 into inner, left and right segments {𝐜𝑖,1 , 𝐜𝑖,2 , 𝐜𝑖,3 }         2. Randomly choose 𝑏𝑠 from list of batches 𝐵 to
(see Figure 3, right). Then max pooling applies per each                       perform a forward propagation through the network
segment separately:                                                            and receive 𝑜𝑠 = {𝑜1 , … , 𝑜𝑞 } ∈ ℝ𝑞⋅𝑐 ;
       𝑝𝑖,𝑗 = 𝑚𝑎𝑥(𝐜𝑖,𝑗 ), 𝑖 ∈ 1 … 𝑡 𝑗 ∈ 1 … 3             (6)             3. Given an 𝑜𝑠 we compute cross entropy loss as
     Thus, for each 𝐜𝑖 we have a 𝐩𝑖 = {𝑝𝑖,1 , 𝑝𝑖,2 , 𝑝𝑖,3 }.                   follows:
                                                                                              𝑐
Concatenation of these sets 𝐩𝑖:𝑗 results in 𝐩 ∈ ℝ3𝑡 and
that is a result of piecewise max pooling operation. At                              𝐽(𝜃) = ∑ log 𝑝(𝑦𝑖 |𝑜𝑖,𝑗 ; 𝜃) , 𝑖 ∈ 1 … 𝑞    (10)
the last step we apply the hyperbolic tangent activation                                     𝑗=1

function. The shape of resulted 𝑑 remains unchanged:                      4.     Update hidden variables 𝐻 of 𝜃 using the calculated
             𝒅 = tanh(𝐩), 𝒅 ∈ ℝ𝟑𝒕                          (7)                   gradients from the previous step;
                                                                          5.     Repeat steps 2-4 while the necessary epoch count
4.2.4 Sentiment Prediction                                                       will not be reached.
Before we receive a neural network output, the result
                                                                          5 Experiments
𝑑 ∈ ℝ3𝑡 of the previous step passed through the fully
connected hidden layer:                                                       We consider attitudes as a pair of named entities
       𝑜 = 𝑊1 𝑑 + 𝑏, 𝑊1 ∈ ℝ𝑐×3𝑡 , 𝑏 ∈ ℝ𝑐         (8)                      within a single sentence (see Section 4.1). The distance
                                                                          in words within pair was limited by segment size 𝑘 = 50.
Figure 4 Max pooling transformation                                       According to Table 1 (see “Share of attitudes expressed
                                                                          in a single sentence”) it allows us to cover up to 76.5%
                                                                          and 74% of sentiment attitudes for the train and test
                                                                          collections respectively. Table 2 illustrates an amount of
                                                                          extracted attitudes from train and test collections.
                                                                              To select an embedding model 𝐸𝑤 , the average
                                                                          distance between attitude entities was taken into account.
                                                                          According to Table 1 (see «avg. dist. between NE within
                                                                          a sentence in words»), we were interested in a Skip-gram
                                                                          based model which covers our estimation. We use a
                                                                          precomputed and publicly available word2vec6 model7
                                                                          based on news articles with window size of 20 and vector
     In Formula 8, 𝑐 is an expected amount of classes,                    size of 1000. To perform text lemmatization, we utilize
and 𝑜 is an output vector. The elements of the latter                     Yandex Mystem8.
vectors are unscaled values. We use a softmax                                 We use the adadelta optimizer for model training
transformation to obtain probabilities per each output                    with parameters that were chosen according to [14]. For
class. Figure 4 illustrates a 3-dimentional output vector.                dropout probability, the statistically optimal value for
To prevent a model from overfitting, we employ dropout                    most classification tasks was chosen.
for output neurons during training process.                                   For model evaluation, we use 𝐹1 (𝑃, 𝑁)-macro
                                                                          measure. It combines recall and precision both by
4.2.5 Training
                                                                          positive (P) and negative (N) classes. We experimentally
As a function, the implemented neural network model                       study the effectiveness of a model by varying
depends on the parameters divided into the following                      𝑓𝑖𝑙𝑡𝑒𝑟𝑠 𝑐𝑜𝑢𝑛𝑡.
groups: 𝐼 represents an input for supervised learning, and                    Table 3 illustrates the results for both implemented
𝐻 describes hidden states that are trainable during                       PCNN model9 and the original CNN model in runs,
network optimization. Formula 9 illustrates network 𝜃                     where each run varies in terms of settings. Due to that
function dependencies:                                                    𝐽(𝜃) has a non-convex shape with large amount of local
              𝜃 = (𝐼; 𝐻) = (𝑇; 𝑊, 𝑊1 , 𝑏)            (9)                  minimums, and initial hidden state varies by each we
      The group of input parameters 𝐼 consist of 𝑚 tuples                 provide multiple evaluation results during the training
𝑇 = {𝑡1 , … , 𝑡𝑚 }, where 𝑡𝑖 = (𝐴𝑒 , 𝑦) includes attitude                 process at certain epochs 𝐹1 (𝑒), where 𝑒 is an amount of
embedding 𝐴𝑒 with the related label 𝑦 ∈ ℝ𝑐 . The group                    epochs were passed. According to the obtained results
of hidden parameters 𝐻 includes a set of convolution                      (see Table 3), we may conclude that using greater
filters 𝑊, hidden fully connected layer 𝑊1 and bias                       amount of filters allows to accelerate training process for


6                                                                         8
 https://code.google.com/p/word2vec/                                          https://tech.yandex.ru/mystem/
7                                                                         9
 http://rusvectores.org/static/models/rusvectores2/news_myst                  github.com/nicolay-r/sentiment-pcnn/tree/damdid-2018
em_skipgram_1000_20_2015.bin.gz




                                                                    190
 Table 3 F1(P,N) results through epochs for CNN and PCNN models

  model         filters count     F1(25)          F1(50)       F1(75)         F1(100)     F1(150)      F1(200)      F1(250)
 CNN                 100            0.03           0.07         0.08            0.13        0.20          0.25         0.29
                     200            0.06           0.11         0.15            0.19        0.17          0.25         0.26
                     300            0.12           0.18         0.24            0.28        0.31          0.30         0.30
 PCNN                100            0.06           0.13         0.23            0.21        0.28          0.29         0.30
                     200            0.17           0.24         0.29            0.29        0.29          0.30         0.31
                     300            0.19           0.27         0.29            0.29        0.29          0.29         0.29

both models. Comparing original CNN with the                         perform better than approaches based on handcrafted
Piecewise version, the model of the latter architecture              features.
reaches top results (𝐹1 (𝑃, 𝑁) ≥ 0.30) significantly                     Due to the dataset limitation and manual annotating
faster. According to Table 4, proposed approach                      complexity, in further works we plan to discover
significantly outperforms the baselines and performs                 unsupervised pre-training techniques based on
better than conventional classifiers [8]. Manually                   automatically annotated articles of external sources. In
implemented feature set was used to train KNN, SVM,                  addition, the current attitude embedding format has no
Naive Bayes, and Random Forest classifiers [8]. For the              information about related article in whole, which is an
same dataset, SVM and Naive Bayes achieved 16% by                    another direction of further improvements.
F-measure, and the best result has been obtained by the
Random Forest classifier (27% F-measure). To assess the              References
upper bound for experimented methods, the expert
agreement with etalon labeling was estimated (Table 4,                  [1] Alimova, I., Tutubalina, E.: Automated detection
last row). Overall, we may conclude that this task still                      of adverse drug reactions from social media
remains complicated and the results are quite low. It                         posts with machine learning. In: Proceedings of
should be noted that the authors of the [3], who worked                       International Conference on Analysis of Images,
with much smaller documents written in English,                               Social Networks and Texts, pp. 1-12, (2017)
reported F-measure 36%.                                                 [2]   Ben-Ami, Z., Feldman, R., Rosenfeld, B.:
Table 4 Experiment results                                                    Entities’ sentiment relevance. ACL-2013, 2, pp.
                                                                              87–92, (2014)
Method                  Precision      Recall        F1(P,N)
                                                                        [3]   Choi, E., Rashkin, H., Zettlemoyer, L., Choi, Y.:
Neg                        0.03            0.39       0.05
                                                                              Document-level sentiment inference with social,
Pos                        0.02            0.40       0.04                    faction, and discourse context. In: Proceedings
Distr                      0.05            0.23       0.08                    of the 54th annual meeting of the association for
School                     0.13            0.10       0.12                    computational linguistics. ACL, pp. 333–343,
KNN                        0.18            0.06       0.09
                                                                              (2016)
                                                                        [4]   Deng, L., Wiebe, J.: MPQA 3.0: An entity/event-
SVM (Grid)                 0.09            0.36       0.15
                                                                              level sentiment corpus. Proceedings of the 2015
Random forest              0.41            0.21       0.27                    Conference of the North American Chapter of
CNN                        0.41            0.23       0.31                    the Association for Computational Linguistics:
PCNN                       0.42            0.23       0.31                    Human Language Technologies, pp. 1323–1328,
Expert agreement           0.62            0.49       0.55                    (2015)
                                                                        [5]   Ellis, J., Getman, J., Strassel, S., M.: Overview
                                                                              of linguistic resources for the TAC KBP 2014
5 Conclusion                                                                  evaluations: Planning, execution, and results.
     This paper introduces the problem of sentiment                           Proceedings of TAC KBP 2014 Workshop,
attitude extraction from analytical articles. The key point                   National Institute of Standards and Technology,
of the proposed solution that it does not depend on                           pp. 17–18, (2014)
handcrafted feature implementation. The models based                    [6]   Hendrickx, I., et. al.: Semeval-2010 task 8:
on the Convolutional Neural Network architecture were                         Multi-way classification of semantic relations
used.                                                                         between pairs of nominals. In: Proceedings of
     In the current experiments, the problem of sentiment                     the Workshop on Semantic Evaluations: Recent
attitude extraction is considered as a three-class machine                    Achievements       and      Future     Directions,
learning task. We experimented with CNN-based models                          Association for Computational Linguistics, pp.
by studying their effectiveness depending on                                  94–99, (2009)
convolutional filters count. Increasing the latter                      [7]   Loukachevitch, N., Rubtsova Y.: Sentirueval-
parameter accelerates training process. Comparing                             2016: Overcoming time gap and data sparsity in
original architecture with the piecewise modification, the                    tweet sentiment analysis. In: Computational
model of the latter reaches better results faster. Both                       Linguistics and Intellectual Technologies
models significantly outperform the baselines and                             Proceedings of the Annual International




                                                               191
     Conference Dialogue, Moscow, RGGU, pp. 416-                  [13] Scheible, C., Schutze, H.: Sentiment relevance.
     427, (2016)                                                       In: Proceedings of ACL 2013 1, pp. 954–963,
[8] Loukachevitch, N., Rusnachenko, N.: Extracting                     (2013)
     sentiment attitudes from analytical texts. In:               [14] Zeiler, M.D.: Adadelta: an adaptive learning rate
     Proceedings of International Conference of                        method. arXiv preprint arXiv:1212.5701 (2012)
     Computational Linguistics and Intellectual                   [15] Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.:
     Technologies Dialog-2018, pp. 455–464, (2018)                     Relation classification via convolutional deep
[9] Mitchell, M., Aguilar, J., Wilson, T., Van                         neural network. In: Proceedings of COLING
     Durme, B.: Open domain targeted sentiment. In:                    2014, the 25th International Conference on
     Proceedings of the 2013 Conference on                             Computational Linguistics: Technical Papers,
     Empirical Methods in Natural Language                             pp. 2335–2344, (2014)
     Processing, pp. 1643–1654, (2013)                            [16] Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant
[10] Mozharova, V., Loukachevitch, N.: Combining                       supervision for relation extraction via piecewise
     knowledge and CRF-based approach to named                         convolutional neural networks. In: Proceedings
     entity recognition in Russian. In: International                  of the 2015 Conference on Empirical Methods in
     Conference on Analysis of Images, Social                          Natural Language Processing, pp. 1753–1762,
     Networks and Texts, pp. 185–195, (2016)                           (2015)
[11] Peng, J., Bo, L., Xu, J.: Conditional neural fields.         [17] Zhang M., Zhang Y., Vo D. T.: Neural networks
     In: Advances in neural information processing                     for open domain targeted sentiment. In:
     systems, pp. 1419–1427, (2009)                                    Proceedings of the 2015 Conference on
[12] Rosenthal, S., Farra, N., Nakov, P.: Semeval-                     Empirical Methods in Natural Language
     2017 task 4: Sentiment analysis in twitter. In:                   Processing, pp. 612-621, (2015)
     Proceedings of SemEval-2017 workshop, pp.
     502-518, (2017)




                                                            192