<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Sentiment Attitudes from Analytical Texts via Piecewise Convolutional Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>© N.L. Rusnachenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bauman Moscow State Technical University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lomonosov Moscow State University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Proceedings of the XX International Conference “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL'2018)</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>186</fpage>
      <lpage>192</lpage>
      <abstract>
        <p>For deep text understanding, it is necessary to explore the connections between text units mentioning events, entities, etc. Depending on the further goals, it allows to consider the text as a graph of task-specific relations. In this paper, we focused on analysis of sentiment attitudes, where the attitude represents a sentiment relation from subject towards object. Given a mass media article and list of mentioned named entities, the task is to extract sentiment attitudes between them. We propose a specific model based on convolutional neural networks (CNN), independent of handcrafted NLP features. For model evaluation, we use RuSentRel 1.0 corpora, consisted of mass media articles written in Russian.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Automatic sentiment analysis, i.e. the identification of
the authors’ opinion on the subject discussed in the text,
is one of the most popular applications of natural
language processing during the last years.</p>
      <p>
        One of the most popular direction becomes a
sentiment analysis of user posts. Twitter [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] social
network allows rapidly spread news in a form of short
text messages, where some of them express user
opinions. Such texts are limited in length and has only a
single object for analysis – author opinion towards the
service or product quality [
        <xref ref-type="bibr" rid="ref1 ref12">1, 12</xref>
        ]. These factors make this
area well studied.
      </p>
      <p>
        Large texts, such as analytical articles represent a
complicated genre of documents for sentiment analysis.
Unlike short posts, large articles expose a lot of entities
where some of them connected by relations. The
connectivity allows us to represent article as a graph.
This kind of representation is necessary for information
extraction (IE) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Analytical texts contain
SubjectObject relations, or attitudes conveyed by different
subjects, including the author(s) attitudes, positions of
cited sources, and relations of the mentioned entities
between each other.
      </p>
      <p>Besides, an analytical text can have a complicated
discourse structure. Given an example: «Donald Trumpe1
accused Chinae2 and Russiae3 of “playing devaluation of
currencies”». This sentence illustrates an attitude from
subject  1 towards multiple objects  2 and  3, where
objects have no attitudes within themselves.
Additionally, statements of opinion can take several
sentences, or refer to the entity mentioned several
sentences earlier.</p>
      <p>In this paper we introduce a problem of sentiment
attitude extraction from analytical articles written in
Russian. Here attitude denotes a directed relation from
subject towards an object, where each end of such
relations represents a mentioned named entity.</p>
      <p>We propose a model based on the modified
architecture of Convolutional Network Networks
(CNN). The model predicts a sentiment score for a given
attitude in context. In case of the original CNN
architecture, max pooling operation reduces information
(convolved attitude context) quite rapidly. The modified
architecture decreases the speed by reducing attitude
context in pieces. The borders of such pieces related to
attitude entities positions. We use RuSentRel 1.0 corpus
for model evaluation. Both models based on original and
modified CNN architectures significantly outperform
baselines and perform better than classifiers based on
handcrafted NLP features.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related works</title>
      <p>
        Relation extraction becomes popular since the
appearance of the relation classification track in
proceedings of SemEval-2010 conference. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] authors
introduce a dataset for a task of semantic classification
between pair of common nominals. The classification
considered in terms of nominals context. This restriction
introduced for simplicity and meaning disambiguation.
The resulted model allows composing a semantic
network for a given text with connections, accompanied
by the relation type (Part-Whole, Member-Collection,
etc.).
      </p>
      <p>
        In 2014, the TAC evaluation conference in
Knowledge Base Population (KBP) track included
socalled sentiment track [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The task was to find all the
cases where a query entity (sentiment holder) holds a
sentiment (positive or negative) about another entity
(sentiment target). Thus, this task was formulated as a
query-based retrieval of entity-sentiment from relevant
documents and focused only on query entities.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] authors discover a target sentiment detection
towards named entities in text. Depending on context,
this sentiment arises from a variety of factors, such as
writer experience, attitudes from other entities towards
target, etc.: «So happy that [Kentucky lost to
Tennessee]event». In latter example, Kentucky has
negative attitude towards Tennessee, but the writer has
positive one. The authors investigated how to detect
named entity (NE) and sentiment expressed towards it. A
variety of models based on conditional random fields
(CRF) were implemented. All models were trained based
on the list of predefined features. The experiments were
subdivided into three tasks (in order of complexity
growth): NE recognition, subjectivity prediction (fact of
sentiment existence along the target), sentiment NE
prediction (3-scale classification).
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] authors proceed discover of target sentiment
detection. Being modeled as a sequence labeling
problem, the authors exploit word embeddings with
automatic features training within neural network
models. Due to CRF model’s affection, the authors
experimented with models based on conditional neural
fields architecture (CNF) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the task was
considered in following parts: entities classification,
entities extraction and classification.
      </p>
      <p>
        MPQA 3.0 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a corpus of analytical articles with
annotated opinion expressions (towards entities and
events). The annotation is sentence-based. For example,
in the sentence «When the Imam issued the fatwa against
Salman Rushdie for insulting the Prophet ...», Imam is
negative to Salman Rushdie, but is positive to the
Prophet. The current corpus consists of 70 documents.
In total, sentiments towards 4,459 targets are labeled.
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] studied the approach to the discovery
of the documents attitudes between subjects mentioned
in the text. The approach considers such features as
relatedness between entities, frequency of a named entity
in the text, direct-indirect speech, and other features. The
best quality of opinion extraction obtained in the work
was only about 36% F-measure by two sentiment classes,
which illustrates the necessity of improving extraction of
attitudes at the document level is significant.
      </p>
      <p>
        For the analysis of sentiments with multiple targets
in a coherent text, in the works [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] the concept
of sentiment relevance is discussed. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors
consider several types of thematic importance of the
entities discussed in the text: the main entity, an entity
from a list of similar entities, accidental entity, etc. These
types are treated differently in sentiment analysis of
coherent texts.
      </p>
      <p>
        For relation extraction, in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] the task was modeled
by convolutional neural network towards context
representation based on word embedding features.
Convolving such embedding by a set of different filters,
the authors implemented and trained Convolutional
Neural Network (CNN) model for the relation
classification task. Being applied for SemEval-2010
Task 8 dataset [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] the resulted model significantly
outperforms the results of other participants.
      </p>
      <p>
        However, for the relation classification task, the
original max pooling reduces information extremely
rapid, and hence, blurs significant relation aspects. The
1 https://github.com/nicolay-r/RuSentRel/tree/v1.0
idea was proceeded by the authors of paper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] in terms
of max pooling operation. This operation applies for a
convolved by filters data and extracts maximal value
within each convolution. The authors proposed to treat
each convolution in parts. The division into parts was
related to attitude ends and was as follows: inner, and
outer. This results in an advanced CNN architecture
model and was dubbed as Piecewise Convolutional
Neural Network (PCNN).
      </p>
      <p>
        In this paper, we present an application of the
PCNN model [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] towards sentiment attitudes
extraction. We use automatically trainable features
instead of handcrafted NLP features. For illustrating
effectiveness, we compared our results with original
CNN implementation, and other approaches: baselines,
classifiers based on handcrafted features.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 Dataset</title>
      <p>
        We use RuSentRel 1.0 corpus1 consisted of analytical
articles from Internet-portal inosmi.ru [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These articles
in the domain of international politics were obtained
from foreign authoritative sources and translated into
Russian. The collected articles contain both the author's
opinion on the subject matter of the article and a large
number of references mentioned between the participants
of the described situations.
      </p>
      <p>For the documents, the manual annotation of the
sentiment attitudes towards the mentioned named entities
has been carried out. The annotation can be subdivided
into two subtypes:
 The author's relation to mentioned named entities;
 The relation of subjects expressed as named entities
to other named entities.</p>
      <p>
        Figure 1 illustrates annotated article attitudes in graph
format. These opinions are as Subject-Object relations
type in terms of related terminology [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and recorded as
triplets: (Subject of opinion, Object of opinion, attitude).
The attitude can be negative (neg) or positive (pos), for
example (Author, USA, neg), (USA, Russia, neg). Neutral
opinions are not recorded. The attitudes are described for
the whole documents, not for each sentence. In some
texts, there were several opinions of the different
sentiment orientation of the same subject in relation to
the same object. This, in particular, could be due to the
comparison of the sentiment orientation of previous
relations and current relations (for example, between
Russia and Turkey). Or the author of the article could
mention his former attitude to some subject and indicate
the change of this attitude at the current time. In such
cases, it was assumed that the annotator should specify
exactly the current state of the relationship. In total, 73
large analytical texts were labeled with about 2000
relations.
      </p>
      <p>
        To prepare documents for automatic analysis, the
texts were processed by the automatic name entity
recognizer, based on CRF method [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The program
identified named entities that were categorized into four
classes: Persons, Organizations, Places and Geopolitical
Entities (states and capitals as states). In total, 15.5
thousand named entity mentions were found in the
documents of the collection. An analytical document can
refer to an entity with several variants of naming
(Vladimir Putin – Putin), synonyms (Russia – Russian
Federation), or lemma variants generated from different
wordforms. Besides, annotators could use only one of
possible entity’s names describing attitudes. For correct
inference of attitudes between named entities in the
whole document, the dataset provides the list of variant
names for the same entity found in our corpus. The
current list contains 83 sets of name variants. This allows
separating the sentiment analysis task from the task of
named entity coreference.
      </p>
      <p>A preliminary version of the RuSentRel corpus was
granted to the Summer school on Natural Language
Processing and Data Analysis2, organized in Moscow in
2017. The collection was divided into the training and
test parts. In the current experiments, we use the same
division of the data. Table 1 contains statistics of the
training and test parts of the RuSentRel corpus.
without indication of any sentiment to each other per a
document. This number is much larger than number of
positive or negative sentiments in documents, which
additionally stresses the complexity of the task.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Sentiment attitudes extraction</title>
      <p>In this paper, the task of sentiment attitude extraction
is treated as follows: given an attitude as a pair of its
named entities, we predict a sentiment label of a pair,
which could be positive, negative, or neutral.</p>
      <p>The act of extraction is to select only those pairs,
which were predicted as non-neutral. This leads to the
following questions:
1. How to complete a set of all attitudes?
2. How to predict attitude labels?</p>
      <sec id="sec-4-1">
        <title>4.1 Composing attitude sets</title>
        <p>Given a list of synonym groups  provided by RuSentRel
dataset (see Section 3), let  ( ) is a function which
returns a synonym by given word3 or phrase  .</p>
        <p>The pair of attitudes  1 = ( 1, ,  1, ) and  2 =
( 2, ,  2, ) are equal up to synonyms  1 ≃  2when both
ends related to the same synonym group:
 ( 1, ) =  ( 2, )   ( 1, ) =  ( 2, ) (1)
Using Formula 1 we define that  is a set without
synonyms as follows:</p>
        <p>: ∄  ,   ∈  : {  ≃   ,  ≠  } (2)
To complete a training set   , we first compose
auxiliary sets without synonyms:   is a set of sentiment
attitudes, and   – is a set of neutral attitudes. For   , the
etalon opinions were used to find related named entities
to compose sentiment attitudes.   consist of attitudes
composed between all available named entities of the
train collection. In this paper, the context attitudes were
limited by a single sentence. Finally, completed   is
an expansion   with   :</p>
        <p>=   ∪   : (3)
∄ ,  : {  ≃   ,   ∈   ,   ∈   }</p>
        <p>
          To estimate the model, we complete the test set  
of neutral attitudes without synonyms. It consists of
attitudes composed between all available named entities
within a single sentence of the test collection. Table 2
illustrates amount of attitudes both for the train and test
collections.
For label prediction, we use an approach that exploits a
word embedding model and automatically trainable
features. We implemented an advanced CNN model,
dubbed as Piecewise Convolutional Neural Network
3 The case of synonym absence has been resolved by
completing a new group with the single element { }.
(PCNN), proposed by [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2.1 Attitude embedding</title>
        <p>The attitude embedding is a form of an attitude
representation in a way of a related context, where each
word of a context is an embedding vector. Figure 1
illustrates a context for an attitude with “USA” and
“Russia” as named entities: «…USA is considering the
possibility of new sanctions against Russia…».</p>
        <p>Picking a context that includes attitude entities with
the inner part, we expand it with words by both sides
equally and finally composing a text sample  =
{ 1, . . . ,   } of a size  . Additionally, each   has been
lowercased and lemmatized.</p>
        <p>Let   is a precomputed embedding vocabulary,
which we use to compose word embeddings    . Each
  might be a part of an attitude entity or a text. In the
latter case</p>
        <p>=   (  )4. For attitude entities, we
consider them as single words. Due to that some entities
are phrases (for example “Russian Federation”), the
embedding for them</p>
        <p>calculated as a sum of each
component word   in the phrase:
   =   (  )
(4)</p>
        <p>Given a sample  , for each word   of it, we
compose vector   as a concatenation of vectors   
(word) and a pair of distances ( 1,  2) (position) related
to each entity5. Given a one attitude entity  1, we let
 1, = 
(  ) − 
( 1), where 
(⋅) is a position
index in sample  by a given argument. The same
computations are applied for  2, with the other entity  2
= { 1, … ,   } represents
respectively. Composed  
an attitude embedding matrix.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.2.2 Convolution</title>
        <p>This step of data transformation applies filters towards
the attitude embedding matrix (see Figure 2). Treating
the latter as a feature-based attitude representation, this
approach implements feature merging by sliding a filter
of a fixed size within a data and transforming information
in it.</p>
        <p>According to Section 4.2.1,   ∈ ℝ × 
is an
attitude embedding matrix with a text segment of size 
and vector size  . We regard   as a sequence of rows
4 In case of word absence   in   , the zero vector was used
 = { 1, … ,   }, where   ∈ ℝ . We denote   : as
consequent vectors concatenation from  'th till  'th
positions.</p>
        <p>An
application
of 
∈ ℝ , ( =  ⋅  )
towards the concatenation   : is a sequence convolution
by filter  , where  is a filter window size. Figure 1
illustrates</p>
        <p>= 3. For convolving calculation   , we
apply scalar multiplication as follows:

 =    − +1:
(5)</p>
        <p>Where  ∈ 1 …  is filter offset within the sequence
 . We decide to let   a zero-based vector of size 
case when  &lt; 0 or  &gt;  . As a result,  = { 1, … ,   }
with shape  ∈ ℝ is a convolution of a sequence  by
in
filter  .</p>
        <p>To get multiple feature combinations, a set of
different filters 
= {  , …  
} has been applied
towards the sequence  , where  is an amount of filters.
This leads to a modified Formula 1 by introduced layer
index  as follows:
  , =     − +1:
(6)
Denoting   = {  ,1, … ,   , } in Formula 1 we reduce the
latter
by
index 
and
compose
a
matrix  =
{ 1,  2, … ,   } which represents convolution matrix with
shape 
∈ ℝ</p>
        <p>× . Figure 1 illustrates an example of
convolution matrix with  = 3.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.2.3 Max pooling</title>
        <p>
          Max pooling is an operation that reduces values by
keeping maximum. In original CNN architecture, max
pooling
applies
separately
per
each
convolution
{ 1, … ,   } of  layers (see Figure 3, left).
therefore is not appropriate for attitude classification
task. To keep context aspects that are inside and outside
of the attitude entities, authors [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] perform piecewise
max pooling. Given attitude entities as borders, we divide
each   into inner, left and right segments {  ,1,   ,2,   ,3}
(see Figure 3, right). Then max pooling applies per each
segment separately:
  , =
        </p>
        <p>(  , ),  ∈ 1 …   ∈ 1 … 3</p>
        <sec id="sec-4-4-1">
          <title>Thus, for each</title>
          <p>we have a   = {  ,1,   ,2,   ,3}.</p>
          <p>Concatenation of these sets   : results in  ∈ ℝ3 and
that is a result of piecewise max pooling operation. At
the last step we apply the hyperbolic tangent activation
function. The shape of resulted  remains unchanged:
 = tanh( ),  ∈ ℝ</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.2.4 Sentiment Prediction</title>
        <p>Before we receive a neural network output, the result
 ∈ ℝ3 of the previous step passed through the fully
connected hidden layer:</p>
        <p>=  1 +  ,  1 ∈ ℝ ×3 ,  ∈ ℝ</p>
        <p>It reduces convolved information quite rapidly, and</p>
        <p>In Formula 8,  is an expected amount of classes,
and  is an output vector. The elements of the latter
vectors
are
unscaled
values.</p>
        <p>We
use
a softmax
transformation to obtain probabilities per each output
class. Figure 4 illustrates a 3-dimentional output vector.
To prevent a model from overfitting, we employ dropout
for output neurons during training process.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.2.5 Training</title>
        <p>As a function, the implemented neural network model
depends on the parameters divided into the following
groups:  represents an input for supervised learning, and</p>
        <p>describes hidden states that are trainable during
network optimization. Formula 9 illustrates network 
function dependencies:
 = ( ;  ) = ( ;  ,  1,  )
(9)</p>
        <p>The group of input parameters  consist of  tuples
 = { 1, … ,   }, where</p>
        <p>= (  ,  ) includes attitude
embedding   with the related label  ∈ ℝ . The group
of hidden parameters</p>
        <p>includes a set of convolution
filters  , hidden fully connected layer  1 and bias
(6)
(7)
(8)
1.
2.
3.
4.</p>
        <sec id="sec-4-6-1">
          <title>6 https://code.google.com/p/word2vec/</title>
          <p>following steps:</p>
          <p>The neural network training process includes the
Split  into list of batches  = { 1, … ,   } with the
fixed size of  , where   ∈  ;</p>
        </sec>
        <sec id="sec-4-6-2">
          <title>Randomly choose</title>
          <p />
          <p>from list of batches  to
perform a forward propagation through the network
and receive   = { 1, … ,   } ∈ ℝ ⋅ ;
Given an   we compute cross entropy loss as
follows:
 ( ) = ∑ log  (  |  , ;  ) ,  ∈ 1 … 
(10)</p>
        </sec>
        <sec id="sec-4-6-3">
          <title>Update hidden variables</title>
          <p>of  using the calculated
gradients from the previous step;
Repeat steps 2-4 while the necessary epoch count
will not be reached.</p>
          <p />
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Experiments</title>
      <p>We consider attitudes as a pair of named entities
within a single sentence (see Section 4.1). The distance
in words within pair was limited by segment size 
= 50.</p>
      <p>According to Table 1 (see “Share of attitudes expressed
in a single sentence”) it allows us to cover up to 76.5%
and 74% of sentiment attitudes for the train and test
collections respectively. Table 2 illustrates an amount of
extracted attitudes from train and test collections.</p>
      <sec id="sec-5-1">
        <title>To select an embedding</title>
        <p>model   , the average
distance between attitude entities was taken into account.
According to Table 1 (see «avg. dist. between NE within
a sentence in words»), we were interested in a Skip-gram
based model which covers our estimation. We use a
precomputed and publicly available word2vec6 model7
based on news articles with window size of 20 and vector
size of 1000. To perform text lemmatization, we utilize</p>
      </sec>
      <sec id="sec-5-2">
        <title>Yandex Mystem8.</title>
        <p>
          We use the adadelta optimizer for model training
with parameters that were chosen according to [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. For
dropout probability, the statistically optimal value for
most classification tasks was chosen.
        </p>
        <p>For
model evaluation,
we
measure. It combines recall and precision both by
positive (P) and negative (N) classes. We experimentally
use  1( ,  )-macro
effectiveness of a
model by
varying
study

the</p>
        <p>.</p>
        <p>Table 3 illustrates the results for both implemented
PCNN
model9 and the original CNN
model in runs,
where each run varies in terms of settings. Due to that
 ( ) has a non-convex shape with large amount of local
minimums, and initial hidden state varies by each we
provide multiple evaluation results during the training
process at certain epochs  1( ), where  is an amount of
epochs were passed. According to the obtained results
(see Table 3), we</p>
        <p>may conclude that using greater
amount of filters allows to accelerate training process for</p>
      </sec>
      <sec id="sec-5-3">
        <title>8 https://tech.yandex.ru/mystem/</title>
        <p>
          both models. Comparing original CNN with the
Piecewise version, the model of the latter architecture
reaches top results ( 1( ,  ) ≥ 0.30) significantly
faster. According to Table 4, proposed approach
significantly outperforms the baselines and performs
better than conventional classifiers [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Manually
implemented feature set was used to train KNN, SVM,
Naive Bayes, and Random Forest classifiers [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For the
same dataset, SVM and Naive Bayes achieved 16% by
F-measure, and the best result has been obtained by the
Random Forest classifier (27% F-measure). To assess the
upper bound for experimented methods, the expert
agreement with etalon labeling was estimated (Table 4,
last row). Overall, we may conclude that this task still
remains complicated and the results are quite low. It
should be noted that the authors of the [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], who worked
with much smaller documents written in English,
reported F-measure 36%.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5 Conclusion</title>
      <p>This paper introduces the problem of sentiment
attitude extraction from analytical articles. The key point
of the proposed solution that it does not depend on
handcrafted feature implementation. The models based
on the Convolutional Neural Network architecture were
used.</p>
      <p>In the current experiments, the problem of sentiment
attitude extraction is considered as a three-class machine
learning task. We experimented with CNN-based models
by studying their effectiveness depending on
convolutional filters count. Increasing the latter
parameter accelerates training process. Comparing
original architecture with the piecewise modification, the
model of the latter reaches better results faster. Both
models significantly outperform the baselines and
perform better than approaches based on handcrafted
features.</p>
      <p>Due to the dataset limitation and manual annotating
complexity, in further works we plan to discover
unsupervised pre-training techniques based on
automatically annotated articles of external sources. In
addition, the current attitude embedding format has no
information about related article in whole, which is an
another direction of further improvements.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Alimova</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tutubalina</surname>
          </string-name>
          , E.:
          <article-title>Automated detection of adverse drug reactions from social media posts with machine learning</article-title>
          .
          <source>In: Proceedings of International Conference on Analysis of Images, Social Networks and Texts</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          , (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ben-Ami</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feldman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenfeld</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Entities' sentiment relevance</article-title>
          .
          <source>ACL-2013</source>
          , 2, pp.
          <fpage>87</fpage>
          -
          <lpage>92</lpage>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rashkin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Document-level sentiment inference with social, faction, and discourse context</article-title>
          . In:
          <article-title>Proceedings of the 54th annual meeting of the association for computational linguistics</article-title>
          .
          <source>ACL</source>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>343</lpage>
          , (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <source>J.: MPQA 3</source>
          .
          <article-title>0: An entity/eventlevel sentiment corpus</article-title>
          .
          <source>Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pp.
          <fpage>1323</fpage>
          -
          <lpage>1328</lpage>
          , (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strassel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Overview of linguistic resources for the TAC KBP 2014 evaluations: Planning, execution, and results</article-title>
          .
          <source>Proceedings of TAC KBP 2014 Workshop, National Institute of Standards and Technology</source>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>18</lpage>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Hendrickx</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , et. al.:
          <article-title>Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals</article-title>
          .
          <source>In: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, Association for Computational Linguistics</source>
          , pp.
          <fpage>94</fpage>
          -
          <lpage>99</lpage>
          , (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Loukachevitch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubtsova</surname>
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Sentirueval2016: Overcoming time gap and data sparsity in tweet sentiment analysis</article-title>
          .
          <source>In: Computational Linguistics and Intellectual Technologies Proceedings of the Annual International Conference Dialogue</source>
          , Moscow, RGGU, pp.
          <fpage>416</fpage>
          -
          <lpage>427</lpage>
          , (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Loukachevitch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rusnachenko</surname>
          </string-name>
          , N.:
          <article-title>Extracting sentiment attitudes from analytical texts</article-title>
          .
          <source>In: Proceedings of International Conference of Computational Linguistics and Intellectual Technologies Dialog-2018</source>
          , pp.
          <fpage>455</fpage>
          -
          <lpage>464</lpage>
          , (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9] Mitchell,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Aguilar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , Wilson, T.,
            <surname>Van Durme</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Open domain targeted sentiment</article-title>
          .
          <source>In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <fpage>1643</fpage>
          -
          <lpage>1654</lpage>
          , (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mozharova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loukachevitch</surname>
          </string-name>
          , N.:
          <article-title>Combining knowledge and CRF-based approach to named entity recognition in Russian</article-title>
          .
          <source>In: International Conference on Analysis of Images, Social Networks and Texts</source>
          , pp.
          <fpage>185</fpage>
          -
          <lpage>195</lpage>
          , (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Conditional neural fields</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          , pp.
          <fpage>1419</fpage>
          -
          <lpage>1427</lpage>
          , (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farra</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Semeval2017 task 4: Sentiment analysis in twitter</article-title>
          .
          <source>In: Proceedings of SemEval-2017 workshop</source>
          , pp.
          <fpage>502</fpage>
          -
          <lpage>518</lpage>
          , (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Scheible</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schutze</surname>
          </string-name>
          , H.:
          <article-title>Sentiment relevance</article-title>
          .
          <source>In: Proceedings of ACL 2013 1</source>
          , pp.
          <fpage>954</fpage>
          -
          <lpage>963</lpage>
          , (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Zeiler</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          :
          <article-title>Adadelta: an adaptive learning rate method</article-title>
          .
          <source>arXiv preprint arXiv:1212.5701</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Relation classification via convolutional deep neural network</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: Technical Papers</source>
          , pp.
          <fpage>2335</fpage>
          -
          <lpage>2344</lpage>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distant supervision for relation extraction via piecewise convolutional neural networks</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <fpage>1753</fpage>
          -
          <lpage>1762</lpage>
          , (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Zhang</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vo D</surname>
          </string-name>
          . T.:
          <article-title>Neural networks for open domain targeted sentiment</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <fpage>612</fpage>
          -
          <lpage>621</lpage>
          , (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>