<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Development of a Model to Predict Intention Using Deep Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikolay Karpov</string-name>
          <email>nkarpov@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Demidovskij</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey Malafeev</string-name>
          <email>amalafeev@yandex.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a method to analyze discussions from social network by using deep learning. We have prepared a new dataset by collecting discussions from a social network and annotating remarks of the discussion. The annotation consists of two types of labels for each message: intention type and direction of intention. Using this dataset and pre-trained word embeddings we have evaluated two neural network structures. On the basis of evaluation, we chose a model to automatically predict intention types and direction of intention of an arbitrary message from any social network.</p>
      </abstract>
      <kwd-group>
        <kwd>natural language processing</kwd>
        <kwd>intention analysis</kwd>
        <kwd>deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>There is a currently growing interest in social network analysis due to the
expanded role of the latter. The fundamental trend is that people are mostly
communicating and collaborating inside these social networks. As the majority
of people now have at least one account at a social network and people
communicate there via exchanging text messages, it seems to be extremely important
to be able to analyze this type of data and reveal its hidden properties.</p>
      <p>One of the most popular formats of communication in a social network is a
phenomenon of a post. It is an arbitrary message, expressing thoughts and ideas
of a speaking person. Such a post usually appeals to people's emotions and the
audience starts to actively discuss it by putting more and more comments. There
is a signi cant peculiarity of such discussions that the topic of the discussion
usually changes very fast and at the end is no longer connected with the subject
of the source post.</p>
      <p>Why is it so important to predict the intention of the given text? The idea of
manipulating the discussion and the message it brings is currently an active area
of research in the eld of political linguistics. Indeed, it is extremely important
to make sure that the dialog of the candidate for the place of the President with
the audience makes the right message and manipulation. Taking into account
the variability of discussion topic described earlier, it is vital to make sure that
the discussion still stays on the necessary path for the author of the post theme.
From our point of view, the rst step in solving this task is to be capable of
predicting the intention of the speaker automatically.</p>
      <p>At the same time, an increasing amount of publications in the sphere of
Internet texts analysis reveals quite interesting properties of modern texts. In
particular, every speech act has its own intention - the aim and will to express
any idea. In addition to it, each intention has its own direction, which means
that any phrase of the social network user can be directed towards the author,
himself etc. More importantly, due to a huge amount of these texts, it is quite
easy to collect a large database which is vital for modern means of data analysis.</p>
      <p>While the debate over the intent analysis seems to gain popularity, there is a
gap in absence of mathematical models and applied instruments for automated
prediction of the intention. Although there are existing studies of making text
classi ers, the task of predicting the intention of the given arbitrary text stays
unaccomplished.</p>
      <p>We consider a building machine learning algorithm to predict intentions in
social network. Our main contribution is the following:
1. Make speci c dataset with each remark of the discussion annotated;
2. Annotation consists of two types of labels for each message: intention and
direction of intention;
3. Successfully apply machine learning algorithm to predict intention and
direction of intention;</p>
      <p>The remaining part of the current study is the following. In Section 2 we will
give the detailed overview of the existing approaches and signi cant theoretical
trends. Further, in Section 3 we will describe the experiment methodology we
elaborated to perform experiments. The detailed overview of the deep learning
architecture will be given in the Section 4. Results of the proposed approach
will be shown in the Section 5. Finally, we will make a conclusion and further
research directions analysis in the Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>A foundation of our research is dual. First, it is based on the research in a
psychology of communication. Second, we use modern methods for automatic
natural language processing.</p>
      <p>
        Let us start from psychology. Early works go back to at least sixties of last
century when J. Austin and J. Searle [
        <xref ref-type="bibr" rid="ref1">1, 10</xref>
        ] made the theory of speech acts.
J. Austin classi es speech acts into three types: locutionary, illocutionary and
perlocutionary acts. Our research deals only with illocutionary acts in the eld
of political discourse. Like some other researchers [14] we use a nite set of
intention types. Oleshkov M.Yu. proposed to typify not a speech acts, but to
typify a communicative strategy as a general goal of speech act [8]. We use his
set of intention types with additions, proposed by Radina N.K. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
      </p>
      <p>
        The task to automatically identify a type of communicative strategy from
the nite set of types can be formulated as a classi cation problem. A classi
cation problem of short messages is a well-known problem in the natural language
processing eld. This problem is traditionally solved by using machine learning
approaches. For instance, sentences can be classi ed according to their
readability using prebuilt features and SVM, Random Forest, Classi cation and others
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Short messages from social networks can be classi ed according to its
sentiment polarity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        A recent success of neural networks application showed that transfer learning
allows to improve classical machine learning methods. [
        <xref ref-type="bibr" rid="ref3">12, 3</xref>
        ]. Essentially it is
important when training dataset is too small. That is why in our study we
decide to use neural networks as a classi er.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experiment Methodology</title>
      <sec id="sec-3-1">
        <title>Dataset Creation</title>
        <p>One of the key elements of our research was a creation of the appropriate dataset.
It was decided to use the most popular Russian social network - VKontakte1 as
the source of discussions and texts. Raw data was downloaded using VKMiner
program2. However, we performed rigorous ltering of these discussions. In
particular, all the discussions with the phrase quantity less than 40 were rejected as
well as meaningless comments, e.g. empty messages or photos instead of texts.</p>
        <p>
          There is no doubt that the expert labeling of each text in accordance with the
intention class that it represents is important [8]. Moreover, as we have already
mentioned, it is vital to also mark the direction of the intention[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. That is
why during the labeling process, experts used both the letter, which represents
the intention type, and the digit, which represents the direction of the given
intention. Such a labeling shifts focus from traditional methodology of intention
analysis to the hybrid one, which enables the opportunity to classify the text
very precisely from an intentional point of view.
        </p>
        <p>The way how the given dataset was used in building the automatic intention
classi er will be described in the next section.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Classi cation Settings</title>
        <p>
          To predict intentions automatically from a nite set of intentions we should
create a classi er of a user message. Oleshkov M.Yu. proposed 25 intention types
[8]. Radina N.K. proposed to group these intention types into 5 supertypes [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
according to Habermas as shown in Table 2. Additionally we have 4 directions
of the intention shown in Table 1. Therefore, we have the following classes:
{ 25x4 = 100 intention types and directions;
{ 25 intention types;
{ 5 intention supertypes;
{ 4 directions of the intention;
        </p>
        <sec id="sec-3-2-1">
          <title>1 http://vk.com</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>2 https://linis.hse.ru/soft-linis</title>
          <p>As the classi er, we use two types of neural networks with traditional
architecture for text classi cation task. To evaluate results of classi er we use
precession, recall, and f-measure score.</p>
          <p>W X Y
t
a u</p>
          <p>p e
m od e e
r
o r</p>
          <p>R sp
f
n ep o
I r T in
e
c
n e
a d
t a
p l
e o
c c</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Deep Learning Architecture</title>
      <p>In this section, we brie y describe our choice of architecture, a regularization
method and a training algorithm.
4.1</p>
      <sec id="sec-4-1">
        <title>Embeddings</title>
        <p>The idea of representing each word in a text or whole texts as vectors have
come to occupy the central place in modern methodologies of text analysis. Such
vectors, which represent words, are called embeddings, and can be easily trained
with word2vec, Glove etc. [7, 9]. In general, these vectors can be trained on the
given dataset, however, the dataset needs to be quite big, e.g. Russian Wikipedia,
which includes 600 millions of words, can be used, to provide sensible results.
Although our nal dataset contains 21192 texts, there are also 100 classes, which
means that there is not enough data to train our word vectors on. That is why it
was decided to use the existing embeddings collection3 trained on the Ruscorpora
[6].</p>
        <p>However, exploration of the given embeddings resulted in the necessity of
preprocessing each word from the source dataset in a way there is a matching
vector in the embeddings collection. In particular, each word has to be in a form
"in nite form + form name", e.g. "ran" should be translated to "run Verb".
This processing required the use of the automatic tool capable of performing the
morphologic analysis of each word in a given text. As a part of this research, the
MyStem4, a tool developed by Yandex, let us make necessary word forms and
reuse already trained embeddings.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Convolution Layers</title>
        <p>Convolutional neural network are state-of-the-art semantic composition models
for text classi cation task [13]. We use three series-connected composition cells
with maxpooling.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Recurrent Layers</title>
        <p>Recurrent layers are proved to be useful in handling variable length sequences
[13]. We use two series-connected long short-term memory (LSTM) cells to
compute continuous representations of tweets with semantic composition.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Regularization</title>
        <p>We use dropout as the regularizer to prevent our network from over tting [11].
Our dropout layer selects a half of the hidden units at random and sets their
output to zero and thus prevents co-adaptation of the features.</p>
        <sec id="sec-4-4-1">
          <title>3 http://rusvectores.org/en/models</title>
        </sec>
        <sec id="sec-4-4-2">
          <title>4 https://tech.yandex.ru/mystem/</title>
          <p>(a) Structure of the LSTM network.</p>
          <p>(b) Structure of the CNN network.
We initialize our embedding layer with the help of pre-trained vectors on the
Russian National Corpus and Russian Wikipedia. Other layers in our neural
networks were initialized randomly. Then we trained them on the train subsets
using Adam method for stochastic optimization of an objective function.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>In the part to follow the experimental results will be shown in Table 3 as well
as the detailed explanations.</p>
      <p>The anticipated results are intended to bring capabilities to solve classi
cation task of using two models (CNN 1b and LSTM 1a) in the foreground.
Firstly, both models were used to predict the class of the intention by the given
text and the result is quite poor for all models (accuracy less than 0.05). This
proves the hypothesis that the dataset used is too small (21192 texts) for such
a huge number of classes (100 classes). Considering this fact it was decided to
continue experiments with fewer classes. The authors elaborated three strategies
to overcome this obstacle.</p>
      <p>The rst strategy was to try to predict only the intention type. Instead of
trying to predict one of 100 classes, we can try to predict only 25 based on
the general classi cation of intention types - Table 2. This experiment brought
us to slightly better results, although the top accuracy was still lower that 0.1.
In the second strategy, we made an attempt to predict only the direction of the
intention. In this case, models should have predicted one of 4 classes of intentions
direction - Table 1. The validation of the second strategy hypothesis proved that
the main reason of low prediction performance in previous attempts was the huge
gap in a concordance of the dataset size and number of classes to predict. The
nal strategy was to try to predict the intention type by a supertype. Indeed,
the intention classi cation by Habermas (Table 2) provides the way of joining
intentions types in 5 supertypes. The results of this experiment showed that
when trying to predict one of 5 supertypes, both models can correctly classify
intentions of every third text.</p>
      <p>Taken together, the interim results lead us to the conclusion that LSTM
model over-performs the CNN model in almost all cases, although it usually
takes much more time to train LSTM model even for several epochs and brings
us to the conclusion that LSTM is much more suitable architecture for text
classi cation tasks and quite expensive from the computational standpoint.</p>
      <p>Finally, the creation of the rst-ever dataset in Russian language labeled
according to the speci c intentions types and the directions is extremely important.
The dataset contains 21992 items, which represent 100 classes. Also, the trained
model and the source code are available publicly for usage and enhancement5
under the business-friendly license (MIT).
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>The aim of this research was to nd a model, suitable to predict intentions of
users which they express in discussions in social networks. We have prepared
a new dataset by collecting discussions from a social network and annotating
remarks of the discussion. The annotation consists of two types of labels for</p>
      <sec id="sec-6-1">
        <title>5 https://github.com/demid5111/intentions-analysis</title>
        <p>each message: intention and direction of intention. All discussions were dedicated
to political topics. Using this dataset and retrained word embeddings we have
built two models of a neural network to automatically predict an intention of
an arbitrary message from any social network user. Experimental results showed
that model based on LSTM allows to obtain better results. The classi cation
by the directions of intention showed the best accuracy. We explain it not only
by a low number of classes, but because of the fact that directions are often
represented using explicit words.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The article was prepared within the framework of the Academic Fund Program
at the National Research University Higher School of Economics (HSE) in 2017
(grant N17-05-0007) and by the Russian Academic Excellence Project "5-100".
6. Andrey Kutuzov and Elizaveta Kuzmenko. WebVectors: A Toolkit for Building
Web Interfaces for Vector Semantic Models, pages 155{161. Springer International
Publishing, Cham, 2017.
7. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean.
Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems, pages 3111{3119, 2013.
8. M. Yu. Oleshkov. Simulation of the communication process: monograph. Nizhny</p>
      <p>Tagil gos.sots.-ped.akademiya (et al.), 2006.
9. Je rey Pennington, Richard Socher, and Christopher D Manning. Glove: Global
vectors for word representation. In EMNLP, volume 14, pages 1532{1543, 2014.
10. John R Searle. Speech acts: An essay in the philosophy of language, volume 626.</p>
      <p>Cambridge university press, 1969.
11. Nitish Srivastava. Improving neural networks with dropout. PhD thesis, University
of Toronto, 2013.
12. Dario Stojanovski, Gjorgji Strezoski, Gjorgji Madjarov, and Ivica Dimitrovski.</p>
      <p>Finki at semeval-2016 task 4: Deep learning architecture for twitter sentiment
analysis. In Proceedings of the 10th International Workshop on Semantic
Evaluation (SemEval-2016), pages 149{154, San Diego, California, June 2016. Association
for Computational Linguistics.
13. Duyu Tang, Bing Qin, and Ting Liu. Document modeling with gated recurrent
neural network for sentiment classi cation. In EMNLP, pages 1422{1432, 2015.
14. Latynov V.V., Cepcov V.A., and Alexeev K.I. Words in action: Intent-analysis of
political discourse. 2000.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. John Langshaw Austin.
          <article-title>How to do things with words</article-title>
          . Oxford university press,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Radina Nadezhda</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Intent alysis of online discussions (using examples from the internet portal inosmi</article-title>
          .
          <source>ru)</source>
          .
          <source>Mediascope, (4)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Nikolay</given-names>
            <surname>Karpov</surname>
          </string-name>
          .
          <article-title>Nru-hse at semeval-2017 task 4: Tweet quanti cation using deep learning architecture</article-title>
          .
          <source>In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</source>
          , pages
          <fpage>681</fpage>
          {
          <fpage>686</fpage>
          , Vancouver, Canada,
          <year>August 2017</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Nikolay</given-names>
            <surname>Karpov</surname>
          </string-name>
          , Julia Baranova, and
          <string-name>
            <given-names>Fedor</given-names>
            <surname>Vitugin</surname>
          </string-name>
          .
          <article-title>Single-sentence readability prediction in russian</article-title>
          .
          <source>In International Conference on Analysis of Images, Social Networks and Texts x000D</source>
          , pages
          <fpage>91</fpage>
          {
          <fpage>100</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Nikolay</given-names>
            <surname>Karpov</surname>
          </string-name>
          , Alexander Porshnev, and
          <string-name>
            <given-names>Kirill</given-names>
            <surname>Rudakov</surname>
          </string-name>
          .
          <article-title>NRU-HSE at SemEval2016 Task 4: Comparative Analysis of Two Iterative Methods Using Quanti cation Library</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
          , pages
          <fpage>171</fpage>
          {
          <fpage>177</fpage>
          , San Diego, California, June 2016.
          <article-title>Association for Computational Linguistics. bibtex: karpov-porshnev-rudakov:2016:SemEval</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>