Introduction

Development of a Model to Predict Intention Using Deep Learning

Nikolay Karpov

nkarpov@hse.ru 0

Alexander Demidovskij

Alexey Malafeev

amalafeev@yandex.ru 0 0 National Research University Higher School of Economics , Russia

This paper presents a method to analyze discussions from social network by using deep learning. We have prepared a new dataset by collecting discussions from a social network and annotating remarks of the discussion. The annotation consists of two types of labels for each message: intention type and direction of intention. Using this dataset and pre-trained word embeddings we have evaluated two neural network structures. On the basis of evaluation, we chose a model to automatically predict intention types and direction of intention of an arbitrary message from any social network.

natural language processing intention analysis deep learning

Introduction

There is a currently growing interest in social network analysis due to the expanded role of the latter. The fundamental trend is that people are mostly communicating and collaborating inside these social networks. As the majority of people now have at least one account at a social network and people communicate there via exchanging text messages, it seems to be extremely important to be able to analyze this type of data and reveal its hidden properties.

One of the most popular formats of communication in a social network is a phenomenon of a post. It is an arbitrary message, expressing thoughts and ideas of a speaking person. Such a post usually appeals to people's emotions and the audience starts to actively discuss it by putting more and more comments. There is a signi cant peculiarity of such discussions that the topic of the discussion usually changes very fast and at the end is no longer connected with the subject of the source post.

Why is it so important to predict the intention of the given text? The idea of manipulating the discussion and the message it brings is currently an active area of research in the eld of political linguistics. Indeed, it is extremely important to make sure that the dialog of the candidate for the place of the President with the audience makes the right message and manipulation. Taking into account the variability of discussion topic described earlier, it is vital to make sure that the discussion still stays on the necessary path for the author of the post theme. From our point of view, the rst step in solving this task is to be capable of predicting the intention of the speaker automatically.

At the same time, an increasing amount of publications in the sphere of Internet texts analysis reveals quite interesting properties of modern texts. In particular, every speech act has its own intention - the aim and will to express any idea. In addition to it, each intention has its own direction, which means that any phrase of the social network user can be directed towards the author, himself etc. More importantly, due to a huge amount of these texts, it is quite easy to collect a large database which is vital for modern means of data analysis.

While the debate over the intent analysis seems to gain popularity, there is a gap in absence of mathematical models and applied instruments for automated prediction of the intention. Although there are existing studies of making text classi ers, the task of predicting the intention of the given arbitrary text stays unaccomplished.

We consider a building machine learning algorithm to predict intentions in social network. Our main contribution is the following: 1. Make speci c dataset with each remark of the discussion annotated; 2. Annotation consists of two types of labels for each message: intention and direction of intention; 3. Successfully apply machine learning algorithm to predict intention and direction of intention;

The remaining part of the current study is the following. In Section 2 we will give the detailed overview of the existing approaches and signi cant theoretical trends. Further, in Section 3 we will describe the experiment methodology we elaborated to perform experiments. The detailed overview of the deep learning architecture will be given in the Section 4. Results of the proposed approach will be shown in the Section 5. Finally, we will make a conclusion and further research directions analysis in the Section 6. 2

Related Works

A foundation of our research is dual. First, it is based on the research in a psychology of communication. Second, we use modern methods for automatic natural language processing.

Let us start from psychology. Early works go back to at least sixties of last century when J. Austin and J. Searle [ 1, 10 ] made the theory of speech acts. J. Austin classi es speech acts into three types: locutionary, illocutionary and perlocutionary acts. Our research deals only with illocutionary acts in the eld of political discourse. Like some other researchers [14] we use a nite set of intention types. Oleshkov M.Yu. proposed to typify not a speech acts, but to typify a communicative strategy as a general goal of speech act [8]. We use his set of intention types with additions, proposed by Radina N.K. [ 2 ]

The task to automatically identify a type of communicative strategy from the nite set of types can be formulated as a classi cation problem. A classi cation problem of short messages is a well-known problem in the natural language processing eld. This problem is traditionally solved by using machine learning approaches. For instance, sentences can be classi ed according to their readability using prebuilt features and SVM, Random Forest, Classi cation and others [ 4 ]. Short messages from social networks can be classi ed according to its sentiment polarity [ 5 ].

A recent success of neural networks application showed that transfer learning allows to improve classical machine learning methods. [ 12, 3 ]. Essentially it is important when training dataset is too small. That is why in our study we decide to use neural networks as a classi er.

Experiment Methodology Dataset Creation

One of the key elements of our research was a creation of the appropriate dataset. It was decided to use the most popular Russian social network - VKontakte1 as the source of discussions and texts. Raw data was downloaded using VKMiner program2. However, we performed rigorous ltering of these discussions. In particular, all the discussions with the phrase quantity less than 40 were rejected as well as meaningless comments, e.g. empty messages or photos instead of texts.

There is no doubt that the expert labeling of each text in accordance with the intention class that it represents is important [8]. Moreover, as we have already mentioned, it is vital to also mark the direction of the intention[ 2 ]. That is why during the labeling process, experts used both the letter, which represents the intention type, and the digit, which represents the direction of the given intention. Such a labeling shifts focus from traditional methodology of intention analysis to the hybrid one, which enables the opportunity to classify the text very precisely from an intentional point of view.

The way how the given dataset was used in building the automatic intention classi er will be described in the next section. 3.2

Classi cation Settings

To predict intentions automatically from a nite set of intentions we should create a classi er of a user message. Oleshkov M.Yu. proposed 25 intention types [8]. Radina N.K. proposed to group these intention types into 5 supertypes [ 2 ] according to Habermas as shown in Table 2. Additionally we have 4 directions of the intention shown in Table 1. Therefore, we have the following classes: { 25x4 = 100 intention types and directions; { 25 intention types; { 5 intention supertypes; { 4 directions of the intention;

1 http://vk.com 2 https://linis.hse.ru/soft-linis

As the classi er, we use two types of neural networks with traditional architecture for text classi cation task. To evaluate results of classi er we use precession, recall, and f-measure score.

W X Y t a u

p e m od e e r o r

R sp f n ep o I r T in e c n e a d t a p l e o c c

Deep Learning Architecture

In this section, we brie y describe our choice of architecture, a regularization method and a training algorithm. 4.1

Embeddings

The idea of representing each word in a text or whole texts as vectors have come to occupy the central place in modern methodologies of text analysis. Such vectors, which represent words, are called embeddings, and can be easily trained with word2vec, Glove etc. [7, 9]. In general, these vectors can be trained on the given dataset, however, the dataset needs to be quite big, e.g. Russian Wikipedia, which includes 600 millions of words, can be used, to provide sensible results. Although our nal dataset contains 21192 texts, there are also 100 classes, which means that there is not enough data to train our word vectors on. That is why it was decided to use the existing embeddings collection3 trained on the Ruscorpora [6].

However, exploration of the given embeddings resulted in the necessity of preprocessing each word from the source dataset in a way there is a matching vector in the embeddings collection. In particular, each word has to be in a form "in nite form + form name", e.g. "ran" should be translated to "run Verb". This processing required the use of the automatic tool capable of performing the morphologic analysis of each word in a given text. As a part of this research, the MyStem4, a tool developed by Yandex, let us make necessary word forms and reuse already trained embeddings. 4.2

Convolution Layers

Convolutional neural network are state-of-the-art semantic composition models for text classi cation task [13]. We use three series-connected composition cells with maxpooling. 4.3

Recurrent Layers

Recurrent layers are proved to be useful in handling variable length sequences [13]. We use two series-connected long short-term memory (LSTM) cells to compute continuous representations of tweets with semantic composition. 4.4

Regularization

We use dropout as the regularizer to prevent our network from over tting [11]. Our dropout layer selects a half of the hidden units at random and sets their output to zero and thus prevents co-adaptation of the features.

3 http://rusvectores.org/en/models 4 https://tech.yandex.ru/mystem/

(a) Structure of the LSTM network.

(b) Structure of the CNN network. We initialize our embedding layer with the help of pre-trained vectors on the Russian National Corpus and Russian Wikipedia. Other layers in our neural networks were initialized randomly. Then we trained them on the train subsets using Adam method for stochastic optimization of an objective function.

Results

In the part to follow the experimental results will be shown in Table 3 as well as the detailed explanations.

The anticipated results are intended to bring capabilities to solve classi cation task of using two models (CNN 1b and LSTM 1a) in the foreground. Firstly, both models were used to predict the class of the intention by the given text and the result is quite poor for all models (accuracy less than 0.05). This proves the hypothesis that the dataset used is too small (21192 texts) for such a huge number of classes (100 classes). Considering this fact it was decided to continue experiments with fewer classes. The authors elaborated three strategies to overcome this obstacle.

The rst strategy was to try to predict only the intention type. Instead of trying to predict one of 100 classes, we can try to predict only 25 based on the general classi cation of intention types - Table 2. This experiment brought us to slightly better results, although the top accuracy was still lower that 0.1. In the second strategy, we made an attempt to predict only the direction of the intention. In this case, models should have predicted one of 4 classes of intentions direction - Table 1. The validation of the second strategy hypothesis proved that the main reason of low prediction performance in previous attempts was the huge gap in a concordance of the dataset size and number of classes to predict. The nal strategy was to try to predict the intention type by a supertype. Indeed, the intention classi cation by Habermas (Table 2) provides the way of joining intentions types in 5 supertypes. The results of this experiment showed that when trying to predict one of 5 supertypes, both models can correctly classify intentions of every third text.

Taken together, the interim results lead us to the conclusion that LSTM model over-performs the CNN model in almost all cases, although it usually takes much more time to train LSTM model even for several epochs and brings us to the conclusion that LSTM is much more suitable architecture for text classi cation tasks and quite expensive from the computational standpoint.

Finally, the creation of the rst-ever dataset in Russian language labeled according to the speci c intentions types and the directions is extremely important. The dataset contains 21992 items, which represent 100 classes. Also, the trained model and the source code are available publicly for usage and enhancement5 under the business-friendly license (MIT). 6

Conclusion

The aim of this research was to nd a model, suitable to predict intentions of users which they express in discussions in social networks. We have prepared a new dataset by collecting discussions from a social network and annotating remarks of the discussion. The annotation consists of two types of labels for

5 https://github.com/demid5111/intentions-analysis

each message: intention and direction of intention. All discussions were dedicated to political topics. Using this dataset and retrained word embeddings we have built two models of a neural network to automatically predict an intention of an arbitrary message from any social network user. Experimental results showed that model based on LSTM allows to obtain better results. The classi cation by the directions of intention showed the best accuracy. We explain it not only by a low number of classes, but because of the fact that directions are often represented using explicit words.

Acknowledgments

The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (grant N17-05-0007) and by the Russian Academic Excellence Project "5-100". 6. Andrey Kutuzov and Elizaveta Kuzmenko. WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models, pages 155{161. Springer International Publishing, Cham, 2017. 7. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111{3119, 2013. 8. M. Yu. Oleshkov. Simulation of the communication process: monograph. Nizhny

Tagil gos.sots.-ped.akademiya (et al.), 2006. 9. Je rey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In EMNLP, volume 14, pages 1532{1543, 2014. 10. John R Searle. Speech acts: An essay in the philosophy of language, volume 626.

Cambridge university press, 1969. 11. Nitish Srivastava. Improving neural networks with dropout. PhD thesis, University of Toronto, 2013. 12. Dario Stojanovski, Gjorgji Strezoski, Gjorgji Madjarov, and Ivica Dimitrovski.

Finki at semeval-2016 task 4: Deep learning architecture for twitter sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 149{154, San Diego, California, June 2016. Association for Computational Linguistics. 13. Duyu Tang, Bing Qin, and Ting Liu. Document modeling with gated recurrent neural network for sentiment classi cation. In EMNLP, pages 1422{1432, 2015. 14. Latynov V.V., Cepcov V.A., and Alexeev K.I. Words in action: Intent-analysis of political discourse. 2000.

1. John Langshaw Austin. How to do things with words . Oxford university press, 1975 .

2. Radina Nadezhda

Intent alysis of online discussions (using examples from the internet portal inosmi . ru) . Mediascope, (4) , 2016 .

Nikolay

Karpov . Nru-hse at semeval-2017 task 4: Tweet quanti cation using deep learning architecture . In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) , pages 681 { 686 , Vancouver, Canada, August 2017 . Association for Computational Linguistics .

Nikolay

Karpov , Julia Baranova, and

Fedor

Vitugin . Single-sentence readability prediction in russian . In International Conference on Analysis of Images, Social Networks and Texts x000D , pages 91 { 100 . Springer, 2014 .

Nikolay

Karpov , Alexander Porshnev, and

Kirill

Rudakov . NRU-HSE at SemEval2016 Task 4: Comparative Analysis of Two Iterative Methods Using Quanti cation Library . In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) , pages 171 { 177 , San Diego, California, June 2016. Association for Computational Linguistics. bibtex: karpov-porshnev-rudakov:2016:SemEval .