<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analytical Review of Methods for Identifying Emotions in Text Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexey Karpov karpov@iias.spb.su</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oxana Verkholyak</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The sentiment analysis of text is one of the important tasks in the field of natural language processing. It is used in different areas. Despite the variety of existing methods, the systems of sentiment analysis of Russian-language texts give low accuracy compared to English-language ones. The article discusses basic methods for identifying emotions in text data and methods of text vectorization. The existing achievements in the field of computer sentiment analysis are analyzed. At the moment, there are many unsolved problems in the field of automatic sentiment analysis.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Intoduction</title>
      <p>Nowadays, a huge stream of information passes through the Internet, including the
communication of network users. There are many open text sources that display data on people’s
opinions on various issues. To get more complete statistics of opinions, an analysis of the
tonality of the text is necessary.</p>
      <p>Sentiment-analysis is a field of computer linguistics and intellectual analysis of the text,
focused on extracting subjective opinions and emotions from it [Minakov, 2013].
Sentimentanalysis finds practical application in many areas: assessing the quality of goods and services
based on customer reviews on the Internet, analyzing negative emotions in messages,
forecasting stock markets, political situations based on news feeds [Romanov et al, 2018]. Also,
sentiment-analysis is necessary in automated systems in which a person communicates with
a machine in natural language, for example, to analyze message histories [Bondareva and
Lagerev, 2018]. In order to analyze such a volume of information in recent years, various
methods have been proposed for automatically determining the tonality of the text, which
will be considered in this article.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methods of sentiment-analysis</title>
      <p>There are several basic methods for determining the tonality of a text [Zvereva, 2014]. All of
them can be divided into several categories (Figure 1).</p>
      <sec id="sec-2-1">
        <title>A description of each method is provided below.</title>
      </sec>
      <sec id="sec-2-2">
        <title>1. Linguistic methods.</title>
        <p>(a) Method based on the tonal dictionaries.</p>
        <p>A tonal dictionary is a set of words or bigrams that are given a certain probability
(weight) of belonging to a positive (positive weight) or negative (negative weight)
class. The range of weights may be different depending on the dictionary. When
analyzing a text, each word is searched in this dictionary and its weight is recorded.
If the word is not in the dictionary, then its class is considered neutral and the weight
will be zero. After all the weights are obtained, the membership of the given text
in a certain tonality class is calculated. For this, the arithmetic mean of weights is
most often calculated, and in rare cases, the sum of the weights is used, or artificial
neural networks are used. Method based on the tonal dictionaries was used in the
works [Tutubalina et al, 2015] [Posevkin and Bessmertniy, 2015].
(b) Method based on the rules.</p>
        <p>This method requires a large set of rules of construction “if - then" rules. For
example, if the particle “not" stands before an adjective of positive coloring, then
this construction can be classified as negative. This method also implies the use
of tonal dictionaries, in which words belong to a certain class (positive, negative,
neutral, etc.). The sentiment analysis problem was solved using the method based
on the rules in the works [Kan, 2011] [Panicheva, 2013].</p>
      </sec>
      <sec id="sec-2-3">
        <title>2. Machine learning methods.</title>
        <p>(a) Supervised learning.</p>
        <p>The method is based on training the classifier on pre-annotated training text data
[Kormalev, 2004] [Kotel’nikov and Klekovkina, 2012]. The most common methods
in the field of tonal analysis are the naive Bayesian classifier and the support vector
machine method.</p>
        <p>The naive Bayesian classifier (NB) is a probabilistic classifier based on the
application of Bayes theorem with the assumption of class independence. It is used by
the authors of the work [Lewis,1998]. Support Vector Machines (SVM) is a linear
classifier. The main idea of the method is to construct a hyperplane separating
the sample objects in the most optimal way. The algorithm works under the
assumption that the greater the distance (gap) between the separating hyperplane
and the objects of the common class, the smaller the average error of the classifier
will be. The authors in [Zainuddin et al, 2014] used the support vector method to
determine the tonality of the text.</p>
        <p>Neural networks are a set of structured computational elements that mimic the
functioning of the human brain [Barskij, 2004]. Neural networks allow modelling the
relationships between input and output data. In the field of sentimental analysis,
the most common neural networks such as convolutional neural network (CNN)
[Spiros et al, 2018] and recurrent neural networks (RNN) such as a neural network
with a long short-term memory (LSTM) [Ben Amar et al, 2018] and Gated recurrent
unit (GRU) [Aken et al, 2018].
(b) Unsupervised learning.</p>
        <p>In contrast to the above method, the unsupervised learning method determines
the relationship and patterns between objects without labeled data [Voronina and
Goncharov, 2015]. Such methods include the Gaussian mixture model and k-nearest
neighbors.</p>
        <p>K-means - the algorithm is based on the search of k training samples, the distance
to which from the given sample is minimal. The most encountered class among
k objects will be the class of the object of interest. This method was used in the
article [Prabin Lama, 2013].</p>
        <p>The Gaussian mixture model is a probabilistic model that assumes that all data
points are generated from a mixture of a finite number of Gaussian distributions
with unknown parameters. The authors in [Pribill et al, 2014] used the GMM model
in their work.
3. Hybrid methods Methods combining several different methods described above [Pazel’skaya
and Solov’ev, 2011], [Krasnikov and Nikulichev, 2013]. In the article [Konig and Brill,
2006] a hybrid method was used for the problem of text classification, which includes
a method based on tonal dictionaries and a method of support vectors. The authors
achieved 72% accuracy using this method.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Vector representation of documents</title>
      <p>Before using a machine classifier, it is necessary to present the text in numerical form (feature
extraction). There are several ways to vectorize text [Hassaan Saeed, 2018], [Dhingra et al,
2017], they are presented in figure 2.
Bag of Words (BoW) – a model that represents a text as an unordered set of words
[Soumya George et al, 2014]. Each word is assigned its own weight. Often used is
TFIDF (Term Frequency - Inverse Document Frequency)-word weight, reflecting the ratio
of the word frequency in the document to the word frequency in all documents [Qaise
and Ramsha Ali, 2018].</p>
      <p>One-hot encoding (direct encoding) – a method that converts words into vectors. The
size of each vector is equal to the volume of all words in the text. Each element in the
word vector is binary and consists of 0 and 1. Before encoding, all words that are present
in the text are arranged alphabetically [Kedar Potdar et al, 2017], [Chaubard, 2016].
Singular Value Decomposition (SVD) – a method that converts text into a sparse matrix
Amn = fai1; ai2; ai3; : : : ; aing, where aij is a weighted column vector of the frequency of
the sentence members i in the document. If the document contains only m terms and n
sentences, then the output will be a matrix of dimension m n [Jezek and Steinberger,
2004].</p>
      <p>Word2Vec (Toolkit developed by Google) is a neural network that generates word vectors.
It is trained on two algorithms: BoW (predicts the word given context) and Skip-gram
(predicts the context given word). Word2Vec first builds a dictionary from a learning
text corpus and analyzes the vector representations of each word. In addition, Word2Vec
can calculate the cosine distance between each word [Ma and Zhang, 2015].
Glove is a method developed at Stanford University (USA). It is based on a method of
calculating the frequency of words in the text corpus. In fact, it consists of two main
stages. The first stage is the construction of the adjacency matrix from the training
corpus. The second stage is matrix factorization to obtain vectors [Pennington et al,
2014].</p>
      <p>FastText is a model that converts into vectors not only words, but also symbolic
ngrams from which words are composed. Due to this, it seems possible to calculate vector
representations of unknown words [Armand et al, 2016].
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation of classification results</title>
      <p>The classification procedure is followed by a quantitative assessment of the results, which can
be carried out using a set of the following indicators:</p>
      <sec id="sec-4-1">
        <title>Accuracy</title>
      </sec>
      <sec id="sec-4-2">
        <title>Precision</title>
      </sec>
      <sec id="sec-4-3">
        <title>Recall</title>
      </sec>
      <sec id="sec-4-4">
        <title>F-score</title>
        <p>Accuracy =</p>
        <p>T P + T N</p>
        <p>T P + T N + F P + F N
P recision =</p>
        <p>Recall =</p>
        <p>T P
T P + F P</p>
        <p>T P</p>
        <p>T P + F N
F
score = 2</p>
        <p>P recision Recall
P recision + Recall
(1)
(2)
(3)
(4)
where TP – true-positive, FP – false-positive, TN – true-negative, FN – false-negative.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental computer systems for detecting emotions in text</title>
      <p>On the Internet platform Kaggle in 2018, international competitions were held to identify
“toxic" (containing negative emotions) comments (Jigsaw Toxic Comment Classification
Challenge) [Kaggle, 2018]. In total, about 4,500 teams participated in them. The contestants had
to create a model that would detect different types of “toxicity" of the text, such as: threat,
obscenity, insults and hatred.</p>
      <p>The Google team introduced the Jigsaw text data set. The corpus consists of various
English-language text comments from Wikipedia. Each text has one or more labels of six:
toxic, severe toxic, obscene, threatening, insulting, and hateful to the individual. The data
set is divided into training and test sets. The number of samples of the training and test data
set – 159571 and 63978 comments, respectively [Kaggle Data, 2018].</p>
      <p>Researchers in [Saif et al, 2018] tested on the Jigsaw database such models as logistic
regression, as well as three models of neural networks (convolutional neural network-CNN,
neural network with long short-term memory(LSTM ) and combined CNN+LSTM). All
comment texts were tokenized using the CountVectorizer from the scikit-learn Python library. The
best result was shown by the combined neural network (CNN+LSTM), which consists of 2
layers of LSTM and 4 layers of CNN. The accuracy of classification for 6 classes was 96.45%.</p>
      <p>In [Spiros et al, 2018], traditional methods (the Naive Bayes classifier, the method of
k-nearest neighbors, the method of support vectors, and linear discriminant analysis) are
compared with a neural network. A convolutional neural network trained on the word2vec
model was used (the word2vec dictionary contains about 3 million words of English taken from
Google news). The traditional support vector machine approach showed 81.1% accuracy, but
CNN surpassed all traditional approaches and showed 91.2% accuracy.</p>
      <p>The authors of the paper [Noever, 2018] suggested that the combination of several
traditional methods to improve the accuracy of recognition of toxic comments. Thus, when testing
a method based on a random forest on the Jigsaw database, the accuracy is 57.82%. If
regression trees, the support vector method, and logistic regression are added to this method, the
accuracy increases to 62.82%.</p>
      <p>The article [Elnaggar et al, 2018] describes the idea of using a combined neural network
model. The authors used a network consisting of a word embedding layer using the Glove
method (which allows for each word in the text data to obtain a corresponding fixed-length
vector using statistical information about this word), 2 layers of a recurrent neural network
(GRU and LSTM) and 6 layers of a convolutional neural network. In this paper, the result of
the work is estimated using the F-score = 79%.</p>
      <p>In [Mai et al, 2018] the authors predicted toxicity of the comment in 2 stages: first, the
comment was toxic or not, and then, if toxic, the type of toxicity was determined. The work
used an ensemble of neural networks-CNN+LST+GRUB, which showed F-score=87.2%.</p>
      <p>The authors of the work [Ben Amar et al, 2018] conducted experiments not with the
search for the best model, but with various methods of preprocessing texts. They used the
LSTM neural network as a model. After the experiment, the researchers decided to focus
on the best, in their opinion, methods of preprocessing the text. They removed stop words
and links from the text, normalized only bad words to increase their toxicity weight, and
included translation of bad words, in case the word was not in an English dictionary, it would
be searched in dictionaries of other languages. This approach showed an accuracy of 97.72%.</p>
      <p>Researchers [Aken et al, 2018] in their work compared different neural networks:
convolutional and recurrent. The best accuracy on Jigsaw data was shown by a bidirectional neural
network such as GRU. Moreover, using the vector representation of both Glove and FastText
the accuracy was the same 98.3%.</p>
      <p>The results of all studies in the Toxic Comment Classification Challenge are summarized
in the diagram shown in figure 3.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Computer systems of sentiment analysis of Russian texts</title>
      <p>Every year, the international conference "Dialogue" competitions of automatic processing
systems of the Russian language – Dialogue Evaluation [Dialogue Evaluation] are held. One
of the main topics of the competition was sentiment-analysis of the text.</p>
      <p>So, in 2012, the ROMIP database [ROMIP] was provided for the competition, which
included people’s reviews of various films, books and digital cameras. In total, there are about
50 thousand text fragments containing people’s opinions about the product. Tonality marking
was carried out on a 5-point scale. Participants were asked to classify reviews into 2, 3 and
5 classes. The work [Pak and Paroubek, 2012] showed the best result of classification on 5
classes. The authors used the n-gram method of support vectors, used binary weights instead
of the traditional TF-IDF, and trained the model on a combined database consisting of reviews
of movies, books and cameras. Applying this approach on the ROMIP database, the average
value of the F-score = 30.63% is obtained.</p>
      <p>The competition in 2013 used the same database as in 2012, in addition to it, the
organizers included a database of news feeds, which contained direct and indirect speech with
an assessment of the sentiment of the statement [ROMIP]. The score can take one of 4
values: positive, negative, mixed score or no score. This database contains about 5 thousand
news fragments. The authors of [Blinov et al, 2013] achieved the best value of F-score=65.9%
in binary classification, and 35.36% in classification into 5 classes. When dividing the data
into 2 classes, the authors used the method of maximum entropy, and when dividing into 5
classes-the method of support vector machine.</p>
      <p>In 2015, Dialogue Evaluation was assigned a broader task. Participants were provided
with the SentiRuEval-2015 database [SentiRuEval], which includes reviews of restaurants and
cars. The volume of the database was about 18 thousand reviews. SentiRuEval-2015 contained
the text of the review, selected target aspects, i.e. components, or characteristics of the
assessed object. For restaurant themes such aspects are kitchen, interior, service, price. For
cars, the list of aspects includes safety, comfort, reliability, appearance, prices, road quality.
Participants had to perform several tasks: to identify the aspect terms (a set of terms in
which the target aspect is expressed), to determine their tone and the immediate response
itself. The best result of solving this problem is described in [Tarasov, 2015]. The author used
recurrent neural networks and obtained the result of the F-score equal to 61.9% and 64.7%
for restaurants and cars, respectively.</p>
      <p>In 2016, the competition was held on the SentiRuEval-2016 database. It includes feedback
from Twitter (tweets) about banks and mobile operators. In addition to the text of the review
itself, the database contains information about which object the review refers to and about
the assessment of tonality in the range from -1 to 1 (negative, neutral and positive). The task
of the participants was to determine the reputational attitude of the tweet in relation to a
company. The authors of work [Arkhipenko et al, 2016] used a two-layer GRU neural network
and fed the input vector in the reverse sequence. Using this method, the F-score = 55.17%
and 55.94% for banks and mobile operators, respectively, was achieved.</p>
      <p>Several works on sentiment-analysis of Russian-language texts outside the competition
are known. Thus, in [Mirzayanova, 2019] the author collected his own database from the site
kinopoisk.ru, containing people’s reviews of various movies with scores ranging from 1 to 10.
The volume of the database is about 1 million words. For experiment all reviews were divided
first on 3, and then on 5 classes. Various methods were used for classification: the Naive
Bayes classifier, the support vector method, as well as neural networks: multilayer perceptron
and LSTM networks. The naive Bayes classifier showed the best results in classification.
The average value of the F-score was 72% and 26.8% when classified into 3 and 5 classes,
respectively.</p>
      <p>The author of the work [Bob’yakova, 2017] used a combination of several approaches:
supervised machine learning, namely a naive Bayes classifier, and a dictionary approach. The
frequency dictionary initially included 100 words with the maximum frequency in the Russian
language according to the "Frequency Dictionary of the Modern Russian Language". But
since most of the words in the compiled dictionary turned out to be pronouns, conjunctions,
prepositions, the dictionary was reduced to 29 words. The experiments were conducted on
a database consisting of 100,000 texts from Twitter. The author used the removal of stop
words, links, hashtags, as well as words with the maximum frequency of occurrence for the
preprocessing of texts. Documents were presented in the form of bigrams and unigrams. With
this approach, precision reached 86.6%, and recall - 89.1%.</p>
      <p>In work [Barskij, 2004] for the sentiment-analysis the database from "posts" of a social
network "Vkontakte" with the emotional coloring corresponding to them was used. The
volume of the database was about 3000 texts. To classify texts, the author applied a naive
Bayes classifier. This method showed an accuracy of 70%.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Discussion</title>
      <p>Every year the number of conferences in the field of sentimental analysis increases, and the
number of publications on the analysis of the text in Russian and in other foreign languages
grows. According to the Google Academy [Google scholar] in 2018, about 700 works on
sentimental analysis of Russian-language texts were published, while about 8,500 works on
English-language texts were published. Also, based on the above material, it can be determined
that the researchers obtained the accuracy of tonal analysis of Russian-language texts about
80%, and in English-language texts the accuracy reaches 96%.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>The presence of numerous works on the topic of sentient-analysis suggests that the topic is
relevant today and is in demand in many areas such as the economic market, politics,
marketing, etc. But, as can be seen from the analytical review, the systems of sentiment-analysis
of Russian-language texts are less developed than foreign texts. Also, Russian-language
sentiment analysis gives a rather low accuracy compared to English. Therefore, now, the task of
improving the accuracy of tonal analysis of Russian-language texts remains relevant.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>
        This research is supported by RFBR projects No. 18-07-01407 and 19-29-09081.
[Minakov, 2013] Minakov I.A. (2013) Sentiment-analisis of the text and its application to
improve the quality of transitions on relevant ads. Bulletin of Samara state technical
University. Technical science, 1(37), 58-63, Samara, Russia (In Rus.) = Analis emotional’noy
tonal’nosti texta i ego primenenie dlya povishenia kachestva perehodov po relevantnim
obyavleniyam: Vestnic Samarskogo gosudarstvennogo universiteta: Tehnicheskie nauki,
1(37), 58-63, 2013.
[Bondareva and Lagerev, 2018] Bondareva I.V. and Lagerev D.G. (
        <xref ref-type="bibr" rid="ref21 ref24">2018</xref>
        ) Research of methods
of vector representation of textual information for solving the problem of tonality
analysis. All-Russian scientific conference "Information technologies of intellectual decision
support", 10-15, Ufa-Stavropol, Russia (In Rus.) = Issledovanie metodov vectornogo
predstavleniya textovoi informacii dlya reshenia zadachi analiza tonal’nosti.
"Informacionie tehnologii intelectual’noy podderzhki prinyatiya resheniy", 10-15, 2018.
[Zvereva, 2014] Zvereva P.P. (2014) Sentimental analysis of the text (on the material of printed
texts of the newspaper "The New York Times" about Russia and Russians). // Bulletin
of the Moscow state University. Series: Linguistics. 5, 32-37, Moscow, Russia (In Rus.)
= Sentient analiz texta (na materiale pechatnih textov gazeti “The New York Times” o
Rossii I rossianah). Vestnik MGOU, seria: Lingvostica, 5, 32-37, 2014.
[Tutubalina et al, 2015] Tutubalina E.V., Ivanov V.V., Zagulova M.A., Mingazov N.R.,
Alimova I.S. and Malih V.A. (
        <xref ref-type="bibr" rid="ref28">2015</xref>
        ) Testing of methods of text tonality analysis based on
dictionaries. Electronic libraries, 18(3-4), 138-160 (In Rus.) = Testirovanie metodov
analiza tonal’nosti texta osnovanih naslovaryah. Electronnie biblioteki, 18(3-4), 138-160,
2015.
[Posevkin and Bessmertniy, 2015] Posevkin R.V., Bessmertniy I.A.(
        <xref ref-type="bibr" rid="ref28">2015</xref>
        ) Application of
sentimental analysis of texts for evaluation of public opinion. Scientific and technical Bulletin
of information technologies, mechanics and optics, 15(1), 169-171 (In Rus.) =
Prienenie sentiment analiza textov dlya ocenki obshestvennogo mnenia. Nauchno-tehnicheskiy
vestnic inforacionnih tehnologiy mehaniki i optiki, 15(1), 169-171, 2015.
[Panicheva, 2013] Panicheva P. (2013) The system of sentimental analysis ATEX, based on
the rules, in the processing of texts of various subjects. Conference "Dialogue", 2,
101113 (In Rus.) = Sistema sentiment analiza ATEX osnovanaya na pravilah pri obrabotke
textov razlichnih thematic. Komputernaya lingvistica i intelectual’nie tehnologii, 2,
101113, 2013.
[Kormalev, 2004] Kormalev D.A. (
        <xref ref-type="bibr" rid="ref3">2004</xref>
        ) Applications of machine learning methods in text
analysis problems. Software systems: theory and applications: proceedings of the
International conference, Pereslavl-Zalessky, 2, 35-48 (In Rus.) = Prilozhenia metodov
mashinnogo obuchenia v zadachax analiza texta. Trudi conferencii: programmnie sistemi:
teoria I prilozhenia 2, 35-48, 2004.
[Kotel’nikov and Klekovkina, 2012] Kotel’nikov E.V., Klekovkina M.V. (2012) Automatic
text tonality analysis based on machine learning methods. Conference "Dialogue", 11
(18), 27–36 (In Rus.) = Avtomaticheskij analiz tonal’nosti tekstov na osnove metodov
mashinnogo obucheniya. Komp’yuternaya lingvistika i intellektual’nye tekhnologii, 11
(18), 27–36, 2012.
[Lewis,1998] Lewis D.D. (1998) Naive (Bayes) at forty: The independence assumption in
information retrieval. Proceedings of 10th European Conference on Machine Learning,
4–15.
[Zainuddin et al, 2014] Zainuddin, Nurulhuda, Selamat(2014) Sentiment Analysis Using
Support Vector Machine. International Conference on Computer, Communication and
Control Technology, 333-337.
[Prabin Lama, 2013] Prabin Lama (2013) Clustering system based on text mining using the
k-means algorithm. Avaible at https://core.ac.uk/download/pdf/38099883.pdf
[Ma and Zhang, 2015] Long Ma, Yanqing Zhang (
        <xref ref-type="bibr" rid="ref28">2015</xref>
        ) Using Word2Vec
Process Big Text Data. //IEEE International Conference. Avaible
https://www.researchgate.net/publication/291153115_Using_Word2Vec_to_process
_big_text_data
to
at
[Dialogue Evaluation] Competitions Dialogue Evaluation.
      </p>
      <p>21.ru/evaluation/</p>
      <sec id="sec-9-1">
        <title>Avaible at http://www.dialog[Pak and Paroubek, 2012] Pak A., Paroubek P. (2012) Language Independent Approach to</title>
        <p>Sentiment Analysis (LIMSI Participation in ROMIP’11). Conference "Dialogue", 11(18),
37-50.
[ROMIP] ROMIP A collection of quotes from the news stream with markup on the estimated
tonality. Avaible at http://romip.ru/ru/collections/sentiment-news-collection-2012.html
[Blinov et al, 2013] Blinov P.D, Klekovkina M.V., Kotelnikov E.V., Pestov O.A.(2013)
Research of lexical approach and machine learning methods for sentiment analysis.
Proceedings of International Conference on Computational linguistics and intellectual
technologies, 2(12), 48-58.
[Google scholar] Google scholar Avaible at https://scholar.google.ru/</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Romanov et al,
          <year>2018</year>
          ]
          <string-name>
            <surname>Romanov</surname>
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasil'eva M.I.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kurtukova</surname>
            <given-names>A.V.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mesheryakov</surname>
            <given-names>R.V.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Text tonality analysis using machine learning techniques</article-title>
          .
          <source>Proceedings 2nd International Conference "R. PIOTROWSKI'S READINGS LE &amp; '2017"</source>
          ,
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          , SaintPetersburg, Russia (In Rus.) =
          <article-title>Analiz tonal'nosti texta s ispol'zovaniem metodov mashinogo obuchenia</article-title>
          .
          <source>Proceedings 2nd International Conference "R. PIOTROWSKI'S READINGS LE &amp; AL'2017"</source>
          ,
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Kan</source>
          , 2011]
          <string-name>
            <surname>Kan D.</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>Rule-based approach to sentiment analysis at ROMIP</article-title>
          . Avaible at http://www.dialog-
          <volume>21</volume>
          .ru/digests/dialog2012/materials/pdf/Kan.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Barskij</source>
          , 2004]
          <string-name>
            <surname>Barskij</surname>
            <given-names>A.B.</given-names>
          </string-name>
          (
          <year>2004</year>
          )
          <article-title>Neural networks: recognition, management, decisionmaking</article-title>
          . Finance and statistics.
          <volume>176</volume>
          (In Rus.) =
          <article-title>Nejronnye seti: raspoznavanie, upravlenie, prinyatie reshenij</article-title>
          .
          <source>Izd-vo Moskva: Finansy i statistika</source>
          ,
          <volume>176</volume>
          ,
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Spiros et al,
          <year>2018</year>
          ]
          <string-name>
            <surname>Georgakopoulos</surname>
            <given-names>A. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tasoulis</surname>
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vrahatis</surname>
            ,
            <given-names>Plagianakos V.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Convolutional neural networks for ment classification</article-title>
          .
          <source>Proceedings of the 10th Hellenic Conference on Artificial Intelligence</source>
          . Avaible at https://arxiv.org/abs/
          <year>1802</year>
          .09957
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>[Ben</surname>
          </string-name>
          Amar et al,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Ben Amar I.</given-names>
            ,
            <surname>Coppin</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lecomte</surname>
          </string-name>
          <string-name>
            <surname>E.</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Final Report for the Toxic Comment Classification Challenge // The Toxic Comment Classification Challenge Avaible at</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Aken et al,
          <year>2018</year>
          ] Betty van Aken,
          <string-name>
            <given-names>Risch J.</given-names>
            ,
            <surname>Krestel</surname>
          </string-name>
          <string-name>
            <surname>R.</surname>
          </string-name>
          , L¨oser
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Challenges for Toxic Comment Classification: An In-Depth Error Analysis</article-title>
          .
          <source>EMNLP</source>
          ,
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Voronina and Goncharov</source>
          , 2015]
          <string-name>
            <given-names>Voronina I.E.</given-names>
            ,
            <surname>Goncharov</surname>
          </string-name>
          <string-name>
            <surname>V.A.</surname>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Analysis of emotional coloring of messages in social networks (on the example of the network "Vkontakte")</article-title>
          .
          <source>Vestnik VSU series: system analysis and information technologies</source>
          ,
          <volume>4</volume>
          ,
          <fpage>151</fpage>
          -
          <lpage>158</lpage>
          (In Rus.) =
          <article-title>Analiz emocional'noj okraski soobshchenij v social'nyh setyah (na primere seti "Vkontakte")</article-title>
          .
          <source>Vestnik VGU, seriya: sistemnyj analiz i informacionnye tekhnologii</source>
          ,
          <volume>4</volume>
          ,
          <fpage>151</fpage>
          -
          <lpage>158</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Pribill et al,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Pribill</given-names>
            <surname>Ju</surname>
          </string-name>
          .,
          <string-name>
            <surname>Pribilova</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matousek</surname>
            <given-names>J</given-names>
          </string-name>
          . (
          <year>2014</year>
          )
          <article-title>GMM Classification of text-to-speech synthesis: identification of original speaker's voice Avaible at</article-title>
          https://www.researchgate.net/publication/265215872_GMM_Classification_of_Textto-Speech_
          <article-title>Synthesis_Identification_of_Original_Speaker's_Voice</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[Pazel'skaya and Solov'ev, 2011] Pazel'skaya</article-title>
          <string-name>
            <surname>A.G.</surname>
          </string-name>
          ,
          <article-title>Solov'ev</article-title>
          <string-name>
            <surname>A.N.</surname>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>The method of definition of emotions in Russian texts</article-title>
          . // Computational linguistics and intellectual technologies: collection of scientific articles,
          <volume>10</volume>
          (
          <issue>17</issue>
          ),
          <fpage>510</fpage>
          -
          <lpage>522</lpage>
          (In Rus.) =
          <article-title>Metod opredeleniya emocij v tekstah na russkom yazyke // Komp'yuternaya lingvistika i intellektual'nye tekhnologii</article-title>
          .
          <source>Sbornik nauchnyh statej</source>
          ,
          <volume>10</volume>
          (
          <issue>17</issue>
          ),
          <fpage>510</fpage>
          -
          <lpage>522</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Krasnikov and Nikulichev</source>
          , 2013]
          <string-name>
            <given-names>Krasnikov I.A.</given-names>
            ,
            <surname>Nikulichev</surname>
          </string-name>
          <string-name>
            <surname>N.N.</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Hybrid algorithm of classification of text documents based on analysis of internal connectedness of the text</article-title>
          . //Engineering Bulletin of Don.
          <volume>3</volume>
          (In Rus.) =
          <article-title>Gibridnyj algoritm klassifikacii tekstovyh dokumentov na osnove analiza vnutrennej svyaznosti teksta</article-title>
          .//Inzhenernyj vestnik Dona.
          <volume>3</volume>
          ,
          <year>2013</year>
          . Avaible at http://www.ivdon.ru/ru/magazine/archive/n3y2013/1773
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Konig and Brill</source>
          , 2006]
          <string-name>
            <surname>Konig</surname>
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brill</surname>
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>Reducing the human overhead in text categorization</article-title>
          .
          <source>//In Proceedings of the 12th ACM SIGKDD conference on knowledge discovery and data mining</source>
          ,
          <fpage>596</fpage>
          -
          <lpage>603</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>[Hassaan</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Hafiz</given-names>
            <surname>Hassaan</surname>
          </string-name>
          <string-name>
            <given-names>Saeed</given-names>
            , Khurram Shahzad, Faisal
            <surname>Kamiran</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Overlapping Toxic Sentiment Classification Using Deep Neural Architectures</article-title>
          .
          <source>Proceedings of the 2018 IEEE International Conference on Data Mining Workshops</source>
          ,
          <fpage>1361</fpage>
          -
          <lpage>1366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Dhingra et al,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Bhuwan</given-names>
            <surname>Dhingra</surname>
          </string-name>
          , Hanxiao Liu, Ruslan Salakhutdinov,
          <string-name>
            <given-names>William W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>Comparative Study of Word Embeddings for Reading Comprehension</article-title>
          . Avaible at https://arxiv.org/pdf/1703.00993.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>[Soumya</surname>
          </string-name>
          George et al,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Soumya</given-names>
            <surname>George</surname>
          </string-name>
          <string-name>
            <given-names>K</given-names>
            ,
            <surname>Shibily Joseph</surname>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Text Classification by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature</article-title>
          . //IOSR Journal of Computer Engineering,
          <volume>16</volume>
          (
          <issue>1</issue>
          ),
          <fpage>34</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Qaise and Ramsha Ali</source>
          ,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Shahzad</given-names>
            <surname>Qaise</surname>
          </string-name>
          , Ramsha
          <string-name>
            <surname>Ali</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Text Mining: Use of TFIDF to Examine the Relevance of Words to Documents</article-title>
          . //International Journal of Computer Applications,
          <volume>181</volume>
          (
          <issue>1</issue>
          ),
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>[Kedar</surname>
          </string-name>
          Potdar et al,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Kedar</given-names>
            <surname>Potdar</surname>
          </string-name>
          , Taher S. Pardawala, Chinmay D.
          <string-name>
            <surname>Pai</surname>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers</article-title>
          . //International Journal of Computer Applications,
          <volume>175</volume>
          (
          <issue>4</issue>
          ),
          <fpage>7</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Chaubard</source>
          , 2016]
          <string-name>
            <given-names>Francois</given-names>
            <surname>Chaubard</surname>
          </string-name>
          , Rohit Mundra, Richard Socher (
          <year>2016</year>
          )
          <article-title>Deep Learning for NLP</article-title>
          . //Lecture Notes: Part I. Avaible at https://tensorflowkorea.files.wordpress.com/
          <year>2017</year>
          /03/cs224n-2017winter
          <string-name>
            <surname>-</surname>
          </string-name>
          notes-all.pdf
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Jezek and Steinberger</source>
          , 2004]
          <string-name>
            <given-names>Karel</given-names>
            <surname>Jezek</surname>
          </string-name>
          , Josef
          <string-name>
            <surname>Steinberger</surname>
          </string-name>
          (
          <year>2004</year>
          )
          <article-title>Text Summarization and Singular Value Decomposition</article-title>
          . //International Conference on Advances in
          <source>Information Systems</source>
          ,
          <volume>245</volume>
          -
          <fpage>254</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Pennington et al,
          <year>2014</year>
          ] Jeffrey Pennington, Richard Socher,
          <string-name>
            <surname>Christopher D.</surname>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>EMNLP</source>
          ,
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Armand et al,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Joulin</given-names>
            <surname>Armand</surname>
          </string-name>
          , Grave Edouard, Bojanowski Piotr and
          <string-name>
            <given-names>Mikolov</given-names>
            <surname>Tomas</surname>
          </string-name>
          .(
          <year>2016</year>
          )
          <article-title>Bag of Tricks for Efficient Text Classification</article-title>
          . Avaible at https://www.researchgate.net/publication/319770220_Bag_of_Tricks_for_Efficient _Text_Classification
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Kaggle</source>
          , 2018]
          <article-title>Kaggle: Toxic Comment Classification Challenge</article-title>
          . (
          <year>2018</year>
          ). Avaible at https://www.kaggle.com/c/jigsaw-toxic
          <article-title>-comment-classification-challenge/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[Kaggle Data</source>
          ,
          <year>2018</year>
          ] Kaggle.
          <article-title>Toxic Comment Classification Challenge</article-title>
          . Data. (
          <year>2018</year>
          ). Avaible at https://www.kaggle.com/c/jigsaw-toxic
          <article-title>-comment-classification-challenge/data</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Saif et al,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Mujahed</given-names>
            <surname>Saif</surname>
          </string-name>
          , Alexander Medvedev, Maxim Medvedev, Todorka
          <string-name>
            <surname>Atanasova</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Classification of online toxic comments using the logistic regression and neural networks models</article-title>
          .
          <source>//Proceedings of the 44th international conference "applications of mathematics in engineering and economics"</source>
          ,
          <year>2048</year>
          (
          <volume>1</volume>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>[Noever</source>
          , 2018]
          <string-name>
            <surname>Noever D.A.</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Machine Learning Suites for Online Toxicity Detection</article-title>
          . Avaible at https://www.researchgate.net/publication/328146083_Machine_Learning_Suites _for_Online_Toxicity_Detection
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Elnaggar et al,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Ahmed</given-names>
            <surname>Elnaggar</surname>
          </string-name>
          , Bernhard Waltl, Ingo Glaser, Jorg Landthaler, Elena Scepankova, Florian
          <string-name>
            <surname>Matthes</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Stop Illegal Comments: A Multi-Task Deep Learning Approach</article-title>
          .
          <source>Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference</source>
          ,
          <volume>41</volume>
          -
          <fpage>47</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Mai et al,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Mai</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          , Marwan Torki, Nagwa
          <string-name>
            <surname>El-Makky</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning</article-title>
          .
          <source>IEEE International Conference on Machine Learning and Applications</source>
          ,
          <volume>875</volume>
          -
          <fpage>878</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [SentiRuEval]
          <fpage>SentiRuEval</fpage>
          -2015 database. Avaible at https://drive.google.com/drive/folders /0B7y8Oyhu03y_fjNIeEo3UFZObTVDQXBrSkNxOVlPaVAxNTJPR1Rpd2U1WEktUV Nkcjd3Wms
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>[Tarasov</source>
          , 2015]
          <string-name>
            <surname>Tarasov D.</surname>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Deep recurrent neural networks for aspect-oriented analysis of user feedback tonality in different languages</article-title>
          .
          <source>International conference "Dialogue"</source>
          , V.
          <volume>2</volume>
          ,
          <issue>14</issue>
          (
          <issue>21</issue>
          ),
          <fpage>53</fpage>
          -
          <lpage>64</lpage>
          (In Rus.) =
          <article-title>Glubokie rekurrentnye nejronnye seti dlya aspektno-orientirovannogo analiza tonal'nosti otzyvov pol'zovatelej na razlichnyh yazykah. Komp'yuternaya lingvistika i intellektual'nye tekhnologii: po materialam ezhegodnoj mezhdunarodnoj konferencii "Dialog"</article-title>
          , V.
          <volume>2</volume>
          ,
          <issue>14</issue>
          (
          <issue>21</issue>
          ),
          <fpage>53</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [Arkhipenko et al,
          <year>2016</year>
          ]
          <string-name>
            <surname>Arkhipenko</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlov</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trofimovich</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skorniakov</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomzin</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turdakov</surname>
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Comparison of neural network architectures for sentiment analysis of Russian tweets</article-title>
          .
          <source>Proceedings of International Conference on Computational linguistics and intellectual technologies</source>
          . Avaible at http://www.dialog21.ru/media/3380/arkhipenkoetal.pdf
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[Mirzayanova</source>
          , 2019] Mirzayanova S.V.
          <article-title>Sentient-analysis of text</article-title>
          . (
          <year>2019</year>
          ).
          <article-title>Final qualifying work</article-title>
          ITMO University, SPb, Russia, p.
          <volume>45</volume>
          (In Rus.) =
          <article-title>Analiz tonal'nosti teksta</article-title>
          , VKR,
          <string-name>
            <surname>Universitet</surname>
            <given-names>ITMO</given-names>
          </string-name>
          ,
          <volume>45</volume>
          p.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [Bob'yakova, 2017] Bob'yakova
          <string-name>
            <surname>D.A.</surname>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>Development of an application for analyzing the tonality of texts from social networks</article-title>
          .
          <source>Final qualifying work ITMO</source>
          University, SPb, Russia,
          <volume>74</volume>
          p.
          <article-title>(In Rus</article-title>
          .) =
          <article-title>Razrabotka prilozheniya dlya analiza tonal'nosti tekstov iz social'nyh setej</article-title>
          , VKR,
          <string-name>
            <surname>Universitet</surname>
            <given-names>ITMO</given-names>
          </string-name>
          ,
          <volume>74</volume>
          p.,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>