<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Approach to Processing News Text Messages Based on Markeme Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Sychev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Voronezh State University</institution>
          ,
          <addr-line>Voronezh</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>313</fpage>
      <lpage>324</lpage>
      <abstract>
        <p>The complexity problem of automatic filtering of messages retrieved from online media platforms and social networks is discussed. The review of approaches in document representation, feature weighting schemes and feature selection techniques is provided. In the paper an approach to the text messages processing based on the markeme analysis is suggested. Markemes identification is based on calculating the Index of Textual Markedness (InTeM). Markemes are words most important for a particular text and occur with the frequency, which is higher than that of the words of the same length. Preliminary results of the exploratory study of the proposed approach, applied to the news messages classification and clustering, are presented and discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>markeme</kwd>
        <kwd>index of textual markedness</kwd>
        <kwd>word form</kwd>
        <kwd>term</kwd>
        <kwd>message</kwd>
        <kwd>skewness coefficient</kwd>
        <kwd>classification</kwd>
        <kwd>clustering</kwd>
        <kwd>feature weighting</kwd>
        <kwd>feature selection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The problem of automatic message processing is considered usually as complex one
due to the following features of the content published on online media platforms and
in social networks:
─ “fuzzy” subject matter of message texts and comments;
─ small length of text in a message or comment;
─ heterogeneity of published texts in terms of stylistics, the level of literacy of the
authors, etc.;
─ a large volume of published messages and comments per unit of time.
of natural language. It also describes a method for assessing the informational
significance of lexical units in natural language texts. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] the quality estimates for several
methods of thematic classification (rubrication) of news messages, using various
numerical estimates of information significance as features, were experimentally
obtained on the “20 news groups” data set.
      </p>
      <p>
        In general, the text classification pipeline includes the following steps: text features
selection (extraction), dimensionality reduction, application of known classification
techniques or development of new ones, evaluation of the classification model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The feature selection which is reducing dimensionality, removing irrelevant data,
and increasing the learning accuracy, is essential to tackling problems some problems,
such as the curse of dimensionality and model overfitting, caused by the high
dimensionality of data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        For the feature selection there are different techniques, e.g. TF or TF-IDF [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
Word2Vec [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], GloVe [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], FastText [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Contextualized Word Representations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and their modification. All these techniques could be related to one of the two general
feature selection approaches as follows: weighted words and word embedding [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Word embedding techniques require a huge corpus of text data sets for training [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
As well, this approach cannot work for words missing from these data sets.
      </p>
      <p>Weighted words technique use a simplified representation of the text, usually in the
form of a bag-of-words (BOW) vector model, which allows to use of fairly simple
and fast algorithms for processing text documents and messages. In the BOW model,
the text is represented as a set of words, usually without taking into account the
grammar or the order of words sequence, but using information about the frequency
of words in the text. When solving the problem of documents classification, the word
frequency of occurrence is used as a decisive feature for training the classifier.
Exactly the word frequency is used in well-known methods for estimating the information
significance of words in the text, for example, in TF-IDF metric based methods.</p>
      <p>
        Some feature selection techniques can be not efficient for specific applications,
depending on the goal and data set of the application. For example, GloVe does not
perform as well as TF-IDF when used for short text messages [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Several widely used unsupervised and supervised term weighting methods on
benchmark data collections in combination with SVM and k-NN algorithms were
considered in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As was stated in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] the term weighting assignment is combined
to improve both recall and precision measures by a multiplication operation from two
factors: term frequency factor (tf) and collection frequency factor (idf). Several
different collection frequency factors, namely, the multipliers of 1, a conventional
inverse collection frequency factor (idf), a probabilistic inverse collection frequency
(idf-prob), a  2 factor, an information gain (ig) factor, a gain ratio (gr) factor, an
Odds Ratio (OR) factor, and proposed by authors novel relevance frequency (rf)
factor were studied in experiments. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] five term scoring methods for automatic term
extraction on different types of text collections were evaluated to investigate the
influence of three factors in the success of a term scoring method in term extraction:
collection size, background collection and the importance of multi-word terms. One
important conclusion from [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is that all term scoring methods could not demonstrate
the high level of performance for collections smaller than 1,000 words due to the
prevailing of the frequency criterion in all methods.
      </p>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] the dimensionality reduction techniques can be organized into
three groups: feature selection (FS), feature projection, and instance selection. While
the first two types of methods aim to reduce the dimensionality of the feature space,
the third aims to reduce the number of instances used for training.
      </p>
      <p>In FS methods, the resulting feature set is a subset of the initial feature set. The
feature projection results in a new group of features mapped from the original features.</p>
      <p>
        FS methods are usually classified into three categories: filter, wrapper, and
embedded [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Filter methods are executed independently of the classifier learning activity.
Wrapper methods encapsulate the classifier performance to assess the relevance of
features or search for the most relevant subset of features. Embedded methods include
FS as part of the training process.
      </p>
      <p>A relevant advantage of selecting features is in the resulting feature set which is a
subset of the original features. Each resulting feature preserves the meaning of the
original feature.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the InTeM index (the Index of Textual Marking of a Word Form) is used
to assess the degree of subjective weight of a word form in the text. The authors in
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] assume that each word form in the text has two parameters: frequency and
length. At the same time, in their opinion, the frequency of the word form is a
complex subjective-objective indicator, and the length of the word form is a simple
objective – linguistic one. Hence, the subjective (i.e. meaningful) weight of a word form
can be obtained by subtracting the simple objective factor (i.e. the weight of the word
form according to its length) from the complex subjective-objective factor (i.e. the
weight of the word form according to its frequency). The resulting value - the Index
of Textual Markedness of a word form (InTeM) - will indicate the degree of
subjective (textual) weight of a given word form for a given text. Thus, in fact, it is
proposed to calculate the following indicator to assess the informational significance of
the word form t i from a text message m:
where
      </p>
      <p>ITM i  WF i WLi</p>
      <p>Nt i
 f j  j1 f j
WF i  j1</p>
      <p>Nt
 f j
j1</p>
      <p>, i  N t
WLi  j1</p>
      <p>Lm l
 f (len)   f (len)
j j1 j
Lm f (len)
j1 j</p>
      <p>, i  N t</p>
      <p>Word forms t i from the text should be ranked in descending order of their
frequency f i in the general list of all word forms of the text. Frequency f (jlen) indicates
the number of occurrences for all word forms t, having the length j in the text of
message m. N t is the total number of different word forms in the text message m. Lm is
defined as the maximum word form's length in the text message m. The length of the
t i is denoted as l.</p>
      <p>Word forms with the maximum value of ITM i are called markemes and form the
set of the most significant word forms for the author of the text.</p>
      <p>In this paper the possibility of using the markeme model of texts for standard
problems of classification, clustering and thematic categorization based on the example of
a collection of news text messages is considered, and preliminary results of the
exploratory study are presented.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>For the purposes of the study, a set M containing 760 text messages on several topics
was formed. The set M included messages in approximately equal proportions of four
topics marked up by experts manually. The average message size was 145.6 words.
The total number of unique terms in the dictionary D, built from lemmas, which were
extracted from their message texts, was about 12 thousand units, and 600 terms,
which had a total frequency of occurrence for the entire set M at least 28, were
selected for the study, The maximum total frequency of occurrence in the set M for a term
from the dictionary was 1281.</p>
      <p>Figure 1 shows the frequency distribution in the M of terms from vocabulary D
along the length. As you can see, long terms (more than 10-12 characters of length)
are found in messages with a low frequency, which reflects the objective linguistic
realities. Within the framework of the markeme approach, the excess of the frequency
of occurrence in the text T of a specific term f i of length l relative to the frequency
f l(len) typical for terms of length l gives grounds for including it in the set of
markemes MKT of a given text T. One should note that the calculation of ITM i in the
framework of the markeme approach does not take into account the form of the
frequency distribution function over the length of word forms.</p>
      <p>This kind of distribution can also be calculated for each text message individually.</p>
      <p>In the study, all terms with ITM i index value exceeded zero were identified as
markemes.</p>
      <p>Table 1 shows the number values of markemes identified from text messages and
averaged by topic categories. N MK 1 is the number of markemes identified from the
global (message collection) frequency distribution of terms along the length, N MK 2 is
the number of markemes identified from the local (i.e. inside individual message)
frequency distribution of terms along the length in the message.</p>
      <p>It is obvious that those markemes that are characterized by a relatively high
frequency f M in messages and a relatively large value of the distribution asymmetry
index (skewness) Sk over the entire set of messages M will be useful in further study.</p>
      <sec id="sec-2-1">
        <title>Topic</title>
        <p>oAf ivdneifrafaemgreeensntsuatmegrebmesr freAquveenracgieessiunma omfetsesramge
Medicine 6.3 3.3 39.2 54.2
Accidents 8.8 5.1 54.0 74.1
Politics 10.8 6.2 58.6 90.0
Sports 8.1 4.2 50.3 75.3
Mean: 8.6 5.1 51.0 74.2</p>
        <p>In this study two indicators of asymmetry for markemes were considered:
─ Sk1 is the skewness of the markeme distribution across the four topics in the M
message set;
─ Sk 2 is the skewness of the markeme distribution over the entire set of messages M
as a whole.</p>
        <p>The Sk 2  Sk 2 section of the scatter matrix shows the histogram of Sk 2 values
distribution. The highest distribution density is observed near the value 10 of Sk 2
variable. The f  Sk 2 section proves this observation. For the Sk1 variable values the
distribution density is concentrated in the vicinity of the value 2.
─ the frequency of occurrence f M i noticeably differs from the minimum values;
─ the topic parameter Sk1 tends to the limit value 2 (good topic specificity) or the
Sk 2 parameter value is in the vicinity of 10 (a good indicator of the markeme
specificity in M).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiment</title>
      <p>For the experiment, the filter strategy for feature selection was chosen to reduce the
dimensionality of the features space. Features filtering was realized using f M i , Sk1 ,
and Sk 2 parameters.</p>
      <p>Markemes Mk i , for which the parameter values satisfied two conditions: f M i  6
, Sk1  1.9 , were selected from the total set of markemes identified from M. Table 2
provides a list of the 58 markemes identified from the M set in this way.
For the messages classification a naive Bayesian classifier, supplied with an
assessment of the quality of classification by cross-validation method (10 folds), was used.
The obtained estimates of the quality are given in Table 3, where rows indicate the
classifier predictions for corresponding topic and columns are related to true topics in
tested data. The Accuracy value was 82%. Accuracy was calculated as ratio: (sum of
correct classifier predictions) / (total number of testing examples). For comparison,
Table 4 provides the estimates for the same classification, except that all the terms
(600 units) from the D dictionary were used as attributes of the frequency vector of
messages. The Accuracy value was 89%.</p>
      <sec id="sec-3-1">
        <title>True/Prediction</title>
        <sec id="sec-3-1-1">
          <title>Prediction topic 1</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Prediction topic 2</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Prediction topic 3</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Prediction topic 4</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Class recall True topic 1</title>
        <p>It is noteworthy that although, in general, the markemes list representation of
messages worsened the Accuracy value by about 7% , there was an improvement in recall
and precision in some topics. For example, the recall of the topic 1 ("Medicine")
improved significantly (with a significant decrease in the precision value), for topics 2,3
("Incidents", "Politics") there was an improvement in the precision value (while the
recall value decreased). The significant decrease of classification Accuracy (table 3)
is due to decreasing in the class precision for the topic 1 and the class recall for topics
2,3. This effect can be considered as a payment for essential features space
dimensionality reduction. One can see in the table 2 that the list of selected
markemesfeatures is too short to provide the high level of class accuracy. Perhaps a more
flexible scheme for selecting f M i , Sk1 and Sk 2 parameters values could improve the
situation. The dimensionality reduction of the feature space for solving the problem of
message classification has happened to be more than 10 times.</p>
      </sec>
      <sec id="sec-3-3">
        <title>True/Prediction</title>
        <sec id="sec-3-3-1">
          <title>Prediction topic 1</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Prediction topic 2</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>Prediction topic 3</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>Prediction topic 4</title>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Class recall</title>
      </sec>
      <sec id="sec-3-5">
        <title>True</title>
        <p>topic 1
130
9
1
20
True
topic 1</p>
        <p>8
190
10
0
81,3%
91,4%
True
topic 1
16
10
164</p>
        <p>3
85,0%
True
topic 1
2
0
5
192
96,5%</p>
        <p>Class
precision
83,3%
90,9%
82,4%
98,0%</p>
      </sec>
      <sec id="sec-3-6">
        <title>True/Prediction</title>
        <p>For comparison purposes, there was carried out a messages classification, based on
a frequency vector with attribute terms selected on the basis of Sk1  1.9 filter ( Sk1
factor to some extent could be considered an analogue of the IDF factor in algorithms
with TF-IDF). In fact, the boolean conversion of the Sk1 factor was used as collection
(topic) frequency factor (IDF). The total number of terms selected from the D was
152. The experiment results are given in Table 5. The value of the Accuracy was
84.3%.
3.2</p>
      </sec>
      <sec id="sec-3-7">
        <title>Messages Clustering</title>
        <p>For the set of markemes (given in Table 2) as attributes of message vectors K-means
clustering was carried out. Table 6 summarizes the results of this experiment. As you
can see from the table, the markeme set allows to accurately identify the thematic core
in a set of messages for each topic, but at the same time most of the messages from
the topic class subset are thematically vague. An increase in the clustering recall
index can be achieved by softening the constraints (for parameters f M i and Sk1 ) when
selecting markems. It is worth noting that the clustering result is quite sensitive to the
choice of the initial conditions for the clustering algorithm.</p>
        <p>The implementation of clustering with a markeme list representation of messages
in topics unknown in advance situation, makes it impossible to calculate the Sk1
parameter. In this case, one can suggest to calculate the Sk 2 parameter, which is not
tied to specific topics. The topics of the clusters identified in this way could be
determined by calculating the correlation coefficients between the frequency-dominant
markemes within the identified clusters. Table 7 shows a fragment of the table of
correlation coefficients for markeme pairs (for Sk1  1.9 ). When calculating the
correlation, the frequency of occurrence of markemes in messages (760 frequencies
totally) was used as coordinates of the markeme vector.
0,77
0,67
0,60
0,59
0,54
0,51
0,51
0,51
0,51
0,50
0,49
0,49
0,48
0,46
0,46
0,45
0,44
0,43
0,43
0,43
0,43
0,41
0,41
0,41
0,40</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The preliminary results of the study presented in this paper allow us to draw several
conclusions regarding the possible use of the markeme approach in text messages
classification, clustering and thematic categorization.
1. Based on the method of identifying the word form as markeme in the text, it should
reflect the degree of subjective (author's) weight of this word form for a particular
text. Since a lot of news text messages come from different online platforms, it is
basically impossible to talk about a single authorship in the stream of news
messages. In this case, the analysis of the text of a news messages is significantly
different from the analysis of a large text, for example, a literary work. Of course,
markeme analysis is more suitable for use as a work tool for linguistic research of
texts.
2. From the point of view of the messages classification and clustering performance
evaluation, both representation of a text message by a vector of markemes
frequency and representation it by a vector of terms frequency based on the calculation of
the TF factor give quite comparable results. Some degradation in classification
accuracy can be considered as a payment for essential features space dimensionality
reduction. Perhaps, a more flexible scheme for selecting f M i , Sk1 and Sk 2
parameters values could improve the situation.
3. From the computing point of view, the markeme model of messages has the
advantage: to identify the markeme from the text, it is enough to have the body of
text itself only, but not the entire set of texts, as it is required, for example, when
computing the TF-IDF factor. Of course, the text size is should be sufficient
enough to calculate the term frequencies.
4. The markemes as a features space basis could be considered as a good choice for
filter strategy in the feature selection procedure to cut the effects of curse of
dimensionality and model overfitting. The choice of markemes based on the
threshold values of the f M i , Sk1 and Sk 2 parameters can be used to construct an
"orthogonal" basis (in some sense) in the feature space of terms for evaluating, for
example, the "blurring" degree of existing topic sections and the need to reorganize
their structure. Markemes can also be used for keywords generation and annotating
news messages.
5. The threshold values for f M i , Sk1 and Sk 2 in fact are considered as tuning
parameters in the feature selection procedure to improve both recall and precision
measures. In this way the choice of the values mentioned above will depend on the
target recall and precision levels.</p>
      <p>Of course a relatively small collection of news texts and 4 topics are used for
experiments, but the paper presents preliminary results of exploratory research. Further
experiments will engage extended both the size of collection and the number of
topics. More experiments and comparison with existing weighting schemes for
improving document representation are expected further.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lande</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morozov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmokhval</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An Approach to Identifying Duplicate Messages in News Information Streams (</article-title>
          <year>2006</year>
          ). URL http://dwl.kiev.ua/art/rdcl/rcdl2006.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mbaykodzhi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dral</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sochenkov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Short Text Messages Classification Method</article-title>
          .
          <source>Journal of Information Technologies and Computing Systems, issue 3</source>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>102</lpage>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Zhebel</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zharikova</surname>
            ,
            <given-names>S.-N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sochenkov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Feature Selection for Text Classification of a News Flows Based on Topical Importance Characteristic</article-title>
          .
          <source>Artificial Intelligence and Decision Making, issue 3</source>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>59</lpage>
          (
          <year>2019</year>
          ).
          <article-title>(in Russian)</article-title>
          . https://doi.org/10.14357/20718594190306.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kowsari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jafari Meimandi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heidarysafa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnes</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , D.:
          <article-title>Text classification algorithms: a survey</article-title>
          .
          <source>Inf.Switz</source>
          .
          <volume>10</volume>
          . (
          <year>2019</year>
          ). https://doi.org/10.3390/info10040150.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pintas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Feature Selection Methods for Text Classification: a Systematic Literature Review</article-title>
          .
          <source>Artif.Intell.Rev</source>
          . (
          <year>2021</year>
          ). https://doi.org/10.1007/s10462- 021-09970-6.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Salton</surname>
            , G.; Buckley,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Inf. Process. Manag</source>
          .
          <year>1988</year>
          ,
          <volume>24</volume>
          , pp.
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          . (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Word2vec explained: Deriving mikolov</article-title>
          et al.'
          <article-title>s negative-sampling word-embedding method</article-title>
          .
          <source>arXiv</source>
          <year>2014</year>
          , arXiv:
          <fpage>1402</fpage>
          .
          <fpage>3722</fpage>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global Vectors for Word Representation</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , Doha, Qatar,
          <fpage>25</fpage>
          -
          <issue>29</issue>
          <year>October 2014</year>
          ; vol.
          <volume>14</volume>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>arXiv</source>
          ,
          <year>2016</year>
          , arXiv:
          <fpage>1607</fpage>
          .
          <fpage>04606</fpage>
          . (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Melamud</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldberger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dagan</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>context2vec: Learning Generic Context Embedding with Bidirectional LSTM</article-title>
          .
          <source>In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning</source>
          , Berlin, Germany,
          <fpage>11</fpage>
          -12
          <source>August</source>
          <year>2016</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>61</lpage>
          (
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>K16</fpage>
          -1006.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Supervised and Traditional Term Weighting Methods for Automatic Text Categorization</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          , vol.
          <volume>31</volume>
          , no.
          <issue>04</issue>
          , pp.
          <fpage>721</fpage>
          -
          <lpage>735</lpage>
          . (
          <year>2009</year>
          ). https://doi: 10.1109/TPAMI.
          <year>2008</year>
          .110
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Verberne</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sappelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hiemstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kraaij</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Evaluation and Analysis of Term Scoring Methods for Term Extraction</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>19</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>510</fpage>
          -
          <lpage>545</lpage>
          (
          <year>2016</year>
          ). https://doi.org/10.1007/s10791-016-9286-2.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mirończuk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Protasiewicz</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Recent Overview of the State-of-the-art Elements of Text Classification, Expert Systems with Applications</article-title>
          , vol.
          <volume>106</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>54</lpage>
          . (
          <year>2018</year>
          ) https://doi.org/10.1016/j.eswa.
          <year>2018</year>
          .
          <volume>03</volume>
          .058.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Feature Selection: A literature Review</article-title>
          .
          <source>The Smart Computing Review</source>
          , vol.
          <volume>4</volume>
          ., pp.
          <fpage>211</fpage>
          -
          <lpage>229</lpage>
          . (
          <year>2014</year>
          ) https://doi.org/10.6029/smartcr.
          <year>2014</year>
          .
          <volume>03</volume>
          .007.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Faustov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kretov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Concept of Markeme and Interim Results of Markeme Analysis of Russian Literature</article-title>
          . Proceedings of Voronezh State University. Series:
          <article-title>Linguistics and intercultural communication, issue 4</article-title>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2017</year>
          ).
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>