<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dynamic Topic Modeling of Russian Prose of the First Third of the XXth Century by Means of Non-Negative Matrix Factorization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ekaterina Zamiraylova e.zamiraylova@gmail.com</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Olga Mitrofanova</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saint Petersburg State University</institution>
          ,
          <addr-line>Saint Petersburg, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>This paper describes automatic topic spotting of literary texts based on the Russian short stories corpus, compiling stories written in the first third of the XXth century. Non-negative matrix factorization (NMF) is a valuable alternative to existing approaches of dynamic topic modeling and it can find niche topics and related vocabularies that are not captured by existent methods. The experiments were conducted on text samples extracted from the corpus, the given samples contain texts of 300 different authors. This approach allows to trace the topic dynamics of Russian prose for 30 years from 1900 to 1930.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In the last decade topic modeling has become one of the most popular issues of computer
linguistics. Topic modeling is usually understood as building a model that shows which topics
appear in each document [Daud et al., 2010]. The topic model of a collection of text documents
determines whether each document belongs to a different topic and it generates a list of words
(terms) from which each topic is formed [Blei, Lafferty, 2006]. With this method, it is possible
to process large amounts of data (fiction texts, magazine articles, news reports, social media,
reviews, etc.) and automatically receive information about the topics of texts. Knowing what
people are talking about and understanding their concerns and opinions is very valuable for
science, business, political campaigns, etc.</p>
      <sec id="sec-1-1">
        <title>Currently a large number of methods for topic modeling have been created. The most common in modern applications are methods based on Bayesian networks, which are probabilistic models on oriented graphs. Probabilistic topic models belong to a relatively young</title>
        <p>field of research in an unsupervised theory. Probabilistic latent semantic analysis (PLSA)
based on the principle of maximum likelihood was one of the first proposed as an alternative
to classical clustering methods based on the calculation of distance functions. Next to PLSA
latent Dirichlet allocation (LDA) and its numerous generalizations were proposed.</p>
        <p>Following on from LDA methods similar probabilistic approaches have been consistently
developed to track the evolution of topics over time in a sequentially organized corpus of
documents, such as the dynamic topic model (DTM) [Blei, Lafferty, 2006]. Alternative
algorithms, such as non-negative matrix factorization (NMF) [Lee and Seung, 1999] considered
in this paper, have proven effective in finding underlying topics in text corpora [Wang et al.,
2012]. For this reason, that algorithm was chosen for this study, which is aimed to automatic
selection of dynamic topics in the Russian short stories corpus of the first third of the XXth
century [see Sherstinova, Martynenko in this volume].
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Selection rationale</title>
      <sec id="sec-2-1">
        <title>Non-negative matrix factorization (NMF) is an unsupervised algorithm of machine learning</title>
        <p>that aims to detect useful features [Mu¨ller and Guido, 2017]. It is utilized for dimensionality
of non-negative matrices, because it decomposes the data into factors in such a way that there
are no negative values in them. Therefore, this method can be applied only to those data
where features have non-negative values, as a non-negative sum of non-negative components
cannot become negative [the same].</p>
      </sec>
      <sec id="sec-2-2">
        <title>One of the advantages of NMF over existing LDA methods is that fewer parameter vari</title>
        <p>ants are used in the modelling process [Darek and Cross, 2016]. In addition, another benefit
is that NMF can identify niche topics that tend to be under-reported in traditional LDA
approaches [O’Callaghan et al., 2015]. Niche topics are sub-topics that can be identified within
a dynamic topic.</p>
        <p>The ability of NMF to consider how significant a word is to a document in a text collection,
based on weighted term frequency values, is particularly useful. In particular, the application
of log-based TF-IDF weighting factor to the data before the construction of the topic model
contributed to diverse but semantically coherent topics that are less likely to be represented by
the same high-frequency terms [Darek and Cross, 2016]. This makes NMF a suitable method
for identifying both broad groups of high-level documents and niche topics with specialized
dictionaries [O’Callaghan et al., 2015].
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental design</title>
      <sec id="sec-3-1">
        <title>The experiment is based on a two-level strategy of topic modeling within the framework of</title>
        <p>non-negative matrix factorization to the Russian short stories corpus of the first third of the</p>
      </sec>
      <sec id="sec-3-2">
        <title>XXth century. This strategy is that the first stage is an application of the topic modeling</title>
      </sec>
      <sec id="sec-3-3">
        <title>NMF to one set of texts from a fixed period of time, the second stage is a combination of results of topic modeling from successive periods of time for detecting a set of dynamic topics related to a particular time window or the whole corpus. 2</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Linguistic data set</title>
      <sec id="sec-4-1">
        <title>The material for this paper is a selected data from the Russian short stories corpus of the</title>
        <p>first third of the XXth century, which is developed at the Philology Department of Saint
Petersburg State University in cooperation with Philology Department of the National Research</p>
      </sec>
      <sec id="sec-4-2">
        <title>University Higher School of Economics, Saint Petersburg [Martynenko et al., 2018a; Shersti</title>
        <p>nova, Martynenko, 2019]. The data set consists of 300 stories of 300 unique writers both
world-famous and barely known. The corpus is a homogeneous resource, which is focused on
one of the most common genres of fiction the short story. This genre is the most popular
among prose writers, its presence may be found in almost all kind of literary of almost all
writers.</p>
        <p>The corpus under development covers one of the most dramatic periods in the development
of the Russian language and literature. The central point that divides the first third of the
twentieth century into different time periods is the October Revolution of 1917. All other
events are considered either as leading to it or as arising from it. It allows to make the
quantitative analysis of language changes in rather wide chronological frameworks and to
estimate what of the arisen language changes were fixed in language, were started to be often
used by speakers or were disappeared after the revolutionary epoch [Martynenko et al., 2018b].</p>
        <p>The base of the corpus provides a means for exploring the language of the first third of the
twentieth century (1900-1930) divided into three main periods: 1) the beginning of the XXth
century and the prerevolutionary years, including the First world war, 2) the revolutionary
years the February and the October revolutions and the Civil war, and 3) postrevolutionary
years from the end of the Civil war to 1930. Each of these time periods will be analyzed
separately and the results will be combined into an overall picture, reflecting the development
of the Russian language in the first third of the XXth century [Martynenko et al., 2018b].
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental procedure</title>
      <sec id="sec-5-1">
        <title>The experimental setup is pre-processing of texts that included: removal of non-text symbols, abbreviations, stop words and lemmatization. The volumes of the data sets are shown (in tokens) below:</title>
      </sec>
      <sec id="sec-5-2">
        <title>In the first step a document-term matrix is created to which TF-IDF and normalization of</title>
        <p>the document length are applied before each matrix is written. It includes marking documents
and creation of a document matrix for the window topic where the topic model is created by
applying NMF to each time window.</p>
        <p>Determining the number of topics is a nontrivial task because the choice of too few of
them leads to overly generalized results while choosing too many topics entails too many small,
highly similar topics [Green and Cross, 2015]. For the cases when this number is not known
in advance there are different strategies for automatic or semi-automatic selection of number
of topics. In particular it is proposed to build a Word2Vec Skipgram model using the Gensim
library (https://radimrehurek.com/gensim/) from all documents in the case. The TC-W2V
measure is used to compare different topic models and then select a model with a suitable
number of topics. More details on the TC-W2V are in [O’Callaghan, 2015].</p>
      </sec>
      <sec id="sec-5-3">
        <title>Applying the method mentioned in [Green and Cross, 2015] to determine the number of topics the following results were obtained:</title>
      </sec>
      <sec id="sec-5-4">
        <title>Top recommendations for number of topics for ’1900–1913’: 10 (Table 2)</title>
      </sec>
      <sec id="sec-5-5">
        <title>Top recommendations for number of topics for ’1914–1922’: 4 (Table 3)</title>
      </sec>
      <sec id="sec-5-6">
        <title>Top recommendations for number of topics for ’1923–1930’: 10 (Table 4)</title>
      </sec>
      <sec id="sec-5-7">
        <title>Top recommendations for number of dynamic topics: 4 (Table 5).</title>
      </sec>
      <sec id="sec-5-8">
        <title>The ability of NMF to apply TF-IDF weighting to data before the topic modeling creates</title>
        <p>diverse but nonetheless coherent topics that are less likely to be represented by the same
high-frequency terms allowing identification of both broad and niche topics with specialized
vocabularies [O’Callaghan et al., 2015]. In the context of the study of the first third of the</p>
      </sec>
      <sec id="sec-5-9">
        <title>XXth century the discovery of these niche topics is an advantage that helps to consider the</title>
        <p>components of the topics and analyze the realities of the period in more detail. To illustrate
this idea table 5 shows the top 10 terms for 4 dynamic topics. Terms in bold are unique to
a topic; terms in italics are met in an overall description of a topic and in time windows (or
even in one time window), terms in bold and italics are found within time windows but are
not in an overall description of a topic.</p>
      </sec>
      <sec id="sec-5-10">
        <title>The above list of words created by NMF to describe topics is rich and various, moreover each time window has its own unique words. If to compare it with the most common LDA method as the authors of the model do [Darek and Cross, 2016] NMF is more suitable for niche content analysis while LDA offers only a good general description of broader topics.</title>
        <p>6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Linguistic interpretation of experimental results</title>
      <p>The highest interest for linguistic analysis is the content of dynamic topics. Table 5 lists the
top 4 dynamic topics penetrating all time periods (1900–1913, 1914–1922, 1923–1930). Table
6 shows niche topics and vocabularies of each dynamic topic in a specific time period. For
instance, the first broad topic in the first time window is represented by 40 words with the
biggest amount of unique terms (писать (to write), любовь (love), любить (to love), сцена
(scene), роль (role), ребенок (child), муж (husband), жена (wife), счастье (happiness),
кабинет (room, office), деньга (money), русский (Russian), пароход (steamer), город (city,
town) and etc.). Inference should be drawn that writers at the beginning of the century
wrote a lot about a mode of life: about family (муж (husband), ребенок (child), жена
(wife), мама (mom), сестра (sister), отец (father)), work (кабинет (room, office), деньга
(money)) and events that occurred with the main characters, which interacted with people of
different professions (купец (merchant), приказчик (manager), извозчик (horse-cab driver),
доктор (doctor)). The number of unique terms is not numerous in the second time window
(1914–1922), which is due to the fact that this is a revolutionary time and the description
of life is minimal. In the post-revolutionary period the vocabulary increases again, there are
unique words that reflect the ¾new life¿ (товарищ (comrade), завод (factory), гражданин
(citizen), рабочий (working), etc.).</p>
      <sec id="sec-6-1">
        <title>There are more words describing nature in the second dynamic topic: 1900–1913 (пруд</title>
        <p>(pond), река (river), ночь (night), солнце (sun), куст (bush), лес (forest), волк (wolf ) ),
during 1914–1922 - more abstract (ветер (wind), море (sea), небо (sky), солнце (sun), ночь
(night), берег (bank)), 1923–1930 (сосна (pine), птица (bird), зверь(beast), лес (forest),
болото (swamp)). The third dynamic topic is filled with words related to the military sphere.
However, if to look through the niche topics and vocabularies of each period at the beginning
of the century only a few words can be attributed to the military theme (солдат (soldier),
офицер (officer), пост (post)), the rest of unique terms are more related to the usual way of
life (барин (lord), старик (old man), деревня (village), благородие (honour), etc.)), which
can indicate to the maintenance of order and regulation of people relations . In the second
time window (1914–1922) two unique words ¾немецкий¿ (German) and ¾немец¿ (German)
appeared, and there are no abstract words, almost all content refer to the military (солдат
(soldier), офицер (officer), стрелять (to shoot), рота (troop), etc.), which fully reflects the
revolutionary period. There is a large number of unique words in the third time window
where there are the following niche topics: movement by train (вагон (coach), пассажир
(passenger), поезд (train), станция (station), ход (motion), курс (course)), family (муж
(husband), мама (mom), ребенок (child), мальчик (boy)), house/home (дом (house, home),
кухня (kitchen)). Only the word ¾солдат¿ (soldier) can be attributed to the military topic.</p>
      </sec>
      <sec id="sec-6-2">
        <title>The fourth dynamic topic has several niche topics: village (телега (telega, horse wagon),</title>
        <p>народ (folk), изба (hut, house), etc.), religion (поп (priest), батюшка (priest), церковь
(сhurch). It is worth paying attention to the dynamics of changes in the religious topic:
more words in the revolutionary time (батюшка (priest), Бог (God), святой (saint))
compared to the beginning of the century (батюшка (priest)), and the postrevolutionary period
(поп (priest), церковь (сhurch)).</p>
      </sec>
      <sec id="sec-6-3">
        <title>The above analysis shows that the internal organization of topics described as a bundle of paradigmatic and syntagmatic connections between the words of the same topic which vary in different time intervals within the same dynamic topic, change significantly over time and reflect the external events.</title>
        <p>If we consider the components of topics from the linguistic viewpoint the largest number of
words belongs to the nominative class, which is represented by common nouns. Proper names
(Александр (Alexander), Алексей (Alexey), Анна (Anna), Владимир (Vladimir), Володя
(Volodya), Мишка (Mishka), Вера (Vera), etc.) are deliberately removed since there are
many dialogues in the data. The frequency of names is very high and its topic distribution is
not conditioned by anything.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>Most nouns refer to the description of people (ребенок (child), девушка (young lady),
женщина (woman), старик (old man), дед (grandfather, old man), баба (country woman,
peasant’s wife), отец (father), мать (mother), муж (husband), жена (wife), father
(батюшка), матушка (mother), сестра (sister), дядя (uncle), etc.), profession (солдат
(soldier), барин (lord), офицер (officer), студент (student), крестьянин (peasant), доктор
(doctor), etc.), body parts (рука (hand, arm), нога (leg), грудь (chest)). Another group of
nouns - everyday realities (комната (room), улица (street), пост (post), изба (hut, house),
дом (house, home), письмо (letter), etc.), nature and animals (лес (forest), куст (bush), волк
(wolf ), солнце (sun), лошадь (horse), pond (пруд), река (river), ночь (night), снег (snow)),
abstract (жизнь (life), счастье (happiness), мысль (thought), смерть (death), etc.), as well
as collective nouns (толпа (crowd), рота (troop), народ (folk)).</p>
      <p>The predicative class is represented by verbs: писать (to write), знать (to know),
любить (to love), хотеть (to want), стоять (to stand), кричать (to shout), бежать
(to run), думать (to think), обедать (to dine), прийти (to come), сидеть (to sit), жить
(to live), работать (to work), спать (to sleep), глянуть (to peep), стрелять (to shoot),
вскочить (to jump up), пойти (to go). From the point of view of semantic classification of
verbs developed by V. V. Vinogradov and supplemented by G. A. Zolotova [Zolotova, 2004]
the present verbs belong to the main semantic classes:</p>
      <p>1) verbs of movement: стоять (to stand), сидеть (to sit), глянуть (to peep), бежать
(to run), прийти (to come), вскочить (to jump up), идти (to go);
2) verbs of speech action: кричать (to shout);
3) verbs of mental actions: знать (to know), думать (to think), казаться (to seem);</p>
      <sec id="sec-7-1">
        <title>4) verbs of emotional action: любить (to love);</title>
        <p>5) verbs of physiological action: жить (to live), обедать (to dine), спать (to sleep);
6) verbs of activity or occupation: работать (to work), писать (to write), стрелять
(to shoot);
7) modal verb: хотеть (to want).</p>
      </sec>
      <sec id="sec-7-2">
        <title>The attributive class is the narrowest it includes qualitative (хороший (good), большой</title>
        <p>(big), черный (black), темный (dark), горбатый (humpbacked), старший (senior)) and
relative adjectives (рабочий (working), русский (Russian), немецкий (German)).</p>
      </sec>
      <sec id="sec-7-3">
        <title>The correlation of words in the topics reflects the diversity of paradigmatic and syn</title>
        <p>tagmatic relations that organize the text [Mitrofanova et al., 2014; Mitrofanova, 2014]. The
language connections within the topics may be describes with lexical functions in the model
¾Meaning &lt; = &gt; Text¿ [Melchuk, 1974/1999] which allows to cover the predictable, idiomatic
connections of the word and its lexical correlates.</p>
        <p>Among paradigmatic relations in topics the following are prevailed: synonymy (Syn),
antonymy (Anti) and derivational (Der) relations, etc. For example, Syn: мама (mom)
мать (mother), друг (friend) товарищ (comrade), фабрика (plant) завод (factory),
черный (black) темный (dark), поп (priest) батюшка (priest), etc. Anti: деревня
(village) город (city, town), Der: работа (work) рабочий (working) работать (to
work), немец (German) немецкий (German), любовь (love) любить (to love), крик
(shout) кричать(to shout), команда (command, team) командир (commanding officer),
etс. Partitive relations: семья (family) ребенок (child), мама (mom), отец (father),
дядя (uncle), сестра (sister), муж (husband), жена (wife), отец (father), мать (mother),
тетка (aunt); армия (military) офицер (officer), солдат (soldier), рота (troop), штаб
(headquarter), капитан (сaptain), команда (command, team), etc.; природа (nature) пруд
(pond), река (river), берег (bank), лес (forest), снег (snow), солнце (sun), ветер (wind),
дерево (tree), куст (bush), болото (swamp), etc..; лес (forest) дерево (tree), куст (bush),
болото (swamp); охота (hunt) ружье (rifle), зверь (beast), лес (forest), огонь (fire);
деревня (village) изба (hut, house), телега (telega, horse wagon), крестьянин (peasant),
барин (lord); дом (house, home) комната (room), дверь (door), окно (window), лампа
(lamp), кухня (kitchen), кабинет (room, office); передвижение на поезде (go by train) –
вагон (coach), пассажир (passenger), поезд (train), станция (station), ход (motion); завод
(factory) – работа (work), рабочий (working), работать (to work), машина (machine), etc.</p>
        <p>Syntagmatic relations are realized at the level of valence frames filled with words from
the topic. Among lexical functions Oper1,2 may be selected, which connect a verb, the name
of the first or the second actant in the role of subject and the name of the situation as
additions: суп (soup) – обедать (to dine), письмо (letter) - писать (to write), винтовка (gun)
– стрелять (to shoot), ребенок (child) – кричать (to shout), etc. In addition, there are
a number of examples for the implementation of the lexical function Cap: (команда
(command, team) – командир (commanding officer); штаб (headquarter) – начальник (chief );
отряд (squad) – командир (commanding officer), церковь (church) – поп (priest), пароход
(steamer) - капитан (captain), etc. The lexical function Equip ("personnel, staff"): people
(folk) – man (country man, peasant man), etc., lexical function Doc (res) ("document that is
the result"): write (to write) – letter (letter); draw – drawing, etc.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <sec id="sec-8-1">
        <title>The observations made during the experiments confirm the expediency of using non-negative matrix factorization for the issues of topic modeling, including the evaluation of the content of texts as a result of semantic compression.</title>
      </sec>
      <sec id="sec-8-2">
        <title>The results obtained in the processing of the selected data from the Russian short stories corpus of the first third of the XXth century indicate the diversity of the implementation of dynamic topic in different time periods.</title>
      </sec>
      <sec id="sec-8-3">
        <title>The research data makes it possible to interpret the received results from the perspective</title>
        <p>of the theory of lexical functions, as well as to use historical and literary approaches for this
purpose. The content of the topics allows to draw conclusions about the topic dynamics of</p>
      </sec>
      <sec id="sec-8-4">
        <title>Russian prose for 30 years from 1900 to 1930.</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <sec id="sec-9-1">
        <title>The research is supported by the Russian Foundation for Basic Research, project # 17-29</title>
        <p>09173 “The Russian language on the edge of radical historical changes: the study of
language and style in prerevolutionary, revolutionary and post-revolutionary artistic prose by the
methods of mathematical and computer linguistics (a corpus-based research on Russian short
stories)”.
[Melchuk, 1974/1999] Melchuk I. A. (1974/1999) Experience of the theory of the linguistic
models ¾Meaning Text¿. Moscow, 1974/199. 1974 (In Russ.) = Opyt teorii lingvistitcheskix
modelej Smysl Tekst, Moskva, 1974/1999.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Daud et al. 2010
          <string-name>
            <surname>] Daud</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muhammad</surname>
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2010</year>
          )
          <article-title>Knowledge Discovery through Directed Probabilistic Topic Models: a Survey /</article-title>
          / Proceedings of Frontiers of Computer Science in China.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Blei, Lafferty, 2006]
          <string-name>
            <given-names>Blei D. M.</given-names>
            ,
            <surname>Lafferty J. D.</surname>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>Dynamic topic models</article-title>
          .
          <source>In Proc. 23rd International Conference on Machine Learning</source>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Lee and Seung</source>
          , 1999]
          <string-name>
            <given-names>Lee D. D.</given-names>
            and
            <surname>Seung</surname>
          </string-name>
          <string-name>
            <surname>H. S.</surname>
          </string-name>
          (
          <year>1999</year>
          )
          <article-title>Learning the parts of objects by non-negative matrix factorization</article-title>
          .
          <source>Nature</source>
          <volume>401</volume>
          ,
          <fpage>788</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[Wang</surname>
          </string-name>
          et al.,
          <year>2012</year>
          ]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Group matrix factorization for scalable topic modeling</article-title>
          .
          <source>In Proc. 35th SIGIR Conf. on Research and Development in Information Retrieval</source>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>384</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Sherstinova, Martynenko, 2019]
          <string-name>
            <surname>Sherstinova</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martynenko</surname>
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2019</year>
          )
          <article-title>Linguistic and Stylistic Parameters for the Study of Literary Language in the Corpus of Russian Short Stories of the First Third of the 20th Century</article-title>
          . This volume.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Mu¨ller and Guido</source>
          , 2017]
          <article-title>Mu¨ller. A. and</article-title>
          <string-name>
            <surname>Guido. S.</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Introduction to Machine Learning with Python: A Guide for Data Scientists, O'Reilly</article-title>
          .,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Darek and Cross</source>
          , 2016]
          <string-name>
            <given-names>Greene D.</given-names>
            and
            <surname>Cross</surname>
          </string-name>
          <string-name>
            <surname>J. P.</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach</article-title>
          .
          <source>ArXiv abs/1607.03055</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>[O'Callaghan</surname>
          </string-name>
          et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>O</given-names>
            <surname>'Callaghan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Greene</surname>
          </string-name>
          <string-name>
            <given-names>D</given-names>
            ,
            <surname>Carthy</surname>
          </string-name>
          <string-name>
            <surname>J.</surname>
          </string-name>
          , and
          <string-name>
            <surname>Cunningham P.</surname>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications (ESWA).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Martynenko et al., 2018a]
          <string-name>
            <surname>Martynenko</surname>
            <given-names>G.Ya.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .,
          <string-name>
            <surname>Melnik</surname>
            <given-names>A. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popova</surname>
            <given-names>T.I.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Methodological problems of creating a computer anthology of the Russian short story as a language resource for the study of the language and style of Russian prose in the era of revolutionary changes (the first third of the XX century) / Computational linguistics and computational ontologies. Issue 2 (Proceedings of the XXI international joint conference "Internet and modern society</article-title>
          , IMS-2018, St. Petersburg, May 30-June 2,
          <source>2018 Collection of scientific articles")</source>
          .
          <source>- St</source>
          . Petersburg: ITMO University,
          <year>2018</year>
          . P.
          <volume>99</volume>
          -
          <fpage>104</fpage>
          . (In Rus.) =
          <article-title>Metodologicheskiye problemy sozdaniya Kompyuternoy antologii russkogo rasskaza kak yazykovogo resursa dlya issledovaniya yazyka i stilya russkoy khudozhestvennoy prozy v epokhu revolyutsionnykh peremen (pervoy treti XX veka) / Kompyuternaya lingvistika i vychislitelnyye ontologii. Vypusk 2 (Trudy XXI Mezhdunarodnoy obyedinennoy konferentsii "Internet i sovremennoye obshchestvo</article-title>
          .
          <source>IMS-2018. Sankt-Peterburg. 30 Maya - 2 Iyunya</source>
          <year>2018</year>
          g.
          <article-title>Sbornik nauchnykh statey"</article-title>
          ).
          <source>SPb: Universitet ITMO</source>
          .
          <year>2018</year>
          . S.
          <volume>99</volume>
          -
          <fpage>104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Martynenko et al., 2018b]
          <string-name>
            <surname>Martynenko</surname>
            <given-names>G.Ya.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherstinova</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .,
          <string-name>
            <surname>Popova</surname>
            <given-names>T.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melnik</surname>
            <given-names>А.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zamiraylova</surname>
            <given-names>E.V.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>On the principles of creation of the Russian short stories corpus of the first third of the 20th century</article-title>
          .
          <source>Proceedings of the XV International conference on computer and cognitive linguistics "TEL</source>
          <year>2018</year>
          ".
          <article-title>-</article-title>
          <string-name>
            <surname>Kazan</surname>
          </string-name>
          ,
          <year>2018</year>
          . Pp.
          <volume>180</volume>
          -
          <fpage>197</fpage>
          . (In Rus.) =
          <article-title>O printsipakh sozdaniya korpusa russkogo rasskaza pervoy treti XX veka // Trudy XV Mezhdunarodnoy konferentsii po kompyuternoy i kognitivnoy lingvistike</article-title>
          ¾
          <source>TEL</source>
          <year>2018</year>
          ¿. -
          <fpage>Kazan</fpage>
          .
          <year>2018</year>
          . - S.
          <fpage>180</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Green and Cross</source>
          , 2015]
          <string-name>
            <given-names>Greene D.</given-names>
            , and
            <surname>Cross</surname>
          </string-name>
          <string-name>
            <surname>J. P.</surname>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Unveiling the Political Agenda of the European Parliament Plenary:</article-title>
          <source>A Topical Analysis ACM Web Science</source>
          <year>2015</year>
          ,
          <volume>28</volume>
          June - 1
          <string-name>
            <surname>July</surname>
          </string-name>
          , 2015 Oxford, UK.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Zolotova et al.,
          <year>2004</year>
          ] olotova
          <string-name>
            <given-names>G. A.</given-names>
            ,
            <surname>Onipenko</surname>
          </string-name>
          <string-name>
            <given-names>N. T.</given-names>
            ,
            <surname>Sidorova</surname>
          </string-name>
          <string-name>
            <surname>M. Y.</surname>
          </string-name>
          <article-title>Communicative grammar of the Russian language</article-title>
          . Ed. - Moscow: Nauka,
          <year>2004</year>
          . 544 p (In Rus.) =
          <article-title>Kommunikativnaya grammatika russkogo yazyka</article-title>
          . M.:
          <string-name>
            <surname>Nauka</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <volume>544</volume>
          s.
          <source>ISBN 5-88744-050-3</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Mitrofanova et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Mitrofanova O. A.</given-names>
            ,
            <surname>Shimorina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            ,
            <surname>Koltsov</surname>
          </string-name>
          <string-name>
            <given-names>S. N.</given-names>
            ,
            <surname>Koltsova</surname>
          </string-name>
          <string-name>
            <surname>O. Yu.</surname>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Modeling semantic links in social media texts using the LDA algorithm (based on the Russian-language segment of the LiveJournal)</article-title>
          .
          <source>Structural and applied linguistics</source>
          , Vol.
          <volume>10</volume>
          ,
          <fpage>151</fpage>
          -
          <lpage>168</lpage>
          . (in Rus.) =
          <article-title>Modelirovaniye semanticheskikh svyazey v tekstakh sotsialnykh setey s pomoshchyu algoritma LDA (na materiale russkoyazychnogo segmenta Zhivogo Zhurnala)</article-title>
          .
          <article-title>Strukturnaya i prikladnaya lingvistka</article-title>
          .
          <source>Vyp</source>
          .
          <volume>10</volume>
          .
          <fpage>151</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Mitrofanova</source>
          , 2014]
          <string-name>
            <surname>Mitrofanova O. A.</surname>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>Topic modeling of special texts based on LDA algorithm</article-title>
          .
          <source>XLII International philological conference. March 11-16</source>
          ,
          <year>2013</year>
          .
          <article-title>Selected works</article-title>
          .
          <source>SPb</source>
          . (in Rus.) =
          <article-title>Modelirovanije tematiki special'nyh tekstov na osnove algoritma LDA XLII Mezhdunarodnaya filologicheskaya konferencija</article-title>
          .
          <volume>11</volume>
          -
          <fpage>16</fpage>
          marta
          <year>2013</year>
          .
          <article-title>Izbrannyje trudy</article-title>
          . SPb
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>