<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text Attribution in Case of Sampling Imbalance Ьу the Method of Constructing an EnsemЫe of Classifiers Based оп Decision Тrees *</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Rogov</string-name>
          <email>rogov@petrsu.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Abramov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>PetrozavodskStateUniversityP</institution>
          ,
          <addr-line>etrozavodsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>326</fpage>
      <lpage>335</lpage>
      <abstract>
        <p>When solvingthe attributionproЫem, the questionof determiningthe author'sstyleof а writerwho createdа smallernumber of texts (both quantitativelaynd in termsof the totalnumber of words)in comparisonwithotheranalyzedauthorsarisesl.n thispaperweconsider possiЫesolutionsto thisproЫemЬу theexampleof determinintghestyle of Apollon GrigorievA.s а methodfor constructingan ensemЫeof classifiers we use Bagging (Bootstrap aggregating). The SMALT information system ("Statisticamlethodsfor analyzingliterarytexts") was used to determinethe frequency characteristicosf the textsand Python 3.6 was used to builddecisiontrees.As а resultof calculationswe сап assume that the relativ efrequency of the "particle-adjectiveЬ"igrammorethan 6.5 isа distinctivfeeatureof the journalistisctyleof Apollon Grigoriev. Therealsowasа studyof thearticle"PoemsЬу А. S. Khomyakov",which confirms the previouslyconclusionthat thereisпо reasonto considerit as belongingto Apollon Grigoriev.</p>
      </abstract>
      <kwd-group>
        <kwd>Text attribution</kwd>
        <kwd>F</kwd>
        <kwd>М</kwd>
        <kwd>Dostoevsky</kwd>
        <kwd>Apollon Grigoriev</kwd>
        <kwd>Poems Ьу А</kwd>
        <kwd>S</kwd>
        <kwd>Khomyakov</kwd>
        <kwd>sampling imbalance</kwd>
        <kwd>decisiontree</kwd>
        <kwd>software complex "SMALT"</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Authorship identification of anonymous texts (attribution of texts) is one of most
urgent proЫem for the philological community; however, there are no universal
mechanisms rfo its solution [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Specialists in study of literature use methods
that are often somewhat unusual for the humanitarian sphere to answer such
questions, including mathematical methods of analysis. One of the issues, which
is far ofrm its final decision, is the afiliation of anonymous articles puЬlished
in the magazines "Time" and "Epoch" (1861-1865). The authorship of some
* SupportedЬу the RussianFoundation for BasicResearch,projectпо. 18-012-90026.
of these articles has been estaЫished, while the authorship of other materials
causes а lot of controversy and discussion in the philologicalfield [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The solution
to this proЫem is additionally hampered Ьу the uneven amount of availaЫe
textual material: there are many articles owned Ьу F. М. Dostoevsky, while the
remaining authors puЬlished in these journals (for example, А. Grigoriev, N. N.
Strakhov, Уа. Р. Polonsky, etc.), don't have so many texts that are uniquely
attributed to them.
      </p>
      <p>
        The following mathematical methods are used to estaЫish authorship of
works: neural networks, QSUM method, decision trees, support vector machine
(SVG), k-means method, Bayesian classifier, Markov chains, principal compo­
nent analysis, discriminant analysis, genetic algorithms, statistical criteria (х2
test, Student's t-test, Kolmogorov-Smirnov criterion), etc. Among other meth­
ods of data mining, decision trees are distinguished Ьу the cfat that they are
easy to understand and interpret and also do not require special preliminary
data processing. Note some authors who used mathematical methods to solve
the proЬlem of text attribution: Morton А. Q., Mendenhall Т. С., Farringdon
J. М., Efron В., Thisted R., Teahan W. J., Chaski С. Е., Stamatatos Е., Juola
Р., Peng R. D., Joachims Т., Diederich J. J., Apte С., Lowe D., Matthews R.,
Tweedie F. J., de Vel О., Argamon S., Levitan S., Zheng R. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It
should Ье noted that Russian language diefrs signicfiantly from English, so the
methods of analysis of texts in English is oeftn not suitaЫe for Russian language.
      </p>
      <p>
        When solving the proЬlem of classification into two classes, the proЬlem of
sampling imbalance oft en arises, i.e. when the number of objects of one class
signicfiantly exceeds the number of objects of another class. In this case the rfist
class is called the majority class and the second class is called the minority class.
ln such samplings classifiers are configured for objects of the majority class, i.e.
high accuracy of the classifier can Ье oЬtained without selecting objects of the
minority class. When solving the attribution proЬlem, the question of determin­
ing the author's style of а writer who created а smaller number of texts (both
quantitatively and in terms of the total number of words) in comparison with
other analyzed authors arises. Let's consider possiЫe solutions to this proЬlem
Ьу the example of determining the style of Apollon Grigoriev. The authors do
not know any analogs of such research of Russian-language texts except for the
works of G. Kjetsaa and М. А. Marusenko [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
2
      </p>
      <p>
        Construction and Analyzing Decision Тrees
An overview of the types of sampling imbalance and the methods used in such
cases can Ье found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this work we will use sampling, namely Undeтsam­
pling. ln this method the balance of sampling elements is achieved Ьу removing
objects of the majority class. The authors think that this method is more ap­
propriate for the task than Oveтsampling (the sampling balance is achieved Ьу
duplicating objects of the minority class) or SMOTE (Ьу generating new objects
of the minority class).
      </p>
      <p>
        As а method for constructing an ensemЫe of classifiers we use Bagging (Boot­
strap aggregating) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The idea of this method is to train several models on ran­
dom subsamples of the original sample (using Bootstrap) with further averaging.
The authors believe that it meets the meaning of the task better than Boosting.
During previous studies in determining the features of the journalistic style of F.
М. Dostoyevsky we ufond that the constructed decision trees based on Ьigrams
well refiect the author's style. ln the experiments the best results were shown
Ьу decision trees with а fragment size of 1000 words. The optimal step size for
choosing the beginning of the next afrgment is 100 words. The same parameters
were used in this work. The SMALT information system ("Statistical meth­
ods for analyzing literary texts") developed at Petrozavodsk State University
was used to determine the frequency characteristics [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Specialists in philology
carried out grammatical markup of texts, which took into account 14 parts of
speech (noun, adjective, numeral, pronoun, adverb, category of state, verb, par­
ticiple, gerund, preposition, conjunction, particle, modal word, interjection) and
also allowed to mark the quotes, foreign words, introductory words, abbreviated
words and non-linguisticsymbols. А set of data for training was compiled (118
rfagments - Apollon Grigoriev, 899 - the rest). The texts from which the data
were prepared are presented in ТаЫе 1. In this case afrgments of the texts of
Apollon Grigoriev are objects of the minority class and all the others are from
the majority class. The text size is quite small (from 2000 to 7000 words).
аТЫе 1.
      </p>
      <p>Source texts for analysis.</p>
      <p>Python 3.6 was used to build decision trees (libraries: scikit-learn - for tree
implementation, pandas - rfo data reading). The original data set was divided
into 7 parts. In each part all fragments of Apollon Grigoriev were taken as а
class with а label "1", the sаше number of afrgшents of other authors were
taken randomly as а class with а label "О". Repetitions of fragments of other
authors were not allowed.</p>
      <p>А decision tree was trained on each part of data. The training continued until
accuracy reached 100% (tree depth). The fragшent of one of the trained trees is
shown in Fig. 1. All trees formed an еnsешЫе. The decision was accepted Ьу а
шajority vote. Accuracy was calculated on the entire data set using the following
rofmula:</p>
      <p>TP+TN</p>
      <p>Accuracy = T-P-+-T-N-+-FP-+-F­N'
where ТР is true-positive, ТN is true-negative, FР is false-positive and
FN is false-negative predicted class. The experimental results are presented in
ЫаТе 2.
(1)
аТЫе 2.</p>
      <p>Classifer accuracy
Depth Accuracy
1 0,8628
2 0,9592
3 0,9841
4 0,9891
5 0,992
6 0,9901</p>
      <p>In total 7 decision trees were built. А fragment of one of the trees is shown
in Fig. 1. Note that on the third level there are two leaves that contain а small
number of fragшents (summary ofrm 12 to 27, on average less than 8%). You
should take into account the possiЫe inaccuracy of the source data. The texts
of Apollon Grigoriev could Ье edited Ьу F. М. Dostoevsky. In addition there is
а slight volatility in the paraшeters of the author's style depending on external
factors (such as шооd, health status, etc). Therefore, when solving the рrоЫеш
of text attribution, you should limit yourself to the first level or at most the
rifst two levels of decision trees. As you can see from ЫТае 2, the accuracy of
the еnsешЫе at the second level already falls into the generally accepted 5%
significance level. Analyzing the decision trees contained in the еnsешЫе, it can
Ье noted that in 4 of them the fi rst attribute was the "particle-adjective" Ьigram
less than or equal to 6.5. In two cases the sаше attribute is found, but with а
diferent threshold (less than or equal to 7.5). Only one tree had а dif erent
rifst attribute ("adjective-particle") less than or equal to 2.5. We can assume
that the relative efrquency of the "particle-adjective" Ьigram more than 6.5 is а
distinctive feature of the journalistic style of Apollon Grigoriev. The proposed
algorithm allows to solve the proЫem of text attribution.</p>
      <p>
        Abbricтatcd \\'ord Abbrcтiaтed ,vord S 2.5
gini = 0.139
samples = 120
value = [
        <xref ref-type="bibr" rid="ref9">111, 9</xref>
        ]
class = Other
      </p>
      <p>The influence of the universally accepted methods for processing unbalanced
data "UpSampling", "UnderSampling", "SMOTE" on the accuracy of classi­fi
cation of works Ьу Apollon Grigoriev was analyzed.</p>
      <p>The availaЫe data set was divided into test (42 - Apollon Grigoriev, 310
- Other) and training samples. The training sample was subjected to the tech­
niques listed above to conofrnt class imbalance. Then the accuracy ("Accuracy",
"roc-auc" curve) was calculated on а test sample, which was the same for all three
techniques. The results of the experiment are shown in ТаЫе 3.</p>
      <p>This analysis showed approximately the same accuracy of all three methods.
UpSampling looks worse. The advantage of UnderSampling is that it is easier to
explain. Therefore, the authors decided to focus on it.</p>
      <p>Experimental results</p>
      <p>Accuracy (test) roc-auc (test) Accuracy (training) roc-auc (training)
When discussing the afiliation of certain articles to certain authors, it should Ье
noted, that in some cases there is no unequivocal evidence relating this article
to а particular author. In particular, one of the controversial and still unresolved
issues is the article "Poems Ьу А. S. Khomyakov" а discussion about whose
authorship in the literary criticism continues over the past twenty years.</p>
      <p>
        The work of "Poems Ьу А. S. Khomyakov" has long been attributed to Apol­
lon Grigoriev. However, recently it has been considered the copyright text of F.
М. Dostoevsky [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. It was interesting to check where our classifier will take it.
The text will Ье attributed to the author that most of the text fragments belong
to. Fig. 2 shows one of the resulting decision trees. If we take the classicfiation
on the fi rst node, then 6 of the 7 decision trees classify it as "Other", i.e. as not
the text of Apollon Grigoriev. Only on one tree, there was an equality (5 frag­
ments "for belonging" and 5 "against"). During the split on the second level 3
"for belonging", 3 "against" and in one rejection of the classification. Our study
confirms the earlier conclusion [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that there is no reason to consider the article
"Poems Ьу А. S. Khomyakov" as belonging to Apollon Grigoriev.
      </p>
      <p>The combination of parts of the speech "Particle" + "Adjective" that is
so often encountered in two texts precisely belonging to Apollon Grigoriev (in
transliteration from Russian "Lermontov i e go napravlenie. Statya vtoraya" and
"Oppoziciya zastoya. Cherty iz istorii mrakobesiya"), almost does not appear in
the text of the controversial article "Poems Ьу А. S. Khomyakov". The author
repeatedly uses this comЬination in the two indicated articles, then in the desired
article it occurs only 10 times (the text consists of 2031 words), in six cases of
which it is а "ne" particle, and in three cases - а "dazhe" particle; over large
parts of text, such comЬinations of parts of speech could not Ье found (while
in other articles belonging to А. Grigoriev, such comЬination is found more
often and more diverse in terms of emerging types of particles - not only "ne"
and "dazhe", but also "tolko", "to", "vse-taki", "zhe" lfolowed Ьу the adjective.
Of course, this observation alone is not enough to douЬt А. Grigoryev's text
attribution, however, the application of methods based on decision trees can
help with comprehensive analysis of texts in general, and the article "Poems Ьу
А. S. Khomyakov" in the context of the issue of the attribution of journalistic
texts.</p>
      <p>P1·фg0isniitio=1 0А.d0п5!'Ь5.:; 3.5
samples = 7
value = f7, О]
class = 0tl1ei·</p>
      <p>PartickAdjccti.-c:</p>
      <p>
        gini = 0.5 S 6.5
samples = 10
value = [
        <xref ref-type="bibr" rid="ref1 ref9">9, 1</xref>
        ]
class = Ot her
      </p>
      <p>True
Adj gtiпin: i = 0.104</p>
      <p>Partick s:; 5.5
samples =7
value = (7, О]
class = Othcr
gini = О.О
samples =О
value = [О, О]
class = ApollonGrigoric:Y</p>
      <p>False
Conjщ1ctio1 Partick:s; 12.5
gini = 0.2GB
samples =3
classva=luAep=ol(l2o,n1Grigoric:\'
Adjc:ctgi.-c:ini N=ш01.c:!'01l82 s:; 1 .5
samples = 1
value = (О, 1)
class = Apol o1 Grigol'ic:Y
gini = О.О
samples =2
value = f2. 01
clas11=s1212=44=8Oth7er
gir1i = О.О
samples =7
value =[7. О]
ctass = Otl1er
11-121244-1 О
С124-2
1-124-4
С124-5
1-124-6
С124=9
gini = О.О
samples =О
classva=luAep=ol[lОo,nОG]rigoritv
Participlc: Modal"'ord :s; 0.5
gini = 0.019
samples =7
value = [7, О]
class = Other
gini = О.О
samples = О
classva=luAepo=lrloo1. GOrligoric:Y</p>
      <p>NшпegrainMli=od0a.0l\\·18ord:s; 0.5
samples =1
value = [О, 1]
class = ApoUouGrigorie,.·
gini = О.О
samples = О
value = [О, OJ
c1ass = Otl1er
Prqюsitgiшi n1iN=ш01e.5ral 3.0
samples =О
value = [О, OJ
class = Other
gini = О.О
samples =1
classva=lue = Ю, 1</p>
      <p>ApollonGrigoritY
1_124_3
gini = О.О
samples = О
value = [О, О]
class = Other</p>
      <p>
        SMALT Information System
Specialized software is required for
example, we note several software
research in the field oftext attribution. As an
tools that are described in more detail in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
"Stileanalizator" (graphematic and statistical analysis, work with marked
texts);
"Аvtoroved" (graphematic, morphological and statistical analysis);
"Atributor" (statistical analysis);
"Lingvoanalizator" (graphematical and statistical analysis).
      </p>
      <p>
        The SMALT information system developed at Petrozavodsk State Univer­
sity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is designed for the collective work of various specialists with
texts. The information system can Ье divided into three sections (see Fig. 3):
import of new texts, verification of texts Ьу philologists and the use of various
analysis methods both on а single text and rfo а group of texts.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Text fragment lmport</title>
    </sec>
    <sec id="sec-3">
      <title>Database</title>
      <p>As part of the text import process, the text is divided into sections, para­
graphs, sentences and words, as well as matching each word with its morpholog­
ical analysis. Ifthe task oftext separation is typical, then the task ofcomparing
the morphological analysis is rather complicated. The proЬlem is both in the
wide variety of spelling of the word (using pre-revolutionary graphics, а more
lfexiЫe dictionary allowing diefrent spelling ofthe word), and in the need to take
into account the context of the use of the word. At diefrent times, algorithms
of r finding the first possiЫe variant, а frequently used variant and an algorithm
based on n-grams were used to select the semantic analysis of the word. The
latter has а great prospect due to the small number ofsubsequent corrections.</p>
      <p>As part ofthe text verification process, philologists perform correction oftext
analysis (rfo example, combining or separating words), correction ofmorpholog­
ical analysis of а word, or creation of а new analysis. Using the web interface
allows several specialists to work on the text at the same time.</p>
      <p>
        During the analysis process, the SMALT information system provides re­
searchers with access to the accumulated database in various sections. For ex­
ample, one of the popular statistical characteristics is Kjetsaa metrics [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The
SMALT information system calculates the characteristics of both а single work
and а group of texts. Another objective of the analysis is to identify the causes
of the results. For example, to identify the reasons for the separation of text
arfgments between diefrent nodes of the decision tree. The SMALT informa­
tion system allows you to access the source data of the required fragment for
subsequent linguistic analysis.
5
      </p>
      <p>Conclusion
When solving the proЫem of determining the author's style of Apollon Grigoriev,
the proЫem of sampling imbalance often arises, i.e. when the number of objects
of one class signicfiantly exceeds the number of objects of another class (in this
case, the objects are the texts of the analyzed authors). As а method for con­
structing an ensemЫe of classifiers we use Bagging (Bootstrap aggregating). The
idea of this method is to train several models on random subsamples of the orig­
inal sample (using Bootstrap) with further averaging. The authors believe that
it meets the meaning of the task better than Boosting. Analyzing decision trees
built using Python 3.6 (libraries: scikit-learn-tree implementation, pandas-data
reading), we can assume that the relative efrquency of the "particle-adjective"
Ьigram more than 6.5 is а distinctive feature of the journalistic style of Apollon
Grigoriev.</p>
      <p>The oЬtained knowledge was used to study the authorship of the article "Po­
ems Ьу А. S. Khomyakov", а discussion about whose authorship in the literary
criticism continues over the past twenty years. If we take the classification on
the first node, then 6 of the 7 decision trees classify it as "Other", i.e. as not the
text of Apollon Grigoriev.</p>
      <p>The obtained results were presented for further consideration to the spe­
cialists of the Department of Russian Language and the Department of Classic
Philology , Russian Literature and Journalism (Petrozavodsk State University).
Acknowledgements. This work was supported Ьу the Russian Foundation
Basic Research, project no. 18-012-90026.
for</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Batura</surname>
          </string-name>
          , Т. V.:
          <article-title>Formal methods for determining the authorship of texts</article-title>
          . Novosiblrsk State University Bulletin.
          <source>Series "Information cTehnology"</source>
          .
          <source>Novosiblrsk</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>94</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Biihlmann</surname>
          </string-name>
          , Р.:
          <article-title>Bagging, Boosting and EnsemЫe Methods</article-title>
          . In: Gentle J.,
          <string-name>
            <surname>Hardle</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mori</surname>
            <given-names>У</given-names>
          </string-name>
          . (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg (
          <year>2012</year>
          ). https://doi.org/10.1007 /978-3-
          <fpage>642</fpage>
          -21551-3_
          <fpage>33</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Calle-Martin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Miranda-Garcia, А.:
          <article-title>Stylometry and Authorship Attribution: lntroduction to the Special lssue</article-title>
          .
          <source>English Studies</source>
          <volume>93</volume>
          (
          <issue>3</issue>
          ),
          <fpage>251</fpage>
          -
          <lpage>258</lpage>
          (
          <year>2012</year>
          ) https://doi.org/10.1080/0013838Х.
          <year>2012</year>
          .668788
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gurova</surname>
          </string-name>
          ,
          <source>Е. 1.: Methods of Authorship Attribution in Contemporary National Philology. The New Philological Bulletin</source>
          <volume>3</volume>
          (
          <issue>38</issue>
          ),
          <fpage>29</fpage>
          -
          <lpage>44</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Farringdon</surname>
            ,
            <given-names>J. М.</given-names>
          </string-name>
          : Analyzing for Authorship / J. М.
          <article-title>Farringdon with contributions Ьу Morton А</article-title>
          . Q.,
          <string-name>
            <surname>Farringdon</surname>
            <given-names>М. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker М</surname>
          </string-name>
          . D. Cardif, University of Wales Press (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kjetsaa</surname>
          </string-name>
          , G.:
          <article-title>Attributed to Dostoevsky: The ProЫem of attributing to Dostoevsky anonymous articles in Time and Epoch</article-title>
          . Oslo: Solum Forlag А. S. (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kotov</surname>
          </string-name>
          , А. А.,
          <string-name>
            <surname>Mineeva</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <year>1</year>
          ., Rogov, А. А.,
          <string-name>
            <surname>Sedov</surname>
            ,
            <given-names>А. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
          </string-name>
          , У. V.: Linguistic Corpuses. Petrozavodsk: PetrSU РuЫ. (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Krawczyk</surname>
          </string-name>
          , В.:
          <article-title>Learning from imbalanced data: open challenges and future directions</article-title>
          .
          <source>Progress in Articfiial lntelligence 5(4)</source>
          ,
          <fpage>221</fpage>
          -
          <lpage>232</lpage>
          (
          <year>2016</year>
          ). https://doi.org/10.1007/s13748-016-0094-0
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rogov</surname>
          </string-name>
          , А.,
          <string-name>
            <surname>Kulakov</surname>
          </string-name>
          , К.,
          <string-name>
            <surname>Moskin</surname>
          </string-name>
          , N.:
          <article-title>Software support in solving the proЬ!em of text attribution</article-title>
          .
          <source>Sowftare engineering 10(5)</source>
          ,
          <fpage>234</fpage>
          -
          <lpage>240</lpage>
          (
          <year>2019</year>
          ) https://doi.org/10.17587/prin.10.
          <fpage>234</fpage>
          -
          <lpage>240</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rogov</surname>
          </string-name>
          , А.,
          <string-name>
            <surname>Sedov</surname>
          </string-name>
          , А.,
          <string-name>
            <surname>Sidorov</surname>
          </string-name>
          , У.,
          <string-name>
            <surname>Surovceva</surname>
          </string-name>
          , Т.:
          <article-title>Mathematical methods for text attribution</article-title>
          . Petrozavodsk, PetrSU РuЫ. (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Romanov</surname>
          </string-name>
          , А. S.:
          <article-title>Methodology and sowftare complex for identifying the author of an unknown text</article-title>
          .
          <source>Tomsk</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sidorov</surname>
          </string-name>
          , У. V.:
          <article-title>Mathematical and informational support of literary text processing methods based on formal grammatical parameters</article-title>
          .
          <source>Petrozavodsk</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          , Е.:
          <article-title>А Survey of Modern Authorship Attribution Methods</article-title>
          .
          <source>Journal of the American Society rfo lnformation Science and Technology</source>
          <volume>60</volume>
          (
          <issue>3</issue>
          ),
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          (
          <year>2009</year>
          ) https://doi.org/10.1002/asi.21001
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Zakharov</surname>
          </string-name>
          , V.:
          <article-title>Question about Khomyakov</article-title>
          . ln: Zakharov,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>The name of the author is Dostoevsky. Essay on creativity</article-title>
          . Moscow, lndrik,
          <fpage>231</fpage>
          -
          <lpage>247</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zakharov</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogov</surname>
          </string-name>
          , А.А.,
          <string-name>
            <surname>Sidorov</surname>
          </string-name>
          , У. V.:
          <article-title>The proЫem of Dostoevsky grammatical constants search and anonymous and pseudonymous articles, puЬlished in "Time" and "Epoch" magazines (1861-1865) attribution. rWoks and Materials of "Russian Language Historical Destiny and the Present" lnternational Congress</article-title>
          . Moscow, MSU,
          <fpage>404</fpage>
          -
          <lpage>405</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>