<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Studying Text Complexity in Russian Academic Corpus with Multi-Level Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marina Solnyshkina</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valery Solovyev</string-name>
          <email>maki.solovyev@mail.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir Ivanov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Danilov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Innopolis University</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kazan Federal University</institution>
          ,
          <addr-line>Kazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The problem of compiling a large multi-level annotated corpus of Russian academic texts was sparked by the demand to measure complexity (difficulty) of texts assigned to certain grade levels in terms of meeting their cognitive and linguistic needs. For this purpose we produced a corpus of 20 textbooks on Social Studies and History written for Russian secondary and high school students. Measuring text complexity called for linguistic annotations at various language levels including POS-tags, dependencies, word frequencies. Three complexity formulas are compared as an example of using a corpus to study the complexity of texts.</p>
      </abstract>
      <kwd-group>
        <kwd>multi-level</kwd>
        <kwd>annotated corpus</kwd>
        <kwd>Russian academic texts</kwd>
        <kwd>text complexity</kwd>
        <kwd>POS-tags</kwd>
        <kwd>dependencies</kwd>
        <kwd>word frequencies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Automatic multi-level analysis of language implies utilizing a large corpus or a
number of corpora which are viewed to be of great value for several research tasks [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. In
this paper we present the ongoing project carried out at Kazan Federal University
(Russia) aimed at compiling and annotating a corpus of Russian academic texts.
      </p>
      <p>
        To the best of our knowledge, no prior corpus-based research has been specifically
conducted with the aim of estimating text complexity of Russian educational
materials on Social studies. The specific, though sporadic, studies of Russian text readability
did not go beyond using mere collections of limited texts of a specific type or genre:
fiction (mostly for academic purposes) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], legal [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], academic texts (chemistry,
mathematics, economics) [
        <xref ref-type="bibr" rid="ref14 ref20 ref26 ref27">26, 14, 20, 27</xref>
        ]. Most of the research carried out in the area
was based on English and other Germanic languages for native and/or non-native
readers [
        <xref ref-type="bibr" rid="ref10 ref16 ref22 ref23 ref3 ref6">3, 6, 10, 16, 22, 23</xref>
        ]. The shortage of previous corpus-based research on text
complexity of modern Russian academic texts provides a strong justification for
pursuing the current study. Our objective is to introduce a multi-level annotated corpus of
Russian academic texts with the ultimate goal of disseminating its potential in
Russian discourse research.
      </p>
      <p>It is the authors hope that this proliferation will contribute to detailed examination,
identification and measurement of Russian text features. The paper is organized in the
following way: In section Background we first give an introduction to the problem of
text complexity, we also present the empirical approach to the problem applied in
modern multidisciplinary studies. In section Corpus Description we provide
information on the corpus collection regarding the type of the texts collected, the size of
the corpora and the ultimate goal behind the corpus collection. In same Section we
also provide information on preprocessing of the corpus and the multi-level process of
the annotation. In Section 4 we briefly describe our experiments conducted with the
compiled corpus and in the conclusion section we offer the authors’ insights into the
areas of possible utilization of the corpus and the perspectives of the work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        The earliest studies on readability dating back to late 19th century were mostly aimed
at developing readability formulas and utilized a limited number of quantitative
features: average sentence length, average word length and word frequency [
        <xref ref-type="bibr" rid="ref13 ref4 ref5">13, 4, 5</xref>
        ].
Given the simplicity of the models and availability of the variables, the readability
formulas have been the focus of harsh criticism since they appeared for the first time.
Modern advances in natural language processing (NLP) allowed obtaining lexical and
syntactic features of a text, as well as automatically train readability models using
machine-learning techniques [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Text readability studies based of ngram models
were successfully conducted by American researchers [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and later on, based on
syntax simplicity/complexity, discourse characteristics (narrativity, abstractness,
referential and deep cohesion, etc., extended to assessing a particular text profile and its
target audience see [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Modern researchers of English develop NLP tools of new generation providing
accurate and valid analyses on various dimensions of texts and measure complex
discourse constructs using surface-level linguistic features such as text structure,
vocabulary or the number of unique words in a text, givenness or the number of determiners
and demonstratives in a text, anaphor or the number of all pronouns lexical diversity,
connectives and conjuncts which together with anaphor are indicators of text
coherence, future as an indicator for situational cohesion, syntactic complexity measured
through the number of words per sentence, and the number of negations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Based on
systemic language parameters text features are to be specified for one language only.
Thus, every modern NLP tool as well as a readability formula are applicable to one
language in particular. E.g. parameters measured for English cannot be applied to
estimating Russian texts complexity as Germanic languages have limited morphology
in comparison with Russian [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and all text features need to be validated in a corpus
of a considerable size.
      </p>
      <p>
        Owing to the existing lack of available corpora Russian discourse studies at the
moment are viewed as underdeveloped [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Russian academic texts began being used
in readability studies only in 1970-s [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], but with a short break during 1990-s the
studies in the area were quite extensive. Nowadays researchers view the following
text readability features as cognitively significant: number of syllables, number of
words, sentence count, average sentence length, abstract words count, homonyms
counts, polysemous words counts, technical terms counts, etc. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Ivanov V.V.
tested correlations of 49 factors, among which the strongest correlations are identified for
the percentage of short adjectives, the percentage of finite verb form, the
FleschKincaid Grade Level Score, the Flesch Reading Ease Score [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the Coleman and
Liau index, average number of words per sentence, percentage of complex sentences,
percentage of compound sentences, percentage of abstract words [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Karpov N. et
al. [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] conducted a series of experiments utilizing a number of machine-learning
models to automatically rank Russian texts based on their complexity. For the
purpose the authors compiled two subcorpora: (1) a corpus of texts generated by teachers
for learners of Russian as a foreign language (at http://texts.cie.ru); (2) 50 original
news articles for native readers. They assessed 25 text parameters of each text in the
corpora, such as sentence length, word length, vocabulary, parts of speech
classification. For the last fifteen years, readability of Russian academic texts has been actively
discussed at conferences in Russia and abroad as well as in numerous publications
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] but readability studies are still far from being systematic and irregularities in
reporting make it difficult to draw firm conclusions [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] mostly due to corpora
limitations.
      </p>
      <p>The problem of defining Russian text complexity features can be studied on a
massive corpus containing academic texts used in modern schools. Unfortunately neither
Russian National Corpus nor Corpora of Russian (http://web-corpora.net/?l=en)
though being large and widely used in studies of lexical, syntactic and discourse
features cannot be used for the purposes of our research based on the fact that they do
not provide access to modern Russian academic texts.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Corpus Description</title>
      <p>For the purposes of the study we compiled a corpus of two sets of textbooks on Social
Studies and History written for Russian secondary and high school students. The total
size of the corpus of 20 textbooks is more than 1 million tokens.</p>
      <p>The first collection of 14 texts from textbooks on Social Studies by Bogolubov L.
N. marked “BOG” by Nikitin A.F. marked “NIK” aimed for 5 – 11 Grade Levels. In
our study, Grade Levels means the class number for which the textbook is intended. It
was selected to teach the predictive model and define independent variables of the
text variation. The second collection of 6 texts from textbooks on History by different
authors aimed for 10 – 11 Grade Levels. Both sets of textbooks are from the “Federal
List of Textbooks Recommended by the Ministry of Education and Science of the
Russian Federation to Use in Secondary and High Schools”.</p>
      <p>To ensure reproducibility of results, we uploaded the corpus on a website thus
providing its availability online. Note, however, that the published texts contain
shufGrade
5-th
6-th
7-th
8-th
9-th
10-th
10-th*
11-th
11-th*
fled order of sentences. The sizes of BOG and NIK subcollections of texts are
presented in Table 1.
In the Table 1 star sign (*) denotes advanced versions of books for the corresponding
grade; sign ‘-‘ denotes absence of a textbook for the corresponding grade.</p>
      <p>Data on the collection of books on history is presented in Table 2. The first
column lists textbook authors and the class number.
For the convenience, we have preprocessed all texts from the corpus in the same way.
Common preprocessing included tokenization and splitting text into sentences.
During the preprocessing step we excluded all extremely long sentences (longer than 120
words) as well as too short sentences (shorter than 5 words) which we consider
outliers. Clearly, such sentences can be not outliers at all in another domain, but for the
case of school textbooks on Social Studies sentences shorter than 5 words are outliers.
Sentence and word-level properties of the preprocessed dataset are presented in
Tables 1 and 2.</p>
      <p>Extremely short sentences mostly appear as names of chapters and sections of the
books or as a result of incorrect sentence splitting. We omit those sentences, because
the average sentence length is a very important feature in text complexity assessment
and hence should not be biased due to splitting errors. At the same time sentences
with five to seven words in Russian can still be viewed as short sentences, because the
average sentence length (in our corpus) is higher than ten.</p>
      <p>Table 1 demonstrates that values of Word per sentence (ASL) as it is generally
expected, increase with the grades.
3.2</p>
      <sec id="sec-3-1">
        <title>Multi-level Annotations in Corpus</title>
        <p>All annotations in the corpus are performed on three levels: text-level, sentence- level
and word-level. At the text-level meta-annotations refer to a number of sentences and
a set of tokens, an author and a grade-level of a given text. At the word-level we have
part-of-speech tag for each word. POS-tagging has been performed with the use of the
TreeTagger for Russian (http://www.cis.uni-muenchen.de/schmid/tools/TreeTagger/).
The tagset is available from the website of the project. As example we provide
distribution of major PoS-tags among texts on Social Studies, Table 3. We also annotate
each lemma in the corpus with its relative frequency measured in the large corpus of
Russian texts, Russian National Corpus.</p>
        <p>
          At the sentence-level the corpus contains annotations of sentence boundaries, the
tokens are assigned to sentences as well as a dependency tree of each sentence. For
dependency parsing we use pretrained neural models
(https://github.com/MANASLU8/ CoreNLPRusModels) for Stanford Dependency
Parser for Russian (https://nlp.stanford.edu/software/stanford-dependencies.shtml).
Finally, at the moment, we are adding semantic annotations to the corpus. The
semantic annotations are based on the very large Russian Thesaurus (RuThes) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
Concepts of the RuThes are mapped to the Wordnet thesaurus that allows to process
textual content at semantic level.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Studies of Text Readability and Complexity</title>
      <p>First of all, the corpus can be used to adjust readability formulas in Russian. Second,
even very simple statistics provided in the Table 3 can be useful in text complexity
studies. For example, one can see that average number of unique adjectives grow
when grade level increases. At the same time average number of adverbs (as well as
verbs) decreases. Both observations correspond with idea that texts become more
descriptive. However, with assistance of the data it is possible to measure the
correlation.</p>
      <p>
        In this study, 3 formulas (our formulas [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], Matskovskiy Readability Formula
[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] and Oborneva’s Readability Formula [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) were applied to 5 Social Studies and
7 History textbooks for grades 10 – 11. In the formulas below, GL denote the grade
level.
      </p>
      <p>
        In paper [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] we provided readability formula GL = 0.36ASL + 5.76ASW –
11.97, where ASL and ASW means average of words per sentence and means
average of syllables per word respectively. Below, this formula is labeled RRF. In [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
Matskovskiy M.S. computed the first readability formula for the Russian language:
GL = 0.62ASL + 0.123X + 0.051, where X is the percentage of three syllable words
in the text. In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] Oboroneva I. introduced readability formula readability formula
GL = 0.5ASL + 8.4 ASW – 15.59.
      </p>
      <p>In an attempt to verify the features defined as contributing to text readability but not
measured by the existing readability formulas, we compared the 11 texts under study
in order to see what metrics better correlate with the grade level. The data are
presented in table 4.</p>
      <p>The Fig. 1 below shows, that Oboroneva’s formula positioned them as textbook
comprehensible only by people with at least 16 – 17 years of formal schooling, i.e.
with Bachelor or Master’s Degree. It is clear from the table that grade level
predictions based upon the equation of regression of Oborneva I. do not coincide with the
actual grade levels, the difference is marked in 6 years in the case of textbooks on
History. As for Matskovskiy’s Readability formula which was initially developed to
compute readability of media texts only, it proves to be quite reliable in assessing
readability of academic texts also (compare columns ‘Grade’ and ‘Matskovskiy’ in
Table 4).</p>
      <sec id="sec-4-1">
        <title>Book</title>
        <p>Guryanov_11
Klimov_10
Petrov_11
Plenko_11
Ponomarev_11
Soboleva_10
BOG_10
BOG_10*
BOG_11*
NIK_10
NIK_11
ASL</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>
        Thus, there are two reasons which make future research into Russian texts readability
relevant. First, the recent reports from educators call for improving reading
comprehension in secondary and high schools throughout the country [
        <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
        ]. Researchers also
testify to Russian students lack of interest in reading caused by inappropriate selection
of educational materials [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The Corpus is a valuable instrument for discourse
studies as its data and flexible search system provide a solid foundation for comparative
research of modern Russian texts and enables deep insights into patterns and
dependencies of different text features. The Corpus is also viewed by the authors as a
powerful tool for discovering new aspects and regularities of Russian discourse.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research was financially supported by the Russian Science Foundation, grant
#18-18-00436, the Russian Government Program of Competitive Growth of Kazan
Federal University, and the subsidy for the state assignment in the sphere of scientific
activity, grant agreement 34.5517.2017/6.7. The Russian Academic Corpus (section 3,
3.1 in the paper) was created without support from the Russian Science Foundation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Kompetentnostnyy podkhod v vysshem professionalnom obrazovanii (pod redaktciyey A.A</article-title>
          .
          <string-name>
            <surname>Orlova</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          <string-name>
            <surname>Gracheva</surname>
          </string-name>
          ),
          <source>Tula</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berezhkovskaja</surname>
            <given-names>E.</given-names>
          </string-name>
          <article-title>Problema psihologicheskoj negotovnosti k polucheniju vysshego obrazovanija u studentov mladshih kursov</article-title>
          . M.: Prospec. (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Britton</surname>
            ,
            <given-names>B.K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gulgoz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Using Kintsch's computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures</article-title>
          .
          <source>Journal of Educational Psychology</source>
          ,
          <volume>83</volume>
          , pp.
          <fpage>329</fpage>
          -
          <lpage>404</lpage>
          (
          <year>1991</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chall</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dale</surname>
            <given-names>E.</given-names>
          </string-name>
          <article-title>Readability revisited: The new Dale-Chall readability formula</article-title>
          .
          <source>Brookline Books</source>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coleman</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liau</surname>
            <given-names>T. L.</given-names>
          </string-name>
          <article-title>A computer readability formula designed for machine scoring</article-title>
          .
          <source>Journal of Applied Psychology</source>
          ,
          <volume>60</volume>
          :
          <fpage>283</fpage>
          -
          <lpage>284</lpage>
          (
          <year>1975</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cornoldi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Oakhill</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          . (Eds.).
          <article-title>Reading comprehension difficulties: Processes and intervention</article-title>
          . Hillsdale, NJ: Erlbaum (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Crossley</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>L. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyle</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>D. S.</given-names>
          </string-name>
          <article-title>Analyzing discourse processing using a simple natural language processing tool (SiNLP)</article-title>
          .
          <source>Discourse Processes</source>
          ,
          <volume>51</volume>
          , pp.
          <fpage>511</fpage>
          -
          <lpage>534</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dzmitryieva</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Iskusstvo yuridicheskogo pis'ma: kolichestvennyy analiz resheniy Konstitutsionnogo Suda Rossiyskoy Federatsii [The art of legal writing: a quantitative analysis of the Russian Constitutional Court rulings]. Sravnitel'noe konstitutsionnoe obozrenie</article-title>
          ,
          <source>no.3</source>
          , pp.
          <fpage>125</fpage>
          -
          <lpage>133</lpage>
          . (In Russian) (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Heilman</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            <given-names>K. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Eskenazi</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Combining lexical and grammatical features to improve readability measures for first and second language texts</article-title>
          .
          <source>In Human Language Technologies</source>
          <year>2007</year>
          :
          <article-title>The Conference of the North American Chapter of the Association for Computational Linguistics (HLT-</article-title>
          <source>NAACL-07)</source>
          , pp.
          <fpage>460</fpage>
          -
          <lpage>467</lpage>
          , Rochester, New York (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jackson</surname>
            ,
            <given-names>G. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guess</surname>
            ,
            <given-names>R. H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>D. S.</given-names>
          </string-name>
          <article-title>Assessing cognitively complex strategy use in an untrained domain</article-title>
          . In N. A.
          <string-name>
            <surname>Taatgen</surname>
            ,
            <given-names>H. van Rijn</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schomaker</surname>
          </string-name>
          , &amp; J.
          <string-name>
            <surname>Nerbonne</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 31st Annual Meeting of the Cognitive Science Society</source>
          . pp.
          <fpage>2164</fpage>
          -
          <lpage>2169</lpage>
          . Amsterdam,
          <source>The Netherlands: Cognitive Science Society</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ivanov</surname>
            <given-names>V.</given-names>
          </string-name>
          <article-title>K voprocu o vozmonosti ispolzovanija lingvisticeskix xarakteristik slonosti teksta pri issledovanii okulomotornoj aktivnosti pri ctenii u podrostkov [Toward using linguistic profiles of text complexity for research of oculomotor activity during reading by teenagers]</article-title>
          .
          <source>Novye issledovanija [New studies]</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <volume>4250</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Karpov</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baranova</surname>
            <given-names>J.</given-names>
          </string-name>
          , and Vitugin F.
          <article-title>Single-sentence readability prediction in Russian</article-title>
          .
          <source>In Proceedings of Analysis of Images, Social Networks, and Texts conference</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kincaid</surname>
            <given-names>J. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fishburne</surname>
            <given-names>R. P.</given-names>
          </string-name>
          <string-name>
            <surname>Jr.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Rogers</surname>
            <given-names>R. L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Chissom</surname>
            <given-names>B. S.</given-names>
          </string-name>
          <article-title>Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease formula) for Navy enlisted personnel</article-title>
          .
          <source>Research Branch Report 8-75</source>
          , Naval Technical Training Command, Millington,
          <string-name>
            <surname>TN</surname>
          </string-name>
          (
          <year>1975</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Krioni</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikin</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>and Fillipova A. Avtomatizirovannaja sistema analiza slozhnosti uchebnyh tekstov [The automated system of the analysis of educational texts complexity]</article-title>
          .
          <source>Vestnik UGATU (Ufa)</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>28</volume>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          <article-title>Reading both high and low coherence texts: Effects of text sequence and prior knowledge</article-title>
          .
          <source>Canadian Journal of Experimental Psychology</source>
          ,
          <volume>55</volume>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>62</lpage>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kintsch</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Songer</surname>
            ,
            <given-names>N.B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kintsch</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <article-title>Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text</article-title>
          .
          <source>Cognition &amp; Instruction</source>
          , 14, pp.
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Obobroneva</surname>
            <given-names>I.</given-names>
          </string-name>
          <article-title>Avtomatizirovannaya otsenka slozhnosti uchebnykh tekstov na osnove statisticheskikh parametrov. M.: RAS Institut soderzhaniya i metodov obucheniya (</article-title>
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Okladnikova</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>Modelkompleksnoj ocenki citabelnosti testovyx materialov na etape razrabotki [A model of multidimensional evaluation of the readability of test materials at the development stage]</article-title>
          .
          <source>Prikaspijskij journal: upravlenie i vysokie texnologii</source>
          ,
          <volume>3</volume>
          :
          <fpage>6371</fpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Popova</surname>
            <given-names>Ja.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shishkevich</surname>
            <given-names>E.V.</given-names>
          </string-name>
          <article-title>Standartizacija uchebnoj literatury srednej shkoly po kriteriju udobochitaemosti In Sevastopol'skij nacional'nyj universitet jadernoj jenergii i promyshlennosti</article-title>
          .
          <source>Nauchnye vedomosti BelGU. Ser. Gumanitarnye nauki. 12. No. 6</source>
          . pp.
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Shpakovskiy</surname>
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <article-title>Otsenka trudnosti vospriyatiya i optimizatsiya slozhnosti uchebnogo teksta</article-title>
          .
          <source>PhD thesis</source>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Solnyshkina</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harkova</surname>
            <given-names>E</given-names>
          </string-name>
          , and
          <article-title>Kiselnikov A. Comparative coh-metrix analysis of reading comprehension texts: Unified (Russian) state exam in English vs Cambridge first certificate in English</article-title>
          .
          <source>English Language Teaching</source>
          ,
          <volume>7</volume>
          (
          <issue>12</issue>
          ):
          <volume>65</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ozuru</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>OReilly</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>D. S. Wheres</given-names>
          </string-name>
          <article-title>the difficulty in standardized reading tests: The passage or the question</article-title>
          ?
          <source>Behavior Research Methods</source>
          ,
          <volume>40</volume>
          , pp.
          <fpage>1001</fpage>
          -
          <lpage>1015</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Reynolds</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>Insights from Russian second language readability classification: complexitydependent training requirements, and feature evaluation of multiple categories</article-title>
          , San Diego, CA: 16
          <article-title>June 2016</article-title>
          .
          <source>In: Proceedings of the 11th Workshop on the Innovative Use of NLP for Building Educational Applications</source>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>300</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Sinclair</surname>
            ,
            <given-names>J. Corpus</given-names>
          </string-name>
          <article-title>Evidence in Language Description, in</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Wichmann,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fligelstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>McEnery</surname>
          </string-name>
          and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Knowles (eds.) Teaching and Language Corpora</article-title>
          . London/New York: Longman, pp.
          <fpage>27</fpage>
          -
          <lpage>39</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Ivanov</surname>
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solnyshkina</surname>
            <given-names>M.I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Solovyev</surname>
            <given-names>V.D.</given-names>
          </string-name>
          <article-title>Efficiency of text readability features in Russian academic texts</article-title>
          . In Computational Linguistics and
          <string-name>
            <given-names>Intellectual</given-names>
            <surname>Technologies</surname>
          </string-name>
          , V.
          <volume>17</volume>
          , pp.
          <fpage>277</fpage>
          -
          <lpage>287</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Karpov</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baranova</surname>
            <given-names>J.</given-names>
          </string-name>
          , and Vitugin F..
          <article-title>Single-sentence readability prediction in Russian</article-title>
          .
          <source>In International Conference on Analysis of Images, Social Networks and Texts</source>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>100</lpage>
          . Springer (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Ustinova</surname>
            ,
            <given-names>L. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fazylova</surname>
            <given-names>L. S.</given-names>
          </string-name>
          <article-title>Avtomatizacija ocenki slozhnosti uchebnyh tekstov na osnove statisticheskih parametrov</article-title>
          .
          <source>Vestnik Karagand. un-ta. Ser. Matematika. 1</source>
          . pp.
          <fpage>96</fpage>
          -
          <lpage>103</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Loukachevitch</surname>
            ,
            <given-names>N. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lashevich</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerasimova</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            ,
            <given-names>V. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Dobrov</surname>
            ,
            <given-names>B. V.</given-names>
          </string-name>
          <string-name>
            <surname>Creating Russian</surname>
          </string-name>
          <article-title>Wordnet by conversion</article-title>
          .
          <source>Kompjuternaja Lingvistika i Intellektualnye Tehnologii</source>
          ,
          <volume>15</volume>
          , pp.
          <fpage>405</fpage>
          -
          <lpage>415</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Solovyev</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            <given-names>V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Solnyshkina</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Assessment of reading difficulty levels in Russian academic texts: Approaches and Metrics</article-title>
          ,
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          , vol.
          <volume>34</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>3049</fpage>
          -
          <lpage>3058</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Matskovskiy</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          <article-title>Problemy chitabelnosti pechatnogo materiala [Problemsof printed material readability]</article-title>
          . In: Dridze,
          <string-name>
            <given-names>T.M.</given-names>
            &amp;
            <surname>Leontev</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.A</surname>
          </string-name>
          . (eds)
          <article-title>Smyslovoe vospriyatie rechevogo soobshcheniya v usloviyakh massovoy kommunikatsii [Semantic perception of verbal communication in the context of mass communication]</article-title>
          . Moscow: Nauka (
          <year>1976</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>