<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Morphemic Analysis of Russian Words</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics Nizhny Novgorod</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The paper considers the task of the morphemic analysis of Russian words and compares the efficiency of several proposed models. These models can be divided into three groups: derivational and inflectional rule-based, probabilistic, and hybrid models. The latter achieved state-of-the-art results of 0.848 F-score on a test set of 500 Russian words. The models use dictionaries of morphs and words and information about the part of speech and other morphological features of the word. Importantly, our solution takes into account synchronic wordformative relations between words. This allows for analyzing words in any grammatical form, as well as previously unseen words. Our system, which we make freely available to the community, also features morphemic annotation of entire texts and search for specified morphs.</p>
      </abstract>
      <kwd-group>
        <kwd>natural language processing</kwd>
        <kwd>morphemic analysis for Russian</kwd>
        <kwd>derivational and inflectional rules</kwd>
        <kwd>probabilistic models</kwd>
        <kwd>morphemic annotation of texts</kwd>
        <kwd>search of morphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Systems that perform automatic morphemic analysis have a wide scope of application.
They can be used in machine translation to reduce the volume of dictionaries and to
recognize multi-morpheme out-of-vocabulary words [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]. Automatic morphemic
analysis can be applied in morpheme notation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and speech recognition [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5-7</xref>
        ]. Also,
it can be used for morphemic and derivational annotation of corpora, which allows one
to study the functioning of the word-formation system, including the emergence of
neologisms and occasionalisms. This idea was put to use in the annotation of the Russian
National Corpus [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but a morphemic annotation / search tool can be useful for
analyzing any other corpus in Russian. Other areas of application include search engines
in which query expansion can be performed by finding words with the same root as the
query terms [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], as well as checking and self-checking morphemic analysis performed
by students [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ].
      </p>
      <p>
        In our study, we considered the existing approaches to automatic morphemic
analysis and developed our own models based on a set of rules and on a probabilistic model.
Our morphemic analysis tool [23], implemented in Python 3, also features functions for
text annotation and morph search. Our system uses a morph database derived from the
morpheme and spelling dictionary by A.N. Tikhonov [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and the 1980 Russian
grammar [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], as well as morph frequency and position data necessary for creating a
probabilistic model. This data, as well as gold standard word analyses used for testing, was
extracted from the same sources. We compared the performance of our system with one
of the very few available tools [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This choice was made because this system, unlike
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ], is able to analyze previously unseen words and words that are in non-initial
forms. Unfortunately, it is quite difficult to find a system for the morphemic analysis
of Russian words in the public domain.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review</title>
      <p>Approaches to automatic morpheme analysis differ depending on the application and
the databases used. The latter can be morph-only databases, or they can be
supplemented by databases of words or stems. Moreover, it is possible to automatically obtain
a morph list from a corpus or a word list.</p>
      <p>
        A.A. Karpov [
        <xref ref-type="bibr" rid="ref6">6, 16</xref>
        ] uses morphemic analysis for speech recognition, relying on
both morphological and morphemic dictionaries. If the word is not found in the
morphological dictionary, then it is assumed that all of it is the root. If the word has been
found, the ‘ending’ of the word is marked (the part consisting of suffixes, inflections
and postfixes), and then the prefix is cut off.
      </p>
      <p>P.V. Dikiy and M.V. Edush [17] do not use large word or stem databases to increase
the processing speed of their solution; they rely on morpheme dictionaries only. Their
system searches for prefixes and inflections, and then roots and suffixes by iterating
over the characters of the word. With a complete match, the morph is marked in the
word, while with a partial match, the search for morphs of this type continues, and if
there are no matches, the search for morphs of the next type is carried out.</p>
      <p>S.G. Fadeev and P.V. Zheltov [18] use only morph dictionaries for morphemic
segmentation of words. To optimize the work of the program, morph arrays are created;
these are sorted according to morph frequency and the position of the morph in the
group of morphs of this type. Because of ambiguity, it is necessary to continue matching
morphs even after finding the first match, but here, too, the number of checks can be
reduced. If the morphological features of the language prevent the appearance of a
given morph in some position (for example, one word cannot have two verbal endings),
it is necessary to start looking for morphs of the next type. If some morph in some
position does not occur in the database, but it is unknown whether in principle it can
appear in this position, before proceeding to the search for morphs of the next type, D
elements must be checked. The adjustable parameter D was called by the authors the
depth of morphemic analysis.</p>
      <p>
        M.G. Tagabileva and Yu.N. Berezutskaya [
        <xref ref-type="bibr" rid="ref8">8, 19</xref>
        ] used both affix and word
databases. The researchers applied morphemic analysis for the annotation of the Russian
National Corpus and implementing morpheme search functionality. Search in the main
subcorpus features the "derivation" option, which allows to take into account
alternations within the same morpheme. When searching for a morpheme, the user can specify
its position and type (prefix, root, suffix or inflection) [19]. As far as the morphemic
analysis model is concerned, first prefixes and suffixes are marked, then roots are cut
off. To mark prefixes, the authors take advantage of the fact that most words with
prefixes have unprefixed pairs. An algorithm for extracting prefixes in words with related
roots was developed. It is possible to obtain several morphemic analyses of the word,
of which the correct variants is selected manually [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        O.V. Kukushkina [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] tackles the task of finding related words and uses both affix
and stem dictionaries to disambiguate word roots. The author’s system is based on the
principle that prefixes and roots are marked in a strictly morphemic fashion, while
affixes are cut off formally, with no regard of their true morphemic boundaries. This is
due to the fact that the boundary between the prefix and the root is important for finding
the correct root, while finding boundaries within a suffix group does not affect root
boundaries.
      </p>
      <p>D. Bernhard [20] also solves the problem of finding related words. The author uses
unsupervised learning, that is, unannotated data is entered as input: a list of words
without morphemic boundaries or morph types indicated. This work is based on a
combination of three methods: considering the predictability of a word part, word comparison
and optimization. When detecting the most predictable word parts, transition
probabilities are applied. Word comparison is used to find sub-strings discriminating words.
Optimization consists in the use of the length and frequency of the morph to select the
right morphemic analysis.</p>
      <p>A.S. Sapin and E.I. Bolshakova [21] developed the morphological analyzer
CrossMorphy, one of the functions of which is automatic morphemic decomposition.
Considering this problem as a classification problem within the framework of machine
learning, the authors apply Conditional Random Fields. As the training set, ready-made
morphemic analyses were taken from the dictionaries of the CrossLexis system (23,400
words) and the Wiktionary (94,400 words). The accuracy for these resources was 0.79
and 0.69, respectively.</p>
      <p>As can be seen, the existing systems for morphemic analysis of Russian words use
a variety of methods and approaches. In many cases, unfortunately, accuracy is not
reported. We decided to develop our own system, with a few distinctive features in
mind. In particular, our system takes into account the derivational connections between
words, the part of the speech of the analyzed word and its morphological features.
Additionally, the system is able to process word forms that are different from the lemma.
The program also allows for accurately analyzing out-of-vocabulary and complex
words, as well as perform morphemic annotation of arbitrary texts.</p>
    </sec>
    <sec id="sec-3">
      <title>Developing a System for Morphemic Analysis</title>
      <sec id="sec-3-1">
        <title>Basic Rule-Based Model (rules)</title>
        <p>
          Lists of prefixes, suffixes, postfixes and repeated elements of complex word formation
patterns were derived from the Russian grammar [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The morphs of the latter group
were designated as "recurring elements". The suffix and postfix lists were distributed
in accordance with the parts of speech they are characteristic of.
        </p>
        <p>First, postfixes, inflections and form-building suffixes are found, then prefixes and
suffixes are marked in words formed by the prefixed-suffix method. If prefixes have
not been marked by this stage, then prefixes and "recurring elements" are selected.
Then, depending on the part of speech of the analyzed word, other suffixes are found.</p>
        <p>Postfixes, inflections and form-building suffixes are successively cut off from the
word end. If the end parts of several morphs coincide, then the longest morph from this
group is chosen. First, the part of the speech of the word is determined. Postfixes and
form-building suffixes are marked for static parts of speech: the infinitive (INFN),
verbal participles (GRND), the comparative (COMP), adverbs (ADVB) and stative words
(PRED). Next, the inflections and form-building suffixes of the morphologically-rich
parts of speech are cut off. In the declinable parts of speech: noun-like pronouns
(NPROs), full adjectives (ADJF), numerals (NUMR), full participles (PRTF), nouns
(NOUN) - the marking of the inflection is based on letter-by-letter comparison of word
forms of different numbers and cases. For unchangeable words, the inflection is not
marked, but some other words may have the zero inflection. When comparing symbols,
the model takes into account the possible alternation of sounds. Then form-building
suffixes are marked for full participles, adjectives and comparative degree adverbs. The
next are the inflections of finite verbs (VERB). For verbs in the indicative mood of the
present and future tense, the inflection is marked by means of conjugating the verbs. In
order to find the inflection of past tense verbs and conditional mood forms, these words
are inflected for gender and number. Then, for verbs in the imperative mood, inflections
and form-building suffixes are marked. After that, the inflections and suffixes of short
adjectives (ADJS) and short participles (PRTS) are found.</p>
        <p>At the next stage, words formed with the prefix-suffix method are considered. For
these, possible formants (derivational affixes) and base words are established. If the
hypothetic base word is found in the dictionary, then the prefix and suffix are marked
in the analyzed word. For example, in the word собеседник (conversation partner), the
prefix со- and the suffix –ник- are properly found, because there is the base word
беседа (conversation) in the dictionary.</p>
        <p>The next step is the marking of prefixes. If an unprefixed pair is found for the word,
then the corresponding prefix is marked. For example, in the word принесём ([we] will
bring) the prefix при- will be marked, because for the lemma нести (to bring) the
unprefixed pair нести is found. If the beginning of the analyzed wordform coincides with
the prefix, but the corresponding unprefixed pair is not found, then it is possible that
the word contains a bound base (for example, as in the word поднять – to lift).
Therefore, the system checks whether there are words in the dictionary, the first part of which
is a prefix, and the remaining part coincides with the unprefixed part of the analyzed
word (for the example word given above such a word is при-нять – to accept, which
has the same root as под-нять). If at least one such word is found, then the
corresponding prefix is marked. Similarly to prefixes, "recurring elements" are found.</p>
        <p>Next, word-building suffixes are marked. Taking into account the derivational
connections between words makes it possible to correctly identify suffixes, even if
character sequences are the same. The rest of the word is considered the root.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Improved Rule-Based Model (rules_corrected)</title>
        <p>
          This model is a modification of the previous one. By removing prefixes, suffixes and
inflections from the list of all morphs extracted from the dictionary of A.N. Tikhonov
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], a list of roots was obtained. To prevent excessive marking of prefixes, the
following conditions were set: the prefix is not marked if the word is less than three characters
in length or the word starts with an element that is found in the list of roots and the root
is one character longer than the prefix that was originally found in the word.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Maximum Matching Model (maxmatch)</title>
        <p>For this and other probabilistic models below we used 100 614 lemmata from the
dictionary, which comprise 17 017 different morphs. In this model, a part of the word is
considered a morph if it is included in the list of morphs and is the longest possible
match. The function maxmatch(s) takes a sequence as input, which is split into morphs.
It uses the parameters i and j that specify the beginning and end of the morph,
respectively. First, it is assumed that the entire word is a morph, and if there are no
coincidences, the position j is decreased by one. If the substring under consideration coincides
with some morph on the list, the morph is marked. The boundary i is moved and placed
at the end of the marked morph, and the boundary j is again placed at the end of the
word. The procedure continues until the boundary i reaches the end of the word.
3.4</p>
        <p>Log_likelihood Model
The model is based on finding the maximum likelihood for morphs. All possible
combinations of morpheme boundaries in the word under analysis are considered, then
those are selected in which the resulting word segments can occur at a given position
and are found in the list of morphs. Then the logarithms of the probabilities of the
candidate analyses are calculated. The analysis that has the maximum value is selected.</p>
        <p>
          By processing all morphemic analyses from Tikhonov's dictionary [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], an
associative array of morphs, positions and frequencies is created. Then a list of all possible
word segmentations is obtained. In total, 2x-1 ways of segmenting the word are possible,
where x is the length of the word in characters (it is assumed that the boundary cannot
occur at the beginning of the word). For each of these segmentations, a bit sequence
mask is created that contains information about the presence of morphemic boundaries:
if there is no morpheme boundary after the symbol, then the value in the corresponding
position of the mask is 0, and if there is a boundary, then the value is 1. From all
segmentations, the system selects the ones in which the resulting word segments are all
found in the morph list.
        </p>
        <p>Next, the most probable morphemic analysis of the word is selected. For each
candidate analysis, the system computes the product of the probabilities of the morphs
occurring in the word. The analysis with the highest value of the natural logarithm of
this product is chosen as the most probable one. Since the number of possible
segmentations is exponential in the length of the word, the maxmatch model is used instead of
log_likelihood for words longer than 18 characters.
3.5</p>
      </sec>
      <sec id="sec-3-4">
        <title>Arithmetic Mean Model (mean)</title>
        <p>This is a slight modification of the log_likelihood model. The mean model also
considers all possible segmentations of the word under analysis, but computes the arithmetic
mean of morph probabilities for each candidate analysis. The one for which this
arithmetic mean is greatest is chosen as the best analysis.
3.6</p>
      </sec>
      <sec id="sec-3-5">
        <title>Combined Models</title>
        <p>These models are combinations of the above ones: rules_corrected, maxmatch,
log_likelihood, and mean. First, the rules_corrected model extracts postfixes,
inflections, prefixes and suffixes. For finding the root and suffixes not found by
rules_corrected, one of the three other models (maxmatch, log_likelihood, or mean) is used.
3.7</p>
      </sec>
      <sec id="sec-3-6">
        <title>Morphemic Annotation of Text</title>
        <p>All models allow for two modes of operation: analysis of individual words or text
annotation. When the text annotation mode is chosen, the system simply performs the
analysis or each successive token in the text. Function words and interjections are
skipped. For the maxmatch, log_likelihood, mean, rules_corrected+maxmatch,
rules_corrected+log_likelihood, and rules_corrected+mean models, annotation only
includes morphemic boundaries within every word. For the rules and rules_corrected
models, morph types are also part of the annotation. The rules and rules_corrected
models also make it possible to search for a morph by its type.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <sec id="sec-4-1">
        <title>Evaluation Metrics</title>
        <p>We use the metrics precision, recall, and F-measure in the same form as they were
applied to morphemic analysis evaluation by K. Ak and O.T. Yildiz [22]. The values
of these metrics are calculated based on the following parameters: hits is the number of
correct boundaries (true positives), insertions is the number of unnecessary boundaries
(false positives), and deletions is the number of overlooked boundaries (false
negatives).</p>
        <p>Then precision, recall, and F-measure are calculated as follows:
 =
 =</p>
        <p>ℎ
ℎ +</p>
        <p>ℎ
ℎ + 
 −  =</p>
        <p>2×ℎ
2×ℎ +  + 
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation Setting</title>
        <p>
          For evaluation, a random sample of 500 words from the Tikhonov dictionary was
obtained. It is important to note that these words had been removed from the dictionary
before any training of our models took place. Thus, the 500 words were completely
‘new’ to all our models. This test set was used to compare the performance of our
models described in Section 3 with one of the available morphemic analysis tools for
Russian [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] that we set as the baseline. The latter returns several candidate analyses. For
testing, we chose the analysis listed as the most probable.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Results and Discussion</title>
        <p>The rules model shows very high precision (0.905) since it takes into account word
forms and derivational connections, so it is very accurate at marking postfixes,
inflections, prefixes and form-building suffixes. The small recall value (0.639) can be
explained by the absence of capabilities for analyzing complex words, as well as ignoring
some of the word-building suffixes and finding non-existent prefixes. The
rules_corrected model demonstrates even better precision (0.944) due to more accurate
prefixfinding.</p>
        <p>The maxmatch, log_likelihood and mean models yield lower results, since they do
not take into account form-building and derivational connections between words. The
low recall (0.567) of the maxmatch model is due to the fact that the model tries to match
the longest possible morphs, and the relatively high recall (0.795) of the mean model
can be explained by the high frequency of short morphs, which leads this model to
segmenting words into smaller parts. The maxmatch and log_likelihood models have
the same metric values, because the product of probabilities increases with the decrease
in the number of factors, which corresponds to a decrease in the number of morphs.</p>
        <p>
          The F-measure values of the rules_corrected+maxmatch (0.848) and
rules_corrected+log_likelihood (0.847) models are quite high due to the fact that these models
produce morphemic segmentation for a larger number of derivational suffixes and
complex words. These models decisively outperform the existing morphemic analysis
system [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] set as the baseline (0.769).
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In an effort to solve the morphemic analysis task for the Russian language, we have
developed a few models: rule-based, probabilistic and combined. The rules model was
created based on the rules of form building and word formation. The improved version
of the model, rules_corrected, has better precision due to more accurate marking of
prefixes. The maxmatch, log_likelihood, and mean models use such characteristics of
morphs as length, frequency and position. By combining the rules_corrected and
maxmatch models we managed to achieve the best performance of 0.848 F-measure on a
gold standard set of 500 held-out words analyzed for morphemic structure.</p>
      <p>Our system, which is made available to the community [23], also features
morphemic annotation of arbitrary texts and morpheme search. The best-performing
models take into account the form-building patterns, derivational connections between
words, the part of the speech of the analyzed word and its other morphological features.
The system can analyze previously unseen and complex words, as well as words in
noninitial forms.</p>
      <p>As is usually the case, there is still room for improvement. We believe that even
better quality of morphemic analysis is achievable by paying more attention to
wordformative suffixes and improving the model for analyzing complex words. In terms of
functionality, it is also possible to implement search for related words in a text.
16. Ronzhin, A.L., Karpov, A.A.: Implementation of morphemic analysis for Russian speech
recognition. In: 9th Conference Speech and Computer, 2004.
http://www.iscaspeech.org/archive_open/specom_04/spc4_291.pdf
17. Edush, M.V., Dikiy, P.V.: The algorithm and the practical realization of the morphemic
decomposition [Algoritm i prakticheskaya realizatsiya morfemnogo razbora].
http://taac.org.ua/files/a2011/proceedings/RU-1-Dikiy%20Petr%20Viktorovich-82.pdf
18. Fadeev, S.G., Zheltov, P.V.: Optimization options of word forms morphemic analysis on the
basis of statistical knowledge. In: Russian linguistic bulletin 3 (7), pp 15 - 17 (2016)
19. The Russian National Corpus. The search in the corpus: the main corpus [Natsional'nyy
korpus russkogo yazyka. Poisk v korpuse: osnovnoy korpus].
http://www.ruscorpora.ru/search-main.html
20. Bernhard, D.: Unsupervised Morphological Segmentation Based on Segment Predictability
and Word Segments Alignment. In: M. Kurimo, M. Creutz, and K. Lagus (eds.), Proceedings
of the Pascal Challenges Workshop on the Unsupervised Segmentation of Words into
Morphemes, pp 19-23 (2006)
21. Sapin, A.S., Bol'shakova, E.I.: Features of the construction of morphoprocessor
CrossMorphy for the Russian language [Osobennosti postroeniya morfoprotsessora russkogo
yazyka CrossMorphy]. In: New information technologies in automatic systems: materials of
the 20th workshop [Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh:
materialy dvadtsatogo nauchno-prakticheskogo seminara]. Мoscow: Keldysh Institute of
Applied Mathematics [Institut prikladnoy matematiki imeni M. V. Keldysha], pp 73 - 81
(2017)
22. Ak, K., Yildiz O.T.: Unsupervised morphological analysis using tries. In: Computer and</p>
      <p>Information Sciences II. – Springer, London, pp 69-75 (2011)
23. Morphemic analysis system for Russian.
https://github.com/LudmilaMaltina/morphemicanalysis-rus</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          , H.:
          <article-title>Joint Semantic Synthesis and Morphological Analysis of the Derived Word</article-title>
          . In:
          <article-title>Transactions of the Association for Computational Linguistics</article-title>
          , vol
          <volume>6</volume>
          , pp
          <fpage>33</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Fritzinger</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing</article-title>
          .
          <source>In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR</source>
          , pp
          <fpage>224</fpage>
          -
          <lpage>234</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sennric</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haddow</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birch</surname>
            ,
            <given-names>A.:</given-names>
          </string-name>
          <article-title>Neural Machine Translation of Rare Words with Subword Units</article-title>
          .
          <source>In: Proceedings of the 59th ACL</source>
          , pp
          <fpage>1715</fpage>
          -
          <lpage>1725</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Plungyan</surname>
          </string-name>
          , V.A.:
          <article-title>General Morphology: the introduction into subject [Obshchaya morfologiya: Vvedenie v problematiku]: textbook, 2nd edition</article-title>
          , 384 pp.
          <article-title>Moscow: Editorial URSS (</article-title>
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huckvale</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Experiments in Applying Morphological Analysis in Speech Recognition and Their Cognitive Explanation</article-title>
          .
          <source>In: IOA Conference on Speech and Hearing</source>
          . http://discovery.ucl.ac.uk/74330/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Karpov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Models and program realization of Russian speech recognition based on morphemic analysis [Modeli i programmnaya realizatsiya raspoznavaniya russkoy rechi na osnove morfemnogo analiza], a PhD thesis</article-title>
          . Saint-Petersburg, 129 pp (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tachbelie</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abate</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menzel</surname>
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Morpheme-based Automatic Speech Recognition for A Morphologically Rich Language Amharic</article-title>
          .
          <source>In: Proceedings of the 2nd International Workshop on Spoken Languages Technologies for Under-resourced Languages (SLTU'10)</source>
          , pp
          <fpage>68</fpage>
          -
          <lpage>73</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tagabileva</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berezutskaya</surname>
            ,
            <given-names>Yu.N.</given-names>
          </string-name>
          :
          <article-title>Word-formation annotation of the Russian National Corpus: aims and methods [Slovoobrazovatel'naya razmetka Natsional'nogo Korpusa russkogo yazyka: zadachi i metody]</article-title>
          . In:
          <article-title>Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” [Komp'yuternaya lingvistika i intellektual'nye tekhnologii: Po materialam ezhegodnoy Mezhdunarodnoy konferentsii «Dialog»], issue</article-title>
          <volume>9</volume>
          (
          <issue>16</issue>
          ), pp
          <fpage>499</fpage>
          -
          <lpage>507</lpage>
          . Мoscow: Russian State University for the Humanities [
          <article-title>Rossiyskiy gosudarstvennyy gumanitarnyy universitet] (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kukushkina</surname>
            ,
            <given-names>O.V.</given-names>
          </string-name>
          :
          <article-title>The problems of morphemic decomposition and automatization of the process of morphemic segmentation of the Russian word [Problemy morfemnogo chleneniya i avtomatizatsiya protsessa morfemnoy segmentatsii russkogo slova]</article-title>
          . In:
          <article-title>Russian computational and quantitative linguistics [Russkaya komp'yuternaya i kvantitativnaya lingvistika]</article-title>
          . http://philol.msu.ru&gt;~
          <fpage>rlc2001</fpage>
          ...files/komp_linv/doc
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>Decomposition of words [Razbor slov po sostavu]</article-title>
          . http://www. morphemeonline
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <article-title>Search in the dictionaries</article-title>
          .
          <source>Morphemics [Poisk</source>
          v slovaryakh. Morfemika]. http://www.udarenieru.ru
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>12. Online dictionaries [Slovari onlayn]. http://www.slovonline.ru</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <article-title>Dictionaries of the Russian language for downloading. The archives of the forum "Speak Russian" [Slovari russkogo yazyka dlya skachivaniya</article-title>
          . Arkhivy foruma «Govorim porusski»]. http://www.speakrus.ru/dict/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Russian</surname>
          </string-name>
          <article-title>Grammar [Russkaya grammatika]</article-title>
          .
          <source>Vol</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Phonetics</surname>
          </string-name>
          . Phonology. Accent. Intonation.
          <article-title>Word-formation</article-title>
          .
          <source>Morphology [Fonetika. Fonologiya. Udarenie. Intonatsiya. Slovoobrazovanie</source>
          . Morfologiya] /
          <string-name>
            <given-names>N.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <source>Shvedova (main editor)</source>
          ,
          <volume>789</volume>
          pp.
          <source>Мoscow: Science [Nauka]</source>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>Morphological, phonetic and morphemic analysis of the word [Morfologicheskiy, foneticheskiy i morfemnyy razbor slova]</article-title>
          . https://vnutrislova.net
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>