An Experimental Study of Neural Morpheme Segmentation Models for Russian Word Forms Elena Bolshakova1[0000−0002−8659−5978] and Alexander Sapin2[0000−0002−9532−132X] 1 Lomonosov Moscow State University, National Research University Higher School of Economics, Moscow, Russia eibolshakova@gmail.com 2 Lomonosov Moscow State University, Moscow, Russia alesapin@gmail.com Abstract. Morphemic structure of words is useful for various NLP prob- lems, in particular, for deriving a meaning of unknown words in languages with rich morphology, such as Russian. For Russian, several machine learning models for automatic morpheme segmentation of words were built, but only for parsing their lemmas. Meanwhile, significantly vary- ing word forms are present in texts, among them unknown words are often encountered, and their lemmas are unknown. The paper reports on experiments for comparing two ways to automatically segment Russian word forms, both ways involve splitting into morphs and classification of resulted morphs. The former is based on a neural model trained on a data set automatically augmented with segmented word forms, the latter pro- duces segmentation through predicted lemma and a pre-trained neural morpheme segmentation model for lemmas. It was shown that the models have comparable quality in morpheme segmentation with classification, and the model based on the augmented dataset slightly outperforms in word-level classification accuracy. Keywords: morphological segmentation, morpheme analysis of Russian word forms, neural network models for morphology, morpheme segmen- tation with classification 1 Introduction Morpheme segmentation as a kind of morphological analysis implies splitting words into constituent morphs, which are the surface forms of morphemes (roots and affixes), for example: без-вкус-н-ый, taste-less. Though the task of auto- matic morpheme segmentation was studied in early years of natural language processing (NLP), significant progress in its solution has appeared in recent years, when various machine learning techniques began to be applied. Since morphemes are the smallest meaningful language units, information about morphemic structure of words is already in use in various NLP applica- tions and auxiliary tasks, including machine translation [2], recognition of se- mantically related words (cognates, paronyms, etc.), creating derivational trees Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Elena Bolshakova and Alexander Sapin of words [10], constructing word embeddings [3] for handling rare and out-of- vocabulary words (by deriving their meaning based on distributional word vector representations) and so on. Morpheme segmentation is especially topical and at the same time more dif- ficult for languages with rich morphologies (such as Russian or Finnish). For morphologically rich languages with many affixes of various types and mean- ings, a more complicated task is relevant, which involves besides segmentation classification of segmented morphs. The main types of morphemes are Prefix, Root, Suffix, Ending, for example: без:PREF/вкус:ROOT/н:SUFF/ый:END, taste:ROOT/less:SUFF. The first works on morpheme segmentation were pure statistical and dictionary- based [10]. Since during a long time only a small amount of words with labeled segmented morphemes was available for training, only unsupervised and semi- supervised machine learning techniques were applied, the most known solutions are implemented in Morfessor system [8, 12]. The task of morpheme segmentation with classification of segmented morphs remained almost unexplored until recent works [4, 5, 13] undertaken for Rus- sian, due to powerful supervised machine learning techniques applied to relevant labeled data, first of all, the dataset from Tikhonov’s derivation dictionary [15]. These works presented various supervised models with open-source code: – Convolutional neural network (CNN) model3 [13]; – Gradient boosted decision trees (GBDT) model4 [4]; – Bidirectional long short-term memory (Bi-LSTM) neural model5 [5]. The implemented methods consider the task of morpheme segmentation with classification as sequence labeling [14] and classify letters of words according to main types of morphs. As showed by comparative evaluation of the models, which was undertaken in the works [4, 5], they all achieve F-measure about 98– 99% for detecting morpheme boundaries and they also show high accuracy of morpheme classification: up to 96–98% for letters, and about to 87–89% for whole words (depending on training datasets and model hyper parameters). Therefore, these models present state-of-the-art (SOTA) methods for the task of morpheme segmentation with classification. However, these SOTA models for Russian were developed only for morpheme segmentation of lemmas (normalized forms of words), so far as only lemmas are present in the existing labeled datasets. Meanwhile, for morphologically rich and highly-inflecting Russian language, significantly varying word forms are present in texts, in particular, for verb успеть (to be in time) more than 15 its forms may be used: успеют, успел, успели and so on. Among various word forms, unknown ones are often encountered, and their lemmas are unknown. Since it turned out that the developed SOTA models work poorly for word forms, giving 3 https://github.com/AlexeySorokin/NeuralMorphemeSegmentation 4 https://github.com/alesapin/GBDTMorphParsing 5 https://github.com/alesapin/RussianMorphParsing 3 only about 30% for classification accuracy, we aimed to research segmentation methods applicable for word forms. In the paper we describe and experimentally compare two ways to automat- ically segment Russian word forms, both ways involve splitting into morphs and classification of resulted morphs. The former is based on a neural model trained on a dataset automatically augmented with segmented and labeled word forms, the latter produces segmentation through predicted lemma and a pre-trained segmentation model for lemmas. It is unclear a priory, which of the ways is preferable, and to evaluate them, we have chosen CNN model as a core for both ways and have exploited an available dataset containing about 90,000 segmented words (lemmas) from Tikhonov’s dictionary [15]. To train the model on word forms, we have extended this dataset by segmented word forms generated by an augmentation procedure we have developed. Experimental evaluation has shown that the model trained on the augmented dataset (hereafter, model on word forms) and the model trained on lemmas and supplemented by the rules for segmenting the word form based on its segmented lemma (hereafter, hybrid model) have comparable quality in morpheme segmen- tation with classification (as well as comparable with quality of SOTA methods), while the model on word forms slightly wins in word-level classification accuracy with score 88%. The paper starts with an overview of the main works on the morpheme segmentation, followed by explanation of our augmentation procedure and the resulted augmented dataset. Then our CNN model architecture and key issues of training the model on word forms are described, and the results of experiments with the compared models are reported and discussed. Finally, we present some conclusion. 2 Related Work The earliest method of morpheme segmentation was proposed by Z. Harris in [9], it detects morpheme boundaries by letter variety statistics (LVS) [7]. Despite that the method showed only 61% of precision (tested on a small English dic- tionary), the statistics was useful in many subsequent researches of the task, in particular [4, 11]. In the next years, the most known solutions for morpheme segmentation were implemented in Morfessor system [8, 12], which exploits unsupervised machine learning methods to be trained on a large unlabelled text. The pure unsupervised method and its semi-supervised version that uses some labeled data in addition to the text collection give about 70–80% of F-measure for detected morpheme boundaries (tested on English, Finnish, and Turkish words). Another kind of semi-supervised machine learning for morpheme segmenta- tion [11] was based on conditional random fields (CRF), the task was considered as sequential classifying and labeling letters of a given word. Besides LVS values and features of letters, the developed CRF classifier exploits some data obtained by Morfessor, thus increasing F-measure on morpheme boundaries to 84–91%. 4 Elena Bolshakova and Alexander Sapin A pure supervised method with significantly better quality for the twofold task of morpheme segmentation with classification was proposed in [13], it was effective due to applying convolutional neural network (CNN) and training on the representative labeled data of Tikhonov’s dictionary [15]. The task is consid- ered as sequence labeling by classifying letters with 22 classes based on BMES labeling scheme: the classes account for beginning (B), middle (M), and ending (E) positions of a letter in the corresponding affix (prefix, root, suffix, postfix), as well as single (S) letter variants of affixes, and also hyphen and linking letter in multi-root and hyphenated words. The trained CNN model is supplemented with post editing of predicted classes by an auxiliary correcting procedure, which fixes some wrong sequences of classes, according to their probabilities. The model outperforms all previous morpheme segmentation models, giving F-measure up to 98% on morpheme boundaries and also achieving classification accuracy of 96% for letters and 88% for whole words. Two more supervised machine learning models for morpheme segmentation with classification were developed for Russian words in recent works [4, 5]: the first is based on decision trees with gradient boosting (GBDT), while the second applies Bi-LSTM neural network. In both models, unlike the CNN model, the number of letter classes was reduced to 10, since the set of BMES labels is re- dundant even for recognizing successive affixes and roots. The GBDT classifier takes into account features of the letter (in particular, its position in the word and LVS values), features of its word (some morphological tags), and also win- dow of 5 previous and 5 subsequent letters. The Bi-LSTM model [5] has three LSTM layers, the input includes one-hot encoded letters and also some morpho- logical tags of the word being segmented. Both GBDT and Bi-LSTM morpheme segmentation models were trained and evaluated on two different datasets of Russian words segmented into labeled morphs, including Tikhonov’s dataset. Evaluation of these CNN, GBDT and Bi-LSTM models trained on the same Russian datasets has showed their comparable quality, about 98–99% of F- measure on morpheme boundaries and 96–98% of classification accuracy for letters and about 87–89% on words [4, 5]. For now, they are SOTA methods outperforming the previously developed ones, both for morpheme segmentation and for segmentation with classification. However, they were developed for seg- menting lemmas (normalized word forms), not for various word forms encoun- tered in texts. Therefore, it seems reasonable to study possible ways to build a more broad supervised model, and for this purpose, a dataset with word forms splitted into morphs is needed. 3 Data Augmentation In order to build a dataset augmented with segmented word forms and thus suitable for training, we have developed a procedure that produces necessary segmentation of word forms based on known segmentation of the corresponding lemmas along with grammatical information about Russian word formation suf- 5 fixes and about specific features of Russian inflection for words of various part of speech [16]. The dataset6 based on Tikhonov’s dictionary was the source of segmented and labeled lemmas, and various word forms for a particular lemma were taken from Open Corpora dictionary7 [1]. The dataset encompasses 96,046 words (lemmas) of main part of speech: nouns, adjectives, verbs, adverbs. Segmented morphs of words are classified according main morpheme types of Russian language (prefix, root, suffix, ending, postfix), and successive prefixes and suffixes (if any) are labeled, for example, the verb смазываться (to lubricate) is segmented and labeled as с:PREF/маз:ROOT/ыва:SUFF/ть:SUFF/ся:POSTFIX. While applying our augmentation procedure, all lemmas from Tikhonov’s dataset were considered and their corresponding word forms from OpenCorpora were processed, but those dataset elements that are absent in Open Corpora dictionary were discarded (approximately, 5 thous. words, the most of them are very rare, such as гофмейстерский, яспис, спассеровать). For a given word form to be segmented and its segmented lemma, the proce- dure applies segmenting rules depending on the part of speech and its subclass. For most nouns, adjective, and participles the rules are quite simple: in the gen- eral case, the given word form and lemma have some common beginning, and if the rest part of the lemma is labeled as ending, the rest part of the word form is also annotated as ending, whereas its common part copies segmentation and labels of the lemma. The following word pair illustrates the rule: Lemma: разрумяненный раз:PREF/румян:ROOT/енн:SUFF/ый:END Word form: разрумяненному раз:PREF/румян:ROOT/енн:SUFF/ому:END However, for some subclasses of nouns and adjectives (words with a final yota: ковбой – ковбоя, соболий – собольего), short adjectives (послушный – послушен) nouns (words with fugitive vowels: день – дня, замочек – замочка), as well as for verbs, more difficult segmenting rules were elaborated. Specifically, to segment personal verbal forms and gerund (e.g., увидевши – у:PREF/вид :ROOT/е:SUFF/вши:SUFF), after detection of the common part with infinitive form, the segmenting rules sequentially try to recognize and to label word-formative suffixes (ова, ева, ыва, ива, вши, ев, ен, в, л, and so on) and postfix (ся, сь) in the mismatching part of the given word form, and its rest part (if any) is classified as ending. Here is an example: Lemma выходить вы:PREF/ход :ROOT/и:SUFF/ть:SUFF Word form выходила вы:PREF/ход :ROOT/и:SUFF/л:SUFF/а:END In such a way, our augmentation procedure has processed about 92% of wordforms. Some rare difficult cases were discarded, in particular, consonant alternation, but such discarding does not impact on the result of comparing. The resulted dataset augmented with segmented and classified word forms has a total size of 1,130,359 elements: 34% nouns, 32.35% adjectives and participles, 33.56% verbal forms, and 0.07% words of other POS. 6 https://github.com/AlexeySorokin/NeuralMorphemeSegmentation/tree/master/data 7 http://opencorpora.org 6 Elena Bolshakova and Alexander Sapin The augmented dataset consists of inflectional paradigms for the processed lemmas (hereafter, inflectional groups), each group encompasses word forms for a particular lemma. Groups for nouns and adjectives are relatively small, while for verbs, a group includes all forms of present, future, and past tense, gerund forms, up to 31 elements. Here is a fragment of inflectional group for verb обсыпать (to strew ): обсыпать об:PREF/сып:ROOT/а:SUFF/ть:SUFF обсыпал об:PREF/сып:ROOT/а:SUFF/л:SUFF обсыпала об:PREF/сып:ROOT/а:SUFF/л:SUFF/а:END обсыпало об:PREF/сып:ROOT/а:SUFF/л:SUFF/о:END обсыпали об:PREF/сып:ROOT/а:SUFF/л:SUFF/и:END обсыплю об:PREF/сып:ROOT/л:SUFF/ю:END обсыпем об:PREF/сып:ROOT/ем:END обсыплем об:PREF/сып:ROOT/л:SUFF/ем:END обсыпешь об:PREF/сып:ROOT/ешь:END обсыплешь об:PREF/сып:ROOT/л:SUFF/ешь:END обсыпете об:PREF/сып:ROOT/ете:END 4 Model Architecture For our study of segmenting word forms and building morpheme segmentation models, among three SOTA models for morpheme segmentation, namely CNN, GBDT, and Bi-LSTM we have chosen convolutional neural network (CNN), because CNN is training much faster than others, and at the same time does not lose in quality. For simplification of experiments, in all our segmentation models we did not use the auxiliary correction procedure proposed for the original CNN model, as well as ensembles of several models [13]. Though such techniques improve quality of segmentation, but not significantly (1–2%), moreover, their application is not necessary for correct comparison of our model on word forms and hybrid model, as they use the same neural architecture. All our trained CNN models for segmenting words (word forms) were im- plemented with Keras library [6] (based on Tensorflow). As model input we use letters represented in one-hot encoding format, complementing them with in- formation about is a particular letter vowel or not, and also with POS tag of the word, which are taken from morphological analyzer, one-hot encoded and concatenated with letter vectors. To align all words to the same fixed length (20 letters), we evidently exploit padding, but with masking residual letters (by excluding them while calculating errors), in order to avoid their influence on gradient descent. Thereby, one word is represented as an 1120 di- mensional vector. The model has several layers, the last layer is fully connected and completed with a softmax activation function, which outputs a probability distribution over all possible letter classes. The resulted classes of letters are obtained from probability distribution with argmax function. Similar to works [4, 5] we apply simplified (i.e., BE) labeling scheme of letters, with 11 classes. 7 Various hyperparameters of our CNN model were experimentally tested in preliminary experiments. The resulted model has four layers with 512 filters in each layer, dropout of 40%, ReLU activation function and kernel size of 5. More filters in a layer slightly improve the quality (less than 0.5%), but the model became too heavy both for training and for evaluation. As for additional layers, they also do not significantly improve quality: the model with three layers gives sufficient results, losing to four-layer network only about 1–2%. Among the gradient descent algorithms (Adam, RMSprop, SGD), the better results were shown by Adam. 5 Models on the Augmented Dataset For all our experiments, the data sets (original Tikhonov’s dataset and the aug- mented one) were randomly divided in proportion 70:10:20 for training, vali- dation, and testing, respectively; the training subset of the augmented dataset includes 791 thous. word forms. After tuning the model with random splits, for correct evaluation of the models, we have fixed our training, testing and validation sets for reproducibility. All trained and evaluated models are freely available8 . In experiments with training our CNN model on the augmented dataset, two different variants of random dividing the dataset were studied: – Random mixing of labeled word forms and then splitting them to training and testing subsets; – Random mixing of inflectional groups (each group consists of all word forms corresponding to the same lemma); and after that splitting to training and testing subsets is performed (thus, splitting does not divide the groups). Thereby we have obtained two trained models, namely, the model on word forms with simple mixing and the model on word forms with group mixing, the results of their evaluation are presented in Tables 1, 2. Table 1 shows quality of only segmentation measured in precision, recall, and F-measure (computed as mean harmonic of the recall and precision). Table 1. Evaluation of morpheme segmentation for models on word forms Model: Word Forms Lemmas Training set Precision Recall F-measure Precision Recall F-measure Simple Mixing 99.40 99.65 99.52 98.82 99.32 99.07 Group Mixing 97.76 98.65 98.20 97.04 98.17 97.60 Only Lemmas 89.60 89.44 89.52 96.95 98.14 97.54 Table 2 corresponds to classification accuracy of the segmented morphs, for letters and for whole words, respectively. The former is the ratio of correctly 8 https://github.com/alesapin/XMorphy 8 Elena Bolshakova and Alexander Sapin recognized classes of letters to the number of all letters, the latter estimates the ratio of completely correctly segmented words with true classes of all their letters. For comparison, in the last lines of the Tables we have added scores of the CNN model trained only on lemmas taken from the augmented dataset (more precise, from its training subset). The scores show that this model significantly loses when applied to word forms: much worse F-measure on morpheme bound- aries (89.52%) and even worse classification accuracy (81.19% and 34.30% for letters and word, respectively). At the same time, almost similar scores for lem- mas confirm consistency of experimental settings. Table 2. Classification accuracy for models on word forms Model: Word Forms Lemmas Training set Letters Words Letter Word Simple Mixing 99.26 96.75 98.53 94.46 Group Mixing 96.94 88.89 96.00 86.36 Only Lemmas 81.19 34.30 95.98 86.07 As for our models on word forms, the model with simple mixing outperforms its counterpart in all the scores (slightly on morphs boundaries and significantly in classification accuracy for words). The explanation is simple: since inflectional groups may be divided while mixing and splitting to training and testing subsets for the model with simple mixing, the testing subset can contain some word forms of the groups, whose elements are present in the training subset, and this improves evaluation results. At the same time, the quality of the model with group mixing is comparable with SOTA morpheme segmentation models built on lemmas. Therefore, it is not quite correct to compare the model with simple mixing with our hybrid model for segmenting word forms, and we have compared only the model with group mixing. 6 Comparison with the Hybrid Model The hybrid model implements another way to segment word forms, which implies the following steps: 1. converting a given word form into its lemma; 2. segmenting the latter by the model trained on lemmas (in our experiments, by the model already learned and indicated in the last lines of Tables 1, 2); 3. transforming the resulted segmented lemma into a segmented word with the aid of the procedure and segmenting rules described in section 3. Using the model already trained on lemmas, we have evaluated the proposed hybrid model, with precision, recall and F-measure on morph boundaries (for 9 segmentation, the scores are given in Table 3), and also accuracy both for letters and whole words (see Table 4). For comparison, in these Tables we repeat scores of the model on word forms (trained with group mixing). It is important, that CNN network of the hybrid model was trained on the lemmas taken from the training dataset for the model on word forms, and it was evaluated on the same testing set. Table 3. Evaluation of morpheme segmentation for hybrid and word forms model Model: Word Forms Lemmas Training set Precision Recall F-measure Precision Recall F-measure Hybrid Model 97.37 98.44 97.90 96.95 98.14 97.54 Model on Word forms 97.76 98.65 98.20 97.04 98.17 97.60 One can notice that two our evaluated models for segmenting Russian word forms have highly close scores for morpheme segmentation, while for classifica- tion (Table 4), the model on word forms (group mixing) slightly wins both for letters and words. Table 4. Classification accuracy for hybrid model and model on word forms Model: Word Forms Lemmas Training set Letters Words Letters Words Hybrid Model 96.51 87.28 95.98 86.07 Model on Word forms 96.94 88.89 96.00 86.36 Additionally, we have evaluated ratio of various errors in morpheme segmen- tation, depending on wrong boundaries between morphemes of various types, the results are presented in Table 5. In both models under comparison, the most frequent errors are related with wrong boundaries between roots and suf- fixes, almost half of the errors (column ROOT-SUFF in Table 5). Another types of frequent errors are wrong recognition of boundary between prefix and root (PREF-ROOT) and erroneous segmentation of successive roots (ROOT-ROOT) or suffixes (SUFF-SUFF). Below we present some examples of these types. In general, the presented statistics of errors are about the same, with rare errors of segmenting word endings. – Root and suffix(ROOT-SUFF) – for verb перетлевать, the incorrectly seg- mented word form пере:PREF/тл:ROOT/е:SUFF/ва:SUFF/ешь:END in- stead of correct пере:PREF/тле:ROOT/ва:SUFF/ешь:END; – Prefix and root (PREF-ROOT) – for adjective подоблачный, erroneous под :ROOT/о:PREF/блач:ROOT/н:SUFF/ою:END instead of the correct segmentation под :PREF/облач:ROOT/н:SUFF/ою:END; 10 Elena Bolshakova and Alexander Sapin – Successive roots and suffixes (ROOT-ROOT, SUFF-SUFF) – for adjective трегубный, instead of correct тр:ROOT/е:LINK/губ :ROOT/н:SUFF/ого:END, wrong segmentation variant: тре:ROOT/губ :ROOT/н:SUFF/ого:END. Table 5. Types of errors in morpheme segmentation (%) PREF- PREF- ROOT- ROOT- SUFF- SUFF- ROOT- Model Other PREF ROOT ROOT SUFF SUFF END END Hybrid 0.06 26.52 10.46 51.36 10.3 0.61 0.15 0.54 On Word forms 0.06 27.42 8.0 49.02 10.3 3.33 0.91 0.96 7 Conclusion and Future Work We have developed and evaluated two models of morpheme segmentation with classification, which were proposed specifically for word forms and are impor- tant for morphologically rich and highly-inflective languages, such as Russian. The first model is purely supervised and built on the augmented dataset with labeling of constituent morphs, the second is the hybrid one combining both the supervised model based on lemmas and rules for segmenting word forms. For augmentation of existing dataset with labeled Russian lemmas we have created the rule-based procedure generating segmented word forms. The quality of the developed models turned out to be comparable, and the model based on the augmented dataset is slightly better in word-level accuracy. This means, that both models can be used in various NLP experiments with Russian text. At the same time, the choice of the model may depend on its computational complexity important in particular applications. For some applied tasks, a three-layer CNN model instead of our four-layers CNN (as a core of the hybrid model) is more preferred, as it is faster to train and takes less memory. Our future work implies: – To resolve some inconsistencies and errors in the original Tikhonov’s dataset, which have been observed while experimenting with it, in order to increase the quality of the built models for word forms; – To elaborate additional segmenting rules for some unconsidered cases of word forms, such improvement of our augmentation procedure may be useful not only for improving the morpheme segmentation models, but also for other tasks. References 1. Bocharov, V., Bichineva, S., Granovsky, D., Ostapuk, N., Stepanova, M.: Quality assurance tools in the OpenCorpora project. In.: Computational Linguistics and Intelligent Technologies: Papers from the Annual Int. Conference ”Dialogue 2011”, Issue 10, 101–109, Moscow (2011). 11 2. Botha, J., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of the 31th International Conference on Ma- chine Learning (ICML), 1899–1907 (2014). 3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with sub- word information. Transactions of the Association for Computational Linguistics, 5, 135–146 (2017). 4. Bolshakova, E., Sapin, A.: Comparing models of morpheme analysis for Russian words based on machine learning: In.: Computational Linguistics and Intellectual Technologies: Proc. of the Int. Conference ”Dialogue 2019”, Moscow, RGGU (2019). 5. Bolshakova, E., Sapin, A.: Bi-LSTM Model for Morpheme Segmentation of Russian Words. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. Proceedings of the Int. Conference AINL 2019, CCIS, 1119, 151–160. Springer, Cham (2019). 6. Chollet, F.: Keras: Deep learning library for theano and tensorflow, https://keras.io/, last accessed 2020/12/9 7. Çöltekin, Ç.: Improving Successor Variety for Morphological Segmentation. Lot Oc- casional Series, 16, 13–28. University of Groningen (2010). 8. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and mor- phology learning, ACM Transactions on Speech and Language Processing, 4(1), Article 3 (2007). 9. Harris, S. Zellig: Morpheme boundaries within words: Report on a computer test. Transformations and Discourse Analysis Papers, 73, 68–77 (1967). 10. Lango, M., Žabokrtský, Z., Ševčı́ková, M.: Semi-automatic construction of word-formation networks. Language Resources & Evaluation (2020). https://doi.org/10.1007/s10579-019-09484-2 11. Ruokolainen, T. et al. : Painless semi-supervised morphological segmentation using conditional random fields. In: Proceedings of the 14th Conference of the European Chapter of the ACL, Short Papers, 84–89 (2014). 12. Smit, P., Virpioja, S., Gronroos, S., Kurimo, M.: Morfessor 2.0: Toolkit for sta- tistical morphological segmentation. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the ACL, Gothenburg, 21–24 (2014). 13. Sorokin, A., Kravtsova, A.: Deep Convolution Networks for Supervised Morpheme Segmentation of Russian Language. In: Ustalov, D. et al.(eds) Artificial Intelligence and Natural Language. Proc. of the Int. Conference AINL 2018, CCIS, 930, 3–10. Springer, Cham (2018). 14. Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th Int. Conference on Neural Information Pro- cessing Systems, 2, 3104–3112 (2014). 15. Tikhonov, A.N.: Word Formation Dictionary of Russian language. Moscow, Russkij Yazyk Publ. (1990). 16. Zaliznjak, A.A.: Grammatical dictionary of Russian: Inflection. Moscow, Russkij Yazyk Publ. (1977).