<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Normalization of Early Modern Ukrainian in GRAC: the Case of Lesia Ukrainka&apos;s Works 1</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Maria</forename><surname>Shvedova</surname></persName>
							<email>mariia.o.shvedova@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>Bandera Str., 12</addrLine>
									<postCode>79000</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nataliia</forename><surname>Prydvorova</surname></persName>
							<email>nataliia.prydvorova.mflpl.2021@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>Bandera Str., 12</addrLine>
									<postCode>79000</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ilona</forename><surname>Skibina</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>Bandera Str., 12</addrLine>
									<postCode>79000</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Normalization of Early Modern Ukrainian in GRAC: the Case of Lesia Ukrainka&apos;s Works 1</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C0C00058C12DDB0CB4E373FB83FDDA91</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Ukrainian language</term>
					<term>orthography</term>
					<term>normalization</term>
					<term>rule-based annotation</term>
					<term>idiolect</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The paper deals with the representation of the oeuvre by Lesia Ukrainka  in the General Regionally Annotated Corpus of Ukrainian (GRAC). The poet's texts offer numerous challenges with regard to orthographical and linguistic variation, and existing editions feature distinct strategies with regard to their normalization, often contradictory and incoherent. The authors propose different rule-based patterns that enable more efficient processing of such texts within the corpus. The approach is relevant not only for Lesia's works but for a wider range of Ukrainian texts of the period characterized by similar features.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>GRAC is a reference corpus of the modern Ukrainian language <ref type="bibr" target="#b0">[1]</ref>. According to tradition, the history of the modern Ukrainian language starts from the poet and dramatist Ivan Kotlyarevsky, that is, from the late 18th -early 19th centuries. However the majority of GRAC is composed of contemporary Ukrainian texts. One of the reasons for this disbalance is the imperfection of available tools for processing Ukrainian texts in their historical spellings. Ukrainian orthography until 1928 had no unified standard. In particular many variants of spelling existed in the 19th century. It is because of spelling differences that the system of morphological analysis used in GRAC leaves many unrecognized words and grammatical forms while processing such texts. This system is based on the VESUM electronic dictionary of Ukrainian <ref type="bibr" target="#b1">[2]</ref>.</p><p>For this reason, much of the 19th-and early 20th-century texts have been added to GRAC in a modernized orthography according to later publications rather than in their original spelling. But using such editions in the corpus is also problematic because publication of historical texts, particularly those of the Soviet period, were not only normalized in terms of spelling and grammar, but often considerably edited and censored <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. Hence these editions are not a comprehensive source of information on the linguistic period when the text in question appeared. In addition, there are many texts published in the 19th and early 20th centuries and never reprinted thereafter (e.g., newspaper texts, the bulk of magazine publications, book editions). Therefore, there is a need to connect additional tools to the existing system of morphological analysis in order to process the old spellings. So far, VESUM has been supplemented with rules for the recognition of old and non-standard spellings common in the corpus, regardless of spelling. Besides, a separate module was created for the recognition of zhelekhivka (the spelling used in Western Ukraine in the 1880s-1920s), where several basic rules for processing texts written in this system were additionally applied <ref type="bibr" target="#b3">[4]</ref>.</p><p>Adding non-standard spellings and grammatical forms to the common dictionary is not always a good solution, because it can increase grammatical homonymy. For example -the token ма in older texts is usually a verb: Хто дба, той ма (Номис, Українські приказки, прислів'я і таке інше, 1864), whereas in modern texts it is most often a colloquial form of address to one's mother (a vocative): -Ма, вибач, я не знав… <ref type="bibr">(Ірен Роздобудько, Арсен, 2012)</ref>. To resolve such ambiguity one needs separate modules within the grammar dictionary for separate groups of texts.</p><p>This paper presents the first experience of creating an additional VESUM module for processing an author's corpus of texts. This module preserves multiple cases of old and individual spelling. It consists of normalization algorithms used in lemmatization of non-standard tokens that do not affect the appearance of the text in the corpus. Non-standard spelling and grammatical variants not covered by these algorithms were added to the general dictionary, provided that they were frequent in the corpus texts. Thus the accuracy of morphological annotation in the texts by Lesia <ref type="bibr">Ukrainka (1871</ref><ref type="bibr">Ukrainka ( -1913) )</ref> has been slightly improved.</p><p>The first part of the paper deals with the characteristic features of the spelling of Lesia Ukrainka and with textual principles in different editions of her oeuvre. The second part includes analysis of how her spelling is rendered in the latest complete collection of works <ref type="bibr" target="#b4">[5]</ref>. The third part of the article is dedicated to the issues behind the morphological analysis of these texts. Rules are proposed to improve the recognition of non-standard words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Characteristic of the Lesia Ukrainka's spelling</head><p>The main critical editions of Lesia Ukrainka's works give different versions of the texts. The most standardized one, and also the closest to modern spelling norms, is the academic edition of the 1970s in 12 volumes <ref type="bibr" target="#b5">[6]</ref>, which used to be a standard edition for a long time.</p><p>Many more specific features of Lesia Ukrainka's language are preserved in modern editions <ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b4">5]</ref>, although the principles of textual normalization in these series are somewhat different. In particular, the 2016-2018 edition of Letters unified the spelling of borrowings, applying the "rule of nine" present in the contemporary norm: сюрпрізсюрприз, цітатацитата, діктуватидиктувати, etc. <ref type="bibr" target="#b9">[10]</ref>. In CW-14 the author's (variant) spelling of such words is preserved. On the other hand, the editors of <ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref> preserve the orthographic variants that do not affect the pronunciation (e.g., пьять, моеї, такоі, тут таки, було-б, де-що, щож), while in CW-14 such cases are rendered in the modern spelling.</p><p>Here is an example of a comparison between the text of Lesia Ukrainka's letter of 6 July 1889 to her mother in CW-12 and CW-14. We can see that CW-12 uses relatively stronger standardized spelling (такій -такий), grammatical forms (світа -світу), lexical variants (куповать -купувать, нанять найнять, завдаткузадатку, тількотільки). In the CW-12 edition multiple passages were omitted due to censorship. The editors of <ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref> restore these and mark them in italics <ref type="bibr" target="#b6">[7]</ref>. Lesia Ukrainka's own spelling was not well-established, as noted, for many lexemes different spelling variants are attested. In addition, she used different spelling systems. For example, as noted in the commentary on CW-14, a draft of the poem "Одержима"/"Obsessed" was written in kuleshivka, and while rewriting of the original text, zhelekhivka was used <ref type="bibr" target="#b4">[5]</ref>. In her correspondence Lesia Ukrainka sometimes used dragomanivka <ref type="bibr" target="#b6">[7]</ref>.</p><p>In addition, Lesia Ukrainka added individual features to her spelling. The correspondence of Lesia Ukrainka with Ivan Franko in 1892, before the publication of her poetic collection "На крилах пісень"/"On the wings of songs", shows that the attitude to spelling was quite free at that time (not unlike the attitude to the author's punctuation nowadays). Lesia Ukrainka writes: "The word чі let it be, because we got used to it, that's our pronunciation", "I do not stand for the letter ї, but it is right just like other iotated letters: ю, я, є etc; I stand for true phonetic spelling (a radical one), but as I cannot use this one in print, then I should deal with that one somehow. After all, I am also opposed to that ї at some way, you can replace it with йі, or just -і, such as своі, Украіна, etc.", "The word слізьми really should be written with a soft sign, because it is pronounced that way. The word вьяне can be written either so, or вйане, but in no way вяне, for that, in fact, will get the Russian pronunciation. Инший let remain so too" <ref type="bibr" target="#b4">[5]</ref>. From this we see that the orthographic standard was not so strict, variations were allowed and discussed, and that it was very important for Lesia Ukrainka to reproduce accurately the everyday language. The language of Lesia Ukrainka reflected some orthoepic, grammatical, and lexical features of the Western Polissia dialect <ref type="bibr" target="#b9">[10]</ref>, and this not only in her works of fiction, where she used them for stylistic purposes, but, for example, in her letters as well. In the texts of CW-14 in many cases the original spelling variation of Lesia Ukrainka is preserved that reflected the pronunciation variants. In the texts of CW-14 the parallel use of orthoepic variants чи/чі, квартира/квартіра, стаття/статя, завжди/завжді, теперешній/теперішній, матерьял/матеріал, настілько/настільки, отель/готель, подлий/підлий, поворіт/поворот, почта/пошта, etc. is attested. In the texts of CW-12, as indicated in the notes to the publication of the letters, such cases, where the spelling of the author was not uniform, are rendered according to the modern norms <ref type="bibr" target="#b5">[6]</ref>.</p><p>There are also some pairs of lexical variants (absolute synonyms) in Lesia Ukrainka's texts. Some of these foci of variation did not survive into the contemporary literary language: бабушка/бабуся, іюнь/червень (and also other names of months), одкритка/листівка, случай/випадок, поводіння/поведінка, устроїтися/влаштуватися, etc. Only the second member of each pair is attested in the present-day standard.</p><p>Such features of Lesia Ukrainka's language are important not only for the study of her individual style but also as a sample of the Ukrainian language of this period. Thus, such variants as тілько, скілько, etc. that were used by Lesia Ukrainka but are now non-normative, were used in the language of her contemporaries, members of her family <ref type="bibr" target="#b10">[11]</ref>. The corpus of Lesia Ukrainka's texts marks a certain stage of formation of the norm. It is therefore valuable for the historical part of a general linguistic corpus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Text normalization in the 2021 edition</head><p>The 2021 edition (CW-14) was chosen for the GRAC corpus, as it reproduces the language of Lesia Ukrainka and her time more accurately, a feature that is important for a reference-type corpus. The texts of Lesia Ukrainka according to the CW-12 version, which had been contained in the corpus until version 13, in GRAC.v.14 (compiled 4.11.2021) were replaced by the texts from CW-14.</p><p>It is known that Lesia Ukrainka herself paid much attention to the written form of her phonetics and morphology, such as the palatalized pronunciation [ч]/[ch] in the word чі, non-palatalized adjective endings in the nominative plural (весняниї, молодиї), non-palatalized pronunciation of final hushing sibilants (між, хоч, ще ж), etc. <ref type="bibr" target="#b11">[12]</ref>.</p><p>All texts published in CW-14 were added to GRAC.v.14. The Lesia Ukrainka corpus consists of 1,225 texts and has a total size of almost 1.5 million tokens, of which about 1.3 million tokens of original works and 0.2 million tokens of translations; 0.56 million tokens of letters, 0.33 million tokens of poetry, 0.23 million tokens of fiction, 0.15 million tokens of journalism, 0.12 million tokens of historical studies, and 0.08 million tokens of folklore records (plays in verse and prose are counted as part of poetry and fiction respectively).</p><p>According to the editors' explanations in the preface to CW-14, the texts are rendered linguistically according to the following basic principle. In many cases, it is necessary and possible to adapt the texts to the modern spelling (e. g. yaryzhka and drahomanivka spelling systems are transliterated to the present-day conventions), but this procedure should not change the author's phonology, morphology, and lexicon. The lexicon was preserved as well as inflectional, derivational and phonetic variants that were habitual for the literary standard of that time and/or for the language of Lesia Ukrainka, irrespective of their status in the contemporary Ukrainian language. If the writer herself allowed different spellings of words even within a single text, then a uniform variant could be choseneither the one that has a greater frequency in the text or the one attested later within the same work; such cases and the editor's choice of a variant are specially commented <ref type="bibr" target="#b12">[13]</ref>.</p><p>So, the texts of Lesia Ukrainka in the edition of 2021 are partially normalized. But the principles of normalization chosen by the editors are not always clear. In some cases the editors are to choose a variant at the level of a single text (either the more frequent one or the one attested later), that is, at the level of the whole edition, the choice of such variants was not uniform and may have been made differently in different works.</p><p>We see that in the 2021 edition as a whole, the proclaimed normalization principles were not always implemented consistently. For example, according to the principles of the edition, the words spelt together, separately, and hyphenated, while this does not affect pronunciation, should be normalized. However spellings that are not conform to the modern standard are attested in the edition: Від змішаня двох різних критеріїв, релятивного й абсолютного, д. Ганкевич не може собі дати ради з псіхолоґією новітніх християн і нехристиян і де-далі все більше перемішує не тілько псіхолоґічні, а навіть хронолоґічні моменти. Так, з поводу того, що українці плачуть тепер <ref type="bibr">(1903 р.)</ref>  The treatment of the voiceless and voiced consonants, which, according to the editorial instructions, should be reproduced according to the modern norm, is not always consistently normalized:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>На душі стало лехко, так, як бувало після довгої молитви в нашій сільській церковці (Мендель Розенбаум • 1902 • Великдень у турмі • Леся Українка). Послухайте, сеньйоре де Маранья, я вас не встигла роспитати вчора (Леся Українка • 1911 • Камінний Господарь). Нехай ця безкрая надія непевна, але ж хиба роспач певніший? (Леся Українка • 1906 • Утопія в белетристиці).</head><p>The inconsistency revealed in Lesia Ukrainka's own spelling, as well as in the modern edition of the texts (2021) must be taken into account in the morphological analysis of the texts in the corpus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Rules proposed to improve the recognition of non-standard words</head><p>A system of morphological analysis based on the VESUM grammatical dictionary is used for morphological markup of the GRAC corpus <ref type="bibr" target="#b1">[2]</ref>. This system is designed for the modern standard, although it uses separate tools to lemmatize texts written using other orthographic systems (zhelekhivka, skrypnykivka, the Soviet spelling of 1933, the spelling of 1992, the modern spelling of diaspora, etc. <ref type="bibr" target="#b18">[18]</ref>) and some of their elements. Some of the frequent non-standard spellings are added to the dictionary, whereas others are recognized using dynamic tagging tools <ref type="bibr" target="#b3">[4]</ref>. While annotating the texts of the 19th -early 20th century, GRAC does not apply transliteration rules to all the texts before morphological markup, as this is the case for the historical corpus of Polish <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b15">15]</ref> or for the historical corpus of Russian <ref type="bibr" target="#b16">[16,</ref><ref type="bibr" target="#b17">17]</ref>. The morphological analysis program annotates the texts in their original spelling and applies the rules of dynamic tagging to the unrecognized words. The available tools (a dictionary containing the most common nonstandard variants, plus the rules of dynamic tagging) allow us to recognize some cases of nonstandard spelling preserved in the texts of the CW-14, such as the spelling of the letter ґ instead of the modern г (єґер, енерґія, орґія, реліґійний, телеґрама, фоноґраф, леґенда, траґедія, етноґрафічний, etc.), some words with the initial и-(инший, инакше), alternative forms of the nominal genitive singular with the ending -и (пам'яти, імени), phonetic treatment of borrowings that departs from the most frequent modern variant: with soft [l'] (сальон, кляса, плян, лямпа, клясовий, мельодія, пляц, Філярет, Голяндія, фільософічний, скарлятина, реклямувати, плятформа, парлямент, новеля, лявіна, кільо, кольоніст, демонольогія, деклямувати, баляда), words with -ія-, -'я-instead of normative -іа-(матеріял, матеріяльний, азіятський, азіят, варіянт, матер'ял, фіялка, спеціяльний, соціялізм, соціялістичний, соціяльний, індіянка, уніятський), with the final -тер instead of the normative -тр (міністер), derivational variants (роля, заля), as well as some other orthoepic, word-formation, grammatical and orthographic variants (ріжний, ріжниця, ратунок, житє, соняшний, претенсія, европейський, Европа, европеєць , струмент, мущина, сальоновий, бенькет, багацтво, альфабет; люде; нераз, неначеб, аби-який, їйбогу, etc.). Such cases are either added to the dictionary as alternatives (with the :alt tag) or fall under one of the rules of dynamic tagging (Andriy Rysin) described in <ref type="bibr" target="#b3">[4]</ref>. The system also recognizes and correctly lemmatizes abbreviations with full forms presented in square brackets which is traditional for critical editions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Се я пробую, чи не віддасть він мені тиї 200 p[ублів], що 5 літ тому взяв (Леся Українка • 1911 • Лист до О. П. Косач (сестри) 14 грудня 1911 р. Хоні. CW-14).</head><p>But, in addition, the texts of CW-14 retain some specific features of the language and spelling of Lesia Ukrainka, which are not recognized by the program of morphological analysis based on VESUM. There are 55 593 unrecognized tokens, or 4.89% of the total sized of the subcorpus, which far exceeds the number of words usually left unrecognized in the analysis of standard texts. An additional list of dynamic tagging rules was created to lemmatize some of these words, which are used only for the analysis of the Lesia Ukrainka corpus. The rules were formulated based on the analysis of the corpus of Lesia Ukrainka's texts from the 2021 edition (CW-14). They cover: orthographic and word-formation variants, grammatical variants (endings), common irregular variants for some word forms. The VESUM-based morphological analysis system already recognizes non-standard long forms of adjectives with -i-endings (for example: дрібнії adj:p:v_naz:compb:long) <ref type="bibr" target="#b1">[2]</ref>, so the suggested replacement will be sufficient for lemmatizing adjectives of this type (зелениї, дорогиї, добриї).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Orthographic and word-formation variants</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">.*рь =&gt; .*р</head><p>The soft sign after р on the end of masculine nouns in the initial form (царь, лікарь, лицарь, владарь, господарь, олтарь, крамарь, вихорь, пузирь, кобзарь, звірь, шинкарь, Цезарь, писарь, ліхтарь, etc.) and in other words after the final р: бурь, вірь, матірь, теперь. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Variant paradigms</head><p>In addition, when analyzing the texts of Lesia Ukrainka, whole variant paradigms with missing alternation in the root were found. They were added as variant paradigms to the main dictionary, hence they are not specific rules only for Lesia's corpus. These are such word forms:</p><p>[word="каміню|каміневі|камінем|каміні"] [word="річі|річей|річах|річами"] [word="зовуть|зовіть|зову|зови|зовеш"] [word="зоветься|зовуся|зовуться|зовусь"]</p><p>To a number of masculine nouns in the dictionary a non-standard variant of ending -a in the genitive singular was added, which is recorded in the corpus of Lesia Ukrainka texts: авторітета, виїзда, візіта, всесвіта, гонорара, діаґноза, діалоґа, дуета, журнала, закона, ідеала, клімата, конкурса, культа, курса, леґіона, луга, манускрипта, момента, мотіва, народа, овса, пансіона, потока, похода, престола, прецедента, приказа, прінціпа, проєкта, процеса, реферата, рода, романа, романса, сезона, сінедріона, скандала, сна, страха, суда, театра, текста, тома, трактата, тріклініума, урока, уступа, факта, фатума, хамсіна, хаоса, характера, храма, часа, шума.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Morphological variants of lexemes</head><p>A separate observation should be made on the nouns with the root -пис. Lesia Ukrainka used them in two versions, masculine (which is now the normative for them) and feminine: часопис/часопись. Feminine variants are recorded in the texts: часопись, рукопись, допись, літопись, правопись, запись, опись, житєпись. In the indirect cases for these words, the forms of both paradigms, masculine and feminine, are also recorded in the Lesia Ukrainka corpus. For example, the stem часописin the corpus of Lesia Ukrainka texts yields the following forms: часописі 23 часопись 20 часописях 4 часописів 3 часописи 3 часописей 1 часопис 1 These variants of the feminine gender of nouns with the root -пис, recorded in the corpus of texts of Lesia Ukrainka, are added to the dictionary as separate lemmas. (Революційна українська партія). 3. All proper names used in the corpus of Lesia Ukrainka at least twice -964 lexemes (The complete list of personal names including those recorded once exceeds 5,500).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Other groups of words suggested for being included into the main dictionary</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Prospects</head><p>The corpus is constantly being updated, in particular with old texts written using different spelling systems. The morphological analysis of the corpus uses a system designed for modern spelling, which is also actively updated with non-standard variants used in many older texts. But such updates do not cover the old spellings and their individual variants sufficiently (as in the case of Lesia Ukrainka). The experience of processing the Lesia Ukrainka corpus has shown that for such subcorpora with different spellings it is advisable to include additional rules to capture individual spelling and grammatical variants. For a closed subcorpus that will not be updated in the future, such as the complete works of an author, it is possible to perform morphological analysis even of all the word forms using manual processing. However, we must bear in mind that in an expanding large historical corpus the number of cases requiring the use of additional rules steadily increases which reduces the efficiency of this method.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Comparison of the text of Lesia Ukrainka's letter to her mother (July 6, 1889, Odesa) in CW-12 (underlined) and CW-14 (crossed out)</figDesc><graphic coords="3,94.25,72.00,406.13,264.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>1 .</head><label>1</label><figDesc>Old names of months of Latin origin: январь, февраль, март, апріль, май, іюнь, іюль, август, сентябрь, октябрь, ноябрь, декабрь 2. Old abbreviations: і т. и. (і таке инше), д. (добродій), гл. (глава), до Р. Х. (до Різдва Христового), до Хр. (до Христа), єв. (євангеліє), С.-Д. (соціал-демократи), Р. У. П.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>над недолею Бурів, д. Ганкевич питає, де вони були, коли царат давив геройську польську революцію, себ то 1863 р.? (Френсіс Артур Фегі • 1906 • Справа ірландськоі мови • пер. Леся Українка). Коли російські революціонери часом користають з імення, пророчих візий та де-яких афорізмів Толстого, то се така сама «іронія історії», як і те, що політик соціалдемократ бере собі motto у безполітичного анархіста часів першого християнства, та щей називає того «непротивленця» якобінцем! (Леся Українка • 1903 • Замітки з приводу статі «Політика і етика»). The spelling of the soft sign and apostrophe is not always normalized either: Та людина, що візьме на себе се завданє, почне з того, що відкине при перегляді все сьвітське, театральні твори, нереліґійну поезию, політичну історию і т. і. (Моріс Верн • 1894 • Біблія або книги Старого Завіта • пер. Леся Українка). Здумайте, до чого мало було часу -я тілько вчора вперше могла заграти як годиться! хоч фортепьяно вже з тиждень як привезене (Леся Українка • 1899 • Лист до О. Ю. Кобилянської 16-17 жовтня 1899 р. Київ).</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>The result of the morphological analysis system based on VESUM</figDesc><table><row><cell>Texts of Lesia Ukrainka from</cell><cell>Modern standard texts</cell></row><row><cell>the 2021 edition</cell><cell></cell></row><row><cell>Known: 1084367, unknown:</cell><cell>Known: 646616478, unknown:</cell></row><row><cell>55593, 4.9%</cell><cell>12074958, 1.8%</cell></row><row><cell>Known unique: 95709,</cell><cell>Known unique: 2791389,</cell></row><row><cell>unknown unique: 23460,</cell><cell>unknown unique: 3466095,</cell></row><row><cell>19.7%</cell><cell>55.4%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>1. .*йі.* =&gt; .*ї.* Although the editors have generally normalized the old orthographic variants without affecting pronunciation, some texts retain spellings with -йі-(йіх, йім, йій, свойій,</figDesc><table><row><cell>звичайів, Єврейі, etc.)</cell></row><row><cell>Similarly:</cell></row><row><cell>.*іі.* =&gt; .*ії.*</cell></row><row><cell>(Сиріі, націі, історіі, Данііл, Єзекііл, Манассіін, etc.)</cell></row><row><cell>.*оі.* =&gt; .*ої.*</cell></row><row><cell>(своіх, божоі, цілоі, римськоі, малоі, давньоі, історичноі, etc.)</cell></row><row><cell>2. .*і.* =&gt; .*и.*</cell></row><row><cell>In many words the author's spelling is preserved with the і letter in place of the standard и</cell></row><row><cell>(чі, завжді, назавжді, звідсі, почасті, чотирі, дінастія, стіль, асірійський, Тіфліс,</cell></row><row><cell>цівілізація, історік, крітік, сімпатія, квартіра, тіф, збірання, режім, крівавий,</cell></row><row><cell>вмірати, блакітний, кредітор, трівога, мотів, умірати, etc.).</cell></row><row><cell>3. .*иї =&gt; .</cell></row></table><note>*ії Endings of adjectives in the nominative plural (e.g, тиї, любиї, чорниї, білиї, ясниї, золотиї, молодиї, весняниї, німиї, темниї, срібниї, святиї, палкиї, малиї, крівавиї, зелениї, дорогиї, добриї, високиї, буйниї, широкиї, стариї, новиї, живиї, ворожиї, чудовиї, тихиї, таємниї, страшниї, смутниї, рясниї, нічниї, непевниї, мудриї, людськиї, жовтиї) and of some nouns (e.g, нациї, партиї, орґанізациї, Франциї, цівілізациї, фікциї, фантазиї, рациї, пунктуациї).</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>The result of the morphological analysis of Lesia Ukrainka texts from the 2021 edition</figDesc><table><row><cell>5. .*ійш.* =&gt; .*іш.*</cell></row><row><cell>Adjectives and deadjectival adverbs with the suffix -ійш (пізнійше, ранійше, скорійше,</cell></row><row><cell>найчастійше, міцнійше, труднійше, простійше, певнійше, найяснійший, докладнійше,</cell></row><row><cell>ощаднійше, найстрашнійший, найсвятійший, найпотрібнійший, найвірнійший,</cell></row><row><cell>цікавійший, пильнійший, новійший, найславнійший, найповнійший, найміцнійший,</cell></row><row><cell>найвидатнійший, etc.).</cell></row><row><cell>6. .*стн.* =&gt; .*сн.*</cell></row><row><cell>Words with the -стн-combination: (перстні, первістний, користно, намістник,</cell></row><row><cell>безучастно, устний, розпустний, пристрастно, зловістний, честний, хрестний,</cell></row><row><cell>провістник, непричастний, ненавистний, напастник, etc.).</cell></row><row><cell>7. и.* =&gt; і.*</cell></row><row><cell>Words with initial и-(именно, искра, иньший, имення, инак, император, инде, идолянин,</cell></row><row><cell>испанський, играшки, etc.)</cell></row><row><cell>8. .*кілько=&gt;.*кільки</cell></row><row><cell>.*тілько=&gt;.*тільки</cell></row><row><cell>Words кілько, тілько and their derivates (тілько, скілько, стілько, наскілько, настілько,</cell></row><row><cell>оскілько, остілько, ніскілько, хтозна-скілько, скілько-небудь)</cell></row><row><cell>Such variants with the ending -o were characteristic for Lesia Ukrainka; according to the</cell></row><row><cell>research of S. Bohdan variants in -o quantitatively prevail in the corpus of her letters:</cell></row><row><cell>стілько -стільки 131:6, настілько -настільки 53:5, скілько -скільки 178:6, наскілько</cell></row><row><cell>-наскільки 59:2 [11].</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgements</head><p>Andriy Rysin for technical guidance and support. Yurii Hromyk, Dmytro Sichinava for the professional advice.</p><p>Thanks to the master's students of Lviv Polytechnic National University who helped to prepare and annotate the Lesia Ukrainka's texts for the corpus: Kateryna Sukhar, Andriana Hevalo, Anastasiia Karavan, Olena Horobets, Oksana Kurtiak, Yuliia Kucher, Maryna Lupain, Iryna Ortynska, Diana Rokytska.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Shvedova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Waldenfels</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yarygin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rysin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Starko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nikolajenko</surname></persName>
		</author>
		<ptr target="http://uacorpus.org" />
		<title level="m">GRAC: General Regionally Annotated Corpus of Ukrainian</title>
				<meeting><address><addrLine>Kyiv; Lviv, Jena</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017-2022</date>
		</imprint>
	</monogr>
	<note>Electronic resource</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Velykyi elektronnyi slovnyk ukrainskoi movy (VESUM) yak zasib NLP dlia ukrainskoi movy [Large Dictionary of Ukrainian (VESUM) as an NLP Tool for the Ukrainian Language</title>
		<author>
			<persName><forename type="first">V</forename><surname>Starko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rysin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Halaktyka Slova, Vydavnytstvo dim Dmytra Buraho</title>
				<imprint>
			<publisher>Dmytro Burago Publishing House</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="135" to="141" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Popravliuvanyi Franko</title>
		<author>
			<persName><forename type="first">O</forename><surname>Drul</surname></persName>
		</author>
		<ptr target="https://zbruc.eu/node/35977" />
	</analytic>
	<monogr>
		<title level="m">Corrected Franko</title>
				<meeting><address><addrLine>Zbruch</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Handling of Nonstandard Spelling in GRAC</title>
		<author>
			<persName><forename type="first">M</forename><surname>Shvedova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rysin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Starko</surname></persName>
		</author>
		<idno type="DOI">10.1109/CSIT52700.2021.9648834</idno>
		<ptr target="https://ieeexplore.ieee.org/document/9648834" />
	</analytic>
	<monogr>
		<title level="m">IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT)</title>
				<meeting><address><addrLine>Lviv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021. Sept. 22-25 2021</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="105" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Povne akademichne zibrannia tvoriv: u chotyrnadtsiaty tomakh</title>
		<author>
			<persName><forename type="first">Cw- ; Lesia</forename><surname>Ukrainka</surname></persName>
		</author>
		<ptr target="https://ubi.org.ua/uk/activity/zibrannya-tvoriv-lesi-ukra-nki-u-14-i-tomah" />
	</analytic>
	<monogr>
		<title level="m">Volynskyi natsionalnyi universytet imeni Lesi Ukrainky</title>
				<meeting><address><addrLine>Lutsk</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
		<respStmt>
			<orgName>Lesya Ukrainka Volyn National University</orgName>
		</respStmt>
	</monogr>
	<note>Full Collection of Works in 14 volumes</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><surname>Cw-</surname></persName>
		</author>
		<title level="m">Collection of Works in 12 volumes</title>
				<editor>
			<persName><forename type="first">Lesia</forename><surname>Ukrainka</surname></persName>
		</editor>
		<meeting><address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="1975" to="1979" />
		</imprint>
	</monogr>
	<note>Naukova dumka</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Lesia</forename><surname>Ukrainka</surname></persName>
		</author>
		<editor>V. A. Prokip (Savchuk</editor>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">Lysty</biblScope>
			<biblScope unit="page" from="1876" to="1897" />
			<pubPlace>Komora, Kyiv</pubPlace>
		</imprint>
	</monogr>
	<note>Letters: 1876-1897</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Lysty</title>
		<author>
			<persName><forename type="first">Lesia</forename><surname>Ukrainka</surname></persName>
		</author>
		<editor>V. A. Prokip (Savchuk</editor>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1898" to="1902" />
			<pubPlace>Komora, Kyiv</pubPlace>
		</imprint>
	</monogr>
	<note>Letters: 1898-1902</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Lysty</title>
		<author>
			<persName><forename type="first">Lesia</forename><surname>Ukrainka</surname></persName>
		</author>
		<editor>V. A. Prokip (Savchuk</editor>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1898" to="1902" />
			<pubPlace>Komora, Kyiv</pubPlace>
		</imprint>
	</monogr>
	<note>Letters: 1898-1902</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Varianty ukrainskoi literaturnoi movy [Variants of the Ukrainian literary language</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Matviias</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Instytut ukrainskoi movy NAN Ukrainy</title>
				<meeting><address><addrLine>Kyiiv</addrLine></address></meeting>
		<imprint>
			<publisher>NASU Institute of Ukrainian Language</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page">162</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Pro «tilko» i ne tilky v movotvorchosti Lesi Ukrainky: u poshukakh idiostyliu [About &apos;til&apos;ko&apos; and not only in Lesia Ukrainka&apos;s linguistic creativity: in search of idiostyle</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bohdan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Kultura slova 93</title>
		<imprint>
			<biblScope unit="page" from="100" to="114" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>The culture of word 93</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Pro zberezhennia fonetychnoi systemy Lesi Ukrainky u maibutnikh publikatsiiakh yii tvoriv [Preserving Lesya Ukrainka&apos;s phonetic system in the upcoming publications of the poetess&apos; works</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Miroshnychenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">volume 8 of Spadshchyna: Literaturne dzhereloznavstvo, tekstolohiia</title>
				<meeting><address><addrLine>Laurus, Kyiiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="14" to="21" />
		</imprint>
	</monogr>
	<note>Heritage: Source Studies in Literature. Textology</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m">volume 1 of Lesia Ukrainka: Povne akademichne zibrannia tvoriv u chotyrnadtsiaty tomakh [Full Collection of Lesia Ukrainka&apos;s Works in 14 volumes</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Aheieva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Yu</forename><surname>Hromyk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Zabuzhko</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Konstankevych</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Moklytsia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Romanov</surname></persName>
		</editor>
		<meeting><address><addrLine>Lutsk</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1896">1896-1906. 2021</date>
			<biblScope unit="page" from="7" to="10" />
		</imprint>
		<respStmt>
			<orgName>Lesya Ukrainka Volyn National University</orgName>
		</respStmt>
	</monogr>
	<note>Volynskyi natsionalnyi universytet imeni Lesi Ukrainky</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kieraś</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Komosińska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Modrzejewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Woliński</surname></persName>
		</author>
		<editor>Ekštein, K., Matoušek, V.</editor>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<idno type="DOI">10.1007/978-3-319-64206-2_35</idno>
		<ptr target="https://doi.org/10.1007/978-3-319-64206-2_35" />
	</analytic>
	<monogr>
		<title level="m">Speech, and Dialogue. TSD 2017</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<biblScope unit="volume">10415</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The Electronic Corpus of 17th and 18th-century Polish Texts</title>
		<author>
			<persName><forename type="first">W</forename><surname>Gruszczyński</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Adamiec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bronikowska</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10579-021-09549-1</idno>
		<ptr target="https://doi.org/10.1007/s10579-021-09549-1" />
	</analytic>
	<monogr>
		<title level="j">Lang Resources &amp; Evaluation</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="309" to="332" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Problemy i metody analiza russkih tekstov v doreformennoy orfografii [Problems and methods of analysis of Russian texts in pre-reform orthography], series 11 of Kompyuternaya lingvistika i intellektualnyie tehnologii: Po materialam ezhegodnoy Mezhdunarodnoy konferentsii «Dialog</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Poliakov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Technologies: based on the materials of the annual International Conference &quot;Dialogue</title>
				<meeting><address><addrLine>Moscow</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="536" to="547" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Grammaticheskij slovar&apos; dlja avtomaticheskogo analiza tekstov XVIII-XIX veka: pervye rezul&apos;taty [A grammar dictionary for automatic analysis of the XVIII-XIXth century texts: first results</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Polyakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">O</forename><surname>Savchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V</forename><surname>Sitchinava</surname></persName>
		</author>
		<ptr target="https://www.dialog-21.ru/media/1308/dialog_2013_vol1web.pdf" />
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Technologies: Based on the materials of the annual International Conference &quot;Dialogue</title>
				<meeting><address><addrLine>Bekasovo; Bekasovo; Moscow</addrLine></address></meeting>
		<imprint>
			<publisher>RSHU publishing house</publisher>
			<date type="published" when="2013-05-02">29 maja -2 ijunja 2013. May 29 -June 2, 2013. 2013</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="644" to="666" />
		</imprint>
	</monogr>
	<note>Osnovnaja programma konferencii [The main conference program</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Nimchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V</forename><surname>Puriaieva</surname></persName>
		</author>
		<title level="m">Istoriia ukrainskoho pravopysu: XVI-XX stolittia [The History of Ukrainian Spelling: 16th to 20th Century</title>
				<meeting><address><addrLine>Naukova Dumka, Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
