<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Viktor</forename><surname>Shynkarenko</surname></persName>
							<email>shinkarenko_vi@ua.fm</email>
							<affiliation key="aff0">
								<orgName type="institution">Dnipro National University of Railway Transport named after academician V</orgName>
								<address>
									<addrLine>Lazaryan 2, аcademician Lazaryan str</addrLine>
									<postCode>49010</postCode>
									<settlement>Dnipro</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Inna</forename><surname>Demidovich</surname></persName>
							<email>2019demidovichinn@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Dnipro National University of Railway Transport named after academician V</orgName>
								<address>
									<addrLine>Lazaryan 2, аcademician Lazaryan str</addrLine>
									<postCode>49010</postCode>
									<settlement>Dnipro</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9DE9C6A0AC9B807F87E034159BB933E4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T11:54+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>natural language texts</term>
					<term>recurrence analysis</term>
					<term>frequency analysis</term>
					<term>text complexity</term>
					<term>text authorship</term>
					<term>classification</term>
					<term>genetic algorithm</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work we try to improve the results of texts and their fragments attribution using the classification method of the least distance in Euclidean space of images, by selecting weights for each of the image measures. For weights determination the genetic algorithm was used. Images are formed using statistical and modified recurrent analysis and the text complexity indicators. We will try to identify the effectiveness for each of them. It was found that this method usage improves the efficiency of the text attribution and the reliability of authorship determination of the texts from the control sample reaches 80-91%.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The authorship determination of natural language texts is quite a relevant topic <ref type="bibr" target="#b0">[1]</ref>. A technique that allows determining the authorship of a particular text with sufficient reliability can be widely used in many areas such as education, jurisprudence, literary criticism, etc. Despite a large number of researches <ref type="bibr" target="#b0">[1]</ref>, there is no way to determine the authorship of even literary texts with a 100% guarantee.</p><p>In this work, it is planned to increase the authorship determination reliability for natural language literary texts. We plan to select the text attributes that have the greatest information content among the selected indicators, that according to other studies adequately reflect the author's syllable and style. In addition, the effectiveness of modified for working with texts recurrent analysis to determine the author's style, and, as a consequence, the authorship of the text will be investigated.</p><p>To determine the ponderability for each of the indicators presented, weights will be used that are tuned using a genetic algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related works</head><p>One of the problems authorship determination methodologies of the literary texts is the complexity of choosing text parameters that will determine the author's style <ref type="bibr" target="#b1">[2]</ref>.</p><p>Various methods of text attribution have been used, but results with the greatest accuracy are obtained by using the text's character frequency <ref type="bibr" target="#b2">[3]</ref>, N-gram <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref> and their variations, as well as the frequency of words (all of them or any of their separate categories <ref type="bibr" target="#b5">[6]</ref>) and word's parts <ref type="bibr" target="#b6">[7]</ref>. The previously cited studies show that the usage of N-grams allows reflecting the personal author style These techniques were widely used to determine the authorship of texts that were written in various languages and various topics <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b11">12]</ref> and have shown their effectiveness. Similar studies also were done for texts in the Ukrainian language <ref type="bibr" target="#b10">[11]</ref>.</p><p>In this work, the most informative indicators from various classes, reflecting the author's style, are distinguished to determine the authorship of literary texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The classes of indicators in the text's image</head><p>The text's image is a vector that contains the indicators measurements of the following classes: the frequency analysis, the text perception complexity and modified recurrence analysis for natural language texts and it is used for authorship determination. Let's consider each class of these indicators.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Frequency analysis of texts</head><p>Frequency analysis is one of the most common text analysis methods. For many languages and a large number of authors, linguists compiled a frequency dictionary of the author's language or his single texts <ref type="bibr" target="#b11">[12]</ref><ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref>. The frequency of a single character occurrence calculation for a specific text is the basis of such text processing <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref>. Based on the obtained data, we can conclude that each text will have its own individual frequency structure. However, during the analysis of the sufficiently large text, it was found that the occurrence frequency of a specific letter in the alphabet will be very close to the frequency of its occurrence in any text of various authors and to the frequency of its use in the language. It will be not so much a characteristic of the author's style or text as the language that the author use <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>.</p><p>To solve this problem and obtain more reliable information that could characterize the text, it was decided to use larger structures than the letters of the alphabet -N-grams <ref type="bibr" target="#b19">[20]</ref>.</p><p>This method arose relatively recently and is often used to detect plagiarism <ref type="bibr" target="#b20">[21]</ref>. An N-gram is a sequence of characters in a text with length N. Depending on the value of N the occurrence frequency of individual words or phrases will be analyzed in the text. Using this method of the text frequency analysis, it becomes possible to distinguish texts by the author's style, since it is formed through not least the service parts of speech, inserted words and structures that are inherent in the particular author's style <ref type="bibr" target="#b21">[22]</ref>.</p><p>In previous studies <ref type="bibr" target="#b31">[32]</ref>, we get the best result in the authorship determination of the text using 4grams. In this work, to compare the effectiveness of the methods, 1-and 4-gram frequency analysis was performed.</p><p>We formed the text's image with the frequency of each character in the text in the case of 1-gram (letters) and 100 of the most frequently encountered 4-gram.</p><p>Let's give an example of the text's image formation using the work of T. Shevchenko "Saul", the work contains 2148 analyzed symbols. The diagram presents the text analysis results (Figure <ref type="figure">1</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1: Text character frequency chart</head><p>The data are arranged in alphabetical order and allow you to visually evaluate the occurrence frequency of each letter, typical for a given text.</p><p>To assess the uniqueness of the obtained values and their applicability for attribution, we should compare the obtained data with information on the average occurrence frequency of the letters in the Ukrainian language. According to a study <ref type="bibr" target="#b30">[31]</ref>, the most frequently encountered letters of the Ukrainian language are О, А, Н. Letters И, Т, В, Е, Р, І, С, К, М will have the secondary frequency of propagation.</p><p>Letters А, О, И, Р, В, Н, І, Е, Л, С have the highest frequency of occurrences in the studied text "Saul" by T. Shevchenko. As you can see, there is a certain discrepancy with the average frequencies of letters in the language and in the studied work, which allows us to take them as one of the author's style characteristics.</p><p>The initial values in the vector of the text's image of the "Saul" by T. Shevchenko, include the frequency indicators (Figure <ref type="figure">1</ref>: Text character frequency chart): 𝑋 ′ =[0.1014, 0.0233, 0.0577, 0.0205, 0.0372, 0.0480, 0.0074, 0.00466, 0.0219, 0.0642, 0.0144, 0.0480, 0.0070, 0.0228, 0.0424, 0.0377, 0.0480, 0.0996, 0.0265, 0.0582, 0.0419, 0.0414, 0.0340, 0.0009, 0.0140, 0.0121, 0.0126, 0.0046, 0.0046, 0.0163, 0,0047, 0.0219, …]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">The text perception complexity indicators</head><p>The next characteristic of the author's text is its structural complexity and complexity of perception. There are a number of metrics that help determine the level of text difficultness. Among them are the number of sentences, words, syllables and letters in the text, as well as the average number of words, syllables, letters in sentences and words.</p><p>This analysis was carried out to assess the complexity of the perception of the English text <ref type="bibr" target="#b22">[23]</ref>, but it is also applicable to any other languages.</p><p>Using this method of text analysis, headings, subheadings and formulas are most often ignored, since they are not complete sentences.</p><p>This data also contains certain information about the author's writing style. However, such indicators consider the text complexity, but don't reflect its content and word order, based on this, these indicators don't have sufficient efficiency for analyzing the author's style on their own, but can be used in conjunction with other indicators.</p><p>Let's give an example of these indicators values according to the poem "Saul" by T. Shevchenko (Table <ref type="table" target="#tab_0">1</ref>). The text complexity indicators also include data of the words number with different lengths in the text. For the studied poem, these indicators will be as follows (Тable 2). In the studied text, the largest word consists of 15 letters; there are no words of 14 letters in the text (Table <ref type="table" target="#tab_1">2</ref>). Based on the analysis of all available texts, it was decided to consider words with a length of 22 characters.</p><p>There is the text's image vector of "Saul" by T. Shevchenko, supplemented with the presented meanings Table <ref type="table" target="#tab_1">2</ref>: 𝑋 ′ =[… 16, 12, 7, 15, 0.2, 13, 9.6, 6.1, 3, 3, 0.8, 4, 0.4, 0, 0.2, 0, 0, 0, 0, 0, 0, 0,…]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Modified recurrence analysis</head><p>The recurrence analysis is used to study time series and processes. We modified this type of analysis for its application in the processing of natural language texts. It is based on the quantitative analysis of RQA <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26,</ref><ref type="bibr" target="#b26">27]</ref> recurrence plot that was used by Zbilut J. P. and Webber Jr. C. <ref type="bibr" target="#b23">[24]</ref>.</p><p>The text is transforming into a time series to apply the recurrence analysis for text processing. The value of each point in the series is the occurrence frequency of the N-gram, and the advancement to the next N-gram is considered as the time unit. The usage of recurrence analysis allows to some extent count the microstructure of the text, its individual language <ref type="bibr" target="#b32">[33]</ref> and the author's style.</p><p>The resulting time series is a characteristic exclusively for this text and allows further research based on these data, including the construction of the phase space and the recurrence plot. There is the time series of the text of T. Shevchenko "Saul" (Figure <ref type="figure" target="#fig_0">2</ref>). According to the rules of constructing a recurrence plot, its size corresponds to the size of the text in characters and displays the number of repeated elements within the text. Based on the value of , the symbols in the processed text that have the same frequency or differ from each other less than 0.002 will be perceived as the same symbol and will be displayed as a filled point in the recurrence plot.</p><p>This plot displays repeating states at different moments in time (respectively, places in the text). The numbers of indicators are calculated using a recurrence chart:</p><p> the recurrence rate in the analysis of the text displays the total number of repetitions of each statistically close sequence of characters.</p><formula xml:id="formula_0">𝑅𝑅 = 1 𝐾 2 ∑ 𝑅 𝑖,𝑗 𝑛,𝜀 𝐾 𝑖,𝑗=1 ,<label>(1)</label></formula><p>where Knumber of the considered states, j i R , -i, j-th point of the recurrence plot,  -the recurrence threshold in i moment, nphase space dimension;  determinism expresses the lengths frequency distribution of l diagonal lines in the in the plot ) (l P  , K -the absolute number of such lines. This indicator displays the number of all sequences repetitions of statistically close N-grams with any length</p><formula xml:id="formula_1">𝐷𝐸𝑇 = ∑ 𝑙𝑃 𝜀 (𝑙) 𝐾 𝑙=𝑙 𝑚𝑖𝑛 ∑ 𝑅 𝑖,𝑗 𝑛,𝜀 𝐾 𝑖,𝑗 ; (2)</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head></head><p>the mean line length L, which for the text analyzing process, will display the average number of repeated N-gram sequences</p><formula xml:id="formula_2">𝐿 = ∑ 𝑙𝑃 𝜀 (𝑙) 𝐾 𝑙=𝑙 𝑚𝑖𝑛 ∑ 𝑃 𝜀 (𝑙) 𝐾 𝑙=𝑙 𝑚𝑖𝑛 ;<label>(3)</label></formula><p> divergence -the reciprocal of the diagonal structures maximum length</p><formula xml:id="formula_3">𝐷𝐼𝑉 = 1 𝑚𝑎𝑥({𝑙 𝑖 ;𝑖=1..𝐾 𝑙 }) ;<label>(4)</label></formula><p> entropy is an indicator of the diagonal structures frequency distribution, which for the text analyzing process, will display the frequency distribution repetitions of consecutive statistically close N-grams sequences</p><formula xml:id="formula_4">𝐸𝑁𝑇𝑅 = − ∑ 𝑝(𝑙) ln(𝑝) , 𝐾 𝑙=𝑙 𝑚𝑖𝑛 (5)</formula><p>where</p><formula xml:id="formula_5">𝑝(𝑙) = 𝑃 𝜀 (𝑙) ∑ 𝑃 𝜀 (𝑙) 𝐾 𝑙=𝑙 𝑚𝑖𝑛 ;<label>(6)</label></formula><p> laminarity displays the frequency distribution of the horizontal structures lengths on a recurrence plot and, for text analysis, expresses the frequency distribution repetitions of statistically close N-gram sequences</p><formula xml:id="formula_6">𝐿𝐴𝑀 = ∑ 𝑣𝑃 𝜀 (𝑣) 𝐾 𝑣=𝑣 𝑚𝑖𝑛 ∑ 𝑅 𝑖,𝑗 𝑛,𝜀 𝐾 𝑖,𝑗 ,<label>(7)</label></formula><p>where vthe length of the horizontal line on the plot ) (v P  ;  time trappingthe average length of horizontal structures. As part of text analysis -the average length of statistically close N-gram sequences repetitions</p><formula xml:id="formula_7">𝑇𝑇 = ∑ 𝑣𝑃 𝜀 (𝑣) 𝐾 𝑣=𝑣 𝑚𝑖𝑛 ∑ 𝑃 𝜀 (𝑣) 𝐾 𝑣=𝑣 𝑚𝑖𝑛 .<label>(8)</label></formula><p>For T. Shevchenko's text "Saul" this indicators have next values Table <ref type="table" target="#tab_2">3</ref>. And the text "Saul" by T. Shevchenko is put in correspondence with the text's imagevector 𝑋 ′ = [0.1014, 0.0233, 0.0577, 0.0205, 0.0372, 0.0480, 0.0074, 0.00466, 0.0219, 0.0642, 0.0144, 0.0480, 0.0070, 0.0228, 0.0424, 0.0377, 0.0480, 0.0996, 0.0265, 0.0582, 0.0419, 0.0414, 0.0340, 0.0009, 0.0140, 0.0121, 0.0126, 0.0046, 0.0046, 0.0163, 0,0047, 0.0219, 12.72, 26.03, 59.67, 2.25, 4.69, 16, 12, 7, 15, 0.2, 13, 9.6, 6.1, 3, 3, 0.8, 4, 0.4, 0, 0.2, 0, 0, 0, 0, 0, 0, 0, 0,019, 0.002, 0.111, 2.236, 0.6, 8.2, 2.275].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Method of text authorship determination 3.2.1. Minimum distance classification</head><p>To determine the authorship of the text, the theory of pattern recognition is used, or rather, its recognition method based on the minimum distance to the standard <ref type="bibr" target="#b32">[33]</ref>.</p><p>The essence of the well-known method <ref type="bibr" target="#b27">[28]</ref> is as follows.</p><p>There are 𝑀 classes of images 𝜔 1 , 𝜔 2 , … , 𝜔 𝑀 , each is associated with a specific author and an image 𝑋 𝑙 of the text, which authorship must be established. This text is known to belong to one of these authors.</p><p>The standards of images for each class are determined 𝑍 1 , 𝑍 2 , … , 𝑍 𝑀 . The standard of the class is a vector containing the average values of each indicator according to the author's texts in the training sample.</p><p>In the previous paper <ref type="bibr" target="#b32">[33]</ref>, it was assumed that the image 𝑋 𝑙 belongs to the class 𝜔 𝑖 (the text belongs to the i-th author) if 𝜌(𝑋 𝑙 , 𝑍 𝑖 ) &lt; 𝜌(𝑋 𝑙 , 𝑍 𝑗 ) for ∀𝑗 ≠ 𝑖, where 𝜌(𝑋, 𝑍)the distance between the images of 𝑋 and 𝑍 in Euclidean space.</p><p>Different indicators can have different units and scales. To solve this problem, the minimax normalization of each indicator in the vectors 𝑋 and 𝑍 is used.</p><p>The image of the text includes 66 indicators. The information content of each in the pattern recognition problem is different. In this regard, in order to increase the recognition efficiency, it was decided to use the weight of indicators. In this case, the fitness function has the form:</p><formula xml:id="formula_8">𝑑 𝑙𝑚 = ∑ 𝑤 𝑖 (𝑥 𝑖𝑙 − 𝑧 𝑖𝑚 ) 2 𝑁 𝑖=1 ,<label>(9)</label></formula><p>where i -indicator's number in the vector; linvestigated text number, l = 1..L; mauthor's standard number; withe weight of the i-th indicator; N -the number of indicators in the vector (the text's image); 𝑥 𝑖𝑙 , 𝑧 𝑖𝑘elements of vectors 𝑋 𝑙 and 𝑍 𝑘 .</p><p>We will assume that the image 𝑋 𝑙 belongs to the class 𝜔 𝑖 (the text belongs to the i-th author) if 𝑑(𝑋 𝑙 , 𝑍 𝑖 ) &lt; 𝑑(𝑋 𝑙 , 𝑍 𝑗 ) for ∀𝑗 ≠ 𝑖.</p><p>The task is to find such weights of the indicators w i , so that the recognition accuracy is the maximum value. A genetic algorithm was used to solve this problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">A genetic algorithm for determining the text indicators weights</head><p>Genetic algorithms are used to solve the optimizing problem of the multi parametric functions' value. All presented tasks are formed as functions that depend on a number of parameters, the global maximum or minimum of which will correspond to the solution of the problem.</p><p>The genetic algorithm idea is the organization of the evolutionary process to obtain the final optimal solution <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b33">34]</ref>. It retains biological terminology. Thus, a chromosome is a vector, each position of which is called a gene. Each such vector (individual) is characterized by a certain health function (fitness function). This function determines the quality of the presented solution. The optimization problem can be considered as the problem of finding an individual with the best health function. The search is based on the heredity mechanisms, variability, and the selection and is implemented using various genetic operations. Crossover is an operation in which two chromosomes exchange their parts. Mutation is a random change in one or more positions in a chromosome.</p><p>Working with a genetic algorithm, the initial population is usually generated randomly. The only criterion is a sufficient variety of individuals to avoid populations falling into the local extremum.</p><p>After the generation of the first generation, the genetic algorithm imitates the evolutionary process as a repetitive process of reproduction and mutation; the probability of an individual's participation in reproduction is directly proportional to its health. The result is a new population, and the old one dies, thus, the health function of all individuals from generation to generation, in average, improves. Subsequently, the process is repeated until the health function stops improving. As a result, individuals with the best health function indicator from the last generation are selected.</p><p>In this work, the genetic algorithm has the following characteristics:  fixed population size;  fixed length of genes;  proportional selection;  individuals for reproduction are selected among the best representatives of the population;  single point crossing;  descendants take the place of the previous population;  a fixed number of randomly generated individuals is added to each population to avoid population degeneration. The health (fitness) function determines the amount of correctly authorship determination for the texts from the training sample. The initial population is randomly generated. During the simulation of the evolutionary process, the selection of individuals for the next generation was carried out according to the following proportions: 34% of the parent individuals with the best indicators of the health function interbred with each other, 60% of the remaining parent individuals mutated randomly, 6% of the offspring individuals were generated randomly to eliminate the population degeneration. In the experiment, the sample size is fixed at 100 individuals in each generation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Training and control samples</head><p>In the experiment the symbol-by-symbol analysis and 4-gram analysis were done. The literary texts were selected for the training sample. This is based on a clear representation of the author's style and his personality, as well as reliable information about the authorship.</p><p>The training sample consists of 20 works by 11 different Ukrainian authors, and the control sample has 3 works written by the same authors in it. We took only prose works, up to 10 thousand characters or their parts of this size.</p><p>We took the following authors for the sample: 1 -I. Bahrianyi, 2 -A. Vyshnia, 3 -M. Vovchok, 4 -A. Dovzhenko, 5 -M. Kotsiubynskyi, 6 -H. Kvitka-Osnovianenko, 7 -P. Myrnyi, 8 -V. Nestaiko, 9 -V. Pidmohylnyi, 10 -I. Franko, 11 -M. Khvylovyi.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Chromosome formation</head><p>In the text's image in experiments with 1-grams, positions from 1 to 32 are occupied by the values of the letters frequency, arranged in alphabetical order (Figure <ref type="figure">1</ref>), places from 33 to 37 are taken by data on the complexity of the text (Table <ref type="table" target="#tab_0">1</ref>), from 38 to 59 the frequency of words with different lengths in the text (Table <ref type="table" target="#tab_1">2</ref>), from 60 to 66 -indicators of recurrence analysis (Table <ref type="table" target="#tab_2">3</ref>).</p><p>For example, the image of the work of T. Shevchenko "Saul": 𝑋 ′ =[0.1014, 0.0233, 0.0577, 0.0205, 0.0372, 0.0480, 0.0074, 0.00466, 0.0219, 0.0642, 0.0144, 0.0480, 0.0070, 0.0228, 0.0424, 0.0377, 0.0480, 0.0996, 0.0265, 0.0582, 0.0419, 0.0414, 0.0340, 0.0009, 0.0140, 0.0121, 0.0126, 0.0046, 0.0046, 0.0163, 0.0047, 0.0219, 12.72, 26.03, 59.67, 2.25, 4.69, 71, 55, 30, 68, 61, 60, 44, 28, 15, 14, 4, 4, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,019, 0.002, 0.111, 2.236, 0.6, 8.2, 2.275].</p><p>The formed chromosomes contain 67 elements each element weights in the vector-image of the text and the recurrence threshold weight . The initial chromosome values are set randomly.</p><p>In experiments with 4 grams, the 100 most common combinations were selected for analysis. In this case, the number of genes in the chromosome was increased to 135 and the first 100 places in the chromosome are taken by weights to the most common 4-grams, weights for the text complexity parameters are on 101-105 places, 106-127 contain weights for data on the number of words with different lengths in the text, the weights for recurrence indicators has 128-134 places and the last place is held by the weight for , respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Experiment results with 1-gram</head><p>The results of authorship determination using 1-gram are shown in Table <ref type="table">4</ref>, shaded cells -cases in which the author of a work from the control sample was identified correctly.</p><p>Working with 1-gram, 15 generations of chromosomes were formed. The calculated weights made it possible to improve the result of the text authorship determination four times more from 6 to 24 out of 33 works in total, which was an improvement from 18% to 80%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>The results of authorship determination with using weights and without them character by character</p><p>The real author of the text 0 During the experiment, 64 chromosomes were obtained, which gave the largest number of correctly identified authors. To obtain a more complete picture of the indicators significance, there are the intervals of the weights for each of the indicators and their average values (Table <ref type="table" target="#tab_3">5</ref>-Table <ref type="table" target="#tab_6">8</ref>). According to the data presented in Table <ref type="table" target="#tab_3">5</ref> among the weights of all chromosomes in the symbol frequency class, the letters Ш, Ф, Ї, Я, Є received the biggest weight.</p><p>It should be noted that these letters are not the most frequently encountered letters in the Ukrainian language, which allows us to consider their frequency as an informative characteristic of a particular text and the author's style in general. These indicators weights, except for the letter Я, the frequency also do not show significant fluctuations in values, which only confirm the earlier conclusion.</p><p>The frequencies of the letters Е, Л, Д, Т, М are the letters whose frequencies have the secondary weight among all. And the letters Я, З, О, Т, І have the greatest spread in the weights values for the entire time of the experiment. According to the data in Table <ref type="table" target="#tab_5">7</ref> the TT a time trapping indicator has the greatest weight. This indicator represents the average length of statistically close N-gram sequences repetitions in the text. However, its weight also shows significant fluctuations. The second in weight is the DIV (divergence) indicator witch is the inverse of the maximum length of diagonal structures, which in the text reflects the number of characters repeated sequences. The weight of this indicator is quite stable, which allows us to consider it as an important parameter of the text.</p><p>The ENTR (entropy) indicator and L (average length of diagonal lines) have less weight, but their fluctuations are also insignificant.</p><p>The laminarity indicator LAM, according to the obtained data, also has a significant weight, but in order to its strong fluctuation, this indicator cannot serve as a reliable characteristic of the text. Analyzing the obtained data (Table <ref type="table" target="#tab_4">6</ref> and Table <ref type="table" target="#tab_6">8</ref>) we can assert that the largest among all weight in the text complexity class have indicators of the words frequency with length 5, 3, 1 and 4, as well as the number of syllables in words and letters in sentences.</p><p>In addition, these weights do not have significant fluctuations throughout the experiment. But there is a significant weight fluctuation for the words with length 3.</p><p>The last indicator in the chromosomethe weight for varied from 0.053 to 2.842 with an average value of 1.095. According to these data, it can be concluded that a significant fluctuation in  will lead to the fact that symbols sequences that are close in frequency will become indistinguishable. As a result, the recurrence plot will be distorted and will not be able to reflect the author's style fully.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Experiment results with 4 grams</head><p>An experiment was also conducted with 4-grams as the most effective option <ref type="bibr" target="#b31">[32]</ref>. As a result, we get the following data (Table <ref type="table">9</ref>). Shaded cells -cases in which the author of the work from the control sample was identified correctly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 9</head><p>The results of authorship determination using 4-grams with calculated weights and without them</p><p>The real author of the text Working with text analysis using 4-grams, 28 generations were formed during the experiment. The obtained weights made it possible to improve the result of text authorship determination only slightly, from 27 to 30 out of 33 works in total.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>According to our previous study <ref type="bibr" target="#b31">[32]</ref>, that contains authorship text determination without differentiation indicators by information content, the reliability result was 18% of the texts correct attribution for analysis character by character and 82% for analysis using 4-grams. Using the methods described in this work, the results for the character by character analysis improved significantly (up to 80%). The results of the analysis using 4-grams also improved, but to a lesser extent (up to 91%).</p><p>In recent years, there have been a lot of researches in the sphere of text attribution using different methods and different lengths and styles texts.</p><p>Vadim Moshkina, Ilya Andreeva, Nadezhda Yarushkina in their work <ref type="bibr" target="#b34">[35]</ref> conducted a comparative analysis of various attribution methods. The architectures of a convolutional neural network, a multilayer perceptron, and LSTM neural network were proposed to solve this problem. It should be noted that the study was conducted on English poetry and took into account the peculiarities of this particular language. The credibility of determining the authorship for each of the methods studied in the article fell within the range from 74% to 83%.</p><p>Rahul Radhakrishnan Iyer, Carolyn Penstein Rose´ in their text attribution work <ref type="bibr" target="#b35">[36]</ref> were using stylometric functions and various algorithms. This work was also carried out using English-language texts. Authors were able to achieve 82% confidence in text authorship attribution.</p><p>The following studies were carried out for Ukrainian texts and gave the following results: in the work <ref type="bibr" target="#b36">[37]</ref> on determining the authorship of journalistic articles, the authors achieved 92% reliability in determining authorship using neural networks. And 79% when determining the authorship of scientific articles, using the Quantitative Method for Automated Text Authorship Attribution Based on the Statistical Analysis of N-grams Distribution <ref type="bibr" target="#b37">[38]</ref>.</p><p>Despite the difference in the methods, languages and styles of the studied texts, the results obtained in our study have sufficient reliability, along with other similar works. It can be concluded that this method can be used in various fields to determine the authorship of Ukrainian-language texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions</head><p>The usage of the genetic algorithm to find weights for various indicator classes helps to improve the results of natural language texts authorship determination. The calculated weights made it possible to improve the result four times (from 6 to 24 matches by the author). Working with 4 grams we got the result which also improved, but to a lesser extent -from 27 to 30 matches. The results obtained were 80% and 91%, respectively.</p><p>The application of this new technique has improved the result of text authorship determination in both cases, which allows us to declare its effectiveness. It helps us to identify most valuable indicators among all that have been analyzed. The effectiveness in authorship determination of different modified recurrence analysis indicators also were defined.</p><p>If we consider the presented classes of indicators and their weight in the text authorship determination, then the most important, according to their obtained weights, will be the frequencies of the letters Ф, Ш, Ї, Є, Ч. The text complexity indicators, namely the number of words in the text with a length of 5, 1 and 4 letters and the number of syllables in words and letters in sentences are the most important indicators in their class. The indicators of divergence, time trapping and entropy are the most informative for recurrence analysis.</p><p>In the future, to improve the result, it is planned to expand the number of analyzed indicators and conduct research using the stems of words. It is also planned to highlight a small list of various nature indicators; which combination will give the best result in determining the authorship of a natural language text.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Time series of the text of T. Shevchenko "Saul"</figDesc><graphic coords="4,72.00,406.23,450.70,168.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="2,85.68,584.04,425.13,167.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Indicators of the text perception complexity in the work of T. Shevchenko "Saul"</figDesc><table><row><cell>Indicator</cell><cell>Value</cell></row><row><cell>Number of words</cell><cell>458</cell></row><row><cell>Number of syllables</cell><cell>937</cell></row><row><cell>Number of sentences</cell><cell>36</cell></row><row><cell>Number of characters</cell><cell>2148</cell></row><row><cell>Average number of words in sentences</cell><cell>12.72</cell></row><row><cell>Average number of syllables in sentences</cell><cell>26.03</cell></row><row><cell>Average number of letters in sentences</cell><cell>59.67</cell></row><row><cell>Average number of syllables in words</cell><cell>2.25</cell></row><row><cell>Average number of letters in words</cell><cell>4.69</cell></row><row><cell cols="2">There is the text's image vector of "Saul" by T. Shevchenko, with the values of the text perception</cell></row><row><cell cols="2">complexity indicators (Table 1): 𝑋 ′ =[… 12.72, 26.03, 59.67, 2.25, 4.69, …]</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Indicators of the word frequency with different lengths in the work of T. Shevchenko "Saul"</figDesc><table><row><cell>Word length</cell><cell>1</cell><cell>2</cell><cell>3</cell><cell>4</cell><cell>5</cell><cell>6</cell><cell>7</cell><cell>8</cell><cell cols="2">9 10</cell><cell>11</cell><cell>12</cell><cell>13 14 15</cell></row><row><cell cols="10">Frequency 16 12 7 15 0.2 13 9.6 6.1 3</cell><cell>3</cell><cell>0.8</cell><cell>4</cell><cell>0.4</cell><cell>0</cell><cell>0.2</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Recurrence analysis indicators for the text "Saul" by T.There is the text's image vector of "Saul" by T. Shevchenko, supplemented with the indicators of the recurrence analysis Table3: 𝑋 ′ =[… 0,019, 0.002, 0.111, 2.236, 0.6, 8.2, 2.275].</figDesc><table><row><cell></cell><cell>Shevchenko</cell></row><row><cell>Indicator</cell><cell>Value</cell></row><row><cell>RR</cell><cell>0.019</cell></row><row><cell>DET</cell><cell>0.002</cell></row><row><cell>DIV</cell><cell>0.111</cell></row><row><cell>L</cell><cell>2.236</cell></row><row><cell>ENTR</cell><cell>0.6</cell></row><row><cell>LAM</cell><cell>8.2</cell></row><row><cell>TT</cell><cell>2.275</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5</head><label>5</label><figDesc>Calculated weights for the frequencies of the letters in Ukrainian alphabet</figDesc><table><row><cell cols="2">Letter Max</cell><cell cols="3">Min Mean Letter Max</cell><cell cols="2">Min Mean Letter Max</cell><cell>Min Mean</cell></row><row><cell>А</cell><cell cols="2">1,726 1,399 1,596</cell><cell>І</cell><cell cols="2">1,331 0,392 0,894</cell><cell>Т</cell><cell>0,984 0,000 0,692</cell></row><row><cell>Б</cell><cell cols="2">2,550 2,299 2,426</cell><cell>Ї</cell><cell cols="2">5,171 4,812 4,988</cell><cell>У</cell><cell>1,221 1,088 1,160</cell></row><row><cell>В</cell><cell cols="2">1,140 0,921 1,060</cell><cell>Й</cell><cell cols="2">3,881 3,545 3,703</cell><cell>Ф</cell><cell>5,509 5,165 5,385</cell></row><row><cell>Г</cell><cell cols="2">3,287 3,002 3,135</cell><cell>К</cell><cell cols="2">1,148 1,114 1,128</cell><cell>Х</cell><cell>2,032 1,517 1,781</cell></row><row><cell>Ґ</cell><cell cols="2">1,851 1,737 1,800</cell><cell>Л</cell><cell cols="2">0,215 0,203 0,208</cell><cell>Ц</cell><cell>3,313 3,070 3,180</cell></row><row><cell>Д</cell><cell cols="2">0,759 0,699 0,729</cell><cell>М</cell><cell cols="2">1,072 0,988 1,030</cell><cell>Ч</cell><cell>4,879 4,497 4,652</cell></row><row><cell>Е</cell><cell cols="2">0,000 0,000 0,000</cell><cell>Н</cell><cell cols="2">2,442 2,175 2,332</cell><cell>Ш</cell><cell>5,544 5,110 5,327</cell></row><row><cell>Є</cell><cell cols="2">5,049 4,599 4,819</cell><cell>О</cell><cell cols="2">3,842 2,309 3,261</cell><cell>Щ</cell><cell>3,850 3,549 3,705</cell></row><row><cell>Ж</cell><cell cols="2">2,643 2,493 2,564</cell><cell>П</cell><cell cols="2">3,590 3,270 3,432</cell><cell>Ь</cell><cell>3,850 3,567 3,738</cell></row><row><cell>З</cell><cell cols="2">4,842 2,830 3,728</cell><cell>Р</cell><cell cols="2">4,039 3,679 3,848</cell><cell>Ю</cell><cell>1,233 0,866 1,009</cell></row><row><cell>И</cell><cell cols="2">4,076 3,882 3,962</cell><cell>С</cell><cell cols="2">2,675 2,518 2,592</cell><cell>Я</cell><cell>5,088 0,805 1,587</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 6</head><label>6</label><figDesc></figDesc><table><row><cell cols="4">Calculated weights for words with different lengths</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Word length</cell><cell>Max</cell><cell>Min</cell><cell>Mean</cell><cell>Word length</cell><cell>Max</cell><cell>Min</cell><cell>Mean</cell></row><row><cell>1</cell><cell>5,156</cell><cell>4,696</cell><cell>4,929</cell><cell>12</cell><cell>2,835</cell><cell>1,350</cell><cell>2,336</cell></row><row><cell>2</cell><cell>1,010</cell><cell>0,900</cell><cell>0,953</cell><cell>13</cell><cell>4,920</cell><cell>2,218</cell><cell>3,871</cell></row><row><cell>3</cell><cell>5,204</cell><cell>2,882</cell><cell>4,479</cell><cell>14</cell><cell>4,599</cell><cell>1,893</cell><cell>3,522</cell></row><row><cell>4</cell><cell>4,920</cell><cell>4,679</cell><cell>4,781</cell><cell>15</cell><cell>4,574</cell><cell>4,342</cell><cell>4,458</cell></row><row><cell>5</cell><cell>5,529</cell><cell>5,096</cell><cell>5,299</cell><cell>16</cell><cell>0,958</cell><cell>0,912</cell><cell>0,930</cell></row><row><cell>6</cell><cell>1,322</cell><cell>1,230</cell><cell>1,278</cell><cell>17</cell><cell>3,136</cell><cell>2,856</cell><cell>2,976</cell></row><row><cell>7</cell><cell>1,807</cell><cell>0,406</cell><cell>0,795</cell><cell>18</cell><cell>0,750</cell><cell>0,685</cell><cell>0,717</cell></row><row><cell>8</cell><cell>2,378</cell><cell>2,270</cell><cell>2,338</cell><cell>19</cell><cell>4,323</cell><cell>4,055</cell><cell>4,201</cell></row><row><cell>9</cell><cell>3,702</cell><cell>3,260</cell><cell>3,467</cell><cell>20</cell><cell>4,171</cell><cell>3,864</cell><cell>4,037</cell></row><row><cell>10</cell><cell>3,287</cell><cell>2,928</cell><cell>3,139</cell><cell>21</cell><cell>4,012</cell><cell>3,654</cell><cell>3,817</cell></row><row><cell>11</cell><cell>1,552</cell><cell>1,431</cell><cell>1,489</cell><cell>22</cell><cell>3,802</cell><cell>3,567</cell><cell>3,702</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 7</head><label>7</label><figDesc></figDesc><table><row><cell cols="2">Calculated weights for recurrence analysis indicators</cell><cell></cell><cell></cell></row><row><cell>Indicator</cell><cell>Max</cell><cell>Min</cell><cell>Mean</cell></row><row><cell>DET</cell><cell>3,458</cell><cell>1,053</cell><cell>2,124</cell></row><row><cell>DIV</cell><cell>4,394</cell><cell>3,914</cell><cell>4,167</cell></row><row><cell>ENTR</cell><cell>4,166</cell><cell>3,728</cell><cell>3,990</cell></row><row><cell>L</cell><cell>3,874</cell><cell>3,602</cell><cell>3,714</cell></row><row><cell>LAM</cell><cell>4,082</cell><cell>1,749</cell><cell>3,361</cell></row><row><cell>RR</cell><cell>1,684</cell><cell>1,566</cell><cell>1,628</cell></row><row><cell>TT</cell><cell>4,651</cell><cell>2,140</cell><cell>4,086</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 8</head><label>8</label><figDesc>Calculated weights for text complexity indicators</figDesc><table><row><cell>Indicator</cell><cell>Max</cell><cell>Min</cell><cell>Mean</cell></row><row><cell>letters in sentences</cell><cell>4,721</cell><cell>4,390</cell><cell>4,536</cell></row><row><cell>letters in words</cell><cell>4,309</cell><cell>3,041</cell><cell>3,446</cell></row><row><cell>words in sentences</cell><cell>0,319</cell><cell>0,297</cell><cell>0,306</cell></row><row><cell>syllables in sentences</cell><cell>0,862</cell><cell>0,793</cell><cell>0,829</cell></row><row><cell>syllables in words</cell><cell>4,848</cell><cell>4,509</cell><cell>4,670</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">References</head></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Authorship attribution</title>
		<author>
			<persName><forename type="first">P</forename><surname>Juola</surname></persName>
		</author>
		<idno type="DOI">10.1561/1500000005</idno>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="233" to="334" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Computational methods in authorship attribution</title>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<idno type="DOI">10.1002/asi.20961</idno>
	</analytic>
	<monogr>
		<title level="j">J. Am. Soc. Inf. Sci. Technol</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="9" to="26" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Opredelenie avtorstva teksta po chastotnyim harakteristikam (determining the authorship of the text by frequency characteristics)</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">I</forename><surname>Drozdova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Obuhova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Tehnicheskie nauki v Rossii i za rubezhom: materialyi VII Mezhdunarodnoy nauchnoy konferentsii</title>
				<meeting><address><addrLine>Moskva</addrLine></address></meeting>
		<imprint>
			<publisher>Buki-Vedi</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="18" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Continuous N-gram Representations for Authorship Attribution</title>
		<author>
			<persName><forename type="first">Yunita</forename><surname>Sari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Stevenson</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/E17-2043</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</title>
		<title level="s">Short Papers</title>
		<meeting>the 15th Conference of the European Chapter of the Association for Computational Linguistics<address><addrLine>Valencia, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="267" to="273" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Authorship Attribution in Portuguese Using Character N-grams</title>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Baptista</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pichardo-Lagunas</surname></persName>
		</author>
		<idno type="DOI">10.12700/APH.14.3.2017.3.4</idno>
	</analytic>
	<monogr>
		<title level="j">Acta Polytechnica Hungarica</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="59" to="78" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Frequent word sequences and statistical stylistics</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Hoover</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/17.2.157</idno>
	</analytic>
	<monogr>
		<title level="j">Literary and Linguistic Computing</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="157" to="180" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Automatic Authorship Attribution Using Syllables as Classification Features</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">O</forename><surname>Sidorov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Rhema journal</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="62" to="81" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Are n-gram Categories Helpful in Text Classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kruczek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kruczek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kuta</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-50417-5_39</idno>
	</analytic>
	<monogr>
		<title level="m">Computational Science ICCS 2020</title>
				<meeting><address><addrLine>Amsterdam</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="524" to="537" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">On the robustness of authorship attribution based on character n-gram features</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Law &amp; Policy</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="427" to="439" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Document embeddings learned on various types of n-grams for cross-topic authorship attribution</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Posadas-Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00607-018-0587-8</idno>
	</analytic>
	<monogr>
		<title level="j">Computing</title>
		<imprint>
			<biblScope unit="volume">100</biblScope>
			<biblScope unit="page" from="741" to="756" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Authorship attribution system</title>
		<author>
			<persName><forename type="first">O</forename><surname>Marchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anisimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nykonenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rossada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Melnikov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="77" to="85" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Wimmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Altmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hřebíček</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ondrejovič</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wimmerová</surname></persName>
		</author>
		<title level="m">Úvod do analýzy textov</title>
				<meeting><address><addrLine>Bratislava</addrLine></address></meeting>
		<imprint>
			<publisher>Univerzita Komenského v Bratislave</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Some aspects of word frequencies</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">I</forename><surname>Popesku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Altmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Glottometrics</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="23" to="46" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Aims and Methods of Quantitative Linguistics</title>
		<author>
			<persName><forename type="first">R</forename><surname>Köhler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Altmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Problems of Quantitative Linguistics</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Altmann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Levickij</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Perebyinis</surname></persName>
		</editor>
		<meeting><address><addrLine>Ruta, Chernivtsi</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="12" to="41" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Perebyinis</surname></persName>
		</author>
		<title level="m">Statystychni metody dlia linhvistiv (statistical methods for linguists</title>
				<meeting><address><addrLine>Vinnytsia</addrLine></address></meeting>
		<imprint>
			<publisher>Nova Knyha</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note>2nd. ed</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Frequency dictionaries</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Alekseev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Quantitative linguistics: an international handbook</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Kohler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Altmann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">G</forename><surname>Piotrowski</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin</addrLine></address></meeting>
		<imprint>
			<publisher>Mouton de Gruyter</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="312" to="324" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Word frequency studies</title>
		<author>
			<persName><forename type="first">I</forename><surname>Popescu</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110218534</idno>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>Mouton de Gruyter</publisher>
			<pubPlace>Berlin-New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Sukhorolska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">I</forename><surname>Fedorenko</surname></persName>
		</author>
		<title level="m">Metody linhvistychnykh doslidzhen: navch. posibnyk dlia studentiv, aspirantiv i naukovtsiv (methods of linguistic research: textbook guide for students</title>
				<meeting><address><addrLine>Lviv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
		<respStmt>
			<orgName>graduate students and researchers</orgName>
		</respStmt>
	</monogr>
	<note>Lvivskyi natsionalnyi universytet im. I.Franka</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Chatuev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Chepovskiy</surname></persName>
		</author>
		<title level="m">Chastotnyie metodyi v kompyuternoy lingvistike (frequency methods in computational linguistics</title>
				<meeting><address><addrLine>Moskva, М</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note>Moskovskiy gosudarstvennyiy universitet pechati</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">N-grammy v lingvistike (N-grams in linguistics)</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">Yu</forename><surname>Gudkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">F</forename><surname>Gudkova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Vestnik Chelyabinskogo gosudarstvennogo universiteta</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">239</biblScope>
			<biblScope unit="page" from="69" to="71" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><surname>Taranukha</surname></persName>
		</author>
		<title level="m">Ispolzovanie kombinirovannykh kriteriev dlia avtomatizirovannoho opredelenyia zaimstvovaniy (using Combined Criteria for Automated Determination of Borrowings</title>
				<meeting><address><addrLine>SibAK, Novosibirsk</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="15" to="18" />
		</imprint>
	</monogr>
	<note>Innovatsyy v nauke»: sbornik statei po materialam XXXII mezhdunarodnoi nauchno-prakticheskoi konferentsii</note>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Kozhyna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">R</forename><surname>Duskaeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">A</forename><surname>Salimovskiy</surname></persName>
		</author>
		<title level="m">Stilistika russkoho yazyka (stylistics of the Russian language)</title>
				<meeting><address><addrLine>Moskva, М</addrLine></address></meeting>
		<imprint>
			<publisher>Nauka</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Ispolzovanie kriteriev otsenki udobochitaemosti teksta dlia poiska informatii, sootvetstvuiushchei realnym potrebnostiam polzovatelia (the usage of criteria for evaluating the readability of the text to find information that meets the real needs of the user</title>
		<author>
			<persName><forename type="middle">V</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><surname>Rohushyna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Problemy prohramiuvannia</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="76" to="88" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Embeddings and delays as derived from quantification of recurrence plots</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Zbilut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Webber</surname><genName>Jr</genName></persName>
		</author>
		<idno type="DOI">10.1016/0375-9601(92)90426-M</idno>
	</analytic>
	<monogr>
		<title level="j">Physics Letters A</title>
		<imprint>
			<biblScope unit="volume">171</biblScope>
			<biblScope unit="issue">3-4</biblScope>
			<biblScope unit="page" from="199" to="203" />
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Recurrence Plots for the Analysis of Complex Systems</title>
		<author>
			<persName><forename type="first">N</forename><surname>Marwan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Romano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kurths</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.physrep.2006.11.001</idno>
	</analytic>
	<monogr>
		<title level="j">Physics Reports</title>
		<imprint>
			<biblScope unit="volume">438</biblScope>
			<biblScope unit="issue">5-6</biblScope>
			<biblScope unit="page" from="237" to="329" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">How to avoid potential pitfalls in recurrence plot based data analysis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Marwan</surname></persName>
		</author>
		<idno type="DOI">10.1142/S0218127411029008</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Bifurcation and Chaos</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1003" to="1017" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Rekurrentnyi analiz -teoriya i praktika (recurrent analysis -theory and practice). Nauchno-tekhnicheskiy vestnik informatsionnykh tekhnolohiy</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">B</forename><surname>Kiselev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">mekhaniki i optiki</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="118" to="127" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">A Comparative Analysis of Remote Sensing Image Classification Techniques</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Sisodia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tiwari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICACCI.2014.6968245</idno>
	</analytic>
	<monogr>
		<title level="m">International Conference on Advances in Computing, Communications and Informatics (ICACCI)</title>
				<meeting><address><addrLine>Delhi</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1418" to="1421" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">A Genetic Algorithm Tutorial</title>
		<author>
			<persName><forename type="first">D</forename><surname>Whitley</surname></persName>
		</author>
		<idno type="DOI">10.1007/BF00175354</idno>
	</analytic>
	<monogr>
		<title level="j">Statistics and Computing</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="65" to="85" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Lisovychenko Vykorystannia henetychnykh alhorytmiv v zadachakh optymizatsii (genetic algorithms usage in optimization problems). Adaptyvni systemy avtomatychnoho upravlinnia: mizhvidomchyi naukovo-tekhnichnyi zbirnyk</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">V</forename><surname>Kalinina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">I</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="48" to="61" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Chastoty povtoriaemosti bukv i bihramm v otkrytykh tekstakh na ukrainskom yazyke (requencies of letters recurrence and bigrams in plain texts in Ukrainian)</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">O</forename><surname>Sushko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ya</surname></persName>
		</author>
		<author>
			<persName><forename type="middle">S</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><surname>Barsukov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Zakhyst informatsii</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="91" to="98" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Demidovich Determination of the attributes of authorship of natural texts</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="27" to="35" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Constructive Model of the Natural Language</title>
		<author>
			<persName><forename type="first">V</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kuropiatnyk</surname></persName>
		</author>
		<idno type="DOI">10.14232/actacyb.23.4.2018.2</idno>
	</analytic>
	<monogr>
		<title level="j">Acta Cybernetica</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="995" to="1015" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Tools of investigation of time and functional efficiency of bionic algorithms for function optimization problems</title>
		<author>
			<persName><forename type="first">V</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ilchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zabula</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CEUR Workshop Proceedings</title>
		<imprint>
			<biblScope unit="volume">2139</biblScope>
			<biblScope unit="page" from="270" to="280" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Solving the problem of determining the author of text data using a combined assessment</title>
		<author>
			<persName><forename type="first">V</forename><surname>Moshkina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Andreeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yarushkinaa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2782</biblScope>
			<biblScope unit="page" from="112" to="118" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title level="m" type="main">A Machine Learning Framework for Authorship Identification FromTexts</title>
		<author>
			<persName><forename type="first">R</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rosé</surname></persName>
		</author>
		<idno>ArXiv abs/1912.10204</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Identification of authorship of Ukrainian-language texts of journalistic style using neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lupei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mitsa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Repariuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sharkan</surname></persName>
		</author>
		<idno type="DOI">10.15587/1729-4061.2020.195041</idno>
		<ptr target="https://doi.org/10.15587/1729-4061.2020.195041" />
	</analytic>
	<monogr>
		<title level="j">Eastern-European Journal of Enterprise Technologies</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="30" to="36" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Development of the Quantitative Method for Automated Text Content Authorship Attribution Based on the Statistical Analysis of N-grams Distribution</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lytvyn</surname></persName>
		</author>
		<idno type="DOI">10.15587/1729-4061.2019.186834</idno>
	</analytic>
	<monogr>
		<title level="j">Eastern-European Journal of Enterprise Technologies</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="28" to="51" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
