<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Natural Language Texts Authorship Establishing Based on the Sentences Structure</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Viktor</forename><surname>Shynkarenko</surname></persName>
							<email>shinkarenko_vi@ua.fm</email>
						</author>
						<author>
							<persName><forename type="first">Inna</forename><surname>Demidovich</surname></persName>
							<email>2019demidovichinn@gmail.com</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Ukrainian State University of Science and Technologies</orgName>
								<address>
									<addrLine>2, аcademician Lazaryan str</addrLine>
									<postCode>49010</postCode>
									<settlement>Dnipro</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">International Conference on Computational Linguistics and Intelligent Systems</orgName>
								<address>
									<addrLine>May 12-13</addrLine>
									<postCode>2022</postCode>
									<settlement>Gliwice</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Natural Language Texts Authorship Establishing Based on the Sentences Structure</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">81DEED4F03179039651BC44F973FDEA5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>natural language texts</term>
					<term>statistic analysis</term>
					<term>text structure</term>
					<term>text authorship</term>
					<term>classification</term>
					<term>parsing</term>
					<term>confidence interval</term>
					<term>formal stochastic grammar</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Natural Language Texts Authorship Establishing was carried out on the basis of the hypothesis that each author has a peculiar the sentences structure forming style with different parts of speech. A natural language text was translated into a formal language generated by a formal stochastic grammar. For each product of the training sample, the corresponding stochastic formal grammar was restored. This method made it possible to reflect the author characteristic style in sentences building. On the basis of works statistical sample, inference rules and a probabilistic measure of their application were formed. The effectiveness of the proposed method was evaluated experimentally. In authorship establishing a probabilistic measure of the text belonging to a formal stochastic grammar was determined. To assess the reliability of the obtained results, the confidence interval of the probability measure was calculated. In the studies with the control sample, the possibility of the correct text authorship establishment is 75-80%.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>This article solves the problem of the text authorship determining by analyzing the sentences structure. It is important to note that the task of the texts authorship establishing, as well as the task of its attribution, is still relevant for today and covers a wide range of goals in various fields and is interesting to a number of specialists in various fields.</p><p>To determine the true author of a text, it is often necessary to turn to experts who can identify the author of an unknown text or determine whether a work belongs to another author using characteristic linguistic features and various stylistic devices. Expert text analysis takes a lot of time and is very laborious. In this regard, formal methods of different texts attribution have great prospects for automating the analysis process.</p><p>Currently, various approaches such as the theory of pattern recognition, mathematical statistics and probability theory, algorithms of neural networks and cluster analysis, and many others are used for text attribution <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. However, all methods used are not sufficiently effective. The particular difficulty is working with features that are characteristic of a particular language, which significantly complicates the task.</p><p>Working with the Ukrainian language, like other Slavic languages, has particular difficulties due to their structural complexity, as well as the variability of word forms and the possibilities of constructing sentences. Also, the complexity of the task is added by various styles of speech that are characteristic of a certain sphere of human activity, place of residence, age, education and subject of the text <ref type="bibr" target="#b2">[3]</ref>.</p><p>In this work, only literary works are used to determine authorship. The analysis of sentences in the text is carried out in order to form and formalize their structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related works</head><p>Syntactic analysis of various text parts is a popular method for analyzing the author's work, its semantics, focus and main idea of the work. However, this type of various directions texts analysis is faced with the complexity of the syntactic model's automatic formation <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>. This is largely due to the complexity of the language structure itself, the variability of the word forms used and the very sentences structure. Despite this, method of text research like this carries the greatest amount of information about the author's style: regardless of the text subject, the syntactic structure of the author's language will clearly display his syllable.</p><p>Various studies of natural language formalization are known <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. One of the methods to work with natural language is using of grammars <ref type="bibr" target="#b8">[9]</ref>. For example, similar studies were carried out for Italian <ref type="bibr" target="#b9">[10]</ref> and Ukrainian <ref type="bibr" target="#b2">[3]</ref>.</p><p>Unlike the problem of text categorization, the goal of which is to determine a topic or list of topics for a text based on its content, text parsing abstracts from a specific area and tries to understand contentindependent features of a text that are "linguistic expressions" of individual authors <ref type="bibr" target="#b10">[11]</ref>. Such contentindependent text properties are usually called stylometric features. For these purposes, various methods have been proposed and applied in problems of authorship attribution, including: frequency of words <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>, symbolic n-grams <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref>, auxiliary words, syllables <ref type="bibr" target="#b16">[17]</ref> and parts of speech definition <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>.</p><p>The idea of using information about parts of speech is not new and has been successfully applied in a number of style classification problems, where, in particular, texts in English were processed <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b1">2]</ref>. As a rule, their repeating sequences were extracted on the parts of speech basis.</p><p>The described approach is feasible for English texts, since the structure of the English language is quite strict and the word order in a sentence is clearly assigned to a certain part of speech. In addition, the words themselves do not have many different word forms, and when a prefix or suffix is added or removed, they go into another part of speech category. However, with using such method into Ukrainian, difficulties may arise. Unlike English, Ukrainian is a more variable language, and the number of forms for one word due to case, gender, and number changing significantly complicates the task. Moreover, the assignment of a word to one or another part of speech is ambiguous and difficult to perform within the automatic process.</p><p>The problem associated with establishing authorship need an individual approach in each case <ref type="bibr" target="#b19">[20]</ref>. As a general rule, problems where the number of potential authors is small and the data samples are large are considered easy and high accuracy is expected. Complexity increases with an increase in the number of authors and a decrease in data volumes <ref type="bibr" target="#b20">[21]</ref>, which leads to a decrease in recognition accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Work structure rules formation</head><p>This paper explores a method for the texts authorship determining based on the sentence structure of the author's individual language.</p><p>Stochastic grammar is used to create rules that describe the structure of sentences in a text. For each rule, the probability of its application in a particular work is determined. The probability of inferring the whole sentence is defined as the product of the parts of speech sequences probabilities used in it. The resulting rules will generate a language that is specific for the explored and structurally similar works of a certain author.</p><p>To describe the main text structure, the parts of speech as a word characteristic were used. Thus, each word in the sentence is replaced by the part of speech that it is.</p><p>Each of the words in the text was analyzed for similarity with the parts of speech existing in the Ukrainian language. For service parts of speech: prepositions, pronouns, conjunctions and interjections, their list was used in all possible forms, and verbs, nouns, adverbs, adjectives, participles and participles were determined by comparison with the list of word endings.</p><p>When a corresponding word or its ending was found in one of the lists, the word was automatically replaced with the corresponding part of speech. If it was not possible to automatically determine the answer, the user was asked to then include the entire word or its ending (the data was entered manually by the user) in an already existing list.</p><p>The following tags were used to tag words in the text in Ukrainian: verb (v), noun (n), pronoun (prn), adjective (adj), conjunction (cnj), adverb (adv), preposition (prp), participle (prtcpl), interjection (intrj), gerund (ger).</p><p>For each part of speech, the probability of its occurrence in a certain place of the sentence in the given text is calculated. The probability of the certain part of speech appearance in the studied sequence will allow us to more accurately capture the individual writing style specific of each of the authors under study. After receiving the text in the form of parts of speech sequences set in sentences with the probability of their occurrence in a particular place, rules are formed.</p><p>To do this, all sentences starting with the same part of speech are grouped, the first word is discarded, and the procedure for calculating the probability is repeated for the next word.</p><p>After the sentences are again grouped according to the parts of speech at the beginning, the first word is again discarded and the probability for the next element is calculated, and so on. Probability is calculated as the number of cases in the text divided by their total number. Thus, the substitution rules for some product T have an initial non-terminal, then terminals corresponding to each word in the sentence and the probability of applying the corresponding rule when parsing the text and have the form:</p><formula xml:id="formula_0">1 1 1, , j p jj bA  → 1, , 1, 1, , ik p i j i k i k A b A + ++ → 1 , 1 ii j J k K ==</formula><p>where σinitial non-terminal, ij bterminals corresponding to the i-th word in the sentence (and corresponding to the i-th rule applied when parsing the sentence or the i-th level of the rule), 𝐴 𝑖,𝑗j-th non-terminal in the i-th level rule, pi,kthe probability of applying the corresponding rule when parsing this work, Ji, Ktis the number of different non-terminals in the right part of the rules of the i-1-th level and i-th level, respectively.</p><p>The level corresponds to the ordinal number of the word in the sentence. Several alternative rules are allowed with a non-terminal on the left side of the rule, but the terminals on the right side of such rules are different, which ensures deterministic parsing. Thus, the text is presented as a set of rules that describe its structural features using the rules described above. The symbol ε stands for empty (end of rule).</p><p>An example of the one automatically restored set of rules is presented below in Table <ref type="table" target="#tab_0">1</ref>. The rule describes all 24 sentences in the text "Etude" by I. Bahrianyi, with a verb in the beginning. As can be seen from the presented probabilities, in the studied work, 31% of sentences will begin with a verb. And the percentage of sentences consisting of only one word, a verb, is 17%.</p><p>Examples of the first few rules according to the table are:</p><formula xml:id="formula_1">𝜎 0.31 → 𝑣𝐴 1,1 ; 𝐴 1,1 0,17 → 𝜀; 𝐴 1,1 0,13</formula><p>→ 𝑛𝐴 2,1 . On the left side of the rule is a non-terminal, then the probability of its application is indicated, and on the right side of the rule is a terminal with a non-terminal to go to the next rule.</p><p>When using this text method, a sentence from the work "Etude" by I. Bahrianyi presented as a sequence of parts of speech included in it will have the following form Table <ref type="table" target="#tab_1">2</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Comparison of two works</head><p>To compare two works, they must be presented in the form of a restored formal stochastic grammar with the rules, the formation of which is described above. Each sequence of rules in one text is compared with each sequence of rules in another text. Let the rules for some text Ti be formed like:</p><formula xml:id="formula_2">' 1 ' ' ' 1 1, , j p jj bA  → ' 1, ' ' ' , 1, 1, , ik p i j i k i k A b A + ++ → '' 1 , 1 ii j J k K == .</formula><p>Let's say in texts Assume that two texts under study contain sentences of a similar structure ( i S and ' k S ), then the degree of their statistical structural similarity will be determined as the product of the minimum difference between the probabilities of applying the corresponding rule: Formation of formal stochastic grammars was carried out for all the works of each author in the training sample, generating the language specific for a particular author. For determining the similarity of a work according to (1), the formal stochastic grammar corresponding to the work from the control sample was used as T1, and the stochastic grammar for all works of the potential author altogether was used as T2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Calculation of the confidence interval boundary values using Student's t-test</head><p>To obtain more reliable results, we calculated confidence intervals for each of the authors in the sample. Student's t-test was applied <ref type="bibr" target="#b21">[22]</ref>.</p><p>To calculate the confidence interval in the training sample for each of the authors, their presented texts similarities to each other were calculated. Data on the similarity of texts within the training sample for each of the authors was divided into three parts with the same number of components.</p><p>The following formula was used to calculate the confidence interval:</p><formula xml:id="formula_3">𝑡 2,𝛽 √ 1 6 ∑ (ζ 𝑘 − 𝜃 𝑠 ) 2 3 𝑘=1</formula><p>, where t2,β -Student's t-test, βconfidence level, ζkthe average value of k-th sample part, θSthe average value over the entire sample.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Training and control samples formation</head><p>During the experiment, the authorship of natural language texts was determined by two samples.</p><p>For the first experiment, 20 works of literary texts by 10 Ukrainian authors were selected in the training sample. The control sample consists of 3 works by each author.</p><p>The works of the following authors are presented: IB -I. Bahrianyi, AV -A. Vyshnia, MV -M. Vovchok, AD -A. Dovzhenko, HK -H. Kvitka-Osnovianenko, PM -P. Myrnyi, VN -V. Nestaiko, VP -V. Pidmohylnyi, IF -I. Franko, MK -M. Khvylovyi.</p><p>For the second experiment, both samples were doubled, respectively, the training sample included 40 works by the same authors, and 60 texts made up a new control sample -6 works by each author.</p><p>The choice of literary texts is due to the availability of reliable information about the works authorship and the presence of each author specific style.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experiment results</head><p>At Figure <ref type="figure">1</ref> and Figure <ref type="figure" target="#fig_1">2</ref> the results of the experiments are presented. Each bar in the chart represents the works of a particular author from the control sample. The column is divided into two zones, where the blue part displays the number of texts with correctly identified authorship, and the orange part shows the number of texts with erroneous ones.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1: Authorship establishing results with a sample 20x3</head><p>According to the results obtained, working with the smaller of the two samples, containing a total of 200 works by 10 authors in the training sample (20 for each author), and 30 works in the control sample, cases of authorship correct attribution is 24, which is 80%.</p><p>The best result was obtained working with the works of A. Dovzhenko, P. Myrnyi, V. Nestaiko and V. Pidmohylnyi. I. Franko turned out to be the author with the most difficult to define style.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2: Authorship establishing results with a sample 40x6</head><p>According to theobtained results, working with a larger sample (400 texts of 10 authors in the training sample and 60 works in the control sample), the number of correctly determined cases is 45, which is 75%.</p><p>The best result was obtained working with the works of P. Myrny. The authors of the most difficult cases to detect are H. Kvitka-Osnovianenko and M. Vovchok -the athurship of only a half of their works in the sample was correctly determined.</p><p>To obtain better result, a confidence interval was introduced Table <ref type="table">3</ref>. Confidence intervals of different authors differ significantly from each other. So, on average, the interval ranges from 0.04 to 0.08, however, for A. Vyshnia and M. Vovchok it is much larger, and amounted to 0.37 and 0.12, respectively. The minimum interval was 0.02 for V. Pidmohylnyi.</p><p>For some of the authors, such as H. Kvitka-Osnovianenko and M. Vovchok, a special style of narration is characteristic, which is difficult to classify and structure, due to which they are characterized by a low recognition result. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sample 40x6</head><p>True False</p><p>Whereas the authors I. Bahrianyi, A. Vyshnia, A. Dovzhenko, P. Myrnbi and V. Nestaiko have a more individual style of writing, which is displayed in the sentence structure and allows establishing their authorship with high accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3 Confidence interval by authors</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IB</head><p>AV MV AD HK PM VN VP IF MKH Average 0,83 0,83 0,61 0,83 0,58 0,97 0,87 0,69 0,67 0,66 Max 0,86 1,00 0,67 0,86 0,60 0,99 0,89 0,70 0,71 0,69 Min 0,80 0,64 0,55 0,80 0,56 0,95 0,85 0,68 0,63 0,63 Range 0,05 0,37 0,12 0,06 0,05 0,04 0,03 0,02 0,07 0,07</p><p>Taking into account the confidence intervals, the following results were obtained Table <ref type="table">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Results of authorship determination working with the confidence interval In the interval 1 actual author According to the results obtained, for 14 cases, as a result of authorship determination, one correctly recognized candidate was obtained, which amounted to 23.3%. In 36 cases, the program was able to narrow the number of applicants to 3, while in 31 cases the applicant with the greatest structural similarity was the correctly recognized author (which accounted for 51.7% of the total sample) and in another 5 cases the correctly recognized author was included in the list of candidates (8.3% of the total sample). Also, in only 8 cases, the authorship of the text was not determined -none of the submitted candidates was correctly recognized, which amounted to 13.3%.</p><p>As a result of the obtained data analysis, it can be argued that the size of the confidence interval can also be considered a specific feature of the author's personal style. So, for some of them, the size of the confidence interval differs significantly from other authors. It is much larger in A. Vyshnia and M. Vovchok. It is noteworthy that A. Vyshnia is characterized by his own style of writing, which made it possible to determine with sufficient accuracy the texts of his authorship, while the style of M. Vovchok is rather difficult to classify. The minimum interval was obtained for the works of the author V. Pidmohylnyi, which, however, leads to rather high results of his classification.</p><p>Thus, the percentage of correct authorship identification, taking into account the confidence interval, was improved to 83.33% (in 50 cases out of 60, the author of the work was correctly recognized).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>In a study of English poets works authorship <ref type="bibr" target="#b29">[30]</ref> using the architectures of a convolutional neural network, a multilayer perceptron, and LSTM neural network, the results ranged from 74-83%. In another work, using various stylometric functions and algorithms, also in English authors works, the success rate was 82% <ref type="bibr" target="#b19">[20]</ref>. Analyzing text corpora in English and Spanish, working with syllables, the achieved result was 78.8%. For the Russian language, which is similar in complexity and structure to Ukrainian, working with a combination of support vector machine and a genetic algorithm, such results as 82.3% were achieved <ref type="bibr" target="#b26">[27]</ref>. And using N-grams in processing the Portuguese language in <ref type="bibr" target="#b30">[31]</ref>, the result reached 72%.</p><p>Working with Ukrainian texts, predominantly journalistic style, in view of the structural variability and complexity of the language, using methods such as neural networks <ref type="bibr" target="#b27">[28]</ref> and the Quantitative Method for Automated Text Authorship Attribution Based on the Statistical Analysis of N-grams Distribution <ref type="bibr" target="#b28">[29]</ref> and working with scientific articles, the authors obtained results of 92% and 79%, respectively.</p><p>The case of using confidence intervals in determining the authorship of texts has not been found in recent works, which allows us to assert the novelty of this approach.</p><p>Among the works related to text tagging, one can find works devoted to the N-grams <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b32">33,</ref><ref type="bibr" target="#b33">34]</ref>.</p><p>For example, a study <ref type="bibr" target="#b5">[6]</ref> of the Coptic language, which is the last phase of the Egyptian language family and a descendant of the ancient Egyptian script, was conducted to assess the success of tagging the study of genre, style and authorship in the Coptic language. The results of the study show a relatively high accuracy of 94-95% correct automatic tagging for literary texts.</p><p>In <ref type="bibr" target="#b32">[33]</ref> the authors focused on the attribution of the Polish texts authorship using stylometric features based on part-of-speech tags. The Polish language is characterized by a high level of inflection, so the authors managed to distinguish more than 1000 tags, which made it possible to build a fairly large feature space by processing texts and performing their classification using machine learning methods. The use of this method in highly inflected languages, including Polish, is considered by the authors to be a promising direction in authorship attribution.</p><p>In <ref type="bibr" target="#b33">[34]</ref> for the authorship attribution problem, the use of part-of-speech skip-grams and an in-house top-k sequential pattern mining algorithm is considered. The authors of the study come to an accuracy of 86-97% for various authors in training sample.</p><p>Given the differences in the analyzed languages in the presented studies, we can conclude that the method proposed here is a promising direction for working with Ukrainian literary texts and will significantly improve the results obtained.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions</head><p>The paper proposes a new method of text attribution using stochastic formal grammars. Since all known methods do not give high accuracy and do not take into account the sentence constructing rules, there is a need to search for new additional methods and new attributes. The characteristic of the author's style in the aspect of the sentence constructing is a previously unexplored sphere. Its use, in combination with already known methods, can increase the efficiency of determining the natural language texts authorship.</p><p>Analyzing texts with a sample 20x3, the number of matches for the author was 24 out of 30 works. And working with doubled samples, the result is also positive, but to a lesser extent -45 out of 60 matches. The results obtained were 80% and 75%, respectively.</p><p>Taking into account the confidence interval, the results were improved to 83.3%. As a result of the analysis, it can be argued that the size of the confidence interval can also be considered a characteristic feature of the author's personal style. Thus, a large confidence interval may indicate a low level of differentiation of the author's style and, as a result, a poor result in determining the authorship of his works. And vice versa -with a small confidence interval, the probability of confident the author's style differentiation increases significantly.</p><p>The average value of the works similarity in the training sample is also significant -the higher the value, the more clearly the style of the author is determined and, accordingly, the result of determining his works authorship is higher.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>2 T</head><label>2</label><figDesc>And the degree of two works texts statistical structural similarity as the sum of the all its sentences similarity degrees.The degree of texts statistical structural similarity similar sentence to sentence i S according Nthe number of sentences in any of these works, if the text structurally similar sentence to sentence</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="1,0.00,191.15,594.96,459.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>The rules of the restored stochastic grammar according to "Etude" by I. Bahrianyi</figDesc><table><row><cell>Left part</cell><cell cols="3">Probability Terminal Nonterminal</cell><cell>Left part</cell><cell cols="3">Probability Terminal Nonterminal</cell></row><row><cell>σ</cell><cell>0,31</cell><cell>v</cell><cell>A1,1</cell><cell>A3,4</cell><cell>1,00</cell><cell>prp</cell><cell>A4,3</cell></row><row><cell>A1,1</cell><cell>0,17</cell><cell>ε</cell><cell></cell><cell>A3,5</cell><cell>1,00</cell><cell>adv</cell><cell>A3,3</cell></row><row><cell>A1,1</cell><cell>0,13</cell><cell>n</cell><cell>A2,1</cell><cell>A3,6</cell><cell>1,00</cell><cell>v</cell><cell>A4,4</cell></row><row><cell>A1,1</cell><cell>0,04</cell><cell>prp</cell><cell>A2,2</cell><cell>A3,7</cell><cell>0,80</cell><cell>adj</cell><cell>A3,7</cell></row><row><cell>A1,1</cell><cell>0,04</cell><cell>cnj</cell><cell>A2,3</cell><cell>A3,7</cell><cell>0,20</cell><cell>ε</cell><cell></cell></row><row><cell>A1,1</cell><cell>0,21</cell><cell>v</cell><cell>A2,4</cell><cell>A4,1</cell><cell>1,00</cell><cell>intrj</cell><cell>A5,1</cell></row><row><cell>A1,1</cell><cell>0,13</cell><cell>prn</cell><cell>A2,5</cell><cell>A4,2</cell><cell>1,00</cell><cell>adj+n</cell><cell>G3</cell></row><row><cell>A2,1</cell><cell>0,40</cell><cell>ε</cell><cell></cell><cell>A4,3</cell><cell>1,00</cell><cell>n</cell><cell>A5,2</cell></row><row><cell>A2,1</cell><cell>0,40</cell><cell>n</cell><cell>A3,1</cell><cell>A4,4</cell><cell>1,00</cell><cell>adv</cell><cell>A5,3</cell></row><row><cell>A2,1</cell><cell>0,20</cell><cell>prp</cell><cell>A3,2</cell><cell>A4,5</cell><cell>1,00</cell><cell>v</cell><cell>A5,4</cell></row><row><cell>A2,2</cell><cell>1,00</cell><cell>n</cell><cell>A3,3</cell><cell>A5,1</cell><cell>1,00</cell><cell>n</cell><cell>A6,1</cell></row><row><cell>A2,3</cell><cell>1,00</cell><cell>v</cell><cell>A3,1</cell><cell>A5,2</cell><cell>0,33</cell><cell>ε</cell><cell></cell></row><row><cell>A2,4</cell><cell>0,20</cell><cell>adj+n</cell><cell>A4,5</cell><cell>A5,2</cell><cell>0,33</cell><cell>v</cell><cell>A6,2</cell></row><row><cell>A2,4</cell><cell>0,20</cell><cell>ε</cell><cell></cell><cell>A5,2</cell><cell>0,33</cell><cell>adv</cell><cell>A6,1</cell></row><row><cell>A2,4</cell><cell>0,20</cell><cell>prp</cell><cell>A2,1</cell><cell>A5,3</cell><cell>1,00</cell><cell>n</cell><cell>A3,3</cell></row><row><cell>A2,4</cell><cell>0,20</cell><cell>v</cell><cell>A3,4</cell><cell>A5,4</cell><cell>1,00</cell><cell>prp</cell><cell>A6,3</cell></row><row><cell>A2,4</cell><cell>0,20</cell><cell>prn</cell><cell>A3,5</cell><cell>A6,1</cell><cell>1,00</cell><cell>v</cell><cell>A7,1</cell></row><row><cell>A2,5</cell><cell>0,33</cell><cell>n</cell><cell>A3,6</cell><cell>A6,2</cell><cell>1,00</cell><cell>v</cell><cell>A7,3</cell></row><row><cell>A2,5</cell><cell>0,33</cell><cell>adj</cell><cell>A3,7</cell><cell>A6,3</cell><cell>1,00</cell><cell>prtcpl</cell><cell>A7,2</cell></row><row><cell>A2,5</cell><cell>0,33</cell><cell>v</cell><cell>A3,4</cell><cell>A7,1</cell><cell>1,00</cell><cell>prp</cell><cell>A5,3</cell></row><row><cell>A3,1</cell><cell>0,50</cell><cell>ε</cell><cell></cell><cell>A7,2</cell><cell>0,33</cell><cell>adj+n</cell><cell>A7,2</cell></row><row><cell>A3,1</cell><cell>0,20</cell><cell>cnj</cell><cell>A4,1</cell><cell>A7,2</cell><cell>0,67</cell><cell>ε</cell><cell></cell></row><row><cell>A3,2</cell><cell>1,00</cell><cell>prn</cell><cell>A4,2</cell><cell>A7,3</cell><cell>1,00</cell><cell>n</cell><cell>A3,3</cell></row><row><cell>A3,3</cell><cell>1,00</cell><cell>ε</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Sentence tagging and corresponding probabilities of stochastic grammar rules</figDesc><table><row><cell>Word in a sentence</cell><cell>Tag</cell><cell>Probability</cell></row><row><cell>Чорні</cell><cell>adj</cell><cell>0,06</cell></row><row><cell>ґрати</cell><cell>n</cell><cell>0,6</cell></row><row><cell>розпанахали</cell><cell>v</cell><cell>0,6</cell></row><row><cell>небо</cell><cell>n</cell><cell>0,125</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">According to our previous studies<ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b24">25]</ref> on establishing the texts authorship, the initial result of reliability ranged from 18% of the correct texts attribution to 82%<ref type="bibr" target="#b22">[23]</ref> using character-bycharacter analysis. Later, the result was improved with a range from 80% to 91% using N-grams<ref type="bibr" target="#b23">[24]</ref>, genetic algorithm<ref type="bibr" target="#b24">[25]</ref> and also working with stems and dictionaries<ref type="bibr" target="#b2">[3]</ref>.In this paper, the proposed method was used for the first time and results of 75-80% were obtained, which, taking into account the peculiarities of the Ukrainian language and the complexity of its formalization for solving the task, correspond to the spread in the percentage of determining the texts authorship in works devoted to this topic. So, for example, the result of determining authorship is in the range from 74% to 92% correctly identified cases<ref type="bibr" target="#b25">[26]</ref><ref type="bibr" target="#b26">[27]</ref><ref type="bibr" target="#b27">[28]</ref><ref type="bibr" target="#b28">[29]</ref><ref type="bibr" target="#b29">[30]</ref><ref type="bibr" target="#b30">[31]</ref><ref type="bibr" target="#b31">[32]</ref>. These results varied depending on the used method, the language and style of the analyzed text.Working with different foreign languages and taking into account their distinctive features, the authors used various methods.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">CUSUM: a credible method for the determination of authorship?</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Hardcastle</surname></persName>
		</author>
		<idno type="DOI">10.1016/s1355-0306(97)72158-0</idno>
	</analytic>
	<monogr>
		<title level="j">Science &amp; Justice: Journal of the Forensic Science Society</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="129" to="138" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A comparative assessment of the difficulty of authorship attribution in Greek and in English</title>
		<author>
			<persName><surname>Juola</surname></persName>
		</author>
		<author>
			<persName><surname>Gk Mikros</surname></persName>
		</author>
		<author>
			<persName><surname>Vinsick</surname></persName>
		</author>
		<idno type="DOI">10.1002/asi.24073</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="61" to="70" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Constructive Model of the Natural Language</title>
		<author>
			<persName><forename type="first">V</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kuropiatnyk</surname></persName>
		</author>
		<idno type="DOI">10.14232/actacyb.23.4.2018.2</idno>
	</analytic>
	<monogr>
		<title level="j">Acta Cybernetica</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="995" to="1015" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Language Model-based Generative Classifier for Sentence-level Discourse Parsing</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kamigaito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Okumura</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.emnlp-main.188</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, Association for Computational Linguistics</title>
				<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2432" to="2446" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A survey of discourse parsing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11704-021-0500-z</idno>
	</analytic>
	<monogr>
		<title level="j">Front. Comput. Sci</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zeldes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Schroeder</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqv043</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="164" to="176" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Formalization of natural language requirements into temporal logics: a survey</title>
		<author>
			<persName><forename type="first">I</forename><surname>Buzhinsky</surname></persName>
		</author>
		<idno type="DOI">10.1109/INDIN41052.2019.8972130</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 17th INDIN</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="400" to="406" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">An introduction to formal languages and automata, Jones &amp; Bartlett Learning</title>
		<author>
			<persName><forename type="first">P</forename><surname>Linz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Rodger</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A new linguistic engine for nooj: Parsing context-sensitive grammars with finitestate machines</title>
		<author>
			<persName><forename type="first">M</forename><surname>Silberztein</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-73420-0_20</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11 th International NooJ Conference Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications</title>
				<meeting>the 11 th International NooJ Conference Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications<address><addrLine>Kenitra, Morocco</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="240" to="250" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Building a large grammar for Italian</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mazzei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lombardo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Linguistic correlates of style: authorship classification with deep linguistic analysis features</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gamon</surname></persName>
		</author>
		<idno type="DOI">10.3115/1220355.1220443</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th International Conference on Computational Linguistics. COLING 2004, Stroudsburg. Association for Computational Linguistics</title>
				<meeting>the 20th International Conference on Computational Linguistics. COLING 2004, Stroudsburg. Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">On the role of words in the network structure of texts: Application to authorship attribution</title>
		<author>
			<persName><forename type="first">C</forename><surname>Akimushkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Amancio</surname></persName>
		</author>
		<author>
			<persName><surname>On</surname></persName>
		</author>
		<author>
			<persName><surname>Oliveira</surname><genName>Jr</genName></persName>
		</author>
		<idno type="DOI">10.1016/j.physa.2017.12.054</idno>
	</analytic>
	<monogr>
		<title level="j">Physica A: Statistical Mechanics and its Applications</title>
		<imprint>
			<biblScope unit="volume">495</biblScope>
			<biblScope unit="page" from="49" to="58" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The microanalysis of style variation</title>
		<author>
			<persName><surname>Dl</surname></persName>
		</author>
		<author>
			<persName><surname>Hoover</surname></persName>
		</author>
		<idno type="DOI">10.1093/llc/fqx022</idno>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">32</biblScope>
			<biblScope unit="page" from="17" to="30" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>Issue</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Continuous N-gram Representations for Authorship Attribution</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Sari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stevenson</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/E17-2043</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</title>
		<title level="s">Short Papers</title>
		<meeting>the 15th Conference of the European Chapter of the Association for Computational Linguistics<address><addrLine>Valencia, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="267" to="273" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Authorship Attribution in Portuguese Using Character N-grams</title>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Baptista</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pichardo-Lagunas</surname></persName>
		</author>
		<idno type="DOI">10.12700/APH.14.3.2017.3.4</idno>
	</analytic>
	<monogr>
		<title level="j">Acta Polytechnica Hungarica</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="59" to="78" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Document embeddings learned on various types of n-grams for cross-topic authorship attribution</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Posadas-Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00607-018-0587-8</idno>
	</analytic>
	<monogr>
		<title level="j">Computing</title>
		<imprint>
			<biblScope unit="volume">100</biblScope>
			<biblScope unit="page" from="741" to="756" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Automatic Authorship Attribution Using Syllables as Classification Features</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">O</forename><surname>Sidorov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Rhema journal</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="62" to="81" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A survey of modern authorship attribution methods</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Am. Soc. Inf. Sci. Technol</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="538" to="556" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Computational methods in authorship attribution</title>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<idno type="DOI">10.1002/asi.20961</idno>
	</analytic>
	<monogr>
		<title level="j">J. Am. Soc. Inf. Sci. Technol</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="9" to="26" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Authorship attribution: what&apos;s easy and what&apos;s hard?</title>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-32790-2_34</idno>
	</analytic>
	<monogr>
		<title level="j">Lecture Notes in Computer Science</title>
		<imprint>
			<biblScope unit="volume">7499</biblScope>
			<biblScope unit="page" from="282" to="289" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The effect of author set size and data size in authorship attribution</title>
		<author>
			<persName><forename type="first">K</forename><surname>Luyckx</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lit. Linguist. Comput</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="35" to="55" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Application of student&apos;s t-test, analysis of variance, and covariance</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Pandey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pandey</surname></persName>
		</author>
		<idno type="DOI">10.4103/aca.ACA_94_19</idno>
	</analytic>
	<monogr>
		<title level="j">Annals of cardiac anaesthesia</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="407" to="411" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Determination of the attributes of authorship of natural texts</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename><surname>Demidovich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="27" to="35" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Demidovich Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021)</title>
				<meeting>the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021)<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">April 22-23, 2021</date>
			<biblScope unit="page" from="832" to="844" />
		</imprint>
	</monogr>
	<note>Main Conference</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task</title>
		<author>
			<persName><forename type="first">I</forename><surname>Demidovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Shynkarenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kuropiatnyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kirichenko</surname></persName>
		</author>
		<idno type="DOI">10.1109/CSIT52700.2021.9648829</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the XVI International Scientific and Technical Conference (CSIT&apos;2021)</title>
				<meeting>the XVI International Scientific and Technical Conference (CSIT&apos;2021)<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">A Machine Learning Framework for Authorship Identification FromTexts</title>
		<author>
			<persName><forename type="first">R</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rosé</surname></persName>
		</author>
		<idno>ArXiv abs/1912.10204</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Authorship Attribution: Comparison of Single-Layer and Double-Layer Machine Learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rygl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Horák</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-32790-2_34</idno>
	</analytic>
	<monogr>
		<title level="j">Lecture Notes in Computer Science</title>
		<imprint>
			<biblScope unit="volume">7499</biblScope>
			<biblScope unit="page" from="282" to="289" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Identification of authorship of Ukrainian-language texts of journalistic style using neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lupei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mitsa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Repariuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sharkan</surname></persName>
		</author>
		<idno type="DOI">10.15587/1729-4061.2020.195041</idno>
	</analytic>
	<monogr>
		<title level="j">Eastern-European Journal of Enterprise Technologies</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="30" to="36" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Development of the Quantitative Method for Automated Text Content Authorship Attribution Based on the Statistical Analysis of N-grams Distribution</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lytvyn</surname></persName>
		</author>
		<idno type="DOI">10.15587/1729-4061.2019.186834</idno>
	</analytic>
	<monogr>
		<title level="j">Eastern-European Journal of Enterprise Technologies</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="28" to="51" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Solving the problem of determining the author of text data using a combined assessment</title>
		<author>
			<persName><forename type="first">V</forename><surname>Moshkina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Andreeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yarushkinaa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CEUR Workshop</title>
		<imprint>
			<biblScope unit="volume">2782</biblScope>
			<biblScope unit="page" from="112" to="118" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Authorship Attribution in Portuguese Using Character N-grams</title>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Baptista</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pichardo-Lagunas</surname></persName>
		</author>
		<idno type="DOI">10.12700/APH.14.3.2017.3.4</idno>
	</analytic>
	<monogr>
		<title level="j">Acta Polytechnica Hungarica</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="59" to="78" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Automatic Authorship Attribution Using Syllables as Classification Features</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">O</forename><surname>Sidorov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Rhema journal</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="62" to="81" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Szwed</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-58274-0_26</idno>
	</analytic>
	<monogr>
		<title level="s">Communications in Computer and Information Science</title>
		<editor>Kozielski S., Mrozek D., Kasprowski P., Małysiak-Mrozek B., Kostrzewa D.</editor>
		<imprint>
			<biblScope unit="volume">716</biblScope>
			<date type="published" when="2017">2017. 2017</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
	<note>Authorship Attribution for Polish Texts Based on Part of Speech Tagging</note>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Using Frequent Fixed or Variable-Length POS Ngrams or Skip-Grams for Blog Authorship Attribution</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pokou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Fournier-Viger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ch</forename><surname>Moghrabi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CPCI-S 收录</title>
		<imprint>
			<biblScope unit="page" from="63" to="74" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
