<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Machine Learning and Classical Methods Combined for Text Differentiation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Iryna</forename><surname>Khomytska</surname></persName>
							<email>iryna.khomytska@ukr.net</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vasyl</forename><surname>Teslyuk</surname></persName>
							<email>vasyl.m.teslyuk@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iryna</forename><surname>Bazylevych</surname></persName>
							<email>i_bazylevych@yahoo.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Ivan Franko National University of Lviv</orgName>
								<address>
									<postCode>79000</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yuliia</forename><surname>Kordiiaka</surname></persName>
							<email>yuliia.m.kordiiaka@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">International Conference on Computational Linguistics and Intelligent Systems</orgName>
								<address>
									<addrLine>May 12-13</addrLine>
									<postCode>2022</postCode>
									<settlement>Gliwice</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Machine Learning and Classical Methods Combined for Text Differentiation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D3AE7240617225D0A625520A6AAEAFDC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Data clustering</term>
					<term>Student&apos;s t-test</term>
					<term>Style factor effect</term>
					<term>Authorial style factor effect</term>
					<term>Authorship attribution</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The novelty of the research is an offered combination of the machine learning methodthe data clustering and the classical method -the Student's t-test to differentiate English and Ukrainian texts. The efficiency of the two methods has been proved to be high for determining the style factor effect and the authorial style factor effect. The research allows us to conclude that the data clustering is a simpler method than the Student's t-test, but it ensures essential differences in fewer cases than the Student's t-test. The use of the Student's t-test is more complicated as it can be performed only after the Pearson's normality test. However, with the help of the Student's t-test, the essential differences have been established in most cases with a test validity of 95%. The research shows that the proposed combination of methods ensures reliable results. The obtained results may be used for text analysis and authorship attribution.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The problem raised in the research is closely connected with text analysis. Text differentiation implies identifying the text distinctive features. There are different approaches to text analysis. They can be classified according to the language level (phonological, lexical, syntactic) and language units. All the approaches aim at characterizing specificity of the researched functional style or authorial style. The machine learning methods are widely used for text analysis <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. However, classical methods also give good results <ref type="bibr" target="#b2">[3]</ref>. Distribution of language units on every language level has its particular character. It is different for every style and text. This particular distribution of language units has a differentiating capability. The established degree of similarity between the compared texts has its practical application. This way we can attribute a text to an author. In other words, we can perform authorship attribution. The problem is not easy to solve, as several linguistic factors may overlap. These are: the style factor, the topic related factor and the authorial style factor. The texts of two different authors should have the same topic. Only in this case, the authorial style peculiarities can be identified. Otherwise, the differences will be topic related. Text differentiation is successfully done by the machine learning methodthe data clustering. This method consists in grouping language units according to some common feature. The language units of one cluster are different from those of the other cluster. The difference between the clusters reflects the difference between the authorial styles. The data clustering is used for psychological portrait formation of social networks users <ref type="bibr" target="#b3">[4]</ref>. Emotional coloring of news headlines is also detected by the data clustering <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>. The method of data clustering is widely used along with the other methods for solving linguistic tasks on different language levels.</p><p>The quantitative approach is used for feminism studies in Ukraine <ref type="bibr" target="#b6">[7]</ref>, for researching the semantic nature of the community Reddit feed post <ref type="bibr" target="#b7">[8]</ref>, for mapping emotional dislocation of translational fiction <ref type="bibr" target="#b8">[9]</ref>, for characterizing peculiarities of Lucy Montgomery's literary style <ref type="bibr" target="#b9">[10]</ref>, for analyzing the distribution of meiosis and litotes in The Catcher in the Rye by Jerome David Salinger <ref type="bibr" target="#b10">[11]</ref>, for studying anthropocentrism as implementation of a testator/testatrix's communicative goal <ref type="bibr" target="#b11">[12]</ref>. The analysis of the mentioned research allows us to state that the quantitative approach gives valuable results for linguistics. However, we recommend to combine the machine learning methods with the classical ones.</p><p>The purpose of our research is to determine an efficient combination of the machine learning and the classical methods which ensures high test validity results for text differentiating. The novel approach consists in offering a combination of the data clustering and the Student's t-test for differentiating English and Ukrainian texts. In our previous research, the Student's t-test proved to be efficient on the phonological level. The authors were differentiated by consonant phoneme groups <ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref>. This method was also successfully applied on the lexical level <ref type="bibr" target="#b15">[16]</ref>. The data clustering method is efficient on the same language levelsthe phonological and lexical levels <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b16">17]</ref>. Consequently, the Student's t-test and the data clustering method can be combined for text differentiation. The combination of the two methods ensures more reliable results.</p><p>The latest methodologies and approaches aim at an optimal solution of the problem of text differentiation. The solution must be simple and it must ensure high accuracy. The problem is not easy to solve as the authorial features are not often clear-cut. The degree of clarity of authorial style features must be sufficient. An author may use the vocabulary common for certain sphere of communication. Because of this, the authorial style lacks the distinctive individual features, by which the manner of writing of one author can be differentiated from that of another author. In fiction, the author's writing is peculiar and can be easily characterized. In scientific papers and formal documents, the author's manner of writing can hardly be noticeable. In this case, different approaches are used to define the differentiating features of this piece of writing. Therefore, we propose a combination of the machine learning and the classical methods. The data clustering ensures a simple solution of text differentiation. The Student's t-test ensures reliable results.</p><p>The research is done on the lexical level (function words) and the phonological level (consonant phoneme groups) <ref type="bibr" target="#b17">[18]</ref>. The texts from Ukrainian emotive prose, English poetry and the colloquial style are researched with the help of the data clustering method and the Student's t-test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Mathematical support of software system</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">The Proposed Combination of Methods</head><p>A combination of the machine learning and the classical methodsthe data clustering and the Student's t-test is proposed for text differentiation on the lexical level and the phonological level. The research is done according to the following algorithm.</p><p>1. Change uppercase to lowercase of all the letters in the researched Ukrainian and English texts of equal size 2. Remove all the punctuation marks 3. Leave only one space between the words 4. Put a space at the beginning and at the end of the text 5. Calculate the absolute frequency of occurrence of function words 6. Use the method of hierarchical clustering <ref type="bibr" target="#b18">[19]</ref> 7. Transcribe the English texts 8. Form samples of equal size for consonant phonemes 9. Calculate the absolute and the mean frequency of occurrence for consonants 10. Form eight consonant phoneme groups 11. Perform the Pearson's normality test for eight consonant phoneme groups:</p><formula xml:id="formula_0">2 2 1 () ˆN ii n i i np np       , (<label>1</label></formula><formula xml:id="formula_1">)</formula><p>where N is a number of intervals <ref type="bibr">[20 -21]</ref>.</p><p>12. Perform the Student's t-test:</p><formula xml:id="formula_2">;( 2) ( ) / nm nm t s t nm        ,<label>(2)</label></formula><p>where  and  are the mean frequencies of occurrence of consonant phoneme groups for the compared samples n and m <ref type="bibr">[22 -24]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">The Developed Software</head><p>A combination of the data clustering and the Student's t-test is the basis of the program for text differentiation. The structure of the program includes the following modules <ref type="bibr" target="#b24">[25]</ref>. The structure of the classes of the software is the following: Main, SampleProcessor, TranscriptionProcessor, ConsonantProcessor, ConsonantUtils, StatisticProcessor.</p><formula xml:id="formula_3"></formula><p>In the class Main, the text files are downloaded and the sequence of operations is controlled. In the class SampleProcessor, all unnecessary symbols are removed. In the class TranscriptionProcessor, the English texts are transcribed. In the class ConsonantProcessor, the samples of consonants are formed. In the class Consonant Utils, the absolute and the mean frequencies for consonants are calculated. In the class StatisticProcessor, the Pearson's test and the Student's t-test are performed. The program code is the following: &gt;library(readxl) &gt;x=read_excel("C:/Users/Катя/Desktop/mag/clust.xlsx") &gt;z=c("London (Before Adam)","Henry(The Sea-Wolf)","Henry(The last leaf)","London(White fang)","Henry(The furnished room)","London(Advanture)") &gt;rownames(x) = z &gt;XDYST=dist(x, method="euclidean") &gt;tree=hclust(XDYST, method="single") &gt;plot(tree) &gt;tree=hclust(XDYST, method="complete")</p><p>The Python program code for the literary work "Tsyklon" by O. Honchar is presented in Figure <ref type="figure" target="#fig_0">1</ref>. Single Linkage and Complete Linkage are used for a distance between the clusters. Euclidean distance is used for a distance between the objects of the clusters. Complete Linkage is used for the texts of Ukrainian emotive prose in the case Single Linkage is not successful.</p><p>The algorithm of the program functioning for the text differentiation by the data clustering and the Student's t-test is shown in Figure <ref type="figure">2</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results of the Study</head><p>The data clustering has been performed in eight samples from Ukrainian emotive prose. These are the texts from the following literary works: "Tsyklon" by O. Honchar, "Sobor" by O. Honchar, "Lev ta mysha" by L. Hlibov, "Konyk strybunets" by L. Hlibov, "Malyy Myron" by I. Franko, "Na loni pryrody" by I. Franko and "Zakhar Berkut" by I. Franko. In these comparisons, the authorial style effect is determined. The results of the data clustering for the mentioned literary works are shown in Figure <ref type="figure">3</ref>.</p><p>In Figure <ref type="figure">3</ref>, we see that the results of the data clustering are successful for "Tsyklon" and "Sobor" by O. Honchar, "Lev ta mysha" and "Konyk strybunets" by L. Hlibov, "Malyy Myron" and "Na loni pryrody" by I. Franko, but not very successful for "Zakhar Berkut" by I. Franko. All the researched literary works by I. Franko are not in the same cluster. Therefore, we change the used Single Linkage for Complete Linkage (Figure <ref type="figure">4</ref>).</p><p>The matrix of distances is shown in Table <ref type="table">1</ref>. In this Table <ref type="table">,</ref> we can see that there is a little distance between two literary works by I. Franko -"Malyy Myron" and "Na loni pryrody". This result proves that the two literary works have the same author.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1. The matrix of distances between the researched texts</head><p>The analysis of the comparisons of literary works "Lev ta mysha" and "Konyk strybunets" by L. Hlibov shows a little distance -11,9. A greater distance is for the comparison "Tsyklon" and "Sobor" by O. Honchar -16,6. The greatest distance -28,8 is for the comparison of literary works by different authors -"Na loni pryrody" by I. Franko and "Konyk strybunets" by L. Hlibov.</p><p>In Figure <ref type="figure">4</ref>, we see that the use of Complete Linkage has given a better result, as all the literary works by one author ("Malyy Myron", "Na loni pryrody" and "Zakhar Berkut" by I. Franko) are in one cluster. Consequently, the use of Complete Linkage is more efficient for solving this task.</p><p>The whole process of the data clustering is presented in  The task of text differentiating has also been done on the phonological level with the help of the classical method -the Student's t-test. The English texts -Th. Moore's poetry and the colloquial style have been differentiated in eight consonant groups. The essential differences between the text compared are shown in Tables <ref type="table" target="#tab_2">3, 4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2.</head><p>The whole process of the data clustering for the researched texts Table <ref type="table">3</ref>. The results of the calculations for the comparison between Moore's poetry and the colloquial style in an unidentified position In Tables <ref type="table" target="#tab_2">3, 4</ref>, 5 and 6 the following designations are used: CGconsonant groups; MP -Moore's poetry; CSthe colloquial style; Lblabials; Drdorsals; Crcoronals; Vlvelars; Nsnasals; Sn sonorous; Frfricatives; Ststops; S is a dispersion; t is the Student's statistic;</p><formula xml:id="formula_4">CG MP x MP   2 i xx  CS x CS   2 x x i   Lb 137,</formula><p>2Q is a significance level; x is the mean value of frequencies of occurrence of consonant groups;  </p><formula xml:id="formula_5">2 i xx </formula><p>is a sum of squares of difference of the value of middle of the interval and the mean value of frequencies of occurrence of consonant groups, 12 xx  is the value of difference between the two compared samples. In an unidentified position, the applied Student's t-test has given a very good result: the essential differences have been established in six out of eight consonant groups. For the groups of the labial and sonorous consonants the differences are statistically insignificant. The mentioned degree of similarity can be explained by the use of words from the colloquial style in the researched Moore's poetry.</p><p>The results have been obtained with a test validity of 95% in the comparisons presented in Tables 3, 4, 5 and 6. x x </p><p>In the position at the beginning of a word, the results are also good (Tables <ref type="table" target="#tab_4">5, 6</ref>). Statistically significant differences have been revealed in five out of eight consonant groups. In addition to the labial and sonorous consonants, the differences are statistically insignificant for the nasals.</p><p>Having analyzed the results of this research, we can state that both the data clustering method and the Student's t-test are efficient for text differentiation on the phonological and lexical levels. However, the former is simpler, the latter is more reliable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>The use of the machine learning methodthe data clustering and the classical methodthe Student's t-test has solved the task of text differentiation, the practical application of which is authorship attribution. The proposed combination of the data clustering method and the Student's t-test is the novelty of the research. The text differentiation task has been successfully done on the lexical level. The texts by I. Franko, O. Honchar and L. Hlibov have been analyzed. The established little distance between the researched texts has proved the fact that they are written by the same author. Consequently, the authorial style effect has been revealed. A good example is the comparison of "Malyy Myron" and "Na loni pryrody" by I. Franko in which the distance is equal to 13,4. The applied classical methodthe Student's t-test has given a good result for determining the style factor effect on the phonological level. The texts of Th. Moore's poetry and the colloquial style differ in 6 out of 8 consonant groups for an unidentified position in a word and in 5 out of 8for the position at the beginning of a word. The results of the research have shown that the data clustering is a simpler method if compared to the Student's t-test. It shows better results if Complete Linkage is used. However, the Student's t-test ensures more reliable data with a test validity of 95%. The practical application of the results is the style</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Calculations for the literary work "Tsyklon" by O. Honchar</figDesc><graphic coords="4,72.00,72.00,448.85,150.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :Figure 4 :</head><label>34</label><figDesc>Figure 3:The results of the data clustering for "Tsyklon" by O. Honchar, "Sobor" by O. Honchar, "Lev ta mysha" by L. Hlibov, "Konyk strybunets" by L. Hlibov, "Malyy Myron" by I. Franko, "Na loni pryrody" by I. Franko and "Zakhar Berkut" by I. Franko</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="1,0.00,191.15,594.96,459.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="7,72.00,98.86,449.45,129.90" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc></figDesc><table><row><cell>Start</cell></row><row><cell>Downloading English and Ukrainian texts English text</cell></row><row><cell>Changing uppercase to lowercase in all the words</cell></row><row><cell>Forming a sample of Ukrainian function words</cell></row><row><cell>Performing the data hierarchical clustering for</cell></row><row><cell>Ukrainian function words</cell></row><row><cell>Transcribing the English texts of poetry and the</cell></row><row><cell>colloquial style</cell></row><row><cell>Forming samples of consonants</cell></row><row><cell>Calculating absolute frequencies for consonant groups</cell></row><row><cell>Calculating mean frequencies for consonant groups</cell></row><row><cell>Performing the Pearson's test</cell></row><row><cell>Performing the Student's t-test</cell></row><row><cell>Comparing the results of two methods appthree tests</cell></row><row><cell>End</cell></row><row><cell>Figure 2: A block-scheme of the algorithm of the program functioning for the text differentiation by</cell></row><row><cell>the data clustering and the Student's t-test</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4 .</head><label>4</label><figDesc>The essential differences between Moore's poetry and the colloquial style in an unidentified position CG S</figDesc><table><row><cell></cell><cell>9</cell><cell>4156,56</cell><cell>131,9</cell><cell cols="2">7611,48</cell></row><row><cell>Dr</cell><cell>425,0</cell><cell>8178,00</cell><cell>362,9</cell><cell cols="2">32500,3</cell></row><row><cell>Cr</cell><cell>5,9</cell><cell>143,58</cell><cell>18,6</cell><cell cols="2">5175,36</cell></row><row><cell>Vl</cell><cell>59,3</cell><cell>3242,26</cell><cell>72,6</cell><cell cols="2">5157,36</cell></row><row><cell>Ns</cell><cell>82,9</cell><cell>1902,71</cell><cell>76,8</cell><cell cols="2">4202,84</cell></row><row><cell>Sn</cell><cell>233,9</cell><cell>4890,01</cell><cell>226,9</cell><cell cols="2">14575,5</cell></row><row><cell>Fr</cell><cell>210,3</cell><cell>8529,18</cell><cell>158,9</cell><cell cols="2">7948,71</cell></row><row><cell>St</cell><cell>182,7</cell><cell>10670</cell><cell>226,5</cell><cell cols="2">5725,74</cell></row><row><cell></cell><cell></cell><cell>t</cell><cell>Q 2</cell><cell>x </cell><cell>x</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>1</cell><cell>2</cell></row><row><cell>Lb</cell><cell>14,00</cell><cell>1,69</cell><cell>&gt; 5%</cell><cell cols="2">Unessential</cell></row><row><cell>Dr</cell><cell>26,04</cell><cell>9,39</cell><cell>&lt; 0.1%</cell><cell cols="2">Essential</cell></row><row><cell>Cr</cell><cell>9,42</cell><cell>5,31</cell><cell>&lt; 0.1%</cell><cell cols="2">Essential</cell></row><row><cell>Vl</cell><cell>11,83</cell><cell>4,43</cell><cell>&lt; 0.1%</cell><cell cols="2">Essential</cell></row><row><cell>Ns</cell><cell>10,09</cell><cell>2,38</cell><cell>&lt; 5%</cell><cell cols="2">Essential</cell></row><row><cell>Sn</cell><cell>18,01</cell><cell>1,53</cell><cell>&gt; 10%</cell><cell cols="2">Unessential</cell></row><row><cell>Fr</cell><cell>16,57</cell><cell>12,21</cell><cell>&lt; 0.1%</cell><cell cols="2">Essential</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5 .</head><label>5</label><figDesc>The results of the calculations for the comparison between Moore's poetry and the colloquial style at</figDesc><table><row><cell>the beginning of a word</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>CG</cell><cell>MP x</cell><cell>MP  </cell><cell>x i </cell><cell>x</cell><cell> 2</cell><cell>CS</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 6 .</head><label>6</label><figDesc>The essential differences between Moore's poetry and the colloquial style at the beginning of a word</figDesc><table><row><cell>CG</cell><cell>S</cell><cell>t</cell><cell>Q 2</cell><cell>1 x </cell><cell>x</cell><cell>2</cell></row><row><cell>Lb</cell><cell>10,74</cell><cell>0,07</cell><cell>&gt; 80%</cell><cell cols="3">Unessential</cell></row><row><cell>Dr</cell><cell>13,76</cell><cell>4,49</cell><cell>&lt; 0,1%</cell><cell cols="3">Essential</cell></row><row><cell>Cr</cell><cell>7,68</cell><cell>4,77</cell><cell>&lt; 0,1%</cell><cell cols="3">Essential</cell></row><row><cell>Vl</cell><cell>8,17</cell><cell>0,24</cell><cell>&gt; 80%</cell><cell cols="3">Unessential</cell></row><row><cell>Ns</cell><cell>3,40</cell><cell>2,32</cell><cell>&lt; 5%</cell><cell cols="3">Essential</cell></row><row><cell>Sn</cell><cell>11,10</cell><cell>1,88</cell><cell>&gt; 5%</cell><cell cols="3">Unessential</cell></row><row><cell>Fr</cell><cell>12,85</cell><cell>9,99</cell><cell>&lt; 0,1%</cell><cell cols="3">Essential</cell></row><row><cell>CG</cell><cell>S</cell><cell>t</cell><cell>Q 2</cell><cell>1</cell><cell></cell><cell>2</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Shared Tasks on Authorship Analysis at PAN</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ghanem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Giachanou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Manjavacas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Specht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-45442-5_66</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval, 42nd European Conference on IR Research, ECIR 2020</title>
				<meeting><address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-04-14">2020. April 14-17, 2020. 2020</date>
			<biblScope unit="page" from="508" to="516" />
		</imprint>
	</monogr>
	<note>Proceedings, Part II</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Specht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF 2018 Evaluation Labs. CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">2125</biblScope>
			<biblScope unit="page" from="1" to="25" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Statistics for Linguistics with R: A Practical Introduction</title>
		<author>
			<persName><forename type="middle">S</forename><surname>Th</surname></persName>
		</author>
		<author>
			<persName><surname>Gries</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Trends in Linguistics: Studies &amp; Monographs</title>
				<imprint>
			<publisher>Mouton de Gruyter</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">348</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Technology for the psychological portraits formation of social networks users for the IT specialists recruitment based on Big Five, NLP and Big Data Analysis</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lytvyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vysotska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rzheuskyi</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2392/paper12.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2392</biblScope>
			<biblScope unit="page" from="147" to="171" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The Sarcasm Detection In News Headlines Based on Machine Learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zanchak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vysotska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Albota</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE 16th International Conference on Computer Sciences and Computer technologies, CSIT 2021</title>
				<meeting>the IEEE 16th International Conference on Computer Sciences and Computer technologies, CSIT 2021<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-09">Sept. 2021</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="131" to="137" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Information technology for determining structure of social group based on fuzzy c-means</title>
		<author>
			<persName><forename type="first">O</forename><surname>Mulesa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Geche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Batyuk</surname></persName>
		</author>
		<idno type="DOI">10.1109/STC-CSIT.2015.7325431</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Xth International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2015</title>
				<meeting>the IEEE Xth International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2015<address><addrLine>Lviv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="60" to="62" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Attitudes Toward Feminism in Ukraine: A Sentiment Analysis of Tweets</title>
		<author>
			<persName><forename type="first">O</forename><surname>Levchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dilai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Intelligent Systems and Computing III. CSIT</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Shakhovska</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Medykovskyy</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018. 2019</date>
			<biblScope unit="volume">871</biblScope>
			<biblScope unit="page" from="119" to="131" />
		</imprint>
	</monogr>
	<note>Advances in Intelligent Systems and Computing</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Linguistically manipulative, disputable, semantic nature of the community Reddit feed post</title>
		<author>
			<persName><forename type="first">S</forename><surname>Albota</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International conference on computational linguistics and intelligent systems, COLINS 2021</title>
				<meeting>the 5th International conference on computational linguistics and intelligent systems, COLINS 2021<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-04-22">2021. April 22-23. 2021</date>
			<biblScope unit="volume">2870</biblScope>
			<biblScope unit="page" from="769" to="783" />
		</imprint>
	</monogr>
	<note>main conference</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Computational Linguistics Tools in Mapping Emotional Dislocation of Translated Fiction</title>
		<author>
			<persName><forename type="first">I</forename><surname>Bekhta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Hrytsiv</surname></persName>
		</author>
		<ptr target="WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems</title>
				<meeting>the 5th International Conference on Computational Linguistics and Intelligent Systems<address><addrLine>COLINS; Kharkiv, Ukraine</addrLine></address></meeting>
		<imprint>
			<publisher>Workshop</publisher>
			<date type="published" when="2021-04-22">2021. April 22-23. 2021</date>
			<biblScope unit="volume">I</biblScope>
			<biblScope unit="page" from="685" to="699" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Quantitative parameters of Lucy Montgomery&apos;s literary style</title>
		<author>
			<persName><forename type="first">N</forename><surname>Hrytsiv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Shestakevych</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Shyyka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International conference on computational linguistics and intelligent systems</title>
				<meeting>the 5th International conference on computational linguistics and intelligent systems<address><addrLine>COLINS; Kharkiv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-04-22">2021. 2021. April 22-23. 2021</date>
			<biblScope unit="volume">2870</biblScope>
			<biblScope unit="page" from="670" to="684" />
		</imprint>
	</monogr>
	<note>I: main conference</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Meiosis and litotes in The Catcher in the Rye by Jerome David Salinger: text mining</title>
		<author>
			<persName><forename type="first">M</forename><surname>Karp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kunanets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kucher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International conference on computational linguistics and intelligent systems</title>
				<meeting>the 5th International conference on computational linguistics and intelligent systems<address><addrLine>COLINS; Kharkiv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-04-22">2021. 2021. April 22-23. 2021</date>
			<biblScope unit="volume">2870</biblScope>
			<biblScope unit="page" from="166" to="178" />
		</imprint>
	</monogr>
	<note>I: main conference</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Anthropocentrism as implementation of a testator/testatrix&apos;s communicative goal</title>
		<author>
			<persName><forename type="first">O</forename><surname>Kulyna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International conference on computational linguistics and intelligent systems, COLINS 2021</title>
				<meeting>the 5th International conference on computational linguistics and intelligent systems, COLINS 2021<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-04-22">2021. April 22-23. 2021</date>
			<biblScope unit="volume">2870</biblScope>
			<biblScope unit="page" from="845" to="854" />
		</imprint>
	</monogr>
	<note>main conference</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Software-Based Approach towards Automated Authorship Acknowledgement-Chi-Square Test on One Consonant Group</title>
		<author>
			<persName><forename type="first">I</forename><surname>Khomytska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Teslyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kryvinska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bazylevych</surname></persName>
		</author>
		<idno type="DOI">10.3390/electronics9071138</idno>
		<ptr target="https://doi.org/10.3390/electronics9071138" />
	</analytic>
	<monogr>
		<title level="j">Electronics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">1138</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Application of global optimization methods to increase the accuracy of classification in the data mining tasks</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Doroshenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedingsthis link is disabled</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="98" to="109" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Classification of imbalanced classes using the committee of neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Doroshenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tkachenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Scientific and Technical Conference on Computer Sciences and Information Technologies</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="400" to="403" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Pеrebyjnis</surname></persName>
		</author>
		<title level="m">Statystychni metody dlia lingvistiv</title>
				<meeting><address><addrLine>Vinnytsia, Ukraine</addrLine></address></meeting>
		<imprint>
			<publisher>Nova Knyha</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note>in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Using function words for authorship attribution: Bag-of-words vs. sequential rules</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Boukhaled</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Ganascia</surname></persName>
		</author>
		<ptr target="https://hal.sorbonne-universite.fr/hal-01198407/document" />
	</analytic>
	<monogr>
		<title level="m">the 11th International Workshop on Natural Language Processing and Cognitive Science</title>
				<meeting><address><addrLine>Venice, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter</publisher>
			<date type="published" when="2014-10">Oct 2014. 2015</date>
			<biblScope unit="page" from="115" to="122" />
		</imprint>
	</monogr>
	<note>Natural Language Processing and Cognitive Science Proceedings</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Authorship Attribution by Differentiation of Phonostatistical Structures of Styles</title>
		<author>
			<persName><forename type="first">I</forename><surname>Khomytska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Teslyuk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Scientific and Technical Conference on Computer Sciences and Information Technologies</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="5" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">O</forename><surname>Klymchuk</surname></persName>
		</author>
		<title level="m">Klasternyy analiz.: vykorystannia u psykholohichnyh doslidzhenniah, Praktychna psykhologia ta sotsialna robota</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="30" to="36" />
		</imprint>
	</monogr>
	<note>in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">C</forename><surname>Gomez</surname></persName>
		</author>
		<title level="m">Statistical Methods in Language and Linguistic Research</title>
				<meeting><address><addrLine>Murcia, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
		<respStmt>
			<orgName>University of</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Kornai</surname></persName>
		</author>
		<title level="m">Mathematical Linguistics</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">M</forename><surname>Turchyn</surname></persName>
		</author>
		<title level="m">Matematychna statystyka. Navch. Posib. Vydavnychyj tsentr &quot;Akademia</title>
				<meeting><address><addrLine>Kyiv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
	<note>in Ukrainian</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Approach for minimization of phoneme groups in authorship attribution</title>
		<author>
			<persName><forename type="first">I</forename><surname>Khomytska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Teslyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bazylevych</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Shylinska</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computing</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="55" to="62" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Matematicheskaya statistika</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">I</forename><surname>Ivchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yu</forename><forename type="middle">I</forename><surname>Medvedev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Moskva: Vyssh. Shk</title>
		<imprint>
			<biblScope unit="page">248</biblScope>
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Software Architecture Design of the Real-Time Processes Monitoring Platform</title>
		<author>
			<persName><forename type="first">A</forename><surname>Batyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Voityshyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Verhun</surname></persName>
		</author>
		<idno type="DOI">10.1109/DSMP.2018.8478589</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Second International Conference on Data Stream Mining &amp; Processing, DSMP 2018</title>
				<meeting>the IEEE Second International Conference on Data Stream Mining &amp; Processing, DSMP 2018<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="98" to="101" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
