<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evaluation of the Gene Expression Profiles Complex Proximity Metric Effectiveness Based on a Hybrid Technique of Gene Expression Data Extraction</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lyudmyla</forename><surname>Yasinska-Damri</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ukrainian Academy of Printing</orgName>
								<address>
									<addrLine>Pid Goloskom street, 19</addrLine>
									<postCode>79000</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Igor</forename><surname>Liakh</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Uzhhorod National University</orgName>
								<address>
									<addrLine>University street, 14</addrLine>
									<postCode>88000</postCode>
									<settlement>Uzhhorod</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sergii</forename><surname>Babichev</surname></persName>
							<email>sbabichev@ksu.ks.ua</email>
							<affiliation key="aff2">
								<orgName type="institution">Kherson State University</orgName>
								<address>
									<addrLine>University street, 27</addrLine>
									<postCode>73000</postCode>
									<settlement>Kherson</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bohdan</forename><surname>Durnyak</surname></persName>
							<email>durnyak@uad.lviv.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Ukrainian Academy of Printing</orgName>
								<address>
									<addrLine>Pid Goloskom street, 19</addrLine>
									<postCode>79000</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Evaluation of the Gene Expression Profiles Complex Proximity Metric Effectiveness Based on a Hybrid Technique of Gene Expression Data Extraction</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2CCBAB68F0417CAF4F6B3E1B5B06B89B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:39+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Gene expression profiles, proximity metrics, OPTICS clustering algorithm, gene expression profiles classification, inductive methods of objective clustering, clustering quality criteria, classification accuracy 0000-0002-8629-8658 (L. Yasinska-Damri)</term>
					<term>0000-0001-5417-9403 (I. Liakh)</term>
					<term>0000-0001-6797-1467 (S. Babichev)</term>
					<term>0000-0003-1526-9005 (B. Durnyak)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Gene expression data processing in order to develop the systems of complex diseases diagnostic or/and gene regulatory networks (GRN) reconstruction is one of the actual direction of modern bioinformatics. One of the important stages of this problem solving is an extraction of mutually correlated gene expression profiles (GEP) considering the used proximity metric. Within the framework of our research, we evaluate the complex metric of GEP proximity calculated as the combination of modified mutual information criterion and Pearson's chi-squared test using OPTICS clustering algorithm implemented using principles of the objective clustering inductive technique (OCIT). The examined objects classification accuracy was used as the main criterion to access the applied method effectiveness. The simulation results have shown that the proposed technique allows us to form an optimal GEP cluster structure in terms of maximum values of the patterns classification accuracy quality criterion.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction and literature review</head><p>The development of models of diseases diagnostics or/and gene regulatory networks (GRN) reconstruction using gene expression data (GED) is one of the actual directions of modern bioinformatics. As a rule, the initial GED is formed as a high dimensional array with components represented the studied patterns and genes. The value of gene expression is depended on the amount of this type of gene that determines the appropriate properties of the examined biological organism. Gene expression profile (GEP) means the vector of gene expressions the values of which are evaluated for the examined patterns.</p><p>Reconstruction of gene regulatory network (GRN) which adequate reflect the nature of genes interaction under the different states of a biological organism in order to develop both effective medicine and disease diagnostic and treating methods is possible provided the extraction of groups of highly and mutually expressed genes. For this reason, the stage of gene expression data preprocessing is very important at the early stage of GRN forming or under the development of a disease diagnosing model. Figure <ref type="figure" target="#fig_1">1</ref> illustrates a stepwise procedure for implementing this process. The filtration procedure, in this case, involves removing genes with zero expression at the first step and genes with low expression in terms of the empirically established threshold at the second step.</p><p>Moreover, data can contain gene expression profiles that are statistically significantly different from the GEP of the main group. It is obvious that such genes do not correlate with the profiles of other genes and they can also be removed from the data. Qualitative implementation of this stage allows significantly reducing the number of genes for further research. This fact also contributes to enhancing the quality of further steps of GED processing for the solving hereinafter described problem.</p><p>Figure <ref type="figure" target="#fig_1">1</ref>: Block-chart of a step-by-step procedure of GED processing to form clusters of highly and mutually expressed GEP In <ref type="bibr" target="#b0">[1]</ref>, the authors presented the "limma" module (Linear Models for Microarray and RNA-Seq Data), which contains various functions for generating, filtering and interpreting gene expression data obtained using both DNA microchips experiments and mRNA molecules sequencing method. This module is to some extent an alternative to the "Bioconductor" package, implemented in the data mining and machine learning R software <ref type="bibr" target="#b1">[2]</ref> and it is based on the use of linear models to allocate differently expressed genes in a multifactor experiment. This module also contains functions for the genes ontology analysis, which is very important for adequate GRN reconstruction, because the interpretation of genes and their interactions based on the analysis of conceptual interconnections allows identifying target genes, to establish the nature of interconnections between target and other genes taking into account appropriate disease.</p><p>The papers <ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref><ref type="bibr" target="#b4">[5]</ref> considered a various tools and techniques of GED filtering that are available in the "Bioconductor" package using quantitative quality criteria for GED received by DNA microarray method <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref> and mRNA molecules sequencing <ref type="bibr" target="#b4">[5]</ref>. As a simulation result, the authors proposed a stepwise algorithm for extracting highly and mutually expressed gene expression profiles for their further grouping into clusters. In a review <ref type="bibr" target="#b5">[6]</ref>, the authors conducted a comparative analysis of current software to process the GED for purpose of extracting the most informative genes. The analysis of the authors' research allows concluding on the feasibility of using the R software for GEP processing in order to form clusters of highly and mutually expressed genes because this software contains all necessary modules and functions to process gene expression data according to the solved task.</p><p>The review <ref type="bibr" target="#b6">[7]</ref> presents the research results focused on the study of various hybrid techniques to extract the clusters of mutually correlated GEP to solve the problem of creation of the system of cancer disease diagnostic. In the reviewed works, various combinations of filtering, clustering and classification techniques using various types of statistical criteria and gene expression profiles proximity metrics were applied. The examined objects classification accuracy was applied as the principal quality metric to assess the appropriate hybrid model effectiveness. The following filtration techniques and methods to estimate the gene expression profiles proximity were analyzed in this review: mutual information maximization method <ref type="bibr" target="#b7">[8]</ref>, 2   Pearson's test <ref type="bibr" target="#b8">[9]</ref>, correlation-based feature selection technique <ref type="bibr" target="#b9">[10]</ref>, Laplacian and Fisher score <ref type="bibr" target="#b10">[11]</ref>, information gain method <ref type="bibr" target="#b11">[12]</ref>, Fisher criterion <ref type="bibr" target="#b12">[13]</ref>, independent component analysis <ref type="bibr" target="#b13">[14]</ref>, maximum relevance minimum redundancy <ref type="bibr" target="#b14">[15]</ref>, probabilistic random function <ref type="bibr" target="#b15">[16]</ref>, random forest ranking <ref type="bibr" target="#b16">[17]</ref>, Fisher-Markov selector <ref type="bibr" target="#b17">[18]</ref>, symmetrical uncertainty <ref type="bibr" target="#b18">[19]</ref> and logarithmic transformation <ref type="bibr" target="#b19">[20]</ref> method. However, we would like to note that in the analyzed research high classification accuracy in most cases is achieved when using a low number of the extracted GEP. Moreover, the parameters of the respective technique used in the appropriate hybrid models are set upped empirically when the simulation process is performed. Undoubtedly, this fact is one of the main disadvantages of the analyzed models.</p><p>The works <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref> presents the partial decision of this task. A stepwise procedure of GEP extraction on the basis of the joint application of Shannon entropy, statistical criteria, clustering technique based on the SOTA clustering algorithm and random forest binary classifier was developed in these papers. The suitable algorithm parameters considering the classification accuracy were set a priory according to the OCIT principles. However, only correlation proximity metric was used within the framework of the authors' research. Thus, the presented hereinbefore brief review allows concluding that an effective model of GEP extraction based on joint application of various proximity metrics, clustering and classification techniques is absent now. This problem can be solved on the basis of joint application of various techniques used successfully in current data science directions of scientific research nowadays <ref type="bibr" target="#b22">[23]</ref><ref type="bibr" target="#b23">[24]</ref><ref type="bibr" target="#b24">[25]</ref><ref type="bibr" target="#b25">[26]</ref>.</p><p>In this work, we consider the GEP hybrid proximity metric calculated as a combination of modified mutual information maximization method and Pearson's 2   test. The modified mutual information maximization method, in this instance, takes into account various methods of Shannon entropy evaluation.</p><p>The objective of the research is the development and evaluation of a hybrid model of GEP extraction on the basis of joint application of hybrid proximity metric, OPTICS clustering algorithm implemented using principles of OCIT and random forest binary classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Materials and methods</head><p>In the general instance, the clustering internal quality criterion should consider both the gene expression profiles allocation inside clusters and clusters' medians allocation relative to each other. Thus, this criterion should be complex and contains two components. If we assume that K is the number of clusters, then the formula for assessing the first component of this criterion can be calculated in the following way:</p><formula xml:id="formula_0">     K k N i k i k k C e d N K QCW 1 1 ) , (<label>1 1 (1)</label></formula><p>where:</p><p>k N and k C are the number of GEP in k-th cluster and the median of k-th cluster respectively; ) , (</p><formula xml:id="formula_1">k i C e d</formula><p>is the distance between i-th profiles and median of this cluster calculated using complex proximity metric which contained both the modified mutual information maximization method (considered various methods of Shannon entropy calculation) and Pearson's 2   test the effectiveness of which is proved in <ref type="bibr" target="#b26">[27]</ref>.</p><p>The second component of the internal criterion can be assessed as the average distance between the allocated clusters' medians:</p><formula xml:id="formula_2">        1 1 1 ) , ( ) 1 ( 2 K i K i j j i C C d K K QCB (2)</formula><p>In <ref type="bibr" target="#b20">[21]</ref>, the authors performed modelling to assess the performance of different types of internal criteria, containing (1) and ( <ref type="formula">2</ref>) as the components. As a result, a hybrid internal criterion formed as a ratio of Calinski-Harabasz criterion and WB index has been proposed:</p><formula xml:id="formula_3">2 2 int ) ( ) 1 ( QCB K N QCW K K QC    (<label>3</label></formula><formula xml:id="formula_4">)</formula><p>where N is the number of objects that should be grouped. This criterion was used as the internal one during the modelling procedure performing. Assessment of the efficiency of both the GEP hybrid proximity metric and quality criteria when the profiles grouping into clusters was performed based on the application of density clustering algorithm Optics <ref type="bibr" target="#b27">[28]</ref>, which is a logical development of DBSCAN density algorithm and allows us to form a multicluster structure based on the application of respective proximity metric. The feasibility of using the OPTICS clustering algorithm is determined by the fact that its application allows us not only to form a multicluster structure containing clusters of close gene expression profiles by density in their allocation in feature space but also to allocate profiles identified as noise because of density of their allocation relative to other GEP is much lower compared to the density of the main groups of GEP distribution.</p><p>We would like to note that the criterion calculated by formulas (1) -( <ref type="formula" target="#formula_3">3</ref>) does not always allow us to objectively form an adequate clustering due to the reproducibility error, which is inherent to most prevailing clustering algorithms. In other words, satisfactory results of data grouping gotten using one dataset are not always repeated when applying another similar dataset. In <ref type="bibr" target="#b28">[29]</ref>, the authors proposed the idea of reducing the reproducibility error by using "fresh data" (not used when creating the model) during the process of verifying the obtained model of object distribution into clusters and making the final decision regarding the cluster structure formation by joint using the internal, external and balance criteria, which considered possible discrepancies between internal and external criteria. This idea was further developed in <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref> where the objective clustering inductive technology was described and implemented. The authors proposed an external quality criterion assessed in the form of normalized distinction of the internal criteria assessed on two equivalent subsets (contained the same number of pairwise similar objects) at the appropriate hierarchical level of cluster structure formation:</p><formula xml:id="formula_5">int 2 int 1 int 2 int 1 QC QC QC QC QC ext   <label>(4)</label></formula><p>The main idea was as follows. The minimal reproducibility error matches the maximum degree of the similarity of objects allocation in clusters obtained on two equivalent subsets. Since the internal criteria consider the nature of both the patterns distribution in clusters and the clusters' medians allocation relative to each other, objective clustering (minimum value of reproducibility error) in this case corresponds to the minimal difference between the corresponding values of the internal criteria. The normalizing correction in formula (4) transforms the range of the external criteria values variation from 0 (zero reproducibility error) to 1 (maximum error). The balance criterion was calculated using the Harrington desirability function according to the algorithm described in detail in <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31]</ref>.</p><p>The random forest classifier was used to implement this step. This choice is determined by the previous authors' research, presented in <ref type="bibr" target="#b20">[21]</ref>, where various types of binary classifiers were studied to classify the samples of patients examined on lung cancer. These samples contained gene expression data as attributes too. The effectiveness of the respective model was assessed using the examined samples classification accuracy.</p><p>Figure <ref type="figure" target="#fig_0">2</ref> shows a block chart of the stepwise procedure performed within the framework of the modelling procedure executing. The practical implementation of this algorithm assumes the following stages:</p><p>Stage I. Formation of GEP data and functions to calculate respective criteria.</p><p>1.1. Forming a array of GED, the components of which represent the assessed patterns and genes whose expression determines the relative amount of a given type of gene for the examined patterns respectively.</p><p>1.2. Formation of the function to estimate the proximity metrics between GEP on the basis of the joint application of the modified mutual information maximization proximity metric and Pearson's 2   test <ref type="bibr" target="#b27">[28]</ref>.</p><p>1.3. Formation of the functions to calculate the internal, external and hybrid balance quality criteria.</p><p>1.4. Formation of the function to calculate the examined samples classification accuracy. 1.5. Formation of two equivalent subsets of GEP by the iterative distribution of the two nearest GEP according to a hybrid proximity metric into two equivalent subsets.</p><p>Stage II. Setup of density-based OPTICS clustering algorithm.</p><p>2.1. Setup of range for changing the minimum number of points within the ε-neighborhood: MinPtsmin, MinPtsmax.</p><p>2.2. Creating a reachability chart. Setup of both the range and step of variation of the εneighborhood values: Epsmin, Epsmax, dEps.</p><p>2.3. Calculation of distances between all pairs of gene expression profiles in equal-power subsets and formation of matrixes of distances between the corresponding profiles. The obtained distance matrixes will be used as input data when the clustering procedure is implemented by applying the OPTICS algorithm.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7.">If max</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MinPts k </head><p>, go to step 3.2 of this procedure. Otherwise, the creation of charts of the clustering and classification quality criteria depending on the Eps value for each of the MinPts values.</p><p>Stage IV. An analysis of the obtained results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4.</head><p>1. An analysis of the obtained charts. Forming conclusions regarding the effectiveness of hybrid metrics of GEP proximity in the process of forming subsets of informative genes for their further use when the creation of disease diagnosing systems or/and GRN reconstruction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiment, results and discussion</head><p>The practical implementation of the proposed algorithm was carried out using the GSE19188 gene expressions dataset of patients studied for the early stage of lung cancer <ref type="bibr" target="#b31">[32]</ref>. The data were obtained using a DNA microchips experiment and contained 156 microchips, 65 of them contained GED of healthy patients and 91 ones included the GED of patients with lung cancer tumor (mild form). 400 the most informative GEP in terms of classification accuracy (approximately 93%) <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref> were used during the simulation procedure implementation. The MinPts value was changed within the limits of 3 to 5. This interval was established empirically. The results of the modelling showed that a larger quantity of points within the Eps neighborhood degrades the simulation results both in terms of the number of clusters in the equal over subsets and in terms of gene expression profiles clustering quality criteria and the samples classification accuracy. The Eps values were varied from the minimum, which was calculated as the minimum distance between gene expression profiles in equal-power subsets to a 1.5 minimum distance. This range was also set empirically. When the Eps values was larger, the GEP were allocated into 2 clusters, and the clustering results were repeated. The resulting range of the Eps values variation was divided into 20 equal sections. The width of the section was equal to the step of the Eps value changing. According to the hereinbefore presented algorithm, the clustering and classification quality criteria were calculated only for cases where the number of clusters allocated on equal-power subsets was equal. This condition minimizes the reproducibility error. Tables <ref type="table" target="#tab_0">1 and 2</ref> and Figures <ref type="figure" target="#fig_3">3 and 4</ref> present the modelling results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>The result of the division of GEP into clusters when MinPts = 3 EPS,*10  The analysis of the obtained results allows concluding on the feasibility of using the proposed GEP proximity metric for the selection of mutually correlated profiles in the case of using a multicluster structure which is formed by applying the OPTICS clustering algorithm. The proposed method crate the condition to assess the algorithm suitable parameters in terms of the optimal nature of the GEP grouping into clusters on the one hand, and the minimum value of the reproducibility error on the other hand. As can be seen from Tables <ref type="table" target="#tab_0">1 and 2</ref>, when the MinPts parameter value is 3, there are seven clusters' structures. The first clustering contains six clusters, the four clustering contain five clusters, and the last two clustering contain three clusters. In the cases when MinPts values are 4 or 5, the first clustering contained four clusters, in other cases, three clusters were obtained in each clustering. It should be noted that the initial data contained approximately 400 gene expression profiles that were carefully selected by stepwise application of the SOTA clustering algorithm <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref>. The accuracy of the samples classifying when the full set of gene expression profiles was used as attributes was approximately 93%.</p><p>Analysis of the results shows also that in all cases, some of the gene expression profiles are identified as noise. These genes are not contained in any cluster. The presence of "noise" genes can be explained by the fact that the density of these GEP in terms of the used proximity metric is less than the conditional boundary value assessed by the OPTICS clustering algorithm. Analysis of the charts presented in Figure <ref type="figure" target="#fig_2">3</ref> has also shown that the internal and external criteria do not optimal to assess the OPTICS algorithm suitable parameters because the minimum values of these metrics do not matched to the maximum values of the object classification accuracy in the corresponding clusters. The maximum value of the hybrid balance criterion, which contains as components both the internal and external criteria is achieved in the case in a three-cluster structure with the parameters of the OPTICS algorithm: MinPts = 3, Eps = 0.00091155 or Eps = 0.00092700 (the same results are achieved in these instances). The results of the classification of objects contained in the corresponding clusters and presented in Figure <ref type="figure" target="#fig_3">4</ref>, confirm the hereinbefore conclusions. As it can be seen from the charts, with these parameters of the algorithm, the classification accuracy is maximal for the first two clusters, while the second cluster contains the largest number of genes, i.e. it is the main in terms of the number of gene expression profiles. The third cluster contains only six genes. The classification results in the fourth, fifth and sixth clusters are not adequate because they are the same in all cases and slightly worse than the classification results in the first three clusters. It should be noted that the maximum values of the hybrid balance criterion that determines the quality of gene expression profiles clustering correspond to the maximum values of the samples classification accuracy that contain as the attributes the extracted gene expression profiles. This fact indicates the high efficiency of the proposed hybrid proximity metric and technique to asses the quality of GEP clustering. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>A hybrid model of GEP clusters formation in order to extract the groups of mutual similar GEP in terms of applied proximity metrics based on the application of OPTICS clustering algorithm implemented on the basis of OCIT principles has been described in this paper. The hybrid proximity metric to access the distance between GEP has been applied during the simulation. This metric has been calculated on the basis of the joint applying the modified mutual information maximization metric (considered various methods of Shannon entropy evaluation) and Pearson's 2   test. The effectiveness of this hybrid proximity metric has been proved in <ref type="bibr" target="#b26">[27]</ref>. The structural block chart of the stepwise algorithm for set the OPTICS algorithm suitable parameters in terms of a hybrid balance clustering quality criterion, which contains as components the internal and external clustering quality criteria has been presented. The high efficiency of the proposed model has been confirmed by the convergence of quality criteria for clustering gene expression profiles and the classification of objects that contain these GEP as attributes.</p><p>An analysis of the simulation results has indicated that the internal and external clustering quality criteria do not allow determining the OPTICS algorithm optimal parameters. The minimal values of these criteria do not matched to the maximum values of the object classification accuracy in the corresponding clusters. The maximal value of the hybrid balance criterion, which is formed considering both the internal and external criteria has been achieved for a three-cluster structure with parameters of the OPTICS algorithm: MinPts = 3, Eps = 0.00091155 or Eps = 0.00092700 (the same results are achieved in these instances).</p><p>The analysis of the results of objects classification has confirmed the high effectiveness of the proposed technique since the classification accuracy is maximal for the first two clusters, while the second cluster contains the largest number of genes, i.e. it is the main in terms of the number of gene expression profiles. The third cluster contains only six genes. The fourth, fifth and sixth clusters contained the same number of gene expression profiles. Additionally, classification accuracy in these cases is slightly worse than the classification results in the first three clusters. It should be noted that the maximum values of the hybrid balance criterion that determines the quality of GEP clustering matched to the maximum values of the samples classification accuracy that contain as the attributes the extracted gene expression profiles. This fact indicates the high efficiency of the proposed hybrid proximity metric and model to assess the quality of GEP clustering. However, we would like to note that the proposed proximity metric is appropriate for high dimensional gene expression profiles. In the case of the other data use, it is necessary to investigate other more suitable for this type of data metrics. This is the limitation of the proposed model.</p><p>The further perspectives of the authors' research are an application of the proposed hybrid proximity metric within the framework of gene expression profiles hybrid clustering and classification techniques implemented based on other clustering and classification algorithms.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Structural block-chart of the algorithm for forming a multicluster structure based on the OPTICS algorithm implemented using the principles of OCIT Stage III. Stepwise clustering of GEP within the specified ranges of the algorithm appropriate parameters variation.</figDesc><graphic coords="5,77.65,109.90,439.65,444.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>3. 1 .</head><label>1</label><figDesc>MinPts value initialization: k = MinPtsmin. 3.2. Eps value initialization: e = Epsmin. 3.3. Clustering of gene expression profiles contained in equivalent subsets, forming the partitions with the number of clusters K1 and K2. 3.4. If K1 = K2 &gt; 2, calculation of internal and external quality criteria by formulas (1) -(4). Otherwise, increase the value of Eps parameter (e = e + de) and go to step 3.3 of this procedure. 3.5. Classification of objects that contain gene expression profiles in each of the allocated clusters. Calculation of the classification quality criterion (Accuracy). 3.6. If max Eps e  , go to step 3.3 of this procedure. Otherwise, calculate the hybrid balance criterion and increase the MinPts value by one: k = k + 1.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The simulation results regarding the criterial analysis of cluster structure using OPTICS algorithm implemented on the basis of OCIT: distribution of the internal criteria assessed on the first (a) and second (b) equivalent subsets of GEP; external (c) and hybrid balance criterion (d) when the Eps and MinPts values are varied from minimum to maximum values</figDesc><graphic coords="7,74.15,72.00,446.55,326.65" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: The results of the simulation regarding assessing the objects classification accuracy whose attributes are the gene expression profiles allocated to clusters using the OPTICS algorithm: a) the first cluster; b) the second cluster; c) the third cluster</figDesc><graphic coords="8,86.15,198.50,431.40,307.90" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="2,72.70,160.45,449.55,120.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2</head><label>2</label><figDesc>The result of the division of GEP into clusters when MinPts = 4 and 5</figDesc><table><row><cell>-3</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Clusters</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>1</cell><cell></cell><cell>2</cell><cell>3</cell><cell>4</cell><cell></cell><cell>5</cell><cell></cell><cell>6</cell></row><row><cell>0.66435</cell><cell>24</cell><cell></cell><cell>311</cell><cell>6</cell><cell cols="2">14</cell><cell>6</cell><cell></cell><cell>7</cell></row><row><cell>0.69525</cell><cell>24</cell><cell></cell><cell>322</cell><cell>6</cell><cell cols="2">14</cell><cell>6</cell><cell></cell><cell>-</cell></row><row><cell>0.71070</cell><cell>24</cell><cell></cell><cell>323</cell><cell>6</cell><cell cols="2">14</cell><cell>6</cell><cell></cell><cell>-</cell></row><row><cell>0.72615</cell><cell>24</cell><cell></cell><cell>329</cell><cell>6</cell><cell cols="2">14</cell><cell>6</cell><cell></cell><cell>-</cell></row><row><cell>0.74160</cell><cell>24</cell><cell></cell><cell>332</cell><cell>6</cell><cell cols="2">14</cell><cell>6</cell><cell></cell><cell>-</cell></row><row><cell>0.91155</cell><cell>24</cell><cell></cell><cell>359</cell><cell>6</cell><cell>-</cell><cell></cell><cell>-</cell><cell></cell><cell>-</cell></row><row><cell>0.92700</cell><cell>24</cell><cell></cell><cell>359</cell><cell>6</cell><cell>-</cell><cell></cell><cell>-</cell><cell></cell><cell>-</cell></row><row><cell>EPS,*10 -3</cell><cell></cell><cell cols="2">MinPts = 4</cell><cell cols="2">EPS,*10 -3</cell><cell></cell><cell cols="2">MinPts = 5</cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="2">Clusters</cell><cell></cell><cell></cell><cell></cell><cell cols="2">Clusters</cell><cell></cell></row><row><cell></cell><cell>1</cell><cell>2</cell><cell>3</cell><cell>4</cell><cell></cell><cell>1</cell><cell>2</cell><cell>3</cell><cell>4</cell></row><row><cell>0.64890</cell><cell>24</cell><cell>158</cell><cell>115</cell><cell>11</cell><cell>0.64890</cell><cell>23</cell><cell>155</cell><cell>115</cell><cell>11</cell></row><row><cell>0.69525</cell><cell>24</cell><cell>322</cell><cell>14</cell><cell>-</cell><cell>0.66435</cell><cell>24</cell><cell>308</cell><cell>14</cell><cell>-</cell></row><row><cell>0.71070</cell><cell>24</cell><cell>323</cell><cell>14</cell><cell>-</cell><cell>0.67980</cell><cell>24</cell><cell>314</cell><cell>14</cell><cell>-</cell></row><row><cell>0.72615</cell><cell>24</cell><cell>326</cell><cell>14</cell><cell>-</cell><cell>0.69525</cell><cell>24</cell><cell>321</cell><cell>14</cell><cell>-</cell></row><row><cell>0.74160</cell><cell>24</cell><cell>331</cell><cell>14</cell><cell>-</cell><cell>0.71070</cell><cell>24</cell><cell>322</cell><cell>14</cell><cell>-</cell></row><row><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>0.72615</cell><cell>24</cell><cell>326</cell><cell>14</cell><cell>-</cell></row><row><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>0.74160</cell><cell>24</cell><cell>331</cell><cell>14</cell><cell>-</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">limma powers diff. express. analysis for RNA-sequencing and microarray studies</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Ritchie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Phipson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wu</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gkv007</idno>
	</analytic>
	<monogr>
		<title level="j">Nucl. Acids Res</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="issue">7</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>art</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">a lang. for data analysis and graphics</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ihaka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gentleman</surname></persName>
		</author>
		<idno type="DOI">10.2307/1390807</idno>
	</analytic>
	<monogr>
		<title level="j">J. of Comp. and Graph. Statistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="299" to="314" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">analysis of microarray GEP of lung cancer</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kornelyuk</surname></persName>
		</author>
		<idno type="DOI">10.7124/bc.00090F</idno>
	</analytic>
	<monogr>
		<title level="j">Biopolymers and Cell</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="70" to="79" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Techniques of DNA microarray data pre-processing based on the complex use of Bioconductor tools and Shannon entropy</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Durnyak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2353</biblScope>
			<biblScope unit="page" from="365" to="377" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Exploratory analysis of neuroblast. data genes expr. based on Bioconductor package tools</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Durnyak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Senkivskyy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2488</biblScope>
			<biblScope unit="page" from="268" to="279" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A Review of Feature Extraction Soft. for Microarray Gene Expr</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">S</forename><surname>Ting</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Mohamad</surname></persName>
		</author>
		<idno type="DOI">10.1155/2014/213656</idno>
	</analytic>
	<monogr>
		<title level="j">Data. BioMed Res. Int</title>
		<imprint>
			<biblScope unit="volume">2014</biblScope>
			<biblScope unit="page">213656</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A survey on hybrid feature selection meth. in microarray gene express. data for cancer classific</title>
		<author>
			<persName><forename type="first">N</forename><surname>Almugren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alshamlan</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2019.2922987</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">8736725</biblScope>
			<biblScope unit="page" from="78533" to="78548" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A hybrid feature selection algorithm for GED classification</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yan</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neucom.2016.07.080</idno>
	</analytic>
	<monogr>
		<title level="j">Neurocomp</title>
		<imprint>
			<biblScope unit="volume">256</biblScope>
			<biblScope unit="page" from="56" to="62" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A novel hybrid feature selection method for microarray data analysis</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">P</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Leu</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.asoc.2009.11.010</idno>
	</analytic>
	<monogr>
		<title level="j">Appl. Soft Comput</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="208" to="213" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A hybrid feature selection method for DNA microarray data</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">Y</forename><surname>Chuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.compbiomed.2011.02.004</idno>
	</analytic>
	<monogr>
		<title level="j">Comput. Biol. Med</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="228" to="237" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Valizade Hasanloei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sheikhpour</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10822-017-0094-6</idno>
	</analytic>
	<monogr>
		<title level="j">J. Comput Aided. Mol. Des</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="375" to="384" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Induction of decision trees</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
		<idno type="DOI">10.1007/BF00116251</idno>
	</analytic>
	<monogr>
		<title level="j">Mach Learn</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="81" to="106" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A Fisher&apos;s Criterion-Based Linear Discriminant Analysis for Predicting the Critical Values of Coal and Gas Outbursts Using the Initial Gas Flow in a Borehole</title>
		<author>
			<persName><forename type="first">L</forename><surname>Xiaowei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chenglin</surname></persName>
		</author>
		<idno type="DOI">10.1155/2017/7189803</idno>
	</analytic>
	<monogr>
		<title level="j">Mathematical Problems in Engineering</title>
		<imprint>
			<biblScope unit="volume">2017</biblScope>
			<biblScope unit="issue">7189803</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Independent component analysis: recent advances</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hyvärinen</surname></persName>
		</author>
		<idno type="DOI">10.1098/rsta.2011.0534</idno>
	</analytic>
	<monogr>
		<title level="j">Phil. Trans. R. Soc</title>
		<imprint>
			<biblScope unit="volume">371</biblScope>
			<biblScope unit="page">20110534</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">mRMR-ABC: A hybrid gene selection algorithm for cancer classification using microarray GEP</title>
		<author>
			<persName><forename type="first">H</forename><surname>Alshamlan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Badr</surname></persName>
		</author>
		<idno type="DOI">10.1155/2015/604910</idno>
	</analytic>
	<monogr>
		<title level="j">Biomed. Res. Intern</title>
		<imprint>
			<biblScope unit="volume">2015</biblScope>
			<biblScope unit="page">604910</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy</title>
		<author>
			<persName><forename type="first">P</forename><surname>Moradi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gholampour</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.asoc.2016.01.044</idno>
	</analytic>
	<monogr>
		<title level="j">Applied Soft Computing</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="117" to="130" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Gene selection and classification approach for microarray data based on random forest ranking and BBHA</title>
		<author>
			<persName><forename type="first">E</forename><surname>Pashaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ozen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Aydin</surname></persName>
		</author>
		<idno type="DOI">10.1109/BHI.2016.7455896</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE-EMBS Int. Conf. Biomed. Health Inform. (BHI)</title>
				<meeting>IEEE-EMBS Int. Conf. Biomed. Health Inform. (BHI)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="308" to="311" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Multiobjective binary biogeography based optimization for feature selection using gene expression data</title>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yin</surname></persName>
		</author>
		<idno type="DOI">10.1109/TNB.2013.2294716</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Nanobiosci</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="343" to="353" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Shreem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Abdullah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Z A</forename><surname>Nazri</surname></persName>
		</author>
		<idno type="DOI">10.1080/00207721.2014.924600</idno>
	</analytic>
	<monogr>
		<title level="j">Int. J. Syst. Sci</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1312" to="1329" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">GOA-based DBN: Grasshopper optimization algorithm-based deep belief neural networks for cancer classification</title>
		<author>
			<persName><forename type="first">P</forename><surname>Tumuluru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ravi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Appl. Eng. Res</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">24</biblScope>
			<biblScope unit="page" from="14218" to="14231" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Škvor</surname></persName>
		</author>
		<idno type="DOI">10.3390/diagnostics10080584</idno>
	</analytic>
	<monogr>
		<title level="j">Diagnostics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">8</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Information Technology of Gene Expression Profiles Processing for Purpose of Gene Regulatory Networks Reconstruction</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lytvynenko</surname></persName>
		</author>
		<idno type="DOI">10.1109/DSMP.2018.8478452</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 IEEE 2nd International Conference on Data Stream Mining and Processing</title>
				<meeting>the 2018 IEEE 2nd International Conference on Data Stream Mining and Processing<address><addrLine>DSMP</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="volume">8478452</biblScope>
			<biblScope unit="page" from="336" to="341" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">An approach towards missing data management using improved grnn-sgtm ensemble method</title>
		<author>
			<persName><forename type="first">I</forename><surname>Izonin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tkachenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Verhun</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jestch.2020.10.005</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal Engineering Science and Technology</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>in press</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Identifying textual content based on thematic analysis of similar texts in big data</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lytvyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Salo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vysotska</surname></persName>
		</author>
		<idno type="DOI">10.1109/STC-CSIT.2019.8929808</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 2019 14th International Scientificc and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 -Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="84" to="91" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">The architecture of distant competencies analyzing system for it recruitment</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rzheuskyi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kutyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vysotska</surname></persName>
		</author>
		<idno type="DOI">10.1109/STC-CSIT.2019.8929762</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 2019 14th Int. Sc. and Techn.l Conf. on Comp. Sc. and Inf. Techn</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="254" to="261" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">An approach towards increasing prediction accuracy for the recovery of missing iot data based on the grnn-sgtm ensemble</title>
		<author>
			<persName><forename type="first">R</forename><surname>Tkachenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Izonin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kryvinska</surname></persName>
		</author>
		<idno type="DOI">10.3390/s20092625</idno>
	</analytic>
	<monogr>
		<title level="j">Sensors</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">9</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>art</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Comparison Analysis of Gene Expression Profiles Proximity Metrics</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yasinska-Damri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Liakh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Durnyak</surname></persName>
		</author>
		<idno type="DOI">10.3390/sym13101812</idno>
	</analytic>
	<monogr>
		<title level="j">Symmetry</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">10</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">OPTICS: Ordering Points to Identify the Clustering Structure</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ankerst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Breunig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<idno type="DOI">10.1145/304181.304187</idno>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Record (ACM Special Interest Group on Management of Data)</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="49" to="60" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Inductive Learning Algorithms for Complex Systems Modeling</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Madala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Ivakhnenko</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1994">1994</date>
			<publisher>CRC Press</publisher>
			<biblScope unit="page">365</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Application of Optics Density-Based Clustering Algorithm Using Inductive Methods of Complex System Analysis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Durnyak</surname></persName>
		</author>
		<idno type="DOI">10.1109/STC-CSIT.2019.8929869</idno>
	</analytic>
	<monogr>
		<title level="m">International Scientific and Technical Conference on Computer Sciences and Information Technologies</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="169" to="172" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Estimation of the inductive model of objective clustering stability based on k-means algorithm for different level of data noise</title>
		<author>
			<persName><forename type="first">S</forename><surname>Babichev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lytvynenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Taif</surname></persName>
		</author>
		<idno type="DOI">10.15588/1607-3274-2016-4-7</idno>
	</analytic>
	<monogr>
		<title level="j">Radio Electronics, Comp. Science, Control</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="54" to="60" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Gene expression-based classification of non-small cell lung carcinomas and survival prediction</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Aerts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hamer</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0010312</idno>
	</analytic>
	<monogr>
		<title level="j">PLoS ONE</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">e10312</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
