<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Techniques of DNA Microarray Data Pre-processing Based on the Complex Use of Bioconductor Tools and Shannon Entropy</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Jan Evangelista Purkyne University in Usti nad Labem</institution>
          ,
          <addr-line>Ceske mladeze, 8, Usti nad Labem, 40096</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ukrainian Academy of Printing</institution>
          ,
          <addr-line>Pid Holoskom, 19, Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper presents the comparison analysis of various techniques of DNA microarrays data pre-processing in order to choose the optimal combination of the methods in terms of the minimum value of Shannon entropy criterium. The Bioconductor package tools of R software were used during the simulation process. The DNA microarray data of patients, which were investigated on lung cancer from database Array Express, were used as the experimental data. The algorithm of step-by-step procedure of the data processing for purpose of determination of the optimal combination of the methods has been proposed as the results of the research. The results of the simulation have shown that the optimal combination of the methods for the investigated data is the following one: rma method background correction, invariant set method normalization and mas method PM correction and summarization. This combination of the methods corresponds to the minimum value of the Shannon entropy criterion.</p>
      </abstract>
      <kwd-group>
        <kwd>DNA microarray</kwd>
        <kwd>gene expression profiles</kwd>
        <kwd>background correction</kwd>
        <kwd>normalization</kwd>
        <kwd>PM correction</kwd>
        <kwd>summarization</kwd>
        <kwd>Shannon entropy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>One of the current directions of modern bioinformatics is reconstruction and
simulation of gene regulatory networks based on the data from DNA-microchip
experiments [1]. Implementation of this process involves the preliminary
experimental data pre-processing in order to form the array of gene expression
profiles. DNA microchip data are presented as a matrix of light intensities, the values
of which are proportional to the expression of the appropriate genes. The genes
expression value determines the amount of appropriate type of protein which will be
generated by this gene. Each of the DNA microchips includes the results of the
experiment for appropriate investigated object or for appropriate conditions of the
experiment performing. In this case the quantity of the DNA microchips corresponds
to the quantity of the investigated objects. It should be noted, that conditions of the
experiments performing are differed in the most cases. In this case very important is
the stage of the obtained data pre-processing. This procedure involves the use of four
steps: background correction, normalization, PM correction and summarization [2–5].
Each of the steps can be implemented using various methods.</p>
      <p>The technique to determine the optimal combination of methods to form the gene
expression profiles array objectively is absent nowadays. In this paper we propose the
technique of gene expression array formation based on Shannon entropy criterion
which is calculated with the use of James-Stein shrinkage estimator method [6]. The
optimal combination of the methods is determined based on the minimum value of the
Shannon entropy criterion.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Formal problem statement</title>
      <p>The data, which are obtained during the DNA microchip experiments, are presented
as a matrix of light intensities. A block chart of procedure of DNA microchip light
intensities matrix formation during the experiment performing is presented in fig. 1.
 
Joining of complementary single-chain nucleotides with fluorescent labels to a single
molecule is performed during the hybridization process. It is obvious, that the level of
light intensities in appropriate point of the microchip is proportional to quantity of the
hybridized RNA molecules, which correspond to appropriate type of the protein. The
following stages of the DNA microchip processing are filtering in order to remove
unhybridized samples and scanning for purpose of the matrix of light intensities
formation. Fig. 2 presents the step-by-step procedure to transform the light intensities
values to the expression of the corresponding genes. As it can be seen from fig. 2,
each of the steps assumes the use of various methods and choice of the combination
of these methods influences directly to the quality of the obtained genes expression
data. Thus, the main problem consists in determination of the optimal combination of
methods to process the DNA microchip data in order to increase the informativity
level of the obtained gene expression data.
The issues concerning the DNA microarray processing are presented in [7–9]. The
authors considered in detail the stages of DNA microarrays creation and the
peculiarities of their processing. In [10] the author considered the possibility of the
neuro-fuzzy modeling implementation for purpose of the microarrays experiments
data processing. In [11] the authors presented the results of research concerning
analysis of genes expression for the purpose of the objects classification using
Bayesian network. However, it should be noted, that the hereinbefore works do not
contain the investigations concerning choice of the appropriate combination of the
methods to process the DNA microchip data based on quantitative criteria.</p>
      <p>Classification and detail description of the background correction methods are
presented in [2–5]. Ideal Mismatch method was proposed by Affymetrix company [3].
This method involves the complex use of both the Perfect Match (PM) nucleotide
samples which fully correspond to the investigated genes and the Miss Match (MM)
samples, in which the mean nucleotide is changed to complementary one. Robust
Multichip Average (RMA) background correction method involves the use only PM
samples [4]. This fact decreases the costs to the microchip preparing due to absence of
the MM samples. The values of light intensities in this case are presented as the sum
of the useful signal, which is distributed exponentially, and the normally distributed
noise component. Distribution Free Convolution Model (DFCM) background
correction method [5] also assumes that values of light intensities are presented as the
combination of both the useful signal and the noise component. But in this case do not
any assumes about the character of the components distribution. This method involves
the use of both the PM and MM samples. The main idea and the detail description of
the Affymetrix Micro Array Suite 5.0 (MAS 5.0) technique of background correction
are presented in [2,3].</p>
      <p>
        The techniques of DNA microchip data normalization are presented in [2,12–16].
The necessity of this stage is determined by low correlation of the data which were
determined when different conditions of the experiment performing. The aim of the
normalization process is the reduction of the microchip empirical data to the same
distribution. This step allows minimizing the technological differences between the
parameters of different genes and, as the result, to carry out the comparison of the
expression values of the corresponding genes obtained under different conditions of
the experiment performing. The results of the research concerning comparison
analysis of various methods of PM corrections and summarization of the DNA
microchip data are presented in [
        <xref ref-type="bibr" rid="ref1">2,14,17,18</xref>
        ]. PM correction stage is performed in
order to reduce the nonspecific hybridization effect by correction of the PM samples
light intensities taking into account the light intensities of the corresponding MM
samples. The summarization process assumes the calculation of gene expressions
values from light intensities of the samples for investigated genes.
      </p>
      <p>
        However, it should be noted that in spite to the achievements in this subject area
the effective technique to choose the optimal combination of methods of DNA
microchip data processing is absent nowadays. This problem can be solved based on
modern techniques of the complex data processing which are used in different field of
scientific research nowadays [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">19–23</xref>
        ].
      </p>
      <p>The aim of the paper is the improvement of technique of DNA microarray data
processing based on the complex use of Bioconductor tools and Shannon entropy for
purpose of gene expression array formation.</p>
    </sec>
    <sec id="sec-3">
      <title>4 Materials and methods</title>
      <p>The Shannon entropy criterion, which is calculated based on James-Stein shrinkage
estimator [6], was used as the main criterion to estimate the gene expression
informativity during the simulation process. This technique is based on the complex
use of the two different models: a high-dimensional model with low bias and high
variance, and a lower dimensional model with larger bias but smaller variance.
Evaluation of the probability of values distribution in cells in accordance with
JamesStein shrinkage technique is calculated by the formula:
piSrink  pi  1   piML
(1)
where piML is the probability of the gene expression values distribution in the i-th
cell which is calculated by maximum likelihood method, pi  1n is the shrinkage
i
target or probability in the i-th cell in the case of uniform distribution of the gene
expression values, ni is quantity of the features in the i-th cell. It is obvious, that
piML corresponds to the high-dimensional model with low bias and high variance and
pi corresponds to the models with higher bias and lower variance of the features
distribution. Parameter of the intensity λ in this case is calculated as follows:
(2)
(3)
 </p>
      <p>k
1   piML 2
i1
k
n  1 pi  piML 2</p>
      <p>i1
k
H Shrink   piShrink log2 piShrink</p>
      <p>i1
where n is the quantity of the features in the investigated vector. The value of the
Shannon entropy is calculated with the use of standard formula taking into account
the method of the probability estimation:
Less value of the criterion (3) corresponds to the higher level of the investigated
vector informativity.</p>
      <p>A structural block chart of the algorithm which was used to determine the optimal
combination of the methods of DNA microarray data processing is shown in fig. 3.
Implementation of this algorithm involves the following steps:
1. Loading of the DNA microarray data.</p>
      <p>2. Setup of the stage of data processing (background correction, normalization,
PM correction, summarization). Fixation of the methods, which do not correspond to
this stage randomly.</p>
      <p>3. Choice of the first method for current stage.
4. DNA microarray data processing by selected methods.</p>
      <p>5. Calculation of the Shannon entropy by formulas (1)–(3) for each of the
microchips. Calculation of average value of the Shannon entropy for all DNA
microarrays.</p>
      <p>6. If the number of the method is less than the maximum quantity of the methods
at this stage, then choice the next method and go to the step 4 of this procedure.
Otherwise fixation of the method which correspond to the minimum value of the
Shannon entropy.</p>
      <p>7. If the number of the stage is less than maximum quantity of the stages, then go
to the next stage and go to the step 3 of this procedure. Otherwise, DNA microarray
data processing with the use of determined combination of the methods.</p>
    </sec>
    <sec id="sec-4">
      <title>5 Experiments</title>
      <p>
        Simulation process of DNA microchip data pre-processing was performed based on R
software [
        <xref ref-type="bibr" rid="ref7">24</xref>
        ] using functions of Bioconductor package [
        <xref ref-type="bibr" rid="ref8">25</xref>
        ]. The lung cancer
patients’ gene expression profiles E-GEOD-68571 [
        <xref ref-type="bibr" rid="ref9">26</xref>
        ] from database ArrayExpress
[
        <xref ref-type="bibr" rid="ref10">27</xref>
        ] were used as the experimental data during the simulation process. These data
includes 96 of DNA microchips of patients which were investigated on lung cancer.
Each of the DNA microchips includes 7129 of genes. 10 patients were identified as
healthy and 86 sick patients were divided by the state of their health into three groups.
      </p>
    </sec>
    <sec id="sec-5">
      <title>6 Results and discussions</title>
      <p>The character of light intensities values distribution at the selected DNA microarray is
presented in fig. 5. Fig. 6 shows the MA charts for all pairs of five selected DNA
microchips. The MA chart presents the difference of logarithms of the PM (Perfect
Math) samples values (M) versus the average of logarithms of the PM samples values
(A). Parameters M and A for i-th gene and samples k and n are calculated in the
following way:</p>
      <p>M  log2  xki  , A  1.2</p>
      <p>
 xni 
log2 xki  xni 
The chart is created for PM values for all possible pairs of the investigated samples.
In the case of the highest quality of the data processing, the data should be distributed
in a rather narrow range, and the points at MA diagram should be located along the
axis of M = 0 with the lowest averages.</p>
      <p>
        The analysis of the received diagrams confirms the assumption concerning the
necessity of the initial data preprocessing. The character of the data distribution for
various microchips is differed significantly (Fig. 5a). The kernel density plots which
are shown in fig. 5b are distributed along axis of the light intensities logarithm
randomly too. Finally, the corresponding points on the MA diagrams (Fig. 6) have
different distributions too. These facts do not allow us to compare the investigated
gene expression profiles objectively. Fig. 7 and Fig. 8 present the results of the
research concerning background correction of the DNA microarrays by methods:
(4)
 
“rma”, “mas” and “DFCM”. The “Ideal Mismatch” method has not used due to lower
quality of its operation [
        <xref ref-type="bibr" rid="ref11">28</xref>
        ]. The analysis of the obtained charts allows us to conclude
that the background correction increases the image quality. The processed data are
distributed more uniformly to compare with unprocessed data. However, it should be
noted that visual analysis of the diagrams does not allow us to compare the quality of
the used methods objectively in order to choose the best one. Fig. 9 presents the
results of the research concerning determination of the optimal combination of the
methods to process the DNA microarray data based on the minimum value of the
Shannon entropy in accordance with hereinbefore technique.
      </p>
      <p>The analysis of the obtained charts allows us to conclude that the optimal methods
in terms of the minimum value of Shannon entropy criterion are the following ones:
“rma” background correction method; “invariant set” normalization method; “mas”
methods PM correction and summarization. This combination of the methods was
used to process the investigated DNA microarrays. Fig. 10 presents the boxplots of
genes expression vectors for the investigated samples of both the non-processed (fig.
10a) and processed (fig. 10b) data.</p>
      <p>As it can be seen from fig. 10b, the values of genes expression are distributed in
the same range. The change of this range can be explained in the following way. The
expression values of the largest quantity of genes are low. But some of the genes have
significantly higher values of expression. It means that these genes determine some
important processes in the investigated objects. The expression values of these genes
determine the variation range of another genes expression. The analysis of the
boxplots allows us also to conclude that the values of the largest quantity of gene
expressions for various objects lie in a very narrow range. This can mean that these
genes are responsible for the functions that are inherent for all investigated objects.
However, each of the investigated samples contains genes, the expression of which
goes beyond the inter-quartile range. These genes are very important for the following
research since they allow us to distinguish the investigated objects by their
particularities.</p>
    </sec>
    <sec id="sec-6">
      <title>7 Conclusions</title>
      <p>In this paper we have proposed the technique of gene expression array formation
which were obtained based on DNA microarray experiments. The initial data is
presented as a set of DNA microchips, each of which contains the matrix of light
intensities, the values of which are proportional the expression values of the
appropriate genes. Four stages have been performed during the simulation process:
background correction, normalization, PM correction and summarization. Each of the
stage assumed the use of different methods. The Shannon entropy criterion which is
calculated based on James-Stein shrinkage estimator has been used as the main
criterion to estimate the genes expression informativity.</p>
      <p>The simulation process has been performed based on R software with the use of
Bioconductor package functions. The lung cancer patients’ gene expression profiles
E-GEOD-68571 from database ArrayExpress have been used as the experimental data
during the simulation process. The results of the simulation have shown that the
optimal combination of the methods in terms of the minimum value of the Shannon
entropy is the following one: “rma” background correction method, “invariant set”
normalization method and “mas” methods PM correction and summarization. This
combination of the methods has been used to process the investigated DNA
microchips.</p>
      <p>The boxplots of both the non-processed and processed data have been created as
the simulation results. The analysis of the obtained results has shown that the values
of the largest quantity of gene expressions for various objects lie in a very narrow
range. It means that these genes are responsible for the functions that are inherent for
all investigated objects. However, each of the investigated samples contains genes,
the expression of which goes beyond the inter-quartile range. This fact can mean that
these genes are very important for the following research since they allow us to
distinguish the investigated objects by their particularities.</p>
    </sec>
    <sec id="sec-7">
      <title>References</title>
      <p>1. Zak, D.E., Vadigepalli, R., Gonye, G.E., Doyle, F.J., Schwaber, J.S., Ogunnaike, B.A.:
Unconventional systems analysis problems in molecular biology: A case study in gene
regulatory network modeling. Computers and Chemical Engineering, 29 (3), pp. 547-563
(2005) doi: 10.1016/j.compchemeng.2004.08.016
2. Bolstad, B.M., Irizarry, R.A., Åstrand, M., Speed, T.P.: A comparison of normalization
methods for high density oligonucleotide array data based on variance and bias.</p>
      <p>Bioinformatics, 19 (2), pp. 185-193 (2003) doi: 10.1093/bioinformatics/19.2.185
3.Affymetrix. Statistical Algorithms Description Document. Affymetrix, Inc., Santa Clara,</p>
      <p>CA, pp. 1-27 (2002)
4.Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U.,
Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide
array probe level data. Selected Works of Terry Speed, pp. 601-616 (2012) doi:
10.1007/978-1-4614-1347-9_15
5.Chen, Z., McGee, M., Liu, Q., Kong, M., Deng, Y., Scheuermann, R.H.: A distribution-free
convolution model for background correction of oligonucleotide microarray data. BMC
Genomics, 10 (SUPPL. 1), art. no. S19 (2009) doi: 10.1186/1471-2164-10-S1-S19
6.Hausser, J., Strimmer, K.: Entropy inference and the james-stein estimator, with application
to nonlinear gene association networks. Journal of Machine Learning Research, 10, pp.
1469-1484 (2009)
7.Kohane, I.S., Kho, A.T., Butte, A.J.: Microarrays for an integrative genomics. Cambridge,</p>
      <p>Massachusetts, England: A Bradford book, the MIT press, 236 p. (2003)
8.Ivakhno, S.S., Kornelyuk, A.I.: Microarrays: Technologies overview and data analysis.</p>
      <p>Ukrain'skyi Biokhimichnyi Zhurnal, 76 (2), pp. 5-19 (2004)
9. Babichev, S.A., Kornelyuk, A.I., Lytvynenko, V.I., Osypenko, V.V.: Computational
analysis of microarray gene expression profiles of lung cancer. Biopolymers and Cell, 32
(1), pp. 70-79 (2016) doi: 10.7124/bc.00090F
10. Wang, Z.: Neuro-Fuzzy Modeling for Microarray Cancer Gene Expression Data. Oxford</p>
      <p>University Computing Laboratory, 107 p. (2005)
11. Loren van Themaat, E.V.: On the Use of Learning Bayesian Networks to Analyze Gene
Expression Data: Classification and Gene Network Reconstruction. University of
Amsterdam, 73 p. (2005)
12. Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and</p>
      <p>Computational Biology Solutions Using R and Bioconductor. Springer, 473 p. (2005)
13. Park, T., Yi, S.-G., Kang, S.-H., Lee, S.Y., Lee, Y.-S., Simon, R.: Evaluation of
normalization methods for microarray data. BMC Bioinformatics, 4, art. no. 33, 13 p.
(2003) doi: 10.1186/1471-2105-4-33
14. Raddatz, B.B., Spitzbarth, I., Matheis, K.A., Kalkuhl, A., Deschl, U., Baumgärtner, W.,
Ulrich, R.: Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A
Review. Veterinary Pathology, 54(5), pp. 734-755 (2017) doi: 10.1177/0300985817709887
15. Åstrand, M.: Contrast normalization of oligonucleotide arrays. Journal of Computational</p>
      <p>Biology, 10 (1), pp. 95-102 (2003) doi: 10.1089/106652703763255697
16. Chen, Y.-J., Kodell, R., Sistare, F., Thompson, K.L., Morris, S., Chen, J.J.: Normalization
methods for analysis of microarray gene-expression data. Journal of Biopharmaceutical
Statistics, 13 (1), pp. 57-74 (2003) doi: 10.1081/BIP-120017726
17. Barbará, D., Wu, X.: An Approximate Median Polish Algorithm for Large
Multidimensional Data Sets. Springer-Verlag London Ltd. Knowledge and Information
Systems, vol. 5, pp. 416-438 (2003)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          18.
          <string-name>
            <surname>Lazaridis</surname>
            ,
            <given-names>E.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinibaldi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bloom</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jove</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A simple method to improve probe set estimates from oligonucleotide arrays</article-title>
          .
          <source>Mathematical Biosciences</source>
          ,
          <volume>176</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>53</fpage>
          -
          <lpage>58</lpage>
          (
          <year>2002</year>
          ) doi: 10.1016/S0025-
          <volume>5564</volume>
          (
          <issue>01</issue>
          )
          <fpage>00100</fpage>
          -
          <lpage>6</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          19.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvynenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osypenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Implementation of the objective clustering inductive technology based on DBSCAN clustering algorithm</article-title>
          .
          <source>Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>1, art</article-title>
          . no.
          <issue>8098832</issue>
          , pp.
          <fpage>479</fpage>
          -
          <lpage>484</lpage>
          (
          <year>2017</year>
          ) doi: 10.1109/STCCSIT.
          <year>2017</year>
          .8098832
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          20.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korobchynskyi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lahodynskyi</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korchomnyi</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basanets</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borynskyi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Development of a technique for the reconstruction and validation of gene network models based on gene expression profiles</article-title>
          .
          <source>Eastern-European Journal of Enterprise Technologies</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          -
          <fpage>91</fpage>
          ), pp.
          <fpage>19</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2018</year>
          ) doi: 10.15587/
          <fpage>1729</fpage>
          -
          <lpage>4061</lpage>
          .
          <year>2018</year>
          .123634
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          21.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krejci</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bicanek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvynenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Gene expression sequences clustering based on the internal and external clustering quality criteria</article-title>
          .
          <source>Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          ,
          <string-name>
            <surname>CSIT</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>1, art</article-title>
          . no.
          <issue>8098744</issue>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>94</lpage>
          (
          <year>2017</year>
          ) doi: 10.1109/STC-CSIT.
          <year>2017</year>
          .8098744
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          22.
          <string-name>
            <surname>Tkachenko</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doroshenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Izonin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsymbal</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havrysh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Imbalance data classification via neural-like structures of geometric transformations model: Local and global approaches</article-title>
          .
          <source>Advances in Intelligent Systems and Computing</source>
          ,
          <volume>754</volume>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>122</lpage>
          (
          <year>2019</year>
          ) doi: 10.1007/978-3-
          <fpage>319</fpage>
          -91008-6_
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          23.
          <string-name>
            <surname>Peleshko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharov</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Izonin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borzov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Design and implementation of visitors queue density analysis and registration method for retail videosurveillance purposes</article-title>
          .
          <source>Proceedings of the 2016 IEEE 1st International Conference on Data Stream Mining and Processing</source>
          ,
          <string-name>
            <surname>DSMP</surname>
          </string-name>
          <year>2016</year>
          ,
          <article-title>art</article-title>
          . no.
          <issue>7583531</issue>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>162</lpage>
          (
          <year>2016</year>
          ) doi: 10.1109/DSMP.
          <year>2016</year>
          .7583531
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          24.
          <string-name>
            <surname>Ihaka</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gentleman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          : R:
          <article-title>A Language for Data Analysis and Graphics</article-title>
          .
          <source>Journal of Computational and Graphical Statistics</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>299</fpage>
          -
          <lpage>314</lpage>
          (
          <year>1996</year>
          ) doi: 10.1080/10618600.
          <year>1996</year>
          .10474713
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          25.
          <string-name>
            <surname>El</surname>
          </string-name>
          . Resource: https://www.bioconductor.org/about/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          26.
          <string-name>
            <surname>Beer</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kardia</surname>
            ,
            <given-names>S.L.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , C.-C.,
          <string-name>
            <surname>Giordano</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levin</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misek</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gharib</surname>
          </string-name>
          , T.G.,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lizyness</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayasaka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
            ,
            <given-names>J.M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iannettoni</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orringer</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanash</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Gene-expression profiles predict survival of patients with lung adenocarcinoma</article-title>
          .
          <source>Nature Medicine</source>
          ,
          <volume>8</volume>
          (
          <issue>8</issue>
          ), pp.
          <fpage>816</fpage>
          -
          <lpage>824</lpage>
          (
          <year>2002</year>
          ) doi: 10.1038/nm733
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          27.
          <string-name>
            <surname>El</surname>
          </string-name>
          . Resource: https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-
          <volume>6857</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          28.
          <string-name>
            <surname>Baldi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hatfield</surname>
            ,
            <given-names>G.W.:</given-names>
          </string-name>
          <article-title>DNA Microarrays and gene expression: From experiments to data analysis modeling</article-title>
          . Cambridge University Press, pp.
          <fpage>22</fpage>
          -
          <lpage>23</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>