<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Use of Inductive Methods to Identify Subtypes of Glioblastomas in Gene Clustering</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Astana Medical University</institution>
          ,
          <addr-line>Astana</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Molecular Biology and Genetics</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kherson National Technical Unіversity</institution>
          ,
          <addr-line>Kherson</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The article presents an inductive clustering model of RNA-seq data for solving the problem of identifying glioblastomas subtypes by inductive methods based on k- and c-means algorithms. Comparative studies between inductive and classical iterative clustering algorithms are carried out using the criteria for evaluating clustering and data visualization. The basic principles of creating an inductive model of objective clustering are formed, the ways and prospects of the possible implementation of the model are shown, the advantages of the objective clustering model in comparison with traditional methods of data clustering are determined.</p>
      </abstract>
      <kwd-group>
        <kwd>Inductive Modeling</kwd>
        <kwd>Multiform Glioblastoma</kwd>
        <kwd>Clustering of Biologist Objects</kwd>
        <kwd>the Method of Group Accounting of Arguments</kwd>
        <kwd>K-Means Algorithm</kwd>
        <kwd>External Balance Criterion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Glioblastoma multiforme (Glioblastoma multiforme, GBM) is one of the most
common and most aggressive types of brain cancer [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the leading cause of death in
adult brain tumors. Glioblastoma accounts for 52% of all brain tumors. According to
the classification of central nervous system tumors by the World Health Organization,
the standard term for this brain tumor is “glioblastoma,” and it has two forms: giant
cell glioblastoma and gliosarcoma. Glioblastomas are also important brain tumors. It
has a very poor prognosis, despite the existence of many therapeutic methods,
including surgical resection of a larger tumor volume, followed by concomitant or
subsequent chemoradiotherapy. Despite advances in the genomics and classification of
glioma subtypes [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ], glioblastoma has a worse prognosis than any other cancer of
the central nervous system, with an average lifespan of 14 months.
      </p>
      <p>
        While genomic data continues to grow rapidly, clinical use and treatment transfer are
lagging behind. Big data currently stored on the “The Cancer Genome Atlas (TCGA)”
network provides a window for creating new clinical hypotheses [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        One of the main topics is how genomics can be used to obtain clinically relevant
information to improve therapy for patients. Two techniques are currently relevant for
the formation of an array of gene expression: DNA microarray sequencing method [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and RNA molecules [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>The use of the RNA-sequencing method (RNA-seq method) allows you to get the
number of studied genes for the studied samples directly. For this reason, this method
is more accurate than the DNA microarray method. The number of genes determines
the level of activity of this gene or its expression. At the next stage, the problem of
identifying the boundary value arises, which allows us to divide the genes into
lowexpressive and highly expressive. Data needs to be normalized. This involves
converting the count values to the same suitable range.</p>
      <p>
        Numerous studies have shown that RNA sequencing technology is more efficient
than DNA microarray technology in terms of the quality of the data obtained [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Identifying cancer subtypes is an important component of a personalized medicine
system. Identifying cancer subtypes is critical when choosing the right treatment for
patients, as different subtypes of cancer can respond well to different treatments.
Currently, a greater number of computational methods have been developed to detect
subtypes of cancer. However, existing methods rarely use information from gene
regulation networks to facilitate the identification of subtypes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. One of the
computational methods that allows us to solve this problem is clustering. Clustering methods
are divided into hierarchical and iterative [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Hierarchical algorithms are associated with the construction of dendrograms. In
agglomerative algorithms, before the start of clustering, all objects are considered
separate clusters, which are combined during the algorithm.</p>
      <p>
        However, the hierarchical cluster analysis procedure is good for a small number of
objects and is not suitable for large data due to the complexity of the agglomerative
algorithm and too large dendrograms. In iterative algorithms, the data is immediately
divided into several clusters, the number of which is estimated based on the
conditions. Further, the elements are moved between clusters so that a certain criterion is
optimized, for example, variability within the clusters is minimized [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>However, iterative clustering algorithms, in particular, k-means, have several
disadvantages:</p>
      <p>• It is not guaranteed to achieve the global minimum of the total quadratic
deviation, but only one of the local minimums.</p>
      <p>• The result depends on the choice of the initial centers of the clusters; their
optimal choice is unknown.</p>
      <p>• The number of clusters must be known in advance.</p>
      <p>
        High subjectivity is one of the key shortcomings of existing iterative algorithms.
Increasing the objectivity of clustering is possible through the use of inductive
methods for modeling complex systems based on the inductive method of data processing
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], in which data processing is carried out by two equal power subsets, and the final
decision on the nature of the separation of objects into clusters is made on the basis of
integrated use external relevance criteria and internal criteria for assessing the quality
of clustering. Thus, the development of models and methods for clustering objects
based on inductive modeling methods to solve the problem of identifying cancer
subtypes is an urgent task.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <p>A block diagram of the identification of experimental data obtained by RNA
sequencing with glioblast is presented in Figure 1.</p>
      <p>For all four approaches, the application of all the indicated clustering procedures
both on the initial data and on the data matrix after the Feature Selection procedure
using the PCA. Evaluation of the results using Index Dunn, Index Calinski-Harabasz,
Entropy, and graphical visualization using Silhouette</p>
      <p>The aim of the work is to develop inductive models of object clustering of
subtypes by multiglioblastomas based on k- and c-srenich algorithms and to assess the
quality of the solution of the results obtained.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Review of the Literature</title>
      <p>
        The basic concepts for creating an inductive method for clustering objects are
described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Further development of this theory is reflected in [14]. The concept of
objective cluster analysis is presented in the following sections and was further
developed in [15]. The authors determine the basic principles of creating an objective
cluster inductive model, show the ways and prospects of its implementation, determine
the advantages of a cluster inductive model in comparison with traditional methods of
data clustering.
      </p>
      <p>Theoretical developments on the implementation of billisterization methods for
systems of inductive modeling of complex processes are presented in [16]. In the
work [13] authors presented an inductive model of object clustering of objects based
on k-means clustering. An algorithm is proposed and practically implemented for
dividing the source data into two equal-sized subsets. The paper presents studies on
the assessment of the stability of the model to the noise component using the "Seeds"
data. However, it should be noted that, despite the successful results achieved in this
area, an objective cluster model based on the analysis of cluster systems does not
currently have practical implementation for solving problems in bioinformatics.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>Materials and Methods</title>
      <sec id="sec-4-1">
        <title>Data</title>
        <p>
          A glioblastoma (GBM) gene expression dataset downloaded from TCGA. This is a
small dataset with 1500 genes and 100 cancer samples extracted from gene expression
data for examples [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Normalization</title>
        <p>Data normalization was carried out according to the characteristics in accordance with
the formula:
(1)
where xij is the value of the attribute i in column j, x՛ij is the normalized value of this
attribute, medj is the median of column j. The choice of this normalization method
was determined by the fact that as a result, the set of data attributes in all columns had
the same median with a maximum range of variation of attributes from -1 to 1, while
the data volume for each column falling into the interquartile distance (50%) is the
largest compared to other normalization methods.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Splitting Into Equidistant Sets</title>
        <p>The algorithm for dividing the original set of objects Ω into 2 equally powerful
disjoint subsets ΩA and ΩB consists of the following steps [16]:
1. calculation of
sample;</p>
        <p>pairwise distances between objects in the original data</p>
        <sec id="sec-4-3-1">
          <title>2. selection of a pair of objects</title>
          <p>the distance between which is minimal:
3. distribution of the object into a subset , and the object into a subset ;
repeating steps 2-3 for the remaining objects. If the number of objects is odd, the last
object is distributed into both subsets.
4.4</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Inductive k-means Algorithm</title>
        <p>The k-means algorithm is one of the machine learning algorithms that solves the
clustering problem. This algorithm is a non-hierarchical, iterative clustering method; it
has gained great popularity due to its simplicity, visualization of implementation, and
rather high quality of work. It was invented in the 1950s by the mathematician Hugo
Steinhaus [17] and almost simultaneously by Stuart Lloyd [18]. Particularly popular
after the publication of the work of McQueen [19] in 1967.</p>
        <p>The algorithm is a version of the EM algorithm, which is also used to separate a
mixture of Gaussians. The main idea of the k-means algorithm is that the data is
randomly divided into clusters, after which the center of mass for each cluster obtained in
the previous step is iteratively recalculated, then the vectors are divided into clusters
again according to which of the new centers is closer in selected metric.</p>
        <p>The purpose of the algorithm is to divide n observations into k clusters so that
each observation belongs to exactly one cluster located at the smallest distance from
the observation.</p>
        <p>Step 1. Start
Step 2. Formation of the initial set  of studied objects. Presentation of the data in
the form of a matrix   xij; i  1, m , where n is the number of rows or the number of
objects under investigation, m is the number of columns or the number of features
characterizing the objects.</p>
        <p>Step 3. Data preprocessing - data normalization:
• median normalization (Feature Median) is obtained by calculating the median of all
data attributes:</p>
        <p>zij  xij  med j  mad j
where xij  zij  is the i-th observation in the j-th variable (the i-th normalized
observation in the j-th variable),
med j  meid  xij  is the
median for the j-th variable,
mad j  m aid  xij  is the mean absolute deviation for the j-th variable.
• normalization using a standardized score (z-score) is a measure of the relative spread
of the observed or measured value, which shows how many standard deviations is its
spread of the relative average value. This is a dimensionless statistic used to compare
values of different dimensions or a measurement scale.
zij 
xij  X</p>
        <p>Sxij</p>
        <p>S
where X is the average value, xij is the standard deviation of the i-th observation in
the j-th variable. The best normalization method depends on the data that will be
normalized. Typically, the Z-score is very common to normalize the data [20].
Step 4. Dividing  into two equally powerful subsets in accordance with the above
algorithm. The resulting subsets  A and  B can be formally represented as follows:
 A  x A ;  B  x B ;</p>
        <p>ij ij
i  1, nA  nB ; nA  nB  n; j  1, m</p>
        <sec id="sec-4-4-1">
          <title>Silhouette:</title>
          <p>Step 5. Choosing the initial number of clusters k  kmin .</p>
          <p>Step 6. Configuring the k-means clustering algorithm.</p>
          <p>For each equidistant subset:
Step 7. Sequential clustering and cluster fixing.</p>
          <p>Step 8. Calculation of the internal criteria for the quality of clustering.</p>
          <p>1 K</p>
          <p>SWC  K j1 Sxj</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Dunn Index:</title>
          <p>DI  k   min
ik</p>
        </sec>
        <sec id="sec-4-4-3">
          <title>Entropy:</title>
          <p>Calinski – Harabasz Index:
 Q K ln uqk 
PE  q1 k1
 Q

</p>
        </sec>
        <sec id="sec-4-4-4">
          <title>Step 9. Calculation of the external balance criterion:</title>
          <p>Step 10. If the value of the balance criterion reaches the optimum, then:
Step 11 Fixes the resulting clustering,
otherwise the number of clusters increases by 1 and steps 5–9 are repeated
Step 12. Determining the optimal number of clusters kopt .</p>
          <p>ECB </p>
          <p> opt
IC A  IC B 2
IC A  IC B 2
Step 13. Clustering data (the set  of objects under study), fixing the clusters.
Step 14. Validation of the results of clustering.</p>
          <p>Step 15. Visualize the results of clustering.</p>
          <p>Step 13. The End
4.5</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>Inductive Fuzzy C-Means Algorithm</title>
        <p>The method of fuzzy clustering of c-means (or fuzzy clustering, soft k-means,
cmeans) allows you to split the existing set of elements with cardinality into a given
number of fuzzy sets. The fuzzy clustering method of c-means can be considered as
an improved method of k-means, in which for each element from the considered set
the degree of its belonging (or responsibility) to each of the clusters is calculated. The
algorithm was developed by J.C. Dunn in 1973 [21] and improved by J.C. Bezdek in
1981 [22].
(b)
Fig. 2. Pseudocode of inductive algorithm k-means (a) c-means (b)
4.6</p>
      </sec>
      <sec id="sec-4-6">
        <title>Clustering Quality Assessment</title>
        <sec id="sec-4-6-1">
          <title>As criteria for the quality of clustering were used: 1. Silhouette [23]</title>
          <p>SWC  K1 iK1 Sxj  max</p>
          <p>S
where K is the number of clusters, xj is the "best" membership of the element j in
the cluster p.</p>
          <p>The best partition is characterized by the maximum SWС, which is achieved when the
distance inside the cluster is small and the distance between the elements of
neighboring clusters is large.
2. Dunn Index [24]
Compares intercluster dissolution with cluster diameter. The higher the index value,
the better the clustering.
x
3. Calinski Index - Harabasz [25]
where N is the number of objects, K is the number of clusters. The maximum index
value corresponds to the optimal cluster structure.
4. Entropy [26]</p>
          <p>QCCH </p>
          <p>QCB   N  K 
QCW  K 1</p>
          <p> max
 Q K ln uqk 
PE  q1 k1
 Q


Entropy is known as a numerical expression of the ordering of a system. The entropy
of the partition reaches a minimum at the highest ordering in the system (in the case
of a clear partition, the entropy is zero). That is, the greater the degree of belonging of
an element to one cluster (and the less the degree of belonging to all other clusters),
the lower the value of entropy and the more qualitatively the clustering is performed.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments and Results</title>
      <p>For the experiment, we used data from the CancerSubtypes package, which is
designed to assist in identifying the validation of cancer subtypes based on arrays of
genomic cancer data. The package is implemented in the R language and is available
as a bioconductor package at http: // bioconductor. org / packages / CancerSubtypes /.
Glioblastoma gene expression data set (GBM) downloaded from TCGA. This is a
small data set with 1,500 genes and 100 cancer samples extracted from gene
expression data.</p>
      <p>The inductive clustering algorithm was used for a complete set of data (1500x100),
the preprocessing of which was carried out in the form of normalization (median,
Zscore) and for data whose preprocessing includes normalization and reduction of data
sizes based on analysis of the main components (PCA). As a result, the number of
genes was reduced to 44 components (44x100).</p>
      <p>For the experiment, two clustering algorithms were used - k-means and c-means.
The results are presented in Table 1.
The clustering results were visualized using the Silhiuette method. This method
allows to interpret and verify the data consistency within clusters. The technique
provides a concise graphical representation. The Silhiuette value is a measure of how
much the object resembles the one in its own cluster (cohesion) compared to other
clusters (separation). The Silhiuette ranges from -1 to +1. A high positive value
indicates that the object is in good agreement with its own cluster and is poorly aligned
with neighbouring clusters. If most objects have a high positive value, then the
clustering configuration is appropriate. If many points have a low or negative value, then
too many or too few clusters can be in the clustering configuration. Figures 3 and 4
show a graphical representation of the clustering results of the inductive k-means and
c-means algorithms.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper presents the results of studies identifying the validation of cancer subtypes
based on arrays of genomic cancer data. As an experiment, we used the GeneExp
dataset obtained from the data from TCGA (The Cancer Genome Atlas) projects.</p>
      <p>The source data matrix contained 1,500 genes and 100 cancer samples. At the first
stage, we normalized the genes. In the second stage, the number of genes was
changed to 44 components using the principal component analysis (PCA). Then we
performed the inductive clustering algorithm and compared various clustering
methods using internal clustering quality criteria as a criterion for evaluating the
effectiveness of the corresponding clustering method.</p>
      <p>An analysis of the processed data allows us to conclude that the proposed method
is highly effective since its implementation can significantly reduce the set of
components of cancer genomic data for subsequent processing.
13. Babichev, S., Taif, M. , Lytvynenko, V.: Estimation of the inductive model of objects
clustering stability based on the k-means algorithm for different levels of data noise. Radio
electronics, computer science, management. Zaporozhye: NAS of Ukraine, no. 4. pp.54-60
(2016)
14. Stepashko, V.S.: Elements of Inductive Modeling Theory - State and Prospects of
Informatics Development in Ukraine: Monographic arts. Scientific Thought, pp. 471–486
(2010)
15. Osypenko, V. V.: The Methodology of Inductive System Analysis as a Tool of
Engineering Researches Analytical Planning. Ann. Warsaw Univ. Life Sci. SGGW. no. 58. pp. 67–
71 (2011)
16. Sarycheva, L.V.: Objective cluster analysis of the data on the basis of the Group Method
of Data Handling. Problem of Management and Informatics, no. 2, pp. 86-104. [In
Russian] (2008)
17. Steinhaus, H. :Sur la division des corp materiels en parties. Bull. Acad Polon Sci 1.804
(1956)
18. Lloyd, S. P. :Least square quantization in PCM. Bell Telephone Laboratories Paper.
Published in journal much later: Lloyd, SP: Least squares quantization in PCM. IEEE Trans.</p>
      <p>Inform. Theor. (1957/1982).
19. MacQueen, J.: Some methods for classification and analysis of multivariate observations.</p>
      <p>Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
vol. 1, no. 14(1967)
20. Melnik, M.: Fundamentals of applied statistics. Moscow: Energoatomizdat, 416 p. (1983)
21. Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact
Well-Separated Clusters . Journal of Cybernetics. 17 09 (t. 3, no. 3), pp. 32–57. ISSN
0022-0280. - DOI: 10.1080 / 01969727308546046 (1973)
22. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. ISBN
0306-40671-3 (1981)
23. Kaufman, L., Rousseeuw, P.: Finding Groups in Data. An Introduction to Cluster Analysis.</p>
      <p>Wiley (2005)
24. Bezdek, J.C., Dunn, J.C.: Optimal fuzzy partitions: A heuristic for estimating the
parameters in a mixture of normal dustrubutions. IEEE Transactions on Computers, 835–
838(1975)
25. Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis // Comm. in Statistics,
3: 1.27(1974)
26. Sripada, S., Rao, G.: Comparison of purity and entropy of k-means clustering and fuzzy
with means clustering, Indian journal of computer science and engineering, vol 2, no.3
ISSN: 0976-5166 (2011)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bleeker</surname>
            ,
            <given-names>F. E .</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molenaar</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leenstra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Recent advances in the molecular understanding of glioblastoma</article-title>
          .
          <source>Journal of Neuro-Oncology</source>
          .
          <volume>108</volume>
          (
          <issue>1</issue>
          ):
          <fpage>11</fpage>
          -
          <lpage>27</lpage>
          .
          <source>PMC 3337398. PMID 22270850 doi: 10.1007 / s11060-011-0793-0</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Verhaak</surname>
            ,
            <given-names>R. G.</given-names>
          </string-name>
          et al.:
          <article-title>Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1</article-title>
          .
          <source>Cancer Cell</source>
          .
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <fpage>98</fpage>
          -
          <lpage>110</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Network</surname>
          </string-name>
          , T. C.: Corrigendum:
          <article-title>Comprehensive genomic characterization defines human glioblastoma genes and core pathways</article-title>
          .
          <source>Nature</source>
          .
          <volume>494</volume>
          (
          <issue>7438</issue>
          ),
          <volume>506</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Frattini</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          et al.:
          <article-title>The integrated landscape of driver genomic alterations in glioblastoma</article-title>
          .
          <source>Nat. Genet</source>
          .
          <volume>45</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1141</fpage>
          -
          <lpage>1149</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>5. The Cancer Genome Atlas homepage</article-title>
          .
          <source>NCI and the NHGRI. Retrieved</source>
          <year>2009</year>
          -
          <volume>04</volume>
          -28. http://cancergenome.nih.gov/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cha</surname>
            ,
            <given-names>Y.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>You</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>D.K.</given-names>
          </string-name>
          :
          <article-title>Microstructure arrays of DNA using topographic control</article-title>
          .
          <source>Nature Communications</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ), art.
          <source>no. 2512 doi: 10.1038/s41467- 019-10540-2</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lian</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>Z.-M.:</given-names>
          </string-name>
          <article-title>Unveiling novel targets of paclitaxel resistance by single molecule long-read RNA sequencing in breast cancer</article-title>
          .
          <source>Scientific Reports</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ), art.
          <source>no. 6032 doi: 10</source>
          .1038/s41598-019-42184-z (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerstein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snyder</surname>
            ,
            <given-names>M. :</given-names>
          </string-name>
          <article-title>RNA-Seq: a revolutionary tool for transcriptomics</article-title>
          .
          <source>Nature Reviews. Genetics</source>
          .
          <volume>10</volume>
          (
          <issue>1</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>63</lpage>
          . doi:
          <volume>10</volume>
          .1038 / nrg2484.
          <source>PMC 2949280. PMID 19015660</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
          </string-name>
          , T. D.,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , et al.:
          <article-title>Identifying cancer subtypes from mirna-tf-mrna regulatory networks and expression data [J]</article-title>
          .
          <source>PloS one</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ): e015279 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Omran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engelbrecht</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An overview of clustering methods</article-title>
          .
          <source>Intell. Data Anal</source>
          .
          <volume>11</volume>
          (
          <issue>6</issue>
          ):
          <fpage>583</fpage>
          -
          <lpage>605</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Celebi</surname>
            ,
            <given-names>M. E .</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kingravi</surname>
            ,
            <given-names>H. A .</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vela</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          :
          <article-title>A comparative study of efficient initialization methods for the k-means clustering algorithm</article-title>
          .
          <source>Expert Systems with Applications</source>
          .
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <fpage>200</fpage>
          -
          <lpage>210</lpage>
          . arXiv:
          <fpage>1209</fpage>
          .
          <year>1960</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Madala</surname>
            ,
            <given-names>H. R.</given-names>
          </string-name>
          :
          <article-title>Inductive Learning Algorithms for Complex Systems Modeling</article-title>
          . CRC Press,
          <volume>365</volume>
          p. (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>