<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Two Step Density-Based Object-Inductive Clustering Algorithm</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Jan Evangelista Purkyne University in Usti nad Labem</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kherson National Technical Uneversity</institution>
          ,
          <addr-line>Kherson</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National University of Water and Environmental Engineering</institution>
          ,
          <addr-line>Rivne</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The article includes the results of study into the practical implementation of two-step DBSCAN and OPTICS clustering algorithms in the field of objective clustering of inductive technologies. The architecture of the objective clustering technology was developed founded on the two-step clustering algorithm DBSCAN and OPTICS. The accomplishment of the technology includes the simultaneous data's clustering on two subsets of the same power by the DBSCAN algorithm, which involve the same number of pairwise objects similar to each other with the subsequent correction of the received clusters by the OPTICS algorithm. The finding the algorithm's optimal parameters was carried out based on the clustering quality criterion's maximum value of a complex balance, which is rated as the geometric average of the Harrington desirability indices for clustering quality criteria (internal and external).</p>
      </abstract>
      <kwd-group>
        <kwd>Clustering</kwd>
        <kwd>Density-based clustering</kwd>
        <kwd>Objective clustering</kwd>
        <kwd>Inductive clustering</kwd>
        <kwd>clustering quality criteria</kwd>
        <kwd>Two Step Clustering</kwd>
        <kwd>DBSCAN</kwd>
        <kwd>OPTICUS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The density-based algorithm is a highly efficient and simple algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Different methods are best suited for different databases. In this paper, we consider the
DBSCAN and OPTICS clustering algorithms, which are used to find clusters of
various shapes, densities and sizes in spatial data sets with noise.
      </p>
      <p>
        The clustering algorithm, Named DBSCAN, (Density Based Spatial Clustering of
Applications with Noise) was proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is based on the assumption that the
density of points, which are located inside the clusters, is greater than behind the
clusters. This algorithm allows finding nonlinearly separable clusters of arbitrary shape. It
can detect clusters completely encircled, but not connected with other clusters. It does
not need specification of the amount of clusters, distinguishes noise and is resistant to
outliers.
      </p>
      <p>However, the DBSCAN algorithm is not without flaws. The boundary points that
can be reached from more than one cluster can belong to any of these clusters, which
rely on the order of viewing the points.</p>
      <p>
        OPTICS clustering algorithm (Ordering points to identify the clustering structure)
as well as DBSCAN allows finding clusters in data space based on density and was
proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, unlike DBSCN, this algorithm uses the distance between
neighboring objects to obtain the availability field, which is used to separate clusters
of different densities from noise, which solves the problem of finding content clusters
in data that have different densities. To do this, the data is ordered, so that the
spatially close points become adjacent in the ordering. For each point, a special distance is
stored that represents the density that should be taken for the cluster so that the points
belong to the same cluster. The result of this procedure is presented in the form of a
dendrogram.
      </p>
      <p>
        Algorithms based on density are highly efficient and simple algorithms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Different methods are best suited for different databases. Here we are dealing with
DBSCAN and OPTICS, which are used to find clusters of various shapes, densities
and sizes in spatial data sets with noise.
      </p>
      <p>The idea underlying this algorithm is that inside each cluster there is a typical
density of points (objects), which is noticeably higher than the density outside the cluster,
as well as the density in areas with noise lower than the density of each cluster.</p>
      <p>
        On the other hand, inductive clustering methods [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] allow for inaccurate noisy data
and short samples, using the minimal amount of the chosen quadratic criterion, to find
a non-physical model (decision rule), the accuracy of which is less than the structure
of the full physical model.
      </p>
      <p>Examining the set of candidate models by external criteria is necessary only for
non-physical models. In case of small dispersion of interference, it is advisable to use
internal search criteria. With increasing interference, it is advisable to move to
nonparametric algorithms. The use of inductive clustering methods is advisable because
they almost always ensure that the optimal amount of clusters is found that is
adequate for the noise level in the data sample.</p>
      <p>The main idea of this work is to combine the density algorithms DBSCAN and
OPTICS, which allow you to recognize clusters of various shapes, as well as define
content clusters for data with different densities and in the form of a two-step
algorithm and an inductive clustering method that will significantly improve the accuracy
when recognition of complex objects. It is assumed that by combining these methods,
it is possible to solve some of the problems listed above with a sufficiently high
result.</p>
      <p>The aim of the work is to develop a methodological basis for constructing hybrid
inductive cluster-analysis algorithms for isolating (clustering) objects with complex
non-linear forms with high recognition accuracy and resolution.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Review of the Literature</title>
      <p>
        The classification of several clustering algorithms by their categories is presented in
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Each of them has its advantages and disadvantages. The choice of an appropriate
clustering algorithm is definited by the type of data being examined and the purpose
of the current task.
      </p>
      <p>Non-parametric algorithms capable of distinguishing clusters of arbitrary shape
also allow obtaining a hierarchical representation of data.</p>
      <p>
        The approach used by these algorithms for non-parametric density estimation is
that the density is characterized by the number of nearby elements [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, the
proximity of a pair of elements is determined by the amount of common neighboring
elements. The most prominent representative of this approach is the DBSCAN clustering
algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Its basic idea is that if an element in the radius keeps a specified amount (MinPts)
of neighboring elements, then all its “neighbors” are placed in the same cluster with
it. Elements that do not have a sufficient number of "neighbors" and are not included
in any cluster belong to "noise". DBSCAN allows you to select clusters of complex
shape and cope with the choices and "noise" in the data. The disadvantages of the
algorithm are the complexity of setting parameter values (and MinPts) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the
difficulty in identifying clusters with significantly different densities.
      </p>
      <p>
        The OPTICS algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a generalization of DBSCAN, where elements are
ordered into a spanning tree so that the spatially close elements are close together. In
this case, there is no need to carefully adjust the appropriate parameter, and the result
is a hierarchical result [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. One of the major drawbacks of the existing clustering
algorithms is the reproducibility error. The basic idea for solving this problem was
proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ], the authors showed that a decrease in reproducibility error can be
achieved through the use of inductive modeling methods for complicated systems,
which are a logical prolongation of group data processing methods. The issues of
creating a methodology for analyzing inductive systems as a tool for analytical
planning of engineering research are considered in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the authors first proposed
a hybrid inductive clustering algorithm based on DBSCAN.
      </p>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] presents the results of computational experiments using objective
cluster inductive technology of multidimensional high-dimensional data. The authors
showed that the implementation of this technology based on some clustering
algorithm involves determining the affinity function between objects, clusters, and
objects, and clusters at the first stage. Then we need to share the investigated data into
two subsets of the same power, which contain the same number of pairs of similar
objects. The formation of quality criteria for the clustering of internal, external and
complex balance should be carried out at the next stage. Optimal clustering is
determined on the basis of the extreme values of the criteria used in the sequential
enumeration of admissible clustering.
      </p>
      <p>
        The article [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] describes the study’s results of the practical accomplishment of the
DBSCAN clustering algorithm within the objective clustering of inductive
technology. In this paper, the finding of the optimal parameters of the algorithm was
performed by use of the complex criterion maximum value for the quality of clustering,
which is calculated as the geometric average of the indicators of the desirability of
Harrington for external and internal criteria for the quality of clustering.
      </p>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] investigated the problem of clustering complex data of inductive
objective clustering technology. A practical accomplishment of the hybrid data
clustering model based on the integrated use of R and KNIME software tools has been
implemented. The model performance was evaluated using various types of data. The
simulation results showed the high efficiency of the proposed technology. It is shown
that the proposed method allows reducing the reproducibility error value because the
final decision on determining the optimal parameters of the clustering algorithm is
made on the basis of parallel analysis of clustering results obtained on equally
powerful data sets taking into account the difference in clustering results obtained on these
subsets.
      </p>
      <p>In this article, we describe a hybrid model of an objective cluster inductive
technology founded on the two-step clustering algorithm DBSCAN and OPTICS. The
practical implementation of the proposed model was performed on R.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Problem Statement</title>
      <p>
        The formulation of the clustering problem is as follows: let X is the set of objects, Y
is the set of amounts (names, labels) of clusters. The function of the distance between
objects   x, x is also set. It is necessary to divide the sample into subsets that do
not overlap (clusters), so that each cluster composes of objects  that are close in
metric, and the objects of different clusters are significantly different. In addition,
each object xi  X m corresponds to a cluster number yi . In this case, the clustering
algorithm can be considered as a function a : X  Y that assigns a cluster number
y Y to any object x  Y . In some cases, the set Y is known in advance, but more
often the task is to find the optimal amount of clusters according to one or another
criterion of the quality of clustering [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In inductive clustering methods, the cluster model is selected using the minimum
external balance criterion, which characterizes the quality of clustering of the
corresponding model on two identical power sets.</p>
      <p>Formally, the optimal inductive clustering model can be presented as:
M : R  K  | e  e0 ,  0 CR opt
RK  is the result of clustering, e is the error of clustering on the training and test
samples,  is the time interval of the clustering process, CR is the set of internal and
external criteria for assessing the quality of clustering.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>Materials and Methods</title>
      <sec id="sec-4-1">
        <title>DBSCAN Сlustering Аlgorithm</title>
        <p>The idea underlying the algorithm is that inside each cluster there is a typical density
of points (objects), which is noticeably higher than the density outside the cluster, as
well as the density in areas with noise below the density of any of the clusters. For
each point of the cluster, its neighborhood of a given radius must contain at least a
certain amount of points, this amount of points is specified by a threshold value.</p>
        <p>
          Most algorithms that produce a flat partition create clusters in the form close to
spherical, since they minimize the interval of documents to the center of the cluster
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          DBSCAN authors have shown experimentally that their algorithm is capable of
recognizing clusters of different shapes. The basic idea behind the algorithm lies in
the fact that within each cluster there is a typical density of points (objects) that is
noticeably higher than the outside density of the cluster, as well as the density in areas
with noise below the density of any of the clusters. Even more precisely, for each
point of the cluster, its neighborhood of a given radius must contain some amount of
points, this amount of points is given by the limit values [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>
          The basis of this algorithm is several definitions [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]:





 is the vicinity of the object is called the outskirts of the radius  of some
object;
the root object is named an object  is the neighborhood of which contains
some minimum amount of MinPts objects;
the object p is directly tightly accessible from the object q if p located in 
is the neighborhood q and q is the root object;
the object p is the tight reachable from the object q for the given  and the
parameter MinPts, if there is a sequence of objects p, , p , where p  q and
p  p such that p 1 is directly densely achievable with p , 1  i  n ;
the object p is tightly connected to the object q when given  and MinPts, if
there is an object o that p is the same as q the available volume from o .
To search for clusters, the DBSCAN algorithm checks  is the neighborhood of each
object. If  is the neighborhood of the object p contains more points than MinPts,
then a new cluster with a root object p created. Then, DBSCAN iterative collects
objects directly tightly reachable from the root objects, which can lead to the union of
several tight reachable clusters. The process is completed when no new object can be
added to one cluster.
        </p>
        <p>Although the DBSCAN algorithm does not need the pre-specified amount of
clusters received, it will be necessary to specify parameters values  and MinPts that
directly affect the clustering result. The optimal values of these parameters are
difficult to determine, especially for multidimensional data spaces. In addition, the
distribution of data in such spaces is often asymmetric, which does not allow them to be
used for clustering of global density parameters.</p>
        <p>The work of the DBSCAN algorithm is as follows.</p>
        <p>Enter: the set of objects S, Eps and MinPt.</p>
        <p>An object can be in one of three states:
1. Not noted.
2. It is noted that no cluster is the internal object.</p>
        <p>3. Attributed to some cluster.</p>
        <p>Step 1. To set all the elements of the set S flag S "not marked". Assign the current
cluster C j to a zero number, j  0 . The set of noise points Noise = 0.
Step 2. For each si  S such flag  si  = "not marked", execute:
Step 3. Flag  si  = "not marked";
Step 4 Ni  NEps  si   q  S dist  si , q  Eps</p>
        <sec id="sec-4-1-1">
          <title>Step 5. If si  MinPt , then Noise  Noise  si </title>
          <p>Otherwise the number of the next cluster j  j 1 ;
EXPANDCLUSTER  si , Ni , C j , Eps, MinPt  ;
Exit: The set of clusters C  C j  .</p>
          <p>EXPANDCLUSTER
Login: The current object si , its eps neighbor Ni , the current cluster Ni and</p>
          <p>Eps, MinPt .</p>
          <p>Step 1 C j  C j  si  ;
Step 2. For all points sk  Ni :
Step 3. If the flag  sk  = "not marked", then
Step 4 flag  sk  = "marked";
Step 5. Nik  NEps  sk  ;
Step 6. If Nik  MinPt , then Ni  Ni  Nik ;
Step 7. If ∄ p : sk  Cp , p  1, C  , those C j  C j  sk  ;</p>
          <p>Exit: cluster C j .</p>
          <p>
            As the research shows [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], the considered clustering algorithm has a number of
advantages that make it possible to use this method for working with clusters of
different nature (forms); the application of this algorithm allows you to work with
largescale samples and allows you to work with n-dimensional objects (these are objects
whose attributes are more than 3 if the function is appropriately selected for
calculating the distance (in the general case it is possible to use the Markov metric) However,
a significant disadvantage is a rather laborious procedure for determining the required
parameters for the correct operation of the algorithm. More detailed descriptions and
drawbacks of the DBSCAN algorithm are shown in Table 1.
          </p>
          <p>Disadvantages
1. DBSCAN is not a fully deterministic
algorithm: the boundary points that can be
accessed from more than one cluster may be
part of another cluster, depending on the
order of data processing.
2. The quality of DBSCAN operation depends
on the distance used. The most commonly
used Euclidean distance; But for
multidimensional data, this indicator can be almost
useless due to the so-called "curse of
dimension", which makes it difficult to find
the nearest value for ε. This effect is also
located in any other algorithm based on the</p>
          <p>Euclidean distance.
3. DBSCAN cannot copy the data to a large
difference in density, since the combination
MinPts ε cannot be selected appropriately
for all clusters.
4. If the scale and data are not understandable,
it may be very difficult to choose a
significant distance from the threshold ε.
5. Significant drawback is a very laborious
procedure for determining the required
parameters for the correct algorithm
procedure.</p>
          <p>
            In the general case, the DBSCAN algorithm has a quadratic computational
complexity due to the search for the Eps is neighborhood. However, the authors of the
algorithm used a special data structure for this purpose R * are trees, as a result, the search
for Eps is neighborhood for one point O (log n). The total computational complexity
of DBSCAN is O (n * log n) [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ].
4.2
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>OPTICS Сlustering Аlgorithm</title>
        <p>
          The concept of the OPTICS algorithm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is similar to DBSCAN, but the algorithm is
designed to get rid of one of the main weaknesses of the DBSCAN algorithm is the
problem of finding content clusters in data that has different densities.
        </p>
        <p>To do this, the database points are (linearly) ordered so that the spatially close
points become adjacent in the ordering. In addition, for each point, a special distance
is stored that represents the density that should be taken for the cluster, so that the
points belong to the same cluster. This is presented in the form of a dendrogram.</p>
        <p>
          In this case, there is no need to carefully adjust the appropriate parameter, and the
result is a hierarchical result [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. However, the parameter is specified in the algorithm
as the maximum radius considered. Ideally, it can be set very large, but this leads to
exorbitant computational costs.
        </p>
        <p>
          OPTICS density algorithm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] also allows you to select a hierarchical structure and
clusters of complex shape. The data is ordered into a spanning tree so that the
spatially close elements are located nearby. In this case, the hierarchy is represented in the
form of a reachability diagram, on which the reachability distances for the constructed
sequence of elements are marked. The peaks in the diagram correspond to the
divisions between the clusters, and their height to the distance. If necessary, a dendrogram
can be easily constructed from a reachability diagram. Since for each element only
adjacent elements in a limited radius ε are considered, the OPTICS algorithm can be
implemented with computational complexity, which is not sufficient for processing
large data arrays.
        </p>
        <p>
          DBSCAN requires two parameters, the optimal values of which are difficult to
determine. Therefore, an OPTICS algorithm was proposed in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], which makes it
possible to order the initial set and simplify the clustering process. In accordance with it, a
reachability diagram is constructed, thanks to which, with a fixed MinPts value, it is
possible to process not only the specified value e, but also all e * &lt;e.
        </p>
        <p>Unlike DBSCAN, the OPTICS algorithm also considers points that are also part of
a denser cluster, so each point is assigned a main distance, which describes a distance,
which describes the distance to the MinPts -th nearest point:</p>
        <p>UNDEFINED
core  dist ,MinPts  
 MinPtsthN  p </p>
        <p>N  p   MinPts</p>
        <p>
N  p   MinPts

(2)
core  dist ,MinPts is the main interval and MinPtsthN  p  is the ascending order of
interval to N  p  .</p>
        <p></p>
        <p>The attainable interval of a point o from a point p is equal to either the interval
between p and o, or the main interval of the point p, depending on which value is
greater:
 UNDEFINED

reachability  dist ,MinPts o, p  
max core  dist ,MinPts p , dist  p, o
</p>
        <p>N  p   MinPts
N  p   MinPts
(3)
reachability  dist ,MinPts o, p  is the attainable interval. If p and o are the nearest
neighbors, and     , we can assume that p and o belong to the same cluster.</p>
        <p>Both the main and achievable interval s are not determined if there is not a
sufficiently dense cluster (applied to  ). If you take a large enough, this will never
happen, but then any query  is the neighborhood returns the entire database, which leads
to work time O  n2  . The parameter  is required to cut loose clusters that are no
longer interesting, and thereby speed up the algorithm. The parameter  , strictly
speaking, is optional. It may simply be set to the maximum possible value. However,
when a spatial index is available, it affects the computational complexity. OPTICS
differs from DBSCAN in that this parameter is not taken into account, if it can
influence, then only in that it sets the maximum value.</p>
        <p>The advantage of the algorithm is that it can efficiently process clusters if the data
has different densities and retrieves objects in a specific order using the ordering
mechanism. The disadvantages of the algorithm include the fact that it is less sensitive
to erroneous data than DBSCAN.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Inductive Сlustering Аlgorithm</title>
        <p>
          Among the main principles of inductive modeling of complex systems are the three
mentioned above, namely [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]: the principle of self-organization; the principle of
external complement and the principle of freedom of decision-making.
        </p>
        <p>The principle of self-organization of models based on the inductive approach to
simulation of complex systems, the origins of which are presented in this topic,
categorically rejects the path of expansion and complication of the model and increase the
output volume of information about the object and postulates the existence of an
optimal, scaled modeling area, and also one model of optimal complexity. It can be
synthesized with the help of self-organization, that is, the search for many model
applicants for appropriately selected external selection criteria for models. Optimization of
the model for some ensemble of criteria determines the results of simulation at the
given levels of noise and volume of observations.</p>
        <p>The principle of external complement is connected with Godel's theorem "... only
external criteria, based on new information, allow us to synthesize the true model of
the object, hidden in the data that is noisy." In other words, we can say that, according
to this principle, only external criteria (i.e., calculated on the basis of "fresh" data not
used for the synthesis of the model) with increasing complexity of the model pass
through the minima. The application of this principle is realized by dividing the
original data table into two parts A and B.</p>
        <p>The principle of freedom of decision-making. In accordance with this principle, for
each generation (or series of model selection) there is a certain minimum of selected
combinations, which are called freedom of choice and ensure the convergence of
multi-row selection of the optimal complexity model. The principles of the freedom to
make decisions and the step-by-step (multi-faceted) decision-making procedure are
first implemented in the perceptron. The perceptron consists of several customizable
link lines. After each series of links, a special device is required that passes the most
probable solutions in the next series. On the last placement a single and final decision
is taken. In other words, following the purposeful selection of models to determine the
optimal complexity model in accordance with the principles set out, the following
rules must be observed:
for each generation (or series of selection) models there is a certain minimum of
selected combinations, which are called freedom of choice;
too many generations leads to an induction (the information matrix becomes
poorly defined);
the more difficult the problem of selection, the more generations need to obtain a
model of optimal complexity;
the freedom of choice is ensured by the fact that for each subsequent series of
selection not only one solution is passed, but a few of the best, selected in the last
row D.</p>
        <p>
          Gabor formulated this principle in the following way: to make decisions at the given
time is necessary in such a way that at the next moment of time when the need for the
next decision will arise, freedom of decision-making would be preserved [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          These principles formed the basis of technology for solving the problems of
inductive synthesis of models according to experimental data. The most general
formulation of the problem of inductive synthesis of models by experimental data, or
structural-parametric identification, is given in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. According to these papers, such a
statement is to find the extremum of some criterion on the set of different models  :
f *  arg min CR  f 
(4)
Since (1) is not completed by the formulation of the problem, it needs to be further
identified, in particular:
ask a priori expert or expert information about the type, character, and
volume of the initial information to be known from the analysis of the
experiment;
specify the class of basic functions from which the set  must be formed;
determine the method of generating models f ;
specify a method for evaluating parameters;
specify a model comparison criterion CR  f  and specify a method for
minimizing it.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], it is noted that the view on clustering as a model allows us to transfer to the
theory of cluster analysis all the basic concepts of the theory of self-organization of
models based on the method of the group method of data handling (GMDH).
Selforganization of clustering models is called their selection in order to choose optimal
clustering. The more inaccurate data is the easier it is to optimize clustering
(complexity is measured by the amount of clusters and the amount of attributes). In cluster
analysis (OCA) algorithms, clusters are formed by the internal criterion (the more
complex, the more precise), and their optimal number and composition of the
ensemble of attributes are calculated by the external criterion (forming a minimum in the
region of under-complicated clusterization, optimal for a given level of dispersion of
noise). The overlapping of clustering variants implements the OCA algorithm [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
The construction of a hierarchical tree of clustering organizes and reduces busting,
and the optimal clustering criterion is not lost. Physical clustering is based on the
criterion of clustering balance. To calculate the criterion, the sample of data is parted
into two equal parts. Each clustering tree is constructed on each sub-sample and the
balance criterion is calculated at each step with the same number of clusters. The
criterion needs a clusterization in which the number and coordinates of the centers
(midpoints) corresponding to each other clusters will coincide [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]:
        </p>
        <p>BL </p>
        <p>
          1 M K (xoA  xoB )2  min
MK j1 i1
(5)
K is the number of clusters in this step of constructing a tree; M is the coordinate
number; xoA are the coordinates of cluster centers constructed on part A; xoB are the
coordinates of cluster centers constructed on part B.
1. Index Dunn [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] compare the cluster spacing with the cluster's diameter. The
higher the index value, the better the clustering.
2. Index Calinski – Harabasz [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]
        </p>
        <p>DI  K   min</p>
        <p>iK
QCB   N  K </p>
        <p>QCW   K 1
QCCH 
 max
(6)
(7)
(8)
(9)
(10)
N is the amount of objects, K is the amount of clusters. The maximum value of the
index corresponds to the optimal cluster structure.</p>
        <p>
          For the calculation of the external criterion of balance, the approach taken in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]
was taken as the basis for the basis. In this paper, the external criterion of the (EC)
controlled clusterization is defined as the normalized optimal value of the sum of the
squares of deviations between the values of the internal criteria (IC) of the clustering
quality (1) - (2):
        </p>
        <p>ECB</p>
        <p>KA KB

 ICA  ICB 
 ICA  ICB 
2
2
To create equal conditions for clustering on subsets and when using the DBSCAN
clustering algorithm, an equal amount of clusters is determined at the clustering stage.
The module of the difference in the values of the external balance criteria with the
same amount of clusters on each subset reaches a minimum value:</p>
        <sec id="sec-4-3-1">
          <title>ECBKP  ECBKP1</title>
          <p> min
KP and KP1 are the number of clusters P and P+1. For each KP and KP1 , it is fixed
epsKP и epsKP1 to define clusters on the set  :
eps  epsKP , epsKP1  ,eps  0, 001
minPts minPtsmin , minPtsmax ,minPts  1
To get rid of one of the main weaknesses of the DBSCAN algorithm is the problem of
finding content clusters in data that have different densities, we will further use the
OPTICS algorithm.
4.4</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Two Step Density-Based Objective Inductive Technology Based on</title>
      </sec>
      <sec id="sec-4-5">
        <title>DBSCAN and OPTICS Clustering Algorithm</title>
        <p>The main idea of this study is the combined use of a hybrid architecture that combines
several computational paradigms, the main focus of which is on obtaining synergistic
effects from their combination or, in other words, hybridization. In a hybrid
architecture that combines several paradigms, the effectiveness of one approach can
compensate for the weakness of the other (Fig. 1).</p>
        <p>By combining different approaches, it is possible to circumvent the disadvantages
inherent in each separately. Hybrid algorithms usually consist of various components
that are combined in the interests of achieving their goals.</p>
        <p>In our study, data processing begins with dividing the studied data into two equally
powerful subsets using inductive objective clustering based on DBSCAN, then the
definition of meaningful clusters is performed using the OPTICS algorithm for data
with different densities (Fig. 1)</p>
        <p>The integration and hybridization of various methods and information technologies
makes it possible to solve complex problems that cannot be solved on the basis of any
particular methods or technologies. In this case, in the case of the integration of
heterogeneous information technologies, one should expect synergistic effects of a higher
order than when combining various models within one technology.</p>
        <p>Hybridization helps to take advantage of each of the interacting components while
reducing the effects of their disadvantages and limitations. Hybrid intelligent systems,
that is, those that combine several components, have recently attracted considerable
attention due to their ability to solve complex problems that are characterized by
inaccuracies, uncertainty, unpredictability, high dimensionality and environmental
variability. They can use both expert knowledge and raw data, often providing original and
promising ways to solve problems.</p>
        <p>The more detailed diagram of the proposed hybrid objective clustering technology is
shown in Fig. 2. This includes such steps:</p>
        <p>Step 1. Start
Step 2. Formation of the initial set of objects  under study. Representation of
data in the form of an n × m matrix, where n is the amount of rows or the amount
of objects studied, is the amount of columns or the amount of features that
characterize the objects.
 A  xiAj, B  xiBj , j  1,..., m
i  1,..., nA  nB , nA  nB  n
Step 4. Configure the DBSCAN clustering algorithm.</p>
        <p>For each equally powerful subset:
Step 5. Data clustering on a subset by the DBSCAN algorithm.</p>
        <p>Step 6. Fixing the results of clustering with
eps  epsmin , epsmax ,eps  0, 001
minPts  minPtsmin , minPtsmax ,minPts  1
(11)
(12)
Step 7. Calculation of internal criteria for the quality of clustering for each
clustering result.</p>
        <p>Step 8. Calculation of the external balance criterion in accordance with the
formula (3)
Step 9. If the modulus of the difference in the values of the external balance
criteria with the same number of clusters on each subset does not reach the minimum
value (4), then Steps 7–8 are repeated.</p>
        <p>Otherwise:
Step 10. Fixed clustering algorithm DBSCAN on the set  with (5):
Step 11. Setting up the OPTICS clustering algorithm.</p>
        <p>Step 12. Identify data clusters with different densities.</p>
        <p>Step 13. Fixation of clustering results by the OPTICS algorithm.</p>
        <p>Step 14. End
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiment, Results and Discussion</title>
      <p>
        For the first experimentation, the bulletins of the interim test are given on the basis of
the comprehensive schools of the federal university [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], which can be folded
together. The results of clustering are presented in table 3.
      </p>
      <p>In the second experiment, the algorithms were evaluated using the indices analysis
of the results obtained on data that contain clusters of different forms shows that the
use of the DBSCAN algorithm of objective clustering inductive technology allows us
to adequately group the objects under study. At the same time, the points, the
distribution density of which in the feature space is smaller than the distribution density of the
objects that make up the clusters, are grouped into a separate cluster.</p>
      <p>These points are identified as noise. In accordance with the principles of inductive
modeling of complex systems, at the last step, the best solutions are formed that
answer (4) for optimal combinations of the algorithm parameters.</p>
      <p>Fig. 4. Data D31: the
number of classes is 31;
the number of dimension
is 2; the number of
specimens is 3100.</p>
      <p>Fig. 5. Data Flame: the
number of classes is 2; the
number of dimension is 2;
the number of copies is
240.</p>
      <p>Fig. 6. Data Jain: the
number of classes is 2;
the number of dimension
is 2; the number of copies
is 373.</p>
      <p>Fig. 7. Data Pathbased:
the number of classes is 3;
the number of dimension
is 2; the number of
specimens is 300.</p>
      <p>Fig. 8. Data R15: the
number of classes is 15;
the number of dimension
is 2; the number of copies
is 600.</p>
      <p>The choice of the final solution using the OPTICS algorithm to determine clusters
on data with different densities is determined by the goals of the problem being
solved. As the results showed (Table 1), the best solutions for choosing the
parameters of the DBSCAN algorithm from the point of view of internal criteria are the
following: “Aggregation” data: EPS = 0.168, minpts = 4; Compound data: EPS = 0.175,
minpts = 4; Iris data: EPS = 0.71, minpts = 3</p>
      <p>Thus, we can conclude that the proposed hybrid objective clustering model based
on the density algorithm DBSCAN followed by the use of the OPTICS algorithm
allows detecting meaningful clusters in data with different densities.
The article demonstrates the results of the accomplishment of the objective clustering
inductive technology based on the DBSCAN clustering algorithm with the subsequent
use of the OPTICS algorithm. The fulfillment of this technology involves the
simultaneous clustering of data on two equal power sets, which include the same number of
pairs of similar objects.</p>
      <p>The external, balance and internal criteria for the quality of clustering were used to
determine the studied data objective clustering. The Calinski-Harabasz and Dunn’s
criteria were used as an internal quality criterion for clustering.</p>
      <p>The external criterion was calculated as the normalized difference of internal
quality criteria. At the same time internal quality criteria was calculated on two equal
power subsets. The balance criterion was used as an external criterion. The determination
of the EPS is the neighborhood and MinPts within the values of the EPS is the
neighborhood was performed as maximal value of the clustering quality criterion of the
complex balance during the operation of the algorithm.</p>
      <p>Aggregation D31, Flame, was used as experimental data. Jain, Pathbased, R15,
Compound data connections of the Computing School of the East-Finnish University,
and well-known also Iris data</p>
      <p>The results of simulation showed high efficiency of the proposed technology. In
the case of Aggregation and Connections data, the studied objects were adequately
divided into clusters. The noise component of the distribution density of objects was
selected when the algorithm was running.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.-P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Knowledge Discovery in Large Spatial Databases: Focusing Techniques for efficient Class Identification</article-title>
          .
          <source>In: Proceedings of the 4th Int. Symp. on large Spatial Databases</source>
          , Portland,
          <string-name>
            <surname>ME</surname>
          </string-name>
          , Vol.
          <volume>951</volume>
          ,
          <fpage>67</fpage>
          -
          <lpage>82</lpage>
          . (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ankerst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breunig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.-P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.: OPTICS</given-names>
          </string-name>
          :
          <article-title>Ordering Points To Identify the Clustering Structure</article-title>
          .
          <source>In: Proceedings of Int. Conf. on Management of Data (SIGMOD99)</source>
          ,
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          . (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ivakhnenko</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>Objective Clustering Based on the Model Self-Organization Theory, Avtomatika</article-title>
          , Vol.
          <volume>5</volume>
          ,
          <fpage>6</fpage>
          -
          <lpage>15</lpage>
          . (
          <year>1987</year>
          )
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubes</surname>
          </string-name>
          , R. C.
          <article-title>Algorithms for Clustering Data</article-title>
          .
          <article-title>Prentice-Hall advanced reference series</article-title>
          . Prentice-Hall, Inc., Upper Saddle River, NJ. (
          <year>1988</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nagpal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jatain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaur</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Review based on data clustering algorithms</article-title>
          .
          <source>In: Proceeding of the IEEE Conference, Information &amp; Communication Technologies (ICT)</source>
          ,
          <fpage>298</fpage>
          -
          <lpage>303</lpage>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          .
          <source>In: Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD-96 )</source>
          . AAAI Press, Vol.
          <volume>96</volume>
          /34,
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          . (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sarmah</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A grid-density based technique for finding clusters in satellite image</article-title>
          , Vol.
          <volume>33</volume>
          /5,
          <fpage>589</fpage>
          -
          <lpage>604</lpage>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ankerst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breunig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>OPTICS: ordering points to identify the clustering structure</article-title>
          .
          <source>In Proc. ACM SIGMOD international conference on Management of data</source>
          , Vol.
          <volume>28</volume>
          /2,
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          . (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Madala</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivakhnenko</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>Inductive Learning Algorithms for Complex Systems Modeling</article-title>
          . In: CRC Press Inc.,
          <source>Boca Raton</source>
          ,
          <volume>365</volume>
          p. (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Stepashko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bulgakova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zosimov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Construction and research of the generalized iterative GMDH algorithm with active neurons</article-title>
          .
          <source>In: Advances in Intelligent Systems and Computing II</source>
          ,
          <fpage>492</fpage>
          -
          <lpage>510</lpage>
          . (
          <year>2018</year>
          ). DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -70581-1_
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bulgakova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stepashko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zosimov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Numerical study of the generalized iterative algorithm GIA GMDH with active neurons</article-title>
          .
          <source>In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies</source>
          , Vol.
          <volume>1</volume>
          , art. No.
          <volume>8098836</volume>
          ,
          <fpage>496</fpage>
          -
          <lpage>500</lpage>
          . (
          <year>2017</year>
          ). DOI:
          <volume>10</volume>
          .1109/STC-CSIT.
          <year>2017</year>
          .8098836
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Osypenko</surname>
            ,
            <given-names>V. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reshetjuk</surname>
            ,
            <given-names>V.M.:</given-names>
          </string-name>
          <article-title>The methodology of inductive system analysis as a tool of engineering researches analytical planning</article-title>
          . In: Ann. Warsaw Univ.
          <source>Life Sci</source>
          ,
          <year>2011</year>
          ,
          <string-name>
            <surname>SGGW</surname>
          </string-name>
          , Vol.
          <volume>58</volume>
          ,
          <fpage>67</fpage>
          -
          <lpage>71</lpage>
          .(
          <year>2011</year>
          ). [Electronic resource]. - Access mode: http://annalswuls.sggw.pl/?q=node/234
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lurie</surname>
            ,
            <given-names>I.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osipenko</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Litvinenko</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taif</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kornilovska</surname>
            ,
            <given-names>N.V.</given-names>
          </string-name>
          :
          <article-title>Hybridization of the algorithm of inductive cluster analysis using estimation of data distribution</article-title>
          .
          <source>In: Lviv Polytechnic: Information systems and networks</source>
          , Vol.
          <volume>832</volume>
          ,
          <fpage>178</fpage>
          -
          <lpage>190</lpage>
          . (
          <year>2015</year>
          )
          <article-title>(in Ukrainian)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvynenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korobchinskyi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Objective clustering inductive technology of gene expression sequences features</article-title>
          ,
          <source>Communications in Computer and Information Science: In the book “Beyond Databases, Architectures and Structures”</source>
          ,
          <fpage>359</fpage>
          -
          <lpage>372</lpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvynenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osypenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Implementation of the objective clustering inductive technology based on the DBSCAN clustering algorithm</article-title>
          .
          <source>In: Proceeding of the XIIth IEEE international scientific and technical conference</source>
          ,
          <volume>479</volume>
          -
          <fpage>484</fpage>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vyshemyrska</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvynenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Implementation of DBSCAN Clustering Algorithm within the Framework of the Objective Clustering Inductive Technology based on R and KNIME</article-title>
          . In: Radio Electronics, Computer Science, Control-2019, No.
          <volume>1</volume>
          ,
          <fpage>77</fpage>
          -
          <lpage>88</lpage>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.-P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise</article-title>
          .
          <source>In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR</source>
          ,
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          .(
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Bäcklund</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hedblom</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neijman</surname>
          </string-name>
          , N.:
          <article-title>A Density-Based Spatial Clustering of Application with Noise</article-title>
          .
          <source>Linköpings Universitet</source>
          . (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Son</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          :
          <article-title>Density-based algorithms for active and anytime clustering</article-title>
          . In: Ludwig Maximilians University Munich (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ivakhnenko</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>Heuristic Self-Organization</article-title>
          . In: Problems of Engineering Cybernetics. Automatica, No.
          <volume>6</volume>
          ,
          <fpage>207</fpage>
          -
          <lpage>219</lpage>
          . (
          <year>1970</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Gabor</surname>
            ,
            <given-names>D.: Planning</given-names>
          </string-name>
          <string-name>
            <surname>Perspectives. Automation</surname>
          </string-name>
          , No.
          <volume>2</volume>
          ,
          <fpage>16</fpage>
          -
          <lpage>22</lpage>
          . (
          <year>1972</year>
          )
          <article-title>(in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Stepashko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>С</surname>
          </string-name>
          .:
          <article-title>Elements of the theory of inductive modeling. The state and prospects of the development of computer science in Ukraine: monograph</article-title>
          .
          <source>Kyiv: Scientific Opinion</source>
          ,
          <fpage>471</fpage>
          -
          <lpage>486</lpage>
          . (
          <year>2010</year>
          ).
          <article-title>(in Ukrainian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zholnarsky</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Agglomerative Cluster Analysis Procedures for Multidimensional Objects: A Test for Convergence</article-title>
          .
          <source>In: Pattern Recognition and Image Analysis</source>
          , Vol.
          <volume>25</volume>
          , No.
          <volume>4</volume>
          ,
          <fpage>389</fpage>
          -
          <lpage>390</lpage>
          . (
          <year>1992</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Ivakhnenko</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivakhnenko</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (GMDH)</article-title>
          .
          <source>In: Pattern Recognition and Image Analysis</source>
          , Vol.
          <volume>5</volume>
          , No.
          <volume>4</volume>
          ,
          <fpage>527</fpage>
          -
          <lpage>535</lpage>
          . (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Bezdek</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Optimal fuzzy partitions: A heuristic for estimating the parameters in a mixture of normal distributions</article-title>
          .
          <source>In: Proceeding of the IEEE Transactions on Computers</source>
          ,
          <volume>835</volume>
          -
          <fpage>838</fpage>
          . (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Calinski</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harabasz</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A dendrite method for cluster analysis</article-title>
          .
          <source>In: Comm. in Statistics</source>
          , Vol.
          <volume>3</volume>
          :
          <issue>1</issue>
          ,
          <fpage>27p</fpage>
          . (
          <year>1974</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvynenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Inductive model of data clustering based on the agglomerative hierarchical algorithm</article-title>
          .
          <source>In: Proceeding of the 2016 IEEE First International Conference on Data Stream Mining and Processing (DSMP)</source>
          ,
          <fpage>19</fpage>
          -
          <lpage>22</lpage>
          . (
          <year>2016</year>
          ). [Electronic resource]. - Access mode: http://ieeexplore.ieee.org/document/7583499/
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>28. https://cs.joensuu.fi/sipu/datasets/</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>