=Paper= {{Paper |id=Vol-2667/paper20 |storemode=property |title=Modified spectral clustering method for graphs decomposition |pdfUrl=https://ceur-ws.org/Vol-2667/paper20.pdf |volume=Vol-2667 |authors=Dinar Yakupov,Vladimir Mokshin }} ==Modified spectral clustering method for graphs decomposition == https://ceur-ws.org/Vol-2667/paper20.pdf
       Modified spectral clustering method for graphs
                      decomposition
                       Dinar Yakupov                                                                  Vladimir Mokshin
      Institute of Technical Cybernetics & Informatics                                Institute of Technical Cybernetics & Informatics
  Kazan National Research Technical University named after                        Kazan National Research Technical University named after
                      A.N.Tupolev – KAI                                                               A.N.Tupolev – KAI
                        Kazan, Russia                                                                   Kazan, Russia
                       yaqup@mail.ru                                                              vladimir.mokshin@mail.ru

    Abstractβ€”Among a large number of tasks on graphs,                        Rcut or Ncut criteria, which confirms the applicability of the
studies related to the placement of objects with the aim of                  Dcut criterion in spectral methods of clustering graphs.
increasing the information content of complex multi-parameter
systems find wide practical application (for example, in                                 II. THE ANALYSIS OF THE SOURCES
transport and computer networks, piping systems, in image                         The solution of the problem of finding the optimal
processing). Despite years of research, accurate and efficient
                                                                             placement of objects in k nodes for a network with the
algorithms cannot be found for placement problems. It is
proposed to consider the solution of the allocation problem in
                                                                             number of vertices |𝐢| by the full search method requires
                                                                                 |𝐢|!
the context of decomposition of the initial network into k                              iterations. For example, for a small network with
                                                                             π‘˜!(|𝐢|βˆ’π‘˜)!
regions, in each of which a vertex with some centrality                      |𝐢|=100 and k=5 it is required 107,88 iterations, which makes
property is searched. This article provides an analysis of                   practical application of this method impossible. Trial-and-
sources for solving the problem of placement in graphs, as well
                                                                             error, greedy, and stochastic algorithms are the most widely
as methods of decomposition of graph structures. Following the
main provisions of the theory of spectral clustering, the
                                                                             used approaches for solving the placement problem.
disadvantages of the splitting applied criteria Rcut and Ncut                   The trial-and-error algorithm is based on iterations. This
are indicated. It is shown that the application of the distance              approach provides a fairly close to optimal solution, but
minimization criterion Dcut proposed in this paper allows to                 requires a lot of time.
obtain high results in the decomposition of the graph. The
obtained results are based on the examples of searching for                      The principle of the greedy algorithm is to make locally
sensor placement vertices in the known ZJ and D-Π’own                         optimal decisions at each stage, assuming that the final
networks of the EPANET hydraulic modeling system.                            solution will also be optimal. The solution is fast, but not
                                                                             accurate.
   Keywordsβ€”graph,        spectral,          clustering,      distance
minimization, decomposition                                                      Solving by evolutionary computation, simulated
                                                                             annealing algorithms is based on choosing combinations of
                        I. INTRODUCTION                                      nodes placing objects on the basis of the probabilistic
    Graph models provide an opportunity to study an object                   approach does not provide warranty of solution time and
based on its topology, without delving into the physical                     solution as a whole.
nature of the processes occurring in the system under                            In [8, 9] the solution of the placement problem is
consideration, which, in turn, greatly simplifies the                        proposed to be considered in the context of the
calculations [1-3]. Among the many problems on graphs,                       decomposition of the initial network into k regions, in each
studies related to splitting the original graph into a                       of which a vertex with some property of centrality is
predetermined number of connected disjoint components                        searched. When decomposing a graph, it is necessary to
have found wide practical application [4-8]. Methods of                      minimize the number of edges connecting the vertices of
decomposition of graph structures make a significant                         different subdomains. A prerequisite for this is the
contribution to the speed of search algorithms, which is                     connectivity of the selected subgraphs.
especially important in conditions of restrictions on
computational and time resources. However, widespread                           Many methods are used to solve the graph decomposition
algorithms of spectral clustering based on minimization of                   problem [10-15].
Rcut and Ncut sections do not always allow to solve the                          Since the 1990s, spectral graph theory has been used in
problem of object placement in the best way. The reason for                  many fields [16]. The main advantage of spectral graph
this is that the decomposition by these criteria takes into                  theory is simplicity, so any system represented as a graph can
account the number of cut edges and the size of the resulting                be analyzed only by the spectrum of the associated matrix.
subgraphs, but does not take into account the distances
between the vertices and the nature of their location within                     Fiedler in [17] showed that the eigenvector that
these subdomains. This paper provides an example showing                     corresponds to the second smallest eigenvalue of the
that decomposing the original graph of 12 vertices and 12                    Laplacian matrix can be used to solve the problem of
edges into two parts, there are two splitting options that meet              bipartite graph decomposition. Hagen and Kahng [18]
the Rcut and Ncut criteria, but when considering these                       introduce the criterion of rational sections (Rcut) to assess
options in terms of the distances between the vertices within                the quality of decomposition. Shi and Malik in [19] use
these subdomains, the second option is preferred. The use of                 conclusions by Fiedler for iterative bipartite partitioning and
this criterion, designated in the paper as Dcut, in the                      introduce the normalized sections criterion (Ncut). The
decomposition of graphs allowed us to solve the problem of                   development and application of the theory of spectral
placing objects in the network with a high quality result,                   clustering are also considered in [20-25].
comparable and even better than spectral methods based on


Copyright Β© 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science

     III. THEORY OF SPECTRAL CLUSTERING OF GRAPHS                            When searching for the optimal decomposition on k=2
                                                                          subgraphs, we get two variants (figure 2).
A. Fundamentals
    The class of spectral decomposition methods [26-29]
combines elements of graph theory and linear algebra. They
are based on the application of the properties of eigenvalues
and vectors of the Laplacian matrix of the graph.
   The eigenvectors contain information about the topology
of the graph. Based on the problems, the spectral graph                                                                a)
theory uses: the main eigenvector [30], Fiedler eigenvector
[17], a group of the first k eigenvectors [19]. A review of
spectral clustering methods is presented in [31, 32].
    Spectral clustering algorithms consist of three main steps:
    1) For the original graph G, the adjacency matrix W, the
matrix of degrees of vertices D, the Laplacian matrix L are
forming. In addition to the non-normalized Laplacian matrix,                                                                b)
its normalized equivalents are also used, for example, the                Fig. 2. Decomposition of the original graph into 2 parts: a. with a partition
Laplacian matrix normalized by the random walk method                     on edges 2-11 and 5-8; b. with a partition on edges 3-4 and 9-10.
[19] or the symmetric normalized Laplacian matrix [20].
                                                                              Next, according to (1) and (2), we define the values of
   2) Determination of eigenvalues and eigenvectors of the                the Rcut and Ncut criteria for the first (figure 2.a) and second
non-normalized or normalized Laplacian matrix, which are                  (figure 2.b) variants of the partition:
used in the formation of the matrix of eigenvectors U.                                           𝑐𝑒𝑑2.11 +𝑐𝑒𝑑5.8                 𝑐𝑒𝑑2.11 +𝑐𝑒𝑑5.8
                                                                                     𝑅𝑐𝑒𝑑1 =           |𝐢1 |
                                                                                                                            +         |𝐢2 |
                                                                                                                                                   = 6.67ο€ ο€         
    3) Division of the set of vertices into k clusters by                                          𝑐𝑒𝑑2.11 +𝑐𝑒𝑑5.8                𝑐𝑒𝑑2.11 +𝑐𝑒𝑑5.8
classical clustering methods, for example, the k-means                     ο€              𝑁𝑐𝑒𝑑1 =                            +                       = 0.8ο€ ο€         
                                                                                                       π‘£π‘œπ‘™(𝐢1 )                      π‘£π‘œπ‘™(𝐢2 )
method in relation to the matrix U.                                                                𝑐𝑒𝑑3.4 +𝑐𝑒𝑑9.10               𝑐𝑒𝑑3.4 +𝑐𝑒𝑑9.10
                                                                           ο€           𝑅𝑐𝑒𝑑2 =                               +                       = 6.67ο€         
                                                                                                        |𝐢1 β€² |                       |𝐢2 β€² |
B. Graph cut point of view                                                                          𝑐𝑒𝑑3.4 +𝑐𝑒𝑑9.10               𝑐𝑒𝑑3.4 +𝑐𝑒𝑑9.10
                                                                           ο€              𝑁𝑐𝑒𝑑2 =                             +                      = 0.8ο€          
                                                                                                          π‘£π‘œπ‘™(𝐢1 β€² )                  π‘£π‘œπ‘™(𝐢2 β€² )
    Methods of spectral clustering are aimed at obtaining
such subgraphs that the difference between the constituent                    From calculations it is clear that from the point of view
elements of the subdomain is minimal with the maximum                     of Rcut and Ncut both variants of splitting give the same
difference between the subgraphs. In this case, the subgraphs             result. However, when solving placement problems, it is
must be connected and balanced in size. To implement these                important to consider the distances between all vertices in
conditions, the criteria proposed in [18, 19]:                            subgraphs. Tables 1 and 2 show the distances between
                                𝑐𝑒𝑑(𝐢𝑖 ,𝐺)                                vertices in subgraphs when decomposing by the first (figure
          𝑅𝑐𝑒𝑑(πΆπ‘˜ ) = βˆ‘π‘˜π‘–=1      |𝐢𝑖 |
                                             β†’ π‘šπ‘–π‘›ο€                    2.a) and second variants of the partition (figure 2.b).
                              𝑐𝑒𝑑(𝐢𝑖 ,𝐺)
𝑁𝑐𝑒𝑑(πΆπ‘˜ ) = βˆ‘π‘˜π‘–=1       β†’ π‘šπ‘–π‘›ο€                                 It can be seen from the tables that the second variant of
                               π‘£π‘œπ‘™(𝐢 )𝑖
where G is the initial graph, 𝐢𝑖 is i-th subgraph, k is the               splitting the graph into 2 parts provides a smaller distance
number of sub-areas to divide the original graph, 𝑐𝑒𝑑(𝐢𝑖 , 𝐺)             length in subgraphs, which indicates a greater degree of
is the sum of the weights of the cut edges, |𝐢𝑖 | is the quantity         grouping of vertices in subgraphs in the second variant of
of vertices in the subgraph i, π‘£π‘œπ‘™(𝐢𝑖 ) is the sum of the                 decomposition.
weights of edges in subgraph i.
                                                                                TABLE I.    DISTANCES BETWEEN VERTICES IN SUBGRAPHS
    It should be noted that the values of both criteria tend to a                (DECOMPOSING BY THE FIRST VARIANT OF THE PARTITION)
minimum with decreasing edge sections and with balancing                                           Subgraph π‘ͺ𝟏
subgraphs ( |𝐢1 | = |𝐢2 | = β‹― = |πΆπ‘˜ |          or π‘£π‘œπ‘™(𝐢1 ) =                       Nodes              1           2               3         4        5        6
π‘£π‘œπ‘™(𝐢2 ) = β‹― = π‘£π‘œπ‘™(πΆπ‘˜ )) . According to [32], the Rcut                                1              0           10               20     30          40       50
criterion is preferred for decomposition by non-normalized                            2              10           0               10     20          30       40
matrices, and Ncut is preferred for decomposition by                                  3              20          10                0     10          20       30
normalized matrices.                                                                  4              30          20               10      0          10       20
                                                                                      5              40          30               20     10           0       10
    However, the criteria under consideration do not always
clearly indicate a solution. Figure 1 shows a graph with 12                           6              50          40               30     20          10        0
vertices and 12 edges. The weight of each edge, according to                                                                       Sum: 700
the figure, is 10.                                                                                           Subgraph π‘ͺ𝟐
                                                                                   Nodes              7           8               9        10        11       12
                                                                                      7              0           10               20     30          40       50
                                                                                      8              10           0               10     20          30       40
                                                                                      9              20          10                0     10          20       30
                                                                                     10              30          20               10      0          10       20
                                                                                     11              40          30               20     10           0       10
                                                                                     12              50          40               30     20          10        0
Fig. 1. The original graph with 12 vertices and 12 edges.                                                                          Sum: 700




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                             86
Data Science

     TABLE II.    DISTANCES BETWEEN VERTICES IN SUBGRAPHS                            White color indicates the vertices where the objects are
     (DECOMPOSING BY THE SECOND VARIANT OF THE PARTITION)
                                                                                 placed, next to the vertices are given the values of the
                               Subgraph π‘ͺ𝟏 β€²                                     determinism estimates Pi. As a result, we get:
         Nodes             1       2        3       10      11       12
                                                                                                     𝐹1 = 1 βˆ’ π‘šπ‘’π‘Žπ‘›(𝑃) = 0.796ο€                          
            1              0      10        20     30       20       30
                                                                                 ο€                    𝐹2 = 1 βˆ’ π‘šπ‘’π‘Žπ‘›(𝑃′ ) = 0.780ο€                        
            2             10       0        10     20       10       20
            3             20      10        0      30       20       30              The value of 𝐹1 is greater than 𝐹2 , which means that the
           10             30      20        30      0       10       20          placement based on the decomposition of the second option
           11             20      10        20     10        0       10          gives better results. This result is mainly due to the fact that
           12             30      20        30     20       10        0
                                                                                 the vertices in the subgraphs of the second variant of the
                                                                                 decomposition are grouped more tightly.
                                             Sum: 580
                               Subgraph π‘ͺ𝟐 β€²                                        Thus, the use of a criterion that takes into account the
         Nodes             4       5        6       7        8        9          length of distances in subgraphs is justified in solving the
            4              0      10        20     30       20       30          problems of placing objects on graphs. In this paper, we
            5             10       0        10     20       10       20          propose the following criterion for minimizing distances
            6             20      10        0      30       20       30
                                                                                 (12):
                                                                                                          𝑐𝑒𝑑(𝐢𝑖 ,𝐺)          𝑑𝑖𝑠𝑑     |𝐢|
            7             30      20        30      0       10       20          𝐷𝑐𝑒𝑑(πΆπ‘˜ ) = βˆ‘π‘˜π‘–=1 (                   βˆ— |𝐢 |βˆ—(|𝐢 𝑖|βˆ’1) βˆ— |𝐢 |) β†’ π‘šπ‘–π‘› 
            8             20      10        20     10        0       10                                    π‘£π‘œπ‘™(𝐢𝑖 )       𝑖      𝑖        𝑖

            9             30      20        30     20       10        0          where 𝑑𝑖𝑠𝑑𝑖 is the sum of the distances between all vertices
                                             Sum: 580                            in subgraph i, |𝐢| is the number of all nodes in the original
                                                                                 graph.
C. Distance minimization criteria
    Let us consider the solution of the problem of placing k                     D. The algorithm of nodes priority distribution
=2 objects on the basis of the preliminary decomposition of                          The structurally optimal number of subgraphs can be
the graph according to options 1 and 2.                                          estimated by the largest difference between the eigenvalues
                                                                                 of the normalized Laplace matrix. Figure 4 shows the first 10
    The problem under consideration can be formulated as                         eigenvalues of the normalized Laplace matrix for the D-
follows. There is a graph G whose nodes are characterized by                     Town network graph.
certain parameter estimates Pi, and whose edges are
characterized by weights Wj. After setting the next object in
the vertex, the estimates are recalculated according to the
formulas:
                                𝑃𝑆 = 1ο€                                    
ο€                         𝑃𝑖 = π‘šπ‘Žπ‘₯(𝑃𝑦 βˆ™ 1/π‘Šπ‘–,𝑦 )ο€                           
where 𝑃𝑆 is evaluation of the deterministic value of the
parameter in the node setup of the object, 𝑃𝑦 is evaluation of
the deterministic value of the parameter of node neighbor,
π‘Šπ‘–,𝑦 is the weight of edge connecting two adjacent vertices i
and y.
    It is necessary to find such an arrangement of objects in
the nodes that provides a minimum of the average value of                        Fig. 4. Eigenvalues of the normalized Laplace matrix.
the uncertainty estimation of the target parameter:
                                                                                     In the figure, you can see that the largest difference is
                 𝐹 = 1 βˆ’ π‘šπ‘’π‘Žπ‘›(𝑃) β†’ π‘šπ‘–π‘›ο€                                       between the eigenvalues equal to 7 and 8, which means that
   The values of the estimates Pi of each vertex for the                         the best partition of this graph corresponds to 7 subgraphs.
decomposition variants 1 and 2 are shown in figure 3:




                                       a)




                                                                                 Fig. 5. Fiedler eigenvector coordinates (7 subgraphs).
                                       b)                                           Consider the plot (figure 5) of the values of elements of
Fig. 3. Values of Pi scores: a. splitting by edges 2-11 and 5-8; b. splitting    the Fiedler eigenvector for the graph under consideration.
by edges 3-4 and 9-10.



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                   87
Data Science

Each node of the graph corresponds to the value of an                          Input: graph G(V, E), number of subdomains k.
element of the eigenvector. Dotted lines mark the
                                                                               Output: the k connected subgraphs.
boundaries between clusters. The nature of the graph shows
that, indeed, you can distinguish 7 grouped sections.                          Steps:

    But what happens if you need to divide the graph into                      ο‚· Step 1. A standard procedure for spectral clustering of
more parts? Figure 6 shows the values of elements of the                         the graph is performed.
Fiedler eigenvector for the graph under consideration,                         ο‚· Step 2. Subgraphs are formed.
divided into 10 subgraphs. It is obvious that the boundaries
between subgraphs become less unambiguous, and the                             ο‚· Step 3. Connectivity is checked. If the subgraph is
probability of some nodes falling into neighboring                               connected, go to step 10. If the subgraph is not
subdomains increases, which can lead to subgraphs of an                          connected, go to step 4.
unrelated structure. As the number of subdomains increases,                    ο‚· Step 4. The boundary node is located (the vertex with
the probability of disjoint subgraphs will increase.                             the largest distance from the values of the
                                                                                 components of the eigenvector to the cluster
                                                                                 centroid).
                                                                               ο‚· Step 5. Determined by the neighboring boundary
                                                                                 vertices that are not part of the current subgraph.
                                                                               ο‚· Step 6. Subgraphs of neighboring nodes found in step
                                                                                 5 are defined.
                                                                               ο‚· Step 7. The subgraph whose centroid is closest to the
                                                                                 boundary vertex is determined.
                                                                               ο‚· Step 8. This boundary node is passed to the subgraph
                                                                                 defined in step 7.
Fig. 6. Fiedler eigenvector coordinates (10 subgraphs).
                                                                               ο‚· Step 9. Go to step 2.
    Figure 7 shows the pipeline network represented by the
original graph model divided into 10 subgraphs. Multi-                         ο‚· Step 10. If the subgraph is the last one, exit, otherwise
colored sections correspond to different subgraphs, and white                    go to the next subgraph (step 3).
nodes are the Central points of the subgraphs. The area                      Using this nodes priority distribution algorithm
highlighted in the figure is part of subgraph "4" and is not              guarantees the connectivity of the resulting subgraphs at low
related to the second half. The values of the Fiedler                     computational cost.
eigenvector elements of these "cut off" nodes are marked in
black in figure 6. As you can see, these nodes are spread                                        IV. CASE STUDY
across three clusters with borders (-0,069; -0,046], (-0,046; -               The proposed solution, based on the application of a
0,032], (-0,032; -0,019], which corresponds to the original               criterion that takes into account the distance lengths in
subgraph "4" and neighboring subgraphs "2" and "3".                       subgraphs, was tested on the example of solving the problem
                                                                          of placing pressure sensors in water supply networks ZJ and
                                                                          D-Π’own of EPANET hydraulic modeling system (figure 8).
                                                                              ZJ is a network with 114 nodes and 164 pipes, D-Town
                                                                          has 407 nodes and 459 pipes. Nodes (consumers) of the
                                                                          considered networks are characterized by pressure
                                                                          determinism estimates, and edges (pipelines) are
                                                                          characterized by lengths Lj. After installing the next sensor in
                                                                          the network, the determinism estimates are recalculated by
                                                                          the formulas:
                                                                                                      𝑃𝑆 = 1ο€                         
                                                                           ο€               𝑃𝑖 = π‘šπ‘Žπ‘₯ (𝑃𝑦 βˆ™ 𝛼1 βˆ™ 𝛼22 βˆ™ 𝑓(𝐿𝑖,𝑦 ))ο€        
                                                                          where 𝑃𝑆 is assessment of determination of pressure values in
Fig. 7. The graph divided into 10 sub-areas. The "cut off" area is        the node setup of the sensor, 𝑃𝑦 is assessment of
highlighted.                                                              determination of pressure values of the node-neighbor, 𝛼1 is
                                                                          the estimated error of determination of specific resistance of
    However, with the growth of the number of k clusters,
                                                                          the pipeline, 𝛼2 is estimating the error of determining the
the probability of formation of disconnected subgraphs
increases.                                                                values of water consumption, 𝑓(𝐿𝑖,𝑦 ) is a function of the
                                                                          length of the pipeline section to the next node.
    Connectivity of subgraphs is a necessary condition, for
example, when solving problems of division, path search,                      The task is to arrange these sensors in the nodes in such a
etc. The solution of this connectivity problem is using                   way that provides a minimum of the average value of the
proposed priority node distribution algorithm:                            estimation of the uncertainty of pressure in the network (9).




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                 88
Data Science

                                                                            TABLE IV.   PERFORMANCE INDICATORS OF ALGORITHMS (D-Π’OWN
                                                                               NETWORK) WITH DIFFERENT CRITERIAS (RCUT, NCUT, DCUT)
                                                                             Indicator      TE        Gr      SCr      SCn      SCd

                                                                                 𝑭        0.535     0.591   0.561    0.542     0.539
                                                                              βˆ‘ 𝑰𝒕𝒆𝒓      8140      210     20       20        20
                                                                               Π’, ΠΌΠΈΠ½     188.5     4.6     6.5      6.6       6.7
                                                                                Μ…, %
                                                                              πŸβˆ’πœΉ         100.0     89.5    95.2     98.7      99.3
                                                                              max(𝜹),%    0.0       17.4    9.8      7.3       3.2

                                                                              The solution obtained by the trial-and-error algorithm
                                                                          was chosen as the reference solution. The main problem of
                                                                          this algorithm is the necessary time and computational
                                                                          resources (1140 and 8140 iterations take 18.9 and 188.5
                                     a)
                                                                          minutes, respectively). The fastest (0.6 and 4.6 minutes) and
                                                                          at the same time less accurate is greedy algorithm (96.8%
                                                                          and 89.5%). Algorithms based on spectral clustering showed
                                                                          close to the reference result at low computational cost (about
                                                                          1.7-1.8 and 6.5-6.7 minutes). The application of the Rcut
                                                                          criterion provides a solution with an accuracy of 100.1% and
                                                                          95.2%, the Ncut criterion - 100.2% and 98.7%. The best
                                                                          results among the methods of spectral clustering for the Dcut
                                                                          criterion are 100.2% and 99.3% in relation to the results of
                                                                          the trial-and-error algorithm.
                                                                              Thus, the application of the proposed Dcut distance
                                                                          minimization criterion for graph decomposition allowed us
                                                                          to solve the problem of placing objects in the network with a
                                      b)                                  high quality result, comparable and even better than spectral
Fig. 8. Water supply networks: a. ZJ, b. D-Π’own.                          methods based on Rcut or Ncut criteria, which confirms the
                                                                          applicability of the Dcut criterion in spectral methods of
    For the ZJ network, options for installing sensors in the             graph clustering.
number from 1 to 10 are considered, for the D-Π’own
network - from 1 to 20. Nodes with the highest centrality in                                      V. CONCLUSION
the group are selected as sensor placement vertices.
                                                                              This article offers a look at the problems of solving the
    Solutions are considered: trial and error (TE), greedy                problems of placing objects in the network in the context of
algorithm (Gr), algorithms based on spectral clustering (SC).             finding a solution in pre-defined subdomains obtained using
Algorithms based on spectral clustering (SC) are considered               the tools of the theory of spectral clustering of graphs by the
in the context of using various criteria: SCr - spectral                  criterion of minimizing distances in the desired subgraphs.
clustering by Rcut criterion, SCn - spectral clustering by Ncut           The analysis of sources for solving the problem of placement
criterion, SCd - spectral clustering by Dcut criterion. The               in graphs, as well as methods of decomposition of graph
criteria to assess the effectiveness of the algorithms: 1)                structures are given. It is shown that many combinatorial
average uncertainty estimates (F), 2) number of iterations                problems on graphs can be solved with acceptable accuracy
(βˆ‘ πΌπ‘‘π‘’π‘Ÿ), 3) elapsed time (T), 4) accuracy rate (1 βˆ’ 𝛿̅), where           and in a short time, performing a search not in the entire set
Ξ΄ is the relative error between the results of the considered             of graph elements, but on local sets grouped by a certain
algorithm and the algorithm of trial-and-error, 5) the highest            criterion. The proven theory of spectral clustering of graphs
relative error (max(Ξ΄)).                                                  is proposed as a decomposition tool. Following the main
                                                                          provisions of this theory, the disadvantages of the applied
    Tables 3 and 4 show the results of calculating the                    criteria for splitting Rcut and Ncut are indicated. It is shown
performance indicators of the algorithms. The best accuracy               that the application of the Dcut distance minimization
scores are shown in bold.                                                 criterion proposed in this paper allows us to obtain good
                                                                          results when decomposing a graph. The obtained results are
    TABLE III. PERFORMANCE INDICATORS OF ALGORITHMS (ZJ                   based on the examples of searching for sensor placement
     NETWORK)WITH DIFFERENT CRITERIAS (RCUT, NCUT, DCUT)
                                                                          vertices in the known ZJ and D-Π’own networks of the
   Indicator        TE         Gr          SCr      SCn     SCd           EPANET hydraulic modeling system.
       𝑭          0.571      0.590        0.572    0.570   0.570             Further work will be aimed at studying the possibilities
    βˆ‘ 𝑰𝒕𝒆𝒓        1140       55           10       10      10
                                                                          of applying the Dcut criterion in solving various
                                                                          combinatorial problems on graphs using spectral clustering
    Π’, ΠΌΠΈΠ½        18.9       0.6          1.7      1.8     1.8            methods.
   πŸβˆ’Μ…
     𝜹, %         100.0      96.8         100.1    100.2   100.2
                                                                                                 ACKNOWLEDGMENT
   max(𝜹),%       0.0        7.6          1.1      1.0     1.0
                                                                             Authors thank all colleagues from The Department of
                                                                          Automated Information Processing Systems & Control of
                                                                          KNRTU named after A.N. Tupolev.


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                 89
Data Science

                               REFERENCES                                     [15] V. Dongen and S. Graph, β€œClustering by Flow Simulation,”
[1]  V.V. Mokshin, I.M. Yakimov, R.M. Yulmetyev and A.V. Mokshin                   University of Utrecht, The Netherlands, 2000.
     β€œRecursive-regression self-organization of complex systems analysis      [16] D. Cvetkovic and S. Simic, β€œGraph spectra in computer science,”
     and control models,” Nelineiniy mir, vol. 7, no. 1, pp. 66-76, 2009.          Linear Algebra Appl., no. 434, pp. 1545-1562, 2011.
[2] V.V. Mokshin, β€œParallel genetic algorithm for selection of significant    [17] M. Fiedler, β€œAlgebraic connectivity of graphs,” Math. J., Czechoslov,
     factors influencing the evolution of a complex system,” Vestnik               no. 23, pp. 298-305, 1973.
     Kazanskogo gosudarstvennogo tekhnicheskogo universiteta im. A.N.         [18] L. Hagen and A. Kahng, β€œNew spectral methods for ratio cut
     Tupoleva, no. 3, pp. 89-93, 2009.                                             partitioning and clustering A,” IEEE Trans. Comput.-Aided Des. of
[3] I.M. Yakimov, А.P. Kirpichnikov, V.V. Mokshin, β€œModeling of                    Integrated Circuits and Systems, vol. 11, no. 9, pp 1074-1085, 1992.
     complex systems in GPSS W simulation environment with advanced           [19] J. Shi and J. Malik, β€œNormalized cuts and image segmentation,” IEEE
     redactor,” Vestnik Kazanskogo tekhnologicheskogo universiteta, vol.           Trans. Pattern Anal. Mach. Intell., no. 22, pp. 888-905, 2000.
     17, no. 4, pp. 298-303, 2014.
                                                                              [20] A.Y. Ng, M.I. Jordan and Y. Weiss, β€œOn spectral clustering: Analysis
[4] I.R. Saifudinov, V.V. Mokshin, P.I. Tutubalin and А.P. Kirpichnikov,           and an algorithm,” Advances in Neural Information Processing
     β€œOptimization model of graph representation for allocation of                 Systems (NIPS), vol. 14, pp. 849-856, 2002.
     significant structures on the example of visual data preprocessing,”
     Vestnik tekhnologicheskogo universiteta, vol. 21, no. 5, pp. 121-129,    [21] E. Ary-Catro, G. Chen and G. Lerman, β€œSpectral clustering based on
     2018.                                                                         local linear approximations,” Electronic Journal of Statistics, vol. 5,
                                                                                   pp. 1537-1587, 2011.
[5] G.I. Galimova and D.T. Yakupov, β€œStatistical analysis of the urban
     water consumption for administrative-business sector,” Amazonia          [22] G. Chen, S. Atev and G. Lerman, β€œKernel spectral curvature
     Investiga, vol. 7, no. 17, pp. 414- 425, 2018.                                clustering (KSCC),” Dynamical Vision Workshop, IEEE 12th
                                                                                   International Conference on Computer Vision, Kyoto, Japan, pp. 765-
[6] I.R. Saifudinov, V.V. Mokshin, А.P. Kirpichnikov, P.I. Tutubalin and           772, 2009.
     L.M. Sharnin, β€œMethod of allocation of significant areas in graphic
     information      for     decision     support    systems,”     Vestnik   [23] G. Chen and G. Lerman,β€œSpectral curvature clustering (SCC),” Int. J.
     tekhnologicheskogo universiteta, vol. 21, no. 6, pp. 146-152, 2018.           Comput. Vision, no. 81, pp. 317-330, 2009.
[7] V.V. Mokshin and I.M. Yakimov, β€œMethod of formation of complex            [24] Q. Guo, H. Li, W. Chen, I.-F. Shen and J. Parkkinen, β€œManifold
     system analysis model,” Informatsionnie tehnologii, no. 5, pp. 46-51,         clustering via energy minimization,” Proceedings of the Sixth
     2011.                                                                         International Conference on Machine Learning and Applications,
                                                                                   Washington, DC, USA, pp. 375-380, 2007.
[8] G.I. Galimova, I.D. Galimyanov, D.T. Yakupov and V.V. Mokshin,
                                                                              [25] U. Luxburg and M. Belkin, β€œConsistency of spectral clustering,” Ann.
     β€œApplication of Spectral Clustering Methods in Pipeline Systems
                                                                                   Statist., no. 36, pp. 555-586, 2008.
     Graph Models,” Helix, vol. 9, no. 5, pp. 5607- 5614, 2019.
[9] A. Di Nardo, C. Giudicianni and R. Greco, β€œSensor placement in            [26] H.D. Simon, β€œPartitioning of unstructured problems for parallel
                                                                                   processing,” Comput. Syst. Eng., vol. 2, pp. 135-148, 1991.
     water distribution networks based on spectral algorithms,” EPiC
     Series in Engineering, vol. 3, pp. 593- 600, 2018.                       [27] R.D. Williams, β€œPerformance of dynamic load balancing algorithms
[10] A. George and J.W. Liu, β€œComputer Solution of Large Sparse                    for unstructured mesh calculations,” Technical Report, California
                                                                                   Institute of Technology, 1990.
     Positive Definite Systems,” Prentice-Hall, Englewood Cliffs, NJ,
     1981.                                                                    [28] T.F. Chan and D.C. Resasco, β€œA framework for the analysis and
[11] K. Schloegel, G. Karypis and V. Kumar, β€œGraph partitioning for high           construction of domain decomposition preconditioners,” Technical
                                                                                   Report CAM-87-09, University of California, Los Angeles, 1987.
     performance scientific simulations,” CRPC Parallel Computing
     Handbook, Morgan Kaufmann, 2001.                                         [29] D.A. Spielman and S.-H. Teng, β€œSpectral partitioning works: Planar
[12] B.W. Kernighan and S.Lin, β€œAn efficient heuristic procedure for               graphs and finite element meshes,” Linear Algebra and its
                                                                                   Applications, vol. 421, pp. 284-305, 2007.
     partitioning graphs,” The Bell System Technical Journal, vol. 49, no.
     2, pp. 291-307, 1970.                                                    [30] D.E. Plotnikov, P.A. Kolbudaev and S.A. Bartalev, β€œIdentification of
[13] C.M. Fiduccia and R.M. Mattheyses, β€œA linear time heuristic for               dynamically homogeneous areas with time series segmentation of
     improving network partitions,” Proc. 19th IEEE Design Automation              remote sensing data,” Computer Optics, vol. 42, no. 3, pp. 447-456,
     Conference, pp. 175-181, 1982.                                                2018. DOI: 10.18287/2412-6179-2018-42-3-447-456.
                                                                              [31] B. Mohar, β€œThe Laplacian spectrum of graphs,” Graph Theory,
[14] A.L. Zheleznyakova, β€œEfficient decomposition methods of
                                                                                   Combinatorics, and Applications, Wiley: Hoboken, NJ, USA, pp.
     unstructured adaptive grids for high-performance calculations in
                                                                                   871-898 , 1991.
     solving computational aerodynamics problems,” Fiziko-himicheskaya
     kinetika v gazovoy dinamike, vol. 18, no. 1, 2017.                       [32] U. Luxburg, β€œA tutorial on spectral clustering”, Statistics and
                                                                                   Computing, vol. 17, pp. 395-41, 2007.




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                 90