=Paper= {{Paper |id=None |storemode=property |title=Combined Method for Effective Clustering based on Parallel SOM and Spectral Clustering |pdfUrl=https://ceur-ws.org/Vol-706/paper27.pdf |volume=Vol-706 |dblpUrl=https://dblp.org/rec/conf/dateso/VojacekMSDD11 }} ==Combined Method for Effective Clustering based on Parallel SOM and Spectral Clustering== https://ceur-ws.org/Vol-706/paper27.pdf
   Combined Method for Effective Clustering based
   Combined Method for Effective Clustering based
      on Parallel SOM and Spectral Clustering
      on Parallel SOM and Spectral Clustering
                  Lukáš Vojáček, Jan Martinovič, Kateřina Slaninová,
                           Pavla Dráždilová,
                  Lukáš Vojáček,            and Jiř
                                     Jan Martinovič,   ı́ Dvorský
                                                      Kateřina  Slaninová,
                           Pavla Dráždilová, and Jiřı́ Dvorský
         Department of Computer Science, VŠB – Technical University of Ostrava,
                                 Ostrava,FEI,
       Department of Computer Science,    Czech Republic
                                              VSB  – Technical University of Ostrava,
        lukas.vojacek@vsb.cz,
                17. listopadu 15,jan.martinovic@vsb.cz,    slaninova@opf.slu.cz,
                                  708 33, Ostrava-Poruba, Czech Republic
                   pavla.drazdilova@vsb.cz,
        lukas.vojacek@vsb.cz,                   jiri.dvorsky@vsb.cz
                                 jan.martinovic@vsb.cz,    slaninova@opf.slu.cz,
                   pavla.drazdilova@vsb.cz, jiri.dvorsky@vsb.cz


          Abstract. The paper is oriented to the problem of clustering for
          large datasets with high-dimensions. We propose a two-phase combined
          method with regard to high dimensions and exploiting the standard
          clustering algorithm. The first step of the method is based on the learn-
          ing phase using artificial neural network, especially Self organizing map,
          which we find as a suitable method for the reduction of the problem
          complexity. Due to the fact, that the learning phase of artificial neural
          networks can be time-consuming operation (especially for large high-
          dimensional datasets), we decided to accelerate this phase using paral-
          lelization to improve the computational efficiency. The second phase of
          the proposed method is oriented to clustering. Because the visualization
          provided by Self organizing maps is depending on the map dimension,
          and is not as clear and comprehensible in the cases of clustering applica-
          tions, we decided to use spectral clustering algorithm to obtain sufficient
          clusters. According to our results, the proposed combined method is
          sufficiently rapid and quite accurate.


   1    Theoretical Background

   Artificial neural networks (ANN) are the mathematical models inspired by the
   structure and functionality of the biological neural systems capable of parallel
   and distributed computation [11]. The ANN can be thought of the non-linear
   statistical data modeling tool to find and visualize complex relationships between
   the input data collections and the output map. Moreover, ANN can be used as
   an adaptive system that is able to change its structure through the learning
   phase in relation to external input or internal information in the network.
       The model typically consists of interconnected groups of neurons, organized
   in the layers of the system. The basic system of ANN has three layers (the input
   neurons, the second layer of neurons and the output layer of neurons). All the
   layers are interconnected through synapses, which have assigned weights used for
   the calculations of network function f (x). The network function is then defined
   as a composition of other functions appropriate to the neurons on each layer of
   the network.


V. Snášel, J. Pokorný, K. Richta (Eds.): Dateso 2011, pp. 120–131, ISBN 978-80-248-2391-1.
                               Combined Method for Effective Clustering . . .     121


    ANN is typical of its possibility of learning. Given a class of functions F ,
learning is a process of finding the optimal solution f ∗ ∈ F for a specific task,
using a set of observations. For the efficiency measurement is used a cost function
C : F → R such, that for the optimal solution f ∗ is C(f ∗ ) = min C(f ), ∀f ∈ F .
    There are known two basic approaches to the learning phase of ANN: su-
pervised learning, unsupervised learning and their extensions or combinations.
Supervised learning is the approach, where the network function is inferred from
the supervised training data collection. The set of training examples consists on
pairs of the input vectors and the appropriate output values. The network func-
tion is then typically used for pattern recognition or classification (for discrete
output) or for regression (for continuous output). As an example of commonly
used supervised algorithms, we can mention multilayer Perceptron with Back-
propagation method.
    Unsupervised learning approach is used for solving the problems oriented
to discovery and determination of the data structure. Among commonly used
algorithms we can include Self organizing maps (SOM) and its extensions. The
ANN with unsupervised learning are commonly used for the tasks like clustering,
estimation of statistical distributions, filtering or compression problems.

1.1   Self Organizing Maps
Self organizing map (SOM), also called Kohonen map, is a type of artificial neural
network invented by professor Teuvo Kohonen in 1982 [16]. The input space of
the training samples is represented in a low dimensional (often two-dimensional)
space, called map. The model is capable of projecting the high-dimensional space
to the lower-dimensional space [19] and is efficient in the structure visualization
due to its feature of the topological preservation using a neighborhood function.
The obtained low-dimensional map is often used for pattern detection, clustering,
or for characterization and analysis of the input space.
    SOM technique has been applied in many spheres like speech recognition
[21, 8], image classification [15, 2], document clustering [14, 9] etc. The detailed
description of the SOM application is provided in [8].
    The model of SOM consists of two layers of nodes. The input layer for re-
ceiving and transmitting the input information and the output layer called the
map represented the output characteristics. The output layer is commonly orga-
nized as the two-dimensional map of nodes, but there are known extensions as,
for example, hexagonal grid of output layer. The both layers are feed-forward
connected. It is known, that the maps with the smaller grid of the output layer
have the behavior similar to K-means clustering [1]. Thr larger output maps have
the ability to describe the topological characteristics of the input data collection
(often with using U-Matrix for the interpretation of the distance between the
nodes).
    The SOM input layer is given by input vectors x ∈ Rn . Each node of this
layer is then connected with one of the output nodes K by means of a weight of
reference vector wk ∈ Rn , k = 1, . . . , K. During the learning phase, the weight
vector wk (t) is computed for the network at time t, where t = 0, 1, . . . is discrete
122     Lukáš Vojáček et al.


time index for each input vector x(t). The passage through the network at time
t is an epoch. The learning (training) phase is performed through competitive
learning, where for each training example x(t) is computed similarity to all
weight vectors wk (t). The output neuron with the most similar vector is then
called as the best matching unit (BMU). The weights of the winning neuron and
the neurons in the closest neighborhood are then updated and adjusted for the
appropriate input vector. The weight vector initialization is commonly assigned
randomly, or by using other data mining methods. Concrete implementation
of the SOM depends on the method used for the weight vectors’ actualization
during the training phase.
    SOM networks are especially suitable for hidden knowledge presentation.
Both the structure of data clusters and query result can be easily visualized. For
the overall view of learned data, we use the so-called Unified distance matrix (U-
matrix), which records the values in clusters and cluster boundaries. The values
are assigned to the neuron which wins competition for them, and the distances
between neighbouring neurons are recorded with grayness level. Darker colors
usually mean greater distance. On the other hand, close data can be colored with
similar colors, in this case the boundary between clusters is shown as a steep
change in color hue.

1.2   Spectral Clustering
Spectral clustering algorithm uses eigenvalues and eigenvectors of a similarity
matrix derived from the data set to find the clusters.
    Given a set of data points {x1 , . . . xn } ∈ Rl and similarity (cosine measure)
aij ≥ 0 between all pairs of the data points xi and xj .
    Let G = (V, E) be an undirected graph with vertex set V = {v1 , . . . , vn }.
Each vertex vi in this graph represents the data point xi . Two vertices are con-
nected, if the similarity aij between the corresponding data points xi and xj is
positive, and the edge is weighted by aij . The weighted adjacency matrix of the
graph is the matrix A = (aij ) i, j = 1, . . . , n. If aij = 0 than (vi , vj ) ∈
                                                                               / E(G).
For undirected graph it governs that A is symmetric. The degree of a vertex
                           Pn
vi ∈ V is defined as di =      aij . The degree matrix D is defined as the diagonal
                             j=1
matrix with the degrees d1 , . . . , dn on the diagonal. The unnormalized graph
Laplacian matrix is defined as L = D − A. In[6] Fiedler defines the second small-
est eigenvalue a(G) of the of Laplacian matrix L(G) as algebraic connectivity of
the graph G. In his honor, the corresponding eigenvector is called Fiedler vector.
The Spectral Partitioning Algorithm which uses Fiedler vector is summarized
in[22]. The other properties of the algebraic connectivity are in [7].
    The survey of data clustering relevant to the clustering document collection
is published in [12], while the analysis of spectral clustering is described in [13].
Kannan et al. developed a natural bicriteria measure for assessing the quality of
the clustering. How to use the spectral algorithm is studied in [13] by Cheng et
al. The practical implementation of the clustering algorithm is presented in [3].
In [5] Ding et al. proposed a new graph partition method based on the min-max
                              Combined Method for Effective Clustering . . .    123


clustering principle: the similarity between two subgraphs (cut set) is minimized,
while the similarity within each subgraph (summation of similarity between all
pairs of nodes within a subgraph) is maximized. Shi and Malik [23] treated image
segmentation as a graph partitioning problem and proposed a global criterion,
the normalized cut, for segmenting the graph. They showed that an efficient
computational technique based on a generalized eigenvalue problem can be used
to optimize this criterion. Recursive algorithm is used in [4]. Dasgupta et al.
analyzed the second eigenvector technique of spectral partitioning on the planted
partition random graph model, by constructing a recursive algorithm.


2     SOM Partitioning

2.1   SOM Algorithms

There are known several variants of the SOM algorithm interpretations [17, 20].
Depending up to the implementation, we can use serial or parallel version of the
algorithms.


Serial SOM Algorithms As conventional variant of the serial algorithm in-
terpretations can be considered standard On-line SOM.
   On-line SOM Algorithm is the conventional method, where the weight vectors
wk (t) are updated during the training phase recursively for each input vector
x(t). The BMU dc is commonly selected by calculating the similarity using
Euclidean distance:
                             dk (t) = kx(t) − wk (t)k2 ,                        (1)

                                dc (t) ≡ min dk (t).                            (2)
                                          k

The weight vectors are then updated using a learning-rate factor σ(t) and a
neighborhood function hck (t):

                 wk (t + 1) = wk (t) + σ(t)hck (t)[x(t) − wk (t)].              (3)

The learning-rate factor σ(t) is used for the correction of the weight vectors; dur-
ing the learning phase is reduced. The concrete updated weight vectors wk (t)
are set by the neighbor function hck (t), which determines the distance between
nodes c and k. The distance is typically decreasing during the learning phase,
from an initial value (often comparable to the dimension/or the half of dimen-
sion of the lattice) to the value equal to one neuron (one node in the lattice).
Commonly is used the standard Gaussian neighborhood function. For the serial
online SOM algorithm were published several variants to improve its computa-
tional efficiency; as an example we can mention WEBSOM [18].
124     Lukáš Vojáček et al.


Parallel SOM Algorithms Till lately, most of the conventional algorithms
were designed as sequential. The sequential algorithms were well suited to the
past generation of computers, which basically performed the operations in the
sequential fashion. With the development of the parallel computation, where
the operations were performed simultaneously, there is growing the necessity to
redesign the serial algorithms to their parallel implementations.
    The parallelization of SOM learning algorithms can be implemented by the
network partitioning. The network partitioning is the implementation, where
the neural network is partitioned among the processors. Then, each input data
sample is processed by its assigned processor or the parallel task. The network
partitioning was implemented by several authors [10, 24].
                                       The main principle of our parallel imple-
                                   mentation is based on division of the neural
                                   network into the parts, where each part is as-
                                   signed to one processor. This division is shown
                                   in the Fig. 1, where we have the map of 5 × 4
                                   nodes. This map is divided into 3 parts which
                                   are associated with 3 processors. Not always
                                   there is the possibility to divide the map into
                                   the identical parts. In these cases, there is the
                                   neural network (map) divided using the prin-
                                   ciple, that the parts of the network differ in
                                   at the most one neuron.
                                       The training phase is based on the serial
   Fig. 1. SOM Map Division
                                   version of the SOM algorithm. Each process
                                   finds its own BMU in its part of the map; this
                                   node is then compared with other BMU ob-
tained by other processes. The information about the BMU of the whole network
is then transmitted to all the processes, which in accordance with this informa-
tion update weights of the appropriate nodes in their parts of the network.

2.2   Spectral Clustering
The experiment is oriented to the problem of effective clustering for the large
datasets with the high-dimension. Due to this reason, we had to find suitable (suf-
ficiently adequate) method for the reduction of the problem complexity. We have
decided to used the SOM method. Consider the training set of n-dimensional
objects O = {oij ; oi ∈ Rn , i = 1, . . . m}, where |O| = m. SOM allows us to
transfer the original problem with m × n dimension to 2-dimensional matrix A
of dimension l × l represented by the SOM map, where l  max(m, n). Our
motivation was to facilitate tasks with objects of high dimension.
     Because the visualization provided by SOM is depending on the map dimen-
sion, and is not as clear and comprehensible in the cases of clustering applica-
tions, we decided to use the appropriate algorithm for clustering. As the SOM
map can be thought as a graph, and the node weights can be thought as similar-
ity measures, we decided to use the spectral clustering method for dividing the
                               Combined Method for Effective Clustering . . .    125


SOM map into the clusters with the close nodes. In other words, we transformed
the SOM map to the similarity matrix of the individual nodes.
    For each graph G, represented by the similarity matrix, the second smallest
eigenvalue λ2 of the Laplacian matrix designates algebraic connectivity a(G)
[6]. This value can be represented as a lower bound for edge and vertices graph
connectivity [7]. We used this knowledge while computing eigenvector incident
to this second smallest eigenvalue (algebraic connectivity).
    By the recursive application of minimal cut to the graph we have obtained the
sequence of the clusters. For each connected subgraph Gi we have obtained its
a(Gi ), algebraic connectivity. The graph division is finished, when a(Gi ) becomes
descending as presented in Algorithm 1.


Algorithm 1 Application of Spectral Clustering Method to SOM
 1. Construction of the SOM map.
 2. Consider graph G1 = (V, E), which is given by the similarity matrix A(G1 ) of
    dimension l × l from the SOM map.
                                                               l
                                                               P
 3. Construct Laplacian matrix L(G1 ) = D(G1 ) − A(G1 ), dii =   aij .
                                                               i=1
 4. Compute a(G1 ), u(G1 ), where algebraic connectivity a(G1 ) = λ2 (G1 ), u(G1 ) is
    appropriate eigenvector.
 5. Divide graph Gj to connected subgraphs G0j+1 = {vi ∈ Vj , where ui ≤ 0}, G00j+1 =
    {vj ∈ Vj , where uj > 0}, but it is possible that this subgraph is not connected.
                                                                           (i)
    Then find the all connected components which create the subgraphs Gj+1 .
                                    (i)
 6. For each subgraph compute a(Gj+1 ) and appropriate u(Gij+1 ).
 7. If a(Gj+1 ) < θ then do not divide else step 5, where θ ∈ R is the threshold for
    algebraic connectivity.




3     Experiments
The experiments were divided into two phases according to the phases of pro-
posed algorithm. The first phase of the experiments was oriented to the accel-
eration of the SOM algorithm. As mentioned above, we have tested the parallel
implementation of the SOM algorithm training phase for various dimensions of
the SOM map and the input vector.
   The second phase of the experiments was related to the division of the SOM
map using spectral clustering. Both phases are documented by obtained issues
and the appropriate figures, see below.

3.1   SOM Acceleration
All the experiments were performed on Windows HPC server 2008 with 6 com-
puting nodes, where each node has 8 processors with 12 GB of memory.
126    Lukáš Vojáček et al.


   The first experiment was provided on the training set of 300 2-dimensional
samples, while the dimension of the neural network (SOM map) was changed.
The outputs are presented in the Table 1, where the records with asterisk (*)
were provided only by one computing node. In these cases, there is not pro-
vided the network communication between processes and due to this fact is the
computation faster.


Table 1. Computing Time Dependence on Map Dimension and Number of Processors

      Processors                      Computing Time [hh:mm:ss]
                       Map Dimension        Map Dimension    Map Dimension
                         100 × 100            300 × 300        500 × 500
          1*                0:01:20            0:12:57            0:35:57
          8*                0:00:16            0:01:56            0:21:43
          16                0:00:14            0:01:06            0:03:18
          24                0:00:13            0:00:50            0:02:05
          32                0:00:15            0:00:48            0:01:37
          40                0:00:15            0:00:38            0:01:21



    The second experiment was provided on the map with selected dimension
of 100 × 100 nodes, while the input vector dimension of the training data set
was changed. There was used the training set of 150 records. The outputs are
presented in the Table 2, where the records with asterisk (*) were provided
only by one computing node. In this cases, there is not provided the network
communication between processes; due to this fact is the computation faster.


Table 2. Computing Time Dependence on Input Vector Dimension and Number of
Processors

      Processors                      Computing Time [hh:mm:ss]
                       Input Vector Dimension 4     Input Vector Dimension 8
          1*                      0:06:29                   0:12:08
          8*                      0:01:01                   0:01:48
          16                      0:00:36                   0:01:00
          24                      0:00:27                   0:00:44
          32                      0:00:25                   0:00:38
          40                      0:00:23                   0:00:34



    As we can see from Tables 1 and 2, the acceleration of the SOM algorithm is
appreciable. With growing number of processors is increasing the computation
effectiveness, and the computational time is sufficiently reducing.
                              Combined Method for Effective Clustering . . .       127


3.2    SOM Map Division

The next phase of the experiments was oriented to the testing of the proposed
algorithms for the SOM map division using spectral clustering, concretely with
the Fiedler vector. We have used three training data collections called TwoDia-
monds, Lsun a Hepta from the Fundamental Clustering Problems Suite (FCPS)1 .
Short description of selected dataset used in our experiments is given in Table 3.


        Table 3. Fundamental Clustering Problems Suite – selected datasets

      Name           Cases   #Vars   #Clusters   Main Clustering Problem
      Hepta           212      3         7       different densities in clusters
      Lsun            400      2         3       different variances in clusters
      TwoDiamonds     800      2         2       touching clusters



    For comprehensible interpretation we have used U-matrix and its 3D visu-
alization of the SOM map obtained after the dimension reduction of the input
dataset . The visualization of the first input data set TwoDiamonds, the SOM
map after the first phase of proposed algorithm and its division by spectral clus-
tering are presented on the Fig. 2. The U-matrix for this dataset, presented on
the Figs. 2(b) and 2(c), is accurately matching to the division provided by the
Fiedler vector on the Fig. 2(d).
    The same situation occurs for the second input data set Lsun, for visual-
ization see Fig. 3(a). As we can see from the Figs. 3(b) and 3(c), the division
provided by the SOM and Fiedler vector, Fig. 3(d), is also accurately matching
as in the experiment with the first dataset.
    The visualization of the third input data set Hepta, the SOM map after
the first phase of proposed algorithm and its division by spectral clustering
are presented on the Fig. 4. From the Figs. 4(b) and 4(c) we can see, that
the division provided by Fig. 4(d) is not as corresponding as in both previous
experiments. The experiment with dataset Hepta demonstrates, that some edges
are determined in spectral clustering differently then in U-Matrix, but both
results are correct.


4     Conclusion

In this paper we presented the parallel implementation of the SOM neural net-
work algorithm. Parallel implementation was tested on HPC cluster containing
6 nodes and 40 processor cores. The achieved speed-up was very good.
    Moreover, the partitioning of the resulting SOM map using spectral clus-
tering method was presented. The spectral clustering was applied on U-matrix.
This method automatically detected the parts of the SOM that represent the
1
    http://www.uni-marburg.de/fb12/datenbionik/data?language_sync=1
128   Lukáš Vojáček et al.
                        TwoDiamonds, n = 800, dimension = 2, classes = 2, main problem: cluster border defined by density
                        1.5



                             1



                        0.5
                                                                                                                                                                                                                                       300
                                                                                                                                                                                                                                       250
                                                                                                                                  300
                             0                                                                                                                                                                                                         200
      Y




                                                                                                                                  250                                                                                                  150

                                                                                                                                  200                                                                                                  100
                     −0.5                                                                                                                                                                                                              50
                                                                                                                                  150
                                                                                                                                                                                                                                       0
                                                                                                                                  100
                         −1
                                                                                                                                                                                                                                  29
                                                                                                                                   50
                                                                                                                                                                                                                         25
                                                                                                                                                                                                                   20
                                                                                                                                    00
                                                                                                                                                                                                              15
                     −1.5                                                                                                                     5
                                                                                                                                                       10                                                10         Rows of SOM
                                                                                                                                                               15
                             0         0.5           1        1.5      2        2.5         3       3.5         4                                                        20                      5
                                                                                                                                                  Columns of SOM                   25
                                                                       X                                                                                                                29 0



                                      (a) Cluster Visualization                                                                                             (b) U-Matrix 3D
                                                                                                                                   29                                                          ’diam1011.Bbin_gnuplotFigi.dat’


               29                                                                                         300                      25


               25
                                                                                                          250
                                                                                                                                   20

               20
                                                                                                                    Rows of SOM
                                                                                                          200
      Rows of SOM




                                                                                                                                   15
               15                                                                                         150

                                                                                                                                   10
               10                                                                                         100


                                                                                                                                    5
                    5                                                                                     50


                    0                                                                                     0                         0
                        0               5            10          15        20              25       29                                    0            5            10               15                  20                  25            29
                                                          Columns of SOM                                                                                                      Columns of SOM



                                                     (c) U-Matrix                                                   (d) SOM Division using Fiedler
                                                                                                                    Vector

                        Fig. 2. Fundamental Clustering Problems Suite – Two Diamonds
                    Lsun, n = 400, dimension = 2, classes = 3, main problem: different variances and inter cluster distances


                         5

                     4.5

                         4

                     3.5
                                                                                                                                                                                                                                       300

                         3                                                                                                                                                                                                             250
                                                                                                                                  300
                                                                                                                                                                                                                                       200
      Y




                     2.5                                                                                                          250                                                                                                  150
                                                                                                                                                                                                                                       100
                                                                                                                                  200
                         2                                                                                                                                                                                                             50
                                                                                                                                  150                                                                                                  0
                     1.5                                                                                                          100                                                                                                  -50

                                                                                                                                   50
                         1
                                                                                                                                                                                                                                  19
                                                                                                                                    0
                                                                                                                                                                                                                        15
                     0.5
                                                                                                                                  -50 0
                                                                                                                                                                                                              10
                                                                                                                                                   5                                                                Rows of SOM
                                                                                                                                                               10                                    5
                                 −1          0            1          2           3              4         5                                                                   15
                                                                                                                                                  Columns of SOM
                                                                      X                                                                                                                 19 0



                                      (a) Cluster Visualization                                                                                             (b) U-Matrix 3D
                                                                                                                                   19                                                            ’Lsun11.Bbin_gnuplotFigi.dat’


               19                                                                                         300

                                                                                                                                   15
                                                                                                          250
               15
                                                                                                          200
                                                                                                                    Rows of SOM
      Rows of SOM




                                                                                                                                   10
                                                                                                          150
               10

                                                                                                          100

                                                                                                                                    5
                                                                                                          50
                    5

                                                                                                          0


                    0                                                                                     -50                       0
                        0                        5                10                  15            19                                    0                    5                      10                           15                      19
                                                          Columns of SOM                                                                                                      Columns of SOM



                                                     (c) U-Matrix                                                   (d) SOM Division using Fiedler
                                                                                                                    Vector

                                        Fig. 3. Fundamental Clustering Problems Suite – Lsun
                                                                     Combined Method for Effective Clustering . . .                                                                              129




Hepta, n = 212, dimension = 3, classes = 7 main problem: none, i.e clear defined clusters




                            4


                            2


                            0
      z




                                                                                                                                                                                        300

                        −2                                                                            300
                                                                                                                                                                                        250
                                                                                                                                                                                        200
                                                                                                      250                                                                               150
                        −4                                                                            200                                                                               100
                                                                                        4                                                                                               50
                                                                                                      150
                                                                                                                                                                                        0
                                                                               2
                                                                                                      100
                                 2                                        0                                                                                                      19
                                                                                                          50
                                         0                                                                                                                                  15
                                                                     −2                                    00
                                                                                                                                                                  10
                                              −2                                                                     5                                                  Rows of SOM
                                                                −4                                                               10                       5
                                                                          x                                         Columns of SOM    15
                                     y                                                                                                         19 0



                                (a) Cluster Visualization                                                                  (b) U-Matrix 3D
                                                                                                          19                                           ’hepta11.Bbin_gnuplotFigi.dat’


               19                                                                  300

                                                                                                          15
                                                                                   250
               15
                                                                                            Rows of SOM




                                                                                   200
      Rows of SOM




                                                                                                          10

               10
                                                                                   150


                                                                                   100
                                                                                                           5
                    5

                                                                                   50


                    0                                                              0                       0
                        0                5             10            15       19                                0               5             10                       15                   19
                                               Columns of SOM                                                                         Columns of SOM



                                             (c) U-Matrix                                   (d) SOM Division using Fiedler
                                                                                            Vector

                                 Fig. 4. Fundamental Clustering Problems Suite – Hepta
130     Lukáš Vojáček et al.


clusters in original high-dimensional data. The detected clusters correspond to
the clusters perceived by the human being.
    In the future work we intend to focus on more precise specification of the
threshold for the algebraic connectivity in the spectral clustering.


Acknowledgment

This work is partially supported by Grant of Grant Agency of Czech Repub-
lic No. 205/09/1079, and SGS, VSB – Technical University of Ostrava, Czech
Republic, under the grant No. SP2011/172.


References

 1. F. Bacao, V. Lobo, and M. Painho. Self-organizing maps as substitutes for k-
    means clustering. In Computational Science - ICCS 2005, Pt. 3, Lecture Notes in
    Computer Science, pages 209–217, 2005.
 2. H. Bekel, G. Heidemann, and H. Ritter. Interactive image data labeling using
    self-organizing maps in an augmented reality scenario. Neural Networks, 18(5-
    6):566–574, June-July 2005.
 3. D. Cheng, R. Kannan, S. Vempala, and G. Wang. On a recursive spectral algorithm
    for clustering from pairwise similarities. Technical report, MIT, 2003.
 4. A. Dasgupta, J. Hopcroft, R. Kannan, and P. Mitra. Spectral clustering by re-
    cursive partitioning. In ESA’06: Proceedings of the 14th conference on Annual
    European Symposium, pages 256–267. Springer-Verlag, 2006.
 5. C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm
    for graph partitioning and data clustering. In ICDM ’01: Proceedings of the 2001
    IEEE International Conference on Data Mining, pages 107–114, Washington, DC,
    USA, 2001. IEEE Computer Society.
 6. M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal,
    pages 298 – 305, 1973.
 7. M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its
    application to graph theory. Czechoslovak Mathematical Journal, pages 619–633,
    1975.
 8. B. Gas, M. Chetouani, J.-L. Zarader, and C. Charbuillet. Predictive kohonen
    map for speech features extraction. In W. Duch, J. Kacprzyk, E. Oja, and
    S. Zadrozny, editors, Artificial Neural Networks: Formal Models and Their Ap-
    plications - ICANN 2005, volume 3697 of Lecture Notes in Computer Science,
    pages 793–798. Springer Berlin / Heidelberg, 2005.
 9. A. Georgakis and H. Li. An ensemble of som networks for document organization
    and retrieval. In Proceedings of AKRR’05, International and Interdisciplinary
    Conference on Adaptive Knowledge Representation and Reasoning, pages 141–147,
    2005.
10. W. Gropp, E. Lusk, and A. Skjellum. Using MPI: portable parallel programming
    with the message-passing inferace. MIT Press, 1999.
11. S. Haykin. Neural Networks. A Comprehensive Foundation. Macmillan, New York,
    1994.
                                Combined Method for Effective Clustering . . .       131


12. D. Húsek, J. Pokorný, H. Řezánková, and V. Snášel. Data clustering: From docu-
    ments to the web. In Web Data Management Practices: Emerging Techniques and
    Technologies, chapter Data Clustering: From Documents to the Web, pages 1–33.
    Idea Group Inc, 2006.
13. R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. J.
    ACM, 51(3):497–515, May 2004.
14. K. Kishida. Techniques of document clustering: A review. Library and Information
    Science, 35(1):106–120, January 2005.
15. O. Kohonen, T. Jaaskelainen, M. Hauta-Kasari, J. Parkkinen, and K. Miyazawa.
    Organizing spectral image database using self-organizing maps. Journal of Imaging
    Science and Technology, 49(4):431–441, July-August 2005.
16. T. Kohonen. Self-Organization and Associative Memory, volume 8 of Springer
    Series in Information Sciences. Springer, Berlin, Heidelberg, 1984. 3rd ed. 1989.
17. T. Kohonen. Things you haven’t heard about the self-organizing map. In Proc.
    ICNN’93, International Conference on Neural Networks, pages 1147–1156, Piscat-
    away, NJ, 1993. IEEE, IEEE Service Center.
18. T. Kohonen. Exploration of very large databases by self-organizing maps. In
    Proceedings of ICNN’97, International Conference on Neural Networks, pages PL1–
    PL6. IEEE Service Center, Piscataway, NJ, 1997.
19. T. Kohonen. Self Organizing Maps. Springer-Verlag, 3rd edition, 2001.
20. R. D. Lawrence, G. S. Almasi, and H. E. Rushmeier. A scalable parallel algorithm
    for self-organizing maps with applications to sparse data mining problems. Data
    Mining and Knowledge Discovery, 3:171–95, 1999.
21. S. Meenakshisundaram, W. L. Woo, and S. S. Dlay. Generalization issues in multi-
    class classification - new framework using mixture of experts. Wseas Transactions
    on Information-Science and Applications, 4:1676–1681, Dec 2004.
22. A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with wigen-
    vectors of graphs. SIAM J. Matrix Anal. Appl., 11(3):430–452, July 1990.
23. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions
    on Pattern Analysis and Machine Intelligence, 22:888–905, 1997.
24. C. H. Wu, R. E. Hodges, and C. J. Wang. Parallelizing the self-organizing feature
    map on multiprocessor systems. Parallel Computing, 17(6–7):821–832, September
    1991.